SlideShare une entreprise Scribd logo
1  sur  19
Télécharger pour lire hors ligne
JDG 7 & Spark
Integration
Infinispan Spark connector
tedwon
Agenda
● Brief Introduction to JBoss Data Grid 7
● Brief Introduction to Apache Spark
● Features of Infinispan Spark connector
● Demo
What is JBoss Data Grid
● Distributed cache
● In-memory NoSQL
● Key/value data store
● Built from the Infinispan community project
● In architecture, there is no master node
● Peer-to-peer membership clustering
● The most strength is extremely highly availability and scalability as in-memory
○ In my personal opinion
● Good harmony with real-time data processing framework for data services
● As a metaphor, In-memory HDFS for real-time processing
What is JBoss Data Grid 7
● JDG 7 was released at July, 2016
● JDG 7 is based on Infinispan 8.3.0 and EAP 7.0.0.GA
○ The previous version JDG 6.6.0 is based on Infinispan 6.4.0
● JDG 7 now becomes much more powerful as a competitive software product
● There are new features and enhancements
○ New GUI Admin console
○ Easy to install and configure clustering and cache
○ Easy to monitor and change configuration
○ Easy to create a new cache through admin console even in runtime
○ Provides API for integration with Apache Spark
JDG 7 - New Features and Enhancements
1. DISTRIBUTED STREAMS
2. REMOTE TASK EXECUTION
3. APACHE SPARK INTEGRATION
4. APACHE HADOOP INTEGRATION
5. NEW ADMINISTRATION CONSOLE FOR SERVER DEPLOYMENTS
6. CONTROLLED SHUTDOWN AND RESTART OF CLUSTER
7. NODE.JS (JAVASCRIPT) HOT ROD CLIENT
8. CASSANDRA CACHE STORE
9. HOT ROD C++ ENHANCEMENTS
10. HOT ROD C# ENHANCEMENTS
What is Apache Spark
● General-purpose Lightning-fast cluster computing framework
● General-purpose
○ One common data processing engine
○ Batch, SQL, Streaming, MLlib, Graph
○ One platform to rule them all
● Lightning-fast
○ RDD - Resilient Distributed Dataset
○ Read-only multiset of data items distributed over a cluster of machines
● Works with Hadoop HDFS and process that data in parallel
○ Distributed file System
● MapReduce interfaces with funtional programming style
○ No MR chaining workflow in Hadoop
What is Apache Spark
Apache Spark - One platform to rule them all
What is Apache Spark RDD
● RDD consists of mulitle data partitions over a cluster of machines
● RDD is intermediate data during a Spark data processing job
● RDD is distributed in memory of each worker node JVM
● Data processing job is transformations of RDDs
● RDDs are disappeared after finishing a job
● Not able to share RDDs between Spark jobs
What is Apache Spark RDD
Infinispan Spark connector
JDG 7’s new feautre
What is Infinispan Spark connector
● https://github.com/infinispan/infinispan-spark
● JDG 7 Document https://goo.gl/9BXp98
● RDD and DStream integration with Apache Spark 1.6
○ DStream is a continuous sequence of RDDs for real-time stream processing
● Use JDG as a data source for Spark
● Easy to read & write cache data in a Spark job
● Provides seamless funtional programming style and syntactic sugar
● Good to share RDD with other Spark jobs
Supported Configurations
Features of Infinispan Spark connector
1. Create an Spark RDD from a JDG cache data
○ Read cache data from Spark job
2. Write any key/value based RDD to a JDG cache
○ Write intermediate or final data to cache
3. Create a Spark DStream from cache-level events
○ Insert, Modify and Delete event in a cache
4. Write any key/value DStream to JDG
○ Write any DStream to cache
5. Use JDG server side filters to create a cache based RDD
○ Using Infinispan Query DSL
Prerequisite & Version compatibility
● JDK 8, Scala 2.10.4, Spark 1.6
● JBoss Data Grid 7.0.0 Server
○ Infinispan Server 8.2.4.Final
● Run JDG in Remote Client-Server mode
Demo
https://github.com/tedwon/infinispan-spark-connector-examples
Performance Considerations
● The number of Spark workers should be greater than JDG nodes
○ To take advantage of the parallelism
● Support locality with co-located in the same node as JDG and Spark worker
○ Spark worker only processes data in the local node with connector
Thank you
References
1. https://access.redhat.com/documentation/en-US/Red_Hat_JBoss_Data_Grid/7.0/html-single/Developer_Guide/index.
html#Integration_with_Apache_Spark
2. http://spark.apache.org/docs/1.6.2/programming-guide.html
3. https://github.com/infinispan/infinispan-spark
4. https://github.com/infinispan/infinispan
5. https://github.com/tedwon/infinispan-spark-connector-examples
6. https://access.redhat.com/documentation/en-US/Red_Hat_JBoss_Data_Grid/7.0/html-single/7.0.0_Release_Notes/in
dex.html#chap-New_Features_and_Enhancements
7. https://en.wikipedia.org/wiki/Apache_Spark
8. https://hub.docker.com/r/gustavonalle/infinispan-spark/

Contenu connexe

Tendances

Infinispan, transactional key value data grid and nosql database
Infinispan, transactional key value data grid and nosql databaseInfinispan, transactional key value data grid and nosql database
Infinispan, transactional key value data grid and nosql databaseAlexander Petrov
 
ClustrixDB at Samsung Cloud
ClustrixDB at Samsung CloudClustrixDB at Samsung Cloud
ClustrixDB at Samsung CloudMariaDB plc
 
MongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of viewMongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of viewPierre Baillet
 
MongoDB Administration ~ Kevin Hanson
MongoDB Administration ~ Kevin HansonMongoDB Administration ~ Kevin Hanson
MongoDB Administration ~ Kevin Hansonhungarianhc
 
GridFS: The Perfect Solution for Media Storage
GridFS: The Perfect Solution for Media StorageGridFS: The Perfect Solution for Media Storage
GridFS: The Perfect Solution for Media StorageMongoDB
 
Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8MongoDB
 
Conquering Data Migration from Oracle to Postgres
Conquering Data Migration from Oracle to PostgresConquering Data Migration from Oracle to Postgres
Conquering Data Migration from Oracle to PostgresEDB
 
MongoDB .local London 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local London 2019: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local London 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local London 2019: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
 
Summer 2017 undergraduate research powerpoint
Summer 2017 undergraduate research powerpointSummer 2017 undergraduate research powerpoint
Summer 2017 undergraduate research powerpointChristopher Dubois
 
WiredTiger Overview
WiredTiger OverviewWiredTiger Overview
WiredTiger OverviewWiredTiger
 
FOSSASIA 2016 - 7 Tips to design web centric high-performance applications
FOSSASIA 2016 - 7 Tips to design web centric high-performance applicationsFOSSASIA 2016 - 7 Tips to design web centric high-performance applications
FOSSASIA 2016 - 7 Tips to design web centric high-performance applicationsAshnikbiz
 
Azure document db/Cosmos DB
Azure document db/Cosmos DBAzure document db/Cosmos DB
Azure document db/Cosmos DBMohit Chhabra
 
Stream processing at Hotstar
Stream processing at HotstarStream processing at Hotstar
Stream processing at HotstarKafkaZone
 
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...HBaseCon
 
An Overview of Apache Spark
An Overview of Apache SparkAn Overview of Apache Spark
An Overview of Apache SparkYasoda Jayaweera
 
Introduction To MongoDB
Introduction To MongoDBIntroduction To MongoDB
Introduction To MongoDBElieHannouch
 
MongoDB Administration 101
MongoDB Administration 101MongoDB Administration 101
MongoDB Administration 101MongoDB
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep divelucenerevolution
 

Tendances (20)

Infinispan, transactional key value data grid and nosql database
Infinispan, transactional key value data grid and nosql databaseInfinispan, transactional key value data grid and nosql database
Infinispan, transactional key value data grid and nosql database
 
ClustrixDB at Samsung Cloud
ClustrixDB at Samsung CloudClustrixDB at Samsung Cloud
ClustrixDB at Samsung Cloud
 
MongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of viewMongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of view
 
MongoDB Administration ~ Kevin Hanson
MongoDB Administration ~ Kevin HansonMongoDB Administration ~ Kevin Hanson
MongoDB Administration ~ Kevin Hanson
 
Cosmosdb graph
Cosmosdb graphCosmosdb graph
Cosmosdb graph
 
GridFS: The Perfect Solution for Media Storage
GridFS: The Perfect Solution for Media StorageGridFS: The Perfect Solution for Media Storage
GridFS: The Perfect Solution for Media Storage
 
Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8
 
Conquering Data Migration from Oracle to Postgres
Conquering Data Migration from Oracle to PostgresConquering Data Migration from Oracle to Postgres
Conquering Data Migration from Oracle to Postgres
 
MongoDB .local London 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local London 2019: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local London 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local London 2019: MongoDB Atlas Data Lake Technical Deep Dive
 
Summer 2017 undergraduate research powerpoint
Summer 2017 undergraduate research powerpointSummer 2017 undergraduate research powerpoint
Summer 2017 undergraduate research powerpoint
 
WiredTiger Overview
WiredTiger OverviewWiredTiger Overview
WiredTiger Overview
 
FOSSASIA 2016 - 7 Tips to design web centric high-performance applications
FOSSASIA 2016 - 7 Tips to design web centric high-performance applicationsFOSSASIA 2016 - 7 Tips to design web centric high-performance applications
FOSSASIA 2016 - 7 Tips to design web centric high-performance applications
 
Spark Core
Spark CoreSpark Core
Spark Core
 
Azure document db/Cosmos DB
Azure document db/Cosmos DBAzure document db/Cosmos DB
Azure document db/Cosmos DB
 
Stream processing at Hotstar
Stream processing at HotstarStream processing at Hotstar
Stream processing at Hotstar
 
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
 
An Overview of Apache Spark
An Overview of Apache SparkAn Overview of Apache Spark
An Overview of Apache Spark
 
Introduction To MongoDB
Introduction To MongoDBIntroduction To MongoDB
Introduction To MongoDB
 
MongoDB Administration 101
MongoDB Administration 101MongoDB Administration 101
MongoDB Administration 101
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep dive
 

En vedette

Complex Event Processing with Esper
Complex Event Processing with EsperComplex Event Processing with Esper
Complex Event Processing with EsperTed Won
 
RHQ 공감 Seminar 6th
RHQ 공감 Seminar 6thRHQ 공감 Seminar 6th
RHQ 공감 Seminar 6thTed Won
 
JBoss Community's Application Monitoring Platform
JBoss Community's Application Monitoring PlatformJBoss Community's Application Monitoring Platform
JBoss Community's Application Monitoring PlatformTed Won
 
JCO 11th 클라우드 환경에서 Java EE 운영 환경 구축하기
JCO 11th 클라우드 환경에서 Java EE 운영 환경 구축하기JCO 11th 클라우드 환경에서 Java EE 운영 환경 구축하기
JCO 11th 클라우드 환경에서 Java EE 운영 환경 구축하기Ted Won
 
Red Hat Forum 2012 - JBoss RHQ - Java Application Monitoring & Management Pla...
Red Hat Forum 2012 - JBoss RHQ - Java Application Monitoring & Management Pla...Red Hat Forum 2012 - JBoss RHQ - Java Application Monitoring & Management Pla...
Red Hat Forum 2012 - JBoss RHQ - Java Application Monitoring & Management Pla...Ted Won
 
Nara - Personalized Web Recommendation Service Quick Review
Nara - Personalized Web Recommendation Service Quick ReviewNara - Personalized Web Recommendation Service Quick Review
Nara - Personalized Web Recommendation Service Quick ReviewTed Won
 
Real-time Big Data Analytics Practice with Unstructured Data
Real-time Big Data Analytics Practice with Unstructured DataReal-time Big Data Analytics Practice with Unstructured Data
Real-time Big Data Analytics Practice with Unstructured DataTed Won
 
Hadoop for the Data Scientist: Spark in Cloudera 5.5
Hadoop for the Data Scientist: Spark in Cloudera 5.5Hadoop for the Data Scientist: Spark in Cloudera 5.5
Hadoop for the Data Scientist: Spark in Cloudera 5.5Cloudera, Inc.
 
Complex Event Processing with Esper
Complex Event Processing with EsperComplex Event Processing with Esper
Complex Event Processing with EsperTed Won
 
How to Avoid Problems with Lump-sum Relocation Allowances
How to Avoid Problems with Lump-sum Relocation AllowancesHow to Avoid Problems with Lump-sum Relocation Allowances
How to Avoid Problems with Lump-sum Relocation AllowancesParsifal Corporation
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingHari Shreedharan
 
AddisDev Meetup ii: Golang and Flow-based Programming
AddisDev Meetup ii: Golang and Flow-based ProgrammingAddisDev Meetup ii: Golang and Flow-based Programming
AddisDev Meetup ii: Golang and Flow-based ProgrammingSamuel Lampa
 
지금 핫한 Real-time In-memory Stream Processing 이야기
지금 핫한 Real-time In-memory Stream Processing 이야기지금 핫한 Real-time In-memory Stream Processing 이야기
지금 핫한 Real-time In-memory Stream Processing 이야기Ted Won
 
Tools For jQuery Application Architecture (Extended Slides)
Tools For jQuery Application Architecture (Extended Slides)Tools For jQuery Application Architecture (Extended Slides)
Tools For jQuery Application Architecture (Extended Slides)Addy Osmani
 
SimplifyStreamingArchitecture
SimplifyStreamingArchitectureSimplifyStreamingArchitecture
SimplifyStreamingArchitectureMaheedhar Gunturu
 
Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream ...
Apache Flink Meetup:  Sanjar Akhmedov - Joining Infinity – Windowless Stream ...Apache Flink Meetup:  Sanjar Akhmedov - Joining Infinity – Windowless Stream ...
Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream ...Ververica
 
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard OfApache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard OfCharles Givre
 
Going bananas with recursion schemes for fixed point data types
Going bananas with recursion schemes for fixed point data typesGoing bananas with recursion schemes for fixed point data types
Going bananas with recursion schemes for fixed point data typesPawel Szulc
 
Data ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFiData ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFiLev Brailovskiy
 

En vedette (20)

Complex Event Processing with Esper
Complex Event Processing with EsperComplex Event Processing with Esper
Complex Event Processing with Esper
 
RHQ 공감 Seminar 6th
RHQ 공감 Seminar 6thRHQ 공감 Seminar 6th
RHQ 공감 Seminar 6th
 
JBoss Community's Application Monitoring Platform
JBoss Community's Application Monitoring PlatformJBoss Community's Application Monitoring Platform
JBoss Community's Application Monitoring Platform
 
JCO 11th 클라우드 환경에서 Java EE 운영 환경 구축하기
JCO 11th 클라우드 환경에서 Java EE 운영 환경 구축하기JCO 11th 클라우드 환경에서 Java EE 운영 환경 구축하기
JCO 11th 클라우드 환경에서 Java EE 운영 환경 구축하기
 
Red Hat Forum 2012 - JBoss RHQ - Java Application Monitoring & Management Pla...
Red Hat Forum 2012 - JBoss RHQ - Java Application Monitoring & Management Pla...Red Hat Forum 2012 - JBoss RHQ - Java Application Monitoring & Management Pla...
Red Hat Forum 2012 - JBoss RHQ - Java Application Monitoring & Management Pla...
 
Nara - Personalized Web Recommendation Service Quick Review
Nara - Personalized Web Recommendation Service Quick ReviewNara - Personalized Web Recommendation Service Quick Review
Nara - Personalized Web Recommendation Service Quick Review
 
Real-time Big Data Analytics Practice with Unstructured Data
Real-time Big Data Analytics Practice with Unstructured DataReal-time Big Data Analytics Practice with Unstructured Data
Real-time Big Data Analytics Practice with Unstructured Data
 
Hadoop for the Data Scientist: Spark in Cloudera 5.5
Hadoop for the Data Scientist: Spark in Cloudera 5.5Hadoop for the Data Scientist: Spark in Cloudera 5.5
Hadoop for the Data Scientist: Spark in Cloudera 5.5
 
Complex Event Processing with Esper
Complex Event Processing with EsperComplex Event Processing with Esper
Complex Event Processing with Esper
 
How to Avoid Problems with Lump-sum Relocation Allowances
How to Avoid Problems with Lump-sum Relocation AllowancesHow to Avoid Problems with Lump-sum Relocation Allowances
How to Avoid Problems with Lump-sum Relocation Allowances
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark Streaming
 
AddisDev Meetup ii: Golang and Flow-based Programming
AddisDev Meetup ii: Golang and Flow-based ProgrammingAddisDev Meetup ii: Golang and Flow-based Programming
AddisDev Meetup ii: Golang and Flow-based Programming
 
지금 핫한 Real-time In-memory Stream Processing 이야기
지금 핫한 Real-time In-memory Stream Processing 이야기지금 핫한 Real-time In-memory Stream Processing 이야기
지금 핫한 Real-time In-memory Stream Processing 이야기
 
Tools For jQuery Application Architecture (Extended Slides)
Tools For jQuery Application Architecture (Extended Slides)Tools For jQuery Application Architecture (Extended Slides)
Tools For jQuery Application Architecture (Extended Slides)
 
SimplifyStreamingArchitecture
SimplifyStreamingArchitectureSimplifyStreamingArchitecture
SimplifyStreamingArchitecture
 
Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream ...
Apache Flink Meetup:  Sanjar Akhmedov - Joining Infinity – Windowless Stream ...Apache Flink Meetup:  Sanjar Akhmedov - Joining Infinity – Windowless Stream ...
Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream ...
 
Spark+flume seattle
Spark+flume seattleSpark+flume seattle
Spark+flume seattle
 
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard OfApache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
 
Going bananas with recursion schemes for fixed point data types
Going bananas with recursion schemes for fixed point data typesGoing bananas with recursion schemes for fixed point data types
Going bananas with recursion schemes for fixed point data types
 
Data ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFiData ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFi
 

Similaire à JDG 7 & Spark Integration - Infinispan Spark Connector

Spark Driven Big Data Analytics
Spark Driven Big Data AnalyticsSpark Driven Big Data Analytics
Spark Driven Big Data Analyticsinoshg
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Sparkdatamantra
 
Apache spark on Hadoop Yarn Resource Manager
Apache spark on Hadoop Yarn Resource ManagerApache spark on Hadoop Yarn Resource Manager
Apache spark on Hadoop Yarn Resource Managerharidasnss
 
Retour d'expérience d'un environnement base de données multitenant
Retour d'expérience d'un environnement base de données multitenantRetour d'expérience d'un environnement base de données multitenant
Retour d'expérience d'un environnement base de données multitenantSwiss Data Forum Swiss Data Forum
 
Blackray @ SAPO CodeBits 2009
Blackray @ SAPO CodeBits 2009Blackray @ SAPO CodeBits 2009
Blackray @ SAPO CodeBits 2009fschupp
 
Apache Spark e AWS Glue
Apache Spark e AWS GlueApache Spark e AWS Glue
Apache Spark e AWS GlueLaercio Serra
 
Zeppelin and spark sql demystified
Zeppelin and spark sql demystifiedZeppelin and spark sql demystified
Zeppelin and spark sql demystifiedOmid Vahdaty
 
Apache Spark for Beginners
Apache Spark for BeginnersApache Spark for Beginners
Apache Spark for BeginnersAnirudh
 
Putting the Spark into Functional Fashion Tech Analystics
Putting the Spark into Functional Fashion Tech AnalysticsPutting the Spark into Functional Fashion Tech Analystics
Putting the Spark into Functional Fashion Tech AnalysticsGareth Rogers
 
Introduction to Apache Spark
Introduction to Apache Spark Introduction to Apache Spark
Introduction to Apache Spark Juan Pedro Moreno
 
New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015Robbie Strickland
 
Spark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, StreamingSpark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, StreamingPetr Zapletal
 
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F... Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...Databricks
 

Similaire à JDG 7 & Spark Integration - Infinispan Spark Connector (20)

Spark Driven Big Data Analytics
Spark Driven Big Data AnalyticsSpark Driven Big Data Analytics
Spark Driven Big Data Analytics
 
Apache Spark on HDinsight Training
Apache Spark on HDinsight TrainingApache Spark on HDinsight Training
Apache Spark on HDinsight Training
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Apache spark on Hadoop Yarn Resource Manager
Apache spark on Hadoop Yarn Resource ManagerApache spark on Hadoop Yarn Resource Manager
Apache spark on Hadoop Yarn Resource Manager
 
Retour d'expérience d'un environnement base de données multitenant
Retour d'expérience d'un environnement base de données multitenantRetour d'expérience d'un environnement base de données multitenant
Retour d'expérience d'un environnement base de données multitenant
 
Blackray @ SAPO CodeBits 2009
Blackray @ SAPO CodeBits 2009Blackray @ SAPO CodeBits 2009
Blackray @ SAPO CodeBits 2009
 
Apache Spark e AWS Glue
Apache Spark e AWS GlueApache Spark e AWS Glue
Apache Spark e AWS Glue
 
Zeppelin and spark sql demystified
Zeppelin and spark sql demystifiedZeppelin and spark sql demystified
Zeppelin and spark sql demystified
 
Apache Spark for Beginners
Apache Spark for BeginnersApache Spark for Beginners
Apache Spark for Beginners
 
Spark Worshop
Spark WorshopSpark Worshop
Spark Worshop
 
Putting the Spark into Functional Fashion Tech Analystics
Putting the Spark into Functional Fashion Tech AnalysticsPutting the Spark into Functional Fashion Tech Analystics
Putting the Spark into Functional Fashion Tech Analystics
 
Apache Spark
Apache SparkApache Spark
Apache Spark
 
Spark
SparkSpark
Spark
 
Introduction to Apache Spark
Introduction to Apache Spark Introduction to Apache Spark
Introduction to Apache Spark
 
New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015
 
Spark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, StreamingSpark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, Streaming
 
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F... Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
 

Plus de Ted Won

Undertow RequestBufferingHandler 소개
Undertow RequestBufferingHandler 소개Undertow RequestBufferingHandler 소개
Undertow RequestBufferingHandler 소개Ted Won
 
JBoss EAP 7 & JDG 7 최신 기술 소개
JBoss EAP 7 & JDG 7 최신 기술 소개JBoss EAP 7 & JDG 7 최신 기술 소개
JBoss EAP 7 & JDG 7 최신 기술 소개Ted Won
 
JBoss Modules Internal
JBoss Modules InternalJBoss Modules Internal
JBoss Modules InternalTed Won
 
오픈 소스 컨트리뷰션 가이드
오픈 소스 컨트리뷰션 가이드오픈 소스 컨트리뷰션 가이드
오픈 소스 컨트리뷰션 가이드Ted Won
 
Jenkins X Hands-on - automated CI/CD solution for cloud native applications o...
Jenkins X Hands-on - automated CI/CD solution for cloud native applications o...Jenkins X Hands-on - automated CI/CD solution for cloud native applications o...
Jenkins X Hands-on - automated CI/CD solution for cloud native applications o...Ted Won
 
Jenkins X - automated CI/CD solution for cloud native applications on Kubernetes
Jenkins X - automated CI/CD solution for cloud native applications on KubernetesJenkins X - automated CI/CD solution for cloud native applications on Kubernetes
Jenkins X - automated CI/CD solution for cloud native applications on KubernetesTed Won
 
Hawkular overview
Hawkular overviewHawkular overview
Hawkular overviewTed Won
 
Building Real-time CEP Application with Open Source Projects
Building Real-time CEP Application with Open Source Projects Building Real-time CEP Application with Open Source Projects
Building Real-time CEP Application with Open Source Projects Ted Won
 
JBoss RHQ와 Byteman을 이용한 오픈소스 자바 애플리케이션 모니터링
JBoss RHQ와 Byteman을 이용한 오픈소스 자바 애플리케이션 모니터링JBoss RHQ와 Byteman을 이용한 오픈소스 자바 애플리케이션 모니터링
JBoss RHQ와 Byteman을 이용한 오픈소스 자바 애플리케이션 모니터링Ted Won
 

Plus de Ted Won (9)

Undertow RequestBufferingHandler 소개
Undertow RequestBufferingHandler 소개Undertow RequestBufferingHandler 소개
Undertow RequestBufferingHandler 소개
 
JBoss EAP 7 & JDG 7 최신 기술 소개
JBoss EAP 7 & JDG 7 최신 기술 소개JBoss EAP 7 & JDG 7 최신 기술 소개
JBoss EAP 7 & JDG 7 최신 기술 소개
 
JBoss Modules Internal
JBoss Modules InternalJBoss Modules Internal
JBoss Modules Internal
 
오픈 소스 컨트리뷰션 가이드
오픈 소스 컨트리뷰션 가이드오픈 소스 컨트리뷰션 가이드
오픈 소스 컨트리뷰션 가이드
 
Jenkins X Hands-on - automated CI/CD solution for cloud native applications o...
Jenkins X Hands-on - automated CI/CD solution for cloud native applications o...Jenkins X Hands-on - automated CI/CD solution for cloud native applications o...
Jenkins X Hands-on - automated CI/CD solution for cloud native applications o...
 
Jenkins X - automated CI/CD solution for cloud native applications on Kubernetes
Jenkins X - automated CI/CD solution for cloud native applications on KubernetesJenkins X - automated CI/CD solution for cloud native applications on Kubernetes
Jenkins X - automated CI/CD solution for cloud native applications on Kubernetes
 
Hawkular overview
Hawkular overviewHawkular overview
Hawkular overview
 
Building Real-time CEP Application with Open Source Projects
Building Real-time CEP Application with Open Source Projects Building Real-time CEP Application with Open Source Projects
Building Real-time CEP Application with Open Source Projects
 
JBoss RHQ와 Byteman을 이용한 오픈소스 자바 애플리케이션 모니터링
JBoss RHQ와 Byteman을 이용한 오픈소스 자바 애플리케이션 모니터링JBoss RHQ와 Byteman을 이용한 오픈소스 자바 애플리케이션 모니터링
JBoss RHQ와 Byteman을 이용한 오픈소스 자바 애플리케이션 모니터링
 

Dernier

Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...OnePlan Solutions
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxRTS corp
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shardsChristopher Curtin
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxRTS corp
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfRTS corp
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITmanoharjgpsolutions
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slidesvaideheekore1
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingShane Coughlan
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...OnePlan Solutions
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptxVinzoCenzo
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecturerahul_net
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?Alexandre Beguel
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesKrzysztofKkol1
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogueitservices996
 
Patterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencePatterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencessuser9e7c64
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldRoberto Pérez Alcolea
 

Dernier (20)

Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh IT
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slides
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptx
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecture
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogue
 
Patterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencePatterns for automating API delivery. API conference
Patterns for automating API delivery. API conference
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository world
 

JDG 7 & Spark Integration - Infinispan Spark Connector

  • 1. JDG 7 & Spark Integration Infinispan Spark connector tedwon
  • 2. Agenda ● Brief Introduction to JBoss Data Grid 7 ● Brief Introduction to Apache Spark ● Features of Infinispan Spark connector ● Demo
  • 3. What is JBoss Data Grid ● Distributed cache ● In-memory NoSQL ● Key/value data store ● Built from the Infinispan community project ● In architecture, there is no master node ● Peer-to-peer membership clustering ● The most strength is extremely highly availability and scalability as in-memory ○ In my personal opinion ● Good harmony with real-time data processing framework for data services ● As a metaphor, In-memory HDFS for real-time processing
  • 4. What is JBoss Data Grid 7 ● JDG 7 was released at July, 2016 ● JDG 7 is based on Infinispan 8.3.0 and EAP 7.0.0.GA ○ The previous version JDG 6.6.0 is based on Infinispan 6.4.0 ● JDG 7 now becomes much more powerful as a competitive software product ● There are new features and enhancements ○ New GUI Admin console ○ Easy to install and configure clustering and cache ○ Easy to monitor and change configuration ○ Easy to create a new cache through admin console even in runtime ○ Provides API for integration with Apache Spark
  • 5. JDG 7 - New Features and Enhancements 1. DISTRIBUTED STREAMS 2. REMOTE TASK EXECUTION 3. APACHE SPARK INTEGRATION 4. APACHE HADOOP INTEGRATION 5. NEW ADMINISTRATION CONSOLE FOR SERVER DEPLOYMENTS 6. CONTROLLED SHUTDOWN AND RESTART OF CLUSTER 7. NODE.JS (JAVASCRIPT) HOT ROD CLIENT 8. CASSANDRA CACHE STORE 9. HOT ROD C++ ENHANCEMENTS 10. HOT ROD C# ENHANCEMENTS
  • 7. ● General-purpose Lightning-fast cluster computing framework ● General-purpose ○ One common data processing engine ○ Batch, SQL, Streaming, MLlib, Graph ○ One platform to rule them all ● Lightning-fast ○ RDD - Resilient Distributed Dataset ○ Read-only multiset of data items distributed over a cluster of machines ● Works with Hadoop HDFS and process that data in parallel ○ Distributed file System ● MapReduce interfaces with funtional programming style ○ No MR chaining workflow in Hadoop What is Apache Spark
  • 8. Apache Spark - One platform to rule them all
  • 9. What is Apache Spark RDD ● RDD consists of mulitle data partitions over a cluster of machines ● RDD is intermediate data during a Spark data processing job ● RDD is distributed in memory of each worker node JVM ● Data processing job is transformations of RDDs ● RDDs are disappeared after finishing a job ● Not able to share RDDs between Spark jobs
  • 10. What is Apache Spark RDD
  • 11. Infinispan Spark connector JDG 7’s new feautre
  • 12. What is Infinispan Spark connector ● https://github.com/infinispan/infinispan-spark ● JDG 7 Document https://goo.gl/9BXp98 ● RDD and DStream integration with Apache Spark 1.6 ○ DStream is a continuous sequence of RDDs for real-time stream processing ● Use JDG as a data source for Spark ● Easy to read & write cache data in a Spark job ● Provides seamless funtional programming style and syntactic sugar ● Good to share RDD with other Spark jobs
  • 14. Features of Infinispan Spark connector 1. Create an Spark RDD from a JDG cache data ○ Read cache data from Spark job 2. Write any key/value based RDD to a JDG cache ○ Write intermediate or final data to cache 3. Create a Spark DStream from cache-level events ○ Insert, Modify and Delete event in a cache 4. Write any key/value DStream to JDG ○ Write any DStream to cache 5. Use JDG server side filters to create a cache based RDD ○ Using Infinispan Query DSL
  • 15. Prerequisite & Version compatibility ● JDK 8, Scala 2.10.4, Spark 1.6 ● JBoss Data Grid 7.0.0 Server ○ Infinispan Server 8.2.4.Final ● Run JDG in Remote Client-Server mode
  • 17. Performance Considerations ● The number of Spark workers should be greater than JDG nodes ○ To take advantage of the parallelism ● Support locality with co-located in the same node as JDG and Spark worker ○ Spark worker only processes data in the local node with connector
  • 19. References 1. https://access.redhat.com/documentation/en-US/Red_Hat_JBoss_Data_Grid/7.0/html-single/Developer_Guide/index. html#Integration_with_Apache_Spark 2. http://spark.apache.org/docs/1.6.2/programming-guide.html 3. https://github.com/infinispan/infinispan-spark 4. https://github.com/infinispan/infinispan 5. https://github.com/tedwon/infinispan-spark-connector-examples 6. https://access.redhat.com/documentation/en-US/Red_Hat_JBoss_Data_Grid/7.0/html-single/7.0.0_Release_Notes/in dex.html#chap-New_Features_and_Enhancements 7. https://en.wikipedia.org/wiki/Apache_Spark 8. https://hub.docker.com/r/gustavonalle/infinispan-spark/