SlideShare une entreprise Scribd logo
1  sur  39
Télécharger pour lire hors ligne
Hadoop Ecosystem and Low
Latency Streaming Architecture
InSemble Inc.
http://www.insemble.com
Agenda
What is Big Data and why it is relevant ?1
Flume, Kafka and Storm4
Reference Architecture for Low Latency Streaming3
Hadoop Ecosystem2
Demo5
Big Data Definitions
• Wikipedia defines it as “Data Sets with sizes beyond the ability of
commonly used software tools to capture, curate, manage and process
data within a tolerable elapsed time”
• Gartner defines it as Data with the following characteristics
– High Velocity
– High Variety
– High Volume
• Another Definition is “Big Data is a large volume, unstructured data
which cannot be handled by traditional database management systems
”
Why a game changer
• Schema on Read
– Interpreting data at processing time
– Key, Values are not intrinsic properties of data but chosen by
person analyzing the data
• Move code to data
– With traditional, we bring data to code and I/O becomes a
bottleneck
– With distributed systems, we have to deal with our own
checkpointing/recovery
• More data beats better algorithms
Enterprise Relevance
• Missed Opportunities
– Channels
– Data that is analyzed
• Constraint was high cost
– Storage
– Processing
• Future-proof your business
– Schema on Read
– Access pattern not as relevant
– Not just future-proofing your architecture
Hadoop Ecosystem
Source: Apache Hadoop Documentation
Hadoop 2 with YARN
Source: Hadoop In Practice by Alex Holmes
Big Data Journey
➢ Real time Insight from all channels
➢ IT is key differentiator for your business
➢ Perfect alignment of Business and IT
➢ Ad Hoc Data Exploration
➢ Batch, Interactive, Real time use cases
➢ Predictive Analytics, Machine Learning
➢ Consolidated Analytics
➢ ETL
➢ Time Constraints
➢ Security standards defined
➢ Governance Standards Defined
➢ Integrated with the Enterprise
➢ Evaluate Business Benefits
➢ Understand Ecosystem
➢ Identify Platform
Aware of Benefits
Execute
Expand
Managed
Optimized
- Scout for Opportunities
- Pilot project
- Multiple Use cases
- Governance Model
- Core competency
Journey Over Time
BusinessValue
Effects
GREAT
GOOD
Real time Stream Processing
Architecture with Hadoop
Flume Architecture
• Distributed system for
collecting and aggregating
from multiple data stores to
a centralized data store
• Agent is a JVM that hosts
the Flume components
• Channel will store
message until picked by a
sink
• Different types of Flume
sources
• Source and Sink are
decoupled
Consolidation Architecture
Multiplexing Architecture
Kafka Introduction
• Messaging System which is distributed, partitioned and replicated
• Kafka brokers run as a cluster
• Producers and Consumers can be written in any language
Topic
• Ordered, immutable sequence numbers
• Retains messages until a period of time
• “Offset” of where they are is controlled by the consumer
• Each partition is replicated and has “leader” and 0 or more “follower”.
R/W only done on leader
Producers and Consumers
• Producer controls which partition messages goes to
• Supports both Queuing and Pub/Sub
– Abstraction called Consumer group
• Ordering within Partition
– Ordering for subscriber has to be done with only one subscriber to that
partition
Storm Introduction
• Distributed real time computational system
–Process unbounded streams of data
–Can use multiple programming languages
–Scalable, fault-tolerant and guarantees that data will be processed
• Use Cases
–Real time analytics, online machine learning
–Continuous Computation
–Distributed RPC
–ETL
• Concepts
–Topology
–Spouts
–Bolts
Concepts
• Storm Cluster
– Master node(Nimbus)
• Distributing code
• Assigns tasks to machines
• Monitors for failures
– Worker nodes(Supervisor)
• Starts/stops worker processes
• Each worker process executes subset of a topology
– Zookeeper
• Coordinates between Nimbus and Supervisors
• Nimbus and Supervisors completely stateless
• State maintained by Zookeeper or local disks
Details
• Stream
– Unbounded sequence of tuples
• Spout(write logic)
– Source of stream. Emits tuples
• Bolt(write logic)
– Processes streams and emits tuples
• Topology
– DAG of spouts and bolts
– Submit a topology to a Storm cluster
– Each node runs in parallel and parallelism is controlled
Stream groupings
• Tells a topology how to send tuples between two components
• Since tasks are executed in parallel, how do we control which tasks the
tuples are being sent to
Why Use Twitter as Data Source
Demo - Twitter TopN Trending Topic
• Method 1 — Flume with interceptor
• Method 2 — Storm with custom Twitter
Spout
• Method 3 — Flume + Kafka + Storm
Demo - Twitter TopN Trending Topic
• Use Flume Twitter Source to ingest data and
publish event to Kafka topic
• Use Kafka as messaging backbone
• Use Storm as an Real-Time event processing
system to calculate TopN trending topic
• Use Redis to store the TopN Result
• Use Node.js/JQuery for visualization
Flow Chart
Demo: Start Redis Server
Demo: Start Node.js server
Demo: Start Storm
Demo: Start Flume Agent
Demo: Storm Console Output
Demo: Trending Result
Flume Agent — Source
Flume Agent — Channel
Flume Agent — Sink
Storm Topology Design
Submit Topology to Storm
Production Cluster
Submit Topology to Test Cluster
ParseTweetBolt Code
ParseTweetBolt Code
ParseTweetBolt Code
Questions?


Vijay Mandava: vijay@insemble.com
Lan Jiang: lan@insemble.com / @Lan_Jiang



Contenu connexe

Tendances

Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Christopher Curtin
 
Apache pulsar - storage architecture
Apache pulsar - storage architectureApache pulsar - storage architecture
Apache pulsar - storage architectureMatteo Merli
 
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...confluent
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafkaemreakis
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explainedconfluent
 
Introduction to Apache BookKeeper Distributed Storage
Introduction to Apache BookKeeper Distributed StorageIntroduction to Apache BookKeeper Distributed Storage
Introduction to Apache BookKeeper Distributed StorageStreamlio
 
Kafka and Spark Streaming
Kafka and Spark StreamingKafka and Spark Streaming
Kafka and Spark Streamingdatamantra
 
Apache Kafka - Free Friday
Apache Kafka - Free FridayApache Kafka - Free Friday
Apache Kafka - Free FridayOtávio Carvalho
 
Pulsar - flexible pub-sub for internet scale
Pulsar - flexible pub-sub for internet scalePulsar - flexible pub-sub for internet scale
Pulsar - flexible pub-sub for internet scaleMatteo Merli
 
Apache Pulsar at Yahoo! Japan
Apache Pulsar at Yahoo! JapanApache Pulsar at Yahoo! Japan
Apache Pulsar at Yahoo! JapanStreamNative
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013mumrah
 
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Gwen (Chen) Shapira
 
Build a custom metrics on aws cloud
Build a custom metrics on aws cloudBuild a custom metrics on aws cloud
Build a custom metrics on aws cloudAhmad karawash
 

Tendances (20)

Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
 
Apache pulsar - storage architecture
Apache pulsar - storage architectureApache pulsar - storage architecture
Apache pulsar - storage architecture
 
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
 
Introduction to Apache BookKeeper Distributed Storage
Introduction to Apache BookKeeper Distributed StorageIntroduction to Apache BookKeeper Distributed Storage
Introduction to Apache BookKeeper Distributed Storage
 
Kafka and Spark Streaming
Kafka and Spark StreamingKafka and Spark Streaming
Kafka and Spark Streaming
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache Kafka - Free Friday
Apache Kafka - Free FridayApache Kafka - Free Friday
Apache Kafka - Free Friday
 
Message queues
Message queuesMessage queues
Message queues
 
Pulsar - flexible pub-sub for internet scale
Pulsar - flexible pub-sub for internet scalePulsar - flexible pub-sub for internet scale
Pulsar - flexible pub-sub for internet scale
 
Kafka aws
Kafka awsKafka aws
Kafka aws
 
Apache Pulsar at Yahoo! Japan
Apache Pulsar at Yahoo! JapanApache Pulsar at Yahoo! Japan
Apache Pulsar at Yahoo! Japan
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
 
Build a custom metrics on aws cloud
Build a custom metrics on aws cloudBuild a custom metrics on aws cloud
Build a custom metrics on aws cloud
 

En vedette

Performance Comparison of Intel Enterprise Edition Lustre and HDFS for MapRed...
Performance Comparison of Intel Enterprise Edition Lustre and HDFS for MapRed...Performance Comparison of Intel Enterprise Edition Lustre and HDFS for MapRed...
Performance Comparison of Intel Enterprise Edition Lustre and HDFS for MapRed...inside-BigData.com
 
Hssc i objective workbook
Hssc i objective workbookHssc i objective workbook
Hssc i objective workbookEngin Basturk
 
Iman kepada Malaikat
Iman kepada MalaikatIman kepada Malaikat
Iman kepada MalaikatNafika E.R.C
 
hivve.me - Collaborative messeneger
hivve.me - Collaborative messeneger hivve.me - Collaborative messeneger
hivve.me - Collaborative messeneger hivve
 
Public Sector Show - Speakers Presentation
Public Sector Show  - Speakers PresentationPublic Sector Show  - Speakers Presentation
Public Sector Show - Speakers Presentationacademiesshow
 
hivve.me - The first collaborative learning messenger
hivve.me - The first collaborative learning messengerhivve.me - The first collaborative learning messenger
hivve.me - The first collaborative learning messengerhivve
 
hivve.me Project Based Learning Messenger
hivve.me  Project Based Learning Messengerhivve.me  Project Based Learning Messenger
hivve.me Project Based Learning Messengerhivve
 
The Academies Show Birmingham 2014 - Session on Pupil Premium
The Academies Show Birmingham 2014 - Session on Pupil PremiumThe Academies Show Birmingham 2014 - Session on Pupil Premium
The Academies Show Birmingham 2014 - Session on Pupil Premiumacademiesshow
 
JessupJamesBIAComprehensiveAssignmentFINAL
JessupJamesBIAComprehensiveAssignmentFINALJessupJamesBIAComprehensiveAssignmentFINAL
JessupJamesBIAComprehensiveAssignmentFINALJames Jessup
 
VCR Presentation Jessup
VCR Presentation JessupVCR Presentation Jessup
VCR Presentation JessupJames Jessup
 

En vedette (20)

Performance Comparison of Intel Enterprise Edition Lustre and HDFS for MapRed...
Performance Comparison of Intel Enterprise Edition Lustre and HDFS for MapRed...Performance Comparison of Intel Enterprise Edition Lustre and HDFS for MapRed...
Performance Comparison of Intel Enterprise Edition Lustre and HDFS for MapRed...
 
NEGOSIASI
NEGOSIASINEGOSIASI
NEGOSIASI
 
Bunga
BungaBunga
Bunga
 
Hssc i objective workbook
Hssc i objective workbookHssc i objective workbook
Hssc i objective workbook
 
Iman kepada Malaikat
Iman kepada MalaikatIman kepada Malaikat
Iman kepada Malaikat
 
hivve.me - Collaborative messeneger
hivve.me - Collaborative messeneger hivve.me - Collaborative messeneger
hivve.me - Collaborative messeneger
 
Pharmacy slide share
Pharmacy slide sharePharmacy slide share
Pharmacy slide share
 
MATT CV ROEVIN
MATT CV ROEVINMATT CV ROEVIN
MATT CV ROEVIN
 
Public Sector Show - Speakers Presentation
Public Sector Show  - Speakers PresentationPublic Sector Show  - Speakers Presentation
Public Sector Show - Speakers Presentation
 
Luxury Wedding Venues in MA
Luxury Wedding Venues in MALuxury Wedding Venues in MA
Luxury Wedding Venues in MA
 
hivve.me - The first collaborative learning messenger
hivve.me - The first collaborative learning messengerhivve.me - The first collaborative learning messenger
hivve.me - The first collaborative learning messenger
 
hivve.me Project Based Learning Messenger
hivve.me  Project Based Learning Messengerhivve.me  Project Based Learning Messenger
hivve.me Project Based Learning Messenger
 
ENFERMERÍA
ENFERMERÍAENFERMERÍA
ENFERMERÍA
 
ankita cv final (2)
ankita cv final (2)ankita cv final (2)
ankita cv final (2)
 
Bunga
BungaBunga
Bunga
 
The Academies Show Birmingham 2014 - Session on Pupil Premium
The Academies Show Birmingham 2014 - Session on Pupil PremiumThe Academies Show Birmingham 2014 - Session on Pupil Premium
The Academies Show Birmingham 2014 - Session on Pupil Premium
 
Q distance
Q distanceQ distance
Q distance
 
JessupJamesBIAComprehensiveAssignmentFINAL
JessupJamesBIAComprehensiveAssignmentFINALJessupJamesBIAComprehensiveAssignmentFINAL
JessupJamesBIAComprehensiveAssignmentFINAL
 
Untitled Presentation
Untitled PresentationUntitled Presentation
Untitled Presentation
 
VCR Presentation Jessup
VCR Presentation JessupVCR Presentation Jessup
VCR Presentation Jessup
 

Similaire à Hadoop Ecosystem and Low Latency Streaming Architecture

HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.pptvijayapraba1
 
Crossing Analytics Systems: Case for Integrated Provenance in Data Lakes
Crossing Analytics Systems: Case for Integrated Provenance in Data LakesCrossing Analytics Systems: Case for Integrated Provenance in Data Lakes
Crossing Analytics Systems: Case for Integrated Provenance in Data LakesIsuru Suriarachchi
 
Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesDavid Martínez Rego
 
Apache flume - an Introduction
Apache flume - an IntroductionApache flume - an Introduction
Apache flume - an IntroductionErik Schmiegelow
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...ssuserd3a367
 
Data Streaming For Big Data
Data Streaming For Big DataData Streaming For Big Data
Data Streaming For Big DataSeval Çapraz
 
Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan Volz
Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan VolzArchiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan Volz
Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan VolzDatabricks
 
Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014Bryan Bende
 
Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013StampedeCon
 
Introduction to Hadoop and Big Data
Introduction to Hadoop and Big DataIntroduction to Hadoop and Big Data
Introduction to Hadoop and Big DataJoe Alex
 
End to End Streaming Architectures
End to End Streaming ArchitecturesEnd to End Streaming Architectures
End to End Streaming ArchitecturesCloudera, Inc.
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiridatastack
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm Chandler Huang
 
Introduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemIntroduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemInSemble
 
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindDeliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindAvere Systems
 
Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...
Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...
Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...Spark Summit
 

Similaire à Hadoop Ecosystem and Low Latency Streaming Architecture (20)

HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.ppt
 
Crossing Analytics Systems: Case for Integrated Provenance in Data Lakes
Crossing Analytics Systems: Case for Integrated Provenance in Data LakesCrossing Analytics Systems: Case for Integrated Provenance in Data Lakes
Crossing Analytics Systems: Case for Integrated Provenance in Data Lakes
 
Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming Architectures
 
Apache flume - an Introduction
Apache flume - an IntroductionApache flume - an Introduction
Apache flume - an Introduction
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
 
Hadoop
HadoopHadoop
Hadoop
 
Data Streaming For Big Data
Data Streaming For Big DataData Streaming For Big Data
Data Streaming For Big Data
 
Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan Volz
Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan VolzArchiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan Volz
Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan Volz
 
Algorithmic Trading
Algorithmic TradingAlgorithmic Trading
Algorithmic Trading
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014
 
Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013
 
Introduction to Hadoop and Big Data
Introduction to Hadoop and Big DataIntroduction to Hadoop and Big Data
Introduction to Hadoop and Big Data
 
Kafka & Hadoop in Rakuten
Kafka & Hadoop in RakutenKafka & Hadoop in Rakuten
Kafka & Hadoop in Rakuten
 
End to End Streaming Architectures
End to End Streaming ArchitecturesEnd to End Streaming Architectures
End to End Streaming Architectures
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiri
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
Introduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemIntroduction To Hadoop Ecosystem
Introduction To Hadoop Ecosystem
 
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindDeliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
 
Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...
Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...
Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...
 

Dernier

VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx9to5mart
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 

Dernier (20)

Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 

Hadoop Ecosystem and Low Latency Streaming Architecture

  • 1. Hadoop Ecosystem and Low Latency Streaming Architecture InSemble Inc. http://www.insemble.com
  • 2. Agenda What is Big Data and why it is relevant ?1 Flume, Kafka and Storm4 Reference Architecture for Low Latency Streaming3 Hadoop Ecosystem2 Demo5
  • 3. Big Data Definitions • Wikipedia defines it as “Data Sets with sizes beyond the ability of commonly used software tools to capture, curate, manage and process data within a tolerable elapsed time” • Gartner defines it as Data with the following characteristics – High Velocity – High Variety – High Volume • Another Definition is “Big Data is a large volume, unstructured data which cannot be handled by traditional database management systems ”
  • 4. Why a game changer • Schema on Read – Interpreting data at processing time – Key, Values are not intrinsic properties of data but chosen by person analyzing the data • Move code to data – With traditional, we bring data to code and I/O becomes a bottleneck – With distributed systems, we have to deal with our own checkpointing/recovery • More data beats better algorithms
  • 5. Enterprise Relevance • Missed Opportunities – Channels – Data that is analyzed • Constraint was high cost – Storage – Processing • Future-proof your business – Schema on Read – Access pattern not as relevant – Not just future-proofing your architecture
  • 6. Hadoop Ecosystem Source: Apache Hadoop Documentation
  • 7. Hadoop 2 with YARN Source: Hadoop In Practice by Alex Holmes
  • 8. Big Data Journey ➢ Real time Insight from all channels ➢ IT is key differentiator for your business ➢ Perfect alignment of Business and IT ➢ Ad Hoc Data Exploration ➢ Batch, Interactive, Real time use cases ➢ Predictive Analytics, Machine Learning ➢ Consolidated Analytics ➢ ETL ➢ Time Constraints ➢ Security standards defined ➢ Governance Standards Defined ➢ Integrated with the Enterprise ➢ Evaluate Business Benefits ➢ Understand Ecosystem ➢ Identify Platform Aware of Benefits Execute Expand Managed Optimized - Scout for Opportunities - Pilot project - Multiple Use cases - Governance Model - Core competency Journey Over Time BusinessValue Effects GREAT GOOD
  • 9. Real time Stream Processing Architecture with Hadoop
  • 10. Flume Architecture • Distributed system for collecting and aggregating from multiple data stores to a centralized data store • Agent is a JVM that hosts the Flume components • Channel will store message until picked by a sink • Different types of Flume sources • Source and Sink are decoupled
  • 13. Kafka Introduction • Messaging System which is distributed, partitioned and replicated • Kafka brokers run as a cluster • Producers and Consumers can be written in any language
  • 14. Topic • Ordered, immutable sequence numbers • Retains messages until a period of time • “Offset” of where they are is controlled by the consumer • Each partition is replicated and has “leader” and 0 or more “follower”. R/W only done on leader
  • 15. Producers and Consumers • Producer controls which partition messages goes to • Supports both Queuing and Pub/Sub – Abstraction called Consumer group • Ordering within Partition – Ordering for subscriber has to be done with only one subscriber to that partition
  • 16. Storm Introduction • Distributed real time computational system –Process unbounded streams of data –Can use multiple programming languages –Scalable, fault-tolerant and guarantees that data will be processed • Use Cases –Real time analytics, online machine learning –Continuous Computation –Distributed RPC –ETL • Concepts –Topology –Spouts –Bolts
  • 17. Concepts • Storm Cluster – Master node(Nimbus) • Distributing code • Assigns tasks to machines • Monitors for failures – Worker nodes(Supervisor) • Starts/stops worker processes • Each worker process executes subset of a topology – Zookeeper • Coordinates between Nimbus and Supervisors • Nimbus and Supervisors completely stateless • State maintained by Zookeeper or local disks
  • 18. Details • Stream – Unbounded sequence of tuples • Spout(write logic) – Source of stream. Emits tuples • Bolt(write logic) – Processes streams and emits tuples • Topology – DAG of spouts and bolts – Submit a topology to a Storm cluster – Each node runs in parallel and parallelism is controlled
  • 19. Stream groupings • Tells a topology how to send tuples between two components • Since tasks are executed in parallel, how do we control which tasks the tuples are being sent to
  • 20. Why Use Twitter as Data Source
  • 21. Demo - Twitter TopN Trending Topic • Method 1 — Flume with interceptor • Method 2 — Storm with custom Twitter Spout • Method 3 — Flume + Kafka + Storm
  • 22. Demo - Twitter TopN Trending Topic • Use Flume Twitter Source to ingest data and publish event to Kafka topic • Use Kafka as messaging backbone • Use Storm as an Real-Time event processing system to calculate TopN trending topic • Use Redis to store the TopN Result • Use Node.js/JQuery for visualization
  • 30. Flume Agent — Source
  • 31. Flume Agent — Channel
  • 34. Submit Topology to Storm Production Cluster
  • 35. Submit Topology to Test Cluster
  • 39. Questions? 
 Vijay Mandava: vijay@insemble.com Lan Jiang: lan@insemble.com / @Lan_Jiang