SlideShare une entreprise Scribd logo
1  sur  46
Télécharger pour lire hors ligne
®
© 2015 MapR Technologies
®
© 2015 MapR Technologies
@kingmesal #StrataHadoop
Jim Scott – Director, Enterprise Strategy & Architecture
®
© 2015 MapR Technologies
The Landscape
•  Stream Processing is Fundamentally Simple
–  Inputs -> Outputs
–  But it is WAY more complicated than this…
•  Optimization can be complicated
•  This space is very confused
–  Performance of different options is dependent upon source
•  Lots of misinformation
–  e.g. performance comparisons that are not apples-to-apples
®
© 2015 MapR Technologies
Semantics
•  There are three general categories of delivery patterns:
–  At-most-once: messages may be lost. This is usually the least desirable
outcome.
–  At-least-once: messages may be redelivered (no loss, but duplicates).
This is good enough for many use cases.
–  Exactly-once: each message is delivered once and only once (no loss,
no duplicates). This is a desirable feature although difficult to
guarantee in all cases.
®
© 2015 MapR Technologies
Today’s Options – Apache Style
•  Samza
•  Storm
•  Spark Streaming
®
© 2015 MapR Technologies© 2015 MapR Technologies
®
Apache Samza
®
© 2015 MapR Technologies
Apache Samza
•  Originally developed in LinkedIn (Chris Riccomini/Jay Kreps),
now ASF top-level project
•  Distributed stream processing framework (YARN/Kafka)
http://samza.apache.org/
®
© 2015 MapR Technologies
Concepts
•  Streams & Partitions
•  Jobs & Tasks
•  Dataflow Graphs
•  Containers
samza.apache.org/learn/documentation/latest/introduction/concepts.html
®
© 2015 MapR Technologies
Streams & Partitions
•  Stream: immutable messages
•  Each stream comprises one
or more partitions
•  Partition: totally ordered
sequence of messages
®
© 2015 MapR Technologies
Jobs & Tasks
•  Job: logical unit of stream
processing, a collection of tasks
•  Task: unit of parallelism
–  in the context of a job, each task
consumes data from one
partition
–  processes messages from each
of its input partitions sequentially
®
© 2015 MapR Technologies
Dataflow Graphs
•  Dataflow graph: logical, directed
graph of jobs
•  Jobs in DG are decoupled
–  can be developed independently
–  don’t impact up/downstream jobs
•  DG can contain cycles
®
© 2015 MapR Technologies
Samza Architecture
•  Processing layer à Samza API
•  Pluggable execution layer
(default: YARN)
•  Pluggable streaming layer
(default: Kafka)
samza.apache.org/learn/documentation/latest/introduction/architecture.html
®
© 2015 MapR Technologies
Samza Execution: Containers
•  Partitions and tasks are both logical
units of parallelism
•  Containers are the unit of physical
parallelism, essentially a Unix process
(or Linux cgroup)
®
© 2015 MapR Technologies
Samza Resources
•  http://www.jfokus.se/jfokus15/preso/ApacheSamza.pdf
•  http://www.berlinbuzzwords.de/session/samza-linkedin-taking-
stream-processing-next-level
•  http://www.infoq.com/articles/linkedin-samza
Kudos to Chris Riccomini and Martin Kleppmann
for their invaluable support concerning Samza!
®
© 2015 MapR Technologies© 2015 MapR Technologies
®
Apache Storm
®
© 2015 MapR Technologies
Apache Storm
•  Originally developed by Nathan Marz at Backtype/Twitter, now
ASF top-level project
•  Distributed, fault-tolerant stream-processing platform
http://storm.apache.org/
®
© 2015 MapR Technologies
Concepts
•  Tuples and Streams
•  Spouts, Bolts, Topologies
•  Tasks and Workers
•  Stream Grouping
storm.apache.org/documentation/Tutorial.html
®
© 2015 MapR Technologies
Tuples and Streams
•  Tuple: ordered list of elements
•  Stream: unbounded sequence of tuples
(b20ea50, nathan@nathanmarz.com) (0f663d2, derekd@yahoo-inc.com)(064874b, andy.feng@gmail.com)
stream
tuple
®
© 2015 MapR Technologies
Spouts
•  The sources of streams
•  Can talk with
–  Queues (Kafka, Kestrel, etc.)
–  Web logs
–  API calls
–  Filesystem (MapR-FS / HDFS)
–  Etc.
S
®
© 2015 MapR Technologies
Bolts
•  Process tuples and create new streams
•  Implement business logic via …
–  Transform
–  Filter
–  Aggregate
–  Join
–  Access datastores & DBs
–  Access APIs (e.g., geo location look-up)
B
®
© 2015 MapR Technologies
Topologies
•  Directed graph of spouts and bolts
S2
B1
B2
S1
B3
B4
B3
®
© 2015 MapR Technologies
Stream Grouping
•  Shuffle grouping: tuples are randomly
distributed across all of the tasks
running the bolt
•  Fields grouping: groups tuples by
specific name field and routes to the
same task
®
© 2015 MapR Technologies
Tasks and Workers
•  Task: each spout/bolt executes as many threads of execution
across the cluster
•  Worker: a physical JVM that executes a subset of all the tasks for
the topology
®
© 2015 MapR Technologies
Trident—the ‘Cascading’ of Storm
•  High-level abstraction processing library on top of Storm
•  Rich API with joins, aggregations, grouping, etc.
•  Provides stateful, exactly-once processing primitives
storm.apache.org/documentation/Trident-tutorial.html
compiled into
Trident topology Storm topology
®
© 2015 MapR Technologies
Execution
master node worker node
Nimbus Supervisor
Worker
Worker
Worker
Zk
0MQ/Netty
®
© 2015 MapR Technologies
Storm Resources
•  https://www.udacity.com/course/ud381
•  http://www.manning.com/sallen/
•  https://github.com/tdunning/storm-counts
®
© 2015 MapR Technologies© 2015 MapR Technologies
®
Apache Spark
®
© 2015 MapR Technologies
Apache Spark
Continued innovation bringing new functionality, such as:
•  Tachyon (Shared RDDs, off-heap solution)
•  BlinkDB (approximate queries)
•  SparkR (R wrapper for Spark)
Spark SQL
(SQL/HQL)
Spark Streaming
(stream processing)
MLlib
(machine learning)
Spark (core execution engine—RDDs)
GraphX
(graph processing)
Mesos
file system (local, MapR-FS, HDFS, S3) or data store (HBase, Elasticsearch, etc.)
YARNStandalone
http://spark.apache.org/
®
© 2015 MapR Technologies
Sweet spot …
•  Iterative Algorithms
–  machine learning
–  graph processing beyond DAG
•  Interactive Data Mining
•  Streaming Applications
®
© 2015 MapR Technologies
Interfacing to permanent storage
•  Local Files
–  file:///opt/httpd/logs/access_log
•  Object Stores (e.g. Amazon S3)
•  MapR-FS, HDFS
–  text files, sequence files, any other Hadoop InputFormat
•  Key-Value datastores (e.g. Apache HBase)
•  Elasticsearch
®
© 2015 MapR Technologies
Cluster Managers
YARN
Standalone
®
© 2015 MapR Technologies
Resilient Distributed Datasets (RDD)
•  RDD: core abstraction of Spark execution engine
•  Collections of elements that can be operated on in parallel
•  Persistent in memory between operations
www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf
®
© 2015 MapR Technologies
RDD Operations
•  Lazy evaluation is key to Spark
•  Transformations
–  Creation of a new dataset from an existing:
map, filter, distinct, union, sample, groupByKey, join, etc.
•  Actions
–  Return a value after running a computation:
collect, count, first, takeSample, foreach, etc.
®
© 2015 MapR Technologies
Spark Streaming
•  High-level language operators for streaming data
•  Fault-tolerant semantics
•  Support for merging streaming data with historical data
spark.apache.org/docs/latest/streaming-programming-guide.html
®
© 2015 MapR Technologies
Spark Streaming
Run a streaming computation as a series of small, deterministic
batch jobs.
•  Chop up live stream into batches of X
seconds (DStream)
•  Spark treats each batch of data as RDDs
and processes them using RDD ops
•  Finally, processed results of the RDD
operations are returned in batches
Spark	
  
Spark	
  
Streaming	
  
batches	
  of	
  X	
  seconds	
  
live	
  data	
  stream	
  
processed	
  results	
  
®
© 2015 MapR Technologies
Spark Streaming
Run a streaming computation as a series of small, deterministic
batch jobs.
•  Batch sizes as low as ½ second,
latency of about 1 second
•  Potential for combining batch processing
and streaming processing in the same
system
Spark	
  
Spark	
  
Streaming	
  
batches	
  of	
  X	
  seconds	
  
live	
  data	
  stream	
  
processed	
  results	
  
®
© 2015 MapR Technologies
Spark Streaming: Transformations
•  Stateless transformations
•  Stateful transformations
–  checkpointing
–  windowed transformations
•  window duration
•  sliding duration
spark.apache.org/docs/latest/streaming-programming-guide.html#transformations-on-dstreams
®
© 2015 MapR Technologies
Spark Streaming: Execution
®
© 2015 MapR Technologies
Spark Resources
•  http://shop.oreilly.com/product/0636920028512.do
•  http://databricks.com/spark-training-resources
•  http://oreilly.com/go/sparkcert
•  http://spark-stack.org
•  https://www.mapr.com/blog/getting-started-spark-mapr-sandbox
®
© 2015 MapR Technologies© 2015 MapR Technologies
®
Comparison
®
© 2015 MapR Technologies
Samza vs Storm vs Spark
Samza Storm Spark Streaming
Stream Source Consumers Spouts Receivers
Stream Primitive Message Tuple DStream
Stream Computation Tasks Bolts Transformations
®
© 2015 MapR Technologies
Samza vs Storm vs Spark
Samza Storm Spark Streaming
processing model one record at a time one record at a time micro-batch
latency milliseconds milliseconds seconds
throughput 100k+ records per node per
second
10k+ records per node per
second
100k+ records per node
per second
processing guarantees at-least-once delivery;
support for exactly-once
planned
at-least-once / exactly once
(with Trident)
exactly once
stateful operations yes no / yes (with Trident) yes
language support + +++ ++
community size/
committer & user base
+ +++ ++
special agile, state management,
Kappa-native
distributed RPC unified processing (batch,
SQL, etc.)
®
© 2015 MapR Technologies
When to use what?
use case Samza Storm Spark Streaming
filtering
✓ ✓ ✓
counting
(incl. aggregations)
✓ ✓
joins ✓
distributed RPC ✓
re-processing
(aka Kappa architecture)
✓ ✓
materialized view maintenance
(cache invalidation)
✓ ✓
®
© 2015 MapR Technologies
Spark vs Storm: Throughput
•  Spark Streaming: 670k records/sec/node
•  Storm: 115k records/sec/node
•  Commercial systems: 100-500k records/sec/node
0	
  
10	
  
20	
  
30	
  
100	
   1000	
  
Throughput	
  per	
  node	
  
(MB/s)	
  
Record	
  Size	
  (bytes)	
  
WordCount	
  
Spark	
  
Storm	
  
0	
  
20	
  
40	
  
60	
  
100	
   1000	
  
Throughput	
  per	
  node	
  
(MB/s)	
  
Record	
  Size	
  (bytes)	
  
Grep	
  
Spark	
  
Storm	
  
®
© 2015 MapR Technologies
Comparison Resources
•  http://www.zdatainc.com/2014/09/apache-storm-apache-spark/
•  http://www.slideshare.net/ChicagoHUG/yahoo-compares-storm-and-spark
•  http://www.slideshare.net/ptgoetz/apache-storm-vs-spark-streaming
•  http://xinhstechblog.blogspot.ie/2014/06/storm-vs-spark-streaming-side-by-side.html
•  http://samza.apache.org/learn/documentation/0.8/comparisons/storm.html
•  http://samza.apache.org/learn/documentation/0.8/comparisons/spark-streaming.html
•  https://www.mapr.com/blog/spark-streaming-vs-storm-trident-whiteboard-walkthrough
®
© 2015 MapR Technologies
$50M$50Min Free Training
®
© 2015 MapR Technologies
jscott@mapr.com
Engage with us!
@MapR maprtech
+MaprTechnologies
maprtech
MapR Technologies
Q&A

Contenu connexe

Tendances

Next-Gen Decision Making in Under 2ms
Next-Gen Decision Making in Under 2msNext-Gen Decision Making in Under 2ms
Next-Gen Decision Making in Under 2msIlya Ganelin
 
Hadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an exampleHadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an examplehadooparchbook
 
Kafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka MeetupKafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka MeetupGwen (Chen) Shapira
 
Unified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache BeamUnified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache BeamDataWorks Summit/Hadoop Summit
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming AnalyticsGuido Schmutz
 
Real time analytics with Kafka and SparkStreaming
Real time analytics with Kafka and SparkStreamingReal time analytics with Kafka and SparkStreaming
Real time analytics with Kafka and SparkStreamingAshish Singh
 
Lessons Learned: Using Spark and Microservices
Lessons Learned: Using Spark and MicroservicesLessons Learned: Using Spark and Microservices
Lessons Learned: Using Spark and MicroservicesAlexis Seigneurin
 
Streaming architecture patterns
Streaming architecture patternsStreaming architecture patterns
Streaming architecture patternshadooparchbook
 
Top 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applicationsTop 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applicationshadooparchbook
 
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...Chris Fregly
 
Kafka and Storm - event processing in realtime
Kafka and Storm - event processing in realtimeKafka and Storm - event processing in realtime
Kafka and Storm - event processing in realtimeGuido Schmutz
 
Apache Storm vs. Spark Streaming - two stream processing platforms compared
Apache Storm vs. Spark Streaming - two stream processing platforms comparedApache Storm vs. Spark Streaming - two stream processing platforms compared
Apache Storm vs. Spark Streaming - two stream processing platforms comparedGuido Schmutz
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaHelena Edelson
 
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingData Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingGwen (Chen) Shapira
 
Bringing complex event processing to Spark streaming
Bringing complex event processing to Spark streamingBringing complex event processing to Spark streaming
Bringing complex event processing to Spark streamingDataWorks Summit
 

Tendances (20)

Streaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkStreaming in the Wild with Apache Flink
Streaming in the Wild with Apache Flink
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Next-Gen Decision Making in Under 2ms
Next-Gen Decision Making in Under 2msNext-Gen Decision Making in Under 2ms
Next-Gen Decision Making in Under 2ms
 
Hadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an exampleHadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an example
 
Kafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka MeetupKafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka Meetup
 
Unified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache BeamUnified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache Beam
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming Analytics
 
Real time analytics with Kafka and SparkStreaming
Real time analytics with Kafka and SparkStreamingReal time analytics with Kafka and SparkStreaming
Real time analytics with Kafka and SparkStreaming
 
Lessons Learned: Using Spark and Microservices
Lessons Learned: Using Spark and MicroservicesLessons Learned: Using Spark and Microservices
Lessons Learned: Using Spark and Microservices
 
Streaming architecture patterns
Streaming architecture patternsStreaming architecture patterns
Streaming architecture patterns
 
Kafka for DBAs
Kafka for DBAsKafka for DBAs
Kafka for DBAs
 
Top 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applicationsTop 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applications
 
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
 
Have your cake and eat it too
Have your cake and eat it tooHave your cake and eat it too
Have your cake and eat it too
 
Kafka and Storm - event processing in realtime
Kafka and Storm - event processing in realtimeKafka and Storm - event processing in realtime
Kafka and Storm - event processing in realtime
 
Apache Storm vs. Spark Streaming - two stream processing platforms compared
Apache Storm vs. Spark Streaming - two stream processing platforms comparedApache Storm vs. Spark Streaming - two stream processing platforms compared
Apache Storm vs. Spark Streaming - two stream processing platforms compared
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and Akka
 
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingData Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
 
Bringing complex event processing to Spark streaming
Bringing complex event processing to Spark streamingBringing complex event processing to Spark streaming
Bringing complex event processing to Spark streaming
 
Twitter with hadoop for oow
Twitter with hadoop for oowTwitter with hadoop for oow
Twitter with hadoop for oow
 

En vedette

Introduction to Real-time data processing
Introduction to Real-time data processingIntroduction to Real-time data processing
Introduction to Real-time data processingYogi Devendra Vyavahare
 
Data Streaming (in a Nutshell) ... and Spark's window operations
Data Streaming (in a Nutshell) ... and Spark's window operationsData Streaming (in a Nutshell) ... and Spark's window operations
Data Streaming (in a Nutshell) ... and Spark's window operationsVincenzo Gulisano
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming AnalyticsGuido Schmutz
 
Introduction To Streaming Data and Stream Processing with Apache Kafka
Introduction To Streaming Data and Stream Processing with Apache KafkaIntroduction To Streaming Data and Stream Processing with Apache Kafka
Introduction To Streaming Data and Stream Processing with Apache Kafkaconfluent
 
Real-Time Event & Stream Processing on MS Azure
Real-Time Event & Stream Processing on MS AzureReal-Time Event & Stream Processing on MS Azure
Real-Time Event & Stream Processing on MS AzureKhalid Salama
 
Hive Poster
Hive PosterHive Poster
Hive Posterragho
 
Taboola's experience with Apache Spark (presentation @ Reversim 2014)
Taboola's experience with Apache Spark (presentation @ Reversim 2014)Taboola's experience with Apache Spark (presentation @ Reversim 2014)
Taboola's experience with Apache Spark (presentation @ Reversim 2014)tsliwowicz
 
AWS Lambda: Event-driven Code in the Cloud
AWS Lambda: Event-driven Code in the CloudAWS Lambda: Event-driven Code in the Cloud
AWS Lambda: Event-driven Code in the CloudAmazon Web Services
 
Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesDavid Martínez Rego
 
Real Time Analytics with Apache Cassandra - Cassandra Day Munich
Real Time Analytics with Apache Cassandra - Cassandra Day MunichReal Time Analytics with Apache Cassandra - Cassandra Day Munich
Real Time Analytics with Apache Cassandra - Cassandra Day MunichGuido Schmutz
 
Real-time Stream Processing with Apache Flink @ Hadoop Summit
Real-time Stream Processing with Apache Flink @ Hadoop SummitReal-time Stream Processing with Apache Flink @ Hadoop Summit
Real-time Stream Processing with Apache Flink @ Hadoop SummitGyula Fóra
 
KDD 2016 Streaming Analytics Tutorial
KDD 2016 Streaming Analytics TutorialKDD 2016 Streaming Analytics Tutorial
KDD 2016 Streaming Analytics TutorialNeera Agarwal
 
RBea: Scalable Real-Time Analytics at King
RBea: Scalable Real-Time Analytics at KingRBea: Scalable Real-Time Analytics at King
RBea: Scalable Real-Time Analytics at KingGyula Fóra
 
Real Time Analytics with Apache Cassandra - Cassandra Day Berlin
Real Time Analytics with Apache Cassandra - Cassandra Day BerlinReal Time Analytics with Apache Cassandra - Cassandra Day Berlin
Real Time Analytics with Apache Cassandra - Cassandra Day BerlinGuido Schmutz
 
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop EcosystemLarge-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop EcosystemGyula Fóra
 
Real-time analytics as a service at King
Real-time analytics as a service at King Real-time analytics as a service at King
Real-time analytics as a service at King Gyula Fóra
 
Data encoding and Metadata for Streams
Data encoding and Metadata for StreamsData encoding and Metadata for Streams
Data encoding and Metadata for Streamsunivalence
 
Stream Analytics in the Enterprise
Stream Analytics in the EnterpriseStream Analytics in the Enterprise
Stream Analytics in the EnterpriseJesus Rodriguez
 

En vedette (20)

Introduction to Real-time data processing
Introduction to Real-time data processingIntroduction to Real-time data processing
Introduction to Real-time data processing
 
Data Streaming (in a Nutshell) ... and Spark's window operations
Data Streaming (in a Nutshell) ... and Spark's window operationsData Streaming (in a Nutshell) ... and Spark's window operations
Data Streaming (in a Nutshell) ... and Spark's window operations
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming Analytics
 
Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?
 
Introduction To Streaming Data and Stream Processing with Apache Kafka
Introduction To Streaming Data and Stream Processing with Apache KafkaIntroduction To Streaming Data and Stream Processing with Apache Kafka
Introduction To Streaming Data and Stream Processing with Apache Kafka
 
Real-Time Event & Stream Processing on MS Azure
Real-Time Event & Stream Processing on MS AzureReal-Time Event & Stream Processing on MS Azure
Real-Time Event & Stream Processing on MS Azure
 
Hive Poster
Hive PosterHive Poster
Hive Poster
 
Taboola's experience with Apache Spark (presentation @ Reversim 2014)
Taboola's experience with Apache Spark (presentation @ Reversim 2014)Taboola's experience with Apache Spark (presentation @ Reversim 2014)
Taboola's experience with Apache Spark (presentation @ Reversim 2014)
 
AWS Lambda: Event-driven Code in the Cloud
AWS Lambda: Event-driven Code in the CloudAWS Lambda: Event-driven Code in the Cloud
AWS Lambda: Event-driven Code in the Cloud
 
Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming Architectures
 
Real Time Analytics with Apache Cassandra - Cassandra Day Munich
Real Time Analytics with Apache Cassandra - Cassandra Day MunichReal Time Analytics with Apache Cassandra - Cassandra Day Munich
Real Time Analytics with Apache Cassandra - Cassandra Day Munich
 
Real-time Stream Processing with Apache Flink @ Hadoop Summit
Real-time Stream Processing with Apache Flink @ Hadoop SummitReal-time Stream Processing with Apache Flink @ Hadoop Summit
Real-time Stream Processing with Apache Flink @ Hadoop Summit
 
KDD 2016 Streaming Analytics Tutorial
KDD 2016 Streaming Analytics TutorialKDD 2016 Streaming Analytics Tutorial
KDD 2016 Streaming Analytics Tutorial
 
RBea: Scalable Real-Time Analytics at King
RBea: Scalable Real-Time Analytics at KingRBea: Scalable Real-Time Analytics at King
RBea: Scalable Real-Time Analytics at King
 
Real Time Analytics with Apache Cassandra - Cassandra Day Berlin
Real Time Analytics with Apache Cassandra - Cassandra Day BerlinReal Time Analytics with Apache Cassandra - Cassandra Day Berlin
Real Time Analytics with Apache Cassandra - Cassandra Day Berlin
 
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop EcosystemLarge-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem
 
Real-time analytics as a service at King
Real-time analytics as a service at King Real-time analytics as a service at King
Real-time analytics as a service at King
 
Data encoding and Metadata for Streams
Data encoding and Metadata for StreamsData encoding and Metadata for Streams
Data encoding and Metadata for Streams
 
Streaming Analytics
Streaming AnalyticsStreaming Analytics
Streaming Analytics
 
Stream Analytics in the Enterprise
Stream Analytics in the EnterpriseStream Analytics in the Enterprise
Stream Analytics in the Enterprise
 

Similaire à Stream Processing Everywhere - What to use?

Cleveland Hadoop Users Group - Spark
Cleveland Hadoop Users Group - SparkCleveland Hadoop Users Group - Spark
Cleveland Hadoop Users Group - SparkVince Gonzalez
 
New York Storm Users Group 2014-01-28 - Using Storm with MapR M7 for Real-Tim...
New York Storm Users Group 2014-01-28 - Using Storm with MapR M7 for Real-Tim...New York Storm Users Group 2014-01-28 - Using Storm with MapR M7 for Real-Tim...
New York Storm Users Group 2014-01-28 - Using Storm with MapR M7 for Real-Tim...Gna Phetsarath
 
Low Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexLow Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexApache Apex
 
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)Spark Summit
 
Seattle Spark Meetup Mobius CSharp API
Seattle Spark Meetup Mobius CSharp APISeattle Spark Meetup Mobius CSharp API
Seattle Spark Meetup Mobius CSharp APIshareddatamsft
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and FutureJianfeng Zhang
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and FutureRajesh Balamohan
 
Visual Mapping of Clickstream Data
Visual Mapping of Clickstream DataVisual Mapping of Clickstream Data
Visual Mapping of Clickstream DataDataWorks Summit
 
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...Data Con LA
 
Spark Streaming & Kafka-The Future of Stream Processing
Spark Streaming & Kafka-The Future of Stream ProcessingSpark Streaming & Kafka-The Future of Stream Processing
Spark Streaming & Kafka-The Future of Stream ProcessingJack Gudenkauf
 
Spark introduction and architecture
Spark introduction and architectureSpark introduction and architecture
Spark introduction and architectureSohil Jain
 
Spark introduction and architecture
Spark introduction and architectureSpark introduction and architecture
Spark introduction and architectureSohil Jain
 
How jKool Analyzes Streaming Data in Real Time with DataStax
How jKool Analyzes Streaming Data in Real Time with DataStaxHow jKool Analyzes Streaming Data in Real Time with DataStax
How jKool Analyzes Streaming Data in Real Time with DataStaxjKool
 
How jKool Analyzes Streaming Data in Real Time with DataStax
How jKool Analyzes Streaming Data in Real Time with DataStaxHow jKool Analyzes Streaming Data in Real Time with DataStax
How jKool Analyzes Streaming Data in Real Time with DataStaxDataStax
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform WebinarCloudera, Inc.
 
Intro to Apache Spark by CTO of Twingo
Intro to Apache Spark by CTO of TwingoIntro to Apache Spark by CTO of Twingo
Intro to Apache Spark by CTO of TwingoMapR Technologies
 
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud PlatformTeaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud PlatformYao Yao
 
20151015 zagreb spark_notebooks
20151015 zagreb spark_notebooks20151015 zagreb spark_notebooks
20151015 zagreb spark_notebooksAndrey Vykhodtsev
 
Apache Spark - A High Level overview
Apache Spark - A High Level overviewApache Spark - A High Level overview
Apache Spark - A High Level overviewKaran Alang
 
Streaming meetup
Streaming meetupStreaming meetup
Streaming meetupkarthik_krk
 

Similaire à Stream Processing Everywhere - What to use? (20)

Cleveland Hadoop Users Group - Spark
Cleveland Hadoop Users Group - SparkCleveland Hadoop Users Group - Spark
Cleveland Hadoop Users Group - Spark
 
New York Storm Users Group 2014-01-28 - Using Storm with MapR M7 for Real-Tim...
New York Storm Users Group 2014-01-28 - Using Storm with MapR M7 for Real-Tim...New York Storm Users Group 2014-01-28 - Using Storm with MapR M7 for Real-Tim...
New York Storm Users Group 2014-01-28 - Using Storm with MapR M7 for Real-Tim...
 
Low Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexLow Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache Apex
 
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
 
Seattle Spark Meetup Mobius CSharp API
Seattle Spark Meetup Mobius CSharp APISeattle Spark Meetup Mobius CSharp API
Seattle Spark Meetup Mobius CSharp API
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
 
Visual Mapping of Clickstream Data
Visual Mapping of Clickstream DataVisual Mapping of Clickstream Data
Visual Mapping of Clickstream Data
 
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
 
Spark Streaming & Kafka-The Future of Stream Processing
Spark Streaming & Kafka-The Future of Stream ProcessingSpark Streaming & Kafka-The Future of Stream Processing
Spark Streaming & Kafka-The Future of Stream Processing
 
Spark introduction and architecture
Spark introduction and architectureSpark introduction and architecture
Spark introduction and architecture
 
Spark introduction and architecture
Spark introduction and architectureSpark introduction and architecture
Spark introduction and architecture
 
How jKool Analyzes Streaming Data in Real Time with DataStax
How jKool Analyzes Streaming Data in Real Time with DataStaxHow jKool Analyzes Streaming Data in Real Time with DataStax
How jKool Analyzes Streaming Data in Real Time with DataStax
 
How jKool Analyzes Streaming Data in Real Time with DataStax
How jKool Analyzes Streaming Data in Real Time with DataStaxHow jKool Analyzes Streaming Data in Real Time with DataStax
How jKool Analyzes Streaming Data in Real Time with DataStax
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform Webinar
 
Intro to Apache Spark by CTO of Twingo
Intro to Apache Spark by CTO of TwingoIntro to Apache Spark by CTO of Twingo
Intro to Apache Spark by CTO of Twingo
 
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud PlatformTeaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
 
20151015 zagreb spark_notebooks
20151015 zagreb spark_notebooks20151015 zagreb spark_notebooks
20151015 zagreb spark_notebooks
 
Apache Spark - A High Level overview
Apache Spark - A High Level overviewApache Spark - A High Level overview
Apache Spark - A High Level overview
 
Streaming meetup
Streaming meetupStreaming meetup
Streaming meetup
 

Plus de MapR Technologies

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscapeMapR Technologies
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationMapR Technologies
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataMapR Technologies
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureMapR Technologies
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...MapR Technologies
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsMapR Technologies
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMapR Technologies
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsMapR Technologies
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageMapR Technologies
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionMapR Technologies
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformMapR Technologies
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...MapR Technologies
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareMapR Technologies
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsMapR Technologies
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Technologies
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data AnalyticsMapR Technologies
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsMapR Technologies
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR Technologies
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLMapR Technologies
 

Plus de MapR Technologies (20)

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscape
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your Data
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIs
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in Healthcare
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQL
 

Dernier

Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 

Dernier (20)

Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

Stream Processing Everywhere - What to use?

  • 1. ® © 2015 MapR Technologies ® © 2015 MapR Technologies @kingmesal #StrataHadoop Jim Scott – Director, Enterprise Strategy & Architecture
  • 2. ® © 2015 MapR Technologies The Landscape •  Stream Processing is Fundamentally Simple –  Inputs -> Outputs –  But it is WAY more complicated than this… •  Optimization can be complicated •  This space is very confused –  Performance of different options is dependent upon source •  Lots of misinformation –  e.g. performance comparisons that are not apples-to-apples
  • 3. ® © 2015 MapR Technologies Semantics •  There are three general categories of delivery patterns: –  At-most-once: messages may be lost. This is usually the least desirable outcome. –  At-least-once: messages may be redelivered (no loss, but duplicates). This is good enough for many use cases. –  Exactly-once: each message is delivered once and only once (no loss, no duplicates). This is a desirable feature although difficult to guarantee in all cases.
  • 4. ® © 2015 MapR Technologies Today’s Options – Apache Style •  Samza •  Storm •  Spark Streaming
  • 5. ® © 2015 MapR Technologies© 2015 MapR Technologies ® Apache Samza
  • 6. ® © 2015 MapR Technologies Apache Samza •  Originally developed in LinkedIn (Chris Riccomini/Jay Kreps), now ASF top-level project •  Distributed stream processing framework (YARN/Kafka) http://samza.apache.org/
  • 7. ® © 2015 MapR Technologies Concepts •  Streams & Partitions •  Jobs & Tasks •  Dataflow Graphs •  Containers samza.apache.org/learn/documentation/latest/introduction/concepts.html
  • 8. ® © 2015 MapR Technologies Streams & Partitions •  Stream: immutable messages •  Each stream comprises one or more partitions •  Partition: totally ordered sequence of messages
  • 9. ® © 2015 MapR Technologies Jobs & Tasks •  Job: logical unit of stream processing, a collection of tasks •  Task: unit of parallelism –  in the context of a job, each task consumes data from one partition –  processes messages from each of its input partitions sequentially
  • 10. ® © 2015 MapR Technologies Dataflow Graphs •  Dataflow graph: logical, directed graph of jobs •  Jobs in DG are decoupled –  can be developed independently –  don’t impact up/downstream jobs •  DG can contain cycles
  • 11. ® © 2015 MapR Technologies Samza Architecture •  Processing layer à Samza API •  Pluggable execution layer (default: YARN) •  Pluggable streaming layer (default: Kafka) samza.apache.org/learn/documentation/latest/introduction/architecture.html
  • 12. ® © 2015 MapR Technologies Samza Execution: Containers •  Partitions and tasks are both logical units of parallelism •  Containers are the unit of physical parallelism, essentially a Unix process (or Linux cgroup)
  • 13. ® © 2015 MapR Technologies Samza Resources •  http://www.jfokus.se/jfokus15/preso/ApacheSamza.pdf •  http://www.berlinbuzzwords.de/session/samza-linkedin-taking- stream-processing-next-level •  http://www.infoq.com/articles/linkedin-samza Kudos to Chris Riccomini and Martin Kleppmann for their invaluable support concerning Samza!
  • 14. ® © 2015 MapR Technologies© 2015 MapR Technologies ® Apache Storm
  • 15. ® © 2015 MapR Technologies Apache Storm •  Originally developed by Nathan Marz at Backtype/Twitter, now ASF top-level project •  Distributed, fault-tolerant stream-processing platform http://storm.apache.org/
  • 16. ® © 2015 MapR Technologies Concepts •  Tuples and Streams •  Spouts, Bolts, Topologies •  Tasks and Workers •  Stream Grouping storm.apache.org/documentation/Tutorial.html
  • 17. ® © 2015 MapR Technologies Tuples and Streams •  Tuple: ordered list of elements •  Stream: unbounded sequence of tuples (b20ea50, nathan@nathanmarz.com) (0f663d2, derekd@yahoo-inc.com)(064874b, andy.feng@gmail.com) stream tuple
  • 18. ® © 2015 MapR Technologies Spouts •  The sources of streams •  Can talk with –  Queues (Kafka, Kestrel, etc.) –  Web logs –  API calls –  Filesystem (MapR-FS / HDFS) –  Etc. S
  • 19. ® © 2015 MapR Technologies Bolts •  Process tuples and create new streams •  Implement business logic via … –  Transform –  Filter –  Aggregate –  Join –  Access datastores & DBs –  Access APIs (e.g., geo location look-up) B
  • 20. ® © 2015 MapR Technologies Topologies •  Directed graph of spouts and bolts S2 B1 B2 S1 B3 B4 B3
  • 21. ® © 2015 MapR Technologies Stream Grouping •  Shuffle grouping: tuples are randomly distributed across all of the tasks running the bolt •  Fields grouping: groups tuples by specific name field and routes to the same task
  • 22. ® © 2015 MapR Technologies Tasks and Workers •  Task: each spout/bolt executes as many threads of execution across the cluster •  Worker: a physical JVM that executes a subset of all the tasks for the topology
  • 23. ® © 2015 MapR Technologies Trident—the ‘Cascading’ of Storm •  High-level abstraction processing library on top of Storm •  Rich API with joins, aggregations, grouping, etc. •  Provides stateful, exactly-once processing primitives storm.apache.org/documentation/Trident-tutorial.html compiled into Trident topology Storm topology
  • 24. ® © 2015 MapR Technologies Execution master node worker node Nimbus Supervisor Worker Worker Worker Zk 0MQ/Netty
  • 25. ® © 2015 MapR Technologies Storm Resources •  https://www.udacity.com/course/ud381 •  http://www.manning.com/sallen/ •  https://github.com/tdunning/storm-counts
  • 26. ® © 2015 MapR Technologies© 2015 MapR Technologies ® Apache Spark
  • 27. ® © 2015 MapR Technologies Apache Spark Continued innovation bringing new functionality, such as: •  Tachyon (Shared RDDs, off-heap solution) •  BlinkDB (approximate queries) •  SparkR (R wrapper for Spark) Spark SQL (SQL/HQL) Spark Streaming (stream processing) MLlib (machine learning) Spark (core execution engine—RDDs) GraphX (graph processing) Mesos file system (local, MapR-FS, HDFS, S3) or data store (HBase, Elasticsearch, etc.) YARNStandalone http://spark.apache.org/
  • 28. ® © 2015 MapR Technologies Sweet spot … •  Iterative Algorithms –  machine learning –  graph processing beyond DAG •  Interactive Data Mining •  Streaming Applications
  • 29. ® © 2015 MapR Technologies Interfacing to permanent storage •  Local Files –  file:///opt/httpd/logs/access_log •  Object Stores (e.g. Amazon S3) •  MapR-FS, HDFS –  text files, sequence files, any other Hadoop InputFormat •  Key-Value datastores (e.g. Apache HBase) •  Elasticsearch
  • 30. ® © 2015 MapR Technologies Cluster Managers YARN Standalone
  • 31. ® © 2015 MapR Technologies Resilient Distributed Datasets (RDD) •  RDD: core abstraction of Spark execution engine •  Collections of elements that can be operated on in parallel •  Persistent in memory between operations www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf
  • 32. ® © 2015 MapR Technologies RDD Operations •  Lazy evaluation is key to Spark •  Transformations –  Creation of a new dataset from an existing: map, filter, distinct, union, sample, groupByKey, join, etc. •  Actions –  Return a value after running a computation: collect, count, first, takeSample, foreach, etc.
  • 33. ® © 2015 MapR Technologies Spark Streaming •  High-level language operators for streaming data •  Fault-tolerant semantics •  Support for merging streaming data with historical data spark.apache.org/docs/latest/streaming-programming-guide.html
  • 34. ® © 2015 MapR Technologies Spark Streaming Run a streaming computation as a series of small, deterministic batch jobs. •  Chop up live stream into batches of X seconds (DStream) •  Spark treats each batch of data as RDDs and processes them using RDD ops •  Finally, processed results of the RDD operations are returned in batches Spark   Spark   Streaming   batches  of  X  seconds   live  data  stream   processed  results  
  • 35. ® © 2015 MapR Technologies Spark Streaming Run a streaming computation as a series of small, deterministic batch jobs. •  Batch sizes as low as ½ second, latency of about 1 second •  Potential for combining batch processing and streaming processing in the same system Spark   Spark   Streaming   batches  of  X  seconds   live  data  stream   processed  results  
  • 36. ® © 2015 MapR Technologies Spark Streaming: Transformations •  Stateless transformations •  Stateful transformations –  checkpointing –  windowed transformations •  window duration •  sliding duration spark.apache.org/docs/latest/streaming-programming-guide.html#transformations-on-dstreams
  • 37. ® © 2015 MapR Technologies Spark Streaming: Execution
  • 38. ® © 2015 MapR Technologies Spark Resources •  http://shop.oreilly.com/product/0636920028512.do •  http://databricks.com/spark-training-resources •  http://oreilly.com/go/sparkcert •  http://spark-stack.org •  https://www.mapr.com/blog/getting-started-spark-mapr-sandbox
  • 39. ® © 2015 MapR Technologies© 2015 MapR Technologies ® Comparison
  • 40. ® © 2015 MapR Technologies Samza vs Storm vs Spark Samza Storm Spark Streaming Stream Source Consumers Spouts Receivers Stream Primitive Message Tuple DStream Stream Computation Tasks Bolts Transformations
  • 41. ® © 2015 MapR Technologies Samza vs Storm vs Spark Samza Storm Spark Streaming processing model one record at a time one record at a time micro-batch latency milliseconds milliseconds seconds throughput 100k+ records per node per second 10k+ records per node per second 100k+ records per node per second processing guarantees at-least-once delivery; support for exactly-once planned at-least-once / exactly once (with Trident) exactly once stateful operations yes no / yes (with Trident) yes language support + +++ ++ community size/ committer & user base + +++ ++ special agile, state management, Kappa-native distributed RPC unified processing (batch, SQL, etc.)
  • 42. ® © 2015 MapR Technologies When to use what? use case Samza Storm Spark Streaming filtering ✓ ✓ ✓ counting (incl. aggregations) ✓ ✓ joins ✓ distributed RPC ✓ re-processing (aka Kappa architecture) ✓ ✓ materialized view maintenance (cache invalidation) ✓ ✓
  • 43. ® © 2015 MapR Technologies Spark vs Storm: Throughput •  Spark Streaming: 670k records/sec/node •  Storm: 115k records/sec/node •  Commercial systems: 100-500k records/sec/node 0   10   20   30   100   1000   Throughput  per  node   (MB/s)   Record  Size  (bytes)   WordCount   Spark   Storm   0   20   40   60   100   1000   Throughput  per  node   (MB/s)   Record  Size  (bytes)   Grep   Spark   Storm  
  • 44. ® © 2015 MapR Technologies Comparison Resources •  http://www.zdatainc.com/2014/09/apache-storm-apache-spark/ •  http://www.slideshare.net/ChicagoHUG/yahoo-compares-storm-and-spark •  http://www.slideshare.net/ptgoetz/apache-storm-vs-spark-streaming •  http://xinhstechblog.blogspot.ie/2014/06/storm-vs-spark-streaming-side-by-side.html •  http://samza.apache.org/learn/documentation/0.8/comparisons/storm.html •  http://samza.apache.org/learn/documentation/0.8/comparisons/spark-streaming.html •  https://www.mapr.com/blog/spark-streaming-vs-storm-trident-whiteboard-walkthrough
  • 45. ® © 2015 MapR Technologies $50M$50Min Free Training
  • 46. ® © 2015 MapR Technologies jscott@mapr.com Engage with us! @MapR maprtech +MaprTechnologies maprtech MapR Technologies Q&A