SlideShare une entreprise Scribd logo
1  sur  58
Télécharger pour lire hors ligne
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Data Science Company
Real Time Big Data
InfoFarm Seminar
18/11/2015
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
About Me
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
About InfoFarm
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Agenda
•  Typical Big Data Landscape
•  The need for Real Time Big Data
•  Real Time Data Ingestion
•  Tools for Real Time Big Data
– Apache Spark
– Apache Storm
– Search
•  Q&A
•  Lunch
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
A Typical Big Data Landscape
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
A Typical Big Data Landscape
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
A Typical Big Data Landscape
•  Data Silo
•  Batch environment
•  Periodical Analytics/statistics
•  Data Source for new systems
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
The need for Real Time Big Data
•  Obtaining analytical results faster
–  Processing faster than once a day
•  Load evens out over day
•  Past/Present/Future
–  Alert for certain events
–  Updating Prediction models on-the-fly
•  Allow faster feedback to end users
–  See results of your actions right away
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Perfect fits for Real Time Processing
•  Anomaly Detection
–  Abnormal readings of sensors
–  Abnormal amounts of log files
–  Fraud detection
•  Real Time updates to Recommender models
–  Fast new recommendations in e-commerce
–  Support for trending items
–  Fast responses to events happening right now
•  Real Time updates of clustering models
•  Improving Classification based un current events
•  Can be run side-by-side with traditional historical models
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Ingestion Processing Output
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Ingestion
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Ingestion
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Apache Kafka
•  Fast
•  Scalable
•  Durable
•  Distributed
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Apache Kafka - Overview
•  Producers write
messages to Kafka
topics
•  Consumers process
messages from a
topic
•  Kafka runs on a
cluster of server
where each server
is called a broker
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Apache Kafka - Topics
•  Topics are split up in
different partitions
•  Partitions are
replicated across the
cluster
•  Order of messages is
guaranteed
•  Messages are stored
for a period of time
•  Producers decide
which partition they
write to
•  Consumers keep the
offset of which
messages they have
read
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.beData Science Company
DEMO
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Ingestion Processing Output
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
The Hadoop Ecosystem
	
  
	
  
	
  
HDFS	
  
Distributed	
  File	
  System	
   Amazon	
  S3	
   Local	
  FS	
  
YARN	
  
Resource	
  Management	
  
MapReduce	
  
HBase	
  
NoSQL	
  
Hive	
  
Data	
  Mart	
  
Pig	
  
ScripCng	
  
Sqoop	
  
SQL	
  
Import	
  
Export	
  
Mahout	
  
Machine	
  
Learning	
  
…	
  
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
The Hadoop Ecosystem
	
  
	
  
	
  
HDFS	
  
Distributed	
  File	
  System	
   Amazon	
  S3	
   Local	
  FS	
  
YARN	
  
Resource	
  Management	
  
MapReduce	
  
HBase	
  
NoSQL	
  
Hive	
  
Data	
  Mart	
  
Pig	
  
ScripCng	
  
Sqoop	
  
SQL	
  
Import	
  
Export	
  
Mahout	
  
Machine	
  
Learning	
  
…	
  
Spark	
   Storm	
   …	
  
Spark	
  SQL	
  
	
  
Spark	
  
MLlib	
  
	
  
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Apache Storm
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Spouts
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Spouts
•  Source of streams into the topology
•  Can be reliable or unreliable
•  Support for:
–  Kafka
–  Kestrel
–  RabbitMQ
–  JMS
–  Amazon Kinesis
–  Build your own (e.g. twitter)
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Bolts
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Bolts
•  Where all the processing happens
•  Filtering, functions, aggregations, joins,
database updates, …
•  You subscribe to streams of a different
component (other bolts/spouts)
•  Must ack every tuple they process
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Parallelism
•  Spouts & Bolts
actually run as
multiple instances
on different
machines
•  Making sure that
the correct
messages goes to
the correct instance
is up to the
developer
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Stream Groupings
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Stream Groupings
•  Defines how a stream should be
partitioned among the bolt's tasks
•  Some examples:
– Round Robin
– Based on key
– All
– Specific instance
– …
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Storm Ups and Downs
•  Really real time
•  Very Powerful
•  Built for performance
•  Very low level (comparable to
MapReduce)
•  Trivial tasks can become hard (sorting,
joins, …)
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Spark Streaming
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Spark Architecture
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Spark Streaming Concepts
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Spark Streaming Input
•  Kafka
•  Flume
•  Kinesis
•  Twitter
•  ZeroMQ
•  HDFS
•  TCP Sockets
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Windowing
•  You can group multiple batches together
into a sliding window.
•  E.g. all the events from the last 60
seconds
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Spark Streaming Strengths
•  Works just like regular Spark processing,
just replace SparkContext with
StreamingContext
•  Full integration with other Spark libraries
(Spark SQL, Spark Mllib, …)
•  Ease of development
•  Scalable, fault-tolerant, …
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.beData Science Company
Spark Streaming Example
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Ingestion Processing Output
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Getting to Your Data
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Getting to Your Data
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Data output bottlenecks
•  Pig & Hive are quite slow
•  No visual feedback from results
•  Specific calculations (cubing) of metrics
– Reporting tools cannot handle the
dimensions of the data
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Elasticsearch
•  Document store (ideal for denormalized
data)
•  Distributed
•  Highly Available
•  Open Source
•  Real Time (Inserts & Searches)
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
ES-Hadoop
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Hive Integration
•  Writing to Elasticsearch from Hive
CREATE	
  EXTERNAL	
  TABLE	
  artists	
  (	
  
	
  	
  	
  	
  id	
  	
  	
  	
  	
  	
  BIGINT,	
  
	
  	
  	
  	
  name	
  	
  	
  	
  STRING,	
  
	
  	
  	
  	
  links	
  	
  	
  STRUCT<url:STRING,	
  picture:STRING>)	
  
STORED	
  BY	
  'org.elasticsearch.hadoop.hive.EsStorageHandler'	
  
TBLPROPERTIES('es.resource'	
  =	
  'radio/artists');	
  
	
  
-­‐-­‐	
  insert	
  data	
  to	
  Elasticsearch	
  from	
  another	
  table	
  called	
  
'source'	
  
INSERT	
  OVERWRITE	
  TABLE	
  artists	
  
	
  	
  	
  	
  SELECT	
  NULL,	
  s.name,	
  named_struct('url',	
  s.url,	
  'picture',	
  
s.picture)	
  
	
  	
  	
  	
  FROM	
  source	
  s;	
  
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Hive Integration
•  Reading from Elasticsearch in Hive
CREATE	
  EXTERNAL	
  TABLE	
  artists	
  (	
  
	
  	
  	
  	
  id	
  	
  	
  	
  	
  	
  BIGINT,	
  
	
  	
  	
  	
  name	
  	
  	
  	
  STRING,	
  
	
  	
  	
  	
  links	
  	
  	
  STRUCT<url:STRING,	
  picture:STRING>)	
  
STORED	
  BY	
  'org.elasticsearch.hadoop.hive.EsStorageHandler'	
  
TBLPROPERTIES('es.resource'	
  =	
  'radio/artists',	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  'es.query'	
  =	
  '?q=me*');	
  
	
  
-­‐-­‐	
  stream	
  data	
  from	
  Elasticsearch	
  
SELECT	
  *	
  FROM	
  artists;	
  
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Pig Integration
•  Writing to Elasticsearch from Pig
-­‐-­‐	
  load	
  data	
  from	
  HDFS	
  into	
  Pig	
  using	
  a	
  schema	
  
A	
  =	
  LOAD	
  'src/test/resources/artists.dat'	
  USING	
  PigStorage()	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  AS	
  (id:long,	
  name,	
  url:chararray,	
  picture:	
  
chararray);	
  
-­‐-­‐	
  transform	
  data	
  
B	
  =	
  FOREACH	
  A	
  GENERATE	
  name,	
  TOTUPLE(url,	
  picture)	
  AS	
  links;	
  
-­‐-­‐	
  save	
  the	
  result	
  to	
  Elasticsearch	
  
STORE	
  B	
  INTO	
  'radio/artists'	
  USING	
  
org.elasticsearch.hadoop.pig.EsStorage();	
  
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Pig Integration
•  Reading from Elasticsearch in Pig
-­‐-­‐	
  execute	
  Elasticsearch	
  query	
  and	
  load	
  data	
  into	
  Pig	
  
A	
  =	
  LOAD	
  'radio/artists'	
  
	
  	
  	
  	
  USING	
  org.elasticsearch.hadoop.pig.EsStorage('es.query=?
me*');	
  
DUMP	
  A;	
  
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Spark Integration
•  Writing to Elasticsearch from Spark
import	
  org.apache.spark.SparkContext	
  	
  	
  	
  	
  
import	
  org.apache.spark.SparkContext._	
  
	
  
import	
  org.elasticsearch.spark._	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  
val	
  conf	
  =	
  ...	
  
val	
  sc	
  =	
  new	
  SparkContext(conf)	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  
-­‐-­‐	
  Create	
  RDD	
  here	
  
	
  
rdd.saveToEs("spark/docs")	
  
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Spark Integration
•  Reading from Elasticsearch in Spark
...	
  
import	
  org.elasticsearch.spark._	
  
	
  
...	
  
val	
  conf	
  =	
  ...	
  
val	
  sc	
  =	
  new	
  SparkContext(conf)	
  
	
  
sc.esRDD("radio/artists",	
  "?q=me*")	
  	
  
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Storm Integration
•  Writing to Elasticsearch from Storm
import	
  org.elasticsearch.storm.EsBolt;	
  	
  
	
  
TopologyBuilder	
  builder	
  =	
  new	
  TopologyBuilder();	
  
builder.setSpout("spout",	
  new	
  RandomSentenceSpout(),	
  10);	
  
builder.setBolt("es-­‐bolt",	
  new	
  EsBolt("storm/docs"),	
  5)	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  .shuffleGrouping("spout");	
  
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Storm Integration
•  Reading from Elasticsearch in Storm
import	
  org.elasticsearch.storm.EsSpout;	
  	
  
	
  
TopologyBuilder	
  builder	
  =	
  new	
  TopologyBuilder();	
  
builder.setSpout("es-­‐spout",	
  new	
  EsSpout("storm/docs",	
  "?q=me*),	
  
5);	
  
builder.setBolt("bolt",	
  new	
  PrinterBolt()).shuffleGrouping("es-­‐
spout");	
  
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Visualizing data
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Kibana
•  Visualization tool
on top of
Elasticsearch
•  Allows ad-hoc
querying &
graphing
•  Support for real
time updates
•  Create your own
dashboards
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.beData Science Company
Demo
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Wrap Up
Ingestion Processing Output
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Data Science Company
Real Time Big Data
InfoFarm Seminar
18/11/2015

Contenu connexe

Tendances

Big Data Meets Learning Science: Keynote by Al Essa
Big Data Meets Learning Science: Keynote by Al EssaBig Data Meets Learning Science: Keynote by Al Essa
Big Data Meets Learning Science: Keynote by Al EssaSpark Summit
 
Pandas/Data Analysis at Baypiggies
Pandas/Data Analysis at BaypiggiesPandas/Data Analysis at Baypiggies
Pandas/Data Analysis at BaypiggiesAndy Hayden
 
Flink Case Study: Bouygues Telecom
Flink Case Study: Bouygues TelecomFlink Case Study: Bouygues Telecom
Flink Case Study: Bouygues TelecomFlink Forward
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...Big Data Spain
 
Extending Spark Graph for the Enterprise with Morpheus and Neo4j
Extending Spark Graph for the Enterprise with Morpheus and Neo4jExtending Spark Graph for the Enterprise with Morpheus and Neo4j
Extending Spark Graph for the Enterprise with Morpheus and Neo4jDatabricks
 
Presto Summit 2018 - 09 - Netflix Iceberg
Presto Summit 2018  - 09 - Netflix IcebergPresto Summit 2018  - 09 - Netflix Iceberg
Presto Summit 2018 - 09 - Netflix Icebergkbajda
 
Presto Summit 2018 - 08 - FINRA
Presto Summit 2018  - 08 - FINRAPresto Summit 2018  - 08 - FINRA
Presto Summit 2018 - 08 - FINRAkbajda
 
H2O Big Join Slides
H2O Big Join SlidesH2O Big Join Slides
H2O Big Join SlidesSri Ambati
 
Real-time Analytics with Presto and Apache Pinot
Real-time Analytics with Presto and Apache PinotReal-time Analytics with Presto and Apache Pinot
Real-time Analytics with Presto and Apache PinotXiang Fu
 
Red hat infrastructure for analytics
Red hat infrastructure for analyticsRed hat infrastructure for analytics
Red hat infrastructure for analyticsKyle Bader
 
Drupal and the Semantic Web - ESIP Webinar
Drupal and the Semantic Web - ESIP WebinarDrupal and the Semantic Web - ESIP Webinar
Drupal and the Semantic Web - ESIP Webinarscorlosquet
 
Presto @ Uber Hadoop summit2017
Presto @ Uber Hadoop summit2017Presto @ Uber Hadoop summit2017
Presto @ Uber Hadoop summit2017Zhenxiao Luo
 
Presto@Netflix Presto Meetup 03-19-15
Presto@Netflix Presto Meetup 03-19-15Presto@Netflix Presto Meetup 03-19-15
Presto@Netflix Presto Meetup 03-19-15Zhenxiao Luo
 
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dan Lynn
 
Getting Started Contributing to Apache Spark – From PR, CR, JIRA, and Beyond
Getting Started Contributing to Apache Spark – From PR, CR, JIRA, and BeyondGetting Started Contributing to Apache Spark – From PR, CR, JIRA, and Beyond
Getting Started Contributing to Apache Spark – From PR, CR, JIRA, and BeyondDatabricks
 
OU RSE Tutorial Big Data Cluster
OU RSE Tutorial Big Data ClusterOU RSE Tutorial Big Data Cluster
OU RSE Tutorial Big Data ClusterEnrico Daga
 
Iceberg: a fast table format for S3
Iceberg: a fast table format for S3Iceberg: a fast table format for S3
Iceberg: a fast table format for S3DataWorks Summit
 
Rdf saturator
Rdf saturatorRdf saturator
Rdf saturatorINRIA-OAK
 
Presto Apache BigData 2017
Presto Apache BigData 2017Presto Apache BigData 2017
Presto Apache BigData 2017Zhenxiao Luo
 

Tendances (20)

Big Data Meets Learning Science: Keynote by Al Essa
Big Data Meets Learning Science: Keynote by Al EssaBig Data Meets Learning Science: Keynote by Al Essa
Big Data Meets Learning Science: Keynote by Al Essa
 
Pandas/Data Analysis at Baypiggies
Pandas/Data Analysis at BaypiggiesPandas/Data Analysis at Baypiggies
Pandas/Data Analysis at Baypiggies
 
Flink Case Study: Bouygues Telecom
Flink Case Study: Bouygues TelecomFlink Case Study: Bouygues Telecom
Flink Case Study: Bouygues Telecom
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
 
Extending Spark Graph for the Enterprise with Morpheus and Neo4j
Extending Spark Graph for the Enterprise with Morpheus and Neo4jExtending Spark Graph for the Enterprise with Morpheus and Neo4j
Extending Spark Graph for the Enterprise with Morpheus and Neo4j
 
Presto Summit 2018 - 09 - Netflix Iceberg
Presto Summit 2018  - 09 - Netflix IcebergPresto Summit 2018  - 09 - Netflix Iceberg
Presto Summit 2018 - 09 - Netflix Iceberg
 
Presto Summit 2018 - 08 - FINRA
Presto Summit 2018  - 08 - FINRAPresto Summit 2018  - 08 - FINRA
Presto Summit 2018 - 08 - FINRA
 
H2O Big Join Slides
H2O Big Join SlidesH2O Big Join Slides
H2O Big Join Slides
 
Real-time Analytics with Presto and Apache Pinot
Real-time Analytics with Presto and Apache PinotReal-time Analytics with Presto and Apache Pinot
Real-time Analytics with Presto and Apache Pinot
 
Red hat infrastructure for analytics
Red hat infrastructure for analyticsRed hat infrastructure for analytics
Red hat infrastructure for analytics
 
Drupal and the Semantic Web - ESIP Webinar
Drupal and the Semantic Web - ESIP WebinarDrupal and the Semantic Web - ESIP Webinar
Drupal and the Semantic Web - ESIP Webinar
 
Presto @ Uber Hadoop summit2017
Presto @ Uber Hadoop summit2017Presto @ Uber Hadoop summit2017
Presto @ Uber Hadoop summit2017
 
ISAX
ISAXISAX
ISAX
 
Presto@Netflix Presto Meetup 03-19-15
Presto@Netflix Presto Meetup 03-19-15Presto@Netflix Presto Meetup 03-19-15
Presto@Netflix Presto Meetup 03-19-15
 
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
 
Getting Started Contributing to Apache Spark – From PR, CR, JIRA, and Beyond
Getting Started Contributing to Apache Spark – From PR, CR, JIRA, and BeyondGetting Started Contributing to Apache Spark – From PR, CR, JIRA, and Beyond
Getting Started Contributing to Apache Spark – From PR, CR, JIRA, and Beyond
 
OU RSE Tutorial Big Data Cluster
OU RSE Tutorial Big Data ClusterOU RSE Tutorial Big Data Cluster
OU RSE Tutorial Big Data Cluster
 
Iceberg: a fast table format for S3
Iceberg: a fast table format for S3Iceberg: a fast table format for S3
Iceberg: a fast table format for S3
 
Rdf saturator
Rdf saturatorRdf saturator
Rdf saturator
 
Presto Apache BigData 2017
Presto Apache BigData 2017Presto Apache BigData 2017
Presto Apache BigData 2017
 

En vedette

Real time data services
Real time data servicesReal time data services
Real time data servicesRelevate
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data EcosystemIvo Vachkov
 
Retail Detail OmniChannel Congress 2015 - Data Science for e-commerce
Retail Detail OmniChannel Congress 2015 - Data Science for e-commerceRetail Detail OmniChannel Congress 2015 - Data Science for e-commerce
Retail Detail OmniChannel Congress 2015 - Data Science for e-commerceInfoFarm
 
Data Driven Decisions seminar
Data Driven Decisions seminarData Driven Decisions seminar
Data Driven Decisions seminarInfoFarm
 
Big Data with Apache Hadoop
Big Data with Apache HadoopBig Data with Apache Hadoop
Big Data with Apache HadoopInfoFarm
 
KNOWLEDGE SCIENCE; NOT INFORMATION SCIENCE OR TECHNOLOGY- SCOPE,THEORIES AND...
KNOWLEDGE SCIENCE; NOT INFORMATION SCIENCE OR TECHNOLOGY-  SCOPE,THEORIES AND...KNOWLEDGE SCIENCE; NOT INFORMATION SCIENCE OR TECHNOLOGY-  SCOPE,THEORIES AND...
KNOWLEDGE SCIENCE; NOT INFORMATION SCIENCE OR TECHNOLOGY- SCOPE,THEORIES AND...Dr. Raju M. Mathew
 
Towards Neuro–Information Science
Towards Neuro–Information ScienceTowards Neuro–Information Science
Towards Neuro–Information Sciencejacekg
 
Big Data and Hadoop - key drivers, ecosystem and use cases
Big Data and Hadoop - key drivers, ecosystem and use casesBig Data and Hadoop - key drivers, ecosystem and use cases
Big Data and Hadoop - key drivers, ecosystem and use casesJeff Kelly
 
Big data + data science startup focus points
Big data + data science startup focus pointsBig data + data science startup focus points
Big data + data science startup focus pointsTom Zorde
 
First impressions of SparkR: our own machine learning algorithm
First impressions of SparkR: our own machine learning algorithmFirst impressions of SparkR: our own machine learning algorithm
First impressions of SparkR: our own machine learning algorithmInfoFarm
 
Sharing & Sustaining Ecosystem Data
Sharing & Sustaining Ecosystem DataSharing & Sustaining Ecosystem Data
Sharing & Sustaining Ecosystem DataTERN Australia
 
Semiotics and Information Science
Semiotics and Information ScienceSemiotics and Information Science
Semiotics and Information ScienceFlorence Paisey
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Perficient, Inc.
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystemSlideCentral
 
Machine learning
Machine learningMachine learning
Machine learningInfoFarm
 
Share Information, Change the World: Big Data, Small Apps, Smart Dashboards &...
Share Information, Change the World: Big Data, Small Apps, Smart Dashboards &...Share Information, Change the World: Big Data, Small Apps, Smart Dashboards &...
Share Information, Change the World: Big Data, Small Apps, Smart Dashboards &...Kristin Wolff
 
Smart City Ecosystem, fram data to value for the citizens, Km4City solution, ...
Smart City Ecosystem, fram data to value for the citizens, Km4City solution, ...Smart City Ecosystem, fram data to value for the citizens, Km4City solution, ...
Smart City Ecosystem, fram data to value for the citizens, Km4City solution, ...Paolo Nesi
 
Data science fin_tech_2016
Data science fin_tech_2016Data science fin_tech_2016
Data science fin_tech_2016iECARUS
 

En vedette (20)

Real time data services
Real time data servicesReal time data services
Real time data services
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
 
Real-Time Analytics: The Future of Big Data in the Agency
Real-Time Analytics: The Future of Big Data in the AgencyReal-Time Analytics: The Future of Big Data in the Agency
Real-Time Analytics: The Future of Big Data in the Agency
 
Retail Detail OmniChannel Congress 2015 - Data Science for e-commerce
Retail Detail OmniChannel Congress 2015 - Data Science for e-commerceRetail Detail OmniChannel Congress 2015 - Data Science for e-commerce
Retail Detail OmniChannel Congress 2015 - Data Science for e-commerce
 
Data Driven Decisions seminar
Data Driven Decisions seminarData Driven Decisions seminar
Data Driven Decisions seminar
 
Big Data with Apache Hadoop
Big Data with Apache HadoopBig Data with Apache Hadoop
Big Data with Apache Hadoop
 
KNOWLEDGE SCIENCE; NOT INFORMATION SCIENCE OR TECHNOLOGY- SCOPE,THEORIES AND...
KNOWLEDGE SCIENCE; NOT INFORMATION SCIENCE OR TECHNOLOGY-  SCOPE,THEORIES AND...KNOWLEDGE SCIENCE; NOT INFORMATION SCIENCE OR TECHNOLOGY-  SCOPE,THEORIES AND...
KNOWLEDGE SCIENCE; NOT INFORMATION SCIENCE OR TECHNOLOGY- SCOPE,THEORIES AND...
 
Towards Neuro–Information Science
Towards Neuro–Information ScienceTowards Neuro–Information Science
Towards Neuro–Information Science
 
Big Data and Hadoop - key drivers, ecosystem and use cases
Big Data and Hadoop - key drivers, ecosystem and use casesBig Data and Hadoop - key drivers, ecosystem and use cases
Big Data and Hadoop - key drivers, ecosystem and use cases
 
Big data + data science startup focus points
Big data + data science startup focus pointsBig data + data science startup focus points
Big data + data science startup focus points
 
First impressions of SparkR: our own machine learning algorithm
First impressions of SparkR: our own machine learning algorithmFirst impressions of SparkR: our own machine learning algorithm
First impressions of SparkR: our own machine learning algorithm
 
Sharing & Sustaining Ecosystem Data
Sharing & Sustaining Ecosystem DataSharing & Sustaining Ecosystem Data
Sharing & Sustaining Ecosystem Data
 
Semiotics and Information Science
Semiotics and Information ScienceSemiotics and Information Science
Semiotics and Information Science
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystem
 
Machine learning
Machine learningMachine learning
Machine learning
 
Share Information, Change the World: Big Data, Small Apps, Smart Dashboards &...
Share Information, Change the World: Big Data, Small Apps, Smart Dashboards &...Share Information, Change the World: Big Data, Small Apps, Smart Dashboards &...
Share Information, Change the World: Big Data, Small Apps, Smart Dashboards &...
 
Smart City Ecosystem, fram data to value for the citizens, Km4City solution, ...
Smart City Ecosystem, fram data to value for the citizens, Km4City solution, ...Smart City Ecosystem, fram data to value for the citizens, Km4City solution, ...
Smart City Ecosystem, fram data to value for the citizens, Km4City solution, ...
 
Big Data + Social Graph
Big Data + Social GraphBig Data + Social Graph
Big Data + Social Graph
 
Data science fin_tech_2016
Data science fin_tech_2016Data science fin_tech_2016
Data science fin_tech_2016
 

Similaire à Real Time Big Data

Building real time data-driven products
Building real time data-driven productsBuilding real time data-driven products
Building real time data-driven productsLars Albertsson
 
Scalable crawling with Kafka, scrapy and spark - November 2021
Scalable crawling with Kafka, scrapy and spark - November 2021Scalable crawling with Kafka, scrapy and spark - November 2021
Scalable crawling with Kafka, scrapy and spark - November 2021Max Lapan
 
Experience with Kafka & Storm
Experience with Kafka & StormExperience with Kafka & Storm
Experience with Kafka & StormOtto Mok
 
big data fest building modern data streaming apps
big data fest building modern data streaming appsbig data fest building modern data streaming apps
big data fest building modern data streaming appsTimothy Spann
 
BigDataFest_ Building Modern Data Streaming Apps
BigDataFest_  Building Modern Data Streaming AppsBigDataFest_  Building Modern Data Streaming Apps
BigDataFest_ Building Modern Data Streaming Appsssuser73434e
 
Kafka Connect: Operational Lessons Learned from the Trenches (Elizabeth Benne...
Kafka Connect: Operational Lessons Learned from the Trenches (Elizabeth Benne...Kafka Connect: Operational Lessons Learned from the Trenches (Elizabeth Benne...
Kafka Connect: Operational Lessons Learned from the Trenches (Elizabeth Benne...confluent
 
Using a Fast Operational Database to Build Real-time Streaming Aggregations
Using a Fast Operational Database to Build Real-time Streaming AggregationsUsing a Fast Operational Database to Build Real-time Streaming Aggregations
Using a Fast Operational Database to Build Real-time Streaming AggregationsVoltDB
 
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Codemotion
 
Lightning Fast SCADA Development with Open Library for WinCC OA
Lightning Fast SCADA Development with Open Library for WinCC OA Lightning Fast SCADA Development with Open Library for WinCC OA
Lightning Fast SCADA Development with Open Library for WinCC OA DMC, Inc.
 
OSMC 2016 - Monitor your infrastructure with Elastic Beats by Monica Sarbu
OSMC 2016 - Monitor your infrastructure with Elastic Beats by Monica SarbuOSMC 2016 - Monitor your infrastructure with Elastic Beats by Monica Sarbu
OSMC 2016 - Monitor your infrastructure with Elastic Beats by Monica SarbuNETWAYS
 
OSMC 2016 | Monitor your Infrastructure with Elastic Beats by Monica Sarbu
OSMC 2016 | Monitor your Infrastructure with Elastic Beats by Monica SarbuOSMC 2016 | Monitor your Infrastructure with Elastic Beats by Monica Sarbu
OSMC 2016 | Monitor your Infrastructure with Elastic Beats by Monica SarbuNETWAYS
 
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017Monal Daxini
 
Open Source Automated Documentation in a Development Environment
Open Source Automated Documentation in a Development EnvironmentOpen Source Automated Documentation in a Development Environment
Open Source Automated Documentation in a Development Environmentnealemorison
 
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...ScyllaDB
 
Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...
Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...
Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...Landon Robinson
 
Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming
Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data StreamingOracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming
Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data StreamingMichael Rainey
 
Workshop: Big Data Visualization for Security
Workshop: Big Data Visualization for SecurityWorkshop: Big Data Visualization for Security
Workshop: Big Data Visualization for SecurityRaffael Marty
 
Spark Summit EU: IBM Keynote
Spark Summit EU: IBM KeynoteSpark Summit EU: IBM Keynote
Spark Summit EU: IBM Keynotesparktc
 
ITCamp 2018 - Damian Widera U-SQL in great depth
ITCamp 2018 - Damian Widera U-SQL in great depthITCamp 2018 - Damian Widera U-SQL in great depth
ITCamp 2018 - Damian Widera U-SQL in great depthITCamp
 

Similaire à Real Time Big Data (20)

Workshop slides
Workshop slidesWorkshop slides
Workshop slides
 
Building real time data-driven products
Building real time data-driven productsBuilding real time data-driven products
Building real time data-driven products
 
Scalable crawling with Kafka, scrapy and spark - November 2021
Scalable crawling with Kafka, scrapy and spark - November 2021Scalable crawling with Kafka, scrapy and spark - November 2021
Scalable crawling with Kafka, scrapy and spark - November 2021
 
Experience with Kafka & Storm
Experience with Kafka & StormExperience with Kafka & Storm
Experience with Kafka & Storm
 
big data fest building modern data streaming apps
big data fest building modern data streaming appsbig data fest building modern data streaming apps
big data fest building modern data streaming apps
 
BigDataFest_ Building Modern Data Streaming Apps
BigDataFest_  Building Modern Data Streaming AppsBigDataFest_  Building Modern Data Streaming Apps
BigDataFest_ Building Modern Data Streaming Apps
 
Kafka Connect: Operational Lessons Learned from the Trenches (Elizabeth Benne...
Kafka Connect: Operational Lessons Learned from the Trenches (Elizabeth Benne...Kafka Connect: Operational Lessons Learned from the Trenches (Elizabeth Benne...
Kafka Connect: Operational Lessons Learned from the Trenches (Elizabeth Benne...
 
Using a Fast Operational Database to Build Real-time Streaming Aggregations
Using a Fast Operational Database to Build Real-time Streaming AggregationsUsing a Fast Operational Database to Build Real-time Streaming Aggregations
Using a Fast Operational Database to Build Real-time Streaming Aggregations
 
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
 
Lightning Fast SCADA Development with Open Library for WinCC OA
Lightning Fast SCADA Development with Open Library for WinCC OA Lightning Fast SCADA Development with Open Library for WinCC OA
Lightning Fast SCADA Development with Open Library for WinCC OA
 
OSMC 2016 - Monitor your infrastructure with Elastic Beats by Monica Sarbu
OSMC 2016 - Monitor your infrastructure with Elastic Beats by Monica SarbuOSMC 2016 - Monitor your infrastructure with Elastic Beats by Monica Sarbu
OSMC 2016 - Monitor your infrastructure with Elastic Beats by Monica Sarbu
 
OSMC 2016 | Monitor your Infrastructure with Elastic Beats by Monica Sarbu
OSMC 2016 | Monitor your Infrastructure with Elastic Beats by Monica SarbuOSMC 2016 | Monitor your Infrastructure with Elastic Beats by Monica Sarbu
OSMC 2016 | Monitor your Infrastructure with Elastic Beats by Monica Sarbu
 
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
 
Open Source Automated Documentation in a Development Environment
Open Source Automated Documentation in a Development EnvironmentOpen Source Automated Documentation in a Development Environment
Open Source Automated Documentation in a Development Environment
 
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
 
Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...
Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...
Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...
 
Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming
Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data StreamingOracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming
Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming
 
Workshop: Big Data Visualization for Security
Workshop: Big Data Visualization for SecurityWorkshop: Big Data Visualization for Security
Workshop: Big Data Visualization for Security
 
Spark Summit EU: IBM Keynote
Spark Summit EU: IBM KeynoteSpark Summit EU: IBM Keynote
Spark Summit EU: IBM Keynote
 
ITCamp 2018 - Damian Widera U-SQL in great depth
ITCamp 2018 - Damian Widera U-SQL in great depthITCamp 2018 - Damian Widera U-SQL in great depth
ITCamp 2018 - Damian Widera U-SQL in great depth
 

Dernier

FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfsimulationsindia
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfrahulyadav957181
 

Dernier (20)

FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdf
 

Real Time Big Data

  • 1. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Data Science Company Real Time Big Data InfoFarm Seminar 18/11/2015
  • 2. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be About Me
  • 3. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be About InfoFarm
  • 4. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Agenda •  Typical Big Data Landscape •  The need for Real Time Big Data •  Real Time Data Ingestion •  Tools for Real Time Big Data – Apache Spark – Apache Storm – Search •  Q&A •  Lunch
  • 5. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be A Typical Big Data Landscape
  • 6. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be A Typical Big Data Landscape
  • 7. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be A Typical Big Data Landscape •  Data Silo •  Batch environment •  Periodical Analytics/statistics •  Data Source for new systems
  • 8. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be The need for Real Time Big Data •  Obtaining analytical results faster –  Processing faster than once a day •  Load evens out over day •  Past/Present/Future –  Alert for certain events –  Updating Prediction models on-the-fly •  Allow faster feedback to end users –  See results of your actions right away
  • 9. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Perfect fits for Real Time Processing •  Anomaly Detection –  Abnormal readings of sensors –  Abnormal amounts of log files –  Fraud detection •  Real Time updates to Recommender models –  Fast new recommendations in e-commerce –  Support for trending items –  Fast responses to events happening right now •  Real Time updates of clustering models •  Improving Classification based un current events •  Can be run side-by-side with traditional historical models
  • 10. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Ingestion Processing Output
  • 11. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Ingestion
  • 12. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Ingestion
  • 13. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
  • 14. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Apache Kafka •  Fast •  Scalable •  Durable •  Distributed
  • 15. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Apache Kafka - Overview •  Producers write messages to Kafka topics •  Consumers process messages from a topic •  Kafka runs on a cluster of server where each server is called a broker
  • 16. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Apache Kafka - Topics •  Topics are split up in different partitions •  Partitions are replicated across the cluster •  Order of messages is guaranteed •  Messages are stored for a period of time •  Producers decide which partition they write to •  Consumers keep the offset of which messages they have read
  • 17. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.beData Science Company DEMO
  • 18. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Ingestion Processing Output
  • 19. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be The Hadoop Ecosystem       HDFS   Distributed  File  System   Amazon  S3   Local  FS   YARN   Resource  Management   MapReduce   HBase   NoSQL   Hive   Data  Mart   Pig   ScripCng   Sqoop   SQL   Import   Export   Mahout   Machine   Learning   …  
  • 20. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be The Hadoop Ecosystem       HDFS   Distributed  File  System   Amazon  S3   Local  FS   YARN   Resource  Management   MapReduce   HBase   NoSQL   Hive   Data  Mart   Pig   ScripCng   Sqoop   SQL   Import   Export   Mahout   Machine   Learning   …   Spark   Storm   …   Spark  SQL     Spark   MLlib    
  • 21. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
  • 22. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Apache Storm
  • 23. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Spouts
  • 24. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Spouts •  Source of streams into the topology •  Can be reliable or unreliable •  Support for: –  Kafka –  Kestrel –  RabbitMQ –  JMS –  Amazon Kinesis –  Build your own (e.g. twitter)
  • 25. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Bolts
  • 26. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Bolts •  Where all the processing happens •  Filtering, functions, aggregations, joins, database updates, … •  You subscribe to streams of a different component (other bolts/spouts) •  Must ack every tuple they process
  • 27. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Parallelism •  Spouts & Bolts actually run as multiple instances on different machines •  Making sure that the correct messages goes to the correct instance is up to the developer
  • 28. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Stream Groupings
  • 29. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Stream Groupings •  Defines how a stream should be partitioned among the bolt's tasks •  Some examples: – Round Robin – Based on key – All – Specific instance – …
  • 30. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Storm Ups and Downs •  Really real time •  Very Powerful •  Built for performance •  Very low level (comparable to MapReduce) •  Trivial tasks can become hard (sorting, joins, …)
  • 31. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Spark Streaming
  • 32. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Spark Architecture
  • 33. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Spark Streaming Concepts
  • 34. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Spark Streaming Input •  Kafka •  Flume •  Kinesis •  Twitter •  ZeroMQ •  HDFS •  TCP Sockets
  • 35. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Windowing •  You can group multiple batches together into a sliding window. •  E.g. all the events from the last 60 seconds
  • 36. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Spark Streaming Strengths •  Works just like regular Spark processing, just replace SparkContext with StreamingContext •  Full integration with other Spark libraries (Spark SQL, Spark Mllib, …) •  Ease of development •  Scalable, fault-tolerant, …
  • 37. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.beData Science Company Spark Streaming Example
  • 38. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Ingestion Processing Output
  • 39. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Getting to Your Data
  • 40. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Getting to Your Data
  • 41. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Data output bottlenecks •  Pig & Hive are quite slow •  No visual feedback from results •  Specific calculations (cubing) of metrics – Reporting tools cannot handle the dimensions of the data
  • 42. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
  • 43. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Elasticsearch •  Document store (ideal for denormalized data) •  Distributed •  Highly Available •  Open Source •  Real Time (Inserts & Searches)
  • 44. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be ES-Hadoop
  • 45. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Hive Integration •  Writing to Elasticsearch from Hive CREATE  EXTERNAL  TABLE  artists  (          id            BIGINT,          name        STRING,          links      STRUCT<url:STRING,  picture:STRING>)   STORED  BY  'org.elasticsearch.hadoop.hive.EsStorageHandler'   TBLPROPERTIES('es.resource'  =  'radio/artists');     -­‐-­‐  insert  data  to  Elasticsearch  from  another  table  called   'source'   INSERT  OVERWRITE  TABLE  artists          SELECT  NULL,  s.name,  named_struct('url',  s.url,  'picture',   s.picture)          FROM  source  s;  
  • 46. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Hive Integration •  Reading from Elasticsearch in Hive CREATE  EXTERNAL  TABLE  artists  (          id            BIGINT,          name        STRING,          links      STRUCT<url:STRING,  picture:STRING>)   STORED  BY  'org.elasticsearch.hadoop.hive.EsStorageHandler'   TBLPROPERTIES('es.resource'  =  'radio/artists',                              'es.query'  =  '?q=me*');     -­‐-­‐  stream  data  from  Elasticsearch   SELECT  *  FROM  artists;  
  • 47. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Pig Integration •  Writing to Elasticsearch from Pig -­‐-­‐  load  data  from  HDFS  into  Pig  using  a  schema   A  =  LOAD  'src/test/resources/artists.dat'  USING  PigStorage()                                          AS  (id:long,  name,  url:chararray,  picture:   chararray);   -­‐-­‐  transform  data   B  =  FOREACH  A  GENERATE  name,  TOTUPLE(url,  picture)  AS  links;   -­‐-­‐  save  the  result  to  Elasticsearch   STORE  B  INTO  'radio/artists'  USING   org.elasticsearch.hadoop.pig.EsStorage();  
  • 48. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Pig Integration •  Reading from Elasticsearch in Pig -­‐-­‐  execute  Elasticsearch  query  and  load  data  into  Pig   A  =  LOAD  'radio/artists'          USING  org.elasticsearch.hadoop.pig.EsStorage('es.query=? me*');   DUMP  A;  
  • 49. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Spark Integration •  Writing to Elasticsearch from Spark import  org.apache.spark.SparkContext           import  org.apache.spark.SparkContext._     import  org.elasticsearch.spark._                     val  conf  =  ...   val  sc  =  new  SparkContext(conf)                       -­‐-­‐  Create  RDD  here     rdd.saveToEs("spark/docs")  
  • 50. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Spark Integration •  Reading from Elasticsearch in Spark ...   import  org.elasticsearch.spark._     ...   val  conf  =  ...   val  sc  =  new  SparkContext(conf)     sc.esRDD("radio/artists",  "?q=me*")    
  • 51. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Storm Integration •  Writing to Elasticsearch from Storm import  org.elasticsearch.storm.EsBolt;       TopologyBuilder  builder  =  new  TopologyBuilder();   builder.setSpout("spout",  new  RandomSentenceSpout(),  10);   builder.setBolt("es-­‐bolt",  new  EsBolt("storm/docs"),  5)                                                                          .shuffleGrouping("spout");  
  • 52. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Storm Integration •  Reading from Elasticsearch in Storm import  org.elasticsearch.storm.EsSpout;       TopologyBuilder  builder  =  new  TopologyBuilder();   builder.setSpout("es-­‐spout",  new  EsSpout("storm/docs",  "?q=me*),   5);   builder.setBolt("bolt",  new  PrinterBolt()).shuffleGrouping("es-­‐ spout");  
  • 53. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Visualizing data
  • 54. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Kibana •  Visualization tool on top of Elasticsearch •  Allows ad-hoc querying & graphing •  Support for real time updates •  Create your own dashboards
  • 55. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.beData Science Company Demo
  • 56. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Wrap Up Ingestion Processing Output
  • 57. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
  • 58. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Data Science Company Real Time Big Data InfoFarm Seminar 18/11/2015