SlideShare une entreprise Scribd logo
1  sur  59
Télécharger pour lire hors ligne
2013 © Trivadis
BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN

WELCOME Big Data and Fast Data
big and fast combined – is it
possible?
Guido Schmutz und Albert Blarer
24. April 2013
24. April 2013
Big Data und Fast Data
1
2013 © Trivadis
Guido Schmutz
•  Working for Trivadis for more than 16 years
•  Oracle ACE Director for Fusion Middleware and SOA
•  Co-Author of different books
•  Consultant, Trainer Software Architect for Java, Oracle, SOA
and EDA
•  Member of Trivadis Architecture Board
•  Technology Manager @ Trivadis
•  More than 25 years of software development 

experience
•  Contact: guido.schmutz@trivadis.com
•  Blog: http://guidoschmutz.wordpress.com
•  Twitter: gschmutz
14.06.2012
2
Where and When should I use the Oracle Service Bus (OSB)
2013 © Trivadis
BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN

2013 © Trivadis
Mit über 600 IT- und Fachexperten bei Ihnen vor Ort.
4
11 Trivadis Niederlassungen mit

über 600 Mitarbeitenden
200 Service Level Agreements
Mehr als 4'000 Trainingsteilnehmer
Forschungs- und Entwicklungs-
budget: CHF 5.0 / EUR 4 Mio.
Finanziell unabhängig und

nachhaltig profitabel
Erfahrung aus mehr als 1'900
Projekten pro Jahr bei über 800
Kunden
Stand 12/2012
Hamburg
Düsseldorf
Frankfurt
Freiburg
München
Wien
Basel
ZürichBern
Lausanne
4
Stuttgart
Datum
Trivadis – das Unternehmen
2013 © Trivadis
Credits
Nathan Marz
Author of „
Big Data – Principles and best practics of scalable
realtime data systems“ – Manning Press
Used to be working at Backtype and Twitter
Creator of
•  Storm
•  Cascalog
•  ElephantDB
24. April 2013
Big Data und Fast Data
5
2013 © Trivadis
Agenda
1.  Big Data, what is it?
2.  Motivation
3.  The Lambda Architecture
4.  Implementing the Lambda Architecture
5.  Summary
24. April 2013
Big Data und Fast Data
6
2013 © Trivadis
Big Data Definition (Gartner et al)
14.02.2013
Big Data 4 Sales
7
Velocity
Tera-, Peta-, Exa-, Zetta-, Yota- bytes and constantly growing
“Traditional” computing in RDBMS 

is not scalable enough. 

We search for “linear scalability”
“Only … structured information 

is not enough” – “95% of produced data in
unstructured”
Characteristics of Big Data: Its
Volume, Velocity and Variety in
combination
+ Veracity (IBM) - information uncertainty
+ Time to action ? – Big Data + Event Processing = Fast Data
2013 © Trivadis
Big Data Emerging Technologies
24. April 2013
Big Data und Fast Data
8
§  MapReduce (e.g. Apache Hadoop)
§  Event Stream Processing & CEP (e.g. Storm or Esper)
§  New messaging systems (e.g. Apache Kafka)
§  Integration tools (e.g. Spring or Camus)
§  New database paradigms (e.g. NoSQL or NewSQL)
§  Data mining tools (e.g. Apache Mahout )
§  Data extraction and detection tools (e.g. Apache Tika )
2013 © Trivadis
14.02.2013
Big Data 4 Sales
9
2013 © Trivadis
Volume Development
0
20
40
60
80
100
0
2000
4000
6000
8000
2005 2007 2009 2011 2013 2015
AggregateUncertainty%
GlobalDataVolumeinExabytes
Year
Sensors:
“internet of
things”
Social Media:
video, audio,
text
VoIP:
Skype, MSN,
ICQ, ...
Enterprise Data:
data dictionary,
ERD, ...
24. April 2013
Big Data und Fast Data
10
2013 © Trivadis
Velocity
24. April 2013
Big Data und Fast Data
11
§  Velocity requirement examples:
§  Recommendation Engine
§  Predictive Analytics
§  Marketing Campaign Analysis
§  Customer Retention and Churn Analysis
§  Social Graph Analysis
§  Capital Markets Analysis
§  Risk Management
§  Rogue Trading
§  Fraud Detection
§  Retail Banking
§  Network Monitoring
§  Research and Development
2013 © Trivadis
Agenda
1.  Big Data, what is it?
2.  Motivation
3.  The Lambda Architecture
4.  Implementing the Lambda Architecture
5.  Summary
24. April 2013
Big Data und Fast Data
12
2013 © Trivadis
What is a data system?
•  A system that manages the storage and querying of data with a
lifetime measured in years encompassing every version of the
application to ever exist, every hardware failure and every human
mistake ever made.
•  A data system answers questions based on information that was
acquired in the past
•  Not all bits of information are equal
•  Some information is derived from other
24. April 2013
Big Data und Fast Data
13
2013 © Trivadis
Desired Properties of a (Big) Data System
Robust and fault-tolerant
Low latency reads and updates
Scalable
General
Extensible
Allows ad hoc queries
Minimal maintenance
Debuggable
24. April 2013
Big Data und Fast Data
14
2013 © Trivadis
Typical problem in today’s

architecture/systems
Bugs will be deployed to production over the lifetime of a data system
Operational mistakes will be made
Humans are part of the overall system
•  Just like hard disks, CPUs, memory, software
•  design for human error like you design for any other fault
Examples of human error
•  Deploy a bug that increments counters by two instead of by one
•  Accidentally delete data from database
•  Accidental DOS on important internal service
Worst two consequences: data loss or data corruption
As long as an error doesn‘t lose or corrupt good data, you can fix what
went wrong
24. April 2013
Big Data und Fast Data
15
Lack of Human Fault Tolerance
2013 © Trivadis
Mutability
The U and D in CRUD
A mutable system updates the current state of the world
Mutable systems inherently lack human fault-tolerance
Easy to corrupt or lose data
24. April 2013
Big Data und Fast Data
16
Capturing change traditionally
Lack of Human Fault Tolerance
Name City
Guido Berne
Albert Zurich
Name City
Guido Basel
Albert Zurich
2013 © Trivadis
Immutability
An immutable system captures historical records of events
Each event happens at a particular time and is always true
24. April 2013
Big Data und Fast Data
17
Capturing change by storing events
Lack of Human Fault Tolerance
Name City Timestamp
Guido Berne 1.8.1999
Albert Zurich 10.5.1988
Name City Timestamp
Guido Berne 1.8.1999
Albert Zurich 10.5.1988
Guido Basel 1.4.2013
2013 © Trivadis
Immutability
Immutability greatly restricts the range of errors that can cause data loss or
data corruption
Vastly more human fault-tolerant
Much easier to reason about systems based on immutability
Conclusion: Your source of truth should always be immutable
24. April 2013
Big Data und Fast Data
18
Lack of Human Fault Tolerance
2013 © Trivadis
What about traditional/today’s architectures ? 

Source of Truth is mutable!
Rather than build systems like this ….
24. April 2013
Big Data und Fast Data
19
Mutable
Database
Application
(Query)
RDBMS
NoSQL
NewSQL
Mobile
Web
RIA
Rich Client
Source of Truth
Source of Truth
2013 © Trivadis
A different kind of architecture with immutable source of truth
… why not building them like this
24. April 2013
Big Data und Fast Data
20
HDFS
NoSQL
NewSQL
RDBMS
View on
Data
Mobile
Web
RIA
Rich Client
Source of Truth
Immutable
data
View on
Data
Application
(Query)
Source of Truth
2013 © Trivadis
How to create the views on the Immutable data?
On the fly ?
Materialized, i.e. Pre-computed ?
24. April 2013
Big Data und Fast Data
21
Immutable
data
View
Immutable
data
Pre-

Computed

Views
Query
Query
2013 © Trivadis
Data = the most raw information
Data is information which is not derived from anywhere else
•  The most raw form of information
•  Data is the special information from which everything else is derived
Questions on data can be answered by running functions that take data
as input
The most general purpose data system can answer questions by running
functions that take the entire dataset as input
query = function (all data)
The lambda architecture provides a general purpose approach for
implementing arbitrary functions on an arbitrary datasets
24. April 2013
Big Data und Fast Data
22
2013 © Trivadis
Data = the most raw information
24. April 2013
Big Data und Fast Data
23
1.2.13 Add iPAD 64GB
10.3.13 Add Sony RX-100
11..3.13 Add Canon GX-10
11.3.13 Remove Sony RX-100
12.3.13 Add Nikon S-100
14.4.13 Add BoseQC-15
15.4.13 Add MacBook Pro 15
20.4.13 Remove Canon GX10
iPAD 64GB
Nikon S-100
BoseQC-15
MacBook Pro 15
4derive derive
Favorite Product List Changes
Current Favorite 

Product List
Current
Product
Count
Raw information => data
Information => derived
2013 © Trivadis
Big Data and Batch Processing
24. April 2013
Big Data und Fast Data
24
Immutable
data
Batch
View
Query??
Incoming
Data
How to compute the batch views ?
How to compute queries from the views ?
2013 © Trivadis
Big Data and Batch Processing
24. April 2013
Big Data und Fast Data
25
Fully processed data Last full
batch period
Time for

batch job
time
now
non-processed data
time
now
batch-processed data
§  Using only batch processing, leaves you always with a portion of non-
processed data.
Adapted from Ted Dunning (March 2012):
http://www.youtube.com/watch?v=7PcmbI5aC20
But we are not done yet …
2013 © Trivadis
Adding Real-Time Processing
24. April 2013
Big Data und Fast Data
26
Immutable
data
Batch
Views
Query
?
Data
Stream
Realtime
Views
Incoming
Data
How to compute queries 

from the views ?How to compute real-time views
2013 © Trivadis
Adding Real-Time Processing
24. April 2013
Big Data und Fast Data
27
1.2.13 Add iPAD 64GB
10.3.13 Add Sony RX-100
11..3.13 Add Canon GX-10
11.3.13 Remove Sony RX-100
12.3.13 Add Nikon S-100
14.4.13 Add BoseQC-15
15.4.13 Add MacBook Pro 15
20.4.13 Remove Canon GX10
Now Add Canon Scanner
iPAD 64GB
Nikon S-100
BoseQC-15
MacBook Pro 15
5
compute
Favorite Product List Changes
Current Favorite 

Product List
Current
Product
Count
Now Canon ScannercomputeAdd Canon Scanner
Stream of
Favorite Product List Changes
Immutable data
Views
Data Stream
Query
2013 © Trivadis
Big Data and Real Time Processing
24. April 2013
Big Data und Fast Data
28
time
Fully processed data Last full
batch period
now
Time for

batch job
batch processing

worked fine here
(e.g. Hadoop)
real time processing

works here
blended view for end user
Adapted from Ted Dunning (March 2012):
http://www.youtube.com/watch?v=7PcmbI5aC20
2013 © Trivadis
Agenda
1.  Big Data, what is it?
2.  Motivation
3.  The Lambda Architecture
4.  Implementing the Lambda Architecture
5.  Summary
24. April 2013
Big Data und Fast Data
29
2013 © Trivadis
Lambda Architecture
24. April 2013
Big Data und Fast Data
30
Immutable
data
Batch
View
Query
Data
Stream
Realtime
View
Incoming
Data
Serving Layer
Speed Layer
Batch Layer
A
B
C D
E
F
G
2013 © Trivadis
Lambda Architecture
A.  All data is sent to both the batch and speed layer
B.  Master data set is an immutable, append-only set of data
C.  Batch layer pre-computes query functions from scratch, result is called Batch
Views. Batch layer constantly re-computes the batch views.
D.  Batch views are indexed and stored in a scalable database to get particular
values very quickly. Swaps in new batch views when they are available
E.  Speed layer compensates for the high latency of updates to the Batch Views in
the Serving layer.
F.  Uses fast incremental algorithms and read/write databases to produce real-
time views
G.  Queries are resolved by getting results from both batch and real-time views
24. April 2013
Big Data und Fast Data
31
2013 © Trivadis
Layered Architecture
Stores the immutable constantly growing dataset
Computes arbitrary views from this dataset using BigData
technologies (can take hours)
Can be always recreated
Responsible for indexing and exposing the pre-computed batch
views so that they can be queried
Exposes the incremented real-time views
Merges the batch and the real-time views into a consistent result
Computes the views from the constant stream of data it receives
Needed to compensate for the high latency of the batch layer
Incremental model and views are transient
24. April 2013
Big Data und Fast Data
32
Serving Layer
Batch Layer
Speed Layer
2013 © Trivadis
Agenda
1.  Big Data, what is it?
2.  Motivation
3.  The Lambda Architecture
4.  Implementing the Lambda Architecture
5.  Summary
24. April 2013
Big Data und Fast Data
33
2013 © Trivadis
Lambda Architecture
24. April 2013
Big Data und Fast Data
34
Speed Layer
Precompute
Views
query
Source: Marz, N. & Warren, J. (2013) Big Data. Manning.
Batch Layer
Precomputed
information
All data
Incremented
information
Process stream
Incoming
Data
Batch
recompute
Realtime
increment
Serving Layer
batch view
batch view
real time view
real time view
Merge
2013 © Trivadis
Lambda Architecture
24. April 2013
Big Data und Fast Data
35
one possible product/framework mapping
Speed Layer
Precompute
Views
query
Batch Layer
Precomputed
information
All data
Incremented
information
Process stream
Incoming
Data
Batch
recompute
Realtime
increment
Serving Layer
batch view
batch view
real time view
real time view
Merge
2013 © Trivadis
Implementing Batch Layer
Immutable Data
•  Append only
•  Normalized
•  Stores master copy of all data
Pre-computed information
•  Function that takes all data as input
query = function(all-data)
•  High Latency, Batch processing
•  Unrestrained computation
•  Horizontal scalable
24. April 2013
Big Data und Fast Data
36
Immutable
data
Batch

Views
compute
Precompute
Views
Batch Layer
Precomputed
information
All data
Batch
recompute
Batch Layer Serving Layer
2013 © Trivadis
Apache Hadoop HDFS
HDFS = the Hadoop Distributed File System
A distributed file storage system
Redundant storage
Designed to reliably store data using commodity hardware
Designed to expect hardware failures
Intended for large files
Designed for batch inserts
24. April 2013
Big Data und Fast Data
37
Batch Layer
2013 © Trivadis
Apache Hadoop Map Reduce
24. April 2013
Big Data und Fast Data
38
§  Hadoop Map Reduce is an open source implementation of the
MapReduce framework.
§  Map Reduce is
§  a programming model, introduced by Google, for processing large data sets,
in a distributed environment
§  De-facto standard to compute huge amounts of data
§  An execution framework for organizing and performing such computations
MAP
master
node
REDUCE
worker node 1
worker node 2
worker node 3
problem
data
solution
data
Batch Layer
2013 © Trivadis
Hadoop MapReduce Flow
24. April 2013
Big Data und Fast Data
39
Source: Bill Graham, Twitter Inc.
Batch Layer
2013 © Trivadis
Hadoop MapReduce
24. April 2013
Big Data und Fast Data
40
Batch Layer
2013 © Trivadis
Cascading
Application framework for Java developers to simply develop robust Data
Analytics and Data Management applications on Apache Hadoop
adds an abstraction layer over the Hadoop API
core concepts of the cascading API:
•  Pipe: a series of processing steps (parsing, looping, filtering, etc) defining the
data processing to be done
•  Flow: association of a pipe (or set of pipes) with a data-source and data-sink
24. April 2013
Big Data und Fast Data
41
Batch Layer
2013 © Trivadis
Casading
24. April 2013
Big Data und Fast Data
42
2013 © Trivadis
Apache Pig
Apache Pig is a platform for analyzing large data sets
Key Properties
•  Ease of programming
•  Optimization opportunities
•  Extensibility
24. April 2013
Big Data und Fast Data
43
Batch Layer
2013 © Trivadis
Implementing Serving Layer

for Batch Views
Need a database that
•  Is batch-writable
•  Adding new information is atomic
•  Has fast random reads
•  Is scalable
•  Is highly available
•  Can be optimized for Storage
•  Information can be de-normalized
•  But no Random writes required!
•  Can be a simple database
24. April 2013
Big Data und Fast Data
44
Serving Layer
batch view
batch view
Batch Layer
Precomputed
information
Immutable
data
Batch

Views
compute
Batch Layer Serving Layer
2013 © Trivadis
SploutSQL
Full SQL => unlike NoSQL
For BigData => unlike RDBMS
Web latency & throughput => unlike Apache Hive, Apache Drill
Why does it scale
•  Data is partitioned
•  Partitions are distributed 

across nodes
•  Adding more nodes 

increase capacity
•  Generation does not 

impact serving
24. April 2013
Big Data und Fast Data
45
Serving Layer
Source: Datasalt.
2013 © Trivadis
SploutSQL
24. April 2013
Big Data und Fast Data
46
Serving Layer
2013 © Trivadis
Implementing Speed Layer

Stream Processing
Continuous computation
Transactional
Storing a limited window of data
•  Compensating for the last few 

hours of data
All the complexity is isolated in the 

speed layer
•  If anything goes wrong, it‘s 

autocorrected by the next batch run
24. April 2013
Big Data und Fast Data
47
Speed Layer
Incremented
information
Process stream
Realtime
increment
Data
Stream
Realtime

Views
derive
Speed Layer Serving Layer
2013 © Trivadis
Apache Kafka
A high throughput distributed messaging system
Originated at LinkedIn
Sequential disk access
24. April 2013
Big Data und Fast Data
48
2013 © Trivadis
Twitter Storm – the “real-time Hadoop”
24. April 2013
Big Data und Fast Data
49
§  Strom is a distributed and fault-tolerant real-time computing platform
§  data flow model, data flows through network of transformation entities
§  Key concepts
§  Tuple: ordered list of elements
§  Streams: unbounded sequence of tuples
§  Spouts: Source of streams
§  Bolts: Process tuples and create new streams
§  Topologies: directed graph of Spouts and Bolts
§  Use Cases
§  Stream Processing
§  Continuous Computation
§  Distributed RPC
SPOUT
BOLT
„MAP“ „REDUCE“
„PERSIST“
problem
data
data
source
solution
data
Speed Layer
Serving Layer
BOLT
BOLT
2013 © Trivadis
Twitter Storm
24. April 2013
Big Data und Fast Data
50
Speed Layer
Serving Layer
2013 © Trivadis
Twitter Trident
Higher level abstraction over Storm
Trident State
Grouped Stream
Functions, Filters
Aggregators
Query
Similar to Pig and Cascading
24. April 2013
Big Data und Fast Data
51
Speed Layer
Serving Layer
2013 © Trivadis
Twitter Trident
24. April 2013
Big Data und Fast Data
52
Speed Layer
Serving Layer
2013 © Trivadis
Implementing Serving Layer

for Real-Time Views
Incremental updates are made available as real-time views
Requires a database that support random read and random writes
•  Relational, NoSQL or NewSQL (in memory) databases can be used
•  Here we are typically not in the BigData range
Results are only needed until the data made it through the batch layer
Complexity isolation
24. April 2013
Big Data und Fast Data
53
Data
Stream
Realtime

Views
derive
Speed Layer Serving Layer
Speed Layer Serving Layer
real time view
real time view
Incremented
information
2013 © Trivadis
Cassandra
Fully distributed, no single-point-of-failure
Linearly scalable
Fault tolerant
Performant
Durable
Integrated caching
Tunable consistency
24. April 2013
Big Data und Fast Data
54
Serving Layer
2013 © Trivadis
Implementing Serving Layer

Merge of Batch and Realtime Views
An interesting feature of Storm /
Trident is the ability to execute
distributed RPC (DRPC) calls in
parallel
This can be used to implement the
merge functionality when a query is
executed
24. April 2013
Big Data und Fast Data
55
Serving Layer
batch view
batch view
real time view
real time view
Realtime

Views
Serving Layer
Batch Views
Merge
query
2013 © Trivadis
Storm / Trident DRPC
24. April 2013
Big Data und Fast Data
56
Serving Layer
2013 © Trivadis
Agenda
1.  Big Data, what is it?
2.  Motivation
3.  The Lambda Architecture
4.  Implementing the Lambda Architecture
5.  Summary
24. April 2013
Big Data und Fast Data
57
2013 © Trivadis
Summary – The lambda architecture
24. April 2013
Big Data und Fast Data
58
§  The Lambda Architecture
§  Can discard batch views and real-time views and recreate everything from
scratch
§  Mistakes corrected via re-computation
§  Data storage layer optimized independently from query resolution layer
§  Still in a very early …. But a very interesting idea!
-  Today a zoo of technologies are needed => Operations won‘t like it
§  Different query language for batch and real time
§  An abstraction over batch and speed layer needed
-  Cascading and Trident are already similar
§  Industry standards needed!
2013 © Trivadis
BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN

THANK YOU.
Trivadis AG
Guido Schmutz & Albert Blarer
Europa-Strasse 5

CH-8095 Glattbrugg
info@trivadis.com

www.trivadis.com
24. April 2013
Big Data und Fast Data
59

Contenu connexe

Tendances

Introduction to Cloud Computing and Big Data
Introduction to Cloud Computing and Big DataIntroduction to Cloud Computing and Big Data
Introduction to Cloud Computing and Big Datawaheed751
 
Big data trends challenges opportunities
Big data trends challenges opportunitiesBig data trends challenges opportunities
Big data trends challenges opportunitiesMohammed Guller
 
Cloud Sobriety for Life Science IT Leadership (2018 Edition)
Cloud Sobriety for Life Science IT Leadership (2018 Edition)Cloud Sobriety for Life Science IT Leadership (2018 Edition)
Cloud Sobriety for Life Science IT Leadership (2018 Edition)Chris Dagdigian
 
Fixing data science & Accelerating Artificial Super Intelligence Development
 Fixing data science & Accelerating Artificial Super Intelligence Development Fixing data science & Accelerating Artificial Super Intelligence Development
Fixing data science & Accelerating Artificial Super Intelligence DevelopmentManojKumarR41
 
Introduction to big data and apache spark
Introduction to big data and apache sparkIntroduction to big data and apache spark
Introduction to big data and apache sparkMohammed Guller
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataMohammed Guller
 
Practical Petabyte Pushing
Practical Petabyte PushingPractical Petabyte Pushing
Practical Petabyte PushingChris Dagdigian
 
Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)Chris Dagdigian
 
The rise of “Big Data” on cloud computing
The rise of “Big Data” on cloud computingThe rise of “Big Data” on cloud computing
The rise of “Big Data” on cloud computingMinhazul Arefin
 
A beginners guide to Cloudera Hadoop
A beginners guide to Cloudera HadoopA beginners guide to Cloudera Hadoop
A beginners guide to Cloudera HadoopDavid Yahalom
 
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...Kai Wähner
 
2015 CDC Workshop on ScienceDMZ
2015 CDC Workshop on ScienceDMZ2015 CDC Workshop on ScienceDMZ
2015 CDC Workshop on ScienceDMZChris Dagdigian
 
The Evolution of Big Data Frameworks
The Evolution of Big Data FrameworksThe Evolution of Big Data Frameworks
The Evolution of Big Data FrameworkseXascale Infolab
 
Cloud Security for Life Science R&D
Cloud Security for Life Science R&DCloud Security for Life Science R&D
Cloud Security for Life Science R&DChris Dagdigian
 
How to build streaming data applications - evaluating the top contenders
How to build streaming data applications - evaluating the top contendersHow to build streaming data applications - evaluating the top contenders
How to build streaming data applications - evaluating the top contendersAkmal Chaudhri
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-HadoopNagarjuna D.N
 
Trends from the Trenches: 2019
Trends from the Trenches: 2019Trends from the Trenches: 2019
Trends from the Trenches: 2019Chris Dagdigian
 
Cloud-Based Big Data Analytics
Cloud-Based Big Data AnalyticsCloud-Based Big Data Analytics
Cloud-Based Big Data AnalyticsSateeshreddy N
 
Top 5 Considerations for a Big Data Solution
Top 5 Considerations for a Big Data SolutionTop 5 Considerations for a Big Data Solution
Top 5 Considerations for a Big Data SolutionDataStax
 

Tendances (20)

Introduction to Cloud Computing and Big Data
Introduction to Cloud Computing and Big DataIntroduction to Cloud Computing and Big Data
Introduction to Cloud Computing and Big Data
 
Big data trends challenges opportunities
Big data trends challenges opportunitiesBig data trends challenges opportunities
Big data trends challenges opportunities
 
Cloud Sobriety for Life Science IT Leadership (2018 Edition)
Cloud Sobriety for Life Science IT Leadership (2018 Edition)Cloud Sobriety for Life Science IT Leadership (2018 Edition)
Cloud Sobriety for Life Science IT Leadership (2018 Edition)
 
Fixing data science & Accelerating Artificial Super Intelligence Development
 Fixing data science & Accelerating Artificial Super Intelligence Development Fixing data science & Accelerating Artificial Super Intelligence Development
Fixing data science & Accelerating Artificial Super Intelligence Development
 
Introduction to big data and apache spark
Introduction to big data and apache sparkIntroduction to big data and apache spark
Introduction to big data and apache spark
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Practical Petabyte Pushing
Practical Petabyte PushingPractical Petabyte Pushing
Practical Petabyte Pushing
 
Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)
 
The rise of “Big Data” on cloud computing
The rise of “Big Data” on cloud computingThe rise of “Big Data” on cloud computing
The rise of “Big Data” on cloud computing
 
A beginners guide to Cloudera Hadoop
A beginners guide to Cloudera HadoopA beginners guide to Cloudera Hadoop
A beginners guide to Cloudera Hadoop
 
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...
 
2015 CDC Workshop on ScienceDMZ
2015 CDC Workshop on ScienceDMZ2015 CDC Workshop on ScienceDMZ
2015 CDC Workshop on ScienceDMZ
 
The Evolution of Big Data Frameworks
The Evolution of Big Data FrameworksThe Evolution of Big Data Frameworks
The Evolution of Big Data Frameworks
 
Data vault what's Next: Part 2
Data vault what's Next: Part 2Data vault what's Next: Part 2
Data vault what's Next: Part 2
 
Cloud Security for Life Science R&D
Cloud Security for Life Science R&DCloud Security for Life Science R&D
Cloud Security for Life Science R&D
 
How to build streaming data applications - evaluating the top contenders
How to build streaming data applications - evaluating the top contendersHow to build streaming data applications - evaluating the top contenders
How to build streaming data applications - evaluating the top contenders
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-Hadoop
 
Trends from the Trenches: 2019
Trends from the Trenches: 2019Trends from the Trenches: 2019
Trends from the Trenches: 2019
 
Cloud-Based Big Data Analytics
Cloud-Based Big Data AnalyticsCloud-Based Big Data Analytics
Cloud-Based Big Data Analytics
 
Top 5 Considerations for a Big Data Solution
Top 5 Considerations for a Big Data SolutionTop 5 Considerations for a Big Data Solution
Top 5 Considerations for a Big Data Solution
 

En vedette

Fast Data – the New Big Data
Fast Data – the New Big DataFast Data – the New Big Data
Fast Data – the New Big DataVoltDB
 
1%2 b inteligencias%2bm%25c3%25baltiples%2bok
1%2 b inteligencias%2bm%25c3%25baltiples%2bok1%2 b inteligencias%2bm%25c3%25baltiples%2bok
1%2 b inteligencias%2bm%25c3%25baltiples%2boknereidavaleria_1792
 
Archivos Web y Economía Digital. María Fernández Rancaño
Archivos Web y Economía Digital. María Fernández RancañoArchivos Web y Economía Digital. María Fernández Rancaño
Archivos Web y Economía Digital. María Fernández RancañoBiblioteca Nacional de España
 
Beefcious quotes slideshare
Beefcious quotes slideshareBeefcious quotes slideshare
Beefcious quotes slideshareBeefcious
 
Buenas prácticas Dinamización Parques "Yo soy tetuan"
Buenas prácticas Dinamización Parques  "Yo soy tetuan"Buenas prácticas Dinamización Parques  "Yo soy tetuan"
Buenas prácticas Dinamización Parques "Yo soy tetuan"LA RUECA Asociación
 
Detection of brown dwarf like objects in the core of ngc3603
Detection of brown dwarf like objects in the core of ngc3603Detection of brown dwarf like objects in the core of ngc3603
Detection of brown dwarf like objects in the core of ngc3603Sérgio Sacani
 
La pobreza en el distrito federal en el 2004
La pobreza en el distrito federal en el 2004La pobreza en el distrito federal en el 2004
La pobreza en el distrito federal en el 2004Cristobal
 
CONOCIENDO LA CAPITAL DE MI PROVINCIA
CONOCIENDO LA CAPITAL DE MI PROVINCIACONOCIENDO LA CAPITAL DE MI PROVINCIA
CONOCIENDO LA CAPITAL DE MI PROVINCIAguestff5f53
 
Taller de velomancia Sabado 12 de Mayo
Taller de velomancia Sabado 12 de MayoTaller de velomancia Sabado 12 de Mayo
Taller de velomancia Sabado 12 de Mayopablomistico
 
Condiciones Generales ADESLAS COMPLETA
Condiciones Generales ADESLAS COMPLETACondiciones Generales ADESLAS COMPLETA
Condiciones Generales ADESLAS COMPLETAComercial-APPSalud
 
Herausforderung „Multi-Channel“-Architektur
Herausforderung „Multi-Channel“-ArchitekturHerausforderung „Multi-Channel“-Architektur
Herausforderung „Multi-Channel“-ArchitekturOPEN KNOWLEDGE GmbH
 
African Leadership in ICT and Knowledge Societies: Issues, Tensions and Oppor...
African Leadership in ICT and Knowledge Societies: Issues, Tensions and Oppor...African Leadership in ICT and Knowledge Societies: Issues, Tensions and Oppor...
African Leadership in ICT and Knowledge Societies: Issues, Tensions and Oppor...Wesley Schwalje
 
Building a Turbo-fast Data Warehousing Platform with Databricks
Building a Turbo-fast Data Warehousing Platform with DatabricksBuilding a Turbo-fast Data Warehousing Platform with Databricks
Building a Turbo-fast Data Warehousing Platform with DatabricksDatabricks
 
Bảo mật mạng máy tính và tường lửa
Bảo mật mạng máy tính và tường lửaBảo mật mạng máy tính và tường lửa
Bảo mật mạng máy tính và tường lửateenteen.mobi mobile
 

En vedette (20)

Fast Data – the New Big Data
Fast Data – the New Big DataFast Data – the New Big Data
Fast Data – the New Big Data
 
1%2 b inteligencias%2bm%25c3%25baltiples%2bok
1%2 b inteligencias%2bm%25c3%25baltiples%2bok1%2 b inteligencias%2bm%25c3%25baltiples%2bok
1%2 b inteligencias%2bm%25c3%25baltiples%2bok
 
Archivos Web y Economía Digital. María Fernández Rancaño
Archivos Web y Economía Digital. María Fernández RancañoArchivos Web y Economía Digital. María Fernández Rancaño
Archivos Web y Economía Digital. María Fernández Rancaño
 
Resources for Mobiles
Resources for MobilesResources for Mobiles
Resources for Mobiles
 
Defib Course
Defib CourseDefib Course
Defib Course
 
Beefcious quotes slideshare
Beefcious quotes slideshareBeefcious quotes slideshare
Beefcious quotes slideshare
 
Brochure EQUIPMAG 2016
Brochure EQUIPMAG 2016Brochure EQUIPMAG 2016
Brochure EQUIPMAG 2016
 
Santa maria 5
Santa maria 5Santa maria 5
Santa maria 5
 
Buenas prácticas Dinamización Parques "Yo soy tetuan"
Buenas prácticas Dinamización Parques  "Yo soy tetuan"Buenas prácticas Dinamización Parques  "Yo soy tetuan"
Buenas prácticas Dinamización Parques "Yo soy tetuan"
 
Detection of brown dwarf like objects in the core of ngc3603
Detection of brown dwarf like objects in the core of ngc3603Detection of brown dwarf like objects in the core of ngc3603
Detection of brown dwarf like objects in the core of ngc3603
 
ICC GROUP
ICC GROUPICC GROUP
ICC GROUP
 
La pobreza en el distrito federal en el 2004
La pobreza en el distrito federal en el 2004La pobreza en el distrito federal en el 2004
La pobreza en el distrito federal en el 2004
 
CONOCIENDO LA CAPITAL DE MI PROVINCIA
CONOCIENDO LA CAPITAL DE MI PROVINCIACONOCIENDO LA CAPITAL DE MI PROVINCIA
CONOCIENDO LA CAPITAL DE MI PROVINCIA
 
Taller de velomancia Sabado 12 de Mayo
Taller de velomancia Sabado 12 de MayoTaller de velomancia Sabado 12 de Mayo
Taller de velomancia Sabado 12 de Mayo
 
Condiciones Generales ADESLAS COMPLETA
Condiciones Generales ADESLAS COMPLETACondiciones Generales ADESLAS COMPLETA
Condiciones Generales ADESLAS COMPLETA
 
REVISTA NUMERO 27 CANDÁS MARINERO
REVISTA NUMERO 27 CANDÁS MARINEROREVISTA NUMERO 27 CANDÁS MARINERO
REVISTA NUMERO 27 CANDÁS MARINERO
 
Herausforderung „Multi-Channel“-Architektur
Herausforderung „Multi-Channel“-ArchitekturHerausforderung „Multi-Channel“-Architektur
Herausforderung „Multi-Channel“-Architektur
 
African Leadership in ICT and Knowledge Societies: Issues, Tensions and Oppor...
African Leadership in ICT and Knowledge Societies: Issues, Tensions and Oppor...African Leadership in ICT and Knowledge Societies: Issues, Tensions and Oppor...
African Leadership in ICT and Knowledge Societies: Issues, Tensions and Oppor...
 
Building a Turbo-fast Data Warehousing Platform with Databricks
Building a Turbo-fast Data Warehousing Platform with DatabricksBuilding a Turbo-fast Data Warehousing Platform with Databricks
Building a Turbo-fast Data Warehousing Platform with Databricks
 
Bảo mật mạng máy tính và tường lửa
Bảo mật mạng máy tính và tường lửaBảo mật mạng máy tính và tường lửa
Bảo mật mạng máy tính và tường lửa
 

Similaire à Big Data and Fast Data - big and fast combined, is it possible?

Big Data and Fast Data – Big and Fast Combined, is it Possible?
Big Data and Fast Data – Big and Fast Combined, is it Possible?Big Data and Fast Data – Big and Fast Combined, is it Possible?
Big Data and Fast Data – Big and Fast Combined, is it Possible?Guido Schmutz
 
Big data oracle_introduccion
Big data oracle_introduccionBig data oracle_introduccion
Big data oracle_introduccionFran Navarro
 
TidalScale Overview
TidalScale OverviewTidalScale Overview
TidalScale OverviewPete Jarvis
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubCloudera, Inc.
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An IntroductionDenodo
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An IntroductionDenodo
 
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesPutting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesDATAVERSITY
 
Expanded top ten_big_data_security_and_privacy_challenges
Expanded top ten_big_data_security_and_privacy_challengesExpanded top ten_big_data_security_and_privacy_challenges
Expanded top ten_big_data_security_and_privacy_challengesTom Kirby
 
Top ten big data security and privacy challenges
Top ten big data security and privacy challengesTop ten big data security and privacy challenges
Top ten big data security and privacy challengesBee_Ware
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database RoundtableEric Kavanagh
 
Building the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architectureBuilding the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architecturemark madsen
 
How the Journey to Modern Data Management is Paved with an Inclusive Edge-to-...
How the Journey to Modern Data Management is Paved with an Inclusive Edge-to-...How the Journey to Modern Data Management is Paved with an Inclusive Edge-to-...
How the Journey to Modern Data Management is Paved with an Inclusive Edge-to-...Dana Gardner
 
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Denodo
 
Think Big - How to Design a Big Data Information Architecture
Think Big - How to Design a Big Data Information ArchitectureThink Big - How to Design a Big Data Information Architecture
Think Big - How to Design a Big Data Information ArchitectureInside Analysis
 
Webinar - The Agility Challenge - Powering Cloud Apps with Multi-Model & Mixe...
Webinar - The Agility Challenge - Powering Cloud Apps with Multi-Model & Mixe...Webinar - The Agility Challenge - Powering Cloud Apps with Multi-Model & Mixe...
Webinar - The Agility Challenge - Powering Cloud Apps with Multi-Model & Mixe...DataStax
 
Hadoop and the Future of SQL: Using BI Tools with Big Data
Hadoop and the Future of SQL: Using BI Tools with Big DataHadoop and the Future of SQL: Using BI Tools with Big Data
Hadoop and the Future of SQL: Using BI Tools with Big DataSenturus
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyRohit Dubey
 

Similaire à Big Data and Fast Data - big and fast combined, is it possible? (20)

Big Data and Fast Data – Big and Fast Combined, is it Possible?
Big Data and Fast Data – Big and Fast Combined, is it Possible?Big Data and Fast Data – Big and Fast Combined, is it Possible?
Big Data and Fast Data – Big and Fast Combined, is it Possible?
 
Big Data and Fast Data combined – is it possible?
Big Data and Fast Data combined – is it possible?Big Data and Fast Data combined – is it possible?
Big Data and Fast Data combined – is it possible?
 
Big data oracle_introduccion
Big data oracle_introduccionBig data oracle_introduccion
Big data oracle_introduccion
 
TidalScale Overview
TidalScale OverviewTidalScale Overview
TidalScale Overview
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 
Using Big Data Analytics
Using Big Data AnalyticsUsing Big Data Analytics
Using Big Data Analytics
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesPutting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
 
Expanded top ten_big_data_security_and_privacy_challenges
Expanded top ten_big_data_security_and_privacy_challengesExpanded top ten_big_data_security_and_privacy_challenges
Expanded top ten_big_data_security_and_privacy_challenges
 
Top ten big data security and privacy challenges
Top ten big data security and privacy challengesTop ten big data security and privacy challenges
Top ten big data security and privacy challenges
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
Building the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architectureBuilding the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architecture
 
Fast analytics kudu to druid
Fast analytics  kudu to druidFast analytics  kudu to druid
Fast analytics kudu to druid
 
How the Journey to Modern Data Management is Paved with an Inclusive Edge-to-...
How the Journey to Modern Data Management is Paved with an Inclusive Edge-to-...How the Journey to Modern Data Management is Paved with an Inclusive Edge-to-...
How the Journey to Modern Data Management is Paved with an Inclusive Edge-to-...
 
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
 
Think Big - How to Design a Big Data Information Architecture
Think Big - How to Design a Big Data Information ArchitectureThink Big - How to Design a Big Data Information Architecture
Think Big - How to Design a Big Data Information Architecture
 
Webinar - The Agility Challenge - Powering Cloud Apps with Multi-Model & Mixe...
Webinar - The Agility Challenge - Powering Cloud Apps with Multi-Model & Mixe...Webinar - The Agility Challenge - Powering Cloud Apps with Multi-Model & Mixe...
Webinar - The Agility Challenge - Powering Cloud Apps with Multi-Model & Mixe...
 
Hadoop and the Future of SQL: Using BI Tools with Big Data
Hadoop and the Future of SQL: Using BI Tools with Big DataHadoop and the Future of SQL: Using BI Tools with Big Data
Hadoop and the Future of SQL: Using BI Tools with Big Data
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit Dubey
 

Plus de Guido Schmutz

30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as Code30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as CodeGuido Schmutz
 
Event Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureEvent Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureGuido Schmutz
 
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsBig Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsGuido Schmutz
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!Guido Schmutz
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Guido Schmutz
 
Event Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data ArchitectureEvent Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data ArchitectureGuido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaGuido Schmutz
 
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) ArchitectureEvent Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) ArchitectureGuido Schmutz
 
Building Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaBuilding Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaGuido Schmutz
 
Location Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache KafkaLocation Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache KafkaGuido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache KafkaSolutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache KafkaGuido Schmutz
 
What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?Guido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaGuido Schmutz
 
Location Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using KafkaLocation Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using KafkaGuido Schmutz
 
Streaming Visualisation
Streaming VisualisationStreaming Visualisation
Streaming VisualisationGuido Schmutz
 
Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?Guido Schmutz
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaGuido Schmutz
 
Fundamentals Big Data and AI Architecture
Fundamentals Big Data and AI ArchitectureFundamentals Big Data and AI Architecture
Fundamentals Big Data and AI ArchitectureGuido Schmutz
 
Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka Guido Schmutz
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming VisualizationGuido Schmutz
 

Plus de Guido Schmutz (20)

30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as Code30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as Code
 
Event Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureEvent Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data Architecture
 
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsBig Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?
 
Event Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data ArchitectureEvent Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data Architecture
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
 
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) ArchitectureEvent Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
 
Building Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaBuilding Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache Kafka
 
Location Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache KafkaLocation Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache Kafka
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache KafkaSolutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
 
What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
 
Location Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using KafkaLocation Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using Kafka
 
Streaming Visualisation
Streaming VisualisationStreaming Visualisation
Streaming Visualisation
 
Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
 
Fundamentals Big Data and AI Architecture
Fundamentals Big Data and AI ArchitectureFundamentals Big Data and AI Architecture
Fundamentals Big Data and AI Architecture
 
Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 

Dernier

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 

Dernier (20)

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 

Big Data and Fast Data - big and fast combined, is it possible?

  • 1. 2013 © Trivadis BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN
 WELCOME Big Data and Fast Data big and fast combined – is it possible? Guido Schmutz und Albert Blarer 24. April 2013 24. April 2013 Big Data und Fast Data 1
  • 2. 2013 © Trivadis Guido Schmutz •  Working for Trivadis for more than 16 years •  Oracle ACE Director for Fusion Middleware and SOA •  Co-Author of different books •  Consultant, Trainer Software Architect for Java, Oracle, SOA and EDA •  Member of Trivadis Architecture Board •  Technology Manager @ Trivadis •  More than 25 years of software development 
 experience •  Contact: guido.schmutz@trivadis.com •  Blog: http://guidoschmutz.wordpress.com •  Twitter: gschmutz 14.06.2012 2 Where and When should I use the Oracle Service Bus (OSB)
  • 3. 2013 © Trivadis BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN

  • 4. 2013 © Trivadis Mit über 600 IT- und Fachexperten bei Ihnen vor Ort. 4 11 Trivadis Niederlassungen mit
 über 600 Mitarbeitenden 200 Service Level Agreements Mehr als 4'000 Trainingsteilnehmer Forschungs- und Entwicklungs- budget: CHF 5.0 / EUR 4 Mio. Finanziell unabhängig und
 nachhaltig profitabel Erfahrung aus mehr als 1'900 Projekten pro Jahr bei über 800 Kunden Stand 12/2012 Hamburg Düsseldorf Frankfurt Freiburg München Wien Basel ZürichBern Lausanne 4 Stuttgart Datum Trivadis – das Unternehmen
  • 5. 2013 © Trivadis Credits Nathan Marz Author of „ Big Data – Principles and best practics of scalable realtime data systems“ – Manning Press Used to be working at Backtype and Twitter Creator of •  Storm •  Cascalog •  ElephantDB 24. April 2013 Big Data und Fast Data 5
  • 6. 2013 © Trivadis Agenda 1.  Big Data, what is it? 2.  Motivation 3.  The Lambda Architecture 4.  Implementing the Lambda Architecture 5.  Summary 24. April 2013 Big Data und Fast Data 6
  • 7. 2013 © Trivadis Big Data Definition (Gartner et al) 14.02.2013 Big Data 4 Sales 7 Velocity Tera-, Peta-, Exa-, Zetta-, Yota- bytes and constantly growing “Traditional” computing in RDBMS 
 is not scalable enough. 
 We search for “linear scalability” “Only … structured information 
 is not enough” – “95% of produced data in unstructured” Characteristics of Big Data: Its Volume, Velocity and Variety in combination + Veracity (IBM) - information uncertainty + Time to action ? – Big Data + Event Processing = Fast Data
  • 8. 2013 © Trivadis Big Data Emerging Technologies 24. April 2013 Big Data und Fast Data 8 §  MapReduce (e.g. Apache Hadoop) §  Event Stream Processing & CEP (e.g. Storm or Esper) §  New messaging systems (e.g. Apache Kafka) §  Integration tools (e.g. Spring or Camus) §  New database paradigms (e.g. NoSQL or NewSQL) §  Data mining tools (e.g. Apache Mahout ) §  Data extraction and detection tools (e.g. Apache Tika )
  • 10. 2013 © Trivadis Volume Development 0 20 40 60 80 100 0 2000 4000 6000 8000 2005 2007 2009 2011 2013 2015 AggregateUncertainty% GlobalDataVolumeinExabytes Year Sensors: “internet of things” Social Media: video, audio, text VoIP: Skype, MSN, ICQ, ... Enterprise Data: data dictionary, ERD, ... 24. April 2013 Big Data und Fast Data 10
  • 11. 2013 © Trivadis Velocity 24. April 2013 Big Data und Fast Data 11 §  Velocity requirement examples: §  Recommendation Engine §  Predictive Analytics §  Marketing Campaign Analysis §  Customer Retention and Churn Analysis §  Social Graph Analysis §  Capital Markets Analysis §  Risk Management §  Rogue Trading §  Fraud Detection §  Retail Banking §  Network Monitoring §  Research and Development
  • 12. 2013 © Trivadis Agenda 1.  Big Data, what is it? 2.  Motivation 3.  The Lambda Architecture 4.  Implementing the Lambda Architecture 5.  Summary 24. April 2013 Big Data und Fast Data 12
  • 13. 2013 © Trivadis What is a data system? •  A system that manages the storage and querying of data with a lifetime measured in years encompassing every version of the application to ever exist, every hardware failure and every human mistake ever made. •  A data system answers questions based on information that was acquired in the past •  Not all bits of information are equal •  Some information is derived from other 24. April 2013 Big Data und Fast Data 13
  • 14. 2013 © Trivadis Desired Properties of a (Big) Data System Robust and fault-tolerant Low latency reads and updates Scalable General Extensible Allows ad hoc queries Minimal maintenance Debuggable 24. April 2013 Big Data und Fast Data 14
  • 15. 2013 © Trivadis Typical problem in today’s
 architecture/systems Bugs will be deployed to production over the lifetime of a data system Operational mistakes will be made Humans are part of the overall system •  Just like hard disks, CPUs, memory, software •  design for human error like you design for any other fault Examples of human error •  Deploy a bug that increments counters by two instead of by one •  Accidentally delete data from database •  Accidental DOS on important internal service Worst two consequences: data loss or data corruption As long as an error doesn‘t lose or corrupt good data, you can fix what went wrong 24. April 2013 Big Data und Fast Data 15 Lack of Human Fault Tolerance
  • 16. 2013 © Trivadis Mutability The U and D in CRUD A mutable system updates the current state of the world Mutable systems inherently lack human fault-tolerance Easy to corrupt or lose data 24. April 2013 Big Data und Fast Data 16 Capturing change traditionally Lack of Human Fault Tolerance Name City Guido Berne Albert Zurich Name City Guido Basel Albert Zurich
  • 17. 2013 © Trivadis Immutability An immutable system captures historical records of events Each event happens at a particular time and is always true 24. April 2013 Big Data und Fast Data 17 Capturing change by storing events Lack of Human Fault Tolerance Name City Timestamp Guido Berne 1.8.1999 Albert Zurich 10.5.1988 Name City Timestamp Guido Berne 1.8.1999 Albert Zurich 10.5.1988 Guido Basel 1.4.2013
  • 18. 2013 © Trivadis Immutability Immutability greatly restricts the range of errors that can cause data loss or data corruption Vastly more human fault-tolerant Much easier to reason about systems based on immutability Conclusion: Your source of truth should always be immutable 24. April 2013 Big Data und Fast Data 18 Lack of Human Fault Tolerance
  • 19. 2013 © Trivadis What about traditional/today’s architectures ? 
 Source of Truth is mutable! Rather than build systems like this …. 24. April 2013 Big Data und Fast Data 19 Mutable Database Application (Query) RDBMS NoSQL NewSQL Mobile Web RIA Rich Client Source of Truth Source of Truth
  • 20. 2013 © Trivadis A different kind of architecture with immutable source of truth … why not building them like this 24. April 2013 Big Data und Fast Data 20 HDFS NoSQL NewSQL RDBMS View on Data Mobile Web RIA Rich Client Source of Truth Immutable data View on Data Application (Query) Source of Truth
  • 21. 2013 © Trivadis How to create the views on the Immutable data? On the fly ? Materialized, i.e. Pre-computed ? 24. April 2013 Big Data und Fast Data 21 Immutable data View Immutable data Pre-
 Computed
 Views Query Query
  • 22. 2013 © Trivadis Data = the most raw information Data is information which is not derived from anywhere else •  The most raw form of information •  Data is the special information from which everything else is derived Questions on data can be answered by running functions that take data as input The most general purpose data system can answer questions by running functions that take the entire dataset as input query = function (all data) The lambda architecture provides a general purpose approach for implementing arbitrary functions on an arbitrary datasets 24. April 2013 Big Data und Fast Data 22
  • 23. 2013 © Trivadis Data = the most raw information 24. April 2013 Big Data und Fast Data 23 1.2.13 Add iPAD 64GB 10.3.13 Add Sony RX-100 11..3.13 Add Canon GX-10 11.3.13 Remove Sony RX-100 12.3.13 Add Nikon S-100 14.4.13 Add BoseQC-15 15.4.13 Add MacBook Pro 15 20.4.13 Remove Canon GX10 iPAD 64GB Nikon S-100 BoseQC-15 MacBook Pro 15 4derive derive Favorite Product List Changes Current Favorite 
 Product List Current Product Count Raw information => data Information => derived
  • 24. 2013 © Trivadis Big Data and Batch Processing 24. April 2013 Big Data und Fast Data 24 Immutable data Batch View Query?? Incoming Data How to compute the batch views ? How to compute queries from the views ?
  • 25. 2013 © Trivadis Big Data and Batch Processing 24. April 2013 Big Data und Fast Data 25 Fully processed data Last full batch period Time for
 batch job time now non-processed data time now batch-processed data §  Using only batch processing, leaves you always with a portion of non- processed data. Adapted from Ted Dunning (March 2012): http://www.youtube.com/watch?v=7PcmbI5aC20 But we are not done yet …
  • 26. 2013 © Trivadis Adding Real-Time Processing 24. April 2013 Big Data und Fast Data 26 Immutable data Batch Views Query ? Data Stream Realtime Views Incoming Data How to compute queries 
 from the views ?How to compute real-time views
  • 27. 2013 © Trivadis Adding Real-Time Processing 24. April 2013 Big Data und Fast Data 27 1.2.13 Add iPAD 64GB 10.3.13 Add Sony RX-100 11..3.13 Add Canon GX-10 11.3.13 Remove Sony RX-100 12.3.13 Add Nikon S-100 14.4.13 Add BoseQC-15 15.4.13 Add MacBook Pro 15 20.4.13 Remove Canon GX10 Now Add Canon Scanner iPAD 64GB Nikon S-100 BoseQC-15 MacBook Pro 15 5 compute Favorite Product List Changes Current Favorite 
 Product List Current Product Count Now Canon ScannercomputeAdd Canon Scanner Stream of Favorite Product List Changes Immutable data Views Data Stream Query
  • 28. 2013 © Trivadis Big Data and Real Time Processing 24. April 2013 Big Data und Fast Data 28 time Fully processed data Last full batch period now Time for
 batch job batch processing
 worked fine here (e.g. Hadoop) real time processing
 works here blended view for end user Adapted from Ted Dunning (March 2012): http://www.youtube.com/watch?v=7PcmbI5aC20
  • 29. 2013 © Trivadis Agenda 1.  Big Data, what is it? 2.  Motivation 3.  The Lambda Architecture 4.  Implementing the Lambda Architecture 5.  Summary 24. April 2013 Big Data und Fast Data 29
  • 30. 2013 © Trivadis Lambda Architecture 24. April 2013 Big Data und Fast Data 30 Immutable data Batch View Query Data Stream Realtime View Incoming Data Serving Layer Speed Layer Batch Layer A B C D E F G
  • 31. 2013 © Trivadis Lambda Architecture A.  All data is sent to both the batch and speed layer B.  Master data set is an immutable, append-only set of data C.  Batch layer pre-computes query functions from scratch, result is called Batch Views. Batch layer constantly re-computes the batch views. D.  Batch views are indexed and stored in a scalable database to get particular values very quickly. Swaps in new batch views when they are available E.  Speed layer compensates for the high latency of updates to the Batch Views in the Serving layer. F.  Uses fast incremental algorithms and read/write databases to produce real- time views G.  Queries are resolved by getting results from both batch and real-time views 24. April 2013 Big Data und Fast Data 31
  • 32. 2013 © Trivadis Layered Architecture Stores the immutable constantly growing dataset Computes arbitrary views from this dataset using BigData technologies (can take hours) Can be always recreated Responsible for indexing and exposing the pre-computed batch views so that they can be queried Exposes the incremented real-time views Merges the batch and the real-time views into a consistent result Computes the views from the constant stream of data it receives Needed to compensate for the high latency of the batch layer Incremental model and views are transient 24. April 2013 Big Data und Fast Data 32 Serving Layer Batch Layer Speed Layer
  • 33. 2013 © Trivadis Agenda 1.  Big Data, what is it? 2.  Motivation 3.  The Lambda Architecture 4.  Implementing the Lambda Architecture 5.  Summary 24. April 2013 Big Data und Fast Data 33
  • 34. 2013 © Trivadis Lambda Architecture 24. April 2013 Big Data und Fast Data 34 Speed Layer Precompute Views query Source: Marz, N. & Warren, J. (2013) Big Data. Manning. Batch Layer Precomputed information All data Incremented information Process stream Incoming Data Batch recompute Realtime increment Serving Layer batch view batch view real time view real time view Merge
  • 35. 2013 © Trivadis Lambda Architecture 24. April 2013 Big Data und Fast Data 35 one possible product/framework mapping Speed Layer Precompute Views query Batch Layer Precomputed information All data Incremented information Process stream Incoming Data Batch recompute Realtime increment Serving Layer batch view batch view real time view real time view Merge
  • 36. 2013 © Trivadis Implementing Batch Layer Immutable Data •  Append only •  Normalized •  Stores master copy of all data Pre-computed information •  Function that takes all data as input query = function(all-data) •  High Latency, Batch processing •  Unrestrained computation •  Horizontal scalable 24. April 2013 Big Data und Fast Data 36 Immutable data Batch
 Views compute Precompute Views Batch Layer Precomputed information All data Batch recompute Batch Layer Serving Layer
  • 37. 2013 © Trivadis Apache Hadoop HDFS HDFS = the Hadoop Distributed File System A distributed file storage system Redundant storage Designed to reliably store data using commodity hardware Designed to expect hardware failures Intended for large files Designed for batch inserts 24. April 2013 Big Data und Fast Data 37 Batch Layer
  • 38. 2013 © Trivadis Apache Hadoop Map Reduce 24. April 2013 Big Data und Fast Data 38 §  Hadoop Map Reduce is an open source implementation of the MapReduce framework. §  Map Reduce is §  a programming model, introduced by Google, for processing large data sets, in a distributed environment §  De-facto standard to compute huge amounts of data §  An execution framework for organizing and performing such computations MAP master node REDUCE worker node 1 worker node 2 worker node 3 problem data solution data Batch Layer
  • 39. 2013 © Trivadis Hadoop MapReduce Flow 24. April 2013 Big Data und Fast Data 39 Source: Bill Graham, Twitter Inc. Batch Layer
  • 40. 2013 © Trivadis Hadoop MapReduce 24. April 2013 Big Data und Fast Data 40 Batch Layer
  • 41. 2013 © Trivadis Cascading Application framework for Java developers to simply develop robust Data Analytics and Data Management applications on Apache Hadoop adds an abstraction layer over the Hadoop API core concepts of the cascading API: •  Pipe: a series of processing steps (parsing, looping, filtering, etc) defining the data processing to be done •  Flow: association of a pipe (or set of pipes) with a data-source and data-sink 24. April 2013 Big Data und Fast Data 41 Batch Layer
  • 42. 2013 © Trivadis Casading 24. April 2013 Big Data und Fast Data 42
  • 43. 2013 © Trivadis Apache Pig Apache Pig is a platform for analyzing large data sets Key Properties •  Ease of programming •  Optimization opportunities •  Extensibility 24. April 2013 Big Data und Fast Data 43 Batch Layer
  • 44. 2013 © Trivadis Implementing Serving Layer
 for Batch Views Need a database that •  Is batch-writable •  Adding new information is atomic •  Has fast random reads •  Is scalable •  Is highly available •  Can be optimized for Storage •  Information can be de-normalized •  But no Random writes required! •  Can be a simple database 24. April 2013 Big Data und Fast Data 44 Serving Layer batch view batch view Batch Layer Precomputed information Immutable data Batch
 Views compute Batch Layer Serving Layer
  • 45. 2013 © Trivadis SploutSQL Full SQL => unlike NoSQL For BigData => unlike RDBMS Web latency & throughput => unlike Apache Hive, Apache Drill Why does it scale •  Data is partitioned •  Partitions are distributed 
 across nodes •  Adding more nodes 
 increase capacity •  Generation does not 
 impact serving 24. April 2013 Big Data und Fast Data 45 Serving Layer Source: Datasalt.
  • 46. 2013 © Trivadis SploutSQL 24. April 2013 Big Data und Fast Data 46 Serving Layer
  • 47. 2013 © Trivadis Implementing Speed Layer
 Stream Processing Continuous computation Transactional Storing a limited window of data •  Compensating for the last few 
 hours of data All the complexity is isolated in the 
 speed layer •  If anything goes wrong, it‘s 
 autocorrected by the next batch run 24. April 2013 Big Data und Fast Data 47 Speed Layer Incremented information Process stream Realtime increment Data Stream Realtime
 Views derive Speed Layer Serving Layer
  • 48. 2013 © Trivadis Apache Kafka A high throughput distributed messaging system Originated at LinkedIn Sequential disk access 24. April 2013 Big Data und Fast Data 48
  • 49. 2013 © Trivadis Twitter Storm – the “real-time Hadoop” 24. April 2013 Big Data und Fast Data 49 §  Strom is a distributed and fault-tolerant real-time computing platform §  data flow model, data flows through network of transformation entities §  Key concepts §  Tuple: ordered list of elements §  Streams: unbounded sequence of tuples §  Spouts: Source of streams §  Bolts: Process tuples and create new streams §  Topologies: directed graph of Spouts and Bolts §  Use Cases §  Stream Processing §  Continuous Computation §  Distributed RPC SPOUT BOLT „MAP“ „REDUCE“ „PERSIST“ problem data data source solution data Speed Layer Serving Layer BOLT BOLT
  • 50. 2013 © Trivadis Twitter Storm 24. April 2013 Big Data und Fast Data 50 Speed Layer Serving Layer
  • 51. 2013 © Trivadis Twitter Trident Higher level abstraction over Storm Trident State Grouped Stream Functions, Filters Aggregators Query Similar to Pig and Cascading 24. April 2013 Big Data und Fast Data 51 Speed Layer Serving Layer
  • 52. 2013 © Trivadis Twitter Trident 24. April 2013 Big Data und Fast Data 52 Speed Layer Serving Layer
  • 53. 2013 © Trivadis Implementing Serving Layer
 for Real-Time Views Incremental updates are made available as real-time views Requires a database that support random read and random writes •  Relational, NoSQL or NewSQL (in memory) databases can be used •  Here we are typically not in the BigData range Results are only needed until the data made it through the batch layer Complexity isolation 24. April 2013 Big Data und Fast Data 53 Data Stream Realtime
 Views derive Speed Layer Serving Layer Speed Layer Serving Layer real time view real time view Incremented information
  • 54. 2013 © Trivadis Cassandra Fully distributed, no single-point-of-failure Linearly scalable Fault tolerant Performant Durable Integrated caching Tunable consistency 24. April 2013 Big Data und Fast Data 54 Serving Layer
  • 55. 2013 © Trivadis Implementing Serving Layer
 Merge of Batch and Realtime Views An interesting feature of Storm / Trident is the ability to execute distributed RPC (DRPC) calls in parallel This can be used to implement the merge functionality when a query is executed 24. April 2013 Big Data und Fast Data 55 Serving Layer batch view batch view real time view real time view Realtime
 Views Serving Layer Batch Views Merge query
  • 56. 2013 © Trivadis Storm / Trident DRPC 24. April 2013 Big Data und Fast Data 56 Serving Layer
  • 57. 2013 © Trivadis Agenda 1.  Big Data, what is it? 2.  Motivation 3.  The Lambda Architecture 4.  Implementing the Lambda Architecture 5.  Summary 24. April 2013 Big Data und Fast Data 57
  • 58. 2013 © Trivadis Summary – The lambda architecture 24. April 2013 Big Data und Fast Data 58 §  The Lambda Architecture §  Can discard batch views and real-time views and recreate everything from scratch §  Mistakes corrected via re-computation §  Data storage layer optimized independently from query resolution layer §  Still in a very early …. But a very interesting idea! -  Today a zoo of technologies are needed => Operations won‘t like it §  Different query language for batch and real time §  An abstraction over batch and speed layer needed -  Cascading and Trident are already similar §  Industry standards needed!
  • 59. 2013 © Trivadis BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN
 THANK YOU. Trivadis AG Guido Schmutz & Albert Blarer Europa-Strasse 5
 CH-8095 Glattbrugg info@trivadis.com
 www.trivadis.com 24. April 2013 Big Data und Fast Data 59