SlideShare une entreprise Scribd logo
1  sur  92
Ricardo Fanjul, letgo
Designing a Horizontally
Scalable Event-Driven Big
Data Architecture with
Apache Spark
#SAISExp2
2 0 1 8 : DATA O DY S S E Y
Ricardo Fanjul
Data Engineer
Founded in 2015
100MM+ downloads
400MM+ listings
#SAISExp2
L E T G O D ATA P L AT F O R M I N N U M B E R S
500GB
Data daily
Events Processed Daily
1 billion
50K
Peaks of events per Second
600+
Event Types
200TB
Storage (S3)
< 1sec
NRT Processing Time
#SAISExp2
T H E DAW N O F L E T G O
#SAISExp2
C L A S S I C A L B I P L AT F O R M
T H E D A W N O F L E T G O
#SAISExp2
C L A S S I C A L B I P L AT F O R M
T H E D A W N O F L E T G O
#SAISExp2
M OV I N G TO - S E RV I C E S A N D
E V E N T S
T H E D A W N O F L E T G O
μ
#SAISExp2
M OV I N G TO - S E RV I C E S A N D
E V E N T S
T H E D A W N O F L E T G O
Domain EventsTracking Events
μ
#SAISExp2
Data Ingestion
Storage
INGEST
Stream
Batch
PROCESS
Query
Data exploitation
Orchestration
DISCOVER
#SAISExp2
I N G E S T
Data Ingestion
Storage
INGEST PROCESS
Stream
Batch
DISCOVER
Query
Data exploitation
Orchestration
Data Ingestion
#SAISExp2
O U R G OA L
DATA I N G E S T I O N
#SAISExp2
T H E D I S C OV E RY
DATA I N G E S T I O N
#SAISExp2
K A F K A C O N N E C T
DATA I N G E S T I O N
Amazon Aurora
#SAISExp2
T H E J O U R N E Y B E G I N S
Data Ingestion
INGEST PROCESS
Stream
Batch
DISCOVER
Query
Data exploitation
Orchestration
Storage
#SAISExp2
B U I L D I N G T H E DATA L A K E
S TO R AG E
#SAISExp2
B U I L D I N G T H E DATA L A K E
S TO R AG E
We want to store all events coming from Kafka to S3.
#SAISExp2
B U I L D I N G T H E DATA L A K E
S TO R AG E
#SAISExp2
S O M E T I M E S … S H I T H A P P E N S
S TO R AG E
#SAISExp2
D U P L I C AT E D E V E N T S
S TO R AG E
#SAISExp2
D U P L I C AT E D E V E N T S
S TO R AG E
#SAISExp2
( V E RY ) L AT E E V E N T S
S TO R AG E
#SAISExp2
( V E RY ) L AT E E V E N T S
S TO R AG E
1
2
3
4
5
6
Dirty Buckets
1 Read batch of events from Kafka.
2 Write each event to Cassandra.
3 Write “dirty hours” to compact topic:
Key=(event_type, hour).
4 Read dirty “hours” topic.
5 Read all events with dirty hours.
6 Store in S3
#SAISExp2
S 3 P RO B L E M S
S TO R AG E
#SAISExp2
S 3 P RO B L E M S
S TO R AG E
1. Eventual consistency
2. Very slow renames
S O M E S 3 B I G DATA P RO B L E M S :
S 3 P RO B L E M S :
E V E N T UA L C O N S I S T E N C Y
S TO R AG E
#SAISExp2
S 3 P RO B L E M S :
E V E N T UA L C O N S I S T E N C Y
S TO R AG E
S3GUARD
S3AFileSystem
The Hadoop FileSystem for Amazon S3
FileSystem
Operations
S3 ClientDynamoDB Client
1
WRITE
fs metadata
READ
fs metadata
WRITE
object data
LIST
object info
LIST
object data
Write Path
Read Path 2 3
1 2
2 1 1 2 3
S 3 P RO B L E M S :
S L OW R E N A M E S
S TO R AG E


¿Job freeze?
#SAISExp2
S 3 P RO B L E M S :
S L OW R E N A M E S
S TO R AG E
New Hadoop 3.1 S3A committers:
• Directory
• Partitioned
• Magic
#SAISExp2
P R O C E S S
INGEST PROCESS
Batch
DISCOVER
Query
Data exploitation
Orchestration
StreamData Ingestion
Storage
#SAISExp2
R E A L T I M E U S E R S E G M E N TAT I O N
S T R E A M
Stream Journal
User bucket’s changed
1 2
#SAISExp2
R E A L T I M E U S E R S E G M E N TAT I O N
S T R E A M
JOURNAL
User bucket’s changed
1 2
STREAM
#SAISExp2
R E A L T I M E PAT T E R N D E T E C T I O N
S T R E A M
Is it still available?
What condition is it in?
Could we meet at……?
Is the price negotiable?
I offer you….$
#SAISExp2
R E A L T I M E PAT T E R N D E T E C T I O N
S T R E A M
{
"type": "meeting_proposal",
"properties": {
"location_name": “Letgo HQ",
"geo": {
"lat": "41.390205",
"lon": "2.154007"
},
"date": "1511193820350",
"meeting_id": "23213213213"
}
}
Structured data
#SAISExp2
R E A L T I M E PAT T E R N D E T E C T I O N
S T R E A M
Meeting proposed + meeting accepted = emit accepted-meeting event
Meeting proposed + nothing in X time = “You have a proposal to meet”
#SAISExp2
Data Ingestion
Storage
INGEST PROCESS DISCOVER
Query
Data exploitation
Orchestration
Stream
Batch
#SAISExp2
G E O DATA E N R I C H M E N T
B AT C H
#SAISExp2
G E O DATA E N R I C H M E N T
B AT C H
{
"data": {
"id": "105dg3272-8e5f-426f-
bca0-704e98552961",
"type": "some_event",
"attributes": {
"latitude": 42.3677203,
"longitude": -83.1186093
}
},
"meta": {
"created_at": 1522886400036
}
}


Technically correct but
not very actionable
#SAISExp2
G E O DATA E N R I C H M E N T
B AT C H
•City: Detroit
•Postal code: 48206
•State: Michigan
•DMA: Detroit
•Country: US
What we know:
(42.3677203, -83.1186093)
#SAISExp2
G E O DATA E N R I C H M E N T
B AT C H
How we do it:
• Populating JTS indices from WKT polygon data
• Custom Spark SQL UDF
SELECT geodata.dma_name,
geodata.dma_number AS dma_number,
geodata.city AS city,
geodata.state AS state ,
geodata.zip_code AS zip_code
FROM (
SELECT
geodata(longitude, latitude) AS geodata
FROM ….
)


#SAISExp2
D I S C O V E R
Data Ingestion
Storage
INGEST PROCESS DISCOVER
Data exploitation
Orchestration
Stream
Batch
Query
#SAISExp2
Stream Journal
User bucket’s changed
1 2
QU E RY I N G DATA
QU E RY
#SAISExp2
Stream Journal
User bucket’s changed
1 2
QU E RY I N G DATA
QU E RY
#SAISExp2
Stream Journal
User bucket’s changed
1 2
QU E RY I N G DATA
QU E RY
Amazon Aurora
QU E RY I N G DATA
QU E RY
METASTORE
SPECTRUM
QU E RY I N G DATA
QU E RY
#SAISExp2
QU E RY I N G DATA
QU E RY
METASTORE
Thrift Server
Amazon Aurora
QU E RY I N G DATA
QU E RY
CREATE TABLE IF NOT EXISTS
database_name.table_name(
some_column STRING,
...
dt DATE
)
USING json
PARTITIONED BY (`dt`)
CREATE TEMPORARY VIEW table_name
USING org.apache.spark.sql.cassandra
OPTIONS (
table "table_name",
keyspace "keyspace_name")
CREATE EXTERNAL TABLE IF NOT EXISTS
database_name.table_name(
some_column STRING...,
dt DATE
)
PARTITIONED BY (`dt`)
USING PARQUET
LOCATION 's3a://bucket-name/database_name/table_name'
CREATE TABLE IF NOT EXISTS database_name.table_name
using com.databricks.spark.redshift
options (
dbtable 'schema.redshift_table_name',
tempdir 's3a://redshift-temp/',
url 'jdbc:redshift://xxxx.redshift.amazonaws.com:5439/letgo?
user=xxx&password=xxx',
forward_spark_s3_credentials 'true')
QU E RY I N G DATA
QU E RY
CREATE TABLE …
STORED AS…
CREATE TABLE …
USING [parquet,json,csv…]
70%
Higher performance!
QU E RY I N G DATA : B AT C H E S W I T H S Q L
QU E RY
Creating the
table
Inserting data
1 2
#SAISExp2
QU E RY I N G DATA : B AT C H E S W I T H S Q L
QU E RY
Creating the
table
1 CREATE EXTERNAL TABLE IF NOT EXISTS database.some_name(
  user_id STRING,
  column_b STRING,
...
)
USING PARQUET
PARTITIONED BY (`dt` STRING)
LOCATION 's3a://example/some_table'
#SAISExp2
QU E RY I N G DATA : B AT C H E S W I T H S Q L
QU E RY
Inserting data
2 INSERT OVERWRITE TABLE database.some_name PARTITION(dt)
SELECT
user_id,
column_b,
dt
FROM other_table
...
#SAISExp2
QU E RY I N G DATA : B AT C H E S W I T H S Q L
QU E RY
Problem?
#SAISExp2
QU E RY I N G DATA : B AT C H E S W I T H S Q L
QU E RY

 200 files because default value of
“spark.sql.shuffle.partition”
QU E RY I N G DATA : B AT C H E S W I T H
S Q L
QU E RY
INSERT OVERWRITE TABLE database.some_name PARTITION(dt)
SELECT
user_id,
column_b,
dt
FROM other_table
...
?
#SAISExp2
QU E RY I N G DATA : B AT C H E S W I T H S Q L
QU E RY
?
DISTRIBUTE BY (dt):
Only one file not Sorted
CLUSTERED BY (dt, user_id, column_b):
Multiple files
DISTRIBUTE BY (dt) SORT BY (user_id, column_b):
Only one file sorted by user_id, column_b.
Good for joins using this properties.
#SAISExp2
QU E RY I N G DATA : B AT C H E S W I T H S Q L
QU E RY
INSERT OVERWRITE TABLE database.some_name
PARTITION(dt)
SELECT
user_id,
column_b,
dt
FROM other_table
...
DISTRIBUTE BY (dt) SORT BY (user_id)
#SAISExp2
QU E RY I N G DATA : B AT C H E S W I T H S Q L
QU E RY
#SAISExp2
Data Ingestion
Storage
INGEST PROCESS DISCOVER
Orchestration
Stream
Batch
Query
Data exploitation
#SAISExp2
O U R A N A LY T I C A L S TAC K
DATA E X P L O I TAT I O N
#SAISExp2
O U R A N A LY T I C A L S TAC K
DATA E X P L O I TAT I O N
Amazon Aurora
METASTORE
Thrift Server
DATA S C I E N T I S T S T E A M ?
DATA E X P L O I TAT I O N
#SAISExp2
DATA S C I E N T I S T S A S I S E E T H E M
DATA E X P L O I TAT I O N
#SAISExp2
DATA S C I E N T I S T S S I N S
DATA E X P L O I TAT I O N
#SAISExp2
DATA S C I E N T I S T S S I N S
DATA E X P L O I TAT I O N
Too many
small files!
#SAISExp2
DATA S C I E N T I S T S S I N S
DATA E X P L O I TAT I O N
Huge Query!
#SAISExp2
DATA S C I E N T I S T S S I N S
DATA E X P L O I TAT I O N
Too much
shuffle!
#SAISExp2
Data Ingestion
Storage
INGEST PROCESS DISCOVER
Stream
Batch
Query
Data exploitation
Orchestration
#SAISExp2
A I R F L OW
O R C H E S T R AT I O N
#SAISExp2
A I R F L OW
O R C H E S T R AT I O N
#SAISExp2
A I R F L OW
O R C H E S T R AT I O N
I’m happy!!
#SAISExp2
M OV I N G TO S TAT E L E S S
C L U S T E R
I ’ M S O R RY R I C A R D O. I ’ M
A F R A I D I C A N ’ T D O T H AT.
P L AT F O R M L I M I TAT I O N S
M OV I N G TO S TAT E L E S S C L U S T E R
#SAISExp2
P L A N N I N G T H E S O L U T I O N
M OV I N G TO S TAT E L E S S C L U S T E R
#SAISExp2
P L A N N I N G T H E S O L U T I O N
M OV I N G TO S TAT E L E S S C L U S T E R
#SAISExp2
P L A N N I N G T H E S O L U T I O N
M OV I N G TO S TAT E L E S S C L U S T E R
#SAISExp2
A L O N G A N D W I N D I N G P RO C E S S
M OV I N G TO S TAT E L E S S C L U S T E R
#SAISExp2
A L O N G A N D W I N D I N G P RO C E S S
M OV I N G TO S TAT E L E S S C L U S T E R
#SAISExp2
N E W C A PA B I L I T I E S O F T H E
P L AT F O R M
M OV I N G TO S TAT E L E S S C L U S T E R
#SAISExp2
N E W C A PA B I L I T I E S O F T H E
P L AT F O R M
M OV I N G TO S TAT E L E S S C L U S T E R
#SAISExp2
J U P I T E R A N D B E YO N D
T H E I N F I N I T E
S O M E T H I N G WO N D E R F U L W I L L
H A P P E N
J U P I T E R A N D B E YO N D T H E I N F I N I T E
#SAISExp2
R E V E A L
J U P I T E R A N D B E YO N D T H E I N F I N I T E
https://youtu.be/CInMDMuSFwc
– A R T H U R C . C L A R K E
“The only way to discover the limits of the possible
is to go beyond them into the impossible.”
#SAISExp2
D O YO U WA N T TO J O I N U S ?
T H E F U T U R E … ?
https://boards.greenhouse.io/letgo

Contenu connexe

Tendances

Let’s spread Phishing and escape the blocklists
Let’s spread Phishing and escape the blocklistsLet’s spread Phishing and escape the blocklists
Let’s spread Phishing and escape the blocklistsAndrea Draghetti
 
Smart Contracts Technical Overview - Meetup Roma - 17/09/19
Smart Contracts Technical Overview - Meetup Roma - 17/09/19Smart Contracts Technical Overview - Meetup Roma - 17/09/19
Smart Contracts Technical Overview - Meetup Roma - 17/09/19Federico Tenga
 
19. stretnutie komunity kubernetes
19. stretnutie komunity kubernetes19. stretnutie komunity kubernetes
19. stretnutie komunity kubernetesJuraj Hantak
 
Logs Are Magic: Why Git Workflows and Commit Structure Should Matter To You
Logs Are Magic: Why Git Workflows and Commit Structure Should Matter To YouLogs Are Magic: Why Git Workflows and Commit Structure Should Matter To You
Logs Are Magic: Why Git Workflows and Commit Structure Should Matter To YouJohn Anderson
 
Uncovering and Visualizing Malicious Infrastructure
Uncovering and Visualizing Malicious InfrastructureUncovering and Visualizing Malicious Infrastructure
Uncovering and Visualizing Malicious InfrastructureAndrea Scarfo
 
On the development and distribution of R packages
On the development and distribution of R packagesOn the development and distribution of R packages
On the development and distribution of R packagesTom Mens
 
April Wensel - Crafting Compassionate Code
April Wensel - Crafting Compassionate CodeApril Wensel - Crafting Compassionate Code
April Wensel - Crafting Compassionate CodeApril Wensel
 
The net is dark and full of terrors - James Bennett
The net is dark and full of terrors - James BennettThe net is dark and full of terrors - James Bennett
The net is dark and full of terrors - James BennettLeo Zhou
 
Decoupled APIs through Microservices
Decoupled APIs through MicroservicesDecoupled APIs through Microservices
Decoupled APIs through MicroservicesDavid Simons
 
Footprinting-and-the-basics-of-hacking
Footprinting-and-the-basics-of-hackingFootprinting-and-the-basics-of-hacking
Footprinting-and-the-basics-of-hackingSathishkumar A
 
Drupal Decoupled on ARTE
Drupal Decoupled on ARTEDrupal Decoupled on ARTE
Drupal Decoupled on ARTEjolidog
 
Bristol Uni - Use Cases of NoSQL
Bristol Uni - Use Cases of NoSQLBristol Uni - Use Cases of NoSQL
Bristol Uni - Use Cases of NoSQLDavid Simons
 
Statistical Programming with JavaScript
Statistical Programming with JavaScriptStatistical Programming with JavaScript
Statistical Programming with JavaScriptDavid Simons
 
Python for Data-driven Storytelling
Python for Data-driven StorytellingPython for Data-driven Storytelling
Python for Data-driven StorytellingHamlet Batista
 
Lyhyt somepala - sosiaalisen median analytiikkaa ja päivitysten tekoa
Lyhyt somepala - sosiaalisen median analytiikkaa ja päivitysten tekoaLyhyt somepala - sosiaalisen median analytiikkaa ja päivitysten tekoa
Lyhyt somepala - sosiaalisen median analytiikkaa ja päivitysten tekoaMaria Rajakallio
 

Tendances (16)

Jenkins 20
Jenkins 20Jenkins 20
Jenkins 20
 
Let’s spread Phishing and escape the blocklists
Let’s spread Phishing and escape the blocklistsLet’s spread Phishing and escape the blocklists
Let’s spread Phishing and escape the blocklists
 
Smart Contracts Technical Overview - Meetup Roma - 17/09/19
Smart Contracts Technical Overview - Meetup Roma - 17/09/19Smart Contracts Technical Overview - Meetup Roma - 17/09/19
Smart Contracts Technical Overview - Meetup Roma - 17/09/19
 
19. stretnutie komunity kubernetes
19. stretnutie komunity kubernetes19. stretnutie komunity kubernetes
19. stretnutie komunity kubernetes
 
Logs Are Magic: Why Git Workflows and Commit Structure Should Matter To You
Logs Are Magic: Why Git Workflows and Commit Structure Should Matter To YouLogs Are Magic: Why Git Workflows and Commit Structure Should Matter To You
Logs Are Magic: Why Git Workflows and Commit Structure Should Matter To You
 
Uncovering and Visualizing Malicious Infrastructure
Uncovering and Visualizing Malicious InfrastructureUncovering and Visualizing Malicious Infrastructure
Uncovering and Visualizing Malicious Infrastructure
 
On the development and distribution of R packages
On the development and distribution of R packagesOn the development and distribution of R packages
On the development and distribution of R packages
 
April Wensel - Crafting Compassionate Code
April Wensel - Crafting Compassionate CodeApril Wensel - Crafting Compassionate Code
April Wensel - Crafting Compassionate Code
 
The net is dark and full of terrors - James Bennett
The net is dark and full of terrors - James BennettThe net is dark and full of terrors - James Bennett
The net is dark and full of terrors - James Bennett
 
Decoupled APIs through Microservices
Decoupled APIs through MicroservicesDecoupled APIs through Microservices
Decoupled APIs through Microservices
 
Footprinting-and-the-basics-of-hacking
Footprinting-and-the-basics-of-hackingFootprinting-and-the-basics-of-hacking
Footprinting-and-the-basics-of-hacking
 
Drupal Decoupled on ARTE
Drupal Decoupled on ARTEDrupal Decoupled on ARTE
Drupal Decoupled on ARTE
 
Bristol Uni - Use Cases of NoSQL
Bristol Uni - Use Cases of NoSQLBristol Uni - Use Cases of NoSQL
Bristol Uni - Use Cases of NoSQL
 
Statistical Programming with JavaScript
Statistical Programming with JavaScriptStatistical Programming with JavaScript
Statistical Programming with JavaScript
 
Python for Data-driven Storytelling
Python for Data-driven StorytellingPython for Data-driven Storytelling
Python for Data-driven Storytelling
 
Lyhyt somepala - sosiaalisen median analytiikkaa ja päivitysten tekoa
Lyhyt somepala - sosiaalisen median analytiikkaa ja päivitysten tekoaLyhyt somepala - sosiaalisen median analytiikkaa ja päivitysten tekoa
Lyhyt somepala - sosiaalisen median analytiikkaa ja päivitysten tekoa
 

Similaire à Designing a Horizontally Scalable Event-Driven Big Data Architecture with Apache Spark

Meteor - not just for rockstars
Meteor - not just for rockstarsMeteor - not just for rockstars
Meteor - not just for rockstarsStephan Hochhaus
 
Spring Roo 2.0 Preview at Spring I/O 2016
Spring Roo 2.0 Preview at Spring I/O 2016 Spring Roo 2.0 Preview at Spring I/O 2016
Spring Roo 2.0 Preview at Spring I/O 2016 DISID
 
Choosing the right database
Choosing the right databaseChoosing the right database
Choosing the right databaseDavid Simons
 
Choosing the Right Database
Choosing the Right DatabaseChoosing the Right Database
Choosing the Right DatabaseDavid Simons
 
Data Modelling at Scale
Data Modelling at ScaleData Modelling at Scale
Data Modelling at ScaleDavid Simons
 
Parcours de professionnalisation
Parcours de professionnalisationParcours de professionnalisation
Parcours de professionnalisationSbastienGuritey
 
Code GPU with CUDA - Identifying performance limiters
Code GPU with CUDA - Identifying performance limitersCode GPU with CUDA - Identifying performance limiters
Code GPU with CUDA - Identifying performance limitersMarina Kolpakova
 
Writing (Meteor) Code With Style
Writing (Meteor) Code With StyleWriting (Meteor) Code With Style
Writing (Meteor) Code With StyleStephan Hochhaus
 
Innovations and trends in Cloud. Connectfest Porto 2019
Innovations and trends in Cloud. Connectfest Porto 2019Innovations and trends in Cloud. Connectfest Porto 2019
Innovations and trends in Cloud. Connectfest Porto 2019javier ramirez
 
SEO orientado a Ventas - DSMVALENCIA 2017
SEO orientado a Ventas - DSMVALENCIA 2017SEO orientado a Ventas - DSMVALENCIA 2017
SEO orientado a Ventas - DSMVALENCIA 2017Luis M Villanueva
 
Creating Modern Metadata Systems with New Relic, Dow Jones [FutureStack16]
Creating Modern Metadata Systems with New Relic, Dow Jones [FutureStack16]Creating Modern Metadata Systems with New Relic, Dow Jones [FutureStack16]
Creating Modern Metadata Systems with New Relic, Dow Jones [FutureStack16]New Relic
 
Whitepaper - Spinbackup Overview
Whitepaper - Spinbackup OverviewWhitepaper - Spinbackup Overview
Whitepaper - Spinbackup OverviewSpinbackup
 
Apache Spark: the next big thing? - StampedeCon 2014
Apache Spark: the next big thing? - StampedeCon 2014Apache Spark: the next big thing? - StampedeCon 2014
Apache Spark: the next big thing? - StampedeCon 2014StampedeCon
 
AI Deeplearning Programming
AI Deeplearning ProgrammingAI Deeplearning Programming
AI Deeplearning ProgrammingPaulSombat
 
Drones - What's next?
Drones - What's next?Drones - What's next?
Drones - What's next?Speck&Tech
 

Similaire à Designing a Horizontally Scalable Event-Driven Big Data Architecture with Apache Spark (20)

Meteor - not just for rockstars
Meteor - not just for rockstarsMeteor - not just for rockstars
Meteor - not just for rockstars
 
Spring Roo 2.0 Preview at Spring I/O 2016
Spring Roo 2.0 Preview at Spring I/O 2016 Spring Roo 2.0 Preview at Spring I/O 2016
Spring Roo 2.0 Preview at Spring I/O 2016
 
Fast api
Fast apiFast api
Fast api
 
Meteor WWNRW Intro
Meteor WWNRW IntroMeteor WWNRW Intro
Meteor WWNRW Intro
 
Choosing the right database
Choosing the right databaseChoosing the right database
Choosing the right database
 
Choosing the Right Database
Choosing the Right DatabaseChoosing the Right Database
Choosing the Right Database
 
airflow_aws_snow.pptx
airflow_aws_snow.pptxairflow_aws_snow.pptx
airflow_aws_snow.pptx
 
Data Modelling at Scale
Data Modelling at ScaleData Modelling at Scale
Data Modelling at Scale
 
Parcours de professionnalisation
Parcours de professionnalisationParcours de professionnalisation
Parcours de professionnalisation
 
eHarmony @ Phoenix Con 2016
eHarmony @ Phoenix Con 2016eHarmony @ Phoenix Con 2016
eHarmony @ Phoenix Con 2016
 
Code GPU with CUDA - Identifying performance limiters
Code GPU with CUDA - Identifying performance limitersCode GPU with CUDA - Identifying performance limiters
Code GPU with CUDA - Identifying performance limiters
 
Writing (Meteor) Code With Style
Writing (Meteor) Code With StyleWriting (Meteor) Code With Style
Writing (Meteor) Code With Style
 
GraphQL, l'avenir du REST ?
GraphQL, l'avenir du REST ?GraphQL, l'avenir du REST ?
GraphQL, l'avenir du REST ?
 
Innovations and trends in Cloud. Connectfest Porto 2019
Innovations and trends in Cloud. Connectfest Porto 2019Innovations and trends in Cloud. Connectfest Porto 2019
Innovations and trends in Cloud. Connectfest Porto 2019
 
SEO orientado a Ventas - DSMVALENCIA 2017
SEO orientado a Ventas - DSMVALENCIA 2017SEO orientado a Ventas - DSMVALENCIA 2017
SEO orientado a Ventas - DSMVALENCIA 2017
 
Creating Modern Metadata Systems with New Relic, Dow Jones [FutureStack16]
Creating Modern Metadata Systems with New Relic, Dow Jones [FutureStack16]Creating Modern Metadata Systems with New Relic, Dow Jones [FutureStack16]
Creating Modern Metadata Systems with New Relic, Dow Jones [FutureStack16]
 
Whitepaper - Spinbackup Overview
Whitepaper - Spinbackup OverviewWhitepaper - Spinbackup Overview
Whitepaper - Spinbackup Overview
 
Apache Spark: the next big thing? - StampedeCon 2014
Apache Spark: the next big thing? - StampedeCon 2014Apache Spark: the next big thing? - StampedeCon 2014
Apache Spark: the next big thing? - StampedeCon 2014
 
AI Deeplearning Programming
AI Deeplearning ProgrammingAI Deeplearning Programming
AI Deeplearning Programming
 
Drones - What's next?
Drones - What's next?Drones - What's next?
Drones - What's next?
 

Dernier

Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.pptamreenkhanum0307
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 

Dernier (20)

Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.ppt
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 

Designing a Horizontally Scalable Event-Driven Big Data Architecture with Apache Spark

  • 1. Ricardo Fanjul, letgo Designing a Horizontally Scalable Event-Driven Big Data Architecture with Apache Spark #SAISExp2
  • 2. 2 0 1 8 : DATA O DY S S E Y
  • 4. Founded in 2015 100MM+ downloads 400MM+ listings #SAISExp2
  • 5. L E T G O D ATA P L AT F O R M I N N U M B E R S 500GB Data daily Events Processed Daily 1 billion 50K Peaks of events per Second 600+ Event Types 200TB Storage (S3) < 1sec NRT Processing Time #SAISExp2
  • 6. T H E DAW N O F L E T G O
  • 7. #SAISExp2 C L A S S I C A L B I P L AT F O R M T H E D A W N O F L E T G O #SAISExp2
  • 8. C L A S S I C A L B I P L AT F O R M T H E D A W N O F L E T G O #SAISExp2
  • 9. M OV I N G TO - S E RV I C E S A N D E V E N T S T H E D A W N O F L E T G O μ #SAISExp2
  • 10. M OV I N G TO - S E RV I C E S A N D E V E N T S T H E D A W N O F L E T G O Domain EventsTracking Events μ #SAISExp2
  • 12. I N G E S T
  • 13. Data Ingestion Storage INGEST PROCESS Stream Batch DISCOVER Query Data exploitation Orchestration Data Ingestion #SAISExp2
  • 14. O U R G OA L DATA I N G E S T I O N #SAISExp2
  • 15. T H E D I S C OV E RY DATA I N G E S T I O N #SAISExp2
  • 16. K A F K A C O N N E C T DATA I N G E S T I O N Amazon Aurora #SAISExp2
  • 17.
  • 18. T H E J O U R N E Y B E G I N S
  • 19. Data Ingestion INGEST PROCESS Stream Batch DISCOVER Query Data exploitation Orchestration Storage #SAISExp2
  • 20. B U I L D I N G T H E DATA L A K E S TO R AG E #SAISExp2
  • 21. B U I L D I N G T H E DATA L A K E S TO R AG E We want to store all events coming from Kafka to S3. #SAISExp2
  • 22. B U I L D I N G T H E DATA L A K E S TO R AG E #SAISExp2
  • 23. S O M E T I M E S … S H I T H A P P E N S S TO R AG E #SAISExp2
  • 24. D U P L I C AT E D E V E N T S S TO R AG E #SAISExp2
  • 25. D U P L I C AT E D E V E N T S S TO R AG E #SAISExp2
  • 26. ( V E RY ) L AT E E V E N T S S TO R AG E #SAISExp2
  • 27. ( V E RY ) L AT E E V E N T S S TO R AG E 1 2 3 4 5 6 Dirty Buckets 1 Read batch of events from Kafka. 2 Write each event to Cassandra. 3 Write “dirty hours” to compact topic: Key=(event_type, hour). 4 Read dirty “hours” topic. 5 Read all events with dirty hours. 6 Store in S3 #SAISExp2
  • 28. S 3 P RO B L E M S S TO R AG E #SAISExp2
  • 29. S 3 P RO B L E M S S TO R AG E 1. Eventual consistency 2. Very slow renames S O M E S 3 B I G DATA P RO B L E M S :
  • 30. S 3 P RO B L E M S : E V E N T UA L C O N S I S T E N C Y S TO R AG E #SAISExp2
  • 31. S 3 P RO B L E M S : E V E N T UA L C O N S I S T E N C Y S TO R AG E S3GUARD S3AFileSystem The Hadoop FileSystem for Amazon S3 FileSystem Operations S3 ClientDynamoDB Client 1 WRITE fs metadata READ fs metadata WRITE object data LIST object info LIST object data Write Path Read Path 2 3 1 2 2 1 1 2 3
  • 32. S 3 P RO B L E M S : S L OW R E N A M E S S TO R AG E 
 ¿Job freeze? #SAISExp2
  • 33. S 3 P RO B L E M S : S L OW R E N A M E S S TO R AG E New Hadoop 3.1 S3A committers: • Directory • Partitioned • Magic #SAISExp2
  • 34. P R O C E S S
  • 36. R E A L T I M E U S E R S E G M E N TAT I O N S T R E A M Stream Journal User bucket’s changed 1 2 #SAISExp2
  • 37. R E A L T I M E U S E R S E G M E N TAT I O N S T R E A M JOURNAL User bucket’s changed 1 2 STREAM #SAISExp2
  • 38. R E A L T I M E PAT T E R N D E T E C T I O N S T R E A M Is it still available? What condition is it in? Could we meet at……? Is the price negotiable? I offer you….$ #SAISExp2
  • 39. R E A L T I M E PAT T E R N D E T E C T I O N S T R E A M { "type": "meeting_proposal", "properties": { "location_name": “Letgo HQ", "geo": { "lat": "41.390205", "lon": "2.154007" }, "date": "1511193820350", "meeting_id": "23213213213" } } Structured data #SAISExp2
  • 40. R E A L T I M E PAT T E R N D E T E C T I O N S T R E A M Meeting proposed + meeting accepted = emit accepted-meeting event Meeting proposed + nothing in X time = “You have a proposal to meet” #SAISExp2
  • 41. Data Ingestion Storage INGEST PROCESS DISCOVER Query Data exploitation Orchestration Stream Batch #SAISExp2
  • 42. G E O DATA E N R I C H M E N T B AT C H #SAISExp2
  • 43. G E O DATA E N R I C H M E N T B AT C H { "data": { "id": "105dg3272-8e5f-426f- bca0-704e98552961", "type": "some_event", "attributes": { "latitude": 42.3677203, "longitude": -83.1186093 } }, "meta": { "created_at": 1522886400036 } } 
 Technically correct but not very actionable #SAISExp2
  • 44. G E O DATA E N R I C H M E N T B AT C H •City: Detroit •Postal code: 48206 •State: Michigan •DMA: Detroit •Country: US What we know: (42.3677203, -83.1186093) #SAISExp2
  • 45. G E O DATA E N R I C H M E N T B AT C H How we do it: • Populating JTS indices from WKT polygon data • Custom Spark SQL UDF SELECT geodata.dma_name, geodata.dma_number AS dma_number, geodata.city AS city, geodata.state AS state , geodata.zip_code AS zip_code FROM ( SELECT geodata(longitude, latitude) AS geodata FROM …. ) 
 #SAISExp2
  • 46. D I S C O V E R
  • 47. Data Ingestion Storage INGEST PROCESS DISCOVER Data exploitation Orchestration Stream Batch Query #SAISExp2
  • 48. Stream Journal User bucket’s changed 1 2 QU E RY I N G DATA QU E RY #SAISExp2
  • 49. Stream Journal User bucket’s changed 1 2 QU E RY I N G DATA QU E RY #SAISExp2
  • 50. Stream Journal User bucket’s changed 1 2 QU E RY I N G DATA QU E RY Amazon Aurora
  • 51. QU E RY I N G DATA QU E RY METASTORE SPECTRUM
  • 52. QU E RY I N G DATA QU E RY #SAISExp2
  • 53. QU E RY I N G DATA QU E RY METASTORE Thrift Server Amazon Aurora
  • 54. QU E RY I N G DATA QU E RY CREATE TABLE IF NOT EXISTS database_name.table_name( some_column STRING, ... dt DATE ) USING json PARTITIONED BY (`dt`) CREATE TEMPORARY VIEW table_name USING org.apache.spark.sql.cassandra OPTIONS ( table "table_name", keyspace "keyspace_name") CREATE EXTERNAL TABLE IF NOT EXISTS database_name.table_name( some_column STRING..., dt DATE ) PARTITIONED BY (`dt`) USING PARQUET LOCATION 's3a://bucket-name/database_name/table_name' CREATE TABLE IF NOT EXISTS database_name.table_name using com.databricks.spark.redshift options ( dbtable 'schema.redshift_table_name', tempdir 's3a://redshift-temp/', url 'jdbc:redshift://xxxx.redshift.amazonaws.com:5439/letgo? user=xxx&password=xxx', forward_spark_s3_credentials 'true')
  • 55. QU E RY I N G DATA QU E RY CREATE TABLE … STORED AS… CREATE TABLE … USING [parquet,json,csv…] 70% Higher performance!
  • 56. QU E RY I N G DATA : B AT C H E S W I T H S Q L QU E RY Creating the table Inserting data 1 2 #SAISExp2
  • 57. QU E RY I N G DATA : B AT C H E S W I T H S Q L QU E RY Creating the table 1 CREATE EXTERNAL TABLE IF NOT EXISTS database.some_name(   user_id STRING,   column_b STRING, ... ) USING PARQUET PARTITIONED BY (`dt` STRING) LOCATION 's3a://example/some_table' #SAISExp2
  • 58. QU E RY I N G DATA : B AT C H E S W I T H S Q L QU E RY Inserting data 2 INSERT OVERWRITE TABLE database.some_name PARTITION(dt) SELECT user_id, column_b, dt FROM other_table ... #SAISExp2
  • 59. QU E RY I N G DATA : B AT C H E S W I T H S Q L QU E RY Problem? #SAISExp2
  • 60. QU E RY I N G DATA : B AT C H E S W I T H S Q L QU E RY 
 200 files because default value of “spark.sql.shuffle.partition”
  • 61. QU E RY I N G DATA : B AT C H E S W I T H S Q L QU E RY INSERT OVERWRITE TABLE database.some_name PARTITION(dt) SELECT user_id, column_b, dt FROM other_table ... ? #SAISExp2
  • 62. QU E RY I N G DATA : B AT C H E S W I T H S Q L QU E RY ? DISTRIBUTE BY (dt): Only one file not Sorted CLUSTERED BY (dt, user_id, column_b): Multiple files DISTRIBUTE BY (dt) SORT BY (user_id, column_b): Only one file sorted by user_id, column_b. Good for joins using this properties. #SAISExp2
  • 63. QU E RY I N G DATA : B AT C H E S W I T H S Q L QU E RY INSERT OVERWRITE TABLE database.some_name PARTITION(dt) SELECT user_id, column_b, dt FROM other_table ... DISTRIBUTE BY (dt) SORT BY (user_id) #SAISExp2
  • 64. QU E RY I N G DATA : B AT C H E S W I T H S Q L QU E RY #SAISExp2
  • 65. Data Ingestion Storage INGEST PROCESS DISCOVER Orchestration Stream Batch Query Data exploitation #SAISExp2
  • 66. O U R A N A LY T I C A L S TAC K DATA E X P L O I TAT I O N #SAISExp2
  • 67. O U R A N A LY T I C A L S TAC K DATA E X P L O I TAT I O N Amazon Aurora METASTORE Thrift Server
  • 68. DATA S C I E N T I S T S T E A M ? DATA E X P L O I TAT I O N #SAISExp2
  • 69. DATA S C I E N T I S T S A S I S E E T H E M DATA E X P L O I TAT I O N #SAISExp2
  • 70. DATA S C I E N T I S T S S I N S DATA E X P L O I TAT I O N #SAISExp2
  • 71. DATA S C I E N T I S T S S I N S DATA E X P L O I TAT I O N Too many small files! #SAISExp2
  • 72. DATA S C I E N T I S T S S I N S DATA E X P L O I TAT I O N Huge Query! #SAISExp2
  • 73. DATA S C I E N T I S T S S I N S DATA E X P L O I TAT I O N Too much shuffle! #SAISExp2
  • 74. Data Ingestion Storage INGEST PROCESS DISCOVER Stream Batch Query Data exploitation Orchestration #SAISExp2
  • 75. A I R F L OW O R C H E S T R AT I O N #SAISExp2
  • 76. A I R F L OW O R C H E S T R AT I O N #SAISExp2
  • 77. A I R F L OW O R C H E S T R AT I O N I’m happy!! #SAISExp2
  • 78. M OV I N G TO S TAT E L E S S C L U S T E R
  • 79. I ’ M S O R RY R I C A R D O. I ’ M A F R A I D I C A N ’ T D O T H AT.
  • 80. P L AT F O R M L I M I TAT I O N S M OV I N G TO S TAT E L E S S C L U S T E R #SAISExp2
  • 81. P L A N N I N G T H E S O L U T I O N M OV I N G TO S TAT E L E S S C L U S T E R #SAISExp2
  • 82. P L A N N I N G T H E S O L U T I O N M OV I N G TO S TAT E L E S S C L U S T E R #SAISExp2
  • 83. P L A N N I N G T H E S O L U T I O N M OV I N G TO S TAT E L E S S C L U S T E R #SAISExp2
  • 84. A L O N G A N D W I N D I N G P RO C E S S M OV I N G TO S TAT E L E S S C L U S T E R #SAISExp2
  • 85. A L O N G A N D W I N D I N G P RO C E S S M OV I N G TO S TAT E L E S S C L U S T E R #SAISExp2
  • 86. N E W C A PA B I L I T I E S O F T H E P L AT F O R M M OV I N G TO S TAT E L E S S C L U S T E R #SAISExp2
  • 87. N E W C A PA B I L I T I E S O F T H E P L AT F O R M M OV I N G TO S TAT E L E S S C L U S T E R #SAISExp2
  • 88. J U P I T E R A N D B E YO N D T H E I N F I N I T E
  • 89. S O M E T H I N G WO N D E R F U L W I L L H A P P E N J U P I T E R A N D B E YO N D T H E I N F I N I T E #SAISExp2
  • 90. R E V E A L J U P I T E R A N D B E YO N D T H E I N F I N I T E https://youtu.be/CInMDMuSFwc
  • 91. – A R T H U R C . C L A R K E “The only way to discover the limits of the possible is to go beyond them into the impossible.” #SAISExp2
  • 92. D O YO U WA N T TO J O I N U S ? T H E F U T U R E … ? https://boards.greenhouse.io/letgo