SlideShare une entreprise Scribd logo
1  sur  30
© 2019 Ververica
Fabian Hueske – Software Engineer
Apache Flink® SQL in Action
© 2019 Ververica2
About Me
• Apache Flink PMC member & ASF member
‒Contributing since day 1 at TU Berlin
‒Focusing on Flink’s relational APIs since ~3.5 years
• Co-author of “Stream Processing with Apache Flink”
• Co-founder of Ververica (formerly data Artisans)
© 2019 Ververica3
What is Apache Flink?
Stateful computations over streams
real-time and historic
fast, scalable, fault tolerant, in-memory
event time, large state, exactly-once
© 2019 Ververica4
Hardened at Scale
Streaming Platform Service
billions messages per day
A lot of Stream SQL
Streaming Platform as a Service
3700+ container running Flink,
1400+ nodes, 22k+ cores, 100s of jobs,
3 trillion events / day, 20 TB state
Fraud detection
Streaming Analytics Platform
1000s jobs, 100.000s cores,
10 TBs state, metrics, analytics,
real time ML,
Streaming SQL as a platform
© 2019 Ververica5
Powered by Apache Flink
© 2019 Ververica6
Flink’s Powerful Abstractions
Process Function (events, state, time)
DataStream API (streams, windows)
SQL / Table API (dynamic tables)
Stream- & Batch
Data Processing
High-level
Analytics API
Stateful Event-
Driven Applications
val stats = stream
.keyBy("sensor")
.timeWindow(Time.seconds(5))
.sum((a, b) -> a.add(b))
def processElement(event: MyEvent, ctx: Context, out: Collector[Result]) = {
// work with event and state
(event, state.value) match { … }
out.collect(…) // emit events
state.update(…) // modify state
// schedule a timer callback
ctx.timerService.registerEventTimeTimer(event.timestamp + 500)
}
Layered abstractions to
navigate simple to complex use cases
© 2019 Ververica7
Flink’s Relational APIs
Unified APIs for batch & streaming data
A query specifies exactly the same result
regardless whether its input is
static batch data or streaming data.
tableEnvironment
.scan("clicks")
.groupBy('user)
.select('user, 'url.count as 'cnt)
SELECT user, COUNT(url) AS cnt
FROM clicks
GROUP BY user
LINQ-style Table APIANSI
SQL
© 2019 Ververica8
Query Translation
tableEnvironment
.scan("clicks")
.groupBy('user)
.select('user, 'url.count)
SELECT user, COUNT(url)
FROM clicks
GROUP BY user
Input data is
bounded
(batch)
Input data is
unbounded
(streaming)
© 2019 Ververica9
What if “Clicks” is a File?
Clicks
user cTime url
Mary 12:00:00 https://…
Bob 12:00:00 https://…
Mary 12:00:02 https://…
Liz 12:00:03 https://…
user cnt
Mary 2
Bob 1
Liz 1
SELECT
user,
COUNT(url) as cnt
FROM clicks
GROUP BY user
Input data is
read at once
Result is produced
at once
© 2019 Ververica10
What if “Clicks” is a Stream?
user cTime url
user cnt
SELECT
user,
COUNT(url) as cnt
FROM clicks
GROUP BY user
Clicks
Mary 12:00:00 https://…
Bob 12:00:00 https://…
Mary 12:00:02 https://…
Liz 12:00:03 https://…
Bob 1
Liz 1
Mary 1Mary 2
Input data is
continuously read
Result is
continuously updated
The result is the same!
© 2019 Ververica11
Why is Stream-Batch Unification Important?
• Usability
‒ ANSI SQL syntax: No custom “StreamSQL” syntax.
‒ ANSI SQL semantics: No stream-specific result semantics.
• Portability
‒ Run the same query on bounded & unbounded data
‒ Run the same query on recorded & real-time data
‒ Bootstrapping query state or backfilling results from historic data
now
bounded query
unbounded query
past future
bounded query
start of the stream
unbounded query
© 2019 Ververica12
Database Systems Run Queries on Streams
• Materialized views (MV) are similar to regular views,
but persisted to disk or memory
‒Used to speed-up analytical queries
‒MVs need to be updated when the base tables change
• MV maintenance is very similar to SQL on streams
‒Base table updates are a stream of DML statements
‒MV definition query is evaluated on that stream
‒MV is query result and continuously updated
© 2019 Ververica13
Continuous Queries in Flink
• Core concept is a “Dynamic Table”
‒ Dynamic tables are changing over time
• Queries on dynamic tables
‒ produce new dynamic tables (which are updated based on input)
‒ do not terminate
• Stream ↔ Dynamic table conversions
© 2019 Ververica14
Stream ↔ Dynamic Table Conversions
•A stream is the changelog of a dynamic table
–As change messages are ingested from a stream, a table evolves
–As a table evolves, change messages are emitted to a stream
•Different changelog interpretations
–Append-only change messages
–Upsert change messages
–Add/Retract change messages
© 2019 Ververica15
How Can I Use It?
• Embed SQL queries in regular Flink applications
‒ Tight integration with DataStream and DataSet APIs
‒ Mix and match with other libraries (CEP, ProcessFunction, Gelly)
‒ Package and operate queries like any other Flink application
• Run SQL queries via Flink’s SQL CLI Client
‒ Interactive mode: Submit query and inspect results
‒ Detached mode: Submit query and write results to sink system
© 2019 Ververica16
Show Results
Enter Query
SELECT
user,
COUNT(url)
FROM clicks
GROUP BY user
Database /
HDFS
Event Log
Query
State
ResultsChangelog
or Table SQL CLI Client
Catalog
Optimizer
CLI
Result Server
Submit Query Job
SQL CLI Client – Interactive Queries
© 2019 Ververica17
The New York Taxi Rides Data Set
• A public data set about taxi rides in New York City
• Rides are ingested as append-only (streaming) table
‒ Each ride is represented by a start and an end event
• Table: Rides
rideId: BIGINT // ID of the taxi ride
taxiId: BIGINT // ID of the taxi
isStart: BOOLEAN // flag for pick-up (true) or drop-off (false) event
lon: DOUBLE // longitude of pick-up or drop-off location
lat: DOUBLE // latitude of pick-up or drop-off location
rideTime: TIMESTAMP // time of pick-up or drop-off event
psgCnt: INT // number of passengers
© 2019 Ververica18
Compute Basic Statistics
SELECT
psgCnt,
COUNT(*) as cnt
FROM Rides
WHERE isStart
GROUP BY
psgCnt
▪ Count rides per number of passengers.
© 2019 Ververica19
Identify Popular Pick-Up / Drop-Off Locations
SELECT
area,
isStart,
TUMBLE_END(rideTime, INTERVAL '5' MINUTE) AS cntEnd,
COUNT(*) AS cnt
FROM (SELECT rideTime, isStart, toAreaId(lon, lat) AS area
FROM Rides)
GROUP BY
area,
isStart,
TUMBLE(rideTime, INTERVAL '5' MINUTE)
▪ Compute every 5 minutes for each area the
number of departing and arriving taxis.
© 2019 Ververica20
Average Tip Per Hour of Day
SELECT
HOUR(r.rideTime) AS hourOfDay,
AVG(f.tip) AS avgTip
FROM
Rides r,
Fares f
WHERE
NOT r.isStart AND
r.rideId = f.rideId AND
f.payTime BETWEEN r.rideTime - INTERVAL '5' MINUTE AND r.rideTime
GROUP BY
HOUR(r.rideTime);
▪ Compute the average tip per hour of day. Fare data is
stored in a separate table Fares that needs to be joined.
© 2019 Ververica21
SQL Feature Set in Flink 1.8.0
STREAMING ONLY
• OVER / WINDOW
‒ UNBOUNDED / BOUNDED
PRECEDING
• INNER JOIN with time-versioned
table
• MATCH_RECOGNIZE
‒ Pattern Matching/CEP (SQL:2016)
BATCH ONLY
• UNION / INTERSECT / EXCEPT
• ORDER BY
STREAMING & BATCH
• SELECT FROM WHERE
• GROUP BY [HAVING]
‒ Non-windowed
‒ TUMBLE, HOP, SESSION windows
• JOIN
‒ Time-Windowed INNER + OUTER
JOIN
‒ Non-windowed INNER + OUTER JOIN
• User-Defined Functions
‒ Scalar
‒ Aggregation
‒ Table-valued
© 2019 Ververica22
What Can I Build With That?
• Data Pipelines & Low-latency ETL
‒ Transform, aggregate, and move events in real-time
‒ Write streams to file systems, DBMS, K-V stores, …
‒ Ingest appearing files to produce streams
• Stream & Batch Analytics
‒ Run analytical queries over bounded and unbounded data
‒ Query and compare historic and real-time data
• Power Live Dashboards
‒ Compute and update data to visualize in real-time
© 2019 Ververica23
SQL CLI Client – Detached Queries
SQL CLI Client
Catalog
Optimizer
Database /
HDFS
Event Log
Query
Submit Query Job
State
CLI
Result Server
Database /
HDFS
Event Log
Enter Query
INSERT INTO
sinkTable
SELECT
user,
COUNT(url) AS cnt
FROM clicks
GROUP BY user
Cluster & Job ID
© 2019 Ververica24
Serving a Dashboard
Elastic
Search
Kafka
INSERT INTO AreaCnts
SELECT
toAreaId(lon, lat) AS areaId,
COUNT(*) AS cnt
FROM TaxiRides
WHERE isStart
GROUP BY toAreaId(lon, lat)
© 2019 Ververica25
Try Flink SQL Yourself!
•The demo setup is available as a free online training
–Slides to learn the basics
–Exercises on streaming data
https://github.com/ververica/sql-training
© 2019 Ververica26
There’s a Lot More To Come!
• Alibaba is contributing features of its Flink fork Blink back
• Major improvements for SQL and Table API
–Better coverage of SQL features: Full TPC-DS support
–Competitive performance: 10x compared to current state
–Improved connectivity: External catalogs (Hive) and connectors
• Extending the scope of Table API
–Expand support for user-defined functions
–Add support for machine-learning learning pipelines & algorithm library
© 2019 Ververica27
Summary
• Unification of stream and batch is important.
• Flink’s SQL solves many streaming and batch use cases.
• In production at Alibaba, Huawei, Lyft, Uber, and others.
• Query deployment as application or via CLI
• Expect major improvements for batch SQL soon!
© 2019 Ververica
© 2019 Ververica
@VervericaDatawww.ververica.com
© 2019 Ververica30
Average Ride Duration Per Pick-Up Location
SELECT pickUpArea,
AVG(timeDiff(s.rowTime, e.rowTime) / 60000) AS avgDuration
FROM (SELECT rideId, rowTime, toAreaId(lon, lat) AS pickUpArea
FROM TaxiRides
WHERE isStart) s
JOIN
(SELECT rideId, rowTime
FROM TaxiRides
WHERE NOT isStart) e
ON s.rideId = e.rideId AND
e.rowTime BETWEEN s.rowTime AND s.rowTime + INTERVAL '1' HOUR
GROUP BY pickUpArea
▪ Join start ride and end ride events on rideId and
compute average ride duration per pick-up location.

Contenu connexe

Tendances

Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud ServicesBuild a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Servicesconfluent
 
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...HostedbyConfluent
 
Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Confluent Kafka and KSQL: Streaming Data Pipelines Made EasyConfluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Confluent Kafka and KSQL: Streaming Data Pipelines Made EasyKairo Tavares
 
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...confluent
 
Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...
Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...
Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...HostedbyConfluent
 
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent) K...
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent)  K...Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent)  K...
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent) K...confluent
 
Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®confluent
 
Why and how to leverage the power and simplicity of SQL on Apache Flink
Why and how to leverage the power and simplicity of SQL on Apache FlinkWhy and how to leverage the power and simplicity of SQL on Apache Flink
Why and how to leverage the power and simplicity of SQL on Apache FlinkFabian Hueske
 
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Kai Wähner
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Guido Schmutz
 
MongoDB .local London 2019: Streaming Data on the Shoulders of Giants
MongoDB .local London 2019: Streaming Data on the Shoulders of GiantsMongoDB .local London 2019: Streaming Data on the Shoulders of Giants
MongoDB .local London 2019: Streaming Data on the Shoulders of GiantsLisa Roth, PMP
 
From data stream management to distributed dataflows and beyond
From data stream management to distributed dataflows and beyondFrom data stream management to distributed dataflows and beyond
From data stream management to distributed dataflows and beyondVasia Kalavri
 
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...Flink Forward
 
Partner Development Guide for Kafka Connect
Partner Development Guide for Kafka ConnectPartner Development Guide for Kafka Connect
Partner Development Guide for Kafka Connectconfluent
 
Flink Forward San Francisco 2018: Ken Krugler - "Building a scalable focused ...
Flink Forward San Francisco 2018: Ken Krugler - "Building a scalable focused ...Flink Forward San Francisco 2018: Ken Krugler - "Building a scalable focused ...
Flink Forward San Francisco 2018: Ken Krugler - "Building a scalable focused ...Flink Forward
 
Stream Analytics with SQL on Apache Flink
 Stream Analytics with SQL on Apache Flink Stream Analytics with SQL on Apache Flink
Stream Analytics with SQL on Apache FlinkFabian Hueske
 
Concepts and Patterns for Streaming Services with Kafka
Concepts and Patterns for Streaming Services with KafkaConcepts and Patterns for Streaming Services with Kafka
Concepts and Patterns for Streaming Services with KafkaQAware GmbH
 
Data Driven Enterprise with Apache Kafka
Data Driven Enterprise with Apache KafkaData Driven Enterprise with Apache Kafka
Data Driven Enterprise with Apache Kafkaconfluent
 
Kakfa summit london 2019 - the art of the event-streaming app
Kakfa summit london 2019 - the art of the event-streaming appKakfa summit london 2019 - the art of the event-streaming app
Kakfa summit london 2019 - the art of the event-streaming appNeil Avery
 
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...Kai Wähner
 

Tendances (20)

Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud ServicesBuild a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
 
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...
 
Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Confluent Kafka and KSQL: Streaming Data Pipelines Made EasyConfluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy
 
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
 
Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...
Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...
Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...
 
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent) K...
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent)  K...Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent)  K...
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent) K...
 
Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®
 
Why and how to leverage the power and simplicity of SQL on Apache Flink
Why and how to leverage the power and simplicity of SQL on Apache FlinkWhy and how to leverage the power and simplicity of SQL on Apache Flink
Why and how to leverage the power and simplicity of SQL on Apache Flink
 
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?
 
MongoDB .local London 2019: Streaming Data on the Shoulders of Giants
MongoDB .local London 2019: Streaming Data on the Shoulders of GiantsMongoDB .local London 2019: Streaming Data on the Shoulders of Giants
MongoDB .local London 2019: Streaming Data on the Shoulders of Giants
 
From data stream management to distributed dataflows and beyond
From data stream management to distributed dataflows and beyondFrom data stream management to distributed dataflows and beyond
From data stream management to distributed dataflows and beyond
 
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
 
Partner Development Guide for Kafka Connect
Partner Development Guide for Kafka ConnectPartner Development Guide for Kafka Connect
Partner Development Guide for Kafka Connect
 
Flink Forward San Francisco 2018: Ken Krugler - "Building a scalable focused ...
Flink Forward San Francisco 2018: Ken Krugler - "Building a scalable focused ...Flink Forward San Francisco 2018: Ken Krugler - "Building a scalable focused ...
Flink Forward San Francisco 2018: Ken Krugler - "Building a scalable focused ...
 
Stream Analytics with SQL on Apache Flink
 Stream Analytics with SQL on Apache Flink Stream Analytics with SQL on Apache Flink
Stream Analytics with SQL on Apache Flink
 
Concepts and Patterns for Streaming Services with Kafka
Concepts and Patterns for Streaming Services with KafkaConcepts and Patterns for Streaming Services with Kafka
Concepts and Patterns for Streaming Services with Kafka
 
Data Driven Enterprise with Apache Kafka
Data Driven Enterprise with Apache KafkaData Driven Enterprise with Apache Kafka
Data Driven Enterprise with Apache Kafka
 
Kakfa summit london 2019 - the art of the event-streaming app
Kakfa summit london 2019 - the art of the event-streaming appKakfa summit london 2019 - the art of the event-streaming app
Kakfa summit london 2019 - the art of the event-streaming app
 
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
 

Similaire à Flink SQL in Action

Webinar: Flink SQL in Action - Fabian Hueske
 Webinar: Flink SQL in Action - Fabian Hueske Webinar: Flink SQL in Action - Fabian Hueske
Webinar: Flink SQL in Action - Fabian HueskeVerverica
 
Why and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on FlinkWhy and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on FlinkDataWorks Summit
 
Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"
Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"
Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"Flink Forward
 
Unified Data Processing with Apache Flink and Apache Pulsar_Seth Wiesman
Unified Data Processing with Apache Flink and Apache Pulsar_Seth WiesmanUnified Data Processing with Apache Flink and Apache Pulsar_Seth Wiesman
Unified Data Processing with Apache Flink and Apache Pulsar_Seth WiesmanStreamNative
 
KEYNOTE Flink Forward San Francisco 2019: From Stream Processor to a Unified ...
KEYNOTE Flink Forward San Francisco 2019: From Stream Processor to a Unified ...KEYNOTE Flink Forward San Francisco 2019: From Stream Processor to a Unified ...
KEYNOTE Flink Forward San Francisco 2019: From Stream Processor to a Unified ...Flink Forward
 
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...Flink Forward
 
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Ver...
Towards Flink 2.0:  Unified Batch & Stream Processing - Aljoscha Krettek, Ver...Towards Flink 2.0:  Unified Batch & Stream Processing - Aljoscha Krettek, Ver...
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Ver...Flink Forward
 
Flink Forward San Francisco 2019: Towards Flink 2.0: Rethinking the stack and...
Flink Forward San Francisco 2019: Towards Flink 2.0: Rethinking the stack and...Flink Forward San Francisco 2019: Towards Flink 2.0: Rethinking the stack and...
Flink Forward San Francisco 2019: Towards Flink 2.0: Rethinking the stack and...Flink Forward
 
ApacheCon 2020 - Flink SQL in 2020: Time to show off!
ApacheCon 2020 - Flink SQL in 2020: Time to show off!ApacheCon 2020 - Flink SQL in 2020: Time to show off!
ApacheCon 2020 - Flink SQL in 2020: Time to show off!Timo Walther
 
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward
 
Meteor Workshop - Open Sanca
Meteor Workshop - Open SancaMeteor Workshop - Open Sanca
Meteor Workshop - Open SancaPaulo Hecht
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
What's new for Apache Flink's Table & SQL APIs?
What's new for Apache Flink's Table & SQL APIs?What's new for Apache Flink's Table & SQL APIs?
What's new for Apache Flink's Table & SQL APIs?Timo Walther
 
Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021
Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021
Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021StreamNative
 
Service Discovery and Registration in a Microservices Architecture
Service Discovery and Registration in a Microservices ArchitectureService Discovery and Registration in a Microservices Architecture
Service Discovery and Registration in a Microservices ArchitecturePLUMgrid
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flinkconfluent
 
MuleSoft Online Meetup - MuleSoft integration with snowflake and kafka
MuleSoft Online Meetup - MuleSoft integration with snowflake and kafkaMuleSoft Online Meetup - MuleSoft integration with snowflake and kafka
MuleSoft Online Meetup - MuleSoft integration with snowflake and kafkaRoyston Lobo
 
Pivotal microservices spring_pcf_skillsmatter.pptx
Pivotal microservices spring_pcf_skillsmatter.pptxPivotal microservices spring_pcf_skillsmatter.pptx
Pivotal microservices spring_pcf_skillsmatter.pptxSufyaan Kazi
 
PLNOG15: The Power of the Open Standards SDN API’s - Mikael Holmberg
PLNOG15: The Power of the Open Standards SDN API’s - Mikael Holmberg PLNOG15: The Power of the Open Standards SDN API’s - Mikael Holmberg
PLNOG15: The Power of the Open Standards SDN API’s - Mikael Holmberg PROIDEA
 
Spring and Pivotal Application Service - SpringOne Tour - Boston
Spring and Pivotal Application Service - SpringOne Tour - BostonSpring and Pivotal Application Service - SpringOne Tour - Boston
Spring and Pivotal Application Service - SpringOne Tour - BostonVMware Tanzu
 

Similaire à Flink SQL in Action (20)

Webinar: Flink SQL in Action - Fabian Hueske
 Webinar: Flink SQL in Action - Fabian Hueske Webinar: Flink SQL in Action - Fabian Hueske
Webinar: Flink SQL in Action - Fabian Hueske
 
Why and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on FlinkWhy and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on Flink
 
Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"
Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"
Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"
 
Unified Data Processing with Apache Flink and Apache Pulsar_Seth Wiesman
Unified Data Processing with Apache Flink and Apache Pulsar_Seth WiesmanUnified Data Processing with Apache Flink and Apache Pulsar_Seth Wiesman
Unified Data Processing with Apache Flink and Apache Pulsar_Seth Wiesman
 
KEYNOTE Flink Forward San Francisco 2019: From Stream Processor to a Unified ...
KEYNOTE Flink Forward San Francisco 2019: From Stream Processor to a Unified ...KEYNOTE Flink Forward San Francisco 2019: From Stream Processor to a Unified ...
KEYNOTE Flink Forward San Francisco 2019: From Stream Processor to a Unified ...
 
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
 
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Ver...
Towards Flink 2.0:  Unified Batch & Stream Processing - Aljoscha Krettek, Ver...Towards Flink 2.0:  Unified Batch & Stream Processing - Aljoscha Krettek, Ver...
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Ver...
 
Flink Forward San Francisco 2019: Towards Flink 2.0: Rethinking the stack and...
Flink Forward San Francisco 2019: Towards Flink 2.0: Rethinking the stack and...Flink Forward San Francisco 2019: Towards Flink 2.0: Rethinking the stack and...
Flink Forward San Francisco 2019: Towards Flink 2.0: Rethinking the stack and...
 
ApacheCon 2020 - Flink SQL in 2020: Time to show off!
ApacheCon 2020 - Flink SQL in 2020: Time to show off!ApacheCon 2020 - Flink SQL in 2020: Time to show off!
ApacheCon 2020 - Flink SQL in 2020: Time to show off!
 
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
 
Meteor Workshop - Open Sanca
Meteor Workshop - Open SancaMeteor Workshop - Open Sanca
Meteor Workshop - Open Sanca
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
What's new for Apache Flink's Table & SQL APIs?
What's new for Apache Flink's Table & SQL APIs?What's new for Apache Flink's Table & SQL APIs?
What's new for Apache Flink's Table & SQL APIs?
 
Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021
Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021
Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021
 
Service Discovery and Registration in a Microservices Architecture
Service Discovery and Registration in a Microservices ArchitectureService Discovery and Registration in a Microservices Architecture
Service Discovery and Registration in a Microservices Architecture
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
MuleSoft Online Meetup - MuleSoft integration with snowflake and kafka
MuleSoft Online Meetup - MuleSoft integration with snowflake and kafkaMuleSoft Online Meetup - MuleSoft integration with snowflake and kafka
MuleSoft Online Meetup - MuleSoft integration with snowflake and kafka
 
Pivotal microservices spring_pcf_skillsmatter.pptx
Pivotal microservices spring_pcf_skillsmatter.pptxPivotal microservices spring_pcf_skillsmatter.pptx
Pivotal microservices spring_pcf_skillsmatter.pptx
 
PLNOG15: The Power of the Open Standards SDN API’s - Mikael Holmberg
PLNOG15: The Power of the Open Standards SDN API’s - Mikael Holmberg PLNOG15: The Power of the Open Standards SDN API’s - Mikael Holmberg
PLNOG15: The Power of the Open Standards SDN API’s - Mikael Holmberg
 
Spring and Pivotal Application Service - SpringOne Tour - Boston
Spring and Pivotal Application Service - SpringOne Tour - BostonSpring and Pivotal Application Service - SpringOne Tour - Boston
Spring and Pivotal Application Service - SpringOne Tour - Boston
 

Plus de Fabian Hueske

Stream Analytics with SQL on Apache Flink
Stream Analytics with SQL on Apache FlinkStream Analytics with SQL on Apache Flink
Stream Analytics with SQL on Apache FlinkFabian Hueske
 
Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.Fabian Hueske
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkFabian Hueske
 
Juggling with Bits and Bytes - How Apache Flink operates on binary data
Juggling with Bits and Bytes - How Apache Flink operates on binary dataJuggling with Bits and Bytes - How Apache Flink operates on binary data
Juggling with Bits and Bytes - How Apache Flink operates on binary dataFabian Hueske
 
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data ProcessingApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data ProcessingFabian Hueske
 
Apache Flink - Hadoop MapReduce Compatibility
Apache Flink - Hadoop MapReduce CompatibilityApache Flink - Hadoop MapReduce Compatibility
Apache Flink - Hadoop MapReduce CompatibilityFabian Hueske
 
Apache Flink - A Sneek Preview on Language Integrated Queries
Apache Flink - A Sneek Preview on Language Integrated QueriesApache Flink - A Sneek Preview on Language Integrated Queries
Apache Flink - A Sneek Preview on Language Integrated QueriesFabian Hueske
 
Apache Flink - Akka for the Win!
Apache Flink - Akka for the Win!Apache Flink - Akka for the Win!
Apache Flink - Akka for the Win!Fabian Hueske
 
Apache Flink - Community Update January 2015
Apache Flink - Community Update January 2015Apache Flink - Community Update January 2015
Apache Flink - Community Update January 2015Fabian Hueske
 

Plus de Fabian Hueske (9)

Stream Analytics with SQL on Apache Flink
Stream Analytics with SQL on Apache FlinkStream Analytics with SQL on Apache Flink
Stream Analytics with SQL on Apache Flink
 
Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
 
Juggling with Bits and Bytes - How Apache Flink operates on binary data
Juggling with Bits and Bytes - How Apache Flink operates on binary dataJuggling with Bits and Bytes - How Apache Flink operates on binary data
Juggling with Bits and Bytes - How Apache Flink operates on binary data
 
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data ProcessingApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
 
Apache Flink - Hadoop MapReduce Compatibility
Apache Flink - Hadoop MapReduce CompatibilityApache Flink - Hadoop MapReduce Compatibility
Apache Flink - Hadoop MapReduce Compatibility
 
Apache Flink - A Sneek Preview on Language Integrated Queries
Apache Flink - A Sneek Preview on Language Integrated QueriesApache Flink - A Sneek Preview on Language Integrated Queries
Apache Flink - A Sneek Preview on Language Integrated Queries
 
Apache Flink - Akka for the Win!
Apache Flink - Akka for the Win!Apache Flink - Akka for the Win!
Apache Flink - Akka for the Win!
 
Apache Flink - Community Update January 2015
Apache Flink - Community Update January 2015Apache Flink - Community Update January 2015
Apache Flink - Community Update January 2015
 

Dernier

Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfrahulyadav957181
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfsimulationsindia
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 

Dernier (20)

Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdf
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 

Flink SQL in Action

  • 1. © 2019 Ververica Fabian Hueske – Software Engineer Apache Flink® SQL in Action
  • 2. © 2019 Ververica2 About Me • Apache Flink PMC member & ASF member ‒Contributing since day 1 at TU Berlin ‒Focusing on Flink’s relational APIs since ~3.5 years • Co-author of “Stream Processing with Apache Flink” • Co-founder of Ververica (formerly data Artisans)
  • 3. © 2019 Ververica3 What is Apache Flink? Stateful computations over streams real-time and historic fast, scalable, fault tolerant, in-memory event time, large state, exactly-once
  • 4. © 2019 Ververica4 Hardened at Scale Streaming Platform Service billions messages per day A lot of Stream SQL Streaming Platform as a Service 3700+ container running Flink, 1400+ nodes, 22k+ cores, 100s of jobs, 3 trillion events / day, 20 TB state Fraud detection Streaming Analytics Platform 1000s jobs, 100.000s cores, 10 TBs state, metrics, analytics, real time ML, Streaming SQL as a platform
  • 5. © 2019 Ververica5 Powered by Apache Flink
  • 6. © 2019 Ververica6 Flink’s Powerful Abstractions Process Function (events, state, time) DataStream API (streams, windows) SQL / Table API (dynamic tables) Stream- & Batch Data Processing High-level Analytics API Stateful Event- Driven Applications val stats = stream .keyBy("sensor") .timeWindow(Time.seconds(5)) .sum((a, b) -> a.add(b)) def processElement(event: MyEvent, ctx: Context, out: Collector[Result]) = { // work with event and state (event, state.value) match { … } out.collect(…) // emit events state.update(…) // modify state // schedule a timer callback ctx.timerService.registerEventTimeTimer(event.timestamp + 500) } Layered abstractions to navigate simple to complex use cases
  • 7. © 2019 Ververica7 Flink’s Relational APIs Unified APIs for batch & streaming data A query specifies exactly the same result regardless whether its input is static batch data or streaming data. tableEnvironment .scan("clicks") .groupBy('user) .select('user, 'url.count as 'cnt) SELECT user, COUNT(url) AS cnt FROM clicks GROUP BY user LINQ-style Table APIANSI SQL
  • 8. © 2019 Ververica8 Query Translation tableEnvironment .scan("clicks") .groupBy('user) .select('user, 'url.count) SELECT user, COUNT(url) FROM clicks GROUP BY user Input data is bounded (batch) Input data is unbounded (streaming)
  • 9. © 2019 Ververica9 What if “Clicks” is a File? Clicks user cTime url Mary 12:00:00 https://… Bob 12:00:00 https://… Mary 12:00:02 https://… Liz 12:00:03 https://… user cnt Mary 2 Bob 1 Liz 1 SELECT user, COUNT(url) as cnt FROM clicks GROUP BY user Input data is read at once Result is produced at once
  • 10. © 2019 Ververica10 What if “Clicks” is a Stream? user cTime url user cnt SELECT user, COUNT(url) as cnt FROM clicks GROUP BY user Clicks Mary 12:00:00 https://… Bob 12:00:00 https://… Mary 12:00:02 https://… Liz 12:00:03 https://… Bob 1 Liz 1 Mary 1Mary 2 Input data is continuously read Result is continuously updated The result is the same!
  • 11. © 2019 Ververica11 Why is Stream-Batch Unification Important? • Usability ‒ ANSI SQL syntax: No custom “StreamSQL” syntax. ‒ ANSI SQL semantics: No stream-specific result semantics. • Portability ‒ Run the same query on bounded & unbounded data ‒ Run the same query on recorded & real-time data ‒ Bootstrapping query state or backfilling results from historic data now bounded query unbounded query past future bounded query start of the stream unbounded query
  • 12. © 2019 Ververica12 Database Systems Run Queries on Streams • Materialized views (MV) are similar to regular views, but persisted to disk or memory ‒Used to speed-up analytical queries ‒MVs need to be updated when the base tables change • MV maintenance is very similar to SQL on streams ‒Base table updates are a stream of DML statements ‒MV definition query is evaluated on that stream ‒MV is query result and continuously updated
  • 13. © 2019 Ververica13 Continuous Queries in Flink • Core concept is a “Dynamic Table” ‒ Dynamic tables are changing over time • Queries on dynamic tables ‒ produce new dynamic tables (which are updated based on input) ‒ do not terminate • Stream ↔ Dynamic table conversions
  • 14. © 2019 Ververica14 Stream ↔ Dynamic Table Conversions •A stream is the changelog of a dynamic table –As change messages are ingested from a stream, a table evolves –As a table evolves, change messages are emitted to a stream •Different changelog interpretations –Append-only change messages –Upsert change messages –Add/Retract change messages
  • 15. © 2019 Ververica15 How Can I Use It? • Embed SQL queries in regular Flink applications ‒ Tight integration with DataStream and DataSet APIs ‒ Mix and match with other libraries (CEP, ProcessFunction, Gelly) ‒ Package and operate queries like any other Flink application • Run SQL queries via Flink’s SQL CLI Client ‒ Interactive mode: Submit query and inspect results ‒ Detached mode: Submit query and write results to sink system
  • 16. © 2019 Ververica16 Show Results Enter Query SELECT user, COUNT(url) FROM clicks GROUP BY user Database / HDFS Event Log Query State ResultsChangelog or Table SQL CLI Client Catalog Optimizer CLI Result Server Submit Query Job SQL CLI Client – Interactive Queries
  • 17. © 2019 Ververica17 The New York Taxi Rides Data Set • A public data set about taxi rides in New York City • Rides are ingested as append-only (streaming) table ‒ Each ride is represented by a start and an end event • Table: Rides rideId: BIGINT // ID of the taxi ride taxiId: BIGINT // ID of the taxi isStart: BOOLEAN // flag for pick-up (true) or drop-off (false) event lon: DOUBLE // longitude of pick-up or drop-off location lat: DOUBLE // latitude of pick-up or drop-off location rideTime: TIMESTAMP // time of pick-up or drop-off event psgCnt: INT // number of passengers
  • 18. © 2019 Ververica18 Compute Basic Statistics SELECT psgCnt, COUNT(*) as cnt FROM Rides WHERE isStart GROUP BY psgCnt ▪ Count rides per number of passengers.
  • 19. © 2019 Ververica19 Identify Popular Pick-Up / Drop-Off Locations SELECT area, isStart, TUMBLE_END(rideTime, INTERVAL '5' MINUTE) AS cntEnd, COUNT(*) AS cnt FROM (SELECT rideTime, isStart, toAreaId(lon, lat) AS area FROM Rides) GROUP BY area, isStart, TUMBLE(rideTime, INTERVAL '5' MINUTE) ▪ Compute every 5 minutes for each area the number of departing and arriving taxis.
  • 20. © 2019 Ververica20 Average Tip Per Hour of Day SELECT HOUR(r.rideTime) AS hourOfDay, AVG(f.tip) AS avgTip FROM Rides r, Fares f WHERE NOT r.isStart AND r.rideId = f.rideId AND f.payTime BETWEEN r.rideTime - INTERVAL '5' MINUTE AND r.rideTime GROUP BY HOUR(r.rideTime); ▪ Compute the average tip per hour of day. Fare data is stored in a separate table Fares that needs to be joined.
  • 21. © 2019 Ververica21 SQL Feature Set in Flink 1.8.0 STREAMING ONLY • OVER / WINDOW ‒ UNBOUNDED / BOUNDED PRECEDING • INNER JOIN with time-versioned table • MATCH_RECOGNIZE ‒ Pattern Matching/CEP (SQL:2016) BATCH ONLY • UNION / INTERSECT / EXCEPT • ORDER BY STREAMING & BATCH • SELECT FROM WHERE • GROUP BY [HAVING] ‒ Non-windowed ‒ TUMBLE, HOP, SESSION windows • JOIN ‒ Time-Windowed INNER + OUTER JOIN ‒ Non-windowed INNER + OUTER JOIN • User-Defined Functions ‒ Scalar ‒ Aggregation ‒ Table-valued
  • 22. © 2019 Ververica22 What Can I Build With That? • Data Pipelines & Low-latency ETL ‒ Transform, aggregate, and move events in real-time ‒ Write streams to file systems, DBMS, K-V stores, … ‒ Ingest appearing files to produce streams • Stream & Batch Analytics ‒ Run analytical queries over bounded and unbounded data ‒ Query and compare historic and real-time data • Power Live Dashboards ‒ Compute and update data to visualize in real-time
  • 23. © 2019 Ververica23 SQL CLI Client – Detached Queries SQL CLI Client Catalog Optimizer Database / HDFS Event Log Query Submit Query Job State CLI Result Server Database / HDFS Event Log Enter Query INSERT INTO sinkTable SELECT user, COUNT(url) AS cnt FROM clicks GROUP BY user Cluster & Job ID
  • 24. © 2019 Ververica24 Serving a Dashboard Elastic Search Kafka INSERT INTO AreaCnts SELECT toAreaId(lon, lat) AS areaId, COUNT(*) AS cnt FROM TaxiRides WHERE isStart GROUP BY toAreaId(lon, lat)
  • 25. © 2019 Ververica25 Try Flink SQL Yourself! •The demo setup is available as a free online training –Slides to learn the basics –Exercises on streaming data https://github.com/ververica/sql-training
  • 26. © 2019 Ververica26 There’s a Lot More To Come! • Alibaba is contributing features of its Flink fork Blink back • Major improvements for SQL and Table API –Better coverage of SQL features: Full TPC-DS support –Competitive performance: 10x compared to current state –Improved connectivity: External catalogs (Hive) and connectors • Extending the scope of Table API –Expand support for user-defined functions –Add support for machine-learning learning pipelines & algorithm library
  • 27. © 2019 Ververica27 Summary • Unification of stream and batch is important. • Flink’s SQL solves many streaming and batch use cases. • In production at Alibaba, Huawei, Lyft, Uber, and others. • Query deployment as application or via CLI • Expect major improvements for batch SQL soon!
  • 30. © 2019 Ververica30 Average Ride Duration Per Pick-Up Location SELECT pickUpArea, AVG(timeDiff(s.rowTime, e.rowTime) / 60000) AS avgDuration FROM (SELECT rideId, rowTime, toAreaId(lon, lat) AS pickUpArea FROM TaxiRides WHERE isStart) s JOIN (SELECT rideId, rowTime FROM TaxiRides WHERE NOT isStart) e ON s.rideId = e.rideId AND e.rowTime BETWEEN s.rowTime AND s.rowTime + INTERVAL '1' HOUR GROUP BY pickUpArea ▪ Join start ride and end ride events on rideId and compute average ride duration per pick-up location.