SlideShare une entreprise Scribd logo
1  sur  61
Télécharger pour lire hors ligne
Everything is connected: How watermarking, scaling, and
exactly once impact one another in Pravega
Flavio Junqueira
Senior Director, Senior Distinguished Engineer
Tom Kaitchuck
Distinguished Engineer
Flink Forward, April 2020 – Pravega http://pravega.io
Pravega: Storage for data streams
• Pravega is
– A stream store: stream is the storage primitive
– The foundation is segments
– Segments enable a flexible composition of streams
– Segments enable watermarking, scaling, transactions, and state replication
• Pravega is open source http://pravega.io
https://github.com/pravega/pravega
Flink Forward, April 2020 – Pravega http://pravega.io
Data streams
Data Pipelines
Storage
(Pravega)
Stream
Processor
(Apache Flink)
• Data Sources:
• User status
• Online transactions
• Server telemetry data
• Sensor samples
• Connected cars
• Drone videos
• Core elements
• Storage (Pravega)
• Stream processor (Apache Flink)
• Arbitrary Direct Acyclic Processing Graphs
• Visualization
• Alerts
• Near real-time insights
• Recommendations
• Actionable instructions
Landscape:
Flink Forward, April 2020 – Pravega http://pravega.io
Data Streams
Flink Forward, April 2020 – Pravega http://pravega.io
Data streams
Social networks
Online shopping
Server monitoring
IoT, Edge
Stream of user events
• Status updates
• Online transactions
Telemetry streams
• CPU, memory, disk utilization
Stream of sensor events
• Temperature samples
• Radar and image
Tools and services
Jira, Git, Jenkins
• Logs
• Results
Flink Forward, April 2020 – Pravega http://pravega.io
Drone video streams + telemetry
Video stream
Telemetry
Drone fleet
UNIFIED
ANALYTICSPROCESSINGEST
• From cattle health
• To airplane inspection between flights
Telemetry
Video feed
Flink Forward, April 2020 – Pravega http://pravega.io
Industrial IoT
UNIFIED
ANALYTICS
UNIFIED
PROCESSING
INGESTION
ENGINE
Manufacturing flow
Decoiler Leveler
Lubricator
Press
Source: https://www.dellemc.com/resources/en-us/asset/customer-profiles-case-studies/solutions/delltechnologies-customer-profile-rwth.pdf
Flink Forward, April 2020 – Pravega http://pravega.io
DATA STREM:S – SEQUENTIAL
EventsServer,
Sensor
,
etc.
e1e2e3e4e5e6e7e8ekek+1ek+2ek+3
HeadTail
Data streams
Flink Forward, April 2020 – Pravega http://pravega.io
DATA STREAMS - PARALLELISM
e1
e2
e3
e4
e5
e6
e7
e8
e10
e11
e14
e12
e13
e15
e16
e9e18
e17
e19
e20
e21
e22
e23
e24
e26
e28
e29
e27
e25
ek
ek+1
ek+2
ek+3
ek+5
ek+4
ek+6
ek+7
ek+8
ek+9
ek+10 ek+8
HeadTail
Data streams: Paralelism
Events
Servers,
Sensors,
etc.
Flink Forward, April 2020 – Pravega http://pravega.io
DATA STREAMS – TRAFFIC FLUCTUATION
Events
Servers,
Sensors,
etc.
e1
e2
e3
e4
e5
e6
e7
e8
e10
e11
e14
e12
e13e15
e16
e9
e18
e17
e19
e20
e21
e22
e23
e24
e26
e28
e29
e27
e25
ek
ek+1
ek+2
ek+3
ek+5
ek+4
ek+6
ek+7
ek+8
ek+9
ek+10 ek+8
e30
e33
e31
e32
e39
e34
e35
e37
e36
e38
e41
e43
e40
e42
e44
Load
Increases
Load
Drops
HeadTail
Data streams: Workload variations
Flink Forward, April 2020 – Pravega http://pravega.io
DATA STREAMS – TRAFFIC FLUCTUATION
Events
Servers,
Sensors,
etc.
e1
e2
e3
e4
e5
e6
e7
e8
e10
e11
e14
e12
e13e15
e16
e9
e18
e17
e19
e20
e21
e22
e23
e24
e26
e28
e29
e27
e25
ek
ek+1
ek+2
ek+3
ek+5
ek+4
ek+6
ek+7
ek+8
ek+9
ek+10 ek+8
e30
e33
e31
e32
e39
e34
e35
e37
e36
e38
e41
e43
e40
e42
e44
HeadTail
Data streams: Unbounded
Fresh,
recently
ingested
Older,
historical
data
The Lambda way
Flink Forward, April 2020 – Pravega http://pravega.io
DATA STREAMS – TRAFFIC FLUCTUATION
Events
Servers,
Sensors,
etc.
e1
e2
e3
e4
e5
e6
e7
e8
e10
e11
e14
e12
e13e15
e16
e9
e18
e17
e19
e20
e21
e22
e23
e24
e26
e28
e29
e27
e25
ek
ek+1
ek+2
ek+3
ek+5
ek+4
ek+6
ek+7
ek+8
ek+9
ek+10 ek+8
e30
e33
e31
e32
e39
e34
e35
e37
e36
e38
e41
e43
e40
e42
e44
HeadTail
Data streams: Unbounded
Ideally, no such a
distinction between
fresh and historical
Flink Forward, April 2020 – Pravega http://pravega.io
DATA STREAMS – TRAFFIC FLUCTUATION
e1
e2
e3
e4
e5
e6
e7
e8
e10
e11
e14
e12
e13e15
e16
e9
e18
e17
e19
e20
e21
e22
e23
e24
e26
e28
e29
e27
e25
ek
ek+1
ek+2
ek+3
ek+5
ek+4
ek+6
ek+7
ek+8
ek+9
ek+10 ek+8
e30
e33
e31
e32
e39
e34
e35
e37
e36
e38
e41
e43
e40
e42
e44
HeadTail
Data streams: Read scalability
Stream
processor
Flink Forward, April 2020 – Pravega http://pravega.io
Stream in
Streaming
Storage
• Unbounded data
• Elastic
• Consistent
• Tailing and historical data
analytics
• Cloud native
Stream out
Data stream: Cloud native, storage primitive
Flink Forward, April 2020 – Pravega http://pravega.io
The recipe for effective stream processing
• Exactly once processing
– Don’t miss, don’t duplicate
• Checkpoints
– Enable rewinding
• Durability
– Enable replaying
• Scaling
– Workloads change dynamically and provisioning follow the changes
• Watermarking
– Advance event, ingest, processing time
Flink Forward, April 2020 – Pravega http://pravega.io
Everything is connected …
Exactly-once semantics
Durability
CheckpointsScaling
Watermarking
Flink Forward, April 2020 – Pravega http://pravega.io
Time windows
Source: Emits samples,
records, messages
en …. e3 e2 e1
Examples:
• Sensors in IoT
• End users in social
networks
• Server metrics
e3 e2 e1
Window aggregation:
Count1
e5 e4
e7
Window 3
(closed)
Window 6
(open)
Window 9
(open)
… e8 e6 e7 e5 e4 e3 e2 e1
Event
time
Flink Forward, April 2020 – Pravega http://pravega.io
Time windows
Source: Emits samples,
records, messages
en …. e3 e2 e1
Examples:
• Sensors in IoT
• End users in social
networks
• Server metrics
e3 e2 e1
Window aggregation:
Count1
e5 e4
e7
Window 3
(closed)
Window 6
(open)
Window 9
(open)
… e8 e6 e7 e5 e4 e3 e2 e1
Writer
needs to
supply time
information
Event
time
Flink Forward, April 2020 – Pravega http://pravega.io
Low watermark
e3 e2 e1
Window aggregation:
Count1
e5 e4
e7
Window 3
(closed)
Window 6
(open)
Window 9
(open)
Event
time
Source: Emits events
W(3)W(6)
Watermark W(t)
• Has an associated timestamp t
• Contract: all events with a timestamp smaller
than or equal to t have been received
• Closes window with smaller ending timestamp
• Late events violate contract
en …. e3 e2 e1 … e8 | e6 e7 e5 e4 | e3 e2 e1
Flink Forward, April 2020 – Pravega http://pravega.io
Order
Source
en …. e3 e2 e1 … e8 | e6 e7 e5 e4 | e3 e2 e1
W(3)W(6)
Source
en …. e3 e2 e1 … e8 e6 | e7 e5 e4 | e3 e2 e1
W(3)W(6)
• Out of order events are expected
• Low watermarks must be able to
accomodate out-of-orderness
OK
Flink Forward, April 2020 – Pravega http://pravega.io
Global order
en …. e5 e3 e1
en …. e6 e4 e2
en …. e10 e7 e4
… e15 | e13 e11 | e9 | e7 e5 | e3 e1
… e16 e14 | e12 | e10 e8 | e6 | e4 e2
… e25 | e22 | e19 | e16 | e13 | e10 | e7 | e4
W(3)W(6)W(9)W(12)
W(3)W(9)W(12) W(6)
W(3)W(9) W(6)W(12)
Multiple sources
Watermarks must reflect time
progress across all sources
• Requires global
coordination
• Aggregate watermarks
• Compare timestamps
from many sources
• Sources need to report:
• Event time
• Position
Flink Forward, April 2020 – Pravega http://pravega.io
Last is 3Counter:0
Tracking source position
• Source connects to stream store to append
e1e4 e2e3
e4 e3 e2 e1
Flink Forward, April 2020 – Pravega http://pravega.io
e3 e2 e1
Tracking source position
• Source connects to stream store to append
e1e4 e2e3
• Appends e1 e2 e3 successfuly
e4 e3 e2 e1
Counter:3
Flink Forward, April 2020 – Pravega http://pravega.io
e3 e2 e1
Last is 3Counter:0
Tracking source position
• Source connects to stream store to append
e1e4 e2e3
• Appends e1 e2 e3 successfuly
• Disconnects
e4 e3 e2 e1
Counter:1Counter:2Counter:3
Flink Forward, April 2020 – Pravega http://pravega.io
e3 e2 e1
Last is 3Counter:0
Tracking source position
• Source connects to stream store to append
e1e4 e2e3
• Appends e1 e2 e3 successfuly
• Disconnects
e4 e3 e2 e1
• Reconnects and determines that the last event
written is e3
Counter:1Counter:2Counter:3Counter:3Last is 3
Flink Forward, April 2020 – Pravega http://pravega.io
e4 e3 e2 e1
Last is 3Counter:0
Tracking source position
• Source connects to stream store to append
e1e4 e2e3
• Appends e1 e2 e3 successfuly
• Disconnects
e4 e3 e2 e1
• Reconnects and determines that the last event
written is e3
• Appends e4
Counter:1Counter:2Counter:3Counter:4
Flink Forward, April 2020 – Pravega http://pravega.io
Time windows: Accuracy
e3 e2 e1
Window aggregation:
Count1
e5 e4
e7
Window 3
(closed)
Window 6
(open)
Window 9
(open)
Source: Emits events
en …. e3 e2 e1
W(3)W(6)
… e8 | e6 e7 e5 e4 | e3 e2 e1
Exactly once is needed for accurate
windows
Event
time
Flink Forward, April 2020 – Pravega http://pravega.io
Upon crashes
e3 e2 e1
Window aggregation:
Count1
e5 e4
e7
Window 3
(closed)
Window 6
(open)
Window 9
(open)
Event
time
Crash
Source: Emits events
en …. e3 e2 e1
W(3)W(6)
… e8 | e6 e7 e5 e4 | e3 e2 e1
Flink Forward, April 2020 – Pravega http://pravega.io
Upon crashes
Source: Emits events
en …. e3 e2 e1 … e8 | e6 e7 e5 e4 | e3 e2 e1
W(3)W(6)
Watermark W(t)
• Has an associated timestamp t
• Contract: all events with a timestamp smaller
than or equal to t have been received
• Closes window with smaller ending timestamp
• Late events violate contract
e3 e2 e1
Window aggregation:
Count1
e5 e4
e7
Window 3
(closed)
Window 6
(open)
Window 9
(open)
Crash
Flink Forward, April 2020 – Pravega http://pravega.io
Upon crashes
Source: Emits events
en …. e3 e2 e1 … e8 | e6 e7 e5 e4 | e3 e2 e1
W(3)W(6)
e3 e2 e1
Window aggregation:
Count1
e5 e4
e7
Window 3
(closed)
Window 6
(open)
Window 9
(open)
• To guarantee exactly-once semantics
• Connector enables Flink to track progress
• Flink determines where to resume from upon recovery
Crash
Flink Forward, April 2020 – Pravega http://pravega.io
Upon crashes
e3 e2 e1
Window aggregation:
Count1
Window 3
(closed)
• Ability to rewind to a safe position
• Replay events
• Checkpoints
Event
time
Source: Emits events
en …. e3 e2 e1
W(3)W(6)
… e8 | e6 e7 e5 e4 | e3 e2 e1
Flink Forward, April 2020 – Pravega http://pravega.io
Checkpointing
• Crash failures
– Hosts across the system can crash at any time
– Ability to back up and re-read data.
• Checkpoints
– A point in the application execution
– All intermediate state is persisted
– Computation can be resumed from checkpoint
• Implications to sources
– Ability to replay (durability)
– Ability to control the position and persist it
… e8 | e6 e7 e5 e4 | e3 e2 e1
W(3)W(6)
• Ability to rewind to a safe position
• Replay events
• Checkpoints
Flink Forward, April 2020 – Pravega http://pravega.io
Checkpoint complicates sinks
• Sinks
– Output data of a job
• When rolling back to a checkpoint
– Preserve exactly-once semantics
– Data needs to be “unwritten” or “deduped”
• Also applies to time
– Output time can only advance on checkpoints
Flink Forward, April 2020 – Pravega http://pravega.io
Scaling and rebalancing
Key
space
Worker
Key
space
Key
space
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Scaling Rebalancing
• Consistent assignment
• Time advances
consistently for
watermarking
• Exactly-once guarantee
Flink Forward, April 2020 – Pravega http://pravega.io
The bottom line:
Computing watermarks is complicated…
Flink Forward, April 2020 – Pravega http://pravega.io
Support for watermarks
in Pravega
Flink Forward, April 2020 – Pravega http://pravega.io
Segment 7
Segment 4
Segment 3
Segment 1
Segment 2
Segment 0
Time
.75
.5
.25
0
t0 t1 t2 t3
Segment 6
Segment 5
t4
KeySpace
Stream S
1.0
Flink Forward, April 2020 – Pravega http://pravega.io
StreamCuts
Segment 7
Segment 4
Segment 3
Segment 1
Segment 2
Segment 0
Time
.75
.5
.25
0
t0 t1 t2 t3
Segment 6
Segment 5
t4
KeySpace
1.0
Stream Cut
Flink Forward, April 2020 – Pravega http://pravega.io
Watermarking (writer side)
Watermark
State
Writer
Writer
Writer
Flink Forward, April 2020 – Pravega http://pravega.io
Watermarking (writer side)
Watermark
State
Writer
Writer
Writer
Flink Forward, April 2020 – Pravega http://pravega.io
Watermarking (writer side)
Watermark
State
Writer
Writer
Writer
Flink Forward, April 2020 – Pravega http://pravega.io
Watermarking (writer side)
Watermark
State
Writer
Writer
Writer
Flink Forward, April 2020 – Pravega http://pravega.io
Reader
Reader
Reader
W
W
W
Watermarking (reader side)
Flink Forward, April 2020 – Pravega http://pravega.io
Reader
Reader
Reader
W
W
W
Watermarking (reader side)
Reader
Reader
Flink Forward, April 2020 – Pravega http://pravega.io
Reader
Reader
Reader
W
Watermarking (reader side)
W
W
Flink Forward, April 2020 – Pravega http://pravega.io
Pravega + Flink
Flink Forward, April 2020 – Pravega http://pravega.io
Writer
Writer
Writer
Writer
Source connector
(uses Pravega as a source)
Reader
Reader
Reader
Reader
Sink connector
(dumps data onto Pravega)
e.g., Flink job
output data
e.g., Flink job
input data
https://github.com/pravega/flink-connectors
Connectors
Flink Forward, April 2020 – Pravega http://pravega.io
Task Manager
Task Manager
Task Manager
Task Manager
Pravega Stream
• Reader group automatically assigns and re-balances segments
• Hides complexity from app
• Stream scaling
Pravega
Reader
Reader Groups
Flink Job
Source tasks
Pravega
ReaderSegment 2
Segment 1
Segment 3
Segment 4
Flink Forward, April 2020 – Pravega http://pravega.io
Pravega time windows
Pravega
Reader
TimeWindow getCurrentTimeWindow();
Time window
• Two attributes:
1. Lower bound
2. Upper bound
• Lower bound
• Timestamp less than or equal to the most recent time
noted (writer API)
• Upper bound
• Timestamp greater than or equal to the most recent
time noted (writer API)
Flink Forward, April 2020 – Pravega http://pravega.io
Watermarks
public abstract class LowerBoundAssigner<T>
implements AssignerWithTimeWindows<T> {
…
@Override
public abstract long extractTimestamp(T element,
long previousElementTimestamp);
// built-in watermark implementation which emits the lower bound
@Override
public Watermark getWatermark(TimeWindow timeWindow) {
if (timeWindow == null || timeWindow.isNearHeadOfStream()) {
return null;
}
return timeWindow.getLowerTimeBound() == Long.MIN_VALUE ?
new Watermark(Long.MIN_VALUE) :
new Watermark(timeWindow.getLowerTimeBound());
}
}
• Timestamp assigner
• LowerBoundAssigner
• Gets the time window
• Watermark is the lower
bound
Flink Forward, April 2020 – Pravega http://pravega.io
With a Pravega reader source
FlinkPravegaReader<Sample> source = FlinkPravegaReader.<Sample>builder()
.withPravegaConfig(pravegaConfig)
.forStream(Stream.of("scope", "stream"))
.withDeserializationSchema(…)
.withTimestampAssigner(new LowerBoundAssigner<Sample>() {
@Override
public long extractTimestamp(Sample sample,
long previousElementTimestamp) {
long timestamp = sample.getTimestamp();
return timestamp;
}
})
.build();
Flink Forward, April 2020 – Pravega http://pravega.io
Task Manager
Task Manager
Task Manager
Task Manager
Pravega Stream
Pravega
Reader
Upon a checkpoint
Segment 2
Segment 1
Segment 3
Segment 4
Flink Job
Source tasks
Pravega
Reader
Master
• Initiates checkpoint
• Invokes call on ReaderGroup
API
• Implementation of
MasterTriggerRestoreHook
1
Flink Forward, April 2020 – Pravega http://pravega.io
Upon a Flink checkpoint
Task Manager
Task Manager
Task Manager
Task Manager
Pravega Stream
Pravega
Reader
Flink Job
Source tasks
Pravega
Reader
Master
C
C
12
Revisioned
Stream
• Initiates checkpoint
• Invokes call on ReaderGroup
API
• Implementation of
MasterTriggerRestoreHook
• Coordinate via state synchronizer
• Readers emit checkpoint event
Segment 2
Segment 1
Segment 3
Segment 4
Flink Forward, April 2020 – Pravega http://pravega.io
Upon a Flink checkpoint
Task Manager
Task Manager
Task Manager
Task Manager
Pravega Stream
Pravega
Reader
Flink Job
Source tasks
Pravega
Reader
Master
C
C
12
Revisioned
Stream
• Coordinate via state synchronizer
• Readers emit checkpoint event
3
• Receives checkpoint
Segment 2
Segment 1
Segment 3
Segment 4
Flink Forward, April 2020 – Pravega http://pravega.io
Exactly-once with Transactions
Sink tasks
Flink Job
Pravega
Txn
writes
• Transactional writes for job output
• Executes a 2PC to commit results
• Option to not use transactions
• At-least-once semantics
Flink Forward, April 2020 – Pravega http://pravega.io
Exactly-once with Transactions
Flink
Master
Flink
Master
S
S
Pravega
Coordinates
checkpointing
Start
checkpoint
(Prepare)
Ack
Prepare
Complete
checkpoint
Commit txn
2-Phase commit protocol
S
S
Sink tasks Sink tasks
Flink Forward, April 2020 – Pravega http://pravega.io
Demo
Flink Forward, April 2020 – Pravega http://pravega.io
Demo setup
0.avro
1.avro
2.Avro
…
49.avro
Input
Files
Event: {
Id: String
Counter: int
Timestamp: byte[]
}
DataStream<Long> sampleStream =
env.addSource(source)
.name("Pravega Stream")
.timeWindowAll(Time.milliseconds(1))
.aggregate(...)
.writeAsText("/tmp/flink-windows.txt");
Ingested to
Pravega Stream
Flink Forward, April 2020 – Pravega http://pravega.io
Wrap up
Flink Forward, April 2020 – Pravega http://pravega.io
Everything is connected
• Relevant features for stream processing
– Exactly-once semantics
– Checkpointing
– Durability
– Scaling
– Watermarking
– … all connected in different ways
• Pravega
– Stream store, open source
– Supports all above features
• Pravega + Flink
– Source + Sink connector
– Enables reading and writing Pravega streams
Flink Forward, April 2020 – Pravega http://pravega.io
Q&A
• Email: fpj@pravega.io, tom.kaitchuck@dell.com
• Twitter: @ProjectPravega
• Pravega Web site: http://pravega.io
• Slack: pravega-io.slack.com
• GitHub: https://github.com/pravega/pravega
• Flink connector: https://github.com/pravega/flink-connectors

Contenu connexe

Tendances

Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward San Francisco 2018:  David Reniz & Dahyr Vergara - "Real-time m...Flink Forward San Francisco 2018:  David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...Flink Forward
 
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Flink Forward
 
The Past, Present, and Future of Apache Flink®
The Past, Present, and Future of Apache Flink®The Past, Present, and Future of Apache Flink®
The Past, Present, and Future of Apache Flink®Aljoscha Krettek
 
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...Flink Forward
 
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...Flink Forward
 
Apache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorApache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorAljoscha Krettek
 
Real Time Experiment Analytics at Pinterest with Apache Flink - Ben Liu & Par...
Real Time Experiment Analytics at Pinterest with Apache Flink - Ben Liu & Par...Real Time Experiment Analytics at Pinterest with Apache Flink - Ben Liu & Par...
Real Time Experiment Analytics at Pinterest with Apache Flink - Ben Liu & Par...Flink Forward
 
Maximilian Michels - Flink and Beam
Maximilian Michels - Flink and BeamMaximilian Michels - Flink and Beam
Maximilian Michels - Flink and BeamFlink Forward
 
Ted Dunning-Faster and Furiouser- Flink Drift
Ted Dunning-Faster and Furiouser- Flink DriftTed Dunning-Faster and Furiouser- Flink Drift
Ted Dunning-Faster and Furiouser- Flink DriftFlink Forward
 
Flink Forward San Francisco 2018 keynote: Srikanth Satya - "Stream Processin...
Flink Forward San Francisco 2018 keynote:  Srikanth Satya - "Stream Processin...Flink Forward San Francisco 2018 keynote:  Srikanth Satya - "Stream Processin...
Flink Forward San Francisco 2018 keynote: Srikanth Satya - "Stream Processin...Flink Forward
 
Flink Forward San Francisco 2018: Xu Yang - "Alibaba’s common algorithm platf...
Flink Forward San Francisco 2018: Xu Yang - "Alibaba’s common algorithm platf...Flink Forward San Francisco 2018: Xu Yang - "Alibaba’s common algorithm platf...
Flink Forward San Francisco 2018: Xu Yang - "Alibaba’s common algorithm platf...Flink Forward
 
Zoltán Zvara - Advanced visualization of Flink and Spark jobs

Zoltán Zvara - Advanced visualization of Flink and Spark jobs
Zoltán Zvara - Advanced visualization of Flink and Spark jobs

Zoltán Zvara - Advanced visualization of Flink and Spark jobs
Flink Forward
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used forAljoscha Krettek
 
Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.Fabian Hueske
 
A stream: Ad-hoc Shared Stream Processing - Jeyhun Karimov, DFKI GmbH
A stream: Ad-hoc Shared Stream Processing - Jeyhun Karimov, DFKI GmbH A stream: Ad-hoc Shared Stream Processing - Jeyhun Karimov, DFKI GmbH
A stream: Ad-hoc Shared Stream Processing - Jeyhun Karimov, DFKI GmbH Flink Forward
 
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...Flink Forward
 
Jamie Grier - Robust Stream Processing with Apache Flink
Jamie Grier - Robust Stream Processing with Apache FlinkJamie Grier - Robust Stream Processing with Apache Flink
Jamie Grier - Robust Stream Processing with Apache FlinkFlink Forward
 
Time Series Analysis Using an Event Streaming Platform
 Time Series Analysis Using an Event Streaming Platform Time Series Analysis Using an Event Streaming Platform
Time Series Analysis Using an Event Streaming PlatformDr. Mirko Kämpf
 
The Evolution of (Open Source) Data Processing
The Evolution of (Open Source) Data ProcessingThe Evolution of (Open Source) Data Processing
The Evolution of (Open Source) Data ProcessingAljoscha Krettek
 
Extending the Yahoo Streaming Benchmark + MapR Benchmarks
Extending the Yahoo Streaming Benchmark + MapR BenchmarksExtending the Yahoo Streaming Benchmark + MapR Benchmarks
Extending the Yahoo Streaming Benchmark + MapR BenchmarksJamie Grier
 

Tendances (20)

Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward San Francisco 2018:  David Reniz & Dahyr Vergara - "Real-time m...Flink Forward San Francisco 2018:  David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...
 
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
 
The Past, Present, and Future of Apache Flink®
The Past, Present, and Future of Apache Flink®The Past, Present, and Future of Apache Flink®
The Past, Present, and Future of Apache Flink®
 
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
 
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
 
Apache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorApache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream Processor
 
Real Time Experiment Analytics at Pinterest with Apache Flink - Ben Liu & Par...
Real Time Experiment Analytics at Pinterest with Apache Flink - Ben Liu & Par...Real Time Experiment Analytics at Pinterest with Apache Flink - Ben Liu & Par...
Real Time Experiment Analytics at Pinterest with Apache Flink - Ben Liu & Par...
 
Maximilian Michels - Flink and Beam
Maximilian Michels - Flink and BeamMaximilian Michels - Flink and Beam
Maximilian Michels - Flink and Beam
 
Ted Dunning-Faster and Furiouser- Flink Drift
Ted Dunning-Faster and Furiouser- Flink DriftTed Dunning-Faster and Furiouser- Flink Drift
Ted Dunning-Faster and Furiouser- Flink Drift
 
Flink Forward San Francisco 2018 keynote: Srikanth Satya - "Stream Processin...
Flink Forward San Francisco 2018 keynote:  Srikanth Satya - "Stream Processin...Flink Forward San Francisco 2018 keynote:  Srikanth Satya - "Stream Processin...
Flink Forward San Francisco 2018 keynote: Srikanth Satya - "Stream Processin...
 
Flink Forward San Francisco 2018: Xu Yang - "Alibaba’s common algorithm platf...
Flink Forward San Francisco 2018: Xu Yang - "Alibaba’s common algorithm platf...Flink Forward San Francisco 2018: Xu Yang - "Alibaba’s common algorithm platf...
Flink Forward San Francisco 2018: Xu Yang - "Alibaba’s common algorithm platf...
 
Zoltán Zvara - Advanced visualization of Flink and Spark jobs

Zoltán Zvara - Advanced visualization of Flink and Spark jobs
Zoltán Zvara - Advanced visualization of Flink and Spark jobs

Zoltán Zvara - Advanced visualization of Flink and Spark jobs

 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used for
 
Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.
 
A stream: Ad-hoc Shared Stream Processing - Jeyhun Karimov, DFKI GmbH
A stream: Ad-hoc Shared Stream Processing - Jeyhun Karimov, DFKI GmbH A stream: Ad-hoc Shared Stream Processing - Jeyhun Karimov, DFKI GmbH
A stream: Ad-hoc Shared Stream Processing - Jeyhun Karimov, DFKI GmbH
 
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
 
Jamie Grier - Robust Stream Processing with Apache Flink
Jamie Grier - Robust Stream Processing with Apache FlinkJamie Grier - Robust Stream Processing with Apache Flink
Jamie Grier - Robust Stream Processing with Apache Flink
 
Time Series Analysis Using an Event Streaming Platform
 Time Series Analysis Using an Event Streaming Platform Time Series Analysis Using an Event Streaming Platform
Time Series Analysis Using an Event Streaming Platform
 
The Evolution of (Open Source) Data Processing
The Evolution of (Open Source) Data ProcessingThe Evolution of (Open Source) Data Processing
The Evolution of (Open Source) Data Processing
 
Extending the Yahoo Streaming Benchmark + MapR Benchmarks
Extending the Yahoo Streaming Benchmark + MapR BenchmarksExtending the Yahoo Streaming Benchmark + MapR Benchmarks
Extending the Yahoo Streaming Benchmark + MapR Benchmarks
 

Similaire à Virtual Flink Forward 2020: Everything is connected: How watermarking, scaling, and exactly once impact one another in Pravega - Flavio Paiva Junqueira, Tom Kaitchuck

Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...
Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...
Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...Ververica
 
Reintroducing the Stream Processor: A universal tool for continuous data anal...
Reintroducing the Stream Processor: A universal tool for continuous data anal...Reintroducing the Stream Processor: A universal tool for continuous data anal...
Reintroducing the Stream Processor: A universal tool for continuous data anal...Paris Carbone
 
Flink Forward San Francisco 2018: Till Rohrmann & Flavio Junqueira - "Scaling...
Flink Forward San Francisco 2018: Till Rohrmann & Flavio Junqueira - "Scaling...Flink Forward San Francisco 2018: Till Rohrmann & Flavio Junqueira - "Scaling...
Flink Forward San Francisco 2018: Till Rohrmann & Flavio Junqueira - "Scaling...Flink Forward
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQLWSO2
 
Creating Open Data with Open Source (beta2)
Creating Open Data with Open Source (beta2)Creating Open Data with Open Source (beta2)
Creating Open Data with Open Source (beta2)Sammy Fung
 
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 KeynoteAdvanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 KeynoteStreamNative
 
Mark Hughes Annual Seminar Presentation on Open Source
Mark Hughes Annual Seminar Presentation on Open Source Mark Hughes Annual Seminar Presentation on Open Source
Mark Hughes Annual Seminar Presentation on Open Source Tracy Kent
 
Buckle Up! With Valerie Burchby and Xinran Waibe | Current 2022
Buckle Up! With Valerie Burchby and Xinran Waibe | Current 2022Buckle Up! With Valerie Burchby and Xinran Waibe | Current 2022
Buckle Up! With Valerie Burchby and Xinran Waibe | Current 2022HostedbyConfluent
 
Apache Beam (incubating)
Apache Beam (incubating)Apache Beam (incubating)
Apache Beam (incubating)Apache Apex
 
JDD2014: Real Big Data - Scott MacGregor
JDD2014: Real Big Data - Scott MacGregorJDD2014: Real Big Data - Scott MacGregor
JDD2014: Real Big Data - Scott MacGregorPROIDEA
 
Memory Forensics for IR - Leveraging Volatility to Hunt Advanced Actors
Memory Forensics for IR - Leveraging Volatility to Hunt Advanced ActorsMemory Forensics for IR - Leveraging Volatility to Hunt Advanced Actors
Memory Forensics for IR - Leveraging Volatility to Hunt Advanced ActorsJared Greenhill
 
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksOverview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksDataWorks Summit/Hadoop Summit
 
Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Overview of Apache Fink: the 4 G of Big Data Analytics FrameworksOverview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Overview of Apache Fink: the 4 G of Big Data Analytics FrameworksSlim Baltagi
 
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics FrameworksOverview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics FrameworksSlim Baltagi
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformHostedbyConfluent
 
Introduction to Apache Flink at Vienna Meet Up
Introduction to Apache Flink at Vienna Meet UpIntroduction to Apache Flink at Vienna Meet Up
Introduction to Apache Flink at Vienna Meet UpStefan Papp
 

Similaire à Virtual Flink Forward 2020: Everything is connected: How watermarking, scaling, and exactly once impact one another in Pravega - Flavio Paiva Junqueira, Tom Kaitchuck (20)

Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...
Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...
Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...
 
Reintroducing the Stream Processor: A universal tool for continuous data anal...
Reintroducing the Stream Processor: A universal tool for continuous data anal...Reintroducing the Stream Processor: A universal tool for continuous data anal...
Reintroducing the Stream Processor: A universal tool for continuous data anal...
 
Flink Forward San Francisco 2018: Till Rohrmann & Flavio Junqueira - "Scaling...
Flink Forward San Francisco 2018: Till Rohrmann & Flavio Junqueira - "Scaling...Flink Forward San Francisco 2018: Till Rohrmann & Flavio Junqueira - "Scaling...
Flink Forward San Francisco 2018: Till Rohrmann & Flavio Junqueira - "Scaling...
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL
 
Creating Open Data with Open Source (beta2)
Creating Open Data with Open Source (beta2)Creating Open Data with Open Source (beta2)
Creating Open Data with Open Source (beta2)
 
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 KeynoteAdvanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
 
Mark Hughes Annual Seminar Presentation on Open Source
Mark Hughes Annual Seminar Presentation on Open Source Mark Hughes Annual Seminar Presentation on Open Source
Mark Hughes Annual Seminar Presentation on Open Source
 
Flink. Pure Streaming
Flink. Pure StreamingFlink. Pure Streaming
Flink. Pure Streaming
 
Buckle Up! With Valerie Burchby and Xinran Waibe | Current 2022
Buckle Up! With Valerie Burchby and Xinran Waibe | Current 2022Buckle Up! With Valerie Burchby and Xinran Waibe | Current 2022
Buckle Up! With Valerie Burchby and Xinran Waibe | Current 2022
 
Apache Beam (incubating)
Apache Beam (incubating)Apache Beam (incubating)
Apache Beam (incubating)
 
Keynote session - LOD2014 W3C event
Keynote session - LOD2014 W3C eventKeynote session - LOD2014 W3C event
Keynote session - LOD2014 W3C event
 
JDD2014: Real Big Data - Scott MacGregor
JDD2014: Real Big Data - Scott MacGregorJDD2014: Real Big Data - Scott MacGregor
JDD2014: Real Big Data - Scott MacGregor
 
Memory Forensics for IR - Leveraging Volatility to Hunt Advanced Actors
Memory Forensics for IR - Leveraging Volatility to Hunt Advanced ActorsMemory Forensics for IR - Leveraging Volatility to Hunt Advanced Actors
Memory Forensics for IR - Leveraging Volatility to Hunt Advanced Actors
 
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksOverview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
 
Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Overview of Apache Fink: the 4 G of Big Data Analytics FrameworksOverview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks
 
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics FrameworksOverview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
 
Stream Scaling in Pravega
Stream Scaling in PravegaStream Scaling in Pravega
Stream Scaling in Pravega
 
FLiP Into Trino
FLiP Into TrinoFLiP Into Trino
FLiP Into Trino
 
Introduction to Apache Flink at Vienna Meet Up
Introduction to Apache Flink at Vienna Meet UpIntroduction to Apache Flink at Vienna Meet Up
Introduction to Apache Flink at Vienna Meet Up
 

Plus de Flink Forward

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Flink Forward
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkFlink Forward
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...Flink Forward
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Flink Forward
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorFlink Forward
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeFlink Forward
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Flink Forward
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkFlink Forward
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxFlink Forward
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink Forward
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraFlink Forward
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkFlink Forward
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentFlink Forward
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022Flink Forward
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink Forward
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsFlink Forward
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotFlink Forward
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesFlink Forward
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Flink Forward
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergFlink Forward
 

Plus de Flink Forward (20)

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 

Dernier

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 

Dernier (20)

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 

Virtual Flink Forward 2020: Everything is connected: How watermarking, scaling, and exactly once impact one another in Pravega - Flavio Paiva Junqueira, Tom Kaitchuck

  • 1. Everything is connected: How watermarking, scaling, and exactly once impact one another in Pravega Flavio Junqueira Senior Director, Senior Distinguished Engineer Tom Kaitchuck Distinguished Engineer
  • 2. Flink Forward, April 2020 – Pravega http://pravega.io Pravega: Storage for data streams • Pravega is – A stream store: stream is the storage primitive – The foundation is segments – Segments enable a flexible composition of streams – Segments enable watermarking, scaling, transactions, and state replication • Pravega is open source http://pravega.io https://github.com/pravega/pravega
  • 3. Flink Forward, April 2020 – Pravega http://pravega.io Data streams Data Pipelines Storage (Pravega) Stream Processor (Apache Flink) • Data Sources: • User status • Online transactions • Server telemetry data • Sensor samples • Connected cars • Drone videos • Core elements • Storage (Pravega) • Stream processor (Apache Flink) • Arbitrary Direct Acyclic Processing Graphs • Visualization • Alerts • Near real-time insights • Recommendations • Actionable instructions Landscape:
  • 4. Flink Forward, April 2020 – Pravega http://pravega.io Data Streams
  • 5. Flink Forward, April 2020 – Pravega http://pravega.io Data streams Social networks Online shopping Server monitoring IoT, Edge Stream of user events • Status updates • Online transactions Telemetry streams • CPU, memory, disk utilization Stream of sensor events • Temperature samples • Radar and image Tools and services Jira, Git, Jenkins • Logs • Results
  • 6. Flink Forward, April 2020 – Pravega http://pravega.io Drone video streams + telemetry Video stream Telemetry Drone fleet UNIFIED ANALYTICSPROCESSINGEST • From cattle health • To airplane inspection between flights Telemetry Video feed
  • 7. Flink Forward, April 2020 – Pravega http://pravega.io Industrial IoT UNIFIED ANALYTICS UNIFIED PROCESSING INGESTION ENGINE Manufacturing flow Decoiler Leveler Lubricator Press Source: https://www.dellemc.com/resources/en-us/asset/customer-profiles-case-studies/solutions/delltechnologies-customer-profile-rwth.pdf
  • 8. Flink Forward, April 2020 – Pravega http://pravega.io DATA STREM:S – SEQUENTIAL EventsServer, Sensor , etc. e1e2e3e4e5e6e7e8ekek+1ek+2ek+3 HeadTail Data streams
  • 9. Flink Forward, April 2020 – Pravega http://pravega.io DATA STREAMS - PARALLELISM e1 e2 e3 e4 e5 e6 e7 e8 e10 e11 e14 e12 e13 e15 e16 e9e18 e17 e19 e20 e21 e22 e23 e24 e26 e28 e29 e27 e25 ek ek+1 ek+2 ek+3 ek+5 ek+4 ek+6 ek+7 ek+8 ek+9 ek+10 ek+8 HeadTail Data streams: Paralelism Events Servers, Sensors, etc.
  • 10. Flink Forward, April 2020 – Pravega http://pravega.io DATA STREAMS – TRAFFIC FLUCTUATION Events Servers, Sensors, etc. e1 e2 e3 e4 e5 e6 e7 e8 e10 e11 e14 e12 e13e15 e16 e9 e18 e17 e19 e20 e21 e22 e23 e24 e26 e28 e29 e27 e25 ek ek+1 ek+2 ek+3 ek+5 ek+4 ek+6 ek+7 ek+8 ek+9 ek+10 ek+8 e30 e33 e31 e32 e39 e34 e35 e37 e36 e38 e41 e43 e40 e42 e44 Load Increases Load Drops HeadTail Data streams: Workload variations
  • 11. Flink Forward, April 2020 – Pravega http://pravega.io DATA STREAMS – TRAFFIC FLUCTUATION Events Servers, Sensors, etc. e1 e2 e3 e4 e5 e6 e7 e8 e10 e11 e14 e12 e13e15 e16 e9 e18 e17 e19 e20 e21 e22 e23 e24 e26 e28 e29 e27 e25 ek ek+1 ek+2 ek+3 ek+5 ek+4 ek+6 ek+7 ek+8 ek+9 ek+10 ek+8 e30 e33 e31 e32 e39 e34 e35 e37 e36 e38 e41 e43 e40 e42 e44 HeadTail Data streams: Unbounded Fresh, recently ingested Older, historical data The Lambda way
  • 12. Flink Forward, April 2020 – Pravega http://pravega.io DATA STREAMS – TRAFFIC FLUCTUATION Events Servers, Sensors, etc. e1 e2 e3 e4 e5 e6 e7 e8 e10 e11 e14 e12 e13e15 e16 e9 e18 e17 e19 e20 e21 e22 e23 e24 e26 e28 e29 e27 e25 ek ek+1 ek+2 ek+3 ek+5 ek+4 ek+6 ek+7 ek+8 ek+9 ek+10 ek+8 e30 e33 e31 e32 e39 e34 e35 e37 e36 e38 e41 e43 e40 e42 e44 HeadTail Data streams: Unbounded Ideally, no such a distinction between fresh and historical
  • 13. Flink Forward, April 2020 – Pravega http://pravega.io DATA STREAMS – TRAFFIC FLUCTUATION e1 e2 e3 e4 e5 e6 e7 e8 e10 e11 e14 e12 e13e15 e16 e9 e18 e17 e19 e20 e21 e22 e23 e24 e26 e28 e29 e27 e25 ek ek+1 ek+2 ek+3 ek+5 ek+4 ek+6 ek+7 ek+8 ek+9 ek+10 ek+8 e30 e33 e31 e32 e39 e34 e35 e37 e36 e38 e41 e43 e40 e42 e44 HeadTail Data streams: Read scalability Stream processor
  • 14. Flink Forward, April 2020 – Pravega http://pravega.io Stream in Streaming Storage • Unbounded data • Elastic • Consistent • Tailing and historical data analytics • Cloud native Stream out Data stream: Cloud native, storage primitive
  • 15. Flink Forward, April 2020 – Pravega http://pravega.io The recipe for effective stream processing • Exactly once processing – Don’t miss, don’t duplicate • Checkpoints – Enable rewinding • Durability – Enable replaying • Scaling – Workloads change dynamically and provisioning follow the changes • Watermarking – Advance event, ingest, processing time
  • 16. Flink Forward, April 2020 – Pravega http://pravega.io Everything is connected … Exactly-once semantics Durability CheckpointsScaling Watermarking
  • 17. Flink Forward, April 2020 – Pravega http://pravega.io Time windows Source: Emits samples, records, messages en …. e3 e2 e1 Examples: • Sensors in IoT • End users in social networks • Server metrics e3 e2 e1 Window aggregation: Count1 e5 e4 e7 Window 3 (closed) Window 6 (open) Window 9 (open) … e8 e6 e7 e5 e4 e3 e2 e1 Event time
  • 18. Flink Forward, April 2020 – Pravega http://pravega.io Time windows Source: Emits samples, records, messages en …. e3 e2 e1 Examples: • Sensors in IoT • End users in social networks • Server metrics e3 e2 e1 Window aggregation: Count1 e5 e4 e7 Window 3 (closed) Window 6 (open) Window 9 (open) … e8 e6 e7 e5 e4 e3 e2 e1 Writer needs to supply time information Event time
  • 19. Flink Forward, April 2020 – Pravega http://pravega.io Low watermark e3 e2 e1 Window aggregation: Count1 e5 e4 e7 Window 3 (closed) Window 6 (open) Window 9 (open) Event time Source: Emits events W(3)W(6) Watermark W(t) • Has an associated timestamp t • Contract: all events with a timestamp smaller than or equal to t have been received • Closes window with smaller ending timestamp • Late events violate contract en …. e3 e2 e1 … e8 | e6 e7 e5 e4 | e3 e2 e1
  • 20. Flink Forward, April 2020 – Pravega http://pravega.io Order Source en …. e3 e2 e1 … e8 | e6 e7 e5 e4 | e3 e2 e1 W(3)W(6) Source en …. e3 e2 e1 … e8 e6 | e7 e5 e4 | e3 e2 e1 W(3)W(6) • Out of order events are expected • Low watermarks must be able to accomodate out-of-orderness OK
  • 21. Flink Forward, April 2020 – Pravega http://pravega.io Global order en …. e5 e3 e1 en …. e6 e4 e2 en …. e10 e7 e4 … e15 | e13 e11 | e9 | e7 e5 | e3 e1 … e16 e14 | e12 | e10 e8 | e6 | e4 e2 … e25 | e22 | e19 | e16 | e13 | e10 | e7 | e4 W(3)W(6)W(9)W(12) W(3)W(9)W(12) W(6) W(3)W(9) W(6)W(12) Multiple sources Watermarks must reflect time progress across all sources • Requires global coordination • Aggregate watermarks • Compare timestamps from many sources • Sources need to report: • Event time • Position
  • 22. Flink Forward, April 2020 – Pravega http://pravega.io Last is 3Counter:0 Tracking source position • Source connects to stream store to append e1e4 e2e3 e4 e3 e2 e1
  • 23. Flink Forward, April 2020 – Pravega http://pravega.io e3 e2 e1 Tracking source position • Source connects to stream store to append e1e4 e2e3 • Appends e1 e2 e3 successfuly e4 e3 e2 e1 Counter:3
  • 24. Flink Forward, April 2020 – Pravega http://pravega.io e3 e2 e1 Last is 3Counter:0 Tracking source position • Source connects to stream store to append e1e4 e2e3 • Appends e1 e2 e3 successfuly • Disconnects e4 e3 e2 e1 Counter:1Counter:2Counter:3
  • 25. Flink Forward, April 2020 – Pravega http://pravega.io e3 e2 e1 Last is 3Counter:0 Tracking source position • Source connects to stream store to append e1e4 e2e3 • Appends e1 e2 e3 successfuly • Disconnects e4 e3 e2 e1 • Reconnects and determines that the last event written is e3 Counter:1Counter:2Counter:3Counter:3Last is 3
  • 26. Flink Forward, April 2020 – Pravega http://pravega.io e4 e3 e2 e1 Last is 3Counter:0 Tracking source position • Source connects to stream store to append e1e4 e2e3 • Appends e1 e2 e3 successfuly • Disconnects e4 e3 e2 e1 • Reconnects and determines that the last event written is e3 • Appends e4 Counter:1Counter:2Counter:3Counter:4
  • 27. Flink Forward, April 2020 – Pravega http://pravega.io Time windows: Accuracy e3 e2 e1 Window aggregation: Count1 e5 e4 e7 Window 3 (closed) Window 6 (open) Window 9 (open) Source: Emits events en …. e3 e2 e1 W(3)W(6) … e8 | e6 e7 e5 e4 | e3 e2 e1 Exactly once is needed for accurate windows Event time
  • 28. Flink Forward, April 2020 – Pravega http://pravega.io Upon crashes e3 e2 e1 Window aggregation: Count1 e5 e4 e7 Window 3 (closed) Window 6 (open) Window 9 (open) Event time Crash Source: Emits events en …. e3 e2 e1 W(3)W(6) … e8 | e6 e7 e5 e4 | e3 e2 e1
  • 29. Flink Forward, April 2020 – Pravega http://pravega.io Upon crashes Source: Emits events en …. e3 e2 e1 … e8 | e6 e7 e5 e4 | e3 e2 e1 W(3)W(6) Watermark W(t) • Has an associated timestamp t • Contract: all events with a timestamp smaller than or equal to t have been received • Closes window with smaller ending timestamp • Late events violate contract e3 e2 e1 Window aggregation: Count1 e5 e4 e7 Window 3 (closed) Window 6 (open) Window 9 (open) Crash
  • 30. Flink Forward, April 2020 – Pravega http://pravega.io Upon crashes Source: Emits events en …. e3 e2 e1 … e8 | e6 e7 e5 e4 | e3 e2 e1 W(3)W(6) e3 e2 e1 Window aggregation: Count1 e5 e4 e7 Window 3 (closed) Window 6 (open) Window 9 (open) • To guarantee exactly-once semantics • Connector enables Flink to track progress • Flink determines where to resume from upon recovery Crash
  • 31. Flink Forward, April 2020 – Pravega http://pravega.io Upon crashes e3 e2 e1 Window aggregation: Count1 Window 3 (closed) • Ability to rewind to a safe position • Replay events • Checkpoints Event time Source: Emits events en …. e3 e2 e1 W(3)W(6) … e8 | e6 e7 e5 e4 | e3 e2 e1
  • 32. Flink Forward, April 2020 – Pravega http://pravega.io Checkpointing • Crash failures – Hosts across the system can crash at any time – Ability to back up and re-read data. • Checkpoints – A point in the application execution – All intermediate state is persisted – Computation can be resumed from checkpoint • Implications to sources – Ability to replay (durability) – Ability to control the position and persist it … e8 | e6 e7 e5 e4 | e3 e2 e1 W(3)W(6) • Ability to rewind to a safe position • Replay events • Checkpoints
  • 33. Flink Forward, April 2020 – Pravega http://pravega.io Checkpoint complicates sinks • Sinks – Output data of a job • When rolling back to a checkpoint – Preserve exactly-once semantics – Data needs to be “unwritten” or “deduped” • Also applies to time – Output time can only advance on checkpoints
  • 34. Flink Forward, April 2020 – Pravega http://pravega.io Scaling and rebalancing Key space Worker Key space Key space Worker Worker Worker Worker Worker Worker Worker Scaling Rebalancing • Consistent assignment • Time advances consistently for watermarking • Exactly-once guarantee
  • 35. Flink Forward, April 2020 – Pravega http://pravega.io The bottom line: Computing watermarks is complicated…
  • 36. Flink Forward, April 2020 – Pravega http://pravega.io Support for watermarks in Pravega
  • 37. Flink Forward, April 2020 – Pravega http://pravega.io Segment 7 Segment 4 Segment 3 Segment 1 Segment 2 Segment 0 Time .75 .5 .25 0 t0 t1 t2 t3 Segment 6 Segment 5 t4 KeySpace Stream S 1.0
  • 38. Flink Forward, April 2020 – Pravega http://pravega.io StreamCuts Segment 7 Segment 4 Segment 3 Segment 1 Segment 2 Segment 0 Time .75 .5 .25 0 t0 t1 t2 t3 Segment 6 Segment 5 t4 KeySpace 1.0 Stream Cut
  • 39. Flink Forward, April 2020 – Pravega http://pravega.io Watermarking (writer side) Watermark State Writer Writer Writer
  • 40. Flink Forward, April 2020 – Pravega http://pravega.io Watermarking (writer side) Watermark State Writer Writer Writer
  • 41. Flink Forward, April 2020 – Pravega http://pravega.io Watermarking (writer side) Watermark State Writer Writer Writer
  • 42. Flink Forward, April 2020 – Pravega http://pravega.io Watermarking (writer side) Watermark State Writer Writer Writer
  • 43. Flink Forward, April 2020 – Pravega http://pravega.io Reader Reader Reader W W W Watermarking (reader side)
  • 44. Flink Forward, April 2020 – Pravega http://pravega.io Reader Reader Reader W W W Watermarking (reader side) Reader Reader
  • 45. Flink Forward, April 2020 – Pravega http://pravega.io Reader Reader Reader W Watermarking (reader side) W W
  • 46. Flink Forward, April 2020 – Pravega http://pravega.io Pravega + Flink
  • 47. Flink Forward, April 2020 – Pravega http://pravega.io Writer Writer Writer Writer Source connector (uses Pravega as a source) Reader Reader Reader Reader Sink connector (dumps data onto Pravega) e.g., Flink job output data e.g., Flink job input data https://github.com/pravega/flink-connectors Connectors
  • 48. Flink Forward, April 2020 – Pravega http://pravega.io Task Manager Task Manager Task Manager Task Manager Pravega Stream • Reader group automatically assigns and re-balances segments • Hides complexity from app • Stream scaling Pravega Reader Reader Groups Flink Job Source tasks Pravega ReaderSegment 2 Segment 1 Segment 3 Segment 4
  • 49. Flink Forward, April 2020 – Pravega http://pravega.io Pravega time windows Pravega Reader TimeWindow getCurrentTimeWindow(); Time window • Two attributes: 1. Lower bound 2. Upper bound • Lower bound • Timestamp less than or equal to the most recent time noted (writer API) • Upper bound • Timestamp greater than or equal to the most recent time noted (writer API)
  • 50. Flink Forward, April 2020 – Pravega http://pravega.io Watermarks public abstract class LowerBoundAssigner<T> implements AssignerWithTimeWindows<T> { … @Override public abstract long extractTimestamp(T element, long previousElementTimestamp); // built-in watermark implementation which emits the lower bound @Override public Watermark getWatermark(TimeWindow timeWindow) { if (timeWindow == null || timeWindow.isNearHeadOfStream()) { return null; } return timeWindow.getLowerTimeBound() == Long.MIN_VALUE ? new Watermark(Long.MIN_VALUE) : new Watermark(timeWindow.getLowerTimeBound()); } } • Timestamp assigner • LowerBoundAssigner • Gets the time window • Watermark is the lower bound
  • 51. Flink Forward, April 2020 – Pravega http://pravega.io With a Pravega reader source FlinkPravegaReader<Sample> source = FlinkPravegaReader.<Sample>builder() .withPravegaConfig(pravegaConfig) .forStream(Stream.of("scope", "stream")) .withDeserializationSchema(…) .withTimestampAssigner(new LowerBoundAssigner<Sample>() { @Override public long extractTimestamp(Sample sample, long previousElementTimestamp) { long timestamp = sample.getTimestamp(); return timestamp; } }) .build();
  • 52. Flink Forward, April 2020 – Pravega http://pravega.io Task Manager Task Manager Task Manager Task Manager Pravega Stream Pravega Reader Upon a checkpoint Segment 2 Segment 1 Segment 3 Segment 4 Flink Job Source tasks Pravega Reader Master • Initiates checkpoint • Invokes call on ReaderGroup API • Implementation of MasterTriggerRestoreHook 1
  • 53. Flink Forward, April 2020 – Pravega http://pravega.io Upon a Flink checkpoint Task Manager Task Manager Task Manager Task Manager Pravega Stream Pravega Reader Flink Job Source tasks Pravega Reader Master C C 12 Revisioned Stream • Initiates checkpoint • Invokes call on ReaderGroup API • Implementation of MasterTriggerRestoreHook • Coordinate via state synchronizer • Readers emit checkpoint event Segment 2 Segment 1 Segment 3 Segment 4
  • 54. Flink Forward, April 2020 – Pravega http://pravega.io Upon a Flink checkpoint Task Manager Task Manager Task Manager Task Manager Pravega Stream Pravega Reader Flink Job Source tasks Pravega Reader Master C C 12 Revisioned Stream • Coordinate via state synchronizer • Readers emit checkpoint event 3 • Receives checkpoint Segment 2 Segment 1 Segment 3 Segment 4
  • 55. Flink Forward, April 2020 – Pravega http://pravega.io Exactly-once with Transactions Sink tasks Flink Job Pravega Txn writes • Transactional writes for job output • Executes a 2PC to commit results • Option to not use transactions • At-least-once semantics
  • 56. Flink Forward, April 2020 – Pravega http://pravega.io Exactly-once with Transactions Flink Master Flink Master S S Pravega Coordinates checkpointing Start checkpoint (Prepare) Ack Prepare Complete checkpoint Commit txn 2-Phase commit protocol S S Sink tasks Sink tasks
  • 57. Flink Forward, April 2020 – Pravega http://pravega.io Demo
  • 58. Flink Forward, April 2020 – Pravega http://pravega.io Demo setup 0.avro 1.avro 2.Avro … 49.avro Input Files Event: { Id: String Counter: int Timestamp: byte[] } DataStream<Long> sampleStream = env.addSource(source) .name("Pravega Stream") .timeWindowAll(Time.milliseconds(1)) .aggregate(...) .writeAsText("/tmp/flink-windows.txt"); Ingested to Pravega Stream
  • 59. Flink Forward, April 2020 – Pravega http://pravega.io Wrap up
  • 60. Flink Forward, April 2020 – Pravega http://pravega.io Everything is connected • Relevant features for stream processing – Exactly-once semantics – Checkpointing – Durability – Scaling – Watermarking – … all connected in different ways • Pravega – Stream store, open source – Supports all above features • Pravega + Flink – Source + Sink connector – Enables reading and writing Pravega streams
  • 61. Flink Forward, April 2020 – Pravega http://pravega.io Q&A • Email: fpj@pravega.io, tom.kaitchuck@dell.com • Twitter: @ProjectPravega • Pravega Web site: http://pravega.io • Slack: pravega-io.slack.com • GitHub: https://github.com/pravega/pravega • Flink connector: https://github.com/pravega/flink-connectors