Flink acceptance testing and state compatibility checking

Flink acceptance testing
And
State compatibility checking
Catlyn Kong catlynk@yelp.com

Yelp’s Mission
Connecting
people with great
local businesses

FLINK AT YELP
Powering
Data Enrichment and Transformation as a Service
StreamSQL manipulations and multi-stream
unwindowed joins as a service
Bot Detection
User Activity Sessions
Customized filters and ML features to provide
trustworthy data
Multi-platform user activity sessions from event logs

FLINK AT YELP
Powering
Connector Ecosystem
Cassandra, Elasticsearch, Redshift, S3, MySQL, etc.
Apache Beam
All python stream processing

Flink acceptance testing framework (a.k.a Flink Compose)
at YelpWhat you’ll see
OUTLINE
Lessons learned
State compatibility checking using Flink Compose

A test conducted to determine if the requirements of a
specification or contract are met
What is acceptance
testing?
ACCEPTANCE TESTING
Often involves orchestrating several services, creating
fixture data, and running some type of test driver.

Why is it hard?
ACCEPTANCE TESTING
Too many moving blocks!
Flink Service
Kafka
Schema
Registry
Database
Dependencies

Our solution?
ACCEPTANCE TESTING
Built on top of yelp-compose which provides better
integration with yelp infrastructure.
Provides a set of libraries that takes care of common tasks
Great way to verify the correctness for developers
Lower the overhead across applications

What does it
looks like?
ACCEPTANCE TESTING
test_script
Flink Standalone
Cluster
Dependencies
Kafka
Schema
Registration
Input Test Stream
Output Test Stream
Flink Compose Sandbox
Read
Write
Submit
*.jar job
to the
cluster

Ordering
of Operations
LESSONS LEARNED
Make sure the assumption of ordering is met
Test
Kafka
Flink app
1) Write 3) Read
2) Process
Ordering of
operations is
important

LESSONS LEARNED
Deterministic results using Event Time.
SELECT business_id,
COUNT(*) as review_count,
FROM biz_reviews,
GROUP BY business_id,
TUMBLE(rowtime, INTERVAL '2'
MINUTE)
Event time is the timestamp associated with the message,
maxOutOfOrderness is 30 sec.
Kafka
msg1 msg2 msg3 msg4 msg5
Event Time

LESSONS LEARNED
Deterministic results using Event Time.
Count # reviews for biz in 2 minute non-overlap windows.
Kafka
biz_id: 1
review_id: 1
time: 35
Event time is the timestamp associated with the message
maxOutOfOrderness is 30 sec.
biz_id: 2
review_id: 2
time: 65
biz_id: 1
review_id: 3
time: 95
biz_id: 1
review_id: 4
time: 125
Event Time

LESSONS LEARNED
Event Time
currentWatermark = highestTimestamp - maxOutOfOrderness
biz_id: 1
review_id: 1
time: 35
t
cW=5 cW=35 cW=65 cW=95
biz_id: 1
# review: 1
window: [0, 120)
biz_id: 2
review_id: 2
time: 65
biz_id: 2
# review: 1
window: [0, 120)
biz_id: 1
review_id: 3
time: 95
biz_id: 1
# review: 2
window: [0, 120)
biz_id: 1
review_id: 4
time: 125
biz_id: 1
# review: 1
window: [120, 240)
biz_id: 1
review_id: 4
time: 155
cW=125
biz_id: 1
# review: 1
window: [120, 240)
Event time?
Careful with
watermark
manipulation!

LESSONS LEARNED
Deterministic results using Processing Time.
SELECT business_id,
COUNT(*) as review_count,
FROM biz_reviews,
GROUP BY business_id,
TUMBLE(proctime, INTERVAL '2'
MINUTE)
Kafka
msg1 msg2 msg3 msg4 msg5
Processing Time

Processing Time
LESSONS LEARNED
Deterministic results using Processing Time.
Count # reviews for biz in 2 minute non-overlap windows.
Kafka
biz_id: 1
review_id: 1
time: 35
biz_id: 2
review_id: 2
time: 65
biz_id: 1
review_id: 3
time: 95
biz_id: 1
review_id: 4
time: 125

Processing Time
LESSONS LEARNED
biz_id: 1
review_id: 1
time: 35
biz_id: 2
review_id: 2
time: 65
biz_id: 1
review_id: 3
time: 95
biz_id: 1
review_id: 4
time: 125
Window Start
We have no control over when Flink sees the messages
Window Start
Proctime?
Wait till start
of window to
produce!

Best Practices
LESSONS LEARNED
Publish common testing images
Generalize common functionalities
● Setting up consumer & producer
● Schema registration
● flink-clientlib to accomodate for upgrades
Run your tests in parallel

Another dimension
STATE COMPATIBILITY

State compatibility
checking
STATE COMPATIBILITY
test_script
Submit
job_1 with
v1 of the
service
Cancel
job_1 with
savepoint
Submit
job_2 with
v2 of the
service
Check for
potential
issues of
state
restoration
Just another test

CI/CD integration
STATE COMPATIBILITY

Looking Forward
WHAT’S NEXT
Ensure every stateful service is guarded by
compatibility check
Leverage state API starting from Flink 1.9 for
smoother state migration
Automate test message generation
Provide test template generation

@YelpEngineering
fb.com/YelpEngineers
engineeringblog.yelp.com
github.com/yelp

Questions/Suggestions?
catlynk@yelp.com

Flink acceptance testing and state compatibility checking

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Flink acceptance testing and state compatibility checking

Similaire à Flink acceptance testing and state compatibility checking (20)

Plus de Flink Forward

Plus de Flink Forward (20)

Dernier

Dernier (20)

Flink acceptance testing and state compatibility checking

Notes de l'éditeur