This document discusses Flink acceptance testing and state compatibility checking at Yelp. It describes how Yelp built a testing framework called Flink Compose on top of Yelp-compose to make it easier to test Flink jobs. The framework handles common tasks like setting up dependencies and allows submitting jobs to a Flink cluster. It also discusses lessons learned around ordering of operations, event vs processing time, and best practices like publishing common images and running tests in parallel. The document concludes by discussing how state compatibility checking is another important test to check for issues when restoring state from a savepoint during upgrades.
3. FLINK AT YELP
Powering
Data Enrichment and Transformation as a Service
StreamSQL manipulations and multi-stream
unwindowed joins as a service
Bot Detection
User Activity Sessions
Customized filters and ML features to provide
trustworthy data
Multi-platform user activity sessions from event logs
4. FLINK AT YELP
Powering
Connector Ecosystem
Cassandra, Elasticsearch, Redshift, S3, MySQL, etc.
Apache Beam
All python stream processing
5. Flink acceptance testing framework (a.k.a Flink Compose)
at YelpWhat you’ll see
OUTLINE
Lessons learned
State compatibility checking using Flink Compose
7. A test conducted to determine if the requirements of a
specification or contract are met
What is acceptance
testing?
ACCEPTANCE TESTING
Often involves orchestrating several services, creating
fixture data, and running some type of test driver.
8. Why is it hard?
ACCEPTANCE TESTING
Too many moving blocks!
Flink Service
Kafka
Schema
Registry
Database
Dependencies
9. Our solution?
ACCEPTANCE TESTING
Built on top of yelp-compose which provides better
integration with yelp infrastructure.
Provides a set of libraries that takes care of common tasks
Great way to verify the correctness for developers
Lower the overhead across applications
10. What does it
looks like?
ACCEPTANCE TESTING
test_script
Flink Standalone
Cluster
Dependencies
Kafka
Schema
Registration
Input Test Stream
Output Test Stream
Flink Compose Sandbox
Read
Write
Submit
*.jar job
to the
cluster
13. LESSONS LEARNED
Deterministic results using Event Time.
SELECT business_id,
COUNT(*) as review_count,
FROM biz_reviews,
GROUP BY business_id,
TUMBLE(rowtime, INTERVAL '2'
MINUTE)
Event time is the timestamp associated with the message,
maxOutOfOrderness is 30 sec.
Kafka
msg1 msg2 msg3 msg4 msg5
Event Time
14. LESSONS LEARNED
Deterministic results using Event Time.
Count # reviews for biz in 2 minute non-overlap windows.
Kafka
biz_id: 1
review_id: 1
time: 35
Event time is the timestamp associated with the message
maxOutOfOrderness is 30 sec.
biz_id: 2
review_id: 2
time: 65
biz_id: 1
review_id: 3
time: 95
biz_id: 1
review_id: 4
time: 125
Event Time
16. LESSONS LEARNED
Deterministic results using Processing Time.
SELECT business_id,
COUNT(*) as review_count,
FROM biz_reviews,
GROUP BY business_id,
TUMBLE(proctime, INTERVAL '2'
MINUTE)
Kafka
msg1 msg2 msg3 msg4 msg5
Processing Time
17. Processing Time
LESSONS LEARNED
Deterministic results using Processing Time.
Count # reviews for biz in 2 minute non-overlap windows.
Kafka
biz_id: 1
review_id: 1
time: 35
biz_id: 2
review_id: 2
time: 65
biz_id: 1
review_id: 3
time: 95
biz_id: 1
review_id: 4
time: 125
18. Processing Time
LESSONS LEARNED
biz_id: 1
review_id: 1
time: 35
biz_id: 2
review_id: 2
time: 65
biz_id: 1
review_id: 3
time: 95
biz_id: 1
review_id: 4
time: 125
Window Start
We have no control over when Flink sees the messages
Window Start
Proctime?
Wait till start
of window to
produce!
19. Best Practices
LESSONS LEARNED
Publish common testing images
Generalize common functionalities
● Setting up consumer & producer
● Schema registration
● flink-clientlib to accomodate for upgrades
Run your tests in parallel
24. Looking Forward
WHAT’S NEXT
Ensure every stateful service is guarded by
compatibility check
Leverage state API starting from Flink 1.9 for
smoother state migration
Automate test message generation
Provide test template generation
Since the introduction of Flink in 2017, real-time data processing really took off at yelp. As of now we have tens of services running thousands of Flink jobs that drive all aspects of business value.
The last thing we want is for any of them to catch fire. No piece of software is problem free, but there are things that we can do to minimize them.
According to wikipedia …, in the context of software developing ...
This provides us a sanitized environment to run tests in a repeatable manner without polluting production traffic.
maxOutOfOrderness is 30 seconds
Stateful applications have yet another dimension to test on -- Can I successfully resume from my savepoint if I switch to a new version of my app. Without the ability to control what’s stored in the state, it’s sometimes pretty hard for us to reason about is certain changes can break state compatibility. We need a mechanism to catch such changes.
By extending the existing acceptance testing framework, this is as easy as adding another test.