45. Natural Choice: Apache Kafka
- Low latency and high throughput
- Persistent events
- Distributes a topic by partitions
- Groups consumers by consumer groups
56. Why Apache Samza?
- DAG on Kafka
- Excellent integration with Kafka
- Built-in checkpointing
- Built-in state management
- Excellent support from our data team
81. We Use Lambda
- Spark + HDFS/S3 for batch processing
- Yes, it is painful, but
- We may need to go way back due to change of business
requirements
- Batch process can run faster — they scale differently
- It was not easy to start a new stream processing instance
84. Dealing with Limitation of Samza
-No broadcasting. We have to override
SystemStreamPartitionGrouper
-No dynamic topology. Can’t have arbitrary number of
nested CEP queries
-Tedious configuration and deployment of jobs. In house
code-gem and deployment solution