Storm makes it easy to write and scale complex realtime computations on a cluster of computers, doing for realtime processing what Hadoop did for batch processing. Storm guarantees that every message will be processed. And it’s fast — you can process millions of messages per second with a small cluster. Best of all, you can write Storm topologies using any programming language. Storm was open-sourced by Twitter in September of 2011 and has since been adopted by many companies around the world.
Storm has a wide range of use cases, from stream processing to continuous computation to distributed RPC. In this talk I'll introduce Storm and show how easy it is to use for realtime computation.
2. Basic info
• Open sourced September 19th
• Implementation is 12,000 lines of code
• Used by over 25 companies
• >2280 watchers on Github (most watched
JVM project)
• Very active mailing list
• >1700 messages
• >520 members
26. Stream grouping
• Shuffle grouping: pick a random task
• Fields grouping: mod hashing on a
subset of tuple fields
• All grouping: send to all tasks
• Global grouping: pick task with lowest id
72. • Will be available in next version of Storm
(0.7.0)
• Requires a source queue that can replay
identical batches of messages
• Aiming for first TransactionalSpout
implementation to use Kafka
Transactional topologies