23. Sentence
Spout
Split
Sentence
Bolt
Word
Count
Bolt
["the cow jumped
over the moon"]
[“cow”]
[“the”]
["jumped”]
["over”]
["the”]
["moon”]
["the”,1]
["jumped”,1]
["cow”,1]
["the”,2]
["over”,1]
["moon”,1]
Storm considers a tuple coming off a spout
"fully processed" when the tuple tree has been exhausted
and every message in the tree has been processed
tuple tree
🐂 ⽓气冲天
24. Sentence
Spout
Split
Sentence
Bolt
Word
Count
Bolt
["the cow jumped
over the moon"]
[“cow”]
[“the”]
["jumped”]
["over”]
["the”]
["moon”]
["the”,1]
["jumped”,1]
["cow”,1]
["the”,2]
["over”,1]
["moon”,1]
collector.emit("split", new Values("the cow jumped over the moon"), 1)
msgIdstream-id
used for identify tuple lateremit a tuple to one of output streams
Tuple Lifecycle(API Layer)
a tuple coming off of a spout
25. Sentence
Spout
Split
Sentence
Bolt
Word
Count
Bolt
["the cow jumped
over the moon"]
[“cow”]
[“the”]
["jumped”]
["over”]
["the”]
["moon”]
["the”,1]
["jumped”,1]
["cow”,1]
["the”,2]
["over”,1]
["moon”,1]
collector.emit("split", new Values("the cow jumped over the moon"), 1)
tuple tree fully processed
Tuple Lifecycle(API Layer)
w’ll talk about later
26. Sentence
Spout
Split
Sentence
Bolt
Word
Count
Bolt
["the cow jumped
over the moon"]
[“cow”]
[“the”]
["jumped”]
["over”]
["the”]
["moon”]
["the”,1]
["jumped”,1]
["cow”,1]
["the”,2]
["over”,1]
["moon”,1]
collector.emit("split", new Values("the cow jumped over the moon"), 1)
tuple tree failed(time-out)
×
×
Tuple Lifecycle(API Layer)
29. 1. tell Storm whenever you're creating a new link in the tree of tuples
2. tell Storm when you have finished processing an individual tuple
1. can detect when the tree of tuples is fully processed
2. can ack or fail the spout tuple appropriately.
Sentence
Spout
Split
Sentence
Bolt
Word
Count
Bolt
["the cow jumped
over the moon"]
[“cow”]
[“the”]
["jumped”]
["over”]
["the”]
["moon”]
["the”,1]
["jumped”,1]
["cow”,1]
["the”,2]
["over”,1]
["moon”,1]
anchored
anchored
anchored
anchored
anchored
anchored
each word tuple is anchored by sentence tuple
Storm:
YOU:
spout tuple
word tuple
Tuple Lifecycle(Program Layer)
Kestrel
/Kafka
["the cow jumped over the moon"]
input tuple
output tuple
input tuple
output tuple
30. 1. tell Storm whenever you're creating a new link in the tree of tuples
2. tell Storm when you have finished processing an individual tuple
1. can detect when the tree of tuples is fully processed
2. can ack or fail the spout tuple appropriately.
Sentence
Spout
Split
Sentence
Bolt
Word
Count
Bolt
[“cow”]
[“the”]
["jumped”]
["over”]
["the”]
["moon”]
["the”,1]
["jumped”,1]
["cow”,1]
["the”,2]
["over”,1]
["moon”,1]
anchored
anchored
anchored
anchored
anchored
anchored
each word-count tuple is anchored by word tuple
Storm:
YOU:
anchored
anchored
anchored
anchored
anchored
anchored
["the cow jumped
over the moon"]
word-count tuple
Tuple Lifecycle(Program Layer)
Kestrel
/Kafka
["the cow jumped over the moon"]
word tuple
input tuple output tuple
input tuple
output tuple
31. 1. tell Storm whenever you're creating a new link in the tree of tuples
2. tell Storm when you have finished processing an individual tuple
1. can detect when the tree of tuples is fully processed
2. can ack or fail the spout tuple appropriately.
Sentence
Spout
Split
Sentence
Bolt
Word
Count
Bolt
[“cow”]
[“the”]
["jumped”]
["over”]
["the”]
["moon”]
["the”,1]
["jumped”,1]
["cow”,1]
["the”,2]
["over”,1]
["moon”,1]
anchored
anchored
anchored
anchored
anchored
anchored
Storm:
YOU:
anchored
anchored
anchored
anchored
anchored
anchored
["the cow jumped
over the moon"]
✅
Tuple Lifecycle(Program Layer)
Kestrel
/Kafka
["the cow jumped over the moon"]
ack word tuple: [“the”]
32. 1. tell Storm whenever you're creating a new link in the tree of tuples
2. tell Storm when you have finished processing an individual tuple
1. can detect when the tree of tuples is fully processed
2. can ack or fail the spout tuple appropriately.
Sentence
Spout
Split
Sentence
Bolt
Word
Count
Bolt
[“cow”]
[“the”]
["jumped”]
["over”]
["the”]
["moon”]
["the”,1]
["jumped”,1]
["cow”,1]
["the”,2]
["over”,1]
["moon”,1]
anchored
anchored
anchored
anchored
anchored
anchored
Storm:
YOU:
anchored
anchored
anchored
anchored
anchored
anchored
["the cow jumped
over the moon"]
✅
✅
Tuple Lifecycle(Program Layer)
Kestrel
/Kafka
["the cow jumped over the moon"]
ack word tuple: [“cow”]
33. 1. tell Storm whenever you're creating a new link in the tree of tuples
2. tell Storm when you have finished processing an individual tuple
1. can detect when the tree of tuples is fully processed
2. can ack or fail the spout tuple appropriately.
Sentence
Spout
Split
Sentence
Bolt
Word
Count
Bolt
[“cow”]
[“the”]
["jumped”]
["over”]
["the”]
["moon”]
["the”,1]
["jumped”,1]
["cow”,1]
["the”,2]
["over”,1]
["moon”,1]
anchored
anchored
anchored
anchored
anchored
anchored
Storm:
YOU:
anchored
anchored
anchored
anchored
anchored
anchored
["the cow jumped
over the moon"]
✅
✅
✅
✅
✅
✅
Tuple Lifecycle(Program Layer)
Kestrel
/Kafka
["the cow jumped over the moon"]
ack word tuple: [“moon”]
34. 1. tell Storm whenever you're creating a new link in the tree of tuples
2. tell Storm when you have finished processing an individual tuple
1. can detect when the tree of tuples is fully processed
2. can ack or fail the spout tuple appropriately.
Sentence
Spout
Split
Sentence
Bolt
Word
Count
Bolt
[“cow”]
[“the”]
["jumped”]
["over”]
["the”]
["moon”]
["the”,1]
["jumped”,1]
["cow”,1]
["the”,2]
["over”,1]
["moon”,1]
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
["the cow jumped
over the moon"]
✅
✅
✅
✅
✅
✅
Storm:
YOU:
Tuple Lifecycle(Program Layer)
Kestrel
/Kafka
["the cow jumped over the moon"]
✅
ack sentence tuple: [“the cow jumped over the moon”]
the input tuple is acked after all the word tuples are emitted
input tuple
word tuples
35. 1. tell Storm whenever you're creating a new link in the tree of tuples
2. tell Storm when you have finished processing an individual tuple
1. can detect when the tree of tuples is fully processed
2. can ack or fail the spout tuple appropriately.
Sentence
Spout
Split
Sentence
Bolt
Word
Count
Bolt
[“cow”]
[“the”]
["jumped”]
["over”]
["the”]
["moon”]
["the”,1]
["jumped”,1]
["cow”,1]
["the”,2]
["over”,1]
["moon”,1]
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
["the cow jumped
over the moon"]
✅
✅
✅
✅
✅
✅
Storm:
YOU:
Kestrel
/Kafka
tuple tree full processed
ack(msgId=1)
Tuple Lifecycle(Program Layer)
["the cow jumped over the moon"]
✅
36. 1. tell Storm whenever you're creating a new link in the tree of tuples
2. tell Storm when you have finished processing an individual tuple
1. can detect when the tree of tuples is fully processed
2. can ack or fail the spout tuple appropriately.
Sentence
Spout
Split
Sentence
Bolt
Word
Count
Bolt
[“cow”]
[“the”]
["jumped”]
["over”]
["the”]
["moon”]
["the”,1]
["jumped”,1]
["cow”,1]
["the”,2]
["over”,1]
["moon”,1]
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
["the cow jumped
over the moon"]
✅
✅
✅
✅
✅
✅
Storm:
YOU:
Kestrel
/Kafka
tuple tree full processed
ack(msgId=1)
Tuple Lifecycle(Program Layer)
✅
37. 1. tell Storm whenever you're creating a new link in the tree of tuples
2. tell Storm when you have finished processing an individual tuple
1. can detect when the tree of tuples is fully processed
2. can ack or fail the spout tuple appropriately.
Sentence
Spout
Split
Sentence
Bolt
Word
Count
Bolt
[“cow”]
[“the”]
["jumped”]
["over”]
["the”]
["moon”]
["the”,1]
["jumped”,1]
["cow”,1]
["the”,2]
["over”,1]
["moon”,1]
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
["the cow jumped
over the moon"]
✅
✅
✅
✅
✅
Storm:
YOU:
Kestrel
/Kafka
Tuple Lifecycle(Program Layer)
Since the word tuple is anchored,
the spout tuple at the root of the tree
w’be replayed later on if the word tuple
failed to be processed downstream
["the cow jumped over the moon"]
tuple tree failed
fail(msgId=1)
××
this.collector.fail(tuple)
41. ONE MORE THING
+ reading an input tuple,
+ emitting tuples based on it
+ and then acking the tuple
at the end of the execute()
Every tuple you process must be acked or failed.
Storm uses memory to track each tuple, so if you don't
ack/fail every tuple, the task will eventually run OOM.
STORM DO IT FOR YOU!
YOU DON’T NEED Attention
Anchor & Ack Anymore ✅
66. Spout Bolt1 Bolt2 Bolt3
AckerBolt
tuple1 tuple2 tuple3
ack_value
tuple1 tuple2 tuple3
ack()/fail()?
em
it
ack
emit
em
it
ack
ack
🌲
67. Spout Bolt1 Bolt2 Bolt3
AckerBolt
tuple1 tuple2 tuple3
ack_value
tuple1 tuple2 tuple3
ack()/fail()?
em
it
ack
emit
em
it
ack
ack
🌲
68. Spout Bolt1 Bolt2 Bolt3
AckerBolt
tuple1 tuple2 tuple3
ack_value
tuple1 tuple2 tuple3
ack()/fail()?
em
it
ack
emit
em
it
ack
ack
🌲
69. Spout Bolt1 Bolt2 Bolt3
AckerBolt
tuple1 tuple2 tuple3
ack_value
tuple1 tuple2 tuple3
ack()/fail()?
em
it
ack
emit
em
it
ack
ack
🌲
70. Spout Bolt1 Bolt2 Bolt3
AckerBolt
tuple1 tuple2 tuple3
ack_value
tuple1 tuple2 tuple3
ack()/fail()?
em
it
ack
emit
em
it
ack
ack
🌲
71. Spout Bolt1 Bolt2 Bolt3
AckerBolt
tuple1 tuple2 tuple3
ack_value
tuple1 tuple2 tuple3
ack()/fail()?
em
it
ack
emit
em
it
ack
ack
🌲
72. Spout Bolt1 Bolt2 Bolt3
AckerBolt
tuple1 tuple2 tuple3
ack_value
tuple1 tuple2 tuple3
ack()/fail()?
em
it
ack
emit
em
it
ack
ack
🌲
73. How Storm Implements Acker…
How does Storm implement reliability in an efficient way?
74. A Storm topology has a set of special "acker" tasks
that track the DAG of tuples for every spout tuple.
When an acker sees that a DAG is complete,
it sends a message to the spout task
that created the spout tuple to ack the message.
1. Acker can have many tasks just like Spout/Bolt
2. DAG of tuples is a Tuple Tree which
3. generate by Spout #tuple(by one of Spout task)
4. The Spout #tuple associated with a MessageId
5. When all tuples on Tuple Tree are full processed
6. Acker send a message to the Spout task on #3
7. Spout can ack the Message along with #tuple
75. 理解Storm可靠性的最好的⽅方法是来看看tuple和tuple树的⽣生命周期,当⼀一个tuple被创建,不管是spout还是bolt创建
的,它会被赋予⼀一个64位的id,⽽而acker就是利⽤用这个id去跟踪所有tuple的。每个tuple知道它的祖宗的id(从spout
发出来的那个tuple的id,⼀一棵tuple树的root tuple-id是固定的), 每当你新发射⼀一个tuple, 它的祖宗id都会传给这个
新的tuple。当⼀一个tuple被ack的时候,会发⼀一个消息给acker,告诉acker这个tuple树发⽣生了怎么样的变化。
具体来说就是它告诉acker: 我已经完成了,我有这些⼉儿⼦子tuple, 你跟踪⼀一下他们吧。
The best way to understand Storm's reliability implement is to look at the lifecycle of tuples and tuple DAGs.
When a tuple is created in a topology, whether in a spout or a bolt, it is given a random 64 bit id. These ids are
used by ackers to track the tuple DAG for every spout tuple.
Every tuple knows the ids of all the spout tuples for which it exists in their tuple trees. When you emit a new
tuple in a bolt, the spout tuple ids from the tuple's anchors are copied into the new tuple. When a tuple is
acked, it sends a message to the appropriate acker tasks with information about how the tuple tree changed.
In particular it tells the acker "I am now completed within the tree for this spout tuple, and here are the
new tuples in the tree that were anchored to me".
76. When a tuple is acked, it sends a message to the appropriate acker tasks
with information about how the tuple tree changed. In particular it tells the acker
"I am now completed within the tree for this spout tuple,
and here are the new tuples in the tree that were anchored to me"
For example, if tuples "D" and "E" were created based on tuple "C",
here's how the tuple tree changes when "C" is acked:
Since "C" is removed from the tree at the same time that "D" and "E" are added to it,
the tree can never be prematurely completed.
1. Bolt emit 时不会向Acker发送消息,Bolt ack 时才会向Acker发送消息
2. ack时知道要ack的input tuple的id和emit时产⽣生的所有output tuple的ids
3. 所以ack时可以把input tuple id和emit的所有output tuple ids先计算好后
才向Acker发送消息
4. Acker收到Bolt的ack消息,将当前的ack val和收到的ack消息进⾏行计算,
得到的结果表⽰示tuple树的变化情况
5. Bolt⼀一旦对input tuple进⾏行ack后,从当前input tuple⼀一直回溯到
root tuple都不再需要保存相关信息
只需要在Acker中保存最新emit出来的output tuples
为什么不需要记录祖先tuple-id(不仅仅是spout tuple id,也包括上游输⼊入tuple)