SlideShare une entreprise Scribd logo
1  sur  100
Télécharger pour lire hors ligne
Storm In Pictures
http://zqhxuyuan.github.io/
2016-7-15
Storm基本构件(What Makes Storm)
DAG
Tuple Tuple Tuple Tuple Tuple
Stream
Spout Bolt
Topology、Stream、Spout、Bolt
network of spouts and bolts
DAG
Topology、Stream、Spout、Bolt
unbounded sequence of tuples
Tuple Tuple Tuple Tuple Tuple Tuple Tuple Tuple Tuple
Topology、Stream、Spout、Bolt
Source of Stream
Topology、Stream、Spout、Bolt
Processes input streams,Produces new streams Sink
Topology、Stream、Spout、Bolt
Processes input streams,Produces new streams
Message/Tuple Transform
Tuple
Tuple
Tuple
Tuple
Tuple
Tuple
Tuple
Tuple
Tuple
Tuple
Tuple
Tuple
Tuple
Tuple
Tuple
Tuple
Tuple
Tuple
Tuple
Tuple
Tuple
Tuple
Tuple
Tuple
⼀一个Tuple的⽣生命周期
1. Spout发射出去
2. 在Stream中流动
3. 被Bolt处理计算
4. 由Bolt再次发送
5. 再次进⼊入消息流
6. 直到被完全处理
①
②
③
④
⑤
⑥
Tuple
Tuple
Tuple
Tuple
Tuple
Tuple
✖️
✖️
✖️
✖️
✖️
Guaranteeing Message Processing
1. At Least Once: Acker
2. Exactly Once: Trident
如果消息处理失败,Storm如何做到消息被重新处理?
Sentence
Spout
Split
Sentence
Bolt
Word
Count
Bolt
["the cow jumped
over the moon"]
[“cow”]
[“the”]
["jumped”]
["over”]
["the”]
["moon”]
["the”,1]
["jumped”,1]
["cow”,1]
["the”,2]
["over”,1]
["moon”,1]
Storm considers a tuple coming off a spout
"fully processed" when the tuple tree has been exhausted
and every message in the tree has been processed
tuple tree
🐂 ⽓气冲天
Sentence
Spout
Split
Sentence
Bolt
Word
Count
Bolt
["the cow jumped
over the moon"]
[“cow”]
[“the”]
["jumped”]
["over”]
["the”]
["moon”]
["the”,1]
["jumped”,1]
["cow”,1]
["the”,2]
["over”,1]
["moon”,1]
collector.emit("split", new Values("the cow jumped over the moon"), 1)
msgIdstream-id
used for identify tuple lateremit a tuple to one of output streams
Tuple Lifecycle(API Layer)
a tuple coming off of a spout
Sentence
Spout
Split
Sentence
Bolt
Word
Count
Bolt
["the cow jumped
over the moon"]
[“cow”]
[“the”]
["jumped”]
["over”]
["the”]
["moon”]
["the”,1]
["jumped”,1]
["cow”,1]
["the”,2]
["over”,1]
["moon”,1]
collector.emit("split", new Values("the cow jumped over the moon"), 1)
tuple tree fully processed
Tuple Lifecycle(API Layer)
w’ll talk about later
Sentence
Spout
Split
Sentence
Bolt
Word
Count
Bolt
["the cow jumped
over the moon"]
[“cow”]
[“the”]
["jumped”]
["over”]
["the”]
["moon”]
["the”,1]
["jumped”,1]
["cow”,1]
["the”,2]
["over”,1]
["moon”,1]
collector.emit("split", new Values("the cow jumped over the moon"), 1)
tuple tree failed(time-out)
×
×
Tuple Lifecycle(API Layer)
Sentence
Spout
Split
Sentence
Bolt
Word
Count
Bolt
Kestrel
/Kafka
Sentence
Spout
Split
Sentence
Bolt
Word
Count
Bolt
Kestrel
/Kafka
Sentence
Spout
Split
Sentence
Bolt
Word
Count
Bolt
Kestrel
/Kafka
Sentence
Spout
Split
Sentence
Bolt
Word
Count
Bolt
Kestrel
/Kafka
Sentence
Spout
Split
Sentence
Bolt
Word
Count
Bolt
Kestrel
/Kafka
ack(1)
tuple’s mesgId=1
take the message off the queue
Tuple Lifecycle(State Machine)
Sentence
Spout
Split
Sentence
Bolt
Word
Count
Bolt
Kestrel
/Kafka
Sentence
Spout
Split
Sentence
Bolt
Word
Count
Bolt
Kestrel
/Kafka
Sentence
Spout
Split
Sentence
Bolt
Word
Count
Bolt
Kestrel
/Kafka
Sentence
Spout
Split
Sentence
Bolt
Word
Count
Bolt
Kestrel
/Kafka
Sentence
Spout
Split
Sentence
Bolt
Word
Count
Bolt
Kestrel
/Kafka
×
put the message back on the queue fail(1)
tuple’s mesgId=1
Tuple Lifecycle(State Machine)
1. tell Storm whenever you're creating a new link in the tree of tuples
2. tell Storm when you have finished processing an individual tuple
1. can detect when the tree of tuples is fully processed
2. can ack or fail the spout tuple appropriately.
Sentence
Spout
Split
Sentence
Bolt
Word
Count
Bolt
["the cow jumped
over the moon"]
[“cow”]
[“the”]
["jumped”]
["over”]
["the”]
["moon”]
["the”,1]
["jumped”,1]
["cow”,1]
["the”,2]
["over”,1]
["moon”,1]
anchored
anchored
anchored
anchored
anchored
anchored
each word tuple is anchored by sentence tuple
Storm:
YOU:
spout tuple
word tuple
Tuple Lifecycle(Program Layer)
Kestrel
/Kafka
["the cow jumped over the moon"]
input tuple
output tuple
input tuple
output tuple
1. tell Storm whenever you're creating a new link in the tree of tuples
2. tell Storm when you have finished processing an individual tuple
1. can detect when the tree of tuples is fully processed
2. can ack or fail the spout tuple appropriately.
Sentence
Spout
Split
Sentence
Bolt
Word
Count
Bolt
[“cow”]
[“the”]
["jumped”]
["over”]
["the”]
["moon”]
["the”,1]
["jumped”,1]
["cow”,1]
["the”,2]
["over”,1]
["moon”,1]
anchored
anchored
anchored
anchored
anchored
anchored
each word-count tuple is anchored by word tuple
Storm:
YOU:
anchored
anchored
anchored
anchored
anchored
anchored
["the cow jumped
over the moon"]
word-count tuple
Tuple Lifecycle(Program Layer)
Kestrel
/Kafka
["the cow jumped over the moon"]
word tuple
input tuple output tuple
input tuple
output tuple
1. tell Storm whenever you're creating a new link in the tree of tuples
2. tell Storm when you have finished processing an individual tuple
1. can detect when the tree of tuples is fully processed
2. can ack or fail the spout tuple appropriately.
Sentence
Spout
Split
Sentence
Bolt
Word
Count
Bolt
[“cow”]
[“the”]
["jumped”]
["over”]
["the”]
["moon”]
["the”,1]
["jumped”,1]
["cow”,1]
["the”,2]
["over”,1]
["moon”,1]
anchored
anchored
anchored
anchored
anchored
anchored
Storm:
YOU:
anchored
anchored
anchored
anchored
anchored
anchored
["the cow jumped
over the moon"]
✅
Tuple Lifecycle(Program Layer)
Kestrel
/Kafka
["the cow jumped over the moon"]
ack word tuple: [“the”]
1. tell Storm whenever you're creating a new link in the tree of tuples
2. tell Storm when you have finished processing an individual tuple
1. can detect when the tree of tuples is fully processed
2. can ack or fail the spout tuple appropriately.
Sentence
Spout
Split
Sentence
Bolt
Word
Count
Bolt
[“cow”]
[“the”]
["jumped”]
["over”]
["the”]
["moon”]
["the”,1]
["jumped”,1]
["cow”,1]
["the”,2]
["over”,1]
["moon”,1]
anchored
anchored
anchored
anchored
anchored
anchored
Storm:
YOU:
anchored
anchored
anchored
anchored
anchored
anchored
["the cow jumped
over the moon"]
✅
✅
Tuple Lifecycle(Program Layer)
Kestrel
/Kafka
["the cow jumped over the moon"]
ack word tuple: [“cow”]
1. tell Storm whenever you're creating a new link in the tree of tuples
2. tell Storm when you have finished processing an individual tuple
1. can detect when the tree of tuples is fully processed
2. can ack or fail the spout tuple appropriately.
Sentence
Spout
Split
Sentence
Bolt
Word
Count
Bolt
[“cow”]
[“the”]
["jumped”]
["over”]
["the”]
["moon”]
["the”,1]
["jumped”,1]
["cow”,1]
["the”,2]
["over”,1]
["moon”,1]
anchored
anchored
anchored
anchored
anchored
anchored
Storm:
YOU:
anchored
anchored
anchored
anchored
anchored
anchored
["the cow jumped
over the moon"]
✅
✅
✅
✅
✅
✅
Tuple Lifecycle(Program Layer)
Kestrel
/Kafka
["the cow jumped over the moon"]
ack word tuple: [“moon”]
1. tell Storm whenever you're creating a new link in the tree of tuples
2. tell Storm when you have finished processing an individual tuple
1. can detect when the tree of tuples is fully processed
2. can ack or fail the spout tuple appropriately.
Sentence
Spout
Split
Sentence
Bolt
Word
Count
Bolt
[“cow”]
[“the”]
["jumped”]
["over”]
["the”]
["moon”]
["the”,1]
["jumped”,1]
["cow”,1]
["the”,2]
["over”,1]
["moon”,1]
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
["the cow jumped
over the moon"]
✅
✅
✅
✅
✅
✅
Storm:
YOU:
Tuple Lifecycle(Program Layer)
Kestrel
/Kafka
["the cow jumped over the moon"]
✅
ack sentence tuple: [“the cow jumped over the moon”]
the input tuple is acked after all the word tuples are emitted
input tuple
word tuples
1. tell Storm whenever you're creating a new link in the tree of tuples
2. tell Storm when you have finished processing an individual tuple
1. can detect when the tree of tuples is fully processed
2. can ack or fail the spout tuple appropriately.
Sentence
Spout
Split
Sentence
Bolt
Word
Count
Bolt
[“cow”]
[“the”]
["jumped”]
["over”]
["the”]
["moon”]
["the”,1]
["jumped”,1]
["cow”,1]
["the”,2]
["over”,1]
["moon”,1]
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
["the cow jumped
over the moon"]
✅
✅
✅
✅
✅
✅
Storm:
YOU:
Kestrel
/Kafka
tuple tree full processed
ack(msgId=1)
Tuple Lifecycle(Program Layer)
["the cow jumped over the moon"]
✅
1. tell Storm whenever you're creating a new link in the tree of tuples
2. tell Storm when you have finished processing an individual tuple
1. can detect when the tree of tuples is fully processed
2. can ack or fail the spout tuple appropriately.
Sentence
Spout
Split
Sentence
Bolt
Word
Count
Bolt
[“cow”]
[“the”]
["jumped”]
["over”]
["the”]
["moon”]
["the”,1]
["jumped”,1]
["cow”,1]
["the”,2]
["over”,1]
["moon”,1]
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
["the cow jumped
over the moon"]
✅
✅
✅
✅
✅
✅
Storm:
YOU:
Kestrel
/Kafka
tuple tree full processed
ack(msgId=1)
Tuple Lifecycle(Program Layer)
✅
1. tell Storm whenever you're creating a new link in the tree of tuples
2. tell Storm when you have finished processing an individual tuple
1. can detect when the tree of tuples is fully processed
2. can ack or fail the spout tuple appropriately.
Sentence
Spout
Split
Sentence
Bolt
Word
Count
Bolt
[“cow”]
[“the”]
["jumped”]
["over”]
["the”]
["moon”]
["the”,1]
["jumped”,1]
["cow”,1]
["the”,2]
["over”,1]
["moon”,1]
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
["the cow jumped
over the moon"]
✅
✅
✅
✅
✅
Storm:
YOU:
Kestrel
/Kafka
Tuple Lifecycle(Program Layer)
Since the word tuple is anchored,
the spout tuple at the root of the tree
w’be replayed later on if the word tuple
failed to be processed downstream
["the cow jumped over the moon"]
tuple tree failed
fail(msgId=1)
××
this.collector.fail(tuple)
Sentence
Spout
Split
Sentence
Bolt
Word
Count
Bolt
[“cow”]
[“the”]
["jumped”]
["over”]
["the”]
["moon”]
["the”,1]
["jumped”,1]
["cow”,1]
["the”,2]
["over”,1]
["moon”,1]
["the cow jumped
over the moon"]
Kestrel
/Kafka
["the cow jumped over the moon"]
Sentence
Spout
Split
Sentence
Bolt
Word
Count
Bolt
[“cow”]
[“the”]
["jumped”]
["over”]
["the”]
["moon”]
["the”,1]
["jumped”,1]
["cow”,1]
["the”,2]
["over”,1]
["moon”,1]
["the cow jumped
over the moon"]
Kestrel
/Kafka
×
×
×
tuple1
tuple2
tuple3
input tuple
output tuple
multi-anchored tuple
tuple1
tuple2
tuple3
×
tuple1
tuple2
tuple3
replay…tuple3 failed
ONE MORE THING
+ reading an input tuple,
+ emitting tuples based on it
+ and then acking the tuple
at the end of the execute()
Every tuple you process must be acked or failed.
Storm uses memory to track each tuple, so if you don't
ack/fail every tuple, the task will eventually run OOM.
STORM DO IT FOR YOU!
YOU DON’T NEED Attention
Anchor & Ack Anymore ✅
Acker
Spout数据源发射⼀一个Tuple,怎么算被完全处理?
Spout Bolt1 Bolt2 Bolt3tuple1
tuple1
SentenceSpout
tuple1 tuple3
SplitBolt
["the cow jumped..”] tuple4
tuple2
tuple6
tuple7
tuple5
tuple3
tuple4
tuple2
[“the”]
["cow”]
["jumped”]
["cow”,1]
["the”,1]
["jumped”,1]
["the cow jumped.”]
tuple6
tuple7
tuple5
WordCountBolt PrintBolt
Tuple Tree 🌲
在Spout中发射⼀一个新的源Tuple时,
可以为该源Tuple指定⼀一个MessageId。
多个源Tuple可以共⽤用同⼀一个MessageId,
表⽰示多个源Tuple组成同⼀一个消息单元,
它们会被放到同⼀一棵Tuple树中
tuple1
tuple2
Spout
tuple1 tuple3
Bolt1
tuple2 tuple4
Bolt2
tuple3 tuple5
tuple4 tuple6
Bolt3
Bolt4
tuple5
tuple6
Bolt5
collector.emit(new Values(tuple1), Message1);
collector.emit(new Values(tuple2), Message1);
collector.emit(new Values(tuple1), Message1);
collector.emit(new Values(tuple2), Message2);
Tuple Tree 🌲 🌲
🌲
Message1
1. Spout中Message1绑定了tuple1和tuple2(同⼀一个MessageId)
2. tuple1发送给Bolt1处理,tuple2发送给Bolt2处理
3. Bolt1处理tuple1⽣生成tuple3,Bolt2处理tuple2⽣生成tuple4
4. Bolt1⽣生成的tuple3流向Bolt3,Bolt2⽣生成的tuple4流向Bolt4
5. Bolt3处理tuple3⽣生成tuple5,Bolt4处理tuple4⽣生成tuple6
6. Bolt3⽣生成的tuple5和Bolt4⽣生成的tuple6都流向了同⼀一个Bolt5
7. Bolt5处理完tuple5和tuple6,表⽰示Message1被完全处理了
tuple1
tuple2
Spout
tuple1 tuple3
Bolt1
tuple2 tuple4
Bolt2
tuple3 tuple5
tuple4 tuple6
Bolt3
Bolt4
tuple5
tuple6
Bolt5
1. Spout中Message1绑定了tuple1和tuple2(同⼀一个MessageId)
2. tuple1发送给Bolt1处理,tuple2发送给Bolt2处理
3. Bolt1处理tuple1⽣生成tuple3,Bolt2处理tuple2⽣生成tuple4
4. Bolt1⽣生成的tuple3流向Bolt3,Bolt2⽣生成的tuple4流向Bolt4
5. Bolt3处理tuple3⽣生成tuple5,Bolt4处理tuple4⽣生成tuple6
6. Bolt3⽣生成的tuple5和Bolt4⽣生成的tuple6都流向了同⼀一个Bolt5
7. Bolt5处理完tuple5和tuple6,表⽰示Message1被完全处理了
tuple1
tuple2
Spout
tuple1 tuple3
Bolt1
tuple2 tuple4
Bolt2
tuple3 tuple5
tuple4 tuple6
Bolt3
Bolt4
tuple5
tuple6
Bolt5
tuple1
tuple2
Spout
tuple1 tuple3
Bolt1
tuple2 tuple4
Bolt2
tuple3 tuple5
tuple4 tuple6
Bolt3
Bolt4
tuple5
tuple6
Bolt5
1. Spout中Message1绑定了tuple1和tuple2(同⼀一个MessageId)
2. tuple1发送给Bolt1处理,tuple2发送给Bolt2处理
3. Bolt1处理tuple1⽣生成tuple3,Bolt2处理tuple2⽣生成tuple4
4. Bolt1⽣生成的tuple3流向Bolt3,Bolt2⽣生成的tuple4流向Bolt4
5. Bolt3处理tuple3⽣生成tuple5,Bolt4处理tuple4⽣生成tuple6
6. Bolt3⽣生成的tuple5和Bolt4⽣生成的tuple6都流向了同⼀一个Bolt5
7. Bolt5处理完tuple5和tuple6,表⽰示Message1被完全处理了
tuple1
tuple2
Spout
tuple1 tuple3
Bolt1
tuple2 tuple4
Bolt2
tuple3 tuple5
tuple4 tuple6
Bolt3
Bolt4
tuple5
tuple6
Bolt5
1. Spout中Message1绑定了tuple1和tuple2(同⼀一个MessageId)
2. tuple1发送给Bolt1处理,tuple2发送给Bolt2处理
3. Bolt1处理tuple1⽣生成tuple3,Bolt2处理tuple2⽣生成tuple4
4. Bolt1⽣生成的tuple3流向Bolt3,Bolt2⽣生成的tuple4流向Bolt4
5. Bolt3处理tuple3⽣生成tuple5,Bolt4处理tuple4⽣生成tuple6
6. Bolt3⽣生成的tuple5和Bolt4⽣生成的tuple6都流向了同⼀一个Bolt5
7. Bolt5处理完tuple5和tuple6,表⽰示Message1被完全处理了
tuple1
tuple2
Spout
tuple1 tuple3
Bolt1
tuple2 tuple4
Bolt2
tuple3 tuple5
tuple4 tuple6
Bolt3
Bolt4
tuple5
tuple6
Bolt5
1. Spout中Message1绑定了tuple1和tuple2(同⼀一个MessageId)
2. tuple1发送给Bolt1处理,tuple2发送给Bolt2处理
3. Bolt1处理tuple1⽣生成tuple3,Bolt2处理tuple2⽣生成tuple4
4. Bolt1⽣生成的tuple3流向Bolt3,Bolt2⽣生成的tuple4流向Bolt4
5. Bolt3处理tuple3⽣生成tuple5,Bolt4处理tuple4⽣生成tuple6
6. Bolt3⽣生成的tuple5和Bolt4⽣生成的tuple6都流向了同⼀一个Bolt5
7. Bolt5处理完tuple5和tuple6,表⽰示Message1被完全处理了
tuple1
tuple2
Spout
tuple1 tuple3
Bolt1
tuple2 tuple4
Bolt2
tuple3 tuple5
tuple4 tuple6
Bolt3
Bolt4
tuple5
tuple6
Bolt5
1. Spout中Message1绑定了tuple1和tuple2(同⼀一个MessageId)
2. tuple1发送给Bolt1处理,tuple2发送给Bolt2处理
3. Bolt1处理tuple1⽣生成tuple3,Bolt2处理tuple2⽣生成tuple4
4. Bolt1⽣生成的tuple3流向Bolt3,Bolt2⽣生成的tuple4流向Bolt4
5. Bolt3处理tuple3⽣生成tuple5,Bolt4处理tuple4⽣生成tuple6
6. Bolt3⽣生成的tuple5和Bolt4⽣生成的tuple6都流向了同⼀一个Bolt5
7. Bolt5处理完tuple5和tuple6,表⽰示Message1被完全处理了
tuple1
tuple2
Spout
tuple1 tuple3
Bolt1
tuple2 tuple4
Bolt2
tuple3 tuple5
tuple4 tuple6
Bolt3
Bolt4
tuple5
tuple6
Bolt5
1. Spout中Message1绑定了tuple1和tuple2(同⼀一个MessageId)
2. tuple1发送给Bolt1处理,tuple2发送给Bolt2处理
3. Bolt1处理tuple1⽣生成tuple3,Bolt2处理tuple2⽣生成tuple4
4. Bolt1⽣生成的tuple3流向Bolt3,Bolt2⽣生成的tuple4流向Bolt4
5. Bolt3处理tuple3⽣生成tuple5,Bolt4处理tuple4⽣生成tuple6
6. Bolt3⽣生成的tuple5和Bolt4⽣生成的tuple6都流向了同⼀一个Bolt5
7. Bolt5处理完tuple5和tuple6,表⽰示Message1被完全处理了
Message1 ✅
Spout Bolt1 Bolt2 Bolt3tuple1 tuple2 tuple3
完全处理: 源Tuple以及由该源Tuple衍⽣生的所有Tuple都经过了Topology中每⼀一个应该到达的Bolt的处理
tuple1
tuple1 tuple2
tuple2 tuple3
tuple3
Spout发射Tuple
Bolt1接收Tuple1
Bolt1处理Tuple1
Bolt1发射Tuple2
Bolt2接收Tuple2
Bolt2处理Tuple2
Bolt2发射Tuple3
Bolt3接收Tuple3
Bolt3处理Tuple3
spout-tuple-1 processed table:只有全部为Y,才表⽰示完全处理
Spout Bolt1 Bolt2 Bolt3tuple1 tuple2tuple1
tuple1
tuple2
Spout Bolt1 Bolt2 Bolt3tuple1 tuple2 tuple3tuple1
tuple1 tuple2
tuple2 tuple3
✅ × ×
×
×
✅
Spout Bolt1 Bolt2 Bolt3tuple1 tuple24tuple1
tuple23
tuple25
tuple26
tuple22
tuple21
tuple27
……
……
tuple33
tuple32
tuple34
tuple35
tuple31
What would spout-tuple-1 processing table like?
A REALLY LARGE/HUGE TABLE!!!
Spout Bolt1 Bolt2 Bolt3
AckerBolt
tuple1 tuple2 tuple3
ack_value
tuple1 tuple2 tuple3
ack()/fail()?
em
it
ack
emit
em
it
ack
ack
Acker组件:跟踪Spout发出的每⼀一个Tuple的Tuple🌲
🌲
1. emit(tuple, …)
2. ack(tuple)
Solution1:拉链式
Spout Bolt1 Bolt2 Bolt3
AckerBolt
tuple1 tuple2 tuple3
ack_value
tuple1 tuple2 tuple3
ack()/fail()?
em
it
ack
emit
em
it
ack
ack
🌲
Spout Bolt1 Bolt2 Bolt3
AckerBolt
tuple1 tuple2 tuple3
ack_value
tuple1 tuple2 tuple3
ack()/fail()?
em
it
ack
emit
em
it
ack
ack
🌲
Spout Bolt1 Bolt2 Bolt3
AckerBolt
tuple1 tuple2 tuple3
ack_value
tuple1 tuple2 tuple3
ack()/fail()?
em
it
ack
emit
em
it
ack
ack
🌲
Spout Bolt1 Bolt2 Bolt3
AckerBolt
tuple1 tuple2 tuple3
ack_value
tuple1 tuple2 tuple3
ack()/fail()?
em
it
ack
emit
em
it
ack
ack
🌲
Spout Bolt1 Bolt2 Bolt3
AckerBolt
tuple1 tuple2 tuple3
ack_value
tuple1 tuple2 tuple3
ack()/fail()?
em
it
ack
emit
em
it
ack
ack
🌲
Spout Bolt1 Bolt2 Bolt3
AckerBolt
tuple1 tuple2 tuple3
ack_value
tuple1 tuple2 tuple3
ack()/fail()?
em
it
ack
emit
em
it
ack
ack
🌲
Spout Bolt1 Bolt2 Bolt3
AckerBolt
tuple1 tuple2 tuple3
ack_value
tuple1 tuple2 tuple3
ack()/fail()?
em
it
ack
emit
em
it
ack
ack
🌲
Solution1:渐进式
Spout Bolt1 Bolt2 Bolt3
AckerBolt
tuple1 tuple2 tuple3
ack_value
tuple1 tuple2 tuple3
ack()/fail()?
em
it
ack
emit
em
it
ack
ack
🌲
Spout Bolt1 Bolt2 Bolt3
AckerBolt
tuple1 tuple2 tuple3
ack_value
tuple1 tuple2 tuple3
ack()/fail()?
em
it
ack
emit
em
it
ack
ack
🌲
Spout Bolt1 Bolt2 Bolt3
AckerBolt
tuple1 tuple2 tuple3
ack_value
tuple1 tuple2 tuple3
ack()/fail()?
em
it
ack
emit
em
it
ack
ack
🌲
Spout Bolt1 Bolt2 Bolt3
AckerBolt
tuple1 tuple2 tuple3
ack_value
tuple1 tuple2 tuple3
ack()/fail()?
em
it
ack
emit
em
it
ack
ack
🌲
Spout Bolt1 Bolt2 Bolt3
AckerBolt
tuple1 tuple2 tuple3
ack_value
tuple1 tuple2 tuple3
ack()/fail()?
em
it
ack
emit
em
it
ack
ack
🌲
Spout Bolt1 Bolt2 Bolt3
AckerBolt
tuple1 tuple2 tuple3
ack_value
tuple1 tuple2 tuple3
ack()/fail()?
em
it
ack
emit
em
it
ack
ack
🌲
Spout Bolt1 Bolt2 Bolt3
AckerBolt
tuple1 tuple2 tuple3
ack_value
tuple1 tuple2 tuple3
ack()/fail()?
em
it
ack
emit
em
it
ack
ack
🌲
How Storm Implements Acker…
How does Storm implement reliability in an efficient way?
A Storm topology has a set of special "acker" tasks
that track the DAG of tuples for every spout tuple.
When an acker sees that a DAG is complete,
it sends a message to the spout task
that created the spout tuple to ack the message.
1. Acker can have many tasks just like Spout/Bolt
2. DAG of tuples is a Tuple Tree which
3. generate by Spout #tuple(by one of Spout task)
4. The Spout #tuple associated with a MessageId
5. When all tuples on Tuple Tree are full processed
6. Acker send a message to the Spout task on #3
7. Spout can ack the Message along with #tuple
理解Storm可靠性的最好的⽅方法是来看看tuple和tuple树的⽣生命周期,当⼀一个tuple被创建,不管是spout还是bolt创建
的,它会被赋予⼀一个64位的id,⽽而acker就是利⽤用这个id去跟踪所有tuple的。每个tuple知道它的祖宗的id(从spout
发出来的那个tuple的id,⼀一棵tuple树的root tuple-id是固定的), 每当你新发射⼀一个tuple, 它的祖宗id都会传给这个
新的tuple。当⼀一个tuple被ack的时候,会发⼀一个消息给acker,告诉acker这个tuple树发⽣生了怎么样的变化。
具体来说就是它告诉acker: 我已经完成了,我有这些⼉儿⼦子tuple, 你跟踪⼀一下他们吧。
The best way to understand Storm's reliability implement is to look at the lifecycle of tuples and tuple DAGs.
When a tuple is created in a topology, whether in a spout or a bolt, it is given a random 64 bit id. These ids are
used by ackers to track the tuple DAG for every spout tuple.
Every tuple knows the ids of all the spout tuples for which it exists in their tuple trees. When you emit a new
tuple in a bolt, the spout tuple ids from the tuple's anchors are copied into the new tuple. When a tuple is
acked, it sends a message to the appropriate acker tasks with information about how the tuple tree changed.
In particular it tells the acker "I am now completed within the tree for this spout tuple, and here are the
new tuples in the tree that were anchored to me".
When a tuple is acked, it sends a message to the appropriate acker tasks
with information about how the tuple tree changed. In particular it tells the acker
"I am now completed within the tree for this spout tuple,
and here are the new tuples in the tree that were anchored to me"
For example, if tuples "D" and "E" were created based on tuple "C",
here's how the tuple tree changes when "C" is acked:
Since "C" is removed from the tree at the same time that "D" and "E" are added to it,
the tree can never be prematurely completed.
1. Bolt emit 时不会向Acker发送消息,Bolt ack 时才会向Acker发送消息
2. ack时知道要ack的input tuple的id和emit时产⽣生的所有output tuple的ids
3. 所以ack时可以把input tuple id和emit的所有output tuple ids先计算好后
才向Acker发送消息
4. Acker收到Bolt的ack消息,将当前的ack val和收到的ack消息进⾏行计算,
得到的结果表⽰示tuple树的变化情况
5. Bolt⼀一旦对input tuple进⾏行ack后,从当前input tuple⼀一直回溯到
root tuple都不再需要保存相关信息
只需要在Acker中保存最新emit出来的output tuples
为什么不需要记录祖先tuple-id(不仅仅是spout tuple id,也包括上游输⼊入tuple)
Spout Bolt1 Bolt2 Bolt3
AckerBolt
tuple1 tuple2 tuple3
ack_value
tuple1 tuple2 tuple3
ack()/fail()?
em
it
ack
ack
ack
Acker组件:跟踪Spout发出的每⼀一个Tuple的Tuple🌲
🌲
1. emit(tuple, …)
2. ack(tuple)
emit emit
Spout Bolt1 Bolt2 Bolt3
AckerBolt
tuple1 tuple2 tuple3
ack_value
tuple1 tuple2 tuple3
ack()/fail()?
em
it
ack
ack
ack
🌲
emit emit
tuple1
Spout Bolt1 Bolt2 Bolt3
AckerBolt
tuple1 tuple2 tuple3
ack_value
tuple1 tuple2 tuple3
ack()/fail()?
em
it
ack
ack
ack
🌲
emit emit
tuple1 tuple2
×
×
×
×:表⽰示⽗父tuple已经完成,Acker需要跟踪⼦子tuples
Spout Bolt1 Bolt2 Bolt3
AckerBolt
tuple1 tuple2 tuple3
ack_value
tuple1 tuple2 tuple3
ack()/fail()?
em
it
ack
ack
ack
🌲
emit emit
tuple1 tuple2
× × tuple3
× ×
× ×
Spout Bolt1 Bolt2 Bolt3
AckerBolt
tuple1 tuple2 tuple3
ack_value
tuple1 tuple2 tuple3
ack()/fail()?
em
it
ack
ack
ack
🌲
emit emit
tuple1 tuple2
× × tuple3
× × ×
×
× ××
Spout Bolt1 Bolt2 Bolt3
AckerBolt
tuple1 tuple2 tuple3
ack_value
tuple1 tuple2 tuple3
ack()/fail()?
em
it
ack
ack
ack
🌲
emit emit
tuple1 tuple2
× × tuple3
× × ×
×
× ××
✅
⼀一点代数知识
⾃自⼰己和⾃自⼰己^异或^⼀一定等于0
0000
^ 0000
———
0000
0
0
1
1
0
1
1
0^
100 1
0001
^ 0001
———
0000
0010
^ 0010
———
0000
0011
^ 0011
———
0000
0100
^ 0100
———
0000
010100110110010011
^ 010100110110010011
———————————
000000000000000000
两个不相同(不是⾃自⼰己和⾃自⼰己)异或不为0
0000
^ 0001
———
0001
0001
^ 1001
———
1000
0010
^ 0110
———
0100
0011
^ 0010
———
0001
1100
^ 0100
———
1000
010100110110010011
^ 010100111110010011
———————————
000000001000000000
0
1
0
1
0
1
1
0
0
1
那么有没有办法得到0呢?
0000
^ 0001
———
0001
0001
^ 1100
———
1101
1101
^ 0010
———
1111
1111
^ 1001
———
0110
0110
^ 0110
———
0000
0^X1=X1
X1^X2=X3
X3^X4=X5
X5^X6=X7
X7^X7= 0
X1
X1
X2
X3
X4
X5
X6
X7
X7
⾃自⼰己和⾃自⼰己异或⼀一定等于0
0001 1100
0000
0001 1101
0010
11010001
1111
1111
1001
0110
0110
0110
0000
X1 X2 X4 X6 X7
Spout Bolt10001 Bolt21010 Bolt30011
Spout/Bolt发射Tuple时都会为Tuple⽣生成⼀一个ID
Spout/Bolt有往下游发射Tuple,必须有Bolt接收
最后⼀一个Bolt没有发射Tuple,表⽰示Topology结束
0001 1010 0011
发射 接收 发射 接收 发射 接收
0001 1010 00110001 1010 0011^ ^ ^ ^ ^
( )0001 1010 00110001 1010 0011^ ^ ^ ^ ^( ) ( )
0000 0000 0000
^ ^
0000
tuple1 tuple1 tuple2 tuple3tuple2 tuple3
0001 1010 00110001 1010 0011^ ^ ^ ^ ^
( )0001 1010 00110001 1010 0011^ ^ ^ ^ ^( ) ( )
0000 0000 0000
^ ^
0000
Spout Bolt10001 Bolt21010 Bolt300110001 1010 0011
Spout发射⼀一个Tuple,id=0001,Acker跟踪此spout tuple
tuple1 tuple1 tuple2 tuple3tuple2 tuple3
0001 1010 00110001 1010 0011^ ^ ^ ^ ^
( )0001 1010 00110001 1010 0011^ ^ ^ ^ ^( ) ( )
0000 0000 0000
^ ^
0000
Spout Bolt10001 Bolt21010 Bolt300110001 1010 0011
Bolt1接收到Spout发射的input tuple,但还没有处理,不会和Acker通信
tuple1 tuple1 tuple2 tuple3tuple2 tuple3
0001 1010 00110001 1010 0011^ ^ ^ ^ ^
( )0001 1010 00110001 1010 0011^ ^ ^ ^ ^( ) ( )
0000 0000 0000
^ ^
0000
Spout Bolt10001 Bolt21010 Bolt300110001 1010 0011
Bolt1发射新的Tuple:1010,并且对input tuple=tuple1进⾏行ack,会和Acker通信
tuple1 tuple1 tuple2 tuple3tuple2 tuple3
0001 1010 00110001 1010 0011^ ^ ^ ^ ^
( )0001 1010 00110001 1010 0011^ ^ ^ ^ ^( ) ( )
0000 0000 0000
^ ^
0000
Spout Bolt10001 Bolt21010 Bolt300110001 1010 0011
Acker中只会保留新⽣生成的⼦子tuple=tuple2的id,祖先tuple ids不会记录
tuple1 tuple1 tuple2 tuple3tuple2 tuple3
0001 1010 00110001 1010 0011^ ^ ^ ^ ^
( )0001 1010 00110001 1010 0011^ ^ ^ ^ ^( ) ( )
0000 0000 0000
^ ^
0000
Spout Bolt10001 Bolt21010 Bolt300110001 1010 0011
Bolt2接收tuple2,处理tuple2,发射⼦子tuple=tuple3,ack(tuple2)
tuple1 tuple1 tuple2 tuple3tuple2 tuple3
0001 1010 00110001 1010 0011^ ^ ^ ^ ^
( )0001 1010 00110001 1010 0011^ ^ ^ ^ ^( ) ( )
0000 0000 0000
^ ^
0000
Spout Bolt10001 Bolt21010 Bolt300110001 1010 0011
Acker中只会保留新⽣生成的⼦子tuple=tuple3的id,祖先tuple ids不会记录
tuple1 tuple1 tuple2 tuple3tuple2 tuple3
0001 1010 00110001 1010 0011^ ^ ^ ^ ^
( )0001 1010 00110001 1010 0011^ ^ ^ ^ ^( ) ( )
0000 0000 0000
^ ^
0000
Spout Bolt10001 Bolt21010 Bolt300110001 1010 0011
Bolt3接收tuple3,处理tuple3,不再发射新tuple,ack(tuple3)
tuple1 tuple1 tuple2 tuple3tuple2 tuple3
0001 1010 00110001 1010 0011^ ^ ^ ^ ^
( )0001 1010 00110001 1010 0011^ ^ ^ ^ ^( ) ( )
0000 0000 0000
^ ^
0000
Spout Bolt10001 Bolt21010 Bolt300110001 1010 0011
tuple1 tuple1 tuple2 tuple3tuple2 tuple3
没有新⽣生成的tuple,Acker的ack_val=0,表⽰示TupleTree完全处理
✅
(spout-tuple-id, tmp-ack-val)
tmp-ack-val = spout-tuple-id ^ (child-tuple-id1 ^ child-tuple-id2 ... )
tmp-ack-val是要ack的tuple的id与由它新创建的所有的tuple的id异或的结果
以spout产⽣生spout-tuple-id为例(tuple1),Bolt1产⽣生bolt1-tuple-id(tuple2),
Bolt2产⽣生bolt2-tuple-id(tuple3),Bolt3不产⽣生tuple。
Spout发射Tuple1,Acker记录tuple1的id,⽤用于跟踪spout-tuple
tmp-ack-val = spout-tuple-id
Bolt1处理Spout的tuple1,发射tuple2,并ack Spout的tuple1
tmp-ack-val = spout-tuple-id ^ (spout-tuple-id ^ bolt1-tuple-id)
= (spout-tuple-id ^ spout-tuple-id) ^ bolt1-tuple-id
= 0 ^ bolt1-tuple-id
= bolt1-tuple-id
Bolt2处理Bolt1的tuple2,发射tuple3,并ack Bolt1的tuple2
tmp-ack-val = spout-tuple-id ^ (spout-tuple-id ^ bolt1-tuple-id) ^ (bolt1-tuple-id ^ bolt2-tuple-id)
= (spout-tuple-id ^ spout-tuple-id) ^ (bolt1-tuple-id ^ bolt1-tuple-id) ^ bolt2-tuple-id
= 0 ^ 0 ^ bolt2-tuple-id
= bolt2-tuple-id
Bolt3处理Bolt2的tuple3,不发射tuple,并ack Bolt2的tuple3
tmp-ack-val = spout-tuple-id ^ (spout-tuple-id ^ bolt1-tuple-id) ^ (bolt1-tuple-id ^ bolt2-tuple-id) ^ bolt2-tuple-id
= (spout-tuple-id ^ spout-tuple-id) ^ (bolt1-tuple-id ^ bolt1-tuple-id) ^ (bolt2-tuple-id ^ bolt2-tuple-id)
= 0 ^ 0 ^ 0
= 0
Spout Bolt1 Bolt2 Bolt3
Acker
Task1
tuple11 tuple12 tuple13
ack_value
tuple11 tuple12 tuple13
ack()/fail()?
em
it
ack
ack
ack
🌲
emit emit
Acker
Task2
Acker
Task3
Spout Bolt1 Bolt2 Bolt3
Acker
Task2
tuple21 tuple22 tuple23
ack_value
tuple21 tuple22 tuple23
ack()/fail()?
em
it
ack
ack
ack
🌲
emit emit
Acker
Task1
Acker
Task3
Spout Bolt1 Bolt2 Bolt3
Acker
Task3
tuple31 tuple32 tuple33
ack_value
tuple31 tuple32 tuple23
ack()/fail()?
em
it
ack
ack
ack
🌲
emit emit
Acker
Task1
Acker
Task2
Spout Bolt1 Bolt2 Bolt3
Acker
Task1
tuple11 tuple12 tuple13
ack_value
tuple11 tuple12 tuple13
ack()/fail()?
em
it
ack
ack
ack
🌲
emit emit
Acker
Task2
Acker
Task2
1. 当⼀一个tuple需要ack时,它到底应该选择哪个Acker来发送这个信息
2. Acker是怎么知道每⼀一个spout tuple应该交给哪个Spout task来处理
1. 设置Config.TOPOLOGY_ACKERS=1或者更⼤大,默认⼀一个Worker⼀一个Acker
2. 在发射tuple的时候指定messageId来达到跟踪某个特定的Spout tuple的⺫⽬目的
3. 对⼀一个tuple树的所有Tuple执⾏行成功都很关⼼心,发射这些tuple时anchor它们
Spout ack(msgId) different from Bolt ack(tuple)
What We Should Do When We Want Use Reliability Of Storm Acker
参考⽂文档
http://blog.csdn.net/zhangzhebjut/article/details/38467145
http://storm.apache.org/releases/1.0.1/Guaranteeing-message-processing.html
http://www.cnblogs.com/foreach-break/p/storm_at_least_once.html
http://blog.jassassin.com/2014/10/22/storm/storm-ack/

Contenu connexe

En vedette (17)

Presentacion Informatica
Presentacion InformaticaPresentacion Informatica
Presentacion Informatica
 
Alyay Presentation
Alyay PresentationAlyay Presentation
Alyay Presentation
 
We didn't watch tv
We didn't watch tvWe didn't watch tv
We didn't watch tv
 
Khmer culture, civilization (part6)
Khmer culture, civilization (part6)Khmer culture, civilization (part6)
Khmer culture, civilization (part6)
 
Question 7
Question 7Question 7
Question 7
 
Genre
GenreGenre
Genre
 
Potential technogies
Potential technogiesPotential technogies
Potential technogies
 
Bahan kuliah 6
Bahan kuliah 6Bahan kuliah 6
Bahan kuliah 6
 
Brand audit catalogue
Brand audit catalogueBrand audit catalogue
Brand audit catalogue
 
Lesson 1
Lesson 1Lesson 1
Lesson 1
 
Khmer culture, civilization (part1)
Khmer culture, civilization (part1)Khmer culture, civilization (part1)
Khmer culture, civilization (part1)
 
Detroit ELEVATE Track 2
Detroit ELEVATE Track 2Detroit ELEVATE Track 2
Detroit ELEVATE Track 2
 
35kr no.94
35kr no.9435kr no.94
35kr no.94
 
Think2015 Google
Think2015 GoogleThink2015 Google
Think2015 Google
 
Bahan kuliah 5
Bahan kuliah 5Bahan kuliah 5
Bahan kuliah 5
 
Sqa sg sheet
Sqa sg sheetSqa sg sheet
Sqa sg sheet
 
Bahan kuliah materi 8
Bahan kuliah materi 8Bahan kuliah materi 8
Bahan kuliah materi 8
 

Dernier

Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 

Dernier (20)

Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 

understand Storm in pictures

  • 2.
  • 3.
  • 4. Storm基本构件(What Makes Storm) DAG Tuple Tuple Tuple Tuple Tuple Stream Spout Bolt
  • 6. Topology、Stream、Spout、Bolt unbounded sequence of tuples Tuple Tuple Tuple Tuple Tuple Tuple Tuple Tuple Tuple
  • 11. Tuple
  • 12. Tuple
  • 13. Tuple
  • 14. Tuple
  • 15. Tuple
  • 16. Tuple
  • 21. Tuple Tuple Tuple Tuple Tuple Tuple ⼀一个Tuple的⽣生命周期 1. Spout发射出去 2. 在Stream中流动 3. 被Bolt处理计算 4. 由Bolt再次发送 5. 再次进⼊入消息流 6. 直到被完全处理 ① ② ③ ④ ⑤ ⑥
  • 22. Tuple Tuple Tuple Tuple Tuple Tuple ✖️ ✖️ ✖️ ✖️ ✖️ Guaranteeing Message Processing 1. At Least Once: Acker 2. Exactly Once: Trident 如果消息处理失败,Storm如何做到消息被重新处理?
  • 23. Sentence Spout Split Sentence Bolt Word Count Bolt ["the cow jumped over the moon"] [“cow”] [“the”] ["jumped”] ["over”] ["the”] ["moon”] ["the”,1] ["jumped”,1] ["cow”,1] ["the”,2] ["over”,1] ["moon”,1] Storm considers a tuple coming off a spout "fully processed" when the tuple tree has been exhausted and every message in the tree has been processed tuple tree 🐂 ⽓气冲天
  • 24. Sentence Spout Split Sentence Bolt Word Count Bolt ["the cow jumped over the moon"] [“cow”] [“the”] ["jumped”] ["over”] ["the”] ["moon”] ["the”,1] ["jumped”,1] ["cow”,1] ["the”,2] ["over”,1] ["moon”,1] collector.emit("split", new Values("the cow jumped over the moon"), 1) msgIdstream-id used for identify tuple lateremit a tuple to one of output streams Tuple Lifecycle(API Layer) a tuple coming off of a spout
  • 25. Sentence Spout Split Sentence Bolt Word Count Bolt ["the cow jumped over the moon"] [“cow”] [“the”] ["jumped”] ["over”] ["the”] ["moon”] ["the”,1] ["jumped”,1] ["cow”,1] ["the”,2] ["over”,1] ["moon”,1] collector.emit("split", new Values("the cow jumped over the moon"), 1) tuple tree fully processed Tuple Lifecycle(API Layer) w’ll talk about later
  • 26. Sentence Spout Split Sentence Bolt Word Count Bolt ["the cow jumped over the moon"] [“cow”] [“the”] ["jumped”] ["over”] ["the”] ["moon”] ["the”,1] ["jumped”,1] ["cow”,1] ["the”,2] ["over”,1] ["moon”,1] collector.emit("split", new Values("the cow jumped over the moon"), 1) tuple tree failed(time-out) × × Tuple Lifecycle(API Layer)
  • 29. 1. tell Storm whenever you're creating a new link in the tree of tuples 2. tell Storm when you have finished processing an individual tuple 1. can detect when the tree of tuples is fully processed 2. can ack or fail the spout tuple appropriately. Sentence Spout Split Sentence Bolt Word Count Bolt ["the cow jumped over the moon"] [“cow”] [“the”] ["jumped”] ["over”] ["the”] ["moon”] ["the”,1] ["jumped”,1] ["cow”,1] ["the”,2] ["over”,1] ["moon”,1] anchored anchored anchored anchored anchored anchored each word tuple is anchored by sentence tuple Storm: YOU: spout tuple word tuple Tuple Lifecycle(Program Layer) Kestrel /Kafka ["the cow jumped over the moon"] input tuple output tuple input tuple output tuple
  • 30. 1. tell Storm whenever you're creating a new link in the tree of tuples 2. tell Storm when you have finished processing an individual tuple 1. can detect when the tree of tuples is fully processed 2. can ack or fail the spout tuple appropriately. Sentence Spout Split Sentence Bolt Word Count Bolt [“cow”] [“the”] ["jumped”] ["over”] ["the”] ["moon”] ["the”,1] ["jumped”,1] ["cow”,1] ["the”,2] ["over”,1] ["moon”,1] anchored anchored anchored anchored anchored anchored each word-count tuple is anchored by word tuple Storm: YOU: anchored anchored anchored anchored anchored anchored ["the cow jumped over the moon"] word-count tuple Tuple Lifecycle(Program Layer) Kestrel /Kafka ["the cow jumped over the moon"] word tuple input tuple output tuple input tuple output tuple
  • 31. 1. tell Storm whenever you're creating a new link in the tree of tuples 2. tell Storm when you have finished processing an individual tuple 1. can detect when the tree of tuples is fully processed 2. can ack or fail the spout tuple appropriately. Sentence Spout Split Sentence Bolt Word Count Bolt [“cow”] [“the”] ["jumped”] ["over”] ["the”] ["moon”] ["the”,1] ["jumped”,1] ["cow”,1] ["the”,2] ["over”,1] ["moon”,1] anchored anchored anchored anchored anchored anchored Storm: YOU: anchored anchored anchored anchored anchored anchored ["the cow jumped over the moon"] ✅ Tuple Lifecycle(Program Layer) Kestrel /Kafka ["the cow jumped over the moon"] ack word tuple: [“the”]
  • 32. 1. tell Storm whenever you're creating a new link in the tree of tuples 2. tell Storm when you have finished processing an individual tuple 1. can detect when the tree of tuples is fully processed 2. can ack or fail the spout tuple appropriately. Sentence Spout Split Sentence Bolt Word Count Bolt [“cow”] [“the”] ["jumped”] ["over”] ["the”] ["moon”] ["the”,1] ["jumped”,1] ["cow”,1] ["the”,2] ["over”,1] ["moon”,1] anchored anchored anchored anchored anchored anchored Storm: YOU: anchored anchored anchored anchored anchored anchored ["the cow jumped over the moon"] ✅ ✅ Tuple Lifecycle(Program Layer) Kestrel /Kafka ["the cow jumped over the moon"] ack word tuple: [“cow”]
  • 33. 1. tell Storm whenever you're creating a new link in the tree of tuples 2. tell Storm when you have finished processing an individual tuple 1. can detect when the tree of tuples is fully processed 2. can ack or fail the spout tuple appropriately. Sentence Spout Split Sentence Bolt Word Count Bolt [“cow”] [“the”] ["jumped”] ["over”] ["the”] ["moon”] ["the”,1] ["jumped”,1] ["cow”,1] ["the”,2] ["over”,1] ["moon”,1] anchored anchored anchored anchored anchored anchored Storm: YOU: anchored anchored anchored anchored anchored anchored ["the cow jumped over the moon"] ✅ ✅ ✅ ✅ ✅ ✅ Tuple Lifecycle(Program Layer) Kestrel /Kafka ["the cow jumped over the moon"] ack word tuple: [“moon”]
  • 34. 1. tell Storm whenever you're creating a new link in the tree of tuples 2. tell Storm when you have finished processing an individual tuple 1. can detect when the tree of tuples is fully processed 2. can ack or fail the spout tuple appropriately. Sentence Spout Split Sentence Bolt Word Count Bolt [“cow”] [“the”] ["jumped”] ["over”] ["the”] ["moon”] ["the”,1] ["jumped”,1] ["cow”,1] ["the”,2] ["over”,1] ["moon”,1] anchored anchored anchored anchored anchored anchored anchored anchored anchored anchored anchored anchored ["the cow jumped over the moon"] ✅ ✅ ✅ ✅ ✅ ✅ Storm: YOU: Tuple Lifecycle(Program Layer) Kestrel /Kafka ["the cow jumped over the moon"] ✅ ack sentence tuple: [“the cow jumped over the moon”] the input tuple is acked after all the word tuples are emitted input tuple word tuples
  • 35. 1. tell Storm whenever you're creating a new link in the tree of tuples 2. tell Storm when you have finished processing an individual tuple 1. can detect when the tree of tuples is fully processed 2. can ack or fail the spout tuple appropriately. Sentence Spout Split Sentence Bolt Word Count Bolt [“cow”] [“the”] ["jumped”] ["over”] ["the”] ["moon”] ["the”,1] ["jumped”,1] ["cow”,1] ["the”,2] ["over”,1] ["moon”,1] anchored anchored anchored anchored anchored anchored anchored anchored anchored anchored anchored anchored ["the cow jumped over the moon"] ✅ ✅ ✅ ✅ ✅ ✅ Storm: YOU: Kestrel /Kafka tuple tree full processed ack(msgId=1) Tuple Lifecycle(Program Layer) ["the cow jumped over the moon"] ✅
  • 36. 1. tell Storm whenever you're creating a new link in the tree of tuples 2. tell Storm when you have finished processing an individual tuple 1. can detect when the tree of tuples is fully processed 2. can ack or fail the spout tuple appropriately. Sentence Spout Split Sentence Bolt Word Count Bolt [“cow”] [“the”] ["jumped”] ["over”] ["the”] ["moon”] ["the”,1] ["jumped”,1] ["cow”,1] ["the”,2] ["over”,1] ["moon”,1] anchored anchored anchored anchored anchored anchored anchored anchored anchored anchored anchored anchored ["the cow jumped over the moon"] ✅ ✅ ✅ ✅ ✅ ✅ Storm: YOU: Kestrel /Kafka tuple tree full processed ack(msgId=1) Tuple Lifecycle(Program Layer) ✅
  • 37. 1. tell Storm whenever you're creating a new link in the tree of tuples 2. tell Storm when you have finished processing an individual tuple 1. can detect when the tree of tuples is fully processed 2. can ack or fail the spout tuple appropriately. Sentence Spout Split Sentence Bolt Word Count Bolt [“cow”] [“the”] ["jumped”] ["over”] ["the”] ["moon”] ["the”,1] ["jumped”,1] ["cow”,1] ["the”,2] ["over”,1] ["moon”,1] anchored anchored anchored anchored anchored anchored anchored anchored anchored anchored anchored anchored ["the cow jumped over the moon"] ✅ ✅ ✅ ✅ ✅ Storm: YOU: Kestrel /Kafka Tuple Lifecycle(Program Layer) Since the word tuple is anchored, the spout tuple at the root of the tree w’be replayed later on if the word tuple failed to be processed downstream ["the cow jumped over the moon"] tuple tree failed fail(msgId=1) ×× this.collector.fail(tuple)
  • 40. tuple1 tuple2 tuple3 input tuple output tuple multi-anchored tuple tuple1 tuple2 tuple3 × tuple1 tuple2 tuple3 replay…tuple3 failed
  • 41. ONE MORE THING + reading an input tuple, + emitting tuples based on it + and then acking the tuple at the end of the execute() Every tuple you process must be acked or failed. Storm uses memory to track each tuple, so if you don't ack/fail every tuple, the task will eventually run OOM. STORM DO IT FOR YOU! YOU DON’T NEED Attention Anchor & Ack Anymore ✅
  • 42. Acker
  • 44. tuple1 SentenceSpout tuple1 tuple3 SplitBolt ["the cow jumped..”] tuple4 tuple2 tuple6 tuple7 tuple5 tuple3 tuple4 tuple2 [“the”] ["cow”] ["jumped”] ["cow”,1] ["the”,1] ["jumped”,1] ["the cow jumped.”] tuple6 tuple7 tuple5 WordCountBolt PrintBolt Tuple Tree 🌲
  • 45. 在Spout中发射⼀一个新的源Tuple时, 可以为该源Tuple指定⼀一个MessageId。 多个源Tuple可以共⽤用同⼀一个MessageId, 表⽰示多个源Tuple组成同⼀一个消息单元, 它们会被放到同⼀一棵Tuple树中 tuple1 tuple2 Spout tuple1 tuple3 Bolt1 tuple2 tuple4 Bolt2 tuple3 tuple5 tuple4 tuple6 Bolt3 Bolt4 tuple5 tuple6 Bolt5 collector.emit(new Values(tuple1), Message1); collector.emit(new Values(tuple2), Message1); collector.emit(new Values(tuple1), Message1); collector.emit(new Values(tuple2), Message2); Tuple Tree 🌲 🌲 🌲 Message1
  • 46. 1. Spout中Message1绑定了tuple1和tuple2(同⼀一个MessageId) 2. tuple1发送给Bolt1处理,tuple2发送给Bolt2处理 3. Bolt1处理tuple1⽣生成tuple3,Bolt2处理tuple2⽣生成tuple4 4. Bolt1⽣生成的tuple3流向Bolt3,Bolt2⽣生成的tuple4流向Bolt4 5. Bolt3处理tuple3⽣生成tuple5,Bolt4处理tuple4⽣生成tuple6 6. Bolt3⽣生成的tuple5和Bolt4⽣生成的tuple6都流向了同⼀一个Bolt5 7. Bolt5处理完tuple5和tuple6,表⽰示Message1被完全处理了 tuple1 tuple2 Spout tuple1 tuple3 Bolt1 tuple2 tuple4 Bolt2 tuple3 tuple5 tuple4 tuple6 Bolt3 Bolt4 tuple5 tuple6 Bolt5
  • 47. 1. Spout中Message1绑定了tuple1和tuple2(同⼀一个MessageId) 2. tuple1发送给Bolt1处理,tuple2发送给Bolt2处理 3. Bolt1处理tuple1⽣生成tuple3,Bolt2处理tuple2⽣生成tuple4 4. Bolt1⽣生成的tuple3流向Bolt3,Bolt2⽣生成的tuple4流向Bolt4 5. Bolt3处理tuple3⽣生成tuple5,Bolt4处理tuple4⽣生成tuple6 6. Bolt3⽣生成的tuple5和Bolt4⽣生成的tuple6都流向了同⼀一个Bolt5 7. Bolt5处理完tuple5和tuple6,表⽰示Message1被完全处理了 tuple1 tuple2 Spout tuple1 tuple3 Bolt1 tuple2 tuple4 Bolt2 tuple3 tuple5 tuple4 tuple6 Bolt3 Bolt4 tuple5 tuple6 Bolt5
  • 48. tuple1 tuple2 Spout tuple1 tuple3 Bolt1 tuple2 tuple4 Bolt2 tuple3 tuple5 tuple4 tuple6 Bolt3 Bolt4 tuple5 tuple6 Bolt5 1. Spout中Message1绑定了tuple1和tuple2(同⼀一个MessageId) 2. tuple1发送给Bolt1处理,tuple2发送给Bolt2处理 3. Bolt1处理tuple1⽣生成tuple3,Bolt2处理tuple2⽣生成tuple4 4. Bolt1⽣生成的tuple3流向Bolt3,Bolt2⽣生成的tuple4流向Bolt4 5. Bolt3处理tuple3⽣生成tuple5,Bolt4处理tuple4⽣生成tuple6 6. Bolt3⽣生成的tuple5和Bolt4⽣生成的tuple6都流向了同⼀一个Bolt5 7. Bolt5处理完tuple5和tuple6,表⽰示Message1被完全处理了
  • 49. tuple1 tuple2 Spout tuple1 tuple3 Bolt1 tuple2 tuple4 Bolt2 tuple3 tuple5 tuple4 tuple6 Bolt3 Bolt4 tuple5 tuple6 Bolt5 1. Spout中Message1绑定了tuple1和tuple2(同⼀一个MessageId) 2. tuple1发送给Bolt1处理,tuple2发送给Bolt2处理 3. Bolt1处理tuple1⽣生成tuple3,Bolt2处理tuple2⽣生成tuple4 4. Bolt1⽣生成的tuple3流向Bolt3,Bolt2⽣生成的tuple4流向Bolt4 5. Bolt3处理tuple3⽣生成tuple5,Bolt4处理tuple4⽣生成tuple6 6. Bolt3⽣生成的tuple5和Bolt4⽣生成的tuple6都流向了同⼀一个Bolt5 7. Bolt5处理完tuple5和tuple6,表⽰示Message1被完全处理了
  • 50. tuple1 tuple2 Spout tuple1 tuple3 Bolt1 tuple2 tuple4 Bolt2 tuple3 tuple5 tuple4 tuple6 Bolt3 Bolt4 tuple5 tuple6 Bolt5 1. Spout中Message1绑定了tuple1和tuple2(同⼀一个MessageId) 2. tuple1发送给Bolt1处理,tuple2发送给Bolt2处理 3. Bolt1处理tuple1⽣生成tuple3,Bolt2处理tuple2⽣生成tuple4 4. Bolt1⽣生成的tuple3流向Bolt3,Bolt2⽣生成的tuple4流向Bolt4 5. Bolt3处理tuple3⽣生成tuple5,Bolt4处理tuple4⽣生成tuple6 6. Bolt3⽣生成的tuple5和Bolt4⽣生成的tuple6都流向了同⼀一个Bolt5 7. Bolt5处理完tuple5和tuple6,表⽰示Message1被完全处理了
  • 51. tuple1 tuple2 Spout tuple1 tuple3 Bolt1 tuple2 tuple4 Bolt2 tuple3 tuple5 tuple4 tuple6 Bolt3 Bolt4 tuple5 tuple6 Bolt5 1. Spout中Message1绑定了tuple1和tuple2(同⼀一个MessageId) 2. tuple1发送给Bolt1处理,tuple2发送给Bolt2处理 3. Bolt1处理tuple1⽣生成tuple3,Bolt2处理tuple2⽣生成tuple4 4. Bolt1⽣生成的tuple3流向Bolt3,Bolt2⽣生成的tuple4流向Bolt4 5. Bolt3处理tuple3⽣生成tuple5,Bolt4处理tuple4⽣生成tuple6 6. Bolt3⽣生成的tuple5和Bolt4⽣生成的tuple6都流向了同⼀一个Bolt5 7. Bolt5处理完tuple5和tuple6,表⽰示Message1被完全处理了
  • 52. tuple1 tuple2 Spout tuple1 tuple3 Bolt1 tuple2 tuple4 Bolt2 tuple3 tuple5 tuple4 tuple6 Bolt3 Bolt4 tuple5 tuple6 Bolt5 1. Spout中Message1绑定了tuple1和tuple2(同⼀一个MessageId) 2. tuple1发送给Bolt1处理,tuple2发送给Bolt2处理 3. Bolt1处理tuple1⽣生成tuple3,Bolt2处理tuple2⽣生成tuple4 4. Bolt1⽣生成的tuple3流向Bolt3,Bolt2⽣生成的tuple4流向Bolt4 5. Bolt3处理tuple3⽣生成tuple5,Bolt4处理tuple4⽣生成tuple6 6. Bolt3⽣生成的tuple5和Bolt4⽣生成的tuple6都流向了同⼀一个Bolt5 7. Bolt5处理完tuple5和tuple6,表⽰示Message1被完全处理了 Message1 ✅
  • 53.
  • 54. Spout Bolt1 Bolt2 Bolt3tuple1 tuple2 tuple3 完全处理: 源Tuple以及由该源Tuple衍⽣生的所有Tuple都经过了Topology中每⼀一个应该到达的Bolt的处理 tuple1 tuple1 tuple2 tuple2 tuple3 tuple3 Spout发射Tuple Bolt1接收Tuple1 Bolt1处理Tuple1 Bolt1发射Tuple2 Bolt2接收Tuple2 Bolt2处理Tuple2 Bolt2发射Tuple3 Bolt3接收Tuple3 Bolt3处理Tuple3 spout-tuple-1 processed table:只有全部为Y,才表⽰示完全处理 Spout Bolt1 Bolt2 Bolt3tuple1 tuple2tuple1 tuple1 tuple2 Spout Bolt1 Bolt2 Bolt3tuple1 tuple2 tuple3tuple1 tuple1 tuple2 tuple2 tuple3 ✅ × × × × ✅
  • 55. Spout Bolt1 Bolt2 Bolt3tuple1 tuple24tuple1 tuple23 tuple25 tuple26 tuple22 tuple21 tuple27 …… …… tuple33 tuple32 tuple34 tuple35 tuple31 What would spout-tuple-1 processing table like? A REALLY LARGE/HUGE TABLE!!!
  • 56. Spout Bolt1 Bolt2 Bolt3 AckerBolt tuple1 tuple2 tuple3 ack_value tuple1 tuple2 tuple3 ack()/fail()? em it ack emit em it ack ack Acker组件:跟踪Spout发出的每⼀一个Tuple的Tuple🌲 🌲 1. emit(tuple, …) 2. ack(tuple)
  • 58. Spout Bolt1 Bolt2 Bolt3 AckerBolt tuple1 tuple2 tuple3 ack_value tuple1 tuple2 tuple3 ack()/fail()? em it ack emit em it ack ack 🌲
  • 59. Spout Bolt1 Bolt2 Bolt3 AckerBolt tuple1 tuple2 tuple3 ack_value tuple1 tuple2 tuple3 ack()/fail()? em it ack emit em it ack ack 🌲
  • 60. Spout Bolt1 Bolt2 Bolt3 AckerBolt tuple1 tuple2 tuple3 ack_value tuple1 tuple2 tuple3 ack()/fail()? em it ack emit em it ack ack 🌲
  • 61. Spout Bolt1 Bolt2 Bolt3 AckerBolt tuple1 tuple2 tuple3 ack_value tuple1 tuple2 tuple3 ack()/fail()? em it ack emit em it ack ack 🌲
  • 62. Spout Bolt1 Bolt2 Bolt3 AckerBolt tuple1 tuple2 tuple3 ack_value tuple1 tuple2 tuple3 ack()/fail()? em it ack emit em it ack ack 🌲
  • 63. Spout Bolt1 Bolt2 Bolt3 AckerBolt tuple1 tuple2 tuple3 ack_value tuple1 tuple2 tuple3 ack()/fail()? em it ack emit em it ack ack 🌲
  • 64. Spout Bolt1 Bolt2 Bolt3 AckerBolt tuple1 tuple2 tuple3 ack_value tuple1 tuple2 tuple3 ack()/fail()? em it ack emit em it ack ack 🌲
  • 66. Spout Bolt1 Bolt2 Bolt3 AckerBolt tuple1 tuple2 tuple3 ack_value tuple1 tuple2 tuple3 ack()/fail()? em it ack emit em it ack ack 🌲
  • 67. Spout Bolt1 Bolt2 Bolt3 AckerBolt tuple1 tuple2 tuple3 ack_value tuple1 tuple2 tuple3 ack()/fail()? em it ack emit em it ack ack 🌲
  • 68. Spout Bolt1 Bolt2 Bolt3 AckerBolt tuple1 tuple2 tuple3 ack_value tuple1 tuple2 tuple3 ack()/fail()? em it ack emit em it ack ack 🌲
  • 69. Spout Bolt1 Bolt2 Bolt3 AckerBolt tuple1 tuple2 tuple3 ack_value tuple1 tuple2 tuple3 ack()/fail()? em it ack emit em it ack ack 🌲
  • 70. Spout Bolt1 Bolt2 Bolt3 AckerBolt tuple1 tuple2 tuple3 ack_value tuple1 tuple2 tuple3 ack()/fail()? em it ack emit em it ack ack 🌲
  • 71. Spout Bolt1 Bolt2 Bolt3 AckerBolt tuple1 tuple2 tuple3 ack_value tuple1 tuple2 tuple3 ack()/fail()? em it ack emit em it ack ack 🌲
  • 72. Spout Bolt1 Bolt2 Bolt3 AckerBolt tuple1 tuple2 tuple3 ack_value tuple1 tuple2 tuple3 ack()/fail()? em it ack emit em it ack ack 🌲
  • 73. How Storm Implements Acker… How does Storm implement reliability in an efficient way?
  • 74. A Storm topology has a set of special "acker" tasks that track the DAG of tuples for every spout tuple. When an acker sees that a DAG is complete, it sends a message to the spout task that created the spout tuple to ack the message. 1. Acker can have many tasks just like Spout/Bolt 2. DAG of tuples is a Tuple Tree which 3. generate by Spout #tuple(by one of Spout task) 4. The Spout #tuple associated with a MessageId 5. When all tuples on Tuple Tree are full processed 6. Acker send a message to the Spout task on #3 7. Spout can ack the Message along with #tuple
  • 75. 理解Storm可靠性的最好的⽅方法是来看看tuple和tuple树的⽣生命周期,当⼀一个tuple被创建,不管是spout还是bolt创建 的,它会被赋予⼀一个64位的id,⽽而acker就是利⽤用这个id去跟踪所有tuple的。每个tuple知道它的祖宗的id(从spout 发出来的那个tuple的id,⼀一棵tuple树的root tuple-id是固定的), 每当你新发射⼀一个tuple, 它的祖宗id都会传给这个 新的tuple。当⼀一个tuple被ack的时候,会发⼀一个消息给acker,告诉acker这个tuple树发⽣生了怎么样的变化。 具体来说就是它告诉acker: 我已经完成了,我有这些⼉儿⼦子tuple, 你跟踪⼀一下他们吧。 The best way to understand Storm's reliability implement is to look at the lifecycle of tuples and tuple DAGs. When a tuple is created in a topology, whether in a spout or a bolt, it is given a random 64 bit id. These ids are used by ackers to track the tuple DAG for every spout tuple. Every tuple knows the ids of all the spout tuples for which it exists in their tuple trees. When you emit a new tuple in a bolt, the spout tuple ids from the tuple's anchors are copied into the new tuple. When a tuple is acked, it sends a message to the appropriate acker tasks with information about how the tuple tree changed. In particular it tells the acker "I am now completed within the tree for this spout tuple, and here are the new tuples in the tree that were anchored to me".
  • 76. When a tuple is acked, it sends a message to the appropriate acker tasks with information about how the tuple tree changed. In particular it tells the acker "I am now completed within the tree for this spout tuple, and here are the new tuples in the tree that were anchored to me" For example, if tuples "D" and "E" were created based on tuple "C", here's how the tuple tree changes when "C" is acked: Since "C" is removed from the tree at the same time that "D" and "E" are added to it, the tree can never be prematurely completed. 1. Bolt emit 时不会向Acker发送消息,Bolt ack 时才会向Acker发送消息 2. ack时知道要ack的input tuple的id和emit时产⽣生的所有output tuple的ids 3. 所以ack时可以把input tuple id和emit的所有output tuple ids先计算好后 才向Acker发送消息 4. Acker收到Bolt的ack消息,将当前的ack val和收到的ack消息进⾏行计算, 得到的结果表⽰示tuple树的变化情况 5. Bolt⼀一旦对input tuple进⾏行ack后,从当前input tuple⼀一直回溯到 root tuple都不再需要保存相关信息 只需要在Acker中保存最新emit出来的output tuples 为什么不需要记录祖先tuple-id(不仅仅是spout tuple id,也包括上游输⼊入tuple)
  • 77. Spout Bolt1 Bolt2 Bolt3 AckerBolt tuple1 tuple2 tuple3 ack_value tuple1 tuple2 tuple3 ack()/fail()? em it ack ack ack Acker组件:跟踪Spout发出的每⼀一个Tuple的Tuple🌲 🌲 1. emit(tuple, …) 2. ack(tuple) emit emit
  • 78. Spout Bolt1 Bolt2 Bolt3 AckerBolt tuple1 tuple2 tuple3 ack_value tuple1 tuple2 tuple3 ack()/fail()? em it ack ack ack 🌲 emit emit tuple1
  • 79. Spout Bolt1 Bolt2 Bolt3 AckerBolt tuple1 tuple2 tuple3 ack_value tuple1 tuple2 tuple3 ack()/fail()? em it ack ack ack 🌲 emit emit tuple1 tuple2 × × × ×:表⽰示⽗父tuple已经完成,Acker需要跟踪⼦子tuples
  • 80. Spout Bolt1 Bolt2 Bolt3 AckerBolt tuple1 tuple2 tuple3 ack_value tuple1 tuple2 tuple3 ack()/fail()? em it ack ack ack 🌲 emit emit tuple1 tuple2 × × tuple3 × × × ×
  • 81. Spout Bolt1 Bolt2 Bolt3 AckerBolt tuple1 tuple2 tuple3 ack_value tuple1 tuple2 tuple3 ack()/fail()? em it ack ack ack 🌲 emit emit tuple1 tuple2 × × tuple3 × × × × × ××
  • 82. Spout Bolt1 Bolt2 Bolt3 AckerBolt tuple1 tuple2 tuple3 ack_value tuple1 tuple2 tuple3 ack()/fail()? em it ack ack ack 🌲 emit emit tuple1 tuple2 × × tuple3 × × × × × ×× ✅
  • 83. ⼀一点代数知识 ⾃自⼰己和⾃自⼰己^异或^⼀一定等于0 0000 ^ 0000 ——— 0000 0 0 1 1 0 1 1 0^ 100 1 0001 ^ 0001 ——— 0000 0010 ^ 0010 ——— 0000 0011 ^ 0011 ——— 0000 0100 ^ 0100 ——— 0000 010100110110010011 ^ 010100110110010011 ——————————— 000000000000000000 两个不相同(不是⾃自⼰己和⾃自⼰己)异或不为0 0000 ^ 0001 ——— 0001 0001 ^ 1001 ——— 1000 0010 ^ 0110 ——— 0100 0011 ^ 0010 ——— 0001 1100 ^ 0100 ——— 1000 010100110110010011 ^ 010100111110010011 ——————————— 000000001000000000 0 1 0 1 0 1 1 0 0 1
  • 84. 那么有没有办法得到0呢? 0000 ^ 0001 ——— 0001 0001 ^ 1100 ——— 1101 1101 ^ 0010 ——— 1111 1111 ^ 1001 ——— 0110 0110 ^ 0110 ——— 0000 0^X1=X1 X1^X2=X3 X3^X4=X5 X5^X6=X7 X7^X7= 0 X1 X1 X2 X3 X4 X5 X6 X7 X7 ⾃自⼰己和⾃自⼰己异或⼀一定等于0 0001 1100 0000 0001 1101 0010 11010001 1111 1111 1001 0110 0110 0110 0000 X1 X2 X4 X6 X7
  • 85. Spout Bolt10001 Bolt21010 Bolt30011 Spout/Bolt发射Tuple时都会为Tuple⽣生成⼀一个ID Spout/Bolt有往下游发射Tuple,必须有Bolt接收 最后⼀一个Bolt没有发射Tuple,表⽰示Topology结束 0001 1010 0011 发射 接收 发射 接收 发射 接收 0001 1010 00110001 1010 0011^ ^ ^ ^ ^ ( )0001 1010 00110001 1010 0011^ ^ ^ ^ ^( ) ( ) 0000 0000 0000 ^ ^ 0000 tuple1 tuple1 tuple2 tuple3tuple2 tuple3
  • 86. 0001 1010 00110001 1010 0011^ ^ ^ ^ ^ ( )0001 1010 00110001 1010 0011^ ^ ^ ^ ^( ) ( ) 0000 0000 0000 ^ ^ 0000 Spout Bolt10001 Bolt21010 Bolt300110001 1010 0011 Spout发射⼀一个Tuple,id=0001,Acker跟踪此spout tuple tuple1 tuple1 tuple2 tuple3tuple2 tuple3
  • 87. 0001 1010 00110001 1010 0011^ ^ ^ ^ ^ ( )0001 1010 00110001 1010 0011^ ^ ^ ^ ^( ) ( ) 0000 0000 0000 ^ ^ 0000 Spout Bolt10001 Bolt21010 Bolt300110001 1010 0011 Bolt1接收到Spout发射的input tuple,但还没有处理,不会和Acker通信 tuple1 tuple1 tuple2 tuple3tuple2 tuple3
  • 88. 0001 1010 00110001 1010 0011^ ^ ^ ^ ^ ( )0001 1010 00110001 1010 0011^ ^ ^ ^ ^( ) ( ) 0000 0000 0000 ^ ^ 0000 Spout Bolt10001 Bolt21010 Bolt300110001 1010 0011 Bolt1发射新的Tuple:1010,并且对input tuple=tuple1进⾏行ack,会和Acker通信 tuple1 tuple1 tuple2 tuple3tuple2 tuple3
  • 89. 0001 1010 00110001 1010 0011^ ^ ^ ^ ^ ( )0001 1010 00110001 1010 0011^ ^ ^ ^ ^( ) ( ) 0000 0000 0000 ^ ^ 0000 Spout Bolt10001 Bolt21010 Bolt300110001 1010 0011 Acker中只会保留新⽣生成的⼦子tuple=tuple2的id,祖先tuple ids不会记录 tuple1 tuple1 tuple2 tuple3tuple2 tuple3
  • 90. 0001 1010 00110001 1010 0011^ ^ ^ ^ ^ ( )0001 1010 00110001 1010 0011^ ^ ^ ^ ^( ) ( ) 0000 0000 0000 ^ ^ 0000 Spout Bolt10001 Bolt21010 Bolt300110001 1010 0011 Bolt2接收tuple2,处理tuple2,发射⼦子tuple=tuple3,ack(tuple2) tuple1 tuple1 tuple2 tuple3tuple2 tuple3
  • 91. 0001 1010 00110001 1010 0011^ ^ ^ ^ ^ ( )0001 1010 00110001 1010 0011^ ^ ^ ^ ^( ) ( ) 0000 0000 0000 ^ ^ 0000 Spout Bolt10001 Bolt21010 Bolt300110001 1010 0011 Acker中只会保留新⽣生成的⼦子tuple=tuple3的id,祖先tuple ids不会记录 tuple1 tuple1 tuple2 tuple3tuple2 tuple3
  • 92. 0001 1010 00110001 1010 0011^ ^ ^ ^ ^ ( )0001 1010 00110001 1010 0011^ ^ ^ ^ ^( ) ( ) 0000 0000 0000 ^ ^ 0000 Spout Bolt10001 Bolt21010 Bolt300110001 1010 0011 Bolt3接收tuple3,处理tuple3,不再发射新tuple,ack(tuple3) tuple1 tuple1 tuple2 tuple3tuple2 tuple3
  • 93. 0001 1010 00110001 1010 0011^ ^ ^ ^ ^ ( )0001 1010 00110001 1010 0011^ ^ ^ ^ ^( ) ( ) 0000 0000 0000 ^ ^ 0000 Spout Bolt10001 Bolt21010 Bolt300110001 1010 0011 tuple1 tuple1 tuple2 tuple3tuple2 tuple3 没有新⽣生成的tuple,Acker的ack_val=0,表⽰示TupleTree完全处理 ✅
  • 94. (spout-tuple-id, tmp-ack-val) tmp-ack-val = spout-tuple-id ^ (child-tuple-id1 ^ child-tuple-id2 ... ) tmp-ack-val是要ack的tuple的id与由它新创建的所有的tuple的id异或的结果 以spout产⽣生spout-tuple-id为例(tuple1),Bolt1产⽣生bolt1-tuple-id(tuple2), Bolt2产⽣生bolt2-tuple-id(tuple3),Bolt3不产⽣生tuple。 Spout发射Tuple1,Acker记录tuple1的id,⽤用于跟踪spout-tuple tmp-ack-val = spout-tuple-id Bolt1处理Spout的tuple1,发射tuple2,并ack Spout的tuple1 tmp-ack-val = spout-tuple-id ^ (spout-tuple-id ^ bolt1-tuple-id) = (spout-tuple-id ^ spout-tuple-id) ^ bolt1-tuple-id = 0 ^ bolt1-tuple-id = bolt1-tuple-id Bolt2处理Bolt1的tuple2,发射tuple3,并ack Bolt1的tuple2 tmp-ack-val = spout-tuple-id ^ (spout-tuple-id ^ bolt1-tuple-id) ^ (bolt1-tuple-id ^ bolt2-tuple-id) = (spout-tuple-id ^ spout-tuple-id) ^ (bolt1-tuple-id ^ bolt1-tuple-id) ^ bolt2-tuple-id = 0 ^ 0 ^ bolt2-tuple-id = bolt2-tuple-id Bolt3处理Bolt2的tuple3,不发射tuple,并ack Bolt2的tuple3 tmp-ack-val = spout-tuple-id ^ (spout-tuple-id ^ bolt1-tuple-id) ^ (bolt1-tuple-id ^ bolt2-tuple-id) ^ bolt2-tuple-id = (spout-tuple-id ^ spout-tuple-id) ^ (bolt1-tuple-id ^ bolt1-tuple-id) ^ (bolt2-tuple-id ^ bolt2-tuple-id) = 0 ^ 0 ^ 0 = 0
  • 95. Spout Bolt1 Bolt2 Bolt3 Acker Task1 tuple11 tuple12 tuple13 ack_value tuple11 tuple12 tuple13 ack()/fail()? em it ack ack ack 🌲 emit emit Acker Task2 Acker Task3
  • 96. Spout Bolt1 Bolt2 Bolt3 Acker Task2 tuple21 tuple22 tuple23 ack_value tuple21 tuple22 tuple23 ack()/fail()? em it ack ack ack 🌲 emit emit Acker Task1 Acker Task3
  • 97. Spout Bolt1 Bolt2 Bolt3 Acker Task3 tuple31 tuple32 tuple33 ack_value tuple31 tuple32 tuple23 ack()/fail()? em it ack ack ack 🌲 emit emit Acker Task1 Acker Task2
  • 98. Spout Bolt1 Bolt2 Bolt3 Acker Task1 tuple11 tuple12 tuple13 ack_value tuple11 tuple12 tuple13 ack()/fail()? em it ack ack ack 🌲 emit emit Acker Task2 Acker Task2 1. 当⼀一个tuple需要ack时,它到底应该选择哪个Acker来发送这个信息 2. Acker是怎么知道每⼀一个spout tuple应该交给哪个Spout task来处理
  • 99. 1. 设置Config.TOPOLOGY_ACKERS=1或者更⼤大,默认⼀一个Worker⼀一个Acker 2. 在发射tuple的时候指定messageId来达到跟踪某个特定的Spout tuple的⺫⽬目的 3. 对⼀一个tuple树的所有Tuple执⾏行成功都很关⼼心,发射这些tuple时anchor它们 Spout ack(msgId) different from Bolt ack(tuple) What We Should Do When We Want Use Reliability Of Storm Acker