Storm (Distribute Stream Processing System)
-
Upload
majid-hajibaba -
Category
Software
-
view
127 -
download
2
Transcript of Storm (Distribute Stream Processing System)
Storm
Reliable Distributed Message
Processing
Presented by
Majid Hajibaba
12 April 2015Majid Hajibaba 1
12 April 2015Majid Hajibaba 2
Architecture of Storm
12 April 2015Majid Hajibaba 3
Relationships of workers, executors and tasks
Storm Real World
Worker Process
Executor Thread
Task Object
Does storm guarantee Message
Processing?
12 April 2015Majid Hajibaba 4
All massage will be processed
If there is an error in any executors (threads)
related worker will restart
everything will start from the beginning
This is because of restarting spout executor
This behavior does not relate to number of Bolts or Spouts
if an executor dies, the whole worker will die
12 April 2015Majid Hajibaba 5
Without Message id, One Worker
Spout Bolt A
Worker 1
If we kill one bolt worker
Single machine
All operations stop to supervisor recreate another worker
Multiple machine
Operations on that machine will wait until restart
But spout send from the remaining messages offset.
Some messages may be missed
12 April 2015Majid Hajibaba 6
Without Message id, Multiple Worker
Spout
Bolt A
Bolt A
Worker 2
Worker 1Worker 3
If we kill one bolt worker
all operations stop to supervisor recreate another worker.
But spout send from the remaining messages offset.
In this case some messages may be missed but the user can aware of it, because the user implemented ack and fail method for each message called by spout.
user should do the required processing so that the messages can be emitted again by the nextTuple method
12 April 2015Majid Hajibaba 7
With Message id, Multiple Worker
Spout
Bolt A
Bolt A
Worker 2
Worker 1Worker 3
1. Tag each tuple emitted by a spout with a unique message ID.
2. emit the new tuple anchored with the original tuple
3. send an acknowledgment, otherwise a failure signal
4. Implement fail method in your code
12 April 2015Majid Hajibaba 8
Programmers: what to do?
@Override
public void fail(Object id) {
collector.emit(tuples[id],id);
}
spoutOutputCollector.emit(new Values(tuple),
generateMsgId(tuple));
collector.emit(inputTuple, transform(inputTuple));
public void execute(tuple input, OutputCollector collector)
{ try {
. . .
collector.ack(input);
}catch(Exception) { collector.fail(input); }
}
Tuples are automatically anchored to the input tuple
the input tuple is acked automatically when the execute method completes.
With BaseRichBolt we have responsibility to send the ack
12 April 2015Majid Hajibaba 9
BasicOutputCollector
BasicBoltExecutor
A link in the tuple tree is called anchoring
12 April 2015Majid Hajibaba 10
Anchoring
A
collector.emit(TupleA, id);
collector.emit(TupleA, TupleB);
B
List<Tuple> anchors = new ArrayList<Tuple>();
anchors.add(tupleC);
anchors.add(tupleD);
collector.emit(anchors, TupleF);
C
D
collector.emit(TupleB, TupleC);
collector.emit(TupleB, TupleD);
D
12 April 2015Majid Hajibaba 11
Acking Work Flow
Basic Source
Spout
queue
queue
BoltTask
Sender
Task
queue
queue
BoltTask
Sender
Task
1 2 3 4 5 6
1
3
5
2
4
6
Acker Bolt
ack
ack
Worker 1
Worker 2
Worker 3
0
1
2
2
3
3
4
4
5
5
(taskId,tupleId)6
Is storm fault tolerant?
12 April 2015Majid Hajibaba 12
12 April 2015Majid Hajibaba 13
Cluster View
Cluster works normally
Nimbus monitors states
Storm is fail-fast
The processes will halt whenever an unexpected error is encountered
Storm can safely halt at any point and recover correctly when the process is restarted
12 April 2015Majid Hajibaba 14
Fault Tolerancy
Topology will continue
Workers will still continue to function
Supervisors will continue to restart workers if they die
No workers are affected by the death of Nimbus
We does not have the states of topology
Workers won't be reassigned to other machines when necessary
Is "sort of" a SPOF
12 April 2015Majid Hajibaba 15
Nimbus downs
The tasks assigned to that machine will time-out
Nimbus will reassign those tasks to other machines
Worker will be restarted on another node
12 April 2015Majid Hajibaba 16
Supervisor node downs
All other operation will continue
Supervisor will automatically restart worker
Processing will continue
If it continuously fails on startup and is unable to heartbeat to Nimbus, Nimbus will reassign the worker to another machine
12 April 2015Majid Hajibaba 17
Worker downs
Processing will continue
No workers are affected by the death of Supervisors
Nimbus is unaware
12 April 2015Majid Hajibaba 18
Supervisor downs
Any Question?End
12 April 2015Majid Hajibaba 19