Storm (Distribute Stream Processing System)

19
Storm Reliable Distributed Message Processing Presented by Majid Hajibaba 12 April 2015 Majid Hajibaba 1

Transcript of Storm (Distribute Stream Processing System)

Page 1: Storm (Distribute Stream Processing System)

Storm

Reliable Distributed Message

Processing

Presented by

Majid Hajibaba

12 April 2015Majid Hajibaba 1

Page 2: Storm (Distribute Stream Processing System)

12 April 2015Majid Hajibaba 2

Architecture of Storm

Page 3: Storm (Distribute Stream Processing System)

12 April 2015Majid Hajibaba 3

Relationships of workers, executors and tasks

Storm Real World

Worker Process

Executor Thread

Task Object

Page 4: Storm (Distribute Stream Processing System)

Does storm guarantee Message

Processing?

12 April 2015Majid Hajibaba 4

Page 5: Storm (Distribute Stream Processing System)

All massage will be processed

If there is an error in any executors (threads)

related worker will restart

everything will start from the beginning

This is because of restarting spout executor

This behavior does not relate to number of Bolts or Spouts

if an executor dies, the whole worker will die

12 April 2015Majid Hajibaba 5

Without Message id, One Worker

Spout Bolt A

Worker 1

Page 6: Storm (Distribute Stream Processing System)

If we kill one bolt worker

Single machine

All operations stop to supervisor recreate another worker

Multiple machine

Operations on that machine will wait until restart

But spout send from the remaining messages offset.

Some messages may be missed

12 April 2015Majid Hajibaba 6

Without Message id, Multiple Worker

Spout

Bolt A

Bolt A

Worker 2

Worker 1Worker 3

Page 7: Storm (Distribute Stream Processing System)

If we kill one bolt worker

all operations stop to supervisor recreate another worker.

But spout send from the remaining messages offset.

In this case some messages may be missed but the user can aware of it, because the user implemented ack and fail method for each message called by spout.

user should do the required processing so that the messages can be emitted again by the nextTuple method

12 April 2015Majid Hajibaba 7

With Message id, Multiple Worker

Spout

Bolt A

Bolt A

Worker 2

Worker 1Worker 3

Page 8: Storm (Distribute Stream Processing System)

1. Tag each tuple emitted by a spout with a unique message ID.

2. emit the new tuple anchored with the original tuple

3. send an acknowledgment, otherwise a failure signal

4. Implement fail method in your code

12 April 2015Majid Hajibaba 8

Programmers: what to do?

@Override

public void fail(Object id) {

collector.emit(tuples[id],id);

}

spoutOutputCollector.emit(new Values(tuple),

generateMsgId(tuple));

collector.emit(inputTuple, transform(inputTuple));

public void execute(tuple input, OutputCollector collector)

{ try {

. . .

collector.ack(input);

}catch(Exception) { collector.fail(input); }

}

Page 9: Storm (Distribute Stream Processing System)

Tuples are automatically anchored to the input tuple

the input tuple is acked automatically when the execute method completes.

With BaseRichBolt we have responsibility to send the ack

12 April 2015Majid Hajibaba 9

BasicOutputCollector

BasicBoltExecutor

Page 10: Storm (Distribute Stream Processing System)

A link in the tuple tree is called anchoring

12 April 2015Majid Hajibaba 10

Anchoring

A

collector.emit(TupleA, id);

collector.emit(TupleA, TupleB);

B

List<Tuple> anchors = new ArrayList<Tuple>();

anchors.add(tupleC);

anchors.add(tupleD);

collector.emit(anchors, TupleF);

C

D

collector.emit(TupleB, TupleC);

collector.emit(TupleB, TupleD);

D

Page 11: Storm (Distribute Stream Processing System)

12 April 2015Majid Hajibaba 11

Acking Work Flow

Basic Source

Spout

queue

queue

BoltTask

Sender

Task

queue

queue

BoltTask

Sender

Task

1 2 3 4 5 6

1

3

5

2

4

6

Acker Bolt

ack

ack

Worker 1

Worker 2

Worker 3

0

1

2

2

3

3

4

4

5

5

(taskId,tupleId)6

Page 12: Storm (Distribute Stream Processing System)

Is storm fault tolerant?

12 April 2015Majid Hajibaba 12

Page 13: Storm (Distribute Stream Processing System)

12 April 2015Majid Hajibaba 13

Cluster View

Page 14: Storm (Distribute Stream Processing System)

Cluster works normally

Nimbus monitors states

Storm is fail-fast

The processes will halt whenever an unexpected error is encountered

Storm can safely halt at any point and recover correctly when the process is restarted

12 April 2015Majid Hajibaba 14

Fault Tolerancy

Page 15: Storm (Distribute Stream Processing System)

Topology will continue

Workers will still continue to function

Supervisors will continue to restart workers if they die

No workers are affected by the death of Nimbus

We does not have the states of topology

Workers won't be reassigned to other machines when necessary

Is "sort of" a SPOF

12 April 2015Majid Hajibaba 15

Nimbus downs

Page 16: Storm (Distribute Stream Processing System)

The tasks assigned to that machine will time-out

Nimbus will reassign those tasks to other machines

Worker will be restarted on another node

12 April 2015Majid Hajibaba 16

Supervisor node downs

Page 17: Storm (Distribute Stream Processing System)

All other operation will continue

Supervisor will automatically restart worker

Processing will continue

If it continuously fails on startup and is unable to heartbeat to Nimbus, Nimbus will reassign the worker to another machine

12 April 2015Majid Hajibaba 17

Worker downs

Page 18: Storm (Distribute Stream Processing System)

Processing will continue

No workers are affected by the death of Supervisors

Nimbus is unaware

12 April 2015Majid Hajibaba 18

Supervisor downs

Page 19: Storm (Distribute Stream Processing System)

Any Question?End

12 April 2015Majid Hajibaba 19