STORM

STORMCOMPARISON – INTRODUCTION - CONCEPTS

PRESENTATION BY KASPER MADSEN

MARCH - 2012

HADOOP VS STORMBatch processing

Jobs runs to completion

JobTracker is SPOF*

Stateful nodes

Scalable

Guarantees no data loss

Open source

Real-time processing

Topologies run forever

No single point of failure

Stateless nodes

Scalable

Guarantees no data loss

Open source

* Hadoop 0.21 added some checkpointing SPOF: Single Point Of Failure

COMPONENTSNimbus daemon is comparable to Hadoop JobTracker. It is the master

Supervisor daemon spawns workers, it is comparable to Hadoop TaskTracker

Worker is spawned by supervisor, one per port defined in storm.yaml configuration

Task is run as a thread in workers

Zookeeper* is a distributed system, used to store metadata. Nimbus and Supervisor daemons are fail-fast and stateless. All state is kept in Zookeeper.

* Zookeeper is an Apache top-level project

Notice all communication between Nimbus and Supervisors are done through Zookeeper

On a cluster with 2k+1 zookeeper nodes, the system can recover when maximally k nodes fails.

STREAMSStream is an unbounded sequence of tuples.

Topology is a graph where each node is a spout or bolt, and the edges indicate which bolts are subscribing to which streams.

• A spout is a source of a stream

• A bolt is consuming a stream (possibly emits a new one)

• An edge represents a grouping

Source of stream A

Source of stream B

Subscribes: AEmits: C

Subscribes: AEmits: D

Subscribes:A & B

Subscribes: C & D

GROUPINGSEach spout or bolt are running X instances in parallel (called tasks).

Groupings are used to decide which task in the subscribing bolt, the tuple is sent to

Shuffle grouping is a random grouping

Fields grouping is grouped by value, such that equal value results in equal task

All grouping replicates to all tasks

Global grouping makes all tuples go to one task

None grouping makes bolt run in same thread as bolt/spout it subscribes to

Direct grouping producer (task that emits) controls which consumer will receive

2 tasks

2 tasks

4 tasks 3 tasks

EXAMPLE

TopologyBuilder builder = new TopologyBuilder();

builder.setSpout("words", new TestWordSpout(), 10);

builder.setBolt("exclaim1", new ExclamationBolt(), 3)

.shuffleGrouping("words");

builder.setBolt("exclaim2", new ExclamationBolt(), 2)

.shuffleGrouping("exclaim1");

The sourcecode for this example is part of the storm-starter project on github

Run 10 tasks

Run 3 tasks

Run 2 tasks

Create stream called ”words”

Create stream called ”exclaim1”

Subscribe to stream ”words”, using shufflegrouping

Create stream called ”exclaim2”

Subscribe to stream ”exclaim1”, using shufflegrouping

A bolt can subscribe to an unlimited number of streams, by chaining groupings.

TestWordSpout ExclamationBolt ExclamationBolt

EXAMPLE – 1

TestWordSpout

public void nextTuple() {

Utils.sleep(100);

final String[] words = new String[] {"nathan", "mike", "jackson", "golda", "bertels"};

final Random rand = new Random();

final String word = words[rand.nextInt(words.length)];

_collector.emit(new Values(word));

}

The TestWordSpout emits a random string from the array words, each 100 milliseconds


EXAMPLE – 2

ExclamationBolt

OutputCollector _collector;public void prepare(Map conf, TopologyContext context, OutputCollector collector) {

_collector = collector; } public void execute(Tuple tuple) {

_collector.emit(tuple, new Values(tuple.getString(0) + "!!!"));_collector.ack(tuple);

} public void declareOutputFields(OutputFieldsDeclarer declarer) {

declarer.declare(new Fields("word"));}

declareOutputFields is used to declare streams and their schemas. It

is possible to declare several streams and specify the stream to use

when outputting tuples in the emit function call.

Prepare is called when bolt is created

Execute is called for each tuple

declareOutputFields is called when bolt is created


FAULT TOLERANCEZookeeper stores metadata in a very robust way

Nimbus and Supervisor are stateless and only need metadata from ZK to work/restart

When a node dies

• The tasks will time out and be reassigned to other workers by Nimbus.

When a worker dies

• The supervisor will restart the worker.• Nimbus will reassign worker to another supervisor, if no heartbeats are sent.• If not possible (no free ports), then tasks will be run on other workers in

topology. If more capacity is added to the cluster later, STORM will automatically initialize a new worker and spread out the tasks.

When nimbus or supervisor dies

• Workers will continue to run• Workers cannot be reassigned without Nimbus• Nimbus and Supervisor should be run using a process monitoring tool, to

restarts them automatically if they fail.

AT-LEAST-ONCE PROCESSINGSTORM guarantees at-least-once processing of tuples.

Message id, gets assigned to a tuple when emitting from spout or bolt. Is 64 bits long

Tree of tuples is the tuples generated (directly and indirectly) from a spout tuple.

Ack is called on spout, when tree of tuples for spout tuple is fully processed.

Fail is called on spout, if one of the tuples in the tree of tuples fails or the tree of tuples is not fully processed within a specified timeout (default is 30 seconds).

It is possible to specify the message id, when emitting a tuple. This might be useful for replaying tuples from a queue.

Ack/fail method called when tree of tuples have been fully processed or

failed / timed-out

AT-LEAST-ONCE PROCESSING – 2Anchoring is used to copy the spout tuple message id(s) to the new tuples generated. In this way, every tuple knows the message id(s) of all spout tuples.

Multi-anchoring is when multiple tuples are anchored. If the tuple tree fails, then multiple spout tuples will be replayed. Useful for doing streaming joins and more.

Ack called from a bolt, indicates the tuple has been processed as intented

Fail called from a bolt, replays the spout tuple(s)

Every tuple must be acked/failed or the task will run out of memory at some point.

_collector.emit(tuple, new Values(word)); Uses anchoring

_collector.emit(new Values(word)); Does NOT use anchoring

AT-LEAST-ONCE PROCESSING – 3Acker tasks tracks the tree of tuples for every spout tuple

• The acker task responsible for a given spout tuple is determined by modulo on message id. Since all tuples have all spout tuple message ids, it is easy to call the correct acker tasks.

• Acker task stores a map, the format is {spoutMsgId, {spoutTaskId, ”ack val”}}• ”ack val” is the representation of state of entire tree of tuples. It is the xor of

all tuple message ids created and acked in the tree of tuples.• When ”ack val” is 0, then tuple tree is fully processed.• Since message ids are random 64 bits numbers, chances of ”ack val”

becoming 0 by accident is extremely small.

Important to set number of acker tasks in topology when processing large amounts of tuples (defaults to 1)

AT-LEAST-ONCE PROCESSING – 4

SpoutTask: 1

BoltTask: 2

BoltTask: 3

BoltTask: 4

Emit ”hey”

msgId:10

Emit ”h”

spoutIds: 10msgId: 3

spoutIds: 10msgId: 2

Emit ”ey”

Shows what happens in acker task, for one spout tuple. Format is: {spoutMsgId, {spoutTaskId, ”ack val”}}

1. After emit ”hey”: {10, {1, 0000 XOR 1010 = 1010}2. After emit ”h”: {10, {1, 1010 XOR 0010 = 1000}3. After emit ”ey”: {10, {1, 1000 XOR 0011 = 1011}4. After ack ”hey”: {10, {1, 1011 XOR 1010 = 0001}5. After ack ”h”: {10, {1, 0001 XOR 0010 = 0011}6. After ack ”ey”: {10, {1, 0011 XOR 0011 = 0000}7. Since ”ack val” is 0, spout tuple with id 10, must be fully processed. Call ack on spout (task 1)

USES 64 BIT IDS IN REALITY

Example

AT-LEAST-ONCE PROCESSING – 5 A tuple isn't acked because the task died:

The spout tuple(s) at the root of the tree of tuples will time out and be replayed.

Acker task dies:

All the spout tuples the acker was tracking will time out and be replayed.

Spout task dies:

In this case the source that the spout talks to is responsible for replaying the messages. For example, queues like Kestrel and RabbitMQ will place all pending messages back on the queue when a client disconnects.

AT-LEAST-ONCE PROCESSING – 6At-least-once processing might process a tuple more than once.

Example

SpoutTask: 1

BoltTask: 2

BoltTask: 3

All grouping 1. A spout tuple is emitted to task 2 and 32. Worker responsible for task 3 fails3. Supervisor restarts worker4. Spout tuple is replayed and emitted to task 2 and 35. Task 2 will now have executed the same bolt twice

Consider why the all grouping is not important in this example

EXACTLY-ONCE-PROCESSINGTransactional topologies (TT) is an abstraction built on STORM primitives.

TT guarantees exactly-once-processing of tuples.

Acking is optimized in TT, no need to do anchoring or acking manually.

Bolts execute as new instances per attempt of processing a batch

Example

SpoutTask: 1

BoltTask: 2

BoltTask: 3

All grouping 1. A spout tuple is emitted to task 2 and 32. Worker responsible for task 3 fails3. Supervisor restarts worker4. Spout tuple is replayed and emitted to task 2 and 35. Task 2 and 3 initiate new bolts because of new attempt5. Now there is no problem

EXACTLY-ONCE-PROCESSING – 2 For efficiency batch processing of tuples is introduced in TT

Batch has two states: processing or committing

Many batches can be in the processing state concurrently

Only one batch can be in the committing state, and a strong ordering is imposed. That means batch 1 will always be committed before batch 2 and so on.

Types of bolts for TT: BasicBolt, BatchBolt, BatchBolt marked as committer

BasicBolt is processing one tuple at a time.

BatchBolt is processing batches. Call finishBatch when all tuples of batch is executed

BatchBolt marked as committer is calling finishBatch only when batch is in committing state.

EXACTLY-ONCE-PROCESSING – 3 Transactional spout has capability to replay exact batches of tuples batchbolt

Committerbatchbolt batchbolt

Committerbatchbolt

BATCH IS IN PROCESSING STATE

Bolt A: execute method is called for all tuples received from spout

finishBatch is called when first batch is received

Bolt B: execute method is called for all tuples received from bolt A

finishBatch is NOT called because batch is in processing state

Bolt C: execute method is called for all tuples received from bolt A (and B)

finishBatch is NOT called, because bolt B has not called finishBatch

Bolt D: execute method is called for all tuples received from bolt C

finishBatch is NOT called because batch is in processing state

BATCH CHANGES TO COMMITTING STATE

Bolt B: finishBatch is called

Bolt C: finishBatch is called, because we know we got all tuples from Bolt B now

Bolt D: finishBatch is called, because we know we got all tuples from Bolt C now

EXACTLY-ONCE-PROCESSING – 4

Transactional spout

Regular spout, parallelism of 1Defined streams: batch & commit

Regular bolt,Parallelism of P

When batch should enter processing state:• Coordinator emits a tuple with TransactionAttempt and the metadata for that

transaction to the "batch" stream.• All emitter tasks receives the tuple and begins to emit their portion of tuples for

the given batch.

All groupings on batch stream

When processing phase of batch is done (determined by acker task):• Ack gets called on coordinator

When ack gets called on coordinator and all prior transactions have committed:• Coordinator emits a tuple with TransactionAttempt to the commit stream.• All Bolts which are marked as committers subscribe to the commit stream of the

coordinator using an all grouping.• Bolts marked as committers now know the batch is in the committing phase

When batch is fully processed again (determined by acker task):• Ack gets called on coordinator• Coordinator knows batch is now committed

STORM LIBRARIESSTORM uses a lot of libraries. The most prominent are

Clojure a new lisp programming language. Crash-course follows

Jetty an embedded webserver. Used to host the UI of Nimbus.

Kryo a fast serializer, used when sending tuples

Thrift a framework to build services. Nimbus is a thrift daemon

ZeroMQ a very fast transportation layer

Zookeeper a distributed system for storing metadata

LEARN MORE

Wiki (https://github.com/nathanmarz/storm/wiki)

Storm-starter (https://github.com/nathanmarz/storm-starter)

Mailing list (http://groups.google.com/group/storm-user)

#storm-user room on freenode

from: http://www.cupofjoe.tv/2010/11/learn-lesson.html

https://github.com/nathanmarz/storm/wiki



https://github.com/nathanmarz/storm-starter

https://github.com/nathanmarz/storm-starter

http://groups.google.com/group/storm-user





http://freenode.net/

STORM

Technology

Transcript of STORM