Post on 09-May-2015
description
STORMCOMPARISON – INTRODUCTION - CONCEPTS
PRESENTATION BY KASPER MADSEN
MARCH - 2012
HADOOP VS STORMBatch processing
Jobs runs to completion
JobTracker is SPOF*
Stateful nodes
Scalable
Guarantees no data loss
Open source
Real-time processing
Topologies run forever
No single point of failure
Stateless nodes
Scalable
Guarantees no data loss
Open source
* Hadoop 0.21 added some checkpointing SPOF: Single Point Of Failure
COMPONENTSNimbus daemon is comparable to Hadoop JobTracker. It is the master
Supervisor daemon spawns workers, it is comparable to Hadoop TaskTracker
Worker is spawned by supervisor, one per port defined in storm.yaml configuration
Task is run as a thread in workers
Zookeeper* is a distributed system, used to store metadata. Nimbus and Supervisor daemons are fail-fast and stateless. All state is kept in Zookeeper.
* Zookeeper is an Apache top-level project
Notice all communication between Nimbus and Supervisors are done through Zookeeper
On a cluster with 2k+1 zookeeper nodes, the system can recover when maximally k nodes fails.
STREAMSStream is an unbounded sequence of tuples.
Topology is a graph where each node is a spout or bolt, and the edges indicate which bolts are subscribing to which streams.
• A spout is a source of a stream
• A bolt is consuming a stream (possibly emits a new one)
• An edge represents a grouping
Source of stream A
Source of stream B
Subscribes: AEmits: C
Subscribes: AEmits: D
Subscribes:A & B
Subscribes: C & D
GROUPINGSEach spout or bolt are running X instances in parallel (called tasks).
Groupings are used to decide which task in the subscribing bolt, the tuple is sent to
Shuffle grouping is a random grouping
Fields grouping is grouped by value, such that equal value results in equal task
All grouping replicates to all tasks
Global grouping makes all tuples go to one task
None grouping makes bolt run in same thread as bolt/spout it subscribes to
Direct grouping producer (task that emits) controls which consumer will receive
2 tasks
2 tasks
4 tasks 3 tasks
EXAMPLE
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("words", new TestWordSpout(), 10);
builder.setBolt("exclaim1", new ExclamationBolt(), 3)
.shuffleGrouping("words");
builder.setBolt("exclaim2", new ExclamationBolt(), 2)
.shuffleGrouping("exclaim1");
The sourcecode for this example is part of the storm-starter project on github
Run 10 tasks
Run 3 tasks
Run 2 tasks
Create stream called ”words”
Create stream called ”exclaim1”
Subscribe to stream ”words”, using shufflegrouping
Create stream called ”exclaim2”
Subscribe to stream ”exclaim1”, using shufflegrouping
A bolt can subscribe to an unlimited number of streams, by chaining groupings.
TestWordSpout ExclamationBolt ExclamationBolt
EXAMPLE – 1
TestWordSpout
public void nextTuple() {
Utils.sleep(100);
final String[] words = new String[] {"nathan", "mike", "jackson", "golda", "bertels"};
final Random rand = new Random();
final String word = words[rand.nextInt(words.length)];
_collector.emit(new Values(word));
}
The TestWordSpout emits a random string from the array words, each 100 milliseconds
TestWordSpout ExclamationBolt ExclamationBolt
EXAMPLE – 2
ExclamationBolt
OutputCollector _collector;public void prepare(Map conf, TopologyContext context, OutputCollector collector) {
_collector = collector; } public void execute(Tuple tuple) {
_collector.emit(tuple, new Values(tuple.getString(0) + "!!!"));_collector.ack(tuple);
} public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word"));}
declareOutputFields is used to declare streams and their schemas. It
is possible to declare several streams and specify the stream to use
when outputting tuples in the emit function call.
Prepare is called when bolt is created
Execute is called for each tuple
declareOutputFields is called when bolt is created
TestWordSpout ExclamationBolt ExclamationBolt
FAULT TOLERANCEZookeeper stores metadata in a very robust way
Nimbus and Supervisor are stateless and only need metadata from ZK to work/restart
When a node dies
• The tasks will time out and be reassigned to other workers by Nimbus.
When a worker dies
• The supervisor will restart the worker.• Nimbus will reassign worker to another supervisor, if no heartbeats are sent.• If not possible (no free ports), then tasks will be run on other workers in
topology. If more capacity is added to the cluster later, STORM will automatically initialize a new worker and spread out the tasks.
When nimbus or supervisor dies
• Workers will continue to run• Workers cannot be reassigned without Nimbus• Nimbus and Supervisor should be run using a process monitoring tool, to
restarts them automatically if they fail.
AT-LEAST-ONCE PROCESSINGSTORM guarantees at-least-once processing of tuples.
Message id, gets assigned to a tuple when emitting from spout or bolt. Is 64 bits long
Tree of tuples is the tuples generated (directly and indirectly) from a spout tuple.
Ack is called on spout, when tree of tuples for spout tuple is fully processed.
Fail is called on spout, if one of the tuples in the tree of tuples fails or the tree of tuples is not fully processed within a specified timeout (default is 30 seconds).
It is possible to specify the message id, when emitting a tuple. This might be useful for replaying tuples from a queue.
Ack/fail method called when tree of tuples have been fully processed or
failed / timed-out
AT-LEAST-ONCE PROCESSING – 2Anchoring is used to copy the spout tuple message id(s) to the new tuples generated. In this way, every tuple knows the message id(s) of all spout tuples.
Multi-anchoring is when multiple tuples are anchored. If the tuple tree fails, then multiple spout tuples will be replayed. Useful for doing streaming joins and more.
Ack called from a bolt, indicates the tuple has been processed as intented
Fail called from a bolt, replays the spout tuple(s)
Every tuple must be acked/failed or the task will run out of memory at some point.
_collector.emit(tuple, new Values(word)); Uses anchoring
_collector.emit(new Values(word)); Does NOT use anchoring
AT-LEAST-ONCE PROCESSING – 3Acker tasks tracks the tree of tuples for every spout tuple
• The acker task responsible for a given spout tuple is determined by modulo on message id. Since all tuples have all spout tuple message ids, it is easy to call the correct acker tasks.
• Acker task stores a map, the format is {spoutMsgId, {spoutTaskId, ”ack val”}}• ”ack val” is the representation of state of entire tree of tuples. It is the xor of
all tuple message ids created and acked in the tree of tuples.• When ”ack val” is 0, then tuple tree is fully processed.• Since message ids are random 64 bits numbers, chances of ”ack val”
becoming 0 by accident is extremely small.
Important to set number of acker tasks in topology when processing large amounts of tuples (defaults to 1)
AT-LEAST-ONCE PROCESSING – 4
SpoutTask: 1
BoltTask: 2
BoltTask: 3
BoltTask: 4
Emit ”hey”
msgId:10
Emit ”h”
spoutIds: 10msgId: 3
spoutIds: 10msgId: 2
Emit ”ey”
Shows what happens in acker task, for one spout tuple. Format is: {spoutMsgId, {spoutTaskId, ”ack val”}}
1. After emit ”hey”: {10, {1, 0000 XOR 1010 = 1010}2. After emit ”h”: {10, {1, 1010 XOR 0010 = 1000}3. After emit ”ey”: {10, {1, 1000 XOR 0011 = 1011}4. After ack ”hey”: {10, {1, 1011 XOR 1010 = 0001}5. After ack ”h”: {10, {1, 0001 XOR 0010 = 0011}6. After ack ”ey”: {10, {1, 0011 XOR 0011 = 0000}7. Since ”ack val” is 0, spout tuple with id 10, must be fully processed. Call ack on spout (task 1)
USES 64 BIT IDS IN REALITY
Example
AT-LEAST-ONCE PROCESSING – 5 A tuple isn't acked because the task died:
The spout tuple(s) at the root of the tree of tuples will time out and be replayed.
Acker task dies:
All the spout tuples the acker was tracking will time out and be replayed.
Spout task dies:
In this case the source that the spout talks to is responsible for replaying the messages. For example, queues like Kestrel and RabbitMQ will place all pending messages back on the queue when a client disconnects.
AT-LEAST-ONCE PROCESSING – 6At-least-once processing might process a tuple more than once.
Example
SpoutTask: 1
BoltTask: 2
BoltTask: 3
All grouping 1. A spout tuple is emitted to task 2 and 32. Worker responsible for task 3 fails3. Supervisor restarts worker4. Spout tuple is replayed and emitted to task 2 and 35. Task 2 will now have executed the same bolt twice
Consider why the all grouping is not important in this example
EXACTLY-ONCE-PROCESSINGTransactional topologies (TT) is an abstraction built on STORM primitives.
TT guarantees exactly-once-processing of tuples.
Acking is optimized in TT, no need to do anchoring or acking manually.
Bolts execute as new instances per attempt of processing a batch
Example
SpoutTask: 1
BoltTask: 2
BoltTask: 3
All grouping 1. A spout tuple is emitted to task 2 and 32. Worker responsible for task 3 fails3. Supervisor restarts worker4. Spout tuple is replayed and emitted to task 2 and 35. Task 2 and 3 initiate new bolts because of new attempt5. Now there is no problem
EXACTLY-ONCE-PROCESSING – 2 For efficiency batch processing of tuples is introduced in TT
Batch has two states: processing or committing
Many batches can be in the processing state concurrently
Only one batch can be in the committing state, and a strong ordering is imposed. That means batch 1 will always be committed before batch 2 and so on.
Types of bolts for TT: BasicBolt, BatchBolt, BatchBolt marked as committer
BasicBolt is processing one tuple at a time.
BatchBolt is processing batches. Call finishBatch when all tuples of batch is executed
BatchBolt marked as committer is calling finishBatch only when batch is in committing state.
EXACTLY-ONCE-PROCESSING – 3 Transactional spout has capability to replay exact batches of tuples batchbolt
Committerbatchbolt batchbolt
Committerbatchbolt
BATCH IS IN PROCESSING STATE
Bolt A: execute method is called for all tuples received from spout
finishBatch is called when first batch is received
Bolt B: execute method is called for all tuples received from bolt A
finishBatch is NOT called because batch is in processing state
Bolt C: execute method is called for all tuples received from bolt A (and B)
finishBatch is NOT called, because bolt B has not called finishBatch
Bolt D: execute method is called for all tuples received from bolt C
finishBatch is NOT called because batch is in processing state
BATCH CHANGES TO COMMITTING STATE
Bolt B: finishBatch is called
Bolt C: finishBatch is called, because we know we got all tuples from Bolt B now
Bolt D: finishBatch is called, because we know we got all tuples from Bolt C now
EXACTLY-ONCE-PROCESSING – 4
Transactional spout
Regular spout, parallelism of 1Defined streams: batch & commit
Regular bolt,Parallelism of P
When batch should enter processing state:• Coordinator emits a tuple with TransactionAttempt and the metadata for that
transaction to the "batch" stream.• All emitter tasks receives the tuple and begins to emit their portion of tuples for
the given batch.
All groupings on batch stream
When processing phase of batch is done (determined by acker task):• Ack gets called on coordinator
When ack gets called on coordinator and all prior transactions have committed:• Coordinator emits a tuple with TransactionAttempt to the commit stream.• All Bolts which are marked as committers subscribe to the commit stream of the
coordinator using an all grouping.• Bolts marked as committers now know the batch is in the committing phase
When batch is fully processed again (determined by acker task):• Ack gets called on coordinator• Coordinator knows batch is now committed
STORM LIBRARIESSTORM uses a lot of libraries. The most prominent are
Clojure a new lisp programming language. Crash-course follows
Jetty an embedded webserver. Used to host the UI of Nimbus.
Kryo a fast serializer, used when sending tuples
Thrift a framework to build services. Nimbus is a thrift daemon
ZeroMQ a very fast transportation layer
Zookeeper a distributed system for storing metadata
LEARN MORE
Wiki (https://github.com/nathanmarz/storm/wiki)
Storm-starter (https://github.com/nathanmarz/storm-starter)
Mailing list (http://groups.google.com/group/storm-user)
#storm-user room on freenode
from: http://www.cupofjoe.tv/2010/11/learn-lesson.html