Apache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Developing Java Streaming Applications with Apache Storm
-
Upload
lester-martin -
Category
Data & Analytics
-
view
183 -
download
2
Transcript of Developing Java Streaming Applications with Apache Storm
![Page 1: Developing Java Streaming Applications with Apache Storm](https://reader035.fdocuments.in/reader035/viewer/2022081515/5aabae457f8b9aaf528b45a9/html5/thumbnails/1.jpg)
Page 1
Developing Java Streaming Applicationswith Apache Storm
Lester Martin www.ajug.org - Nov 2017
![Page 2: Developing Java Streaming Applications with Apache Storm](https://reader035.fdocuments.in/reader035/viewer/2022081515/5aabae457f8b9aaf528b45a9/html5/thumbnails/2.jpg)
Page 2
Connection before ContentLester Martin – Hadoop/Spark/Storm Trainer & Consultant
http://lester.website (links to blog, twitter, github, LI, FB, etc)
![Page 3: Developing Java Streaming Applications with Apache Storm](https://reader035.fdocuments.in/reader035/viewer/2022081515/5aabae457f8b9aaf528b45a9/html5/thumbnails/3.jpg)
Page 3
Agenda – Needs Updating!!!!• What is Storm?• Conceptual Model• Compile Time• DEMO: Develop Word Count Topology
• Runtime• DEMO: Submit Word Count Topology• Additional Features• DEMO: Kafka > Storm > HBase Topology in Local Cluster
![Page 4: Developing Java Streaming Applications with Apache Storm](https://reader035.fdocuments.in/reader035/viewer/2022081515/5aabae457f8b9aaf528b45a9/html5/thumbnails/4.jpg)
Page 4
What is Storm?
![Page 5: Developing Java Streaming Applications with Apache Storm](https://reader035.fdocuments.in/reader035/viewer/2022081515/5aabae457f8b9aaf528b45a9/html5/thumbnails/5.jpg)
Page 5
Storm is …
à Streaming– Key enabler of the Lambda Architecture
à Fast– Clocked at 1M+ messages per second per node
à Scalable– Thousands of workers per cluster
à Fault Tolerant– Failure is expected, and embraced
à Reliable– Guaranteed message delivery– Exactly-once semantics
![Page 6: Developing Java Streaming Applications with Apache Storm](https://reader035.fdocuments.in/reader035/viewer/2022081515/5aabae457f8b9aaf528b45a9/html5/thumbnails/6.jpg)
Page 6
Storm in the Lambda Architecture
persists data
Hadoop
batch processing
batch feedsUpdate event models
Pattern templates, key-performance indicators, and
alerts
Dashboards and Applications
Stormreal-time data feeds
![Page 7: Developing Java Streaming Applications with Apache Storm](https://reader035.fdocuments.in/reader035/viewer/2022081515/5aabae457f8b9aaf528b45a9/html5/thumbnails/7.jpg)
Page 7
Conceptual Model
![Page 8: Developing Java Streaming Applications with Apache Storm](https://reader035.fdocuments.in/reader035/viewer/2022081515/5aabae457f8b9aaf528b45a9/html5/thumbnails/8.jpg)
Page 8
TUPLE
{…}
![Page 9: Developing Java Streaming Applications with Apache Storm](https://reader035.fdocuments.in/reader035/viewer/2022081515/5aabae457f8b9aaf528b45a9/html5/thumbnails/9.jpg)
Page 9
Tuple
à Unit of work to be processesà Immutable ordered set of serializable valuesà Fields must have assigned name
{…}
![Page 10: Developing Java Streaming Applications with Apache Storm](https://reader035.fdocuments.in/reader035/viewer/2022081515/5aabae457f8b9aaf528b45a9/html5/thumbnails/10.jpg)
Page 10
Stream
à Core abstraction of Stormà Unbounded sequence of Tuples
{…} {…} {…} {…} {…} {…} {…}
![Page 11: Developing Java Streaming Applications with Apache Storm](https://reader035.fdocuments.in/reader035/viewer/2022081515/5aabae457f8b9aaf528b45a9/html5/thumbnails/11.jpg)
Page 11
SPOUT
![Page 12: Developing Java Streaming Applications with Apache Storm](https://reader035.fdocuments.in/reader035/viewer/2022081515/5aabae457f8b9aaf528b45a9/html5/thumbnails/12.jpg)
Page 12
Spout
à Source of Streamsà Wrap an event source and emit Tuples
![Page 13: Developing Java Streaming Applications with Apache Storm](https://reader035.fdocuments.in/reader035/viewer/2022081515/5aabae457f8b9aaf528b45a9/html5/thumbnails/13.jpg)
Page 13
Message QueuesMessage queues are often the source of the data processed by StormStorm Spouts integrate with many types of message queues
real-time data source
operating systems,
services and applications,
sensors
Kestrel, RabbitMQ,
AMQP, Kafka, JMS, others…
message queue
log entries, events, errors,
status messages, etc.
Storm
data from queue is read by Storm
![Page 14: Developing Java Streaming Applications with Apache Storm](https://reader035.fdocuments.in/reader035/viewer/2022081515/5aabae457f8b9aaf528b45a9/html5/thumbnails/14.jpg)
Page 14
BOLT
![Page 15: Developing Java Streaming Applications with Apache Storm](https://reader035.fdocuments.in/reader035/viewer/2022081515/5aabae457f8b9aaf528b45a9/html5/thumbnails/15.jpg)
Page 15
Bolt
à Core unit of computationà Receive Tuples and do stuffà Optionally, emit additional Tuples
![Page 16: Developing Java Streaming Applications with Apache Storm](https://reader035.fdocuments.in/reader035/viewer/2022081515/5aabae457f8b9aaf528b45a9/html5/thumbnails/16.jpg)
Page 16
Bolt
à Write to a data store
![Page 17: Developing Java Streaming Applications with Apache Storm](https://reader035.fdocuments.in/reader035/viewer/2022081515/5aabae457f8b9aaf528b45a9/html5/thumbnails/17.jpg)
Page 17
Bolt
à Read from a data store
![Page 18: Developing Java Streaming Applications with Apache Storm](https://reader035.fdocuments.in/reader035/viewer/2022081515/5aabae457f8b9aaf528b45a9/html5/thumbnails/18.jpg)
Page 18
Bolt
à Perform arbitrary computation
![Page 19: Developing Java Streaming Applications with Apache Storm](https://reader035.fdocuments.in/reader035/viewer/2022081515/5aabae457f8b9aaf528b45a9/html5/thumbnails/19.jpg)
Page 19
Bolt
à (Optionally) Emit additional Stream(s)
![Page 20: Developing Java Streaming Applications with Apache Storm](https://reader035.fdocuments.in/reader035/viewer/2022081515/5aabae457f8b9aaf528b45a9/html5/thumbnails/20.jpg)
Page 20
TOPOLOGY
![Page 21: Developing Java Streaming Applications with Apache Storm](https://reader035.fdocuments.in/reader035/viewer/2022081515/5aabae457f8b9aaf528b45a9/html5/thumbnails/21.jpg)
Page 21
Topology
à DAG of Spouts and Boltsà Data Flow Representationà Streaming Computation
![Page 22: Developing Java Streaming Applications with Apache Storm](https://reader035.fdocuments.in/reader035/viewer/2022081515/5aabae457f8b9aaf528b45a9/html5/thumbnails/22.jpg)
Page 22
Topology
à Storm executes Spouts and Bolts as Tasks that run in parallel on multiple machines
![Page 23: Developing Java Streaming Applications with Apache Storm](https://reader035.fdocuments.in/reader035/viewer/2022081515/5aabae457f8b9aaf528b45a9/html5/thumbnails/23.jpg)
Page 23
Parallel Execution of Topology Components
a logical topology
spout A
bolt A bolt B
bolt C
a physical implementation
machine A
machine B
machine E
machine C
machine D
machine F
machine G
spout A two tasks
bolt A two tasks
bolt B two tasks
bolt C one task
![Page 24: Developing Java Streaming Applications with Apache Storm](https://reader035.fdocuments.in/reader035/viewer/2022081515/5aabae457f8b9aaf528b45a9/html5/thumbnails/24.jpg)
Page 24
Stream GroupingsStream Groupings determine how Storm routes Tuples between Tasks
Grouping Type Routing BehaviorShuffle Randomized round-robin (evenly distribute
load to downstream Bolts)Fields Ensures all Tuples with the same Field
value(s) are always routed to the same TaskAll Replicates Stream across all the Bolt’s
Tasks (use with care)Other options Including custom RYO grouping logic
![Page 25: Developing Java Streaming Applications with Apache Storm](https://reader035.fdocuments.in/reader035/viewer/2022081515/5aabae457f8b9aaf528b45a9/html5/thumbnails/25.jpg)
Page 25
Compile Time
@Overridepublic void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields(”sentence"));}
![Page 26: Developing Java Streaming Applications with Apache Storm](https://reader035.fdocuments.in/reader035/viewer/2022081515/5aabae457f8b9aaf528b45a9/html5/thumbnails/26.jpg)
Page 26
Example Spout Code (1 of 2)
public class RandomSentenceSpout extends BaseRichSpout {SpoutOutputCollector _collector;Random _rand;
@Overridepublic void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
_collector = collector;_rand = new Random();
}@Overridepublic void nextTuple() {
Utils.sleep(100);String[] sentences = new String[]{ "the cow jumped over the moon", "an apple a day keeps
the doctor away", "four score and seven years ago", "snow white and the seven dwarfs", "i am at two with nature" };
String sentence = sentences[_rand.nextInt(sentences.length)];_collector.emit(new Values(sentence));
}
Continued next page…
Storm uses open to open the spout and provide it with its configuration, a context object providing information about components in the topology, and an output collector used to emit tuples.
Storm uses nextTuple to request the spout emit the next tuple.
The spout uses emit to send a tuple to one or more bolts.
Name of the spout class. Storm spout class used as a “template”.
![Page 27: Developing Java Streaming Applications with Apache Storm](https://reader035.fdocuments.in/reader035/viewer/2022081515/5aabae457f8b9aaf528b45a9/html5/thumbnails/27.jpg)
Page 27
Example Spout Code (2 of 2)
@Overridepublic void ack(Object id) {}@Overridepublic void fail(Object id) {}@Overridepublic void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields(”sentence"));}
}
Storm calls the spout’s ack method to signal that a tuple has been fully processed.
Storm calls the spout’s fail method to signal that a tuple has not been fully processed.
The declareOutputFieldsmethod names the fields in a tuple.
Continued…
![Page 28: Developing Java Streaming Applications with Apache Storm](https://reader035.fdocuments.in/reader035/viewer/2022081515/5aabae457f8b9aaf528b45a9/html5/thumbnails/28.jpg)
Page 28
Example Bolt Code
public static class ExclamationBolt extends BaseRichBolt {OutputCollector _collector;
public void prepare(Map conf, TopologyContext context, OutputCollector collector) {_collector = collector;
}
public void execute(Tuple tuple) {_collector.emit(tuple, new Values(tuple.getString(0) + "!!!"));_collector.ack(tuple);
}
public void cleanup(); {}
public void declareOutputFields(OutputFieldsDeclarer declarer) {declarer.declare(new Fields("word"));
} }
The prepare method provides the bolt with its configuration and an OutputCollectorused to emit tuples.
The execute method receives a tuple from a stream and emits a new tuple. It also provides an ackmethod that can be used after successful delivery.
The cleanup method releases system resources when bolt is shut down.
Names the fields in the output tuples. More detail later.
Name of the bolt class. Bolt class used as a “template.”
![Page 29: Developing Java Streaming Applications with Apache Storm](https://reader035.fdocuments.in/reader035/viewer/2022081515/5aabae457f8b9aaf528b45a9/html5/thumbnails/29.jpg)
Page 29
Example Topology Code
public static main(String[] args) throws exception {
TopologyBuilder builder = new TopologyBuilder();builder.setSpout(“words”, new TestWordSpout());builder.setBolt(“exclaim1”, new NewExclamationBolt()).shuffleGrouping(“words”);builder.setBolt(“exclaim2”, new NewExclamationBolt()).shuffleGrouping(“exclaim1”);
Config conf = new Config();
StormSubmitter.submitTopology(”add-exclamation", conf, builder.createTopology());}
This code…
words exclaim1 exclaim2shuffleGrouping shuffleGrouping
…builds this Topology.
runs code in TestWordSpout()
runs code in NewExclamationBolt()
runs code in NewExclamationBolt()
![Page 30: Developing Java Streaming Applications with Apache Storm](https://reader035.fdocuments.in/reader035/viewer/2022081515/5aabae457f8b9aaf528b45a9/html5/thumbnails/30.jpg)
Page 30
DEMODevelop Word Count Topology
![Page 31: Developing Java Streaming Applications with Apache Storm](https://reader035.fdocuments.in/reader035/viewer/2022081515/5aabae457f8b9aaf528b45a9/html5/thumbnails/31.jpg)
Page 31
Runtime
Nimbus
Supervisor
Supervisor
Supervisor
Supervisor
![Page 32: Developing Java Streaming Applications with Apache Storm](https://reader035.fdocuments.in/reader035/viewer/2022081515/5aabae457f8b9aaf528b45a9/html5/thumbnails/32.jpg)
Page 32
Physical View
![Page 33: Developing Java Streaming Applications with Apache Storm](https://reader035.fdocuments.in/reader035/viewer/2022081515/5aabae457f8b9aaf528b45a9/html5/thumbnails/33.jpg)
Page 33
Topology Submitter uploads topology:• topology.jar• topology.ser• conf.ser
Topology Deployment
![Page 34: Developing Java Streaming Applications with Apache Storm](https://reader035.fdocuments.in/reader035/viewer/2022081515/5aabae457f8b9aaf528b45a9/html5/thumbnails/34.jpg)
Page 34
Topology Deployment
Nimbus calculates assignments and sends to Zookeeper
![Page 35: Developing Java Streaming Applications with Apache Storm](https://reader035.fdocuments.in/reader035/viewer/2022081515/5aabae457f8b9aaf528b45a9/html5/thumbnails/35.jpg)
Page 35
Topology Deployment
Supervisor nodes receive assignment information via Zookeeper watches
![Page 36: Developing Java Streaming Applications with Apache Storm](https://reader035.fdocuments.in/reader035/viewer/2022081515/5aabae457f8b9aaf528b45a9/html5/thumbnails/36.jpg)
Page 36
Topology Deployment
Supervisor nodes download topology from Nimbus:• topology.jar• topology.ser• conf.ser
![Page 37: Developing Java Streaming Applications with Apache Storm](https://reader035.fdocuments.in/reader035/viewer/2022081515/5aabae457f8b9aaf528b45a9/html5/thumbnails/37.jpg)
Page 37
Topology Deployment
Supervisors spawn workers (JVM processes)
![Page 38: Developing Java Streaming Applications with Apache Storm](https://reader035.fdocuments.in/reader035/viewer/2022081515/5aabae457f8b9aaf528b45a9/html5/thumbnails/38.jpg)
Page 38
DEMOSubmit Topology to Storm Topology
![Page 39: Developing Java Streaming Applications with Apache Storm](https://reader035.fdocuments.in/reader035/viewer/2022081515/5aabae457f8b9aaf528b45a9/html5/thumbnails/39.jpg)
Page 39
Additional Features
FAIL
![Page 40: Developing Java Streaming Applications with Apache Storm](https://reader035.fdocuments.in/reader035/viewer/2022081515/5aabae457f8b9aaf528b45a9/html5/thumbnails/40.jpg)
Page 40
Local Versus Distributed Storm ClustersThe topology program code submitted to Storm using storm jar is different when submitting to local mode versus a distributed cluster. The submitTopology method is used in both cases.• The difference is the class that contains the submitTopology method.
Config conf = new Config();LocalCluster cluster = new LocalCluster();LocalCluster.submitTopology("mytopology", conf, topology);
Config conf = new Config(); StormSubmitter.submitTopology("mytopology", conf, topology);
Instantiate a local cluster object.
Submit a topology to a local cluster.
Submit a topology to a distributed cluster.Same method
name, different classes
Same method name, different classes.
![Page 41: Developing Java Streaming Applications with Apache Storm](https://reader035.fdocuments.in/reader035/viewer/2022081515/5aabae457f8b9aaf528b45a9/html5/thumbnails/41.jpg)
Page 41
Reliable Processing
Bolts may emit Tuples Anchored to one received.Tuple “B” is a descendant of Tuple “A”
![Page 42: Developing Java Streaming Applications with Apache Storm](https://reader035.fdocuments.in/reader035/viewer/2022081515/5aabae457f8b9aaf528b45a9/html5/thumbnails/42.jpg)
Page 42
Reliable Processing
Multiple Anchorings form a Tuple tree(bolts not shown)
![Page 43: Developing Java Streaming Applications with Apache Storm](https://reader035.fdocuments.in/reader035/viewer/2022081515/5aabae457f8b9aaf528b45a9/html5/thumbnails/43.jpg)
Page 43
Reliable Processing
Bolts can Acknowledge that a tuple has been processed successfully.
ACK
![Page 44: Developing Java Streaming Applications with Apache Storm](https://reader035.fdocuments.in/reader035/viewer/2022081515/5aabae457f8b9aaf528b45a9/html5/thumbnails/44.jpg)
Page 44
Reliable Processing
Bolts can also Fail a tuple to trigger a spout to replay the original.
FAIL
![Page 45: Developing Java Streaming Applications with Apache Storm](https://reader035.fdocuments.in/reader035/viewer/2022081515/5aabae457f8b9aaf528b45a9/html5/thumbnails/45.jpg)
Page 45
Reliable Processing
Any failure in the Tuple tree will trigger a replay of the original tuple
![Page 46: Developing Java Streaming Applications with Apache Storm](https://reader035.fdocuments.in/reader035/viewer/2022081515/5aabae457f8b9aaf528b45a9/html5/thumbnails/46.jpg)
Page 46
More Stuff
à Topology description/deployment options– Flux– Storm SQL
à Polyglot developmentà Micro-batching with Tridentà Fault tolerance & deployment isolationà Integrations
– Messaging; Kafka, Redis, Kestrel, Kinesis, MQTT, JMS– Databases; HBase, Hive, Druid, Cassandra, MongoDB, JDBC– Search Engines; Solr, Elasticsearch– HDFS– And more!
![Page 47: Developing Java Streaming Applications with Apache Storm](https://reader035.fdocuments.in/reader035/viewer/2022081515/5aabae457f8b9aaf528b45a9/html5/thumbnails/47.jpg)
Page 47
DEMOKafka > Storm > HBase Topology in a Local Cluster
![Page 48: Developing Java Streaming Applications with Apache Storm](https://reader035.fdocuments.in/reader035/viewer/2022081515/5aabae457f8b9aaf528b45a9/html5/thumbnails/48.jpg)
Page 48
Kafka > Storm > HBase ExampleRequirements:• Land simulated server logs into Kafka• Configure a Kafka Bolt to consume the server log messages• Ignore all messages that are not either WARN or ERROR• Persist WARN and ERROR messages into HBase
– Keep 10 most recent messages for each server
– Maintain a running total of these concerning messages
• Publish these messages back to Kafka
Kafka
Kafka
HBase
HBaseParse FilterKafka
Kafka
![Page 49: Developing Java Streaming Applications with Apache Storm](https://reader035.fdocuments.in/reader035/viewer/2022081515/5aabae457f8b9aaf528b45a9/html5/thumbnails/49.jpg)
Page 49
Questions?Lester Martin – Hadoop/Spark/Storm Trainer & Consultant
http://lester.website (links to blog, twitter, github, LI, FB, etc)
THANKS FOR YOUR TIME!!