Apache Storm and twitter Streaming API integration
-
Upload
udayaprasad-v -
Category
Software
-
view
509 -
download
0
description
Transcript of Apache Storm and twitter Streaming API integration
Welcome
Integration of Storm and Twitter Streaming API
Agenda
• What is Storm?• Storm Benefits• How Storm differentiates from Hadoop• Storm vs. Flume• Storm Example using Twitter Streaming API• Quiz
• Storm is a Fault tolerant, distributed, real-time computation system.
• It’s a Non persistent API.• On a Storm cluster, we basically execute topologies,
which process streams of tuples (data).• Each Topology is a graph consisting of Spouts(which
produce tuples) and bolts (which transform tuples).
What is Storm?
• Once Storm Topology submitted, also, if all the computation logic written in bolts are correct, then it just works.
Storm Benefits
Storm HadoopDistributed & fault tolerant Distributed & fault tolerant
Real-time Computation system
Batch Processing system
Non persistent Persistent, Uses HDFS for file storage
Storm Vs. HadoopStorm Vs. Hadoop
Storm FlumeReal-time Streaming systems Real-time Streaming systems
Real-time Computation system Not an Real-time Computation system
It will not Use any Message brokers for real-time processing of data
It uses Channel, as a message broker between Source and Sink
Storm Vs. Flume
Topology Scenario:- I have taken one spout(TwitterSampleSpout) and three
bolts(WordSplitterBolt, IgnoreWordsBolt, WordCounterBolt) in this project.
Here spout(TwitterSampleSpout) work is to download Tweets from Twitter and send it back to WordSplitterBolt.
The WordSplitterBolt work is to split the entire text into words by using space delimiter, and it will send those words to IgnoreWordsBolt.
The IgnoreWordsBolt work is to ignore determiners like(a, an, the.. etc), it just act like a filter, later it will send the final list of words to WordCounterBolt. There actual count will happen, in console it will show top counted list of words. Just works like a Twitter trends.
This process will continue forever and aggregate all the list of words and find its count.
Storm Example using Twitter Streaming API
TwitterSampleSpout
WordSplitterBolt
IgnoreWordsBolt
WordCounterBolt
Topology
Thanks to all