Storm overview & integration
-
Upload
vanja-radovanovic -
Category
Technology
-
view
107 -
download
1
description
Transcript of Storm overview & integration
![Page 1: Storm overview & integration](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c6dda74a7959261a8b45be/html5/thumbnails/1.jpg)
STORM
Buckle up Dorothy !!!
![Page 2: Storm overview & integration](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c6dda74a7959261a8b45be/html5/thumbnails/2.jpg)
Distributed real-time computation
ABOUT
By Nathan MarzBacktype => Twitter => Apache
![Page 3: Storm overview & integration](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c6dda74a7959261a8b45be/html5/thumbnails/3.jpg)
Real-time analytics
WHAT IS IT GOOD FOR?
Online machine learningContinuous computationDistributed RPCETL (Extract, Transform, Load)…
![Page 4: Storm overview & integration](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c6dda74a7959261a8b45be/html5/thumbnails/4.jpg)
No data loss
Fault-tolerantScalable
PROMISES
Robust
![Page 5: Storm overview & integration](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c6dda74a7959261a8b45be/html5/thumbnails/5.jpg)
VIEW FROM ABOVE
StorageTopologyStreamSource
Storm Cluster
Pull
(Kafka,*MQ, …)
Read/Write
![Page 6: Storm overview & integration](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c6dda74a7959261a8b45be/html5/thumbnails/6.jpg)
PRIMITIVES
Field 1 / Value 1
Field 2 / Value 2
Field 3 / Value 3
Field 4 / Value 4
Field 5 / Value 5
Tuple
Tuple Tuple Tuple Tuple
Stream
![Page 7: Storm overview & integration](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c6dda74a7959261a8b45be/html5/thumbnails/7.jpg)
TopologyBolt
PRIMITIVES
SpoutT T
T
Bolt
Spout
Bolt
Bolt
TT
T
T TT
T TT
TT
T
TT
T
T
![Page 8: Storm overview & integration](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c6dda74a7959261a8b45be/html5/thumbnails/8.jpg)
ABSTRACTION
PRIMITIVES
TuplesFilters
TransformationIncrementalDistributedScalable
FunctionsJoins
Chaining streamsSmall components
EFFECTS
SpoutsBolts
![Page 9: Storm overview & integration](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c6dda74a7959261a8b45be/html5/thumbnails/9.jpg)
CLUSTER
Nimbus Zookeeper Cluster
Worker Node
Executor
Supervisor
Executor
Executor
Worker Node
Executor
Supervisor
Executor
Executor
Worker Node
Executor
Supervisor
Executor
Executor
![Page 10: Storm overview & integration](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c6dda74a7959261a8b45be/html5/thumbnails/10.jpg)
NIMBUS / NODES
CLUSTER
SmallNo state
CommunicationStateRobustKill / Restart easy
ZOOKEEPER
![Page 11: Storm overview & integration](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c6dda74a7959261a8b45be/html5/thumbnails/11.jpg)
No data loss
Fault-tolerantScalable
AS PROMISED?
Robust
![Page 12: Storm overview & integration](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c6dda74a7959261a8b45be/html5/thumbnails/12.jpg)
GUARANTEES
Message transforms into a tuple treeStorm tracks tuple treeFully processed when tree exhausted
![Page 13: Storm overview & integration](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c6dda74a7959261a8b45be/html5/thumbnails/13.jpg)
FAILURES
Task died – failed tuples replayedAcker task died – related tuples timeout and are replayedSpout task died – source replays, e.g. pending messages are placed back on the queue
![Page 14: Storm overview & integration](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c6dda74a7959261a8b45be/html5/thumbnails/14.jpg)
WHAT DO I HAVE TO DO?
Inform about new links in treeInform when finished with a tupleEvery tuple must be acked or failed
![Page 15: Storm overview & integration](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c6dda74a7959261a8b45be/html5/thumbnails/15.jpg)
TRIDENT
ANYTHING SIMPLER?
High level abstractionStateful persistence primitives Exactly-once semantics
![Page 16: Storm overview & integration](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c6dda74a7959261a8b45be/html5/thumbnails/16.jpg)
AS PROMISED?
YES
![Page 17: Storm overview & integration](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c6dda74a7959261a8b45be/html5/thumbnails/17.jpg)
USER DASHBOARD
PROBLEMBad performanceUses core storage
Pre-computeCustomizeFast
IDEA
IsolateQuarterly agg.
![Page 18: Storm overview & integration](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c6dda74a7959261a8b45be/html5/thumbnails/18.jpg)
ARCHITECTURE
Core
Events Queue
Kafka
4 Partitions 2 Replicas
Storm
4 Workers
MS SQL
4 Staging
Dashboard
Push
Pull Write
Read
State in source
![Page 19: Storm overview & integration](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c6dda74a7959261a8b45be/html5/thumbnails/19.jpg)
KAFKA
987654321
New
Client
Topic StackedFlushedClient offsetReplicated
Old
PartitionedFast
![Page 20: Storm overview & integration](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c6dda74a7959261a8b45be/html5/thumbnails/20.jpg)
TRANSFORMATION
ORIGINAL{ id: df45er87c78df, sender: “Info”, destination: “39345123456”, parts: 2, price: 100, client: “Demo”, time: “2014-06-02 14:47:58”, country: “IT”, network: “Wind”, type: “SMS”, …}
{ client: “Demo”, type: “SMS”, country: “IT”, network: “Wind”, bucket: “2014-06-02 14:45:00”, traffic: 2, expenses: 200}
COMPUTED
![Page 21: Storm overview & integration](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c6dda74a7959261a8b45be/html5/thumbnails/21.jpg)
CODE
TridentState tridentState = topology .newStream("CoreEvents", buildKafkaSpout()) .parallelismHint(4) .each( new Fields("bytes"), new CoreEventMessageParser(), new Fields("time", "client", "network", "country", "type", "parts", "price")) .each( new Fields("time"), new QuarterTimeBucket(), new Fields("bucket")) .project(new Fields("bucket", "client", "network", "country", "type", "traffic", "expenses“)) .groupBy(new Fields("bucket", "client", "network", "country", "type")) .persistentAggregate(getStateFactory(), new Fields("traffic", "expenses"), new Sum(), new Fields("trafficExpenses")) .parallelismHint(8);
![Page 22: Storm overview & integration](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c6dda74a7959261a8b45be/html5/thumbnails/22.jpg)
PERFORMANCE
1.500
PEAK
REGULARKAFKA 60.000
4.500 160.000STORA
GE2.000 10.000
DASHBOARD
1 1
![Page 23: Storm overview & integration](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c6dda74a7959261a8b45be/html5/thumbnails/23.jpg)
TUNING STORAGE
1st Issue - StorageRandom access – 1.500 w/s limitStaged approach – 30.000 w/s limit
No locks – isolatedScalable – each worker it’s stageMain table indexing nicelyDoesn’t affect reading
![Page 24: Storm overview & integration](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c6dda74a7959261a8b45be/html5/thumbnails/24.jpg)
STAGED WRITES
Worker 1
Main Table
MergeWorker 2
Stage Table 1
Stage Table 2
MergeWrite
Write
![Page 25: Storm overview & integration](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c6dda74a7959261a8b45be/html5/thumbnails/25.jpg)
TUNING TOPOLOGY
2nd Issue - Serialization
Raw/s Expanded/s Writes/s0
20,000
40,000
60,000
80,000
100,000
120,000
140,000
160,000
200 KB1 MB4 MB8 MB16 MB24 MB
Plateauing
![Page 26: Storm overview & integration](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c6dda74a7959261a8b45be/html5/thumbnails/26.jpg)
SERIALIZATION
S [s] S [byte] S [% CPU] D [s] D [% CPU]0
200
400
600
800
1,000
1,200
CSV (Plain)CSV (Deflate)CSV (GZip)Jackson (Plain)Jackson (GZip)Jackson SmileJava ObjectKryo
![Page 27: Storm overview & integration](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c6dda74a7959261a8b45be/html5/thumbnails/27.jpg)
MEASURE
AXISMax spout pendingSQL workers
Kafka fetch speedDB write speedKafka / DB ratioCapacity
DB batch sizeKafka fetch size
Latency
METRICS
Serialization…
![Page 28: Storm overview & integration](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c6dda74a7959261a8b45be/html5/thumbnails/28.jpg)
MONITOR
STORM UI TOPOLOGY
![Page 29: Storm overview & integration](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c6dda74a7959261a8b45be/html5/thumbnails/29.jpg)
METRICS
GRAPHITE
![Page 30: Storm overview & integration](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c6dda74a7959261a8b45be/html5/thumbnails/30.jpg)
GOTCHAS
Version 0.9.1Partially in fluxKafka integrationMessage & topology versioningPerformance tuning
![Page 31: Storm overview & integration](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c6dda74a7959261a8b45be/html5/thumbnails/31.jpg)
Lambda Architecture
NEXT?
Master Dataset
Real-time Views
Serving LayerBatch Layer
Speed Layer
NewData
Query
Query
Batch Views
![Page 32: Storm overview & integration](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c6dda74a7959261a8b45be/html5/thumbnails/32.jpg)
http://storm.incubator.apache.org
RESOURCES
http://lambda-architecture.nethttp://kafka.apache.org
![Page 33: Storm overview & integration](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c6dda74a7959261a8b45be/html5/thumbnails/33.jpg)
http://www.gimp.org
PRESENTATION TOOLS
http://www.pictaculous.com
http://www.colourlovers.comhttp://www.easycalculation.com
http://paletton.com
![Page 34: Storm overview & integration](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c6dda74a7959261a8b45be/html5/thumbnails/34.jpg)
QUESTIONS?