Post on 17-Jan-2016
1
GearpumpReal time DAG-Processing at Scale
Strata Singapore 2015
Sean ZhongXiang.zhong@intel.com, Intel Software
2
What is Gearpump• Akka based lightweight Real time data processing platform.• Apache License http://gearpump.io version 0.7
• Akka: • Communication, concurrency, Isolation, and fault-tolerant
Simple and PowerfulMessage level streamingLong running daemons
What is Akka?
3
What is Akka?• Micro-service(Actor) oriented.
• Message Driven• Lock-free• Location-transparent
It is like our human society, driven by messageWhich can scale to 7 billion population!
4
Micro-service orientedhigher abstraction
• Break your application into Micro services instead of object.
• Throw away locks• Use Immutable Async message to exchange
information between micro-service instead of shared object.
5
Gearpump in Big Data Stack
store
visualization
batch stream
SQLCatalystStreamSQLImpala
Cluster manager monitor/alert/notify
Machine learning
Graphx
ClouderaManager
Storage
Engine
Analytics
Visualization& management
storm
Here !
Data explore
Gearpump
6
Why another streaming platform?
• The requirements are not fully met.• A higher abstraction like micro-service oriented can largely simplify
the problem.
7
What we want• Meet The 8 Requirements of Real-Time Stream
Processing (2006)
7
Flexible Volume Speed Accuracy VisualAny whereAny sizeAny sourceAny use casedynamic DAG②StreamSQL
High throughput⑦Scale linearly
①In-StreamZero latency⑥HA⑧Responsive
Exactly-once③Message loss/delay/out of order④Predictable
Easy to debugWYSWYG
8
Overview
9
DAG representation and API
Graph(A~>B~>C~>D, B~>E~>D)
Low level Graph API Syntax:
A B C D
E
Processor
Processor Processor Processor
ProcessorDAG
ShuffleField grouping
Field grouping
10
Architecture - Actor Hierarchy
Client
Hook in and query state
As general service
YARNEach App has one isolated AppMaster, and use Actor Supervision tree for error handling.
Master ClusterHA Design
11
WorkerWorkerWorker
Akka Cluster
Master
standbyMaster
StandbyMaster
State
Goss
ip Gossip
Gossip
Architecture - Master HA (no SPOF)• Akka Cluster for a centerless HA system• Conflict free data types(CRDT) for consistency
CRDT Data type example:
Decentralized: Not rely on single central meta server
leader
12
Feature HighlightsAkka/Akka-
stream/Stormcompatible
Throughput14 million/s
(*)
2ms Latency(*)
Exactly-once Dynamic DAG Out of OrderMessage
Flexible DSL DAGVisualization
Internet of Thing
function
usability
[*] Test environment: Intel® Xeon™ Processor E5-2690, 4 nodes, each node has 32 CPU cores, and 64GB memory, 10GbE network. We use default configuration of Gearpump 0.7 release. See backup page for more details.We use the SOL workload, message size 100 bytes, (https://github.com/intel-hadoop/storm-benchmark) (tested carried by Intel team)
13
Using Gearpump
14
Three steps to use it1. Download binary from http://gearpump.io2. Submit jar by UI3. Monitor Status
15
Application Submission Flow
Master
Workers1. Submit a Jar
Master
Workers
AppMaster 2. Create AppMaster
Master
Workers
AppMaster
3. Request Resource YARNMaster
Workers
AppMaster Executor Executor Executor
Ask YARN
for Resource1 2
43
154. Report Executor to AppMaster
YARN or without YARN
16
Low level Graph API - WordCountval context = new ClientContext() val split = Processor[Split](splitParallism) val sum = Processor[Sum](sumParallism) val app = StreamApplication("wordCount", Graph(split ~> sum), UserConfig.empty) val appId = context.submit(app) context.close() class Split(taskContext : TaskContext, conf: UserConfig) extends Task(taskContext, conf) { override def onNext(msg : Message) : Unit = { /* split the line */ } } class Sum (taskContext : TaskContext, conf: UserConfig) extends Task(taskContext, conf) { val count = /**count of words **/ override def onNext(msg : Message) : Unit = {/* do aggregation on word*/} }
Scala Java
17
High Level DSL API - WordCount
val context = ClientContext() val app = new StreamApp("dsl", context)val data = "This is a good start, bingo!! bingo!!" app.fromCollection(data.lines) // word => (word, count = 1) .flatMap(line => line.split("[\\s]+")).map((_, 1)) // (word, count1), (word, count2) => (word, count1 + count2) .groupByKey().sum.log
val appId = context.submit(app) context.close()
18
Akka-stream API – WordCountimplicit val system = ActorSystem("akka-test")implicit val materializer = new GearpumpMaterializer(system)val echo = system.actorOf(Props(new Echo()))val sink = Sink.actorRef(echo, "COMPLETE")val source = GearSource.from[String](new CollectionDataSource(lines))source.mapConcat{line => line.split(" ").toList} .groupBy2(x=>x) .map(word => (word, 1)) .reduce {(a, b) => (a._1, a._2 + b._2)} .log("word-count").runWith(sink)
Available at branch https://github.com/gearpump/gearpump/tree/akkastream
19
DAG PageUI Portal - DAG Visualization
Track global min-Clock of all message DAG:
• Node size reflect throughput• Edge width represents flow rate• Red node means something goes wrong
19
20
UI Portal – Processor DetailProcessor Page
Data skew distribution Task throughput and latency
21
Performance optimization
22
Throughput and LatencyThroughput: 11 million message/second
Latency: 17ms on full load
SOL Shuffle test32 tasks->32 tasks
[*] Test environment: Intel® Xeon™ Processor E5-2680, 4 nodes, each node has 32 CPU cores, and 64GB memory, 10GbE network. (tested carried by Intel team)We use default configuration of Gearpump 0.2 release. We use the SOL workload, message size 100 bytes, (https://github.com/intel-hadoop/storm-benchmark)
23
Scalability• Test run on 100 nodes(*) and 3000 tasks• Gearpump performance scales:
100 nodes
[*] We use 8 machines to simulate 100 worker nodesTest environment: Intel® Xeon™ Processor E5-2680, each node has 32 CPU cores, and 64GB memory, 10GbE network. (tested carried by Intel team)We use default configuration of Gearpump 0.3.5 release. We use the SOL workload, message size 100 bytes, (https://github.com/intel-hadoop/storm-benchmark)
24
Fault-Tolerance: Recovery time
[*]: Recovery time is the time interval between: a) failure happen b) all tasks in topology resume processing data.
91 worker nodes, 1000 tasks
Test environment: Intel® Xeon™ Processor E5-2680 91 worker nodes, 1000 tasks (We use 7 machines to simulate 91 worker nodes). Each node has 32 CPU cores, and 64GB memory, 10GbE network. We use default configuration of Gearpump 0.3.5 release. We use the SOL workload (https://github.com/intel-hadoop/storm-benchmark) (by Intel team)
25
High performance Messaging Layer• Akka remote message has a big overhead, (sender + receiver address)• Reduce 95% overhead (400 bytes to ~20 bytes)
Effective batching
convert from short address
convert to short address
Sync with other executors
26
Effective batchingNetwork Idle: Flush as fast as we canNetwork Busy: Smart batching until the network is open again.
This feature is ported from Storm-297
Network BandwidthDoubled
For 100 byte per message
Test environment: Same as storm-297
27
High performance flow Control
Pass back-pressure level-by-levelAbout ~1% throughput impact
TaskTaskTask TaskTaskTask TaskTaskTask TaskTaskTask
Back-pressure Sliding window
Another option(not used): big-loop-feedback flow control
1. NO central ack nodes2. Each level knows
network status, thus can optimize the network at best
28
Use cases
29
What is a good use case for Gearpump?
When you want exactly-once message processing, with millisecond latency. When you want to integrate with Akka and Akka-stream transparently. When you want dynamic modification of online DAGWhen you want to connect with IoT edge device, location tranparency.When you want a simple scheduler to distribute customized application, like collecting logs, distributed cron…When you want to use Monoid state like Count-Min Sketch…Besides, it can integrate with: YARN, Kafka, Storm and etc..
30
IoT Transparent Cloud
Location transparent. Unified programming model across the boundary.
log
Data Center
dag on device side
Target Problem Large gap between edge devices and data center
case: Intelligent Traffic System, 3000 traffic lights, travel time, overspeed detection…
31
Exactly-once: Financial use casesTarget Problem both real-time and accuracy are important
realtime Stock index
CrawlersProcess Alerts Actions Reports
Rules
TransactionAccountOther
data source
Programing trading
32
Delete
Transformers: Dynamic DAG
Add
change parallelism online to scale out without message lossadd/remove source/sink processor dynamically
B
Target Problem No existing way to manipulate the DAG on the fly
Each Processor can has its own independent jar
Replace
It can
33
Eve: Online Machine Learning
• Decide immediately based online learning result
sensor
Learn
Predict
Decide
Input
Output
Target Problem ML train and predict online for real-time decision support.
TAP analytics platform integration: http://trustedanalytics.org/
34
Gearpump Internals
35
General ideas
State
Minclock service
Replayable Source DAG
MinClock service track the min timestamp of all pending messages in the system now and future
Message(timestamp)
Every message in the systemHas a application timestamp(message birth time)
Normal Flow
36
General ideas
State
Minclock service
Replayable Source
Checkpoint Store
Message(timestamp)
④Exactly-once State can be reconstructed by message replay:
①Detect Message loss at Tp
② clock pause at Tp
③ replay from Tp
Recovery Flow
DAG
37
1. Detect Message loss2. Pause the clock at Tp when message is lost3. Replay from clock Tp4. Exactly-once
38
Detect Failure in timeAckRequest and Ack to detect Message loss:
Easy to trouble-shootWhen An error happen, we know When Where Why
Master
AppMaster
Executor
Task
Failure
Failure
Failure
Super
vision ch
ain
39
Executor Executor
AppMaster
TaskTaskTask
Store
Source
Global clock service
① error detected
②Fence zombie
Recover the runtime when machine crashed
Use dynamic session ID to fence zombies
Send message
1. Quarantine
40
Executor
AppMaster
Task
Store
SourceReplay
Global clock service
① error detected
②isolate zombie
Executor
Task
③ Recover
DAG Recovery: Quarantine and Recover2. Recover the executor JVM, and replay message
Send message
41
1. Detect Message loss2. Pause the clock at Tp when message is lost3. Replay from clock Tp4. Exactly-once
42
Application’s Clock Service
Definition: Task min-clock isMinimum of ( min timestamp of pending-messages in current task Task min-Clock of all upstream tasks)
Report task min-clock
Level ClockA
B
C
D
E
Later
Earlier
1000
800
600
400
Ever incremental
Clock of D
43
1. Detect Message loss2. Pause the clock at Tp when message is lost3. Replay from clock Tp4. Exactly-once
44
Source-based Message Replay
Normal Flow
Replay from the very-beginning source
Source
Like offset of kafka queue -> message timestamp
45
Source-based Message ReplayReplay from the very-beginning source
Global Clock Service
①Resolve offset with timestamp Tp
②Replay from offset
Source
Recovery Flow
46
1. Detect Message loss2. Clock pause at Tp when message is lost3. Replay from clock Tp4. Exactly-once
47
Checkpoint Store
DAG runtime
Key: Ensure State(t) only contains message(timestamp <= t)
Exactly-once message processing
How?
append only checkpoint
48
Exactly-once message processingTwo states
StateAccept (t <
Tc)
StateAccept all
Checkpoint Store
Streaming SystemMessages
Normal Flow
49
Exactly-once message processingTwo states
StateAccept (t <
Tc)
StateAccept all
Checkpoint Store
Streaming SystemMessages
Recover checkpoint in failures
Recovery Flow
50
How to do Dynamic DAG?Multiple-Version DAG
A B C
B’Message.time >= Tc
Target Problem: Replace processor B with B’ at time Tc
DAG(Version = 0) DAG(Version = 1)transit
Message.time < Tc
NO message loss during the transition
A B C
51
Demo
52
Live demo
53
References• 钟翔 大数据时代的软件架构范式: Reactive 架构及 Akka 实践 , 程序员期刊 2015 年2A期• Gearpump whitepaper http://typesafe.com/blog/gearpump-real-time-streaming-engine-using-akka• 吴甘沙 低延迟流处理系统的逆袭 , 程序员期刊 2013 年10期• Stonebraker http://cs.brown.edu/~ugur/8rulesSigRec.pdf• https://github.com/intel-hadoop/gearpump• Gearpump: https://github.com/intel-hadoop/gearpump• http://highlyscalable.wordpress.com/2013/08/20/in-stream-big-data-processing/• https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap
-machines• Sqlstream http://www.sqlstream.com/customers/• http://www.statsblogs.com/2014/05/19/a-general-introduction-to-stream-processing/• http://www.statalgo.com/2014/05/28/stream-processing-with-messaging-systems/• Gartner report on IOT
http://www.zdnet.com/article/internet-of-things-devices-will-dwarf-number-of-pcs-tablets-and-smartphones/
54
Legal Disclaimers1 This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps.2 Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at [intel.com].3 Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.§ For more information go to http://www.intel.com/performance. Intel, the Intel logo, Xeon are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.© 2015 Intel Corporation
55
Latest Performance evaluation on Gearpump 0.7
Microsoft Word Document
This is the latest performance test on version 0.7. Please see the embedded document on the right to see the configuration details.