1 Gearpump Real time DAG-Processing at Scale Strata Singapore 2015 Sean Zhong Xiang.zhong@intel.com,...

Post on 17-Jan-2016

233 views 0 download

Tags:

Transcript of 1 Gearpump Real time DAG-Processing at Scale Strata Singapore 2015 Sean Zhong Xiang.zhong@intel.com,...

1

GearpumpReal time DAG-Processing at Scale

Strata Singapore 2015

Sean ZhongXiang.zhong@intel.com, Intel Software

2

What is Gearpump• Akka based lightweight Real time data processing platform.• Apache License http://gearpump.io version 0.7

• Akka: • Communication, concurrency, Isolation, and fault-tolerant

Simple and PowerfulMessage level streamingLong running daemons

What is Akka?

3

What is Akka?• Micro-service(Actor) oriented.

• Message Driven• Lock-free• Location-transparent

It is like our human society, driven by messageWhich can scale to 7 billion population!

4

Micro-service orientedhigher abstraction

• Break your application into Micro services instead of object.

• Throw away locks• Use Immutable Async message to exchange

information between micro-service instead of shared object.

5

Gearpump in Big Data Stack

store

visualization

batch stream

SQLCatalystStreamSQLImpala

Cluster manager monitor/alert/notify

Machine learning

Graphx

ClouderaManager

Storage

Engine

Analytics

Visualization& management

storm

Here !

Data explore

Gearpump

6

Why another streaming platform?

• The requirements are not fully met.• A higher abstraction like micro-service oriented can largely simplify

the problem.

7

What we want• Meet The 8 Requirements of Real-Time Stream

Processing (2006)

7

Flexible Volume Speed Accuracy VisualAny whereAny sizeAny sourceAny use casedynamic DAG②StreamSQL

High throughput⑦Scale linearly

①In-StreamZero latency⑥HA⑧Responsive

Exactly-once③Message loss/delay/out of order④Predictable

Easy to debugWYSWYG

8

Overview

9

DAG representation and API

Graph(A~>B~>C~>D, B~>E~>D)

Low level Graph API Syntax:

A B C D

E

Processor

Processor Processor Processor

ProcessorDAG

ShuffleField grouping

Field grouping

10

Architecture - Actor Hierarchy

Client

Hook in and query state

As general service

YARNEach App has one isolated AppMaster, and use Actor Supervision tree for error handling.

Master ClusterHA Design

11

WorkerWorkerWorker

Akka Cluster

Master

standbyMaster

StandbyMaster

State

Goss

ip Gossip

Gossip

Architecture - Master HA (no SPOF)• Akka Cluster for a centerless HA system• Conflict free data types(CRDT) for consistency

CRDT Data type example:

Decentralized: Not rely on single central meta server

leader

12

Feature HighlightsAkka/Akka-

stream/Stormcompatible

Throughput14 million/s

(*)

2ms Latency(*)

Exactly-once Dynamic DAG Out of OrderMessage

Flexible DSL DAGVisualization

Internet of Thing

function

usability

[*] Test environment: Intel® Xeon™ Processor E5-2690, 4 nodes, each node has 32 CPU cores, and 64GB memory, 10GbE network. We use default configuration of Gearpump 0.7 release. See backup page for more details.We use the SOL workload, message size 100 bytes, (https://github.com/intel-hadoop/storm-benchmark) (tested carried by Intel team)

13

Using Gearpump

14

Three steps to use it1. Download binary from http://gearpump.io2. Submit jar by UI3. Monitor Status

15

Application Submission Flow

Master

Workers1. Submit a Jar

Master

Workers

AppMaster 2. Create AppMaster

Master

Workers

AppMaster

3. Request Resource YARNMaster

Workers

AppMaster Executor Executor Executor

Ask YARN

for Resource1 2

43

154. Report Executor to AppMaster

YARN or without YARN

16

Low level Graph API - WordCountval context = new ClientContext() val split = Processor[Split](splitParallism) val sum = Processor[Sum](sumParallism) val app = StreamApplication("wordCount", Graph(split ~> sum), UserConfig.empty) val appId = context.submit(app) context.close() class Split(taskContext : TaskContext, conf: UserConfig) extends Task(taskContext, conf) { override def onNext(msg : Message) : Unit = { /* split the line */ } } class Sum (taskContext : TaskContext, conf: UserConfig) extends Task(taskContext, conf) { val count = /**count of words **/ override def onNext(msg : Message) : Unit = {/* do aggregation on word*/} }

Scala Java

17

High Level DSL API - WordCount

val context = ClientContext() val app = new StreamApp("dsl", context)val data = "This is a good start, bingo!! bingo!!" app.fromCollection(data.lines) // word => (word, count = 1) .flatMap(line => line.split("[\\s]+")).map((_, 1)) // (word, count1), (word, count2) => (word, count1 + count2) .groupByKey().sum.log

val appId = context.submit(app) context.close()

18

Akka-stream API – WordCountimplicit val system = ActorSystem("akka-test")implicit val materializer = new GearpumpMaterializer(system)val echo = system.actorOf(Props(new Echo()))val sink = Sink.actorRef(echo, "COMPLETE")val source = GearSource.from[String](new CollectionDataSource(lines))source.mapConcat{line => line.split(" ").toList} .groupBy2(x=>x) .map(word => (word, 1)) .reduce {(a, b) => (a._1, a._2 + b._2)} .log("word-count").runWith(sink)

Available at branch https://github.com/gearpump/gearpump/tree/akkastream

19

DAG PageUI Portal - DAG Visualization

Track global min-Clock of all message DAG:

• Node size reflect throughput• Edge width represents flow rate• Red node means something goes wrong

19

20

UI Portal – Processor DetailProcessor Page

Data skew distribution Task throughput and latency

21

Performance optimization

22

Throughput and LatencyThroughput: 11 million message/second

Latency: 17ms on full load

SOL Shuffle test32 tasks->32 tasks

[*] Test environment: Intel® Xeon™ Processor E5-2680, 4 nodes, each node has 32 CPU cores, and 64GB memory, 10GbE network. (tested carried by Intel team)We use default configuration of Gearpump 0.2 release. We use the SOL workload, message size 100 bytes, (https://github.com/intel-hadoop/storm-benchmark)

23

Scalability• Test run on 100 nodes(*) and 3000 tasks• Gearpump performance scales:

100 nodes

[*] We use 8 machines to simulate 100 worker nodesTest environment: Intel® Xeon™ Processor E5-2680, each node has 32 CPU cores, and 64GB memory, 10GbE network. (tested carried by Intel team)We use default configuration of Gearpump 0.3.5 release. We use the SOL workload, message size 100 bytes, (https://github.com/intel-hadoop/storm-benchmark)

24

Fault-Tolerance: Recovery time

[*]: Recovery time is the time interval between: a) failure happen b) all tasks in topology resume processing data.

91 worker nodes, 1000 tasks

Test environment: Intel® Xeon™ Processor E5-2680 91 worker nodes, 1000 tasks (We use 7 machines to simulate 91 worker nodes). Each node has 32 CPU cores, and 64GB memory, 10GbE network. We use default configuration of Gearpump 0.3.5 release. We use the SOL workload (https://github.com/intel-hadoop/storm-benchmark) (by Intel team)

25

High performance Messaging Layer• Akka remote message has a big overhead, (sender + receiver address)• Reduce 95% overhead (400 bytes to ~20 bytes)

Effective batching

convert from short address

convert to short address

Sync with other executors

26

Effective batchingNetwork Idle: Flush as fast as we canNetwork Busy: Smart batching until the network is open again.

This feature is ported from Storm-297

Network BandwidthDoubled

For 100 byte per message

Test environment: Same as storm-297

27

High performance flow Control

Pass back-pressure level-by-levelAbout ~1% throughput impact

TaskTaskTask TaskTaskTask TaskTaskTask TaskTaskTask

Back-pressure Sliding window

Another option(not used): big-loop-feedback flow control

1. NO central ack nodes2. Each level knows

network status, thus can optimize the network at best

28

Use cases

29

What is a good use case for Gearpump?

When you want exactly-once message processing, with millisecond latency. When you want to integrate with Akka and Akka-stream transparently. When you want dynamic modification of online DAGWhen you want to connect with IoT edge device, location tranparency.When you want a simple scheduler to distribute customized application, like collecting logs, distributed cron…When you want to use Monoid state like Count-Min Sketch…Besides, it can integrate with: YARN, Kafka, Storm and etc..

30

IoT Transparent Cloud

Location transparent. Unified programming model across the boundary.

log

Data Center

dag on device side

Target Problem Large gap between edge devices and data center

case: Intelligent Traffic System, 3000 traffic lights, travel time, overspeed detection…

31

Exactly-once: Financial use casesTarget Problem both real-time and accuracy are important

realtime Stock index

CrawlersProcess Alerts Actions Reports

Rules

TransactionAccountOther

data source

Programing trading

32

Delete

Transformers: Dynamic DAG

Add

change parallelism online to scale out without message lossadd/remove source/sink processor dynamically

B

Target Problem No existing way to manipulate the DAG on the fly

Each Processor can has its own independent jar

Replace

It can

33

Eve: Online Machine Learning

• Decide immediately based online learning result

sensor

Learn

Predict

Decide

Input

Output

Target Problem ML train and predict online for real-time decision support.

TAP analytics platform integration: http://trustedanalytics.org/

34

Gearpump Internals

35

General ideas

State

Minclock service

Replayable Source DAG

MinClock service track the min timestamp of all pending messages in the system now and future

Message(timestamp)

Every message in the systemHas a application timestamp(message birth time)

Normal Flow

36

General ideas

State

Minclock service

Replayable Source

Checkpoint Store

Message(timestamp)

④Exactly-once State can be reconstructed by message replay:

①Detect Message loss at Tp

② clock pause at Tp

③ replay from Tp

Recovery Flow

DAG

37

1. Detect Message loss2. Pause the clock at Tp when message is lost3. Replay from clock Tp4. Exactly-once

38

Detect Failure in timeAckRequest and Ack to detect Message loss:

Easy to trouble-shootWhen An error happen, we know When Where Why

Master

AppMaster

Executor

Task

Failure

Failure

Failure

Super

vision ch

ain

39

Executor Executor

AppMaster

TaskTaskTask

Store

Source

Global clock service

① error detected

②Fence zombie

Recover the runtime when machine crashed

Use dynamic session ID to fence zombies

Send message

1. Quarantine

40

Executor

AppMaster

Task

Store

SourceReplay

Global clock service

① error detected

②isolate zombie

Executor

Task

③ Recover

DAG Recovery: Quarantine and Recover2. Recover the executor JVM, and replay message

Send message

41

1. Detect Message loss2. Pause the clock at Tp when message is lost3. Replay from clock Tp4. Exactly-once

42

Application’s Clock Service

Definition: Task min-clock isMinimum of ( min timestamp of pending-messages in current task Task min-Clock of all upstream tasks)

Report task min-clock

Level ClockA

B

C

D

E

Later

Earlier

1000

800

600

400

Ever incremental

Clock of D

43

1. Detect Message loss2. Pause the clock at Tp when message is lost3. Replay from clock Tp4. Exactly-once

44

Source-based Message Replay

Normal Flow

Replay from the very-beginning source

Source

Like offset of kafka queue -> message timestamp

45

Source-based Message ReplayReplay from the very-beginning source

Global Clock Service

①Resolve offset with timestamp Tp

②Replay from offset

Source

Recovery Flow

46

1. Detect Message loss2. Clock pause at Tp when message is lost3. Replay from clock Tp4. Exactly-once

47

Checkpoint Store

DAG runtime

Key: Ensure State(t) only contains message(timestamp <= t)

Exactly-once message processing

How?

append only checkpoint

48

Exactly-once message processingTwo states

StateAccept (t <

Tc)

StateAccept all

Checkpoint Store

Streaming SystemMessages

Normal Flow

49

Exactly-once message processingTwo states

StateAccept (t <

Tc)

StateAccept all

Checkpoint Store

Streaming SystemMessages

Recover checkpoint in failures

Recovery Flow

50

How to do Dynamic DAG?Multiple-Version DAG

A B C

B’Message.time >= Tc

Target Problem: Replace processor B with B’ at time Tc

DAG(Version = 0) DAG(Version = 1)transit

Message.time < Tc

NO message loss during the transition

A B C

51

Demo

52

Live demo

53

References• 钟翔 大数据时代的软件架构范式: Reactive 架构及 Akka 实践 , 程序员期刊 2015 年2A期• Gearpump whitepaper http://typesafe.com/blog/gearpump-real-time-streaming-engine-using-akka• 吴甘沙 低延迟流处理系统的逆袭 , 程序员期刊 2013 年10期• Stonebraker http://cs.brown.edu/~ugur/8rulesSigRec.pdf• https://github.com/intel-hadoop/gearpump• Gearpump: https://github.com/intel-hadoop/gearpump• http://highlyscalable.wordpress.com/2013/08/20/in-stream-big-data-processing/• https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap

-machines• Sqlstream http://www.sqlstream.com/customers/• http://www.statsblogs.com/2014/05/19/a-general-introduction-to-stream-processing/• http://www.statalgo.com/2014/05/28/stream-processing-with-messaging-systems/• Gartner report on IOT

http://www.zdnet.com/article/internet-of-things-devices-will-dwarf-number-of-pcs-tablets-and-smartphones/

54

Legal Disclaimers1 This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps.2 Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at [intel.com].3 Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.§ For more information go to http://www.intel.com/performance. Intel, the Intel logo, Xeon are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.© 2015 Intel Corporation

55

Latest Performance evaluation on Gearpump 0.7

Microsoft Word Document

This is the latest performance test on version 0.7. Please see the embedded document on the right to see the configuration details.