Streaming OLAP Applications - HPTS

74
Streaming OLAP Applications C. Scott Andreas | HPTS 2013 @cscotta From square one to multi-gigabit streams and beyond

Transcript of Streaming OLAP Applications - HPTS

Page 1: Streaming OLAP Applications - HPTS

Streaming OLAP Applications

C. Scott Andreas | HPTS 2013 @cscotta

From square one to multi-gigabit streams and beyond

Page 2: Streaming OLAP Applications - HPTS

Roadmap– Framing the problem – Four phases of an architecture’s evolution – Code: A general-purpose lockless aggregator – Demonstration – Further reading

Page 3: Streaming OLAP Applications - HPTS
Page 4: Streaming OLAP Applications - HPTS
Page 5: Streaming OLAP Applications - HPTS
Page 6: Streaming OLAP Applications - HPTS

A journey of up and out– Started at ~7,000 flows / second on one node – Added distribution, bringing us to 7,000 flow/sec/node – Implemented custom OLAP engine: 1.6 MM/sec/node – Further work remains on a streaming OLAP map/reduce, demonstrated on a stream of 80 Gbps.

Page 7: Streaming OLAP Applications - HPTS

Sing

le-No

de S

calab

ility

Many-Node Scalability

xA good place to be

Page 8: Streaming OLAP Applications - HPTS

Sing

le-Cus

tomer

Sca

labilit

y

Many-Customer Scalability

xA good place to be

Page 9: Streaming OLAP Applications - HPTS

Four phases– Up: Off-the-shelf CEP software – Out: Distribution – Up: Custom streaming OLAP engine – Out: Evolution toward a streaming map/reduce

Page 10: Streaming OLAP Applications - HPTS

[1] Off-the-Shelf CEP– Single customer, single node – Exists, works!

Page 11: Streaming OLAP Applications - HPTS
Page 12: Streaming OLAP Applications - HPTS

select symbol, avg(price) as avgPrice from StockTickEvent.win:length(100) group by symbol;

A sample EPL that returns the average price per symbol for the last 100 stock ticks: !!

http://esper.codehaus.org/tutorials/tutorial/tutorial.html

a

Page 13: Streaming OLAP Applications - HPTS
Page 14: Streaming OLAP Applications - HPTS
Page 15: Streaming OLAP Applications - HPTS
Page 16: Streaming OLAP Applications - HPTS

Sing

le-No

de S

calab

ility

Many-Node Scalability

x

you are here

7,000 events/second one node, no HA

Page 17: Streaming OLAP Applications - HPTS

[2] DistributionDesigning an HA multi-tenant analytics engineto map M customers onto N nodes.

Page 18: Streaming OLAP Applications - HPTS

streambuffering

kafka01

kafkaNN

OLAP filtering + aggregation

olap01

olapNN

Client API 0 - NN

collectors

coll01 coll02

coll03 coll04

coll05 coll06

zookeeper zookeeper zookeeper zookeeper zookeeper

Storage 0 - NNStorage 0 - NNStorage 0 - NNStorage 0 - NN

Client API 0 - NNClient API 0 - NNClient API 0 - NN

Page 19: Streaming OLAP Applications - HPTS

Self-Organization

github.com/boundary/ordasity

Page 20: Streaming OLAP Applications - HPTS

Self-Organization• ZooKeeper broadcasts a consistently-ordered view of

cluster state changes for all nodes, all active streams, and who owns what.

github.com/boundary/ordasity

Page 21: Streaming OLAP Applications - HPTS

Self-Organization• ZooKeeper broadcasts a consistently-ordered view of

cluster state changes for all nodes, all active streams, and who owns what.

• “Claim streams until I have at least my fair share.”

github.com/boundary/ordasity

Page 22: Streaming OLAP Applications - HPTS

Self-Organization• ZooKeeper broadcasts a consistently-ordered view of

cluster state changes for all nodes, all active streams, and who owns what.

• “Claim streams until I have at least my fair share.”

• If I have too much, “hand off streams until I’m doing my fair share.”

github.com/boundary/ordasity

Page 23: Streaming OLAP Applications - HPTS

Self-Organization• ZooKeeper broadcasts a consistently-ordered view of

cluster state changes for all nodes, all active streams, and who owns what.

• “Claim streams until I have at least my fair share.”

• If I have too much, “hand off streams until I’m doing my fair share.”

• If I’m shutting down, tell others, hand streams off, and don’t claim any more.

github.com/boundary/ordasity

Page 24: Streaming OLAP Applications - HPTS
Page 25: Streaming OLAP Applications - HPTS
Page 26: Streaming OLAP Applications - HPTS

Sing

le-No

de S

calab

ility

Many-Node Scalability

xyou are here

Page 27: Streaming OLAP Applications - HPTS

Sing

le-No

de S

calab

ility

Many-Node Scalability

xyou are here

Page 28: Streaming OLAP Applications - HPTS

Sing

le-No

de S

calab

ility

Many-Node Scalability

xyou are here

7,000 flows/second any number of nodes, HA

Page 29: Streaming OLAP Applications - HPTS

Sing

le-C

usto

mer

Sca

labilit

y

Many-Customer Scalability

xbut you are still here

Page 30: Streaming OLAP Applications - HPTS

Sing

le-C

usto

mer

Sca

labilit

y

Many-Customer Scalability

xbut you are still here

Page 31: Streaming OLAP Applications - HPTS

Sing

le-C

usto

mer

Sca

labilit

y

Many-Customer Scalability

xbut you are still here

7,000 flows/second any number of nodes, HA

Page 32: Streaming OLAP Applications - HPTS

[3] Custom Streaming OLAPLockless aggregation of event streams

Page 33: Streaming OLAP Applications - HPTS

Timestamp Dimension Key Rollup Object

Page 34: Streaming OLAP Applications - HPTS
Page 35: Streaming OLAP Applications - HPTS

Methodology: Launch process with thread count configuration,preload all data into memory,run for 10 minutes, and exit printing the final mean processing rate. Batch size: 10,000. Hardware: Tests run on an EC2 cc2.8xlarge (2x Xeon E5-2670; 32 vcores,16 physical) Software: Java 1.7.0_40-b43 Xmx24G CMS+Parnew. EC2 Linux 3.4.43-43.43.amzn1.x86_64 (ami-a73758ce)

Chart 4

0

1250000

2500000

3750000

5000000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

Lockless Aggregator

Page 36: Streaming OLAP Applications - HPTS

Methodology: Launch process with thread count configuration,preload all data into memory,run for 10 minutes, and exit printing the final mean processing rate. Batch size: 10,000. Hardware: Tests run on an EC2 cc2.8xlarge (2x Xeon E5-2670; 32 vcores,16 physical) Software: Java 1.7.0_40-b43 Xmx24G CMS+Parnew. EC2 Linux 3.4.43-43.43.amzn1.x86_64 (ami-a73758ce)

Chart 4

0

600000

1200000

1800000

2400000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

Lock-Striping Aggregator

Page 37: Streaming OLAP Applications - HPTS

Methodology: Launch process with thread count configuration,preload all data into memory,run for 10 minutes, and exit printing the final mean processing rate. Batch size: 10,000. Hardware: Tests run on an EC2 cc2.8xlarge (2x Xeon E5-2670; 32 vcores,16 physical) Software: Java 1.7.0_40-b43 Xmx24G CMS+Parnew. EC2 Linux 3.4.43-43.43.amzn1.x86_64 (ami-a73758ce)

Chart 4

0

1250000

2500000

3750000

5000000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

Lockless Aggregator (NonBlockingHashMap)Lock-Striping Aggregator (ConcurrentHashMap)

Page 38: Streaming OLAP Applications - HPTS
Page 39: Streaming OLAP Applications - HPTS
Page 40: Streaming OLAP Applications - HPTS

Timestamp Dimension Key Rollup Object

Page 41: Streaming OLAP Applications - HPTS

Sing

le-C

usto

mer

Sca

labilit

y

Many-Customer Scalability

xmoving on up!

Page 42: Streaming OLAP Applications - HPTS

Sing

le-C

usto

mer

Sca

labilit

y

Many-Customer Scalability

xmoving on up!

Page 43: Streaming OLAP Applications - HPTS

Sing

le-C

usto

mer

Sca

labilit

y

Many-Customer Scalability

xmoving on up!

1.6MM flows/second/node any number of nodes, HA

Page 44: Streaming OLAP Applications - HPTS

Example Implementation

Page 45: Streaming OLAP Applications - HPTS

Example Implementation

Page 46: Streaming OLAP Applications - HPTS
Page 47: Streaming OLAP Applications - HPTS
Page 48: Streaming OLAP Applications - HPTS
Page 49: Streaming OLAP Applications - HPTS
Page 50: Streaming OLAP Applications - HPTS

demo

Page 51: Streaming OLAP Applications - HPTS

Man

y-No

de a

nd L

arge

Cus

tom

er S

calab

ility

Many-Node and Many-Customer Scalability

x

what gets us here?

Page 52: Streaming OLAP Applications - HPTS

Man

y-No

de a

nd L

arge

Cus

tom

er S

calab

ility

Many-Node and Many-Customer Scalability

x

what gets us here?

high processing rate, HA, any number of nodes, no “single-node” sharding limit.

Page 53: Streaming OLAP Applications - HPTS

[4] Streaming OLAP Map/ReduceIncremental lockless filtering / aggregation of event streams, final rollups of total streams

Page 54: Streaming OLAP Applications - HPTS

Input Sources

Map

Map

Map

Map

Map

Reduce

Output

high velocity, partitioned

streams

low velocity incremental

outputmany,

high velocityfinal

aggregationtop-level

filtering and aggregation

Page 55: Streaming OLAP Applications - HPTS
Page 56: Streaming OLAP Applications - HPTS
Page 57: Streaming OLAP Applications - HPTS
Page 58: Streaming OLAP Applications - HPTS

Streaming Map/Reduce

Page 59: Streaming OLAP Applications - HPTS

Streaming Map/Reduce• Higher latency, but much higher velocity

Page 60: Streaming OLAP Applications - HPTS

Streaming Map/Reduce• Higher latency, but much higher velocity

• Challenging for time-windowed aggregations (case of the slow mapper)

Page 61: Streaming OLAP Applications - HPTS

Streaming Map/Reduce• Higher latency, but much higher velocity

• Challenging for time-windowed aggregations (case of the slow mapper)

• Implementations: Apache Samza atop YARN (LinkedIn), Storm (Twitter), Summingbird (Twitter)

Page 62: Streaming OLAP Applications - HPTS

Streaming Map/Reduce• Higher latency, but much higher velocity

• Challenging for time-windowed aggregations (case of the slow mapper)

• Implementations: Apache Samza atop YARN (LinkedIn), Storm (Twitter), Summingbird (Twitter)

• Papers: MillWheel (Google at VLDB)http://db.disi.unitn.eu/pages/VLDBProgram/pdf/industry/p734-akidau.pdf

Page 63: Streaming OLAP Applications - HPTS
Page 64: Streaming OLAP Applications - HPTS

Parallel OLAP Aggregation

Page 65: Streaming OLAP Applications - HPTS

Parallel OLAP Aggregation

• Fundamental problem: contention

Page 66: Streaming OLAP Applications - HPTS

Parallel OLAP Aggregation

• Fundamental problem: contention

• Lockless data structures reduce contention – but CAS is no silver bullet

Page 67: Streaming OLAP Applications - HPTS

Parallel OLAP Aggregation

• Fundamental problem: contention

• Lockless data structures reduce contention – but CAS is no silver bullet

• One approach: thread-local aggregation with TreeMaps/HashMaps, combining operations once/sec

Page 68: Streaming OLAP Applications - HPTS

Parallel OLAP Aggregation

• Fundamental problem: contention

• Lockless data structures reduce contention – but CAS is no silver bullet

• One approach: thread-local aggregation with TreeMaps/HashMaps, combining operations once/sec

• “Flat Combining and the Synchronization-Parallelism Tradeoff”

Page 69: Streaming OLAP Applications - HPTS
Page 70: Streaming OLAP Applications - HPTS
Page 71: Streaming OLAP Applications - HPTS

CodeStreaming Aggregation: https://github.com/cscotta/deschutesCluster Coordination: https://github.com/boundary/ordasityDocumentation: http://taco.cat/deschutes

Page 72: Streaming OLAP Applications - HPTS
Page 73: Streaming OLAP Applications - HPTS
Page 74: Streaming OLAP Applications - HPTS

Streaming OLAP Applications

C. Scott Andreas | HPTS 2013 @cscotta

From square one to multi-gigabit streams and beyond