Storm 2012 03-29

30
Real-time and long- time Fun with Hadoop + Storm

description

Detailed design for a robust counter as well as design for a completely on-line multi-armed bandit implementation that uses the new Bayesian Bandit algorithm - by Ted Dunning

Transcript of Storm 2012 03-29

Page 1: Storm 2012 03-29

Real-time and long-time

Fun with Hadoop + Storm

Page 2: Storm 2012 03-29

The Challenge

• Hadoop is great of processing vats of data– But sucks for real-time (by design!)

• Storm is great for real-time processing– But lacks any way to deal with batch processing

• It sounds like there isn’t a solution– Neither fashionable solution handles everything

Page 3: Storm 2012 03-29

This is not a problem.

It’s an opportunity!

Page 4: Storm 2012 03-29

t

now

Hadoop is Not Very Real-time

UnprocessedData

Fully processed

Latest full period

Hadoop job takes this long for this data

Page 5: Storm 2012 03-29

Need to Plug the Hole in Hadoop

• We have real-time data with limited state– Exactly what Storm does– And what Hadoop does not

• Can Storm and Hadoop be combined?

Page 6: Storm 2012 03-29

t

now

Hadoop works great back here

Storm workshere

Real-time and Long-time together

Blended view

Blended view

Blended View

Page 7: Storm 2012 03-29

An Example

• I want to know how many queries I get– Per second, minute, day, week

• Results should be available– within <2 seconds 99.9+% of the time– within 30 seconds almost always

• History should last >3 years• Should work for 0.001 q/s up to 100,000 q/s• Failure tolerant, yadda, yadda

Page 8: Storm 2012 03-29

Rough Design – Data Flow

Search Engine

Query Event Spout

Logger Bolt

Counter Bolt

Raw Logs

LoggerBolt

Semi Agg

Hadoop Aggregator

Snap

Long agg

Query Event Spout

Counter Bolt

Logger Bolt

Page 9: Storm 2012 03-29

Counter Bolt Detail

• Input: Labels to count• Output: Short-term semi-aggregated counts– (time-window, label, count)

• Non-zero counts emitted if– event count reaches threshold (typical 100K)– time since last count reaches threshold (typical 1s)

• Tuples acked when counts emitted• Double count probability is > 0 but very small

Page 10: Storm 2012 03-29

Counter Bolt Counterintuitivity

• Counts are emitted for same label, same time window many times– these are semi-aggregated– this is a feature– tuples can be acked within 1s– time windows can be much longer than 1s

• No need to send same label to same bolt– speeds failure recovery

Page 11: Storm 2012 03-29

Design Flexibility

• Counter can persist short-term transaction log– counter can recover state on failure– log is normally burn after write

• Count flush interval can be extended without extending tuple timeout– Decreases currency of counts– System is still real-time at a longer time-scale

• Total bandwidth for log is typically not huge

Page 12: Storm 2012 03-29

Counter Bolt No-nos

• Cannot accumulate entire period in-memory– Tuples must be ack’ed much sooner– State must be persisted before ack’ing– State can easily grow too large to handle without

disk access• Cannot persist entire count table at once – Incremental persistence required

Page 13: Storm 2012 03-29

Guarantees

• Counter output volume is small-ish– the greater of k tuples per 100K inputs or k tuple/s– 1 tuple/s/label/bolt for this exercise

• Persistence layer must provide guarantees– distributed against node failure– must have either readable flush or closed-append– HDFS is distributed, but no guarantees– MapRfs is distributed, provides both guarantees

Page 14: Storm 2012 03-29

Failure Modes

• Bolt failure– buffered tuples will go un’acked– after timeout, tuples will be resent– timeout ≈ 10s– if failure occurs after persistence, before acking, then double-

counting is possible• Storage (with MapR)– most failures invisible– a few continue within 0-2s, some take 10s– catastrophic cluster restart can take 2-3 min– logger can buffer this much easily

Page 15: Storm 2012 03-29

Presentation Layer

• Presentation must– read recent output of Logger bolt– read relevant output of Hadoop jobs– combine semi-aggregated records

• User will see– counts that increment within 0-2 s of events– seamless meld of short and long-term data

Page 16: Storm 2012 03-29

16

Mobile Network MonitorTransaction

data

Batch aggregation

Map

Real-time dashboard and alerts

Geo-dispersed ingest servers

Retro-analysisinterface

Page 17: Storm 2012 03-29

Example 2 – Real-time learning

• My system has to– learn a response model

and

– select training data– in real-time

• Data rate up to 100K queries per second

Page 18: Storm 2012 03-29

Door Number 3

• I have 15 versions of my landing page• Each visitor is assigned to a version– Which version?

• A conversion or sale or whatever can happen– How long to wait?

• Some versions of the landing page are horrible– Don’t want to give them traffic

Page 19: Storm 2012 03-29

Real-time Constraints

• Selection must happen in <20 ms almost all the time

• Training events must be handled in <20 ms• Failover must happen within 5 seconds• Client should timeout and back-off– no need for an answer after 500ms

• State persistence required

Page 20: Storm 2012 03-29

Rough Design

DRPC Spout Query Event Spout

Logger Bolt

Counter Bolt

Raw Logs

Model State

Timed Join Model

Logger Bolt

Conversion Detector

Selector Layer

Page 21: Storm 2012 03-29

A Quick Diversion

• You see a coin– What is the probability of heads?– Could it be larger or smaller than that?

• I flip the coin and while it is in the air ask again• I catch the coin and ask again• I look at the coin (and you don’t) and ask again• Why does the answer change?– And did it ever have a single value?

Page 22: Storm 2012 03-29

A First Conclusion

• Probability as expressed by humans is subjective and depends on information and experience

Page 23: Storm 2012 03-29

A Second Conclusion

• A single number is a bad way to express uncertain knowledge

• A distribution of values might be better

Page 24: Storm 2012 03-29

I Dunno

Page 25: Storm 2012 03-29

5 and 5

Page 26: Storm 2012 03-29

2 and 10

Page 27: Storm 2012 03-29

Bayesian Bandit

• Compute distributions based on data• Sample p1 and p2 from these distributions

• Put a coin in bandit 1 if p1 > p2

• Else, put the coin in bandit 2

Page 28: Storm 2012 03-29

And it works!

Page 29: Storm 2012 03-29

The Basic Idea

• We can encode a distribution by sampling• Sampling allows unification of exploration and

exploitation

• Can be extended to more general response models

Page 30: Storm 2012 03-29

• Contact:– [email protected]– @ted_dunning

• Slides and such:– http://info.mapr.com/ted-storm-2012-03