Designing and Mining Multi-Terabyte Astronomy Archives - Microsoft
Playing in Tune: How We Refactored Cube to Terabyte Scale
Transcript of Playing in Tune: How We Refactored Cube to Terabyte Scale
Harmony in Tune
Philip (flip) KromerHuston Hoburg infochimps.com
Feb 15 2013
How we Refactored Cube to Terabyte Scale
Big Data for All
Big Data for All
why dashboards?
Lightweight Dashboards
• Understand what’s happening
• Understand data in context
• NOT exploratory analytics
• real-time insight...but not just about real-time
mainline: j.mp/sqcube
hi-scale branch: j.mp/icscube
The “Church of Graphs”
Predictive Kvetching
Lightweight Dashboards
Approach to Tuning
• Measure: “Why can’t it be faster?”
• Harmonize: “Use it right”
• Tune: “Align it to production resources”
cube is awesome
What’s so great?• Streaming, real-time
• Ad-hoc data: write whatever you want
• Ad-hoc queries: make up new queries whenever
• Efficient (“pyramidal”) calculations
Event Stream
• { time: "2013-02-15T01:02:03Z", type: "webreq", data: { path: "/order", method: "POST", duration: 50.7, status: 400, ua:"...MSIE 6.0..." } }
• { time: "2013-02-15T01:02:03Z", type: "tweet", id: 8675309, data: { text: "MongoDB talk yay", retweet_count: 121, user: { screen_name: "infochimps", followers_count: 7851, lang: "en", ...} } }
Events vs Metrics
• { time: "2013-02-15T01:02:03Z", type: "tweet", id: 8675309, data: { text: "MongoDB talk yay", retweet_count: 121, user: { screen_name: "infochimps", followers_count: 7851, lang: "en", ...} } }
Event:
• “# of tweets in 10s bucket at 1:02:10 on 2013-02-15”
• “# of non-english-language tweets in 1hr bucket at ...”
Metrics:
Events vs Metrics
• { time: "2013-02-15T01:02:03Z", type: "webreq", data: { path: "/order", method: "POST", duration: 50.7, status: 400, ua:"...MSIE 6.0..." } }
Event:
Metrics:
• “# of requests in 10s bucket at 3:05:10 on 2013-02-15”
• “Average duration of requests with 4xx status in the 5 minute bucket at 3:05:00 on 2013-02-15”
Events vs Metrics• Events:
• baskets of facts
• narcissistic
• LOTS AND LOTS
{ time: "2013-02-15T01:02:03Z", type: "webreq", data: { path: "/order", method: "POST", duration: 50.7, status: 400, ua:"...MSIE 6.0..." } }
Events vs Metrics• Events:
• baskets of facts
• narcissistic
• LOTS AND LOTS
• Metrics:
• a timestamped number
• look like the graph
• one per time bucket
{ time: "2013-02-15T01:02:03Z", type: "webreq", data: { path: "/order", method: "POST", duration: 50.7, status: 400, ua:"...MSIE 6.0..." } }
{ time: "2013-02-15T01:02:03Z", value: 90 }
billions and billions
3000 events/second
tuning methodology
Monkey See Monkey Do
Google for the #s the cool kids use
Spinal Tap
Turn everythingto 11!!!!
Hillbilly Mechanic
Rewrite formemcachedHBase onCassandra!!!
Moneybags
SSD plz
Moar CPU
Moar RAM
Moar Replica
Tuning How to do it
• Measure: “Why can’t it be faster?”
• Harmonize: “Use it right”
• Tune: “Align it to production resources”
see throughthe magic
• Why can’t it be faster than it is now?
• dstat (http://j.mp/dstatftw): dstat -drnycmf -t 5
• htop
• mongostat
Grok: client-side
• Made a sprayer to inject data
• invalidate a time range at max speed
• writes variously-shaped data: noise, ramp, sine, etc
• Or just reach into the DB and poke
• delete range of metrics, leave events
• delete range of events, leave metrics
Fault injection
• raise when packet comes in with certain flag
• { time: "2013...", data: {...}, _raise:"db_write" }
• (only in development mode, obvs.)
app-side tracing
• “Metalog” announces lifecycle progress:
• writes to log...
• ... or as cube metrics!
metalog.event('connect', { method: 'ws', ip: connection.remoteAddress, path: request.url }, 'minor');
app-side tracing
fits on machine
• Rate:
• 3000 ev/sec ≈ 250 M ev/day ≈ 2 BILLION/wk
• Expensive. Difficult.
• 250 GB accumulated per day (@1000 bytes/ev)
• 95 TB accumulated per year (@1000 bytes/ev)
3000 events/second
Metrics• Rate:
• 3M tensec/year (π· 107 sec/year)
• < 100 bytes/metric ...
• Manageable!
• a 30 metric dashboard is ~ 10 GB/year @10sec
• a 30 metric dashboard is ~ 170 MB/year @ 5min
20% gains are boring
At scale, your first barriers are either:
• Easy
• Impossible
Metrics: 10 GB/year
Events: 10 TB/month
Scalability síPerformance no
Still CPU and Memory Use
• Problem
• Mongo seems to be working
• but high resident memory and fault rate
• Memory-mapped Files
• 1Tb data served by 4Gb ram is no good
Capped Collections
AA B C D E F
• Fixed size circular queue
• records are in order of insertion
• oldest records are discarded when full
AH C D E F G ......G
Capped Collections
• Extremely efficient on write
• Extremely efficient for insertion-order reads
• Very efficient if queries are ‘local’
• events in same timebucket typically arrived at nearby timesand so are nearby on disk
AA B C D E F
don’t like the answer?
change the question.
uncapped events
capped metrics:
metrics are a view on data
mainline
capped events
uncapped metrics:
events are ephemeral
hi-scale branch
Harmony
• Make your pattern of accessmatch your system’s strengths and rhythm
Validate Mental Model
Easy fixes
• Duplicate requests = duplicate calculations
• Cube patch for request queues exists
• Easy fix!
• Non-pyramidal are inefficient
• Remove until things are under control
• ( solve paralyzing problems first )
cube 101
Cube Systems
Collector
• Receives events
• writes to MongoDB
• marks metrics for re-calculation (“invalidates”)
Evaluator
• receives, parses requests for metrics
• calculates metrics “pyramidally”
• then stores them, cached
Pyramidal Aggregation
10 20 15 25 10 10
1 5 2 0 2 0 6 4 7 1 0 2 2 3 2 4 2 2 5 5 4 6 4 1 2 7 0 0 0 1 6 0 0 1 0 3
90
ev ev ev ev ev ev ...
10s
1min
5min
Pyramidal Aggregation
1 5 2 0 2 0 6 4 7 1 0 2 2 3 2 4 2 2
ev ev ev ev ev ev ...
10s
1min
5min
Uses Cached Results
1 5 2 0 2 0 6 4 7 1 0 2 2 3 2 4 2 2
ev ev ev ev ev ev ...
10 20 15 25 10
5 5 4 6 4 1 2 7 0 0 0 1 10s
1min
5min
Pyramidal Aggregation
5 min
1 min
10 sec
ev ev ev ev ev....
• calculates metrics...
• from metrics and constants ... from metrics ...
• from events
• (then stores them, cached)
fast writes
how fast can we write?
how fast can we write?
FASTstreaming writes: way efficient
locked out
Writes and Invalidations
Inserts Stop Every 5s
• working
• working
• ANGRY
• ANGRY
• working
• working
Thanks, mongostat!
• working
• working
• ANGRY
• ANGRY
• working
• working
...
(simulated)
Inserts Stop Every 5sEvents Collection
AH C D E F G ......G
hi-speed writes localized reads
Metrics Collection. . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . ..
. ..
randomishreads
hi-speeddeletes
xxxxxxx
updates
Inserts Stop Every 5sEvents Collection
AH C D E F G ......G
hi-speed writes localized reads
Metrics Collection. . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . ..
. ..
randomishreads
hi-speeddeletes
xxxxxxx
updates
Inserts Stop Every 5s• What’s really going on?
• Database write locks
• Events and metrics have conflicting locks
• Solution: split the databasesEvents Collection
AH C D E F G ......G
hi-speed writes localized reads
Metrics Collection. . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . ..
. ..
randomishreads
hi-speeddeletes
xxxxxxx
fast reads
Pre-cache Metrics
• Keep metrics fresh (Warmer)
• Only calculate recent updates (Horizons)
fancy metrics
Non-pyramidal Aggregates
• Can’t calculate from warmed metrics
• Store values with counts in metrics
• Counts can be vivified for aggregations
• Smaller footprint than full events
• Works best for dense, finite values
finally, scaling
Multicore
• MongoDB
• Writes limited to single core
• Requires sharding for multicore
Multicore
• Cube (node.js)
• Concurrent, but not multi-threaded
• Easy solution
• Multiple collectors on different ports
• Produces redundant invalidations
• Requires external load balancing
Multicore
Hardware
• High Memory
• Capped events size scale with memory
• CPU
• Mongo / cube not optimized for multicore
• Faster cores
• EC2 Best value: m2.2xlarge
• < $700/mo, 34.2GB RAM, 13 bogo-hertz
Cloud helps
• Tune machines to application
• Dedicating databases for each application makes life a lot easier
Cloud helps
• Tune machines to application
•
github.com/ infochimps-labs
good ideas that didn’t help
Queues
• Different queueing methods
• Should optimize metric calculations
• No significant improvement
Locks: update VS remove
• Uncapped metrics allow ‘remove’ as invalidation option
• Remove doesn’t help with database locks
• It was a stupid idea anyway: that’s OK
• “Hey, poke it and see what happens!”
Mongo Aggregations
• Mongo has aggregations!
• Node ends up working better
• Mongo aggregations aren’t faster
• Less flexible
• Would require query language rewrite
Why not Graphite?
• Data model
• Metrics-centric vs Events-centric(metrics code not intertwingled with app code)
• Environment familiarity
• Cube: d3, node.js, mongo
• Graphite: Django, Whisper, C