Making Every Bit Count in Wide Area Analytics
description
Transcript of Making Every Bit Count in Wide Area Analytics
![Page 1: Making Every Bit Count in Wide Area Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051421/56816384550346895dd467ef/html5/thumbnails/1.jpg)
1
Making Every Bit Count in Wide Area Analytics
Ariel Rabkin
Joint work with: Matvey Arye, Siddhartha Sen, Michael J. Freedman, and Vivek Pai
![Page 2: Making Every Bit Count in Wide Area Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051421/56816384550346895dd467ef/html5/thumbnails/2.jpg)
2
Global Systems Have Global Data
![Page 3: Making Every Bit Count in Wide Area Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051421/56816384550346895dd467ef/html5/thumbnails/3.jpg)
3
The Rise of Big Distributed Data
• CDNs:– Akamai has ~20 million requests per
second– CloudFlare has about 300 MB/s of logs,
volume doubles every 4 months• Sensor data (e.g., power grid,
highways)• Smart camera networks
![Page 4: Making Every Bit Count in Wide Area Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051421/56816384550346895dd467ef/html5/thumbnails/4.jpg)
4
Trends
Time
Amou
nt p
er
dolla
r Data
Volum
esWide-area Bandwidth
![Page 5: Making Every Bit Count in Wide Area Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051421/56816384550346895dd467ef/html5/thumbnails/5.jpg)
5
Analyzing Low-rate Events is Easy
Server Crashed!
Alert me when server crashes!
![Page 6: Making Every Bit Count in Wide Area Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051421/56816384550346895dd467ef/html5/thumbnails/6.jpg)
6
High-rate Events can be Costly
Every minute, compute request counts by URL
RequestsRequestsRequestsRequests
RequestsRequestsRequestsRequests
![Page 7: Making Every Bit Count in Wide Area Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051421/56816384550346895dd467ef/html5/thumbnails/7.jpg)
7
Backhaul has Bad DynamicsExample: backhaul count of events every 5 minutesChoice of summaries is made upfront statically
• Buyer’s remorse: Chose to collect unnecessary and expensive data
• Analyst’s remorse: Summaries insufficient for analysis. No way to retroactively get more data
![Page 8: Making Every Bit Count in Wide Area Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051421/56816384550346895dd467ef/html5/thumbnails/8.jpg)
8
Local Storage!
Every minute, compute request counts by URL
RequestsRequestsRequestsRequests
RequestsRequestsRequestsRequests
LocalAggregatio
n and Storage
LocalAggregatio
n and Storage
![Page 9: Making Every Bit Count in Wide Area Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051421/56816384550346895dd467ef/html5/thumbnails/9.jpg)
9
Challenge: Bandwidth ScarcityI want the request count for every URL every
secondI can’t do that, Ari. That costs 100 MB/sec. You only have 12 MB/sec. Want to impose a rank cutoff, value
cutoff, or change frequency?
I can do that for 900 KB/sec.
Can I get the top 1000 URLs every second?
Great, do it!
![Page 10: Making Every Bit Count in Wide Area Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051421/56816384550346895dd467ef/html5/thumbnails/10.jpg)
10
? ? ? ? ? ? ?
Challenge: Varying Scarcity
Time
Band
wid
thNeeded
Available
Can do
First aggregate over longer time periods, up to 30 seconds. Then
only keep the top URLs.
![Page 11: Making Every Bit Count in Wide Area Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051421/56816384550346895dd467ef/html5/thumbnails/11.jpg)
12
Data Processing Requirements• Aggregatable
• Merge-able
Data DataMerged
Representation
+ =• Reducible
Data Data
StoredData +
=Updat
e
![Page 12: Making Every Bit Count in Wide Area Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051421/56816384550346895dd467ef/html5/thumbnails/12.jpg)
13
Raw byte stringse.g. MapReduce
Database tables
High-level API
Merge + Aggregate
Predictable performance
ArbitraryJoins
X X √ X√ X X √
![Page 13: Making Every Bit Count in Wide Area Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051421/56816384550346895dd467ef/html5/thumbnails/13.jpg)
14
The Data Cube Model
Counts by URL 12:00
12:01
12:02
www.mysite.com
3 5 …
www.yoursite.com
5 4 …
www.hersite.com
8 12 …Roll-up of mysite.com by time from 12:00 to 12:01:
8Roll-up of sites at time
12:00: 16
Cube: A multidimensional array, with one or more aggregates, indexed by a set of dimensions
Aggregation function used for:• Updates• Roll-ups• Merging cubes• Degrading
cubes
![Page 14: Making Every Bit Count in Wide Area Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051421/56816384550346895dd467ef/html5/thumbnails/14.jpg)
15
Data Cube
Raw byte stringse.g. MapReduce
Database tables
High-level API
Merge + Aggregate
Predictable performance
ArbitraryJoins
X X √ X√ X X √√ √ √ X
![Page 15: Making Every Bit Count in Wide Area Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051421/56816384550346895dd467ef/html5/thumbnails/15.jpg)
16
DataflowOperator
sLocalCube
DataflowOperator
s
Net
wor
k bo
ttle
neck
DataflowOperator
sLocal Cube
DataflowOperator
s
DataflowOperator
sMerged Cube
Dataflow
Operators
A Vision for Wide-Area Analytics
Dataflow adapted to bandwidth
![Page 16: Making Every Bit Count in Wide Area Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051421/56816384550346895dd467ef/html5/thumbnails/16.jpg)
17
Adaptivity
DataflowOperator
s
Local CubeDataflowOperator
s
Net
wor
kbo
ttle
neck
![Page 17: Making Every Bit Count in Wide Area Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051421/56816384550346895dd467ef/html5/thumbnails/17.jpg)
18
Feedback control
Net
wor
kbo
ttle
neck
Adaptivity
DataflowOperator
s
Local CubeDataflowOperator
sSummariz
edCube
• Key ingredients:– Cube summarization as
mechanism– User-defined policies– Feedback control
![Page 18: Making Every Bit Count in Wide Area Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051421/56816384550346895dd467ef/html5/thumbnails/18.jpg)
19
Backup Slides
![Page 19: Making Every Bit Count in Wide Area Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051421/56816384550346895dd467ef/html5/thumbnails/19.jpg)
20
Conclusions• The hard problems in wide-area analysis:– Reasoning about bandwidth/data quality
tradeoffs– Optimizing data quality under changing
conditions.– Jointly optimizing bandwidth and other
resources• We are building a system. –We call it JetStream. Stay tuned….
![Page 20: Making Every Bit Count in Wide Area Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051421/56816384550346895dd467ef/html5/thumbnails/20.jpg)
23
Bandwidth Costs do not Decline Smoothly
[TeleGeography's Global Bandwidth Research Service]
![Page 21: Making Every Bit Count in Wide Area Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051421/56816384550346895dd467ef/html5/thumbnails/21.jpg)
24 [TeleGeography's Global Bandwidth Research Service]
20% 20%
Frankfurt-
London
2012 Bandwidth Price Shifts
![Page 22: Making Every Bit Count in Wide Area Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051421/56816384550346895dd467ef/html5/thumbnails/22.jpg)
25
Diurnal Load Makes Overprovisioning Expensive
• Leased lines waste capacity during off-peak
• Public internet gets congested during peak
![Page 23: Making Every Bit Count in Wide Area Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051421/56816384550346895dd467ef/html5/thumbnails/23.jpg)
29
Can iteratively pose different queries
RequestsRequestsRequestsRequests
Benefit: Iteration
RequestsRequestsRequestsRequests
LocalAggregatio
n and Storage
LocalAggregatio
n and Storage
A revised query
![Page 24: Making Every Bit Count in Wide Area Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051421/56816384550346895dd467ef/html5/thumbnails/24.jpg)
30
Can adapt data volume collected to available bw
RequestsRequestsRequestsRequests
Benefit: adaptation
RequestsRequestsRequestsRequests
LocalAggregatio
n and Storage
LocalAggregatio
n and Storage
Limited Bandwidth
![Page 25: Making Every Bit Count in Wide Area Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051421/56816384550346895dd467ef/html5/thumbnails/25.jpg)
31
Can adapt data volume collected to available bw
RequestsRequestsRequestsRequests
Benefit: adaptation
RequestsRequestsRequestsRequests
LocalAggregatio
n and Storage
LocalAggregatio
n and Storage
Ample Bandwidth
![Page 26: Making Every Bit Count in Wide Area Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051421/56816384550346895dd467ef/html5/thumbnails/26.jpg)
32
A dataflow model for wide-area analytics
Operator
Cube
Defines data transformation on tuples. Can do input or output.
Structured storage of data
![Page 27: Making Every Bit Count in Wide Area Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051421/56816384550346895dd467ef/html5/thumbnails/27.jpg)
33
Processing SourceCube
Net
wor
k bo
ttle
neck
Processed Data
Processing SourceCube
Generated data Ingested Into Local cubes
![Page 28: Making Every Bit Count in Wide Area Analytics](https://reader034.fdocuments.in/reader034/viewer/2022051421/56816384550346895dd467ef/html5/thumbnails/28.jpg)
34
Processed Data
Processing