Hands on with Hyper-V Clustering Maintenance Mode & Cluster Aware Updating
The Power of Choice in Data-Aware Cluster …...The Power of Choice in ! Data-Aware Cluster...
Transcript of The Power of Choice in Data-Aware Cluster …...The Power of Choice in ! Data-Aware Cluster...
The Power of Choice in !Data-Aware Cluster Scheduling
Shivaram Venkataraman1 , Aurojit Panda1 Ganesh Ananthanarayanan2, Michael Franklin1, Ion Stoica1
1UC Berkeley, 2Microsoft Research
amplab
Trends: Big Data
2 [Kathy Yelick LBNL, VLDBJ 2012, Dhruba Borthakur]
Challenge&1:&Data&is&Big&Projected&Growth&
Increa
se&ove
r&201
0&
0$
10$
20$
30$
40$
50$
60$
2010$ 2011$ 2012$ 2013$ 2014$ 2015$
Moore's$Law$
Overall$Data$
Particle$Accel.$
DNA$Sequencers$
Data$Grows$faster$than$Moore’s$Law$[IDC$report,$Kathy$Yelick,$LBNL]$
Data grows faster than Moore’s Law!
Trends: Big Data
3 [Kathy Yelick LBNL, VLDBJ 2012, Dhruba Borthakur]
Projected&Growth&Increa
se&ove
r&201
0&
0$
10$
20$
30$
40$
50$
60$
2010$ 2011$ 2012$ 2013$ 2014$ 2015$
Moore's$Law$
Overall$Data$
Particle$Accel.$
DNA$Sequencers$
Facebook Hive cluster Last 4 years:
data growth 2500x ! queries/day 60x!
Microsoft Scope Cluster “The number of daily jobs has doubled every six months for the past two years.”
Trends: Low Latency
4
10 min
2004: MapReduce
batch job
2009: Hive
2010: Dremel
2012: In-memory
Spark
1 min
10s 2s
Applications
Machine learning algorithms stochastic gradient, coordinate descent
Approximate Query Processing blinkdb, presto, minitable
7
1
4
Rack
Available (N) = 2 Required (K) = 2
Time
Available Data Running
Busy
Existing
11
2
3
Unavailable Data
1
4
Rack
Available (N) = 4 Required (K) = 2
Time
Choice-Aware
2
3
Launched (M) = 3
13
Available Data Running
Busy
KMN Scheduler - How much can KMN improve locality - Propagate benefits across stages - Handling stragglers
14
Locality, K=100
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Prob
. of L
ocali
ty
Utilization
K/N=1.0 K/N=0.5 K/N=0.1 K – Number of blocks chosen N – Number of blocks available
18
KMN significantly improves locality
Bottleneck Link
R3
M1 M2 M4 M5
R2
M3
6
Core
2 4 2
R1
4 2
Bottleneck Link Link with Max. transfers
Cross Rack Data Skew
Minimum transfers Maximum transfers
22
=62= 3
Facebook Trace Cross Rack Data Skew
Minimum transfers
Maximum transfers
0 0.2 0.4 0.6 0.8
1
0 5 10 15 20 25 30
CDF
Cross Rack Data Skew
<50 tasks 50-150 tasks >150 tasks
23
Power of Choice Load balancing: balls and bins Insight: Run extra tasks (M > K) M1
M2
M4 M5
M3
M6 M7
Cross Rack Data Skew = 3
24
Power of Choice
Technique: Spread out choice of K tasks to reduce skew
M1
M2
M4 M5
M3
M6 M7
M = 7, K = 5 Cross Rack Data Skew = 2
25
Using KMN
// Create Spark RDD file = sc.textFile(“tpc-‐h.data”) // Select a 10% sample using KMN sample = file.blockSample(0.1) // RDD operations sample.map { li => (li.linestatus, li.quantity) }.collect()
27
Evaluation
Baseline: Use a pre-selected random sample Setup: 100 m2.4xlarge EC2 machines, 60GB RAM/mc
Facebook traces replay Long DAGs (Stochastic Gradient Descent) SQL queries from Conviva Reducer placement Varying Utilization
29
Facebook Overall
0 10 20 30 40 50
0-10
11-100
>100
Job Completion Time (s)
Job
Size
Baseline KMN-M/K=1.05
30
Cross Rack Skew
0 10 20 30
<=4
4-to-8
>8
Shuffle Stage Time (seconds)
Cros
s Ra
ck S
kew
Baseline KMN-M/K=1.0 KMN-M/K=1.05 KMN-M/K=1.1
31
How many extra tasks ?
32
0
0.2
0.4
0.6
0.8
1
0 10 20 Cross-Rack Skew
M/K=1.0 M/K=1.1 M/K=2.0
0
0.2
0.4
0.6
0.8
1
0 10 20 30 Cross Rack Skew
M/K=1.0 M/K=1.1 M/K=2.0
50 – 150 tasks > 150 tasks
KMN: How many stages ?
34 Gradient
Aggregate1
Aggregate2
Aggregate3
KMN Stages Time (s)
Gradient 15.27
KMN: How many stages ?
35 Gradient
Aggregate1
Aggregate2
Aggregate3
KMN Stages Time (s)
Gradient 15.27
Gradient + Agg1 12.72
KMN: How many stages ?
36 Gradient
Aggregate1
Aggregate2
Aggregate3
KMN Stages Time (s)
Gradient 15.27
Gradient + Agg1 12.72
Gradient + Agg2 11.79
KMN: How many stages ?
37 Gradient
Aggregate1
Aggregate2
Aggregate3
KMN Stages Time (s)
Gradient 15.27
Gradient + Agg1 12.72
Gradient + Agg2 11.79
Gradient + Agg3 12.09
Related Work Power of Choice
Power-of-Two choices [TPDS’01] Sparrow [SOSP’13]
Improving Cluster Scheduling
Quincy [SOSP’09] alsched [SOCC’12] Dolly [NSDI’13]
38