GoodFit: Multi-Resource Packing of Tasks with Dependencies
-
Upload
hadoop-summit -
Category
Technology
-
view
507 -
download
0
Transcript of GoodFit: Multi-Resource Packing of Tasks with Dependencies
![Page 1: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/1.jpg)
GoodFit: Multi-Resource Packing of Tasks with
Dependencies
![Page 2: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/2.jpg)
Cluster Scheduling for JobsJobs
Machines, file-system, network
Cluster Scheduler matches tasks to resources
Goals• High cluster utilization• Fast job completion time• Predictable perf./ fairness
E.g., BigData (Hive, SCOPE, Spark)E.g., CloudBuild
Tasks
Dependencies
• Need not keep resource “buffers”• More dynamic than VM placement (tasks last seconds)• Aggregate properties are important (eg, all tasks in a job should finish)
![Page 3: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/3.jpg)
Need careful multi-resource planning
Problem
Fragmentation
Current Schedulers Packer Scheduler
Over-allocation of net/disk
Current Schedulers Packer Scheduler
2 tasks/T 3 tasks/T (+50%) 2 tasks/ 2T 2 tasks/T (+100%)
![Page 4: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/4.jpg)
… worse with dependenciesProblem 2
Tt, r t, 1- r
t, r
t, 1- r t, 1- r
(T- 2)t, r (T- 4)t, r ~Tt, r
……
DAG label= {duration, resource demand}
resource
time
~nT t…
resource
time
~T t
…
…
Crit. Path Best
Critical path scheduling is n times off since it ignores resource demands
Packers can be d times off since they ignore future work [d resources]
![Page 5: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/5.jpg)
Typical job scheduler infrastructure
+ packing+ bounded unfairness+ merge schedules+ overbook
DAGAM
DAGAM
…
Node heartbeat
Task assignment
Schedule Constructor
Schedule Constructor
RMNM
NM
NM
NM
![Page 6: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/6.jpg)
Main ideas in multi-resource packingTask packing ~ Multi-dimensional bin packing, but* Very hard problem (“APX-hard”)* Available heuristics do not directly apply [task demands change with placement]
Alignment score (A) = D R A packing heuristic Task’s resources demand vector: D Machine resource vector: R<
Fit
A job completion time heuristic shortest remaining work, P tasks avg. durationtasks avg. resource demand
**
=remaining # tasks
Packing Efficiency
?delays job completion
loses packing efficiencyJob Completion Time
Fairness
Trade-offs:
We show that:{best “perf” |bounded unfairness} ~ best “perf”
loses both
![Page 7: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/7.jpg)
Main ideas in packing dependent tasks
1. Identify troublesome tasks (meat) and place them first
2. Systematically place other tasks without deadlocks
3. At runtime, use a precedence order from the computed schedule + heuristics to (a) overbook, (b) previous slide.
4. Better lower bounds for DAG completion time
M
P
C
O
time
resource
meat begin
meat end
parents
meat
children
![Page 8: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/8.jpg)
Results - 1
Packing
Packing + Deps.
Lower bound
[20K DAGs from Cosmos]
![Page 9: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/9.jpg)
Results - 2
Tez + PackingTez + Pack +Deps
[200 jobs from TPC-DS, 200 server cluster]
![Page 10: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/10.jpg)
Bundling
![Page 11: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/11.jpg)
Temporal relaxation of fairness
![Page 12: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/12.jpg)
Map (disk)
Reduce (netw.)
Fair share among two identical jobs
50%
50%
50%
50%
2T 4T
Instantaneous fairness
100%
100%
100%
100%
2T 3TT
1) Temporal relaxation of fairnessa job will finish within x the time it takes given strict share
2) Optimal trade-off with performancex fairness costs x on make-span
3) A simple (offline) algorithm that achieves the above trade-off
Problem:
Instantaneous fairness can be up to dx worse on makespan (d resources)
Best
Fairness slack Perf loss
0 (perfectly fair) 2x
1 (<2x longer) 1.1x
2 (<3x longer) 1.07x
![Page 13: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/13.jpg)
Bare metalVM Allocation
Data-parallel Jobs
Job: Tasks
Dependencies
E.g., HDInsight, AzureBatch
E.g., BigData (Yarn, Cosmos, Spark)
E.g., CloudBuild
3500 servers3500 users>20M targets/day
~100K servers (40K at Yahoo)
>50K servers>2EB stored>6K devs
![Page 14: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/14.jpg)
• Tasks are short-lived (10s of seconds)• Have peculiar shaped demands• Composites are important (job needs all tasks to finish)• OK to kill and restart tasks• Locality
1) Job scheduling has specific aspects
2) will speed-up the average job (and reduce resource cost)
3) research + practice
![Page 15: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/15.jpg)
Resource aware scheduling improves SLOs and Return/$
![Page 16: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/16.jpg)
Cluster Scheduling for JobsJobs
Machines, file-system, network
Cluster Scheduler matches tasks to resources
Goals• High cluster utilization• Fast job completion time• Predictable perf./ fairness• Efficient (milliseconds…)
E.g., HDInsight, AzureBatchE.g., BigData (Hive, SCOPE, Spark)E.g., CloudBuild
Tasks
Dependencies
![Page 17: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/17.jpg)
Need careful multi-resource planning
Problem
Fragmentation
Current Schedulers Packer Scheduler
Over-allocation of net/disk
Current Schedulers Packer Scheduler
2 tasks/T 3 tasks/T (+50%) 2 tasks/ 2T 2 tasks/T (+100%)
![Page 18: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/18.jpg)
… worse with dependenciesProblem 2
Tt, r t, 1- r
t, r
t, 1- r t, 1- r
(T- 2)t, r (T- 4)t, r ~Tt, r
……
DAG label= {duration, resource demand}
resource
time
~nT t…
resource
time
~T t
…
…
Crit. Path Best
Critical path scheduling is n times off since it ignores resource demands
Packers can be d times off since they ignore future work [d resources]
![Page 19: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/19.jpg)
Typical job scheduler infrastructure
+ packing+ bounded unfairness+ merge schedules+ overbook
DAGAM
DAGAM
…
Node heartbeat
Task assignment
Schedule Constructor
Schedule Constructor
RMNM
NM
NM
NM
![Page 20: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/20.jpg)
Main ideas in packing dependent tasks
1. Identify troublesome tasks (T) and place them first
2. Systematically place other tasks without dead-ends
3. At runtime, enforce computed schedule + heuristics to (a) overbook, (b) previous slide.
4. Better lower bounds for DAG completion time
T
P
C
O
time
resource
Trouble begin
Trouble end
parents
trouble
children
![Page 21: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/21.jpg)
Results - 1
Packing
Packing + Deps.
Lower bound
[20K DAGs from Cosmos]
2X1.5X
![Page 22: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/22.jpg)
Results - 2
Tez + PackingTez + Pack +Deps
[200 jobs from TPC-DS, 200 server cluster]
![Page 23: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/23.jpg)
Multi-Resource Packing for Cluster SchedulersTetris
![Page 24: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/24.jpg)
Performance of cluster schedulers
We observe that:
1Time to finish a set of jobs
Resources are fragmented i.e. machines are running below capacity Even at 100% usage, goodput is much smaller due to over-allocation Even pareto-efficient multi-resource fair schemes result in much lower performance
Tetrisup to 40% improvement in makespan1 and job
completion time with near-perfect fairness
![Page 25: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/25.jpg)
25
Findings from Bing and Facebook traces analysis
Tasks need varying amounts of each resource
Demands for resources are weakly correlated
Diversity in multi-resource requirements:
Multiple resources become tight
This matters because no single bottleneck resource: Enough cross-rack network bandwidth to use all CPU cores
Upper bounding potential gains reduce makespan1 by up to 49% reduce avg. job compl. time by up to 46%
![Page 26: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/26.jpg)
26
Why so bad #1
Production schedulers neither pack tasks nor consider all their relevant
resource demands
#1 Resource Fragmentation
#2 Over-allocation
![Page 27: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/27.jpg)
27Current Schedulers
“Packer” Scheduler
Machine A4 GB Memory
Machine B4 GB Memory
T1: 2 GB
T3: 4 GB
T2: 2 GB
Tim
e
Resource Fragmentation (RF)
STOP
Machine A4 GB Memory
Machine B4 GB Memory
T1: 2 GB
T3: 4 GB
T2: 2 GB
Tim
e
Avg. task compl. time = 1 t
Current Schedulers
RF increase with the number of resources being allocated !
Avg. task compl.time = 1.33 t
Resources allocated in terms of Slots
Free resources unable to be assigned to tasks
![Page 28: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/28.jpg)
28Current Schedulers
“Packer” Scheduler
Machine A4 GB Memory; 20 MB/s Nw.
Tim
e T1: 2 GBMemory
20 MB/s Nw.
T2: 2 GBMemory
20 MB/s Nw.
T3: 2 GBMemory
Machine A4 GB Memory; 20 MB/s Nw.
Tim
e T1: 2 GBMemory
20 MB/s Nw.
T2: 2 GBMemory
20 MB/s Nw.
T3: 2 GBMemory
STOP
20 MB/s Nw.
20 MB/s Nw.
Over-Allocation
Not all tasks resource
demands areexplicitly allocated Disk and
network are over-allocated
Avg. task compl.time= 2.33 t Avg. task compl. time = 1.33 t
Current Schedulers
![Page 29: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/29.jpg)
29
Work Conserving != no fragmentation, over-allocation
Treat cluster as a big bag of resources Hides the impact of resource fragmentation
Assume job has a fixed resource profile Different tasks in the same job have different demands
Multi-resource Fairness Schemes do not help eitherWhy so bad #2
The schedule impacts job’s current resource profiles
Can schedule to create complementarity profiles
Packer Scheduler vs. DRF Avg. Job Compl.Time: 50% Makespan: 33%
Pareto1 efficient != performant
1no job can increase share without decreasing the share of another
![Page 30: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/30.jpg)
30
Competing objectives
Job completion time
Fairness
vs.
Cluster efficiency
vs.
Current Schedulers1. Resource Fragmentation
3. Fair allocations sacrifice performance
2. Over-Allocation
![Page 31: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/31.jpg)
31
# 1Pack tasks along multiple resources to improve cluster efficiency and reduce
makespan
Tetris
![Page 32: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/32.jpg)
32
Theory
PracticeMulti-Resource Packing of
Tasks similar to Multi-Dimensional Bin Packing
Balls could be tasks Bin could be machine, time
1APX-Hard is a strict subset of NP-hard
APX-Hard1
Existing heuristics do not directly apply here: Assume balls of a fixed size
Assume balls are known apriori
vary with time / machine placed elastic
cope with online arrival of jobs, dependencies, cluster activity
Avoiding fragmentation looks like: Tight bin packing Reduces # of bins used -> reduce makespan
![Page 33: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/33.jpg)
33
# 1Packing
heuristic
Tetris
1. Check for fit ensure no over-allocation Over-Allocation
Alignment score (A)
A packing heuristic Tasks resources demand vector Machine resource vector<
Fit
“A” works because:
2. Bigger balls get bigger scores
3. Abundant resources used first Resource Fragmentation
4. Can spread load across machines
![Page 34: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/34.jpg)
34
# 2Faster average job completion time
Tetris
![Page 35: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/35.jpg)
35
TetrisCHALLENGE #
2
Shortest Remaining Time First1 (SRTF)
1SRTF – M. Harchol-Balter et al. Connection Scheduling in Web Servers [USITS’99] schedules jobs in ascending order of their remaining time
Job Completion Time Heuristic
Q: What is the shortest “remaining time” ?
“remaining work”
remaining # tasks tasks durationstasks resource demands
&
&=
A job completion time heuristic Gives a score P to every job Extended SRTF to incorporate multiple resources
![Page 36: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/36.jpg)
36
TetrisCHALLENGE #
2Job Completion Time Heuristic
Combine A and P scores !
Packing Efficiency
Completion Time
?
1: among J runnable jobs2: score (j) = A(t, R)+ P(j)3: max task t in j, demand(t) ≤ R (resources free) 4: pick j*, t* = argmax score(j)
A: delays job completion time
P: loss in packing efficiency
![Page 37: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/37.jpg)
37
# 3Achieve performance and fairness
Tetris
![Page 38: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/38.jpg)
38
# 3
Tetris
A says: “task i should go here to improve packing efficiency”
Feasible solution which typically can satisfy all of them
P says: “schedule job j next to improve job completion time”
Fairness says: “this set of jobs should be scheduled next”
Fairness Heuristic
Performance and fairness do not mix well in general
But ….We can get “perfect fairness” and much better performance
![Page 39: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/39.jpg)
39
# 3
Tetris
Fairness Knob, F [0, 1) F = 0 most efficient scheduling F → 1 close to perfect fairness
Pick the best-for-perf. task from among
1-F fraction of jobs furthest from fair share
Fairness Heuristic
Fairness is not a tight constraint
Long term fairness not short term fairness Lose a bit of fairness for a lot of gains in performance
Heuristic
![Page 40: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/40.jpg)
40
Putting it all together
We saw:
Other things in the paper:
Packing efficiency Prefer small remaining work Fairness knob
Estimate task demands Deal with inaccuracies, barriers Ingestion / evacuation
Job Manager1 Node Manager1
Cluster-wide Resource Manager
Multi-resource asks; barrier hint
Track resource usage; enforce allocations
New logic to match tasks to machines (+packing, +SRTF, +fairness)
Allocations
Asks
Offers
Resourceavailability reports
Yarn architectureChanges to add Tetris(shown in orange)
![Page 41: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/41.jpg)
41
Evaluation Pluggable scheduler in Yarn 2.4
250 machine cluster deployment
Replay Bing and Facebook traces
![Page 42: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/42.jpg)
42
Efficiency
Makespan
DRF 28 %
Avg. Job Compl. Time
35%
0 5000 10000 150000
50
100
150
200 CPU Mem In St
Time (s)
Uti
lizat
ion
(%)
Tetris
Gains from avoiding fragmentation avoid over-allocation
0 4500 9000 13500 18000 225000
50
100
150
200 CPU Mem In St
Time (s)
Uti
lizat
ion
(%)
Tetris vs.
Capacity Scheduler 29 % 30 %
Over-allocation
Lower value => higher resource fragmentation
Uti
lizat
ion
(%)
20015010050
00 500
010000
15000Time
(s)
Over-allocation
Lower value => higher resource fragmentation
Capacity Scheduler
![Page 43: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/43.jpg)
43
Fairness
Fairness Knob quantifies the extent to which Tetris adheres to fair allocation
No FairnessF = 0
Makespan
50 %
10 %
25 %
Job Compl. Time
40 %
23 %
35 %
Avg. Slowdown[over impacted jobs]
25 %
2 %
5 %
Full FairnessF → 1
F = 0.25
![Page 44: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/44.jpg)
44
Tetris Pack efficiently along multiple
resources
Prefer jobs with less
“remaining work”
Incorporate Fairness
combine heuristics that improve packing efficiency with those that lower average job completion time
achieving desired amounts of fairness can coexist with improving cluster performance
implemented inside YARN; trace-driven simulations and deployment show encouraging initial results
We are working towards a Yarn check-inhttp://research.microsoft.com/en-us/UM/redmond/projects/tetris/
![Page 45: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/45.jpg)
45
Backup slides
![Page 46: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/46.jpg)
Estimating resource requirements
Estimating Resource Demands
Under-utilization
from:
o finished tasks in the same phase
peak usage demands estimates
Machine1 - In Network
850
1024
0
512
MBy
tes /
sec
Time (sec)In Network UsedIn Network Free
Resource Tracker
o report unused resourceso aware of other cluster activities: ingestion and evacuation
Resource Tracker
o collecting statistics from recurring jobsPeak Demand
o inputs size/location of tasks
46
PlacementImpacts network/disk requirements
![Page 47: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/47.jpg)
47
Packer Scheduler vs. DRF
DRF Scheduler
Packer Schedulers
2 tasks
Job Schedule
Resources used
2 tasks 2 tasks2 tasks 2 tasks 2 tasks6 tasks 6 tasks 6 tasksA
BC
18 cores
16 GB
18 cores
16 GB
18 cores
16 GB
t 2t 3t0 tasks
Job Schedule
Resources used
0 tasks 6 tasks0 tasks 6 tasks18 tasksA
BC
18 cores 18 cores
6 GB
18 cores
6 GB
t 2t 3t
36 GB
Durations:A: 3tB: 3tC: 3t
Durations:A: tB: 2tC: 3t33%
improvement
Dominant Resource Fairness (DRF)computes the dominant share (DS) of every user and seeks to maximize the minimum DS across all users
Cluster [18 Cores, 36 GB Memory] Job: [Task Prof.], # tasks
A [1 Core, 2 GB], 18
B [3 Cores, 1 GB], 6
C [3 Cores, 1 GB], 6DS =
max (, , ) (Maximize allocations) (CPU constraint)
2qA + 1qB + 1qC 36 (Memory constraint)
![Page 48: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/48.jpg)
481Time to finish a set of jobs
Machine 1,2: [2 Cores, 4 GB]Job: [Task Prof.], # tasks
A [2 Cores, 3 GB], 6
B [1 Core, 2 GB], 2
Resources used
4 cores
6 GB
2 tasks
2 tasks
2 tasks
2 tasks
t 2t 3t 4tJob Schedule
4 cores
6 GB
4 cores
6 GB
2 cores
4 GB
Resources used
2 cores
4 GB
2 tasks
2 tasks
2 tasks
2 tasks
t 2t 3t 4tJob Schedule
4 cores
6 GB
4 cores
6 GB
4 cores
6 GB
Pack No PackDurations:
A: 3tB: 4t
Durations:A: 4tB: t
29% improvement
Packing efficiency does not achieve everything
Achieving packing efficiency does not necessarily improve job completion time
![Page 49: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/49.jpg)
49
Ingestion / evacuation
ingestion = storing incoming data for later analytics
evacuation = data evacuated and re-replicated before maintenance operations
e.g. some clusters reports volumes of up to 10 TB per hour
Other cluster activities which produce background traffic
e.g. rack decommission for machines re-imaging
Resource Tracker reports, used by Tetris to avoid contention between its tasks and these activities
![Page 50: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/50.jpg)
50
Workload analysis
![Page 51: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/51.jpg)
51
Alternative Packing Heuristics
![Page 52: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/52.jpg)
52
Fairness vs. Efficiency
![Page 53: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/53.jpg)
53
Fairness vs. Efficiency
![Page 54: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/54.jpg)
54
Virtual Machine Packing != Tetris
Virtual Machine Packing
But focus on different challenges and not task packing: balance load across servers ensure VM availability inspite of failures
allow for quick software and hardware updates
NO corresponding entity to a job and hence job completion time is inexpressible
Explicit resource requirements (e.g. small VM) makes VM packing simpler
Consolidating VMs, with multi-dimensional resource requirements, on to the fewest number of servers
![Page 55: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/55.jpg)
55
Barrier knob, b [0, 1)
Tetris gives preference for last tasks in a stage
Offer resources to tasks in a stage preceding a barrier, where b fraction of tasks have finished
b = 1 no tasks preferentially treated
![Page 56: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/56.jpg)
56
Starvation Prevention
It could take a long time to accommodate large tasks ?
But …1. most tasks have demands within one order of magnitude of one another
2. machines report resource availability to the scheduler periodically scheduler learn about all the resources freed up by tasks that finish in the
preceding period together => can to reservation for large tasks
![Page 57: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/57.jpg)
57
Cluster load vs. Tetris performance
![Page 58: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/58.jpg)
Packing and Dependency-aware Scheduling for Data-Parallel Clusters
Graphene
![Page 59: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/59.jpg)
Performance of cluster schedulers
We observe that:
1Time to finish a set of jobs
Typically cluster schedulers do dependency-aware scheduling OR multi-resource packing None of the existing solutions are close to optimal for more than 50% of the
production jobs
Graphene> 30% improvements in makespan1 and job
completion time for more than 50% of the jobs
2
![Page 60: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/60.jpg)
Findings from Bing traces analysis
Jobs structure have evolved into complex DAGs of tasks
depth 7 103 tasks
Median job DAG’s has:
A good cluster scheduler should be aware of dependencies
1Time to finish a set of jobs3
![Page 61: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/61.jpg)
61
Findings from Bing traces analysis
High coefficient of variation (~1) for many resources Demands for resources are weakly
correlated
Applications have (very) diverse resource needs:
Multiple resources become tight
This matters because no single bottleneck resource: Enough cross-rack network bandwidth to use all CPU cores
CPU, Memory, Network and Disk
A good cluster scheduler should pack resources
![Page 62: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/62.jpg)
62
Why so bad
Production schedulers DON’T pack tasks
consider dependencies
ORAND
![Page 63: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/63.jpg)
Dependency-aware
Packing
Breadth First Search (BFS)
63
Do not account for tasks resource demands
If so, they assume tasks have homogeneous demands
OR Consider the DAG structure during the schedule
Tetris
Ignore dependencies
Takes local greedy choices
Handle tasks with multiple resource requirements
Any scheduler that is not packing, is up to n x OPTIMAL (n – number tasks)
Any scheduler that ignores dependencies is d x OPTIMAL (d – number resource dimensions)
Critical Path Scheduling (CPSched)
![Page 64: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/64.jpg)
Where does the “work” lie in a DAG?
“Work” – stages in a DAG where most amount of resources X time is spent
Large DAGs that are neither a bunch of unrelated stages nor a chain of stages
> 40% of the DAGs have most of the “work” on the Critical Path CPSched performs well
> 30% of the DAGs have most of the “work” such that Packers performs well
For ~50% of the DAGs neither packers nor critically-based
schedulers may perform well 7
![Page 65: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/65.jpg)
65
Pack tasks along multiple resources while consider tasks dependencies
Graphene State-of-the art techniques are suboptimal
Key ideas in Graphene
Conclusion
![Page 66: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/66.jpg)
66
State-of-the art scheduling techniques are suboptimal
CPSched / Tetris3 X Optimal
t0: t1:
t2:
t3:
1{.7, .31}
.01{.95, .01}
.01{.1, .7}
.96{. 2, .68}
.98{. 1, .01}
.01{. 01, .01}
t4:
t5:
duration{rsrc.1, rsrc.2}
task:
CPSched t0 t4 t5t
t1 t3t22t 3t
Time: ~3T
Tetris t0 t1 t2t
t4 t3t52t 3t
Time: ~3T
Optimal t1 t0t
t4 t3
t2
3tTime: ~T
t5
Key insights: t0, t2, t5 are troublesome tasks schedule them as soon as possible
Total capacity in any dimens. = 1
![Page 67: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/67.jpg)
67
Schedule construction: identify troublesome tasks and place accordingly on a virtual resource time space.
Graphene# 1
![Page 68: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/68.jpg)
T
P
C
O
…
time
reso
urce
s
T
…
time
reso
urce
s
P
OC
T
Schedule Construction
Identify tasks that can lead to a poor schedule (troublesome tasks) - T more likely to be on the critical path more difficult to pack
Break the others tasks into P, C, O sets based on their relationship with tasks from T
Place tasks in T on a virtual time space; overlay the others to fill any resultant holes in this space
Nearly optimal for over three quarters of our analyzed production DAGs
11
![Page 69: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/69.jpg)
69
Online component: enforces the desired schedule of the various DAGs.
Graphene# 2
![Page 70: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/70.jpg)
DAG
Schedule Construction
Schedule Construction
Preference order
Preference order- merging schedulesDAG
Runtime component
Node heartbeat
Task assignment
Resource Manager
Prefer jobs with less remaining work
Enforces priority ordering Local placement
Multi-resource packing Judicious overbooking of
malleable resources
Deficit counters to bound unfairness
Enables implementation of different fairness schemes
Job completion time
Online Scheduling
Makespan Being Fair
- bound unfairness- packing + overbooking
13
![Page 71: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/71.jpg)
71
Evaluation Implemented in Yarn and Tez
250 machine cluster deployment
Replay Bing traces and TPC-DS / TPC-H workloads
![Page 72: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/72.jpg)
Makespan
Tetris
29 %
Avg. Job Compl. Time
27%
Graphene vs.
Critical Path
31 % 33 %BFS
23 % 24%
Gains from view of the entire DAG place the troublesome tasks first
Efficiency
more compact schedule better packing overbooking
15
![Page 73: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/73.jpg)
73
Graphene combine various mechanisms to improve packing efficiency and
consider tasks dependencies
constructs a good schedule by placing tasks on a virtual resource time space
implemented inside YARN and Tez; trace-driven simulations and deployment show encouraging initial results
online heuristics that softly enforces the desired schedules
![Page 74: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/74.jpg)
Makespan
Tetris
29 %
Avg. Job Compl. Time
27%
Graphene vs.
Critical Path
31 % 33 %BFS
23 % 24%
Gains from view of the entire DAG place the troublesome tasks first
Graphene BFSRunning tasks
Efficiency
more compact schedule better packing overbooking
15
![Page 75: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/75.jpg)
![Page 76: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/76.jpg)
![Page 77: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/77.jpg)
![Page 78: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/78.jpg)
![Page 79: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/79.jpg)
![Page 80: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/80.jpg)
![Page 81: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/81.jpg)
![Page 82: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/82.jpg)
![Page 83: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/83.jpg)
![Page 84: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/84.jpg)
![Page 85: GoodFit: Multi-Resource Packing of Tasks with Dependencies](https://reader034.fdocuments.in/reader034/viewer/2022042722/58a9ac561a28ab9c758b598d/html5/thumbnails/85.jpg)