Fair Share Scheduling
-
Upload
patricia-whitley -
Category
Documents
-
view
37 -
download
0
description
Transcript of Fair Share Scheduling
Fair Share Scheduling
Ethan Bolker
Mathematics & Computer Science UMass Boston
www.cs.umb.edu/~eb
Queen’s University
March 23, 2001
2
References• www.bmc.com/patrol/fairshare • www/cs.umb.edu/~eb/goalmode
• Yiping Ding• Jeff Buzen• Dan Keefe• Oliver Chen• Chris Thornley
Acknowledgements• Aaron Ball• Tom Larard• Anatoliy Rikun• Liying Song
3
Coming Attractions
• Queueing theory primer • Fair share semantics• Priority scheduling; conservation laws• Predicting response times from shares
– analytic formula– experimental validation– applet simulation
• Implementation geometry
4
Transaction Workload• Stream of jobs visiting a server
(ATM, time shared CPU, printer, …)• Jobs queue when server is busy• Input:
– Arrival rate: job/sec – Service demand: s sec/job
• Performance metrics:– server utilization: u = s (must be
1)– response time: r = ??? sec/job (average)– degradation: d = r/s
5
Response time computations• r, d measure queueing delay
r s (d 1), unless parallel processing possible
• Randomness really mattersr = s (d = 1) if arrivals scheduled (best case, no waiting)
r >> s for bulk arrivals (worst case, maximum delays)
• Theorem. If arrivals are Poisson and service is exponentially distributed (M/M/1) then d = 1/(1- u) r = s/(1- u)
• Think: virtual server with speed 1-u
6
M/M/1• Essential nonlinearity often counterintuitive
– at u = 95% average degradation is 1/(1-0.95) = 20,– but 1 customer in 20 has no wait at all (5% idle time)
• A useful guide even when hypotheses fail– accurate enough ( 30%) for real computer systems– d depends only on u: many small jobs have same impact as few
large jobs– faster system smaller s smaller u r = s/(1-u)
double win: less service, less wait– waiting costly, server cheap (telephones): want u 0– server costly (doctors): want u 1 but scheduled
7
• Customers want good response times
• Decreasing u is expensive
• High end Unix offerings from HP, IBM, Sun offer fair share scheduling packages that allow an administrator to allocate scarce resources (CPU, processes, bandwidth) among workloads
• How do these packages behave?
• Model as a black box, independent of internals
• Limit study to CPU shares on a uniprocessor
Scheduling for Performance
8
Multiple Job Streams
• Multiple workloads, utilizations u1, u2, …
• U = ui < 1
• If no workload prioritization then all degradations are equal: di = 1/(1-U)
• Share allocations are de facto prioritizations
• Study degradation vector V = (d1, d2, …)
9
Share Semantics
• Suppose workload w has CPU share fw
• Normalize shares so that w fw = 1
• w gets fraction fw of CPU time slices when at least one of its jobs is ready for service
• Can it use more if competing workloads idle?
No : think share = cap
Yes : think share = guarantee
10
Shares As Caps
dedicated system share f
u/f need f > u !
• Good for accounting (sell fraction of web server)• Available now from IBM, HP, soon from Sun • Straightforward (boring) - workloads are isolated• Each runs on a virtual processor with speed *= f
utilization u response time r r(1 u)/(f u)
11
Shares As Guarantees
• Good for performance + economy (use otherwise idle resources)
• Shares make a difference only when there are multiple workloads
• Large share resembles high priority: share may be less than utilization
• Workload interaction is subtle, often unintuitive, hard to explain
12
Modeling
PerformanceGoals
complex scheduling software
OS
qu
ery
up
dat
e
rep
ort
resp
onse
tim
e
workload
measure frequently
fastcomputation
analyticalgorithms
Model
13
Modeling• Real system
– Complex, dynamic, frequent state changes– Hard to tease out cause and effect
• Model– Static snapshot, deals in averages and probabilities– Fast enlightening answers to “what if ” questions
• Abstraction helps you understand real system• Start with a study of priority scheduling
Priority Scheduling• Priority state: order workloads by priority (ties OK)
– two workloads, 3 states: 12, 21, [12]– three workloads, 13 states:
• 123 (6 = 3! of these ordered states), • [12]3 (3 of these), • 1[23] (3 of these), • [123] (1 state with no priorities)
– n wkls, f(n) states, n! ordered (simplex lock combos)• p(s) = prob( state = s ) = fraction of time in state s• V(s) = degradation vector when state = s (measure this, or compute it
using queueing theory) • V = s p(s)V(s) (time avg is convex combination)• Achievable region is convex hull of vectors V(s)
15
Two workloads
d1
V(12) (wkl 1 high prio)
V(21)
V([12]) (no priorities)
achievable region
d2
d1 = d2
16
Two workloads
d1
V(12) (wkl 1 high prio)
V(21)
V([12]) (no priorities)
0.5 V
(12)
+ 0.
5V(2
1)
V([1
2])
d2
d1 = d2
17
Two workloads
d1
V(12) (wkl 1 high prio)
V(21)
V([12]) (no priorities)
d2
d1 = d2
note: u1 < u2 wkl 2 effect on wkl 1 large
18
Conservation• No Free Lunch Theorem. Weighted average degradation
is constant, independent of priority scheduling scheme:
i (ui /U) di = 1/(1-U)
• Provable from some hypotheses• Observable in some real systems• Sometimes false: shortest job first minimizes average
response time (printer queues, supermarket express checkout lines)
19
Conservation• For any proper set A of workloads
Imagine giving those workloads top priority. Then can pretend other wkls don’t exist. In that case
i A (ui /U(A)) di = 1/(1-U(A))When wkls in A have lower priorities they have higher degradations, so in general
i A (ui /U(A)) di 1/(1-U(A))
• These 2n -2 linear inequalities determine the convex achievable region R
• R is a permutahedron: only n! vertices
20
Two Workloads
u 1d1 + u 2d2 = 1/(1-U)
d1 : workload 1 degradation
conservation law:
(d1 , d2 ) lies on the line
d 2 : w
orkl
oad
2 de
grad
atio
n
21
d 1 1/(1- u1 )
constraint resultingfrom workload 1
d1 : workload 1 degradation
d 2 : w
orkl
oad
2 de
grad
atio
nTwo Workloads
22
Workload 1 runs at high priority:
V(1,2) = (1 /(1- u1 ), 1 /(1- u1 )(1-U) )
d1 1 /(1- u1 )
constraint resultingfrom workload 1
d1 : workload 1 degradation
d 2 : w
orkl
oad
2 de
grad
atio
nTwo Workloads
23
V(2,1)
d2 1 /(1- u2 )
d1 : workload 1 degradation
d 2 : w
orkl
oad
2 de
grad
atio
nTwo Workloads
u 1d1 + u 2d2 = 1/(1-U)
24
V(2,1)
achievable region R
d1 : workload 1 degradation
d 2 : w
orkl
oad
2 de
grad
atio
nTwo Workloads
u 1d1 + u 2d2 = 1/(1-U)
V(1,2) = (1 /(1- u1 ), 1 /(1- u1 )(1-U) )
d1 1 /(1- u1 )
d2 1 /(1- u2 )
25
Three Workloads• Degradation vector (d1,d2, d3) lies on plane u1 d1
+ u2 d2 + u3dr3 = C
• We know a constraint for each workload w: uw dw Cw
• Conservation applies to each pair of wkls as well: u1 d1 + u2 d2 C12
• Achievable region has one vertex for each priority ordering of workloads: 3! = 6 in all
• Hence its name: the permutahedron
26
3! = 6 vertices (priority orders)
23 - 2 = 6 edges(conservation constraints)
d2
d1
d3
V(2,1,3)
u1 r1 + u2 d2 + u3 d3 = C
V(1,2,3)
Three Workload Permutahedron
27
Experimental evidence
28
Four workload permutahedron4! = 24 vertices (ordered states)
24 - 2 = 14 facets (proper subsets)(conservation constraints)
74 faces (states)
Simplicial geometry and transportation polytopes,Trans. Amer. Math. Soc. 217 (1976) 138.
29
Map shares to degradations- two workloads -
• Suppose f1 and f2 > 0 , f1 + f2 = 1
• Model: System operates in state – 12 with probability f1
– 21 with probability f2
(independent of who is on queue)
• Average degradation vector:
V = f1 V(12) + f2 V(21)
Dec 13, 2000 Fair Share Scheduling 30
Predict Degradations From Shares(Two Workloads)
• Reasonable modeling assumption: f1 = 1, f2 = 0 means workload 1 runs at high priority
• For arbitrary shares: workload priority order is (1,2) with probability f1
(2,1) with probability f2 (probability = fraction of time)
• Compute average workload degradation: d1 = f1
(wkl 1 degradation at high priority) + f2 (wkl 1 degradation at low priority )
31
Model validation
32
Model validation
33
Map shares to degradations- three (n) workloads -
f1 f2 f3prob(123) = ------------------------------ (f1 + f2 + f3) (f2 + f3) (f3)
• Theorem: These n! probabilities sum to 1– interesting identity generalizing adding fractions– prove by induction, or by coupon collecting
• V = ordered states s prob(s) V(s) • O(n!), (n!), good enough for n 9 (12)
34
Model validation
35
Model validation
36
The Fair Share Applet• Screen captures on next slides are from
www.bmc.com/patrol/fairshare
• Experiment with “what if” fair share modeling
• Watch a simulation
• Random virtual job generator for the simulation is the same one used to generate random real jobs for our benchmark studies
37
1
2
3
• Three workloads, each with utilization 0.32 jobs/second 1.0 seconds/job = 0.32 = 32%
• CPU 96% busy, so average (conserved) response time is 1.0/(10.96) = 25 seconds
• Individual workload average response times depend on shares
Three Transaction Workloads
???
???
???
38
1
2
3
• Normalized f3 = 0.20 means 20% of the time workload 3 (development) would be dispatched at highest priority
Three Transaction Workloads
• During that time, workload priority order is (3,1,2) for 32/80 of the time, (3,2,1) for 48/80
• Probability( priority order is 312 ) = 0.20(32/80) = 0.08
sum 80.032.0
48.0
20.0
39
• Formulas on previous slide
• Average predicted response time weighted by throughput 25 seconds (as expected)
• Hard to understand intuitively
• Software helps
Three Transaction Workloads
40
Three Transaction Workloads
note change from 32%
41
Simulation
jobs currently on run queue
42
When the Model Fails• Real CPU uses round robin scheduling to deliver time
slices
• Short jobs never wait for long jobs to complete
• That resembles shortest job first, so response time conservation law fails
• At high utilization, simulation shows smaller response times than predicted by model
• Response time conservation law yields conservative predictions
43
Scaling Degradation Predictions• V = ordered states s prob(s) V(s)
• Each s is a permutation of (1,2, … , n)
• Think of it as a vector in n-space
• Those n! vectors lie on of a sphere
• For n large they are pretty densely packed
• Think of prob(s) as a discrete approximation to a probability distribution on the sphere
• V is an integral
44
Monte Carlo
• loop sampleSize timeschoose a permutation s at random from the
distribution determined by the shares
compute degradation vector V(s)
accumulate V += prob(s)V(s)
• sampleSize = 40000 works well independent of n!
45
Map shares to degradations(geometry)
• Interpret shares as barycentric coordinates in the n-1 simplex
• Study the geometry of the map from the simplex to the n-1 dimensional permutahedron
• Easy when n=2: each is a line segment and map is linear
46
Mapping a triangle to a hexagon
f1 = 1 f 1 =
0
f3 = 0
f3 = 1
132
123
213
312
321
231
wkl 1 high priority
wkl 1 low priority
M
47
Mapping a triangle to a hexagon
f1 = 1
{23}
f 1 =
0
48
Mapping a triangle to a hexagon
49
What This Means
• Add a strong statement that summarizes how you feel or think about this topic
• Summarize key points you want your audience to remember