Day 3 - Jawaharlal Nehru University · • The assignment of the processor to the jobs must...

39
Day 3

Transcript of Day 3 - Jawaharlal Nehru University · • The assignment of the processor to the jobs must...

Page 1: Day 3 - Jawaharlal Nehru University · • The assignment of the processor to the jobs must minimize the overall completion time of the jobs. • Assume only space sharing allocation

Day 3

Page 2: Day 3 - Jawaharlal Nehru University · • The assignment of the processor to the jobs must minimize the overall completion time of the jobs. • Assume only space sharing allocation

Agenda for Today

• Formulate simple problem statement

• Revisit the workload characterization problem.

• Present detailed (step by step) derivation of the workload.

Page 3: Day 3 - Jawaharlal Nehru University · • The assignment of the processor to the jobs must minimize the overall completion time of the jobs. • Assume only space sharing allocation

Workload Modeling • Workload modeling is used to generate synthetic

workloads based on real-life job execution observations.

• The goal is typically to be able to create workloads that can be used in performance evaluation studies

Workload Submission Workload

description (may include QoS)

Workload Generation

J1 J1 Jn

Analysis

SLA

Consolidate

result

Negotiation

Page 4: Day 3 - Jawaharlal Nehru University · • The assignment of the processor to the jobs must minimize the overall completion time of the jobs. • Assume only space sharing allocation

Problem Formulation • Given 𝑃 number of available processors and m jobs, 𝒥 = 𝐽1, 𝐽2, ⋯ , 𝐽𝑚 , waiting in a queue to be processed

• Problem - allocate 𝑝𝑗 processors to job 𝐽𝑖 such that the overall

execution time is minimized

𝑚𝑖𝑛𝑚𝑖𝑠𝑒 𝑇𝑗 𝑝𝑗 ,

𝑚

𝑗=1

Subject to 𝑝𝑗 ≤ 𝑃, 𝑝𝑗∈ *1,2, … , 𝑝𝑚𝑎𝑥+

𝑚

𝑗=1

execution will starts only when 𝑝 = 𝑝𝑚𝑎𝑥 allocated

– 𝑇𝑗 𝑛 be the execution time function of job j,

– 𝑝𝑚𝑎𝑥 the maximum parallelism job j can have,

– 𝑝𝑗 is the unknown processor allocation to job j, and

Page 5: Day 3 - Jawaharlal Nehru University · • The assignment of the processor to the jobs must minimize the overall completion time of the jobs. • Assume only space sharing allocation

Problem Formulation

• Assume that we have four jobs 𝒥 = J1, J2, J3, J4 .

• For simplicity assume that each job request 3 processors and the service demand of each job is 20 time units.

• Suppose we have 𝑃 = 7 available homogenous processors to be assigned to the 4 jobs.

• The assignment of the processor to the jobs must minimize the overall completion time of the jobs.

• Assume only space sharing allocation

Page 6: Day 3 - Jawaharlal Nehru University · • The assignment of the processor to the jobs must minimize the overall completion time of the jobs. • Assume only space sharing allocation

Allocation Possibility • One possible assignment is

• J1 = 3,

• J2 = 3

• J3 = 1

• J4 = 0

20 Time

J1, J2 completed

started J1, J2 J3, J4 40

J3, J4

Initialization say 3 time units

• The total execution time is 43 units.

• Can we do better?

Page 7: Day 3 - Jawaharlal Nehru University · • The assignment of the processor to the jobs must minimize the overall completion time of the jobs. • Assume only space sharing allocation

Job Description • We have a set of jobs to be executed

𝒥 = 𝐽1, 𝐽2, ⋯ , 𝐽𝑛

• Number of tasks per job

– Each job 𝐽𝑖 has a set of tasks

𝐽𝑖 = 𝑇1, 𝑇2, ⋯ , 𝑇𝑚 – Note that a task represents a part of the work that must be

done serially by a single processor

– Interdependence among job tasks is important to consider

• A job is said to be “small” (or “large”) if it consists of a small (or large) number of tasks.

• Tasks with a large service demand may introduce large queuing delays for queued jobs.

Page 8: Day 3 - Jawaharlal Nehru University · • The assignment of the processor to the jobs must minimize the overall completion time of the jobs. • Assume only space sharing allocation

Job Description • A job is said to be “small” (or “large”) if it consists of a

small (or large) number of tasks.

• Based on he analysis of real workload logs used in production

– the percentage of small jobs, with a small number of tasks, is higher than large jobs, with a large number of tasks.

– For this reason, we examine the following distribution for the number of tasks per job.

• Tasks with a large service demand may introduce large queuing delays for queued jobs.

Large jobs Small jobs

Page 9: Day 3 - Jawaharlal Nehru University · • The assignment of the processor to the jobs must minimize the overall completion time of the jobs. • Assume only space sharing allocation

Job Description

• A job is completely described by the following parameters:

– Cumulative job service demand (W)

– The arrival time

– The number of task

• A job with one task is called a sequential job and a job with multiple tasks is called parallel jobs.

Page 10: Day 3 - Jawaharlal Nehru University · • The assignment of the processor to the jobs must minimize the overall completion time of the jobs. • Assume only space sharing allocation

Workload Modeling

• We will consider the following workloads – WK1 consists of curves with

relatively good speedup.

– WK2 consists of curves with not as good speedup as W1.

– WK3 consists of curves with poor speedup.

– WK4 contains jobs with all three speedup types, each appearing with approximately equal frequency.

Page 11: Day 3 - Jawaharlal Nehru University · • The assignment of the processor to the jobs must minimize the overall completion time of the jobs. • Assume only space sharing allocation

Workload Generation

• Sevcik proposed the following model to represent the execution time function of a job that can run on 𝑝 processors:

𝑇 𝑝 = 𝜙(𝑝)𝑊

𝑝+ 𝛼 + 𝛽 ∙ 𝑝

• It has been shown that a wide range of representative applications can be modeled by utilizing the above execution time function.

Page 12: Day 3 - Jawaharlal Nehru University · • The assignment of the processor to the jobs must minimize the overall completion time of the jobs. • Assume only space sharing allocation

Workload Generation

• The execution time function captures both the scale up and the overhead

𝑇 𝑝 = 𝜙(𝑝)𝑊

𝑝

𝑠𝑐𝑎𝑙𝑒𝑢𝑝

+ 𝛼 + 𝛽 ∙ 𝑝𝑜𝑣𝑒𝑟ℎ𝑒𝑎𝑑

• The runtime function allows to create different workloads by choosing different values for the parameters: φ, W, β, and α.

Page 13: Day 3 - Jawaharlal Nehru University · • The assignment of the processor to the jobs must minimize the overall completion time of the jobs. • Assume only space sharing allocation

Workload Generation

• There is certain level of load imbalance when running a job on multiple processors.

• 𝜙(𝑝) parameter in the equation

𝑇 𝑝 = 𝜙(𝑝)𝑊

𝑝+ 𝛼 + 𝛽 ∙ 𝑝

– 𝜙(𝑝) represents the degree to which the work is not evenly spread across the p processors (i.e., load imbalance)

– Real measurements conducted by Wu shows that its value is in the range of: 1.1 ≤ 𝜙(𝑝) ≤ 1.2

– Therefore, φ(p) can be considered equal to 1.0.

Page 14: Day 3 - Jawaharlal Nehru University · • The assignment of the processor to the jobs must minimize the overall completion time of the jobs. • Assume only space sharing allocation

Workload Generation

Note that adding processors to a job reduces computation time but there is certain level of increases in completion time due to sequential execution

• 𝛼 parameters in the equation captures the above

𝑇 𝑝 = 𝜙(𝑝)𝑊

𝑝+ 𝛼 + 𝛽 ∙ 𝑝

– 𝛼 represents the increase of the work per processor due to parallelization (i.e., overhead)

Page 15: Day 3 - Jawaharlal Nehru University · • The assignment of the processor to the jobs must minimize the overall completion time of the jobs. • Assume only space sharing allocation

Workload Generation

• Note that adding processors to a job reduces computation time but increases communication time

• 𝛽 parameter in the equation captures the above issue

𝑇 𝑝 = 𝜙(𝑝)𝑊

𝑝+ 𝛼 + 𝛽 ∙ 𝑝

– 𝛽 represents the communication and congestion delays that increase with the increase in the number of processors assigned to job.

– What this says is that the more the number of processors given to a job the higher the communication cost and congestion delays

Page 16: Day 3 - Jawaharlal Nehru University · • The assignment of the processor to the jobs must minimize the overall completion time of the jobs. • Assume only space sharing allocation

Workload Generation

• 𝑊 in the runtime function represents the total service demand of the job

𝑇 𝑝 = 𝜙(𝑝)𝑊

𝑝+ 𝛼 + 𝛽 ∙ 𝑝

– The mean value 𝑊 is 13.76 and the coefficient of variation must be greater than one (e.g., 3.5, 10.0).

Large jobs Small jobs

average service demand = 1.3 and account for 7/8 of the jobs in the system

average service demand = 101 and accounts for 1/8 jobs in the system

Page 17: Day 3 - Jawaharlal Nehru University · • The assignment of the processor to the jobs must minimize the overall completion time of the jobs. • Assume only space sharing allocation

Workload Speed Up

• We can determine the speed up of the application on p processors as follows:

𝑆 𝑝 =1 +

1𝑝𝑚𝑎𝑥

2 +1𝑝𝑚𝑎𝑥

2

𝜇

1𝑝+𝑝𝑝𝑚𝑎𝑥

2 +1𝑝𝑚𝑎𝑥

2

𝜇

• Where

– 𝜇 ∈ ∞, 0.2,0.4

– 𝑝𝑚𝑎𝑥 is the maximum number of processors assigned to the workload (WK1, WK2, WK3).

Page 18: Day 3 - Jawaharlal Nehru University · • The assignment of the processor to the jobs must minimize the overall completion time of the jobs. • Assume only space sharing allocation

Workload Speed Up

• We can determine the speed up for WK3 with 𝜇 = 0.2 and 𝑝𝑚𝑎𝑥=1,4,6,9, p=32

𝑆 𝑝 =1 +

1𝑝𝑚𝑎𝑥

2 +1𝑝𝑚𝑎𝑥

2

𝜇

1𝑝+𝑝𝑝𝑚𝑎𝑥

2 +1𝑝𝑚𝑎𝑥

2

𝜇

• We can substitutive the above

𝑆 𝑝 =1 +14 2

+14 2

𝜇

132 +

324 2+14 2

𝜇

Page 19: Day 3 - Jawaharlal Nehru University · • The assignment of the processor to the jobs must minimize the overall completion time of the jobs. • Assume only space sharing allocation

Workload Speed Up

• The results for the speedup curve 𝑝𝑚𝑎𝑥 = 4

Page 20: Day 3 - Jawaharlal Nehru University · • The assignment of the processor to the jobs must minimize the overall completion time of the jobs. • Assume only space sharing allocation

Workload Speed Up

• The results for the speedup curve when 𝑝𝑚𝑎𝑥 = 16

Page 21: Day 3 - Jawaharlal Nehru University · • The assignment of the processor to the jobs must minimize the overall completion time of the jobs. • Assume only space sharing allocation

Workload Speed Up

• The results for the speedup curve when 𝑝𝑚𝑎𝑥 = 64

Page 22: Day 3 - Jawaharlal Nehru University · • The assignment of the processor to the jobs must minimize the overall completion time of the jobs. • Assume only space sharing allocation

Workload WK1 • Consists of curves with relatively

good speedup.

• They correspond to 𝜇 = +∞

• Example is matrix multiplication application

• The number of possible tasks is given by

𝑛 =𝑊

𝛽

• β reflects the communication and congestion delays that increase with the number of processors.

Task 1

Task 2 Task 3 Task n-1

Structure of matrix multiplication application

Task n

Page 23: Day 3 - Jawaharlal Nehru University · • The assignment of the processor to the jobs must minimize the overall completion time of the jobs. • Assume only space sharing allocation

Workload WK1

• The actual job service demand value is obtained from a two-stage hyper-exponential distribution depending on the coefficient of variation of the service time.

𝑓 𝑤 = 𝑃 1 − 𝑒𝑊101

𝑙𝑎𝑟𝑔𝑒 𝑜𝑏𝑠

+ 1 − 𝑃 1 − 𝑒𝑊1.3

𝑠𝑚𝑎𝑙𝑙 𝑗𝑜𝑏𝑠

Page 24: Day 3 - Jawaharlal Nehru University · • The assignment of the processor to the jobs must minimize the overall completion time of the jobs. • Assume only space sharing allocation

Workload WK1

• The actual job service demand value is obtained from a two-stage hyper-exponential distribution depending on the coefficient of variation of the service time.

𝑓 𝑤 = 𝑃 1 − 𝑒𝑊101 + 1 − 𝑃 1 − 𝑒

𝑊101

Where

– 𝑃 = 0125

– The mean value of 𝑊 is 13.76

Page 25: Day 3 - Jawaharlal Nehru University · • The assignment of the processor to the jobs must minimize the overall completion time of the jobs. • Assume only space sharing allocation

Workload WK1

• The service demand of WK1 can now be computed as

𝑓 𝑤 = 0.125 1 − 𝑒−13.76101 + 1 − 0.125 1 − 𝑒

−13.761.3

Page 26: Day 3 - Jawaharlal Nehru University · • The assignment of the processor to the jobs must minimize the overall completion time of the jobs. • Assume only space sharing allocation

Workload WK2

• They correspond to 𝜇 = 0.4, which indicate that its speedup is not as good as WK1

• Example: n-body simulations of stellar or planetary movements, in which the movement of each body is governed by the gravitational forces produced by the system as a whole

• The number of possible tasks is given by

𝑛 =𝑊

𝛽

Page 27: Day 3 - Jawaharlal Nehru University · • The assignment of the processor to the jobs must minimize the overall completion time of the jobs. • Assume only space sharing allocation

Workload WK3

• They correspond to 𝜇 = 0.2, which indicate that its speedup is poorer than WK1 and WK2

• Example: Mean value analysis

• kinds of applications exhibit this property.

• The number of possible tasks is given by

𝑛 =𝑊

𝛽

Page 28: Day 3 - Jawaharlal Nehru University · • The assignment of the processor to the jobs must minimize the overall completion time of the jobs. • Assume only space sharing allocation

Workload WK3

• They correspond to 𝜇 = 0.2, which indicate that its speedup is poorer than WK1 and WK2

• Example: Mean value analysis

• kinds of applications exhibit this property.

• The number of possible tasks is given by

𝑛 =𝑊

𝛽

Page 29: Day 3 - Jawaharlal Nehru University · • The assignment of the processor to the jobs must minimize the overall completion time of the jobs. • Assume only space sharing allocation

The Arrival Process • Jobs arrive to the system in stochastic manner

J1 J2 J3 J4

• An exponential distribution with the mean inter-arrival time can be derived as follows:

𝑓 𝑥 = 𝜆 ∙ 𝑒−𝜆∙𝑥

• where

1

𝜆=𝐸 𝑇 1

𝑁 ∗ 𝐿𝑜𝑎𝑑

Page 30: Day 3 - Jawaharlal Nehru University · • The assignment of the processor to the jobs must minimize the overall completion time of the jobs. • Assume only space sharing allocation

The Arrival Process

• We already know that E(W) = 13.76.

𝐸 𝑇 1 = 𝐸 𝑊 + 𝐸 𝛽 + 𝐸 𝛼

• By using the theorem of total expectation across the three different values of 𝑝𝑚𝑎𝑥, we find E(𝛽) = 0.30

𝐸 𝛼 =

0.0 𝑖𝑓 𝜇 = +∞2.0 𝑖𝑓 𝜇 = 0.45.0 𝑖𝑓 𝜇 = 0.22.3 mixed speedup

Page 31: Day 3 - Jawaharlal Nehru University · • The assignment of the processor to the jobs must minimize the overall completion time of the jobs. • Assume only space sharing allocation

Today Lab

• We have seen the cloudlet in CloudSim. It just runs with fixed values. We want to change it.

• Task 1: change the constant value to make the arrival process based on exponential distributions

• Task 2: Currently the execution time of the cloudlet is fixed. Change it to be generated using 2-stage hyper-exponential distribution.

Page 32: Day 3 - Jawaharlal Nehru University · • The assignment of the processor to the jobs must minimize the overall completion time of the jobs. • Assume only space sharing allocation

32

Thank you.

Questions, Comments, …?

Page 33: Day 3 - Jawaharlal Nehru University · • The assignment of the processor to the jobs must minimize the overall completion time of the jobs. • Assume only space sharing allocation

Exponential Distribution

• The Exponential:

– 𝜆 = measures how many things happen per unit

𝑓 𝑥 = 𝜆 ∙ 𝑒−𝜆∙𝑥

0

0.5

1

1.5

2

2.5

1 2 3 4 5 6 7

pro

bab

ility

den

sity

fu

nct

ion

s

X

Lambda 0.5 1 2

Page 34: Day 3 - Jawaharlal Nehru University · • The assignment of the processor to the jobs must minimize the overall completion time of the jobs. • Assume only space sharing allocation

Hyper-exponential distribution

• Hyper-exponential distributions is used in service demand modeling

• The hyper-exponential distribution is obtained by selecting from a mixture of several exponential distributions.

• The simplest variant has only two stages:

𝜆1

𝜆2

𝑃

1 − 𝑃 𝜆1 ≠ 𝜆2

𝑓 𝑥 = 𝑝𝑖 ∙ 𝜆𝑖 ∙ 𝑒𝜆𝑖𝑥

𝑛

𝑖=1

0 ≤ 𝑝𝑖 ≤ 1

0 ≤ 𝜆1, 𝜆2𝑖 ≤ 1

Page 35: Day 3 - Jawaharlal Nehru University · • The assignment of the processor to the jobs must minimize the overall completion time of the jobs. • Assume only space sharing allocation

Coefficient of Variation

• Coefficient of variation is defined as follows

𝐶𝑣 =𝜎

𝜇

• Where

– 𝜎 is the standard deviation

– 𝜇 is the mean

Page 36: Day 3 - Jawaharlal Nehru University · • The assignment of the processor to the jobs must minimize the overall completion time of the jobs. • Assume only space sharing allocation

Long-lived vs short-lived Jobs

• Long-lived processes

• short-lived processes

𝑃 𝑇 > 𝜏 ∝ 𝛼2, 𝛼 ≈ 1

• This means that most processes are short, but a small number are very long.

Page 37: Day 3 - Jawaharlal Nehru University · • The assignment of the processor to the jobs must minimize the overall completion time of the jobs. • Assume only space sharing allocation

Coefficient of Variation

• Coefficient of variation is defined as follows

𝐶𝑣 =𝜎

𝜇

• Where

– 𝜎 is the standard deviation

– 𝜇 is the mean

Page 38: Day 3 - Jawaharlal Nehru University · • The assignment of the processor to the jobs must minimize the overall completion time of the jobs. • Assume only space sharing allocation

Performance Measurements

• Classical performance metrics: – Response time,

– throughput,

– scalability,

– resource/cost/energy,

– efficiency,

– Elasticity

– Availability,

– reliability, and

– security

– SLA violation

Page 39: Day 3 - Jawaharlal Nehru University · • The assignment of the processor to the jobs must minimize the overall completion time of the jobs. • Assume only space sharing allocation

References

• Thyagaraj Thanalapati, Sivarama P. Dandamudi: An Efficient Adaptive Scheduling Scheme for Distributed Memory Multicomputers. IEEE Trans. Parallel Distrib. Syst. 12(7): 758-768 (2001)

• A. Iosup and D.H.J. Epema, Grid Computing Workloads, IEEE Internet Computing 15(2): 19-26 (2011)

• Feitelson DG. Workload modeling for computer systems performance evaluation. Cambridge University Press; 2015.