Performance and Availability Models for IaaS Cloud and Their Applications Rahul Ghosh

34
1 Performance and Availability Models for IaaS Cloud and Their Applications Rahul Ghosh Duke High Availability Assurance Lab Dept. of Electrical and Computer Engineering Duke University, Durham, NC 27708 www.ee.duke.edu/~rg51 Collaborators: Vijay K. Naik, Murthy Devarakonda (IBM), Kishor S. Trivedi, DongSeong Kim and Francesco Longo (Duke) IBM Student Workshop for Frontiers of Cloud Computing Hawthorne, NY, USA

description

Performance and Availability Models for IaaS Cloud and Their Applications Rahul Ghosh Duke High Availability Assurance Lab Dept. of Electrical and Computer Engineering Duke University, Durham, NC 27708 www.ee.duke.edu/~rg51 Collaborators: Vijay K. Naik , Murthy Devarakonda (IBM), - PowerPoint PPT Presentation

Transcript of Performance and Availability Models for IaaS Cloud and Their Applications Rahul Ghosh

Page 1: Performance and Availability Models for  IaaS Cloud and Their Applications Rahul Ghosh

1

Performance and Availability Models for

IaaS Cloud and Their Applications

Rahul GhoshDuke High Availability Assurance Lab

Dept. of Electrical and Computer EngineeringDuke University, Durham, NC 27708

www.ee.duke.edu/~rg51

Collaborators: Vijay K. Naik, Murthy Devarakonda (IBM), Kishor S. Trivedi, DongSeong Kim and Francesco Longo (Duke)

IBM Student Workshop for Frontiers of Cloud ComputingHawthorne, NY, USASeptember 10, 2010

Page 2: Performance and Availability Models for  IaaS Cloud and Their Applications Rahul Ghosh

2

Key problems of interest: Characterize cloud services as a function of arrival rate,

available capacity, service requirements, and failure properties

Apply these characteristics in SLA analysis and management, admission control, cloud capacity planning, cloud economics

Approach: Performability (Performance + Availability) analysis We use stochastic interacting stochastic sub-models based

approach•Lower relative cost of solving the models while covering

large parameter space compared to measurement based analysis

Introduction

Two key quality-of-service measures for IaaS cloud: (1) service availability and (2) provisioning response

delay

Page 3: Performance and Availability Models for  IaaS Cloud and Their Applications Rahul Ghosh

3

Novelty of our approachSingle monolithic model vs. interacting sub-models approach

- Even with a simple case of 6 physical machines and 1 virtual machine per physical machine, a monolithic model will have 126720 states.

- In contrast, our approach of interacting sub-models has only 41 states.

Clearly, for a real cloud, a naïve modeling approach will lead to very large analytical model. Solution of such

model is practically impossible.

Interacting sub-models approach is scalable, tractable and of high fidelity. Also, adding a new feature in an interacting sub-models approach, does not require

reconstruction of the entire model.What are the different sub-models? How do they interact?

Page 4: Performance and Availability Models for  IaaS Cloud and Their Applications Rahul Ghosh

4

Main Assumptions All requests are homogenous, where each request is for one

virtual machine (VM) with fixed size CPU cores, RAM, disk capacity.

We use the term “job” to denote a user request for provisioning a VM.

Submitted requests are served in FCFS basis by resource provisioning decision engine (RPDE).

If a request can be accepted, it goes to a specific physical machine (PM) for VM provisioning. After getting the VM, the request runs in the cloud and releases the VM when it finishes.

To reduce cost of operations, PMs can be grouped into multiple pools. We assume three pools – hot (running with VM instantiated), warm (turned on but VM not instantiated) and cold (turned off).

All physical machines (PMs) in a particular type of pool are identical.

System model

Page 5: Performance and Availability Models for  IaaS Cloud and Their Applications Rahul Ghosh

5

Provisioning and servicing steps:(i) resource provisioning decision, (ii) VM provisioning and (iii) run-time execution

Life-cycle of a job inside a IaaS cloud

We translate these steps

into analytical sub-models

Resource Provisioning

DecisionEngine

Run-timeExecution

InstanceCreation

Deploy

Job rejection due to buffer full

Job rejection due toinsufficient capacity

Arrival Queuing ProvisioningDecision

Instantiation VM deployment

Actual Service Out

Provisioning response delay

Admissioncontrol

Page 6: Performance and Availability Models for  IaaS Cloud and Their Applications Rahul Ghosh

6

Resource provisioning decision

Resource Provisioning

DecisionEngine

Run-timeExecution

InstanceCreation

Deploy

Job rejection due to buffer full

Job rejection due toinsufficient capacity

Arrival Queuing ProvisioningDecision

Instantiation VM deployment

Actual Service Out

Provisioning response delay

Admissioncontrol

Page 7: Performance and Availability Models for  IaaS Cloud and Their Applications Rahul Ghosh

7

A request is provisioned on a hot PM if pre-instantiated but unassigned VM exists. If none exists, a PM from warm pool is used. If all warm machines are busy, a PM from cold pool is used.

Resource provisioning decision

Page 8: Performance and Availability Models for  IaaS Cloud and Their Applications Rahul Ghosh

8

Continuous Time Markov Chain (CTMC)

Resource provisioning decision model

i = number of jobs in queue, s = pool (hot, warm or cold)Provisioning decision of a single job

Page 9: Performance and Availability Models for  IaaS Cloud and Their Applications Rahul Ghosh

9

Output measures-Job rejection probability due to buffer full (Pblock)

-Job rejection probability due to insufficient capacity (Pdrop)

-Total job rejection probability (Preject= Pblock+ Pdrop)

-Mean queuing delay (E[Tq_dec])

-Mean decision delay (E[Tdecision])

Reward rate based approach(attach a reward rate to each

state of Markov chain)

Little’s law (connecting mean number

in the queue with mean waiting time)

3-stage Coxian distribution

Page 10: Performance and Availability Models for  IaaS Cloud and Their Applications Rahul Ghosh

10

VM provisioning

Resource Provisioning

DecisionEngine

Run-timeExecution

InstanceCreation

Deploy

Job rejection due to buffer full

Job rejection due toinsufficient capacity

Arrival Queuing ProvisioningDecision

Instantiation VM deployment

Actual Service Out

Provisioning response delay

Admissioncontrol

Page 11: Performance and Availability Models for  IaaS Cloud and Their Applications Rahul Ghosh

11

VM provisioning model

Service out

ResourceProvisioning

DecisionEngine

Accepted jobsRunning VMs

Idle resources in hot machineIdle resources in warm machineIdle resources in cold machine

Hot pool

Warm pool

Cold pool

Hot PM

Page 12: Performance and Availability Models for  IaaS Cloud and Their Applications Rahul Ghosh

12

VM provisioning model for each hot PM

0,0,0 0,1,0 Lh,1,0

0,0,1 (Lh-1),1,1 Lh,1,1

1,0,m Lh,0,m

Lh,1,(m-1)

0,0,m

(Lh-1),1,(m-1)

0,0,(m-1) 0,1,(m-1)

h h h

hh h

hh

hh

hhh

h h

hh h

hh

h hh h

22 2

)1( m

m

)1( m

mm

)1( m)1( m

… …

… …

Lh is the buffer sizeand m is max. # VMs that can run simultaneously on a PM

i,j,k i = number of jobs in the queue, j = number of VMs being provisioned, k = number of VMs running

Page 13: Performance and Availability Models for  IaaS Cloud and Their Applications Rahul Ghosh

13

VM provisioning model for each warm PM

0,0,0 0,1*,0

Lw,1*,0

0,0,1 (Lw-1),1,1

Lw,1,1

w

w

hh

22 2

1,0,m Lw,0,m

Lw,1,(m-1)

0,0,m

(Lw-1),1,(m-1)

0,0,(m-1)

0,1,(m-1)

hh h

h hh h

)1( m

m

)1( m

mm

)1( m)1( m…

……

… ……

… …0,1,0 Lw,1,0

0,1**,0 Lw, 1**,0

w w

w ww w

whh

w…

w w

w

ww

w w

w w w

Page 14: Performance and Availability Models for  IaaS Cloud and Their Applications Rahul Ghosh

14

Output measures from VM provisioning models Prob. that a job can be accepted in the hot/warm/cold

pool (Ph /Pw /Pc)

Weighted mean queuing delay for VM provisioning (E[Tvm_q])

Weighted mean provisioning delay (E[Tprov])

Page 15: Performance and Availability Models for  IaaS Cloud and Their Applications Rahul Ghosh

15

Run-time execution

Resource Provisioning

DecisionEngine

Run-timeExecution

InstanceCreation

Deploy

Job rejection due to buffer full

Job rejection due toinsufficient capacity

Arrival Queuing ProvisioningDecision

Instantiation VM deployment

Actual Service Out

Provisioning response delay

Admissioncontrol

Page 16: Performance and Availability Models for  IaaS Cloud and Their Applications Rahul Ghosh

16

Run-time model

Model outputs: Mean job service time / resource holding time

1

Page 17: Performance and Availability Models for  IaaS Cloud and Their Applications Rahul Ghosh

17

Output measures from pure performance models All these models are used for pure performance analysis since we do

not consider any failure Output of resource provisioning decision model:

-Job rejection probability due to buffer full (Pblock)-Job rejection probability due to insufficient capacity (Pdrop)-Mean queuing delay (E[Tq_dec])-Mean decision delay (E[Tdecision])

Output of VM provisioning models:-Probability that a atleast one machine in hot /warm/cold pool can accept a job for provisioning-These probabilities are denoted by Ph, Pw and Pc for hot, warm and cold pool respectively-Weighted mean queuing delay for VM provisioning (E[Tq_vm])-Weighted mean provisioning delay (E[Tprov])

Output of run-time model:-Mean job service time

Output of pure performance models-Total job rejection probability (Preject= Pblock + Pdrop)-Net mean response delay (E[Tresp]=E[Tq_dec]+E[Tdecision]+E[Tq_vm]+E[Tprov])

Page 18: Performance and Availability Models for  IaaS Cloud and Their Applications Rahul Ghosh

18

Availability model

Model outputs: Probability that the cloud service is available, downtime in minutes per year

Page 19: Performance and Availability Models for  IaaS Cloud and Their Applications Rahul Ghosh

19

Model interactions: Performability

Page 20: Performance and Availability Models for  IaaS Cloud and Their Applications Rahul Ghosh

20

Numerical Results

Page 21: Performance and Availability Models for  IaaS Cloud and Their Applications Rahul Ghosh

21

Effect of increasing job arrival rate

Page 22: Performance and Availability Models for  IaaS Cloud and Their Applications Rahul Ghosh

22

Effect of increasing job service time

Page 23: Performance and Availability Models for  IaaS Cloud and Their Applications Rahul Ghosh

23

Effect of increasing # VMs

Page 24: Performance and Availability Models for  IaaS Cloud and Their Applications Rahul Ghosh

24

Effect of increasing MTTF of a PM

Page 25: Performance and Availability Models for  IaaS Cloud and Their Applications Rahul Ghosh

25

Applications of the models

Page 26: Performance and Availability Models for  IaaS Cloud and Their Applications Rahul Ghosh

26

Admission controlArrival

rate (jobs/h

r)

Distribution of PMs across different pools (all delays are in seconds)

(15, 15, 15) (30, 30, 30) (45, 45, 0) (90, 0, 0)E[Tresp] E[Tprov] E[Tresp] E[Tprov] E[Tresp] E[Tprov] E[Tresp] E[Tprov]

250 484.37 477.83

314.26 310.27

304.03 300.24

303.79

300.00

500 697.98 656.92

354.87 347.83

312..00

306.62

305.14

300.00

550 5146.12 666.07

363.95 355.66

315.00 309.06

305.54

300.00

600 13825.85 670.52

373.99 364.03

318.42 311.80

306.00

300.00What is the maximum job arrival rate that can supported by

the cloud service?

Increasing arrival rate increases response delay. Putting more PMs reduces this delay.

Page 27: Performance and Availability Models for  IaaS Cloud and Their Applications Rahul Ghosh

27

Response time – energy trade-offArrival

rate (jobs/h

r)

Distribution of PMs across different pools (all delays are in seconds)

(15, 15, 15) (30, 30, 30) (45, 45, 0) (90, 0, 0)E[Tresp] E[Tprov] E[Tresp] E[Tprov] E[Tresp] E[Tprov] E[Tresp] E[Tprov]

250 484.37 477.83

314.26 310.27

304.03 300.24

303.79

300.00

500 697.98 656.92

354.87 347.83

312..00

306.62

305.14

300.00

550 5146.12 666.07

363.95 355.66

315.00 309.06

305.54

300.00

600 13825.85 670.52

373.99 364.03

318.42 311.80

306.00

300.00What is the optimal # PMs across different pools that

minimizes response time for a given energy budget?

Increasing capacity reduces the gap between actual provisioning delay and response delay.

Page 28: Performance and Availability Models for  IaaS Cloud and Their Applications Rahul Ghosh

28

SLA driven capacity planning

What should be the size of each pool, so that total cost is minimized and SLA (maximum

rejection probability or response delay) is upheld?

Page 29: Performance and Availability Models for  IaaS Cloud and Their Applications Rahul Ghosh

29

Recent work on IaaS cloud resiliency

Page 30: Performance and Availability Models for  IaaS Cloud and Their Applications Rahul Ghosh

30

Resiliency Analysis Definition of resiliency

Resiliency is the persistence of service delivery that can justifiably be trusted when facing changes*

changes of interest in the context of IaaS cloudIncrease in workload, faultloadDecrease in system capacitySecurity attacksAccidents or disasters

Our contributions:Quantifying resiliency of IaaS cloudResiliency analysis approach using performance analysis models

*[1] J. Laprie, “From Dependability to resiliency”, DSN 2008[2] L. Simoncini, “Resilient Computing: An Engineering Discipline”, IPDPS 2009

Page 31: Performance and Availability Models for  IaaS Cloud and Their Applications Rahul Ghosh

31

Effect of changing demand

Page 32: Performance and Availability Models for  IaaS Cloud and Their Applications Rahul Ghosh

32

Effect of changing capacity

Page 33: Performance and Availability Models for  IaaS Cloud and Their Applications Rahul Ghosh

33

Conclusions Stochastic model can be an inexpensive alternative to

measurement based evaluation of cloud QoS

To reduce the complexity of modeling, we use an interacting sub-model approach

- Overall solution of the model is obtained iteration over individual sub-model solutions

The proposed approach is general and can be applicable to variety of IaaS clouds

Results show that IaaS cloud service quality is affected through variations in workload (job arrival rate, job service rate), faultload (machine failure rate) and available system capacity

This approach can be extended to solve specific cloud problems such as capacity planning of public, private and hybrid clouds

In future, models will be validated using real data collected from cloud

Page 34: Performance and Availability Models for  IaaS Cloud and Their Applications Rahul Ghosh

34

Thanks!