Optimization of IaaS Cloud including Performance ...

76
1 Optimization of IaaS Cloud including Performance, Availability, Power Analysis Prof. Kishor S. Trivedi Duke High Availability Assurance Lab (DHAAL) Department of Electrical and Computer Engineering Duke University, Durham, NC 27708-0291 Phone: (919) 660-5269 E-mail: [email protected] URL: www.ee.duke.edu/~ktrivedi Networking 2014 Trondheim, Norway June 2, 2014 Copyright © 2014 by K.S. Trivedi

Transcript of Optimization of IaaS Cloud including Performance ...

Page 1: Optimization of IaaS Cloud including Performance ...

1

Optimization of IaaS Cloud including

Performance, Availability, Power Analysis

Prof. Kishor S. TrivediDuke High Availability Assurance Lab (DHAAL)

Department of Electrical and Computer EngineeringDuke University, Durham, NC 27708-0291

Phone: (919) 660-5269 E-mail: [email protected]: www.ee.duke.edu/~ktrivedi

Networking 2014

Trondheim, NorwayJune 2, 2014

Copyright © 2014 by K.S. Trivedi

Page 2: Optimization of IaaS Cloud including Performance ...

2

Duke University

Research Triangle Park (RTP)

UNC-CH

Duke

NC state

USANorth

Carolina

2

Copyright © 2014 by K.S. Trivedi

Page 3: Optimization of IaaS Cloud including Performance ...

3

HARP (NASA), SAVE (IBM),

IRAP (Boeing)

SHARPE, SPNP, SREPT

Theory

Applications

Books:Blue, Red,

White

Software

Packages

Stochastic modeling methods & numerical solution methods:

•Large Fault trees, Stochastic Petri Nets,

•Large/stiff Markov & non-Markov models

•Fluid stochastic Petri Nets

•Performability & Markov reward models

•Software aging and rejuvenation

•Attack countermeasure trees

DHAAL Research Triangle

Reliability/availability/performanceAvionics, Space, Power systems, Transportation systems,Automobile systems Computer systems (hardware/software)Telco systemsComputer NetworksVirtualized Data centerCloud computing

Copyright © 2014 by K.S. Trivedi

Page 4: Optimization of IaaS Cloud including Performance ...

4

Books Authored by TrivediProbability and Statistics with Reliability, Queuing, and Computer Science Applications, first edition, Prentice-Hall, 1982; Second edition, John Wiley, 2001 (Bluebook)

Performance and Reliability Analysis of Computer Systems: An Example-Based Approach Using the SHARPE Software Package, Kluwer, 1996 (Redbook)

Queuing Networks and Markov Chains, John Wiley, first edition, 1996; second edition, 2006 (White book)

Copyright © 2014 by K.S. Trivedi

Page 5: Optimization of IaaS Cloud including Performance ...

5

Overview of Cloud Computing

Cloud Capacity Planning

� Availability Model for IaaS Cloud

� Performance Model for IaaS Cloud

� Power Model for IaaS Cloud

Talk outline

Copyright © 2014 by K.S. Trivedi

Page 6: Optimization of IaaS Cloud including Performance ...

6

An Overview of Cloud Computing

Copyright © 2014 by K.S. Trivedi

Page 7: Optimization of IaaS Cloud including Performance ...

7

On-demand self-service: Provisioning of computing capabilities without human intervention

Resource pooling:Shared physical and virtualized environment

Rapid elasticity:Through standardization and automation, quick scaling

Metered Service: Pay-as-you-go model of computing

Source: P. Mell and T. Grance, “The NIST Definition of Cloud Computing”, October 7, 2009

Key characteristics

Many of these characteristics are borrowed from Cloud’s predecessors!

Copyright © 2014 by K.S. Trivedi

Page 8: Optimization of IaaS Cloud including Performance ...

8

Time line of evolution

Evolution of cloud computing

*Source: http://seekingalpha.com/article/167764-tipping-point-gartner-annoints-cloud-computing-top-strategic-technology

Cluster computing

Grid computing

Utility computing

Cloud computing

Early 80s

Early 90s

Around 2000

Around 2005-06

Copyright © 2014 by K.S. Trivedi

Page 9: Optimization of IaaS Cloud including Performance ...

9

Infrastructure-as-a-Service (IaaS) Cloud: Examples: Amazon EC2

Platform-as-a-Service (PaaS) Cloud: Examples: Microsoft Windows Azure, Google AppEngine

Software-as-a-Service (SaaS) Cloud:Examples: Gmail, Google Docs

Cloud Service models

Copyright © 2014 by K.S. Trivedi

Page 10: Optimization of IaaS Cloud including Performance ...

10

Private Cloud: � Cloud infrastructure solely for an organization

� Managed by the organization or third party

� May exist on premise or off-premise

Public Cloud:� Cloud infrastructure available for use for general users

� Owned by an organization providing cloud services

Hybrid Cloud:� Composition of two or more clouds (private or public)

Deployment models

Copyright © 2014 by K.S. Trivedi

Page 11: Optimization of IaaS Cloud including Performance ...

11

Stochastic Model Driven Capacity Planning for an IaaS Cloud, R. Ghosh, F. Longo, R. Xia, V. Naik, and K. Trivedi, IEEE Trans. On Services Computing, 2014 (to appear)

Copyright © 2014 by K.S. Trivedi

Page 12: Optimization of IaaS Cloud including Performance ...

12

What is the optimal

#PMs so that total

cost is minimized?

SLA driven capacity planning

Large sized cloud, large # configurations to searchCopyright © 2014 by K.S. Trivedi

Page 13: Optimization of IaaS Cloud including Performance ...

13

Capacity Planning Problem

Determine the number of Physical Machines

That

Minimize the overall cost

Copyright © 2014 by K.S. Trivedi

Page 14: Optimization of IaaS Cloud including Performance ...

14

Joint work withRahul Ghosh, Ruofan Xia and Dong Seong Kim (Duke),

Francesco Longo (Univ. of Messina) Vijay Naik, Murthy Devarakonda and Daniel Dias

(IBM T. J. Watson Research Center)

Duke/IBM project on cloud computing

Copyright © 2014 by K.S. Trivedi

Page 15: Optimization of IaaS Cloud including Performance ...

15

Cost components

Capital Expenditure (CapEx)

• Infrastructure cost

Operational Expenditure (OpEx)

• Penalty due to violation of different SLA metrics

– Cost of job rejection due to insufficient resources

– Cost of downtime

• Cost of carrying out repairs

• Power usage cost

Copyright © 2014 by K.S. Trivedi

Page 16: Optimization of IaaS Cloud including Performance ...

16

Three Pools of Servers (PMs)

Copyright © 2014 by K.S. Trivedi

To reduce power usage costs, physical machines are

divided into three pools [IBM Research Cloud]

� Hot pool (high performance & high power usage)

� Warm pool (medium performance & power usage)

� Cold pool (lowest performance & power usage)

Page 17: Optimization of IaaS Cloud including Performance ...

17

System Operation Details

Copyright © 2014 by K.S. Trivedi

Failure/Repair (Availability):

� Servers may fail and get repaired.

� A minimum number of operational hot servers are required for the system to function.

� Servers in other pools may be temporarily assigned to the hot pool to maintain system

operation (migration).

Job Arrival/Service (Performance):

� New jobs may be rejected if existing workload is heavy so all resources are occupied.

Server operation consumes power depending on the server status (i.e., the pool it is in and the number of active VMs)

Page 18: Optimization of IaaS Cloud including Performance ...

18

Optimization Problem

Determine the number of PMs in each pool: nh , nw , nc so as to minimize CapEx(nh , nw , nc) + OpEx(nh , nw , nc)

� nh , nw , nc: number of servers in the hot, warm, cold pool

CapEx function can be “easily” determined

OpEx for each vector of PMs in each pool needs to be computed

For a search-based optimization algorithm, this OpEx computation needs to be done many times

� We need an efficient algorithm for doing this

Scalable models are developed for such a computation

Copyright © 2014 by K.S. Trivedi

Page 19: Optimization of IaaS Cloud including Performance ...

19

High level view of developed models for OpEx

Copyright © 2014 by K.S. Trivedi

Performance model

Availability model

Downtime

cost

Repair

costJob

rejection

cost

OpEx

Power & cooling

cost

• Number of servers

in the three pools;

• System operation

period length.

• Mean time to failure/

repair of servers in

the three pools;

• Unit downtime cost;

• Unit repair cost

• External Job arrival rate;

• Mean job execution time;

• Power consumption of a

given server

• Unit rejection cost;

• Unit power cost

Page 20: Optimization of IaaS Cloud including Performance ...

20

Cost Component – Part I

Infrastructure cost:

Cf : cost of each server

nh , nw , nc : number of servers in the hot, warm, cold pool

ns : the number of servers on a chassis

C’f : the cost of a server chassis

Rejection cost:

ρreject : task rejection rate from the performance model

Ct : cost of each task rejection

L : length of the operation period

Copyright © 2014 by K.S. Trivedi

Page 21: Optimization of IaaS Cloud including Performance ...

21

Cost Components – Part II

Repair cost:

rh, rw, rc : Mean number of repairs per time unit in the hot, warm and cold pool respectively

Cr : Cost of each repair

L: Length of time of operation

Downtime cost:

Cd : Revenue loss per time unit because of downtime

DT: Total steady state downtime in minutes for the Cloud during operational period L (hr) (from Availability model)

DTth: Threshold on downtime beyond which downtime cost is incurred

Copyright © 2014 by K.S. Trivedi

Page 22: Optimization of IaaS Cloud including Performance ...

22

Cost Components – Part III

Power and cooling cost:

ph(x), pw(y), pc(z): the probability that there are x, y, z servers in the hot, warm, cold pool respectively. Computed from the availability model.

W(x, y, z): the power consumption when there are x, y, z servers running in the hot, warm, cold respectively. Computed from the performance model.

Cp: cost of per unit of power consumption

L: length of the operation period

Copyright © 2014 by K.S. Trivedi

Page 23: Optimization of IaaS Cloud including Performance ...

23

zyx ′′′′′′ ,,zyx ′′′ ,,zyx ,,

Performance model

zyxW ,,

zyxW ′′′ ,,

zyxW ′′′′′′ ,,

States of availability model

Overall power consumption and cooling cost as expected steady state reward rates

… …

Power and cooling cost

Page 24: Optimization of IaaS Cloud including Performance ...

24

Optimization Problems and Solution Approach

The problems are nonlinear and (in general) non-convex.

We use Simulated Annealing but other search algorithms

can be used as well.

For each vector of values of the number of severs in

each pool, we need and efficient method of computing

the job rejection probability, downtime and the power

usage cost � scalable availability, performance and

power models are needed

Copyright © 2014 by K.S. Trivedi

Page 25: Optimization of IaaS Cloud including Performance ...

25

Optimal configurations in different problem instances

Sample Results

Copyright © 2014 by K.S. Trivedi

Page 26: Optimization of IaaS Cloud including Performance ...

26

Comparison with intuition based approach

Consider a case where OpEx involves only:

� Power consumption and cooling costs

An example of intuition based capacity planning in such

scenario:

� More PMs in hot pool higher power cost

� More PMs in cold pool lower power cost

In our previous paper (DSN workshop DCDV 2011), we

showed, such intuition based approach does not always hold true

� When the workload arrival rate is high, PMs in the cold pool will act as PMs in the hot pool

� Cold pool power consumption will be almost same as the hot pool

Copyright © 2014 by K.S. Trivedi

Page 27: Optimization of IaaS Cloud including Performance ...

27

How do we develop scalable performance and

availability and power models to compute OpEx ?

Copyright © 2014 by K.S. Trivedi

Page 28: Optimization of IaaS Cloud including Performance ...

28

Our goals in the IBM Cloud project

Develop a comprehensive analytic modeling approach

High fidelity

Scalable and tractable

Copyright © 2014 by K.S. Trivedi

Page 29: Optimization of IaaS Cloud including Performance ...

29

Our approach

Monolithic analytic (Markov) models will not work as

they will suffer largeness and hence not scalable

Our approach:� overall system model decomposed into a set of sub-models

� sub-model solutions composed via an interacting Markov chain approach

� Fixed-Point problem solved via successive substitution

� scalable and tractable

Copyright © 2014 by K.S. Trivedi

Page 30: Optimization of IaaS Cloud including Performance ...

30

Scalable Analytic Model for IaaS Cloud

Availability and Downtime[paper in IEEE Trans. On Cloud Computing 2014]

Copyright © 2014 by K.S. Trivedi

Page 31: Optimization of IaaS Cloud including Performance ...

31

Analytic model

Markov model (CTMC) is too large to construct by hand.

� The number of PMs in each pool can be large

� PMs can migrate among pools

We use a high level formalism of stochastic Petri net (the flavor

known as stochastic reward net (SRN)).

SRN models can be automatically converted into underlying

Markov (reward) model and solved for the measures of interest

such as DT (downtime)

For very large number of PMs even decomposed models are not

enough; we resort to discrete-even simulation; same SRN model

can be simulated via our software package (SPNP)

Copyright © 2014 by K.S. Trivedi

Page 32: Optimization of IaaS Cloud including Performance ...

32

Monolithic SRN Model

Copyright © 2014 by K.S. Trivedi

Page 33: Optimization of IaaS Cloud including Performance ...

33

Monolithic Model

Copyright © 2014 by K.S. Trivedi

Monolithic SRN model is automatically translated into CTMC or

Markov Reward Model

However the model not scalable as state-space size of this model is

extremely large

#PMs per pool #states #non-zero matrix entries

3 10, 272 59, 560

4 67,075 453, 970

5 334,948 2, 526, 920

6 1,371,436 11, 220, 964

7 4,816,252 41, 980, 324

8 Memory overflow Memory overflow

10 - -

Page 34: Optimization of IaaS Cloud including Performance ...

34

Decompose into Interacting Sub-models

SRN sub-model for hot pool

SRN sub-model for warm poolSRN sub-model for cold pool

Copyright © 2014 by K.S. Trivedi

Page 35: Optimization of IaaS Cloud including Performance ...

35

Import graph and model outputs

Model outputs:

� mean number of PMs in each pool (E[#Ph], E[#Pw], and E[#Pc])

� Downtime in minutes per year

Copyright © 2014 by K.S. Trivedi

Page 36: Optimization of IaaS Cloud including Performance ...

36

Many questions

Existence of Fixed Point (easy)

Uniqueness

Rate of convergence

Accuracy

Scalability

Copyright © 2014 by K.S. Trivedi

Page 37: Optimization of IaaS Cloud including Performance ...

37

Monolithic vs. interacting sub-models

#states, #non-zero entries

Copyright © 2014 by K.S. Trivedi

Page 38: Optimization of IaaS Cloud including Performance ...

38

Monolithic vs. interacting sub-models

Downtime [minutes per year]

� k is the #PM in hot pool

to have the Cloud

available

� results differ only after the

8th significant figure (not

reported in table)

Copyright © 2014 by K.S. Trivedi

Page 39: Optimization of IaaS Cloud including Performance ...

39

Interacting sub-models vs. simulation

Mean number of non-failed PMs

� n is the initial #PMs in each pool

Numeric solution of interacting sub-models in the c.i.

of simulation solution

Copyright © 2014 by K.S. Trivedi

Page 40: Optimization of IaaS Cloud including Performance ...

40

Analytic-Numeric vs. simulative solutions

Downtime [minutes per year]

Copyright © 2014 by K.S. Trivedi

Page 41: Optimization of IaaS Cloud including Performance ...

41

Analytic-Numeric vs. simulative solution

Solution times [seconds]

Copyright © 2014 by K.S. Trivedi

Page 42: Optimization of IaaS Cloud including Performance ...

42

Availability Model Summary

Copyright © 2014 by K.S. Trivedi

Page 43: Optimization of IaaS Cloud including Performance ...

43

Performance Modeling and

Analysis for IaaS Cloud[paper in Proc. IEEE PRDC 2010; FGCS 2013]

Copyright © 2014 by K.S. Trivedi

Page 44: Optimization of IaaS Cloud including Performance ...

44

System model

Current Assumptions [will be relaxed soon]

� Homogenous requests

� All physical machines (PMs) are identical.

Copyright © 2014 by K.S. Trivedi

Page 45: Optimization of IaaS Cloud including Performance ...

Provisioning and servicing steps:

(i) resource provisioning decision,

(ii) VM provisioning and

(iii) run-time execution

Life-cycle of a job inside a IaaS cloud

Run-time

Execution

Job rejection

due to buffer full

Job rejection due to

insufficient capacity

Arrival Queuing InstantiationVM

deploymentActual Service Out

Resource

Provisioning

Decision

Engine

Provisioning

Decision

Provisioning response delay

Copyright © 2014 by K.S. Trivedi

Page 46: Optimization of IaaS Cloud including Performance ...

Resource provisioning decision engine (RPDE)

Run-time

Execution

Job rejection

due to buffer full

Job rejection due to

insufficient capacity

Arrival Queuing InstantiationVM

deploymentActual Service Out

Resource

Provisioning

Decision

Engine

Provisioning

Decision

Provisioning response delay

Copyright © 2014 by K.S. Trivedi

Page 47: Optimization of IaaS Cloud including Performance ...

47

Flow-chart:

Resource provisioning decision engine (RPDE)

Copyright © 2014 by K.S. Trivedi

Page 48: Optimization of IaaS Cloud including Performance ...

48Copyright © 2014 by K.S. Trivedi

CTMC model for RPDE

0,0 0,h

λ1,h

λ

1,w

λ

1,c

λ

hhPδ hhPδ hhPδ hhPδ

0,w

)1( hh P−δ )1( hh P−δ )1( hh P−δ

)1( cc P−δ

)1( cc P−δ

0,c

)1( ww P−δ )1( ww P−δ )1( ww P−δ

)1( cc P−δ

)1( cc P−δwwPδ

wwPδ

wwPδ

wwPδ

ccPδccPδ

ccPδ

ccPδ

λN-1,h

λ…

λλN-1,w

N-1,cλ λ

i = number of jobs in queue,

s = pool (hot, warm or cold)i,s

Page 49: Optimization of IaaS Cloud including Performance ...

49

Generator Matrix of the RPDE model

Copyright © 2014 by K.S. Trivedi

"

3

"

2

"

1

3

2

1

3

2

1

3

2

1

3

2

1

0

*,1

)1(*,1

)1(*,1

*,2

)1(*,2

)1(*,2

*,2

)1(*,2

)1(*,2

*,1

)1(*,1

)1(*,1

*,0

)1(*,0

)1(*,0

*0,0

,1,1,1,2,2,2,2,2,2,1,1,1,0,0,00,0

c

wwww

hhhh

ww

hh

c

wwww

hhhh

c

wwww

hhhh

c

wwww

hhhh

cN

PPwN

PPhN

cN

PwN

PhN

c

PPw

PPh

c

PPw

PPh

c

PPw

PPh

cNwNhNcNwNhNcwhcwhcwh

δ

δδ

δδ

λ

λδ

λδ

δ

δδ

δδ

λδ

λδδ

λδδ

λδ

λδδ

λδδ

λ

−−

−−

−−

−−

−−−−−−

OM

O

O

O

L

The generator matrix possesses significant structure and may be solved through matrix geometric method.

Page 50: Optimization of IaaS Cloud including Performance ...

50

Closed form solution of RPDE sub-model

Copyright © 2014 by K.S. Trivedi

Let,

)1( ww PZ

−=

δ

λ

)1( ww

c

PX

+=

δ

λδ

)1( hh

w

PY

+=

δ

δλ

)1( hh PW

−=

δ

λ

It can be shown:

Page 51: Optimization of IaaS Cloud including Performance ...

51

Closed form solution of RPDE sub-model

Copyright © 2014 by K.S. Trivedi

Page 52: Optimization of IaaS Cloud including Performance ...

52

Closed form solution of RPDE sub-model

Copyright © 2014 by K.S. Trivedi

Similarly, other state probabilities can be derived in terms of )0,0(π

Where,

Finally, normalization is provided by:

Page 53: Optimization of IaaS Cloud including Performance ...

53

RPDE model: parameters & measures

Input Parameters:� –arrival rate: data collected from cloud

� – mean search delays for resource provisioning

decision engine: from searching algorithms or measurements

� – probability of being able to provision: computed from

VM provisioning model

� N – maximum # jobs in RPDE: from system/server specification

Output Measures:

� Job rejection probability due to buffer full (Pblock)

� Job rejection probability due to insufficient capacity (Pdrop)

� Sum of the above two is the overall rejection probability (ρreject )

� Mean decision delay for an accepted job (E[Tdecision])

� Mean queuing delay for an accepted job (E[Tq_dec])

cwh δδδ /1,/1,/1

λ

cwh PPP ,,

Copyright © 2014 by K.S. Trivedi

Page 54: Optimization of IaaS Cloud including Performance ...

VM provisioning

Run-time

Execution

Job rejection

due to buffer full

Job rejection due to

insufficient capacity

Arrival Queuing InstantiationVM

deploymentActual Service Out

Resource

Provisioning

Decision

Engine

Provisioning

Decision

Provisioning response delay

Copyright © 2014 by K.S. Trivedi

Page 55: Optimization of IaaS Cloud including Performance ...

55

VM provisioning model

Service

out

Resource

Provisioning

Decision

Engine

Accepted jobs

Running VMs

Idle resources on hot machine

Idle resources on warm machine

Idle resources on cold machine

Hot PMpool

Warm pool

Cold pool

Hot PM

Copyright © 2014 by K.S. Trivedi

Page 56: Optimization of IaaS Cloud including Performance ...

56

VM provisioning model for each hot PM

0,0,0 0,1,0 Lh,1,0

0,0,1 (Lh-1),1,1 Lh,1,1

1,0,m Lh,0,m

Lh,1,(m-1)

0,0,m

(Lh-1),1,(m-1)0,0,(m-1) 0,1,(m-1)

hλ hλ hλ

hλhλ hλ

hλhλ

hλhλ

hλhλhλ

hβ hβ

hβhβ hβ

hβhβ

hβ hβhβ hβ

µ

µ µ µ

µ2µ2

µ2

µ)1( −m

µm

µ)1( −m

µmµm

µ)1( −mµ)1( −m

……

Lh is the buffer size and m is max. # VMs that can run simultaneously on a PM

i,j,k i = number of jobs in the queue, j = number of VMs being provisioned,

k = number of VMs running

Copyright © 2014 by K.S. Trivedi

Page 57: Optimization of IaaS Cloud including Performance ...

57

Generator Matrix of the hot PM model

Copyright © 2014 by K.S. Trivedi

22

21

20

13

12

11

10

13

12

11

10

13

12

11

10

03

02

01

0

*2312

*311

*310

*3303

*2212

*211

*210

*3203

*2112

*111

*110

*3103

*2012

*011

*010

*3003

*2002

*001

*000

312311310303212211210203112111110103012011010003002001000

µβ

µβ

β

µ

λµβ

λµβ

λβ

λµ

λµβ

λµβ

λβ

λµ

λµβ

λµβ

λβ

λµ

λµ

λµ

λ

The generator matrix when Lh

= 3 and m = 3. It possesses a block structure that facilitates a matrix geometric solution.

Page 58: Optimization of IaaS Cloud including Performance ...

58

Input Parameters:

� can be measured experimentally� obtained from the lower level run-time model� obtained from the resource provisioning decision model

Hot pool model is the set of independent hot PM models

Output Measure:

� = prob. that a job is accepted in the hot pool =

where, is the steady state probability that a PM can not accept job for provisioning - from the solution of the Markov model of a hot PM on the previous slide

h

blockh

n

P )1( −=

λλ

hβ/1

µ/1

hP h

hh

nh

mL

m

i

h

iL )(1 )(

),0,(

1

0

)(

),1,( ϕϕ +− ∑−

=

blockP

VM provisioning model (for each hot PM)

hn

)( )(

),0,(

1

0

)(

),1,(

h

mL

m

i

h

iL hhϕϕ +∑

=

Copyright © 2014 by K.S. Trivedi

Page 59: Optimization of IaaS Cloud including Performance ...

59

VM provisioning model for each warm PM

0,0,0 0,1*,0 Lw,1*,0

0,0,1 (Lw-1),1,1 Lw,1,1

hβhβ

µ

µµ

µ

µ2 µ2µ2

1,0,m Lw,0,m

Lw,1,(m-1)

0,0,m

(Lw-1),1,(m-

1)0,0,(m-1) 0,1,(m-1)

hβhβ hβ

hβ hβhβ hβ

µ)1( −m

µm

µ)1( −m

µmµm

µ)1( −mµ)1( −m…

… …

0,1,0 Lw,1,0

0,1**,

0 Lw, 1**,0

wλ wλ

wλwλ

wγ wγ

wβhβhβ

wλ…

wλ wλ

wλwλ

wλwλ

wλ wλwλ

0,1,1

Copyright © 2014 by K.S. Trivedi

Page 60: Optimization of IaaS Cloud including Performance ...

60

Generator Matrix for warm pool model

Copyright © 2014 by K.S. Trivedi

22

21

**

13

12

11

10

**

13

12

11

10

**

13

12

11

10

**

13

12

11

*

01

00

*

01

00

*

01

00

*

************

*2312

*311

031

*3303

*2212

*211

*021

*3203

*2112

*111

*011

*3103

*2012

*011

*001

*3003

*2002

*001

000

310

031

*210

*021

*110

*011

*010

*001

312311031303212211021203112111011103012011001003002001000310031210021110011010001

µβ

µβ

ββ

µ

λµβ

λµβ

λβ

λµ

λµβ

λµβ

λβ

λµ

λµβ

λµβ

λβ

λµ

λµ

λµ

λλ

ββ

γγ

βλ

λγ

βλ

λγ

βλ

λγ

h

h

hh

wh

wh

wh

w

wh

wh

wh

w

wh

wh

wh

w

w

w

ww

ww

ww

ww

ww

ww

ww

ww

ww

Lw

= 3 and m = 3. The matrix may be solved using matrix-analytical method.

Page 61: Optimization of IaaS Cloud including Performance ...

61

VM provisioning model for each cold PM

0,0,0 0,1*,0Lc,1*,

0

0,0,1 (Lc-1),1,1Lc,1,

1

hβhβ

µ

µµ

µ

µ2 µ2µ2

1,0,

m Lc,0,m

Lc,1,(m-1)

0,0,

m

(Lc-1),1,(m-1)0,0,(m-1) 0,1,(m-1)

hβhβ hβ

hβ hβhβ hβ

µ)1( −m

µm

µ)1( −m

µmµm

µ)1( −mµ)1( −m…

… …

0,1,0 Lc,1,0

0,1**,

0Lc,

1**,0

hβhβ

cλ cλ

cλcλ

cλcλ

cλ cλ

cλcλ

cλ cλ

cλ cλcλ

0,1,1cλ

Copyright © 2014 by K.S. Trivedi

Page 62: Optimization of IaaS Cloud including Performance ...

62

Warm/cold PM model is similar to hot PM, except:� Effective job arrival rate

� For first job, warm/cold PM requires additional start-up time

� Mean provisioning delay for a VM for the first job is longer

Outputs of hot, warm and cold pool models:� Probabilities ( ) that at least one PM in hot/warm/cold

pool can accept a job

VM provisioning model: Summary

cwh PPP ,,

Copyright © 2014 by K.S. Trivedi

Page 63: Optimization of IaaS Cloud including Performance ...

63

Import graph for performance models

VM

provisioning

models

hP

cP

hP

Hot pool

model

Warm pool

model

Cold pool

model

RPDE model

blockP

hP

wP

wP

blockP

blockP

job rejection probability

and mean response delay

Copyright © 2014 by K.S. Trivedi

Page 64: Optimization of IaaS Cloud including Performance ...

64

To solve hot, warm and cold PM models, we need from resource

provisioning decision model

To solve provisioning decision model, we need from hot, warm

and cold pool model respectively

This leads to a cyclic dependency among the resource provisioning

decision model and VM provisioning models (hot, warm, cold)

We resolve this dependency via fixed-point iteration

Observe, our fixed-point variable is and corresponding fixed-point

equation is of the form:

Fixed-point iteration

blockP

cwhPPP ,,

blockP

)(blockblock

PfP =

Copyright © 2014 by K.S. Trivedi

Page 65: Optimization of IaaS Cloud including Performance ...

65

Many questions

Existence of Fixed Point (easy)

Uniqueness

Rate of convergence

Accuracy

Scalability

Copyright © 2014 by K.S. Trivedi

Page 66: Optimization of IaaS Cloud including Performance ...

66

1 PM per pool and 1 VM per PM

The error is between e-03 and e-07 for all the results.

The number of states in monolithic model is 912 while in ISP model it is 21

Performance measures comparison with monolithic model

Jobs/hr Mean RPDE queue length Rejection probability

IMC monolithic IMC Monolithic

1 9.0332e-07 9.2321e-07 9.8899e-06 1.1221e-03

5 4.1622e-05 4.3364e-05 4.2334e-02 8.0500e-02

10 2.3731e-04 2.4225e-04 2.3496e-01 2.6587e-01

15 6.3539e-04 6.4377e-04 3.9860e-01 4.1493e-01

20 1.2526e-03 1.2655e-03 5.1069e-01 5.1969e-01

25 2.0990e-03 2.1179e-03 5.8915e-01 5.9449e-01

30 3.1826e-03 3.2091e-01 6.4648e-01 6.4985e-01

35 4.5106e-03 4.5462e-03 6.8999e-01 6.9223e-01

Copyright © 2014 by K.S. Trivedi

Page 67: Optimization of IaaS Cloud including Performance ...

67

Power Quantification for IaaS Cloud[paper in Proc. IEEE/IFIP DSN workshop DCDV 2011]

Copyright © 2014 by K.S. Trivedi

Page 68: Optimization of IaaS Cloud including Performance ...

68

Hot PM idle power consumption (no VM): hl

Additional power consumption of each running VM with average resource utilization: va

For each state (i, j, k) of the CTMC, we assign a reward rate: r(i, j, k) = hl + kva

Power Consumption from Hot PM Model

0,0,0 0,1,0 Lh,1,0

0,0,1(Lh-1),1,1

Lh,1,1

1,0,m Lh,0,m

Lh,1,(m-1)

0,0,m

(Lh-1),1,(m-1)

0,0,(m-1) 0,1,(m-1)

hλ hλ hλ

hλhλ hλ

hλhλ

hλhλ

hλhλhλ

hβ hβ

hβhβ hβ

hβhβ

hβ hβhβ hβ

µ

µ µ µ

µ2µ2

µ2

µ)1( −m

µm

µ)1( −m

µmµm

µ)1( −mµ)1( −m

… …

Copyright © 2014 by K.S. Trivedi

Page 69: Optimization of IaaS Cloud including Performance ...

69

Power Consumption from Warm PM Model

llll hwww ≤≤≤321

Warm PM CTMC states Reward rates

Copyright © 2014 by K.S. Trivedi

Page 70: Optimization of IaaS Cloud including Performance ...

70

Power Consumption from Cold PM Model

llll hccc ≤≤≤321

Net power consumption is sum of power consumptions in hot, warm and cold pool

Cold PM CTMC states Reward rates

Copyright © 2014 by K.S. Trivedi

Page 71: Optimization of IaaS Cloud including Performance ...

71

zyx ′′′′′′ ,,zyx ′′′ ,,zyx ,,

Performance model

zyxP ,,

zyxP ′′′ ,,

zyxP ′′′′′′ ,,

States of availability model

Overall power consumption and cooling cost as expected steady state reward rates

… …

Power and cooling cost

Page 72: Optimization of IaaS Cloud including Performance ...

72

Conclusions

Copyright © 2014 by K.S. Trivedi

Page 73: Optimization of IaaS Cloud including Performance ...

73

Conclusions

Analytic models are powerful for the construction and numerical solution of various reliability, availability, performance, and power models

For very complex systems such as clouds, hierarchical, fixed-point iterative and approximate solutions are needed.� Performance, availability and power consumption analysis can be

done using such an approach

Simulative and hybrid models/solutions should be used when absolutely necessary

Models can then be used in � capacity planning

� a feedback control setting for adapting to changes

Copyright © 2014 by K.S. Trivedi

Page 74: Optimization of IaaS Cloud including Performance ...

74

� Performance analysis:

� Developed scalable interacting stochastic sub-models for large Clouds

� Analysis of provisioning delay and impact of different factors, e.g., arrival rate, system capacity, resource holding time etc.

� R. Ghosh, F. Longo, V. K. Naik, and K. S. Trivedi, “Modeling and Performance Analysis of Large Scale IaaS Clouds”, Elsevier Future Generation Computing Systems, July 2013.

Summary

Copyright © 2014 by K.S. Trivedi

Page 75: Optimization of IaaS Cloud including Performance ...

75

� Scalable Availability analysis:

� Interacting stochastic sub-models for failure-repair analysis

� F. Longo, R. Ghosh, V. K. Naik, and K. S. Trivedi, “A Scalable Availability Model for Infrastructure-as-a-Service Cloud”, DSN, June 2011.

� R. Ghosh, F. Longo, F. Frattini, S. Russo and K. S. Trivedi, “Scalable Analytics for IaaS Cloud Analytics”, IEEE Trans. On Cloud Computing, accepted Feb 2014.

� Cost analysis, optimization and Cloud capacity planning:

� Developed and solved optimization problems to minimize the total cost without violating the SLAs

� R. Ghosh, F. Longo, R. Xia, V. K. Naik, and K. S. Trivedi, “Stochastic Model Driven Capacity Planning for an Infrastructure-as-a-Service Cloud”, IEEE Trans. On Services Computing, accepted August 2013.

Summary (contd.)

Copyright © 2014 by K.S. Trivedi

Page 76: Optimization of IaaS Cloud including Performance ...

76

Thanks!

Copyright © 2014 by K.S. Trivedi