© 2009 IBM Corporation 1 Improving Consolidation of Virtual Machines with Risk-aware Bandwidth...

30
© 2009 IBM Corporation 1 Improving Consolidation of Virtual Machines with Risk-aware Bandwidth Oversubscription in Compute Clouds Amir Epstein Joint work with David Breitgand

Transcript of © 2009 IBM Corporation 1 Improving Consolidation of Virtual Machines with Risk-aware Bandwidth...

Page 1: © 2009 IBM Corporation 1 Improving Consolidation of Virtual Machines with Risk-aware Bandwidth Oversubscription in Compute Clouds Amir Epstein Joint work.

© 2009 IBM Corporation1

Improving Consolidation of Virtual Machines with Risk-aware Bandwidth Oversubscription in Compute Clouds

Amir Epstein Joint work with David Breitgand

Page 2: © 2009 IBM Corporation 1 Improving Consolidation of Virtual Machines with Risk-aware Bandwidth Oversubscription in Compute Clouds Amir Epstein Joint work.

© 2009 IBM Corporation2

Motivation

Network Bandwidth is a critical Data Center resourceNetwork Bandwidth may become a bottleneck for

consolidationAccurate and efficient network bandwidth demand estimation

is difficultCommon practice: fully provision for peak loadsConsequences: resource waste

Page 3: © 2009 IBM Corporation 1 Improving Consolidation of Virtual Machines with Risk-aware Bandwidth Oversubscription in Compute Clouds Amir Epstein Joint work.

© 2009 IBM Corporation3

Full Provisioning VS. Multiplexing

The aggregate demand of VMs may be much smaller than the sum of the maximum demand of each VM: ∑i maxt di(t) >> maxt ∑i di(t)

Max(VM1)+Max(VM2)=1100

10

20

30

40

50

60

70

1 10 19 28 37 46 55 64 73 82 91 100

Time

Cap

acit

y

VM1

VM1-Max

0

10

20

30

40

50

60

1 10 19 28 37 46 55 64 73 82 91 100

Time

Cap

acit

y

VM2

VM2-Max

Page 4: © 2009 IBM Corporation 1 Improving Consolidation of Virtual Machines with Risk-aware Bandwidth Oversubscription in Compute Clouds Amir Epstein Joint work.

© 2009 IBM Corporation4

Full Provisioning VS. Multiplexing

Max(VM1+VM2)=71 < Max(VM1)+Max(VM2)=110

0

10

20

30

40

50

60

70

80

1 11 21 31 41 51 61 71 81 91

Time

Cap

acit

y VM1

VM2

VM1+VM2

Max: VM1+VM2

Page 5: © 2009 IBM Corporation 1 Improving Consolidation of Virtual Machines with Risk-aware Bandwidth Oversubscription in Compute Clouds Amir Epstein Joint work.

© 2009 IBM Corporation5

Statistical Multiplexing

Consider each VM dynamic bandwidth demands as a random variable

Consider the aggregate bandwidth demand which is a sum of the random variables representing VMs Bandwidth demands

As the number of VMs increases:– The ratio between standard deviation of the aggregate

bandwidth demand and the mean decreases

Page 6: © 2009 IBM Corporation 1 Improving Consolidation of Virtual Machines with Risk-aware Bandwidth Oversubscription in Compute Clouds Amir Epstein Joint work.

© 2009 IBM Corporation6

Overcommit

Cloud provider aims at improving cost-efficiency Overcommit resources using statistical multiplexingOur focus is bandwidth

Page 7: © 2009 IBM Corporation 1 Improving Consolidation of Virtual Machines with Risk-aware Bandwidth Oversubscription in Compute Clouds Amir Epstein Joint work.

© 2009 IBM Corporation7

Stochastic Bin Packing Problem (SBP)

S={X1,…, Xn} – Set of items

Xi – random variable representing the size (bandwidth demand) of item i

p – overflow probability Goal: Partition the set S into the smallest number of subsets

(bins) S1,…,Sk such that

p represents a probabilistic SLA / policy

kjpXji SXii

1for ]1Pr[:

Page 8: © 2009 IBM Corporation 1 Improving Consolidation of Virtual Machines with Risk-aware Bandwidth Oversubscription in Compute Clouds Amir Epstein Joint work.

© 2009 IBM Corporation8

SBP with Normal Distribution

We assume that each item i independently follows normal distribution N(μi ,σi

2) .

When σi,=0, for all i, then Xi= μi and the problem reduces to the classical bin packing problem

The focus of this work is SBP with normal variables

Page 9: © 2009 IBM Corporation 1 Improving Consolidation of Virtual Machines with Risk-aware Bandwidth Oversubscription in Compute Clouds Amir Epstein Joint work.

© 2009 IBM Corporation9

Related Work – Bin Packing

The problem is NP-hard Bin packing is hard to approximate to a factor better than

3/2 unless P=NP. First Fit Decreasing (FFD) has asymptotic approximation

ratio of 11/9 and (absolute) approximation ratio of 3/2. MFFD algorithm has asymptotic approximation ratio of

71/60. AFPTAS exists. Online bin packing

– First Fit (FF) has competitive ratio of 17/10.– Best upper and lower bounds are 1.58899 and 154014,

respectively.

Page 10: © 2009 IBM Corporation 1 Improving Consolidation of Virtual Machines with Risk-aware Bandwidth Oversubscription in Compute Clouds Amir Epstein Joint work.

© 2009 IBM Corporation10

Related Work – Stochastic Bin Packing

-approximation for SBP with Bernoulli

variables [Kleinberg et. al 1997] SBP with Poisson, Exponential and Bernoulli variables

[Goel and Indik 1999]– PTAS exists for Poisson and exponential distributions.– Quasi-PTAS exists for Bernoulli variables.– These results relax bin capacity and overflow probability

constraints by a factor 1+ε. - competitive algorithm for SBP with

normal variables [Wang et. al 2011]

1

1

log

log log

pO

p

(1 2)(1 )

Page 11: © 2009 IBM Corporation 1 Improving Consolidation of Virtual Machines with Risk-aware Bandwidth Oversubscription in Compute Clouds Amir Epstein Joint work.

© 2009 IBM Corporation11

Our Results

2-approximation algorithm for SBP with normal variables (2+ε)-competitive algorithm for online SBP with normal

variables Observe the existence of a dual PTAS for SBP with normal

variables.

Page 12: © 2009 IBM Corporation 1 Improving Consolidation of Virtual Machines with Risk-aware Bandwidth Oversubscription in Compute Clouds Amir Epstein Joint work.

© 2009 IBM Corporation12

Definitions

Definition: The effective load of bin j is

where and the quantile function is the inverse function of the CDF Ф of N(0,1).

Observation: A packing is feasible for a given overflow probability p iff for every bin j,

The load of bin j is normally distributed with mean and

variance

2

: :

1i j i j

j i ii X S i X S

l

1(1 )p 1

2

: :i j i j

j i ii X S i X S

l

: i j

ii X S

2

: i j

ii X S

Page 13: © 2009 IBM Corporation 1 Improving Consolidation of Virtual Machines with Risk-aware Bandwidth Oversubscription in Compute Clouds Amir Epstein Joint work.

© 2009 IBM Corporation13

Simple solution approach

Reduce the problem to the classical bin packing problem with item sizes

, thus A feasible solution to the classical bin packing problem is a

feasible solution SBP, since

The optimum for the classical bin packing instance with the new sizes may be significantly larger than the optimum for SBP.

i i

2

: : :

( ) 1i j i j i j

i i i ii X S i X S i X S

( ) 1i i iP X p

Page 14: © 2009 IBM Corporation 1 Improving Consolidation of Virtual Machines with Risk-aware Bandwidth Oversubscription in Compute Clouds Amir Epstein Joint work.

© 2009 IBM Corporation14

Effective Size

Thus, the effective size of item i on bin j can be viewed as

j

j

j j

Si

Sii

ii

Si Siiijl

2

2

2

)(

jSii

ii 2

2)(

Page 15: © 2009 IBM Corporation 1 Improving Consolidation of Virtual Machines with Risk-aware Bandwidth Oversubscription in Compute Clouds Amir Epstein Joint work.

© 2009 IBM Corporation15

Approximation Algorithm

Algorithm 1: First Fit VMR decreasingOrder the items in non-increasing order of VMRPlace the next item in the first bin into which it can be

feasibly packed If no such bin exists, open a new bin to pack this item

Variance to Mean Ratio (VMR) is2 /i i id

Page 16: © 2009 IBM Corporation 1 Improving Consolidation of Virtual Machines with Risk-aware Bandwidth Oversubscription in Compute Clouds Amir Epstein Joint work.

© 2009 IBM Corporation16

Approximation Algorithm

Theorem 1: Algorithm 1 is a 2-approximation algorithm for SBP with normal variables.

Page 17: © 2009 IBM Corporation 1 Improving Consolidation of Virtual Machines with Risk-aware Bandwidth Oversubscription in Compute Clouds Amir Epstein Joint work.

© 2009 IBM Corporation17

Integer Program for SBP

2

1 1

1

1 1 ,

1 1 i n,

x {0,1} 1 i n, 1

n n

ij i ij ii i

m

ijj

ij

x x j m

x

j m

Page 18: © 2009 IBM Corporation 1 Improving Consolidation of Virtual Machines with Risk-aware Bandwidth Oversubscription in Compute Clouds Amir Epstein Joint work.

© 2009 IBM Corporation18

Mathematical Program Relaxation

2

1 1

1

1 1 ,

1 1 i n,

x 0 1 i n, 1

n n

ij i ij ii i

m

ijj

ij

x x j m

x

j m

Page 19: © 2009 IBM Corporation 1 Improving Consolidation of Virtual Machines with Risk-aware Bandwidth Oversubscription in Compute Clouds Amir Epstein Joint work.

© 2009 IBM Corporation19

Fractional Algorithm (Algorithm 2)

Order the items in non-increasing order of VMRPlace the next item in the bin with remaining capacity. If

the item causes an overflow to the bin, assign maximum fraction of this item to the bin. Then, open a new bin to pack the remaining part of this item.

Variance to Mean Ratio (VMR) is 2 /i i id

Page 20: © 2009 IBM Corporation 1 Improving Consolidation of Virtual Machines with Risk-aware Bandwidth Oversubscription in Compute Clouds Amir Epstein Joint work.

© 2009 IBM Corporation20

Analysis

Lemma: There exists a feasible solution to the MP with the following property. For any pair of items k,l and a pair of bins i<j, if xkj>0 and xli>0, then dl ≥ dk.

Observation: Fractional algorithm produces a feasible fractional solution to the MP.

This implies that collocating items with high VMR (bursy)

minimizes the total effective size of the items

Variance to Mean Ratio (VMR) is 2 /i i id

Page 21: © 2009 IBM Corporation 1 Improving Consolidation of Virtual Machines with Risk-aware Bandwidth Oversubscription in Compute Clouds Amir Epstein Joint work.

© 2009 IBM Corporation21

Proof Outline

Consider a feasible solution to the MP with lexicographically maximal standard deviation (STD) vector of the bins S=(S1,…,Sm), where

Assume by contradiction that the items are not packed into the bins according to non-increasing order of VMR

Thus, there exists at least one pair of items that are not placed in this order (i.e., item with smaller VMR is packed to a bin with smaller index than the other item).

We show that we can exchange fractions of these items between the bins, such that

– the new solution is feasible– The STD vector of the bins in the solution is lexicographically

greater than the one in the original solution Contradiction

2

: i j

j ii X S

S

Page 22: © 2009 IBM Corporation 1 Improving Consolidation of Virtual Machines with Risk-aware Bandwidth Oversubscription in Compute Clouds Amir Epstein Joint work.

© 2009 IBM Corporation22

Online Algorithm

VMR Let

Class 0: Class 1≤k≤C: Class C+1:

2 /i i id

2id

2 1 2(1 ) (1 )k kid

2 21/ (1 )C id

1 4

8 1 1ln logC

Page 23: © 2009 IBM Corporation 1 Improving Consolidation of Virtual Machines with Risk-aware Bandwidth Oversubscription in Compute Clouds Amir Epstein Joint work.

© 2009 IBM Corporation23

Online Algorithm

Algorithm 3:Classify next item according to the VMR classesPlace the next item in the first bin of its class into which it

can be feasibly packed If no such bin exists, open a new bin to pack this item

Theorem 2: Algorithm 3 is a (2+O(ε))-approximation algorithm for SBP with normal variables.

Page 24: © 2009 IBM Corporation 1 Improving Consolidation of Virtual Machines with Risk-aware Bandwidth Oversubscription in Compute Clouds Amir Epstein Joint work.

© 2009 IBM Corporation24

Simulation Study

Compare our proposed algorithms to previous reported ones

Data set– Real trace from production data center used to compute

mean and standard deviation of bandwidth consumption of 6000 VMs over a few hours period.

– Synthetic traces with statistical properties similar to those of the real traces

Page 25: © 2009 IBM Corporation 1 Improving Consolidation of Virtual Machines with Risk-aware Bandwidth Oversubscription in Compute Clouds Amir Epstein Joint work.

© 2009 IBM Corporation25

Algorithms

Algorithms 1-3 First Fit (FF) with deterministic item sizes μi+βσi

First Fit Decreasing (FFD) with deterministic item sizes μi+βσi

Group Packing (GP) [Wang et. al 2011]

For the online algorithms (Algorithm 3 and Group Packing), we set ε=0.1.

Page 26: © 2009 IBM Corporation 1 Improving Consolidation of Virtual Machines with Risk-aware Bandwidth Oversubscription in Compute Clouds Amir Epstein Joint work.

© 2009 IBM Corporation26

Real Instance

(Online) (Approx.) (L.B)

Page 27: © 2009 IBM Corporation 1 Improving Consolidation of Virtual Machines with Risk-aware Bandwidth Oversubscription in Compute Clouds Amir Epstein Joint work.

© 2009 IBM Corporation27

Real Instance

(L.B)(Approx.)(Online)

Page 28: © 2009 IBM Corporation 1 Improving Consolidation of Virtual Machines with Risk-aware Bandwidth Oversubscription in Compute Clouds Amir Epstein Joint work.

© 2009 IBM Corporation28

Real Instance

(L.B)(Approx.)(Online)

Page 29: © 2009 IBM Corporation 1 Improving Consolidation of Virtual Machines with Risk-aware Bandwidth Oversubscription in Compute Clouds Amir Epstein Joint work.

© 2009 IBM Corporation29

Online Algorithms

Large synthetic instances

8%

9%

Page 30: © 2009 IBM Corporation 1 Improving Consolidation of Virtual Machines with Risk-aware Bandwidth Oversubscription in Compute Clouds Amir Epstein Joint work.

© 2009 IBM Corporation30

Summary

We studied SBP under the assumption that virtual machines bandwidth demand obeys normal distribution

We showed a 2-approximation algorithm We showed (2+ε)-competitive algorithm We observed the existence of a dual PTAS for SBPWe studied the performance and applicability of our

algorithms using synthetic and real data The performance evaluation showed that our proposed

algorithms considerably reduce the number of bins compared to the best known algorithms for the problem