Optimizing Client-side Resource Utilization in Public...

51
Optimizing Client-side Resource Utilization in Public Clouds Swapnil Haria, Mihir Patil, Haseeb Tariq, Anup Rathi

Transcript of Optimizing Client-side Resource Utilization in Public...

Page 1: Optimizing Client-side Resource Utilization in Public Cloudspages.cs.wisc.edu/~swapnilh/resources/CS736 Final Presentation.pdf•Live migrating SPEC benchmarks running in LXC •Observed

Optimizing Client-side Resource Utilization inPublic Clouds

Swapnil Haria, Mihir Patil, Haseeb Tariq, Anup Rathi

Page 2: Optimizing Client-side Resource Utilization in Public Cloudspages.cs.wisc.edu/~swapnilh/resources/CS736 Final Presentation.pdf•Live migrating SPEC benchmarks running in LXC •Observed

Outline

• Motivation

• Solution

• Implementation

• Evaluation

• Conclusion

Page 3: Optimizing Client-side Resource Utilization in Public Cloudspages.cs.wisc.edu/~swapnilh/resources/CS736 Final Presentation.pdf•Live migrating SPEC benchmarks running in LXC •Observed

Outline

• Motivation

• Solution

• Implementation

• Evaluation

• Conclusion

Page 4: Optimizing Client-side Resource Utilization in Public Cloudspages.cs.wisc.edu/~swapnilh/resources/CS736 Final Presentation.pdf•Live migrating SPEC benchmarks running in LXC •Observed

Cloud Services ( Not a distraction anymore1)

[1] Jeff Bezos' Risky Bet, November 2006, http://www.bloomberg.com/bw/stories/2006-11-12/jeff-bezos-risky-bet

Page 5: Optimizing Client-side Resource Utilization in Public Cloudspages.cs.wisc.edu/~swapnilh/resources/CS736 Final Presentation.pdf•Live migrating SPEC benchmarks running in LXC •Observed

• 30 % of total cloud revenue• Annual revenues crossed $5 Billion

Cloud Services ( Not a distraction anymore1)

[1] Jeff Bezos' Risky Bet, November 2006, http://www.bloomberg.com/bw/stories/2006-11-12/jeff-bezos-risky-bet

Page 6: Optimizing Client-side Resource Utilization in Public Cloudspages.cs.wisc.edu/~swapnilh/resources/CS736 Final Presentation.pdf•Live migrating SPEC benchmarks running in LXC •Observed

Cloud Services ( Not a distraction anymore1)

• 30 % of total cloud revenue• Annual revenues crossed $5 Billion

Other Players :

[1] Jeff Bezos' Risky Bet, November 2006, http://www.bloomberg.com/bw/stories/2006-11-12/jeff-bezos-risky-bet

Page 7: Optimizing Client-side Resource Utilization in Public Cloudspages.cs.wisc.edu/~swapnilh/resources/CS736 Final Presentation.pdf•Live migrating SPEC benchmarks running in LXC •Observed

Popularity

• ZERO up-front capital expenses

• On-demand hardware availability

• Flexible pricing options

Page 8: Optimizing Client-side Resource Utilization in Public Cloudspages.cs.wisc.edu/~swapnilh/resources/CS736 Final Presentation.pdf•Live migrating SPEC benchmarks running in LXC •Observed

Popularity

• ZERO up-front capital expenses

• On-demand hardware availability

• Flexible pricing options

"Amazon EC2 changes the economics of computing by allowing you to pay only for capacity that you actually use."

Page 9: Optimizing Client-side Resource Utilization in Public Cloudspages.cs.wisc.edu/~swapnilh/resources/CS736 Final Presentation.pdf•Live migrating SPEC benchmarks running in LXC •Observed

Popularity

• ZERO up-front capital expenses

• On-demand hardware availability

• Flexible pricing options

"Amazon EC2 changes the economics of computing by allowing you to pay only for capacity that you actually use."

Elastic Cloud Compute

Page 10: Optimizing Client-side Resource Utilization in Public Cloudspages.cs.wisc.edu/~swapnilh/resources/CS736 Final Presentation.pdf•Live migrating SPEC benchmarks running in LXC •Observed

Popularity

• ZERO up-front capital expenses

• On-demand hardware availability

• Flexible pricing options

"Amazon EC2 changes the economics of computing by allowing you to pay only for capacity that you actually use."

Elastic Cloud Compute

Page 11: Optimizing Client-side Resource Utilization in Public Cloudspages.cs.wisc.edu/~swapnilh/resources/CS736 Final Presentation.pdf•Live migrating SPEC benchmarks running in LXC •Observed

Limitations

• Allocate resources in fixed sized chunks (EC2 Instances)

• 1 core , 1GB RAM -> 36 core, 244 GB RAM

• Accurately predict application requirements

• Undersized VM - Performance degradation

• Oversized VM - Extra costs

Multiple applications, multiple VMs, no peace

Page 12: Optimizing Client-side Resource Utilization in Public Cloudspages.cs.wisc.edu/~swapnilh/resources/CS736 Final Presentation.pdf•Live migrating SPEC benchmarks running in LXC •Observed

Challenges

• Application requirements vary widely

• Black Friday for e-commerce websites

http://www.xad.com/media-mentions/mobile-activity-on-xmas-eve-24-pct-higher-than-black-friday/

Page 13: Optimizing Client-side Resource Utilization in Public Cloudspages.cs.wisc.edu/~swapnilh/resources/CS736 Final Presentation.pdf•Live migrating SPEC benchmarks running in LXC •Observed

Challenges

• Application requirements vary widely

• Black Friday for e-commerce websites

• Evenings and late nights for Netflix

http://www.techspot.com/news/46048-netflix-represents-327-of-north-americas-peak-web-traffic.html

Page 14: Optimizing Client-side Resource Utilization in Public Cloudspages.cs.wisc.edu/~swapnilh/resources/CS736 Final Presentation.pdf•Live migrating SPEC benchmarks running in LXC •Observed

Challenges

• Application requirements vary widely

• Black Friday for e-commerce websites

• Evenings and late nights for Netflix

• Slashdot effect!

CMUSphinx Project

Page 15: Optimizing Client-side Resource Utilization in Public Cloudspages.cs.wisc.edu/~swapnilh/resources/CS736 Final Presentation.pdf•Live migrating SPEC benchmarks running in LXC •Observed

Challenges

• Humans are bad at estimating workload requirements2

• Study of developers at Twitter submitting jobs to datacenter

• 70% overestimated by 10x

• 20% underestimated by 5x

terrible

[2]Quasar: Resource-Efficient and QoS-Aware Cluster Management. Christina Delimitrou and Christos Kozyrakis. ASPLOS 2014.

Page 16: Optimizing Client-side Resource Utilization in Public Cloudspages.cs.wisc.edu/~swapnilh/resources/CS736 Final Presentation.pdf•Live migrating SPEC benchmarks running in LXC •Observed

Outline

• Motivation

• Solutions

• Implementation

• Evaluation

• Conclusion

Page 17: Optimizing Client-side Resource Utilization in Public Cloudspages.cs.wisc.edu/~swapnilh/resources/CS736 Final Presentation.pdf•Live migrating SPEC benchmarks running in LXC •Observed

Resource as a Service3

1. Fine grained cloud reservations

2. CPU (cycles), memory (pages), I/O (bandwidth), Time (seconds)

• Where does it stop?

• Reduces wasted costs, but difficult to reason about

• Hardware feasibility issues for service providers

[3] The rise of RaaS: the resource-as-a-service cloud. Orna Agmon Ben-Yehuda et al. Commun. ACM 2014

Page 18: Optimizing Client-side Resource Utilization in Public Cloudspages.cs.wisc.edu/~swapnilh/resources/CS736 Final Presentation.pdf•Live migrating SPEC benchmarks running in LXC •Observed

Proposal

Page 19: Optimizing Client-side Resource Utilization in Public Cloudspages.cs.wisc.edu/~swapnilh/resources/CS736 Final Presentation.pdf•Live migrating SPEC benchmarks running in LXC •Observed

Tell me more!

Application Mobility Real-time Management

Page 20: Optimizing Client-side Resource Utilization in Public Cloudspages.cs.wisc.edu/~swapnilh/resources/CS736 Final Presentation.pdf•Live migrating SPEC benchmarks running in LXC •Observed

Application Mobility

• On-demand application migration across machines

• Conventional issues -

• Application state stored in kernel (file descriptors, sockets)

• Residual dependencies left on source machine

• Execution Continuity

We need

- Process Isolation (even from kernel)

- Minimal state in kernel

Page 21: Optimizing Client-side Resource Utilization in Public Cloudspages.cs.wisc.edu/~swapnilh/resources/CS736 Final Presentation.pdf•Live migrating SPEC benchmarks running in LXC •Observed

Now where did I see that before?

Image Source - Wikipedia

Page 22: Optimizing Client-side Resource Utilization in Public Cloudspages.cs.wisc.edu/~swapnilh/resources/CS736 Final Presentation.pdf•Live migrating SPEC benchmarks running in LXC •Observed

Where do I find one of these?

Old idea, but making a comeback in Cloud OS• Drawbridge from Microsoft Research

• MirageOS from University of Cambridge

Both (claim to) support application-migration!

Page 23: Optimizing Client-side Resource Utilization in Public Cloudspages.cs.wisc.edu/~swapnilh/resources/CS736 Final Presentation.pdf•Live migrating SPEC benchmarks running in LXC •Observed

Real-time Management

• Monitor application requirements in real-time

• Use application migration to organize processes on VMs

Page 24: Optimizing Client-side Resource Utilization in Public Cloudspages.cs.wisc.edu/~swapnilh/resources/CS736 Final Presentation.pdf•Live migrating SPEC benchmarks running in LXC •Observed

Real-time Management

• Monitor application requirements in real-time • Relatively easy• Working set sizes, idle cycles

• Use application migration to organize processes on VMs• Complex • Varying configurations and prices of VMs• Identifying processes to migrate • Downtime / Budgets!

Page 25: Optimizing Client-side Resource Utilization in Public Cloudspages.cs.wisc.edu/~swapnilh/resources/CS736 Final Presentation.pdf•Live migrating SPEC benchmarks running in LXC •Observed

Policies

Steps

• Determine migration events

• Identify process(es) for migration

• Choose target from existing VMs, if possible

• Figure out instance types for creating new VMs

Page 26: Optimizing Client-side Resource Utilization in Public Cloudspages.cs.wisc.edu/~swapnilh/resources/CS736 Final Presentation.pdf•Live migrating SPEC benchmarks running in LXC •Observed

Policies

Metrics (in order of priority)

• Maximize VM utilization

• Satisfy performance guarantees

• Minimize costs

User-Defined Parameters

• Upper limit on cost

• Max downtime per process

Page 27: Optimizing Client-side Resource Utilization in Public Cloudspages.cs.wisc.edu/~swapnilh/resources/CS736 Final Presentation.pdf•Live migrating SPEC benchmarks running in LXC •Observed

Policies

• Single Application per VM

• Easy to reason about

• Use naive best fit model to find target VMs

• Multiple Applications per VM

• Highly complex optimization problem (NP-Hard)

• Use Heuristics!

• Use best fit and explore nearby options to find target VMs

Page 28: Optimizing Client-side Resource Utilization in Public Cloudspages.cs.wisc.edu/~swapnilh/resources/CS736 Final Presentation.pdf•Live migrating SPEC benchmarks running in LXC •Observed

Software Architecture

Page 29: Optimizing Client-side Resource Utilization in Public Cloudspages.cs.wisc.edu/~swapnilh/resources/CS736 Final Presentation.pdf•Live migrating SPEC benchmarks running in LXC •Observed

Software Architecture

Page 30: Optimizing Client-side Resource Utilization in Public Cloudspages.cs.wisc.edu/~swapnilh/resources/CS736 Final Presentation.pdf•Live migrating SPEC benchmarks running in LXC •Observed

Software Architecture

Page 31: Optimizing Client-side Resource Utilization in Public Cloudspages.cs.wisc.edu/~swapnilh/resources/CS736 Final Presentation.pdf•Live migrating SPEC benchmarks running in LXC •Observed

Software Architecture

Page 32: Optimizing Client-side Resource Utilization in Public Cloudspages.cs.wisc.edu/~swapnilh/resources/CS736 Final Presentation.pdf•Live migrating SPEC benchmarks running in LXC •Observed

Outline

• Motivation

• Solutions

• Implementation

• Evaluation

• Conclusion

Page 33: Optimizing Client-side Resource Utilization in Public Cloudspages.cs.wisc.edu/~swapnilh/resources/CS736 Final Presentation.pdf•Live migrating SPEC benchmarks running in LXC •Observed

Proof of Concept Model

• Linux Containers (lxc)

• Emulate isolated processes on Drawbridge/MirageOS

• Checkpoint/Restore in Userspace (CRIU)

• Checkpoint containers on VM A

• Migrate files to VM B

• Restore on VM B

Page 34: Optimizing Client-side Resource Utilization in Public Cloudspages.cs.wisc.edu/~swapnilh/resources/CS736 Final Presentation.pdf•Live migrating SPEC benchmarks running in LXC •Observed

Simulator

• Rapidly validate migration policies

• Evaluate the influence of policy parameters on results

• Written in about 2000 lines of Java code

Page 35: Optimizing Client-side Resource Utilization in Public Cloudspages.cs.wisc.edu/~swapnilh/resources/CS736 Final Presentation.pdf•Live migrating SPEC benchmarks running in LXC •Observed

Outline

• Motivation

• Solutions

• Implementation

• Evaluation

• Conclusion

Page 36: Optimizing Client-side Resource Utilization in Public Cloudspages.cs.wisc.edu/~swapnilh/resources/CS736 Final Presentation.pdf•Live migrating SPEC benchmarks running in LXC •Observed

Experimental Setup

• Proof of concept model(WIP)• Live migrating SPEC benchmarks running in LXC

• Observed downtime – 30 seconds (depending of process size)

• Migration Policy Simulations• Used our own random workload generator

• 2 workloads of each type – static, high variability and low variability

Page 37: Optimizing Client-side Resource Utilization in Public Cloudspages.cs.wisc.edu/~swapnilh/resources/CS736 Final Presentation.pdf•Live migrating SPEC benchmarks running in LXC •Observed

Capping Costs

0

10

20

30

40

50

60

3 4 4.5 5

Max spending limit per day (dollars)

Number of Migrations

Single app Multiple apps

0

50

100

150

200

250

300

350

400

3 4 4.5 5

Max spending limit per day (dollars)

Overcommitment

Single app Multiple apps

Page 38: Optimizing Client-side Resource Utilization in Public Cloudspages.cs.wisc.edu/~swapnilh/resources/CS736 Final Presentation.pdf•Live migrating SPEC benchmarks running in LXC •Observed

Constraining Downtime

12.5

13

13.5

14

14.5

15

15.5

2 3 4 5

Max migrations per process per day

Total Cost

Single app Multiple apps

0

100

200

300

400

500

600

2 3 4 5

Max migrations per process per day

Overcommitment

Single app Multiple apps

Page 39: Optimizing Client-side Resource Utilization in Public Cloudspages.cs.wisc.edu/~swapnilh/resources/CS736 Final Presentation.pdf•Live migrating SPEC benchmarks running in LXC •Observed

Suppressing Spikes

0

10

20

30

40

50

60

1 4 8

Median window size

Number of Migrations

Single app Multiple apps

0

50

100

150

200

250

1 4 8

Median window size

Overcommitment

Single app Multiple apps

Page 40: Optimizing Client-side Resource Utilization in Public Cloudspages.cs.wisc.edu/~swapnilh/resources/CS736 Final Presentation.pdf•Live migrating SPEC benchmarks running in LXC •Observed

Show me the money

• Baseline• Used same workloads as the simulation

• Picked from available VMs that would best fit the workloads

• No migrations!

• Cost for 3 days - $45.36

• Our solution• No migration policy requires more than $15 for 3 days

• 66% money saved!

Page 41: Optimizing Client-side Resource Utilization in Public Cloudspages.cs.wisc.edu/~swapnilh/resources/CS736 Final Presentation.pdf•Live migrating SPEC benchmarks running in LXC •Observed

Conclusions

• Streamlining cloud operations important with increasing scale

• Current IaaS reservation models insufficient

• Better support needed from cloud providers • Amazon EC2 Container Service

• Migration policies have to optimize in a multi-dimensional space

• Simple ones offer savings too!

Page 42: Optimizing Client-side Resource Utilization in Public Cloudspages.cs.wisc.edu/~swapnilh/resources/CS736 Final Presentation.pdf•Live migrating SPEC benchmarks running in LXC •Observed

Questions?

Page 43: Optimizing Client-side Resource Utilization in Public Cloudspages.cs.wisc.edu/~swapnilh/resources/CS736 Final Presentation.pdf•Live migrating SPEC benchmarks running in LXC •Observed

BACKUPS

Page 44: Optimizing Client-side Resource Utilization in Public Cloudspages.cs.wisc.edu/~swapnilh/resources/CS736 Final Presentation.pdf•Live migrating SPEC benchmarks running in LXC •Observed

Single application per VM

Page 45: Optimizing Client-side Resource Utilization in Public Cloudspages.cs.wisc.edu/~swapnilh/resources/CS736 Final Presentation.pdf•Live migrating SPEC benchmarks running in LXC •Observed

Effect of cost per day

0

5

10

15

20

25

30

35

40

45

3 4 4.5 5

Max amount allowed per day (dollars)

Migrations and Cost

Migrations Cost

0

20

40

60

80

100

120

140

3 4 4.5 5

Max amount allowed per day (dollars)

Overcommitment

Overcommitment

Page 46: Optimizing Client-side Resource Utilization in Public Cloudspages.cs.wisc.edu/~swapnilh/resources/CS736 Final Presentation.pdf•Live migrating SPEC benchmarks running in LXC •Observed

Migrations cap

0

5

10

15

20

25

30

35

2 3 4

Max number of migrations per process per day

Migrations and Cost

Migrations Cost

0

100

200

300

400

500

600

2 3 4

Max number of migrations per process per day

Overcommitment

Overcommitment

Page 47: Optimizing Client-side Resource Utilization in Public Cloudspages.cs.wisc.edu/~swapnilh/resources/CS736 Final Presentation.pdf•Live migrating SPEC benchmarks running in LXC •Observed

Median window variations

0

5

10

15

20

25

30

35

40

45

50

1 4 8

Migrations and Cost

Migrations Cost

0

20

40

60

80

100

120

140

160

180

200

1 4 8

Overcommitment

Overcommitment

Page 48: Optimizing Client-side Resource Utilization in Public Cloudspages.cs.wisc.edu/~swapnilh/resources/CS736 Final Presentation.pdf•Live migrating SPEC benchmarks running in LXC •Observed

Multiple applications per VM

Page 49: Optimizing Client-side Resource Utilization in Public Cloudspages.cs.wisc.edu/~swapnilh/resources/CS736 Final Presentation.pdf•Live migrating SPEC benchmarks running in LXC •Observed

Effect of cost per day

0

10

20

30

40

50

60

3 4 4.5 5

Max amount allowed per day (dollars)

Migrations and Cost

Migrations Cost

0

50

100

150

200

250

300

350

400

3 4 4.5 5

Max amount allowed per day (dollars)

Overcommitment

Overcommitment

Page 50: Optimizing Client-side Resource Utilization in Public Cloudspages.cs.wisc.edu/~swapnilh/resources/CS736 Final Presentation.pdf•Live migrating SPEC benchmarks running in LXC •Observed

Migrations cap

0

5

10

15

20

25

30

35

40

45

50

3 4 5

Max number of migrations per process per day

Migrations and Cost

Migrations Cost

0

100

200

300

400

500

600

3 4 5

Max number of migrations per process per day

Overcommitment

Overcommitment

Page 51: Optimizing Client-side Resource Utilization in Public Cloudspages.cs.wisc.edu/~swapnilh/resources/CS736 Final Presentation.pdf•Live migrating SPEC benchmarks running in LXC •Observed

Median window variations

0

10

20

30

40

50

60

1 4 8

Migrations and Cost

Migrations Cost

0

50

100

150

200

250

1 4 8

Overcommitment

Overcommitment