SpotCheck: Designing a Derivative IaaS Cloud on the Spot Market

44
SpotCheck: Designing a Derivative IaaS Cloud on the Spot Market Prateek Sharma, Stephen Lee, Tian Guo, David Irwin, Prashant Shenoy EuroSys 2015

Transcript of SpotCheck: Designing a Derivative IaaS Cloud on the Spot Market

Page 1: SpotCheck: Designing a Derivative IaaS Cloud on the Spot Market

SpotCheck: Designing a Derivative IaaSCloud on the Spot Market

Prateek Sharma, Stephen Lee, Tian Guo,David Irwin, Prashant Shenoy

EuroSys 2015

Page 2: SpotCheck: Designing a Derivative IaaS Cloud on the Spot Market

Motivation System Design Evaluation Related Work Conclusion

Infrastructure Clouds

Cloud computing is popular and offers many benefits:I Ease of deployment, scalability, pay-as-you-goI Infrastructure, Platform, Software as a ServiceI Infrastructure clouds offer computing and storage resources

2/ 20

Page 3: SpotCheck: Designing a Derivative IaaS Cloud on the Spot Market

Motivation System Design Evaluation Related Work Conclusion

Cost vs. Availability TradeoffsCloud servers have different cost vs. availability tradeoffs:On-demand servers:

I Fixed price per unit timeI Non-revocable

Spot servers:I Variable prices based on market conditionsI Revocable =⇒ lower availabilityI Surplus capacity sold at lower price

m3.medium m3.large0.00

0.05

0.10

0.15

0.20

Pri

ce (

$/h

r)

On-demand Spot (average)

3/ 20

Page 4: SpotCheck: Designing a Derivative IaaS Cloud on the Spot Market

Motivation System Design Evaluation Related Work Conclusion

Cost vs. Availability TradeoffsCloud servers have different cost vs. availability tradeoffs:On-demand servers:

I Fixed price per unit timeI Non-revocable

Spot servers:I Variable prices based on market conditionsI Revocable =⇒ lower availabilityI Surplus capacity sold at lower price

m3.medium m3.large0.00

0.05

0.10

0.15

0.20

Pri

ce (

$/h

r)

On-demand Spot (average)

3/ 20

Page 5: SpotCheck: Designing a Derivative IaaS Cloud on the Spot Market

Motivation System Design Evaluation Related Work Conclusion

Exploiting Spot Servers

Spot servers ideal for batch jobsI Batch jobs are disruption tolerantI Checkpoint-Restart to tolerate spot revocations

Spot servers for interactive applications?I Revocable =⇒ Downtime

I Mitigate impact of revocation?I Potentially lower costs

4/ 20

Page 6: SpotCheck: Designing a Derivative IaaS Cloud on the Spot Market

Motivation System Design Evaluation Related Work Conclusion

Exploiting Spot Servers

Spot servers ideal for batch jobsI Batch jobs are disruption tolerantI Checkpoint-Restart to tolerate spot revocations

Spot servers for interactive applications?I Revocable =⇒ Downtime

I Mitigate impact of revocation?I Potentially lower costs

4/ 20

Page 7: SpotCheck: Designing a Derivative IaaS Cloud on the Spot Market

Motivation System Design Evaluation Related Work Conclusion

Running Interactive Applications on Spot Servers

Spot

Application

Application

On-demand

ApplicationMigrate

Naive solution:I Spot server revoked =⇒ migrate to on-demandI Revocation =⇒ risk of losing application stateI Users cannot manage complexity & risk of migration

Problem Statement:Design a system to transparently use a mix of spot & on-demandservers to run interactive applications

5/ 20

Page 8: SpotCheck: Designing a Derivative IaaS Cloud on the Spot Market

Motivation System Design Evaluation Related Work Conclusion

Running Interactive Applications on Spot Servers

Spot

Application

Application

On-demand

ApplicationMigrate

Naive solution:I Spot server revoked =⇒ migrate to on-demand

I Revocation =⇒ risk of losing application stateI Users cannot manage complexity & risk of migration

Problem Statement:Design a system to transparently use a mix of spot & on-demandservers to run interactive applications

5/ 20

Page 9: SpotCheck: Designing a Derivative IaaS Cloud on the Spot Market

Motivation System Design Evaluation Related Work Conclusion

Running Interactive Applications on Spot Servers

Spot

Application

Application

On-demand

ApplicationMigrate

Naive solution:I Spot server revoked =⇒ migrate to on-demand

I Revocation =⇒ risk of losing application stateI Users cannot manage complexity & risk of migration

Problem Statement:Design a system to transparently use a mix of spot & on-demandservers to run interactive applications

5/ 20

Page 10: SpotCheck: Designing a Derivative IaaS Cloud on the Spot Market

Motivation System Design Evaluation Related Work Conclusion

Running Interactive Applications on Spot Servers

Spot

Application

Application

On-demand

ApplicationMigrate

Naive solution:I Spot server revoked =⇒ migrate to on-demandI Revocation =⇒ risk of losing application stateI Users cannot manage complexity & risk of migration

Problem Statement:Design a system to transparently use a mix of spot & on-demandservers to run interactive applications

5/ 20

Page 11: SpotCheck: Designing a Derivative IaaS Cloud on the Spot Market

Motivation System Design Evaluation Related Work Conclusion

Talk Outline

1. Motivation2. System Design3. Evaluation4. Related Work5. Conclusion

6/ 20

Page 12: SpotCheck: Designing a Derivative IaaS Cloud on the Spot Market

Motivation System Design Evaluation Related Work Conclusion

Derivative CloudI Cloud middleware derived from infrastructure cloudsI Buy servers from IaaS and resell to usersI Use servers of different types =⇒ different SLA from IaaSI Example: spot & on-demand pool to lower costs and increase

availability

Derivative Cloud

Native IaaS Cloud

Customers

Spot PoolVM VM VM

On-demand PoolVM VM VM

Lease Servers

Request Servers

Resell Servers

7/ 20

Page 13: SpotCheck: Designing a Derivative IaaS Cloud on the Spot Market

Motivation System Design Evaluation Related Work Conclusion

SpotCheckI SpotCheck: Derivative cloud using spot & on-demand serversI Provide low-cost, non-revocable servers to run unmodified

applicationsI Key idea: Run on spot. When revoked, migrate to on-demandI Requirement: Transparently migrate VMs without losing state

SpotCheck

Native IaaS Cloud

Spot PoolVM VM VM

On-demand PoolVM VM VM

Lease Servers

Migrate

8/ 20

Page 14: SpotCheck: Designing a Derivative IaaS Cloud on the Spot Market

Motivation System Design Evaluation Related Work Conclusion

VM Migration

Key idea:Run on spot. When revoked, migrate to on-demand

Naive approach: Virtual Machine Live MigrationI Migration may not completeI Small termination warning (~2 minutes on EC2)I Incomplete migration =⇒ loss of state

9/ 20

Page 15: SpotCheck: Designing a Derivative IaaS Cloud on the Spot Market

Motivation System Design Evaluation Related Work Conclusion

Bounded-time VM MigrationI Guaranteed bound on VM migration timeI Independent of memory size, application behaviour

Backup Server

Spot

VM

Continuouslycheckpoint memory

On-demand

VM

Lazily restore memory

I Residual dirty pages sent in bounded time

I Lazy restore: quickly restore VM

10/ 20

Page 16: SpotCheck: Designing a Derivative IaaS Cloud on the Spot Market

Motivation System Design Evaluation Related Work Conclusion

Bounded-time VM MigrationI Guaranteed bound on VM migration timeI Independent of memory size, application behaviour

Backup Server

Spot

VM

Continuouslycheckpoint memory

On-demand

VM

Lazily restore memory

I Residual dirty pages sent in bounded time

I Lazy restore: quickly restore VM

10/ 20

Page 17: SpotCheck: Designing a Derivative IaaS Cloud on the Spot Market

Motivation System Design Evaluation Related Work Conclusion

Bounded-time VM MigrationI Guaranteed bound on VM migration timeI Independent of memory size, application behaviour

Backup Server

Spot

VM

Continuouslycheckpoint memory

On-demand

VM

Lazily restore memory

I Residual dirty pages sent in bounded time

I Lazy restore: quickly restore VM10/ 20

Page 18: SpotCheck: Designing a Derivative IaaS Cloud on the Spot Market

Motivation System Design Evaluation Related Work Conclusion

Nested VirtualizationProblem:

I Bounded-time VM migration needs hypervisor modificationI IaaS clouds don’t allow hypervisor modifications

Solution: Nested VirtualizationI User VMs run on a nested hypervisor (XenBlanket)I Bounded-time migration, VM lazy restore implemented inside

nested hypervisor

IaaS HypervisorCloud server

11/ 20

Page 19: SpotCheck: Designing a Derivative IaaS Cloud on the Spot Market

Motivation System Design Evaluation Related Work Conclusion

Nested VirtualizationProblem:

I Bounded-time VM migration needs hypervisor modificationI IaaS clouds don’t allow hypervisor modifications

Solution: Nested VirtualizationI User VMs run on a nested hypervisor (XenBlanket)I Bounded-time migration, VM lazy restore implemented inside

nested hypervisor

IaaS HypervisorCloud server

Nested HypervisorUser VM

11/ 20

Page 20: SpotCheck: Designing a Derivative IaaS Cloud on the Spot Market

Motivation System Design Evaluation Related Work Conclusion

Spreading Revocation Risk

Spot pool

Spot pool

VM 1

VM 2

VM 3

VM 1

VM 2

VM 3

On-demand

VM 1

VM 2

VM 3

Problem: Revocation storms

I Spot pool revoked =⇒ migrate all VMs to on-demand

Solution: Multiple independent poolsI Revocations from one pool dont affect other pools

12/ 20

Page 21: SpotCheck: Designing a Derivative IaaS Cloud on the Spot Market

Motivation System Design Evaluation Related Work Conclusion

Spreading Revocation Risk

Spot poolSpot pool

VM 1

VM 2

VM 3

VM 1

VM 2

VM 3

On-demand

VM 1

VM 2

VM 3

Problem: Revocation stormsI Spot pool revoked =⇒ migrate all VMs to on-demand

Solution: Multiple independent poolsI Revocations from one pool dont affect other pools

12/ 20

Page 22: SpotCheck: Designing a Derivative IaaS Cloud on the Spot Market

Motivation System Design Evaluation Related Work Conclusion

Spreading Revocation Risk

Spot pool

Spot pool

VM 1

VM 2

VM 3

VM 1

VM 2

VM 3

On-demand

VM 1

VM 2

VM 3

Problem: Revocation stormsI Spot pool revoked =⇒ migrate all VMs to on-demand

Solution: Multiple independent poolsI Revocations from one pool dont affect other pools

12/ 20

Page 23: SpotCheck: Designing a Derivative IaaS Cloud on the Spot Market

Motivation System Design Evaluation Related Work Conclusion

Spreading Revocation Risk

Spot pool

Spot pool

VM 1

VM 2

VM 3

VM 1

VM 2

VM 3

On-demand

VM 1

VM 2

VM 3

Problem: Revocation stormsI Spot pool revoked =⇒ migrate all VMs to on-demand

Solution: Multiple independent poolsI Revocations from one pool dont affect other pools

12/ 20

Page 24: SpotCheck: Designing a Derivative IaaS Cloud on the Spot Market

Motivation System Design Evaluation Related Work Conclusion

Pool Management Policies

I Distribute VMs among multiple poolsI Example: availability zones, server sizes (small, large)I Spot prices in different pools uncorrelatedI Pool distribution policies :

1. Equally2. Weighted by pool cost3. Weighted by pool availability

I Centralized controller dynamically selects pool for new VMs.

13/ 20

Page 25: SpotCheck: Designing a Derivative IaaS Cloud on the Spot Market

Motivation System Design Evaluation Related Work Conclusion

Evaluation

SpotCheck prototype implemented on Amazon EC2

I Application PerformanceI Cost of SpotCheck VMsI Availability

I Benchmarks: SpecJBB, TPC-WI Experiments on EC2 m3.medium instancesI Spot prices from April to Oct 2014

14/ 20

Page 26: SpotCheck: Designing a Derivative IaaS Cloud on the Spot Market

Motivation System Design Evaluation Related Work Conclusion

Impact on Application PerformanceSpec-JBB 0.0015%TPC-W 16.7%

Performance degradation due to continuous checkpointing

1 10 20 30 40 50Num. VMs per backup server

02000400060008000

1000012000

Thro

ughp

ut (b

ops)

SpecJBB Throughput

I Backup server can support up to 40 VMsI Each VM shares 1

40th the cost of a backup server

15/ 20

Page 27: SpotCheck: Designing a Derivative IaaS Cloud on the Spot Market

Motivation System Design Evaluation Related Work Conclusion

Impact on Application PerformanceSpec-JBB 0.0015%TPC-W 16.7%

Performance degradation due to continuous checkpointing

1 10 20 30 40 50Num. VMs per backup server

02000400060008000

1000012000

Thro

ughp

ut (b

ops)

SpecJBB Throughput

I Backup server can support up to 40 VMsI Each VM shares 1

40th the cost of a backup server

15/ 20

Page 28: SpotCheck: Designing a Derivative IaaS Cloud on the Spot Market

Motivation System Design Evaluation Related Work Conclusion

Cost of Running VMs on SpotCheck

1-Pool 2-Pools 4-Pools Equal Distributed

4-Pools Cost

4-Pools Stability

0.00

0.01

0.02

0.03

0.04

0.05

Avera

ge c

ost

per

hour

($) Live migration Bounded time migration

$0.014$0.0086

I Cost with different pool management policies is ~$0.014/hrI Cost saving of 80% compared to on-demand

16/ 20

Page 29: SpotCheck: Designing a Derivative IaaS Cloud on the Spot Market

Motivation System Design Evaluation Related Work Conclusion

Availability

1-Pool 2-Pools 4-Pools Equal Distributed

4-Pools Cost

4-Pools Stability

0.00

0.05

0.10

0.15

0.20

Unavaila

bili

ty (

%)

SpotCheck with Full restore SpotCheck with Lazy restore

0.002%0.019%

0.16%

0.02%

I Availability of 99.998% relative to on-demand serversI Downtime (~20 seconds) during migrations

17/ 20

Page 30: SpotCheck: Designing a Derivative IaaS Cloud on the Spot Market

Motivation System Design Evaluation Related Work Conclusion

Related Work

Spot instances: [Javadi et al., 2011, Ben-Yehuda et al., 2011]Derivative Clouds: PiCloud, HerokuNested Virtualization: XenBlanket [Williams et al., 2012],

Turtles [Ben-Yehuda et al., 2010]Bounded time Migration: Yank [Singh et al., 2013],

Remus [Cully et al., 2008]Lazy VM restoration: Post-Copy migration [Hines et al., 2009],

SnowFlock [Lagar-Cavilla et al., 2009]

18/ 20

Page 31: SpotCheck: Designing a Derivative IaaS Cloud on the Spot Market

Motivation System Design Evaluation Related Work Conclusion

Conclusion

I Different cost vs. availability tradeoffs in Infrastructure cloudsI Spot servers: Cheap but volatileI Derivative Cloud : abstraction layer between user and cloudI SpotCheck : Derivative cloud with spot & on-demand serversI Prototyped on Amazon EC2I Cost savings of 80%, 5 9’s availability

19/ 20

Page 32: SpotCheck: Designing a Derivative IaaS Cloud on the Spot Market

Motivation System Design Evaluation Related Work Conclusion

END

Thank You

20/ 20

Page 33: SpotCheck: Designing a Derivative IaaS Cloud on the Spot Market

Spot price spikes

0 50 100 150 200Time

0.0

0.1

0.2

0.3

0.4

0.5

Pri

ce (

$/h

r)

Spot price On-demand

1/ 12

Page 34: SpotCheck: Designing a Derivative IaaS Cloud on the Spot Market

Spot Prices

0.0 0.2 0.4 0.6 0.8 1.0Spot−price

Ondemand−priceratio

0.0

0.2

0.4

0.6

0.8

1.0Av

aila

bilit

y CD

F

m3.mediumm3.largem3.xlargem3.2xlarge

2/ 12

Page 35: SpotCheck: Designing a Derivative IaaS Cloud on the Spot Market

Time line

0 10 20 30 40 50

Time

0.00

0.02

0.04

0.06

0.08

0.10

Pri

ce (

$/h

r)

Spot price

bid price

3/ 12

Page 36: SpotCheck: Designing a Derivative IaaS Cloud on the Spot Market

Price Correlation

1 2 3 4 5 6 7 8 91

01

11

21

31

41

51

61

71

81

9

Zone IDs

123456789

10111213141516171819

Zon

e I

Ds

1.0

0.5

0.0

0.5

1.0

4/ 12

Page 37: SpotCheck: Designing a Derivative IaaS Cloud on the Spot Market

What if everybody starts using this?

I Public cloud server market is largeI More incentive to expand spot marketsI Bidding is truth revealingI Benefits both users and cloud providers

5/ 12

Page 38: SpotCheck: Designing a Derivative IaaS Cloud on the Spot Market

Cost of running VMs on SpotCheck

Factors which affect cost:

p :Probability of revocationE(S) :Expected Spot price based on price tracesD :On-demand priceB :Backup server cost = 1 large serverN :# VMs sharing a backup server = 40

Expected Cost of running a VM on SpotCheck:

E(Cost) = (1− p) · E (S) + p · D +BN

Expected Cost = 0.2× On-Demand = $ 0.014 / hour

6/ 12

Page 39: SpotCheck: Designing a Derivative IaaS Cloud on the Spot Market

Availability

Max. num. of concurrent revocationsN/4 N/2 3N/4 N

1-Pool 0 0 0 1.74 × 10−4

2-Pool 0 3.75 × 10−3 0 2.25 × 10−5

4-Pool 7.4 × 10−3 7.71 × 10−5 1.92 × 10−5 0

Probability of the maximum number of concurrent revocations fordifferent pools. N is the number of VMs.

7/ 12

Page 40: SpotCheck: Designing a Derivative IaaS Cloud on the Spot Market

Nested Virtualization overhead

I Nested Virtualization overhead is dependent on applicationcharacteristics

I Can use containers.I Migration mechanisms in containers need to be implemented

8/ 12

Page 41: SpotCheck: Designing a Derivative IaaS Cloud on the Spot Market

Storage & Networking

I Direct access to EBS and S3I Bridged or NAT networking inside nested hypervisorI VPC to isolate user VMs

9/ 12

Page 42: SpotCheck: Designing a Derivative IaaS Cloud on the Spot Market

Restore overhead

0 1 5 10Num. VMs being concurrently lazily restored

010203040506070

Resp

onse

tim

e (m

s)

TPC-W response time

10/ 12

Page 43: SpotCheck: Designing a Derivative IaaS Cloud on the Spot Market

Backup Slides

1. Spot pricing, CDF, avail.2. What if everyone starts using this. Market large. More

incentive for IaaS .3. Storage and networking. EBS/S3. Bridge or NAT nw

interfaces.4. Inter cloud operation. exploring practical challenges with

inter-cloud networking5. Nested overhead. Indeed significant for certain workloads. 1)

Implementations will mature. 2) Only use this because IaaSdoesnt expose migration functionality 3) Looking at usingOS-level virtualization and containers.

11/ 12

Page 44: SpotCheck: Designing a Derivative IaaS Cloud on the Spot Market

References

(2014).Heroku.http://www.heroku.com.

(2014).PiCloud.http://www.multyvac.com.

Ben-Yehuda, M., Day, M., Dubitzky, Z., Factor, M., Har’El, N.,Gordon, A., Liguori, A., Wasserman, O., and Yassour, B. (2010).The Turtles Project: Design and Implementation of NestedVirtualization.In OSDI.

Ben-Yehuda, O., Ben-Yehuda, M., Schuster, A., and Tsafrir, D.(2011).Deconstructing Amazon EC2 Spot Instance Pricing.In CloudCom.

Cully, B., Lefebvre, G., Meyer, D., Feeley, M., Hutchinson, N.,and Warfield, A. (2008).Remus: High Availability via Asynchronous Virtual MachineReplication.In NSDI.

Hines, M. R., Deshpande, U., and Gopalan, K. (2009).Post-copy Live Migration of Virtual Machines.SIGOPS Operating Systems Review, 43(3).

Javadi, B., Thulasiram, R., and Buyya, R. (2011).Statistical Modeling of Spot Instance Prices in Public CloudEnvironments.In UCC.

Lagar-Cavilla, H. A., Whitney, J. A., Scannell, A. M., Patchin, P.,Rumble, S. M., De Lara, E., Brudno, M., and Satyanarayanan, M.(2009).SnowFlock: Rapid Virtual Machine Cloning for Cloud Computing.In EuroSys.

Singh, R., Irwin, D., Shenoy, P., and Ramakrishnan, K. (2013).Yank: Enabling Green Data Centers to Pull the Plug.In NSDI.

Williams, D., Jamjoom, H., and Weatherspoon, H. (2012).The Xen-Blanket: Virtualize Once, Run Everywhere.In EuroSys.

12/ 12