SpotADAPT: Spot-Aware (re-)Deployment of Analytical Processing Tasks on Amazon EC2 by Dalia...
description
Transcript of SpotADAPT: Spot-Aware (re-)Deployment of Analytical Processing Tasks on Amazon EC2 by Dalia...
SpotADAPT: Spot-Aware (re-)Deployment of Analytical Processing
Tasks on Amazon EC2
by Dalia Kaulakiene, Aalborg University (Denmark)Christian Thomsen, Aalborg University (Denmark)
Torben Bach Pedersen, Aalborg University (Denmark)Ugur Çetintemel, Brown University (USA)
Tim Kraska, Brown University (USA)
SpotADAPT: Spot-Aware (re-)Deployment of Analytical Processing Tasks on Amazon EC2 2
Amazon Web Services EC2 cloud
Contract Price per hour (*)
Reserved instances 1-year or 3-year contract $0.0581
on-demand No contract $0.128
Spot instances No contract,Can be revoked
$0.0365 (**)
* c3.large instance type (Linux) in ap-northeast-1 region ** Average price in 1 week, Mar 23 - Mar 30, 2015
SpotADAPT: Spot-Aware (re-)Deployment of Analytical Processing Tasks on Amazon EC2 3
Amazon Spot market
• User bids for the machine– “I need 8 vCPUs machine in region A”– “maximum I will pay $0.5 per hour”
• If the spot price < $0.5:– The user gets an instance
• If (and when) the spotprice > $0.5:– AWS takes back an instance
ap-northeast-1c
ap-northeast-1a
SpotADAPT: Spot-Aware (re-)Deployment of Analytical Processing Tasks on Amazon EC2 4
ProblemsThe user needs to execute an analytical workload on spot instances
Hadoop job Data in Amazon S3
Problem1. Execution time is unknown
AWS organizes instances into 9 families with 4-5 instance types General purpose: T2, M4, M3 Compute optimized: C4, C3 Memory optimized: R3 GPU: G2 Storage optimized: I2, D2
Problem2. Execution cost is unknown (and varies)
7 regions, 2-4 availability zones in each Spot prices changes in real-time
SpotADAPT: Spot-Aware (re-)Deployment of Analytical Processing Tasks on Amazon EC2 5
SpotADAPT• Estimates execution time on AWS instances• Estimates execution price in AWS regions• Proposes deployment w/ optimization goals:
Fastest execution within budgetor Cheapest execution within time constraints
• Monitors execution • Proposes re-deployment if:
Instance is taken away by AWS Cheaper or faster deployment is available
SpotADAPT: Spot-Aware (re-)Deployment of Analytical Processing Tasks on Amazon EC2 6
Execution time estimation
1. Dataset size increase effect Increasing (sampled) input size Executing on same machine More micro-runs does not improve
accuracy! SpotADAPT takes few micro-runs
to estimate the time of large dataset
AWS instance family:Slowest machine (2 vCPUs)More powerful machine (4 vCPUs)… (.. vCPUs)Most powerful machine (32 vCPUs)
Wordcount
SpotADAPT: Spot-Aware (re-)Deployment of Analytical Processing Tasks on Amazon EC2 7
Execution time estimation (cont.)
2. Scale-up effect Increasing machine power (#
vCPUs) Using same dataset SpotADAPT takes 1 micro-
dataset, executes workload on few instance typesin the family, estimates the time of large dataset on all instances
3. Combine Estimate execution time on all
machines using large dataset
Wordcount
SpotADAPT: Spot-Aware (re-)Deployment of Analytical Processing Tasks on Amazon EC2 8
SpotADAPT flow
SpotADAPT: Spot-Aware (re-)Deployment of Analytical Processing Tasks on Amazon EC2 9
SpotADAPT. Step 1• Hadoop job
• Data: Bucket in AWS S3
• Optimization goals: Cheapest execution within time boundariesor Fastest execution within budget boundaries
SpotADAPT: Spot-Aware (re-)Deployment of Analytical Processing Tasks on Amazon EC2 10
SpotADAPT. Step 2
• Setup: Prepare data for micro-runs
for data size effect estimation for scale-up effect estimation
Execute micro runs for each AWS instance family: On base instance type – for data size effect On other instance types using one micro-dataset –
for scale-up effect
SpotADAPT: Spot-Aware (re-)Deployment of Analytical Processing Tasks on Amazon EC2 11
SpotADAPT. Step 2 (cont.)
• Execution time estimation: Data size effect
Scale-up effect
Combining both
Execution time (slowest instance, large dataset)
Scale factor (time on slower instance / time on 2x powerful instance)
Execution time (slowest instance, large dataset)Execution time (2x instance, large dataset)…Execution time (most powerful instance, large dataset)
AWS instance family:Slowest machine (2 vCPUs)More powerful machine (4 vCPUs)… (.. vCPUs)Most powerful machine (32 vCPUs)
SpotADAPT: Spot-Aware (re-)Deployment of Analytical Processing Tasks on Amazon EC2 12
SpotADAPT. Step 2 (cont.)
• Execution price estimation For each instance family For each instance type in the family For each region For each availability zone For on-demand For spot (assuming start time = current time)
SpotADAPT: Spot-Aware (re-)Deployment of Analytical Processing Tasks on Amazon EC2 13
SpotADAPT. Step 3• Initial deployment
Choose best combination: AWS region, zone Instance type Pricing model
For fastest execution:
1. Choose fastest instance2. Find the deployment which gives cheaper
execution than the budget3. If nothing found, choose second fastest, repeat
For cheapest execution:
1. Choose cheapest deployment2. If execution time exceeds the deadline, choose
second best deployment, repeat
SpotADAPT: Spot-Aware (re-)Deployment of Analytical Processing Tasks on Amazon EC2 14
SpotADAPT. Step 3 (cont.)• Adaptive (re-)Deployment:
When instance is taken back by Amazon(Out-of-bid re-deployment)
When prices in current region increase When prices in other region decrease
Aligned with optimization goals
SpotADAPT: Spot-Aware (re-)Deployment of Analytical Processing Tasks on Amazon EC2 15
Simulation
Fastest execution: SpotADAPT Oracle time Fast compute: Fast mem:
Cheapest execution: SpotADAPT Oracle time: Oracle time+price: Cheap vCPU:
default
oracle
• Workloads Wordcount Selfjoin
• Spot price traces Jan 8, 2015 – April 8, 2015 9 AWS regions, 21 availability zones in total
• Strategies:
SpotADAPT: Spot-Aware (re-)Deployment of Analytical Processing Tasks on Amazon EC2 16
Results (Fastest execution)
Budget <= $0.1
Wor
dcou
ntS
elfjo
in
Defa
ult s
trate
gies
FAI
L
Budget <= $0.5
Spot
ADAP
T ==
Ora
cle
SpotADAPT: Spot-Aware (re-)Deployment of Analytical Processing Tasks on Amazon EC2 17
Results (Cheapest execution)
Wor
dcou
nt
Deadline 9.5h
Sel
fjoin
Deadline 6hDeadline 1h
Spot
ADAP
T ==
Ora
cle
Chea
p vC
PU fa
ils 6
0% o
f tim
es
Spot
ADAP
T is
0.3
% m
ore
expe
nsiv
e
SpotADAPT: Spot-Aware (re-)Deployment of Analytical Processing Tasks on Amazon EC2 18
Results – Adaptive re-deployment
Initial deployment
Re-deployment
SpotADAPT: Spot-Aware (re-)Deployment of Analytical Processing Tasks on Amazon EC2 19
Summary• SpotADAPT estimates time on AWS instances
Only few micro-runs on some instances in the family• Estimates execution price in AWS regions
Using the most recent price is as good as knowing all future prices• Proposes deployment w/ optimization goals:
Fastest execution within budgetor Cheapest execution within time constraints
• Monitors execution • Proposes re-deployment if:
Instance is taken away by AWS Cheaper or faster deployment is available
SpotADAPT: Spot-Aware (re-)Deployment of Analytical Processing Tasks on Amazon EC2 20
Thank you!
SpotADAPT: Spot-Aware (re-)Deployment of Analytical Processing Tasks on Amazon EC2 21
Future work• Future work
More diverse workloads Larger input datasets
SpotADAPT: Spot-Aware (re-)Deployment of Analytical Processing Tasks on Amazon EC2 22
Backup slides• Setup time:
For 50GB Wordcount, for 80GB Selfjoin Setup time is ~ 15% of execution time on slowest machine
• Setup price: Setup price is ~ 50% of execution price on on-demand market