EC2 Performance, Spot Instance ROI and EMR Scalability

Post on 06-Dec-2014

1.764 views 1 download

Tags:

description

The presentation accompanying my research into Amazon Web Services EC2 performance, Spot instance ROI and EMR (Hadoop) scalability.

Transcript of EC2 Performance, Spot Instance ROI and EMR Scalability

EC2 PERFORMANCE, SPOT INSTANCE ROI AND EMR SCALABILITY

Jesse Anderson

AMAZON WEB SERVICES (AWS)

Elastic Cloud Compute (EC2) Virtual Machine in Cloud

Simple Storage Service (S3) Network Share in Cloud

Elastic MapReduce (EMR) Cluster of EC2 instances for Hadoop cluster

EC2 PRICE TYPES

Spot Instances System for bidding on unused instances Same Performance Go away (abruptly) if outbid

On Demand Ad Hoc starting

Reserved Not Covered

SPOT INSTANCE SAVINGS

MILLION MONKEYS PROJECT

Randomly recreated Shakespeare Open source Good metric for CPU and memory

EC2 SPECIFICATIONS

Instance Name

Memory

EC2 Compute Units/Cores

Platform

I/O Performance

Small 1.7 GB 1 EC2 on 1 Core 32-bit Moderate

Large 7.5 GB 4 EC2 on 2 Cores 64-bit High

Extra Large 15 GB 8 EC2 on 8 Cores 64-bit High

High-CPU Medium

1.7 GB 5 EC2 on 2 Cores 32-bit Moderate

High-CPU Large 7 GB 20 EC2 on 8 Cores 64-bit High

Quad XL 23 GB 33.5 on 8 Cores 64-bit Very High

EC2 Compute Unit (ECU) – One EC2 Compute Unit (ECU) provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.

EC2 PERFORMANCE

My Core 2 Duo 2.66 GHZ did 50,000,000,000 character groups

EC2 COST PER HOUR ON DEMAND/SPOT

PRICE PER UNIT

EMR (HADOOP) CLUSTERING

Tests of 1, 2, 3, 4, 5, 10, 20 node clusters

Price Scalability

EMR COST

PRICE PER UNIT IN A CLUSTER

CLUSTERED CHARACTER GROUPS

EMR/HADOOP SCALABILITY PERCENTAGE

EMR/HADOOP SCALABILITY ABSOLUTE

BREAKDOWNS

Original project would have run in 3 days 9 hours Took 1.5 months before

20 node cluster costs $45.44 per day 5 day run cost $317 11 day run cost $528

ENGINEERING FOR THE CLOUD

Establish if a good fit Test the EC2 performance Figure out a unit or widget Find the most cost efficient EC2

performer with price per unit/widget Engineer with Spot Instances in mind

CONCLUSIONS

Spot Instance Saves From $2.20 to $1.30 per hour Saved $1,000 in one run

Hadoop/EMR Scalability 95% efficiency at 2-5 nodes 87% efficiency at 10 nodes 84% efficiency at 20 nodes