AWS re:Invent 2016: Save up to 90% and Run Production Workloads on Spot - Featuring IFTTT and Mapbox...

Post on 06-Jan-2017

498 views 0 download

Transcript of AWS re:Invent 2016: Save up to 90% and Run Production Workloads on Spot - Featuring IFTTT and Mapbox...

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Boyd McGeachie, Business Development

November 29, 2016

Save Up to 90% and

Run Production Workloads on Spot Featuring IFTTT and Mapbox

CMP307

On-Demand

Pay for compute

capacity by the hour

with no long-term

commitments

For spiky workloads,

or to define needs

Amazon EC2 Consumption Models

Reserved

Make a low, one-time

payment and receive

a significant discount

on the hourly charge

For committed

utilization

Spot

Bid for unused

capacity, charged at a

Spot Price which

fluctuates based on

supply and demand

For time-insensitive

or transient

workloads

With Spot ,the rules are simple

Markets where the price of compute changes based on

supply and demand.

You’ll never pay more than your bid. When the market exceeds your bid, you get 2 minutes to

wrap up your work.

$0.27 $0.29$0.50

1b 1c1a

8XL

$0.30 $0.16$0.214XL

$0.07 $0.08$0.082XL

$0.05 $0.04$0.04XL

$0.01 $0.04$0.01L

C3

$1.76

On

Demand

$0.88

$0.44

$.22

$0.11

Show me the markets!

Each instance family

Each instance size

Each Availability Zone

In every region

Is a separate Spot Market

50% Bid

75% Bid

You pay the

market

price

Bid Price Vs Market Price

25% Bid

EC2 Best practices

Fault tolerance

for Spot

Stateless Multi-AZ Loosely coupledInstance Flexibility

Maps

Directions Geocoding

Mobile

Developer tools

Analysis

Mapbox maps power over 5,000

platforms ranging from social to

mobility apps

21,014,573,573 probes per week

605,940,678 miles per week

S3 BucketAmazon

DynamoDBMobile Apps

Amazon

Route 53

Amazon

Kinesis

Processing

Persistence

APIAuto Scaling

groupSpot Fleet

Auto Scaling

groupSpot Fleet

Consumption of compute hours

has increased by 1044% since

last year

We will do over 500 million

hours of compute this year

But margins increased

Q: How much did you have to

change in order to switch to Spot?

A: Following general best

practices makes it easy to use

Spot Instances.

Best Practices

• Diversify across Availability Zones, regions, and the types of

instances you use to ensure stability

• Reduce your cold start time

• Break up large jobs into smaller pieces

Diversify for stability

• 10 completely isolated regions, multiple Availability Zones, and

multiple instance types

• The Spot market is segmented by these dimensions, so each one

adds an additional layer of protection against price spikes and

interruptions

• Use Spot Fleet: bid on multiple types of instances

• ECS makes it easy to deploy applications across multiple types of

instances

Spotswap

• Manages Spot priceouts for a Spot Fleet by activating backup On-

Demand capacity.

• Polls the termination notification endpoint on each instance and tags

instances that are about to be terminated

• When enough Spot Instances are about to be terminated, we

automatically launch On-Demand instances to pick up the slack

• When Spot prices go back down, we shut down the more expensive

On-Demand instances

Reduce your cold start time

• Bake software into an AMI or Docker image for fast loading

• Minimize state on the EC2 using S3, DynamoDB, or Amazon Kinesis

• Goal should be to have your application bootstrap in less than 2

minutes

• ECS helps us quickly deploy applications and move them around to

different EC2 instances - seconds instead of minutes

Break up large jobs

Long running job?

Instance 1

15 hours

Break it up!

Instance 1

15 hours

And finish it faster

Instance 1

Instance 2

Instance 3

5 hours

Use the same number of compute hours, but how many EC2

instances you run decides how quickly the work gets done.

Minimize lost work when there is a failure

Long jobs

Short jobs

Failure

Example: how to break up a text file

• Take a text file with 10 million lines

• Divide number of lines by 10000

• Generate a SQS message for each

chunk of 10000 lines (1000

messages)

• Have your workers start reading at

the specified point in the file

• Deliver output to S3

Impact

• Spot interruptions are relatively rare for the instance types we use,

so the fallback is only triggered a 1-2 times per month.

• We are running on discounted Spot Instances more than 98% of the

time.

• On our maps service alone, this has resulted in an 90% savings on

our EC2 costs each month.

$$$ Gotchas

Spot enables you to run many more instances that you

normally would, so watch out for any small costs that add

up

Detailed monitoring at 1-minute granularity costs about

$0.005 per hour, but if you do 2 million hours, it adds up to

almost $10K

EBS volumes are still full price on Spot so use instance

store

Amazon EC2 Spot – in the wild

1) We make this easy, using

the Spot Bid Advisor.

2) With deliberate pool

selection and bidding, you

keep your Spot Instance as

long as you need to.

3) And with new features like

Spot Fleet diversified, we do

the heavy lifting for you...

Spot Bid Advisor – aws-spot-labs

Spot Fleet helps you

Launch Thousands of Spot Instanceswith one RequestSpotFleet call.

Get Best PriceFind the lowest priced horsepower that works for you.

or

Get Diversified ResourcesDiversify your fleet. Grow your availability.

And

Apply Custom WeightingCreate your own capacity unit based on your application

needs

It is easy! aws ec2 request-spot-fleet --spot-fleet-request-config file://config.json {

"IamFleetRole": "arn:aws:iam::781603563322:role/fleet-role", "TargetCapacity":

"100", "SpotPrice": "0.03", "ValidFrom": "2015-09-15T00:56:19Z", "ValidUntil":

"2016-09-14T07:00:00Z", "TerminateInstancesWithExpiration": true,

"LaunchSpecifications": [ { "ImageId": "ami-0d4cfd66", "InstanceType":

"c3.large", "WeightedCapacity": 2, "SubnetId": "subnet-d0dc51fb" }, { "ImageId":

"ami-0d4cfd66", "InstanceType": "c3.large", "WeightedCapacity": 2, "SubnetId":

"subnet-64531413" }, { "ImageId": "ami-0d4cfd66", "InstanceType": "c3.large",

"WeightedCapacity": 2, "SubnetId": "subnet-0b1b8052" }, { "ImageId": "ami-

0d4cfd66", "InstanceType": "c3.xlarge", "WeightedCapacity": 4, "SubnetId":

"subnet-d0dc51fb" }, { "ImageId": "ami-0d4cfd66", "InstanceType": "c3.xlarge",

"WeightedCapacity": 4, "SubnetId": "subnet-64531413" }, { "ImageId": "ami-

0d4cfd66", "InstanceType": "c3.xlarge", "WeightedCapacity": 4, "SubnetId":

"subnet-0b1b8052" }, { "ImageId": "ami-0d4cfd66", "InstanceType": "c3.4xlarge",

"WeightedCapacity": 16, "SubnetId": "subnet-d0dc51fb" }, { "ImageId": "ami-

0d4cfd66", "InstanceType": "c3.4xlarge", "WeightedCapacity": 16, "SubnetId":

"subnet-64531413" }, { "ImageId": "ami-0d4cfd66", "InstanceType": "c3.4xlarge",

"WeightedCapacity": 16, "SubnetId": "subnet-0b1b8052" }, { "ImageId": "ami-

0d4cfd66", "InstanceType": "c3.8xlarge", "WeightedCapacity": 32, "SubnetId":

"subnet-d0dc51fb" }, { "ImageId": "ami-0d4cfd66", "InstanceType": "c3.8xlarge",

"WeightedCapacity": 32, "SubnetId": "subnet-64531413" }, { "ImageId": "ami-

0d4cfd66", "InstanceType": "c3.8xlarge", "WeightedCapacity": 32, "SubnetId":

"subnet-0b1b8052" }, { "ImageId": "ami-0d4cfd66", "InstanceType": "c3.2xlarge",

"WeightedCapacity": 8, "SubnetId": "subnet-d0dc51fb" }, { "ImageId": "ami-

0d4cfd66", "InstanceType": "c3.2xlarge", "WeightedCapacity": 8, "SubnetId":

"subnet-64531413" }, { "ImageId": "ami-0d4cfd66", "InstanceType": "c3.2xlarge",

"WeightedCapacity": 8, "SubnetId": "subnet-0b1b8052" } ] }

An easy to use interface that

lets you launch spare EC2

instances in seconds

Helps you select and bid on the

EC2 instances that meet your

applications requirements

Simple to use dashboard lets

you modify and manage your

application’s compute capacity

EC2 Spot Console

Spot Fleet – Plays by your rules

Spot Fleet – Your preferences

Diversification with EC2 Spot Fleet

Multiple EC2 Spot instances

selected

Multiple Availability Zones

selected

Pick the instances with similar

performance characteristics, e.g.

c3.large, m3.large, m4.large,

r3.large, c4.large

One connection, countless possibilities

43 Million

9.5 Million

360+

1 Billion

80 Million

Applets created

Users on the platform

Services launched

Runs per month

Service activations

IFTTT Infrastructure 2014

• Ruby on Rails

• Dedicated web instances

• Dedicated worker instances

IFTTT Infrastructure 2014

• Experimenting with Spot bidding for workers

• Spot Auto Scaling groups with fixed bid prices

• Started to depend on Spot capacity

• ~50% savings over On-Demand

• Approach vulnerable to market fluctuations

Enter Spot Fleet

IFTTT on Spot Fleet

• Launch worker AMI via Spot Fleet into a mix of markets

• AMI configured to scale based on instance type

• Moved all workers over to Spot Fleet management

Good, but not great...

IFTTT Infrastructure Now

• All applications in Docker containers

• Containers scheduled by Mesos and Marathon

• Heterogenous (web containers alongside worker, etc.)

IFTTT Infrastructure Today

• Reserved instances for Zookeeper & leaders

• Cluster managed entirely by Spot Fleet

• Initially in 48 different Availability Zones

• Mesos instance AMI auto-boots and connects to cluster

• Mesos nodes advertise available resources

IFTTT Instance Weights

• Looked at memory and CPU usage across cluster

• Calculated max containers per instance type

• Set instance weights to relative calculations

• Allocation metrics reported to CloudWatch

IFTTT Results

• Estimated cost of 75% less than On-Demand

• Bin-packing adds extra savings over previous system

• Spot Fleet manages capacity automatically

• Infrastructure abstraction frees up developers

Thank you!

Remember to complete

your evaluations!