AWS to Bare Metal: Motivation, Pitfalls, and Results

56
AWS CLOUD TO BARE METAL

Transcript of AWS to Bare Metal: Motivation, Pitfalls, and Results

Page 1: AWS to Bare Metal: Motivation, Pitfalls, and Results

AWS CLOUDTO

BARE METAL

Page 2: AWS to Bare Metal: Motivation, Pitfalls, and Results

Wish saved 35% on MongoDB costs

Improved latency by 20%

And reduced latency variance

Page 3: AWS to Bare Metal: Motivation, Pitfalls, and Results

HI, I’M ADAM.(I’m a software engineer; I also run production…)

Page 4: AWS to Bare Metal: Motivation, Pitfalls, and Results

I WORK AT WISH.(we’re a mobile eCommerce platform)

Page 5: AWS to Bare Metal: Motivation, Pitfalls, and Results

I WORK AT WISH.(we also grow really fast…)

Page 6: AWS to Bare Metal: Motivation, Pitfalls, and Results

AWS TO BARE METAL• The Why

• The Scope

• The Servers

• The Network

• The Operations

• The Results

Page 7: AWS to Bare Metal: Motivation, Pitfalls, and Results

THE THEME

Page 8: AWS to Bare Metal: Motivation, Pitfalls, and Results

The Why

Page 9: AWS to Bare Metal: Motivation, Pitfalls, and Results

there was spinning disk EBS

In the beginning

Page 10: AWS to Bare Metal: Motivation, Pitfalls, and Results

DB slows to a crawl

Replica set detects failureElection kills the app for 30s

App slows down

EBS LATENCY SPIKE

Page 11: AWS to Bare Metal: Motivation, Pitfalls, and Results

Provisioned IOPS EBS launches

Summer 2012

Page 12: AWS to Bare Metal: Motivation, Pitfalls, and Results
Page 13: AWS to Bare Metal: Motivation, Pitfalls, and Results

But - super expensive!

Page 14: AWS to Bare Metal: Motivation, Pitfalls, and Results

Maybe time for bare metal?

Page 15: AWS to Bare Metal: Motivation, Pitfalls, and Results

So we modeled the costs…

Page 16: AWS to Bare Metal: Motivation, Pitfalls, and Results
Page 17: AWS to Bare Metal: Motivation, Pitfalls, and Results

The Scope

Page 18: AWS to Bare Metal: Motivation, Pitfalls, and Results
Page 19: AWS to Bare Metal: Motivation, Pitfalls, and Results
Page 20: AWS to Bare Metal: Motivation, Pitfalls, and Results

?

Page 21: AWS to Bare Metal: Motivation, Pitfalls, and Results
Page 22: AWS to Bare Metal: Motivation, Pitfalls, and Results

The Servers

Page 23: AWS to Bare Metal: Motivation, Pitfalls, and Results

Server Specs?

Page 24: AWS to Bare Metal: Motivation, Pitfalls, and Results
Page 25: AWS to Bare Metal: Motivation, Pitfalls, and Results

GOAL

Find lowest cost per query

for your workload

Page 26: AWS to Bare Metal: Motivation, Pitfalls, and Results

THROUGHPUT & LATENCY

• Typically: more throughput → more latency

• Application dictates max latency (p95?)

• For each hardware config…

• Find highest throughput under max latency

Page 27: AWS to Bare Metal: Motivation, Pitfalls, and Results
Page 28: AWS to Bare Metal: Motivation, Pitfalls, and Results
Page 29: AWS to Bare Metal: Motivation, Pitfalls, and Results
Page 30: AWS to Bare Metal: Motivation, Pitfalls, and Results

THE WORKLOAD

• db.setProfilingLevel(2)

• Snapshot the DB volume

• Dump system.profile after 1 hour

Page 31: AWS to Bare Metal: Motivation, Pitfalls, and Results

OUR TOOL

• Restore the snapshot

• Clear filesystem caches

• Replay ops at configured throughput

• Report on latency / MongoDB stats

Page 32: AWS to Bare Metal: Motivation, Pitfalls, and Results

LATEST SPECS

• 2x Ivy Bridge 3.3 GHz (32 hyperthreads)

• 256 GB RAM

• 3.2 TB LSI WarpDrive PCI-e

YOUR M

ILEAGE M

AY VARY

!

Page 33: AWS to Bare Metal: Motivation, Pitfalls, and Results

The Network

Page 34: AWS to Bare Metal: Motivation, Pitfalls, and Results

NETWORKS ARE WEIRD

• Network engineering is weird for software people

• Need to master a few, big pieces

• We wasted a lot of time improvising…

Page 35: AWS to Bare Metal: Motivation, Pitfalls, and Results
Page 36: AWS to Bare Metal: Motivation, Pitfalls, and Results

PLAN TO FAIL• Every component and connection fails

• Switch dies?

• NIC dies?

• Switch ⟷ switch connection dies?

• DirectConnect dies?

Page 37: AWS to Bare Metal: Motivation, Pitfalls, and Results

The Operations

Page 38: AWS to Bare Metal: Motivation, Pitfalls, and Results

THE OPERATIONS

• Migration / Rollback• Backups• Processes• Documentation

Page 39: AWS to Bare Metal: Motivation, Pitfalls, and Results

MIGRATION (PREP)

• Add new nodes to replica set

• hidden: true, priority: 0

• Wait for them to sync

Page 40: AWS to Bare Metal: Motivation, Pitfalls, and Results

MIGRATION (READ-ONLY)

• Unhide nodes:

• hidden: false, priority: 0

Page 41: AWS to Bare Metal: Motivation, Pitfalls, and Results

MIGRATION (READ-WRITE)

• Force primary into colo:

• hidden: false, priority: 2

Page 42: AWS to Bare Metal: Motivation, Pitfalls, and Results

MIGRATION (DONE)

• Hide old AWS nodes:

• hidden: true, priority: 0

Page 43: AWS to Bare Metal: Motivation, Pitfalls, and Results

ROLLBACK

• No big deal

• Adjust hidden/priority to move traffic back

Page 44: AWS to Bare Metal: Motivation, Pitfalls, and Results

BACKUPS

• EBS snapshots rock!

• Hidden member in EC2 for backup

• Nice for DR too…

Page 45: AWS to Bare Metal: Motivation, Pitfalls, and Results

PROCESSES

• No RackServer() API

• Ensure consistency:

• Checklists

• Verification tools

Page 46: AWS to Bare Metal: Motivation, Pitfalls, and Results

DOCUMENTATION

• No DescribeInstances either…

• Consider life without AWS Management Console

• Worse: consider it being occasionally wrong

Page 47: AWS to Bare Metal: Motivation, Pitfalls, and Results

DOCUMENTATION

• Wiremaps

• Network maps (IPs, VLANs, etc)

• Equipment specs

• Serial numbers

Page 48: AWS to Bare Metal: Motivation, Pitfalls, and Results

The Results

Page 49: AWS to Bare Metal: Motivation, Pitfalls, and Results

Big project - took about 6 months

Page 50: AWS to Bare Metal: Motivation, Pitfalls, and Results

Savings made it worthwhile

Page 51: AWS to Bare Metal: Motivation, Pitfalls, and Results

Bonus: it got faster!

Page 52: AWS to Bare Metal: Motivation, Pitfalls, and Results

Budget a lot of time for learning

Page 53: AWS to Bare Metal: Motivation, Pitfalls, and Results

Benchmark & validate your assumptions

Page 54: AWS to Bare Metal: Motivation, Pitfalls, and Results

Obsess over the details

Page 55: AWS to Bare Metal: Motivation, Pitfalls, and Results
Page 56: AWS to Bare Metal: Motivation, Pitfalls, and Results

Thanks!

[email protected]