AWS to Bare Metal: Motivation, Pitfalls, and Results
-
Upload
mongodb -
Category
Technology
-
view
135 -
download
0
Transcript of AWS to Bare Metal: Motivation, Pitfalls, and Results
![Page 1: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/1.jpg)
AWS CLOUDTO
BARE METAL
![Page 2: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/2.jpg)
Wish saved 35% on MongoDB costs
Improved latency by 20%
And reduced latency variance
![Page 3: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/3.jpg)
HI, I’M ADAM.(I’m a software engineer; I also run production…)
![Page 4: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/4.jpg)
I WORK AT WISH.(we’re a mobile eCommerce platform)
![Page 5: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/5.jpg)
I WORK AT WISH.(we also grow really fast…)
![Page 6: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/6.jpg)
AWS TO BARE METAL• The Why
• The Scope
• The Servers
• The Network
• The Operations
• The Results
![Page 7: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/7.jpg)
THE THEME
![Page 8: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/8.jpg)
The Why
![Page 9: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/9.jpg)
there was spinning disk EBS
In the beginning
![Page 10: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/10.jpg)
DB slows to a crawl
Replica set detects failureElection kills the app for 30s
App slows down
EBS LATENCY SPIKE
![Page 11: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/11.jpg)
Provisioned IOPS EBS launches
Summer 2012
![Page 12: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/12.jpg)
![Page 13: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/13.jpg)
But - super expensive!
![Page 14: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/14.jpg)
Maybe time for bare metal?
![Page 15: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/15.jpg)
So we modeled the costs…
![Page 16: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/16.jpg)
![Page 17: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/17.jpg)
The Scope
![Page 18: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/18.jpg)
![Page 19: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/19.jpg)
![Page 20: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/20.jpg)
?
![Page 21: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/21.jpg)
![Page 22: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/22.jpg)
The Servers
![Page 23: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/23.jpg)
Server Specs?
![Page 24: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/24.jpg)
![Page 25: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/25.jpg)
GOAL
Find lowest cost per query
for your workload
![Page 26: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/26.jpg)
THROUGHPUT & LATENCY
• Typically: more throughput → more latency
• Application dictates max latency (p95?)
• For each hardware config…
• Find highest throughput under max latency
![Page 27: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/27.jpg)
![Page 28: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/28.jpg)
![Page 29: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/29.jpg)
![Page 30: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/30.jpg)
THE WORKLOAD
• db.setProfilingLevel(2)
• Snapshot the DB volume
• Dump system.profile after 1 hour
![Page 31: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/31.jpg)
OUR TOOL
• Restore the snapshot
• Clear filesystem caches
• Replay ops at configured throughput
• Report on latency / MongoDB stats
![Page 32: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/32.jpg)
LATEST SPECS
• 2x Ivy Bridge 3.3 GHz (32 hyperthreads)
• 256 GB RAM
• 3.2 TB LSI WarpDrive PCI-e
YOUR M
ILEAGE M
AY VARY
!
![Page 33: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/33.jpg)
The Network
![Page 34: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/34.jpg)
NETWORKS ARE WEIRD
• Network engineering is weird for software people
• Need to master a few, big pieces
• We wasted a lot of time improvising…
![Page 35: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/35.jpg)
![Page 36: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/36.jpg)
PLAN TO FAIL• Every component and connection fails
• Switch dies?
• NIC dies?
• Switch ⟷ switch connection dies?
• DirectConnect dies?
![Page 37: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/37.jpg)
The Operations
![Page 38: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/38.jpg)
THE OPERATIONS
• Migration / Rollback• Backups• Processes• Documentation
![Page 39: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/39.jpg)
MIGRATION (PREP)
• Add new nodes to replica set
• hidden: true, priority: 0
• Wait for them to sync
![Page 40: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/40.jpg)
MIGRATION (READ-ONLY)
• Unhide nodes:
• hidden: false, priority: 0
![Page 41: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/41.jpg)
MIGRATION (READ-WRITE)
• Force primary into colo:
• hidden: false, priority: 2
![Page 42: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/42.jpg)
MIGRATION (DONE)
• Hide old AWS nodes:
• hidden: true, priority: 0
![Page 43: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/43.jpg)
ROLLBACK
• No big deal
• Adjust hidden/priority to move traffic back
![Page 44: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/44.jpg)
BACKUPS
• EBS snapshots rock!
• Hidden member in EC2 for backup
• Nice for DR too…
![Page 45: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/45.jpg)
PROCESSES
• No RackServer() API
• Ensure consistency:
• Checklists
• Verification tools
![Page 46: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/46.jpg)
DOCUMENTATION
• No DescribeInstances either…
• Consider life without AWS Management Console
• Worse: consider it being occasionally wrong
![Page 47: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/47.jpg)
DOCUMENTATION
• Wiremaps
• Network maps (IPs, VLANs, etc)
• Equipment specs
• Serial numbers
![Page 48: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/48.jpg)
The Results
![Page 49: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/49.jpg)
Big project - took about 6 months
![Page 50: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/50.jpg)
Savings made it worthwhile
![Page 51: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/51.jpg)
Bonus: it got faster!
![Page 52: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/52.jpg)
Budget a lot of time for learning
![Page 53: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/53.jpg)
Benchmark & validate your assumptions
![Page 54: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/54.jpg)
Obsess over the details
![Page 55: AWS to Bare Metal: Motivation, Pitfalls, and Results](https://reader031.fdocuments.in/reader031/viewer/2022032115/55b38bbabb61eb5d748b4612/html5/thumbnails/55.jpg)