(SOV209) Introducing AWS Directory Service | AWS re:Invent 2014
Maximizing Amazon S3 Performance (STG304) | AWS re:Invent 2013
-
Upload
amazon-web-services -
Category
Technology
-
view
5.910 -
download
2
description
Transcript of Maximizing Amazon S3 Performance (STG304) | AWS re:Invent 2013
![Page 1: Maximizing Amazon S3 Performance (STG304) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022042607/55642528d8b42a6e298b5182/html5/thumbnails/1.jpg)
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
Maximizing Amazon S3 Performance
Craig Carl, AWS
November 15, 2013
![Page 2: Maximizing Amazon S3 Performance (STG304) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022042607/55642528d8b42a6e298b5182/html5/thumbnails/2.jpg)
Trillions Of Unique Customer Objects
Q4 2006
Q1 2007
Q2 2007
Q3 2007
Q4 2007
Q1 2008
Q2 2008
Q3 2008
Q4 2008
Q1 2009
Q2 2009
Q3 2009
Q4 2009
Q1 2010
Q2 2010
Q3 2010
Q4 2010
Q1 2011
Q2 2011
Q3 2011
Q4 2011
Q1 2012
Q2 2012
Q3 2012
Q4 2012
Q1 2013
Q2 2013
Q3 2013
![Page 3: Maximizing Amazon S3 Performance (STG304) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022042607/55642528d8b42a6e298b5182/html5/thumbnails/3.jpg)
1.5 Million+ Peak Transactions Per Second
![Page 4: Maximizing Amazon S3 Performance (STG304) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022042607/55642528d8b42a6e298b5182/html5/thumbnails/4.jpg)
Architecture Optimizing PUTs Optimizing GETs
Choosing a region
Building a naming scheme
Considering LISTs
Multipart upload Using CloudFront
Range-based GETs
![Page 5: Maximizing Amazon S3 Performance (STG304) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022042607/55642528d8b42a6e298b5182/html5/thumbnails/5.jpg)
Choosing a Region
• Performance – Proximity to your users
– Co-locating with compute, other AWS resources
• Other things to think about – Legal and regulatory requirements
– Costs vary by region
![Page 6: Maximizing Amazon S3 Performance (STG304) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022042607/55642528d8b42a6e298b5182/html5/thumbnails/6.jpg)
Pay Attention to Your Naming Scheme If:
• You want consistent performance from a bucket
• You want a bucket capable of routinely
exceeding 100 TPS
http://amzn.to/18oF5LC
![Page 7: Maximizing Amazon S3 Performance (STG304) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022042607/55642528d8b42a6e298b5182/html5/thumbnails/7.jpg)
Transactions Per Second (TPS)
1 2
5 8
100/8 = 12.5 events/sec
100,000 users @ 10 events an hour = 224 TPS
![Page 8: Maximizing Amazon S3 Performance (STG304) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022042607/55642528d8b42a6e298b5182/html5/thumbnails/8.jpg)
Distributing Key Names
• Don’t do this
<my_bucket>/2013_11_13-164533125.jpg <my_bucket>/2013_11_13-051033564.jpg <my_bucket>/2013_11_13-061133789.jpg <my_bucket>/2013_11_13-051033458.jpg <my_bucket>/2013_11_12-063433125.jpg <my_bucket>/2013_11_12-021033564.jpg <my_bucket>/2013_11_12-065533789.jpg <my_bucket>/2013_11_12-011033458.jpg <my_bucket>/2013_11_11-022333125.jpg <my_bucket>/2013_11_11-153433564.jpg <my_bucket>/2013_11_11-065233789.jpg <my_bucket>/2013_11_11-065633458.jpg
![Page 9: Maximizing Amazon S3 Performance (STG304) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022042607/55642528d8b42a6e298b5182/html5/thumbnails/9.jpg)
Distributing Key Names
• Add randomness to the beginning of the key name
<my_bucket>/521335461-2013_11_13.jpg <my_bucket>/465330151-2013_11_13.jpg <my_bucket>/987331160-2013_11_13.jpg <my_bucket>/465765461-2013_11_13.jpg <my_bucket>/125631151-2013_11_13.jpg <my_bucket>/934563160-2013_11_13.jpg <my_bucket>/532132341-2013_11_13.jpg <my_bucket>/565437681-2013_11_13.jpg <my_bucket>/234567460-2013_11_13.jpg <my_bucket>/456767561-2013_11_13.jpg <my_bucket>/345565651-2013_11_13.jpg <my_bucket>/431345660-2013_11_13.jpg
![Page 10: Maximizing Amazon S3 Performance (STG304) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022042607/55642528d8b42a6e298b5182/html5/thumbnails/10.jpg)
Other Techniques for Distributing Key Names
• Store objects as a hash of their name – add the original name as metadata
• “deadmau5_mix.mp3” 0aa316fb000eae52921aab1b4697424958a53ad9
– watch for duplicate names!
– prepend keyname with short hash
• 0aa3-deadmau5_mix.mp3
• Epoch time (reverse) – 5321354831-deadmau5_mix.mp3
![Page 11: Maximizing Amazon S3 Performance (STG304) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022042607/55642528d8b42a6e298b5182/html5/thumbnails/11.jpg)
Randomness in a Key Name Can Be an Anti-Pattern
• Lifecycle policies
• LISTs with prefix filters
• Maintaining thumbnails of images – craig.jpg -> stored as orig-09329jed0fc
– thumb-09329jed0fc
• When you need to recover a file with its original
name
![Page 12: Maximizing Amazon S3 Performance (STG304) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022042607/55642528d8b42a6e298b5182/html5/thumbnails/12.jpg)
Solving for the Anti-Pattern
• Add additional prefixes to help sorting
• Amazon S3 maintains keys lexicographically in its
internal indices
<my_bucket>/images/521335461-2013_11_13.jpg <my_bucket>/images/465330151-2013_11_13.jpg <my_bucket>/movies/293924440-2013_11_13.jpg <my_bucket>/movies/987331160-2013_11_13.jpg <my_bucket>/thumbs-small/838434842-2013_11_13.jpg <my_bucket>/thumbs-small/342532454-2013_11_13.jpg <my_bucket>/thumbs-small/345233453-2013_11_13.jpg <my_bucket>/thumbs-small/345453454-2013_11_13.jpg
![Page 13: Maximizing Amazon S3 Performance (STG304) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022042607/55642528d8b42a6e298b5182/html5/thumbnails/13.jpg)
Distributing Your Key Names Is Always a Good Idea!
It can take some time for improvements to manifest
Open a support case if you need an immediate bump
or if you’ve got any questions!
http://amzn.to/18oF5LC
![Page 14: Maximizing Amazon S3 Performance (STG304) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022042607/55642528d8b42a6e298b5182/html5/thumbnails/14.jpg)
Amazon CloudFront
![Page 15: Maximizing Amazon S3 Performance (STG304) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022042607/55642528d8b42a6e298b5182/html5/thumbnails/15.jpg)
Using Amazon CloudFront for Distribution
• Caches objects from Amazon S3
• Reduces the number of Amazon S3 GETs
• Low latency with multiple endpoints
• High transfer rate
• Two flavors: – Web distribution (static content)
– RTMP distribution (on-demand streaming of media)
![Page 16: Maximizing Amazon S3 Performance (STG304) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022042607/55642528d8b42a6e298b5182/html5/thumbnails/16.jpg)
Multipart Upload Provides Parallelism
• Allows faster, more flexible uploads
• Allows you to upload a single object as a set of parts
• Upon upload, Amazon S3 then presents all parts as a single object
• Enables parallel uploads, pausing and resuming an object upload, and beginning uploads before you know the total object size
![Page 17: Maximizing Amazon S3 Performance (STG304) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022042607/55642528d8b42a6e298b5182/html5/thumbnails/17.jpg)
Choose the Right Part Size
• Strike a balance between part size and number of parts
– Lots of small parts increase connection overhead, invalidating the benefits
of parallelism
– Too few large parts don’t get you enough benefits of multipart; don’t get you
resiliency to network errors
• We recommend parts of 25–50 MB on higher-bandwidth networks and parts of 10 MB on mobile networks
![Page 18: Maximizing Amazon S3 Performance (STG304) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022042607/55642528d8b42a6e298b5182/html5/thumbnails/18.jpg)
You Can Parallelize Your GETs, Too
• Use range-based GETs to get multithreaded performance when downloading objects
• Compensates for unreliable networks
• Benefits of multithreaded parallelism
• Align your ranges with your parts!
![Page 19: Maximizing Amazon S3 Performance (STG304) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022042607/55642528d8b42a6e298b5182/html5/thumbnails/19.jpg)
If you’re using SSL and parallelizing…
• You’re likely to become CPU-constrained because encryption is CPU-intensive
• Amazon S3 recommends using AES-256 to optimize for security and performance
• You can leverage AES-NI hardware on your host to improve your performance
![Page 20: Maximizing Amazon S3 Performance (STG304) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022042607/55642528d8b42a6e298b5182/html5/thumbnails/20.jpg)
If Your Application Relies on LIST…
• Getting the objects your customers have stored
• Seeing sets of files (all animations, videos)
• Getting logs
• Viewing inventories
• Sorting keys based on metadata
![Page 21: Maximizing Amazon S3 Performance (STG304) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022042607/55642528d8b42a6e298b5182/html5/thumbnails/21.jpg)
What Should You Do?
• Parallelize LIST when you need a sequential list of your keys
• You should build a secondary index of your keys, such as with Amazon DynamoDB, to get a faster alternative to LIST when a sequential list isn’t sufficient – Sorting by metadata
– Looking up by category
– Objects by time stamp
![Page 22: Maximizing Amazon S3 Performance (STG304) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022042607/55642528d8b42a6e298b5182/html5/thumbnails/22.jpg)
LIST Operations with Amazon DynamoDB
• Maintain metadata in DynamoDB – Keep data about what’s in your buckets in DynamoDB
• On PUTs, enter data about your objects in DynamoDB
• On GETs, use DynamoDB to assist in your search for specific objects
• You can use DynamoDB to give you “LIST” based on specific criteria
![Page 23: Maximizing Amazon S3 Performance (STG304) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022042607/55642528d8b42a6e298b5182/html5/thumbnails/23.jpg)
Wrap up: Maximizing Amazon S3 Performance
Architecture Optimizing PUTs Optimizing GETs
Choosing a region
Building a naming scheme
Considering LISTs
Multipart upload Using CloudFront
Range-based GETs
![Page 24: Maximizing Amazon S3 Performance (STG304) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022042607/55642528d8b42a6e298b5182/html5/thumbnails/24.jpg)
Please give us your feedback on this
presentation
As a thank you, we will select prize
winners daily for completed surveys!
STG304