Deep Dive on Amazon S3

Post on 16-Apr-2017

1.912 views 0 download

Transcript of Deep Dive on Amazon S3

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Susan Chan

Senior Product Manager, Amazon S3

August 2016

Deep Dive on Amazon S3

Recent innovations on S3

Visibility & control

of your data

New storage

offeringMore data

ingestion options

• Standard -

Infrequent Access• Amazon CloudWatch

integration

• AWS CloudTrail integration

• New lifecycle policies

• Event notifications

• Bucket limit increases

• Read-after-write consistency

• IPv6 support

• AWS Snowball (80 TB)

• S3 Transfer Acceleration

• Amazon Kinesis Firehose

• Partner integration

Choice of storage classes on S3

Standard

Active data Archive dataInfrequently accessed data

Standard - Infrequent Access Amazon Glacier

File sync and share

+

consumer file

storage

Backup and archive +

disaster recovery

Long-retained

data

Use cases for Standard-Infrequent Access

Designed for 11 9s of

durability

Standard - Infrequent Access storage

Designed for

99.9% availability

Durable AvailableSame as Standard storage

High performance

• Bucket policies

• AWS Identity and Access

Management (IAM) policies

• Many encryption options

Secure

• Lifecycle management

• Versioning

• Event notifications

• Metrics

Integrated

• No impact on user

experience

• Simple REST API

Easy to use

- Directly PUT to Standard - IA

- Transition Standard to Standard - IA

- Transition Standard - IA to Amazon Glacier

storage

- Expiration lifecycle policy

- Versioning support

Standard - Infrequent Access storageIntegrated: Lifecycle management

Standard - Infrequent Access

Transition older objects to Standard - IA

Lifecycle policy

Standard Storage -> Standard - IA

<LifecycleConfiguration>

<Rule>

<ID>sample-rule</ID>

<Prefix>documents/</Prefix>

<Status>Enabled</Status>

<Transition>

<Days>30</Days>

<StorageClass>STANDARD-IA</StorageClass>

</Transition>

<Transition>

<Days>365</Days>

<StorageClass>GLACIER</StorageClass>

</Transition>

</Rule>

</LifecycleConfiguration>

Standard - Infrequent Access storage

Standard Storage -> Standard - IA

<LifecycleConfiguration>

<Rule>

<ID>sample-rule</ID>

<Prefix>documents/</Prefix>

<Status>Enabled</Status>

<Transition>

<Days>30</Days>

<StorageClass>STANDARD-IA</StorageClass>

</Transition>

<Transition>

<Days>365</Days>

<StorageClass>GLACIER</StorageClass>

</Transition>

</Rule>

</LifecycleConfiguration>

Standard - IA Storage -> Amazon Glacier

Standard - Infrequent Access storage

Lifecycle policy

S3 support for IPv6

Dual-stack endpoints support both IPv4 and IPv6

Same high performance

Integrated with most S3 features

Manage access with IPv6 addresses

Easy to adopt, just change your endpoint.

No additional charges

IPv6 - Getting started

Update your endpoint to

• virtual hosted style address

http://bucketname.s3.dualstack.aws-region.amazonaws.com

Or

• path style address

http://s3.dualstack.aws-region.amazonaws.com/bucketname

Restricting access by IP addresses

{ "Version": "2012-10-17",

"Id": "S3PolicyId1",

"Statement": [

{ "Sid": "IPAllow",

"Effect": "Allow",

"Principal": "*",

"Action": "s3:*",

"Resource": "arn:aws:s3:::examplebucket/*",

"Condition": {

"IpAddress": {"aws:SourceIp": "54.240.143.0/24"}

"NotIpAddress": {"aws:SourceIp": "54.240.143.188/32"} } } ] }

Bucket policy with IPv4

Updating bucket policy with IPv6

{ "Version": "2012-10-17",

"Id": "S3PolicyId1",

"Statement": [

{ "Sid": "IPAllow",

"Effect": "Allow",

"Principal": "*",

"Action": "s3:*",

"Resource": "arn:aws:s3:::examplebucket/*",

"Condition": {

"IpAddress": "aws:SourceIp":

[ "54.240.143.0/24", "2001:DB8:1234:5678::/64" ]}

"NotIpAddress": {"aws:SourceIp":

["54.240.143.128/30", "2001:DB8:1234:5678:ABCD::/80”]}}]}

John Brzozowski

Fellow and Chief Architect, IPv6

15 – COMCAST

IPV6 @ COMCAST

"Route 6 runs uncertainly from nowhere to nowhere, scarcely to be followed from one end to the other, except by some devoted eccentric”

George R. Stewart

AWS NYC 2016

16 – COMCAST

BACKGROUND

• The IPv6 program at Comcast began in 2005

• Seamlessness is a cornerstone of our program

• Motivation

• IPv4 is not adequate, could not support near or long term growth requirements

• IPv6 is inevitable

• Scope

• Everything, over time!

17 – COMCAST

THE FIRST IPV6 ONLY SERVICE…

• 98+% of devices are managed using IPv6 only

• Management use of IPv6 (only) is one of the largest deployments of IPv6 worldwide

• Trending towards 100% of all new and existing devices managed using IPv6 only, no IPv4

GROWTH

18 – COMCAST

BROADBAND

89%

19 – COMCAST

X1~50%

20 – COMCAST

NEXT…

• Minimizing and reducing IPv4 dependencies

• IPv6 is used to manage the majority (and growing) of our business needs today

• IPv6 utilization continues to grow• Currently ~30% of our Internet facing

communications is over IPv6

• Leverage IPv6 as a platform for innovation

21 – COMCAST

STAY TUNED…

Data ingestion into S3

S3 Transfer Acceleration

S3 BucketAWS Edge

Location

Uploader

Optimized

Throughput!

Typically 50%-400% faster

Change your endpoint, not your code

No firewall exceptions

No client software required

59 global edge locations

Rio DeJaneiro

Warsaw New York Atlanta Madrid Virginia Melbourne Paris LosAngeles

Seattle Tokyo Singapore

Tim

e [h

rs.]

500 GB upload from these edge locations to a bucket in Singapore

Public Internet

How fast is S3 Transfer Acceleration?

S3 Transfer Acceleration

Getting started

1. Enable S3 Transfer Acceleration on

your S3 bucket.

2. Update your endpoint to <bucket-name>.s3-accelerate.amazonaws.com.

3. Done!

How much will it help me?

s3speedtest.com

Tip: Parallelizing PUTs with multipart uploads

• Increase aggregate throughput by

parallelizing PUTs on high-bandwidth

networks

• Move the bottleneck to the network,

where it belongs

• Increase resiliency to network errors;

fewer large restarts on error-prone

networks

Best Practice

Incomplete multipart upload expiration policy

• Partial upload does incur storage charges

• Set a lifecycle policy to automatically make

incomplete multipart uploads expire after a

predefined number of days

Incomplete multipart

upload expiration

Best Practice

Enable policy with the AWS Management Console

Example lifecycle policy

<LifecycleConfiguration>

<Rule>

<ID>sample-rule</ID>

<Prefix>MyKeyPrefix/</Prefix>

<Status>rule-status</Status>

<AbortIncompleteMultipartUpload>

<DaysAfterInitiation>7</DaysAfterInitiation>

</AbortIncompleteMultipartUpload>

</Rule>

</LifecycleConfiguration>

Or enable a policy with the API

Tip #1: Use versioning

• Protects from accidental overwrites and

deletes

• New version with every upload

• Easy retrieval of deleted objects and roll

back to previous versions

Best Practice

Versioning

Tip #2: Use lifecycle policies

• Automatic tiering and cost controls

• Includes two possible actions:

• Transition: archives to Standard - IA or Amazon

Glacier based on object age you specified

• Expiration: deletes objects after specified time

• Actions can be combined

• Set policies at the bucket or prefix level

• Set policies for current version or non-

current versions

Lifecycle policies

Versioning + lifecycle policies

Expired object delete marker policy

• Deleting a versioned object makes a

delete marker the current version of the

object

• Removing expired object delete marker

can improve list performance

• Lifecycle policy automatically removes

the current version delete marker when

previous versions of the object no

longer exist

Expired object delete

marker

Enable policy with the console

Insert console screen shot

Tip #3: Restrict deletes

• Bucket policies can restrict deletes

• For additional security, enable MFA (multi-factor

authentication) delete, which requires additional

authentication to:• Change the versioning state of your bucket

• Permanently delete an object version

• MFA delete requires both your security credentials and a

code from an approved authentication device

Best Practice

<my_bucket>/2013_11_13-164533125.jpg<my_bucket>/2013_11_13-164533126.jpg<my_bucket>/2013_11_13-164533127.jpg<my_bucket>/2013_11_13-164533128.jpg<my_bucket>/2013_11_12-164533129.jpg<my_bucket>/2013_11_12-164533130.jpg<my_bucket>/2013_11_12-164533131.jpg<my_bucket>/2013_11_12-164533132.jpg<my_bucket>/2013_11_11-164533133.jpg<my_bucket>/2013_11_11-164533134.jpg<my_bucket>/2013_11_11-164533135.jpg<my_bucket>/2013_11_11-164533136.jpg

Use a key-naming scheme with randomness at the beginning for high TPS

• Most important if you regularly exceed 100 TPS on a bucket

• Avoid starting with a date

• Consider adding a hash or reversed timestamp (ssmmhhddmmyy)

Don’t do this…

Tip #4: Distribute key names

Distributing key names

Add randomness to the beginning of the key name…

<my_bucket>/521335461-2013_11_13.jpg<my_bucket>/465330151-2013_11_13.jpg<my_bucket>/987331160-2013_11_13.jpg<my_bucket>/465765461-2013_11_13.jpg<my_bucket>/125631151-2013_11_13.jpg<my_bucket>/934563160-2013_11_13.jpg<my_bucket>/532132341-2013_11_13.jpg<my_bucket>/565437681-2013_11_13.jpg<my_bucket>/234567460-2013_11_13.jpg<my_bucket>/456767561-2013_11_13.jpg<my_bucket>/345565651-2013_11_13.jpg<my_bucket>/431345660-2013_11_13.jpg

Remember to complete

your evaluations!

Thank you!