Deep Dive on Amazon Glacier Covering New Retrieval Features - December 2016 Monthly Webinar Series

33
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Mas Kubo, Senior Product Manager, Amazon Glacier December 12, 2016 Deep Dive on Amazon Glacier © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Transcript of Deep Dive on Amazon Glacier Covering New Retrieval Features - December 2016 Monthly Webinar Series

Page 1: Deep Dive on Amazon Glacier Covering New Retrieval Features - December 2016 Monthly Webinar Series

Mas Kubo, Senior Product Manager, Amazon Glacier

December 12, 2016

Deep Dive on Amazon Glacier

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Page 2: Deep Dive on Amazon Glacier Covering New Retrieval Features - December 2016 Monthly Webinar Series

Storing 20 PB and 1M+ hours of motion picture and television content, growing 1 PB per quarter

Single-copy on Glacier Over $10MM in savings Replaced legacy tape solution Higher performance, higher

durability, lower cost

Media Content Distribution – Sony DADC

Page 3: Deep Dive on Amazon Glacier Covering New Retrieval Features - December 2016 Monthly Webinar Series

HealthSuite digital platform powered by AWS

15 PB of patient data Archives patient records and medical

images produced across over 1,500 hospitals

Securely stored for decades (lifetime of patients)

Uses HIPAA-eligible AWS services

Patient data – Philips Healthcare

Page 4: Deep Dive on Amazon Glacier Covering New Retrieval Features - December 2016 Monthly Webinar Series

Batches and Streams

Direct Connect

Snowball, Snowball Edge,

Snowmobile

3rd Party Connectors

Transfer Acceleration

Storage Gateway

Kinesis Firehose

File

Amazon EFS

Block

Amazon EBS (persistent)

Object

Amazon GlacierAmazon S3 Amazon EC2 Instance Store

(ephemeral)

Page 5: Deep Dive on Amazon Glacier Covering New Retrieval Features - December 2016 Monthly Webinar Series

Data Storage Demand

Media assets, 4k, 8k Healthcare/life sciences Financial services Regulated industries Oil and gas/geospatial Digital preservation Longterm backups Logs

Solution Requirements: Secure and durable Scalable Cost-effective Flexible data access Compliant

Page 6: Deep Dive on Amazon Glacier Covering New Retrieval Features - December 2016 Monthly Webinar Series

Flexible Data Access

Three retrieval options from minutes to hours

Durable11 9s of durability (5 orders of

magnitude better than 2 copies on tape)

Management FeaturesVault Lock, Retrieval Policies,

CloudTrail

Cost-EffectiveStarting at $0.004 per GB

per month

SecureAll data encrypted at rest

ScalableFrom gigabytes to exabytes

Amazon Glacier

Page 7: Deep Dive on Amazon Glacier Covering New Retrieval Features - December 2016 Monthly Webinar Series

Amazon Glacier

Metered usage:pay as you go

No capital investmentNo commitment

No risky capacity planning

Avoid risks of physical media handling

Control your geographic locality for

performance and compliance

Page 8: Deep Dive on Amazon Glacier Covering New Retrieval Features - December 2016 Monthly Webinar Series

Key Terms and Concepts

Vaults – container for archives, up to 1,000 vaults per account

Archives – basic unit, write-once, 40 TB max, unlimited archives

Inventory – cold index of archives refreshed every 24 hours

1. Access – three ways to access Amazon Glacier

2. Uploads – multipart, lifecycle, cost optimizations, AWS Snowball

3. Data management – Vault Lock, tagging, audit logs

4. Retrievals – retrieval policies, range retrievals, new retrieval features

Page 9: Deep Dive on Amazon Glacier Covering New Retrieval Features - December 2016 Monthly Webinar Series

Accessing Amazon Glacier

1. Direct Amazon Glacier API/SDK2. Amazon S3 lifecycle integration3. Third-party tools and gateways

FastGlacier

Page 10: Deep Dive on Amazon Glacier Covering New Retrieval Features - December 2016 Monthly Webinar Series

Uploading data: Internet or sneaker-net

AWS DirectConnect

Dedicated bandwidth between your site and AWS

InternetTransfer data in a secure SSL tunnel

over the public Internet

SnowballSnowball Edge

SnowmobilePhysical transfer of media into

and out of AWS

Page 11: Deep Dive on Amazon Glacier Covering New Retrieval Features - December 2016 Monthly Webinar Series

Uploading data: archive descriptions

Use archive description field for metadata

If local index is corrupted or destroyed, use archive description to reconstruct critical mappings

For example, create index entry, add primary key to archive description on upload

Local Index Entry

Primary key: 12345Description: 2014AuditDept: FinanceDeptArchiveID: 9FG23…..…..

UploadArchive(data,ArchiveDescription=“12345, 2014Audit,FinanceDept”) -> Archive ID = 9FG23…..

Page 12: Deep Dive on Amazon Glacier Covering New Retrieval Features - December 2016 Monthly Webinar Series

Uploading data: optimizing costs

Every archive has 32 KB of associated overhead and some operations are charged per request

For archive size of 3.2 MB ~1% cost overheads

For 1 KB archive, 97% of cost would go to overhead

Solution is aggregation – recommend minimum size on the order of at least MBs

Page 13: Deep Dive on Amazon Glacier Covering New Retrieval Features - December 2016 Monthly Webinar Series

Checksum 2

Checksum 1

File 2

Checksum 3

. . .

Local indexFile 1 offset

File 1

File 2 offset

File 3 offset

Index/directory…

Checksum & metadataChecksum & metadata

Checksum & metadata

Archive

Uploading data: aggregating archives

Page 14: Deep Dive on Amazon Glacier Covering New Retrieval Features - December 2016 Monthly Webinar Series

Best practices: multipart uploadsImprove throughput, reliability, and get idempotency

1. InitiateMultipartUpload(partSize) → uploadId2. UploadPart(uploadId, data)3. CompleteMultipartUpload(uploadId) → archiveId

Archive

Parallel Uploads

Parts

Page 15: Deep Dive on Amazon Glacier Covering New Retrieval Features - December 2016 Monthly Webinar Series

Amazon Glacier: Amazon S3 lifecycle policies

Seamlessly move data from Amazon S3 to Amazon Glacier Automated lifecycle rules Transition based on object age

Page 16: Deep Dive on Amazon Glacier Covering New Retrieval Features - December 2016 Monthly Webinar Series

Amazon Glacier: Amazon S3 lifecycle policies

Object-level tagging for S3 objects

Apply lifecycle rules based on object tags

Example: transition objects to Amazon Glacier when 1 year old and have object tags ‘Project=Delta’ and ‘Data type=HPI’.

Page 17: Deep Dive on Amazon Glacier Covering New Retrieval Features - December 2016 Monthly Webinar Series

Management features: vault tagging

Page 18: Deep Dive on Amazon Glacier Covering New Retrieval Features - December 2016 Monthly Webinar Series

Management features: AWS CloudTrail

Enable AWS CloudTrail in console

Control plane events: vault activities

Data plane events:archive activities

Page 19: Deep Dive on Amazon Glacier Covering New Retrieval Features - December 2016 Monthly Webinar Series

Management features: vault access policies

Manage access to a vault in a single location – single AWS Identity and Access Management (IAM) policy Grant/revoke access to internal business units/teams “Marketing_Vault” has an access policy that is distinct from

“DevOps_Vault”

Easily manage cross-account access for your business partner Simply add a section for your business partner in the same policy

Page 20: Deep Dive on Amazon Glacier Covering New Retrieval Features - December 2016 Monthly Webinar Series

Management features: Vault Lock

Non-overwrite, non-erasable records

Time-based retention with “ArchiveAgeInDays” control

Policy lockdown (strong governance)

Legal hold with vault-level tags

Configure optional designated third-party access and grant temporary access

Page 21: Deep Dive on Amazon Glacier Covering New Retrieval Features - December 2016 Monthly Webinar Series

Vault Lock: two-step locking InitiateVaultLock

Effectuates a retention policy for testing (in-progress state) Returns a unique lock ID (expires after 24 hours)

AbortVaultLock Deletes an in-progress policy Ability to modify a policy before locking it down

CompleteVaultLock Locks down the vault with the appropriate lock ID A Vault Lock policy cannot be aborted once locked

Management features: Vault Lock

Page 22: Deep Dive on Amazon Glacier Covering New Retrieval Features - December 2016 Monthly Webinar Series

Set up a legal hold tag Configure a vault-level tag “LegalHold” Set initial value to “False”

Add compliance control for legal hold in a vault lock policy Deny delete archive operation From anybody (root, administrators, users, business partners) When LegalHold tag = “True”

Place or lift legal hold by updating the tag value

Legal hold with vault-level tagsManagement features: Vault Lock

Page 23: Deep Dive on Amazon Glacier Covering New Retrieval Features - December 2016 Monthly Webinar Series

Example control: legal holdManagement features: Vault Lock

Page 24: Deep Dive on Amazon Glacier Covering New Retrieval Features - December 2016 Monthly Webinar Series

Map one vault to a single retention range Group regulatory data by retention: 1-year vault, 6-year vault, etc.

Create a new vault and lock it before storing production data Enforce the full ArchiveAgeInDays on all new archives Leave no “gap” on existing archives

Thoroughly test a vault lock policy before locking it down (Abort/Initiate)

Implement only the most restrictive controls with Vault Lock Leave the flexible controls to vault access policy

Vault Lock best practicesManagement features: Vault Lock

Page 25: Deep Dive on Amazon Glacier Covering New Retrieval Features - December 2016 Monthly Webinar Series

Amazon Glacier received a third-party assessment from Cohasset Associates on how Amazon Glacier with Vault Lock can be used to meet the requirements of SEC 17a-4(f) and

CFTC 1.31(b)-(c)

Third-party assessmentManagement features: Vault Lock

Page 26: Deep Dive on Amazon Glacier Covering New Retrieval Features - December 2016 Monthly Webinar Series

Data retrievals: basic concepts

Initiate jobArchiveId: AE99F…Vault: Films -> Job ID

1

Retrieval Processing (minutes or hours depending on retrieval option)

2

3 Job completion notification

4 Download output

Page 27: Deep Dive on Amazon Glacier Covering New Retrieval Features - December 2016 Monthly Webinar Series

Data retrievals: restoring via lifecycle

1 2

Page 28: Deep Dive on Amazon Glacier Covering New Retrieval Features - December 2016 Monthly Webinar Series

Data retrievals: restoring via lifecycle

3

4

Page 29: Deep Dive on Amazon Glacier Covering New Retrieval Features - December 2016 Monthly Webinar Series

Data retrievals: data retrieval policies Provides transparency and cost control for data retrievals Governs all retrieval activities for an account in a region Synchronously accepts or rejects each retrieval request Accounts for inflight retrieval operations

Page 30: Deep Dive on Amazon Glacier Covering New Retrieval Features - December 2016 Monthly Webinar Series

Data retrievals: expedited and bulk retrievals

Expedited Standard Bulk

Data Access Time 1 - 5 minutes 3 - 5 hours 5 - 12 hours

Data Retrievals $0.03 per GB $0.01 per GB $0.0025 per GB

Retrieval Requests $0.01 per request $0.05 per 1,000 requests $0.025 per 1,000 requests

Expedited: designed for occasional urgent access to a small number of archives Standard: low-cost option for retrieving data in just a few hours Bulk: lowest cost option optimized for large retrievals, up to petabytes of data in

12 hours Three flexible and powerful retrieval options to access any of your Amazon

Glacier data

Page 31: Deep Dive on Amazon Glacier Covering New Retrieval Features - December 2016 Monthly Webinar Series

Data retrievals: expedited retrievals

Expedited: two types of requests On-demand: like EC2 On-Demand instances are available

the vast majority of the time Provisioned requests: guaranteed capacity

Provisioned capacity Guarantees expedited retrieval capacity is available when

needed Ensure at least 3 expedited requests every 5 minutes and

provides up to 150 MB/s of retrieval throughput $100 per month per unit

Page 32: Deep Dive on Amazon Glacier Covering New Retrieval Features - December 2016 Monthly Webinar Series

Thank you!

Page 33: Deep Dive on Amazon Glacier Covering New Retrieval Features - December 2016 Monthly Webinar Series

Q&A