Data Storage for the Long Haul: Compliance and Archive

55
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Henry Zhang, Senior Product Manager, Amazon Glacier August 11, 2016 Data Storage for the Long Haul: Compliance and Archive

Transcript of Data Storage for the Long Haul: Compliance and Archive

Page 1: Data Storage for the Long Haul: Compliance and Archive

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Henry Zhang, Senior Product Manager, Amazon Glacier

August 11, 2016

Data Storage for the Long Haul:

Compliance and Archive

Page 2: Data Storage for the Long Haul: Compliance and Archive

AWS storage maturity

Amazon EFS

File

Amazon Elastic

Block Store

Amazon EC2

Instance Store

Block

Amazon S3 Amazon Glacier

Object

Data Transfer

AWS Direct

Connect

AWS

Snowball

ISV

Connectors

Amazon

Kinesis

Firehose

Amazon S3

Transfer

Acceleration

AWS Storage

Gateway

Page 3: Data Storage for the Long Haul: Compliance and Archive

Audio archives–SoundCloud

• World’s leading social sound platform

• Audio files transcoded and stored in multiple formats

• Stores petabytes (PBs) of data

• Transcoded files served from S3

• Originals moved to Amazon Glacier for long-term retention

Page 4: Data Storage for the Long Haul: Compliance and Archive

• Media distribution backbone (Ve.nue platform)

• Over-The-Top (OTT) broadcast service

• PBs of media assets

• Assets to be archived and retained for decades

Video archives ̶

Page 5: Data Storage for the Long Haul: Compliance and Archive

Patient data–Philips Healthcare

• HealthSuite digital platform powered by AWS

• 15 petabytes of patient data

• Archived for decades (beyond the lifetime of patients)

• Uses AWS HIPAA-eligible services in the BAA

Page 6: Data Storage for the Long Haul: Compliance and Archive

Public sector–King County

• Most populous county in Washington state

• Replaced tape solution for backup from 17 agencies

• Meets compliance requirement

• Saved $1MM in first year; no more tape refresh or

management churn

Page 7: Data Storage for the Long Haul: Compliance and Archive

Archive:

Data retained for the long term,

for compliance or potential

future reference

Data archiving needs are growing everywhere

• Media assets, 4K, 8K

• Health care/life sciences

• Financial services

• Regulated industries

• Oil and gas/geospatial

• Digital preservation

• Long-term backups

• Logs

Page 8: Data Storage for the Long Haul: Compliance and Archive

Traditional archiving approaches

• Storage arrays/disk arrays

• Tape silos/tape libraries

• Tape drives (LTO-X/DLT/etc.)

• Virtual tape libraries (VTLs)

• Tape out/vaulting

• Specialized software and

personnel

Page 9: Data Storage for the Long Haul: Compliance and Archive

How can AWS help with your archival?

Metered usage:

Pay as you go

No capital investment

No commitment

No risky capacity planning

Avoid risks of physical

media handling

Control your

geographic locality for

performance and

compliance

Page 10: Data Storage for the Long Haul: Compliance and Archive

Archive Options–Storage Tiers and Data Lifecycle

Page 11: Data Storage for the Long Haul: Compliance and Archive

Object storage options

S3 Standard

Active data Archive dataInfrequently accessed data

S3 Standard - Infrequent

Access

Amazon Glacier

Milliseconds 3-5 hoursMilliseconds

$0.03/GB/mo. $0.007/GB/mo.$0.0125/GB/mo.

Page 12: Data Storage for the Long Haul: Compliance and Archive

A closer look: S3-IA and Amazon Glacier

S3-IA

• Same durability and throughput as S3 Standard

• Instant access

• $0.01/GB on each data retrieval

Amazon Glacier

• Same 11 9s durability as S3 Standard

• 3-5 hour data retrieval latency

• Suitable for cold archive such as offsite tapes

S3 Standard - Infrequent

Access

Amazon Glacier

Page 13: Data Storage for the Long Haul: Compliance and Archive

- Transition Standard to Standard-IA

- Transition Standard-IA to Amazon Glacier

- Expiration lifecycle policy

- Versioning support

Data lifecycle management

T T+3 days T+5 days T+ 15 days T + 25 days T + 30 days T + 60 days T + 90 days T + 150 days T + 250 days T + 365 days

Data access frequency over time

Page 14: Data Storage for the Long Haul: Compliance and Archive

Set up lifecycle policy

Page 15: Data Storage for the Long Haul: Compliance and Archive

Transition older videos to Standard-IA

Page 16: Data Storage for the Long Haul: Compliance and Archive

Archive to S3-IA after 30 days

Lifecycle policy

Standard Storage->Standard-IA

<LifecycleConfiguration>

<Rule>

<ID>sample-rule</ID>

<Prefix>documents/</Prefix>

<Status>Enabled</Status>

<Transition>

<Days>30</Days>

<StorageClass>STANDARD-IA</StorageClass>

</Transition>

<Transition>

<Days>365</Days>

<StorageClass>GLACIER</StorageClass>

</Transition>

</Rule>

</LifecycleConfiguration>

Page 17: Data Storage for the Long Haul: Compliance and Archive

Archive to Amazon Glacier after 365 days

Lifecycle policy

Standard Storage->Standard-IA

<LifecycleConfiguration>

<Rule>

<ID>sample-rule</ID>

<Prefix>documents/</Prefix>

<Status>Enabled</Status>

<Transition>

<Days>30</Days>

<StorageClass>STANDARD-IA</StorageClass>

</Transition>

<Transition>

<Days>365</Days>

<StorageClass>GLACIER</StorageClass>

</Transition>

</Rule>

</LifecycleConfiguration>

Standard-IA Storage->Amazon Glacier

Page 18: Data Storage for the Long Haul: Compliance and Archive

Save money on storage

58% saving over S3 Standard

44% saving over S3 Standard-IA

* Assumes the highest public pricing tier

Page 19: Data Storage for the Long Haul: Compliance and Archive

Example backup software integration

• Commvault–Native integration with

S3 and Amazon Glacier

• Deduplication and encryption

• Single-console management

Amazon S3 Amazon Glacier

Page 20: Data Storage for the Long Haul: Compliance and Archive

Compliance Use Case 1–Regulatory Retention

Page 21: Data Storage for the Long Haul: Compliance and Archive

Amazon Glacier Vault Lock allows you to easily

set compliance controls on individual vaults and

enforce them via a lockable policy

Time-based retention

MFA authentication

Controls govern all

records in a vault

Immutable policy

Two-step locking

Compliance storage with Vault Lock

Page 22: Data Storage for the Long Haul: Compliance and Archive

Vault Lock for compliance storage

• Non-overwrite, non-erasable records

• Time-based retention with “ArchiveAgeInDays” control

• Policy lockdown (strong governance)

• Legal hold with vault-level tags

• Configure optional designated third-party access and grant

temporary access

Page 23: Data Storage for the Long Haul: Compliance and Archive

Amazon Glacier received a third-party assessment

from Cohasset Associates on how Amazon Glacier

with Vault Lock can be used to meet the requirements

of SEC Rule 17a-4(f) and CFTC 1.31(b)-(c).

Page 24: Data Storage for the Long Haul: Compliance and Archive

Example control: 1-year record retention

Page 25: Data Storage for the Long Haul: Compliance and Archive

Example control: 1-year record retention

Page 26: Data Storage for the Long Haul: Compliance and Archive

Vault Lock: Two-step locking

Page 27: Data Storage for the Long Haul: Compliance and Archive

Legal hold with vault-level tags

Page 28: Data Storage for the Long Haul: Compliance and Archive

Example control: Legal hold

Page 29: Data Storage for the Long Haul: Compliance and Archive

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Rich Sutton, VP of Engineering

Digital Risk, Social Media Security, and Compliance

Proofpoint SocialPatrol Archive

AWS Glacier and Vault Lock Use Case

Page 30: Data Storage for the Long Haul: Compliance and Archive

Proofpoint

• Cloud-based security and compliance for the enterprise:

threat research, email, mobile, social, digital risk

• Founded 2002, public in 2012

• $350M annual revenue, $3B market cap

• Huge AWS user

Page 31: Data Storage for the Long Haul: Compliance and Archive

Proofpoint SocialPatrol

Policy controls and enforcement for social

• Combats fraudulent brand impersonation

• Moderates content at scale

• Ensures compliance in publishing

• Integrates with social APIs

• 150+ classifiers using NLP and ML

• Text, links, images, meta data

• Ingesting >1M social posts per day

• Built in AWS

Page 32: Data Storage for the Long Haul: Compliance and Archive

Proofpoint SocialPatrol

How it works:

PFPT in AWS

Policy engine MySQL/C*/SolrEnterprise

Archive

“Awesome. Help me with retention by integrating with my existing email archive.”

Social

Page 33: Data Storage for the Long Haul: Compliance and Archive

Proofpoint SocialPatrol archiving integration

Imperfect …

Social != Email Every archive is

different

Requires internal

collaboration

Page 34: Data Storage for the Long Haul: Compliance and Archive

Proofpoint SocialPatrol Archive

SEC Rule 17a-4(f)-compliant archive, purpose-built for

social, enabled by Amazon Glacier and Vault Lock

PFPT in AWS

Policy engine MySQL/C*/SolrSocial

Amazon Glacier

& Vault Lock

Page 35: Data Storage for the Long Haul: Compliance and Archive

Proofpoint SocialPatrol Archive

The customer specifies the retention period in Proofpoint

Social:

Page 36: Data Storage for the Long Haul: Compliance and Archive

Proofpoint SocialPatrol Archive

Via AWS API we create a vault for that customer:

Page 37: Data Storage for the Long Haul: Compliance and Archive

Proofpoint SocialPatrol Archive

Via AWS API,

we lock the vault,

and specify policy

to observe a

legal hold via a tag.

Page 38: Data Storage for the Long Haul: Compliance and Archive

Proofpoint SocialPatrol Archive

As social content flows in, we record its purge date and

surface that to the user. Each piece of social content is an

archive in the vault.

Page 39: Data Storage for the Long Haul: Compliance and Archive

Proofpoint SocialPatrol Archive

Search UI uses

the copy of the data

we already had.

As archives expire,

we purge them.

Page 40: Data Storage for the Long Haul: Compliance and Archive

Proofpoint SocialPatrol Archive

• Legal hold can be put in place by Proofpoint Support

• Data can be exported from Amazon Glacier by

Proofpoint Support when necessary

• Amazon Glacier with Vault Lock allowed us to build a

product that complies with SEC Rule 17a-4(f) and CFTC

Rule 1.31(b)-(c)

What would it have cost for us to build a WORM data store,

get it certified, and scale it … ?

Page 41: Data Storage for the Long Haul: Compliance and Archive

Compliance Use Case 2–Auditing and Alerts

Page 42: Data Storage for the Long Haul: Compliance and Archive

Audit logging with AWS CloudTrail

• S3 and Amazon Glacier can log API

calls for audit via CloudTrail

• Enable CloudTrail in the AWS console

and designate your log bucket

• S3 logs bucket-level activities; object

activities supported via event notification

• Amazon Glacier logs all API calls for

vault and archives

Page 43: Data Storage for the Long Haul: Compliance and Archive

Access policy for a storage container

• Control access to a storage container in a single location

– S3 bucket or Amazon Glacier vault access policy

– Grant/revoke access to internal business units/teams

– “Marketing_Vault” has a distinct access policy from “DevOps_Vault”

• Easily manage cross-account access for your business partner

– Simply add a section for your business partner in the same policy

– Cross-account activities (API calls) also show up in CloudTrail logs

Page 44: Data Storage for the Long Haul: Compliance and Archive

S3 event notifications

Events

Amazon

SNS topic

Amazon

SQS

queue

AWS

Lambda

function

• Notification when objects are

created via PUT, POST, Copy, or

Multipart Upload, DELETE

• Filtering on prefixes and suffixes

for all types of notifications

Page 45: Data Storage for the Long Haul: Compliance and Archive

Request specific notifications

Request notifications on specific

PUT APIs

Request notifications on specific

DELETE APIs

s3:ObjectCreated:*

s3:ObjectCreated:Put

s3:ObjectCreated:Post

s3:ObjectCreated:Copy

s3:ObjectCreated:CompleteMultipartUpload

s3:ObjectRemoved:*

s3:ObjectRemoved:Delete

s3:ObjectRemoved:DeleteMarkerCreated

Page 46: Data Storage for the Long Haul: Compliance and Archive

Compliance Use Case 3–Geographic Redundancy

Page 47: Data Storage for the Long Haul: Compliance and Archive

Remote replicas managed

by separate AWS accounts

Secure

Distribute data to regional

customers

Lower Latency

Store hundreds of

miles apart

Compliance

S3 cross-region replicationAutomated, fast, and reliable asynchronous replication of data across AWS regions

Page 48: Data Storage for the Long Haul: Compliance and Archive

• Usual charges for

storage, requests, and

inter-region data transfer

for the replicated copy of

data

• Replicate into Standard-IA

or Amazon Glacier

Cost

HEAD operation on a source

object to determine replication

status

• Replicated objects will not be

re-replicated

• Use S3 COPY to replicate

existing objects

Replication status

DELETE without object

version ID• Marker replicated

DELETE specific object

version ID• Marker NOT replicated

Delete operation

Cross-region replication: Details

Object ACL updates are

replicated

• Objects with Amazon-

managed encryption key

replicated

• AWS KMS encryption not

replicated

Access control

Page 49: Data Storage for the Long Haul: Compliance and Archive

Versioning with cross-region replication

A

B

Vid1- v2

Vid1- v1

Key: A/vid1 Key: B/vid1

Vid1- v2

Vid1- v1

Vid1- v3Vid1- v3

Vid1- v4Vid1- v4

A

Page 50: Data Storage for the Long Haul: Compliance and Archive

Cross-region replication with lifecycle archiving

S3

Bucket A

Amazon Glacier

S3

Bucket B

Page 51: Data Storage for the Long Haul: Compliance and Archive

Snowball

• Accelerate PBs with AWS-

provided appliances

• NEW 80 TB model

Storage Gateway

• Instant hybrid cloud

• Up to 120 MB/s cloud upload rate

(4x improvement)

Data ingestion into AWS storage services

Firehose

• Ingest data streams directly into

AWS data stores

Direct Connect

• COLO to AWS

ISV Connectors

• Commvault

• Veritas

• etcetera

NEW S3 Transfer Acceleration

• Accelerate object transfer up to

300% using AWS’s private

network

Page 52: Data Storage for the Long Haul: Compliance and Archive

What is Snowball? Petabyte-scale data transport

E-ink shipping

label

Ruggedized

case

“8.5G Impact”

All data encrypted

end-to-end50 TB or 80 TB

10 G network

Rain & dust

resistant

Tamper-resistant

case & electronics

Page 53: Data Storage for the Long Haul: Compliance and Archive

Pricing

Dimension Price

Usage Charge per Job $250.00

Extra Day Charge (First 10 days* are free) $15.00

Data Transfer In $0.00/GB

Data Transfer Out $0.02/GB

Shipping** Varies

Amazon S3 Charges Standard storage and request

fees apply

* Starts one day after the appliance is delivered to you. The first day the appliance is received at your site and the last day the appliance is shipped out are also free

and not included in the 10-day free usage time.

** Shipping charges are based on your shipment destination and the shipping option (e.g., overnight, 2-day) you choose.

Transfer 1 PB with 13 devices

in parallel in 1 week!

Page 54: Data Storage for the Long Haul: Compliance and Archive

Remember to complete

your evaluations!

Page 55: Data Storage for the Long Haul: Compliance and Archive

Thank you!