AWS re:Invent 2016: Deep Dive on Amazon Elastic Block Store (STG301)

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Rob Alexander, AWS Principal Solutions Architect

November 30, 2016

Deep Dive on Amazon

Elastic Block Store

STG301

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Who: Lead Software Development Engineers, Architects, and Technical PMs

Where: Storage Booth Walk-up Bar

When: Exhibit hours (Tues 5-7pm, Wed & Thurs 10:30a-6:00p)

What: Architecture best practices, code reviews, feature requests

Storage “Office Hours”Meet the People who Build AWS Storage

Storage service options

Amazon

Elastic Block Store

Amazon

Elastic File SystemAmazon S3

Block File Object

AWS

block storage

offerings

EC2

instance

store

What is Amazon EC2 instance store?

EC2 instances • Local to instance

• Non-persistent data store

• Data not replicated (by default)

• No snapshot support

• SSD or HDD

Physical Host

Instance Store

or

AWS

block storage

offerings

EC2

instance

store

sc1st1

io1gp2

EBS

SSD-backed

volumes

EBS

HDD-backed

volumes

What is EBS?

EBS

volume

EC2

instance

• Block storage as a service

• Create, attach volumes through an API

• Service accessed over the network

What is EBS?

EBS

volume

EC2

instance

!=

What is EBS?

EBS

volume

EC2

instance

What is EBS?

EBS

volume

Availability Zone

AWS Region

EC2

instance

What is EBS?

EBS

volume

EC2

instance

EC2

instance

• Volumes persist

independent of EC2

• Detach and attach between

instances

• Volume and instance must

be in the same AZ

Availability Zone

AWS Region

What is EBS?

EBS

boot

volume

Availability Zone

AWS Region

EC2

instance

EBS

data

volume

EBS

data

volume

• Volumes attach to one

instance at a time

• Many volumes can attach

to an instance

• Separate boot volume from

data volumes

What is EBS?

EBS

volume

Availability Zone Availability Zone

AWS Region

Replica

EBS is designed for:

What is EBS?

99.999% service availability

0.1% to 0.2% annual failure rate (AFR)

What is an EBS snapshot?

EBS

volume

Availability Zone

AWS Region

Amazon

S3EBS snapshot

Availability Zone

Replica

How does an EBS snapshot work?

EBS

volume

• Point-in-time backup of modified volume blocks

• Stored in S3, accessed via EBS APIs

• Subsequent snapshots are incremental

• Deleting snapshot will only remove data

exclusive to that snapshot

EBS

snapshot

What can you do with a snapshot?

EBS

volume

Availability Zone

AWS Region

EC2 instance

EBS snapshot

AMI


EBS

volume

Availability Zone

AWS Region

Amazon

S3EBS snapshot

Availability Zone

EBS

volume

Replica Replica


EBS

volume

Availability Zone

AWS Region

Amazon

S3EBS snapshot

EBS

volume

Availability Zone

AWS Region

EBS snapshot

Replica Replica


AWS Region

Public datasets on

AWS available as

EBS snapshots:

Availability Zone

EBS

volume

https://aws.amazon.com/public-data-sets/

• Genomic

• Census

• Global weather

• Transportation

Replica

What is an EBS-optimized instance?

EBS

volume

Availability Zone

AWS Region

EBS-optimized

EC2 instance


EBS

EC2

instances

Internet

Databases

~ 125 MB/s

S3

Shared

c3.2xlarge


EBS

EC2

instances InternetDatabasesc3.2xlarge

S3

~ 125 MB/s

Shared


More details:

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSOptimized.html

• Dedicated network bandwidth for EBS I/O

• Enabled by default on c4, d2, m4, p2, and x1 instances

• Can be enabled at instance launch or on a running instance

• Not an option on some 10 Gbps instance types

(c3.8xlarge, r3.8xlarge, i2.8xlarge)

What is EBS encryption?

Encryption

• Attach both encrypted and unencrypted

• No volume performance impact

• Any current generation instance

• Supported by all EBS volume types

• Snapshots also encrypted

• No extra cost

• Boot and data volumes can be encrypted

EBS volume types

EBS volume types

Hard disk driveSolid state drive

EBS volume types

General Purpose

SSD

gp2

Provisioned IOPS

SSD

io1

Throughput Optimized

HDD

st1

Cold

HDD

sc1

SSD HDD

Throughput?

orIOPS

Choosing an EBS volume type

What is more important to your workload:

Don’t know

Choosing an EBS volume

your workload

yet?

EBS volume types: I/O Provisioned

General Purpose SSD

gp2

Throughput: 160 MB/s

Latency: Single-digit ms

Capacity: 1 GB to 16 TB

Baseline: 3 IOPS per GB up to 10,000

Burst: 3,000 IOPS (for volumes up to 1 TB)

Great for boot volumes, low-latency applications, and bursty databases

Burst & baseline: General Purpose SSD (gp2)IO

PS

0 1 16

1,000

2,000

3,000

8,000

10,000

BASELINE IOPS(Baseline of 3 IOPS/GB)

Burstable to

3,000 IOPS

3 90.5

Volume size (TB)

~ 3334 GB

100 IOPS

Minimum

300 GB volume

Burst bucket: General Purpose SSD (gp2)

Max I/O credit per bucket is 5.4M

You can spend up to

3000 IOPS per second

Baseline performance = 3 IOPS per GiB or 100 IOPS

Always accumulating

3 IOPS per GiB per second

gp2

How long can I burst on gp2?

0

100

200

300

400

500

600

700

1 8 30 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950

Min

ute

s o

f B

urs

t

Volume size in GB

43 min 1 hour

10 hours

How do I monitor gp2 burst balance?

VolumeWriteOpsBurstBalance500 GB gp2 volume

900,000

write IOs

over 5 min =

3000 IOPS

450,000

write IOs

over 5 min =

1500 IOPS

Throughput?

orIOPS


What is more important to your workload:

i2

gp2 io1


Latency ?

< 1 ms Single-digit ms

Which is more important ?

Cost Performance

IOPS

≤ 65,000> 65,000

is more important

EBS volume types: I/O Provisioned

Provisioned IOPS SSD

io1

Baseline: 100 to 20,000 IOPS

Throughput: 320 MB/s

Latency: Single-digit ms


Ideal for critical applications and databases with sustained IOPS

Scaling Provisioned IOPS SSD (io1)IO

PS

0 2 16

1,000

5,000

10,000

15,000

20,000

6 90.4

MAX PROVISIONED IOPS(Maximum IOPS:GB ratio of 50:1)

Available Provisioned IOPS

Volume Size (TB)

~ 400 GB

i2

gp2 io1


Latency ?

< 1 ms Single-digit ms


Cost Performance

IOPS

≤ 65,000> 65,000

is more important

Throughput?

Throughputis more important

Small, random I/O Large, sequential I/O

i2

gp2 io1 st1

d2


Latency ?

< 1 ms Single-digit ms ≤ 1,250 MB/s

Aggregate throughput?

> 1,250 MB/s


Cost Performance

IOPS

≤ 65,000> 65,000

is more important


Cost Performance

EBS volume types: Throughput Provisioned

Throughput

Optimized HDD

st1

Baseline: 40 MB/s per TB up to 500 MB/s


Burst: 250 MB/s per TB up to 500 MB/s

Ideal for large-block, high-throughput sequential workloads

Throughput Optimized HDD – burst and base

0

100

200

300

400

500

600

0.5 1 2 4 6 8 10 12 14 16

Thro

ug

hput in

MB

/s

Volume Size in TB

Burst Base

320

ST1

Burst bucket: Throughput Optimized HDD (st1)

Max I/O bucket credit is 1 TB of

credit per TB in volume

You can spend up to

250 MB/s per TB

Baseline performance = 40 MB/s per TB

Always accumulating 40 MB/s per TB

st1

Up to 8 TB in I/O credit

Always accumulating 320 MB/s

You can spend up

to 500 MB/s

Burst bucket: example 8 TB st1 volume

Baseline performance = 320 MB/s

st1


Small, random I//O Large, sequential I/O

Which is more important?

Latency?

i2

gp2 io1 sc1 st1

d2


IOPS

≤ 65,000> 65,000



> 1,250 MB/s

is more important

Cost Performance


Cost Performance

Cold HDD

sc1

EBS volume types: Throughput Provisioned

Baseline: 12 MB/s per TB up to 192 MB/s


Burst: 80 MB/s per TB up to 250 MB/s

Ideal for sequential throughput workloads, such as logging and backup

Cold HDD – burst and base

0

50

100

150

200

250

300

0.5 1 2 4 6 8 10 12 14 16

Thro

ug

hput in

MB

/s

Volume size in TB

Burst Base

192

SC1

Burst bucket: Cold HDD (sc1)

Max I/O bucket credit is 1 TB of

credit per TB in volume

You can spend up to 80

MB/s per TB

Baseline performance = 12 MB/s per TB

Always accumulating 12 MB/s per TB

sc1


Small, random I/O Large, sequential I/O


Latency?

i2

gp2 io1 sc1 st1

d2


IOPS

≤ 65,000> 65,000



> 1,250 MB/s

is more important

Cost Performance


Cost Performance

I/O Provisioned Volumes Throughput Provisioned Volumes

sc1st1io1gp2

$0.10 per GB $0.125 per GB

$0.065 per PIOPS

* All prices are per month, and from the us-west-2 Region as of April 2016

$0.045 per GB $0.025 per GB

Snapshot storage for all volume types is $0.05 per GB per month

Hybrid volume use cases

c4

gp2

st1

STG205Case Study:

Librato’s Experience Running

Cassandra Using Amazon EBS

Data files

Commit log

i2


gp2 st1

STG311Case Study:

How Videology and Zendesk

Modernized Their Big Data Platforms on

Amazon EBS

Hot data

0–7 DaysWarm data

8–30 days

sc1

Cold data

31–60 days

Tiered Elasticsearch data:


st1

Case Study:

Info: https://aws.amazon.com/solutions/case-

studies/infor-ebs/

Transaction logs

“We’ve seen much stronger performance for our database

backup workloads with the Amazon EBS ST1 volumes, and

we’re also saving 75 percent on our monthly backup costs.”

Randy Young, Director of Cloud Operations, Infor

i2st1

Full backups

st1

Partial backups

SQL Server

Database

EBS

snapshots


gp2

st1

Amazon EMR Apache Hadoop

Example

Frameworks on YARN

HDFS

sc1

EMR cluster

instance

• Random, small I/O

• Shuffle, spill, and temp operations

• Large, sequential I/O

• Multiple volumes for more parallelism

or

Hadoop with multiple EBS volume types

[{

"classification":"yarn-site","properties":{

"yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage":"99.9","yarn.nodemanager.local-dirs":"/mnt/yarn,/mnt1/yarn"

}},{

"classification":"core-site","properties":{

"fs.s3.buffer.dir":"/mnt/s3,/mnt1/s3"}

},{

"classification":"hdfs-site","properties":{

"dfs.namenode.name.dir":"file:///mnt2/namenode,file:///mnt3/namenode,file:///mnt4/namenode","dfs.name.dir":"/mnt2/namenode,/mnt3/namenode,/mnt4/namenode","dfs.data.dir":"/mnt2/hdfs,/mnt3/hdfs,/mnt4/hdfs","dfs.datanode.data.dir":"file:///mnt2/hdfs,file:///mnt3/hdfs,file:///mnt4/hdfs"

}},{

"classification":"mapred-site","properties":{

"mapred.local.dir":"/mnt/mapred,/mnt1/mapred","mapreduce.cluster.local.dir":"/mnt/mapred,/mnt1/mapred"

}}

]

GP2 = mnt, mnt1 ST1 / SC1 = mnt2, mnt3, mnt4

EBS deep dive

EBS deep dive

Performance

How do we count I/Os for GP2 and IO1?

When possible, we merge sequential I/Os (up to 256 KB in

size)

...To minimize I/O charges on IO1

and maximize burst on GP2


Example 1: Random I/Os

• 4 random I/Os (i.e., non sequential I/Os)

• Each I/O 64 KB

Up to 256 KBEC2

instanceEBS

Counted as 4 I/Os


Example 2: Sequential I/O

• 4 sequential I/Os

• Each I/O 64 KB

Up to 256 KBEC2

instanceEBS

Counted as 1 I/O


Example 3: Large I/O

• 1 I/O

• 1024 KB

Up to 256 KBEC2

instanceEBS

Counted as 4 I/Os

How do we count I/Os for ST1 and SC1?

• When possible, we merge sequential I/Os (up to 1 MB in size)

• Workloads with primarily large, sequential I/Os perform best on

ST1 and SC1

• Ex: Big Data/EMR, Hadoop, Kafka, Log Processing, Data

Warehouses


Example 1: Random I/Os

• 4 random I/Os

• Each I/O 64 KB

Up to 1024 KBEC2

instanceEBS

Counted as 4 I/Os or 4 MB/s of burst


Example 2: Sequential I/O

• 4 sequential I/Os

• Each I/O 1024 KB

Up to 1024 KBEC2

instanceEBS

Counted as 4 I/Os or 4 MB/s of burst


Example 3: Mixed I/O

• 2 * 512 KB sequential I/Os

• 2 * 64 KB random I/Os

• 2 * 128 KB sequential I/Os

Up to 1024 KBEC2

instanceEBS

Counted as 4 I/Os or 4 MB/s of burst (but only ~ 1.4 MB of data transferred)

Burst balance for ST1 and SC1

0

20

40

60

80

100

120

0 1 2 3 4 5 6 7 8 9 10

Burs

t B

ala

nce %

Time in Hours

1 MB Sequential 16 KB Random

4 TB ST1 volume

1 MB Sequential:

500 MB/s for 3 hours

16 KB Random:

8 MB/s for 3 hours

Burst balance for ST1 and SC1

4 TB ST1 Volume

0

1000

2000

3000

4000

5000

6000

Data Transferred in GB

1 MB Sequential 16 KB Random

1 MB Sequential:

5.4 TB transferred

16 KB Random:

87 GB transferred

2046 sectors x 512 bytes/sector = ~1024 KiB

$ iostat –xm

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util

xvdf 0.00 0.20 0.00 523.40 0.00 523.00 2046.44 3.99 7.62 1.61 100.00

Verify workload I/O patterns

iostat for Linux

perfmon for Windows

128 KiB

Verify ST1 & SC1 workloads

Amazon

CloudWatch

Console

Under

64 KiB?


Small or

random IOPS

likely

CloudWatch

Console

Stuck

around

44 KiB?


CloudWatch

Console

Upgrade your Linux

kernel to at least 3.8

Responses

RequestsInstance

Ring queue

userspace process

kernel

request

queuenoop

deadline

cfq

scheduler

I/O requests in a Linux virtual world

I/O Driver Domain

Hypervisor

I/O requests in a Linux virtual world: 3.8+ kernel

Instance

EBS

userspace process

kernel

request

queue

scheduler

noop

deadline

cfq

pre 3.8:

44 KB

post 3.8:

128 KB to

1024 KB

I/O Driver Domain

Hypervisor

Up to 32 requests in queue

I/O requests in a Linux virtual world: 4.2+ kernel

Instance

EBS

userspace process

kernel

request

queue

per core

blk-mq

pre 3.8:

44 KB

post 3.8:

128 KB to

1024 KB

I/O Driver Domain

Hypervisor

Up to 32 requests in queue

ST1 & SC1: Linux performance tuning

Increase maximum request size:

• Recommended for ST1, SC1 on a 4.2+ Linux kernel

• Memory allocated per device

• Default is 32, max for EC2 is 256

For example with GRUB’s /boot/grub/menu.lst configuration:

kernel /boot/vmlinuz-4.4.5-15.26.amzn1.x86_64 root=LABEL=/ console=ttyS0 xen_blkfront.max=256

Verify setting:

/sys/module/xen_blkfront/parameters/max

• OS boot command line configuration

ST1 & SC1: Linux performance tuning

Increase read-ahead buffer:

• Recommended for high-throughput read workloads

• Per device configuration

• Default is 128 KiB (256 sectors) for Amazon Linux

• Smaller or random I/O will degrade performance

For example:

$ sudo blockdev –setra 2048 /dev/xvdf

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSPerformance.html

IOPS vs. throughput

20,000 IOPS

16 KB

320 MB/s

10,000 IOPS

64 KB

640 MB/s

10,000 IOPS

32 KB

320 MB/s

I/O request size

1,250 IOPS

256 KB

320 MB/s

Example:

io1 volume

20,000 PIOPS

Performance: EBS-optimized

c4.large

Dedicated to EBS

500 Mbps ~ 62.5 MB/s

2 TB GP2 volume:6,000 IOPS

160 MB/s max throughput4,000 16K IOPS

c4.2xlarge

Dedicated to EBS

1 Gbps ~ 125 MB/s

8,000 16K IOPS

2 TB GP2 volume:6,000 IOPS

160 MB/s max throughput

Performance: throughput workload

EC2

instances InternetDatabases

m4.16xlarge

S3

10 Gbps ~ 1250 MB/s

EBS-optimized

10 Gbps

~1,250 MB/s

65,000 16K IOPS

8 TB ST1 volume:320 MB/s base

500 MB/s burst

Shared

Performance: throughput workload

EC2

instances InternetDatabases

m4.16xlarge

S3

3 * 8 TB ST1 volume:960 MB/s base

1,250 MB/s burstRAID0

10 Gbps ~ 1250 MB/s

Shared

EBS-optimized

10 Gbps

~1,250 MB/s

65,000 16K IOPS

Best practice: RAID

When to RAID?

• Storage requirement > 16 TB

• Throughput requirement > 500 MB/s

• IOPS requirement > 20,000 @ 16K

Best practice: RAID

EBS

volume

Availability Zone

AWS Region

EC2

instance

EBS

volume

RAID0RAID10

Replica Replica

Best practice: RAID

Avoid RAID for redundancy

• EBS data is already replicated

• RAID5/6 loses 20 – 30% of usable I/O to parity

• RAID1 halves available EBS bandwidth

EBS deep dive

Performance

Reliability

What about EC2 instance failure?

Availability Zone

AWS Region

EBS

volume

EC2

instance

Replica

What about EC2 instance failure?

Availability Zone

AWS Region

EBS

volume

New

EC2

instance

Replica

EBS enables EC2 auto recovery

RECOVER Instance

Instance ID

Instance metadata

Private IP addresses

Elastic IP addresses

EBS volume attachments

Instance retains:

* Supported on C3, C4, M3, M4, P2, R3, T2, and X1 instance types with EBS-only storage

StatusCheckFailed_System

Amazon CloudWatch

per-instance metric alarm:

When alarm triggers?

What about EC2 instance termination?

Availability Zone

EBS

volume

EC2

instance

DeleteOnTermination = True

DeleteOnTermination = False

AWS Region

Replica

Best practice: taking snapshots from Linux

Quiesce I/O

1. Database: FLUSH and LOCK tables

2. Filesystem: sync and fsfreeze

3. EBS: snapshot all volumes

4. When CreateSnapshot API returns

success, it is safe to resume

Best practice: taking snapshots from Windows

1. sync equivalent available

2. Use Volume Shadow Copy Service-

(VSS) aware utilities for backups

3. EBS: backups on dedicated volume

for snapshots

Best practice: taking EBS snapshots from Windows

EBS

boot

volume

Windows

EC2

instance

EBS

data

volume

EBS

backup

volume

Windows Server Backup

EBS snapshot

EBS volume initialization

New EBS volume? New EBS volume from snapshot?

• Attach and it’s ready to go • Initialize for best performance

• Random read across volume

Best practice: EBS volume initialization

$ sudo yum install –y fio

$ sudo fio --filename=/dev/xvdf --rw=randread --bs=128k --iodepth=32

--ioengine=libaio --direct=1 --name=volume-initialize

Fio-based example:

Best practice: automate snapshots

Key ingredients:

AWS Lambda Amazon EC2

Run commandTagging

https://aws.amazon.com/ec2/run-command/

Best practice: automate snapshots

Lambda

scheduled event:

daily snapshots

EC2

instances

Backup

Retention30 days

Search for instances

tagged “Backup”

EC2 Run commands to

quiesce file systemSnapshot attached

volumes

Tag snapshots with

expire date

1. 2. 3. 4.

Best practice: automate snapshot expiration

Lambda

scheduled event:

daily expire

Search for snapshots

tagged to “Expire On”

today

Delete expired

snapshots

1. 2.

EBS

snapshots

BackupExpireOn

Date

https://github.com/dR0ski/lambda-ebs-snapshot-custodian

Prototype for EBS Snapshot Custodian:

EBS best practices

Performance

Reliability

Security

Best practice: encryption

EBS encryption:

data volumes


Create a new AWS KMS master key for EBS

• Define key rotation policy

• Enable AWS CloudTrail auditing

• Control who can use key

• Control who can administer key


EBS encryption:

data volumes

How does EBS encryption work?

EBS volume 1

EBS

master

key

KMS

Data key 1

Data key 2

Data key 3Envelope encryption

EBS volume 2

EBS volume 3


EBS volume 1

EBS

master

key

AWS KMS

Envelope encryption

EBS volume 2

EBS volume 3


EBS volume 1

EBS

master

key

KMS

Envelope encryption

• Limits exposure risk

• Performance

• Simplifies key management

EBS volume 2

EBS volume 3

Summary

Use encryption if

you need it

Take snapshots,

tag snapshotsSelect the right

instance for your

workload

Select the right

volume for your

workload

Thank you!

Remember to complete

your evaluations!

AWS re:Invent 2016: Deep Dive on Amazon Elastic Block Store (STG301)

Technology

Transcript of AWS re:Invent 2016: Deep Dive on Amazon Elastic Block Store (STG301)