Deep Dive on Delivering Amazon EC2 Instance Performance

63
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. John Phillips, Sr. Manager, Amazon EC2 April 2016 Deep Dive on Delivering Amazon EC2 Instance Performance

Transcript of Deep Dive on Delivering Amazon EC2 Instance Performance

Page 1: Deep Dive on Delivering Amazon EC2 Instance Performance

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

John Phillips, Sr. Manager, Amazon EC2

April 2016

Deep Dive on Delivering

Amazon EC2 Instance Performance

Page 2: Deep Dive on Delivering Amazon EC2 Instance Performance

InstancesAPI

Networking

EC2EC2

Purchase options

Amazon Elastic Compute Cloud is Big

Page 3: Deep Dive on Delivering Amazon EC2 Instance Performance

Host Server

Hypervisor

Guest 1 Guest 2 Guest n

Amazon EC2 Instances

Page 4: Deep Dive on Delivering Amazon EC2 Instance Performance

2006 2008 2010 2012 2014 2016

m1.small

m1.large

m1.xlarge

c1.medium

c1.xlarge

m2.xlarge

m2.4xlarge

m2.2xlarge

cc1.4xlarge

t1.micro

cg1.4xlarge

cc2.8xlarge

m1.medium

hi1.4xlarge

m3.xlarge

m3.2xlarge

hs1.8xlarge

cr1.8xlarge

c3.large

c3.xlarge

c3.2xlarge

c3.4xlarge

c3.8xlarge

g2.2xlarge

i2.xlarge

i2.2xlarge

i2.4xlarge

i2.4xlarge

m3.medium

m3.large

r3.large

r3.xlarge

r3.2xlarge

r3.4xlarge

r3.8xlarge

t2.micro

t2.small

t2.med

c4.large

c4.xlarge

c4.2xlarge

c4.4xlarge

c4.8xlarge

d2.xlarge

d2.2xlarge

d2.4xlarge

d2.8xlarge

g2.8xlarge

t2.large

m4.largem4.xlarge

m4.2xlarge

m4.4xlarge

m4.10xlarge

Amazon EC2 Instances History

Page 5: Deep Dive on Delivering Amazon EC2 Instance Performance

What to Expect from the Session

• Defining system performance and how it is

characterized for different workloads

• How Amazon EC2 instances deliver performance

while providing flexibility and agility

• How to make the most of your EC2 instance experience

through the lens of several instance types

Page 6: Deep Dive on Delivering Amazon EC2 Instance Performance

Defining Performance

Page 7: Deep Dive on Delivering Amazon EC2 Instance Performance

• Servers are hired to do jobs

• Performance is measured differently depending on the job

Hiring a Server

?

Page 8: Deep Dive on Delivering Amazon EC2 Instance Performance

• What performance means

depends on your perspective:

– Response time

– Throughput

– Consistency

Defining Performance: Perspective Matters

Application

System libraries

System calls

Kernel

Devices

Workload

Page 9: Deep Dive on Delivering Amazon EC2 Instance Performance

Performance Factors

Resource Performance factors Key indicators

CPU Sockets, number of cores, clock

frequency, bursting capability

CPU utilization, run queue length

Memory Memory capacity Free memory, anonymous paging,

thread swapping

Network

interface

Max bandwidth, packet rate Receive throughput, transmit throughput

over max bandwidth

Disks Input / output operations per

second, throughput

Wait queue length, device utilization,

device errors

Page 10: Deep Dive on Delivering Amazon EC2 Instance Performance

Resource Utilization

• For given performance, how efficiently are resources being used?

• Something at 100% utilization can’t accept any more work

• Low utilization can indicate more resources are being purchased

than needed

Page 11: Deep Dive on Delivering Amazon EC2 Instance Performance

Example: Web Application

• MediaWiki installed on Apache with 140 pages of content

• Load increased in intervals over time

Page 12: Deep Dive on Delivering Amazon EC2 Instance Performance

Example: Web Application

• Memory stats

Page 13: Deep Dive on Delivering Amazon EC2 Instance Performance

Example: Web Application

• Disk stats

Page 14: Deep Dive on Delivering Amazon EC2 Instance Performance

Example: Web Application

• Network stats

Page 15: Deep Dive on Delivering Amazon EC2 Instance Performance

Example: Web Application

• CPU stats

Page 16: Deep Dive on Delivering Amazon EC2 Instance Performance

• Picking an instance is tantamount to resource performance tuning

• Give back instances as easily as you can acquire new ones

• Find an ideal instance type and workload combination

Instance Selection = Performance Tuning

Page 17: Deep Dive on Delivering Amazon EC2 Instance Performance

Delivering Compute Performance with

Amazon EC2 Instances

Page 18: Deep Dive on Delivering Amazon EC2 Instance Performance

CPU Instructions and Protection Levels

• CPU has at least two protection levels.

• Privileged instructions can’t be executed in user mode to protect

system. Applications leverage system calls to the kernel.

Kernel

Application

Page 19: Deep Dive on Delivering Amazon EC2 Instance Performance

Example: Web application system calls

Page 20: Deep Dive on Delivering Amazon EC2 Instance Performance

X86 CPU Virtualization: Prior to Intel VT-x

• Binary translation for privileged instructions

• Para-virtualization (PV)

• PV requires going through the VMM, adding latency

• Applications that are system call-bound are most affected

VMM

Application

Kernel

PV

Page 21: Deep Dive on Delivering Amazon EC2 Instance Performance

X86 CPU Virtualization: After Intel VT-x

• Hardware assisted virtualization (HVM)

• PV-HVM uses PV drivers opportunistically for operations that are

slow emulated:

• e.g., network and block I/O

Kernel

Application

VMM

PV-HVM

Page 22: Deep Dive on Delivering Amazon EC2 Instance Performance

Tip: Use PV-HVM AMIs with EBS

Page 23: Deep Dive on Delivering Amazon EC2 Instance Performance

Timekeeping Explained

• Timekeeping in an instance is deceptively hard

• gettimeofday(), clock_gettime(), QueryPerformanceCounter()

• The TSC

• CPU counter, accessible from userspace

• Requires calibration, vDSO

• Invariant on Sandy Bridge+ processors

• Xen pvclock; does not support vDSO

• On current generation instances, use TSC as clocksource

Page 24: Deep Dive on Delivering Amazon EC2 Instance Performance

Tip: Use TSC as clocksource

Page 25: Deep Dive on Delivering Amazon EC2 Instance Performance

CPU Performance and Scheduling

• Hypervisor ensures every guest receives CPU time

• Fixed allocation

• Uncapped vs. capped

• Variable allocation

• Different schedulers can be used depending on the goal

• Fairness

• Response time / deadline

• Shares

Page 26: Deep Dive on Delivering Amazon EC2 Instance Performance

Review: C4 Instances

• Custom Intel E5-2666 v3 at 2.9 GHz

• P-state and C-state controls

Model vCPU Memory (GiB) EBS (Mbps)

c4.large 2 3.75 500

c4.xlarge 4 7.5 750

c4.2xlarge 8 15 1,000

c4.4xlarge 16 30 2,000

c4.8xlarge 36 60 4,000

Page 27: Deep Dive on Delivering Amazon EC2 Instance Performance

What’s new in C4: P-state and C-state control

• By entering deeper idle states, non-idle cores can achieve

up to 300MHz higher clock frequencies

• But… deeper idle states require more time to exit, may not

be appropriate for latency-sensitive workloads

Page 28: Deep Dive on Delivering Amazon EC2 Instance Performance

Tip: P-state control for AVX2

• If an application makes heavy use of AVX2 on all cores, the

processor may attempt to draw more power than it should

• Processor will transparently reduce frequency

• Frequent changes of CPU frequency can slow an application

Page 29: Deep Dive on Delivering Amazon EC2 Instance Performance

Review: T2 Instances

• Lowest cost EC2 instance at $0.013 per hour

• Burstable performance

• Fixed allocation enforced with CPU credits

Model vCPU CPU Credits

/ Hour

Memory

(GiB)

Storage

t2.micro 1 6 1 EBS Only

t2.small 1 12 2 EBS Only

t2.medium 2 24 4 EBS Only

t2.large 2 36 8 EBS Only

Page 30: Deep Dive on Delivering Amazon EC2 Instance Performance

How Credits Work

• A CPU credit provides the

performance of a full CPU core for

one minute

• An instance earns CPU credits at

a steady rate

• An instance consumes credits

when active

• Credits expire (leak) after 24 hours

Baseline rate

Credit

balance

Burst

rate

Page 31: Deep Dive on Delivering Amazon EC2 Instance Performance

Tip: Monitor CPU credit balance

Page 32: Deep Dive on Delivering Amazon EC2 Instance Performance

Monitoring CPU Performance in Guest

• Indicators that work is being done

• User time

• System time (kernel mode)

• Wait I/O, threads blocked on disk I/O

• Else, Idle

• What happens if OS is scheduled off the CPU?

Page 33: Deep Dive on Delivering Amazon EC2 Instance Performance

Tip: How to Interpret Steal Time

• Fixed CPU allocations of CPU can be offered through

CPU caps

• Steal time happens when CPU cap is enforced

• Leverage CloudWatch metrics

Page 34: Deep Dive on Delivering Amazon EC2 Instance Performance

Delivering Memory Performance

with Amazon EC2 Instances

Page 35: Deep Dive on Delivering Amazon EC2 Instance Performance

Announced: X1 Instances

• Largest memory instance with 2 TB of DRAM

• Quad socket, Intel E7 processors with 128 vCPUs

Model vCPU Memory (GiB) Local

Storage

x1.32xlarge 128 1952 2x 1920GB

Page 36: Deep Dive on Delivering Amazon EC2 Instance Performance

Virtualized Address Spaces

0 4GB

Current Guest Process

0 4GB

Guest OS Virtual

Address Spaces

Physical

Address SpacesVirtual RAM

Virtual

ROM

Virtual

Devices

Virtual

Frame

Buffer

Source: [1]

Page 37: Deep Dive on Delivering Amazon EC2 Instance Performance

Memory Address Translation

Virtual

Address

Physical

Address

Process

Page Table

1 2

2

3

4

TLB

Operating

System’s

Page Fault Handler

Source: [1]

Page 38: Deep Dive on Delivering Amazon EC2 Instance Performance

Virtual Machine Memory

0 4GB

Current Guest Process

0 4GB

Guest OSVirtual

Address Spaces

Physical

Address SpacesVirtual RAM

Virtual

ROM

Virtual

Devices

Virtual

Frame

Buffer

0 4GBMachine

Address SpaceRAM ROMDevices

Frame

Buffer

Source: [1]

Page 39: Deep Dive on Delivering Amazon EC2 Instance Performance

Before Intel EPT: Shadow Page Tables

• Hypervisor maintains shadow page tables that map guest

virtual pages directly to machine pages.

• Guest modifications to virtual to physical tables need to be

synced with shadow page tables.

• Shadow page tables loaded into MMU on context switch.

Page 40: Deep Dive on Delivering Amazon EC2 Instance Performance

Address Translation Before EPT

Virtual

Address

Machine

Address

Emulated

TLB

Page Table

GuestPage Table

Machine

Map

1 2

2

3

45

3

6

TLB

• Shadow page tables loaded into MMU on context switch

Source: [1]

Page 41: Deep Dive on Delivering Amazon EC2 Instance Performance

Drawbacks: Shadow Page Tables

• Maintaining consistency between guest page tables and

shadow page tables leads to hypervisor traps.

• Loss of performance due to TLB flush on every context switch.

• Memory overhead due to shadow copying of guest page

tables.

Page 42: Deep Dive on Delivering Amazon EC2 Instance Performance

Extended Page Tables

• Extended page tables (EPT) translate the guest virtual

addresses to machine addresses

‒ No need to trap to hypervisor when the guest OS updates it’s page

tables

• TLB w/ virtual process identifiers

‒ No need to flush TLB on VM or hypervisor context switch

Page 43: Deep Dive on Delivering Amazon EC2 Instance Performance

Address Translation w/ Extended Page Tables

Virtual

Address

Machine

Address

GuestPage Table

Machine

Map

1

2

TLB

3

2

3

Source: [1]

Page 44: Deep Dive on Delivering Amazon EC2 Instance Performance

NUMA

• Non-uniform memory access

• Each processor in a multi-CPU system has local memory that is

accessible through a fast interconnect

• Each processor can also access memory from other CPUs, but local

memory access is a lot faster than remote memory

• Performance is related to the number of CPU sockets and how they

are connected - Intel QuickPath Interconnect (QPI)

Page 45: Deep Dive on Delivering Amazon EC2 Instance Performance

Tip: Kernel Support for NUMA Balancing

• An application will perform best when the threads of its processes

are accessing memory on the same NUMA node.

• NUMA balancing moves tasks closer to the memory they are

accessing.

• This is all done automatically by the Linux kernel when automatic

NUMA balancing is active: version 3.13+ of the Linux kernel.

• Windows support for NUMA first appeared in the Enterprise and

Data Center SKUs of Windows Server 2003.

Page 46: Deep Dive on Delivering Amazon EC2 Instance Performance

Delivering I/O Performance with

Amazon EC2 Instances

Page 47: Deep Dive on Delivering Amazon EC2 Instance Performance

I/O and Devices Virtualization

• Scheduling I/O requests between virtual devices and

shared physical hardware

• Split driver model

• Intel VT-d

• Direct pass through and IOMMU for dedicated devices

• Enhanced networking

Page 48: Deep Dive on Delivering Amazon EC2 Instance Performance

Review: I2 Instances

16 vCPU: 3.2 TB SSD; 32 vCPU: 6.4 TB SSD

365K random read IOPS for 32 vCPU instance

Model vCPU Memory

(GiB)

Storage Read IOPS Write IOPS

i2.xlarge 4 30.5 1 x 800 SSD 35,000 35,000

i2.2xlarge 8 61 2 x 800 SSD 75,000 75,000

i2.4xlarge 16 122 4 x 800 SSD 175,000 155,000

i2.8xlarge 32 244 8 x 800 SSD 365,000 315,000

Page 49: Deep Dive on Delivering Amazon EC2 Instance Performance

Hardware

Split Driver Model

Driver Domain Guest Domain Guest Domain

VMM

Frontend

driver

Frontend

driver

Backend

driver

Device

Driver

Physical

CPU

Physical

Memory

Network

Device

Virtual CPUVirtual

Memory

CPU

Scheduling

Sockets

Application1

23

4

5

Page 50: Deep Dive on Delivering Amazon EC2 Instance Performance

Split Driver Model

• Each virtual device has two main components

• Communication ring buffer

• An event channel signaling activity in the ring buffer

• Data is transferred through shared pages

• Shared pages requires inter-domain permissions, or granting

Page 51: Deep Dive on Delivering Amazon EC2 Instance Performance

Granting in pre-3.8.0 Kernels

• Requires “grant mapping” prior to 3.8.0

• Grant mappings are expensive operations due to TLB flushes

read(fd, buffer,…)

Page 52: Deep Dive on Delivering Amazon EC2 Instance Performance

Granting in 3.8.0+ Kernels, Persistent and Indirect

• Grant mappings are set up in a pool once

• Data is copied in and out of the grant pool

read(fd, buffer…)

Copy to and from grant pool

Page 53: Deep Dive on Delivering Amazon EC2 Instance Performance

Tip: Use 3.8+ kernel

• Amazon Linux 13.09 or later

• Ubuntu 14.04 or later

• RHEL7 or later

• Etc.

Page 54: Deep Dive on Delivering Amazon EC2 Instance Performance

Device Pass Through: Enhanced Networking

• SR-IOV eliminates need for driver domain

• Physical network device exposes virtual function to

instance

• Requires a specialized driver, which means:

• Your instance OS needs to know about it

• EC2 needs to be told your instance can use it

Page 55: Deep Dive on Delivering Amazon EC2 Instance Performance

Hardware

After Enhanced Networking

Driver Domain Guest Domain Guest Domain

VMM

NIC

Driver

Physical

CPU

Physical

Memory

SR-IOV Network

Device

Virtual CPUVirtual

Memory

CPU

Scheduling

Sockets

Application1

2

3

NIC

Driver

Page 56: Deep Dive on Delivering Amazon EC2 Instance Performance

Tip: Use Enhanced Networking

• Highest packets-per-second

• Lowest variance in latency

• Instance OS must support it

• Look for SR-IOV property of instance or image

Page 57: Deep Dive on Delivering Amazon EC2 Instance Performance

Summary

Page 58: Deep Dive on Delivering Amazon EC2 Instance Performance

• Find an instance type and workload combination– Define performance

– Monitor resource utilization

– Make changes

Instance Selection = Performance Tuning

Page 59: Deep Dive on Delivering Amazon EC2 Instance Performance

• Bare metal performance goal, and in many scenarios

already there

• History of eliminating hypervisor intermediation and driver

domains– Hardware assisted virtualization

– Scheduling and granting efficiencies

– Device pass through

Virtualization Themes

Page 60: Deep Dive on Delivering Amazon EC2 Instance Performance

• PV-HVM

• Time keeping: use TSC

• C state and P state controls

• Monitor T2 CPU credits

• NUMA balancing

• Persistent grants for I/O performance

• Enhanced networking

Recap: Getting the Most Out of EC2 Instances

Page 61: Deep Dive on Delivering Amazon EC2 Instance Performance

Next steps

• Visit the Amazon EC2 documentation

• Come visit us in the Developer Chat to hear more

Page 62: Deep Dive on Delivering Amazon EC2 Instance Performance
Page 63: Deep Dive on Delivering Amazon EC2 Instance Performance

References

[1] labs.vmware.com/download/46/