(PFC302) Performance Benchmarking on AWS | AWS re:Invent 2014

• The best benchmark• Absolute vs. relative measures• Fixed time or fixed work• What’s different?• Use a good AMI

0.00 5.00 10.0015.0020.0025.0030.00

Ubuntu 12.4 ami-…AWS CentOS 5.4 ami-…

CentOS 5.4 ami-…CentOS 5.4 ami-…CentOS 5.4 ami-…

Average CPU result

Coefficient of Variance

• Application runs on premises

• Primary requirement is integer CPU performance

• Application is complex to set up, no benchmark tests exist, limited time

• What instance would work best?

1. Choose a synthetic benchmark

2. Baseline: Build, configure, tune, and run it on premises

3. Run the same test (or tests) on a set of instance types

4. Use results from the instance tests to choose the best match

Integer

Twofish

BZip2 compress

BZip2 decompress

JPEG compress

JPEG decompress

PNG compress

PNG decompress

Dijkstra

Floating Point

Black-Scholes

Mandelbrot

Sharpen image

Blur image

N-Body

Ray trace

Memory

STREAM copy

STREAM scale

STREAM add

STREAM triad

ID="`wget -q -O - http://169.254.169.254/latest/meta-data/instance-id`"

TYPE="`wget -q -O - http://169.254.169.254/latest/meta-data/instance-type`

./geekbench_x86_64 --no-upload >$GBTXT

Geekbench

1CPU ratio C.O.V. NCPU ratio C.O.V. RT (min)

m3.xlarge 0.93 1.04% 2.04 2.31% 2.06

m3.2xlarge 0.93 1.40% 3.80 1.46% 2.08

m2.xlarge 0.80 2.84% 1.54 4.06% 1.99

m2.2xlarge 0.80 1.34% 2.82 1.21% 2.04

m2.4xlarge 0.76 2.28% 5.11 1.71% 2.01

c3.large 1.13 0.93% 1.32 0.71% 1.76

c3.xlarge 1.13 0.39% 2.51 1.81% 1.74

c3.2xlarge 1.13 0.19% 4.88 0.25% 1.70

cc2.8xlarge 1.00 0.71% 15.46 1.93% 2.21

geekbench 1CPU ratio C.O.V.

m3.xlarge

instance-1 0.93 0.31%

instance-2 0.97 0.23%

instance-3 0.94 0.17%

instance-4 0.94 0.10%

instance-5 0.94 0.32%

instance-6 0.94 0.10%

instance-7 0.93 0.25%

instance-8 0.93 0.38%

instance-9 0.94 0.11%

instance-10 0.94 0.09%

gb-integer 1CPU ratio C.O.V. NCPU ratio C.O.V. RT (min)

c3.large 1.12 0.50% 1.37 0.43% NA

c3.xlarge 1.13 0.38% 2.72 0.41% NA

c3.2xlarge 1.12 0.38% 5.35 0.51% NA

cc2.8xlarge 1.00 0.20% 17.88 3.31% NA

geekbench

c3.large 1.13 0.93% 1.32 0.71% 1.76

c3.xlarge 1.13 0.39% 2.51 1.81% 1.74

c3.2xlarge 1.13 0.19% 4.88 0.25% 1.70

cc2.8xlarge 1.00 0.71% 15.46 1.93% 2.21

ID="`wget -q -O - http://169.254.169.254/latest/meta-data/instance-id`"

TYPE="`wget -q -O - http://169.254.169.254/latest/meta-data/instance-type`"

./Run –c 1 –c $COPIES >$FN

UnixBench 1CPU ratio C.O.V. NCPU ratio C.O.V. RT (min)

m3.xlarge 1.38 1.90% 2.49 1.36% 28.25

m3.2xlarge 1.42 1.85% 4.21 1.99% 28.29

m2.xlarge 0.40 5.82% 0.76 1.28% 28.30

m2.2xlarge 0.42 1.71% 1.23 1.75% 28.32

m2.4xlarge 0.48 3.31% 2.02 1.71% 28.34

c3.large 1.10 1.33% 1.91 1.54% 28.17

c3.xlarge 1.06 1.48% 2.85 1.26% 28.21

c3.2xlarge 1.10 0.54% 4.50 1.02% 28.96

cc2.8xlarge 1.00 2.97% 6.44 2.65% 30.20

UB-Integer 1CPU ratio C.O.V. NCPU ratio C.O.V. RT (min)

c3.large 1.05 0.24% 1.10 0.30% 0.17

c3.xlarge 1.05 0.27% 2.20 0.28% 0.17

c3.2xlarge 1.05 0.07% 4.34 0.23% 0.17

cc2.8xlarg

e 1.00 0.10% 15.54 0.95% 0.17

UnixBench

c3.large 1.10 1.33% 1.91 1.54% 28.17

c3.xlarge 1.06 1.48% 2.85 1.26% 28.21

c3.2xlarge 1.10 0.54% 4.50 1.02% 28.96

cc2.8xlarg

e 1.00 2.97% 6.44 2.65% 30.20

www.spec.org

Benchmark Category

400.perlbench C Programming language

401.bzip2 C Compression

403.gcc C C compiler

429.mcf C Combinatorial optimization

445.gobmk C Artificial intelligence

456.hmmer C Search gene sequence

458.sjeng C Artificial intelligence

462.libquantum C Physics / quantum computing

464.h264ref C Video compression

471.omnetpp C++ Discrete event simulation

473.astar C++ Path-finding algorithms

483.xalancbmk C++ Xml processing

ID="`wget -q -O - http://169.254.169.254/latest/meta-data/instance-id`”

TYPE="`wget -q -O - http://169.254.169.254/latest/meta-data/instance-type`”

runspec –noreportable –tune=base –size=ref –rate=$COPIES –iterations=1 /

400 403 445 456 458 462 464 471 473 483

SPECint 1CPU ratio C.O.V. RT (min)

ratio C.O.V. RT (min)

m3.xlarge 1.01 1.06% 54.39 2.24 1.15% 104.18

m3.2xlarge 1.01 1.67% 54.49 4.25 1.63% 109.22

m2.xlarge 0.76 1.97% 70.83 1.39 2.45% 85.37

m2.2xlarge 0.79 0.94% 68.85 2.76 1.24% 85.42

m2.4xlarge 0.78 0.16% 68.73 5.21 1.26% 89.91

c3.large 1.11 1.95% 50.00 1.25 1.47% 94.22

c3.xlarge 1.10 1.96% 50.29 2.39 1.28% 97.66

c3.2xlarge 1.08 0.87% 50.87 4.67 0.25% 100.22

cc2.8xlarge 1.00 0.29% 54.92 14.92 0.52% 125.74

sysbench –num-threads=$TDS --max-requests=30000 --test=cpu /

--cpu-max-prime=100000 run > $FN

sysbench Default C.O.V. RT (min)

m3.xlarge 3.21 1.44% 0.06

m3.2xlarge 6.41 1.38% 0.03

m2.xlarge 1.59 0.75% 0.11

m2.2xlarge 3.19 0.64% 0.06

m2.4xlarge 8.83 0.62% 0.02

c3.large 1.78 0.26% 0.10

c3.xlarge 3.55 0.53% 0.05

c3.2xlarge 6.55 8.45% 0.03

cc2.8xlarge 25.34 2.30% 0.01

tuned ratio C.O.V. RT (min)

1.69 1.29% 3.86

3.38 1.41% 1.93

0.80 0.23% 8.16

1.60 0.76% 4.07

4.71 0.20% 1.38

0.91 0.09% 7.13

1.83 0.02% 3.57

3.54 3.31% 1.85

13.69 1.10% 0.48

SPECInt

sysbench

default

sysbench

m3.xlarge 2.04 2.01 2.49 1.88 2.24 3.21 1.69

m3.2xlarge 3.80 3.96 4.21 3.77 4.25 6.41 3.38

m2.xlarge 1.54 1.52 0.76 1.59 1.38 1.59 0.80

m2.2xlarge 2.82 3.02 1.23 3.19 2.76 3.19 1.60

m2.4xlarge 5.11 5.54 2.02 6.48 5.21 8.83 4.71

c3.large 1.32 1.37 1.91 1.10 1.25 1.78 0.91

c3.xlarge 2.51 2.72 2.85 2.20 2.39 3.55 1.83

c3.2xlarge 4.88 5.35 4.50 4.34 4.67 6.55 3.54

cc2.8xlarge 15.46 17.88 6.44 15.5

14.92 25.34 13.69

• Application runs on premises

• Primary requirement: memory throughput of 20K MB/sec

• What instance would work best?

1. Choose a synthetic benchmark

2. Baseline: Build, configure, tune, and run it on premises

3. Run the same test (or tests) on a set of instance types

4. Use results from the instance tests to choose the best match

www.cs.virginia.edu/stream/top20/Bandwidth.html

https://github.com/gregs1104/stream-scaling

name kernel

COPY: a(i) = b(i) 16 0

SCALE: a(i) = q*b(i) 16 1

SUM: a(i) = b(i) + c(i) 24 1

TRIAD: a(i) = b(i) + q*c(i) 24 2

* McCalpin, John D.: "STREAM: Sustainable Memory Bandwidth in High Performance Computers",

./stream | egrep \

./sysbench --num-threads=$TDS --test=memory run >$FN

Stream-

Geekbench

Memory-Triad

sysbench

(default)

m3.xlarge 23640.56 15375.64 302.95

m3.2xlarge 26046.17 14999.27 603.40

m2.xlarge 18766.58 17365.76 528.16

m2.2xlarge 22421.91 17600.00 1019.08

m2.4xlarge 19634.50 14405.82 1576.30

c3.large 11434.83 9967.96 2116.84

c3.xlarge 21141.30 13972.65 2643.33

c3.2xlarge 30235.78 20657.49 2944.91

cc2.8xlarge 55200.86 37067.32 1195.90

sysbench memory defaults

--memory-block-size [1K]

--memory-total-size [100G]

--memory-scope {global,local} [global]

--memory-hugetlb [off]

--memory-oper {read, write, none} [write]

--memory-access-mode {seq,rnd} [seq]

• I/O metrics– IOPs

– Throughput

– Latency

• Test parameters:– Read %

– Write %

– Sequential

– Random

– Queue depth

• Storage configuration– Volume(s)

– RAID

– LVM

Seq.Read

Seq.Write

MixedSeq

MixedSeqWrite

RandRead

RandWrite

MixedRandRead

MixedRandWrite

PIOPs 2K Queue Depth

1D PIOPS 2K

1D PIOPS 2KQD22D PIOPS 2K

2D PIOPS 2KQD2

• disk copy

• cp file1 /disk1/file1

• dd

• dd if=/dev/zero of=/data1/testile1 \

bs=1048 count=1024000

• fio – flexible io tester

• fio simple.cfg

Seconds MB/sec

cp f1 f2 17.248 59.37

rm –rf f2; cp f1 f2 .853 1200.47

cp f1 f3 .880 1164.96

dd if=/dev/zero bs=1048 count=1024000 of=d1 .722 1419.01

dd if=/dev/urandom bs=1048 count=1024000 of=d2 79.710 12.84

fio simple.cfg NA 61.55

Random

1M I/O

PIOPs 16disk

write 904.03

r70w30 1005.91

If benchmarking your application is not practical, synthetic

benchmarks can be used if you are careful.

• Choose the best benchmark that represents your application

• Analysis – what does “best” mean?

• Run enough tests to quantify variability

• Baseline – what is a “good result” ?

• Samples – keep all of your results – more is better!

tech.just-eat.com @justeat_tech

https://loadtestingtool.com

https://github.com/etsy/statsd

https://graphite.readthedocs.org

Please give us your feedback on this session.

Complete session evaluations and earn re:Invent swag.

http://bit.ly/awsevals

(PFC302) Performance Benchmarking on AWS | AWS re:Invent 2014

Technology

Transcript of (PFC302) Performance Benchmarking on AWS | AWS re:Invent 2014

AWS re:Invent - Accelerating Research

Continuous Deployment @ AWS Re:Invent

Migrating My.T-Mobile.com to AWS (ENT214) | AWS re:Invent 2013

AWS re:Invent Hackathon

AWS re:Invent 2015 re:Cap

AWS re:Invent 2016 - Scality's Open Source AWS S3 Server

AWS re:Invent 2016: NEW LAUNCH! Introducing AWS Greengrass (IOT201)

Data Replication Options in AWS (ARC302) | AWS re:Invent 2013

(ENT302) Cost Optimization on AWS | AWS re:Invent 2014

AWS re:Invent 2016: AWS Partners and Data Privacy (GPST303)

(WEB305) Migrating Your Website to AWS | AWS re:Invent 2014

AWS re:Invent 2017 re:View

Bluesoft @ AWS re:Invent 2017 + AWS 101

Mobile Game Architectures on AWS (MBL201) | AWS re:Invent 2013

(ARC206) Architecting Reactive Applications on AWS | AWS re:Invent 2014

Overview of Windows on AWS (CPN206) | AWS re:Invent 2013

AWS re:Invent 2016: The Effective AWS CLI User (DEV402)

Zero to Sixty: AWS CloudFormation (DMG201) | AWS re:Invent 2013

AWS re:Invent 2016: What’s New with AWS Lambda (SVR202)

Feedback on AWS re:invent 2016