Large Scale Load Testing Amazon.com’s Traffic on AWS (CPN102) | AWS re:Invent 2013

35
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. Large-Scale Load Testing Amazon.com’s Traffic on AWS Carlos Arguelles, Amazon.com November 15, 2013

description

It’s 4am and you don’t know it, but you're about to get three times the traffic you were expecting. Is your service ready to handle it? Systems are only as scalable as their weakest component. Large scale load testing in production is the best (and surest) way to ensure that services can truly scale to the unexpected. But the load generator itself can be difficult to scale, expensive to run on hundreds or thousands of hosts, challenging to keep the data secure, and time consuming to develop. The Amazon.com retail site is one of most heavily used sites in the world, and has to be ready for anything, at anytime. How do you design a load test for this in record time while keeping it cost effective? Well, you use AWS! Come learn Best Practices on how you can use Amazon SQS, Amazon S3, Amazon EC2, Amazon CloudWatch, Auto Scaling, and Amazon DynamoDB to design horizontally scalable large-scale load tests that can simulate the load that millions of users are putting onto your site. We met a tight schedule and did it under budget thanks to AWS and you can too!

Transcript of Large Scale Load Testing Amazon.com’s Traffic on AWS (CPN102) | AWS re:Invent 2013

Page 1: Large Scale Load Testing Amazon.com’s Traffic on AWS (CPN102) | AWS re:Invent 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

Large-Scale Load Testing

Amazon.com’s Traffic on AWS

Carlos Arguelles, Amazon.com

November 15, 2013

Page 2: Large Scale Load Testing Amazon.com’s Traffic on AWS (CPN102) | AWS re:Invent 2013
Page 3: Large Scale Load Testing Amazon.com’s Traffic on AWS (CPN102) | AWS re:Invent 2013

What I’d like you to get out of this

Load and performance issues cost

Page 4: Large Scale Load Testing Amazon.com’s Traffic on AWS (CPN102) | AWS re:Invent 2013

What I’d like you to get out of this

Page 5: Large Scale Load Testing Amazon.com’s Traffic on AWS (CPN102) | AWS re:Invent 2013

What I’d like you to get out of this

How you can leverage AWS for load and stress tests

Page 6: Large Scale Load Testing Amazon.com’s Traffic on AWS (CPN102) | AWS re:Invent 2013

About me

Page 7: Large Scale Load Testing Amazon.com’s Traffic on AWS (CPN102) | AWS re:Invent 2013

Amazon.com retail site

Amazon.com receives a LOT of traffic

Page 8: Large Scale Load Testing Amazon.com’s Traffic on AWS (CPN102) | AWS re:Invent 2013

Amazon.com retail site

Significant fluctuation throughout the day

(not to scale)

Page 9: Large Scale Load Testing Amazon.com’s Traffic on AWS (CPN102) | AWS re:Invent 2013

Amazon.com retail site

Significant fluctuation throughout the year

(not to scale)

(not to scale)

Page 10: Large Scale Load Testing Amazon.com’s Traffic on AWS (CPN102) | AWS re:Invent 2013

Amazon.com retail site

Significant growth year to year

(not to scale)

(not to scale)

Page 11: Large Scale Load Testing Amazon.com’s Traffic on AWS (CPN102) | AWS re:Invent 2013

Some load-related issues

can

Page 12: Large Scale Load Testing Amazon.com’s Traffic on AWS (CPN102) | AWS re:Invent 2013

regular day (off-peak)

1st test (cancelled)

2nd test (successful)

50%

100%

85%

CPU Usage on our fleet

Page 13: Large Scale Load Testing Amazon.com’s Traffic on AWS (CPN102) | AWS re:Invent 2013

Some load-related issues

can only

Page 14: Large Scale Load Testing Amazon.com’s Traffic on AWS (CPN102) | AWS re:Invent 2013

Ingestion

Fleet

Database

Output

Fleet Amazon S3

Hadoop

Database Amazon

DynamoDB

Page 15: Large Scale Load Testing Amazon.com’s Traffic on AWS (CPN102) | AWS re:Invent 2013

Some load-related issues

cannot

Page 16: Large Scale Load Testing Amazon.com’s Traffic on AWS (CPN102) | AWS re:Invent 2013

5%

Disk

Usage

20%

5 hours

Start load…

Page 17: Large Scale Load Testing Amazon.com’s Traffic on AWS (CPN102) | AWS re:Invent 2013

What do you really want to do?

Resilience

Testing

Load

Testing

Stress

Testing

Performance

Testing

Page 18: Large Scale Load Testing Amazon.com’s Traffic on AWS (CPN102) | AWS re:Invent 2013

Load Testing

Page 19: Large Scale Load Testing Amazon.com’s Traffic on AWS (CPN102) | AWS re:Invent 2013

Stress Testing

Page 20: Large Scale Load Testing Amazon.com’s Traffic on AWS (CPN102) | AWS re:Invent 2013

Resilience Testing

Page 21: Large Scale Load Testing Amazon.com’s Traffic on AWS (CPN102) | AWS re:Invent 2013

Performance Testing

Page 22: Large Scale Load Testing Amazon.com’s Traffic on AWS (CPN102) | AWS re:Invent 2013

How does AWS help us?

Page 23: Large Scale Load Testing Amazon.com’s Traffic on AWS (CPN102) | AWS re:Invent 2013

Generating load

Replays from real-world traffic Artificial rate, blend of operations

Page 24: Large Scale Load Testing Amazon.com’s Traffic on AWS (CPN102) | AWS re:Invent 2013

Most useful AWS design pattern, ever

Page 25: Large Scale Load Testing Amazon.com’s Traffic on AWS (CPN102) | AWS re:Invent 2013

Distributing load, the hard way

Slave

Slave

Slave

Slave

Master

12,000 TPS

3000 TPS

3000 TPS

3000 TPS

3000 TPS

4000 TPS

4000 TPS

4000 TPS

0 TPS

Page 26: Large Scale Load Testing Amazon.com’s Traffic on AWS (CPN102) | AWS re:Invent 2013

Distributing load, the easy way

Controller

Job

Job

Job

Job

Job

Job

Job

Worker Worker

Controller Worker Worker Worker Worker Worker

Page 27: Large Scale Load Testing Amazon.com’s Traffic on AWS (CPN102) | AWS re:Invent 2013

Metrics &

Dashboards

Replaying traffic to generate load

Test Data

Repository

Controller

Job

Job

Job

Job

Job

Job

Job Controller

Worker Worker Worker Worker Worker Worker Worker

Service

under test

Page 28: Large Scale Load Testing Amazon.com’s Traffic on AWS (CPN102) | AWS re:Invent 2013

Metrics &

Dashboards Test Data

Repository

Controller

Job

Job

Job

Job

Job

Job

Job Controller

Worker Worker Worker Worker Worker Worker Worker

Amazon S3 for storing data

Amazon DynamoDB for

indexing

Amazon SQS

for state,

resilience

Amazon EC2 & Auto Scaling

for hardware

Amazon

CloudWatch

Reactive

auto scaling

based on

queue size

Page 29: Large Scale Load Testing Amazon.com’s Traffic on AWS (CPN102) | AWS re:Invent 2013

Generating load

Replays from real-world traffic Artificial rate, blend of operations

Page 30: Large Scale Load Testing Amazon.com’s Traffic on AWS (CPN102) | AWS re:Invent 2013

Artificial traffic to generate load

• Why?

– You do not have

real-world data

– You expect a

change in traffic

• How?

– Control rate

– Control blend

– Control duration

Page 31: Large Scale Load Testing Amazon.com’s Traffic on AWS (CPN102) | AWS re:Invent 2013

Artificial traffic to generate load

50,000 TPS

for 20 minutes 99% Read, 1% Writes

95,000 TPS

for 3 hours 80% Read, 20% Writes

85,000 TPS

for 45 minutes 90% Read, 10% Writes

Minute#1: 50,000 TPS, 99% 1%

Minute#20: 50,000 TPS, 99% 1%

Minute#1

10 TPS

for 1 minute,

99% R 1% W

10 TPS

for 1 minute,

99% R 1% W

10 TPS

for 1 minute,

99% R 1% W

1

2

5000

Page 32: Large Scale Load Testing Amazon.com’s Traffic on AWS (CPN102) | AWS re:Invent 2013

Artificial traffic to generate load

Controller

Job

Job

Job

Job

Job

Job

Job

Worker Worker

Controller Worker Worker Worker Worker Worker

Page 33: Large Scale Load Testing Amazon.com’s Traffic on AWS (CPN102) | AWS re:Invent 2013

Amazon EC2 Spot Instances

• A great way to inexpensively test – Up to 90% off regular price (name your price)

– Interruption-tolerant, time-flexible tasks

• Approaches – Combine with on-demand instances (burst)

– Try Spot Instances first, then fallback to on-demand

Page 34: Large Scale Load Testing Amazon.com’s Traffic on AWS (CPN102) | AWS re:Invent 2013

Takeaways

Page 35: Large Scale Load Testing Amazon.com’s Traffic on AWS (CPN102) | AWS re:Invent 2013

Please give us your feedback on this

presentation

As a thank you, we will select prize

winners daily for completed surveys!

CPN102