Large Scale Load Testing Amazon.com’s Traffic on AWS (CPN102) | AWS re:Invent 2013

Post on 23-Jun-2015

1.045 views 2 download

Tags:

description

It’s 4am and you don’t know it, but you're about to get three times the traffic you were expecting. Is your service ready to handle it? Systems are only as scalable as their weakest component. Large scale load testing in production is the best (and surest) way to ensure that services can truly scale to the unexpected. But the load generator itself can be difficult to scale, expensive to run on hundreds or thousands of hosts, challenging to keep the data secure, and time consuming to develop. The Amazon.com retail site is one of most heavily used sites in the world, and has to be ready for anything, at anytime. How do you design a load test for this in record time while keeping it cost effective? Well, you use AWS! Come learn Best Practices on how you can use Amazon SQS, Amazon S3, Amazon EC2, Amazon CloudWatch, Auto Scaling, and Amazon DynamoDB to design horizontally scalable large-scale load tests that can simulate the load that millions of users are putting onto your site. We met a tight schedule and did it under budget thanks to AWS and you can too!

Transcript of Large Scale Load Testing Amazon.com’s Traffic on AWS (CPN102) | AWS re:Invent 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

Large-Scale Load Testing

Amazon.com’s Traffic on AWS

Carlos Arguelles, Amazon.com

November 15, 2013

What I’d like you to get out of this

Load and performance issues cost

What I’d like you to get out of this

What I’d like you to get out of this

How you can leverage AWS for load and stress tests

About me

Amazon.com retail site

Amazon.com receives a LOT of traffic

Amazon.com retail site

Significant fluctuation throughout the day

(not to scale)

Amazon.com retail site

Significant fluctuation throughout the year

(not to scale)

(not to scale)

Amazon.com retail site

Significant growth year to year

(not to scale)

(not to scale)

Some load-related issues

can

regular day (off-peak)

1st test (cancelled)

2nd test (successful)

50%

100%

85%

CPU Usage on our fleet

Some load-related issues

can only

Ingestion

Fleet

Database

Output

Fleet Amazon S3

Hadoop

Database Amazon

DynamoDB

Some load-related issues

cannot

5%

Disk

Usage

20%

5 hours

Start load…

What do you really want to do?

Resilience

Testing

Load

Testing

Stress

Testing

Performance

Testing

Load Testing

Stress Testing

Resilience Testing

Performance Testing

How does AWS help us?

Generating load

Replays from real-world traffic Artificial rate, blend of operations

Most useful AWS design pattern, ever

Distributing load, the hard way

Slave

Slave

Slave

Slave

Master

12,000 TPS

3000 TPS

3000 TPS

3000 TPS

3000 TPS

4000 TPS

4000 TPS

4000 TPS

0 TPS

Distributing load, the easy way

Controller

Job

Job

Job

Job

Job

Job

Job

Worker Worker

Controller Worker Worker Worker Worker Worker

Metrics &

Dashboards

Replaying traffic to generate load

Test Data

Repository

Controller

Job

Job

Job

Job

Job

Job

Job Controller

Worker Worker Worker Worker Worker Worker Worker

Service

under test

Metrics &

Dashboards Test Data

Repository

Controller

Job

Job

Job

Job

Job

Job

Job Controller

Worker Worker Worker Worker Worker Worker Worker

Amazon S3 for storing data

Amazon DynamoDB for

indexing

Amazon SQS

for state,

resilience

Amazon EC2 & Auto Scaling

for hardware

Amazon

CloudWatch

Reactive

auto scaling

based on

queue size

Generating load

Replays from real-world traffic Artificial rate, blend of operations

Artificial traffic to generate load

• Why?

– You do not have

real-world data

– You expect a

change in traffic

• How?

– Control rate

– Control blend

– Control duration

Artificial traffic to generate load

50,000 TPS

for 20 minutes 99% Read, 1% Writes

95,000 TPS

for 3 hours 80% Read, 20% Writes

85,000 TPS

for 45 minutes 90% Read, 10% Writes

Minute#1: 50,000 TPS, 99% 1%

Minute#20: 50,000 TPS, 99% 1%

Minute#1

10 TPS

for 1 minute,

99% R 1% W

10 TPS

for 1 minute,

99% R 1% W

10 TPS

for 1 minute,

99% R 1% W

1

2

5000

Artificial traffic to generate load

Controller

Job

Job

Job

Job

Job

Job

Job

Worker Worker

Controller Worker Worker Worker Worker Worker

Amazon EC2 Spot Instances

• A great way to inexpensively test – Up to 90% off regular price (name your price)

– Interruption-tolerant, time-flexible tasks

• Approaches – Combine with on-demand instances (burst)

– Try Spot Instances first, then fallback to on-demand

Takeaways

Please give us your feedback on this

presentation

As a thank you, we will select prize

winners daily for completed surveys!

CPN102