Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

74
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. Speed and Reliability at Any Scale Combining SQS and DB Services Jonathan Desrocher, Amazon Web Services Colin Vipurs, Shazam Entertainment Ltd. November 14, 2013

description

Amazon Simple Queue Service (Amazon SQS) makes it easy and inexpensive to enhance the scalability and reliability of your cloud application. In this session, we demonstrate design patterns for using Amazon SQS in conjunction with Amazon Simple Storage Service (Amazon S3), Amazon DynamoDB, Amazon Elastic MapReduce, Amazon Relational Database Service, and Amazon Redshift. Shazam will share their experience of combining Amazon SQS with Amazon DynamoDB to support a Super Bowl advertising campaign.

Transcript of Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

Page 1: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

Speed and Reliability at Any Scale –

Combining SQS and DB Services

Jonathan Desrocher, Amazon Web Services

Colin Vipurs, Shazam Entertainment Ltd.

November 14, 2013

Page 2: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

AWS Messaging = Amazon SQS + Amazon SNS

Page 3: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

• Payload size of up to 256KB

• Message batching for higher throughput and reduced costs

• Supports long polling for reduced costs and latency

• Cross-origin resource sharing support

Amazon SQS Core Features

New and improved

Page 4: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

SQS Core mechanics

Page 5: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

Basic Message Lifecycle

Writer

Page 6: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

Basic Message Lifecycle

Writer

Page 7: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

Basic Message Lifecycle

Writer

Page 8: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

Basic Message Lifecycle

Reader A

Reader B

Page 9: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

Basic Message Lifecycle

A B Reader A

Reader B

ReceiveMessage

Page 10: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

Reader A

Basic Message Lifecycle

A B

Reader B

Page 11: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

Basic Message Lifecycle

B

Reader B

Page 12: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

Basic Message Lifecycle

Reader B

Page 13: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

Basic Message Lifecycle

Reader B

B

Page 14: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

Basic Message Lifecycle

Reader B

Page 15: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

Basic Message Lifecycle

Reader B

Page 16: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

That covers reliability.

Now let’s go for the scale!

Page 17: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

Bulk Transactional Reads

Reader A

Page 18: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

Bulk Transactional Reads

ReceiveMessage

A A A A A

A A A A A

Reader A

RAM: 10 Msgs

Page 19: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

Bulk Transactional Reads

A A A A A

A A A A A

A A A A A

A A A A A

ReceiveMessage

Reader A

RAM: 20 Msgs

Page 20: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

Bulk Transactional Reads

A A A A A

A A A A A

A A A A A

A A A A A

A A A A A

A A A A A

ReceiveMessage

Reader A

RAM: 30 Msgs

Page 21: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

Bulk Transactional Reads

DeleteMessage

Reader A

Page 22: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

Bulk Transactional Reads

Reader A

Page 23: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

Let’s take it to real life!

Page 24: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

Scalability example: market trade volume by half hour

0

50,000

100,000

150,000

200,000

250,000

300,000

350,000

Page 25: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

Scalability example: market trade volume by half hour

15%

85%

0

50,000

100,000

150,000

200,000

250,000

300,000

350,000

Page 26: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

Design pattern #1:

Batch processing

Page 27: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

Batch Processing

• Use SQS as a scalable and resilient short-term storage

solution.

• Simply configure the appropriate retention period and

send away!

Elastic Beanstalk Application

HTTP PUT SendMessage

Page 28: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

Batch Processing

• When appropriate, launch a fleet of Amazon EC2

workers and process the messages en masse.

ReceiveMessage

Page 29: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

Design pattern #2:

IAM Roles for Amazon EC2

Page 30: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

Using IAM Roles for Amazon EC2 • Create an IAM role with the

appropriate permissions to Amazon SQS.

• Launch EC2 instances with this role.

• Done!

– Audit logs will correlate the EC2 instance ID to the SQS API calls.

– IAM will automatically rotate the credentials on our behalf.

{

"Statement": [

{

"Sid": "Stmt1384277213171",

"Action": [

"sqs:ChangeMessageVisibility",

"sqs:DeleteMessage",

"sqs:GetQueueAttributes",

"sqs:GetQueueUrl",

"sqs:ListQueues",

"sqs:ReceiveMessage"

],

"Effect": "Allow",

"Resource": "arn:aws:sqs:us-east-1:455320512810:Sensor_Ingestion"

}

]

}

Page 31: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

Using IAM Roles for Amazon EC2 • Use the AWS SDK on the

Instance

• No need to type credentials

• Not in code

• Not in a configuration file

• Not via the console either

require ‘rubygems’

require ‘aws-sdk’

sqs = AWS::SQS.new()

myqueue = sqs.queues.named("Sensor_Ingestion")

myqueue.poll do |msg|

# Do something with the message

end

Page 32: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

Design pattern #3:

Using SQS to durably batch writes

Page 33: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

Using SQS to durably batch writes

• The application: – An AWS Elastic Beanstalk application.

– Clients upload data to the application through HTTP PUTs.

– Each upload is 100KB in size.

– Amazon S3 will be used as the permanent data store.

S3 PUT

Elastic Beanstalk Application

HTTP PUT

S3

Bucket

Page 34: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

Using SQS to durably batch writes

• The challenge: – We have an external constraint that requires us to batch the upload into

Amazon S3.

For example:

• Amazon EMR best practices call for Amazon S3 object size of >10MB.

• Hourly Amazon Redshift batch inserts.

EMR Cluster

Redshift Database

S3 PUT

Elastic Beanstalk Application

HTTP PUT

S3

Bucket

Page 35: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

Using SQS to durably batch writes

• Enter SQS: – Persist individual client PUTs as SQS messages.

– Have an Amazon EC2 worker role that performs the following logic:

• Receive SQS message and add to an in-memory buffer.

• Once buffer is full, upload to Amazon S3.

• Upon acknowledgement from S3, delete SQS messages from queue.

EMR

Cluster

Redshift

Database

Elastic Beanstalk Application

HTTP PUT S3 PUT SendMessage ReceiveMessage

S3

Bucket

Page 36: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

Using SQS to durably batch writes

• Also to consider: – Some data stores are optimized for read workloads

– Buffering the writes with Simple Queue Service will ensure both speed

and reliability of data ingestion.

Elastic Beanstalk Application

HTTP PUT BatchWriteItem SendMessage ReceiveMessage

RDS Database

Page 37: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

Design pattern #4:

Discarding stale messages

Page 38: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

Discarding stale messages

• Controlled via the MessageRetentionPeriod property.

• Useful when there is no business value for data older

than X minutes. – “Transactions that don’t complete within 5 minutes are abandoned, enabling

client-side failure handling”.

EMR

Cluster

Redshift

Database

Elastic Beanstalk Application

HTTP PUT S3 PUT SendMessage ReceiveMessage

S3

Bucket [Stale Messages]

Page 39: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

Design pattern #5:

Simple Notification Service Fan-out

Page 40: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

Simple Notification Service Fan-out

• Atomically distribute a message to multiple subscribers over different transport methods

– SQS queues

– HTTP/S endpoints

– SMS

– Email

• Also used to abstract different mobile device push providers (MBL308)

– Apple Push Notification Service

– Google Cloud Messaging for Android

– Amazon Device Messaging

Publish

Page 41: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

Simple Notification Service Fan-out

• Perform different operations on the same data

– Split different facets of the payload into different systems.

– Duplicate the data into short-term and longterm storage systems.

Elastic Beanstalk Application

HTTP PUT SNS Publish

S3 PUT ReceiveMessage

S3

Bucket

S3 PUT ReceiveMessage

S3

Bucket

EMR

Cluster

Redshift

Database

Page 42: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

Simple Notification Service Fan-out

• Deliver the same data to different environments

Elastic Beanstalk Application

HTTP PUT SNS Publish

S3 PUT ReceiveMessage

S3

Bucket

S3 PUT ReceiveMessage

S3

Bucket

DynamoDB

Table

DynamoDB

Table

Production

Test

Page 43: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

Simple Notification Service Fan-out

• Distribute the same data to a multiple external environments:

– Push data to different locations worldwide.

– Seamlessly synchronize AWS and on-premises environments.

– Pro tip: MessageID field is consistent across locations.

Elastic Beanstalk Application

HTTP PUT SNS Publish

us-east-1

eu-west-1

ap-northeast-1

On-premises Data Center

Page 44: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

Simple Notification Service Fan-out

• Each recipient can have its own preferred transport protocol:

– SQS for guaranteed delivery

– Email for human-friendly delivery

– HTTP/S for real-time push

Elastic Beanstalk Application

HTTP PUT SNS Publish

Partner/Process A

Partner/Process B

Partner/Process C

Partner/Process D

Page 45: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

Design pattern #6:

Send messages from the browser

Page 46: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

Send messages from the browser

• Make direct calls to AWS services such as SQS

and DynamoDB directly from the user’s browser.

• Authentication is based on STS tokens.

• Supports S3, SQS, SNS and DynamoDB.

Page 47: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

Send messages from the browser

• Back to our sample architecture: – Browser authenticates against Elastic Beanstalk application

– Response includes location of SQS Queue and STS Token for direct

authentication.

EMR

Cluster

Redshift

Database

S3 PUT SendMessage ReceiveMessage

S3

Bucket

Elastic Beanstalk Application GetToken

Page 48: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

Colin Vipurs Shazam Entertainment Ltd.

Page 49: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013
Page 50: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013
Page 51: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013
Page 52: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

375 MILLION

USERS

10 MILLION

NEW USERS PER MONTH

75 MILLION

MONTHLY ACTIVE USERS

Page 53: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

2004 2005 2006 2007 2008 2009 2010 2011 2012 2013

Page 54: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

Amazon SQS for surge protection

SQS shield

Page 55: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

c3po

queue worker user updater DynamoDB

ba

tch

Facebook Realtime Updates

single event message update request user data

use

r da

ta d

ata

re

q.

Page 56: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

Facebook Realtime Updates

Page 57: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

read msg

Queue Worker Anatomy

operation

versioned msg

pollers unmarshallers

data object

handlers

target system

Page 58: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

data

ingester

SQS for SLAs

Page 59: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

data

ingester

queue worker

ba

tch

SQS for SLAs

high priority

std priority

low priority

Page 60: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

API DynamoDb data

SQS as DynamoDB Buffer

Page 61: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

Tra

ffic

Vo

lum

e

Time

Page 62: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

Tra

ffic

Vo

lum

e

Time

large volume

event

Page 63: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

API DynamoDb data

throughtput exceeded

data

queue worker

data message

data

SQS as DynamoDB Buffer

Page 64: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

API

queue worker

data client fails

fast client

retries

DynamoDb

SQS as DynamoDB Buffer

data

Page 65: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

SQS as DynamoDB Buffer

public interface Writer <T> {

void write(T t);

}

Page 66: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

SNS/SQS Datastore Segregation

Page 67: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

API

SNS/SQS Datastore Segregation

tag data

API

Reporting

S3

Page 68: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

API

SNS/SQS Datastore Segregation

tag data

API

Dev Hacking

Reporting

S3

Page 69: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

SQS for Shazam is…

• Protection from the outside world

• Short term, unbounded persistence

• Cost effective elastic capacity

• Scalable data segregation

Page 70: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

Thank you Colin!

Page 71: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

Design patterns recap

1. Batch processing

2. IAM Roles for EC2

3. Using SQS to durably batch writes

4. Discard stale messages

5. Simple Notification Service Fan-out

6. Send messages from the browser

Page 72: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

Additional messaging resources

• Application Services Booth

• re:Invent sessions: – ARC301 Controlling the Flood: Massive Message Processing with

AWS SQS and DynamoDB

– MBL308 Engage Your Customers with Amazon SNS Mobile Push

• AWS Support and Discussion Forums

• AWS Architecture Center: http://aws.amazon.com/architecture

• Documentation: http://aws.amazon.com/documentation/sqs

Page 73: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

Next stop:

ARC301 - Controlling the Flood! Right here in this room.

Page 74: Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

Please give us your feedback on this

presentation

As a thank you, we will select prize

winners daily for completed surveys!

SVC206