Scaling the Platform for your Startup - Amazon S3 · AWS Elastic Beanstalk (EB) •Easily deploy,...

Post on 21-May-2020

6 views 0 download

Transcript of Scaling the Platform for your Startup - Amazon S3 · AWS Elastic Beanstalk (EB) •Easily deploy,...

Scaling the Platform for your

StartupAndreas Chatzakis, AWS Solutions Architecture

Peter Mounce, Senior Software Developer at JUST EAT

15th April 2015, AWS London Summit

Why are you here?

• Building the technology platform for your startup

• You want to prepare for success

• Learn about design patterns & scalability

• A pragmatic approach for startups

Priorities for startups

• Racing within a window of opportunity

• Small team with no legacy

• Focus on solving a problem

• Avoid over-engineering & re-engineering

• Reduce risk of failure when you go viral

A scalable architecture

• Can support growth in users, traffic, data size

• Without practical limits

• Without a drop in performance

• Seamlessly - just by adding more resources

• Efficiently - in terms of cost per user

Day 1 – Dev & private beta

Single host

THE server

(e.g. Apache,

MySQL)

Elastic IP

www.example.com

Amazon Route 53

DNS service

Server Image

(AMI)

Day 2 - Public beta

We need a bigger server

• Add larger & faster storage (EBS)

• Use the right instance type

• Easy to change instance sizes

• Not our long term strategy

• Will hit an endpoint eventually

• No fault tolerance

Separating web and DB• More capacity

• Scale each tier individually

• Tailor instance for each tier

– Instance type

– Storage

• Security

– Security groups

– DB in a private VPC subnet

But how do I choose what

DB technology I need?

SQL? NoSQL?

Why start with a Relational DB?

• SQL is versatile & feature-rich

• Lots of existing code, tools, knowledge

• Clear patterns to scalability (for read-heavy apps)

• Reality: eventually you will have a polyglot data layer

– There will be workloads where NoSQL is a better fit

– Combination of both Relational and NoSQL

– Use the right tool for each workload

Key Insight: Relational Databases are Complex

• Our experience running Amazon.com taught us that

relational databases can be a pain to manage and operate

with high availability

• Poorly managed relational databases are a leading cause

of lost sleep and downtime in the IT world!

• Especially for startups with small teams

Relational Databases

MySQL, Aurora, PostgreSQL, Oracle, SQL Server

Fully managed; zero adminAmazon

RDS

Aurora

Improving efficiency

Offload static content• Amazon S3: highly available hosting that scales

– Static files (JavaScript, CSS, images)

– User uploads

• S3 URLs – serve directly from S3

• Let the web server focus on dynamic content

Amazon CloudFront• Worldwide network of edge locations

• Cache on the edge

– Reduce latency

– Reduce load on origin servers

– Static and dynamic content

– Even few seconds caching of popular content can have huge impact

• Connection optimizations

– Optimize transfer route

– Reuse connections

– Benefits even non cachable content

CloudFront for static & dynamic content

Amazon

Route 53

EC2 instance(s)

S3 bucket

Static content

Dynamic content

css/*js/*Images/*

Default(*)

CloudFront

distribution

Database caching• Faster response from RAM

• Reduce load on database

Application server

1. If data in cache, return result

2. If not in cache, read from DB

RDS database

Amazon ElastiCache

3. And store in cache

Amazon ElastiCache: in-memory cache

• Simple to Deploy

• Managed

– Automatically replaces failed nodes

– Patch management

• Elastic

• Compatible

ElastiCache

Day 3 – Paying customers

High Availability

Availability Zone a

RDS DB

instance

Web

serverS3 bucket for

static assets

www.example.com

Amazon Route 53

DNS service

Amazon CloudFront

ElastiCache

node 1

High Availability

Availability Zone a

RDS DB

instance

Availability Zone b

Web

serverWeb

serverS3 bucket for

static assets

www.example.com

Amazon Route 53

DNS service

Amazon CloudFront

ElastiCache

node 1

High Availability

Availability Zone a

RDS DB

instance

Availability Zone b

www.example.com

Amazon Route 53

DNS service

Elastic Load

Balancing

Web

serverWeb

serverS3 bucket for

static assets

Amazon CloudFront

ElastiCache

node 1

Elastic Load Balancing

• Managed Load Balancing Service

• Fault tolerant

• Health Checks

• Distributes traffic across AZs

• Elastic – automatically scales its capacity

High Availability

Availability Zone a

RDS DB

instance

Availability Zone b

www.example.com

Amazon Route 53

DNS service

Elastic Load

Balancing

Web

serverWeb

serverS3 bucket for

static assets

ElastiCache

node 1

Amazon CloudFront

High Availability

Availability Zone a

RDS DB

instance

Availability Zone b

www.example.com

Amazon Route 53

DNS service

Elastic Load

Balancing

Web

serverWeb

server

RDS DB

standby

S3 bucket for

static assets

ElastiCache

node 1

Amazon CloudFront

Data layer HA

Availability Zone a

RDS DB

instance

ElastiCache

node 1

Availability Zone b

S3 bucket for

static assets

www.example.com

Amazon Route 53

DNS service

Elastic Load

Balancing

Web

serverWeb

server

RDS DB

standby

Data layer HA

Availability Zone a

RDS DB

instance

ElastiCache

node 1

Availability Zone b

S3 bucket for

static assets

www.example.com

Amazon Route 53

DNS service

Elastic Load

Balancing

Web

serverWeb

server

RDS DB

standby

ElastiCache

node 2

User sessions• Problem: Often stored on local disk (not shared)

• Quickfix: ELB Session stickiness

• Solution: DynamoDB

Elastic Load

Balancing

Web

serverWeb

server

Logged in Logged out

Amazon DynamoDB• Managed document and key-value store

• Simple to launch and scale

• To millions of IOPS

• Both reads and writes

• Consistent, fast performance

• Durable: perfect for storage of session data

https://github.com/aws/aws-dynamodb-session-tomcat

http://docs.aws.amazon.com/aws-sdk-php/guide/latest/feature-dynamodb-session-handler.html

Day 4 – Let’s go viral!

Replace guesswork with elastic IT

Startups pre-AWS

Demand

Unhappy Customers

Waste $$$

Traditional

Capacity

Capacity

Demand

AWS Cloud

Scaling the web tier

Availability Zone a

RDS DB

instance

ElastiCache

node 1

Availability Zone b

S3 bucket for

static assets

www.example.com

Amazon Route 53

DNS service

Elastic Load

Balancing

Web

serverWeb

server

RDS DB

standby

ElastiCache

node 2

Scaling the web tier

Availability Zone a

RDS DB

instance

ElastiCache

node 1

Availability Zone b

S3 bucket for

static assets

www.example.com

Amazon Route 53

DNS service

Elastic Load

Balancing

Web

serverWeb

server

RDS DB

standby

ElastiCache

node 2

Web

server

Web

server

Scaling the web tier

Availability Zone a

RDS DB

instance

ElastiCache

node 1

Availability Zone b

S3 bucket for

static assets

www.example.com

Amazon Route 53

DNS service

Elastic Load

Balancing

Web

serverWeb

server

RDS DB

standby

ElastiCache

node 2

Web

server

Web

server

Automatic resizing of compute

clusters based on demand

Feature Details

Control Define minimum and maximum instance pool sizes and when scaling and cool down occurs.

Integrated to Amazon CloudWatch

Use metrics gathered by CloudWatch to drive scaling.

Instance types Run Auto Scaling for on-demand and Spot Instances. Compatible with VPC.

aws autoscaling create-auto-scaling-group--auto-scaling-group-name MyGroup--launch-configuration-name MyConfig--min-size 4--max-size 200--availability-zones us-west-2c, us-west-2b

Auto ScalingTrigger auto-scaling policy

Amazon CloudWatch

Decompose into small,

loosely coupled, stateless

building blocks

Prerequisite

What does this mean in practice?

• Only store transient data on local disk

• Needs to persist beyond a single http request?

– Then store it elsewhere

User uploads

User Sessions

Amazon S3

AWS DynamoDB

Application Data

Amazon RDS

Having decomposed into

small, loosely coupled,

stateless building blocks

You can now Scale out with ease

Having done that…

Having decomposed into

small, loosely coupled,

stateless building blocks

We can also Scale back with ease

Having done that…

Take the shortcut

• While this architecture is simple you still need to deal with:

– Configuration details

– Deploying code to multiple instances

– Maintaining multiple environments (Dev, Test, Prod)

– Maintain different versions of the application

• Solution: Use AWS Elastic Beanstalk

AWS Elastic Beanstalk (EB)

• Easily deploy, monitor, and scale three-tier web

applications and services.

• Infrastructure provisioned and managed by EB

• You maintain control.

• Preconfigured application containers

• Easily customizable.

• Support for these platforms:

Loose coupling with SQS

Tight coupling

• Place asynchronous tasks into Amazon SQS• SQS – buffer that protects backend systems• Process at own pace• Respond quickly to end users

SQS

Get

Message

Back

End EC2

Instance

Put

Message

Front

End EC2

Instance

Day 5 – Add more features

Mobile

PushNotifications

MobileAnalytics

CognitoCognito

Sync

Analytics

KinesisData

PipelineRedShift EMR

Your Applications

AWS Global Infrastructure

Network

VPCDirect

ConnectRoute 53

Storage

EBS S3 Glacier CloudFront

Database

DynamoDBRDS ElastiCache

Deployment & Management

ElasticBeanstalk

OpsWorksCloud

FormationCode

DeployCode

PipelineCode

Commit

Security & Administration

CloudWatch ConfigCloudTrail

IAM Directory KMS

Application

SQS SWFApp

StreamElastic

TranscoderSES

CloudSearch

SNS

Enterprise Applications

WorkSpaces WorkMail WorkDocs

Compute

EC2 ELBAuto

ScalingLambdaECS

AWS building blocksInherently Scalable & Highly Available Scalable & Highly Available

Elastic Load Balancing

Amazon CloudFront

Amazon Route53

Amazon S3

Amazon SQS

Amazon SES

Amazon CloudSearch

AWS Lambda

Amazon DynamoDB

Amazon Redshift

Amazon RDS

Amazon Elasticache

Amazon EC2

Amazon VPC

Automated Configurable With the right architecture

Stay focused as you scale your team

AWSCloud-Based

Infrastructure

YourBusiness

More Time to Focus onYour Business

Configuring Your Cloud Assets

70%

30%70%

On-PremiseInfrastructure

30%

Managing All of the “Undifferentiated Heavy Lifting”

Day 6 – Growing fast

Scaling Relational DBs

• Increase RDS instance specs– Larger instance type

– More storage / more PIOPS

• Read Replicas (Master – Slave)– Scale out beyond capacity of single DB instance

– Available in Amazon RDS for MySQL, PostgreSQL and Amazon Aurora

– Replication lag

– Writes => master

– Reads with tolerance to stale data => read replica (slave)

– Reads with need for most recent data => master

Scaling the DB

Web

server

Web

server

Web

server

Web

server

Availability Zone a

RDS DB

instance

ElastiCache

node 1

Availability Zone b

S3 bucket for

static assets

www.example.com

Amazon Route 53

DNS service

Elastic Load

Balancing

RDS DB

standby

ElastiCache

node 2

Scaling the DB

Web

server

Web

server

Web

server

Web

server

Availability Zone a

RDS DB

instance

ElastiCache

node 1

Availability Zone b

S3 bucket for

static assets

www.example.com

Amazon Route 53

DNS service

Elastic Load

Balancing

RDS DB

standby

ElastiCache

node 2 RDS read

replica

Scaling the DB

Web

server

Web

server

Web

server

Web

server

Availability Zone a

RDS DB

instance

ElastiCache

node 1

Availability Zone b

S3 bucket for

static assets

www.example.com

Amazon Route 53

DNS service

Elastic Load

Balancing

RDS DB

standby

ElastiCache

node 2 RDS read

replicaRDS read

replica

What if your app is write-heavy?

Challenge: You will eventually hit the write throughput or

storage limit of the master node

Solutions:

• Federation (splitting into multiple DBs based on function)

• Sharding (splitting one data set up across multiple hosts)

Database federation

• Split up tables to smaller autonomous databases

• Harder to do cross-function queries• Essentially delaying the need for

sharding• Won’t help with single huge

functions/tables

Forums DB

Users DB

Products DB

Sharded horizontal scaling

• Each partition hosts a portion of the rows of a table

• More complex at the application layer

• ORM support can help• No practical limit on scalability• Operation complexity • Shard by key space• RDBMS or NoSQL

User ShardID

002345 A

002346 B

002347 C

002348 B

002349 A

Shard C

Shard B

Shard A

NoSQL data stores

• Trade query & integrity features of Relational DBs for

– More flexible data model

– Horizontal scalability & predictable performance

DynamoDB

Provisioned read/write performance per table

Massive and Seamless Scale

• Distributed system that can scale both reads and writes

– Sharding + Replicas

• Automatic & transparent partitioning:

– Data set size growth

– Provisioned capacity increasestable

Summary

Amazon Route 53

DNS serviceNo limit

Availability Zone a

RDS DB

instance

ElastiCache

node 2

Availability Zone b

S3 bucket for

static assets

www.example.com

Elastic Load

Balancing

RDS DB

standby ElastiCache

node 3

RDS read

replicaRDS read

replica

DynamoDB

RDS read

replicaElastiCache

node 4

RDS read

replicaElastiCache

node 1

CloudSearchLambdaSES SQS

A quick review• Keep it simple and stateless

• Make use of managed self-scaling services

• Multi-AZ and AutoScale your EC2 infrastructure

• Use the right DB for each workload

• Cache data at multiple levels

• Simplify operations with deployment tools

Next steps?READ!

•aws.amazon.com/documentation

•aws.amazon.com/architecture

•aws.amazon.com/start-ups

ASK FOR HELP!

• forums.aws.amazon.com

• aws.amazon.com/support

Performance testing @ JUST EAT

(Or: DoS yourself every night in production to prove you can take it)

@justeat_tech + @petemouncehttp://tech.just-eat.com

Please wait while I start my DoS attack...

(Demo - start fake load, show dashboards)

@justeat_tech + @petemouncehttp://tech.just-eat.com

The problem with performance tests & continuous delivery

● Don’t want to sacrifice continuous delivery & decoupled

teams

● Don’t want performance to suffer

All the usual problems:

● Bottleneck through single environment

● Individual tests take too long

@justeat_tech + @petemouncehttp://tech.just-eat.com

Why?

Continuously test

● performance

● capacity

If we find a problem Thursday night:

1. don’t run fake load over the weekend

2. enjoy weekend as normal

3. fix it next week with leisure

@justeat_tech + @petemouncehttp://tech.just-eat.com

Gamble!

OH: “We deploy tens of small changes a day. I bet we

won’t break production...”

OH: “Let’s just do it in production with fake traffic at the

same time as customers!”

@justeat_tech + @petemouncehttp://tech.just-eat.com

Not that much of a gamble, really

We have tight feedback loops at this point.

Engineers being on call

... highly invested in not regressing performance.

@justeat_tech + @petemouncehttp://tech.just-eat.com

How?

Pick scenarios we care about

Pick data variations to exercise

Add header(s) to discriminate fake load vs customer load

And then:

● Run it every night during peak time

● If no alerts fire, we’re good

@justeat_tech + @petemouncehttp://tech.just-eat.com

What did we gain?

Continuous confidence in capacity

@justeat_tech + @petemouncehttp://tech.just-eat.com

What did we gain?

Continuous confidence in dealing with spikes

@justeat_tech + @petemouncehttp://tech.just-eat.com

What did we gain?

Performance as a 1st-class concern

@justeat_tech + @petemouncehttp://tech.just-eat.com

What did we gain?

Tests become independent of environments’ data

@justeat_tech + @petemouncehttp://tech.just-eat.com

(Remind me to stop my DoS attack now)

(Demo - stop fake load, show dashboards)

@justeat_tech + @petemouncehttp://tech.just-eat.com

@justeat_tech + @petemouncehttp://tech.just-eat.com

Yes, we’re recruiting too.

http://tech.just-eat.com/jobs