AWS Public Sector Symposium 2014 Canberra | Managing Seasonal Workloads on AWS

Post on 08-May-2015

703 views 1 download

description

Technical deep dive in to 10 AWS Cloud best practices with in-depth look at the tips and tricks of architecting on the AWS platform.

Transcript of AWS Public Sector Symposium 2014 Canberra | Managing Seasonal Workloads on AWS

AWS Government, Education, & Nonprofits Symposium

Canberra, Australia | May 20, 2014

Managing Seasonal Workloads on AWS Clayton Brown Ecosystem Solution Architect

Managing Seasonal Workloads on AWS

Why are customers adopting cloud computing?

Variable expense Replace capital expenditure with variable expense

Source  IDC  Whitepaper,  sponsored  by  Amazon,  “The  Business  Value  of  Amazon  Web  Services  

Accelerates  Over  Time.”    July  2012  

Average  of  400  servers  replaced  per  customer  

Economies of scale Lower variable expense than companies can achieve themselves

Why are customers adopting cloud computing?

Saved  $34m  on  SmartHub  applica;on  

10’s  of  millions  of  $  saved  with  first  12  apps  

migrated  to  AWS  

50%  reduc;on  in  analy;cs  

costs  

Mul;ple  global  regions    helps  build  highly  available  

applica;ons  

Web  Server  

Availability  Zone  1  

Web  Server  

Availability  Zone  2  

Web  Server  

Regional AWS design provides Highly Availability as a Baseline

Corporate Data Center

Which can be fully integrated with existing assets

Demand  

Time  Week  1   Week  2   Week  3   Week  4   Week  5  

Wasted Capacity

Lost Customers,

Rush Hardware Wasted Capacity

Lost Customers,

Rush Hardware

Lost Customers, Rush Hardware

1m  

1.5m  

2.0m  

Scaling on-premise infrastructure can be a challenge

Sizing capacity for peak is harder even still

Demand  

Q1   Q2   Q3   Q4   Q1  

Wasted Capacity

Lost Customers,

Order Hardware

Wasted Capacity

Wasted Capacity

Wasted Capacity

200k  

300k  

600k  

Time  

Capacity  of  Resources  Actual  Demand  

3000 Cores for risk management processes N

umbe

r of C

ores

300 Cores on weekends

Thu Fri Sun Mon Tue Sat Wed

3000 -

300 -

Different workloads have different usage patterns

Sunday Monday Tuesday Wednesday Thursday Friday Saturday

Typical  weekly  traffic  to  Amazon.com  

Provisioned capacity

November  traffic  to  Amazon.com  

November

November  traffic  to  Amazon.com  Provisioned capacity

November

November  traffic  to  Amazon.com  76%

24%

Provisioned capacity

November

Actual  demand  

Predicted  demand  

Customer  dissa;sfac;on  

Waste  

Demand  

Time  

Elastic capacity No need to guess capacity requirements and over-provision

AWS enables companies to match resources to demand

Elastic capacity No need to guess capacity requirements and over-provision

Elas;c  capacity  

Demand  

Time  

AWS enables companies to match costs to demand

November 10th 2010 Turned off last physical web server of

Amazon.com

October 31st 2011 Turned off last web servers supporting

European business

November  traffic  to  Amazon.com  

November

Num

ber o

f EC

2 In

stan

ces

4/12/2008 4/14/2008 4/15/2008 4/16/2008 4/18/2008 4/19/2008 4/20/2008 4/17/2008 4/13/2008

40  servers  to  5000  in  3  days  

EC2 scaled to peak of 5000 instances

“Techcrunched” Launch of Facebook

modification Steady state of ~40

instances

Automation is a key enabler to elastic usage

Bootstrapping or DEV-OPS The process of automatically configuring the software and settings on your machines as they boot, each time they boot. Your infrastructure as code.

Amazon Route 53 Elastic Load Balancer

The image cannot be displaye

S3 Bucket CloudFront Distribution

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been

Web Servers

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been

Web Servers

Web ASG Elastic Beanstalk

App

App

Master

Standby

RR 1

RR 2

RR 3

RR 4

ElastiCache Cluster

This is a stack

In AWS everything can be Automated , everything is an API

Resources are not longer finite, they are elastic in AWS

Cloud  Forma=on  is  a  great  Cookie  Cu@er  

Your infrastructure as code.

This is a STACK. JavaScript Object Notation ( JSON ) A template of your datacenter / workload. Your infrastructure as code.

Headers Parameters Mappings Resources Outputs

Git Subversion Mercurial

Dev

Test

Prod

Cloud  Forma=on  is  context  aware  

Your infrastructure as code.

Create: PROD

dev.mysite.com test.mysite.com

prod.mysite.com

Create: TEST Create: DEV

Elastic resources requires Utility Pricing

Enabling customers to Optimize Costs based on Utilization

Meeting base workload, variable and peak with different pricing models

Architecting Tips for scaling to meet Seasonal Patterns

Auto Scaling groups are useful for more than just fault tolerance

•  Vertical Scaling

•  Horizontal Scaling

•  Auto Scaling

•  Scheduled Scaling

•  Programmatic Scaling

•  Datasbse Tier Scaling

•  Asynchronous Process Scaling

•  Event Scaling

ASG == Minimum unit of deployment

myAutoScalingGroup -  myLaunchConfig -  Min 1 -  max 1 -  desired 1

Launch Configuration

ami-0535d66c

ap-southeast2-a ap-southeast2-b

myElasticLoadBlancer

myLaunchConfig - ami-0535d66g - m3.large

Minimum instance of 1 creates Auto Healing Groups

Vertical Scaling (Scale UP)

Vertical Scaling using different instance types

0 0.5

1 1.5

2 2.5

3 3.5

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29

DB

Inst

ance

Typ

e

Days of the Month

End of the Month Scaling

75% Savings

Small 1.7 GB, 1 ECU 1 virtual core

Large 7.5 GB 4 ECUs 2 virtual cores

Extra Large 15 GB 8 ECUs 4 virtual cores

Hi-Mem XL 17.1 GB 6.5 ECUs 2 virtual cores

Hi-Mem 2XL 34.2 GB 13 ECUs 4 virtual cores

Hi-Mem 4XL 68.4 GB 26 ECUs 8 virtual cores

High-CPU Med 1.7 GB 5 ECUs 2 virtual cores

High-CPU XL 7 GB 20 ECUs 8 virtual cores

Micro 613 MB Up to 2 ECUs (for short bursts)

Cluster GPU 4XL 22 GB 33.5 ECUs 8 Nehalem virtual cores 2 x NVIDIA Tesla “Fermi” M2050 GPUs

Cluster Compute 4XL 23 GB 33.5 ECUs 8 Nehalem virtual cores

Cluster Compute 8XL 60.5 GB 88 ECUs 8 core 2 x Intel Xeon

Medium 3.75 GB 2 ECUs 1 virtual cores

Memory intensive Cluster Compute

Processor Intensive

Average Applications

Minimal resources

Multiple Family Types, optimized for different uses

Multiple sizes of instance within a family type

Vertical Scaling using Launch Configurations

myAutoScalingGroup -  smallConfig -  Min 1 -  Max 2 -  desired 1 -  TP: Oldest Instance

ami-0535d66c

ElasticIP (EIP) / Elastic NIC (ENI)

Launch Config A

smallConfig - ami-0535d66g - small

ap-southeast2-a

Launch Config B

bigConfig - ami-0535d66g - large

UPDATE myAutoScalingGroup -  largeConfig -  Min 1 -  Max 2 -  Desired 2 -  TP: Oldest Instance

Ver;cal  Scaling  

UPDATE Desired = 1

Database Tier scaling is automated when using RDS

Push Button Scaling

UP - DOWN

Read Only Replica

IN- OUT

Snapshot & Restore

ON – OFF

Database Tier management is heavily automated using RDS

High Availability

Host Replacement

High Scalability

Asynchronous Replication

Horizontal Scaling (Scale OUT)

ap-southeast2-a ap-southeast2-b Launch

Configuration

ami-0535d66c

myLaunchConfig - ami-0535d66g - m3.large

myAutoScalingGroup -  myLaunchConfig -  Min 2 -  max 100 -  Desired 2

elb-cname.amazonaws.com

ASG UPDATE Desired = 4

Elastic Load Balancing (ELB) over multiple Availability Zones (AZs)

ASG UPDATE Desired = 2

HOST LEVEL

METRICS

AGGREGATE LEVEL

METRICS

LOG ANALYSIS

EXTERNAL SITE

PERFORMANCE

Auto Scaling (Elastic Usage)

ap-southeast2-a ap-southeast2-b Launch

Configuration

ami-0535d66c

myLaunchConfig - ami-0535d66g - m3.large

myAutoScalingGroup -  myLaunchConfig -  Min 2 -  max 100 -  Desired 2

Desired = 4

Auto Scaling using Policies to Scale Out

Scale UP +1

Scale DOWN -1

ap-southeast2-a ap-southeast2-b Launch

Configuration

ami-0535d66c

myLaunchConfig - ami-0535d66g - m3.large

myAutoScalingGroup -  myLaunchConfig -  Min 2 -  max 100 -  Desired 2

API Update Desired = 4

Auto Scaling using API to Scale In / Out

Scale UP +1

Scale DOWN -1

AutoSclaingGroups* -  myLaunchConfig -  Min 0 -  max 100 -  Desired 0

Launch Configuration

ami-0535d66c

ap-southeast2-a ap-southeast2-b

launchWhenCheap - ami-0535d66g -  m3.large -  Spot-price : 0.05

Automate Workload Patterns using Scheduled Scaling

as-put-scheduled-update-group-action ScaleUp --auto-scal`ing-group my-test-asg --recurrence “30 0 1 1,6,12 0” --desired-capacity 20

as-put-scheduled-update-group-action ScaleOff --auto-scaling-group my-test-asg --start-time "2013-05-13T08:00:00Z" --desired-capacity 0

Auto Scaling with Alarms & Policies

Achieve High Utilization with this style of architecture, eliminating waste

Trigger  auto-­‐scaling  policy  

Reserved Instances On Demand Spot Pricing

Scheduled Adaptive Predictive

Optimize delivery using S3 static hosting and CloudFront

London  

Paris  

NY  

Served  from  S3  /images/*  

 

3  

Served  from  EC2  *.php    

2  

Single  CNAME  www.mysite.com  

 

1  

Lower Cost Lower Latency Higher Scale

Fault Tolerance High Availability High Utilization

Scaling Asynchronous Processing

Asynchronous Process Scaling with SQS Messaging

•  Amazon managed queue service •  Decouple your components •  Think parallel •  Implement elasticity •  Drive Auto Scaling fleets using Queue Depth

Controller A Controller B Controller C

Controller A Controller B Controller C

Q Q Q

Tight Coupl ing

Loose Coupling using Queues

Amazon  SQS  

Processing  task/processing  trigger  

Processing  results  

Min 5 Min 10 Min 2

S3 Bucket For Ingest

User

SNS Topic

RRS S3 Bucket to

Serve content to CloudFron

t

S3 Bucket For

originals

CloudFront Download Distribution

SQS Queue Size for Thumbnail

SQS Queue Size Image for

Mobile

SQS Queue Size Image for Web

Auto scaling Group

Instances

Auto scaling Group

Instances

Auto scaling Group

Instances

Asynchronous Process Scaling with SQS Messaging (SQS)

S3 Bucket For Ingest

User

RRS S3 Bucket to

Serve content to CloudFront

S3 Bucket For

originals

CloudFront Download Distribution

Auto scaling Group

Instances

Auto scaling Group

Instances

Auto scaling Group

Instances

SWF

Instance running decider

Asynchronous Process Scaling with Simple Workflow (SWF)

AutoSclaingGroups* -  myLaunchConfig -  Min 0 -  max 100 -  Desired 0

Launch Configuration

ami-0535d66c

ap-southeast2-a ap-southeast2-b

launchWhenCheap - ami-0535d66g -  m3.large -  Spot-price : 0.05

Optimize costs using Auto Bidding groups and spot pricing

aws autoscaling create-launch-configuration --launch-configuration-name launchWhenCheap --spot-price 0.05

SQS queue

Consumers

Producer

Consumers

Amazon Elastic MapReduce Hadoop Cluster

HDFS

Task Node

Core Node

Amazon S3

Amazon DynamoDB/RDS

BI Apps

Via Flume/Fluentd (Log Aggregator) Logs

from EC2

Instances

Code/ Scripts

Amazon S3

Amazon Elastic MapReduce

HiveQL Pig Latin Cascading

Mapper Reducer

Runs multiple JobFlow Steps

Name Node

JDBC/ODBC

HiveQL Pig Latin

Query

Task Node

Core Node

Scale 1000s of nodes when needed a back to zero using EMR

Optionally using a Spot Pricing strategy on task nodes

Event Based Scaling

Parameterized Scaling via CloudFormation

myAutoScalingGroup -  myLaunchConfig -  Min 2 -  max 100 -  Desired inputParameter

Are you confident your N+1?

February, 2012

Automated failover using pilot light configurations

Web Server

Application Server

Database Server

Data Volume

Data Mirroring/ Replication

Not Running

Smaller Instance

Amazon Route 53

User or system

Web Server

Application Server

Database Server

Data Volume

UPDATE Desired = 0 à 1 Desired = 0 à 1 Desired = 1 à 1

Web Server

Application Server

Just in Time systems which can be during an event

•  ~30th biggest E-commerce operation, globally •  ~200 distinct applications, many mobile •  Hundreds of new, untested analytical approaches •  Processing hundreds of TB of data on thousands of servers •  Spikes of hundreds of thousands of concurrent users •  Critically compressed budget •  Less than a year to execute •  Core systems will be used for a single critical day •  Constitutionally-mandated completion date

Support Systems which can be retired immediately after an event

THANK YOU Please give us your feedback by filling out the Feedback Forms

AWS Government, Education, & Nonprofits Symposium

Canberra, Australia | May 20, 2014