AWS fault tolerant architecture

33
+ Dynamic Fault Tolerant Applications using AWS Sumit Kadyan University Of Victoria

description

Describing methods on how to achieve fault tolerance in web by making the website architecture fault tolerant.The different architecture have been explained and picking one of the best approach in the end to describe why this approach is a good one.

Transcript of AWS fault tolerant architecture

Page 1: AWS fault tolerant architecture

+

Dynamic Fault Tolerant Applications using AWS

Sumit KadyanUniversity Of Victoria

Page 2: AWS fault tolerant architecture

+Agenda

Motivation

How do we design FT web services on AWS

Research in Load Balancing Algorithms

Future Study

Questions!!

Page 3: AWS fault tolerant architecture

+Motivation

Not everything on the cloud is fault tolerant!!

You have to design it to be Fault Tolerant

AWS offers Dynamic Fault tolerance

Around 40% of the users using AWS do not deploy any redundancy in their setup.

The price involved in using resources on the cloud has fallen by Roughly 2500% in 7 years.

AWS service warranty claims 99.95% availability. That’s around 4 hours downtime in a year.

Page 4: AWS fault tolerant architecture

+Inherent Fault tolerant components

Amazon Simple storage (S3)

Amazon Elastic Load Balancing(ELB)

Amazon Elastic Compute Cloud(EC2)

Amazon Elastic Block Store (EBS)

“The above inherit Fault tolerant components provide features such as AZ, Elastic IP’s , Snapshots that a Fault Tolerant HA system must take advantage of and use Correctly” .

Simply said AWS has given you the resources to make HA / FT applications.

Page 5: AWS fault tolerant architecture

+AWS Components

Amazon EC2 (Amazon Elastic Compute Cloud) :- Web service that provides computing resources i.e. server instances to host your software.

AMI (Amazon Machine Image) : Template basically contains s/w & h/w configuration applied to instance type.

EBS (Elastic Block Store) :- Block Level storage volumes for EC2’s. Not associated with instance. AFR is around .1 to .5 %.

Page 6: AWS fault tolerant architecture

+Availability Zones

Amazon AZ are zones within same region.

Engineered to be insulated from failures of other AZ’s.

Independent Power, cooling, network & security.

Page 7: AWS fault tolerant architecture

+Elastic IP Addresses

Public IP addresses that can be mapped to any EC2 Instance within a particular EC2 region.

Addresses are associated with AWS account and not the instance.

In case of failure of EC2 Component , detach Elastic IP from the failed component and map it to a reserve EC2.

Mapping downtime around 1-2 Mins.

Page 8: AWS fault tolerant architecture

+Auto Scaling

Auto Scaling enables you to automatically scale up or down the EC2 capacity.

You Define your own rules to achieve this. E.g. When no of running EC2’s < X , launch Y EC2’s.

Use metrics from Amazon CloudWatch to launch/terminate EC2’s . E.g. resource utilization above certain threshold.

E.g. of AS & ELB next ->

Page 9: AWS fault tolerant architecture

+Elastic Load Balancing

Elastic Load Balancer distributes incoming traffic across available EC2 instances.

Monitors EC2’s and removes Failed EC2 resources.

Works in parallel with Auto Scaling to provide FT.

Page 10: AWS fault tolerant architecture

+Implement N+1 Redundancy

Auto Scaling & ELB Lets say N=1 .

Define rule X :- 2 Instances of defined AMI always available.

ELB distributes load among the 2 servers. Enough capacity for each server to handle the entire capacity i.e. N=1

Server 1 Goes down

Server 2 can process the entire traffic.

Auto Scaling identifies failure and launches healthy EC2 using the AMI to fulfill rule X.

Page 11: AWS fault tolerant architecture

+Fault Tolerance Web Design

Architecting High Availability in AWS High Availability in the Web/App Layer High Availability in the Load Balancing Layer High Availability in the Database Layer

Page 12: AWS fault tolerant architecture

+Web/App Layer

It is a common practice to launch the Web/App layer in more than one EC2 Instance to avoid SPOF.

How would user session information be shared between the EC2 servers?

It is hence necessary to synchronize session data among EC2 servers.

Not every user can work with stateless server configurations.

Page 13: AWS fault tolerant architecture

+Web/App Layer

Page 14: AWS fault tolerant architecture

+Web/App Layer

Option 1 : JGroups

Toolkit for reliable messaging

Can be used by Java based servers.

Suited for max of around 5-10 EC2’s.

Not suited for larger architectures.

Page 15: AWS fault tolerant architecture

+Web/App server

Option 3 : RDMS

Many use it but considered poor design.

Master will be overwhelmed by session requests.

A m1.RDS MySQL Master has max 600 connections. 400 online users will generate session requests. Only 200 connections left to serve transaction/user authentication requests.

Can cause intermittent web service downtime due to above reason.

Page 16: AWS fault tolerant architecture

+Web/App Layer

Option 2:- MemCached

Highly Used , Supports multiple platforms.

Save user session data in multiple nodes to avoid SPOF (trade off latency to write to multiple nodes)

Depending on requirements create high memory EC2 instances for MemCached/Elasti Cache.

Can scale up to tens of thousands of requests.

Page 17: AWS fault tolerant architecture

+Load Balancing Layer

It balances the load among the available EC2 instances.

SPOF in the LB can bring down the entire site during outage.

Equally important as replicating servers, databases etc.

Many ways to build highly available Load balancing Tier.

Page 18: AWS fault tolerant architecture

+Load Balancing Tier

Option 1: Elastic Load Balancer

Inherently Fault Tolerant.

Automatically distributes incoming traffic among EC2 Instances.

Automatically creates more ELB EC2 Instance when load increases to avoid SPOF.

Detects health of EC2 Instances and routes to only healthy instances.

Page 19: AWS fault tolerant architecture

+ELB Implementation

Architecture

Single Server Setup

Not Recommended , yet most followed!!

What is there to balance !!!??

No fault tolerance benefit.

SPOF in the terms of LB & EC2 instance.

Page 20: AWS fault tolerant architecture

+ELB Implementation Architecture

Multi-Server Setup (in AZ)

HTTP/S requests are directed to EC2 by the ELB.

Multiple EC2 instances in same AZ under ELB tier.

ELB load balances the requests between the Web/App EC2 instances.

Page 21: AWS fault tolerant architecture

+ELB Implementation Architecture

ELB with Auto Scaling(inside AZ)

Web/App Ec2 are configured with AutoScaling to scale out/down.

Amazon ELB can direct the load seamlessly to the EC2 instances configured with AutoScaling.

Page 22: AWS fault tolerant architecture

+ELB Implementation Architecture

Multiple AZ’s inside a Region

Multiple Web/App EC2 instances can reside across multiple AZ’s inside a AWS region.

ELB is doing multi AZ load balancing.

Page 23: AWS fault tolerant architecture

+ELB Implementation Architecture

ELB with Amazon AutoScaling across AZ’s

EC2 can be configured with amazon autoscaling to scale out/down across AZ’s.

Highly recommended . Highest Availability offered among all ELB implementations.

Page 24: AWS fault tolerant architecture

+ Issues with ELB

Supports only round-robin & sticky session algorithms. Weighted as of 2013.

Designed to handle incremental traffic. Sudden Flash traffic can lead to non availability until scaling up occurs.

The ELB needs to be “Pre-warmed” to handle sudden traffic. Currently not configurable from the AWS console.

Known to be “non – round robin” when requests are generated from single or specific range of IP’s.

Like multiple requests from within a company operating on a specific range of IP.

Page 25: AWS fault tolerant architecture

+3rd party Load Balancer

3rd Party Load Balancers

Nginx & Haproxy to work as Load Balancers.

Use your own scripts to scale up EC2 ‘s & LB’s.

AutoScaling Works best with ELB.

Page 26: AWS fault tolerant architecture

+Load Balancing Algorithms

Random :- Send connection requests to server randomly (Simple but inefficient)

Round Robin :- Round Robin passes each new connection request to next server in line. Eventually distributing connections evenly.

Weighted Round Robin :- Assign weights to Machines based on the capacity , no of connections each machine receives depends on weights.

More Algos such as Least Connections, Fastest etc.

Page 27: AWS fault tolerant architecture

+Proposed Research

A Load Balancing Algorithm that adapts its strategies for allocating web requests dynamically.

Prober :- Gather Status info from Web Servers every 50 ms. CPU Load on server Server’s response rate No of requests served

Allocator: - Based on prober update , allocator updates weights allocated.

The proposed algo differs by considering local & local information at each web server to choose the best server to allocate request.

Page 28: AWS fault tolerant architecture

+Real Time Server Stats Load

Balancing (RTSLB)

Deciding Factors used in algorithm

Weighted metric of cache hits on different servers.

CPU Load of Web Server

Server Response Rate

No of Clients requests being handled

Page 29: AWS fault tolerant architecture

+Architecture

Page 30: AWS fault tolerant architecture

+Algorithm

Page 31: AWS fault tolerant architecture

+Results

RTSLB outperforms the other Load based algorithms. The difference would be much higher if the no of connections would increase.

Page 32: AWS fault tolerant architecture

+Future Study

Neural Networks based LB algorithms have a promising future.

Increasing availability by further improving existing LB Algorithms.

Studying the results in a cloud environment.

Page 33: AWS fault tolerant architecture

+Questions