Designing Fault-Tolerant Applications
Miles WardEnterprise Solutions Architect
Building Fault-Tolerant Applications on AWS
White paper published last year
Sharing best practices
We’d like to hear your best practices as well
Copyright © 2011 Amazon Web Services
http://media.amazonwebservices.com/AWS_Building_Fault_Tolerant_Applications.pdf
AWS Fault-Tolerant Building Blocks
Two approaches:
1) AWS services that are inherently fault-tolerant and highly available:
• Amazon Simple Storage Service (S3)
• Amazon SimpleDB
• Amazon SQS, SNS, SES, CloudWatch, CloudFront, and more.
2) AWS services that offer tools and features to design fault-tolerant and highly available systems:
• Amazon Elastic Compute Cloud (EC2)
– Availability Zones, Elastic IPs, EBS, etc.
– Flexible to trade off budget vs. time to recovery
• Amazon Relational Database Service (RDS)
– Multi-AZ Deployments
– Backup/Restore
Copyright © 2011 Amazon Web Services
Region
Availability Zone
Amazon EC2 Architecture
EC2 Instance
Elastic IP
Address
Security
Group(s)
Load Balancing
Elastic
Block
Storage
CloudWatch
Auto
Scaling
Ephemeral
Storage
Amazon S3
EBS
Snapshot
EBS
Snapshot
Amazon
Machine
Image (AMI)
Copyright © 2011 Amazon Web Services
EC2 Features
AMI Packaged, reusable functionality
On-Instance Storage Lifetime tied to instance lifetime AFR like standard hard disk (around 5%)
EBS Volumes Lifetime independent of any particular EC2 instance Redundant within an AZ AFR is 0.1% to 0.5% Incorporate volume mappings into your architecture Use EBS snapshot backups
Copyright © 2011 Amazon Web Services
EC2 Features
Elastic IP Addresses Map to any EC2 instance within a given Region Detach from failed instance; map to replacement
Auto Scaling Two ways to use it:
• Respond to changing conditions by adding or terminating EC2 instances (attach to CloudWatch metrics)
• Maintain a fixed number of instances running, replacing them if they fail or become unhealthy
Reserved Instances Guarantees capacity for when it’s needed
Copyright © 2011 Amazon Web Services
EC2 Features
CloudWatch Alarms
Copyright © 2011 Amazon Web Services
EC2 Features
Elastic Load Balancing
Distributes incoming traffic across multiple instances
Sends traffic only to healthy instances
Copyright © 2011 Amazon Web Services
Amazon EC2 Regions and Availability Zones
US East (Northern Virginia) EU (Dublin)
Availability Zone A
Availability Zone B
Availability Zone A
Availability Zone B
Availability Zone C
Availability Zone D
Amazon EC2 Regions:
US East (Northern Virginia) / US West (Northern California) /
EU (Ireland) / Asia Pacific (Singapore) / Asia Pacific (Tokyo)
Copyright © 2011 Amazon Web Services
Availability Zone Characteristics and Advice
Distinct physical locations
Low-latency network connections between AZs
Independent power, cooling, network, security
Always partition app stacks across 2 or more AZs
Elastic Load Balance across instances in multiple AZs
Copyright © 2011 Amazon Web Services
Centralized Services (S3 Backups, SimpleDB, etc)
Proper Use of Multiple Availability Zones
Availability Zone A Availability Zone B
Elastic Load Balancer
Incoming Requests
Web Server Web Server
App Server App Server
Database Server or
RDS DB Instance
Database Server or
RDS DB Instance
Requests and Health Checks
Copyright © 2011 Amazon Web Services
Region Characteristics and Advice
Regions are:
Functionally separate
Composed of 2 or more AZs
Connected via the public internet
Use regions to:
Have functionality geographically close to customers
Comply with national laws and practices
Implement a DR strategy
RDS Fault-Tolerant Features
Multi-AZ Deployments
Synchronous replication across AZs
Automatic fail-over to standby replica
Automated Backups
Enables point-in-time recovery of the DB instance
Retention period configurable
Snapshots
User initiated full backup of DB
New DB can be created from snapshots
AWS Architectural Guidance
Copyright © 2011 Amazon Web Services
Design For Failure – Basic Principles
Avoid single points of failure
Assume everything fails, and design backwards
Goal: Applications should continue to function even if the underlying physical hardware fails or is removed or replaced.
Design your recovery process
Trade off business needs vs. cost of high-availability
Copyright © 2011 Amazon Web Services
Design For Failure – Use AWS Building Blocks
Use Elastic IP addresses for consistent and re-mappable routes
Use multiple Amazon EC2 Availability Zones (AZs)
Replicate data across multiple AZs Example: Amazon RDS Multi-AZ mode
Use real-time monitoring (Amazon CloudWatch)
Use Amazon Elastic Block Store (EBS) for persistent file systems
Take EBS Snapshots and use S3 for backups
Copyright © 2011 Amazon Web Services
Build Loosely Coupled Systems
Use independent components
Design everything as a Black Box
Load-balance and scale clusters
Think about graceful degradation
Controller
A
Controller
B
Controller
C
Controller
A
Controller
B
Controller
C
Q Q Q
Tight
Coupling
Loose Coupling
using Queues
Amazon SQS as Buffers
Copyright ©
2011 Amazon
Web Services
Implement Elasticity
Don’t assume health or fixed location of components
Use designs that are resilient to reboot and re-launch
Bootstrap your instances – “Who am I am and what is my role?”
Enable dynamic configuration
Use configurations in SimpleDB for bootstrapping
Use Auto Scaling
Use Elastic Load Balancing on each tier
Copyright © 2011 Amazon Web Services
Implementing ElasticityElastic Load Balancing, CloudWatch, and AutoScaling
Elastic Load
Balancing
CloudWatchAuto Scaling
Utilization
Metrics
Copyright © 2011 Amazon Web Services
Use a Chaos Monkey
From the Netflix blog:
Simple monkey: Kill any instance in the account
Complex monkey: Kill instances with specific tags Introduce other faults (e.g. connectivity via Security Group)
Human monkey: Kill instances from the AWS Management Console
http://techblog.netflix.com/2010/12/5-lessons-weve-learned-using-aws.html
Copyright © 2011
Amazon Web
Services
AWS Architecture Center
aws.amazon.com/architecture
White papers:
Cloud architectures
Building fault-tolerant applications
Web hosting best practices
Leveraging different storage options
AWS security best practices
Copyright © 2011 Amazon Web Services
Thank You!
Copyright © 2011 Amazon Web Services
Top Related