Architecting for the Cloud: Hoping for the Best, Prepared for the Worst
-
Upload
martincozzi -
Category
Engineering
-
view
1.598 -
download
0
Transcript of Architecting for the Cloud: Hoping for the Best, Prepared for the Worst
AWS Loft: Behind the scenes with Cotap
Architecting for the Cloud:
Hoping for the best, prepared for the worst.
Infrastructure as Code
Infrastructure as Code
● Current state
● Past decisions
● Tracking the evolution
● CloudFormation
● Design -> JSON
● Version Control!
Infrastructure as Code
Infrastructure as Code
Infrastructure as Code
Infrastructure as Code
Rule #1
All changes have to be under Version
Control
Design for automation
Design for automation
● AutoScalingGroups
● Hardware: CloudFormation
● Software: Configuration management
● Cattle not Cats
Rule #2
No instances should be launched manually.
Monitoring & Alerting
Monitoring & Alerting
● Cost ofo Interruptions
o Waking somebody up
● Channels
● Self-healing infrastructure
● External monitoring
● Page only when critical
Monitoring & Alerting
Situation Channel Page
Disk full 60% Chat, Email ✗
Disk full 90% Chat, Email, PagerDuty ✓
Chef not running for > 30m Chat, Email ✗
Redis not running for > 3 x 5s Chat, Email, PagerDuty ✓
ElasticSearch N-1 Chat, Email ✗
ElasticSearch N-2 Chat, Email, PagerDuty ✓
Monitoring & Alerting
● Cost ofo Interruptions
o Waking somebody up
● Channels
● Self-healing infrastructure
● External monitoring
● Page only when critical
Platform to fail
Platform to fail
● Easy creation of temporary “Stacks”
● Branches can get their own hardware
● Clients can talk to a branch
● QA happens on Sandbox
● Exact copy of Production
● Scale up/down based on needs
● Different Region (us-east-1)
Platform to fail
Platform to fail
● Easy creation of temporary “Stacks”
● Branches can get their own hardware
● Clients can talk to a branch
● QA happens on Sandbox
● Exact copy of Production
● Scale up/down based on needs
● Different Region (us-east-1)
All changes have to go through Sandbox.
Rule #3
Rule #4
Production is just a more powerful Sandbox
Disaster Recovery
Disaster Recovery
● Multi-AZs
● Traffic routing
● Multi-Regions (S3 too)
● AutoScalingGroups Min:1 Max:1
● Off-site backups (VPN + Disks)
● RPO + RTO
Security
Security
● MFA
● Public key distribution
● Root key rotation
● Private/Public Subnets
● ACLs/Security Groups
● Update AMIs
● Trusted Advisor!
Security
Scaling
Scaling
● Preemptive
● Automatic
● Vertically
● Horizontally
● Bottlenecks
Scaling
Cost Control
Cost Control
● Tagso Role
o Environment
● Cost explorer
● Threshold alerting
● Share monthly
● Export to CSV
● Right-Scale (ASG)
Cost Control
Cost Control
Cost Control
● Tagso Role
o Environment
● Cost explorer
● Threshold alerting
● Share monthly
● Export to CSV
● Right-Scale (ASG)
4 rules of 5 nines.
● All changes have to be under VC
● No instance should be launched manually
● All changes are deployed to Sandbox first
● Production is just a more powerful Sandbox