Availability & Scalability with Elastic Load Balancing & Route 53 (CPN204) | AWS re:Invent 2013

109
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. David Brown (Elastic Load Balancing) Sean Meckley (Amazon Route 53) Paul Kearney (InfoSpace) November 15, 2013 Architecting for Availability & Scalability with Elastic Load Balancing and Amazon Route 53 Thursday, November 21, 13

description

Elastic Load Balancing provides a scalable and highly-available load balancer that automatically distributes incoming application traffic across multiple Amazon EC2 instances. It enables you to achieve even greater fault tolerance in your applications, seamlessly providing the amount of load balancing capacity needed in response to incoming application traffic. In this session, we take a deeper look at some of the existing and newer features that enable application developers to architect highly-available architectures that are resilient to load spikes and application failures. We also explore some of the features that allow seamless integration with services such as Auto Scaling and Amazon Route 53 to further improve the scalability and resilience of your applications.

Transcript of Availability & Scalability with Elastic Load Balancing & Route 53 (CPN204) | AWS re:Invent 2013

  • 1. Architecting for Availability & Scalability with Elastic Load Balancing and Amazon Route 53 David Brown (Elastic Load Balancing) Sean Meckley (Amazon Route 53) Paul Kearney (InfoSpace)November 15, 2013 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. Thursday, November 21, 13

2. welcome !2 Thursday, November 21, 13 3. Everything fails all the time Werner Vogels, CTO, Amazon.com!2 Thursday, November 21, 13 4. Avoid single points of failure.!4 Thursday, November 21, 13 5. Elastic Load Balancing and Amazon Route 53 are critical components when building scalable and highly-available applications.!5 Thursday, November 21, 13 6. Load BalancerElasticSecureIntegratedCost-Effective[ What is Elastic Load Balancing? ] !6 Thursday, November 21, 13 7. Availability Zone 1aEC2 InstancesEC2 Instances Elastic Load Balancing (Internal)Elastic Load Balancing ClientEC2 InstancesEC2 Instances Availability Zone 1b[ What is Elastic Load Balancing? ] !7 Thursday, November 21, 13 8. 3Levels of Availability!7 Thursday, November 21, 13 9. 1 Instance Availability!8 Thursday, November 21, 13 10. 12Instance AvailabilityZonal Availability!8 Thursday, November 21, 13 11. 123Instance AvailabilityZonal AvailabilityRegional Availability!8 Thursday, November 21, 13 12. 1 Instance Availability!9 Thursday, November 21, 13 13. First step in increasing the availability of a system or application.!10 Thursday, November 21, 13 14. ClientEC2 InstanceLoad balancer used to route incoming requests to multiple EC2 instances[ Instance Redundancy ] !12 Thursday, November 21, 13 15. EC2 InstanceElastic Load Balancing ClientEC2 InstanceLoad balancer used to route incoming requests to multiple EC2 instancesEC2 Instance[ Instance Redundancy ] !13 Thursday, November 21, 13 16. Incoming request load shared by all instances behind the load balancer.!13 Thursday, November 21, 13 17. EC2 InstanceElastic Load Balancing ClientEC2 InstanceLeastconns used to spread request across healthy instancesEC2 Instance[ Request Routing ] !15 Thursday, November 21, 13 18. EQUAL UTILIZATION ON EACH INSTANCE EC2 InstanceElastic Load Balancing ClientEC2 InstanceLeastconns used to spread request across healthy instancesEC2 Instance[ Request Routing ] !15 Thursday, November 21, 13 19. TARGETS INSTANCES WITHEQUAL UTILIZATION ON EACH INSTANCE FEWEST OUTSTANDING REQUESTS EC2 InstanceElastic Load Balancing ClientEC2 InstanceLeastconns used to spread request across healthy instancesADJUSTS TO REQUESTSMOOTHS REQUEST LOADRESPONSE TIMESACROSS ALL INSTANCESEC2 Instance[ Request Routing ] !15 Thursday, November 21, 13 20. Instances that fail can be replaced seamlessly while other instances continue to operate.!15 Thursday, November 21, 13 21. EC2 InstanceElastic Load Balancing ClientEC2 InstanceApplication level health checks ensure that request traffic is shifted away from a failed instanceEC2 Instance[ Health Checks ] !17 Thursday, November 21, 13 22. FAILURE DETECTEDX EC2 InstanceElastic Load Balancing ClientEC2 InstanceApplication level health checks ensure that request traffic is shifted away from a failed instanceEC2 Instance[ Health Checks ] !17 Thursday, November 21, 13 23. TRAFFIC SHIFTEDFAILURE DETECTEDXX EC2 InstanceElastic Load Balancing ClientEC2 InstanceApplication level health checks ensure that request traffic is shifted away from a failed instanceEC2 Instance[ Health Checks ] !17 Thursday, November 21, 13 24. TRAFFIC SHIFTEDFAILURE DETECTEDXX EC2 InstanceElastic Load Balancing ClientEC2 InstanceApplication level health checks ensure that request traffic is shifted away from a failed instanceEC2 InstanceHEALTHY INSTANCES CARRY ADDITIONAL REQUEST LOAD[ Health Checks ] !17 Thursday, November 21, 13 25. TRAFFIC SHIFTEDFAILURE DETECTED USED TO DETERMINE THE HEALTH OF THE INSTANCE X AND APPLICATIONXEC2 InstanceElastic Load Balancing ClientEC2 InstanceTCP AND HTTPApplication level health checks ensure that request traffic is shifted away from a failed instanceCONSIDER THE DEPTH AND ACCURACY OF YOUR EC2 InstanceHEALTH CHECKS[ Health Checks ]CUSTOMIZE FREQUENCYAND FAILURE THRESHOLDS HEALTHY INSTANCES CARRY ADDITIONAL REQUEST LOAD 503 ERRORS RETURNED IF NO HEALTHY INSTANCES!17 Thursday, November 21, 13 26. Auto Scaling can be used to automatically adjust instance capacity up or down depending on conditions you define.!18 Thursday, November 21, 13 27. Elastic Load BalancingEC2 InstanceEC2 InstanceEC2 Instance[ ELB & Auto Scaling ] !19 Thursday, November 21, 13 28. Elastic Load BalancingEC2 InstanceEC2 InstanceEC2 Instance[ ELB & Auto Scaling ]LOAD INCREASES!19 Thursday, November 21, 13 29. Elastic Load BalancingINSTANCES ADDED FOR INCREASED LOADEC2 InstanceEC2 InstanceEC2 InstanceEC2 InstanceEC2 Instance[ ELB & Auto Scaling ] !19 Thursday, November 21, 13 30. Elastic Load BalancingEC2 InstanceLOAD DECREASESEC2 InstanceEC2 InstanceEC2 InstanceEC2 Instance[ ELB & Auto Scaling ] !19Thursday, November 21, 13 31. Elastic Load BalancingEC2 InstanceEC2 InstanceINSTANCES REMOVED AS LOAD DECREASESEC2 Instance[ ELB & Auto Scaling ] !19 Thursday, November 21, 13 32. Elastic Load BalancingINSTANCES REMOVED AS LOAD DECREASESAUTOMATICALLY SCALES INSTANCES UP OR DOWNEC2 InstanceAUTOMATICALLY REPLACESEC2 InstanceCUSTOM SCALING METRICSEC2 InstanceREDUCES COSTSFAILED INSTANCES[ ELB & Auto Scaling ] !19 Thursday, November 21, 13 33. 2 Zonal Availability!19 Thursday, November 21, 13 34. Availability Zones are distinct geographical locations that are engineered to be insulated from failures in other zones.!20 Thursday, November 21, 13 35. Region Availability Zone !21 Thursday, November 21, 13 36. It is important to run application stacks in more than one zone.!22 Thursday, November 21, 13 37. Avoid unnecessary dependencies between zones.!23 Thursday, November 21, 13 38. Zone 1aEC2 InstancesLoad balancer used to balance across instances in multiple Availability Zones.Elastic Load Balancing ClientEC2 InstancesZone 1b[ Availability Zone Redundancy ] !25 Thursday, November 21, 13 39. Each load balancer will contain one or more DNS records, one for each load balancer node.!25 Thursday, November 21, 13 40. ClientElastic Load Balancing192.0.2.1EC2 InstanceEC2 InstanceEC2 Instance192.0.2.2EC2 InstanceEC2 InstanceEC2 Instance[ Understanding DNS ] !27 Thursday, November 21, 13 41. ClientElastic Load Balancing192.0.2.1192.0.2.2DNS ROUND ROBIN USED TOEXPECT DNS RECORDSBALANCE TRAFFIC BETWEEN AVAILABILITY ZONESEC2 InstanceEC2 InstanceEC2 InstanceTO CHANGE OVER TIME EC2 InstanceEC2 InstanceEC2 Instance[ Understanding DNS ] EACH LOAD BALANCER DOMAIN NAME MAY CONTAIN MULTIPLE A RECORDS !27 Thursday, November 21, 13 42. Using multiple Availability Zones does bring a few challenges.!27 Thursday, November 21, 13 43. requests / minuteAvailability Zones may see traffic imbalances due to clients caching DNS records.time[ Multiple Zone Challenges ] !28 Thursday, November 21, 13 44. 2Zone 1aAn unequal number of instances per zone can lead to over utilization of instances in a zone.EC2 Instances Elastic Load Balancer Client3 EC2 InstancesZone 1b[ Multiple Zone Challenges ] !30 Thursday, November 21, 13 45. Problem solved.!30 Thursday, November 21, 13 46. Cross-Zone Load Balancing distributes traffic across all healthy instances, regardless of Availability Zone.!31 Thursday, November 21, 13 47. Zone 1a2 Effectively balances the request load across all instances behind the load balancer.EC2 Instances Elastic Load Balancing Client3 EC2 InstancesZone 1b[ Cross-Zone Load Balancing ] !33 Thursday, November 21, 13 48. requests / minuteTraffic is spread evenly across each of the active Availability Zones.time[ Cross-Zone Load Balancing ] !33 Thursday, November 21, 13 49. requests / minuteAvailability Zones may ELIMINATES IMBALANCES IN NO BANDWIDTH CHARGE FOR CROSS-ZONE TRAFFICREQUESTS DISTRIBUTED EQUALLY TO ALL INSTANCES REGARDLESS OF ZONEsee UTILIZATION INSTANCE traffic imbalances due to clients caching DNS records.REDUCES IMPACT OF CLIENTS CACHING DNS RECORDStime[ Cross-Zone Load Balancing ] !33 Thursday, November 21, 13 50. 3 Regional Redundancy!35 Thursday, November 21, 13 51. Elastic Load Balancing and Amazon Route 53 have been integrated to support a single application across multiple regions.!36 Thursday, November 21, 13 52. Region Availability Zone !37 Thursday, November 21, 13 53. ROUTE53AWSs authoritative Domain Name Service (DNS)Health checking serviceHighly available and scalableOffers tools that provide flexible, high-performance, and highly available architectures on AWS[ What is Amazon Route 53? ] !39 Thursday, November 21, 13 54. Improves availability by health checking load balancer nodes and rerouting traffic to avoid failuressupporting multi-region and backup architectures for high-availabilityROUTE53[ What is Amazon Route 53? ] !40 Thursday, November 21, 13 55. Health Checks Automated requests sent over the Internet to your application to verify that your application is reachable, available, and functional.+Failover Only returns answers for resources that are healthy and reachable from the outside world, so end users are routed away a failed application.[ What is DNS failover? ] !40 Thursday, November 21, 13 56. Work on Failure System activity Time to reactConstant Work System activity Time to reacttimetimeWhen nothing is failing, volume of APIHealth checkers and edge locationscalls is zero. When failure occurs,perform the same volume of activityvolume of API calls spikes.whether endpoints are healthy or unhealthy.[ How does it work? ] !41 Thursday, November 21, 13 57. Amazon Route 53 conducts health checks from within each AWS region[ Global Health Check Network ] !43 Thursday, November 21, 13 58. NETWORK PARTITION !43 Thursday, November 21, 13 59. 150SECONDSMANUAL FAILOVERvs. operator receives an alarm operator manually configures DNS update wait for DNS changes to propagate[ How does it work? ] !44 Thursday, November 21, 13 60. 150SECONDSNO CONTROL PLANE INVOLVEMENT REQUIRED FOR FAILOVER TO OCCURMANUAL FAILOVER operator receives an alarm operator manually DIRECTLY FROM GLOBALLY DISTRIBUTED configures DNS HEALTH CHECKER FLEET update wait for DNS changes to propagate EDGE LOCATIONS PULL HEALTH RESULTSvs.DONT HAVE TO WAIT FOR API REQUESTS TO SUCCEED AND THEN PROPAGATE[ How does it work? ] FAILOVER HAPPENS ENTIRELY WITHIN THE AMAZON ROUTE 53 DATA PLANE !44 Thursday, November 21, 13 61. RegionE-commerce site: example.com Elastic Load BalancingRunning application stack in multiple Availability Zones in a single AWS regionWants a backup in case:-Own application goes down across multiple Availability Zones-Some parts of the world experience degraded connectivity to this AWS regionEC2 InstancesEC2 Instances[ Simple Failover Scenario ] !46 Thursday, November 21, 13 62. Region Primary Elastic Load BalancingEC2 InstancesHealth CheckROUTESecondary53S3EC2 Instances[ Simple Failover Scenario ] !47 Thursday, November 21, 13 63. RegionXElastic Load BalancingXPrimary Health CheckROUTESecondary53S3FAILOVER HEALTH CHECK FAILSEC2 InstancesEC2 Instances[ Simple Failover Scenario ] !48 Thursday, November 21, 13 64. Static SiteStatic vs. dynamic content[ Static Backup Site Options ] !48 Thursday, November 21, 13 65. Provides your globally-distributed end users with faster performanceTag each destination end-point to the Amazon EC2 region that its located inAmazon Route 53 will route end users to the end-point that provides the lowest latency[ Latency Based Routing ] !50 Thursday, November 21, 13 66. Better performance than running in a single regionImproved reliability relative to running in one regionEasier implementation than traditional DNS solutionsMuch lower prices than traditional DNS solutions[ LBR Benefits ]Our customers bid on video ad inventory in real time and our system must evaluate the content they're sponsoring and respond with a decision in less than 50ms, or they'll lose the auction. Route 53s Latency Based Routing lets us easily run multiple stacks of our whole targeting platform in each AWS region so we can meet our customers latency needs. Jonathan Dodson, Vice President of Engineering at Affine!50 Thursday, November 21, 13 67. Region 1example.com wants faster page load for customersRegion 2Launches application stack inElastic Load BalancingElastic Load Balancingadditional AWS regionsUses Amazon Route 53 Latency Based RoutingAmazon Route 53 DNS Failover ensures that end users are only routed to a region where the application isEC2 InstancesEC2 InstancesEC2 InstancesEC2 Instanceshealthy[ Multi-Region Failover ] !52 Thursday, November 21, 13 68. Region 1Region 2 PrimaryElastic Load BalancingEC2 InstancesHealth CheckROUTE53Primary Health CheckEC2 InstancesElastic Load BalancingEC2 InstancesEC2 Instances[ Multi-Region Failover ]!53 Thursday, November 21, 13 69. Region 1Region 2 PrimaryElastic Load BalancingHealth CheckROUTE53Primary Health CheckX XElastic Load BalancingHEALTH CHECK FAILS AND TRAFFIC SHIFTS AWAYEC2 InstancesEC2 InstancesEC2 InstancesEC2 Instances[ Multi-Region Failover ]!54 Thursday, November 21, 13 70. Region 1Elastic Load BalancingEC2 InstancesRegion 2Elastic Load BalancingEC2 InstancesEC2 InstancesS3EC2 Instances[ Multi-Region & S3 Failover ]!55 Thursday, November 21, 13 71. [ Configuring DNS Failover ] !56 Thursday, November 21, 13 72. AWS & InfoSpace Elastic Load Balancing & Amazon Route 53 for High-Availability!57 Thursday, November 21, 13 73. InfoSpace Search Since 1996, our mission has been to make it fast and easy for users to find what they need online.!57 Thursday, November 21, 13 74. InfoSpace Search!58 Thursday, November 21, 13 75. InfoSpace SearchSearch Sites!58 Thursday, November 21, 13 76. InfoSpace SearchSearch SitesSearch API!58 Thursday, November 21, 13 77. Types of Users!59 Thursday, November 21, 13 78. Types of Users Search Site Users 400 million queries per month Broad geographical distribution!59 Thursday, November 21, 13 79. Types of Users Search Site Users 400 million queries per month Broad geographical distributionSearch API Partners 150+ partnersworldwide Located primarily in US and EU 2 billion queries/month!59 Thursday, November 21, 13 80. Types of Users Search Site Users 400 million queries per month Broad geographical distributionSearch API Partners 150+ partnersworldwide Located primarily in US and EU 2 billion queries/monthClick Users 6.5 billion clicks/month Broad geographical distribution!59 Thursday, November 21, 13 81. Global Distribution of Traffic!60 Thursday, November 21, 13 82. Global Distribution of Traffic!60 Thursday, November 21, 13 83. Global Distribution of Traffic AZ#AZ# AZ#AZ#AZ# AZ#AZ#AZ# AZ#!60 Thursday, November 21, 13 84. Global Distribution of Traffic AZ#AZ# AZ#AZ#AZ# AZ#AZ#AZ# AZ#!60 Thursday, November 21, 13 85. Global Distribution of Traffic AZ#AZ# AZ#AZ#AZ# AZ#AZ#AZ# AZ#!60 Thursday, November 21, 13 86. Global Distribution of Traffic AZ#AZ# AZ#AZ#AZ# AZ#AZ#AZ# AZ#!60 Thursday, November 21, 13 87. Global Distribution of Traffic AZ#AZ# AZ#AZ#AZ# AZ#AZ#AZ# AZ#!60 Thursday, November 21, 13 88. Global Distribution of Traffic AZ#AZ# AZ#AZ#AZ# AZ#AZ#AZ# AZ#!60 Thursday, November 21, 13 89. Key Statistics 4.5 billion requests/month Migrated from 2 data centers to AWS in 5 months Deployed in 4 regions Approximately 500 EC2 instances Approximately 50 load balancers Approximately 70 Amazon Route 53 zones!62 Thursday, November 21, 13 90. AWS InfrastructureRoute$53$Private$Subnet$Public$Subnet$NAT$TSG$Suppor+ng$ Services$Search$ API$Search$ Sites$Outbound$via$NAT$Suppor+ng$Services$!62 Thursday, November 21, 13 91. Fire and Forget!63 Thursday, November 21, 13 92. Fire and ForgetProductionSystem under test!63 Thursday, November 21, 13 93. Fire and ForgetProductionSystem under test!63 Thursday, November 21, 13 94. Fire and ForgetAsynchronousProductionSystem under test!63 Thursday, November 21, 13 95. Fire and ForgetProductionSystem under test!63 Thursday, November 21, 13 96. Fire and ForgetProductionSystem under test!63 Thursday, November 21, 13 97. Fire and Forget!64 Thursday, November 21, 13 98. Fire and Forget!64 Thursday, November 21, 13 99. Fire and Forget!64 Thursday, November 21, 13 100. Fire and ForgetLBRLBR!64 Thursday, November 21, 13 101. Fire and ForgetLBRLBR!64 Thursday, November 21, 13 102. Fire and ForgetLBR!64 Thursday, November 21, 13 103. Results Regional failover in 150 seconds consistently Decreased latency 25% less latent worldwide Can easily reroute individual partners to different region to avoid routing problems Replaced expensive network gear from datacenter!65 Thursday, November 21, 13 104. What next? Expanding to additional regions Integration of monitoring data with traffic routing!66 Thursday, November 21, 13 105. 3Levels of Availability!67 Thursday, November 21, 13 106. 1 Instance Availability!68 Thursday, November 21, 13 107. 12Instance AvailabilityZonal Availability!68 Thursday, November 21, 13 108. 123Instance AvailabilityZonal AvailabilityRegional Availability!68 Thursday, November 21, 13 109. Please give us your feedback on this presentationCPN104 As a thank you, we will select prize winners daily for completed surveys!Thursday, November 21, 13Thank You