Cloudera Enterprise Reference Architecture for AWS Deployments
LearnBop Blue Green AWS Deployments - October 2015
-
Upload
alec-lazarescu -
Category
Technology
-
view
470 -
download
2
Transcript of LearnBop Blue Green AWS Deployments - October 2015
LearnBop Blue/Green Deployments
October 2015
whoami
utcnow
CTO at
www.learnbop.com
Algorithmic individual tutoring tuned by veteran teachers
Common Core and state standards supported
Currently enjoyed in schools. Sign up to be notified when parent led version is live:http://go.learnbop.com/amazon-parents
Common sample architecture
General release good practices
Continuous integration - build, test, etcScripted environment creation/update (ideally in source control)Scripted “one-click” deployNew code, API’s AND database schema should be backwards compatible
Why not rolling releases?
Not immutable infrastructure❖ Opportunities for config creep❖ Rollback risks - Code only releases likely easy. What if you patch the
OS, update a few libraries, etc?
Manual or automatic complexity tracking version state
Some big change will require new servers/environment anyway
Why blue/green?
Immutable infrastructure❖ Ensures your environment build process is up to date each release❖ Old environment is guaranteed untouched if rollback or comparison
needed
Rollback is FAST
Same process for minor or major changes (OS updates? no problem)
One button spin up & deploy plus one button to shift traffic. Either old or new. No complex in between risk.
Swap CNAMEs to the rescue?
Swap CNAMEs to the rescue?
Web Request Path - Round 1
Web Request Path - Round 1Maybe in 1993…GET / HTTP/1.0
Web Request Path - Round 1Maybe in 1993…GET / HTTP/1.0
Web Request Path - Round 2GET / HTTP/1.1
Web Request Path - Round 3
Web Request Path - Round 4
Web Request Path - Round 5
Web Request Path - Round 6
Swap CNAMEs to the rescue?
Swap CNAME worst caseBad Scenario 1 - Users stuck on old pre-swap version longer than a few min
User actively clicking on the site with keep the HTTP keep-alive sockets active and won’t get a chance to check DNS again
Browser and OS DNS cache can keep old value longer than a minimal DNS TTL
Some DNS servers or apps may be configured/misconfigured with abnormally high TTL
Bad Scenario 2 - Users stuck on old pre-swap version INDEFINITELYLong polling, websockets, notification refresh will keep re-using the same
HTTP keep-alive socketIt never goes back to a DNS server to get a new address as long as they
don’t lose internet access/close the browserI’ve seen it happen 12+ hours
Swap CNAME worst caseBad Scenario 3 - Semi-permanent stale data
CDN caches old version of file during you swapBrowser gets old file with Cache-Control: max-
age=3600 and caches it for a YEAREmergency Workarounds
Tell your users to clear cache (not a great move for public websites)
Change your cachebuster ?build= # and re-publishDisable CDNBad Scenario 4 - User requests going from old
→ new → old serversRequest hits one bank of DNS servers and gets new
IPHit different bank of DNS servers and gets old IPCould send new form data to old server backend...
Swap CNAME worst caseBad Scenario 3 - Semi-permanent stale data
CDN caches old version of file during you swapBrowser gets old file with Cache-Control: max-
age=3600 and caches it for a YEAREmergency Workarounds
Tell your users to clear cache (not a great move for public websites)
Change your cachebuster ?build= # and re-publishDisable CDNBad Scenario 4 - User requests going from old
→ new → old serversRequest hits one bank of DNS servers and gets new
IPHit different bank of DNS servers and gets old IPCould send new form data to old server backend...
How do we know what version users are hitting?
How do we know what version CDN is hitting?
Discarded AlternativesTry to reuse ELB OR put servers in a 3rd ELB (not in blue or green env)
Complex to manage which servers should be in and outIf using Elastic Beanstalk and auto-scaling complex to manage new servers
or putting in servers
Trick Beanstalk into switching the ELB it’s using (swap ELB for pre and post)
Error: Tag keys starting with ‘aws:’ are reserved for internal use
Swap CNAMEs first and then put new nodes in both new and old ELB. Remove old nodes from old ELB after
Not bad but still need to leave old ELB up in case of old DNSPre-rollback testing hard as old nodes are not reachable
Final Solution AttributesAttributes
Only possible relatively recently with new AWS attach/detach ELB to AutoScaling Group (ASG) feature out June 11th - see blog post
Fully scripted and one click (bash script run through RunDeck)Rollback is as simple and running it again to swap backNo CNAME/DNS changes!Old environment not hit more than 3 minutes after new servers come onlineNo one hitting new server has any risk of future request hitting old server
(unless you rollback)
Final Solution Environment SetupEnvironment work
Initial state: Beanstalk application with two environments running and green (staging and production)
Create two new ELB’s outside of Elastic Beanstalk (PROD and STAGING)Attach STAGING ELB to staging (pre-swap to prod) Autoscaling GroupCNAME dualstack DNS name of STAGING ELB to your staging web site
addressAttach PROD ELB to production Autoscaling GroupCNAME dualstack DNS name of PROD ELB to your production site
addressEnsure Connection Draining is enabled on all four ELBs with a timeout of
120 secondsEnsure application sets a session type cookie on EVERY requestCreate an ELB application controlled session stickiness cookie policy
Final Solution Steps - Sanity ChecksFirst Do No Harm! Lots of sanity checks before proceeding.
1. Confirm two environments exist in application and one has the PROD ELB attached to its ASG and the other has the STAGING ELB attached to its ASG.
2. Confirm both environments are Health: Green
Final Solution Steps1. Enable ELB application sticky cookie policy on PROD ELB (both HTTP and
HTTPS if applicable! - avoid users hitting new servers then old)2. Set PROD ELB Connection Idle Timeout to 20 seconds (to close
connection and thwart WebSockets, Long Polling, HTTP keep-alive)3. Attach PROD ELB to new code environment ASG (loop until complete)4. Detach PROD ELB from old code environment ASG (loop until complete)5. Disable ELB application sticky cookie policy on PROD ELB6. Set PROD ELB Connection Idle Timeout back to 60 seconds7. Attach STAGING ELB to old code environment ASG (loop until complete)8. Detach STAGING ELB from new code environment ASG (loop until
complete)9. Flag old code environment for termination (separate script 2 hours later)10.Flag deployment successful in 3rd party tools/monitoring
Rollback if needed is running the same script
Q&A / Thank you!
Always Be Shipping!Email: [email protected]: alec1a
Slide Deck (posted by Sunday, Oct 4th)http://tinyurl.com/bluegreen2015
LearnBop for Parentshttp://go.learnbop.com/amazon-parents