(APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

78
© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. November 13, 2014 | Las Vegas APP307 - Leveraging the Cloud with a Blue-Green Deployment Architecture Jim Plush, Sr. Director of Engineering, CrowdStrike - @jimplush Sean Berry, Principal Software Engineer, CrowdStrike - @schleprachaun

description

Minimizing customer impact is a key feature in successfully rolling out frequent code updates. Learn how to leverage the AWS cloud so you can minimize bug impacts, test your services in isolation with canary data, and easily roll back changes. Learn to love deployments, not fear them, with a blue/green architecture model. This talk walks you through the reasons it works for us and how we set up our AWS infrastructure, including package repositories, Elastic Load Balancing load balancers, Auto Scaling groups, internal tools, and more to help orchestrate the process. Learn to view thousands of servers as resources at your command to help improve your engineering environment, take bigger risks, and not spend weekends firefighting bad deployments.

Transcript of (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Page 1: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

November 13, 2014 | Las Vegas

APP307 - Leveraging the Cloud with a

Blue-Green Deployment ArchitectureJim Plush, Sr. Director of Engineering, CrowdStrike - @jimplush

Sean Berry, Principal Software Engineer, CrowdStrike - @schleprachaun

Page 2: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

About us

Page 3: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

• Founded in September 2011

• ~150 employees

• Detection/prevention

– Advanced cyber threats

– Real-time detection

– Real-time analytics

Cybersecurity startup

Page 4: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Published experts

Page 5: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Event Stream Processing

Sensor

Targeted Malicious

Malware

The “CLOUD”

{"date":"11/14/2014 08:03", "path": “C:\WINDOWS\Programs\Word.exe", "id": 49, "parentId": 48}

{"date":"11/14/2014 08:03", "path": “C:\WINDOWS\System32\cmd.exe", "id": 50, "parentId": 49}

{"date":"11/14/2014 08:03", "path": “C:\WINDOWS\Programs\Word.exe", "id": 51, "parentId": 50}

DNS Lookup

{"date":"11/14/2014 08:03", “dns": “badapple.cc”, "id": 52, "parentId": 51}

TCP Connect

{"date":"11/14/2014 08:03", “tcp_connect”: “10.10.10.10”, "id": 53, "parentId": 51}

FTP Download

{"date":"11/14/2014 08:03", "download": “10.10.10.10/badstuff.exe”, “id": 54, "parentId": 51}

Document Exfiltration

{"date":"11/14/2014 08:03", "scp": “C:\Documents\TradeSecrets.doc”, “id": 55, "parentId": 54}

Page 6: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014
Page 7: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Tactical UI

Page 8: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Data ingestionService AService A

UI

Service AService A

API

Se

nso

rs

Termination server

Termination server

Termination server

Termination server

KafkaDynamoDB Redis Amazon RDS Amazon Redshift Amazon Glacier Amazon S3

Data plane

Se

nso

rsS

en

so

rs

Exte

rna

l se

rvic

e E

lastic L

oa

d B

ala

ncin

g loa

d b

ala

ncer Content Router

Content Router

Service AService AProcessor 1

Service AService AProcessor 2

Page 9: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

• Fortune 500, Think Tanks, Non-Profits

• 100K+ events per second

– Expected to hit 500K EPS by end of 2015

• Each enterprise customer can generate 2-4 TBs of

data per day

• Microservice architecture

• Polyglot environment

High scale, big data

Page 10: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Our tech stack is complicated

Page 11: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

…but possible because of AWS

Page 12: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Motivation

Page 13: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Solving for the problems• OMG, all servers need to be patched??

• I’m afraid to restart that service; it’s been running

for 2 years

• Large rolling restarts

• Deployment fear

– Friday night deploys

• B/G for event processing?

Page 14: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Our primary objectives for deployments

• Minimize customer impact

– Customers should have no indication that

anything has changed

• Maximize engineer’s weekends

– Avoid burnout

• Reduce dependencies of rollouts

– Everything goes out together, 50+ services,

1000+ VMS

Page 15: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Leveraging AWS

• Programmable data centers

• Nodes are ephemeral

• It should be easier to re-create an environment

than to fix it

— Think like the cloud

Page 16: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

What is blue-green?

Router

Web

server

App

server

Application v1

Shared

database

Web

server

App

server

Application v2

xx

Page 17: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

What is blue-green?

• Full cluster BG

– Everything goes out together

– Indiana Jones: “idol switch”

• App-based BG

– Each app or team controls their own

blue-green deployments

Page 18: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Data plane

The data planecan’t blue-green all the things

Blue cluster

Green cluster

KafkaDynamoDB Redis Amazon RDS

pgsql

Amazon

Redshift

Amazon

GlacierAmazon S3

Page 19: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

When do we deploy?• Teams deploy end of sprint releases together

• Hot-fix/Upgrades are performed via rolling restart

deployments frequently

• Early on deployments took an entire day

– Lack of automation

• Deploys today generally take 45 minutes

– Everyone has run a deployment

Page 20: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Sustaining engineer

• Every team member including QA has run

deployments

• Builds confidence, understanding, and

redundancy

• Ensures documentation is up to date and all

things are automated that can be.

Sustaining engineer badge of honor

shirt after their tour of duty

Page 21: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Deployment day

• Apt repo synchronized and locked down

• Data plane migrations applied

• “Green” cluster is launched (1000s of machines)

• IT tests run

• Canary customers

• Logging and error checks

• Active-active

• “Blue” marked as inactive, decommissioned

Page 22: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Keys to success

Pro tip: It’s not just flipping load balancers

Page 23: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Keys to successAutomate all the things

• jr devs should be able to run your deploy

system

Page 24: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Keys to successInstrumentation & Metrics

https://github.com/codahale/metrics

https://github.com/rcrowley/go-metrics

Page 25: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Keys to successUse a provisioning system

• Chef

• Puppet

• Salt

• baked AMIs

Page 26: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Keys to successLive integration / regression test suites

Test

System

Send deterministic input values

Verify processed state

Page 27: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Keys to successCanary Customers

V1 App V2 App

Page 28: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Keys to successFeature Flags

Page 29: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Keys to successUnified app requirements

Page 30: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Keys to successDeployment History

Page 31: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

– every team member

“Thank God we have blue-green”

Page 32: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Implementation

Page 33: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

How we blue-green

Page 34: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Elevator pitch on Kafka

• Distributed commit log

• Similar to a message queue

• Allows for replaying messages from earlier in the stream in case of failure

Page 35: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

KafkaDynamoDB Redis Amazon RDS Amazon Redshift Amazon Glacier Amazon S3

Data plane

Service AService AProcessor 1

Service AService AProcessor 2

Service AService AProcessor 3

Service AService AProcessor 4

Se

nso

rs

Termination server

Termination server

Termination server

Termination server

Content Router

Content Router

Se

nso

rs

• Blue is running; normal operation

• Content Routers are writing to the “active” topics in Kafka

• Blue processors read from the “active” topicsSe

nso

rs

Active topic

Active topic

Exte

rna

l se

rvic

e E

LB

loa

d b

ala

ncer

It all starts with a running cluster

Page 36: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Main management page for blue-green

Page 37: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

KafkaDynamoDB Redis Amazon RDS Amazon Redshift Amazon Glacier Amazon S3

Data plane

Exte

rna

l se

rvic

e E

LB

loa

d b

ala

nceerS

en

so

rs

Termination server

Termination server

Termination server

Termination server

Termination server

Termination server

Termination server

Termination server

Content Router

Content Router

Se

nso

rsS

en

so

rs

Service AService AProcessor 1

Service AService AProcessor 2

Service AService AProcessor 3

Service AService AProcessor 4

Active topic

Launching new cluster

Active topic

Active topic

Inactive Topic

Service AService AProcessor 1

Service AService AProcessor 2

Service AService AProcessor 3

Service AService AProcessor 4

Content Router

Content Router

• Green cluster is launched

• Termination servers are kept out of the ELB load

balancer by failing health checks

• Content Routers write to the “active” topics

• Processors in green read from the “inactive” topics

Page 38: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Sizing the new cluster

Page 39: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Getting the size right• Sizing of our autoscale groups is

determined programmatically

– Admin page allows for

setting mix / max

– Script determines

appropriate desired-capacity

based on running cluster

• Launching is then as simple as

updating the autoscale groups to

the new sizes

def current_counts(region='us-east-1'):

proc = Popen(

"as-describe-auto-scaling-groups “

“--region {} “

“--max-records=600".format(region),

shell=False, stdout=PIPE, stderr=PIPE)

out, err = proc.communicate()

if err:

raise Exception(err)

counts = {}

for line in out.splitlines():

if "AUTO-SCALING-GROUP" not in line:

continue

parts = line.split()

group = parts[1]

current = parts[-2]

counts[group] = int(current)

return counts

Page 40: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Tuning size before we launch

Page 41: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Bootstrapping

Page 42: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

User data and Chef get things rolling

• Inside out Chef bootstrapping

– Didn’t feel comfortable running `wget … | bash`

• Custom version of Chef installer

– Version of Chef

– Where to find the Chef servers

– Which role to run

– Which environment (dev, integ, blue, green)

Page 43: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Testing the new stuffE

xte

rna

l se

rvic

e E

LB

loa

d b

ala

ncer

Se

nso

rs

Termination server

Termination server

Termination server

Termination server

Se

nso

rs

Active topic

Active topic

KafkaDynamoDB Redis Amazon RDS Amazon Redshift Amazon Glacier Amazon S3

Data plane

Termination server

Termination server

Termination server

Termination serverInte

gra

tio

n te

sts

Active topic

Inactive Topic

Content Router

Content Router

Service AService AProcessor 1

Service AService AProcessor 2

Service AService AProcessor 3

Service AService AProcessor 4

Service AService AProcessor 1

Service AService AProcessor 2

Service AService AProcessor 3

Service AService AProcessor 4

Content Router

Content Router

• Test customer(s) are *canaried

• Integration test suite is run by connecting to a termination server directly

• Tests pass; then we canary real customers

Page 44: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Canary customers

Page 45: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

• Canary information is stored in zookeeper

• Fortunately we dogfood our own tech

• This affords us the ability to use ourselves as canaries for

new code

• The inactive processing cluster is set to read from the .inactivetopics

• The standard Kafka topics with .inactive appended

• The ingestion layer has a watcher on that znode and routes

any canaried customer to a the .inactive topics

• Ex. regular traffic goes to foo.bar, canary traffic goes to

foo.bar.inactive

• When we are ready to test real traffic we mark several

customers as canaries and start the monitoring process to

determine any issues

Canary customers

Page 46: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Canary customersS

en

so

rs

Exte

rna

l se

rvic

e E

LB

loa

d b

ala

ncer

Event ingestor

Kafka

Green Processors

Inactive Topic

Regular Traffic

Active topic

Blue Processors

Active topic

Inactive Topic

Canary Traffic

Customer 123

Customer 456

Page 47: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Let’s canary some customers

Page 48: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

That was easy

Page 49: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Testing

Page 50: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

IT tests run

• Integration tests are run

– ~3000 tests in total

– Test customer must be “canaried”

• If any tests fail, we triage and determine if it is still possible to

move forward

• Testing is only done when we are passing 100% — no

exceptions!

Page 51: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Sean is mad - we have work to do

Page 52: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Sean is happy - so we all are happy

Page 53: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

KafkaDynamoDB Redis Amazon RDS Amazon Redshift Amazon Glacier Amazon S3

Data plane

Trust, but verify!S

en

so

rs

Termination server

Termination server

Termination server

Termination server

Se

nso

rs

Active Topic

Active Topic

Inactive Topic

Se

nso

rs

Exte

rna

l se

rvic

e E

LB

loa

d b

ala

ncer

Service AService AProcessor 1

Service AService AProcessor 2

Service AService AProcessor 3

Service AService AProcessor 4

Content Router

Content Router

Inactive Topic

Service AService AProcessor 1

Service AService AProcessor 2

Service AService AProcessor 3

Service AService AProcessor 4

• Monitor green services

• Verify health of the cluster by inspecting graphical

data and log outputs

• Rerun tests with load

Page 54: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Monitoring

Page 55: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Logging and error checking

• Every server forwards its relevant logs to Splunk

• Several dashboards have been set up with common things to

watch for

• Raw logs are streamed in near real-time and we watch

specifically for log-level ERROR

• This is one of our most important steps, as it gives us the most

insight into the health of the system as a whole

Page 56: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Logging / Error Checking

Page 57: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Moving customers overTermination server

Termination server

Termination server

Termination server

Termination server

Termination server

Termination server

Termination server

Se

nso

rsS

en

so

rsS

en

so

rs

Exte

rna

l se

rvic

e E

LB

loa

d b

laa

ncer

KafkaDynamoDB Redis Amazon RDS Amazon Redshift Amazon Glacier Amazon S3

Data plane

Active topic

Active topic

Content Router

Content Router

Service AService AProcessor 1

Service AService AProcessor 2

Service AService AProcessor 3

Service AService AProcessor 4

Service AService AProcessor 1

Service AService AProcessor 2

Service AService AProcessor 3

Service AService AProcessor 4

Content Router

Content Router

Active topic

Active topic

• Flip all customers back away from canary

• Activate green cluster

• Event processors and consuming services in blue

and green now write to and consume the “active” topics

• We are in a state of active-active for a few minutes

Page 58: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Each node in the data processing layer has a watcher on a particular znode which tells

the environment whether it is active (use standard Kafka topics) or inactive (append .inactive to the topics)

Service AService AProcessor 1

Service AService AProcessor 2

Service AService AProcessor 3

Service AService AProcessor 4

Active Topic

Kafka

Service AService AProcessor 1

Service AService AProcessor 2

Service AService AProcessor 3

Service AService AProcessor 4

Active - active

Inactive Topic

Ingestion

Page 59: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Inactive TopicActive topic

When we are ready to make the switch, we start by making the new cluster active and

enter into an active-active state where both processing clusters are doing work.

Kafka

Green, switchto active!

Active Topic

This is where is it paramount that

code is forward compatible since

two different code bases will be

doing work simultaneously

Active - active

Service AService AProcessor 1

Service AService AProcessor 2

Service AService AProcessor 3

Service AService AProcessor 4

Service AService AProcessor 1

Service AService AProcessor 2

Service AService AProcessor 3

Service AService AProcessor 4

Ingestion

Page 60: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

However, blue and green are fully partitioned and there is no intercommunication

between the clusters. This allows for things like changes in serialization for inter-

service communication.

Active Topic

Kafka

Active Topic

Active - active

Service AService AProcessor 1

Service AService AProcessor 2

Service AService AProcessor 3

Service AService AProcessor 4

Service AService AProcessor 1

Service AService AProcessor 2

Service AService AProcessor 3

Service AService AProcessor 4

Ingestion

Page 61: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

KafkaDynamoDB Redis Amazon RDS Amazon Redshift Amazon Glacier Amazon S3

Data plane

Flipping the switchTermination server

Termination server

Termination server

Termination server

Content Router

Content Router

Se

nso

rsS

en

so

rsS

en

so

rs

Exte

rna

l se

rvic

e E

LB

loa

d b

ala

ncer

Termination server

Termination server

Termination server

Termination server

Content Router

Content Router

Active topic

Active topic

Service AService AProcessor 1

Service AService AProcessor 2

Service AService AProcessor 3

Service AService AProcessor 4

Service AService AProcessor 1

Service AService AProcessor 2

Service AService AProcessor 3

Service AService AProcessor 4

Inactive topic

Active topic

• We deactivate Blue, which forces Termination Servers in Blue to

fail health checks and all Blue sensors disconnect

• Blue processors switch to read from the “inactive” topic

• Once all consumers of the “inactive” topic have caught up to the

head of the stream, Blue can be decommissioned

Page 62: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Out with the old…

Termination server

Termination server

Termination server

Termination server

Content Router

Content Router

KafkaDynamoDB Redis Amazon RDS Amazon Redshift Amazon Glacier Amazon S3

Data plane

Active topic

Active topic

Se

nso

rsS

en

so

rsS

en

so

rs

Exte

rna

l se

rvic

e E

LB

loa

d b

ala

ncer

Service AService AProcessor 1

Service AService AProcessor 2

Service AService AProcessor 3

Service AService AProcessor 4

• Green is now the active cluster

• If we need to roll back code, we have a snapshot of the repository

in Amazon S3

• We haven’t had to roll back code… yet

Page 63: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Easing the pain

Page 64: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Bootstapping faster

Page 65: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Half-baked AMIsWe use a process to create “half-baked” AMIs, which speed up deployments

• JVM (for our Scala code base)

• Common tools and configurations

• Latest updates to make sure patches are up to date

• Build plan is run twice daily

Green ServerGreen ServerGreen ServerGreen ServerGreen Server

Green server

Green ServerGreen ServerGreen ServerGreen ServerGreen Server

Blue server

Half-baked-AMI

Auto Scaling group

1

AMI

Auto Scale Group

Amazon S3

Page 66: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Getting code ready

Page 67: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

How code graduates - Development

Commit on main

Development apt repo

Auto deploy changed

roles

Development cluster

Page 68: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

How code graduates - Production

Create release-X.X.X or

hotfix-X.X.X branches

Integration apt repo

Production apt repo

Same exact

Binary

Integration clusterIntegration apt repo

Sync specified

Packages for integ

New production cluster

Page 69: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Choosing what goes out

Page 70: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Viewing debian details

Page 71: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Integration is synced

Page 72: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Integration is synced

Page 73: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Production is synced from Integ

Page 74: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Updating the data plane

Page 75: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Data plane migrations

• Migrations applied to the database are forward only

• We have past experiences with two way migrations, but the cost

outweigh the benefits.

• Code must be forward compatible in case rollbacks are necessary

• Database schemas are only modified via migrations even in

development and integration environments

• We use an in-house migration service (based on flyway) to parallelize

the process

Page 76: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Final Thoughts

• blue-green deployments can be done in many ways

• Our requirement of never losing customer data

made this the best solution for us

• The automation and tooling around our deployment

system were built over many months and was a lot

of work (built by 2 people – Hi Dennis!)

• But it is completely worth it, knowing we have a very

reliable, fault-tolerant system

Page 77: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

Thank you

Page 78: (APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS re:Invent 2014

http://bit.ly/awsevals

Jim: @jimplush

Sean: @schleprachaun