Zero to Production in Crazy Time: Adobe’s Transformation

17
Zero to Prod in Crazy Time John Martinez | Adobe Cloud Services

description

Adobe has quickly scaled from nothing to a huge presence in the AWS cloud. This is the story from the trenches: how we screwed up, learned and evolved our use of Chef to help get us to today. Taming Chef to work in the AWS cloud while trying to build a platform at a large scale was not as easy as we originally planned, and we’re consistently trying to make it better. We’ll share some tips and tricks from our experience.

Transcript of Zero to Production in Crazy Time: Adobe’s Transformation

Page 1: Zero to Production in Crazy Time: Adobe’s Transformation

Zero to Prod in Crazy Time

John Martinez | Adobe Cloud Services

Page 2: Zero to Production in Crazy Time: Adobe’s Transformation

About Me

• Currently working as a Cloud Operations Engineer at Adobe

• I get to figure out new stuff, and make really old stuff work in AWS

• 20+ years doing UNIX/Linux work

• Learned about cloud computing at Netflix

• Working at Adobe feeds my habit - photography

Page 3: Zero to Production in Crazy Time: Adobe’s Transformation

About Ops PeopleSome people see us as Ninjas, I really see us as Storm Troopers

Page 4: Zero to Production in Crazy Time: Adobe’s Transformation

Cloud Platforms @ Adobe• Creative Cloud

• Marketing Cloud

• Digital Publishing Suite

• Phonegap

• Typekit

• Acrobat.com

• Echosign

• Revel

• ...and growing...

Page 5: Zero to Production in Crazy Time: Adobe’s Transformation

How We Got Started

• Creative Cloud went live in late April 2012

• AWS from the start

• We needed to do SOMETHING

• Yes, it was really that scientific of a decision

• Chef vs. Puppet

• That learning curve

Page 6: Zero to Production in Crazy Time: Adobe’s Transformation

#EPICFAIL #1

• Not socializing the need for Chef to the dev team

• Once sold, keep momentum going

• The “let’s make this more complicated than it needs to be syndrome”

• Start with easy stuff first, then graduate

• Ops guy admits: the dev people know how to use software engineering methods for creating and maintaining infrastructure code: USE IT

Page 7: Zero to Production in Crazy Time: Adobe’s Transformation

Tweaking Knobs• EC2 AMIs: bake or configure?

• Baking positive: fast boot times

• Baking negative: too static

• Configure positive: very dynamic

• Configure negative: can take forever to boot

• We settled on a mostly dynamic configuration, with some static baking

• knife-ec2 is great, but what about autoscale?

• The CloudFormation connection

Page 8: Zero to Production in Crazy Time: Adobe’s Transformation

#EPICFAIL #2

• Get Chef, don’t actually use it

• Back to that learning curve (Hint: Training)

• Issue with compressed timelines and small staff

• In the heat of deploying prod, doing stupid things

• Losing track of what got deployed where

• Who’s doing what?

• Not sleeping sucks

Page 9: Zero to Production in Crazy Time: Adobe’s Transformation

Out of the Rubble

• Now that we’re live: refactor time (a.k.a. Fix all the broken stuff)

• Chef development for reals

• OMG: WINDOWS?!?!

• Not a lot of expertise in-house or outside

• Ops guy admits: learned to love dev tools like Jenkins and Git

Page 10: Zero to Production in Crazy Time: Adobe’s Transformation

It’s Alive!

• Did gradually over time

• Started with simple recipes, graduated to more complicated ones

• Using Environments to deploy the right thing in the right place

• It’s AWS stupid: you SHOULD kill your instances

• CloudFormation to AutoScale to Chef Client

Page 11: Zero to Production in Crazy Time: Adobe’s Transformation

It’s Alive (v1)

EC2Instances

S3 Bucket(validator

key)

CloudFormation Auto

ScaleGroup

Hosted

11. knife upload

CookbooksEnvironment

RolesData bags

2 3

4

0

0. ManualEditor (vi)Perforce

cfn-create-stack

4. Chef ClientBootstrap

Data Bag KeyRecipes

Page 12: Zero to Production in Crazy Time: Adobe’s Transformation

More Automation (v2)

EC2Instances

S3 Bucket(validator

key)

CloudFormation Auto

ScaleGroup

Hosted

11. knife upload

CookbooksEnvironment

RolesData bags

2 3

4

0

0. AutomatedGit

JenkinsJenkins CFN

4. Chef ClientBootstrap

Data Bag KeyRecipes

Page 13: Zero to Production in Crazy Time: Adobe’s Transformation

On Bootstrapping EC2 Instances

• Biggest issue with Chef in AWS: straying from knife-ec2

• Read the bootstrap document and reverse engineer it

• http://wiki.opscode.com/display/chef/Client+Bootstrap+Fast+Start+Guide

• http://wiki.opscode.com/display/chef/EC2+Bootstrap+Fast+Start+Guide

• user-data is your friend

• Use it for node identity

• Resist the devil: don’t send any API keys or passwords or embarrassing things via user-data!!!

• Windows works this way, too, but learn PowerShell

Page 14: Zero to Production in Crazy Time: Adobe’s Transformation

#EPICFAIL #3Oh crap, Opscode is DOWN!!!

Page 15: Zero to Production in Crazy Time: Adobe’s Transformation

#EPICFAIL #3

• Failing to architect for failure (double BAM)

• Even though we built a hot AWS architecture, we still got bit

• What does it mean when Hosted Chef is down for us?

• Talk to Opscode...really, talk to them, they want to help

Page 16: Zero to Production in Crazy Time: Adobe’s Transformation

How We’re Trying to Improve• Mostly around availability

• Augment Hosted Chef with Private Chef

• Mostly around security

• Use the tools at your disposal

• IAM policies for EC2 roles and S3 bucket security

• Mostly around performance

• Refactoring AWS-related code to use AWS SDK for Ruby

• AMI factory from base Amazon Linux or Ubuntu AMIs (bonus points for Windows)

Page 17: Zero to Production in Crazy Time: Adobe’s Transformation

The End

• Operational scripts, template examples and other bits

• https://github.com/Adobe-CloudOps

• Contact me:

• @johnmartinez

[email protected]

• Questions? Suggestions? Come talk to me after!