Cloud Operations Bootcamp: Culture - Jesse Robbins

Post on 10-May-2015

2.748 views 1 download

Tags:

description

Cloud Operations Bootcamp: Culture

Transcript of Cloud Operations Bootcamp: Culture - Jesse Robbins

Speaker:

‣ jesse@opscode.com‣ @jesserobbins‣ www.opscode.com

Jesse Robbins CEO

1

Operations Culture

Today

2

Today

‣Operations is Culture

2

Today

‣Operations is Culture

‣ Failure Happens

2

Today

‣Operations is Culture

‣ Failure Happens

‣ The OODA Loop

2

Today

‣Operations is Culture

‣ Failure Happens

‣ The OODA Loop

‣Do Fire Drills

2

Operations is Culture

3

4

“You don’t choose the moment, the moment chooses you.

You only get to choose how prepared you are when it does.” -Fire Chief Mike Burtch

Cloud Operations is the ability to consistently create and deploy reliable software to an

unreliable platform that scales horizontally.

5

http://radar.oreilly.com/2007/10/operations-is-a-competitive-ad.html

6

“It’s not my code, it’s your machines!

http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr

6

“It’s not my code, it’s your machines!

http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr

Spock ScottyLittle bit weird

Sits closer to the bossThinks too hard

Pulls levers & turns knobsEasily excitedYells a lot in emergencies

6

“It’s not my code, it’s your machines!

http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr

Copyright © 2010 Opscode, Inc - All Rights Reserved 7

No "ngerpointing

http://www.!ickr.com/photos/rocketjim54/2955889085/http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr

Fingerpointyness

problem!!!argggh!

time

http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr

Fingerpointyness

problem!!!argggh!

time

freaking out,not talking,finding fault

http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr

Fingerpointyness

problem!!!argggh!

time

freaking out,not talking,finding fault

blaming,covering

ass

http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr

Fingerpointyness

problem!!!argggh!

time

freaking out,not talking,finding fault

blaming,covering

ass

whining,hiding.

hurt egos

http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr

Fingerpointyness

problem!!!argggh!

time

freaking out,not talking,finding fault

blaming,covering

ass

whining,hiding.

hurt egos

figuring it out

http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr

Fingerpointyness

problem!!!argggh!

time

freaking out,not talking,finding fault

blaming,covering

ass

fixin

g th

ings

fixed

whining,hiding.

hurt egos

figuring it out

http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr

Being productive

problem!!!argggh!

time

http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr

Being productive

problem!!!argggh!

time

figuring it out

http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr

Being productive

problem!!!argggh!

time

fixin

g th

ings

fixed

figuring it out

http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr

Being productive

problem!!!argggh!

time

fixin

g th

ings

fixed

feeling guilty

figuring it out

http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr

Being productive

problem!!!argggh!

time

fixin

g th

ings

fixed

feeling guilty

figuring it out

move on with

life

http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr

This will be on the test:FAILURE HAPPENS!

Good Book!

12

Complexity Complex

Loos

eTi

ght

Cou

plin

g

Simple

"Catastrophic Potential" adapted from Normal Accidents by Charles Perrow

Catastrophic Potential

Created by Jesse Robbins

12

Complexity Complex

Loos

eTi

ght

Cou

plin

g

Simple

"Catastrophic Potential" adapted from Normal Accidents by Charles Perrow

Catastrophic Potential

Created by Jesse Robbins

KEEP OUT!!!

define:Nines (roughly)

define:Nines (roughly)

99% 5256 min (3.5 days)

define:Nines (roughly)

99% 5256 min (3.5 days)

99.9% 528 min ( 8.8 hours )

define:Nines (roughly)

99% 5256 min (3.5 days)

99.9% 528 min ( 8.8 hours )

99.99% 53 min

define:Nines (roughly)

99% 5256 min (3.5 days)

99.9% 528 min ( 8.8 hours )

99.99% 53 min

99.999% 5 min

define:Nines (roughly)

99% 5256 min (3.5 days)

99.9% 528 min ( 8.8 hours )

99.99% 53 min

99.999% 5 min

99.9999% 30 Seconds

define:Nines (roughly)

99% 5256 min (3.5 days)

99.9% 528 min ( 8.8 hours )

99.99% 53 min

99.999% 5 min

99.9999% 30 Seconds

99.99999% 3 Seconds

99.9% *99.9% *99.9%

= 99.7%

14

Internet Routing... won’t.

!"#$$%"&'(')*)"+,-.,-/01,( +/.01210*"345467"89: #

;''-1(<"=/-)"3.1>0?-'"@'-':

http://radar.oreilly.com/2008/10/sprint-blocking-cogent-network.html

#googlefail

Copyright © 2010 Opscode, Inc - All Rights Reserved

YOU

21

Continuous Power... isn’t

365 Main SF

365 364.96 Main SF

http://radar.oreilly.com/2007/07/failure-happens-a-summary-of-t.html

http://radar.oreilly.com/2007/07/failure-happens-a-summary-of-t.html

Failure happens

A single datacenter is the problem• Since they all fail at some point

Recovery procedures after failure• Power was gone ~45 minutes• Most services took hours to come back• Some unnamed ones more than 12 hours

Geography is a Single Point of Failure

Copyright © 2010 Opscode, Inc - All Rights Reserved 30

Providers are baskets too.

Copyright © 2010 Opscode, Inc - All Rights Reserved 32

Failure Happens.

Anyone promising otherwise is either foolish or lying

(or both).

OODAObserve, Orient, Decide, Act

34

35

OODA: Observe, Orient, Decide, Act

http://en.wikipedia.org/wiki/OODA_loop

http://www.flickr.com/photos/dnorman/2678090600http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr

Speaker:

‣ jesse@opscode.com‣ @jesserobbins‣ www.opscode.com

Jesse Robbins CEO

37