DevOps Incident Handling - Making friends not enemies.

52
How to win friends when handling outages and downtime David Mytton London DevOps - Oct 2014 blog.serverdensity.com

description

David Mytton CEO of Server Density presented this talk to the DevOps Meetup in London. It takes you through how to handle DevOps incidents, outages and downtime -- and more specifically how to make friends, not enemies in the process.

Transcript of DevOps Incident Handling - Making friends not enemies.

Page 1: DevOps Incident Handling - Making friends not enemies.

How to win friends when handling outages and downtime

David MyttonLondon DevOps - Oct 2014

blog.serverdensity.com

Page 2: DevOps Incident Handling - Making friends not enemies.

David Mytton

Page 3: DevOps Incident Handling - Making friends not enemies.

Server monitoring, cloud management, dashboards and alerting

serverdensity.com

Page 4: DevOps Incident Handling - Making friends not enemies.

Slides: twitter.com/davidmytton

Page 5: DevOps Incident Handling - Making friends not enemies.

Let’s talk about downtime

Page 6: DevOps Incident Handling - Making friends not enemies.

2013 Spend: ~$5bn

Page 7: DevOps Incident Handling - Making friends not enemies.

2013 Spend: ~$6bn

Page 8: DevOps Incident Handling - Making friends not enemies.

2013 Spend: ~$4bn

Page 9: DevOps Incident Handling - Making friends not enemies.

You will have downtime

How much do you spend?

Page 10: DevOps Incident Handling - Making friends not enemies.
Page 11: DevOps Incident Handling - Making friends not enemies.

Preparation

Page 12: DevOps Incident Handling - Making friends not enemies.

Preparation - On Call

● Primary?

Page 13: DevOps Incident Handling - Making friends not enemies.

Preparation - On Call

● Primary?

● Secondary?

Page 14: DevOps Incident Handling - Making friends not enemies.

Preparation - On Call

● Primary?

● Secondary?

● Reachability - Tube, 3G/4G (edge?!), Do Not Disturb mode, at the gym, family emergency, system updates

Page 15: DevOps Incident Handling - Making friends not enemies.

Preparation - On Call

● Off call

Page 16: DevOps Incident Handling - Making friends not enemies.

Preparation - On Call

● Off call

● Rotations

Page 17: DevOps Incident Handling - Making friends not enemies.

Preparation - On Call

● Off call

● Rotations

● Illness

Page 18: DevOps Incident Handling - Making friends not enemies.

Preparation - On Call

● Off call

● Rotations

● Illness

● Work the next day?

Page 19: DevOps Incident Handling - Making friends not enemies.

Preparation - Documentation

Page 20: DevOps Incident Handling - Making friends not enemies.

Preparation - Documentation

● Searchable

Page 21: DevOps Incident Handling - Making friends not enemies.

Preparation - Documentation

● Searchable

● Easy to edit

Page 22: DevOps Incident Handling - Making friends not enemies.

Preparation - Documentation

● Searchable

● Easy to edit

● Independent of your infrastructure

Page 23: DevOps Incident Handling - Making friends not enemies.

Preparation - Documentation

● Searchable

● Easy to edit

● Independent of your infrastructure

● Up to date

Page 24: DevOps Incident Handling - Making friends not enemies.
Page 25: DevOps Incident Handling - Making friends not enemies.

Preparation - Key Info

Page 26: DevOps Incident Handling - Making friends not enemies.

Preparation - Key Info

● Team contacts

Page 27: DevOps Incident Handling - Making friends not enemies.

Preparation - Key Info

● Team contacts

● Key vendor contacts

Page 28: DevOps Incident Handling - Making friends not enemies.

Preparation - Key Info

● Team contacts

● Key vendor contacts

● Credentials to key systems

Page 29: DevOps Incident Handling - Making friends not enemies.

Unexpected failures

Page 30: DevOps Incident Handling - Making friends not enemies.

Unexpected failures

● Communication systems

Page 31: DevOps Incident Handling - Making friends not enemies.

Unexpected failures

● Communication systems

● Network connectivity

Page 32: DevOps Incident Handling - Making friends not enemies.

Unexpected failures

● Communication systems

● Network connectivity

● Access to support

Page 33: DevOps Incident Handling - Making friends not enemies.

ALERT!

Page 34: DevOps Incident Handling - Making friends not enemies.

ALERT!

1. Load up incident response checklist

Page 35: DevOps Incident Handling - Making friends not enemies.

ALERT!

1. Load up incident response checklist

2. Log incident in JIRA

Page 36: DevOps Incident Handling - Making friends not enemies.

ALERT!

1. Load up incident response checklist

2. Log incident in JIRA

3. Log into Ops War Room

Page 37: DevOps Incident Handling - Making friends not enemies.

ALERT!

1. Load up incident response checklist

2. Log incident in JIRA

4. Public status post

3. Log into Ops War Room

Page 38: DevOps Incident Handling - Making friends not enemies.

ALERT!

1. Load up incident response checklist

2. Log incident in JIRA

4. Public status post

5. Initial investigation

3. Log into Ops War Room

Page 39: DevOps Incident Handling - Making friends not enemies.

Key response principles

Page 40: DevOps Incident Handling - Making friends not enemies.

Key response principles

● Log everything

Page 41: DevOps Incident Handling - Making friends not enemies.

Key response principles

● Log everything

● Frequent public status updates

Page 42: DevOps Incident Handling - Making friends not enemies.

Key response principles

● Log everything

● Frequent public status updates

● Gather the team

Page 43: DevOps Incident Handling - Making friends not enemies.

Key response principles

● Log everything

● Frequent public status updates

● Gather the team

● Escalate!

Page 44: DevOps Incident Handling - Making friends not enemies.

Postmortem

Page 45: DevOps Incident Handling - Making friends not enemies.

Postmortem

● Within a few days

Page 46: DevOps Incident Handling - Making friends not enemies.

Postmortem

● Within a few days

● Tell the story

Page 47: DevOps Incident Handling - Making friends not enemies.

Postmortem

● Within a few days

● Tell the story

● Provide technical detail

Page 48: DevOps Incident Handling - Making friends not enemies.

Postmortem

● Within a few days

● Tell the story

● Provide technical detail

● Explain what failed and why

Page 49: DevOps Incident Handling - Making friends not enemies.

Postmortem

● How it’s going to be fixed

Page 50: DevOps Incident Handling - Making friends not enemies.

stspg.io/ZDC

Page 51: DevOps Incident Handling - Making friends not enemies.

Summary

● Preparation

● Communication

● Checklists

● Documentation

● Postmortem

Page 52: DevOps Incident Handling - Making friends not enemies.

どもありがとうございます

@davidmytton

[email protected]

blog.serverdensity.com

www.serverdensity.com