Leveling up monitoring: A decade of automating and scaling Nagios

Post on 14-Apr-2017

6.221 views 0 download

Transcript of Leveling up monitoring: A decade of automating and scaling Nagios

Leveling Up Monitoring:

A Decade of Automating and Scaling Nagios

Katherine Daniels and Laurie Denness

@beerops - @lozzd Velocity 2016

@beerops - @lozzd Velocity 2016

Katherine Daniels@beerops

Senior Operations Engineer, Etsy Co-Author of Effective DevOps

Laurie Denness @lozzd

Staff Operations Engineer, Etsy Official Graph Enthusiast

3

Agenda

@beerops - @lozzd Velocity 2016

Au to mat i o n

2

D e p loy i nato r

3

S c a l i ng + To o l i ng

4

I n T h e B e g i n n i ng . . .

1

25MActive Buyers

About Etsy

1.6MActive Sellers

$2.39B2015 Annual GMS

(As of March 31, 2016)

Monitoring!

@beerops - @lozzd Velocity 2016

@beerops - @lozzd Velocity 2016

bit.ly/yaynagios

https://kartar.net/2015/08/monitoring-survey-2015---tools/

@beerops - @lozzd Velocity 2016

In The Beginning

@beerops - @lozzd Velocity 2016

@beerops - @lozzd Velocity 2016

@beerops - @lozzd Velocity 2016

Sometimes your statement needs emphasis with a black background.

@beerops - @lozzd Velocity 2016

@beerops - @lozzd Velocity 2016

L E S S O N S L E A R N E D :

Templates are awesome.

@beerops - @lozzd Velocity 2016

@beerops - @lozzd Velocity 2016

@beerops - @lozzd Velocity 2016

@beerops - @lozzd Velocity 2016

@beerops - @lozzd Velocity 2016

define service { use generic-service hostgroups Linux_hosts,!email-only-servers service_description SSH check_command check_ssh }

@beerops - @lozzd Velocity 2016

define service { use disk-space-service hostgroup_name email-only-servers contact_groups ops_nonurgent }

@beerops - @lozzd Velocity 2016

@beerops - @lozzd Velocity 2016

L E S S O N S L E A R N E D :

Start small.

@beerops - @lozzd Velocity 2016

Nagios and Chef

@beerops - @lozzd Velocity 2016

@beerops - @lozzd Velocity 2016

24

@beerops - @lozzd Velocity 2016

L E S S O N S L E A R N E D :

Automation is awesome!

@beerops - @lozzd Velocity 2016

L E S S O N S L E A R N E D :

Automation is awesome!

HA HA JUST KIDDING

@beerops - @lozzd Velocity 2016

@beerops - @lozzd Velocity 2016

@beerops - @lozzd Velocity 2016

L E S S O N S L E A R N E D :

Trust but verify.

@beerops - @lozzd Velocity 2016

How Many Repos?

@beerops - @lozzd Velocity 2016

@beerops - @lozzd Velocity 2016

@beerops - @lozzd Velocity 2016

L E S S O N S L E A R N E D :

?!?!?!?!??!?!

@beerops - @lozzd Velocity 2016

L E S S O N S L E A R N E D :

Try, fail, learn, and try again.

Problems

Problems

• Four git repos, inconsistent mess, duplication

Problems

• Four git repos, inconsistent mess, duplication

• Broken semi-useful automation - need to regain trust

Problems

• Four git repos, inconsistent mess, duplication

• Broken semi-useful automation - need to regain trust

• Some shared config, some unique

Problems

• Four git repos, inconsistent mess, duplication

• Broken semi-useful automation - need to regain trust

• Some shared config, some unique

• Gain confidence in changes

Problems

• Four git repos, inconsistent mess, duplication

• Broken semi-useful automation - need to regain trust

• Some shared config, some unique

• Gain confidence in changes

• Stop editing on the production box

@beerops - @lozzd Velocity 2016

Nagios and Chef

@beerops - @lozzd Velocity 2016

Nagios and Chefand Deployinator!

@beerops - @lozzd Velocity 2016

Solution 1: Merge everything: find and remove duplication,

shared configs

@beerops - @lozzd Velocity 2016

Thanks Murphy!

@beerops - @lozzd Velocity 2016

Super Secret Option!!!

@beerops - @lozzd Velocity 2016

@beerops - @lozzd Velocity 2016

@beerops - @lozzd Velocity 2016

@beerops - @lozzd Velocity 2016

Solution 2:

Using Jenkins CI to test changes before production

@beerops - @lozzd Velocity 2016

Solution 3:

Use Deployinator to run Chef recipe to generate automated configs

Chart Tit le

Chart Tit le

@beerops - @lozzd Velocity 2016

Solution 4:

Use Deployinator to rsync config to all boxes

• git pull repo on deploy host

• git pull repo on deploy host

• Run Chef recipe to add automated pieces

• git pull repo on deploy host

• Run Chef recipe to add automated pieces

• Re-run the try-nagios script against that

• git pull repo on deploy host

• Run Chef recipe to add automated pieces

• Re-run the try-nagios script against that

• rsync copy from deploy box to Nagios hosts

• git pull repo on deploy host

• Run Chef recipe to add automated pieces

• Re-run the try-nagios script against that

• rsync copy from deploy box to Nagios hosts

• Create symlink for nagios.cfg

• git pull repo on deploy host

• Run Chef recipe to add automated pieces

• Re-run the try-nagios script against that

• rsync copy from deploy box to Nagios hosts

• Create symlink for nagios.cfg

• Restart Nagios

@beerops - @lozzd Velocity 2016

L E S S O N S L E A R N E D :

Use the tools you have.

@beerops - @lozzd Velocity 2016

Scaling things up!

@beerops - @lozzd Velocity 2016

@beerops - @lozzd Velocity 2016

@beerops - @lozzd Velocity 2016

@beerops - @lozzd Velocity 2016

@beerops - @lozzd Velocity 2016

Core Workers

@beerops - @lozzd Velocity 2016

Core Workers

@beerops - @lozzd Velocity 2016

@beerops - @lozzd Velocity 2016

L E S S O N S L E A R N E D :

If at first you don’t succeed, rub some webscale on it.

@beerops - @lozzd Velocity 2016

Iterating and Iterating

@beerops - @lozzd Velocity 2016

L E S S O N S L E A R N E D :

Iterate

Iterate

Iterate

@beerops - @lozzd Velocity 2016

To Infinity and Beyond

@beerops - @lozzd Velocity 2016

http://github.com/etsy/opsweekly

http://github.com/etsy/opsweekly

Chart Tit le

Chart Tit le

Final Lessons Learned

• Templates are awesome

• Start small

• Automation is awesome

• Trust but verify

• Learn from (y)our mistakes

• Iterate on the tools you have

Open Source Summary

Open Source Summary

• http://github.com/etsy/deployinator

• http://github.com/etsy/pushbot

• http://github.com/etsy/trylib

• http://github.com/etsy/opsweekly

• http://github.com/etsy/nagios-herald

• http://github.com/RJ/irccat

THANK YOU!

@beerops - @lozzd Velocity 2016