Automating System Administration Recovering your life · Operations Linux/Hosting/Colo...

20
Recovering your life Automating System Administration Recovering your life

Transcript of Automating System Administration Recovering your life · Operations Linux/Hosting/Colo...

Page 1: Automating System Administration Recovering your life · Operations Linux/Hosting/Colo Windows/Internal IT 16 L1 16 L2/L3 16 L1/L2/L3. Selling upwards. The 10000 foot view ... system

Recovering your lifeAutomating System AdministrationRecovering your life

Page 2: Automating System Administration Recovering your life · Operations Linux/Hosting/Colo Windows/Internal IT 16 L1 16 L2/L3 16 L1/L2/L3. Selling upwards. The 10000 foot view ... system

A note on the group structure

Sysad

Operations Linux/Hosting/Colo Windows/Internal IT

16 L1 16 L2/L3 16 L1/L2/L3

Page 3: Automating System Administration Recovering your life · Operations Linux/Hosting/Colo Windows/Internal IT 16 L1 16 L2/L3 16 L1/L2/L3. Selling upwards. The 10000 foot view ... system

Selling upwards

Page 4: Automating System Administration Recovering your life · Operations Linux/Hosting/Colo Windows/Internal IT 16 L1 16 L2/L3 16 L1/L2/L3. Selling upwards. The 10000 foot view ... system

The 10000 foot view

Page 5: Automating System Administration Recovering your life · Operations Linux/Hosting/Colo Windows/Internal IT 16 L1 16 L2/L3 16 L1/L2/L3. Selling upwards. The 10000 foot view ... system

● Management is generally not too interested in the technical details

● Show the benefits to the business– People not burning out

– People being more productive

– More income per person (or reduced costs)

Business wins

Page 6: Automating System Administration Recovering your life · Operations Linux/Hosting/Colo Windows/Internal IT 16 L1 16 L2/L3 16 L1/L2/L3. Selling upwards. The 10000 foot view ... system

Selling to your peers

Page 7: Automating System Administration Recovering your life · Operations Linux/Hosting/Colo Windows/Internal IT 16 L1 16 L2/L3 16 L1/L2/L3. Selling upwards. The 10000 foot view ... system

Show them the view

Page 8: Automating System Administration Recovering your life · Operations Linux/Hosting/Colo Windows/Internal IT 16 L1 16 L2/L3 16 L1/L2/L3. Selling upwards. The 10000 foot view ... system

Everyone moves together

Page 9: Automating System Administration Recovering your life · Operations Linux/Hosting/Colo Windows/Internal IT 16 L1 16 L2/L3 16 L1/L2/L3. Selling upwards. The 10000 foot view ... system

Be clear about what you want at the end

Page 10: Automating System Administration Recovering your life · Operations Linux/Hosting/Colo Windows/Internal IT 16 L1 16 L2/L3 16 L1/L2/L3. Selling upwards. The 10000 foot view ... system

Baby steps

Page 11: Automating System Administration Recovering your life · Operations Linux/Hosting/Colo Windows/Internal IT 16 L1 16 L2/L3 16 L1/L2/L3. Selling upwards. The 10000 foot view ... system

The first steps

● Choose your tools– Puppet, cfengine, BCFG2, LCFG, Config, ...

● We started out by managing only three identical mail exchangers

● We only managed some common files on all the three boxes

Page 12: Automating System Administration Recovering your life · Operations Linux/Hosting/Colo Windows/Internal IT 16 L1 16 L2/L3 16 L1/L2/L3. Selling upwards. The 10000 foot view ... system

Getting people comfortable

● For a few days, files were still edited on the end boxes, only to be overwritten

● Then editing files on the puppetmaster became a given

● Once this happened, we moved NagIOS check definitions, event handlers and some more config files into the puppetmaster

Page 13: Automating System Administration Recovering your life · Operations Linux/Hosting/Colo Windows/Internal IT 16 L1 16 L2/L3 16 L1/L2/L3. Selling upwards. The 10000 foot view ... system

Danger awaitsthe unwary

Especially if you don'thave change controlor QA

Page 14: Automating System Administration Recovering your life · Operations Linux/Hosting/Colo Windows/Internal IT 16 L1 16 L2/L3 16 L1/L2/L3. Selling upwards. The 10000 foot view ... system

Disaster tales

● Any sysadmin can blow up a box, but to blow up an entire bunch of machines takes a sysadmin with a configuration management system

● We had a junior admin send out a hosts.deny file with sshd: ALL

● Recovering from that was ... fun, and instructive

Page 15: Automating System Administration Recovering your life · Operations Linux/Hosting/Colo Windows/Internal IT 16 L1 16 L2/L3 16 L1/L2/L3. Selling upwards. The 10000 foot view ... system

Benefits of the disaster

● The entire team saw how the configuration management system could reduce their work

● There was a major wave of support for change control and a revision control system for configurations

Page 16: Automating System Administration Recovering your life · Operations Linux/Hosting/Colo Windows/Internal IT 16 L1 16 L2/L3 16 L1/L2/L3. Selling upwards. The 10000 foot view ... system

Package management

● Package deployment via RPM or YUM● YUM repository for RPMs

– Easy task

● All packages standardised and converted into RPM format– This was a hard task, taking about three months of

effort

Page 17: Automating System Administration Recovering your life · Operations Linux/Hosting/Colo Windows/Internal IT 16 L1 16 L2/L3 16 L1/L2/L3. Selling upwards. The 10000 foot view ... system

Partial victories as milestones

Page 18: Automating System Administration Recovering your life · Operations Linux/Hosting/Colo Windows/Internal IT 16 L1 16 L2/L3 16 L1/L2/L3. Selling upwards. The 10000 foot view ... system

Major improvements

● Reduced number of telephonic support calls● Used Jabber conference rooms far more

effectively● Reduced effective work hours from 12+ to a far

more manageable 8/day● As we introduced package management, host

deployment time per project went from three weeks to one.

Page 19: Automating System Administration Recovering your life · Operations Linux/Hosting/Colo Windows/Internal IT 16 L1 16 L2/L3 16 L1/L2/L3. Selling upwards. The 10000 foot view ... system

Current status, and further goals

Page 20: Automating System Administration Recovering your life · Operations Linux/Hosting/Colo Windows/Internal IT 16 L1 16 L2/L3 16 L1/L2/L3. Selling upwards. The 10000 foot view ... system

Currently missing

● An effective change control system● An effective review process for changes● ITIL?● Internal Developer support● Windows tooling and integration