DevOpsDays Sydney 2016 Report-back fromfiles.meetup.com/19537394/Devopsdays Sydney 2016 v2.pdf ·...

26
Report-back from DevOpsDays Sydney 2016 Scott McLauchlan [email protected]

Transcript of DevOpsDays Sydney 2016 Report-back fromfiles.meetup.com/19537394/Devopsdays Sydney 2016 v2.pdf ·...

Page 1: DevOpsDays Sydney 2016 Report-back fromfiles.meetup.com/19537394/Devopsdays Sydney 2016 v2.pdf · Patrick Robinson, System/DevOps Engineer at Envato 1. Reduce toil. 2. Build resilient

Report-back from DevOpsDays Sydney 2016

Scott [email protected]

Page 2: DevOpsDays Sydney 2016 Report-back fromfiles.meetup.com/19537394/Devopsdays Sydney 2016 v2.pdf · Patrick Robinson, System/DevOps Engineer at Envato 1. Reduce toil. 2. Build resilient

About the Conference

● “The conference that brings development and operations together.”

● Held in Sydney on Thursday 1 and Friday 2 December.

● 8 conventional presentations/keynotes.● 6 “Open Spaces” - each comprising of three

attendee-driven breakout sessions.● 5 “Ignite” talks (20 slides, 15 seconds each,

automatically advanced).● Several hundred attendees from private industry,

government and academia.● Key theme - DevOps is a mindset/culture, not a

product or technology.

Page 3: DevOpsDays Sydney 2016 Report-back fromfiles.meetup.com/19537394/Devopsdays Sydney 2016 v2.pdf · Patrick Robinson, System/DevOps Engineer at Envato 1. Reduce toil. 2. Build resilient

Containers will not fix your broken culture (and other hard truths)Bridget Kromhout, Principal Technologist for Cloud Foundry at Pivotal

● Precis: “Containers will not fix your broken culture. Microservices won’t prevent your two-pizza teams from needing to have conversations with one another over that pizza. No amount of industrial-strength job scheduling makes your organization immune to Conway's Law”

● You need to work out what the problem you’re trying to solve first, then choose a technology to help solve it. Don’t end up with:“We replaced our monolith with micro services so that every outage could be more like a murder mystery.” (@honest_update)

Page 4: DevOpsDays Sydney 2016 Report-back fromfiles.meetup.com/19537394/Devopsdays Sydney 2016 v2.pdf · Patrick Robinson, System/DevOps Engineer at Envato 1. Reduce toil. 2. Build resilient

Containers will not fix your broken culture (cont.)● Conway’s Law: Any organization that designs a system

(defined broadly) will produce a design whose structure is a copy of the organization's communication structure.

● What should be done:○ Let your developers see production metrics, logs, etc.○ Be careful of “shared” assumptions - defaults can be

dangerous.○ When writing comments remember “why” is more useful

than “what”.

● Process is the scar-tissue from past failure.

Page 5: DevOpsDays Sydney 2016 Report-back fromfiles.meetup.com/19537394/Devopsdays Sydney 2016 v2.pdf · Patrick Robinson, System/DevOps Engineer at Envato 1. Reduce toil. 2. Build resilient

Why?: the forgotten word of DevOpsHannah Browne, General Manager of Cevo

● “DevOps”, “Digital”, “Agile Transformation” are all just jargon meaning “doing software better”.

● DevOps is a culture/mindset

● Agile is another word for “BizDev”.

● Agile is how you build an application, DevOps is managing the application throughout its life.

● The value in quickly iterating is that it allows you to validate your work with the customer.

Page 6: DevOpsDays Sydney 2016 Report-back fromfiles.meetup.com/19537394/Devopsdays Sydney 2016 v2.pdf · Patrick Robinson, System/DevOps Engineer at Envato 1. Reduce toil. 2. Build resilient

Why?: the forgotten word of DevOps (cont.)● Goals should be measurable.

● Prioritisation is asking “Why” - the hard part is finding the right person to ask.

● Post mortems and peer reviews are important places to ask “why”.

Page 7: DevOpsDays Sydney 2016 Report-back fromfiles.meetup.com/19537394/Devopsdays Sydney 2016 v2.pdf · Patrick Robinson, System/DevOps Engineer at Envato 1. Reduce toil. 2. Build resilient

The 10 Rules of AutomationPatrick Robinson, System/DevOps Engineer at Envato

1. Reduce toil.2. Build resilient automation.3. Build systems that are easy to reason about.4. Prefer small tools.5. Five stages of immutability

i. Petsii. Wagyu beef (mutable)iii. Frankenstein (application deployment separate from

immutable server deployment)iv. Sir Nutkin (server and application are both immutable but

deployed separately)v. Unicorn (100% immutable)

Page 8: DevOpsDays Sydney 2016 Report-back fromfiles.meetup.com/19537394/Devopsdays Sydney 2016 v2.pdf · Patrick Robinson, System/DevOps Engineer at Envato 1. Reduce toil. 2. Build resilient

The 10 Rules of Automation (cont.)

6. Design modular systems.7. Systems should be composed of loosely coupled

components.8. Avoid automation fatigue - there should be no unnecessary

manual intervention.9. Don’t just automate a complex, error-prone process - fix the

process first.10. Avoid reinventing the wheel - if a tool does 95% of the job

just do the other 5% in Bash.

Page 9: DevOpsDays Sydney 2016 Report-back fromfiles.meetup.com/19537394/Devopsdays Sydney 2016 v2.pdf · Patrick Robinson, System/DevOps Engineer at Envato 1. Reduce toil. 2. Build resilient

Deepening our people, to weather the orgLindsay Holmwood, Engineering Manager at the DTA

● DevOps is about changing technical culture, but technical culture is tightly linked to the broader organisational culture.

● Culture comes from artifacts, beliefs and assumptions.

● Organisations have values, but are they lived or aspirational?

● Hindsight is a culture-killer. Explain in terms of foresight, not hindsight.

● Systems are a snapshot of values and assumptions.

Page 10: DevOpsDays Sydney 2016 Report-back fromfiles.meetup.com/19537394/Devopsdays Sydney 2016 v2.pdf · Patrick Robinson, System/DevOps Engineer at Envato 1. Reduce toil. 2. Build resilient

Functions as a Service: Beyond the rainbowPeter Hall, Operations Delivery Lead at REA Group

● Functions as a service, a.k.a. microservices○ AWS Lambda○ Google Cloud Functions○ Azure Functions

● Useful to replace cron boxes.

● Use as glue between event-based services.

● If using AWS store artifacts in version-enables S3 buckets.

● Lambda is not always cheaper than EC2.

● Take care that Lambda doesn’t use all the IPs in your VPC through scaling.

Page 11: DevOpsDays Sydney 2016 Report-back fromfiles.meetup.com/19537394/Devopsdays Sydney 2016 v2.pdf · Patrick Robinson, System/DevOps Engineer at Envato 1. Reduce toil. 2. Build resilient

Do small data sets dream of big data?Mujtaba Hussain, Infrastructure Engineer at Fillr

● Important to understand how your users interact with your product

● Be very careful of assumptions.

● Use data visualisation to see patterns in the way your users use your product.

● Introduce experimentation, rapid failure to every facet of your product lifecycle based on data accumulated.

● Automate the testing of your backups.

Page 12: DevOpsDays Sydney 2016 Report-back fromfiles.meetup.com/19537394/Devopsdays Sydney 2016 v2.pdf · Patrick Robinson, System/DevOps Engineer at Envato 1. Reduce toil. 2. Build resilient

Three years of breaking things to make them better.Donny Nadolny, Developer at PagerDuty

● “Failure Friday” - weekly exercise using simple failures to expose problems in systems and alerting.

● Don’t automate straight away.

● Switch it up when it gets boring.

● Failures include stopping and restarting processes, suspending then resuming processes, rebooting servers, isolating network segments, adding network latency and packet loss, ramping up traffic to DR site, failing over DB, “taking down” a datacentre.

Page 13: DevOpsDays Sydney 2016 Report-back fromfiles.meetup.com/19537394/Devopsdays Sydney 2016 v2.pdf · Patrick Robinson, System/DevOps Engineer at Envato 1. Reduce toil. 2. Build resilient

Three years of breaking things to make them better (cont.)● 100% is the wrong reliability target for everything.

● Error budget = permissible downtime per month:○ 90% uptime = 3 days per month○ 99% uptime = 7 hours per month○ 99.9% uptime = 43 minutes per month○ 99.99% uptime = 4 minutes per month○ 99.999% uptime = 26 seconds per month

● If under error budget for 3 months in a row consider taking down the service to use most of the budget○ Find hidden dependencies.○ “Gut check” target availability.

Page 14: DevOpsDays Sydney 2016 Report-back fromfiles.meetup.com/19537394/Devopsdays Sydney 2016 v2.pdf · Patrick Robinson, System/DevOps Engineer at Envato 1. Reduce toil. 2. Build resilient

Abstractions and Metaphors: words matter in operationsNigel Kersten, Puppet

● Thoughts and actions are shaped by the metaphors we use.

● Metaphors are ubiquitous in IT, especially in automation and operations.

● Argument is war - “win”, “loose”, “right on target”, “indefensible”, “shot down”.

● Time is money - “spend”, “cost”, “invested”, “borrowed”, “worth”.

Page 15: DevOpsDays Sydney 2016 Report-back fromfiles.meetup.com/19537394/Devopsdays Sydney 2016 v2.pdf · Patrick Robinson, System/DevOps Engineer at Envato 1. Reduce toil. 2. Build resilient

Abstractions and Metaphors: words matter in operations (cont.)● Examples in computing - “throw and catch errors”, “threads”,

“containers”, “ruby gems”, etc.

● Good is up, bad is down.Happy is up, sad is down.Health/life is up, sickness/death is down.Rational is up, emotional is down.Unknown is up, known is down (contrasting with the others).

● Time is a moving object.

● The mind is a machine.

Page 16: DevOpsDays Sydney 2016 Report-back fromfiles.meetup.com/19537394/Devopsdays Sydney 2016 v2.pdf · Patrick Robinson, System/DevOps Engineer at Envato 1. Reduce toil. 2. Build resilient

Abstractions and Metaphors: words matter in operations (cont.)● The mind is a brittle object.

● Personification - “the system is not happy”, “the application got confused”, etc.

● Why are containers the great new paradigm, when Solaris zones and BSD jails have been around for years? Is it because “jails” are unattractive and “zones” are meaningless?

Page 17: DevOpsDays Sydney 2016 Report-back fromfiles.meetup.com/19537394/Devopsdays Sydney 2016 v2.pdf · Patrick Robinson, System/DevOps Engineer at Envato 1. Reduce toil. 2. Build resilient

Open Spaces

Open Space #1● Ops Team planning/Tracking Metrics● How to sell the "Why" to our organisations● How to run integration tests with microservices

Open Space #2● Patching all the immutable things● Teaching Ops● Infrastructure as code with multiple teams

Page 18: DevOpsDays Sydney 2016 Report-back fromfiles.meetup.com/19537394/Devopsdays Sydney 2016 v2.pdf · Patrick Robinson, System/DevOps Engineer at Envato 1. Reduce toil. 2. Build resilient

Open Spaces (cont.)

Open Space #3● How to do documentation right● Improving Communication & Mentoring Junior team

members● Container orchestration

Open Space #4● Arrested DevOps Podcast● More 'dev' in the 'ops'● What is traditional project management’s role in

DevOps? ● DevOps for DB admins

Page 19: DevOpsDays Sydney 2016 Report-back fromfiles.meetup.com/19537394/Devopsdays Sydney 2016 v2.pdf · Patrick Robinson, System/DevOps Engineer at Envato 1. Reduce toil. 2. Build resilient

Open Spaces (cont.)

Open Space #5● Management and Leadership● War Stories● Building better containers with Habitat

Open Space #6● Convincing the business to retire the old stuff● Security and DevOps● Hiring Best Practice

Page 20: DevOpsDays Sydney 2016 Report-back fromfiles.meetup.com/19537394/Devopsdays Sydney 2016 v2.pdf · Patrick Robinson, System/DevOps Engineer at Envato 1. Reduce toil. 2. Build resilient

Open Spaces (cont.)Ops Team planning/Tracking Metrics

● Use the concept of a “goalkeeper” to shield other team members from ad hoc work

● Have an “ops budget” for outside teams (e.g. one person for 2.5 days per week).

● Embed team members with outside teams rather than bringing work in.

Patching all the immutable things● Bake AMI/Docker images and automate regular

rebuilds of the images.● Keep a base image with OS and security patches,

rebuild the application image when the base image is updated.

Page 21: DevOpsDays Sydney 2016 Report-back fromfiles.meetup.com/19537394/Devopsdays Sydney 2016 v2.pdf · Patrick Robinson, System/DevOps Engineer at Envato 1. Reduce toil. 2. Build resilient

Open Spaces (cont.)How to do documentation right

● Expose a newbie to your documentation● Regular “gardening” (e.g. go to a random wiki page

and update/correct it).● Use templates where possible.● Remember to answer what, why, who and how.● Python MkDocs is a useful tool.

Arrested DevOps Podcast● https://www.arresteddevops.com/devopsdays-sydney-2016/

Page 22: DevOpsDays Sydney 2016 Report-back fromfiles.meetup.com/19537394/Devopsdays Sydney 2016 v2.pdf · Patrick Robinson, System/DevOps Engineer at Envato 1. Reduce toil. 2. Build resilient

Open Spaces (cont.)Management and Leadership

● Your management needs to know what you do● Common problems

○ Time to manage, stepping away from a previous tech role○ Not letting go of old tech role - not showing trust in team by

delegating.○ Lack of one-on-ones

Convincing the business to retire the old stuff● Get the Finance department on your side.● Explain and expose technical debt to management.● Break down the costs on an application by application basis.● Get a security audit of the old systems.● Make sure all new systems have a sunset strategy.

Page 23: DevOpsDays Sydney 2016 Report-back fromfiles.meetup.com/19537394/Devopsdays Sydney 2016 v2.pdf · Patrick Robinson, System/DevOps Engineer at Envato 1. Reduce toil. 2. Build resilient

Ignite Talks● FinOps

○ Get your Finance people to attend your stand-ups.○ Shut down AWS resources that aren’t accounted for financially (e.g. don’t have

a cost centre assigned).

● Improving DevOps Monitoring and Alerting○ SMS is the best method to use for alerting.○ Make sure your alerts go to the right people.○ Automate escalation (when alerts aren’t acknowledged).○ Make sure your on-call documentation is accessible, up-to-date and accurate.○ Ensure all of your logs are rotated○ Collect your logs centrally.

● When DNS Fails○ If multiple components need DNS to communicate between them then a

primary DNS failure will cause large latency between components as the DNS requests time out and the components fall back to the secondary DNS server.

Page 24: DevOpsDays Sydney 2016 Report-back fromfiles.meetup.com/19537394/Devopsdays Sydney 2016 v2.pdf · Patrick Robinson, System/DevOps Engineer at Envato 1. Reduce toil. 2. Build resilient

Ignite Talks (cont.)

● Putting the Engineering in Ops○ Problems are opportunities for improvement.○ Make sure your metrics align with business needs.○ Don’t put business decisions in the hands of people that

don’t understand the business.○ An out-of-date Configuration Management Database (CMDB)

is worse than no CMDB.○ A process is not tested until it has been validated by a

non-subject matter expert.○ If a customer reports a problem before you notice it, you

have failed.

Page 25: DevOpsDays Sydney 2016 Report-back fromfiles.meetup.com/19537394/Devopsdays Sydney 2016 v2.pdf · Patrick Robinson, System/DevOps Engineer at Envato 1. Reduce toil. 2. Build resilient

Ignite Talks (cont.)

● A Guide to On-call ○ Include developers in your on-call roster.○ Remember that it is not necessary (or possible) for everyone

on the on-call roster to know everything about every system.○ Make sure that there is plenty of documentation, good

access to logs and reliable monitoring.○ Hold handover meetings, discussing the alerts that have

occurred in the previous on-call period.○ Keep managers informed of the costs of on-call.○ Hold post-incident reviews.○ Hold retros for each on-call period.

Page 26: DevOpsDays Sydney 2016 Report-back fromfiles.meetup.com/19537394/Devopsdays Sydney 2016 v2.pdf · Patrick Robinson, System/DevOps Engineer at Envato 1. Reduce toil. 2. Build resilient

Phone: +61 2 6249 9598Web: www.ga.gov.auEmail: [email protected]: Cnr Jerrabomberra Avenue and Hindmarsh Drive, Symonston ACT 2609Postal Address: GPO Box 378, Canberra ACT 2601

Questions?