DOES15 - Randy Shoup - Ten (Hard-Won) Lessons of the DevOps Transition

Post on 14-Feb-2017

258 views 2 download

Transcript of DOES15 - Randy Shoup - Ten (Hard-Won) Lessons of the DevOps Transition

Ten (Hard-Won) Lessonsof the DevOps

Transition

Randy Shoup @randyshoup

linkedin.com/in/randyshoup

1. Reorganize TeamsAround Ownership

• End-to-end Ownershipo Small, cross-functional team owns application / service from design to

deployment to retiremento Team has inside it all skill sets needed to do the jobo Depends on other teams for supporting serviceso Able to move very rapidly and independently

• “You build it, you run it”o The same team that builds the software operates the softwareo No separate maintenance or sustaining engineering team

1. Reorganize TeamsAround Ownership

• E.g., KIXEYE and MySQLo Development team wrote the SQL, issued all the querieso DBA / Ops team responsible for performance and uptimeo Splitting ownership between teams was counterproductive and

disruptive

• Alternative strategieso Centrally-maintained persistence service

ORo Customer manages its own persistence

2. Lose the Ticket Culture

Ticket Culture Ownership CultureDo what is asked for Do what is neededOne-way communication Two-way collaborationGoal is to close the ticket Goal is product successReactive approach Proactive approachReinforces silos Reinforces collaborationPrioritizes process Prioritizes results

3. Replace Approvals With Code

• Reduce or eliminate approval bodieso E.g., eBay Architecture Review Boardo (-) Too lateo (-) Too slowo (-) Too disengaged from details

• Package expertise in codeo Smart, experienced people build their knowledge into codeo Teams with specialized skills (databases, security, compliance, etc.)

provide a service, library, or tool

3. Replace Approvals With Code

• E.g., Security at Googleo Provide secure foundations by maintaining lower-level libraries and

serviceso Provide self-service penetration tests, vulnerability assessments, etc.

The easiest way to “enforce” a standard practice is with working code.

4. Enforce a Service Mentality

• Vendor-Customer Disciplineo Service team is a vendor; the products are its customerso Service is useful only to the extent it provides value to its customers

• Customer can choose to use service or not (!)o Customer team is responsible for deciding what is best for their use

caseo Use the right tool for the right job

• Provides powerful incentiveso Service must be *strictly better* than the alternatives of build, buy,

borrow

5. Charge for Usage

• Charge customers for *usage* of the serviceo Aligns economic incentives of customer and providero Motivates both sides to optimize efficiency

• Free usage leads to wasteo No incentive to control usage or find more efficient alternatives

• E.g., App Engine usage at Googleo Charging particularly egregious internal customer led to 10x

reduction in usage

6. Prioritize Quality

• Quality, Performance, and Reliability are “Priority-0 features”o “Stop the line” if there is a degradationo Equally important to users as product features or engaging user

experience

• Developers write tests and code togethero Continuous testing of features, performance, loado Confidence to make risky changes

• “Slow down to speed up”o Catch bugs earlier, fail faster

6. Prioritize Quality

• E.g., Development Process at Googleo Code reviews before submissiono Automated tests for everythingo Single searchable source code repository

Internal Open Source Modelo Not “here is a bug report”o Instead “here is the bug; here is the code fix; here is the test that

verifies the fix”

7. Start Investing in Testing

• Write functional tests around a componento If you can only write a few tests, they should be meaningful oneso End-to-end tests exercise more meaningful customer-visible

capabilities than unit tests

• Fail any build that breaks a test

• Keep ratcheting up the testso For every new feature, add tests for that featureo For every new bug, add a test that reproduces the bug and verifies the

fix

8. Actively ManageTechnical Debt

• Maintain sustainable and well-understood level of debto Denominated in engineering effort to fixo Plan for how and when you will pay it offo Track feature work vs. accrued debt over time

• “Don’t have time to do it right” ?o WRONG – Don’t have time to do it twice (!)o The more constrained you are on time and resources, the more

important it is to do it solidly the first time

Vicious Cycle of Technical Debt

Technical Debt

“No time to

do it right”

Quick-and-dirty

Virtuous Cycle of Investment

Solid Foundatio

n

Confidence

Faster and

Better

Invest in Quality

9. Share On-call Duties

• All members of the team rotate on-call responsibilitieso Strongest motivator to build in solid monitoring and diagnosis

capabilitieso Best way to learn the real-world behavior of the systemo Best way to develop empathy for customers and other team members

• Train via on-call “apprenticeship”o 1. Apprentice starts as secondary on-call, experienced engineer is

primaryo 2. Apprentice is primary, experienced engineer is secondaryo 3. Apprentice graduates

10. Make Post-Mortems

Truly Blameless• Overcoming blame culture takes work

o Institutional memory of blame is longo E.g., Initial post-mortems at KIXEYE elicited tons of fear

• Constantly reinforce learning over blameo When you say “blameless”, you have to really mean it (!)o Don’t ask “what did you do?”, ask “what did you learn?”

10. Make Post-Mortems

Truly Blameless• Open and Honest Discussion

o Document exactly what happenedo What went righto What went wrong

• Focus on Learning and Improvemento How should we change process, technology, documentation, etc.o How could we have automated the problems away?o How could we have diagnosed more quickly?

• Take fear and personalization out of it Engineers will compete to take personal responsibility (!) “Finally we can fix that broken system”

Top Five Takeaways

• 1. Reorganize Teams Around Ownership

• 2. Replace Approvals With Code

• 3. Prioritize Quality

• 4. Actively Manage Technical Debt

• 5. Make Post-Mortems Truly Blameless

What I Could Use Help With

• Encouraging leaders to lose the blame culture

• Measuring productivity in a principled way

• Overcoming resistance to taking the pager

Thank You!• @randyshoup

• linkedin.com/in/randyshoup