Stop Getting CRUSHED - UberConf · Our “Strategy” for Success High Quality Code Low Technical...

Post on 28-Jun-2020

1 views 0 download

Transcript of Stop Getting CRUSHED - UberConf · Our “Strategy” for Success High Quality Code Low Technical...

Business Pressure

Stop Getting CRUSHED by

Janelle Kleinjanelle@openmastery.org @janellekz

How do we break down the wall of ignorance?

This is a HARD Problem.

Focus of this Talk:

My Goals:

How do we repair our relationship with management?

How do we convince management to give us time to work on improvements?

How do we get our team on board with measuring their PAIN?

We Start with the Best of Intentions

High Quality Code

Low Technical Debt

Easy to Maintain

Good Code Coverage

Then This Happens!

Driving to the Edge of Chaos

Difficulty of Work Increases

Edge of Chaos Human

Limitations

PAIN

Product Owner: “We’ve got more important things to do.”

Deferring Problems

Driving to the Edge of Chaos

Edge of Chaos Human

Limitations

Difficulty of Work Increases

PAIN

Manager: “Good job everyone! Keep up that great work ethic!”

Driving to the Edge of Chaos

Painful Releases

Edge of Chaos Human

Limitations

Difficulty of Work Increases

PAIN

Manager: “We need to go faster! Let’s hire more developers.”

Driving to the Edge of Chaos

Thrashing

Edge of Chaos Human

Limitations

Difficulty of Work Increases

PAIN

Developer: “I give up. I don’t care anymore if the project fails.”

Project Meltdown

Driving to the Edge of Chaos

Edge of Chaos Human

Limitations

Difficulty of Work Increases

PAIN

Across Our Industry

Every few years we rewrite our software…

Start%Over%

Unmaintainable%So0ware%

Our “Solution”

Culture of Hopelessness

The Biggest Cause of FAILURE

Across our Industry

No VisibilityControl

Visibility No Control

RESET

“A description of the goal is not a strategy.”

-- Richard P. Rumelt

What’s wrong with our current strategy?

Our “Strategy” for Success

High Quality Code

Low Technical Debt

Easy to Maintain

Good Code Coverage

RESET“A good strategy is a specific and coherent response to—and approach for overcoming—the obstacles to progress.”

-- Richard P. Rumelt

The problem is we don’t have a strategy...

What are the obstacles?

Obstacle 1: Management doesn’t care about interest payments.

Obstacle 2: Management would rather you shut up and do your job.

Obstacle 3: The Problem is outside anyone’s control.

What are the obstacles?

Obstacle 1: Your manager doesn’t care about interest payments.

Obstacle 2: Management would rather you shut up and do your job.

Obstacle 3: The Problem is outside anyone’s control.

“Let’s rewrite the software!”

My new project: “Awesome in Disguise”

I had full control.

Continuous Delivery from day 1.

Then they announced “the plan”

My project was moved under different management…

Crazy Deadlines

Constant Urgency

Compromise Safety for Speed

Time%Pressure%

Compromise%Safety%for%

Speed%

Increase%Number%&%Severity%of%Hazards%

%

More%Pain%and%Higher%Task%Effort%

Constant'Urgency'

Cycle of Chaos High-Risk Decision Habits

Time%Pressure%

Compromise%Safety%for%

Speed%

Increase%Number%&%Severity%of%Hazards%

%

More%Pain%and%Higher%Task%Effort%

Constant'Urgency'

Cycle of Chaos High-Risk Decision Habits

Time%Pressure%

Compromise%Safety%for%

Speed%

Increase%Number%&%Severity%of%Hazards%

%

More%Pain%and%Higher%Task%Effort%

Constant'Urgency'

Cycle of Chaos High-Risk Decision Habits

Time%Pressure%

Compromise%Safety%for%

Speed%

Increase%Number%&%Severity%of%Hazards%

%

More%Pain%and%Higher%Task%Effort%

Constant'Urgency'

Cycle of Chaos High-Risk Decision Habits

Time%Pressure%

Compromise%Safety%for%

Speed%

Increase%Number%&%Severity%of%Hazards%

%

More%Pain%and%Higher%Task%Effort%

Constant'Urgency'

Cycle of Chaos High-Risk Decision Habits

I Tried to Explain “Technical Debt”

“The project is already behind schedule!!”

Manager said:

“How can you possibly justify working on anything other than the deliverables?!”

So we did what we were told.

Our Solution…

Until we couldn’t take it anymore.

Explained the problem of Technical Debt

Business Coaching

“That doesn’t sound so bad.”

The Response:

??

??

WHAT?!

Loans are a Predictable Financial Tool

Revenue

- Cost

Profit + 10%

Increase Price?

Increase Sales?

Reduce Cost?

What makes investment decisions harder isn’t higher costs, it’s lower predictability.

Investment Strategy

Obstacle 1: Your manager doesn’t care about interest payments.

But… Managers care A LOT about RISK.

The gradual loss of predictability is much scarier than the gradual increase in cost.

What are the obstacles?

Obstacle 1: Your manager doesn’t care about interest payments.

Obstacle 2: Management would rather you shut up and do your job.

Obstacle 3: The Problem is outside anyone’s control.

What are the obstacles?

Obstacle 1: Your Manager doesn’t care about interest payments.

Obstacle 2: Your manager would rather you shut up and do your job.

Obstacle 3: The System is setup to fail.

Another new project…

“Don’t ask for permission, ask for forgiveness.”

“Don’t ask for permission, ask for forgiveness.”

Another new project…

Then We Got New Management!

I put together “a plan”…

“What is Janelle trying to pull?! Who does she think she is?!”

Management said (behind my back):

Get Back Inside Your Box! (or else)

Severe Violation of SOCIAL PROTOCOL

SOCIAL PROTOCOL

Never talk to your manager’s boss about a problem.

Never try to convince your manager to take a different path by getting others to gang up

to override their decisions

If you want to convince your manager to change decisions, it is your responsibility to BRING DATA.

Developers: “We’re going to hit the wall!”

Managers: “I have no choice but to keep going!”

Then I got into Consulting…

Developers

Manager

Consultant

“We’re going to hit the wall!”

Keynote

“We better invest money in this!”

The Consulting World

The Job of a Consultant

Why do they need my help?!

RESET

Consultants Bridge the Divide

Consultant

Keynote

Obstacle 2: Your manager would rather you shut up and do your job.

Follow SOCIAL PROTOCOL

Stay (Mostly) Inside Developer Box

Communicate in Manager-Speak +

What are the obstacles?

Obstacle 2: Your manager would rather you follow social protocol.

Lesson 3: The system is setup to fail.

Obstacle 1: Your manager doesn’t care about interest payments.

What are the obstacles?

Obstacle 1: Your manager doesn’t care about interest payments.

Obstacle 2: Your manager would rather you follow social protocol.

Obstacle 3: The system is setup to fail.

Software Development is a Discovery Process

Learn & Adapt

If we don’t have a feedback loop to respond to problems

We will CRASH.

Component

Component

Hub

Hub

Hub

“Human Systems Architecture” is a design problem

Ignorant Leader

The Challenge: Decision-Making is Distributed.

What if our organization was a robot?

Fire x1

Dev Team

Management

Nothing’s happening…

What if our organization was a robot?

Fire x1

Dev Team

Management

Fire x10

Management

Dev Team

Fire x10

What if our organization was a robot?

If the feedback loop is broken, we burn.

Learn & Adapt

Role

Decision(Type(

Required(Knowledge(

Visibility and Decisions Coupled

Visibility and Decisions De-coupled

Role A

Required(Knowledge(

Role B

Decision(Type(

Broken Feedback Loop is Baked into the Design

Manager

Alloca&on(Decisions(

Knowledge(of(Risks(

Risk(Mgmt(Decisions(

Developer

Communication Breakdown

Broken Feedback Loop (Manager Role)

Developer Product Owner

Actual'Risks'

interacts''with'

Actual'Customers'

interacts''with'

Knowledge'of'Customers'

Trade9off'Decisions'

depend'on'depend'on'Knowledge'of'Risks'

Communication Breakdown

Broken Feedback Loop (Product Owner Role)

Now we can steer!!

So#ware(Task(

Mi.gate(the(Risks(

Product(Development(

Product(Work(Queue(

Risk(Mgmt(Work(Queue(

Product(Owner(

Product(Decisions(

Knowledge(of(Customers(

Technical(Risk(Manager(

Risk(Mgmt(Decisions(

Knowledge(of(Risks(

Dev$Team$Capacity$

Manager

Alloca.on(Decisions(

Fix: Refactor the Organizational Architecture

This is the design that typically emerges when we have:

Trust.

Role

Decision(Type(

Required(Knowledge(

Visibility and Decisions Coupled

Visibility and Decisions De-coupled

Role A

Required(Knowledge(

Role B

Decision(Type(

Obstacle 3: The system is set up to fail.

We’ve got to fix the machine(even though it’s not our responsibility)

What are the obstacles?

Obstacle 1: Management doesn’t care about interest payments.

Obstacle 2: Management would rather you follow social protocol.

Obstacle 3: The system is setup to fail.

RESET“A good strategy is a specific and coherent response to—and approach for overcoming—the obstacles to progress.”

-- Richard P. Rumelt

What’s the Strategy?

Obstacle 1: Management doesn’t care about interest payments.

Obstacle 2: Management would rather you follow social protocol.

Obstacle 3: The system is setup to fail.

Become a Risk Translator Make visibility part of your job.

Refactor the Organization From Translator, to Partner, to Owner.

Measure the Pain Explain the problems with data.

What’s the Strategy?

Become a Risk Translator Make visibility part of your job.

Refactor the Organization From Translator, to Partner, to Owner.

Measure the Pain Explain the problems with data.

How to Measure the PAIN in Software Development

Janelle Klein

leanpub.com/ideaflow

Open Standard for Measuring “PAIN”

PAIN occurs during the process of understanding and extending the software

Complex(So*ware(

PAIN

Not the Code.

Optimize “Idea Flow”

Unexpected Behavior

Problem Resolved

Measure Painful Experience with the Code

Troubleshooting

Progress

5 hours and 18 minutes of troubleshooting...

PAINFUL

“What caused the pain in this case?”

Categorize the Problems with #HashTags

#ReportingEngine

#Hibernate

#MergeHell

1. Problem A

2. Problem B

3. Problem C

Add up the Pain by Category

What’s the biggest problem to solve?

OSS “Idea Flow Mapping” Tools

github.com/openmastery/tools

Subtask

Troubleshooting

Progress

Learning

7:070:00

0:00 19:52

12 year old project after all original developers left.

Case Study: Huge Mess with Great Team

70-90% of dev capacity on “friction”

The Team’s Improvement Focus: Increasing unit test coverage by 5%

Case Study: Huge Mess with Great Team

“What are the specific problems that are causing the team’s pain?”

Add up the Pain by Category

1. Test Data Generation

2. Merging Problems

3. Repairing Tests

1000 hours/month

The Biggest Problem: ~700 hours/month generating test data

This is SURPRISINGLY EASY to do:

Spend tons of time working on improvements, but our improvements don’t make much difference.How to Avoid This @5pm:

Top 5 Reasons Why Improvement Efforts FAIL

18 months after a Micro-Services/Continuous Delivery rewrite.

Troubleshooting

Progress

Learning40-60% of dev capacity on “friction”

0:00 28:15

12:230:00

Case Study: From Monolith to Microservices

The Architecture Looks SOOO Good on Paper!

Team A Team B Team C

Complexity Moved HereWTF?! WTF?!

They had this problem:

Time%Pressure%

Compromise%Safety%for%

Speed%

Increase%Number%&%Severity%of%Hazards%

%

More%Pain%and%Higher%Task%Effort%

Constant'Urgency'

Which led to this problem…

Edge of Chaos Human

Limitations

Deferring Problems

Painful Releases

Thrashing

Project Meltdown

Driving to the Edge of Chaos

Difficulty of Work Increases

PAIN

The Long-Term Costs of Chaos

0%

100%

Release 1 Release 2 Release 3

Troubleshooting

Progress

Learning

Percentage Capacity spent on Troubleshooting (red) and Learning (blue)

(extrapolated from samples)

0%

100%

Release 1 Release 2 Release 3

Percentage Capacity spent on Troubleshooting (red) and Learning (blue)

Figure out what to do Learning is front-loaded

Troubleshooting

Progress

Learning

The Long-Term Costs of Chaos

0%

100%

Release 1 Release 2 Release 3

Percentage Capacity spent on Troubleshooting (red) and Learning (blue)

Rush Before the Deadline Validation is Deferred

Troubleshooting

Progress

Learning

The Long-Term Costs of Chaos

0%

100%

Release 1 Release 2 Release 3

Percentage Capacity spent on Troubleshooting (red) and Learning (blue)

Pain Builds Baseline friction keeps rising

Troubleshooting

Progress

Learning

The Long-Term Costs of Chaos

0%

100%

Release 1 Release 2 Release 3

Percentage Capacity spent on Troubleshooting (red) and Learning (blue)

Chaos Reigns Unpredictable work stops

fitting in the timebox

Troubleshooting

Progress

Learning

The Long-Term Costs of Chaos

The cost of bad architecture decisions in the microservices world are EXTREMELY HIGH.

Visibility gives us a way to make the case for architecture changes

The Challenge:

How do I get my team to collect data??

Would you be willing to collect data if you knew your management would give you

dedicated time to work on the biggest problems?

1. Don’t ask for Permission

2. State your Goal "I want to make the business case to management for fixing things around here. No more chaos and working on weekends, this needs to stop. But I need data to make the case so I need everyone's help."

3. State the Plan "Here's what I'm thinking. I want to run an experiment to record data for one month on all the time we spend troubleshooting. We can look at the data together and identify our biggest problems, then I’ll write it up and present the case to management to get things fixed.”

4. Enlist the Team “Will you guys help me make this happen?”

Here’s What You Do:

1. Don’t ask for Permission

2. Make the Goal Clear to Your Team "I want to make the business case to management for fixing things around here. No more chaos and working on weekends, this needs to stop. But I need data to make the case so I need everyone's help."

3. State the Plan "Here's what I'm thinking. I want to run an experiment to record data for one month on all the time we spend troubleshooting. We can look at the data together and identify our biggest problems, then I’ll write it up and present the case to management to get things fixed.”

4. Enlist the Team “Will you guys help me make this happen?”

Here’s What You Do:

1. Don’t ask for Permission

2. Make the Goal Clear to Your Team "I want to make the business case to management for fixing things around here. No more chaos and working on weekends, this needs to stop. But I need data to make the case so I need everyone's help."

3. Take Responsibility "Here's what I'm thinking. I want to run an experiment to record data for one month on all the time we spend troubleshooting. We can look at the data together and identify our biggest problems, then I’ll write it up and present the case to management to get things fixed.”

4. Enlist the Team “Will you guys help me make this happen?”

Here’s What You Do:

1. Don’t ask for Permission

2. Make the Goal Clear to Your Team "I want to make the business case to management for fixing things around here. No more chaos and working on weekends, this needs to stop. But I need data to make the case so I need everyone's help."

3. Take Responsibility "Here's what I'm thinking. I want to run an experiment to record data for one month on all the time we spend troubleshooting. We can look at the data together and identify our biggest problems, then I’ll write it up and present the case to management to get things fixed.”

4. Enlist the Team “Will you guys help me make this happen?”

Here’s What You Do:

What’s the Strategy?

Become a Risk Translator Make visibility part of your job.

Refactor the Organization From Translator, to Partner, to Owner.

Measure the Pain Explain the problems with data.

What’s the Strategy?

Become a Risk Translator Make visibility part of your job.

Refactor the Organization From Translator, to Partner, to Owner.

Measure the Pain Explain the problems with data.

The Constraints

Stay (Mostly) Inside Developer Box

Communicate in Manager-Speak +

Manager

Alloca&on(Decisions(

Knowledge(of(Risks(

Risk(Mgmt(Decisions(

Developer

RiskTranslator

Risk(Summary(

Risk Translator Role

Fits Within Developer Box

Your Job is to Repair the Broken Feedback Loop

Risk Translator

Engineering(Execution)

Management(Coordination)

Risk is the bridge language.

Quality Risk (Troubleshooting)

Likelihood)of))Unexpected)Behavior)

Cost)to)Troubleshoot)and)Repair)

High)Frequency)Low)Impact)

Low)Frequency)Low)Impact)

Low)Frequency)High)Impact)

PAIN)

Familiarity Risk (Learning)

Likelihood)of))working)with)Unfamiliar)

Code)

Cost)to)Learn)

High)Frequency)Easy)to)Learn)

Low)Frequency)Easy)to)Learn)

Low)Frequency)Hard)to)Learn)

PAIN)

Assumption Risk (Rework)

Likelihood)of))making)a))

Bad)Assump4on)

Cost)to)Correct)Decisions)

High)Uncertainty)Low)Delay)

Low)Uncertainty)Low)Delay)

Low)Uncertainty)High)Delay)

PAIN)

Decisions that save a few hours

Side-effects that cost several hours

Save 40 hours in direct costs(leave the toy on the stairs)

Increase chances of losing 1000 hours by 20%(tripping and falling)

Explain Problems in Terms of Risk (Gambling)

Distribution of Development Capacity

Over the long-term, probability wins.

Send “Project Visibility Updates”

Hi Larry, I know it’s really hard to stay in the loop on all the different project risks, so I wanted to send you a summarized update of some of our recent findings.

Subject: Project Visibility Update

We started collecting data during development to track where all of our time was going, and made some pretty frightening discoveries.

See attached. Let me know if you’d like to talk.

Risk Translators build

Trustby making sense.

What’s the Strategy?

Become a Risk Translator Make visibility part of your job.

Refactor the Organization From Translator, to Partner, to Owner.

Measure the Pain Explain the problems with data.

What’s the Strategy?

Become a Risk Translator Make visibility part of your job.

Refactor the Organization From Translator, to Partner, to Owner.

Measure the Pain Explain the problems with data.

Manager

Alloca&on(Decisions(

Knowledge(of(Risks(

Risk(Mgmt(Decisions(

Developer

RiskTranslator

Risk(Summary(

Refactor Step 1: Risk Translator

So#ware(Task(

Mi.gate(the(Risks(

Product(Development(

Product(Work(Queue(

Risk(Mgmt(Work(Queue(

Product(Owner(

Product(Decisions(

Knowledge(of(Customers(

Dev$Team$Capacity$

Alloca.on(Decisions(

Manager2Translator$Partnership$

Risk(Mgmt(Decisions(

Knowledge(of(Risks(

Refactor Step 2: Partnership

Now we can steer!!

So#ware(Task(

Mi.gate(the(Risks(

Product(Development(

Product(Work(Queue(

Risk(Mgmt(Work(Queue(

Product(Owner(

Product(Decisions(

Knowledge(of(Customers(

Technical(Risk(Manager(

Risk(Mgmt(Decisions(

Knowledge(of(Risks(

Dev$Team$Capacity$

Manager

Alloca.on(Decisions(

Refactor Step 3: Owner

Option 1 Option 2

Stay the Course Change

This is Safer (less risky)

or

Make the Case for Partnership

Keys to Success

Use specific examples then generalize effects.

Don’t negotiate time, explain the risks.

Use consultant +1 effect.

To create action, use lots of RED.

1. Explain Why You Decided to Collect Data

Saw this talk/read this book about…

(Blame Me)

How to Measure the PAIN in Software Development

Janelle Klein

Time%Pressure%

Compromise%Safety%for%

Speed%

Increase%Number%&%Severity%of%Hazards%

%

More%Pain%and%Higher%Task%Effort%

Constant'Urgency'

“In the book, Janelle talks about this “Cycle of Chaos”…

“As the problems build, they introduce Quality Risk…

Likelihood)of))Unexpected)Behavior)

Cost)to)Troubleshoot)and)Repair)

High)Frequency)Low)Impact)

Low)Frequency)Low)Impact)

Low)Frequency)High)Impact)

PAIN)

Likelihood  of  Mistakes  

Cost  to  Recover  

Quality Risk

Our application is more likely to be in a BROKEN state.

“This can lead to developers spending 50, 70, even 90% of development capacity dealing with chaos.”

0%

100%

Release 1 Release 2 Release 3

Percentage Capacity spent on Troubleshooting (red) and Learning (blue)

Chaos Reigns Over 70% of development

capacity spent dealing with problems

Example Thrashing Project

“We’re measuring the time we spend resolving chaos while we work…

Problems Measured in HOURS.

2. Here’s What We Found…

Pick your WORST offending examples.

Use lots of RED.

Save time by skipping diagnostic

tools (~80 hours)

Side-effects of Troubleshooting time (~700 hours/month)

36h 25m0:00

Troubleshooting

Progress11 hours and 15 minutes of troubleshooting...

Creating a New Customer Report

“This is a timeline that shows all the time we spend troubleshooting…

Save time by deferring architecture fixes

(~100 hours/month)

Side-effects of Environment Downtime

25% of capacity

“When the problems build up, they have a really big impact…

25 developers down for 2 days

“These are the top 3 problems consuming the majority of the team’s development capacity…

1000 hours/month

1. Test Data Generation

2. Merging Problems

3. Repairing False Alarms

Top Three Problems

“The deadline is coming either way…”

80% of features 100% done?100% of features 80% done?

“Here’s what we were thinking…”

3-Month Improvement Trial

Dedicated resources (1 or 2 developers)

Dev team identifies highest-leverage improvement opportunities and prioritizes with management

Continue to share Project Visibility Updates each month

“Will you help us turn this project around?”

So#ware(Task(

Mi.gate(the(Risks(

Product(Development(

Product(Work(Queue(

Risk(Mgmt(Work(Queue(

Product(Owner(

Product(Decisions(

Knowledge(of(Customers(

Technical(Risk(Manager(

Risk(Mgmt(Decisions(

Knowledge(of(Risks(

Dev$Team$Capacity$

Manager

Alloca.on(Decisions(

Refactor Step 3: Owner

Power Move: For a job well done, you get a promotion

Sometimes getting what you want is just a matter of taking responsibility.

Big Takeaway

What’s the Strategy?

Become a Risk Translator Make visibility part of your job.

Refactor the Organization From Translator, to Partner, to Owner.

Measure the Pain Explain the problems with data.

What do you see as the biggest obstacle to success?

Discussion:

Janelle Klein

@janellekz

Upcoming Talks:

Top 5 Reasons Why Improvement Efforts Fail

Keynote: A Programmer’s Guide to Humans

Automated Developer Insight — The Next Frontier

janelle@openmastery.org