Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage...

84
Monkeys in Lab Coats Automating Failure Testing Research at

Transcript of Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage...

Page 1: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

Monkeys in Lab CoatsAutomating Failure Testing Research at

Page 2: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

The whole is greater than the sum of its parts.

- Aristotle[Metaphysics]

Page 3: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

The Professor vs The Practitioner

Peter Alvaro

Ex-Berkeley, Ex-Industry

Assistant Prof @ Santa Cruz

Misses the calm of PhD life

Likes prototyping stuff

Kolton Andrus

Ex-Netflix, Ex-Amazon

‘Chaos’ Engineer

Misses his actual pager

Likes breaking stuff

Page 4: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

Measures of Success

Academic

H-Index

Grant warchest

Department ranking

Industry

Availability (i.e. 99.99% uptime)

Number of Incidents

Reduce Operational Burden

Page 5: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

An Unlikely Team?

Page 6: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know
Page 7: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

but ... it’s manual

Works Great!

Page 8: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

Surely there is a better way ...

Page 9: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know
Page 10: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

Free lunch?

Page 11: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

The End?

(Academia + Industry)

Page 12: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

Let’s build it“Can we, pretty please?”

Page 13: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

Freedom and Responsibility

Core Value

Page 14: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

Responsibility

Academic Industry

Prove that it works

Show that it scales

Find real bugs

Page 15: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

The Big Idea Lineage Driven Fault Injection

Page 16: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

What could possibly go wrong?

Consider computation involving 100 services

Search Space:2100 executions

Page 17: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

“Depth” of bugs

Single Faults Search Space:100 executions

Page 18: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

“Depth” of bugs

Combination of 4 faults Search Space:3M executions

Page 19: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

“Depth” of bugs

Combination of 7 faults Search Space:16B executions

Page 20: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

Random Search

Search Space:2100 executions

Page 21: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

Engineer-guided Search

Search Space:???

Page 22: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

Fault-tolerance “is just” redundancy

Page 23: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

How do we find the redundancy?

Could a bad ‘thing’ ever happen?

Why did a good ‘thing’ happen?

Page 24: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

Lineage-driven fault injection

Why did a good thing happen?

Consider its lineage.

The write is stable

Stored on RepA

Stored on RepB

Bcast1 Bcast2

Client Client

Page 25: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

Lineage-driven fault injection

Why did a good thing happen?

Consider its lineage.

What could have gone wrong?

Faults are cuts in the lineage graph.

Is there a cut that breaks all supports?

The write is stable

Stored on RepA

Stored on RepB

Bcast1 Bcast2

Client Client

Page 26: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

Lineage-driven fault injection

Why did a good thing happen?

Consider its lineage.

What could have gone wrong?

Faults are cuts in the lineage graph.

Is there a cut that breaks all supports?

The write is stable

Stored on RepA

Stored on RepB

Bcast1 Bcast2

Client Client

Page 27: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

What would have to go wrong?

(RepA OR Bcast1)

The write is stable

Stored on RepA

Stored on RepB

Bcast2

Client Client

Bcast1

Page 28: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

What would have to go wrong?

(RepA OR Bcast1)

AND (RepA OR Bcast2)

The write is stable

Stored on RepA

Stored on RepB

Bcast1 Bcast2

Client Client

Page 29: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

What would have to go wrong?

(RepA OR Bcast1)

AND (RepA OR Bcast2)

AND (RepB OR Bcast2)

The write is stable

Stored on RepA

Stored on RepB

Bcast1

Client Client

Bcast2

Page 30: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

What would have to go wrong?

(RepA OR Bcast1)

AND (RepA OR Bcast2)

AND (RepB OR Bcast2)

AND (RepB OR Bcast1)

The write is stable

Stored on RepA

Stored on RepB

Bcast1 Bcast2

Client Client

Page 31: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

Lineage-driven fault injection The write is stable

Stored on RepA

Stored on RepB

Bcast1 Bcast2

Client Client

Hypothesis: {Bcast1, Bcast2}

Page 32: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

Search Space Reduction

Each Experiment finds a bug, OR

Reduces the Search space

Page 33: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

The prototype system “Molly”Recipe:

1. Start with a successful outcome. Work backwards.

2. Ask why it happened: Lineage3. Convert lineage to a boolean

formula and solve4. Lather, rinse, repeat

2. Lineage 3. CNF

Fail1. Success

Why?

Encode

Solve

4. REPEAT

Page 34: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

The Big Idea Meets Production

Page 35: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

1. Start with a successful outcome

2. Lineage

3. CNF

Fail1. Success

Why?

Encode

Solve

4. REPEAT

Page 36: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

What is success?

Page 37: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know
Page 38: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

“Start with the customer and work backwards”

Leadership Principle

Page 39: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know
Page 40: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know
Page 41: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

Lesson 1

Work backwards from what you know

Page 42: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

2. Ask why it happened

2. Lineage

3. CNF

Fail1. Success

Why?

Encode

Solve

4. REPEAT

Page 43: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

Request Tracing

Page 44: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know
Page 45: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

Request Tracing

Page 46: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

Alternate Execution

Page 47: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

Evolution over time

Page 48: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

Redundancy through History

Page 49: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

Lesson 2

Meet in the middle

Page 50: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

3. Solve

2. Lineage

3. CNF

Fail1. Success

Why?

Encode

Solve

4. REPEAT

Page 51: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

A “small” matter of code

Page 52: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

4. Lather, Rinse, Repeat

2. Lineage

3. CNF

Fail1. Success

Why?

Encode

Solve

4. REPEAT

Page 53: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

Turn the crank, right?

Page 54: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

Idempotence

Page 55: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know
Page 56: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know
Page 57: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

Bins and Balls

Request

Class 1

Class 2

Class 3

Class n

[...]

r’ r

Page 58: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

Class n

Predicting Request Graphs

Request

Page 59: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

Class n

Predicting Request Graphs

Request

Some function f: Requests → Classes

Page 60: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

F( ) = Class n

Request

Predicting Request Graphs

Page 61: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know
Page 62: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

Solve the Machine Learning problem?or the Failure Testing one?

Page 63: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

Simplest thing that will work?

Page 64: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

["bookmarks”, “recent”]

["playlist", 0, “name”]

["ratings"]

Falcor Path Mapping

=> “bookmarks,playlist,ratings”

Page 65: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

Lesson 3

Adapt the theory to the reality

Page 66: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

Many moons passed...

Page 67: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

Does it work? YES!

Page 68: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

Case study: “Netflix AppBoot”

Services ~100

Search space (executions) 2100 (1,000,000,000,000,000,000,000,000,000,000)

Experiments performed 200

Critical bugs found 11

Page 69: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

Future Work

Richer device metrics

Request class creation

Better experiment selection

Search prioritization

Richer lineage collection

Exploring temporal interleavings

Page 70: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know
Page 71: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know
Page 72: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know
Page 73: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know
Page 74: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know
Page 75: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know
Page 76: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know
Page 77: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know
Page 78: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know
Page 79: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

Lessons

Work backwards from what you know

Meet in the middle

Adapt the theory to the reality

Page 80: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

Academia + Industry

Page 81: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

Academia + Industry

Academia Industry

Page 82: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

Thank You!

Peter Alvaro

@palvaro

[email protected]

Kolton Andrus

@KoltonAndrus

[email protected]

Page 83: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

References● Netflix Blog on ‘Automated Failure Testing’

http://techblog.netflix.com/2016/01/automated-failure-testing.html

● Netflix Blog on ‘Failure Injection Testing’ techblog.netflix.com/2014/10/fit-failure-injection-testing.html

● ‘Lineage Driven Fault Injection’http://people.ucsc.edu/~palvaro/molly.pdf

● ‘Automating Failure Testing Research at Scale’https://people.ucsc.edu/~palvaro/socc16.pdf

Page 84: Monkeys in Lab Coats - QCon · Better experiment selection Search prioritization Richer lineage collection Exploring temporal interleavings. Lessons Work backwards from what you know

Photo Credits● http://etc.usf.edu/clipart/4000/4048/children_7_lg.gif● http://cdn.c.photoshelter.com/img-get2/I0000MIN8fL0q8AA/fit=1000x750/taiw

an-hiking-river-tracing-walking.jpg● http://i.imgur.com/iWKad22.jpg● https://blogs.endjin.com/2014/05/event-stream-manipulation-using-rx-part-2/● http://youpivot.com/category/features/● https://www.cloudave.com/33427/boards-need-evolve-time/● https://www.linkedin.com/pulse/amelia-packager-missing-data-imputation-ram

prakash-veluchamy