Planning for Change in a Formal Verification of the Raft...

52
Planning for Change in a Formal Verification of the Raft Consensus Protocol James Wilcox Steve Anton Zach Tatlock Mike Ernst Tom Anderson Doug Woos

Transcript of Planning for Change in a Formal Verification of the Raft...

Page 1: Planning for Change in a Formal Verification of the Raft ...ztatlock/pubs/verdi-woos-cpp16-slides.pdfRaft Algorithm Proof Overview state machine replication implemented in Verdi and

Planning for Change in a Formal Verification of the

Raft Consensus Protocol

James Wilcox

Steve Anton

Zach Tatlock

Mike Ernst

Tom Anderson

Doug Woos

Page 2: Planning for Change in a Formal Verification of the Raft ...ztatlock/pubs/verdi-woos-cpp16-slides.pdfRaft Algorithm Proof Overview state machine replication implemented in Verdi and

Contributions

First formal proof of Raft’s safety first verified implementation!

Large-scale Verdi case studystress test; reverification inevitable

Proof engineering lessonsaffinity lemmas, etc.

Page 3: Planning for Change in a Formal Verification of the Raft ...ztatlock/pubs/verdi-woos-cpp16-slides.pdfRaft Algorithm Proof Overview state machine replication implemented in Verdi and

Distributed Systems

Page 4: Planning for Change in a Formal Verification of the Raft ...ztatlock/pubs/verdi-woos-cpp16-slides.pdfRaft Algorithm Proof Overview state machine replication implemented in Verdi and

Reliably deliver procrastination

Page 5: Planning for Change in a Formal Verification of the Raft ...ztatlock/pubs/verdi-woos-cpp16-slides.pdfRaft Algorithm Proof Overview state machine replication implemented in Verdi and

Also serious infrastructure

Page 6: Planning for Change in a Formal Verification of the Raft ...ztatlock/pubs/verdi-woos-cpp16-slides.pdfRaft Algorithm Proof Overview state machine replication implemented in Verdi and

One day last summer...

Page 7: Planning for Change in a Formal Verification of the Raft ...ztatlock/pubs/verdi-woos-cpp16-slides.pdfRaft Algorithm Proof Overview state machine replication implemented in Verdi and

One day last summer...

Page 8: Planning for Change in a Formal Verification of the Raft ...ztatlock/pubs/verdi-woos-cpp16-slides.pdfRaft Algorithm Proof Overview state machine replication implemented in Verdi and

One day last summer...

Page 9: Planning for Change in a Formal Verification of the Raft ...ztatlock/pubs/verdi-woos-cpp16-slides.pdfRaft Algorithm Proof Overview state machine replication implemented in Verdi and

How distributed systems fail

Page 10: Planning for Change in a Formal Verification of the Raft ...ztatlock/pubs/verdi-woos-cpp16-slides.pdfRaft Algorithm Proof Overview state machine replication implemented in Verdi and

Related Work

IronFleet [SOSP15]

EventML [LADA12, AVoCS15]

liveness, log compaction, serialization

language for verified distributed systems

Verdi [PLDI15]network semantics, transformers, higher-order

Page 11: Planning for Change in a Formal Verification of the Raft ...ztatlock/pubs/verdi-woos-cpp16-slides.pdfRaft Algorithm Proof Overview state machine replication implemented in Verdi and

Verdi backgroundNetwork semantics

operational semantics define network behavior

Verified system transformersprove property transfer to adversarial network

VSTApp

App App

App

App App

Page 12: Planning for Change in a Formal Verification of the Raft ...ztatlock/pubs/verdi-woos-cpp16-slides.pdfRaft Algorithm Proof Overview state machine replication implemented in Verdi and

Big Picture

Past: Verdi Frameworkcompositional fault tolerance

Present: Verified Raftcritical piece of infrastructure

Future:dynamically upgrading systemsprogram logic

Page 13: Planning for Change in a Formal Verification of the Raft ...ztatlock/pubs/verdi-woos-cpp16-slides.pdfRaft Algorithm Proof Overview state machine replication implemented in Verdi and

Outline

Verification Challenge

Raft Algorithm

Proof Overview

state machine replication

implemented in Verdi

and lessons learned

)

Page 14: Planning for Change in a Formal Verification of the Raft ...ztatlock/pubs/verdi-woos-cpp16-slides.pdfRaft Algorithm Proof Overview state machine replication implemented in Verdi and

Replication for fault tolerance

critical components must not fail

Page 15: Planning for Change in a Formal Verification of the Raft ...ztatlock/pubs/verdi-woos-cpp16-slides.pdfRaft Algorithm Proof Overview state machine replication implemented in Verdi and

Replication for fault tolerance

)available if n/2 nodes are up

replicas must be consistent with

each other

Page 16: Planning for Change in a Formal Verification of the Raft ...ztatlock/pubs/verdi-woos-cpp16-slides.pdfRaft Algorithm Proof Overview state machine replication implemented in Verdi and

Replication for fault tolerance

)

Page 17: Planning for Change in a Formal Verification of the Raft ...ztatlock/pubs/verdi-woos-cpp16-slides.pdfRaft Algorithm Proof Overview state machine replication implemented in Verdi and

)Replication correctness

Page 18: Planning for Change in a Formal Verification of the Raft ...ztatlock/pubs/verdi-woos-cpp16-slides.pdfRaft Algorithm Proof Overview state machine replication implemented in Verdi and

Replication correctness

⇡linearizability

cluster presents consistent order of operations to clients

Page 19: Planning for Change in a Formal Verification of the Raft ...ztatlock/pubs/verdi-woos-cpp16-slides.pdfRaft Algorithm Proof Overview state machine replication implemented in Verdi and

⇡Internal Correctness

linearizability follows from internal correctness:

state machine safety

Page 20: Planning for Change in a Formal Verification of the Raft ...ztatlock/pubs/verdi-woos-cpp16-slides.pdfRaft Algorithm Proof Overview state machine replication implemented in Verdi and

Goal: Verify Raft

)Reduce linearizability to

State Machine Safety [PLDI15]

Prove State Machine Safety

Page 21: Planning for Change in a Formal Verification of the Raft ...ztatlock/pubs/verdi-woos-cpp16-slides.pdfRaft Algorithm Proof Overview state machine replication implemented in Verdi and

Goal: Verify Raft

)

Lin. SMS

LOC

45k

5k

Page 22: Planning for Change in a Formal Verification of the Raft ...ztatlock/pubs/verdi-woos-cpp16-slides.pdfRaft Algorithm Proof Overview state machine replication implemented in Verdi and

Outline

Verification Challenge

Raft Algorithm

Proof Overview

state machine replication

implemented in Verdi

and lessons learned

)

Page 23: Planning for Change in a Formal Verification of the Raft ...ztatlock/pubs/verdi-woos-cpp16-slides.pdfRaft Algorithm Proof Overview state machine replication implemented in Verdi and

Formalizing the networkstate of the world

packets in flight history of I/O

data @ nodes

Page 24: Planning for Change in a Formal Verification of the Raft ...ztatlock/pubs/verdi-woos-cpp16-slides.pdfRaft Algorithm Proof Overview state machine replication implemented in Verdi and

Formalizing the network

Page 25: Planning for Change in a Formal Verification of the Raft ...ztatlock/pubs/verdi-woos-cpp16-slides.pdfRaft Algorithm Proof Overview state machine replication implemented in Verdi and

Formalizing the network

Page 26: Planning for Change in a Formal Verification of the Raft ...ztatlock/pubs/verdi-woos-cpp16-slides.pdfRaft Algorithm Proof Overview state machine replication implemented in Verdi and

Defining network semantics

Hnet(dst, ⌃[dst], src, m)=(�0, o, P 0) ⌃0=⌃[dst 7! �0]

({(src, dst, m)} ] P, ⌃, T ) (P ] P 0, ⌃0, T ++ hoi)Deliver

Page 27: Planning for Change in a Formal Verification of the Raft ...ztatlock/pubs/verdi-woos-cpp16-slides.pdfRaft Algorithm Proof Overview state machine replication implemented in Verdi and

Defining network semantics

Hnet(dst, ⌃[dst], src, m)=(�0, o, P 0) ⌃0=⌃[dst 7! �0]

({(src, dst, m)} ] P, ⌃, T ) (P ] P 0, ⌃0, T ++ hoi)Deliver

p 2 P

(P, ⌃, T ) (P ] {p}, ⌃, T )Duplicate

({p} ] P, ⌃, T ) (P, ⌃, T )Drop

Htmt(n, ⌃[n]) = (�0, o, P 0) ⌃0 = ⌃[n 7! �0]

(P, ⌃, T ) (P ] P 0, ⌃0, T ++ htmt, oi)Timeout

Page 28: Planning for Change in a Formal Verification of the Raft ...ztatlock/pubs/verdi-woos-cpp16-slides.pdfRaft Algorithm Proof Overview state machine replication implemented in Verdi and

Defining network semantics

Hnet(dst, ⌃[dst], src, m)=(�0, o, P 0) ⌃0=⌃[dst 7! �0]

({(src, dst, m)} ] P, ⌃, T ) (P ] P 0, ⌃0, T ++ hoi)Deliver

p 2 P

(P, ⌃, T ) (P ] {p}, ⌃, T )Duplicate

({p} ] P, ⌃, T ) (P, ⌃, T )Drop

Htmt(n, ⌃[n]) = (�0, o, P 0) ⌃0 = ⌃[n 7! �0]

(P, ⌃, T ) (P ] P 0, ⌃0, T ++ htmt, oi)Timeout

systems defined by handlers

Page 29: Planning for Change in a Formal Verification of the Raft ...ztatlock/pubs/verdi-woos-cpp16-slides.pdfRaft Algorithm Proof Overview state machine replication implemented in Verdi and

election

replication

...

Term 3Term 2Term 1

Implementing Raft

Page 30: Planning for Change in a Formal Verification of the Raft ...ztatlock/pubs/verdi-woos-cpp16-slides.pdfRaft Algorithm Proof Overview state machine replication implemented in Verdi and

Implementing Raft: Leader Election

Candidate

Followers

ReqVote Vote

...

Term 3Term 2Term 1

Page 31: Planning for Change in a Formal Verification of the Raft ...ztatlock/pubs/verdi-woos-cpp16-slides.pdfRaft Algorithm Proof Overview state machine replication implemented in Verdi and

Implementing Raft

...

Term 3Term 2Term 1

Page 32: Planning for Change in a Formal Verification of the Raft ...ztatlock/pubs/verdi-woos-cpp16-slides.pdfRaft Algorithm Proof Overview state machine replication implemented in Verdi and

Term 3Term 2Term 1

...

Leader

Followers

Append AppendAck

Implementing Raft: Log Replication

Leader commits entry when receiving

n/2 acks

Page 33: Planning for Change in a Formal Verification of the Raft ...ztatlock/pubs/verdi-woos-cpp16-slides.pdfRaft Algorithm Proof Overview state machine replication implemented in Verdi and

Outline

Verification Challenge

Raft Algorithm

Proof Overview

state machine replication

implemented in Verdi

and lessons learned

)

Page 34: Planning for Change in a Formal Verification of the Raft ...ztatlock/pubs/verdi-woos-cpp16-slides.pdfRaft Algorithm Proof Overview state machine replication implemented in Verdi and

Verifying Raft: Show linearizability

Page 35: Planning for Change in a Formal Verification of the Raft ...ztatlock/pubs/verdi-woos-cpp16-slides.pdfRaft Algorithm Proof Overview state machine replication implemented in Verdi and

Verifying Raft: Approach

)

Page 36: Planning for Change in a Formal Verification of the Raft ...ztatlock/pubs/verdi-woos-cpp16-slides.pdfRaft Algorithm Proof Overview state machine replication implemented in Verdi and

State Machine Safety

Nodes agree about committed entries

proof by induction on an execution

since only committed entries executed

)

Page 37: Planning for Change in a Formal Verification of the Raft ...ztatlock/pubs/verdi-woos-cpp16-slides.pdfRaft Algorithm Proof Overview state machine replication implemented in Verdi and

State Machine Safety: Proof

I) Inot inductive!

Page 38: Planning for Change in a Formal Verification of the Raft ...ztatlock/pubs/verdi-woos-cpp16-slides.pdfRaft Algorithm Proof Overview state machine replication implemented in Verdi and

State Machine Safety: Proof

I) II Itrue initially preserved

Lemma Lemma Lemma …90 invariants in total

Page 39: Planning for Change in a Formal Verification of the Raft ...ztatlock/pubs/verdi-woos-cpp16-slides.pdfRaft Algorithm Proof Overview state machine replication implemented in Verdi and

The burden of proof

P) PP with ghost state

P true initially P preserved

Lemma Lemma …Lemma

Re-verification is the primary challenge: - invariants are not inductive - not-yet-verified code is wrong - need additional invariants

Page 40: Planning for Change in a Formal Verification of the Raft ...ztatlock/pubs/verdi-woos-cpp16-slides.pdfRaft Algorithm Proof Overview state machine replication implemented in Verdi and

The burden of proof

P) PP with ghost state

P true initially P preserved

Lemma Lemma …LemmaRe-verification is the primary challenge

Proof engineering techniques help: - affinity lemmas - intermediate reachability - structural tactics - information hiding

Page 41: Planning for Change in a Formal Verification of the Raft ...ztatlock/pubs/verdi-woos-cpp16-slides.pdfRaft Algorithm Proof Overview state machine replication implemented in Verdi and

Ghost State: ExampleCapture all entries received by a node

Leader

Follower

Append

Log (real) allEntries (ghost)

A,B,C

[A],B,C

A,D {A,D}A,B,C {A,B,C,D}

{A,B,C}

Page 42: Planning for Change in a Formal Verification of the Raft ...ztatlock/pubs/verdi-woos-cpp16-slides.pdfRaft Algorithm Proof Overview state machine replication implemented in Verdi and

Affinity Lemmas: Example

Affinity Lemma

every invariant of entries in logs is invariant of entries in allEntries

)e.term > 0

e log2

) e.term > 0e allEntries 2

Page 43: Planning for Change in a Formal Verification of the Raft ...ztatlock/pubs/verdi-woos-cpp16-slides.pdfRaft Algorithm Proof Overview state machine replication implemented in Verdi and

Affinity Lemmas: Example

Affinity Lemma

every invariant of entries in logs is invariant of entries in allEntries

)P e

e log2

) P ee allEntries 2

Page 44: Planning for Change in a Formal Verification of the Raft ...ztatlock/pubs/verdi-woos-cpp16-slides.pdfRaft Algorithm Proof Overview state machine replication implemented in Verdi and

Affinity Lemmas

Ex 1: Relate ghost state to real statetransfer properties once and for all

Ex 2: Relate current messages to pastresponse => past request

Page 45: Planning for Change in a Formal Verification of the Raft ...ztatlock/pubs/verdi-woos-cpp16-slides.pdfRaft Algorithm Proof Overview state machine replication implemented in Verdi and

Structured Handlers: Examplehandler = update_state ; respond

handler

net

net’

update_statenet

net’

netirespond

Page 46: Planning for Change in a Formal Verification of the Raft ...ztatlock/pubs/verdi-woos-cpp16-slides.pdfRaft Algorithm Proof Overview state machine replication implemented in Verdi and

Structured Handlers: Examplehandler = update_state ; respond

handler

net

net’

I

I

update_statenet

net’

netirespond

Page 47: Planning for Change in a Formal Verification of the Raft ...ztatlock/pubs/verdi-woos-cpp16-slides.pdfRaft Algorithm Proof Overview state machine replication implemented in Verdi and

Structured Handlers: Examplehandler = update_state ; respond

handler

net

net’

update_statenet

net’

netirespond

I

I

I

I

I

Page 48: Planning for Change in a Formal Verification of the Raft ...ztatlock/pubs/verdi-woos-cpp16-slides.pdfRaft Algorithm Proof Overview state machine replication implemented in Verdi and

The burden of proof

P) PP with ghost state

P true initially P preserved

Lemma Lemma …LemmaRe-verification is the primary challenge

Proof engineering techniques help: - affinity lemmas - intermediate reachability - structural tactics - information hiding

Page 49: Planning for Change in a Formal Verification of the Raft ...ztatlock/pubs/verdi-woos-cpp16-slides.pdfRaft Algorithm Proof Overview state machine replication implemented in Verdi and

Contributions

First formal proof of Raft’s safety first verified implementation!

Large-scale Verdi case studystress test; reverification inevitable

Proof engineering lessonsaffinity lemmas, etc.

Page 50: Planning for Change in a Formal Verification of the Raft ...ztatlock/pubs/verdi-woos-cpp16-slides.pdfRaft Algorithm Proof Overview state machine replication implemented in Verdi and
Page 51: Planning for Change in a Formal Verification of the Raft ...ztatlock/pubs/verdi-woos-cpp16-slides.pdfRaft Algorithm Proof Overview state machine replication implemented in Verdi and

Planning for Change in a Formal Verification of the

Raft Consensus Protocol

James Wilcox

Steve Anton

Zach Tatlock

Michael Ernst

Tom Anderson

Doug Woos

Page 52: Planning for Change in a Formal Verification of the Raft ...ztatlock/pubs/verdi-woos-cpp16-slides.pdfRaft Algorithm Proof Overview state machine replication implemented in Verdi and