Experiences with Formal Specifications of Fault-Tolerant File Systems

36
Experiences with Formal Specifications of Fault-Tolerant File Systems Roxana Geambasu (University of Washington) Andrew Birrell (Microsoft Research) John MacCormick (Dickinson College)

description

Experiences with Formal Specifications of Fault-Tolerant File Systems. Roxana Geambasu(University of Washington) Andrew Birrell (Microsoft Research) John MacCormick (Dickinson College). Fault-Tolerant File Systems (FTFSs). FTFSs are crucial components in today’s datacenters - PowerPoint PPT Presentation

Transcript of Experiences with Formal Specifications of Fault-Tolerant File Systems

Page 1: Experiences with Formal Specifications of Fault-Tolerant File Systems

Experiences with Formal Specifications of Fault-Tolerant

File Systems

Roxana Geambasu (University of Washington)

Andrew Birrell (Microsoft Research)

John MacCormick (Dickinson College)

Page 2: Experiences with Formal Specifications of Fault-Tolerant File Systems

22

Fault-Tolerant File Systems (FTFSs) FTFSs are crucial components in today’s datacenters They underlie most of what we do on the Web Dependability & correctness of FTFSs are paramount

Google File System (GFS)

Niobe Dynamo

Google Earth

Google Analytics

Amazon services

Web services

Page 3: Experiences with Formal Specifications of Fault-Tolerant File Systems

3

FTFSs Are Extremely Complex Contain sophisticated protocols for:

replica consistency, recovery (replica addition to compensate for failures), reconfiguration (replica removal due to failure), load balancing, etc.

Hence, FTFS protocols and implementation are hard to get right

Page 4: Experiences with Formal Specifications of Fault-Tolerant File Systems

Formal Methods (FM) Formal methods have been used extensively to

increase trust in complex systems Formal specification languages are unambiguous Model checking and formal proofs are reliable

However, FTFS designers still rely solely on prose and intuitive reasoning Prose may be ambiguous, inaccurate Intuitive reasoning may be faulty

Page 5: Experiences with Formal Specifications of Fault-Tolerant File Systems

FTFS Design and Analysis Challenges

Without formal methods, it is hard to: Understand FTFS behavior and semantics

Intuitive reasoning is hard and error-prone

Explore alternative designs Alternative designs may affect semantics in complex ways

Compare various FTFSs Prose is ambiguous and code bases are huge (tens of

thousands of lines of code)

Page 6: Experiences with Formal Specifications of Fault-Tolerant File Systems

Goal: Convince FTFS Builders to Use FM Previous studies showed how and for what purposes

to use FM for many classes of systems, e.g.: Local/distributed FSs, processor caches, TCP congestion

Our work: Shows how and for what purposes to use FM for

another specific class of important systems:

fault-tolerant file systems

Identifies convenient ways in which FM help in understanding, designing & comparing FTFSs

Page 7: Experiences with Formal Specifications of Fault-Tolerant File Systems

77

Our Experience We wrote TLA+ specifications for three protocols:

Chain replication (Cornell University) Niobe (Microsoft) GFS (Google)

Our experience shows that FM help solve FTFS challenges:

1. Comparing system mechanisms & tradeoffs

2. Understanding and proving semantics

3. Exploring alternative designs

Page 8: Experiences with Formal Specifications of Fault-Tolerant File Systems

88

Outline

Specification effort

Experiences with formal specifications for FTFS:

1. Comparing system mechanisms

2. Understanding and proving semantics

3. Exploring alternative designs

Conclusions

Page 9: Experiences with Formal Specifications of Fault-Tolerant File Systems

Specification Effort

Question: How hard is it to build specifications?

Answer: Moderately precise specifications are reasonably easy to produce

Chain Niobe GFS

Time to write 3 weeks 1 week 2 weeks

Page 10: Experiences with Formal Specifications of Fault-Tolerant File Systems

1010

Outline

Specification effort

Experiences with formal specifications for FTFS:

1. Comparing system mechanisms

2. Understanding and proving semantics

3. Exploring alternative designs

Conclusions

Page 11: Experiences with Formal Specifications of Fault-Tolerant File Systems

11

1. Comparing System Mechanisms

Case study: GFS vs. Niobe

From prose, they seemed very different systems GFS: trades some consistency for throughput Niobe: designed for strong consistency

Our TLA+ specifications highlight significant mechanism overlap and also key differences

11

Page 12: Experiences with Formal Specifications of Fault-Tolerant File Systems

12

Capturing Similarities & Differences

Common

Niobe GFS

(291 lines)

(287 lines)(189 lines)

single-master, primary-secondary replication

More than half of the TLA+ code-base is common Specifications are small due to TLA+ expressiveness

Compare their total sizes to the tens of thousands of LOC of the systems’ implementations

Page 13: Experiences with Formal Specifications of Fault-Tolerant File Systems

13

Differences Stand Out Clearly in TLA+

1

w

32

w

w

w

ACK ACK

4

Example: Write completion in GFS and Niobe

Page 14: Experiences with Formal Specifications of Fault-Tolerant File Systems

14

Differences Stand Out Clearly in TLA+

Group reconfiguration

1

w

32

w

w

w

ACK ACK

4 1

32

w

w

w

ACK ACK

w

4

Example: Write completion in GFS and Niobe

Page 15: Experiences with Formal Specifications of Fault-Tolerant File Systems

15

Understanding Tradeoffs

Smaller latency, but writes may leave

group inconsistent

A write never leaves replica group in

inconsistent state

Example: Write completion in GFS and Niobe

Tradeoff:

Page 16: Experiences with Formal Specifications of Fault-Tolerant File Systems

16

Lesson: Formalism Helps in Comparison

Formal specifications distill key differences and similarities between systems

Understanding the key differences enables us to understand tradeoffs

Page 17: Experiences with Formal Specifications of Fault-Tolerant File Systems

1717

Outline

Specification effort

Experiences with formal specifications for FTFS:

1. Comparing system mechanisms

2. Understanding and proving semantics

3. Exploring alternative designs

Conclusions

Page 18: Experiences with Formal Specifications of Fault-Tolerant File Systems

18

2. Understanding FTFS Consistency Hard to prove consistency models for FTFSs

For weakly consistent systems, it can be even harder

Solution: use refinement mapping1. Reduce system to a really simple model

2. Prove the correctness of the reduction

3. Reason about the SimpleStore

For convenience, we use model-checking instead of full manual proofs at Step 2

System

SimpleStore

consistency model

reduction

consistency model

Page 19: Experiences with Formal Specifications of Fault-Tolerant File Systems

19

SimpleStores capture only client-visible behaviors and abstract out all protocol mechanisms

SimpleStores are easy to reason about

SimpleStores for the Three FTFSs

Chain_SS Niobe_SS GFS_SS

Chain BlueNiobeChain

reductionreductionreduction

GFS

Page 20: Experiences with Formal Specifications of Fault-Tolerant File Systems

20

Chain_SS

Chain

reduction

20

Chain’s Consistency Semantics

Using convenient methods, we gained reliable insight into Chain’s consistency model

linearizablelinearizable Proof is straightforward (half a page)

Page 21: Experiences with Formal Specifications of Fault-Tolerant File Systems

2121

Niobe’s Consistency Semantics

Chain_SS

Chain

reduction

linearizable

linearizable

Niobe_SS

Niobe

reduction

linearizable

linearizable

Similar experience as with Chain Thus, formal methods help in verifying standard

consistency models for strongly-consistent FTFSs

GFS_SS

??

GFS

Page 22: Experiences with Formal Specifications of Fault-Tolerant File Systems

GFS’ Consistency Semantics Formal methods proved helpful in several ways

An interesting conclusion (details in the paper): Using refinement mappings, we were able to show

that, under a small set of assumptions, GFS has regular-register semantic

GFS_SS

GFSassumptions

reduction

regular register

regular register

well-defined intermediate-level consistency model

Page 23: Experiences with Formal Specifications of Fault-Tolerant File Systems

23

Lesson: Formalism Helps Understand Semantics

Refinement mappings help in understanding & reliably verifying consistency models of FTFS

They are useful for both strongly consistent and weakly consistent FTFSs

Page 24: Experiences with Formal Specifications of Fault-Tolerant File Systems

2424

Outline

Specification effort

Experiences with formal specifications for FTFS:

1. Comparing system mechanisms

2. Understanding and proving semantics

3. Exploring alternative designs

Conclusions

Page 25: Experiences with Formal Specifications of Fault-Tolerant File Systems

2525

3. Exploring Alternative Designs

Exploring alternative designs is much easier using our framework (TLA+ specs, SimpleStores, reductions)

SystemSimpleStore

System model

reduction

Page 26: Experiences with Formal Specifications of Fault-Tolerant File Systems

26

Case-Study: Changing Niobe’s Design Currently, Niobe’s clients read from primary only Reading from any replica may improve throughput

Design question:

What happens to Niobe if it adopts read-any policy?

ChainGFSassumption

Niobe_SS

linearizable

regular register?

GFS_SS

regular register

Nioberead-any

regular register

Page 27: Experiences with Formal Specifications of Fault-Tolerant File Systems

Conclusions FTFSs are extremely important in today’s Web We showed how formal methods can help improve

our understanding and trust in FTFSs

Lessons from our experience with three FTFSs: Writing formal specifications is relatively easy Formal methods enable:

Insightful comparison of mechanisms & tradeoffs Reliable verification of consistency properties Convenient investigation of alternative designs

Page 28: Experiences with Formal Specifications of Fault-Tolerant File Systems

2828

Appendix

Page 29: Experiences with Formal Specifications of Fault-Tolerant File Systems

29

Related Work FM are extensively used to reason about software [Bickford,

et.al., 96] and hardware [Shimizu, et.al., 02] However, FTFS builders have not adopted them yet By sharing our experience, we hope to convince FTFS builders of the

utility of specifying their systems formally

Using FM to improve understanding and trust in systems: Previous works apply FM to various classes of systems: [Chkliaev, et.al., 00], [Crow, et.al., 98], [Joshi, et.al., 03], [Houston, et.al., 91] The closest works are those looking at distributed FS (AFS, Coda)

[Sivathanu, et.al., 05], [Wing, et.al., 97], [Yang, et.al., 04] We show how to apply them in the specific context of FTFS

Reducing complex systems to simple ones in order to reason about semantics has been used before [Joshi, et.al., 03]

We apply this method to FTFSs

Page 30: Experiences with Formal Specifications of Fault-Tolerant File Systems

GFS Assumptions

If:

1. A write never crosses chunk boundaries GFS client library offers chunk-level operations

2. A write never goes to a stale replica Implement this assumption using a lease mechanism

Then:GFS_SS

GFSassumptions

reduction

regular register

regular register

Page 31: Experiences with Formal Specifications of Fault-Tolerant File Systems

Standard Consistency Models Linearizability (Atomic register semantic)

Any client-visible history H generated by the system is equivalent to a legal sequential interleaving S

The sequential interleaving S preserves the real-time ordering of operations from H

Serializability Any client-visible history H generated by the system is equivalent to a

legal sequential interleaving S

Regular register semantic Read not concurrent with any write returns most recently written value Read concurrent with some writes returns either the value of the in-

process writes or the most recently written value

Safe register semantic Read not concurrent with any write returns most recently written value Read concurrent with some writes can return anything

Page 32: Experiences with Formal Specifications of Fault-Tolerant File Systems

3232

Summary of Contributions Identified a new important class of extremely complex

systems: FTFSs

Showed three aspects of FTFS design & analysis for which FM prove especially valuable Mechanism comparison, semantics understanding, and

design space exploration

Showed how to apply specific FMs to FTFSs Showed how to construct SimpleStores and what can be

learned from them SimpleStores are reusable between systems

We believe that our study, tailored toward FTFSs, can be more relevant to FTFS designers than more general studies

Page 33: Experiences with Formal Specifications of Fault-Tolerant File Systems

Lessons from Our Experience Building high-level specifications for FTFS is relatively easy

It is also remarkably useful for understanding system

The exercise of writing specifications exposes similarities in seemingly dissimilar systems (GFS, Niobe)

Formal specifications also distill the key design differences

Specifications enable convenient verifications of consistency for both strongly and weakly consistent systems Niobe and Chain are both linearizable GFS can be upgraded to regular register via a clear set of assumptions GFS’ design to read from any replica heavily influences its consistency

Intuition can fail often times Niobe seemed to be reducible to Chain_SS, but actually was not

Page 34: Experiences with Formal Specifications of Fault-Tolerant File Systems

3434

Chain SimpleStore

write channel

drop(w7)

commit(w5)

SerialDB

reads

read()

read channel

Chain_SS

r2

r1

r3

w7

w6

w5

Requestswrites

Responseswrites

reads

Page 35: Experiences with Formal Specifications of Fault-Tolerant File Systems

3535

The Temporal Logic of Actions (TLA+) Formalism that combines a temporal logic with a

logic of actions Especially designed for specification of distributed

asynchronous systems TLA+ specifications model the system as a state

machine: Define system variables (state) Model actions that the system can take as state transitions

Page 36: Experiences with Formal Specifications of Fault-Tolerant File Systems

36

Understanding Tradeoffs

Smaller write latency, but writes may leave group inconsistent

A write never leaves replica group in

inconsistent state

1

32

4 1

32

4

Error

readread

Old value