Experiences with Formal Specifications of Fault-Tolerant File Systems Roxana Geambasu(University of...
-
Upload
blanche-freeman -
Category
Documents
-
view
213 -
download
0
Transcript of Experiences with Formal Specifications of Fault-Tolerant File Systems Roxana Geambasu(University of...
![Page 1: Experiences with Formal Specifications of Fault-Tolerant File Systems Roxana Geambasu(University of Washington) Andrew Birrell(Microsoft Research) John.](https://reader033.fdocuments.in/reader033/viewer/2022051620/56649ef35503460f94c05fcb/html5/thumbnails/1.jpg)
Experiences with Formal Specifications of Fault-Tolerant
File Systems
Roxana Geambasu (University of Washington)
Andrew Birrell (Microsoft Research)
John MacCormick (Dickinson College)
![Page 2: Experiences with Formal Specifications of Fault-Tolerant File Systems Roxana Geambasu(University of Washington) Andrew Birrell(Microsoft Research) John.](https://reader033.fdocuments.in/reader033/viewer/2022051620/56649ef35503460f94c05fcb/html5/thumbnails/2.jpg)
22
Fault-Tolerant File Systems (FTFSs) FTFSs are crucial components in today’s datacenters They underlie most of what we do on the Web Dependability & correctness of FTFSs are paramount
Google File System (GFS)
Niobe Dynamo
Google Earth
Google Analytics
Amazon services
Web services
![Page 3: Experiences with Formal Specifications of Fault-Tolerant File Systems Roxana Geambasu(University of Washington) Andrew Birrell(Microsoft Research) John.](https://reader033.fdocuments.in/reader033/viewer/2022051620/56649ef35503460f94c05fcb/html5/thumbnails/3.jpg)
3
FTFSs Are Extremely Complex Contain sophisticated protocols for:
replica consistency, recovery (replica addition to compensate for failures), reconfiguration (replica removal due to failure), load balancing, etc.
Hence, FTFS protocols and implementation are hard to get right
![Page 4: Experiences with Formal Specifications of Fault-Tolerant File Systems Roxana Geambasu(University of Washington) Andrew Birrell(Microsoft Research) John.](https://reader033.fdocuments.in/reader033/viewer/2022051620/56649ef35503460f94c05fcb/html5/thumbnails/4.jpg)
Formal Methods (FM) Formal methods have been used extensively to
increase trust in complex systems Formal specification languages are unambiguous Model checking and formal proofs are reliable
However, FTFS designers still rely solely on prose and intuitive reasoning Prose may be ambiguous, inaccurate Intuitive reasoning may be faulty
![Page 5: Experiences with Formal Specifications of Fault-Tolerant File Systems Roxana Geambasu(University of Washington) Andrew Birrell(Microsoft Research) John.](https://reader033.fdocuments.in/reader033/viewer/2022051620/56649ef35503460f94c05fcb/html5/thumbnails/5.jpg)
FTFS Design and Analysis Challenges
Without formal methods, it is hard to: Understand FTFS behavior and semantics
Intuitive reasoning is hard and error-prone
Explore alternative designs Alternative designs may affect semantics in complex ways
Compare various FTFSs Prose is ambiguous and code bases are huge (tens of
thousands of lines of code)
![Page 6: Experiences with Formal Specifications of Fault-Tolerant File Systems Roxana Geambasu(University of Washington) Andrew Birrell(Microsoft Research) John.](https://reader033.fdocuments.in/reader033/viewer/2022051620/56649ef35503460f94c05fcb/html5/thumbnails/6.jpg)
Goal: Convince FTFS Builders to Use FM Previous studies showed how and for what purposes
to use FM for many classes of systems, e.g.: Local/distributed FSs, processor caches, TCP congestion
Our work: Shows how and for what purposes to use FM for
another specific class of important systems:
fault-tolerant file systems
Identifies convenient ways in which FM help in understanding, designing & comparing FTFSs
![Page 7: Experiences with Formal Specifications of Fault-Tolerant File Systems Roxana Geambasu(University of Washington) Andrew Birrell(Microsoft Research) John.](https://reader033.fdocuments.in/reader033/viewer/2022051620/56649ef35503460f94c05fcb/html5/thumbnails/7.jpg)
77
Our Experience We wrote TLA+ specifications for three protocols:
Chain replication (Cornell University) Niobe (Microsoft) GFS (Google)
Our experience shows that FM help solve FTFS challenges:
1. Comparing system mechanisms & tradeoffs
2. Understanding and proving semantics
3. Exploring alternative designs
![Page 8: Experiences with Formal Specifications of Fault-Tolerant File Systems Roxana Geambasu(University of Washington) Andrew Birrell(Microsoft Research) John.](https://reader033.fdocuments.in/reader033/viewer/2022051620/56649ef35503460f94c05fcb/html5/thumbnails/8.jpg)
88
Outline
Specification effort
Experiences with formal specifications for FTFS:
1. Comparing system mechanisms
2. Understanding and proving semantics
3. Exploring alternative designs
Conclusions
![Page 9: Experiences with Formal Specifications of Fault-Tolerant File Systems Roxana Geambasu(University of Washington) Andrew Birrell(Microsoft Research) John.](https://reader033.fdocuments.in/reader033/viewer/2022051620/56649ef35503460f94c05fcb/html5/thumbnails/9.jpg)
Specification Effort
Question: How hard is it to build specifications?
Answer: Moderately precise specifications are reasonably easy to produce
Chain Niobe GFS
Time to write 3 weeks 1 week 2 weeks
![Page 10: Experiences with Formal Specifications of Fault-Tolerant File Systems Roxana Geambasu(University of Washington) Andrew Birrell(Microsoft Research) John.](https://reader033.fdocuments.in/reader033/viewer/2022051620/56649ef35503460f94c05fcb/html5/thumbnails/10.jpg)
1010
Outline
Specification effort
Experiences with formal specifications for FTFS:
1. Comparing system mechanisms
2. Understanding and proving semantics
3. Exploring alternative designs
Conclusions
![Page 11: Experiences with Formal Specifications of Fault-Tolerant File Systems Roxana Geambasu(University of Washington) Andrew Birrell(Microsoft Research) John.](https://reader033.fdocuments.in/reader033/viewer/2022051620/56649ef35503460f94c05fcb/html5/thumbnails/11.jpg)
11
1. Comparing System Mechanisms
Case study: GFS vs. Niobe
From prose, they seemed very different systems GFS: trades some consistency for throughput Niobe: designed for strong consistency
Our TLA+ specifications highlight significant mechanism overlap and also key differences
11
![Page 12: Experiences with Formal Specifications of Fault-Tolerant File Systems Roxana Geambasu(University of Washington) Andrew Birrell(Microsoft Research) John.](https://reader033.fdocuments.in/reader033/viewer/2022051620/56649ef35503460f94c05fcb/html5/thumbnails/12.jpg)
12
Capturing Similarities & Differences
Common
Niobe GFS
(291 lines)
(287 lines)(189 lines)
single-master, primary-secondary replication
More than half of the TLA+ code-base is common Specifications are small due to TLA+ expressiveness
Compare their total sizes to the tens of thousands of LOC of the systems’ implementations
![Page 13: Experiences with Formal Specifications of Fault-Tolerant File Systems Roxana Geambasu(University of Washington) Andrew Birrell(Microsoft Research) John.](https://reader033.fdocuments.in/reader033/viewer/2022051620/56649ef35503460f94c05fcb/html5/thumbnails/13.jpg)
13
Differences Stand Out Clearly in TLA+
1
w
32
w
w
w
ACK ACK
4
Example: Write completion in GFS and Niobe
![Page 14: Experiences with Formal Specifications of Fault-Tolerant File Systems Roxana Geambasu(University of Washington) Andrew Birrell(Microsoft Research) John.](https://reader033.fdocuments.in/reader033/viewer/2022051620/56649ef35503460f94c05fcb/html5/thumbnails/14.jpg)
14
Differences Stand Out Clearly in TLA+
Group reconfiguration
1
w
32
w
w
w
ACK ACK
4 1
32
w
w
w
ACK ACK
w
4
Example: Write completion in GFS and Niobe
![Page 15: Experiences with Formal Specifications of Fault-Tolerant File Systems Roxana Geambasu(University of Washington) Andrew Birrell(Microsoft Research) John.](https://reader033.fdocuments.in/reader033/viewer/2022051620/56649ef35503460f94c05fcb/html5/thumbnails/15.jpg)
15
Understanding Tradeoffs
Smaller latency, but writes may leave
group inconsistent
A write never leaves replica group in
inconsistent state
Example: Write completion in GFS and Niobe
Tradeoff:
![Page 16: Experiences with Formal Specifications of Fault-Tolerant File Systems Roxana Geambasu(University of Washington) Andrew Birrell(Microsoft Research) John.](https://reader033.fdocuments.in/reader033/viewer/2022051620/56649ef35503460f94c05fcb/html5/thumbnails/16.jpg)
16
Lesson: Formalism Helps in Comparison
Formal specifications distill key differences and similarities between systems
Understanding the key differences enables us to understand tradeoffs
![Page 17: Experiences with Formal Specifications of Fault-Tolerant File Systems Roxana Geambasu(University of Washington) Andrew Birrell(Microsoft Research) John.](https://reader033.fdocuments.in/reader033/viewer/2022051620/56649ef35503460f94c05fcb/html5/thumbnails/17.jpg)
1717
Outline
Specification effort
Experiences with formal specifications for FTFS:
1. Comparing system mechanisms
2. Understanding and proving semantics
3. Exploring alternative designs
Conclusions
![Page 18: Experiences with Formal Specifications of Fault-Tolerant File Systems Roxana Geambasu(University of Washington) Andrew Birrell(Microsoft Research) John.](https://reader033.fdocuments.in/reader033/viewer/2022051620/56649ef35503460f94c05fcb/html5/thumbnails/18.jpg)
18
2. Understanding FTFS Consistency Hard to prove consistency models for FTFSs
For weakly consistent systems, it can be even harder
Solution: use refinement mapping1. Reduce system to a really simple model
2. Prove the correctness of the reduction
3. Reason about the SimpleStore
For convenience, we use model-checking instead of full manual proofs at Step 2
System
SimpleStore
consistency model
reduction
consistency model
![Page 19: Experiences with Formal Specifications of Fault-Tolerant File Systems Roxana Geambasu(University of Washington) Andrew Birrell(Microsoft Research) John.](https://reader033.fdocuments.in/reader033/viewer/2022051620/56649ef35503460f94c05fcb/html5/thumbnails/19.jpg)
19
SimpleStores capture only client-visible behaviors and abstract out all protocol mechanisms
SimpleStores are easy to reason about
SimpleStores for the Three FTFSs
Chain_SS Niobe_SS GFS_SS
Chain BlueNiobeChain
reductionreductionreduction
GFS
![Page 20: Experiences with Formal Specifications of Fault-Tolerant File Systems Roxana Geambasu(University of Washington) Andrew Birrell(Microsoft Research) John.](https://reader033.fdocuments.in/reader033/viewer/2022051620/56649ef35503460f94c05fcb/html5/thumbnails/20.jpg)
20
Chain_SS
Chain
reduction
20
Chain’s Consistency Semantics
Using convenient methods, we gained reliable insight into Chain’s consistency model
linearizablelinearizable Proof is straightforward (half a page)
![Page 21: Experiences with Formal Specifications of Fault-Tolerant File Systems Roxana Geambasu(University of Washington) Andrew Birrell(Microsoft Research) John.](https://reader033.fdocuments.in/reader033/viewer/2022051620/56649ef35503460f94c05fcb/html5/thumbnails/21.jpg)
2121
Niobe’s Consistency Semantics
Chain_SS
Chain
reduction
linearizable
linearizable
Niobe_SS
Niobe
reduction
linearizable
linearizable
Similar experience as with Chain Thus, formal methods help in verifying standard
consistency models for strongly-consistent FTFSs
GFS_SS
??
GFS
![Page 22: Experiences with Formal Specifications of Fault-Tolerant File Systems Roxana Geambasu(University of Washington) Andrew Birrell(Microsoft Research) John.](https://reader033.fdocuments.in/reader033/viewer/2022051620/56649ef35503460f94c05fcb/html5/thumbnails/22.jpg)
GFS’ Consistency Semantics Formal methods proved helpful in several ways
An interesting conclusion (details in the paper): Using refinement mappings, we were able to show
that, under a small set of assumptions, GFS has regular-register semantic
GFS_SS
GFSassumptions
reduction
regular register
regular register
well-defined intermediate-level consistency model
![Page 23: Experiences with Formal Specifications of Fault-Tolerant File Systems Roxana Geambasu(University of Washington) Andrew Birrell(Microsoft Research) John.](https://reader033.fdocuments.in/reader033/viewer/2022051620/56649ef35503460f94c05fcb/html5/thumbnails/23.jpg)
23
Lesson: Formalism Helps Understand Semantics
Refinement mappings help in understanding & reliably verifying consistency models of FTFS
They are useful for both strongly consistent and weakly consistent FTFSs
![Page 24: Experiences with Formal Specifications of Fault-Tolerant File Systems Roxana Geambasu(University of Washington) Andrew Birrell(Microsoft Research) John.](https://reader033.fdocuments.in/reader033/viewer/2022051620/56649ef35503460f94c05fcb/html5/thumbnails/24.jpg)
2424
Outline
Specification effort
Experiences with formal specifications for FTFS:
1. Comparing system mechanisms
2. Understanding and proving semantics
3. Exploring alternative designs
Conclusions
![Page 25: Experiences with Formal Specifications of Fault-Tolerant File Systems Roxana Geambasu(University of Washington) Andrew Birrell(Microsoft Research) John.](https://reader033.fdocuments.in/reader033/viewer/2022051620/56649ef35503460f94c05fcb/html5/thumbnails/25.jpg)
2525
3. Exploring Alternative Designs
Exploring alternative designs is much easier using our framework (TLA+ specs, SimpleStores, reductions)
SystemSimpleStore
System model
reduction
![Page 26: Experiences with Formal Specifications of Fault-Tolerant File Systems Roxana Geambasu(University of Washington) Andrew Birrell(Microsoft Research) John.](https://reader033.fdocuments.in/reader033/viewer/2022051620/56649ef35503460f94c05fcb/html5/thumbnails/26.jpg)
26
Case-Study: Changing Niobe’s Design Currently, Niobe’s clients read from primary only Reading from any replica may improve throughput
Design question:
What happens to Niobe if it adopts read-any policy?
ChainGFSassumption
Niobe_SS
linearizable
regular register?
GFS_SS
regular register
Nioberead-any
regular register
![Page 27: Experiences with Formal Specifications of Fault-Tolerant File Systems Roxana Geambasu(University of Washington) Andrew Birrell(Microsoft Research) John.](https://reader033.fdocuments.in/reader033/viewer/2022051620/56649ef35503460f94c05fcb/html5/thumbnails/27.jpg)
Conclusions FTFSs are extremely important in today’s Web We showed how formal methods can help improve
our understanding and trust in FTFSs
Lessons from our experience with three FTFSs: Writing formal specifications is relatively easy Formal methods enable:
Insightful comparison of mechanisms & tradeoffs Reliable verification of consistency properties Convenient investigation of alternative designs
![Page 28: Experiences with Formal Specifications of Fault-Tolerant File Systems Roxana Geambasu(University of Washington) Andrew Birrell(Microsoft Research) John.](https://reader033.fdocuments.in/reader033/viewer/2022051620/56649ef35503460f94c05fcb/html5/thumbnails/28.jpg)
2828
Appendix
![Page 29: Experiences with Formal Specifications of Fault-Tolerant File Systems Roxana Geambasu(University of Washington) Andrew Birrell(Microsoft Research) John.](https://reader033.fdocuments.in/reader033/viewer/2022051620/56649ef35503460f94c05fcb/html5/thumbnails/29.jpg)
29
Related Work FM are extensively used to reason about software [Bickford,
et.al., 96] and hardware [Shimizu, et.al., 02] However, FTFS builders have not adopted them yet By sharing our experience, we hope to convince FTFS builders of the
utility of specifying their systems formally
Using FM to improve understanding and trust in systems: Previous works apply FM to various classes of systems: [Chkliaev, et.al., 00], [Crow, et.al., 98], [Joshi, et.al., 03], [Houston, et.al., 91] The closest works are those looking at distributed FS (AFS, Coda)
[Sivathanu, et.al., 05], [Wing, et.al., 97], [Yang, et.al., 04] We show how to apply them in the specific context of FTFS
Reducing complex systems to simple ones in order to reason about semantics has been used before [Joshi, et.al., 03]
We apply this method to FTFSs
![Page 30: Experiences with Formal Specifications of Fault-Tolerant File Systems Roxana Geambasu(University of Washington) Andrew Birrell(Microsoft Research) John.](https://reader033.fdocuments.in/reader033/viewer/2022051620/56649ef35503460f94c05fcb/html5/thumbnails/30.jpg)
GFS Assumptions
If:
1. A write never crosses chunk boundaries GFS client library offers chunk-level operations
2. A write never goes to a stale replica Implement this assumption using a lease mechanism
Then:GFS_SS
GFSassumptions
reduction
regular register
regular register
![Page 31: Experiences with Formal Specifications of Fault-Tolerant File Systems Roxana Geambasu(University of Washington) Andrew Birrell(Microsoft Research) John.](https://reader033.fdocuments.in/reader033/viewer/2022051620/56649ef35503460f94c05fcb/html5/thumbnails/31.jpg)
Standard Consistency Models Linearizability (Atomic register semantic)
Any client-visible history H generated by the system is equivalent to a legal sequential interleaving S
The sequential interleaving S preserves the real-time ordering of operations from H
Serializability Any client-visible history H generated by the system is equivalent to a
legal sequential interleaving S
Regular register semantic Read not concurrent with any write returns most recently written value Read concurrent with some writes returns either the value of the in-
process writes or the most recently written value
Safe register semantic Read not concurrent with any write returns most recently written value Read concurrent with some writes can return anything
![Page 32: Experiences with Formal Specifications of Fault-Tolerant File Systems Roxana Geambasu(University of Washington) Andrew Birrell(Microsoft Research) John.](https://reader033.fdocuments.in/reader033/viewer/2022051620/56649ef35503460f94c05fcb/html5/thumbnails/32.jpg)
3232
Summary of Contributions Identified a new important class of extremely complex
systems: FTFSs
Showed three aspects of FTFS design & analysis for which FM prove especially valuable Mechanism comparison, semantics understanding, and
design space exploration
Showed how to apply specific FMs to FTFSs Showed how to construct SimpleStores and what can be
learned from them SimpleStores are reusable between systems
We believe that our study, tailored toward FTFSs, can be more relevant to FTFS designers than more general studies
![Page 33: Experiences with Formal Specifications of Fault-Tolerant File Systems Roxana Geambasu(University of Washington) Andrew Birrell(Microsoft Research) John.](https://reader033.fdocuments.in/reader033/viewer/2022051620/56649ef35503460f94c05fcb/html5/thumbnails/33.jpg)
Lessons from Our Experience Building high-level specifications for FTFS is relatively easy
It is also remarkably useful for understanding system
The exercise of writing specifications exposes similarities in seemingly dissimilar systems (GFS, Niobe)
Formal specifications also distill the key design differences
Specifications enable convenient verifications of consistency for both strongly and weakly consistent systems Niobe and Chain are both linearizable GFS can be upgraded to regular register via a clear set of assumptions GFS’ design to read from any replica heavily influences its consistency
Intuition can fail often times Niobe seemed to be reducible to Chain_SS, but actually was not
![Page 34: Experiences with Formal Specifications of Fault-Tolerant File Systems Roxana Geambasu(University of Washington) Andrew Birrell(Microsoft Research) John.](https://reader033.fdocuments.in/reader033/viewer/2022051620/56649ef35503460f94c05fcb/html5/thumbnails/34.jpg)
3434
Chain SimpleStore
write channel
drop(w7)
commit(w5)
SerialDB
reads
read()
read channel
Chain_SS
r2
r1
r3
w7
w6
w5
Requestswrites
Responseswrites
reads
![Page 35: Experiences with Formal Specifications of Fault-Tolerant File Systems Roxana Geambasu(University of Washington) Andrew Birrell(Microsoft Research) John.](https://reader033.fdocuments.in/reader033/viewer/2022051620/56649ef35503460f94c05fcb/html5/thumbnails/35.jpg)
3535
The Temporal Logic of Actions (TLA+) Formalism that combines a temporal logic with a
logic of actions Especially designed for specification of distributed
asynchronous systems TLA+ specifications model the system as a state
machine: Define system variables (state) Model actions that the system can take as state transitions
![Page 36: Experiences with Formal Specifications of Fault-Tolerant File Systems Roxana Geambasu(University of Washington) Andrew Birrell(Microsoft Research) John.](https://reader033.fdocuments.in/reader033/viewer/2022051620/56649ef35503460f94c05fcb/html5/thumbnails/36.jpg)
36
Understanding Tradeoffs
Smaller write latency, but writes may leave group inconsistent
A write never leaves replica group in
inconsistent state
1
32
4 1
32
4
Error
readread
Old value