Why panic () ? Improving Reliability through Restartable File Systems
description
Transcript of Why panic () ? Improving Reliability through Restartable File Systems
Swaminathan Sundararaman, Sriram Subramanian, Abhishek Rajimwale, Andrea C.
Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Michael M. Swift
Applications require data Use FS to reliably store data
Both hardware and software can fail
Typical Solution Large clusters for availability Reliability through replication
2
GFS MasterGFS
Master
Sla
ve
Nod
esS
lave
N
odes
OS
FS
Replication infeasible for desktop environments
Wouldn’t RAID work? Can only tolerate H/W failures
FS crash are more severe Services/applications are killed Requiring OS reboot and
recovery Need: better reliability in the event of file system failures
3
Raid Controller
Dis
ks
Dis
k
App
App
App
MotivationBackgroundRestartable file systemsAdvantages and limitationsConclusions
4
5
6
int journal_mark_dirty(….){ struct reiserfs_journal_cnode *cn = NULL; if (!cn) { cn = get_cnode(p_s_sb); if (!cn) { reiserfs_panic(p_s_sb, "get_cnode failed!\n"); }}}
void reiserfs_panic(struct super_block *sb, ...){ BUG(); /* this is not actually called, but makes reiserfs_panic() "noreturn" */ panic("REISERFS: panic %s\n“, error_buf);}
ReiserFS
File systems already detect failures
Recovery: simplified by generic recovery mechanism
1. Code to recover from all failures Not feasible in reality
2. Restart on failure Previous work have taken this approach
FS need: stateful & lightweightrecovery
7
HeavyweightLightweight
Stat
eles
sSt
atef
ulNooks/Shadow
Xen, MinixL4, Nexus
SafeDriveSingularity
CuriOSEROS
Goal: build lightweight & stateful solution to tolerate file-system failures
Solution: single generic recovery mechanism for any file system failure
1. Detect failures through assertions2. Cleanup resources used by file system3. Restore file-system state before crash4. Continue to service new file system requests
8
FS Failures: completely transparent to applications
9
Transparency Multiple applications using FS upon crash Intertwined execution
Fault-tolerance Handle a gamut of failures Transform to fail-stop failures
Consistency OS and FS could be left in an inconsistent state
FS consistency required to prevent data loss
10
Not all FS support crash-consistency FS state constantly modified by applications
Periodically checkpoint FS state Mark dirty blocks as Copy-On-Write Ensure each checkpoint is atomically written
On Crash: revert back to the last checkpoint
11
VFS
File System
Application
Epoch 0 Epoch 1
time
chec
kpoi
ntOpen (“file”) write() read()
Completed In-progressLegend: Crash
write()
Periodically create
checkpoints1
Move to recent checkpoint4
Replay completed operations
5
Unwind in-flight
processes3
File System Crash2
Re-execute unwound process
6
1
2
4
5
6
write() Close()3
File systems constantly modified Hard to identify a consistent recovery
point
Naïve Solution: Prevent any new FS operation and call sync Inefficient and unacceptable overhead
12
13
VFS
File System
Page Cache
Disk
App
App
App
File Systems write to disk through Page Cache
All requests go through the VFS layer
ext3VFA
T Control requests to FS and dirty pages to disk
14
VFS
File System
Page Cache
Disk
App
VFS
File System
Page Cache
App
Disk
Regular
VFS
File System
Page Cache
App
Disk
STOP STOP
Membrane
11
Have built-in crash consistency mechanism Journaling or Snapshotting
Seamlessly integrate with these mechanism Need FSes to indicate beginning and end of
an transaction Works for data and ordered journaling mode Need to combine writeback mode with COW
15
Log operations at the VFS level Need not modify existing file systems
Operations: open, close, read, write, symlink, unlink, seek, etc. Read:
Logs are thrown away after each checkpoint
What about logging writes?16
Mainly used for replaying writesGoal: Reduce the overhead of
logging writes Soln: Grab data from page cache during
recovery
17
VFS
File System
Page Cache
VFS
File System
Page Cache
VFS
File System
Page Cache
Write (fd, buf, offset, count)
18
19
Setup
20
21
22
Restart ext2 during random-read micro benchmark
23
Data (Mb)
Recovery Time (ms)
10 12.920 13.240 16.1
24
Improves tolerance to file system failures Build trust in new file systems (e.g., ext4, btrfs)
Quick-fix bug patching Developer transform corruptions to restart Restart instead of extensive code restructuring
Encourage more integrity checks in FS code Assertions could be seamlessly transformed to
restart File systems more robust to failures/crashes
25
Only tolerate fail-stop failures Not address-space based Faults could corrupt other kernel components
FS restart may be visible to application e.g., Inode numbers could be changed after
restart
26
VFS
File System
Application
Epoch 0After Crash RecoveryBefore Crash
Epoch 0
create (“file1”) stat (“file1”) write (“file1”, 4k)
File : file1Inode# : 15
create (“file1”) stat (“file1”)write (“file1”, 4k)
File1: inode# 12
File1: inode# 15
Inode# Mismatch
File : file1Inode# : 12
Failures are inevitable in file systems Learn to cope and not hope to avoid them
Generic recovery mechanism for FS failures Improves FS reliability availability of
data Users: Install new FSes with confidence Developers: Ship FS faster; as not all
exception cases are now show-stoppers27
Questions and Comments
28
Advanced Systems Lab (ADSL)University of Wisconsin-Madison
http://www.cs.wisc.edu/adsl