File Systems: Why, How and Where
-
Upload
kernel-tlv -
Category
Software
-
view
429 -
download
1
Transcript of File Systems: Why, How and Where
File Systems: Why, How and Where
Philip DerbekoenSilo2017
The Tragedy of
FileSystems
The Tragedy of FileSystems• Scale and scalability•Reliability •Recovery•Complexity
The Tragedy of FileSystems
The Tragedy of FileSystems• Scale and scalability•Reliability •Recovery•Complexity • Flexibility for developers
Challenges
1. Metadata performance2. Reliability and Recovery 3. Small files performance4. Large files performance 5. Storage Management
Components
•Block allocation•Directory Management• File and Directory operations• Inode handling•Transactions and journaling• Superblock handling• FS tree•Other
Beginning – Sequential
Sequential File System
Header:name,length
Footer:NameCRC
DATA
UNTIL …
Disk – The king of storage
Disk - Anatomy
You are here
Simplest possible FS – FAT
ext2
How it is done
file
file
file
dentry
dentry
inode
inode
File attributes
Direct blocks
Direct blocks
…
Indirect blocks
Double Indirect blocks
Triple Indirect blocks
DirectDirectDirectDirect
DirectDirectDirect
DirectDirectDirectDirect
Ext2 - fsck
•Unclean mount or mount counter•Not everything can be solved•Plan:
• Superblock check• Free Blocks• Inode sanity• Inode links• Duplicates• Bad Blocks • Directory checks
SLOW!!!
Other consistency options – Soft Updates
•Dependency Rules:1. Never point to uninitialized structure2. Never reuse before nullifying the pointers3. Never reset an old pointer before a new one was set
Other consistency
options - COW
ext3
1. Journaling2. Online file system growth3. Directory indexing (not really, as was done for ext2 as well)
Ext3 - journaling
TxB TxEInodeBit
mapWriteBack
Data
Ext3 - journaling
TxB TxEInodeBit
mapOrdered
Data
Ext3 - journaling
TxB TxEInodeBit
map DataFull Journal
Data
Ext3 – Journal final comments
• Journal-assisted recovery: Redo Logging•Commit Batching• Journal Cleaning – mark the last checkpoint in journal superblock•Deletes and reuse
Ext4
•Backward and forward compatible – up to a certain point• Scalability• “Sequentiality” improvements:
• Extent-based allocations• Journal checksum speed up• Delayed allocations
•Transparent Encryption
Performance Optimizations
1. Synchronization of operations (the less is the better)2. Locality of allocations3. I/O Scheduling4. Scalability5. Caching6. Pre-fetching
New Sheriff in town
New Features
• Snapshotting•Versioning•Backups•Deduplication•Data and meta data checksums
BTRFS (“Better FS”)
BTRFS (“Better FS”)
BTRFS (“Better FS”)
Newer is better?
FS Sizes
FS Patches (Linux 2.6 over 5079 patches)
•Maintenance (45%)•Bugs (35%) – constant bug fixing over the life of FS•Performance •Reliability• Features
FAST 2013 – “A Study of Linux File System Evolution”
Bug Consequences
•Corruption•Crash• Failure of operation•Deadlock•Hang•Memory leak•Other
FAST 2013 – “A Study of Linux File System Evolution”
38% of bugs are on failure paths
freq
uenc
y
“Timeline” – facts should not mess a storyBerkley
FFS
ext2
ext3
ReiserFS
ext4
ZFS
BTRFS
FAT
FAT32
NTFS
WinFS (dead)
ReFS
HFS
HFS+
APFS
What was not covered
• Shared, network, distributed and clustered file systems:• WAFL• AFS• GFS and DFS• WebDav
•Volume Management•UnionFS (Knoppix CD+HDD, Docker layers)
The End
Keep in touch: [email protected]