Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci-Dusseau ... · Andrea C. Arpaci-Dusseau, and...
Transcript of Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci-Dusseau ... · Andrea C. Arpaci-Dusseau, and...
![Page 1: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci-Dusseau ... · Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau University of Wisconsin - Madison ... P5 P6 Inode1 Inode2 XactP2](https://reader036.fdocuments.in/reader036/viewer/2022070913/5fb48b2fe1ab3f67826d39b3/html5/thumbnails/1.jpg)
1
Yiying Zhang, Leo Prasath Aruraj,
Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau
University of Wisconsin - Madison
![Page 2: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci-Dusseau ... · Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau University of Wisconsin - Madison ... P5 P6 Inode1 Inode2 XactP2](https://reader036.fdocuments.in/reader036/viewer/2022070913/5fb48b2fe1ab3f67826d39b3/html5/thumbnails/2.jpg)
Indirection Reference an object with a different name
Flexible, simple, and modular
Indirection in computer systems Virtual memory: virtual to physical memory address
Hard disks: bad sectors to nearby locations
RAID arrays: logical to array physical address
SSDs: logical to SSD physical address
2 * Usually attributed to Butler Lampson
![Page 3: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci-Dusseau ... · Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau University of Wisconsin - Madison ... P5 P6 Inode1 Inode2 XactP2](https://reader036.fdocuments.in/reader036/viewer/2022070913/5fb48b2fe1ab3f67826d39b3/html5/thumbnails/3.jpg)
Excess indirection Redundant levels of indirection in a system
e.g. OS on top of hypervisor(s)
e.g. File system on top of RAID
Are all indirections really necessary? Some indirection can be removed
Space and performance cost
What about flash-based SSDs? File system: file offset to logical address (F -> L)
Device: logical address to physical address (L -> P)
3
B
A
C
C L
F
P
![Page 4: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci-Dusseau ... · Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau University of Wisconsin - Madison ... P5 P6 Inode1 Inode2 XactP2](https://reader036.fdocuments.in/reader036/viewer/2022070913/5fb48b2fe1ab3f67826d39b3/html5/thumbnails/4.jpg)
Indirection in SSDs (L->P) Mapping from logical to physical address
Hides erase-before-write and wear leveling
Implemented in Flash Translation Layer (FTL)
Cost of indirection RAM space to maintain indirection table
Hybrid: small page-mapped area + big block-mapped area
Performance cost of garbage collection
Performance impact on random writes [Kim ’12]
4 4
FTL L -> P
P
FS
SSD
LBA+Data
L
![Page 5: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci-Dusseau ... · Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau University of Wisconsin - Madison ... P5 P6 Inode1 Inode2 XactP2](https://reader036.fdocuments.in/reader036/viewer/2022070913/5fb48b2fe1ab3f67826d39b3/html5/thumbnails/5.jpg)
Solution: De-indirection Remove indirection in SSDs (L->P) Store physical addresses directly in file system (F->P)
New interface: Nameless Write
Write without a name (logical address) Device allocates and returns physical address File system stores physical address
Advantages
Reduces space and performance cost of indirection Device maintains critical controls
5
L
F
P
![Page 6: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci-Dusseau ... · Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau University of Wisconsin - Madison ... P5 P6 Inode1 Inode2 XactP2](https://reader036.fdocuments.in/reader036/viewer/2022070913/5fb48b2fe1ab3f67826d39b3/html5/thumbnails/6.jpg)
Designed nameless writing interfaces
Implemented a nameless-writing system Built a nameless-writing SSD emulator
Ported ext3 to nameless writes
Evaluation results Evaluated against two other FTLs
Small indirection table, ~20x reduction over traditional SSDs
Better random write throughput, ~20x over traditional SSDs
6
![Page 7: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci-Dusseau ... · Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau University of Wisconsin - Madison ... P5 P6 Inode1 Inode2 XactP2](https://reader036.fdocuments.in/reader036/viewer/2022070913/5fb48b2fe1ab3f67826d39b3/html5/thumbnails/7.jpg)
Introduction
Problems of basic interfaces and solutions
Nameless-writing device and ext3
Results
Conclusion
7
![Page 8: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci-Dusseau ... · Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau University of Wisconsin - Madison ... P5 P6 Inode1 Inode2 XactP2](https://reader036.fdocuments.in/reader036/viewer/2022070913/5fb48b2fe1ab3f67826d39b3/html5/thumbnails/8.jpg)
8
Nameless Write Writes only data and no name
Physical Read Reads using physical address
Free/Trim Invalidates block at physical address
FTL
P
P
FS
SSD
P Data
P: Addr the structure points to
P: Addr of the block
![Page 9: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci-Dusseau ... · Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau University of Wisconsin - Madison ... P5 P6 Inode1 Inode2 XactP2](https://reader036.fdocuments.in/reader036/viewer/2022070913/5fb48b2fe1ab3f67826d39b3/html5/thumbnails/9.jpg)
P1: Cost of straw-man nameless-write approach How to reduce the overheads of complete de-indirection?
P2: Migration during wear leveling How to reflect physical address change in the file system?
P3: Locating metadata structures
How to find metadata structures efficiently?
9
![Page 10: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci-Dusseau ... · Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau University of Wisconsin - Madison ... P5 P6 Inode1 Inode2 XactP2](https://reader036.fdocuments.in/reader036/viewer/2022070913/5fb48b2fe1ab3f67826d39b3/html5/thumbnails/10.jpg)
Overwrite a data block in a file in ext3
10
P0
P0 FS
SSD
P0
P1 P2
P1 P2
Data Inode Directory
Metadata
Data
Red: Addr the structure points to
Blue: Addr of the block
Problems Overhead of updating along FS tree
FS more complex and less flexible
P1+ offset
![Page 11: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci-Dusseau ... · Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau University of Wisconsin - Madison ... P5 P6 Inode1 Inode2 XactP2](https://reader036.fdocuments.in/reader036/viewer/2022070913/5fb48b2fe1ab3f67826d39b3/html5/thumbnails/11.jpg)
Problem of recursive updates Writes propagate to reflect physical addresses
Solution: Two segments of address space
Stop recursive updates
Physical address space Nameless write, physical read Contains data blocks
Virtual address space Traditional (virtual) read/write Small indirection table in device Contains metadata blocks (typically small metadata [Agrawal’07])
11
![Page 12: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci-Dusseau ... · Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau University of Wisconsin - Madison ... P5 P6 Inode1 Inode2 XactP2](https://reader036.fdocuments.in/reader036/viewer/2022070913/5fb48b2fe1ab3f67826d39b3/html5/thumbnails/12.jpg)
12
Data Blocks Super
Inode Bitmap
Journal
Data Bitmap
Inode
L -> P
Dir
FTL allocates physical addresses
Data + logical address Data
Virtual address space Physical address space
SSD physical flash memory
Physical Address
![Page 13: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci-Dusseau ... · Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau University of Wisconsin - Madison ... P5 P6 Inode1 Inode2 XactP2](https://reader036.fdocuments.in/reader036/viewer/2022070913/5fb48b2fe1ab3f67826d39b3/html5/thumbnails/13.jpg)
13
P0
P0 FS
SSD
P0
P1
I#
Data Inode Directory
L1 -> P1
Inode + L1
Overwrite a data block with segmented address space
Advantages One level of update propagation
Simple implementation
Metadata
Data
Red: Addr the structure points to
Blue: Addr of the block
![Page 14: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci-Dusseau ... · Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau University of Wisconsin - Madison ... P5 P6 Inode1 Inode2 XactP2](https://reader036.fdocuments.in/reader036/viewer/2022070913/5fb48b2fe1ab3f67826d39b3/html5/thumbnails/14.jpg)
Block wear in SSDs Uneven wear among blocks with data of different access frequency
Wear leveling SSD moves data to distribute block erases evenly
Physical address change File system needs to be informed
Only address change in the physical space
14
P1
P1
P1
FS
SSD
P2
Read P1
P1
![Page 15: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci-Dusseau ... · Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau University of Wisconsin - Madison ... P5 P6 Inode1 Inode2 XactP2](https://reader036.fdocuments.in/reader036/viewer/2022070913/5fb48b2fe1ab3f67826d39b3/html5/thumbnails/15.jpg)
New interface: Migration Callbacks Device informs FS about physical address change
Temporary remapping table
Reads and overwrites to old address Remapped to new address
FS processes callbacks in background Acknowledges device when metadata updated
15
FTL
P1
P1
FS
SSD
P1 ->P2
P2
P2
P1 -> P2
Ack
![Page 16: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci-Dusseau ... · Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau University of Wisconsin - Madison ... P5 P6 Inode1 Inode2 XactP2](https://reader036.fdocuments.in/reader036/viewer/2022070913/5fb48b2fe1ab3f67826d39b3/html5/thumbnails/16.jpg)
Problem: Locating metadata structures e.g. During callbacks
e.g. During recovery
Naive approach: traversing all metadata
Solution: Associated Metadata Small amount of metadata used to locate metadata
e.g. Inode number, inode generation number, block offset
Sent with nameless writes and migration callbacks
Stored adjacent to data pages on device, e.g. OOB area
16
![Page 17: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci-Dusseau ... · Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau University of Wisconsin - Madison ... P5 P6 Inode1 Inode2 XactP2](https://reader036.fdocuments.in/reader036/viewer/2022070913/5fb48b2fe1ab3f67826d39b3/html5/thumbnails/17.jpg)
17
P1
FS
SSD
P1->P2, Assoc Meta1 P3->P4, Assoc Meta2 P5->P6, Assoc Meta 3
P2
Ack P1->P2 P3->P4 P5->P6
Hash Table Key: Assoc Metadata
Value: Callback Entries
P1->P2 P3->P4 P5->P6
P3 P4
P5 P6
Inode1 Inode2
Xact Commit
Callback Entry: Old physical addr New physical addr Associated metadata Timestamp
P1 P3 P5
P2 P4 P6
![Page 18: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci-Dusseau ... · Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau University of Wisconsin - Madison ... P5 P6 Inode1 Inode2 XactP2](https://reader036.fdocuments.in/reader036/viewer/2022070913/5fb48b2fe1ab3f67826d39b3/html5/thumbnails/18.jpg)
Introduction
Nameless write interfaces
Results
Conclusion 18
![Page 19: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci-Dusseau ... · Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau University of Wisconsin - Madison ... P5 P6 Inode1 Inode2 XactP2](https://reader036.fdocuments.in/reader036/viewer/2022070913/5fb48b2fe1ab3f67826d39b3/html5/thumbnails/19.jpg)
Supports nameless write interfaces
Flexible device allocation
Maintains small mapping table Indirection of the virtual address space
Temporary remapping table for callbacks
Control of garbage collection and wear leveling Minimize physical address migration (In-place GC)
19
![Page 20: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci-Dusseau ... · Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau University of Wisconsin - Madison ... P5 P6 Inode1 Inode2 XactP2](https://reader036.fdocuments.in/reader036/viewer/2022070913/5fb48b2fe1ab3f67826d39b3/html5/thumbnails/20.jpg)
Ext3: Journaling file system extending ext2
Ordered journal mode Metadata always written after data
Fits well with nameless writes
Interface support Segmented address space
Nameless write
Physical read
Free/trim
Callback
20
![Page 21: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci-Dusseau ... · Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau University of Wisconsin - Madison ... P5 P6 Inode1 Inode2 XactP2](https://reader036.fdocuments.in/reader036/viewer/2022070913/5fb48b2fe1ab3f67826d39b3/html5/thumbnails/21.jpg)
Total: 4360
Ext3: 1530
JBD: 480
Generic I/O: 2020
Headers: 340
21
![Page 22: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci-Dusseau ... · Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau University of Wisconsin - Madison ... P5 P6 Inode1 Inode2 XactP2](https://reader036.fdocuments.in/reader036/viewer/2022070913/5fb48b2fe1ab3f67826d39b3/html5/thumbnails/22.jpg)
Introduction
Nameless write interfaces
Nameless-writing device and ext3
Conclusion 22
![Page 23: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci-Dusseau ... · Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau University of Wisconsin - Madison ... P5 P6 Inode1 Inode2 XactP2](https://reader036.fdocuments.in/reader036/viewer/2022070913/5fb48b2fe1ab3f67826d39b3/html5/thumbnails/23.jpg)
SSD emulator Linux pseudo block device
Data stored in memory
FTLs studied
Page mapping: log-structured allocation
ideal in performance, unrealistic in indirection space
Hybrid mapping: small page-mapped area + block-mapped area
models real SSDs, realistic in indirection space
Nameless-writing
23
![Page 24: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci-Dusseau ... · Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau University of Wisconsin - Madison ... P5 P6 Inode1 Inode2 XactP2](https://reader036.fdocuments.in/reader036/viewer/2022070913/5fb48b2fe1ab3f67826d39b3/html5/thumbnails/24.jpg)
24
Mapping table sizes for typical file system images [Agrawal’09]
Nameless writes use 2% - 7% mapping table space of traditional hybrid SSDs
100
1024
11 118
0.2 2.2
![Page 25: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci-Dusseau ... · Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau University of Wisconsin - Madison ... P5 P6 Inode1 Inode2 XactP2](https://reader036.fdocuments.in/reader036/viewer/2022070913/5fb48b2fe1ab3f67826d39b3/html5/thumbnails/25.jpg)
Sequential and sustained 4KB random write
25 25
Nameless writes deliver 20x random write throughput over traditional hybrid SSDs
Performance of nameless writes is close to page FTL (upper-bound)
![Page 26: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci-Dusseau ... · Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau University of Wisconsin - Madison ... P5 P6 Inode1 Inode2 XactP2](https://reader036.fdocuments.in/reader036/viewer/2022070913/5fb48b2fe1ab3f67826d39b3/html5/thumbnails/26.jpg)
Varmail, FileServer, and WebServer from Filebench
26
Similar performance when workload is read or sequential-write intensive
Performance of hybrid FTL is worse than the other two FTLs when workload has random writes
![Page 27: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci-Dusseau ... · Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau University of Wisconsin - Madison ... P5 P6 Inode1 Inode2 XactP2](https://reader036.fdocuments.in/reader036/viewer/2022070913/5fb48b2fe1ab3f67826d39b3/html5/thumbnails/27.jpg)
Introduction
Nameless write interfaces
Nameless-writing device and ext3
Results
27
![Page 28: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci-Dusseau ... · Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau University of Wisconsin - Madison ... P5 P6 Inode1 Inode2 XactP2](https://reader036.fdocuments.in/reader036/viewer/2022070913/5fb48b2fe1ab3f67826d39b3/html5/thumbnails/28.jpg)
Problem: Excess indirection in flash-based SSDs
Solution: De-indirection with Nameless Writes
Implementation of a nameless-writing system Built an emulated nameless-writing SSD
Ported ext3 to nameless writes
Advantages of nameless writes
Reduce the space cost of indirection over traditional SSDs
Improve random write performance over traditional SSDs Reduce energy cost, simplify SSD firmware
28
![Page 29: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci-Dusseau ... · Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau University of Wisconsin - Madison ... P5 P6 Inode1 Inode2 XactP2](https://reader036.fdocuments.in/reader036/viewer/2022070913/5fb48b2fe1ab3f67826d39b3/html5/thumbnails/29.jpg)
“All problems in computer science can be solved by another level of indirection” Usually attributed to Butler Lampson Lampson attributes it to David Wheeler
And Wheeler usually added: “but that usually will create another problem”
29
![Page 30: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci-Dusseau ... · Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau University of Wisconsin - Madison ... P5 P6 Inode1 Inode2 XactP2](https://reader036.fdocuments.in/reader036/viewer/2022070913/5fb48b2fe1ab3f67826d39b3/html5/thumbnails/30.jpg)
Too much: Excess indirection e.g. file offset => logical address => physical address
Partial indirection e.g. nameless writes with segmented address space
Too little: Cost of (complete) de-indirection e.g. overheads of recursive update
30
![Page 31: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci-Dusseau ... · Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau University of Wisconsin - Madison ... P5 P6 Inode1 Inode2 XactP2](https://reader036.fdocuments.in/reader036/viewer/2022070913/5fb48b2fe1ab3f67826d39b3/html5/thumbnails/31.jpg)
31
Thank you !
Questions ?
The ADvanced Systems Laboratory (ADSL) http://www.cs.wisc.edu/adsl/