DEDUPLICATION IN YAFFS KARTHIK NARAYAN PAVITHRA SESHADRIVIJAYAKRISHNAN.
-
Upload
heather-farmer -
Category
Documents
-
view
221 -
download
1
Transcript of DEDUPLICATION IN YAFFS KARTHIK NARAYAN PAVITHRA SESHADRIVIJAYAKRISHNAN.
![Page 1: DEDUPLICATION IN YAFFS KARTHIK NARAYAN PAVITHRA SESHADRIVIJAYAKRISHNAN.](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649dbe5503460f94ab198a/html5/thumbnails/1.jpg)
DEDUPLICATION IN YAFFS
KARTHIK NARAYANPAVITHRA SESHADRIVIJAYAKRISHNAN
![Page 2: DEDUPLICATION IN YAFFS KARTHIK NARAYAN PAVITHRA SESHADRIVIJAYAKRISHNAN.](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649dbe5503460f94ab198a/html5/thumbnails/2.jpg)
Data Deduplication • Intelligent Compression• Addresses storage space requirements
Solid State Devices • A cutting edge storage technology • Addresses I/O performance requirements
Data Deduplication + SSD => Perfect match ?
![Page 3: DEDUPLICATION IN YAFFS KARTHIK NARAYAN PAVITHRA SESHADRIVIJAYAKRISHNAN.](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649dbe5503460f94ab198a/html5/thumbnails/3.jpg)
• Abundant storage requirements in storage systems due to aplenty of redundant data
• Increased storage cost and performance degradations
• SSDs can be more cost effective than managing a group of mechanical hard drives
![Page 4: DEDUPLICATION IN YAFFS KARTHIK NARAYAN PAVITHRA SESHADRIVIJAYAKRISHNAN.](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649dbe5503460f94ab198a/html5/thumbnails/4.jpg)
What have we done?
• Deduplication in YAFFS2 (Yet Another Flash File System)- NAND flash file system
• Deduplication addresses the problems caused by redundant data and it has been implemented using Content based fingerprinting
• Properties of flash are harnessed to reduce the overheads and implementation complexity
• We show that the write time for duplicate data and storage space has been greatly reduced
![Page 5: DEDUPLICATION IN YAFFS KARTHIK NARAYAN PAVITHRA SESHADRIVIJAYAKRISHNAN.](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649dbe5503460f94ab198a/html5/thumbnails/5.jpg)
![Page 6: DEDUPLICATION IN YAFFS KARTHIK NARAYAN PAVITHRA SESHADRIVIJAYAKRISHNAN.](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649dbe5503460f94ab198a/html5/thumbnails/6.jpg)
SSD ENTERS THE PICTUREHigh performance storage SSDs use microchips which retain data in non-
volatile memory chipsAs of 2010, most SSDs use NAND-based flash
memory, which retains memory even without powerIt has been the single biggest change to drive
technology in recent years, with the storage medium showing up in data centers,laptops and in memory cards in mobile devices
![Page 7: DEDUPLICATION IN YAFFS KARTHIK NARAYAN PAVITHRA SESHADRIVIJAYAKRISHNAN.](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649dbe5503460f94ab198a/html5/thumbnails/7.jpg)
Some properties of Flash/SSD
Faster access time than a disk, because the data can be randomly accessed and does not rely on a read/write interface head synchronizing with a rotating disk SSD also provides greater physical resilience to physical vibration, shock and extreme temperature fluctuations because of the absence of moving parts
![Page 8: DEDUPLICATION IN YAFFS KARTHIK NARAYAN PAVITHRA SESHADRIVIJAYAKRISHNAN.](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649dbe5503460f94ab198a/html5/thumbnails/8.jpg)
DESIGN
• Source deduplication- Takes place within a file system
• Content based finger printing
• Files are divided into chunks
• Fingerprinting for the chunks is carried out before every write operation
• Multiple redundant copies are indirected to the same device location
• Read performance is not affected
![Page 9: DEDUPLICATION IN YAFFS KARTHIK NARAYAN PAVITHRA SESHADRIVIJAYAKRISHNAN.](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649dbe5503460f94ab198a/html5/thumbnails/9.jpg)
A note on choice of hash function• The hash functions used include standards such as SHA-1, SHA-256
and others. These provide a far lower probability of data loss than the risk of an undetected/uncorrected hardware error in most cases
• Some cite the computational resource intensity of the process as a drawback of data deduplication
• To improve performance, We can utilize weak hashes. Weak hashes are much faster to calculate but there is a greater risk of a hash collision.
• Systems that utilize weak hashes will subsequently calculate a strong hash or compare the actual data and will use it as the determining factor to whether it is actually the same data or not.
• We can afford to compare the actual data as a read is fast enough in SSD than wasting precious CPU power
![Page 10: DEDUPLICATION IN YAFFS KARTHIK NARAYAN PAVITHRA SESHADRIVIJAYAKRISHNAN.](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649dbe5503460f94ab198a/html5/thumbnails/10.jpg)
DESIGN contd..• The chunk fingerprints and the corresponding chunk
IDs are maintained as in memory structures• A back store typically has large number of chunks • This led to the idea of storing Hashes on the device
and maintaining a cache of it in memory• A combination of LFU(Least frequently Used) and
LRU(Least Recently Used) cache replacement policies should yield good results. We have implemented LFU.
![Page 11: DEDUPLICATION IN YAFFS KARTHIK NARAYAN PAVITHRA SESHADRIVIJAYAKRISHNAN.](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649dbe5503460f94ab198a/html5/thumbnails/11.jpg)
Implementation
• We chose YAFFS2 to implement deduplication.
• Popular commercially used robust file system for NAND
• Our testing environment is an android emulator which runs the virtual CPU called Goldfish
• Goldfish executes ARM926T instructions and has hooks for input and output -- such as reading key presses from or displaying video output in the emulator
![Page 12: DEDUPLICATION IN YAFFS KARTHIK NARAYAN PAVITHRA SESHADRIVIJAYAKRISHNAN.](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649dbe5503460f94ab198a/html5/thumbnails/12.jpg)
Implementation• Primarily, we tweaked the functionyaffs_WriteChunkDataToObject that writes chunk data to the NAND
During every chunk write :
• Determine the fingerprint for the chunk• Check if a fingerprint exists in the chunk cache• If it is not present,fetch the fingerprint & corresponding
chunk ids from device
![Page 13: DEDUPLICATION IN YAFFS KARTHIK NARAYAN PAVITHRA SESHADRIVIJAYAKRISHNAN.](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649dbe5503460f94ab198a/html5/thumbnails/13.jpg)
Implementation Contd...
If a chunk id is present corresponding to the hash, remove Least frequently used entry from chunk cache and replace it with the entry obtained from device
Update meta-data for this chunk to point to the existing chunk ID corresponding to its fingerprint value
If no chunk id is present for the hash, write the chunk to NAND and update the hash entry
![Page 14: DEDUPLICATION IN YAFFS KARTHIK NARAYAN PAVITHRA SESHADRIVIJAYAKRISHNAN.](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649dbe5503460f94ab198a/html5/thumbnails/14.jpg)
RESULTS
![Page 15: DEDUPLICATION IN YAFFS KARTHIK NARAYAN PAVITHRA SESHADRIVIJAYAKRISHNAN.](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649dbe5503460f94ab198a/html5/thumbnails/15.jpg)
![Page 16: DEDUPLICATION IN YAFFS KARTHIK NARAYAN PAVITHRA SESHADRIVIJAYAKRISHNAN.](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649dbe5503460f94ab198a/html5/thumbnails/16.jpg)
![Page 17: DEDUPLICATION IN YAFFS KARTHIK NARAYAN PAVITHRA SESHADRIVIJAYAKRISHNAN.](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649dbe5503460f94ab198a/html5/thumbnails/17.jpg)
CONCLUSION
Decade’s most important data storage technologyDeduplication on SSDs would be at the fore front of back up solutions in futureThese two technologies together can control storage costs without sacrificing reliability or performance De-dupe technology continues to spread, and as SSD costs drop, those benefits will become even more apparent.