Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium...

53
1 Flash-aware File System Flash-aware Computing Instructor: Prof. Sungjin Lee ([email protected] )

Transcript of Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium...

Page 1: Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build

1

Flash-aware File System

Flash-aware Computing

Instructor:Prof. Sungjin Lee ([email protected])

Page 2: Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build

2

Today File System Basics Traditional Flash File Systems SSD-Friendly Flash File Systems Reference

Page 3: Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build

3

What is a File System? Provides a virtualized logical view of information stored on various

storage media, such as disks, tapes, and flash-based SSDs

Two key abstractions have developed over time in the virtualization of storage File: A linear array of bytes, each of which you can read or write

Its contents are defined by a creator (e.g., text and binary) It is often referred to as its inode number

Directory: A special file that is a collection of files and other directories Its contents are quite specific – it contains a list of (user-readable

name, inode #) pairs (e.g., (“foo”, 10)) It has a hierarchical organization (e.g., tree, acyclic-graph, and graph) It is also identified by an inode number

Page 4: Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build

4

Operations on Files and Directories

POSIX APIs Descriptioncreat () Create a fileopen () Create/open a filewrite () Write bytes to a fileread () Read bytes from a filelseek () Move byte position inside a fileunlink () Remove a filetruncate () Resize a fileclose () Close a file

POSIX APIs Descriptionopendir () Open a directory for readingclosedir () Close a directoryreaddir () Read one directory entryrewinddir () Rewind a directory so it can be

rereadmkdir () Create a new directoryrmdir () Remove a directory

POSIX Operations on DirectoriesPOSIX Operations on Files

Page 5: Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build

5

Virtual File System The POSIX API is to the VFS interface, rather than any specific

type of file system

open(), close (), read (), write ()

File-system specificimplementations

Page 6: Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build

6

File System Implementation UNIX File System Journaling File System Log-structured or Copy-on Write File Systems

Page 7: Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build

7

BootSector

SuperBlock

InodeBmap

DataBmap 0 1 2 … Data (4 KiB Blocks)

UNIX File System A traditional file system first developed for UNIX systems

Boot sector: Information to be loaded into RAM to boot up the OS Superblock: File system’s metadata (e.g., file system type, size, …) Inode & data Bmaps: Keep the status of blocks belonging to an

inode table and data blocks Inode table: Keep file’s metadata (e.g., size, permission, …) and

data block pointers Data blocks: Keep users’ file data

Inode table

Page 8: Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build

8

Inode & Block Pointers

BootSector

SuperBlock

InodeBmap

DataBmap 0 1 2 …

Inode table

Last modified timeLast access time

File sizePermissionLink count

27

1037

3942

27 100

Bloc

k nu

mbe

rs

100

101

102

1037

Direct Blocks

Indirect Blocks

104

131

152

39421034

94133

14483

104

1034

Double Indirect Blocks

Page 9: Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build

9

BootSector

SuperBlock

InodeBmap

DataBmap 0 1 2 … Data (4 KiB Blocks)

Consistent Update Problem What happens if sudden power loss occurs while writing

data to a file

Inode table

write (0, “foo”, strlen (“foo”) );

foo0

The file system will be inconsistent!!! Consistent update problem

Page 10: Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build

10

Journaling File System Journaling file systems address the consistent update problem

by adopting an idea of write-ahead logging (or journaling) from database systems

Ext3, Ext4, ReiserFS, XFS, and NTFS are based on journaling

Double writes could degrade overall write performance!

BootSector

SuperBlock

InodeBmap

DataBmap 0 1 2 … Data (4 KiB Blocks)

foo0

Journaling spaceInode table

TxB TxEwrite (0, “foo”, strlen (“foo”) );

Journal write & commit

foo0

Checkpoint

Page 11: Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build

11

Log-structured File System Log-structured file systems (LFS) treat a storage space as a

huge log, appending all files and directories sequentially

The state-of-the-art file systems are based on LFS or CoW e.g., Sprite LFS, F2FS, NetApp’s WAFL, Btrfs, ZFS, …

BootSector

SuperBlock

Check point#1

inode for A

Inode for B

File A

File B

inodeMap

Check point#2

Write all the files and inodes sequentially

Page 12: Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build

12

Log-structured File System (Cont.) Advantages (+) No consistent update problem (+) No double writes – an LFS itself is a log! (+) Provide excellent write performance – disks are

optimized for sequential I/O operations (+) Reduce the movements of disk headers further (e.g.,

inode update and file updates)

Disadvantages (–) Expensive garbage collection cost (–) Slow read performance

Page 13: Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build

13

Disadvantages of LFS

Expensive garbage collection cost: invalid blocks must be reclaimed for future writes; otherwise, free disk space will be exhausted

Slow read performance: involve more head movements for future reads (e.g., when reading the file A)

BootSector

SuperBlock

Check point#1

inode for A

Inode for B

inodeMap

Check point#2

inode for A

Inode for B

inodeMap

Write sequentially

Check point#2

File A

File B

Invalid

Page 14: Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build

14

Write Cost Write cost with GC is modeled as follows Note: a segment (seg) is a unit of space allocation and GC

N is the number of segments µ is the utilization of the segments (0 ≤ µ < 1) If segments have no live data (µ = 0), write cost becomes 1.0

Page 15: Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build

15

Write Cost Comparison

(measured)

(delayed writes, sorting)

Page 16: Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build

16

Greedy Policy The cleaner chooses the least-utilized segments and sorts the live data

by age before writing it out again Workloads: 4 KB files with two overwrite patterns

(1) Uniform: No locality – equal likelihood of being overwritten (2) Hot-and-cold: Locality – 10:90

The variance in segment utilization

Worse than a systemwith no locality

Page 17: Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build

17

Cost-Benefit Policy Hot segments are frequently selected as victims even though their

utilizations would drop further It is necessary to delay cleaning and let more of the blocks die On the other hand, free space in cold segments are valuable

Cost-benefit policy:

Page 18: Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build

18

Cost-Benefit Policy (Cont.)

Page 19: Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build

19

LFS Performance

Inode updates

Random reads

Page 20: Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build

20

Today File System Basics Traditional Flash File Systems JFFS2: Journaling Flash File System YAFFS2: Yet Another Flash File System UBIFS: Unsorted Bock Image File System

SSD-Friendly File Systems Reference

Page 21: Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build

21

Traditional File Systems for Flash Originally designed for block devices like HDDs e.g., ext2/3/4, FAT32, and NTFS

But, NAND flash memory is not a block device The FTL provides block-device views outside, hiding the unique

properties of NAND flash memory

Read Write Erase

Read Write

Traditional File System for HDDs(e.g., ext2/3/4, FAT32, and NTFS)

Flash Translation Layer

NAND flash

Flash-based SSDs

NAND control(e.g., ONFI)

Block I/O Interface

Page 22: Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build

22

Flash File Systems Directly manage raw NAND flash memory Internally performing address mapping, garbage collection, and

wear-leveling by itself

Representative flash file systems JFFS2, YAFFS2, and UBIFS

Flash File System(e.g., JFFS2, YAFFS2, and UBIFS)

NAND-specific Low-Level Device Driver(e.g., MTD and UBI)

Read Write Erase

NAND Flash

Read Write Erase NAND control(e.g., ONFI)

Page 23: Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build

23

Memory Technology Device (MTD) MTD is the lowest level for accessing flash chips Offer the same APIs for different flash types and technologies

e.g., NAND, OneNAND, and NOR

JFFS2 and YAFFS2 run on top of MTD

NAND OneNAND NOR …

MTD

JFFS2 YAFFS2 FTLs

mtd_read (), mtd_write (), …

device-specific commands, …

Typical FS

Page 24: Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build

24

Traditional File Systems vs. Flash File Systems

File System + FTL Flash File System

Method - Access a flash device via FTL - Access a flash device directly

Pros- High interoperability- No difficulties in managing recent

NAND flash with new constraints

- High-level optimization with system-level information

- Flash-aware storage management

Cons- Lack of system-level information- Flash-unaware storage

management

- Low interoperability- Must be redesigned to handle

new NAND constraints

Flash file systems now become obsoletebecause of difficulties for the adoptionto new types of NAND devices

Page 25: Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build

25

JFFS2: Journaling Flash File System A log-structured file system (LFS) for use with NAND flash Unlike LFS, however, it does not allow any in-place updates!!!

Main features of JFFS2 File data and metadata stored as nodes in NAND flash memory Keep an inode cache holding the information of nodes in DRAM A greedy garbage collection algorithm

Select cheapest blocks as a victim for garbage collection A simple wear-leveling algorithm combined with GC

Consider the wearing rate of flash blocks when choosing a victim block for GC

Optional data compression

Page 26: Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build

26

JFFS2: Write Operation

NAND Flash Memory

Updates on File AVer: 1

Offset: 0Len: 200

32-bit CRCData

Ver: 2Offset: 200

Len: 20032-bit CRC

Data

Ver: 3Offset: 75

Len: 5032-biit CRC

Data

0 200 400

Ver 2 Ver 3

Ver: 4Offset: 0Len: 75

32-bit CRCData

Ver: 5Offset: 125

Len: 7532-bit CRC

Data

Ver 4 Ver 5

0-75: Ver 1

Logical View of File A

75-125: Ver 3

125-200: Ver 1

200-400: Ver 2

Inode Cache (DRAM)

Ver 1

0-75: Ver 4

125-200: Ver 5

All data are written sequentially to a log which records all changes

Page 27: Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build

27

JFFS2: Read Operation

NAND Flash Memory

Updates on File AVer: 1

Offset: 0Len: 200

32-bit CRCData

Ver: 2Offset: 200

Len: 20032-bit CRC

Data

Ver: 3Offset: 75

Len: 5032-biit CRC

Data

0 200 400

Ver 2 Ver 3

Ver: 4Offset: 0Len: 75

32-bit CRCData

Ver: 5Offset: 125

Len: 7532-bit CRC

Data

Ver 4 Ver 5

0-75: Ver 1

Logical View of File A

75-125: Ver 3

125-200: Ver 1

200-400: Ver 2

Inode Cache (DRAM)

Ver 1

0-75: Ver 4

125-200: Ver 5

Read data from File A(offset: 75 – 200)

The latest data can be read from NAND flash by referring to the inode cache in DRAM

Page 28: Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build

28

JFFS2: Mount Scan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build the inode cache

NAND Flash Memory

0 200 400

Ver 2 Ver 3 Ver 4 Ver 5

Ver: 1Offset: 0Len: 200

32-bit CRCData

0-200: Ver 1

200-400: Ver 2

Inode Cache

Ver: 2Offset: 200

Len: 20032-bit CRC

Data

Ver: 3Offset: 75

Len: 5032-bit CRC

Data

Ver: 4Offset: 0Len: 75

32-bit CRCData

Ver: 5Offset: 125

Len: 7532-bit CRC

Data

125-200: Ver 1

75-125: Ver 3

0-75: Ver 1

200-400: Ver 2

125-200: Ver 1

75-125: Ver 3

0-75: Ver 4

200-400: Ver 2

Ver 1

125-200: Ver 5

Page 29: Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build

29

JFFS2: Problems Slow mount time All nodes must be scanned at mount time Mount time increases in proportion to flash size and file system

contents

High memory consumption All node information must be maintained in DRAM Memory consumption linearly depends on file system contents

Low scalability Infeasible for a large-scale flash device Mount time and memory consumption increase according to a

flash size

Page 30: Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build

30

YAFFS2: Yet Another Flash File System Another log-structured file system for flash memory Store data to flash memory like a log with a unique sequence

number like JFFS2 Reads and writes are performed similar to JFFS2

Mitigate the problems raised by JFFS2 Relatively frugal with memory resource Checkpoints for a fast file system mount Dynamic wear-leveling

Support multiple platforms Linux, WinCE, RTOSs, etc

Page 31: Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build

31

YAFFS2: Physical Layout The entries in the log are all one chunk (one page) in size

and can hold one of two types of chunk Data chunk: A chunk holding regular data file contents Object header: A descriptor for an object, such as a directory, a

regular file, etc; similar to struct stat but include dentry

Page (chunk)

Block

Objec

t hea

der

Objec

t ID:

500

Chun

k ID:

0

Data

chun

kOb

ject I

D:50

0Ch

unk I

D:1 /

Seq

: 1

Data

chun

kOb

ject I

D:50

0Ch

unk I

D:2 /

Seq

: 1

Data

chun

kOb

ject I

D:50

0Ch

unk I

D:3 /

Seq

: 1

Data

chun

kOb

ject I

D:50

0Ch

unk I

D:4 /

Seq

: 1

Data

chun

kOb

ject I

D:50

0Ch

unk I

D:1 /

Seq

: 2

- Object ID: Identify which object the chunk belongs to - Chunk ID: Identify where in the file this chunk belongs- Seq : A sequence number of a data chunk

Page 32: Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build

32

YAFFS2: File System Layout Maintain the information about objects and chunks in DRAM

Object header (/)

Object header (file1)

Data chunk (file1)

Data chunk (file1)

Object header (dir1)

Object header (file2)

Data chunk (file2)

NAND flash memory

Main memory

Object(/)

Object(file1)

Tnode(file1)

Object(dir1)

Object(file2)

Tnode(file2)

Page 33: Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build

33

YAFFS2: Block Summary A block summary (including object id and chunk id) for chunks in the block

are written to the last chunk This allows all the tags for chunks in that block to be read in one hit, avoiding

full disk scan If a block summary is not available for a block, then a full scan is used to get

the tags on each chunk

Block

Objec

t hea

der

Objec

t ID:

500

Chun

k ID:

0

Data

chun

kOb

ject I

D:50

0Ch

unk I

D:1 /

Seq

: 1

Data

chun

kOb

ject I

D:50

0Ch

unk I

D:2 /

Seq

: 1

Data

chun

kOb

ject I

D:50

0Ch

unk I

D:3 /

Seq

: 1

Data

chun

kOb

ject I

D:50

0Ch

unk I

D:4 /

Seq

: 1

Data

chun

kOb

ject I

D:50

0Ch

unk I

D:1 /

Seq

: 2

…Bloc

kSu

mm

ary

Bloc

kSu

mm

ary

Object header: 1st page- Object ID: 500- Chunk ID: 0

Data chunk: 2nd page…

Page 34: Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build

34

YAFFS2: Checkpoints DRAM structures are saved on flash at unmount Structures re-read, avoiding boot scan Lazy loading also reduces mount time

Objec

t hea

der

Objec

t ID:5

00Ch

unk I

D:0

Data

chun

kOb

ject ID

:500

Chun

k ID:

1 / S

eq: 1

Data

chun

kOb

ject ID

:500

Chun

k ID:

2 / S

eq: 1

Data

chun

kOb

ject ID

:500

Chun

k ID:

3 / S

eq: 1

Data

chun

kOb

ject ID

:500

Chun

k ID:

4 / S

eq: 1

Data

chun

kOb

ject ID

:500

Chun

k ID:

1 / S

eq: 2

On-fl

ash

RAM

stru

ctur

e

On-fl

ash

RAM

stru

ctur

e

Saved at unmount

Re-read at remount

Page 35: Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build

35

UBIFS: Unsorted Bock Image File System A new flash file system developed by Nokia Considered as the next generation of JFFS2

New features of UBIFS Scalability

Scale well with respect to flash size Memory size and mount time do not depend on flash size

Fast mount Do not have to scan the whole media when mounting

Write-back support Dramatically improve the throughput of the file system in

many workloads …

Page 36: Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build

36

UBI: Unsorted Block Image UBIFS runs on top of UBI volume UBI supports multiple volumes, bad block management, wear-

leveling, and bit-flips error management The upper level software can be simpler with UBI

NAND OneNAND NOR …

MTD

JFFS2 YAFFS2 UBIFS

mtd_read (), mtd_write (), …

device-specific commands, …

UBI

ubi_read (), ubi_write (), …

Page 37: Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build

37

How UBI works Logical erase blocks (LEBs) are mapped to physical erase

blocks (PEBs) Any LEB can be mapped to any PEB

PEB: physical erase blockLEB: logical erase block

PEB 0 PEB 6 PEB 7PEB 1 PEB 2 PEB 3 PEB 4 PEB 5 PEB 8 PEB 9 PEB 10

MTD device

LEB 0 LEB 1 LEB 2 LEB 3 LEB 4 LEB 0 LEB 1 LEB 2

Volume A Volume B

UBI layer

eras

ere

adReturn 0xFFs writ

ere

ad

Page 38: Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build

38

How UBI works UBI has its own wear-leveling algorithm that moves the

data kept in highly erased blocks to lower one

PEB: physical erase blockLEB: logical erase block

Static read-only data

PEB 0 PEB 6PEB 6 PEB 7PEB 1 PEB 2 PEB 3 PEB 4 PEB 5 PEB 8 PEB 9 PEB 10

MTD device

LEB 0 LEB 1 LEB 2 LEB 3 LEB 4 LEB 0 LEB 1 LEB 2

Volume A Volume B

UBI layer

Low erase counter High erase counter

Move dataRe-map LEB

Page 39: Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build

39

UBIFS: Indexing with B+ Tree UBIFS index is a B+ tree and is stored on NAND flash

c.f., JFFS2 does not store the index on flash Leaf level contains data

Full scanning is not needed B+ tree is cached in RAM

Shrunk in case of memory pressure B+ tree

Index Data

Leaf level nodes

Realize good scalability and fast mount

Page 40: Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build

40

UBIFS: Wandering Tree How to find the root of the tree?

- Write data node “D”- Old “D” becomes obsolete- Write indexing node “C”- Old “C” becomes obsolete- Write indexing node “B”- Old “B” becomes obsolete- Write indexing node “A”- Old “A” becomes obsolete

A

B

C

D D A B C BD A B

A

B

C

A

C

B

CC A

The position of the tree in flash is changed

Page 41: Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build

41

UBIFS: Master Area LEBs 1/2 are reserved for a master area pointing to a root index node The master area can be quickly found on mount since its location is

fixed (LEBs 1 and 2) Using the root index node, B+ tree can be quickly constructed

A mater area could have multiple nodes A valid master node is found by scanning master area

D D A B C BD A B CC A

Master area (LEBs 1 and 2)

M(ver.1)

M(ver.2)

Root index node

Page 42: Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build

42

Mount Time Comparison

Page 43: Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build

43

Summary JFFS2: Journaling Flash File System version 2 Commonly used for low volume flash devices Compression is supported Long mount time & High memory consumption

YAFFS2: Yet Another Flash File System version 2 Fast mount time with check-pointing

UBIFS: Unsorted Block Image File System Fast mount time and low memory consumption by adopting B+

tree indexing

Page 44: Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build

44

Today File System Basics Traditional Flash File Systems SSD-Friendly Flash File Systems F2FS: Flash-friendly File System

Reference

Page 45: Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build

45

F2FS: Flash-friendly File System Log-structured file system for FTL devices Unlike other flash file systems, it runs atop FTL-based flash storage

and is optimized for it Exploit system-level information for better performance and reliability

(e.g., better hot-cold separation, background GC, …)

NAND Flash Memory

MTD

JFFS2/YAFFS2 UBIFS

UBI

NAND Flash Memory

FTL

F2FS

Block Device Driver

Flash Device

Traditional flash file system F2FS

Page 46: Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build

46

Design Concept of F2FS Designed to take advantage of both of two approaches Exploit high-level system information Flash-aware storage management Handle new NAND flash without redesign

File System + FTL Flash File System

Method - Access a flash device via FTL - Access a flash device directly

Pros- High interoperability- No difficulties in managing recent

NAND flash with new constraints

- High-level optimization with system-level information

- Flash-aware storage management

Cons- Lack of system-level information- Flash-unaware storage

management

- Low interoperability- Must be redesigned to handle

new constraints

Page 47: Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build

47

Index Structure of LFS

Update on file data

LFS and UBIFS suffer from the wandering tree problem Update on a file causes several extra writes

Page 48: Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build

48

Index Structure of F2FS Introduce Node Address Table (NAT) containing the locations

of all the node blocks, including indirect and direct nodes NAT allows us to eliminate the wandering tree problem

In-place update

Page 49: Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build

49

Logical Storage Layout File-system’s metadata is located together for locality Use an “in-place update” strategy for metadata

Files are written sequentially for performance Use an “out-of-place-update” strategy to exploit high throughput

of multiple NAND chips Six active logs for static hot and cold data separation

Superblock 0

Superblock 1Checkpointarea

SegmentInfo.Table(SIT)

NodeAddress

Table(NAT)

SegmentSummary

Area(SSA)

Main Area

Hot/Warm/Coldnode segments

Hot/Warm/Colddata segments

Random writes Sequential writes

Page 50: Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build

50

Cleaning Process Background cleaning process A kernel thread doing the cleaning job periodically at idle time

Victim selection policies Greedy algorithm for foreground cleaning job

Reduce the amount of data moved for cleaning Cost-benefit algorithm for background cleaning job

Reclaim obsolete space in a file system– Improve the lifetime of a storage device– Improve the overall I/O performance

Page 51: Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build

51

F2FS Performance

Page 52: Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build

52

Today File System Basics Traditional Flash File Systems SSD-Friendly Flash File Systems Reference

Page 53: Flash-aware File Systemocw.snu.ac.kr/sites/default/files/NOTE/Week17.pdfScan the flash memory medium after rebooting Check the CRC for written data and mark the obsolete data Build

53

Reference Rosenblum, Mendel, and John K. Ousterhout. "The design and

implementation of a log-structured file system." ACM Transactions on Computer Systems (TOCS) 10.1 (1992): 26-52.

Woodhouse, David. "JFFS2: The Journalling Flash File System, Version 2, 2001.“

YAFFS2 Specification. http://www.yaffs.net/yaffs-2-specification Lee, Changman, et al. "F2FS: A New File System for Flash Storage.“

USENIX FAST. 2015. Arpaci-Dusseau, Remzi H., and Andrea C. Arpaci-Dusseau. Operating

systems: Three easy pieces. Vol. 151. Wisconsin: Arpaci-Dusseau Books, 2014.

Hunter, Adrian. "A brief introduction to the design of UBIFS." Rapport technique, March (2008).