Local Filesystems (part 1) CPS210 Spring 2006. Papers The Design and Implementation of a Log-...

Post on 18-Jan-2016

218 views 0 download

Transcript of Local Filesystems (part 1) CPS210 Spring 2006. Papers The Design and Implementation of a Log-...

Local Filesystems (part 1)

CPS210Spring 2006

Papers

The Design and Implementation of a Log-Structured File System Mendel Rosenblum

File System Logging Versus Clustering: A Performance Comparison Margo Seltzer

Surface organized into tracks

Parallel tracks form cylinders

Tracks broken up into sectors

Disk head position

Rotation is counter-clockwise

About to read a sector

After reading blue sector

After BLUE read

Red request scheduled next

After BLUE read

Seek to red’s track

After BLUE read Seek for RED

SEEK

Wait for red sector to reach head

After BLUE read Seek for RED Rotational latency

ROTATESEEK

Read red sector

After BLUE read Seek for RED Rotational latency After RED read

ROTATESEEK

Unix index blocks Intuition

Many files are small Length = 0, length = 1, length < 80, ...

Some files are huge (3 gigabytes) “Clever heuristic” in Unix FFS inode

12 (direct) block pointers: 12 * 8 KB = 96 KB Availability is “free” - you need inode to open() file

anyway 3 indirect block pointers

single, double, triple

Unix index blocks

106105

501502

102101

1615

1817

500100

1000 104103

2019

2221

2423

2625

2827

3029

3231

Unix index blocks

1615

1817

-1-1

-1

Direct blocks

Indirect pointerDouble-indirect

Triple-indirect

Unix index blocks

1615

1817

-1100

-1

2019

Unix index blocks

102101

1615

1817

500100

-1

2019

2221

2423

Unix index blocks

106105

501502

102101

1615

1817

500100

1000 104103

2019

2221

2423

2625

2827

3029

3231

Log-structured file system What is the high level motivation?

Caches are getting bigger Disk reads are less important Disk traffic will be dominated by writes

Why a log? Eliminate seeks (make all disk writes sequential) Easy crash-recovery

Most writes are small meta-data updates Consecutive small writes will trigger seeks Some file systems perform these synchronously

LFS challenges

Writes are easier, what about reads?

How do we ensure large open spaces? Why does this matter?

LFS on disk structures

Segment cleaner LFS requires large open spaces

Fragmentation will kill performance Use notion of segments

Large contiguous areas of live/dead data 512 kb or 1MB

Segment cleaner defragments disk Separate the old from the young Old data rarely changes Clean two differently

Threading vs copying

Thread between segments

Segment cleaning

Three steps Read segments into memory Identify live blocks in those segments Write live blocks to a small, clean

segment Must also update file inodes

Segment block summary for each segment

Segment block summaries Contains info about blocks in a

segment For each file data block

SBS has the file number and block number Also used to identify live/dead blocks

Use file number from SBS with actual inode If same, block is live If different, block is dead

Optimization: just keep inode versions

Cleaner policy questions

When should the cleaner run? Continuously, at night, high utilization

How many segments per cleaning? Tens, hundreds, …

Which segments should be cleaned?

How should live blocks be grouped?

Write cost

Percent of bandwidth used for new data 1.0 is perfect (we only write new data) 10.0 isn’t great (1/10 written data is new)

Ideal bimodal distribution of segments High utilization of old segments Low utilization of young segments Combines high utilization and low write cost

Initial simulation results

Surprise!

Why the surprise?

These segments decay slowly collecting dead blocks.In aggregate, they contain a lot of free blocks

Instead of greatest yield

Use cost-benefit analysis Benefit/cost =

(Yield * age)/cost = (1-u)*age/(1+u)

Result Clean cold segments at higher

utilization

Result

What do LFS results really mean?

Workloads matter When is LFS better than FFS? When is FFS better than LFS?

More terminology

Blocks Partial blocks, aka fragments Contiguous ranges of blocks, aka

clusters Want to allocate indoes + data

In same cylinder (no seeks) Want data

In clusters on same track (also no seeks)

Challenges for FFS

Reduce (eliminate?) synchronous writes

Avoid fragmentation Why are meta-data writes

synchronous?

Sequential performance vs file size

Four phase benchmark: Create Read Overwrite Delete

Ideal conditions Blank file system, no cleaner

Create results

Read results

Overwrite results

Delete results