Deciding When to Forget in the Elephant File System

23
1 Deciding When to Forget in the Elephant File System University of British Columbia: Douglas. S. Santry, Michael J. Feeley, Norman C. Hutchinson, Ross W. Carton, and Jacob Ofir Hewlett-Packard Laboratories: Alistair. C. Veitch December 1999 Presentated by: David Allen May 31 st , 2005

description

Deciding When to Forget in the Elephant File System. University of British Columbia: Douglas. S. Santry, Michael J. Feeley, Norman C. Hutchinson, Ross W. Carton, and Jacob Ofir Hewlett-Packard Laboratories: Alistair. C. Veitch December 1999. Presentated by: David Allen May 31 st , 2005. - PowerPoint PPT Presentation

Transcript of Deciding When to Forget in the Elephant File System

Page 1: Deciding When to Forget in the Elephant File System

1

Deciding When to Forget in the Elephant File System

University of British Columbia:Douglas. S. Santry, Michael J. Feeley, Norman C. Hutchinson, Ross

W. Carton, and Jacob OfirHewlett-Packard Laboratories:

Alistair. C. Veitch

December 1999

Presentated by: David Allen

May 31st, 2005

Page 2: Deciding When to Forget in the Elephant File System

2

Elephant File System: Overview

• Undo and Long-Term History File system that helps to protect data by keeping

histories of file and directory changes.

• User Control Gives control over retention policies to the user. Can be applied at the file level.

• Storage Reclamation Separates storage reclamation from file operations

such as write and delete. Cleaner runs in background to reclaim storage and

support the retention policy.

Page 3: Deciding When to Forget in the Elephant File System

3

Elephant file system: Why

• User Failures There is already good protection from

network, system and media failures. Now we need to protect from user mistakes.

rm *.o is not the same as rm * o

Page 4: Deciding When to Forget in the Elephant File System

4

High-End Disk Capacity by Year

0

50

100

150

200

250

300

350

400

450

1999 2000 2001 2002 2003 2004 2005

Year

Cap

acity (M

B)

Elephant file system: Why

• Cheap Disk Space Single inexpensive

disks were approaching 50GB at time of paper in 1999.

Now in 2005 they are approaching 500GB.

They will be 2TB by 2010.

Page 5: Deciding When to Forget in the Elephant File System

5

Elephant file system: Why

• Cheap Disk Space In addition to high-end disk capacity increasing 10x in

6 years, the price is more than 10 times cheaper.

High-End Disk Price per GB by Year

$0.00

$2.00

$4.00

$6.00

$8.00

$10.00

$12.00

$14.00

$16.00

1999 2000 2001 2002 2003 2004 2005

Year

Price/G

B

Rough Price for High-End Disk Drive

$0.00

$100.00

$200.00

$300.00

$400.00

$500.00

$600.00

$700.00

$800.00

1996 1997 1998 1999 2000 2001 2002 2004 2005

Year

Pri

ce

Page 6: Deciding When to Forget in the Elephant File System

6

Elephant file system: Why

• Cheap Disk Space Other types of media as well.

8GB compact flash

6GB micro drives

(Useful for that 16.7MP Canon camera. 42MB images.)

Page 7: Deciding When to Forget in the Elephant File System

7

Elephant file system: Why

• Capacity Large disk capacities. Constant human productivity. Only a relatively small set of files that need

protection.

It makes sense to support revision histories on files and directories.

Page 8: Deciding When to Forget in the Elephant File System

8

Elephant file system: Change

• Change in pattern of use. Does this paper stand up to changes in disk

usage? Explosion of large files from still and video

digital cameras, mp3 CD rips, and divx DVD rips.

I have 17.8GB of pictures and video from one trip, which I need to prune and edit to a final form.

How would people in the class use this system?

Page 9: Deciding When to Forget in the Elephant File System

9

Elephant file system: Policies

• Keep One (no versioning) Just like the FFS. Files changes can overwrite

existing data, and are permanent.

Page 10: Deciding When to Forget in the Elephant File System

10

Elephant file system: Policies

• Keep All (complete versioning) Like revision control systems. Entire history is

maintained.

Page 11: Deciding When to Forget in the Elephant File System

11

Elephant file system: Policies

• Keep Safe (undo protection) Keeps recent changes for a specified undo

period.

undo period

Page 12: Deciding When to Forget in the Elephant File System

12

Elephant file system: Policies

• Keep Landmarks (long-term history) In addition to Keep Safe protection, retain

important file versions.

undo period

Page 13: Deciding When to Forget in the Elephant File System

13

Elephant file system: Policies

• Application Defined (user specified) Custom policy implemented at the user level.

Page 14: Deciding When to Forget in the Elephant File System

14

Elephant file system: Features for Comparison

• User Control Only retains history on user selected files, with user

selected policies. Custom policies can be created. Landmarks can be user specified.

• Automation Implemented within the file system. Revisions are maintained automatically as the files

are used. Landmarks can be determined automatically. Cleaning is done in the background.

Page 15: Deciding When to Forget in the Elephant File System

15

Elephant file system: Features for Comparison

• Granularity Every file and directory change can be kept. Full or partial long term histories can be maintained. Files can be grouped to maintain consistency for

landmarking. Versioning on files is done at the block level.

• Access Specific version can be specified with a file and date

pair. Only the current version can be written to. Most recent revision is fastest, but all versions can be

accessed relatively quickly. Only a single version exists at a time.

Page 16: Deciding When to Forget in the Elephant File System

16

Elephant file system: Features for Comparison

• Storage Files with no versions are stored as efficiently as files without

versioning. Revisions to inodes are stored in a inode log, which uses full

blocks and is much larger than a single inode. Directories are stored as name histories.

Page 17: Deciding When to Forget in the Elephant File System

17

Elephant file system vs. the Trash Can

• User Control Users manually empty the trash can. This causes files to have different

levels of protection based on when they were deleted and when the trash can was emptied.

• Automation Files are automatically moved to the trash can on delete.

• Granularity Very coarse-grained. Only protects files against accidental deletion. Only until the trash can is emptied. No directory protection.

• Access Files can retrieved from the trash can, but the user needs to determine

where to put it.• Storage

Copy of entire file is kept in the trash can.

Page 18: Deciding When to Forget in the Elephant File System

18

Elephant file system vs. Backups• User Control

Typically no control over system backups. Users can manually copy files.

• Automation System backups are usually automatic.

• Granularity Very coarse over time. No fine grained revisioning No protection between backups. Typically limited by backup retention policy (number of tapes).

• Access System backups are usually very expensive to retrieve. User manual backups are usually closer, but not always convenient.

• Storage Usually full or differential copies of the data.

Page 19: Deciding When to Forget in the Elephant File System

19

Elephant file system vs. Checkpoints

• User Control Typically no user control over checkpoints.

• Automation Checkpoints are usually automatic.

• Granularity Very coarse over time. No fine grained revisioning No protection between backups. Typically limited by checkpoints retention policy (space).

• Access Typically on-line, easy to get to.

• Storage Efficient. Copy-on-write policy maintains changes to file system

after the checkpoint.

Page 20: Deciding When to Forget in the Elephant File System

20

Elephant file system vs. Revision Control System

• User Control Only retains history on user selected files, but usually best to use

revision control on all files in a directory. No policies to select, entire history is retained. File groups can be "tagged" to establish a consistent version. (Like

landmarks and grouping.)• Automation

No automation. Usually a set of command line tools that are initiated by the user.

Checkout, commit...• Granularity

Medium granularity. Only committed changes are kept. All versions are retained. Often it is difficult or impossible to remove old

versions. Typically revision control does not include directories. (CVS) Often renaming or moving files will break file histories. (CVS,

SourceSafe)

Page 21: Deciding When to Forget in the Elephant File System

21

Elephant file system vs. Revision Control System

• Access Files can be accessed by name and version. Only most recent files can be modified. Older versions can be branched. Branches can be merged. Multiple branches (versions) can exists at a time.

• Storage Text file are usually stored efficiently as differentials. Access is fast for recent versions and slow for old

versions. Binary file storage is usually inefficient, full copies.

Page 22: Deciding When to Forget in the Elephant File System

22

Elephant file system: Summary• Most files don't need versioning so impact is low.• Performance is very close to a system with no

versioning.• Storage cost of metadata is high in the prototype

implementation.

• Disk capacity has increased as predicted in this paper, but so has the need for capacity due to digital music and imaging.

• Usage patterns have also changed for the same reasons.

• Does this system still make as much sense in the face of these changes? Definitely!

Page 23: Deciding When to Forget in the Elephant File System

23

References• "Deciding When to Forget in the Elephant File System."

D. S. Santry, M. J. Feeley, N. C. Hutchinson, A. C. Veitch, R. W. Carton, and J. Or, In Proceedings of the Seventeenth ACM Symposium on Operating Systems Principles, December 12-15, 1999, Charleston, SC, pp. 110-123.

• Historic disk capacity and price data: http://www.littletechshoppe.com/ns1625/winchest.html

• Current media capacities and prices: http://froogle.google.com