Unit 7 File Systems Reading: - Text : 6.1.2 & 6.1.3 - File Systems section from any new book on...

51
Unit 7 File Systems Reading: - Text : 6.1.2 & 6.1.3 - File Systems section from any new book on Operating Systems (like Tanenbaum's in course reference books) Original slides by Patrice Belleville ; Changes by George Tsiknis

Transcript of Unit 7 File Systems Reading: - Text : 6.1.2 & 6.1.3 - File Systems section from any new book on...

Unit 7

File Systems

Reading:- Text : 6.1.2 & 6.1.3-File Systems section from any new book on Operating Systems (like Tanenbaum's in course reference books)

Original slides by Patrice Belleville ; Changes by George Tsiknis

Unit Outline

Disks characteristics Rotating disks (6.1.2) Solid state disks (6.1.3)

Files and Directories File System implementation and layout

ISO 9660 (CD-ROMs) MS-DOS Linux

Virtual File Systems Robustness and recovery

Unit 7 2

Main Memory vs Disk Differences between memory and disk

access• memory location are accessible individually.• data on disk can only be accessed one chunk at a time

– block size is typically between 512b and 8Kb. naming

• variables are accessed using their address.• data on disk is normally accessed through a file name• the OS translates the name and offset into a logical

block number• the disk controller maps that number to the location on

disk

Unit 7 3

A Disk DriveSpindle

Arm

Actuator

Platters

Electronics(including a processor and memory!)

SCSIconnector

Image courtesy of Seagate TechnologyUnit 7 4

Disk Structure

Hard disks: platter view (from the side)

Unit 7 5

Surface 0

Surface 1Surface 2

Surface 3Surface 4

Surface 5

Cylinder k

Spindle

Platter 0

Platter 1

Platter 2

Spindle

SurfaceTracks

Track k

Sectors

Gaps

Disk Structure Hard disks: surface layout

Unit 7 6

SpindleSpindle

A hard disk in action:

Disk Structure

SpindleSpindle

Unit 7 7

Disks Operation What affects the time needed to retrieve data from a

hard disk? Seek Time: Time to position the arm on the right track

Tavg seek ~ 9 ms Rotational Latency: Time to position head at the right sector

Tavg rotation = ½ * 1/RPM * 60 secs Average Transfer Time : time to transfer a sector

Tavg transfer = 1/RPM * 1/<avg # sectors per track> * 60 secs Then Taccess = Tavg seek + Tavg rotation + Tavg transfer

Example: Disk with: 15000 RPM, 10ms avg seek and 500 sectors/track. Taccess =Unit 7 8

Logical Disk Blocks

Modern disks present a simpler abstract view of the complex sector geometry: The set of available sectors is modeled as a sequence of b-

sized logical blocks (0, 1, 2, ...) Mapping between logical blocks and actual (physical)

sectors Maintained by hardware/firmware device called disk

controller. Converts requests for logical blocks into

(surface,track,sector) triples.

Unit 7 9

Accessing Disk: Direct Memory Access (DMA)

Disk controller transfers data to/from main memory independently of CPU

Process initiated by CPU using PIO

• send request to controller with addresses and sizes Data transferred to memory without CPU involvement Controller signals CPU with interrupt when transfer complete

Can transfer large amounts of data with one request

1: PIO data transfer CPU -> Controller initiated by CPU

1: PIO data transfer CPU -> Controller initiated by CPU

2: DMA data transfer Controller <-> Memory initiated by Controller

2: DMA data transfer Controller <-> Memory initiated by Controller3: Interrupt

control transfer Controller -> CPU initiated by Controller

3: Interrupt control transfer Controller -> CPU initiated by Controller

Unit 13 10

Unit Outline

Disks characteristics Rotating disks (6.1.2) Solid state disks (6.1.3)

Files and Directories File System implementation and layout

ISO 9660 (CD-ROMs) MS-DOS Linux

Virtual File Systems Robustness and recovery

Unit 7 11

Solid State Disks (SSDs)

Used in USB sticks, digital cameras, iPods, etc. Pages: 512KB to 4KB, Blocks: 32 to 128 pages Data read/written in units of pages. Page can be written only after its block has been erased A block wears out after 100,000 repeated writes.

Flash translation layer

I/O bus

Page 0 Page 1 Page P-1…Block 0

… Page 0 Page 1 Page P-1…Block B-1

Flash memory

Solid State Disk (SSD)

Requests to read and write logical disk blocks

Unit 7 12

SSD Performance Characteristics

Why are random writes so slow? Need to erase a block (takes around 1 ms) Must copy of all useful pages in the block

• Find a used block (new block) and erase it• Write the page into the new block• Copy other pages from old block to the new block

Sequential read tput 250 MB/s Sequential write tput 170 MB/sRandom read tput 140 MB/s Random write tput 14 MB/sRand read access 30 us Random write access 300 us

Unit 7 13

SSD Tradeoffs vs Rotating Disks Advantages

No moving parts faster, less power

Disadvantages Have the potential to wear out

• Mitigated by “wear leveling logic” in flash translation layer• E.g. Intel X25 guarantees 1 petabyte (1015 bytes) of random

writes before they wear out In 2010, about 100 times more expensive per byte

Applications MP3 players, smart phones, laptops Beginning to appear in desktops and servers

Unit 7 14

Unit Outline

Disks characteristics Rotating disks (6.1.2) Solid state disks (6.1.3)

Files and Directories File System implementation and layout

ISO 9660 (CD-ROMs) MS-DOS Linux

Virtual File Systems Robustness and recovery

Unit 7 15

File System Issues

What issues are relevant to the design of a file system? How files are named. Where information about a file is stored. How to find a file's data, given its name. How space for new files is allocated. How to recover from hardware and software failures.

Unit 7 16

Files

In both Windows and Unix, a file is a sequence of bytes. very flexible

These bytes are given meaning by user programs. How do we determine the type of data in a file?

Using the file name (e.g. file extension in Windows) By looking at the first few bytes (e.g. Unix)

Attributes are associated with each file: These vary depending on the operating system.

Unit 7 17

Common File Attributes

File size File owner and group. Location of the file's data Time of creation/last access/last update File permissions (who can read/write/execute it) Assorted flags (hidden/system/archive/lock/etc)

Unit 7 18

File Names A file is accessed using its name. Rules for names depend on the operating system

MS-DOS/Windows up to Windows ME (1981)• 8 ASCII characters, followed by “.” and 3 characters

extension.• Case insensitive (that is, MYFILE.DOC is same as

myfile.doc) ISO 9660 CD-Rom (1988)

• Same as for MS-DOS.• Design goal was to support the lowest common

denominator• Extensions allow file names for Windows NT to 7, and

Unix/LinuxUnit 7 19

File Names (cont')

Windows NT to 7 (1993) 255 Unicode characters, case sensitive (can be switched

off). Many Windows tools are case insensitive!

Unix/Linux 255 ASCII characters (except NULL and /), case sensitive UTF-8 can be used with recent versions of Linux.

Unit 7 20

Directories

A directory is just a file whose data contains a list of entries.

Each entry contains information about one file or directory.

Each file or directory is an entry in some directory, except for the top-level directory.

Unit 7 21

Unit Outline

Disks characteristics Rotating disks (6.1.2) Solid state disks (6.1.3)

Files and Directories File System implementation and layout

ISO 9660 (CD-ROMs) MS-DOS Linux

Virtual File Systems Robustness and recovery

Unit 7 22

ISO9660 CD-ROM File System CD-ROMs are read-only. Consists of a sequence blocks of

2048 data bytes. The file system layout is made simpler. Files are stored using contiguous blocks.

A CD-ROM contains : 16 blocks with various info set by the manufacturer a primary volume descriptor block containing the root directory

Directory entry

Unit 7 23

1 1 8 8 7 1 2 4 1 4 – 15 ? ?

Directory Entry length

Extended attributes record lengthFlags

Name length

Location Size Dt/Tm CD# Name; version

bytes

Unit Outline

Disks characteristics Rotating disks (6.1.2) Solid state disks (6.1.3)

Files and Directories File System implementation and layout

ISO 9660 (CD-ROMs) MS-DOS Linux

Virtual File Systems Robustness and recovery

Unit 7 24

MS-DOS File System

No longer used normally with computers, but in Most digital cameras MP3 players iPods (unless reformatted differently).

Directory entry (32 bytes)

Unit 7 25

8 3 1 10 2 2 2 4

ExtensionAttributes

TimeDate

First cluster #

File Name Unused Size

MS-DOS File System (cont') Space is managed using a

File Allocation Table (FAT) Each block (called cluster)

represented by a 12, 16 or 32-bit word.

A word contains the number of the next block in file.

• In other words: each file is a linked list of blocks.

Two (usually) copies of the FAT are stored on disk.

A copy is always kept in memory.

Unit 7 26

MS-DOS File System (cont')

Pros: ______________________________

Cons: the FAT table takes a lot of memory space

random access to large files is __________

fragmentation can occur frequently• blocks of some file a are scattered all over the disk

Unit 7 27

MS-DOS File System (cont')

Fragmentation example

Unit 7 28

25 26 27 28 29 30 31 32 33 34 35 *

38 39 40 41 42 43 *

0

12

24

36

Initial State: 5 files (sizes are 6, 6, 12, 12, 7 blocks)

7*54

Step 1: deleting the green file

15141413

*111098321

*232221201918171613 15 16 17 18 *

4645

44232220 21

*

21 *

*47

Step 2: creating a new 7-block fileStep 3: creating a new 8-block fileStep 4: creating a new 1-block fileStep 5: deleting the blue fileStep 6: appending 4 blocks to the gray file.

Unit Outline

Disks characteristics Rotating disks (6.1.2) Solid state disks (6.1.3)

Files and Directories File System implementation and layout

ISO 9660 (CD-ROMs) MS-DOS Linux

Virtual File Systems Robustness and recovery

Unit 7 29

Linux File System Overall disk structure

Super blocks contain information about the file system. Each group block contains a copy of its superblock, so if one dies the

information can be recovered. Information about free/occupied blocks is kept separate from the

information used to locate data.Unit 7 30

Group Block 0 Group Block 1 ... Group Block n-1 Group Block n

Super Block

Group Attributes

Block Bitmap

Inode Bitmap

Inode Table

Data Blocks

Linux File System (cont')

Example of a superblock:

Unit 7 31

Filesystem OS type Linux

Inode count: 8060928

Block count: 16113187

Reserved block count: 805659

Free blocks: 15164036

Free inodes: 8021502

First block: 0

Block size: 4096

Blocks per group: 32768

Inodes per group: 16384

Inode blocks per group: 512

First inode: 11

Inode size: 128

Linux File Structure

A file consists of An Inode

• Contains the file's attributes (but not its name).• Contains direct and indirect pointers to data blocks.• A disk block contains multiple Inodes.

Indirect blocks• These contain pointers to data blocks, or to other

indirect blocks. Data blocks

Unit 7 32

Linux File Structure

inode:

Unit 7 33

Type/Permissions

Owner info

File size

Timestamps

Data Blocks # (12)

Indirect Block #

2-indirect Block #

3-indirect Block #

Data Block

Data Block Data Block

Data Block

3-indirect Block

2-indirect Block

Indirect Block Indirect Block Indirect Block

2-indirect Block

2-indirect Block

...

...

...

... ...

...

...

Data Block

Data Block

...

Linux Directories A directory contains entries of other directories or

files. A directory entry consists of

the file name, and the Inode number for the file.

The directory contains no other information. The first entry of every directory is . : a reference to

the directory itself. The second entry of every directory is .. : a reference

to the parent directory.

Unit 7 34

Sharing Files in Linux It is possible for several directory entries to refer to

the same Inode. This is called a hard link. This is the case for . and .. Hard Links can be used to give a program several names

• Example:

all three entries refer to the same inode 1308482. Can be used to share files

All files must belong to the same file system• Why?

Unit 7 35

%ls -ali /bin1308482 -rwxr-xr-x 3 root root 31112 2010-09-11 06:48 bunzip2*1308482 -rwxr-xr-x 3 root root 31112 2010-09-11 06:48 bzcat*1308482 -rwxr-xr-x 3 root root 31112 2010-09-11 06:48 bzip2*

Sharing Files in Linux (cont')

Unix/Linux also support symbolic (soft) links A file f whose contents is the name of another file. Example:

The second file may be on a different file system.

Unit 7 36

%ls -al /lib-rw-r--r-- 1 root root 534832 2010-10-21 19:02 libm-2.12.1.solrwxrwxrwx 1 root root 14 2010-10-22 18:42 libm.so.6 -> libm-2.12.1.so

Reading Data

To read data from a file myfile.txt Find the directory containing myfile.txt. Read the inode for file myfile.txt. Read the data

• either by accessing the direct blocks.• or by going through up to 3 layers of indirect blocks.

Random access to large files is much faster than for the MS-DOS file system.

Unit 7 37

Fragmentation

Unlike the MS-DOS file system, modern file systems (NTFS, etc.) try to keep files together.

For linux: files are kept within a block group if possible. large files are written to large free areas, whereas small files

are stored in smaller free areas. Fragmentation still happens, but much more slowly,

and normally only becomes a problem if the file system is very full.

Unit 7 38

Unit Outline

Disks characteristics Rotating disks (6.1.2) Solid state disks (6.1.3)

Files and Directories File System implementation and layout

ISO 9660 (CD-ROMs) MS-DOS Linux

Virtual File Systems Robustness and recovery

Unit 7 39

Virtual File Systems How do we handle multiple disks, devices or

partitions of one disk with possibly different file systems (i.e. NTFS, FAT32, CD-ROM, etc.)?

MS-DOS, Windows Each disk is assigned a letter name

• A:\ : floppy disk• C:\ : primary hard disk• Z:\ : drive on a server somewhere on the network

This letter is used to decide which file system to pass the request to.

Hence the user must know which file system contains the file he/she wants to access.

Unit 7 40

Virtual File Systems

Unix/Linux There is a root file system / at the top of the hierarchy. Every other file system appears as a subdirectory in that file

system.• Example:

– # ls /mnt/cdrom– # mount -t iso9660 /dev/cdrom /mnt/cdrom

mount: block device /dev/sr0 is write-protected, mounting read-only

– # ls /mnt/cdrom– Autorun.arn Autorun.exe Autorun.inf docs forms

ReadMe.txt The user need not even be aware that multiple file systems

are involved.Unit 7 41

Virtual File Systems

How this is done: User programs make system calls to access various

operations. A layer called the Virtual File System (VFS) performs the

parts of the operations that are common to all file systems. The virtual file system calls low-level functions to

accomplish specific tasks. Each file system must implement these low-level functions

appropriately.

Unit 7 42

Virtual File Systems

Pictorially:

Unit 7 43

User program 1 User program 2 User program 1...

Virtual File System

ISO 9660 F. S. Ext4 F. S. VFAT F. S. ...

Unit Outline

Disks characteristics Rotating disks (6.1.2) Solid state disks (6.1.3)

Files and Directories File System implementation and layout

ISO 9660 (CD-ROMs) MS-DOS Linux

Virtual File Systems Robustness and recovery

Unit 7 44

Robustness and Recovery

File systems contain critical information. Events occur that may cause updates to fail:

Operating system crash (caused by a bug). Mechanical/Electrical failures of the disk. Power failures.

Consequences: The information about to be written may be lost. The file system may become inconsistent.

• There is a risk of losing other information

Unit 7 45

File System Consistency Check

When the operating system shuts down: It saves a file-system-is-clean bit to disk.

During the boot process The operating system checks this bit. If it's not set, then the file system may be in an inconsistent

state So it needs to fix it.

Unit 7 46

File System Consistency Check Example: Linux file system check (e2fsck) Works in 5 stages

Stage 1: reads the inodes and determines• which inodes are in use• the type of file each inode is used for• whether blocks are in use or free• which blocks contain directories• which blocks are used by fewer or more than 1 inode.

Stage 2: verifies that directory entries are valid• all of the fields must have sensible values.• entries for . and .. should be present.

Unit 7 47

File System Consistency Check Stage 3: checks the directory structure

• It must form a tree• So reconnect disconnected pieces, and break any loop.

Stage 4: check and correct reference counts• Multiple directory entries can point to the same inode• The inode keeps track of this number

– Why?

• Make sure the reference count in the Inode is correct. Stage 5: check bitmaps.

• compare block and inode bitmaps against on-disk bitmaps

• Update these if necessary.Unit 7 48

File System Recovery

File System recovery Takes a long time for large file systems. Does not always restore the file system perfectly.

How do databases handle this problem? They log the transactions being performed. If a transaction is interrupted, it can be undone or redone

by executing the logged operations. Some file systems do the same thing.

Unit 7 49

Journaling File Systems A journaling file system has a hidden file called a

journal (NTFS, Linux ext3). Each operation is broken down into atomic steps.

Example: to delete a file Free each data block. Decrement the Inode's reference count (free it if it becomes 0). Remove the directory entry for the file.

Before performing the operations Write the sequence of steps to the journal. Add an end-of-operation indicator to the journal.

After the operation completes The steps can be deleted from the journal.

Unit 7 50

Journaling File Systems Each step must be idempotent

That is, executing the step multiple times should have the same effect as executing it only one.

Why? Examples (good or bad?):

• Increment reference count for inode #786453• Set reference count for inode #786453 to 2• Mark block #9168734 free

When the file system isn't clean on reboot Replay every operation from the journal that has an end-of-

operation indicator. This is much faster than a full check.

Unit 7 51