File System File System ImplementationsImplementations
Presented by:Presented by:
Gaurav GuptaGaurav Gupta
Department of CSEEDepartment of CSEE
University Of Maryland Baltimore CountyUniversity Of Maryland Baltimore County
IntroductionIntroduction
Local File systems & Remote File SystemsLocal File systems & Remote File Systems Two general File systems in modern UnixTwo general File systems in modern Unix
• System V File system, originalSystem V File system, original• Berkeley fast file system, 4.2 BSD, Berkeley fast file system, 4.2 BSD,
betterbetter Vnode/Vfs interface supports multiple file Vnode/Vfs interface supports multiple file
systems.systems. This chapter summarizes and compares This chapter summarizes and compares
the two file systemsthe two file systems
System V File System (s5fs)System V File System (s5fs)
Single logical disk or partition, one FS per Single logical disk or partition, one FS per partitionpartition
Each FS has own root, sub directories, Each FS has own root, sub directories, Files, data and metadataFiles, data and metadata
Disk Block = 512 * n, granularity of disk Disk Block = 512 * n, granularity of disk allocation for a fileallocation for a file
Translated by disk drivers in to track Translated by disk drivers in to track sectors and cylinderssectors and cylinders
B S Inode list Data blocks
Boot area superblock
Layout:Layout:• Boot area: Bootstrap codeBoot area: Bootstrap code• Superblock: Attributes and metadata of Superblock: Attributes and metadata of
file systemfile system• Inode list: one inode/file 64 bytes, fix the Inode list: one inode/file 64 bytes, fix the
size of file systemsize of file system• Data area: files, directories and indirect Data area: files, directories and indirect
blocks which hold pointers to other file blocks which hold pointers to other file data blocksdata blocks
Directories:Directories:• File containing list of files and File containing list of files and
subdirectoriessubdirectories• Fixed record of 16 bytesFixed record of 16 bytes
• 2 bytes ( 22 bytes ( 21616= 65535 files) inode = 65535 files) inode number, 14 bytes file name number, 14 bytes file name
• 0 inode number means file no longer 0 inode number means file no longer existexist
• Root directory and parent have inode Root directory and parent have inode number equal to 2 number equal to 2
7373 ..
3838 ....
99 File1File1
00 Deleted fileDeleted file
110110 Subdirectory1Subdirectory1
6565 File2File2
Inodes ( Index nodes):Inodes ( Index nodes):• Each file has one unique inodeEach file has one unique inode• Inode contains metadata of the fileInode contains metadata of the file• on-disk inode and in-core inodeon-disk inode and in-core inode
FieldField SizeSize DescriptionDescription
di_modedi_mode 22 File type, permissionsFile type, permissions
di_uiddi_uid 22 Owner UIDOwner UID
di_giddi_gid 22 Owner GIDOwner GID
di_sizedi_size 44 Size in bytesSize in bytes
di_addrdi_addr 3939 Array of block addressesArray of block addresses
:: :: ::
di_addr:di_addr:• File is not stored in contiguous blocks, File is not stored in contiguous blocks,
prevents fragmentationprevents fragmentation• An array of block address is required, An array of block address is required,
Stored in inode, prevent extra readStored in inode, prevent extra read• Size of array depends on the size of fileSize of array depends on the size of file
012345
76
89
101112
indirectDouble indirect
triple indirect
SuperblockSuperblock• Metadata about File systemMetadata about File system• One Superblock per File systemOne Superblock per File system• Kernel reads Superblock when mounting Kernel reads Superblock when mounting
the File systemthe File system• Superblock contains following Superblock contains following
informationinformation Size in blocks of the file systemSize in blocks of the file system Size in blocks of the inode listSize in blocks of the inode list Number of free blocks and inodesNumber of free blocks and inodes Free block list (Partial)Free block list (Partial) Free inode list (Full)Free inode list (Full)
Kernel OrganizationKernel Organization
In-Core InodesIn-Core Inodes• Represented by Represented by struct inodestruct inode• All fields of on-disk inode and following extra fieldsAll fields of on-disk inode and following extra fields
vnodevnode: contains the vnode of the file: contains the vnode of the file Device IDDevice ID of the partition containing the file of the partition containing the file InodeInode numbernumber of the file of the file FlagsFlags for synchronization and cache for synchronization and cache
managementmanagement Pointers to keep the inode on a Pointers to keep the inode on a free listfree list Pointers to keep the inode on a Pointers to keep the inode on a hash queuehash queue Block number of Block number of last block readlast block read
Inode LookupInode Lookup• Lookuppn(),Lookuppn(), a file system independent a file system independent
function performs pathname parsingfunction performs pathname parsing• When searching s5fs directory it translates When searching s5fs directory it translates
to a call to to a call to s5lookup()s5lookup()• s5lookup first checks directory name s5lookup first checks directory name
lookup cachelookup cache• On miss it reads the directory one block at On miss it reads the directory one block at
a timea time• If directory contains a valid filename entry, If directory contains a valid filename entry,
s5lookup() obtains inode number of files5lookup() obtains inode number of file• iget()iget() is called to locate inode is called to locate inode• iget() searches the appropriate hash table iget() searches the appropriate hash table
to get the inodeto get the inode
File I/OFile I/O• Read and write system call accept Read and write system call accept
File descriptor, user buffer address, count of number File descriptor, user buffer address, count of number of byte transferredof byte transferred
• Offset if obtained from the open file objectOffset if obtained from the open file object• Offset is advanced to the number of byte Offset is advanced to the number of byte
transferredtransferred• For random I/O “lseek” is used to set the offset For random I/O “lseek” is used to set the offset
to desired locationto desired location• Kernel verifies the file mode and puts an Kernel verifies the file mode and puts an
exclusive lock on the inode for serialized exclusive lock on the inode for serialized accessaccess
• II n read s5read() translate the starting n read s5read() translate the starting offset to logical block number in the fileoffset to logical block number in the file
Allocating and Reclaiming InodesAllocating and Reclaiming Inodes• Inode remains active as long as vnode has a Inode remains active as long as vnode has a
non-zero reference count non-zero reference count • New implementations puts the inactive inode New implementations puts the inactive inode
on free liston free list• Inode caching uses LRU replacement algorithm Inode caching uses LRU replacement algorithm
( suboptimal)( suboptimal)• When file is actively used, inode is pinned When file is actively used, inode is pinned
( ineligible for freeing)( ineligible for freeing) When file becomes inactive some pages When file becomes inactive some pages
may still be in the memorymay still be in the memory• Inode is free only when no pages are present in Inode is free only when no pages are present in
the memorythe memory• New inodes are allocated from the top of the New inodes are allocated from the top of the
free listfree list
AnalysisAnalysis• Simple designSimple design• Single superblock can be corruptedSingle superblock can be corrupted• Grouping of inode in the beginning Grouping of inode in the beginning
requires long seek time between inode requires long seek time between inode read and file accessread and file access
• Fixed lock size wastes spaceFixed lock size wastes space• Filename is limited to 14 charactersFilename is limited to 14 characters• Number of inodes are limited to 65535Number of inodes are limited to 65535
The Berkeley Fast File SystemThe Berkeley Fast File System
Improves performance, reliability and Improves performance, reliability and functionalityfunctionality
Provides all functionality of s5fs, Provides all functionality of s5fs, system call handling algorithms and system call handling algorithms and kernel data structureskernel data structures
Difference in disk layout, on disk Difference in disk layout, on disk structures and free block allocation structures and free block allocation methodsmethods
Data layout on hard diskData layout on hard disk
track0
platters
track2track1
head 0
head 1
head 2
Cylinder 0
Cylinder 1
Sector 0Sector 1
Sector size is 512 bytesSector size is 512 bytes Unix view of disk is linear array of Unix view of disk is linear array of
blocksblocks Number of sectors/block = 2Number of sectors/block = 2nn, n is , n is
small numbersmall number Device driver translates block Device driver translates block
number to logical sector number and number to logical sector number and the physical track, head and sector the physical track, head and sector numbernumber
Each cylinder contains a sequential Each cylinder contains a sequential set of block numbersset of block numbers
Head seek time, rotation latencyHead seek time, rotation latency
On disk organizationOn disk organization• Disk partition comprises of set of Disk partition comprises of set of
consecutive cylinders on diskconsecutive cylinders on disk• FFS further divides the partition into one FFS further divides the partition into one
or more cylinder groups (consecutive or more cylinder groups (consecutive cylinders)cylinders)
• Traditional superblock is divided into two Traditional superblock is divided into two structuresstructures
• FFS superblock contains information like FFS superblock contains information like number, size and location of cylinder number, size and location of cylinder group, block size, inodes etc.group, block size, inodes etc.
• Superblock does not change unless file Superblock does not change unless file system is rebuiltsystem is rebuilt
• Every cylinder group has information Every cylinder group has information about the group including free inodes, about the group including free inodes, free block lists etcfree block lists etc
• Each group has a copy of superblockEach group has a copy of superblock
Blocks and fragmentsBlocks and fragments• Advantage & disadvantage of block sizeAdvantage & disadvantage of block size• FFS divides blocks in to fragmentsFFS divides blocks in to fragments• Block size is 2Block size is 2nn , min = 4096, much , min = 4096, much
larger then s5fs ( 512/1024 bytes)larger then s5fs ( 512/1024 bytes)• Small size files fragments are usefulSmall size files fragments are useful• Lower bound of fragments = 512 bytesLower bound of fragments = 512 bytes• File has complete disk blocks except lastFile has complete disk blocks except last• First block should be a single block not First block should be a single block not
set of fragmentsset of fragments• Occasional recopying of data incase the Occasional recopying of data incase the
file grows in sizefile grows in size• FFS controls this by allowing only direct FFS controls this by allowing only direct
block to contain fragmentsblock to contain fragments
Allocation PoliciesAllocation Policies• In s5fs free inode and block list is In s5fs free inode and block list is
random except at the file system random except at the file system creation timecreation time
• FFS aim to collocate related information FFS aim to collocate related information on the disk to optimize sequential on the disk to optimize sequential accessaccess
• FFS places inodes of all the files of a FFS places inodes of all the files of a single directory into same cylinder single directory into same cylinder group (improves commands like ls –l )group (improves commands like ls –l )
• Create new directory in a different Create new directory in a different cylinder group from the parent ( for cylinder group from the parent ( for uniform distribution)uniform distribution)
• Place data blocks of file in the same cylinder Place data blocks of file in the same cylinder group as inodesgroup as inodes
• Change cylinder group when the file reaches Change cylinder group when the file reaches 48KB size and again at every MB48KB size and again at every MB
• Allocate sequential blocks of a file at Allocate sequential blocks of a file at rotationally optimal positionsrotationally optimal positions
FFS Functional enhancementsFFS Functional enhancements• Long file names- 255 characters and variable Long file names- 255 characters and variable
directory entry lengthdirectory entry length• Symbolic links- Symbolic link is a file that Symbolic links- Symbolic link is a file that
points to another file. points to another file. typetype field of the inode identifies the file as field of the inode identifies the file as
symbolic linksymbolic link
|Analysis|Analysis• Read throughput increases from 29KB/s Read throughput increases from 29KB/s
in s5fs to 221 KB/s in FFSin s5fs to 221 KB/s in FFS• CPU utilization increases from 11% to CPU utilization increases from 11% to
43%43%• Write throughput increases from 48KB/s Write throughput increases from 48KB/s
to 142 KB/sto 142 KB/s• Average wastage in data block is half Average wastage in data block is half
block per file in s5fs and half fragment block per file in s5fs and half fragment per file in FFSper file in FFS
Same when fragment size equals block sizeSame when fragment size equals block size Overhead to maintain fragmentsOverhead to maintain fragments