Chapter 6 Distributed File Systems. Topics Review of UNIX Sun NFS VFS architecture caching.
-
Upload
blaze-barker -
Category
Documents
-
view
218 -
download
0
Transcript of Chapter 6 Distributed File Systems. Topics Review of UNIX Sun NFS VFS architecture caching.
Chapter 6
Distributed File Systems
Topics Review of UNIX Sun NFS
VFS architecture caching
Layered Structure Directory service
Mapping: file name unique file ID Access control
File service Mapping: file ID inode File access
Block service Block management Device access
Directory
File service
Block service
Hierarchical Directory Systems A general hierarchy: a tree of
directoriesroot
directory
file
directory directory
directory directory
directory
file file
file
file
file
file
User directory
File System Layout Disk is divided up into several partitions
Each partition has one file system MBR – master boot record
boot the computer & contain the partition table Partition table
Starting & ending addresses of each partition One partition is marked as active
Within each partition Boot block – first block, a program loads the OS Superblock – key parameters about the file sys.
MBRPartition 1 Partition
2Partition 3
Partition 4
Boot block
Super block
Free space mgmt
I-nodes
Root dir
Files and directories
Implementing Files Key issue: how to keep track of which
disk sectors go with which file? E.g., block size= 512B, file size=2014B, so
where are these 2014/514=4 blocks on disk? Many methods
Contiguous allocation Linked list allocation I-nodes Each one has its own pros and cons
Index Nodes (i-nodes) An i-node lists the attributes and disk
addresses of the file’s blocks Only when a file is open, its i-nodes should be
loaded into memory Much smaller than FAT Irrelevant to size of disk
File attributes
Address of disk block 0
Address of disk block 1
Address of disk block 2
Address of disk block 3
Address of block of pointers
Disk block containing
additional disk addresses
i-node and 3-level index
1
12
13
14
151K pointers
1K pointers
1K pointers
4 KB
4 KB
i-node
Managing open files in File Service layer
Parent’s OFT
Child’s OFT
SystemOFT(storespositionpointers)
In-coreinodetable
inode
data
Kernel-resident Disk-residentSwappableper process
0
0
1
1
2
2
3
OFT: Open File Table (one entry per open)
1
12
Implementing Directories Directory system: map the ASCII file name onto the
info needed to locate the data Directory entry
Where are the attributes stored? In the directory entry (MS-DOS/Windows) In the i-nodes (UNIX)
Games Attributes
Mail Attributes
News Attributes
Work Attributes
DOS/Windows
Games
News
Work
File attributes
Address of disk block 0
Address of disk block 1
…
i-node
UNIX
Implementing Directories: Example
...foo
bin
64
...foo
bin2
63
...usr
vmunix
42
local 3
8
6
8
Hello world!
/usr/bin
Lnk_cnt=2Lnk_cnt=1
5
VMUNIX5
Locate A File: /usr/ast/mbox
1 .1 ..
4 bin
7 dev
14 lib
9 etc
6 usr
8 tmp
Attr.
132
…..
6 .1 ..
19
dick
30
erik
51
jim
26
ast
45
bal
Attr.
406
…..
26 .6 ..
64 grants
92 books
60 mbox
81 simix
17 src
root I-node 6 is for /usr
Block 132 is /usr dir.
I-node 26 is for /usr/ast
Block 406 is /usr/ast dir.
Looking up usr yields i-node 6
/usr is in block 132
/usr/ast is i-node 26
/usr/ast is in block 406 /usr/ast/mbox
is i-node 60
How to Share A File? If directory entry has addresses of blocks
How about new appended blocks? Addresses of Disk blocks stored separately
UNIX i-node approach Symbolic linking: create a link file containing the
path name
Dir A
Dir B Dir C
File 1Directory entry contains disk address
Dir A
Dir B Dir C
File 1
i-node
Dir A
Dir B Dir C
File 1Link file
../Dir C/File1
Symbolic linking
Caching Reserve a set of blocks in main memory as disk
sectors cache How cache works?
Maintenance of the cache Like page replacement: FIFO, LRU, etc.
Hash tableFront (LRU) Rear (MRL)
Write Important Blocks Back First Write critical blocks back to disk
immediately after they are updated (write-through) Reduce the probability of inconsistency greatly Write-through cache: modified blocks are
written back immediately Compared to delayed-write
Don’t keep data blocks in memory for too long Force synchronization periodically (per 30 sec)
Block Read Ahead If a file is read sequentially, read block
(k+1) when block k is in used by a process
If a file is randomly accessed, read ahead wastes bandwidth
Detect the access patterns for open files Switch between read ahead or not
according to current pattern Q: how to use it on stateless or stateful
servers?
Mapping file systems to physical devices
Mounting
bin etc usr
cc date sh passwd getty
bin src include
yacc ban awk uts stdio.h
/
/
Rootfile system
/dev/sd0g
Mount point
man mount Mount attaches a file system to the file system
hierarchy at the mount_point, which is the pathname of a directory. If mount_point has any contents prior to the mount operation, these are hidden until the file system is unmounted.
The table of currently mounted file systems
can be found by examining the mounted file system information file. This is provided by a file system that is usually mounted on /etc/mnttab.
NFS Architecture
Stateless File Server Robust in the face of failures, but
Not all operations are idempotent Like lock operation
Longer messages Longer processing time
Transparency Location transparency
Path name (i.e. full name of file) does not say where the file is located.
Location Independence Path name is independent of the server. Hence you can
move a file from server to server without changing its name.
Have a namespace of files and then have some (dynamically) assigned to certain servers. This namespace would be the same on all machines in the system.
Root transparency made up name / is the same on all systems This would ruin some conventions like /tmp
NFS Protocols Mounting
Analyze the pathname Request & store file handler Static & auto mounting
Directory and file access Support most UNIX calls No support for open() and close()
VFS/v-node Architecture Motivation: share a common file server by
an arbitrary collection of clients and servers Require a file-system independent framework
for file access v-node (virtual i-node): for every open file
in the VFS layer Check if a directory or file is local Contain a pointer pointing to an r-node
(remote i-node) in NFS client VFS: represent any file system
Well-defined interface One for each file system
Virtual File System
v-node Data fields (struct v-node)
Methods (struct vnodeops)
vop_open vop_lookupvop_read vop_mkdirvop_getaddr …
v_flagv_count v_type v_vfsmountedhere …
v_data
v_op
r-node
FS-independent part
Interfacedefinition
FS-dependentimplementation of vnodeops(Shared amongUnix vnodes)c
Data fields (struct vfs)
Methods (struct vfsops)
vfs_mount vfs_rootvfs_unmount vfs_syncvfs_statvfs …
vfs_next vfs_fstype vfs_vnodecoverd …
vfs_datavfs_op
FS-dependentimplementation of vfsops
FS-dependentdata
FS-independent part
Interfacedefinition
VFS implementation
Struct vfs instance vfs_data vfs_ops vfs_next: pointer to the next FS
mounted vfs_fstype: ufs, nfs, ext2fs, etc.
Mounting
rootvfsRoot filesystem
Mounted file system
vfs vfs
/ /usr /
vnode
ROOT ROOT
belongs to mounted
herevnode vnode
covers
v-nodes for mounted-on directories are kept in main memory.
Implementation Server: export one or more of its
directories for access by remote clients /etc/exports file, e.g.,/usr/local –access=hostA:hostB/usr/bin –ro
Client: mount the exported directories Become part of its directory No difference between a local file or a remote
file Two clients can communicate by sharing files
in their common directories.
Mount A Remote File System Call mount program, specify the remote
directory and local mount point. E.g., mount -t msdos /dev/ad0s1 /mnt/windows E.g., mount indus:/usr/src /usr/src
Parse the name and find the server Contact the server Receive the file handler Create a v-node for the remote directory in vfs
layer Create a r-node in NFS client, pointed by the v-
node
Mount (1)
Mounting (part of) a remote file system in NFS.
Mount (2)
Mounting nested directories from multiple servers in NFS.
Automounting (1)
ps -fe | grep automount
Automounting (2)Using symbolic links with automounting.
• Can also be used with file replication.
Open A Remote File Parse the file name
Get the v-node and r-node of the mounted file system
Ask NFS client to open the file Contact server and get the file
handler for the opened file NFS client creates an r-node for the
file vfs creates a v-node for the file
File Attributes (1)
Attribute Description
TYPE The type of the file (regular, directory, symbolic link)
SIZE The length of the file in bytes
CHANGEIndicator for a client to see if and/or when the file has changed
FSID Server-unique identifier of the file's file system
Some general mandatory file attributes in NFS.
File Attributes (2)Attribute Description
ACL an access control list associated with the file
FILEHANDLE The server-provided file handle of this file
FILEID A file-system unique identifier for this file
FS_LOCATIONS Locations in the network where this file system may be found
OWNER The character-string name of the file's owner
TIME_ACCESS Time when the file data were last accessed
TIME_MODIFY Time when the file data were last modified
TIME_CREATE Time when the file was created
Some general recommended file attributes.
Semantics of File Sharing (1)
a) On a single processor, when a read follows a write, the value returned by the read is the value just written.
b) In a distributed system with caching, obsolete values may be returned.
Semantics of File Sharing (2)
Method Comment
UNIX semantics Every operation on a file is instantly visible to all processes
Session semantics
No changes are visible to other processes until the file is closed
Immutable files No updates are possible; simplifies sharing and replication
Transaction All changes occur atomically
Modified session semantics: changes to an open file are initially visible only to the processes on the same machine. Upon closed, the changes are visible to other machines.
UNIX Semantic Probably Unix doesn't quite do this.
If a write is large (several blocks) do seeks for each
During a seek, the process sleeps (in the kernel)
Another process can be writing a range of blocks that intersects the blocks for the first write.
The result could be (depending on disk scheduling) that the result does not have a last write.
Perhaps Unix semantics means - A read returns the value stored by the last write providing one exists.
File Locking in NFS
Operation Description
Lock Creates a lock for a range of bytes
Lockt Test whether a conflicting lock has been granted
Locku Remove a lock from a range of bytes
Renew Renew the lease on a specified lock
More complicated with file replication.
Client Caching (1)
Q: where to put the cache? a) user space b) kernel space
Client Caching (2)
Using the NFS version 4 callback mechanism to recall file delegation.
Lease When a client wants a file, the server
gives a lease on it that specifies how long the copy is valid
Client renew the lease before it expires No message sent when a lease expires
How about client crash? How about server crash?
Lease time and reboot time
Cache Management Algorithms
Write trough Works, but heavy network traffic
Delayed write Better performance but possibly ambiguous semantics
Write on close Matches session semantics
Centralized control
UNIX semantics, but not robust and scales poorly
General Principles for DS Proposed by Satyanarayanan
Clients have cycles to burn Cache whenever possible Exploit the usage properties Minimize system-wide knowledge and
change Trust the fewest possible entities Batch work where possible
Possible Trends Main memory file system Fiber optic network
Effects on cache Mobile users
Disconnection Geographic location
Multimedia application VOD