Distributed Operating SystemsCS551
Colorado State University
at Lockheed-Martin
Lecture 8 -- Spring 2001
4 April 2001 CS-551, Lecture 8 2
CS551: Lecture 8
Topics– Distributed File Systems (Chapter 8)
Distributed Name Service Distributed File Service Distributed Directory Service NFS X.500
– Distributed Synchronization (Chapter 10) Global Time Physical Clocks Network Time Protocol (NTP) Logical Clocks
4 April 2001 CS-551, Lecture 8 3
Definitions
DFSs “support the sharing of information in the form of files throughout an intranet. A well-designed file service provides access to files stored at a server with performance and reliability similar to … files stored on local disks. A distributed file system enables programs to store and access remote files exactly as they do local ones, allowing users to access files from any computer in an intranet.” (Coulouris, Dollimore, Kindberg, 2001)
4 April 2001 CS-551, Lecture 8 4
Definitions, continued “…in a DS, it is important to distinguish
between the concepts of the file service and the file server. The file service is the specification of what the file system offers to its clients … the file system’s interface to the clients. A file server, in contrast, is a process that runs on some machine and helps implement the file service. A system may have one file server or several.” (Tanenbaum, 1995)
4 April 2001 CS-551, Lecture 8 5
Upload/Download Model
ServerClient
Client’s copy
Updated File
Original File
Adapted from Tanenbaum (1995)
4 April 2001 CS-551, Lecture 8 6
Remote Access Model
ServerClient
Client requestsaccess fromremote file File does
not move
Adapted from Tanenbaum (1995)
4 April 2001 CS-551, Lecture 8 7
Terms
File system– “an abstract view of secondary storage”– “responsible for
Global naming File access Overall file organization”
Distributed Name Service– “focuses on the issues related to filenames”
4 April 2001 CS-551, Lecture 8 8
Basic File Systems
File Storage– Structured versus non-structured
File Attributes– File name, size, owner, creation/modification
dates, version, protection information File Protection Modes
– Read, write, execute, append, truncate, delete
4 April 2001 CS-551, Lecture 8 12
Goals of a DFS
Network Transparency– Looks like a traditional file system on a
mainframe– User need not know a file’s location
High Availability– Users should have easy access to files,
wherever the users or files are located– Tolerant of failures
4 April 2001 CS-551, Lecture 8 13
Architecture
On the Network– File servers: hold the files– Clients: make accesses to the servers
Name Server (does name resolution)– Maps names to directories/files
Cache Manager– Implements file caching– Often at both server and clients– Coordinates to avoid inconsistent file copies
4 April 2001 CS-551, Lecture 8 14
Mechanisms of a DFS
Mounting– Binding together of different filename spaces to
form a single name space– A name space is mounted to (or bounded to) a
mount point (or node in the name space)– Need to maintain mount information
Keep it at the clients Keep it at the servers
4 April 2001 CS-551, Lecture 8 15
Name Space Hierarchy
a
i
b c
d e f g h
j k
Server X
Server Z
Server Y
Adapted from Singhal & Shivaratri (1994)
4 April 2001 CS-551, Lecture 8 16
Mechanisms: Mounting, cont.
Keep it at the clients– Client must mount each required file system
e.g. Sun’s NFS
– Each client can see a different filename space– When moving files, each client may need updating
Keep it at the servers– Each client sees identical filename space– If files are moved between servers, only need to update
servers’ information
4 April 2001 CS-551, Lecture 8 17
Mechanisms, continued Caching
– Clients get copy of remote file information Local memory, local disk, server memory
– Improves performance
Hints– Guaranteeing that all data in cache is always valid is
expensive– Some cached data can be used as a hint
If shown valid, then time is saved If found invalid, can recover without serious problems
– E.g. cache location of a file
4 April 2001 CS-551, Lecture 8 18
Mechanisms, concluded Bulk Data Transfer
– Big cost of communication is the communication protocol
– So send multiple data blocks on each transfer Less communication overhead Less context switching Fewer acknowledgements
Encryption– Enforce security– Before communication between two entities, use an
authentication server to provide a key
4 April 2001 CS-551, Lecture 8 19
DFS Design Issues
Naming and Name Resolution Caches on Disk or Main Memory Writing Policy Cache Consistency Availability Scalability Semantics
4 April 2001 CS-551, Lecture 8 20
Naming and Name Resolution
Name Resolution– “The process of mapping a name to an object,
or in the case of replication, multiple objects” (SS 94)
Name Space– “a collection of names which may or may not
share an identical resolution mechanism” (SS 94)
4 April 2001 CS-551, Lecture 8 21
Name Space Hierarchy
a
i
b c
d e f g h
j k
Server X
Server Z
Server Y
Adapted from Singhal & Shivaratri (1994)
4 April 2001 CS-551, Lecture 8 23
Naming Definitions
Location independent: A file can be moved without changing the filename
Location transparent: Filename does not tell where the file is located
4 April 2001 CS-551, Lecture 8 24
Location Transparency
Must be provided via global naming Dependent on a name being location
independent– E.g. a universal name
Example: social security number versus home street address
4 April 2001 CS-551, Lecture 8 26
Global Naming and Name Transparency A global name space requires
– Name resolution– Location resolution
Name resolution maps symbolic filenames to computer file names
Location resolution involves mapping global names to a location
Difficult if both name transparency and location transparency are both supported
4 April 2001 CS-551, Lecture 8 28
Naming Approaches
Add host name to names of files on that host– Provides unique names– Loses network transparency– Loses location transparency– Moving file to a different host causes change of
filename Possible changes to applications using that file
– Easy to find a file
4 April 2001 CS-551, Lecture 8 29
Naming Approaches, continued
Mount remote directories onto local directories– To do the mount, need to know host– Once mounted, references are location
transparent– Can resolve filenames easily– However, a difficult approach to do
Not fault tolerant File migration requires lots of updates
4 April 2001 CS-551, Lecture 8 30
Naming Approaches, concluded
Use a single global directory– Does not have disadvantages of previous
approaches– Variations found in Sprite and Apollo– Need a single computing facility or a few with
lots of cooperation Need system-wide unique filenames
– Not good on a heterogeneous system– Not good on a wide geographic system
4 April 2001 CS-551, Lecture 8 31
Naming Issues, continued
Contexts– Used to partition a name space
To avoid problems with system-wide unique names Geographical, organizational, etc.
– A name space in which to resolve a name– A filename has two parts
Context Local filename
– Almost like another level of directory– Used in x-Kernel logical file system
4 April 2001 CS-551, Lecture 8 32
Naming Issues, concluded
Name Server– Maps names to files and directories– Centralized
Easy to use A bottleneck Not fault tolerant
– Distributed Servers deal with different domains Several servers may be needed to deal with all the
components in a filename
4 April 2001 CS-551, Lecture 8 34
DFS Design Issues, continued
File Cache Location– Main Memory
Can support diskless workstations Faster Similar to design of server memory cache Competes with virtual memory system for space
– Try to avoid data blocks being in both cache and virtual memory
Can’t cache a large file– So needs to be able to handle blocks (block-oriented)
4 April 2001 CS-551, Lecture 8 35
DFS Design Issues, continued
Cache Location, continued– Local Disk
Able to handle large files without affecting performance
Doesn’t affect virtual memory system Permits incorporation of portable workstations into
distributed system– As per Coda
4 April 2001 CS-551, Lecture 8 36
DFS Design Issues, continued
Cache Writing Policy– When should a modified cache block be sent to
the server?– Write-through
Send all writes immediately to the servers Reliable, little lost if there is a crash Lose advantage of having a cache
– Delayed writing
4 April 2001 CS-551, Lecture 8 37
DFS Design Issues, continued
Cache Writing Policy, continued– Delayed writing
Forward writes to server after a delay– E.g. when a block is full– E.g. when the file is closed– E.g. when timer goes off (say every 30 seconds)
Takes advantage of cache Crash could lose some data
– What about short-lived files (e.g. temps)? Perhaps server need not know about these
4 April 2001 CS-551, Lecture 8 38
DFS Design Issues, continued
Cache Consistency– Server-Initiated
Server tells client that data needs to be updated– I.e. server needs good records
Client cache managers invalidate old data – Client-Initiated
Client cache manager makes sure client’s data is okay with server before using
– Then why bother with cache at all?
– Both these are expensive and require cooperation between clients and servers
4 April 2001 CS-551, Lecture 8 39
DFS Design Issues, continued
Cache Consistency, continued– Alternative
Do not allow file caching of shared, writeable files– As a concurrent-write sharing file may be open at multiple
clients with at least one client writing
Server needs to keep track of clients sharing files Can be avoided by locking files
4 April 2001 CS-551, Lecture 8 40
DFS Design Issues, continued Cache Consistency, concluded
– Issue: Sequential-write sharing Occurs when a client opens a file that has been
modified recently and closed by another client Problem 1
– When client opens a file, it may have outdated blocks in its cache
– Solution: use timestamps on files and cached blocks Problem 2
– When client opens a file, current data blocks may still be waiting to be flushed in another client’s cache
– Solution: Require all clients to flush modified file blocks when a new client opens file for writing
4 April 2001 CS-551, Lecture 8 42
DFS Design Issues, continued Availability
– Files can be unavailable due to server failures– Availability achieved through replication
Copies at different servers Problems
– Overhead (file space)– Consistency
• Need to maintain• Need to detect and correct inconsistencies
4 April 2001 CS-551, Lecture 8 43
Availability, continued
Unit of replication– A file is the most common unit
Cedar, Roe, Sprite Overall replica management is harder
– Directory information about file may need to be stored (e.g. protection info)
– Replicas of files belonging to a common directory may not have common file servers, requiring extra name resolutions
4 April 2001 CS-551, Lecture 8 44
Availability, continued
Unit of replication, continued– A group of files or Volume
Used by Coda Easier to associate information with the group A waste if most of the files are not really shared
– Compromise Used in Locus A user’s files are a file group (primary pack) A replica may just contain a subset of the pack
4 April 2001 CS-551, Lecture 8 45
Availability, concluded
Replication Management– Keeps mutual consistency among the copies– Suggest a weighted voting scheme
Reads/writes can happen only by votes from current copies
Timestamps are kept on current copies– Designate on or more processes as agents for
controlling access to copies Locus: each file group has a synchronization site Harp: a primary file server controls access
4 April 2001 CS-551, Lecture 8 46
Figure 8.8 Employing a Mapping Table for Intermediate File Handles.
4 April 2001 CS-551, Lecture 8 47
Figure 8.9 Distributed File Replication Employing Group Communication.
4 April 2001 CS-551, Lecture 8 48
DFS Design Issues: Scalability
Can the design deal with system as it grows?
Caching is used to improve client response time
But it introduces cache consistency problems
4 April 2001 CS-551, Lecture 8 49
Scalability, continued
Server-initiated invalidation– Server keeps track of sharers
Notifies them if file is changed Large system => busy server Helps to note if file is read-only
– Form a tree Server only deals with only delta clients directly Each of these clients can serve delta clients Etc. – forming a tree for messages to propagate
4 April 2001 CS-551, Lecture 8 50
Scalability, continued
Server structure– Decides how many clients a server can support– Single process that blocks during the I/O
Horrible – all clients must wait
– Separate process per client Context switching overhead from frequent requests
from different clients
– Thread per client Cheaper context switching
4 April 2001 CS-551, Lecture 8 51
Scalability, continued
Principle:– Minimize cross-machine interaction– Use caching, hints, relaxed sharing semantics
Stringent semantics are less scalable
– Avoid central control and central resources Central authentication service, name server, etc.
– Desire symmetry and autonomy Each machine has equal role
– Decentralized system administration
4 April 2001 CS-551, Lecture 8 52
Scalability, concluded
Principle, concluded:– Clustering
Partition system into a collection of clusters– Cluster = set of machines plus cluster server
Hope most requests are satisfied by local cluster server
– Balance and locality
With reasonable locality, clusters can be a scalable building block
4 April 2001 CS-551, Lecture 8 53
DFS Design Issues: Semantics
Characterizes the effects of accesses on files Basic (Unix semantics)
– A read operation returns the data stored by the last write operation
– Expensive Need a single coordinating server OR no sharing Or users need to use locks
4 April 2001 CS-551, Lecture 8 54
Semantics, concluded
Session semantics– Writes are visible immediately to local clients– Changes to a file are visible to remote clients,
only after closing the file– No attempt to maintain consistency
4 April 2001 CS-551, Lecture 8 55
Distributed Directory Service
Directory Structures– Hierarchical – Acyclic
E.g., Unix– Cyclic
Directory Management– List of active directories with files– Storage of directory structure
4 April 2001 CS-551, Lecture 8 56
Directory Tree on one machine
D
Adapted from Tanenbaum (1995)
E
B C
A
4 April 2001 CS-551, Lecture 8 57
Directory Graph on two machines
D
Adapted from Tanenbaum (1995)
E
B C
A
1
0
1 1
2
4 April 2001 CS-551, Lecture 8 58
Distributed Directory Service
Directory Operations– Directory service
Create, rename, delete directories, etc.
– File service Create, rename, delete files, etc.
4 April 2001 CS-551, Lecture 8 59
File Types
Library files (.lib, .dll) Program files (.c, .cpp, .p, .java, .f) Object-code files (.o, .obj) Compressed files (.zip, .Z, .gz) Archive files (.arc, .tar, .jar) Graphic files (.gif, .jpeg, .ps, .dvi) Sound files (.wav, .midi) Index files (.idx) Document files (.doc, .tex. ,wp)
Top Related