Dr Markus Hagenbuchner [email protected] CSCI319markus/SIM/CSCI319/11_FileSystems.pdf · NFS...

54
CSCI319 Chapter 11 Page: 1 Dr Markus Hagenbuchner [email protected] CSCI319 Distributed Systems

Transcript of Dr Markus Hagenbuchner [email protected] CSCI319markus/SIM/CSCI319/11_FileSystems.pdf · NFS...

CSCI319 Chapter 11 Page: 1

Dr Markus Hagenbuchner

[email protected]

CSCI319

Distributed Systems

CSCI319 Chapter 11 Page: 2

DISTRIBUTED FILE SYSTEMSLecture notes based on the textbook by Tannenbaum

Study objectives:

1. Understand the role of distributed file systems.

2. Understand the requirements and concepts of distributed file

system design.

3. Understand how the eight design principles are applied in

the realization of distributed file systems.

4. Obtain a better understanding of the workings of NFS.

CSCI319 Chapter 11 Page: 3 of 54

Content

• File system models

• Typical client-server architectures

• Communication

• Naming and Mounting

• Synchronization

• File sharing

• File locking

• Caching

• Fault tolerance

• Security

Therefore, we are taking a view ahead of what is covered later in this subject. The aim is to obtain an overview on how the various design principles are applied.

CSCI319 Chapter 11 Page: 4 of 49

Distributed File Systems

• Sharing data is fundamental to distributed systems.

• Distributed file systems form the basis for many distributed applications.

• Distributed file systems allow multiple processes to share data over long periods of time (in a secure and reliable way)

• Examples: NFS, Coda, plan 9, etc….

CSCI319 Chapter 11 Page: 5 of 49

Client-Server Architectures (1)

The two most common models in DFS: The remote access model, and upload/download model.

Upload/download modelRemote access model

CSCI319 Chapter 11

Interactive slide

We have already spoken briefly about centralized and decentralized architectures. Which one of the two terms {centralized, decentralized} is correct in the following two sentences?

In DFS,

• the remote access model as depicted on the previous slide engages the architecture.

• the upload/download model uses a .architecture.

6

CSCI319 Chapter 11

Interactive slide

We have already spoken briefly about centralized and decentralized architectures. Which one of the two terms {centralized, decentralized} is correct in the following two sentences?

In DFS,

• the remote access model as depicted on the previous slide engages the decentralized architecture.

• the upload/download model uses a centralizedarchitecture.

7

CSCI319 Chapter 11 Page: 8 of 49

Client-Server Architectures (2)

NFS realizes the remote access model using a layered architecture.

The basic NFS architecture for UNIX systems can be illustrated as

follows:

CSCI319 Chapter 11 Page: 9 of 49

Interactive Slide

What is the purpose of the VFS layer?

* To achieve transparency. The VFS clients are unaware of

physical attributes (i.e. physical location, media, etc.) of a file.

Which elements can call the VFS layer (in NFS systems)?

1. The system call layer (part of the OS)

2. The NFS client

What is a “stub” in the context of RPC?

* A communication “end-point”

CSCI319 Chapter 11 Page: 10 of 49

Interactive Slide

What is the purpose of the VFS layer?

* To achieve transparency. The VFS client is unaware of physical

attributes (i.e. physical location, media, etc.) of a file.

Which elements can call the VFS layer (in NFS systems)?

1. The system call layer (part of the OS)

2. The NFS client

What is a “stub” in the context of RPC?

* A communication “end-point”

CSCI319 Chapter 11 Page: 11 of 49

File System Model (1)

• NFS is a protocol, based on RPC, for the

realization of distributed file systems.

• NFS is actively being maintained and developed.

• There exist several versions of NFS. Mainstream

versions are NFSv3 and NFSv4.

• NFSv4 is designed to improve performance, and

breaks with traditional views on what constitutes a

file.

• Lets compare the differences of the NFSv3 and

NFSv4 protocol on a subset of supported primitives:

CSCI319 Chapter 11 Page: 12 of 49

File System Model (1)

An incomplete list of file system operations supported by NFS.

Operation NFSv3 NFSv4 Description

Create Yes No Create a regular file

Create No Yes Create a non-regular file

Link Yes Yes Create a hard link to a file

Symlink Yes No Create a symbolic link to a file

Mkdir Yes No Create a subdirectory

Mknod Yes No Create a special file

Rename Yes Yes Change the name of a file

Remove Yes Yes Remove a file from the file system

Rmdir Yes No Remove an empty subdirectory

Open No Yes Open a file

Close No Yes Close a file

Lookup Yes Yes Look up a file by means of a file name

CSCI319 Chapter 11 Page: 13 of 49

Interactive slide

In general, what does NFS provide?

NFS provides high level primitives which allow the creation, modification, and removal of possibly remote files or directories.

NFSv4 takes the concept further by generalizing the concept of “file”, and simplifying the handling of remote files.

What are non-regular files in NFSv4?

Symbolic links, directories, and special files such as a mount point.

How does a “lookup” differ between NFSv3 and NFSv4?

NFSv4 can resolve beyond a mount point. NFSv3 can not.

CSCI319 Chapter 11 Page: 14 of 49

Interactive slide

In general, what does NFS provide?

NFS provides high level primitives which allow the creation, modification, and removal of possibly remote files or directories.

NFSv4 takes the concept further by generalizing the concept of “file”, and simplifying the handling of remote files.

What are non-regular files in NFSv4?

Symbolic links, directories, and special files such as a mount point.

How does a “lookup” differ between NFSv3 and NFSv4?

NFSv4 can resolve beyond a mount point. NFSv3 can not.

CSCI319 Chapter 11 Page: 15 of 49

More on files in Distributed File Systems (1)

Data is generally stored in blocks. This allows us to think about weather to distribute the files or the data blocks.

The difference between (a) distributing whole files across several servers and (b) striping files for parallel access.

CSCI319 Chapter 11 Page: 16 of 49

Interactive slide

Name advantages of file striping:

Reduces risk of catastrophic data loss.

Allows parallelization of data access

Name disadvantages of file striping:

Increases risk of data loss (this can be addressed by

introducing a parity disc).

More difficult to manage.

CSCI319 Chapter 11 Page: 17 of 54

Interactive slide

Name advantages of file striping:

• Reduces risk of catastrophic data loss.

• Allows parallelization of data access (scalable)

Name disadvantages of file striping:

• Increases risk of data loss.

• More difficult to manage (i.e. number of blocks may not be

a multiple of the number of disks).

In practice, these disadvantages can be addressed through the

introduction of redundancies (i.e., parity), and through

byte-level striping. This will be a topic in one of the

laboratory classes.

CSCI319 Chapter 11 Page: 18 of 54

Files in Distributed File Systems (2)

The striping files technique scales well. But for extremely large

file systems a different approach is needed.

Example: Web Search Engines. All general purpose search

engines for the Web require a local copy of Web content.

As of 2009, it is estimated that the WWW consists of over

60 billion Web pages of a combined size of 1.12

petabytes. The Web continues to grow at an exponential

rate. Therefore, a search engine which covers a sizeable

portion of the WWW requires a file system that scales

extremely well.

Example: Google's File System. Google introduced a cluster

based distributed file system to achieve scalability.

CSCI319 Chapter 11 Page: 19 of 49

Cluster-Based Distributed File Systems (1)

Googles’ File System (GFS) stores data in 64MB segments

distributed across a number of chunk servers. The

organization of a Google cluster of servers is as follows.

CSCI319 Chapter 11 Page: 20 of 49

Interactive slide

Explain why does the GFS scale?

Master mostly passive

Data distributed and balanced over chunk servers

Master uses a hash table (the chunk table)

The chunk table fits into the main memory.

What is the potential bottleneck in GFS, and how can this

be addressed?

Network: Introduce dedicated lines between chunk server

and GFS client.

Disk: Striping.

CSCI319 Chapter 11 Page: 21 of 49

Interactive slide

Explain why does the GFS scale?

• Master mostly passive

• Data distributed and balanced over chunk servers

• Master uses a hash table (the chunk table)

• The chunk table fits into the main memory.

What is the potential bottleneck in GFS, and how can this

be addressed?

• Network: Introduce dedicated lines between chunk

server and GFS client.

• Disk speed on Chunk Server: Use Striping.

CSCI319 Chapter 11 Page: 22 of 49

Communication in DFS

• Communication in DFS is mostly based

on RPC

– RPC makes DFS independent to OS, network,

transport protocols, etc.

• Using RPC in NFS as an example:

CSCI319 Chapter 11 Page: 23 of 49

Remote Procedure Calls in NFSExample, NFSv4 supports compound RPCs. I.e. reading a file in NFS

version 3 (a), and by using a compound procedure in version 4 (b).

Compound RPC is faster since network is often slower than disk access.

CSCI319 Chapter 11 Page: 24 of 49

The RPC2 Subsystem (2)

RPC2 aims at offering more flexible and reliable RPC:

1. Server can send back message to client to let

client know that it is still working on a request

(avoid timeouts).

2. Allows the embedding (injection) of application

side protocols in RPC. This is called “side

effects”.

3. Allows parallel RPC calls.

This will be covered in more detail during the

laboratory classes.

CSCI319 Chapter 11 Page: 25 of 49

The RPC2 Subsystem (1)

Support for “side effects” in Coda’s RPC2 system.

CSCI319 Chapter 11 Page: 26 of 49

The RPC2 Subsystem (3)

Efficiency of RPCs can be enhanced by allowing mutually

independent tasks to occur in parallel. Example: sending of

invalidation messages in RPCv1 (a) versus RPCv2 (b).

Note that RPC2 calls are still blocking calls.

CSCI319 Chapter 11 Page: 27 of 49

Naming in DFS

• Names are (almost) always organized as

hierarchical (structured) name spaces in

DFS (see textbook, chapter 5).

• NFS is defined for structured name

spaces.

• We will now look at how NFS handles

naming.

CSCI319 Chapter 11 Page: 28 of 49

Naming in NFS (1)

Example: Mounting (part of) a remote file system in NFS.

Only sub-trees explicitly “exported” by the server can be

mounted by a client.

CSCI319 Chapter 11 Page: 29 of 49

Naming in NFS (2)

Example 2: Mounting nested

directories from multiple servers in

NFS.

CSCI319 Chapter 11 Page: 30 of 49

Automounting (1)(Static) mounting can be troublesome with large directory structures (i.e.

home directories). This is countered in NFS through an automounter.

CSCI319 Chapter 11 Page: 31 of 49

Automounting (2)To bypass the automounter whenever a mountpoint (here alice) is accessed, we can use symbolic links with automounting.

CSCI319 Chapter 11 Page: 32 of 49

Constructing a Global Name Space

A distributed file server may have to deal with several different name spaces. This has been addressed through the introduction of GNS which introduces the notion of Junctions. In GNS, clients maintain a virtual tree in which nodes are either a directory or a junction. There are 5 types of junctions:

CSCI319 Chapter 11 Page: 33 of 49

Interactive slide

What is the role of the 5 junctions in GNS?

• GNS junction: Refers to another GNS which may

be hosted on another system or by another process.

• Logical file-system name and physical file name:

Required to contact a location service (which

provides a handle or address of a file)

• Physical file-system name and logical file name:

Refer to a file system on another server (the contact

address). Example:

http://www.uow.edu.au/index.html is a physical file

name example.

CSCI319 Chapter 11 Page: 34 of 49

Interactive slide

What is the role of the 5 junctions in GNS?

• GNS junction: Refers to another GNS which may

be hosted on another system or by another process.

• Logical file-system name and logical file name:

Required to contact a location service (which

provides a handle or address of a file). Example:

http://www.uow.edu.au/research/2012/index.html

• Physical file-system name and physical file

name: Refer to a file system on another server.

Example: C:\data\pub\index.html may be the name

of a physically existing file.

CSCI319 Chapter 11 Page: 35 of 49

Synchronization in DFS

Issues that require attention:

• File sharing: A files may be accessed by

multiple clients simultaneously.

• File locking: Deny concurrent accesses.

• Caching: Replication of files to where the

processes are located.

CSCI319 Chapter 11 Page: 36 of 49

File Sharing Semantics (1)

Example: Read-follows-

write semantics. On a

single processor, when a

read follows a write, the

value returned by the read

is the value just written.

(UNIX semantics)

NFS used the UNIX

semantics thus fast

successive writes followed

by a read maintains the

correct order.

CSCI319 Chapter 11 Page: 37 of 49

File Sharing

Semantics (2)

Example 2: Session semanticsallows client side caching. In a distributed system with caching, obsolete values may be returned. This can result in inconsistencies:

But UNIX semantics only work

on systems where there is:

•Only one file server

•No client side caching

CSCI319 Chapter 11 Page: 38 of 49

File Sharing Semantics (3)

In fact, there are four common ways of dealing with

shared files in a distributed system.

Immutable files cannot change content but atomically

replace a file.

With transactions, changes between begin_transaction

and end_transaction are atomic.

CSCI319 Chapter 11 Page: 39 of 49

Interactive slide

What happens with using immutable files when one file is

replaced while another process is reading it?

CSCI319 Chapter 11 Page: 40 of 49

Interactive slide

What happens with using immutable files when one file is

replaced while another process is reading it?

1. Maintain a copy of the old file until all reads or

complete, or

2. Refuse subsequent reads from old file.

CSCI319 Chapter 11 Page: 41 of 49

File Locking (1)

Transaction semantics differ from file locking. File locking is supported

with NFSv4. Operations in NFSv4 related to file locking are:

With locking, there are two cases to consider:

1. Accessing a resource which may or may not already be locked.

2. Requesting a lock on a resource which may already be accessed by

another process.

CSCI319 Chapter 11 Page: 42 of 49

File Locking (2)

Case 1: A client requests shared access given the current

denial state. The result of an open operation with share

reservations in NFSv4 is as follows:

CSCI319 Chapter 11 Page: 43 of 49

File Locking (3)

Case 2: A client requests a denial state given the current

file access state. The result of an open operation with share

reservations in NFSv4 is as follows:

CSCI319 Chapter 11 Page: 44 of 49

Client-Side Caching (1)A more detailed look at the effects of client-side caching in

NFS.

Problem: Local cache may be inconsistent with associated file on FS.

->NFSv4 aims at improving issues with inconsistencies when caching data.

CSCI319 Chapter 11 Page: 45 of 49

Client-Side Caching (2)NFSv4 addresses this problem by using file delegation, and a

callback mechanism to recall file delegation.

CSCI319 Chapter 11 Page: 46 of 49

Sharing Files in Coda (with respect to caching)

Example: The transactional behavior in sharing files in Coda. Note that

transaction semantics utilizes the upload/download model.

From a transactional point of view there is no problem since SA precedes SB

CSCI319 Chapter 11 Page: 47 of 49

Client-Side Caching in Coda

A solution to overcome problems with caching is by using a “callback-promise” as in Coda. For example, the use of local copies when opening a session in Coda.

CSCI319 Chapter 11 Page: 48 of 49

Fault tolerance

.

RAID is used to achieve fault tolerance on centralized FS. RAID is

not suitable for DFS. In DFS, fault tolerance can be achieved i.e.

with the Byzantine method. For example: The different phases in

Byzantine fault tolerance:

CSCI319 Chapter 11 Page: 49 of 49

Interactive slide

.

Give one solution to how to achieve transparency in

the Byzantine method.

For example:

• use a trusted coordinator (or master) process.

• implement within the clients’ middleware layer.

• implement within the application layer.

CSCI319 Chapter 11 Page: 50 of 49

Interactive slide

.

Give one solution to how to achieve transparency in

the Byzantine method.

For example:

• use a trusted coordinator (or master) process.

• implement in the client side middleware layer.

• implement within the application layer (not

recommended).

CSCI319 Chapter 11 Page: 51

Security in NFSNFSv3 has very limited support for security. The NFSv3

security architecture is founded on SSL which serves as a

secured tunnel for NFS data. For example:

CSCI319 Chapter 11 Page: 52

Secure RPCsThis is improved significantly in NFSv4 where a security layer has become part of the FS. This is illustrated in the following:

CSCI319 Chapter 11 Page: 53

Access ControlThis allows various kinds of users and processes to be distinguished by NFSv4 with respect to access control. For example, a selection of valid users in NFSv4 are as follows:

The first three are also known in NFSv3 but not consequently realized.

CSCI319 Chapter 11 Page: 54

Summary on DFS

• File system models

• Typical client-server architectures

• Communication

• Naming and Mounting

• Synchronization

– File sharing

– File locking

– Caching

– Fault tolerance

– Security considerations