Information Management NTU Distributed File Systems.

41
Information Management NTU Distributed File Systems

Transcript of Information Management NTU Distributed File Systems.

Page 1: Information Management NTU Distributed File Systems.

Information Management NTU

Distributed File Systems

Page 2: Information Management NTU Distributed File Systems.

Information Management NTU

Purposes of a Distributed File System

Sharing of storage and information across a network

Convenience (and efficiency) of a conventional file system

Persistent storage that most other services (e.g., Web servers) need

Page 3: Information Management NTU Distributed File Systems.

Information Management NTU

Properties of Storage Systems

Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.

Other properties include availability, timing guarantees, etc.

Page 4: Information Management NTU Distributed File Systems.

Information Management NTU

Files

Files are an abstraction of permanent storage.

A file is typically defined as a sequence of similar-sized data items along with a set of attributes.

A directory is a file that provides a mapping from text names to internal file identifiers.

Page 5: Information Management NTU Distributed File Systems.

Information Management NTU

File Attributes

Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.

Page 6: Information Management NTU Distributed File Systems.

Information Management NTU

File Systems

Responsible for the (a) organization, (b) storage, (c) retrieval, (d) naming, (e) sharing, and (f) protection of files.

Provide a set of programming operations that characterize the file abstraction, particularly operations to read and write subsequences of data items beginning at any point of a file.

Page 7: Information Management NTU Distributed File Systems.

Information Management NTU

File System Modules

Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.

A basic distributed file system implements all of the above plus modules for

client-server communication and distributed naming and location of files.

Page 8: Information Management NTU Distributed File Systems.

Information Management NTU

UNIX File Operations

Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.

Page 9: Information Management NTU Distributed File Systems.

Information Management NTU

Distributed File System Requirements

Transparency: access, location, mobility, performance, and scaling transparency.

Concurrency (and Consistency) Replication/Caching (and Consistency) Hardware/operating system heterogeneity Fault-Tolerance Security (Access Control, Authentication) Efficiency

Page 10: Information Management NTU Distributed File Systems.

Information Management NTU

A File Service Architecture

Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.

Note: The modules communicate with one another by remote procedure calls.

Page 11: Information Management NTU Distributed File Systems.

Information Management NTU

File Service Components

Flat file service: implementing operations on the contents of files, which are referred to by unique file identifiers (UFIDs)

Directory service: mapping text names of files (including directories) to their UFIDs

Client module: integrating and extending the previous two services under a single application programming interface

* Why is this structure more open and configurable?

Page 12: Information Management NTU Distributed File Systems.

Information Management NTU

Flat File Service Operations

Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.

Page 13: Information Management NTU Distributed File Systems.

Information Management NTU

Difference from UNIX

Immediate access to files using UFIDs (without open or close)

Read or write starts at the position indicated by a parameter

All operations, except create, are repeatable

Allows a stateless implementation

Page 14: Information Management NTU Distributed File Systems.

Information Management NTU

Access Control

Conventional access rights checks (at open calls) not feasible

Two ‘stateless’ approaches:

* Capability (by manipulating the UFID)

* User identity sent with every request (adopted in NFS and AFS)

Main problem: forged requests; some authentication mechanism is needed

Page 15: Information Management NTU Distributed File Systems.

Information Management NTU

Capabilities and UFIDs

A capability is a binary value that acts as an access key; it can be encoded in the UFID.

Basic construction of a UFID:

file group id + file number + random number

Additional field: permissions Additional field: encryption of the

permission field

Page 16: Information Management NTU Distributed File Systems.

Information Management NTU

Directory Service Operations

Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.

Note: Each directory is stored as an ordinary file with a UFID.

Page 17: Information Management NTU Distributed File Systems.

Information Management NTU

The Network File System (NFS)

Introduced by Sun Microsystems in 1985, now an Internet standard

Runs on top of RPC (RFC 1831) Implemented on most operating systems Version described here: UNIX

implementation of NFS Version 3 (RFC 1813, June 1995)

Most recent version: NFS Version 4 (RFC 3010, December 2000)

Page 18: Information Management NTU Distributed File Systems.

Information Management NTU

NFS Architecture

Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.

Note: Each computer can act as both a client and a server.

Page 19: Information Management NTU Distributed File Systems.

Information Management NTU

The Virtual File System Module

Access transparency File handles (file identifiers):

‘filesystem indentifier’ + ‘i-node number’ + ‘i-node generation number’

One VFS structure for each mounted filesystem relates a remote filesystem (identified by its file handle obt

ained at mount time) to a local directory on which it is mounted

One v-node per open file indicates whether a file is local or remote, etc.

Page 20: Information Management NTU Distributed File Systems.

Information Management NTU

The NFS Client Module in UNIX

Integrated with the kernel Emulates the UNIX file system primitives A single client module serves all user-level

processes The encryption key for authentication

stored in the kernel Caches file blocks

Page 21: Information Management NTU Distributed File Systems.

Information Management NTU

NFS Server Operations

Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.

Page 22: Information Management NTU Distributed File Systems.

Information Management NTU

NFS Server Operations (cont’d)

Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.

Page 23: Information Management NTU Distributed File Systems.

Information Management NTU

Remote File Acceses

Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.

Page 24: Information Management NTU Distributed File Systems.

Information Management NTU

File System Information in UNIX

saturn:~ 35 % df -kFilesystem kbytes capacity Mounted on/dev/dsk/c0t3d0s0 143903 91% //dev/dsk/c0t3d0s6 267943 99% /usr/dev/dsk/c0t3d0s3 15383 3% /tmpgalaxy:/usr/local.real 4030440 53% /usr/locallucky:/var/mail.real 564648 86% /var/mailcosmos:/home.real/student/xxx

3941760 60% /home/xxxgalaxy:/home.real/faculty/yyy

2964512 51% /home/yyy

* Note: The output of ‘df -k’ has been edited.

Page 25: Information Management NTU Distributed File Systems.

Information Management NTU

Caching

Server caching read-ahead write-through delayed-write with the commit operation

Client caching cache validation (freshness interval and validation tim

estamp, modification timestamp and getattr, …) bio-daemon (for read-ahead and delayed-write cachin

g at the client side)

Page 26: Information Management NTU Distributed File Systems.

Information Management NTU

Achievements of NFS

Access and location transparency Mobility transparency (partially) Read-only file replication: the automounter Fault-tolerance: stateless servers, the automoun

ter Efficiency: caching of disk blocks (main problem:

frequent use of getattr)

Nonachievements: scalability, concurrency and consistency, security (Kerberos), ...

Page 27: Information Management NTU Distributed File Systems.

Information Management NTU

The Andrew File System (AFS)

Developed at CMU Current versions: AFS-2, AFS-3 Compatible with NFS Main achievement over (older) NFS: better

scalability by minimizing client-server communication

Key characteristics: whole-file serving and caching (partial file caching allowed in AFS-3)

Page 28: Information Management NTU Distributed File Systems.

Information Management NTU

Observations onUNIX File Usage

Files are mostly small Read operations are more common Sequential accesses are more common Most files are written by one user Files are referenced in burst

Page 29: Information Management NTU Distributed File Systems.

Information Management NTU

AFS Architecture

Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.

Page 30: Information Management NTU Distributed File Systems.

Information Management NTU

AFS File Name Space

Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.

Page 31: Information Management NTU Distributed File Systems.

Information Management NTU

System Call Interception in AFS

Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.

Page 32: Information Management NTU Distributed File Systems.

Information Management NTU

AFS System Calls Implementation

Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.

Page 33: Information Management NTU Distributed File Systems.

Information Management NTU

Cache Consistency

A callback promise is provided when Vice supplies a copy of file to a Venus process

The callback promise stored with the cached copy is in either valid or cancelled state

When Venus handles an open, it checks the cache.

Page 34: Information Management NTU Distributed File Systems.

Information Management NTU

The Vice Service Interface

Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.

Page 35: Information Management NTU Distributed File Systems.

Information Management NTU

Enhancements to NFS and AFS

Spritely NFS add open and close, use callbacks

NQNFS (Not Quite NFS) use callbacks and leases

WebNFS allow browsers and other applications to interact with an NF

S server directly NFS Version 4 (RFC 3010, December 2000)

incorporating all of the above and more DCE/DFS (based on AFS)

use callbacks and write tokens (with a lifetime)

Page 36: Information Management NTU Distributed File Systems.

Information Management NTU

New Features of NFS Version 4

Adoption of the RPCSEC_GSS (RFC 2203) security protocol

Multiple operations in one request Better migration and replication abilities

A client may query the location(s) of a file system. Introduction of open and close operations Lease-based file locking Callback-based delegation of files

Page 37: Information Management NTU Distributed File Systems.

Information Management NTU

New Design Approaches

Backgroundhigh-performance storage technology (e.g., RAID) log-structure file systems (e.g., Sprite, BSD LFS)high-performance switched networks (e.g., ATM,

high-speed Ethernet) Goals: high scalability and fault-tolerance Main ideas: distribute file data among

many nodes, separate responsibilities, … Constraints: high level of trust

Page 38: Information Management NTU Distributed File Systems.

Information Management NTU

More Recent File System Designs

xFSServerless: all data, metadata, and control can be lo

cated anywhere in the system; any machine can take over the responsibilities of a failed one

FrangipaniTwo-layer structure

the Petal distributed virtual disk system the Frangipani server module

Both designs utilize RAID-style striping, log-structured file storage, etc.

Page 39: Information Management NTU Distributed File Systems.

Information Management NTU

Log-based Striping in xFS

Source: T.E. Anderson et al., Serverless Network File Systems, ACM TOCS 1996

Page 40: Information Management NTU Distributed File Systems.

Information Management NTU

An xFS Configuration

Source: T.E. Anderson et al., Serverless Network File Systems, ACM TOCS 1996

Page 41: Information Management NTU Distributed File Systems.

Information Management NTU

A Frangipani Configuration

Source: C.A. Thekkath et al., Frangipani: A Scalable Distributed File System, ACM SOSP 1997