Distributed File Systems

03-10-09Some slides are taken from Professor Grimshaw, Ranveer Chandra, Krasimira Kapitnova, etc

3rd year graduate student working with Professor Grimshaw

Interests lie in Operating Systems, Distributed Systems, and more recently Cloud Computing

Also Trumpet Sporty things Hardware Junkie I like tacos … a lot

File System refresher Basic Issues

Naming / Transparency Caching Coherence Security Performance

Case Studies NFS v3 - v4 Lustre AFS 2.0

What is a file system?

Why have a file system?

Mmmm, refreshing File Systems

Must have Name e.g. “/home/sosa/DFSSlides.ppt” Data – some structured sequence of bytes

Tend to also have Size Protection Information Non-symbolic identifier Location Times, etc

A container abstraction to help organize files Generally hierarchical

(tree) structure Often a special type of

file Directories have a

Name Files and directories (if

hierarchical) within them

A large container for tourists

Two approaches to sharing files

Copy-based Application explicitly

copies files between machines

Examples: UUCP, FTP, gridFTP, {.*}FTP, Rcp, Scp, etc.

Access transparency – i.e. Distributed File Systems

Sharing is caring

Basic idea Find a copy

naming is based on machine name of source (viper.cs.virginia.edu), user id, and path

Transfer the file to the local file system scp grimshaw@viper.cs.virginia.edu:fred.txt .

Read/write Copy back if modified

Pros ans Cons?

Pros Semantics are clear No OS/library modification

Cons? Deal with model Have to copy whole file Inconsistencies Inconsistent copies all over the place Others?

Mechanism to access remote the same as local (i.e. through the file system hierarchy)

Why is this better?

… enter Distributed File Systems

A Distributed File System is a file system that may have files on more than one machine

Distributed File Systems take many forms Network File Systems Parallel File Systems Access Transparent

Distributed File Systems

Why distribute?

Sharing files with other users Others can access your files You can have access to files you wouldn’t

regularly have access to Keeping files available for yourself on

more than one computer Small amount of local resources High failure rate of local resources Can eliminate version problems (same file

copied around with local edits)

Naming Performance Caching Consistency Semantics Fault Tolerance Scalability

What does a DFS look like to the user? Mount-like protocol .e.g

/../mntPointToBobsSharedFolder/file.txt Unified namespace. Everything looks like

they’re on the same namespace Pros and Cons?

Location transparency Name does not hint at physical location Mount points are not transparent

Location Independence File name does not need to be changed

when the file’s physical storage location changes

Independence without transparency?15

Generally trade-off the benefits of DFS’s with some performance hits How much depends on

workload Always look at workload

to figure out what mechanisms to use

What are some ways to improve performance?

Single architectural feature that contributes most to performance in a DFS!!!

Single greatest cause of heartache for programmers of DFS’s Maintaining consistency semantics more

difficult Has a large potential impact on scalability

Size of the cached units of data Larger sizes make

more efficient use of the network –spacial locality, latency

Whole files simply semantics but can’t store very large files locally

Small files Who does what

Push vs Pull Important for

maintaining consistency

Different DFS’s have different consistency semantics UNIX semantics On Close semantics Timeout semantics (at least x-second up-to

date) Pro’s / Con’s?

Can replicate Fault Tolerance Performance

Replication is inherently location-opaque i.e. we need location independence in naming

Different forms of replication mechanisms, different consistency semantics Tradeoffs, tradeoffs, tradeoffs

Mount-based DFS NFS version 3 Others include SMB, CIFS, NFS version 4

Parallel DFS Lustre Others include HDFS, Google File System, etc

Non-Parallel Unified Namespace DFS’s Sprite AFS version 2.0 (basis for many other DFS’s)

Coda AFS 3.0

Most commonly used DFS ever!

Goals Machine & OS

Independent Crash Recovery Transparent Access “Reasonable” Performance

Design All are client and servers RPC (on top of UDP v.1,

v.2+ on TCP) Open Network Computing

Remote Procedure Call External Data

Representation (XDR) Stateless Protocol

Client sends path name to server with request to mount

If path is legal and exported, server returns file handle Contains FS type, disk, i-node number of

directory, security info Subsequent accesses use file handle

Mount can be either at boot or automount Automount: Directories mounted on-use Why helpful?

Mount only affects client view

Mounting (part of) a remote file system in NFS.

Mounting nested directories from multiple servers in NFS.

Supports directory and file access via remote procedure calls (RPCs)

All UNIX system calls supported other than open & close

Open and close are intentionally not supported For a read, client sends lookup message to server Lookup returns file handle but does not copy info

in internal system tables Subsequently, read contains file handle, offset and

num bytes Each message is self-contained – flexible,

a) Reading data from a file in NFS version 3.b) Reading data using a compound procedure in version 4.

Some general mandatory file attributes in NFS.

Attribute Description

TYPE The type of the file (regular, directory, symbolic link)

SIZE The length of the file in bytes

CHANGEIndicator for a client to see if and/or when the file has changed

FSID Server-unique identifier of the file's file system

Some general recommended file attributes.

Attribute Description

ACL an access control list associated with the file

FILEHANDLE The server-provided file handle of this file

FILEID A file-system unique identifier for this file

FS_LOCATIONS Locations in the network where this file system may be found

OWNER The character-string name of the file's owner

TIME_ACCESS Time when the file data were last accessed

TIME_MODIFY Time when the file data were last modified

TIME_CREATE Time when the file was created

All communication done in the clear Client sends userid, group id of request

NFS server Discuss

Consistency semantics are dirty Checks non-dirty items every 5 seconds Things marked dirty flushed within 30

seconds Performance under load is horrible, why? Cross-mount hell - paths to files different

on different machines ID mismatch between domains

Goals Improved Access and good performance on

the Internet Better Scalability Strong Security Cross-platform interoperability and ease to

extend

Stateful Protocol (Open + Close) Compound Operations (Fully utilize

bandwidth) Lease-based Locks (Locking built-in) “Delegation” to clients (Less work for the

server) Close-Open Cache Consistency (Timeouts

still for attributes and directories) Better security

Borrowed model from CIFS (Common Internet File System) see MS

Open/Close Opens do lookup, create, and lock all in one (what

a deal)! Locks / delegation (explained later) released on

file close Always a notion of a “current file handle” i.e. see

Problem: Normal filesystem semantics have too many RPC’s (boo)

Solution: Group many calls into one call (yay)

Semantics Run sequentially Fails on first failure Returns status of each

individual RPC in the compound response (either to failure or success)

Compound Kitty

Both byte-range and file locks Heartbeats keep locks alive (renew

lock) If server fails, waits at least the agreed

upon lease time (constant) before accepting any other lock requests

If client fails, locks are released by server at the end of lease period

Tells client no one else has the file Client exposes callbacks

Any opens that happen after a close finishes are consistent with the information with the last close

Last close wins the competition

Uses the GSS-API framework

All id’s are formed with User@domain Group@domain

Every implementation must have Kerberos v5

Every implementation must have LIPKey

Replication / Migration mechanism added Special error messages to indicate

migration Special attribute for both replication and

migration that gives the location of the other / new location

May have read-only replicas

People don’t like to move Requires Kerberos (the death of many

good distributed file systems Looks just like V3 to end-user and V3 is

good enough

Need for a file system for large clusters that has the following attributes Highly scalable > 10,000 nodes Provide petabytes of storage High throughput (100 GB/sec)

Datacenters have different needs so we need a general-purpose back-end file system

Open-source object-based cluster file system

Fully compliant with POSIX Features (i.e. what I will discuss)

Object Protocols Intent-based Locking Adaptive Locking Policies Aggressive Caching

Policy depends on context

Mode 1: Performing operations on something they only mostly use (e.g. /home/username)

Mode 2: Performing operations on a highly contentious Resource (e.g. /tmp)

DLM capable of granting locks on an entire subtree and whole files

POSIX Keeps local journal of

updates for locked files One per file operation Hard linked files get

special treatment with subtree locks

Lock revoked -> updates flushed and replayed

Use subtree change times to validate cache entries

Additionally features collaborative caching -> referrals to other dedicated cache service

Security

Supports GSS-API Supports (does not require) Kerberos Supports PKI mechanisms

Did not want to be tied down to one mechanism

Named after Andrew Carnegie and Andrew Mellon Transarc Corp. and then IBM took development of AFS In 2000 IBM made OpenAFS available as open source

Goals Large scale (thousands of servers and clients) User mobility Scalability Heterogeneity Security Location transparency Availability

Features: Uniform name space Location independent file sharing Client side caching with cache

consistency Secure authentication via Kerberos High availability through automatic

switchover of replicas Scalability to span 5000 workstations

Based on the upload/download model Clients download and cache files Server keeps track of clients that cache the

file Clients upload files at end of session

Whole file caching is key Later amended to block operations (v3) Simple and effective

Kerberos for Security AFS servers are stateful

Keep track of clients that have cached files Recall files that have been modified

Clients have partitioned name space: Local name space and shared name space Cluster of dedicated servers (Vice) present

shared name space Clients run Virtue protocol to communicate

with Vice

AFS’s storage is arranged in volumes Usually associated with files of a particular client

AFS dir entry maps vice files/dirs to a 96-bit fid Volume number Vnode number: index into i-node array of a volume Uniquifier: allows reuse of vnode numbers

Fids are location transparent File movements do not invalidate fids

Location information kept in volume-location database Volumes migrated to balance available disk space,

utilization Volume movement is atomic; operation aborted on

server crash

User process –> open file F

The kernel resolves that it’s a Vice file ->

passes it to Venus

D is in the cache & has callback –> use it without any network communication

D is in cache but has no callback –> contact the appropriate server for a new copy; establish callback

D is not in cache –> fetch it from the server; establish callback

File F is identified -> create a current cache copy

Venus returns to the kernel which opens F and returns its handle to the process

AFS caches entire files from servers Client interacts with servers only during open and

close OS on client intercepts calls, passes to Venus

Venus is a client process that caches files from servers

Venus contacts Vice only on open and close Reads and writes bypass Venus

Works due to callback: Server updates state to record caching Server notifies client before allowing another client

to modify Clients lose their callback when someone writes the

file Venus caches dirs and symbolic links for path

translation

The use of local copies when opening a session in Coda.

A descendent of AFS v2 (AFS v3 went another way with large chunk caching)

Goals More resilient to server and network

failures Constant Data Availability Portable computing

Keeps whole file caching, callbacks, end-to-end encryption

Adds full server replication General Update Protocol

Known as Coda Optimistic Protocol COP1 (first phase) performs actual semantic

operation to servers (using multicast if available) COP2 sends a data structure called an update

set which summarizes the client’s knowledge. These messages are piggybacked on later COP1’s

Disconnected Operation (KEY) Hoarding

Periodically reevaluates which objects merit retention in the cache (hoard walking)

Relies on both implicit and a lot of explicit info (profiles etc)

Emulating i.e. maintaining a replay log Reintegration – re-play replay log

Conflict Resolution Gives repair tool Log to give to user to manually fix issue

The state-transition diagram of a Coda client with respect to a volume.

AFS deployments in academia and government (100’s)

Security model required Kerberos Many organizations not willing to make the costly

switch AFS (but not coda) was not integrated into

Unix FS. Separate “ls”, different – though similar – API

Session semantics not appropriate for many applications

Goals Efficient use of large main memories Support for multiprocessor workstations Efficient network communication Diskless Operation Exact Emulation of UNIX FS semantics

Location transparent UNIX FS

Naming Local prefix table which maps path-name prefixes to

servers Cached locations Otherwise there is location embedded in remote stubs

in the tree hierarchy Caching

Needs sequential consistency If one client wants to write, disables caching on all

open clients. Assumes this isn’t very bad since this doesn’t happen often

No security between kernels. All over trusted network

The best way to implement something depends very highly on the goals you want to achieve

Always start with goals before deciding on consistency semantics

Distributed File Systems

Documents

Transcript of Distributed File Systems

Lecture 7 – Distributed File Systems 1 1 15-440 Distributed Systems.

Distributed File Systems CS 3100 Distributed File Systems1.

Distributed File Systems

17: Distributed File Systems 1 Jerry Breecher OPERATING SYSTEMS Distributed File Systems.

Distributed P2P file systems

CS-550: Distributed File Systems [SiS]1 Resource Management in Distributed Systems: Distributed File Systems.

Distributed File Systems

11 Distributed File Systems

1 File Systems Concepts Distributed File Systems.

DISTRIBUTED SYSTEMS EXPERIMENT-FILE

95-702 Distributed Systems Coulouris 5’th Ed. 95-702 Distributed Systems Distributed File Systems…

Distributed File Systems - Piazza

Distributed File Systems - Unicalsi.deis.unical.it/~talia/aa0304/dis/lezione7-2p.pdf · 2003. 11. 11. · 1 Distributed File Systems Chapter 10 Distributed File System a) A distributed

Distributed File Systems for Exascale Computingdatasys.cs.iit.edu/publications/2012_SC12-FusionFS-poster.pdf · " Coexist with remote parallel file systems Distributed File Systems

1 Distributed Systems Distributed File Systems Chapter 11.

Distributed File Systems

DISTRIBUTED FILE SYSTEM 1 DISTRIBUTED FILE SYSTEMS.

CS6223: Distributed Systems Distributed File Systems.

CS4513 Distributed Computer Systems File Systems.

Distributed File SystemsDistributed File Systems