Post on 22-Oct-2015
description
03-10-09Some slides are taken from Professor Grimshaw, Ranveer Chandra, Krasimira Kapitnova, etc
3rd year graduate student working with Professor Grimshaw
Interests lie in Operating Systems, Distributed Systems, and more recently Cloud Computing
Also Trumpet Sporty things Hardware Junkie I like tacos … a lot
2
File System refresher Basic Issues
Naming / Transparency Caching Coherence Security Performance
Case Studies NFS v3 - v4 Lustre AFS 2.0
3
What is a file system?
Why have a file system?
4
Mmmm, refreshing File Systems
Must have Name e.g. “/home/sosa/DFSSlides.ppt” Data – some structured sequence of bytes
Tend to also have Size Protection Information Non-symbolic identifier Location Times, etc
5
A container abstraction to help organize files Generally hierarchical
(tree) structure Often a special type of
file Directories have a
Name Files and directories (if
hierarchical) within them
6
A large container for tourists
Two approaches to sharing files
Copy-based Application explicitly
copies files between machines
Examples: UUCP, FTP, gridFTP, {.*}FTP, Rcp, Scp, etc.
Access transparency – i.e. Distributed File Systems
7
Sharing is caring
Basic idea Find a copy
naming is based on machine name of source (viper.cs.virginia.edu), user id, and path
Transfer the file to the local file system scp grimshaw@viper.cs.virginia.edu:fred.txt .
Read/write Copy back if modified
Pros ans Cons?
8
Pros Semantics are clear No OS/library modification
Cons? Deal with model Have to copy whole file Inconsistencies Inconsistent copies all over the place Others?
9
Mechanism to access remote the same as local (i.e. through the file system hierarchy)
Why is this better?
… enter Distributed File Systems
10
A Distributed File System is a file system that may have files on more than one machine
Distributed File Systems take many forms Network File Systems Parallel File Systems Access Transparent
Distributed File Systems
Why distribute?
11
Sharing files with other users Others can access your files You can have access to files you wouldn’t
regularly have access to Keeping files available for yourself on
more than one computer Small amount of local resources High failure rate of local resources Can eliminate version problems (same file
copied around with local edits)
12
Naming Performance Caching Consistency Semantics Fault Tolerance Scalability
13
What does a DFS look like to the user? Mount-like protocol .e.g
/../mntPointToBobsSharedFolder/file.txt Unified namespace. Everything looks like
they’re on the same namespace Pros and Cons?
14
Location transparency Name does not hint at physical location Mount points are not transparent
Location Independence File name does not need to be changed
when the file’s physical storage location changes
Independence without transparency?15
Generally trade-off the benefits of DFS’s with some performance hits How much depends on
workload Always look at workload
to figure out what mechanisms to use
What are some ways to improve performance?
16
Single architectural feature that contributes most to performance in a DFS!!!
Single greatest cause of heartache for programmers of DFS’s Maintaining consistency semantics more
difficult Has a large potential impact on scalability
17
Size of the cached units of data Larger sizes make
more efficient use of the network –spacial locality, latency
Whole files simply semantics but can’t store very large files locally
Small files Who does what
Push vs Pull Important for
maintaining consistency
18
Different DFS’s have different consistency semantics UNIX semantics On Close semantics Timeout semantics (at least x-second up-to
date) Pro’s / Con’s?
19
Can replicate Fault Tolerance Performance
Replication is inherently location-opaque i.e. we need location independence in naming
Different forms of replication mechanisms, different consistency semantics Tradeoffs, tradeoffs, tradeoffs
20
Mount-based DFS NFS version 3 Others include SMB, CIFS, NFS version 4
Parallel DFS Lustre Others include HDFS, Google File System, etc
Non-Parallel Unified Namespace DFS’s Sprite AFS version 2.0 (basis for many other DFS’s)
Coda AFS 3.0
21
22
Most commonly used DFS ever!
Goals Machine & OS
Independent Crash Recovery Transparent Access “Reasonable” Performance
Design All are client and servers RPC (on top of UDP v.1,
v.2+ on TCP) Open Network Computing
Remote Procedure Call External Data
Representation (XDR) Stateless Protocol
23
24
25
Client sends path name to server with request to mount
If path is legal and exported, server returns file handle Contains FS type, disk, i-node number of
directory, security info Subsequent accesses use file handle
Mount can be either at boot or automount Automount: Directories mounted on-use Why helpful?
Mount only affects client view
Mounting (part of) a remote file system in NFS.
26
Mounting nested directories from multiple servers in NFS.
27
28
Supports directory and file access via remote procedure calls (RPCs)
All UNIX system calls supported other than open & close
Open and close are intentionally not supported For a read, client sends lookup message to server Lookup returns file handle but does not copy info
in internal system tables Subsequently, read contains file handle, offset and
num bytes Each message is self-contained – flexible,
but?
a) Reading data from a file in NFS version 3.b) Reading data using a compound procedure in version 4.
29
Some general mandatory file attributes in NFS.
Attribute Description
TYPE The type of the file (regular, directory, symbolic link)
SIZE The length of the file in bytes
CHANGEIndicator for a client to see if and/or when the file has changed
FSID Server-unique identifier of the file's file system
30
Some general recommended file attributes.
Attribute Description
ACL an access control list associated with the file
FILEHANDLE The server-provided file handle of this file
FILEID A file-system unique identifier for this file
FS_LOCATIONS Locations in the network where this file system may be found
OWNER The character-string name of the file's owner
TIME_ACCESS Time when the file data were last accessed
TIME_MODIFY Time when the file data were last modified
TIME_CREATE Time when the file was created
31
All communication done in the clear Client sends userid, group id of request
NFS server Discuss
32
Consistency semantics are dirty Checks non-dirty items every 5 seconds Things marked dirty flushed within 30
seconds Performance under load is horrible, why? Cross-mount hell - paths to files different
on different machines ID mismatch between domains
33
Goals Improved Access and good performance on
the Internet Better Scalability Strong Security Cross-platform interoperability and ease to
extend
34
Stateful Protocol (Open + Close) Compound Operations (Fully utilize
bandwidth) Lease-based Locks (Locking built-in) “Delegation” to clients (Less work for the
server) Close-Open Cache Consistency (Timeouts
still for attributes and directories) Better security
35
Borrowed model from CIFS (Common Internet File System) see MS
Open/Close Opens do lookup, create, and lock all in one (what
a deal)! Locks / delegation (explained later) released on
file close Always a notion of a “current file handle” i.e. see
pwd
36
Problem: Normal filesystem semantics have too many RPC’s (boo)
Solution: Group many calls into one call (yay)
Semantics Run sequentially Fails on first failure Returns status of each
individual RPC in the compound response (either to failure or success)
37
Compound Kitty
Both byte-range and file locks Heartbeats keep locks alive (renew
lock) If server fails, waits at least the agreed
upon lease time (constant) before accepting any other lock requests
If client fails, locks are released by server at the end of lease period
38
Tells client no one else has the file Client exposes callbacks
39
Any opens that happen after a close finishes are consistent with the information with the last close
Last close wins the competition
40
Uses the GSS-API framework
All id’s are formed with User@domain Group@domain
Every implementation must have Kerberos v5
Every implementation must have LIPKey
41
Meow
Replication / Migration mechanism added Special error messages to indicate
migration Special attribute for both replication and
migration that gives the location of the other / new location
May have read-only replicas
42
43
People don’t like to move Requires Kerberos (the death of many
good distributed file systems Looks just like V3 to end-user and V3 is
good enough
44
45
Need for a file system for large clusters that has the following attributes Highly scalable > 10,000 nodes Provide petabytes of storage High throughput (100 GB/sec)
Datacenters have different needs so we need a general-purpose back-end file system
46
Open-source object-based cluster file system
Fully compliant with POSIX Features (i.e. what I will discuss)
Object Protocols Intent-based Locking Adaptive Locking Policies Aggressive Caching
47
48
49
50
51
52
Policy depends on context
Mode 1: Performing operations on something they only mostly use (e.g. /home/username)
Mode 2: Performing operations on a highly contentious Resource (e.g. /tmp)
DLM capable of granting locks on an entire subtree and whole files
53
POSIX Keeps local journal of
updates for locked files One per file operation Hard linked files get
special treatment with subtree locks
Lock revoked -> updates flushed and replayed
Use subtree change times to validate cache entries
Additionally features collaborative caching -> referrals to other dedicated cache service
54
Security
Supports GSS-API Supports (does not require) Kerberos Supports PKI mechanisms
Did not want to be tied down to one mechanism
56
57
Named after Andrew Carnegie and Andrew Mellon Transarc Corp. and then IBM took development of AFS In 2000 IBM made OpenAFS available as open source
Goals Large scale (thousands of servers and clients) User mobility Scalability Heterogeneity Security Location transparency Availability
Features: Uniform name space Location independent file sharing Client side caching with cache
consistency Secure authentication via Kerberos High availability through automatic
switchover of replicas Scalability to span 5000 workstations
58
59
Based on the upload/download model Clients download and cache files Server keeps track of clients that cache the
file Clients upload files at end of session
Whole file caching is key Later amended to block operations (v3) Simple and effective
Kerberos for Security AFS servers are stateful
Keep track of clients that have cached files Recall files that have been modified
60
Clients have partitioned name space: Local name space and shared name space Cluster of dedicated servers (Vice) present
shared name space Clients run Virtue protocol to communicate
with Vice
61
62
AFS’s storage is arranged in volumes Usually associated with files of a particular client
AFS dir entry maps vice files/dirs to a 96-bit fid Volume number Vnode number: index into i-node array of a volume Uniquifier: allows reuse of vnode numbers
Fids are location transparent File movements do not invalidate fids
Location information kept in volume-location database Volumes migrated to balance available disk space,
utilization Volume movement is atomic; operation aborted on
server crash
User process –> open file F
The kernel resolves that it’s a Vice file ->
passes it to Venus
D is in the cache & has callback –> use it without any network communication
D is in cache but has no callback –> contact the appropriate server for a new copy; establish callback
D is not in cache –> fetch it from the server; establish callback
File F is identified -> create a current cache copy
Venus returns to the kernel which opens F and returns its handle to the process
63
64
AFS caches entire files from servers Client interacts with servers only during open and
close OS on client intercepts calls, passes to Venus
Venus is a client process that caches files from servers
Venus contacts Vice only on open and close Reads and writes bypass Venus
Works due to callback: Server updates state to record caching Server notifies client before allowing another client
to modify Clients lose their callback when someone writes the
file Venus caches dirs and symbolic links for path
translation
The use of local copies when opening a session in Coda.
65
A descendent of AFS v2 (AFS v3 went another way with large chunk caching)
Goals More resilient to server and network
failures Constant Data Availability Portable computing
66
Keeps whole file caching, callbacks, end-to-end encryption
Adds full server replication General Update Protocol
Known as Coda Optimistic Protocol COP1 (first phase) performs actual semantic
operation to servers (using multicast if available) COP2 sends a data structure called an update
set which summarizes the client’s knowledge. These messages are piggybacked on later COP1’s
67
Disconnected Operation (KEY) Hoarding
Periodically reevaluates which objects merit retention in the cache (hoard walking)
Relies on both implicit and a lot of explicit info (profiles etc)
Emulating i.e. maintaining a replay log Reintegration – re-play replay log
Conflict Resolution Gives repair tool Log to give to user to manually fix issue
68
The state-transition diagram of a Coda client with respect to a volume.
69
AFS deployments in academia and government (100’s)
Security model required Kerberos Many organizations not willing to make the costly
switch AFS (but not coda) was not integrated into
Unix FS. Separate “ls”, different – though similar – API
Session semantics not appropriate for many applications
70
Goals Efficient use of large main memories Support for multiprocessor workstations Efficient network communication Diskless Operation Exact Emulation of UNIX FS semantics
Location transparent UNIX FS
71
Naming Local prefix table which maps path-name prefixes to
servers Cached locations Otherwise there is location embedded in remote stubs
in the tree hierarchy Caching
Needs sequential consistency If one client wants to write, disables caching on all
open clients. Assumes this isn’t very bad since this doesn’t happen often
No security between kernels. All over trusted network
72
The best way to implement something depends very highly on the goals you want to achieve
Always start with goals before deciding on consistency semantics
73
74