Lecture 7 – Distributed File Systems 1 1 15-440 Distributed Systems.
Distributed File Systems
description
Transcript of Distributed File Systems
Thomas Hollstegge
Distributed File Systems
2
Distributed file systems
Agenda
MotivationDistributed file system basicsCase studiesSummary and outlook
3
Distributed file systems
Agenda
MotivationDistributed file system basicsCase studiesSummary and outlook
4
Distributed file systems
Motivation
ICT allows for distributed workUsers work timely and spatially separatedThey need access to common data collections
Provided by distributed file systems (DFS)
Distributed work leads to new business models24/7 customer serviceAnalysis of worldwide financial information (stock prices etc.)
Economic relevance!
Different DFSs were developed in the past Structured discussion necessary
5
Distributed file systems
Agenda
MotivationDistributed file system basicsCase studiesSummary and outlook
6
Distributed file systems
Basics – Storage fundamentals
„Storage“: Fundamendal abstraction in computingData encapsulated in objectsExplicit creation and deletionUnaffected by system failures
„File system“: Refinement of abstraction
Three different usage dimensionsSingle user vs. multiple usersSingle-thread vs. multi-thread OSSingle site vs. multiple sites
7
Distributed file systems
Basics – Requirements for DFS (1/2)
TransparencyUser must be unaware of internal separation of componentsAccess, performance, location, scaling transparency
AvailabilitySystem should be fault tolerant
Concurrent updatesSimultaneous access to a single resource
ReplicationFile may be present at different locationsShares load between servers, enhances fault tolerancy
8
Distributed file systems
Basics – Requirements for DFS (2/2)
Hardware and software heterogeneitySupport for various platforms
ConsistencyData integrity has to be maintained
SecurityAccess control, user authentication, confidentiality
EfficiencyPerformance should be comparable to local file systems
9
Distributed file systems
Basics – Abstract file service model (1/3)
Source: [CDK01], p. 318
10
Distributed file systems
Basics – Abstract file service model (2/3)
Service Operations
Directory service Lookup(Dir, Name) FileId – throws NotFoundAddName(Dir, Name, File) – throws NameDuplicateUnName(Dir, Name) – throws NotFoundGetNames(Dir, Pattern) NameSeq
Flat file service Read(FileId, i, n) Data – throws BadPositionWrite(FileId, i, Data) – throws BadPositionCreate() FileIdDelete(FileId)GetAttributes(FileId) AttrSetAttributes(FileId, Attr)
Source: [CDK01], p. 319-322
11
Distributed file systems
Basics – Abstract file service model (3/3)
Access controlServer-side user authorisationAccess rights checked upon directory lookup or every request
Hierarchical file structureRealised within the client moduleDirectories may store references to other directories
File groupsSet of files that can be moved between serversSimilar to a file system
12
Distributed file systems
Agenda
MotivationDistributed file system basicsCase studies
Network File System (NFS)Andrew File System (AFS)Lustre
Summary and outlook
13
Distributed file systems
NFS – History
198?: NFSv1Developed at Sun Microsystems, unreleased
1984: NFSv2Developed at Sun MicrosystemsFirst released version, widely acceptedSupports files < 4GB, synchronous writes
1992: NFSv3Developed by a group of researchersOvercomes drawbacks (file size, asynchronous writes)
2002: NFSv4Enhanced security, user authenticationBetter Windows support
14
Distributed file systems
NFS – General description (1/2)
Source: [CDK01], p. 324
15
Distributed file systems
NFS – General description (2/2)
Stateless protocolServer does not maintain client statesClient requests are blocking (Exception: asynchronous write)
User authenticationDefault: UNIX user ID (insecure!)Optional: Kerberos, DES
CachingRead cache: YesWrite cache: No!
Server file systemNot restricted, should support unique file IDs
16
Distributed file systems
NFS – Abstract model (1/2)
vs.
17
Distributed file systems
NFS – Abstract model (2/2)
OperationsSimilar to UNIX file system callsAll abstract operations can be represented
Access controlChecked upon every request
Hierarchical file systemRealised within the client module
File groupsNot supported, only manual movement of files
18
Distributed file systems
NFS – Requirements
TransparencyAvailabilityConcurrent updatesReplicationHeterogeneityConsistencySecurityEfficiency
19
Distributed file systems
Agenda
MotivationDistributed file system basicsCase studies
Network File System (NFS)Andrew File System (AFS)Lustre
Summary and outlook
20
Distributed file systems
AFS – History
1982: Initial versionDeveloped at Carnegie Mellon University (CMU), PittsburghPart of the Andrew distributed computing environmentProvides support for teaching and research
1989: Spin-offDevelopment outsourced to Transarc Inc.
1994: Transarc acquired by IBMAll rights owned by IBM
2000: Open-sourceCode was released under an open source licenseSince then: continuous development
21
Distributed file systems
AFS – General description (1/3)
22
Distributed file systems
AFS – Name spaces
23
Distributed file systems
AFS – General description (2/3)
Cached?No!
24
Distributed file systems
AFS – General description (3/3)
Caching„Callback promises“Workstations are notified when cached files change
Stateful protocolServer maintains client statesProblematic when client fails
User authenticationKerberos
Server file systemNot restricted, should support unique file IDs
25
Distributed file systems
AFS – Abstract model (1/2)
vs.
26
Distributed file systems
AFS – Abstract model (2/2)
OperationsDiffer from abstract modelSome operations combined, callback promises added
Access controlRights checked upon every requestExtended access lists per directory
Hierarchical file systemRealized within the client module
File groupsFile idenitfier contains link to file groupLocation database maps file groups to servers
27
Distributed file systems
AFS – Requirements
TransparencyAvailabilityConcurrent updatesReplicationHeterogeneityConsistencySecurityEfficiency
28
Distributed file systems
Agenda
MotivationDistributed file system basicsCase studies
Network File System (NFS)Andrew File System (AFS)Lustre
Summary and outlook
29
Distributed file systems
Lustre (1/3)
„Lustre“: Linux ClusterFile system especially suited for clustersEasily handles thousands of clients and servers
Uses object-based storageObjects offer methods for data access, attributes, policiesHigh-level abstractionLower performance than block-based storage
Three system rolesObject Storage Targets (OST)Metadata Servers (MDS)Clients
30
Distributed file systems
Lustre (2/3)
Object StorageTargets(OST)
MetadataServers(MDS)
Clients
File operations,locking
Recovery,file status
Directorymetadata
Source: [BS02], p. 51
31
Distributed file systems
Lustre (3/3)
Lustre partly follows abstract modelSeparation of directory and flat file serviceFile attributes managed by OSTs
Hierarchical file systemsRealised within the client module
High availabilityHeavy use of redundancyCaching of metadata
32
Distributed file systems
Agenda
MotivationDistributed file system basicsCase studiesSummary and outlook
33
Distributed file systems
Summary and outlook
Abstract file service modelDeveloped to meet many requirements for DFSs
Different implementationsNFS: Stateless, concurrency controlAFS: Stateful, heavy use of caching, better performance
Other approach: LustreModularised approach, especially suited for clusters
Future developmentsLarge-scale environmentsCloud computingIssues: Data security, privacy
34
Distributed file systems
ANY QUESTIONS?Thank you for your attention!
35
Distributed file systems
Literature
[BS02] Peter J. Braam, Philip Schwan: Lustre: The intergalactic file system, Proceedings of the 2003 Ottawa Linux Symposium, pp. 50–54, 2002.[CDK01] George Coulouris, Jean Dollimore, Tim Kindberg: Distributed Systems, Concepts and Design, 3rd. ed., Addison-Wesley, 2001.[Kir06] Olaf Kirch: Why NFS Sucks, Proceedings of the Linux Symposium, 2nd. ed., pp. 51–63, 2006.[MSC+ 86] James H. Morris, Mahadev Satyanarayanan, Michael H. Conner, John H. Howard, David S. H. Rosenthal, F. Donelson Smith: Andrew: A distributed personal computing environment, Commununications of the ACM, 29(3), pp. 184–201, Association for Computing Machinery, 1986.[PJS+ 94] Brian Pawlowski, Chet Juszczak, Peter Staubach, Carl Smith, Diane Lebel, David Hitz: NFS Version 3: Design and Implementation, Proceedings of the Summer 1994 USENIX Technical Conference, pp. 137–151, 1994.[Sat89] Mahadev Satyanarayanan: Distributed file systems, Distributed systems, S. Mullender (ed.), pp. 149–188, ACM Press, 1989.[Sch03] Philip Schwan: Lustre: Building a file system for 1000-node clusters, Proceedings of the 2003 Ottawa Linux Symposium, pp. 380–386, 2003.[Tan03] Andrew S. Tanenbaum: Moderne Betriebssysteme, 2nd. ed., Prentice Hall, 2003.[Tv07] Andrew S. Tanenbaum, Marten van Steen: Distributed Systems: Principles and Paradigmsva, 2nd. ed., Prentice Hall, 2007.