Distributed File System By Manshu Zhang. Outline Basic Concepts Current project Hadoop Distributed...
-
Upload
gyles-bradford -
Category
Documents
-
view
212 -
download
0
Transcript of Distributed File System By Manshu Zhang. Outline Basic Concepts Current project Hadoop Distributed...
Distributed File System
By Manshu Zhang
Outline
Basic Concepts Current project
Hadoop Distributed File System
Future work Reference
DFS
A distributed implementation of the classical time sharing model of a file system, where multiple users share files and storage resources.
Key Characteristics of DFS
Dispersion
Clients and files
Multiplicity
Clients and files
Primary issues of DFS
Naming and Transparency
Fault Tolerance
Naming
Naming – mapping between logical and physical objects.
Multilevel mapping. Transparent replicas and location
Naming Schemes — Three Main Approaches
Host name + local name guarantees a unique system wide name.
Mount remote directories to local directories once mounted, files can be referenced in a location-transparent
manner
Total integration of the component file systems. A single global name structure If a server is unavailable, some arbitrary set of directories on on
different machines also becomes unavailable
Transparency(1)
Login Transparency: User can log in at any host with
uniform login procedure and perceive a uniform view of
the file system. Access Transparency: Client process on a hots has
uniform mechanism to access all files in system regardeless of files are on local/remote host.
Location Transparency: The names of the files do not
reveal their physical location.
Transparency(2)
Concurrency Transparency: An update to a file should not have effect on the correct execution of other process that is concurrently sharing a file.
Replication Transparency: Files may be replicated to provide redundancy for availability and also to permit concurrent access for efficiency.
Fault Tolerance
Stateful Vs. Stateless Maintain information on client
File Replication
Distinctions Between Stateful &Stateless Service
Failure Recovery. A stateful server loses all its volatile state in a crash. With stateless server, the effects of server failure and
recovery are almost unnoticeable.
File Replication
Several copies of a file's contents at different
locations enable multiple servers to share the
load of providing the service
Naming scheme maps a replicated file name
to a particular replica.
Updates
Current Project
HDFS: Hadoop Distributed File System
Distributed parallel fault tolerant file system. It is
designed to reliably store very large files across
machines in a large cluster.
Efficient, reliable, and open source
Naming: central metadata server
Synchronization: write-once-read-many, give
locks on objects to clients, using leases
Consistency and replication: server side
replication, asynchronous replication, checksum
Fault tolerance: failure as norm
Security: no dedicated security mechanism
Future Work Robustness of data sharing model
The preceding section, architecture, naming,
synchronization, availability, heterogeneity and support
for databases
Security
Reference
[1] Thanh, T.D.; Mohan, S.; Choi, E.; SangBum Kim; Pilsung Kim.
2008Networked Computing and Advanced Information Management. “A
Taxonomy and Survey on Distributed File Systems”
[2] Randy chow,1997,Distributed operating systems & Algorithms
[3] Eliezer Levy, Abraham Silberschatz. December 1990 Computing
Surveys (CSUR) , Volume 22 Issue 4. ”Distributed file systems: concepts
and examples”.
[4]http://hadoop.apache.org/common/docs/current/
hdfs_design.html#Introduction
[5]http://www.snia.org/events/wintersymp2009/cloud/
dhruba_hadoop_snia.pdf
[6]http://en.wikipedia.org/wiki/List_of_file_systems#Distributed_file_systems
[7]http://en.wikipedia.org/wiki/Hadoop#Hadoop_Distributed_File_System
[8]http://www.cs.gsu.edu/~cscyqz/courses/aos/slides08/ch6.1-Fall08.pptx
Q&A?
Thank you!