Post on 04-Jul-2015
Towards a Scalable File System Progress on adapting BlobSeer to WAN scale for the HGMDS distributed metadata system Viet-Trung Tran, Gabriel Antoniu, Alexandru Costan (INRIA - Rennes) In collaboration with Kohei Hiraga, Osamu Tatebe (U Tsukuba)
FP3C meeting Bordeaux, 2 – 3 September 2011
Plan
1. Background and context 2. Goal 3. Approach and solution 4. Preliminary evaluation 5. Conclusion
FP3C meeting – Bordeaux, 2-3 September 2011 - 2
FP3C meeting – Bordeaux, 2-3 September 2011 - 3
Background BlobSeer & HGMDS
1
FP3C meeting – Bordeaux, 2-3 September 2011 - 4
BlobSeer: A large-scale data management service
Generic data-management platform for huge, unstructured data • Huge data (TB) : BLOBs • Highly concurrent, fine-grain access (MB): R/W/A • Prototype available Key design features • Decentralized metadata management • Beyond MVCC: multiversioning exposed to the user • Lock-free write access through versioning A back-end for higher-level, sophisticated data management systems
FP3C meeting – Bordeaux, 2-3 September 2011 - 5
BlobSeer: Architecture
Clients • Perform fine grain blob accesses Providers • Store the pages of the blob Provider manager • Monitors the providers • Favours data load balancing Metadata providers • Store information about page location Version manager • Ensures concurrency control
Clients
Providers
Metadata providers
Provider manager
Version manager
FP3C meeting – Bordeaux, 2-3 September 2011 - 6
HGMDS: A distributed metadata management system for global file systems
• Multi-master file system metadata server (MDS). • Managing inode structure. • High latency networks don't affect metadata operation performance. - Both reading and writing. • One MDS per site. • Metadata versioning using vector clocks for collision detection. • Automatic collision resolution by system side.
The Internet Site A
File system Clients
HGMDS
Site B
Site C
mkdir/rmdir/ create/stat/
unlink
Propagate updates in
background
HGMDS
FP3C meeting – Bordeaux, 2-3 September 2011 - 7
Goal A joint architecture integrating BlobSeer and HGMDS
2
FP3C meeting – Bordeaux, 2-3 September 2011 - 8
Goal BlobSeer HGMDS
Data management Metadata management Typically on a single site Global scale, multiple sites
Idea: build a global file system deployed on multiple site by integrating BlobSeer to HGMDS Potential benefits: • HGMDS: efficient multi-site file metadata management • BlobSeer: concurrency-optimized access to globally shared data
FP3C meeting – Bordeaux, 2-3 September 2011 - 9
Our approach and solution 3
Two approaches
Multiple BlobSeer instances • One BlobSeer / site One single BlobSeer-WAN over distributed geographic sites
FP3C meeting – Bordeaux, 2-3 September 2011 - 10
1st approach: 1 BlobSeer instance / site
FP3C meeting – Bordeaux, 2-3 September 2011 - 11
Client
1st approach: Zoom
High latency when accessing remote BLOBs: • Too many remote requests for small metadata EMETTEUR - NOM DE LA PRESENTATION - 12
2nd approach: 1 BlobSeer-WAN instance over distributed geographic sites
Multiple version managers • 1 version manager/site Multiple provider managers • 1 provider manager/site On each site • Multiple data providers and metadata servers • Data providers are under control of local provider manager
EMETTEUR - NOM DE LA PRESENTATION - 13
Idea: leverage locality for remote metadata accesses
Metadata I/O is resolved locally
EMETTEUR - NOM DE LA PRESENTATION - 14
2
2nd approach: I/O scheme in BlobSeer-WAN
Writing • Publish version on local version manager • Locally write metadata on local metadata servers • Locally write data on local data providers Reading (Read your write in many cases) • Ask a version to local version manager • Local metadata accesses • Access remote/local providers if necessary
FP3C meeting – Bordeaux, 2-3 September 2011 - 15
Vector clocks and optimistic metadata replication
FP3C meeting – Bordeaux, 2-3 September 2011 - 16
Expected benefits
FP3C meeting – Bordeaux, 2-3 September 2011 - 17
• On WAN: BlobSeer coordinates with HGMDS to provide a global versioning file system
- Low latency metadata I/O - Eventually consistency model - Load balancing/fault tolerance
• On LAN: - Distributed version management - Load balancing/fault tolerance
FP3C meeting – Bordeaux, 2-3 September 2011 - 18
Preliminary evaluation BlobSeer-WAN on G5K
4
FP3C meeting – Bordeaux, 2-3 September 2011 - 19
Testbed
Using 2 sites of G5K • Rennes: 40 nodes
• 30 nodes reserved for BlobSeer services • 10 nodes for clients
• Grenoble: 40 nodes • 30 nodes reserved for BlobSeer services • 10 nodes for clients
• Interconnect network between sites 10 Gbps
FP3C meeting – Bordeaux, 2-3 September 2011 - 20
Concurrent appending: 512 MB/client
FP3C meeting – Bordeaux, 2-3 September 2011 - 21
Conclusion On going work
5
FP3C meeting – Bordeaux, 2-3 September 2011 - 22
Summary Discussed the integration of BlobSeer and HGMDS: • BlobSeer-WAN extension is required BlobSeer-WAN • Preliminary results look encouraging • Performance of BlobSeer-WAN on two sites similar to that of
vanilla BlobSeer on a single site • Prototype available at BlobSeer’s repository/branches/
BlobSeer-WAN-dev/
HGMDS • Implementation almost done • Works on multi-sites • Collisions automatically solved by a rule
Next steps
• A more extensive evaluation for BlobSeer-WAN • Integrate BlobSeer-WAN to HGMDS • Preliminary evaluation of HGMDS BlobSeer-WAN on
Grid5000 and on the Japanese Clusters • Submit co-authored paper by Spring 2012 • Next internships: Kohei @Inria Rennes
FP3C meeting – Bordeaux, 2-3 September 2011 - 23
Thank you!
FP3C meeting 2 – 3 September 2011