Plank
description
Transcript of Plank
The Storage Fabric of the Grid:The Network Storage Stack
James S. Plank
Director:Logistical Computing and Internetworking
(LoCI) Laboratory
Department of Computer ScienceUniversity of Tennessee
Cluster and Computational Grids for Scientific Computing:September 12, 2002, Le Chateau de Faverges de la Tour, France
Grid Research & The Fabric Layer
Middleware
Application
Resources
The“Fabric”
Layer
What is the Fabric Layer?
• Networking: TCP/IP
• Storage: Files in a file system
• Computation: Processes managed by an OS
What is the Fabric Layer?
• Networking: TCP/IP
• Storage: Files in a file system
• Computation: Processes managed by an OS
Most Grid research accepts these as givens.(Examples: MPI, GridFTP)
LoCI’s Research Agenda
Redefine the fabric layer based onEnd-to-End Principles
Communication Storage Computation
Data / Link /Physical
Network
Transport
Application
Access /Physical
IBP Depot
exNode
LoRS
Application
Access /Physical
IBP NFU
exProc
LoRS
Application
What Should This Get You?
• Scalabililty
• Flexibility
• Fault-tolerance
• Composability
I.E. Better Grids
LoCI Lab Personnel
Directors:Jim PlankMicah Beck
Exec Director:Terry Moore
Grad Students:Erika FuentesSharmila KancherlaXiang LiLinzhen Xuan
Research Staff:Scott AtchleyAlexander BassiYing Ding Hunter HagewoodJeremy MillarStephen SolteszYong Zheng
Undergrad Students:Isaac CharlesRebecca CollinsKent GalbraithDustin Parr
Collaborators
• Jack Dongarra (UT - NetSolve, Linear Algebra)
• Rich Wolski (UCSB - Network Weather Service)
• Fran Berman (UCSD/NPACI - Scheduling)
• Henri Casanova (UCSD/NPACI - Scheduling)
• Laurent LeFevre (INRAI/ENS - Multicast, Active Networking)
The Network Storage Stack
Applications
Logistical File System
Logistical Tools
L-Bone
IBP
Local Access
Physical
exNode
• A Fundamental Organizing Principle
• Like the IP Stack
• Each level encapsulates details from the lower levels, while still exposing details to higher levels
The Network Storage Stack
Applications
Logistical File System
Logistical Tools
L-Bone
IBP
Local Access
Physical
exNode
• A Fundamental Organizing Principle
• Like the IP Stack
• Each level encapsulates details from the lower levels, while still exposing details to higher levels
The Network Storage Stack
The L-bone:Resource discovery& proximity queries
IBP (Internet Backplane Protocol): Allocating and managing network storage
The exNode:A data structurefor aggregation
LoRS: The Logistical Runtime System:Aggregation tools and methodologies
IBP: The Internet Backplane Protocol
• Managing and using state in the network.
• Inserting storage in the network so that:– Applications may use it advantageously.
– Storage owners do not lose control of their resources.
– The whole system is truly scalable and fault-tolerant
Low-level primitives and software for:
The Byte Array:IBP’s Unit of Storage
• You can think of it as a “buffer”.
• You can think of it as a “file”.
• Append-only semantics.
• Transience built in.
The IBP Client API
• Can be used by anyone* who can talk to the server.
• Seven procedure calls in three categories:– Allocation (1)– Data transfer (4)– Management (2)
• * not really, but close...
Client API: Allocation• IBP_allocate(char *host, int maxsize, IBP_attributes attr)
• Like a network malloc()
• Returns a trio of capabilities.– Read / Write / Manage– ASCII Strings (obfuscated)
• No user-defined file names:– Big flat name space.– No registration required to pass capabilities.
Allocation Attributes
• Time-Limited or Permanent
• Soft or Hard
• Read/Write semantics:– Byte Array– Pipe– Circular Queue
Client API: Data Transfer
• IBP_store(write-cap, bytes, size, ...)• IBP_deliver(read-cap, pointer, size, ...)
• IBP_copy(read-cap, write-cap, size, ...)
• IBP_mcopy(...)
2-party:
3-party:
N-party/other things:
IBP Client API: Management
• IBP_manage()/IBP_status()
• Allows for resizing byte arrays.• Allows for extending/shortening the time limit on
time-limited allocations.• Manages reference counts on the read/write
capabilities.• State probing.
IBP Servers
• Daemons that serve local disk or memory.
• Root access not required.
• Can specify sliding time limits or revokability.
• Encourages resource sharing.
Typical IBP usage scenario
Sender ReceiverIBP Network
Logistical Networking Strategies
Sender ReceiverIBPNetwork
Sender ReceiverIBPIBP IBP
#2
#1
#3
Sender ReceiverIBP#4 IBPIBP
XSuffrage on MCell/APST
University of Tennessee, Knoxville
NetSolve+
IBP
University of California, San Diego
GRAM+
GASS
Tokyo Institute of Technology
NetSolve+
NFS
NetSolve+
IBP
APST DaemonAPST Client
[(NetSolve+IBP) + (GRAM+GASS) + (NetSolve+NFS)] + NWS
MCell/APST Experimental Results
Experimental Setting:
MCell simulation with 1,200 tasks:• composed of 6 Monte-Carlo Simulations• input files: 1, 20, 100 MB
4 scenarios: Initially(a) all input files are only in Japan(b) 100MB files staged in California(c) in addition, one 100MB file staged in Tennessee(d) all input files replicated everywhere
workqueue
XSufferage
Scheduling Heuristics match data and tasks in appropriate locations- Automatic staging with IBP effective- Improved overall performance
The Network Storage Stack
The L-bone:Resource Discovery& Proximity queries
IBP: Allocating and managing networkstorage (like a network malloc)
The exNode:A data structurefor aggregation
LoRS: The Logistical Runtime System:Aggregation tools and methodologies
The Logistical Backbone (L-Bone)
• LDAP-based storage resource discovery.
• Query by capacity, network proximity, geographical proximity, stability, etc.
• Periodic monitoring of depots.
• Uses the Network Weather Service (NWS) for live measurements and forecasting.
Snapshot: August, 2002
Approximately 1.6 TB of publicly accessible storage(Scaling to a petabyte someday…)
The Network Storage Stack
The L-bone:Resource Discovery& Proximity queries
IBP: Allocating and managing networkstorage (like a network malloc)
The exNode:A data structurefor aggregation
LoRS: The Logistical Runtime System:Aggregation tools and methodologies
The exNode
• The Network “File” Pointer.• Analogous to the Unix inode.• Map byte-extents to IBP buffers (or other allocations).
• XML-based data structure/serialization.• Allows for replication, flexible decomposition of data.• Also allows for “end-to-end services.”• Arbitrary metadata.
The exNode (XML-based)
A B C
0
300
200
100
IBPDepots
Network
The Network Storage Stack
The L-bone:Resource Discovery& Proximity queries
IBP: Allocating and managing networkstorage (like a network malloc)
The exNode:A data structurefor aggregation
LoRS: The Logistical Runtime System:Aggregation tools and methodologies
Logistical Runtime System
• Aggregation for:– Capacity– Performance (striping)– More performance (caching)– Reliability (replication)– More reliability (ECC)– Logistical purposes (routing)
Logistical Runtime System
• Basic Primitives:
– Upload: Create a network file from local data
– Download: Get bytes from a network file.
– Augment: Add more replicas to a network file.
– Trim: Remove replicas from a network file.
– Stat: Get information about the network file.
– Refresh: Alter the time limits of the IBP buffers.
Upload
Augment to Tennessee
Augment to Santa Barbara
Stat (ls)
Failures do
happen.
Download
Trimming(dead capability removal)
End-To-End Services:
• MD5 Checksums stored per exNode block to detect corruption.
• Encryption is a per-block option.
• Compression is an per-block option.
• Parity/Coding is in the design.
Parity / Coding
IBPBuffers
Network
= + +
= + 2 + 3
ExNode with Coding
Scalability
• No bottlenecks• Really hard problems left unsolved, but for
the most part, the lower levels shouldn’t need changing.– Naming
– Good scheduling
– Consistency / File System semantics
– Computation
Status
Applications
Logistical File System
Logistical Tools
L-Bone
IBP
Local Access
Physical
exNode
• IBP/L-Bone/exNode/Tools all supported.
• Apps: Mail, IBP-ster, Video IBP-ster, IBPvo -- demo at SC-02
• Other institutions (see L-Bone)
What’s Coming Up?
• More nodes on the L-Bone
• More collaboration with applications groups
• Research on performance and scheduling
• Logistical File System
• A Computation Stack
• Code / Information at loci.cs.utk.edu
The Storage Fabric of the Grid:The Network Storage Stack
James S. Plank
Director:Logistical Computing and Internetworking
(LoCI) Laboratory
Department of Computer ScienceUniversity of Tennessee
Cluster and Computational Grids for Scientific Computing:September 12, 2002, Le Chateau de Faverges de la Tour, France
Replication: Experiment #1
UCSB
UCSDTAMU
UTK UNC
Harvard
Turin, ITStuttgart, DE
3 MB file
0
3 MB
UTK 2
UTK 5
UTK 6
UTK 3
UTK 4
UTK 1
UCSB 1
UCSB 2
UCSB 3
UCSD 1
UCSD 3
Harvard
UNC
UTK 5
UTK 2
UTK 5
UTK 6
UTK 3
UCSB 1
UCSB 2
UCSB 3
Replication: Experiment #1
01020304050
708090
100
60
UTK
99.8
5U
CS
DU
CS
BH
arva
rd
UN
C
99.7
1
95.3
1
59.7
7
99.8
8
Fra
gm
ent
Ava
ilab
ility
(%
)
01020304050
708090
100
60
UTK
96.2
7U
CS
DU
CS
BH
arva
rd
UN
C
98.6
0
88. 6
057
.29
97.2
0
Fra
gm
ent
Ava
ilab
ility
(%
)
01020304050
708090
100
60
UTK
99.8
7U
CS
DU
CS
BH
arva
rd
UN
C
99.8
0
90.4
7
57.4
5
99.8
7
Fra
gm
ent
Ava
ilab
ility
(%
)
Depot Availability at UTK
Depot Availability at UCSD
Depot Availability at Harvard
860 Download Attempts
100% Success
857 Download Attempts
100% Success
751 Download Attempts
100% Success
Most Frequent Download Path
From UTK From Harvard
From UCSD
Replication: Experiment #2
• Deleted 12 of the 21 IBP allocations
• Downloaded from UTK
3 MB file
0
3 MB
UTK 2
UTK 5 100%
UTK 6
UTK 3
UTK 4 99.84%
UTK 1
UCSB 1 93.88%
UCSB 2
UCSB 3
UCSD 1
UCSD 3 100%
Harvard 48.24%
UNC
UTK 5 99.78%
UTK 2
UTK 5 100%
UTK 6 100%
UTK 3
UCSB 1
UCSB 2 94.69%
UCSB 3
1,225 Attempts
93.88% Success