Plank

49
The Storage Fabric of the Grid: The Network Storage Stack James S. Plank Director: Lo gistical C omputing and I nternetworking (LoCI ) Laboratory Department of Computer Science University of Tennessee Cluster and Computational Grids for Scientific Computing: September 12, 2002, Le Chateau de Faverges de la Tour, France

description

Fashion, apparel, textile, merchandising, garments

Transcript of Plank

Page 1: Plank

The Storage Fabric of the Grid:The Network Storage Stack

James S. Plank

Director:Logistical Computing and Internetworking

(LoCI) Laboratory

Department of Computer ScienceUniversity of Tennessee

Cluster and Computational Grids for Scientific Computing:September 12, 2002, Le Chateau de Faverges de la Tour, France

Page 2: Plank

Grid Research & The Fabric Layer

Middleware

Application

Resources

The“Fabric”

Layer

Page 3: Plank

What is the Fabric Layer?

• Networking: TCP/IP

• Storage: Files in a file system

• Computation: Processes managed by an OS

Page 4: Plank

What is the Fabric Layer?

• Networking: TCP/IP

• Storage: Files in a file system

• Computation: Processes managed by an OS

Most Grid research accepts these as givens.(Examples: MPI, GridFTP)

Page 5: Plank

LoCI’s Research Agenda

Redefine the fabric layer based onEnd-to-End Principles

Communication Storage Computation

Data / Link /Physical

Network

Transport

Application

Access /Physical

IBP Depot

exNode

LoRS

Application

Access /Physical

IBP NFU

exProc

LoRS

Application

Page 6: Plank

What Should This Get You?

• Scalabililty

• Flexibility

• Fault-tolerance

• Composability

I.E. Better Grids

Page 7: Plank

LoCI Lab Personnel

Directors:Jim PlankMicah Beck

Exec Director:Terry Moore

Grad Students:Erika FuentesSharmila KancherlaXiang LiLinzhen Xuan

Research Staff:Scott AtchleyAlexander BassiYing Ding Hunter HagewoodJeremy MillarStephen SolteszYong Zheng

Undergrad Students:Isaac CharlesRebecca CollinsKent GalbraithDustin Parr

Page 8: Plank

Collaborators

• Jack Dongarra (UT - NetSolve, Linear Algebra)

• Rich Wolski (UCSB - Network Weather Service)

• Fran Berman (UCSD/NPACI - Scheduling)

• Henri Casanova (UCSD/NPACI - Scheduling)

• Laurent LeFevre (INRAI/ENS - Multicast, Active Networking)

Page 9: Plank

The Network Storage Stack

Applications

Logistical File System

Logistical Tools

L-Bone

IBP

Local Access

Physical

exNode

• A Fundamental Organizing Principle

• Like the IP Stack

• Each level encapsulates details from the lower levels, while still exposing details to higher levels

Page 10: Plank

The Network Storage Stack

Applications

Logistical File System

Logistical Tools

L-Bone

IBP

Local Access

Physical

exNode

• A Fundamental Organizing Principle

• Like the IP Stack

• Each level encapsulates details from the lower levels, while still exposing details to higher levels

Page 11: Plank

The Network Storage Stack

The L-bone:Resource discovery& proximity queries

IBP (Internet Backplane Protocol): Allocating and managing network storage

The exNode:A data structurefor aggregation

LoRS: The Logistical Runtime System:Aggregation tools and methodologies

Page 12: Plank

IBP: The Internet Backplane Protocol

• Managing and using state in the network.

• Inserting storage in the network so that:– Applications may use it advantageously.

– Storage owners do not lose control of their resources.

– The whole system is truly scalable and fault-tolerant

Low-level primitives and software for:

Page 13: Plank

The Byte Array:IBP’s Unit of Storage

• You can think of it as a “buffer”.

• You can think of it as a “file”.

• Append-only semantics.

• Transience built in.

Page 14: Plank

The IBP Client API

• Can be used by anyone* who can talk to the server.

• Seven procedure calls in three categories:– Allocation (1)– Data transfer (4)– Management (2)

• * not really, but close...

Page 15: Plank

Client API: Allocation• IBP_allocate(char *host, int maxsize, IBP_attributes attr)

• Like a network malloc()

• Returns a trio of capabilities.– Read / Write / Manage– ASCII Strings (obfuscated)

• No user-defined file names:– Big flat name space.– No registration required to pass capabilities.

Page 16: Plank

Allocation Attributes

• Time-Limited or Permanent

• Soft or Hard

• Read/Write semantics:– Byte Array– Pipe– Circular Queue

Page 17: Plank

Client API: Data Transfer

• IBP_store(write-cap, bytes, size, ...)• IBP_deliver(read-cap, pointer, size, ...)

• IBP_copy(read-cap, write-cap, size, ...)

• IBP_mcopy(...)

2-party:

3-party:

N-party/other things:

Page 18: Plank

IBP Client API: Management

• IBP_manage()/IBP_status()

• Allows for resizing byte arrays.• Allows for extending/shortening the time limit on

time-limited allocations.• Manages reference counts on the read/write

capabilities.• State probing.

Page 19: Plank

IBP Servers

• Daemons that serve local disk or memory.

• Root access not required.

• Can specify sliding time limits or revokability.

• Encourages resource sharing.

Page 20: Plank

Typical IBP usage scenario

Page 21: Plank

Sender ReceiverIBP Network

Logistical Networking Strategies

Sender ReceiverIBPNetwork

Sender ReceiverIBPIBP IBP

#2

#1

#3

Sender ReceiverIBP#4 IBPIBP

Page 22: Plank

XSuffrage on MCell/APST

University of Tennessee, Knoxville

NetSolve+

IBP

University of California, San Diego

GRAM+

GASS

Tokyo Institute of Technology

NetSolve+

NFS

NetSolve+

IBP

APST DaemonAPST Client

[(NetSolve+IBP) + (GRAM+GASS) + (NetSolve+NFS)] + NWS

Page 23: Plank

MCell/APST Experimental Results

Experimental Setting:

MCell simulation with 1,200 tasks:• composed of 6 Monte-Carlo Simulations• input files: 1, 20, 100 MB

4 scenarios: Initially(a) all input files are only in Japan(b) 100MB files staged in California(c) in addition, one 100MB file staged in Tennessee(d) all input files replicated everywhere

workqueue

XSufferage

Scheduling Heuristics match data and tasks in appropriate locations- Automatic staging with IBP effective- Improved overall performance

Page 24: Plank

The Network Storage Stack

The L-bone:Resource Discovery& Proximity queries

IBP: Allocating and managing networkstorage (like a network malloc)

The exNode:A data structurefor aggregation

LoRS: The Logistical Runtime System:Aggregation tools and methodologies

Page 25: Plank

The Logistical Backbone (L-Bone)

• LDAP-based storage resource discovery.

• Query by capacity, network proximity, geographical proximity, stability, etc.

• Periodic monitoring of depots.

• Uses the Network Weather Service (NWS) for live measurements and forecasting.

Page 26: Plank

Snapshot: August, 2002

Approximately 1.6 TB of publicly accessible storage(Scaling to a petabyte someday…)

Page 27: Plank

The Network Storage Stack

The L-bone:Resource Discovery& Proximity queries

IBP: Allocating and managing networkstorage (like a network malloc)

The exNode:A data structurefor aggregation

LoRS: The Logistical Runtime System:Aggregation tools and methodologies

Page 28: Plank

The exNode

• The Network “File” Pointer.• Analogous to the Unix inode.• Map byte-extents to IBP buffers (or other allocations).

• XML-based data structure/serialization.• Allows for replication, flexible decomposition of data.• Also allows for “end-to-end services.”• Arbitrary metadata.

Page 29: Plank

The exNode (XML-based)

A B C

0

300

200

100

IBPDepots

Network

Page 30: Plank

The Network Storage Stack

The L-bone:Resource Discovery& Proximity queries

IBP: Allocating and managing networkstorage (like a network malloc)

The exNode:A data structurefor aggregation

LoRS: The Logistical Runtime System:Aggregation tools and methodologies

Page 31: Plank

Logistical Runtime System

• Aggregation for:– Capacity– Performance (striping)– More performance (caching)– Reliability (replication)– More reliability (ECC)– Logistical purposes (routing)

Page 32: Plank

Logistical Runtime System

• Basic Primitives:

– Upload: Create a network file from local data

– Download: Get bytes from a network file.

– Augment: Add more replicas to a network file.

– Trim: Remove replicas from a network file.

– Stat: Get information about the network file.

– Refresh: Alter the time limits of the IBP buffers.

Page 33: Plank

Upload

Page 34: Plank

Augment to Tennessee

Page 35: Plank

Augment to Santa Barbara

Page 36: Plank

Stat (ls)

Page 37: Plank

Failures do

happen.

Page 38: Plank

Download

Page 39: Plank

Trimming(dead capability removal)

Page 40: Plank

End-To-End Services:

• MD5 Checksums stored per exNode block to detect corruption.

• Encryption is a per-block option.

• Compression is an per-block option.

• Parity/Coding is in the design.

Page 41: Plank

Parity / Coding

IBPBuffers

Network

= + +

= + 2 + 3

ExNode with Coding

Page 42: Plank

Scalability

• No bottlenecks• Really hard problems left unsolved, but for

the most part, the lower levels shouldn’t need changing.– Naming

– Good scheduling

– Consistency / File System semantics

– Computation

Page 43: Plank

Status

Applications

Logistical File System

Logistical Tools

L-Bone

IBP

Local Access

Physical

exNode

• IBP/L-Bone/exNode/Tools all supported.

• Apps: Mail, IBP-ster, Video IBP-ster, IBPvo -- demo at SC-02

• Other institutions (see L-Bone)

Page 44: Plank

What’s Coming Up?

• More nodes on the L-Bone

• More collaboration with applications groups

• Research on performance and scheduling

• Logistical File System

• A Computation Stack

• Code / Information at loci.cs.utk.edu

Page 45: Plank

The Storage Fabric of the Grid:The Network Storage Stack

James S. Plank

Director:Logistical Computing and Internetworking

(LoCI) Laboratory

Department of Computer ScienceUniversity of Tennessee

Cluster and Computational Grids for Scientific Computing:September 12, 2002, Le Chateau de Faverges de la Tour, France

Page 46: Plank

Replication: Experiment #1

UCSB

UCSDTAMU

UTK UNC

Harvard

Turin, ITStuttgart, DE

3 MB file

0

3 MB

UTK 2

UTK 5

UTK 6

UTK 3

UTK 4

UTK 1

UCSB 1

UCSB 2

UCSB 3

UCSD 1

UCSD 3

Harvard

UNC

UTK 5

UTK 2

UTK 5

UTK 6

UTK 3

UCSB 1

UCSB 2

UCSB 3

Page 47: Plank

Replication: Experiment #1

01020304050

708090

100

60

UTK

99.8

5U

CS

DU

CS

BH

arva

rd

UN

C

99.7

1

95.3

1

59.7

7

99.8

8

Fra

gm

ent

Ava

ilab

ility

(%

)

01020304050

708090

100

60

UTK

96.2

7U

CS

DU

CS

BH

arva

rd

UN

C

98.6

0

88. 6

057

.29

97.2

0

Fra

gm

ent

Ava

ilab

ility

(%

)

01020304050

708090

100

60

UTK

99.8

7U

CS

DU

CS

BH

arva

rd

UN

C

99.8

0

90.4

7

57.4

5

99.8

7

Fra

gm

ent

Ava

ilab

ility

(%

)

Depot Availability at UTK

Depot Availability at UCSD

Depot Availability at Harvard

860 Download Attempts

100% Success

857 Download Attempts

100% Success

751 Download Attempts

100% Success

Page 48: Plank

Most Frequent Download Path

From UTK From Harvard

From UCSD

Page 49: Plank

Replication: Experiment #2

• Deleted 12 of the 21 IBP allocations

• Downloaded from UTK

3 MB file

0

3 MB

UTK 2

UTK 5 100%

UTK 6

UTK 3

UTK 4 99.84%

UTK 1

UCSB 1 93.88%

UCSB 2

UCSB 3

UCSD 1

UCSD 3 100%

Harvard 48.24%

UNC

UTK 5 99.78%

UTK 2

UTK 5 100%

UTK 6 100%

UTK 3

UCSB 1

UCSB 2 94.69%

UCSB 3

1,225 Attempts

93.88% Success