GridFTP: File Transfer Protocol in Grid Computing Networks Caitlin Minteer.

17
GridFTP: File Transfer Protocol in Grid Computing Networks Caitlin Minteer

Transcript of GridFTP: File Transfer Protocol in Grid Computing Networks Caitlin Minteer.

Page 1: GridFTP: File Transfer Protocol in Grid Computing Networks Caitlin Minteer.

GridFTP: File Transfer Protocol in Grid

Computing Networks

Caitlin Minteer

Page 2: GridFTP: File Transfer Protocol in Grid Computing Networks Caitlin Minteer.

Agenda

• Grid Computing

• Globus Toolkit

• Grid FTP

• Advantages of GridFTP

• Disadvantages of GridFTP

• Using GridFTP

• GridFTP Clients

Page 3: GridFTP: File Transfer Protocol in Grid Computing Networks Caitlin Minteer.

Grid Computing and the Globus Toolkit

• Grid computing is the rising networking infrastructure that is designed to offer access to computational data and human resources spread over wide area environments

• The Globus Toolkit is a technology for the grid – open source toolkit – building computing grids

Page 4: GridFTP: File Transfer Protocol in Grid Computing Networks Caitlin Minteer.

GridFTP

• GridFTP - File Transfer Protocol in Grid Computing Networks.– high-performance

– secure

– reliable data transfer protocol

– optimized for high-bandwidth wide-area networks.

• based upon the Internet FTP protocol• it implements extensions for high-performance

operation

Page 5: GridFTP: File Transfer Protocol in Grid Computing Networks Caitlin Minteer.

GridFTP

• Provides:– A highly extensible server– a scriptable command-line client– a set of development libraries for custom

solutions.

Page 6: GridFTP: File Transfer Protocol in Grid Computing Networks Caitlin Minteer.

Advantages of GridFTP

• Security

• Parallel Streams

• Striping

• Partial File Transfer

• Reliable and Restartable data transfer

• Data Extensibility

• Protocol Extensibility

Page 7: GridFTP: File Transfer Protocol in Grid Computing Networks Caitlin Minteer.

GridFTP: Security

• Authentication of users or services• Protecting communications• Determining authorization • Managing user credentials and maintaining

group membership information. • The Globus GridFTP server and client use the

Grid Security Infrastructure (GSI) protocol that allows a secure Public Key Infrastructure (PKI) interface, and adds the capability of delegated authority through certificates.

Page 8: GridFTP: File Transfer Protocol in Grid Computing Networks Caitlin Minteer.

Parallel Streams

• GridFTP supports multiple TCP streams in parallel between a single source and destination.

• This feature can improve aggregate bandwidth in relation to that done by a single stream.

Page 9: GridFTP: File Transfer Protocol in Grid Computing Networks Caitlin Minteer.

Stripping

• Stripping - having several network endpoints at the source, destination, or both participating in the transfer of the same file.

• Done by having a cluster with a parallel shared file system.

• Each node in the cluster reads a section of the file and sends it over the network.

• Striping and parallelism may be used together where one may have more than one TCP streams open between each of the servers participating in the transfer.

Page 10: GridFTP: File Transfer Protocol in Grid Computing Networks Caitlin Minteer.

Partial File Transfer

• Partial file access: Regions of a file may be accessed by specifying an offset into the file and the length of the block desired

• GridFTP supports this capability by specifying the byte position in the file to begin the transfer.

Page 11: GridFTP: File Transfer Protocol in Grid Computing Networks Caitlin Minteer.

Third Party Control and Reliable/Restartable Data Transfer

• To enable reliability, the GridFTP server automatically sends restart markers (checkpoints) to the client.

• If the transfer has a fault, the client may restart the transfer and provide the markers received.

• The server will restart the transfer, picking up where it left off based on the markers.

• The Reliable File Transfer (RFT) service goes one step further by providing a service interface (job submission like interface) and writing the restart markers to a database so that it can survive a local fault.

• clients are also able to act as a third-party to initiate transfers between remote sites.

Page 12: GridFTP: File Transfer Protocol in Grid Computing Networks Caitlin Minteer.

Data Extensibility

• The Data Storage Interface (DSI) module knows how to read and write to the local storage system and can optionally transform the data.

• It completely abstracts away the underlying storage.

Page 13: GridFTP: File Transfer Protocol in Grid Computing Networks Caitlin Minteer.

Protocol Extensibility

• GlobusXIO – is the eXtensible Input/Output (XIO) framework in the Globus Toolkit.

• provides a simple abstraction layer to runtime loadable IO implementations.

• system uses a read, write, open, close abstraction that Globus GridFTP is able to leverge in order to be transport protocol agnostic.

• Therefore, protocols much more aggressive than TCP can be used. To meet more specific extensibility needs, we also provide easy-to-use development libraries.

Page 14: GridFTP: File Transfer Protocol in Grid Computing Networks Caitlin Minteer.

Disadvantages of GridFTP

• The client needs to remain active at all the times until the transfer finishes, which means that when the client state has been lost the rich set of recovery features of GridFTP can not be used.

• In the event of client state loss, transfer has to restart from scratch.

• GridFTP’s many features are tied to its protocol and implementation. Reimplementation and re-engineering would be required to provide these features to other file transfer services.

• several memory leaks, unclear error responses and bugs that have caused many issues in the use of GridFTP

Page 15: GridFTP: File Transfer Protocol in Grid Computing Networks Caitlin Minteer.

Using GridFTP: Put, Get, & Third Party

• “Putting” – move a file from one system to a server – ‘globus-url-copy -vb -tcp-bs 2097152 -p 4 file:///filename

gsiftp://hostnameofserver/filename.’

• “Getting” – move a file from the server to the local machine – ‘globus-url-copy -vb -tcp-bs 2097152 -p 4

gsiftp://hostnameofserver/filename file:///filename.’

• Third party transfers – move a file between two GridFTP servers – globus-url-copy -vb -tcp-bs 2097152 -p 4

gsiftp://othermachinehostname/filename gsiftp://localhostname/filename.

Page 16: GridFTP: File Transfer Protocol in Grid Computing Networks Caitlin Minteer.

GridFTP Clients

• globus-url-copy – the provided scriptable, command line client– Easy to use– access multiple protocols that you can specify in a URL

• To use the globus-url-copy, a proxy certificate must be obtained. • Then a temporary proxy must be generated  • Globus does not provide an interactive client for GridFTP neither

GUI nor text based. • regular FTP clients will work with GridFTP but will not take

advantage of all the features of GridFTP• UberFTP is the first interactive, GridFTP-enabled ftp client.

– supports GSI authentication– parallel data channels– third party transfers.

Page 17: GridFTP: File Transfer Protocol in Grid Computing Networks Caitlin Minteer.

Summary

• Grid Computing• Globus Toolkit• Grid FTP• Advantages of GridFTP

– Security– Parallel Streams– Striping – Partial File Transfer– Reliable and Restartable data transfer – Data Extensibility– Protocol Extensibility

• Disadvantages of GridFTP• Using GridFTP• GridFTP Clients