The Globus Striped GridFTP Framework and Server Bill Allcock 1 (presenting) John Bresnahan 1 Raj...

32
The Globus Striped GridFTP Framework and Server Bill Allcock 1 (presenting) John Bresnahan 1 Raj Kettimuthu 1 Mike Link 2 Catalin Dumitrescu 2 Ioan Raicu 2 Ian Foster 1,2 1 Math & Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, U.S.A. 2 Dept of Computer Science, University of Chicago, Chicago, IL 60615, U.S.A.

Transcript of The Globus Striped GridFTP Framework and Server Bill Allcock 1 (presenting) John Bresnahan 1 Raj...

Page 1: The Globus Striped GridFTP Framework and Server Bill Allcock 1 (presenting) John Bresnahan 1 Raj Kettimuthu 1 Mike Link 2 Catalin Dumitrescu 2 Ioan Raicu.

The Globus Striped GridFTP Framework and Server

Bill Allcock1 (presenting) John Bresnahan1

Raj Kettimuthu1 Mike Link2 Catalin Dumitrescu2 Ioan Raicu2 Ian Foster1,2

1 Math & Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, U.S.A.

2 Dept of Computer Science, University of Chicago, Chicago, IL 60615, U.S.A.

Page 2: The Globus Striped GridFTP Framework and Server Bill Allcock 1 (presenting) John Bresnahan 1 Raj Kettimuthu 1 Mike Link 2 Catalin Dumitrescu 2 Ioan Raicu.

Introduction Problem we are addressing A brief discussion of the GridFTP Protocol Design / Architecture of our implementation Performance results

Page 3: The Globus Striped GridFTP Framework and Server Bill Allcock 1 (presenting) John Bresnahan 1 Raj Kettimuthu 1 Mike Link 2 Catalin Dumitrescu 2 Ioan Raicu.

Technology Drivers Internet revolution: 100M+ hosts

Collaboration & sharing the norm Universal Moore’s law: x103/10 yrs

Sensors as well as computers Petascale data tsunami

Gating step is analysis & our old infrastructure?

114 genomes735 in progress

You are here

Page 4: The Globus Striped GridFTP Framework and Server Bill Allcock 1 (presenting) John Bresnahan 1 Raj Kettimuthu 1 Mike Link 2 Catalin Dumitrescu 2 Ioan Raicu.

What issues are we addressing?

Striping storage systems are often clusters, and we need

to be able to utilize all of that parallelism Collective Operations

essentially, the striping should be invisible to the outside world

Uniform interface Ideally, any data source can be treated the

same way

Page 5: The Globus Striped GridFTP Framework and Server Bill Allcock 1 (presenting) John Bresnahan 1 Raj Kettimuthu 1 Mike Link 2 Catalin Dumitrescu 2 Ioan Raicu.

What issues are we addressing? Network Protocol Independence

TCP has well known issues with high Bandwidth-Delay Product networks

Need to be able to take advantage of aggressive protocols on circuits.

Diverse Failure Modes Much happening under the covers, so must be

resilient to failures End-to-End Performance

We need to be able to manage performance for a wide range of resources

Page 6: The Globus Striped GridFTP Framework and Server Bill Allcock 1 (presenting) John Bresnahan 1 Raj Kettimuthu 1 Mike Link 2 Catalin Dumitrescu 2 Ioan Raicu.

The GridFTP Protocol

Page 7: The Globus Striped GridFTP Framework and Server Bill Allcock 1 (presenting) John Bresnahan 1 Raj Kettimuthu 1 Mike Link 2 Catalin Dumitrescu 2 Ioan Raicu.

What is GridFTP? A secure, robust, fast, efficient, standards based,

widely accepted data transfer protocol A Protocol

Multiple Independent implementation can interoperate This works. Fermi Lab has an implementation with their

DCache system and U. Virginia has a .Net implementation that work with ours.

Lots of people have developed clients independent of the Globus Project.

The Globus Toolkit supplies a reference implementation: Server Client tools (globus-url-copy) Development Libraries

Page 8: The Globus Striped GridFTP Framework and Server Bill Allcock 1 (presenting) John Bresnahan 1 Raj Kettimuthu 1 Mike Link 2 Catalin Dumitrescu 2 Ioan Raicu.

GridFTP: The Protocol

Existing standards RFC 959: File Transfer Protocol RFC 2228: FTP Security Extensions RFC 2389: Feature Negotiation for the File

Transfer Protocol Draft: FTP Extensions GridFTP: Protocol Extensions to FTP for the Grid

Grid Forum Recommendation GFD.20 http://www.ggf.org/documents/GWD-R/GFD-R.020.pdf

Page 9: The Globus Striped GridFTP Framework and Server Bill Allcock 1 (presenting) John Bresnahan 1 Raj Kettimuthu 1 Mike Link 2 Catalin Dumitrescu 2 Ioan Raicu.

What did the GridFTP protocol add? Extended Block Mode

data is sent in “packets” with a header containing a 64 bit offset and length

allows out-of-order reception of packets Restart and Performance Markers

allows for robust restart and perf monitoring SPAS/SPOR

striped PASV and striped PORT allows a list of IP/ports to be returned

Page 10: The Globus Striped GridFTP Framework and Server Bill Allcock 1 (presenting) John Bresnahan 1 Raj Kettimuthu 1 Mike Link 2 Catalin Dumitrescu 2 Ioan Raicu.

What did the GridFTP Protocol add? Data Channel Authentication

Needed since in third party transfer, you don’t know who will connect to the listener.

ESTO/ERET allows for additional processing on the data prior

to storage/transmission We use this for partial file transfers

SBUF/ABUF manual and automatic TCP buffer tuning

Options to set parallelism/striping parameters

Page 11: The Globus Striped GridFTP Framework and Server Bill Allcock 1 (presenting) John Bresnahan 1 Raj Kettimuthu 1 Mike Link 2 Catalin Dumitrescu 2 Ioan Raicu.

1Establish control connection

2Establish data connection

Client

Server Server Server

Client

1Establish control connection

2Establish control connection

3Establish data connection

Client Server Model 3rd Party Transfer Model

Client/Server vs 3rd Party

Page 12: The Globus Striped GridFTP Framework and Server Bill Allcock 1 (presenting) John Bresnahan 1 Raj Kettimuthu 1 Mike Link 2 Catalin Dumitrescu 2 Ioan Raicu.

Parallelism vs Striping

Page 13: The Globus Striped GridFTP Framework and Server Bill Allcock 1 (presenting) John Bresnahan 1 Raj Kettimuthu 1 Mike Link 2 Catalin Dumitrescu 2 Ioan Raicu.

Architecture / Design of our Implementation

Page 14: The Globus Striped GridFTP Framework and Server Bill Allcock 1 (presenting) John Bresnahan 1 Raj Kettimuthu 1 Mike Link 2 Catalin Dumitrescu 2 Ioan Raicu.

Data Channel

Server PI

DTP

Description of transfer: completely server-internal communication. Protocol is unspecified and left up to the implementation.

Server PI

DTP

Internal IPC API Internal IPC API

Client PIInfo on transfer: restart markers, performance markers, etc. Server PI optionally processes these, then sends them to the client PI

ControlChannels

Overall Architecture

Page 15: The Globus Striped GridFTP Framework and Server Bill Allcock 1 (presenting) John Bresnahan 1 Raj Kettimuthu 1 Mike Link 2 Catalin Dumitrescu 2 Ioan Raicu.

Possible Configurations

Control

Data

Typical Installation

Control

Data

Separate Processes

Striped Server

Control

Data

Control

Striped Server (future)

Data

Page 16: The Globus Striped GridFTP Framework and Server Bill Allcock 1 (presenting) John Bresnahan 1 Raj Kettimuthu 1 Mike Link 2 Catalin Dumitrescu 2 Ioan Raicu.

Data Storage Interface

Data Processing Module

Data Channel Protocol Module

Datasourceor sink

Data channel

Data Transfer Processor

Page 17: The Globus Striped GridFTP Framework and Server Bill Allcock 1 (presenting) John Bresnahan 1 Raj Kettimuthu 1 Mike Link 2 Catalin Dumitrescu 2 Ioan Raicu.

Data Storage Interface This is a very powerful abstraction Several can be available and loaded

dynamically via the ERET/ESTO commands Anything that can implement the interface

can be accessed via the GridFTP protocol We have implemented

POSIX file (used for performance testing) HPSS (tape system; IBM) Storage Resource Broker (SRB; SDSC) NeST (disk space reservation; UWis/Condor)

Page 18: The Globus Striped GridFTP Framework and Server Bill Allcock 1 (presenting) John Bresnahan 1 Raj Kettimuthu 1 Mike Link 2 Catalin Dumitrescu 2 Ioan Raicu.

Extensible IO (XIO) system Provides a framework that implements a

Read/Write/Open/Close Abstraction Drivers are written that implement the functionality (file,

TCP, UDP, GSI, etc.) Different functionality is achieved by building protocol

stacks GridFTP drivers will allow 3rd party applications to easily

access files stored under a GridFTP server Other drivers could be written to allow access to other

data stores. Changing drivers requires minimal change to the

application code. Ported GridFTP to use UDT in less than a day

AFTER the UDT driver was written

Page 19: The Globus Striped GridFTP Framework and Server Bill Allcock 1 (presenting) John Bresnahan 1 Raj Kettimuthu 1 Mike Link 2 Catalin Dumitrescu 2 Ioan Raicu.

Network ProtocolNetwork

Protocol

Globus XIO Approach

ApplicationDisk

Network Protocol

Special Device

Glo

bus

XIO

Driver

Driver

Driver

Page 20: The Globus Striped GridFTP Framework and Server Bill Allcock 1 (presenting) John Bresnahan 1 Raj Kettimuthu 1 Mike Link 2 Catalin Dumitrescu 2 Ioan Raicu.

Globus XIO Framework

Moves the data from user to driver stack.

Manages the interactions between drivers.

Assist in the creation of drivers.

Asynchronous support. Close and EOF Barriers. Error checking Internal API for passing

operations down the stack.

User API

Fram

ework

Driver Stack

Transform

Transform

Transport

Page 21: The Globus Striped GridFTP Framework and Server Bill Allcock 1 (presenting) John Bresnahan 1 Raj Kettimuthu 1 Mike Link 2 Catalin Dumitrescu 2 Ioan Raicu.

What issues are we addressing?

Striping storage systems are often clusters, and we need

to be able to utilize all of that parallelism Collective Operations

essentially, the striping should be invisible to the outside world

Uniform interface Ideally, any data source can be treated the

same way

Page 22: The Globus Striped GridFTP Framework and Server Bill Allcock 1 (presenting) John Bresnahan 1 Raj Kettimuthu 1 Mike Link 2 Catalin Dumitrescu 2 Ioan Raicu.

What issues are we addressing? Network Protocol Independence

TCP has well known issues with high Bandwidth-Delay Product networks

Need to be able to take advantage of aggressive protocols on circuits.

Diverse Failure Modes Much happening under the covers, so must be

resilient to failures End-to-End Performance

We need to be able to manage performance for a wide range of resources

Page 23: The Globus Striped GridFTP Framework and Server Bill Allcock 1 (presenting) John Bresnahan 1 Raj Kettimuthu 1 Mike Link 2 Catalin Dumitrescu 2 Ioan Raicu.

Performance Results

Page 24: The Globus Striped GridFTP Framework and Server Bill Allcock 1 (presenting) John Bresnahan 1 Raj Kettimuthu 1 Mike Link 2 Catalin Dumitrescu 2 Ioan Raicu.

Comparison in Stream Mode

Throughput comparison on a LAN (0.2 ms RTT and 612 Mbps bandwidth)

0

20

40

60

80

100

120

140

160

1M 10M 100M 1000M

File size

Thro

ughp

ut in

mbp

s

NCFTP

ZEBRA

WUFTP

Throughput comparison on a WAN (60 ms RTT and 1 Gbps bandwidth)

0

20

40

60

80

100

120

140

160

180

1M 10M 100M 1000M

File size

Thro

ughp

ut in

mbp

s

NCFTP

ZEBRA

WUFTP

Page 25: The Globus Striped GridFTP Framework and Server Bill Allcock 1 (presenting) John Bresnahan 1 Raj Kettimuthu 1 Mike Link 2 Catalin Dumitrescu 2 Ioan Raicu.

Parallel Stream PerformanceThroughput on a LAN (0.2 ms RTT and 612 Mbps bandwidth)

0

100

200

300

400

500

600

700

0 5 10 15 20 25 30 35

Number of Streams

Th

rou

gh

pu

t in

mb

ps

iperf

zebra mem

zebra disk

bonnie

Throughput on DOT cluster (2.2 ms RTT and 1 Gbps bandwidth)

0

100

200

300

400

500

600

700

800

900

1000

0 5 10 15 20 25 30 35

Number of Streams

Th

rou

gh

pu

t in

mb

ps

iperf

zebra mem

zebra disk

bonnie

0

500

1000

1500

2000

2500

3000

3500

4000

4500

0 5 10 15 20 25 30 35

Streams

See

k co

un

t

Zebra (disk-disk)

Bandwidth vs Streams

0

200

400

600

800

1000

1200

1400

0 5 10 15 20 25 30 35

Streams

Ba

nd

wid

th (

in M

bp

s)

( Iperf)memory-memory

( Zebra)memory-memory

( Zebra)disk-disk

Bonnie

Page 26: The Globus Striped GridFTP Framework and Server Bill Allcock 1 (presenting) John Bresnahan 1 Raj Kettimuthu 1 Mike Link 2 Catalin Dumitrescu 2 Ioan Raicu.

Memory to MemoryStriping Performance

BANDWIDTH Vs STRIPING

0

5000

10000

15000

20000

25000

30000

0 10 20 30 40 50 60 70

Degree of Striping

Ban

dw

idth

(M

bp

s)

# Stream = 1 # Stream = 2 # Stream = 4 # Stream = 8 # Stream = 16 # Stream = 32

Page 27: The Globus Striped GridFTP Framework and Server Bill Allcock 1 (presenting) John Bresnahan 1 Raj Kettimuthu 1 Mike Link 2 Catalin Dumitrescu 2 Ioan Raicu.

Disk to Disk Striping PerformanceBANDWIDTH Vs STRIPING

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

20000

0 10 20 30 40 50 60 70

Degree of Striping

Ban

dw

idth

(M

bp

s)

# Stream = 1 # Stream = 2 # Stream = 4 # Stream = 8 # Stream = 16 # Stream = 32

Page 28: The Globus Striped GridFTP Framework and Server Bill Allcock 1 (presenting) John Bresnahan 1 Raj Kettimuthu 1 Mike Link 2 Catalin Dumitrescu 2 Ioan Raicu.

Storage Performance SDSC

0

200

400

600

800

1000

1200

1400

1600

1800

0 1 2 3 4 5 6 7 8 9

Number of nodes

Dis

k t

hro

ug

hp

ut

(in

Mb

ps)

Write

Read

Page 29: The Globus Striped GridFTP Framework and Server Bill Allcock 1 (presenting) John Bresnahan 1 Raj Kettimuthu 1 Mike Link 2 Catalin Dumitrescu 2 Ioan Raicu.

Storage Performance NCSA

0

200

400

600

800

1000

1200

1400

1600

1800

0 1 2 3 4 5 6 7 8 9

Number of nodes

Dis

k th

rou

gh

pu

t (i

n M

bp

s)

Write

Read

Page 30: The Globus Striped GridFTP Framework and Server Bill Allcock 1 (presenting) John Bresnahan 1 Raj Kettimuthu 1 Mike Link 2 Catalin Dumitrescu 2 Ioan Raicu.

Scalability ResultsGridFTP Server Performance

Upload 10MB file from 1800 clients to ned-6.isi.edu:/dev/null

0

100

200

300

400

500

600

700

800

900

1000

1100

1200

1300

1400

1500

1600

1700

1800

1900

2000

0 500 1000 1500 2000 2500 3000 3500

Time (sec)

# o

f Co

ncu

ren

t Ma

chin

es

/ Re

spo

nse

Tim

e (

sec)

0

10

20

30

40

50

60

70

80

90

100

CP

U %

/ Th

rou

gh

pu

t (M

B/s

))

Throughput

Response Time

Load

CPU %

Memory Used

Page 31: The Globus Striped GridFTP Framework and Server Bill Allcock 1 (presenting) John Bresnahan 1 Raj Kettimuthu 1 Mike Link 2 Catalin Dumitrescu 2 Ioan Raicu.

Summary

The GridFTP protocol provides a good set of features for data movement requirements in the Grid.

The Globus Striped Server (Zebra) implementation of this protocol provides a flexible design / architecture for integrating with different communities, storage systems, and protocols.

Our implementation is robust and performant over a range of environments.

Page 32: The Globus Striped GridFTP Framework and Server Bill Allcock 1 (presenting) John Bresnahan 1 Raj Kettimuthu 1 Mike Link 2 Catalin Dumitrescu 2 Ioan Raicu.

Questions?