10 May 2007 HTTP - - [email protected] User data via HTTP(S) Andrew McNab University of...

12
10 May 2007 HTTP - www.gridsite.org - [email protected] User data via HTTP(S) Andrew McNab University of Manchester

Transcript of 10 May 2007 HTTP - - [email protected] User data via HTTP(S) Andrew McNab University of...

Page 1: 10 May 2007 HTTP -  - Andrew.McNab@manchester.ac.uk User data via HTTP(S) Andrew McNab University of Manchester.

10 May 2007 HTTP - www.gridsite.org - [email protected]

User data via HTTP(S)

Andrew McNab

University of Manchester

Page 2: 10 May 2007 HTTP -  - Andrew.McNab@manchester.ac.uk User data via HTTP(S) Andrew McNab University of Manchester.

10 May 2007 HTTP - www.gridsite.org - [email protected]

Outline

Protocol structure Advantages “Missing” features Performance APIs SlashGrid Summary

Page 3: 10 May 2007 HTTP -  - Andrew.McNab@manchester.ac.uk User data via HTTP(S) Andrew McNab University of Manchester.

10 May 2007 HTTP - www.gridsite.org - [email protected]

Protocol structure

• HTTP uses a single control and data channel

– cf separate control channel in (Grid)FTP

• Multiple requests can be sent down the same TCP connection

– Each request starts with a block of headers, giving request

URI, cookies etc.

– No need to rebuild TCP connection: more requests are cheap

– RFCs define many headers: partial fetches, redirections etc.

• HTTPS puts HTTP inside an encrypted SSL/TLS stream

– SSL session reuse avoids need to rebuild SSL context even if

TCP connection is closed

Page 4: 10 May 2007 HTTP -  - Andrew.McNab@manchester.ac.uk User data via HTTP(S) Andrew McNab University of Manchester.

10 May 2007 HTTP - www.gridsite.org - [email protected]

Advantages

• Simple protocol, with independent TCP connections

– No special effort needed for firewalls

• Clients exist in almost all languages / environments

• Very good quality implementations due to the Web

– eg Apache with a huge developer community

• Integrates seamlessly into Web portals

– eg poster 58, with HTTPS added to DPM

• Reuses users' “common knowledge” about the Web

Page 5: 10 May 2007 HTTP -  - Andrew.McNab@manchester.ac.uk User data via HTTP(S) Andrew McNab University of Manchester.

10 May 2007 HTTP - www.gridsite.org - [email protected]

What's missing?

• GSI Proxies?

– Clients can use GSI proxies without modification.

– GridSite adds GSI proxy support to Apache webserver.

• Third-party transfers?

– COPY is defined in WebDAV RFC

– Implemented by GridSite using cookies (“onetime

passcodes”)

• Multichannel / parallel tranfers?

– Make parallel partial requests for blocks of a file

– Apache supports these partial requests out of the box

Page 6: 10 May 2007 HTTP -  - Andrew.McNab@manchester.ac.uk User data via HTTP(S) Andrew McNab University of Manchester.

10 May 2007 HTTP - www.gridsite.org - [email protected]

“GridHTTP”

A profile for using HTTP in a Grid environment

– Doesn't define any new headers etc. Clients use GSI Proxies over HTTPS to authenticate Can request an HTTP data transfer, using “Upgrade” header.

– Server may redirect to an HTTP version of the file

– Includes a onetime passcode HTTP cookie in the response Client makes an HTTP GET request using the passcode cookie

– Naïve clients like curl respond this way automatically! For third party transfers, instead of GET, client issues COPY to

destination site with passcode, so it can pull the file instead.

Page 7: 10 May 2007 HTTP -  - Andrew.McNab@manchester.ac.uk User data via HTTP(S) Andrew McNab University of Manchester.

10 May 2007 HTTP - www.gridsite.org - [email protected]

Firewalls

Some sites block outgoing port 80 (HTTP) and port 433 (HTTPS)

– Risk of denial of service attacks on mainstream Web sites? Some sites also use transparent HTTP caches on port 80 to

reduce interactive web traffic

– This is decreasing due to “Web 2.0” and uncacheable pages To sidestep this, we advocate using two unused, reserved ports:

– Port 488 (“gss-http”) for HTTPS

– Port 777 (“multiling-http”) for HTTP Apache virtual hosts can readily listen on multiple ports:

– <VirtualHost host.domain:488 host.domain:443>

Page 8: 10 May 2007 HTTP -  - Andrew.McNab@manchester.ac.uk User data via HTTP(S) Andrew McNab University of Manchester.

10 May 2007 HTTP - www.gridsite.org - [email protected]

Performance (1)

Mean of 5 * 100MB from Manchester to

21 EGEE sites

mean GridHTTP time

/ mean GridFTP time

vs

mean GridHTTP time

960s is ncp.edu.pk9s is man.ac.uk

Page 9: 10 May 2007 HTTP -  - Andrew.McNab@manchester.ac.uk User data via HTTP(S) Andrew McNab University of Manchester.

10 May 2007 HTTP - www.gridsite.org - [email protected]

Performance (2)

Mean of 5 * 100MB from Manchester to

17 EGEE sites

mean GridHTTP time

/ mean HTTPS time

vs

mean GridHTTP time

500s is indiacms.res.in10s is man.ac.uk

Page 10: 10 May 2007 HTTP -  - Andrew.McNab@manchester.ac.uk User data via HTTP(S) Andrew McNab University of Manchester.

10 May 2007 HTTP - www.gridsite.org - [email protected]

APIs Languages / environments a big issue for new

applications

– ie not everyone uses C++ ! Many environments have “native” HTTP(S) support

– eg libxml, ROOT, PHP, Java, Gnome (Virtually) all languages have HTTP(S) libraries

– eg curl supports everything from Ada to wxWidgets Command-line (wget, curl, ...) and file browser tools

for (virtually) all operating systems Since GridHTTP uses standard HTTP concepts like

cookies, standard client libraries work without modification.

Page 11: 10 May 2007 HTTP -  - Andrew.McNab@manchester.ac.uk User data via HTTP(S) Andrew McNab University of Manchester.

10 May 2007 HTTP - www.gridsite.org - [email protected]

SlashGrid

This is the simplest API: a POSIX-like filesystem

open(), read(), write(), mkdir(), ftruncate(), unlink(),

stat(), readdir(), rename() Now part of GridSite

– Uses FUSE kernel module on Linux, which is

included in SL 4.4 and available for all 2.4.x/2.6.x

kernels HTTP(S) to retrieve remote files, with GSI proxy if

available URLs mapped to local paths:

– /grid/https/node42.site.name/dir/file.dat

Page 12: 10 May 2007 HTTP -  - Andrew.McNab@manchester.ac.uk User data via HTTP(S) Andrew McNab University of Manchester.

10 May 2007 HTTP - www.gridsite.org - [email protected]

Summary HTTP(S) viable protocols for bulk data transfer Considerable advantages in terms of ubiquity of

client tools and quality of servers “Missing” features provided using headers etc

defined by the RFCs

– In particular, “GridHTTP” profile A wide variety of APIs available for ~all

langauges SlashGrid is a POSIX-like filesystem HTTP(S)

client

– That uses GSI proxies if available