Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc.edu ...

32
Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc.edu http://www.npaci.edu/ DICE/SRB/ SDSC/UCSD/NPACI

Transcript of Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc.edu ...

Page 1: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc.edu  SDSC/UCSD/NPACI.

Data Grid Interactionswith Firewalls

Michael WanReagan Moore

{mwan,moore}@sdsc.edu

http://www.npaci.edu/DICE/SRB/

SDSC/UCSD/NPACI

Page 2: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc.edu  SDSC/UCSD/NPACI.

A Quick Overview of SRB Data Grid

• Federated server system– Single client signOn– Access to all resources in the federation– Data grid owns all files

• Context management– MCAT server – Metadata catalog– Use traditional DBMS

• Four logical name spaces– Logical resource name (operations on sets of resources)– Distinguished user name space– Logical file name space– Metadata attribute name space (state information)

Page 3: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc.edu  SDSC/UCSD/NPACI.

Federated Servers and Resources

MCAT1

MCAT2

MCAT3Server1.1

Server1.2

Server2.1Server2.2

Server3.1

Federated Data Grids

Data Grid 1

Data Grid 2

Data Grid 3

Page 4: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc.edu  SDSC/UCSD/NPACI.

Types of Data Loss Risks

• Media corruption

• Vendor systemic failure

• Operational error

• Malicious user

• Natural disaster

• Solutions - replication, firewalls, federation

Page 5: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc.edu  SDSC/UCSD/NPACI.

National Archives Persistent Archive

NARA U Md SDSC

MCAT MCAT MCAT

Principle copystored at NARAwith completemetadata catalog

Replicated copyat U Md for improvedaccess, load balancingand disaster recovery

Deep Archive atSDSC, no useraccess, but complete copy

Page 6: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc.edu  SDSC/UCSD/NPACI.

BIRN Virtual Data Grid:BIRN Virtual Data Grid:Source Mark EllismanSource Mark Ellisman

• Defines a Distributed Data Handling System

• Integrates Storage Resources in the BIRN network

• Integrates Access to Data, to Computational and Visualization Resources

• Acts as a Virtual Platform for Knowledge-based Data Integration Activities

• Provides a Uniform Interface

to Users

Page 7: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc.edu  SDSC/UCSD/NPACI.

Worldwide Universities NetworkDavid De Roure, University of Southampton

[email protected]://www.ecs.soton.ac.uk/~dder

• Implement data grid linking academic universities

• Support collaborative research and education– HASTAC: Humanities, Arts, Science and Technology Advanced

Collaboratory

– Geo-referenced social science data collections

– Earth Science data collections

• Provide data grid registry to promote federation of international data grids

Page 8: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc.edu  SDSC/UCSD/NPACI.

Foundation of the WUN Grid

• SDSC• Manchester• Southampton• White Rose• NCSA• A functioning, general

purpose international Grid

• A hub for federating other data grids Manchester-SDSC mirror

Page 9: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc.edu  SDSC/UCSD/NPACI.

Authentication

• User authenticates to a data grid server– GSI or challenge response– Access controls map constraints between user

distinguished names and logical file names

• Data grid server authenticates to remote data grid server

• Remote data grid server authenticates to remote storage repository under data grid ID

Page 10: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc.edu  SDSC/UCSD/NPACI.

Firewall Interactions• Client behind a firewall

• Client initiated parallel I/O• Client initiated bulk file load

• Server behind a firewall• Paired servers inside and outside the firewall

• Server inside the firewall only responds to messages from outside server

• Server initiated parallel I/O

• Federated data grids• Need to add metadata to forward messages from a paired front-end server to the back-end server

Page 11: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc.edu  SDSC/UCSD/NPACI.

SRBserver1

SRB agent

SRBserver2

Client behind firewall

MCAT

Sput

SRB agent

1

2

3

4

5

6

srbObjCreatesrbObjWrite

1.Logical-to-Physical mapping2. Identification of Replicas3.Access & Audit Control

Peer-to-peer

Request

Server(s) SpawningData

Transfer R

Page 12: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc.edu  SDSC/UCSD/NPACI.

SRBserver1

SRB agent

SRBserver2

Client Initiated Parallel I/O

MCAT

Sput -M

SRB agent

1

2

3

4

7

8srbObjPut

1.Logical-to-Physical mapping2. Identification of Replicas3.Access & Audit Control

Return socket addr.,

port and cookie

Connect to server

Data transfer

R

5

6

Page 13: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc.edu  SDSC/UCSD/NPACI.

SRBserver

SRB agent

SRBserver2

Client Initiated -Third Party Data Transfer

MCAT

Scp

SRB agent

1

2

3

4

5

srbObjCopy

dataPut- socket addr.,

port and cookie

Connect to server2 Data

transfer

R

6

SRBserver1

SRBserver

SRB agent

R

Page 14: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc.edu  SDSC/UCSD/NPACI.

SRBserver1

SRB agent

SRBserver2

Client Initiated - Bulk Load Operation

MCAT

Sput -b

SRB agent

1

2

3

4

6

Return Resource Location

1.Logical-to-Physical mapping2. Identification of Replicas3.Access & Audit Control

Query Resource

Bulk Register

Bulk Data transfer thread

R

8 Mb buffer

Bulk Registration

threads

5

Store Data in a temp file

Unfold temp file

Page 15: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc.edu  SDSC/UCSD/NPACI.

SRBserver1

SRB agent

SRBserver2

Server behind firewall

MCAT

Sput

SRB agent

1

2

3

4

5

6

srbObjCreatesrbObjWrite

1.Logical-to-Physical mapping2. Identification of Replicas3.Access & Audit Control

Peer-to-peer

Request

Server(s) SpawningData

Transfer R

Page 16: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc.edu  SDSC/UCSD/NPACI.

SRBserver1

SRB agent

SRBserver2

Server Initiated Parallel I/O

MCAT

Sput -m

SRB agent

1

2

3

4

5

6

srbObjPut+ socket addr , port and cookie

1.Logical-to-Physical mapping2. Identification of Replicas3.Access & Audit Control

Peer-to-peer

Request

Connect to client

Data transfer

R

Page 17: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc.edu  SDSC/UCSD/NPACI.

Federated Data Grids

MCAT1

MCAT2

MCAT3Server1.1

Server1.2

Server2.1Server2.2

Server3.1

Automating redirection toa server in front of a firewall

Data Grid 1

Data Grid 2

Data Grid 3

Client

Page 18: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc.edu  SDSC/UCSD/NPACI.

Container - Archival of Small files

• Performance issues with storing/retrieving large number of small files to/from tape

• Container design– physical grouping of small files– Implemented with a Logical Resource

• A pool of Cache Resource for the frontend resource• An Archival Resource for the backend resource

– Read/Write I/O always done on Cache Resource and sync to the Archival Resource

• Stage to cache if a cache copy does not exist• The entire container is moved between cache and archival and

written to tape • Bulk operation with container - faster

Page 19: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc.edu  SDSC/UCSD/NPACI.

Examples of using container

• Make a container with name “myCont”– Smkcont -S cont-sdsc myCont

• Put a file into “myCont”– Sput -c myCont myLocalSrcFile mySRBTargFile

• Bulk Load a local directory into “myCont”– Sbload -c myCont myLocalSrcDir mySRBTargColl

• Sync “myCont” to archival and purge the cache copy– Ssyncont -d myCont

• Download a file store in “myCont”– Sget mySRBsrcFile myLocalTargFile

• Slscont - list existing containers and contents

Page 20: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc.edu  SDSC/UCSD/NPACI.

Summary of Data Transfer modes

• Serial - default mode

• Parallel - for large files

• Bulk - for large number of small files

• Container - Archiving small files (to tapes).

• Container + bulk - faster archival of small files

Page 21: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc.edu  SDSC/UCSD/NPACI.

Types of Data Transfer

• Local to SRB - Sput, Srsync

• SRB to Local - Sget, Srsync

• SRB to SRB - Scp, Sreplicate, Sbkupsrb, Srsync– Third party transfer

• Server to Server data transfer, client not involved

• Parallel I/O

Page 22: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc.edu  SDSC/UCSD/NPACI.

Other useful Data Management Scommands

• Srsync, Schksum - – Data synchronization using checksum values – similar to UNIX’s rsync

• Sreplicate, Sbkupsrb– generate multiple copies of data using replica– Replica - multiple copies of the same file

• same Logical Path Name - e.g., /home/srb.sdsc/foo

• replica on different resources

• Each replica has different replNum

• Most recently modified flag

Page 23: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc.edu  SDSC/UCSD/NPACI.

Commands Using Checksum

• Registering checksum values into MCAT– at the time of upload

• Sput -k - compute checksum of local source file and register with MCAT

• Sput -K – checkum verification mode

– After upload, compute checksum by reading back uploaded file

– Compare with the checksum generated with locally

– Existing SRB files• Schksum

– compute and register checksum if not already exist

• Srsync - if the checksum does not exist

Page 24: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc.edu  SDSC/UCSD/NPACI.

Srsync command• Synchronize the data

– from a local copy to SRB• Srsync myLocalFile s:mySrbFile

– from a SRB copy to a local file system• Srsync s:mySrbFile myLocalFile

– between two SRB paths.• Srsync s:mySrbFile1 s:mySrbFile2

• Similar to rsync– compare the checksum values of source and target– upload/download source to target if

• target does not exist or checksum differ

– Save checksum values to MCAT

Page 25: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc.edu  SDSC/UCSD/NPACI.

Srsync command (cont)

• Some Srsync options– -r --- recursively Synchronizing a

directory/collection– -s --- use size instead of checksum value for

determining synchronization• Faster - no checksum computation

• Less accurate

– -m, -M --- parallel I/O

Page 26: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc.edu  SDSC/UCSD/NPACI.

Sreplicate, Sbkupsrb commands

• Generate multiple copies of data using replica

• Sreplicate - Generate a new replica each time

• Sbkupsrb– Backups the srb data/collection to the specified

backupResource with a replica– If an up-to-date replica already exists in the

backupResource, nothing will be done

Page 27: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc.edu  SDSC/UCSD/NPACI.

Data and Resource Virtualisation

• Data and Collections Organisation– File Logical Name space -

• UNIX like directories (collections) and files (data)

• Mapping of logical name to physical attributes - host address, physical path.

• UNIX like API and utilities for making collections (mkdir) and data creation (creat)

• Virtualisation of Resources– Mapping of a logical resource name to physical attributes: Resource

Location, Type – Client use a single logical name to reference a resource

Page 28: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc.edu  SDSC/UCSD/NPACI.

Listing Resources

• SgetR – List Configured Resources– SgetR– --------------------------- RESULTS ------------------------------– rsrc_name: unix-sdsc– netprefix: srb.sdsc.edu:NULL:NULL– rsrc_typ_name: unix file system– default_path:

/misc/srb/srb/SRBVault/?USER.?DOMAIN/?SPLITPATH/TEST.?PATH?DATANAME.?RANDOM.?TIMESEC

– phy_default_path: /misc/srb/srb/SRBVault/?USER.?DOMAIN/?SPLITPATH/TEST.?PATH?DATANAME.?RANDOM.?TIMESEC

– phy_rsrc_name: unix-sdsc– rsrc_typ_name: unix file system– rsrc_class_name: permanent– user_name: srb– domain_desc: sdsc– zone_id: sdscdemo– -----------------------------------------------------------------

Page 29: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc.edu  SDSC/UCSD/NPACI.

Serial Mode Data Transfer

• Simple to Implement and Use – Unix-like API – srbObjCreate, srbObjWrite

• Performance Issue– 2 hops data transfer – Single data stream– One file at a time – overhead relatively high for

small files• MCAT interaction – query and registration• Small buffer transfer

• Large files – Single Hop, multiple data streams• Small files – Single Hop, multiple files at a time

Page 30: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc.edu  SDSC/UCSD/NPACI.

Upload a File to a SRB Resource

• Sput –S unix-sdsc localFile srbFile– Default data transfer mode – serial

• Sls -l srbFile– srb 0 unix-sdsc 2764364 2004-08-21-18.19 % srbFile

Page 31: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc.edu  SDSC/UCSD/NPACI.

Small files Data Transfer (Bulk operation)

• Upload/download large number of small files– One file at a time – relative high overhead

• MCAT interaction, Small buffer transfer

• <= 0.5 sec/file for LAN, > 1 sec/files for WAN

• Bulk Operation– Bulk data transfer

• transfer multiple files in a single large buffer (8 Mb)

– Bulk Registration• Register large number of files (1,000) in a single call

– Multiple threads for transfer and registration

– Single Hop

– 3-10 times speedup

– All or nothing type operation

– Specify -b in Sput/Sget

Page 32: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc.edu  SDSC/UCSD/NPACI.

Parallel Mode Data Transfer

• For large file transfer– multiple data streams – Single hop data transfer

• Two sub-modes – Server initiated – Client initiated (for clients behind firewall)

• Up to 5 times speed up for WAN• Two simple API – srbObjPut and srbObjGet• Use –m (Server initiated), -M (Client initiated) options• Available to all Scommands involving data transfer

– As an option – Sput, Sget, Srsync– Automatic – Sreplicate, Scp, Sbkupsrb, SsyncD, Ssyncont