The Data Replication Service Ann Chervenak Robert Schuler USC Information Sciences Institute.

14
The Data Replication Service Ann Chervenak Robert Schuler USC Information Sciences Institute

Transcript of The Data Replication Service Ann Chervenak Robert Schuler USC Information Sciences Institute.

Page 1: The Data Replication Service Ann Chervenak Robert Schuler USC Information Sciences Institute.

The Data Replication Service

Ann Chervenak

Robert Schuler

USC Information Sciences Institute

Page 2: The Data Replication Service Ann Chervenak Robert Schuler USC Information Sciences Institute.

The Data Replication Service

Included in the Tech Preview of GT4.0 release

Design is based on the publication component of the Lightweight Data Replicator system

Developed by Scott Koranda from U. Wisconsin at Milwaukee

Functionality Replicate a set of files in the Grid on a local site Users identify a set of desired files DRS queries Replica Location Service to discover current

locations of these files Creates local replicas of desired files using the Reliable File

Transfer Service Registers new replicas in Replica Location Service for discovery

Page 3: The Data Replication Service Ann Chervenak Robert Schuler USC Information Sciences Institute.

Motivation for DRS

Need for higher-level data management services that integrate lower-level Grid functionality

Efficient data transfer (GridFTP, RFT) Replica registration and discovery (RLS) Eventually validation of replicas, etc.

Goal is to generalize the custom data management systems developed by several application communities

Eventually plan to provide a suite of general, configurable, higher-level data management services

DRS is the first of these services

Page 4: The Data Replication Service Ann Chervenak Robert Schuler USC Information Sciences Institute.

Relationship to Other Globus Services

At requesting site, deploy:

WS-RF Services Data Replication

Service Delegation Service Reliable File Transfer

Service

Pre WS-RF Components Replica Location

Service (Local Replica Catalog and Replica Location Index)

GridFTP Server

Web Service Container

Data Replication

Service

Replicator Resource

Reliable File

Transfer Service

RFT Resource

Local Replica Catalog

Replica Location

Index

GridFTP Server

Delegation Service

Delegated Credential

Local Site

Page 5: The Data Replication Service Ann Chervenak Robert Schuler USC Information Sciences Institute.

DRS Functionality

Initiate a DRS Request Discover and select among replicas that act as

source locations for data copies Transfer data to local site to create new replicas Register new replicas in catalogs

Page 6: The Data Replication Service Ann Chervenak Robert Schuler USC Information Sciences Institute.

Initiating a DRS Request

Client uses GT4 Delegation Service to create a delegated credential that may be used by other services to act on behalf of user

Client creates a request file containing a replication request description including: desired logical files destination URLs

Client sends message to DRS to create the Replicator resource and passes the request file’s URL

Replicator retrieves the request file

Page 7: The Data Replication Service Ann Chervenak Robert Schuler USC Information Sciences Institute.

Replica Discovery and Selection Replicator queries the Globus Replica Location Service in a two-

step process to discover locations of desired files: Query local site’s Replica Location Index to find the catalogs at

remote sites that contain mappings for the requested files Query remote Local Replica Catalogs to get the physical file names

of the replicas

Replicator selects source file for each file to be copied Current implementation chooses randomly A callout is provided for more sophisticated replica selection

decisions based on state of Grid

Page 8: The Data Replication Service Ann Chervenak Robert Schuler USC Information Sciences Institute.

File Transfers to Create New Replicas

The Replicator initiates a reqeust with Globus Reliable File Transfer Service

Creates RFT resource that holds state for each data transfer

Control passes from DRS to RFT, which also retrieves the delegated credential from the Delegation Service

RFT coordinates the file transfers

Transfers are performed by GridFTP servers at the source and destination sites

After transfers complete, the Replicator checks status of each file in the transfer request

Page 9: The Data Replication Service Ann Chervenak Robert Schuler USC Information Sciences Institute.

Registration of New Replicas

Replicator adds mappings for the newly created replicas to its Globus RLS Local Replica Catalog

Local Replica Catalog updates Replica Location Indexes to make new replicas visible throughout Grid

Page 10: The Data Replication Service Ann Chervenak Robert Schuler USC Information Sciences Institute.

Performance Measurements: Wide Area Testing

The destination for the pull-based transfers is located in Los Angeles

Dual-processor, 1.1 GHz Pentium III workstation with 1.5 GBytes of memory and a 1 Gbit Ethernet

Runs a GT4 container and deploys services including RFT and DRS as well as GridFTP and RLS

The remote site where desired data files are stored is located at Argonne National Laboratory in Illinois

Dual-processor, 3 GHz Intel Xeon workstation with 2 gigabytes of memory with 1.1 terabytes of disk

Runs a GT4 container as well as GridFTP and RLS services

Page 11: The Data Replication Service Ann Chervenak Robert Schuler USC Information Sciences Institute.

DRS Operations Measured

Create the DRS Replicator resource Discover source files for replication using local RLS

Replica Location Index and remote RLS Local Replica Catalogs

Initiate an Reliable File Transfer operation by creating an RFT resource

Perform RFT data transfer(s) Register the new replicas in the RLS Local Replica

Catalog

Page 12: The Data Replication Service Ann Chervenak Robert Schuler USC Information Sciences Institute.

Experiment 1: Replicate 10 Files of Size 10 Gigabytes

Component of Operation Time (milliseconds)

Create Replicator Resource 317.0

Discover Files in RLS 449.0

Create RFT Resource 808.6

Transfer Using RFT 1186796.0

Register Replicas in RLS 3720.8

Data transfer time dominates Wide area data transfer rate of 67.4 Mbits/sec

Page 13: The Data Replication Service Ann Chervenak Robert Schuler USC Information Sciences Institute.

Experiment 2: Replicate 1000 Files of Size 10 Megabytes

Component of Operation Time (milliseconds)

Create Replicator Resource 1561.0

Discover Files in RLS 9.8

Create RFT Resource 1286.6

Transfer Using RFT 963456.0

Register Replicas in RLS 11278.2

Time to create Replicator and RFT resources is larger Need to store state for 1000 outstanding transfers

Data transfer time still dominates Wide area data transfer rate of 85 Mbits/sec

Page 14: The Data Replication Service Ann Chervenak Robert Schuler USC Information Sciences Institute.

Future Work

We will continue performance testing of DRS: Increasing the size of the files being transferred Increasing the number of files per DRS request

Add and refine DRS functionality as it is used by applications

E.g., add a push-based replication capability

We plan to develop a suite of general, configurable, composable, high-level data management services