Replica Management Services in the European DataGrid Project
Work Package 2European DataGrid
Outline
• The need for the European DataGrid and replica mgt.
• Overview of replica management services• Performance evaluation of services• Future work – replica management in EGEE• Conclusion
Why do we need a Grid?
100s MB/s data output -> several PB of data per year.
Equivalent to 2 million CDs of data/year needing 20,000 PCs per exp to analyse.
Distributed Grid computing…
The European DataGrid
• Ran from Jan 2001 – March 2004• Aim: to develop a Grid infrastructure for data-
intensive scientific applications– High energy physics, biology and Earth observation
producing several PB of data per year
• Developed Grid middleware for job, data and fabric management, information and monitoring
Grid Architecture
Scopeof EDG
middleware
Scopeof EDG-
WP2
Data Management
• Requirements:– Enable secure access to massive amounts
of data in a global name space– Move and replicate data at high speed
from one geographical site to another
• 1st generation: GDMP + edg-replica-manager– Used Globus for secure file transfer– C++ based – gave basic replication
functionality and cataloging
Data Management
• 2nd generation – uses web services– Easy and standardised way to connect distributed services via
XML
• Services include– Replica Manager Client
• main user interface– Replica Location Service
• stores physical locations of replicas– Replica Metadata Catalog
• stores logical file name mappings and metadata attributes– Replica Optimization Service
• provides optimised access to replicas– Security
• HTTPS + Globus’ GSI
Replica Location Service
• Implementation of RLS framework co-developed with Globus
• Maps unique identifier (GUID) to multiple replicas (SURLs)
• Local catalog (LRC) with distributed index (RLI)
RLI RLIRLI
LRC LRC LRC LRC
GUID:LRCs
GUID:SURL
soft-stateupdate
Replica Metadata Catalog
• GUIDs are unfriendly and non-intuitive– guid:131f9940-f501-11d8-9669-0800200c9a66
• Use user-definable Logical File Names– lfn:cal-test-data-2004-09-01-005a
• RMC stores LFN:GUID mappings (n:1)• Can also store ~10 metadata attributes
– eg file owner, file size
• Together with RLS gives complete LFN:GUID:SURL view
GUID
SURL
SURL
SURL
LFN
LFN
LFN
RMCRLS
Replica Optimization Service
• Gives optimised access to replicas by choosing replicas with quickest access (based on network measurements)
• Automatically replicates files to sites on which they are neededSimulation research (OptorSim) continues to investigate more complex replica management strategies
Replica Manager
• Client-side tool acts as user interface to services (although services can also be accessed directly)
• Coordinates service interactions• Interfaces with external services
– information service (MDS, R-GMA)– storage services (SRM, EDG-SE)– file transfer services (GridFTP)
Implementation
• Servers written in Java, clients auto-generated (Java, c++ etc) from WSDL
• Web services run on Apache Axis inside Java servlet engine (Tomcat/Oracle AS)
• Use MySQL/Oracle as back-end DB to store persistent information
• RLS used already in production for LCG (Oracle AS/DB)– CMS Data Challenge 04 – 2 million entries stored
Service Interactions
UserInterface
ReplicaManager
ReplicaMetadata Catalog
ReplicaOptimization Service
ReplicaLocation Service
StorageElement 1
StorageElement 2
1. replicateFile(LFN, SE2)
2. getGuid(LFN)
3. listReplicas(GUID)
4. listBestFile(SURLs, SE2)
5. copyFile(SE1, SE2)
6. registerFile(GUID, SURL)
“Make a replica of the file specified by LFN to SE2”
RLS performance
• In production use, only single LRC used so far– Test performance using Java and c++ API to insert and
query GUID:SURL mappings
Java vs c++ insert
• Excellent query performance, c++ more stable than Java
c++ query
RLS performance
Using Java API and multiple concurrent threads
Insert 500,000 mappings 5 insert and 5 query threads
• Throughput peak ~20 threads, again stable query performance
Security
• Security adds significant overheads!
• Problem caused by new connection for each transaction
• Could be reduced by using bulk operations
RLS Inserts Secure Client (s)
Insecure Client(s)
1 0.77 0.07
10 7.07 0.54
100 55.44 3.38
1000 527.12 28.61
RMC performance
• Test multiple LFNs per GUID and multiple metadata attributes
c++ query Java insert
• Scales well with no. of LFNs per GUID and no. of attributes
RMC Performance
• Command Line Interface: edg-rmc addAlias
Time (s) Operation
0 - 1.0 Start-up script and JVM start-up
1.0 - 1.1 Parse command and options
1.1 - 2.1 Get RMC service locator
2.1 - 2.3 Get RMC object
2.3 - 3.0 Call to rmc.addAlias() method
3.0 End
• Very slow compared to API calls (2 orders of mag slower)• Recommended for testing an installation only
The Future of EDG Services
• G-Lite - middleware (re)engineering and integration– using many concepts/experience from
EDG– but geared towards service-oriented
architecture
• EGEE - building production quality Grids• Lessons learned from EDG:
• Less is more: stability and usability most important• User interface and documentation difficult to get first time• Need easy integration of different providers
EGEE Data Mgt Services
• Replica Manager -> Data Scheduler + Transfer Fetcher + File Placement Service + File Transfer Service
From EGEE Middleware Architecture and Planning (Release 1.0) DJRA1.1
EGEE Data Mgt Services
• RLS + RMC -> Combined Catalog Interface to: File Catalog + Replica Catalog (+ Metadata Catalog)
From EGEE Middleware Architecture and Planning (Release 1.0) DJRA1.1
Conclusion
• EDG WP2 has developed a set of integrated replica management services
• Can cope with demanding Grid conditions– already used in production environment
• A lot of concepts now being taken forward into EGEE project
Top Related