Post on 01-Apr-2015
INFSO-RI-508833
Enabling Grids for E-sciencE
www.eu-egee.org
Data Management System
Jean Salzemann CNRS/IN2P3
ACGRID School,Hanoi (Vietnam) November 6th, 2007
Credits: Giuseppe Misurelli
2
Enabling Grids for E-sciencE
INFSO-RI-508833
Outline
• Grid Data Management Challenge
• Storage Elements and SRM
• LFC File Catalog
• Data Movement Utils
3
Enabling Grids for E-sciencE
INFSO-RI-508833
Grid DM Challenge
• Grid Data Management Challenge
• Storage Elements and SRM
• LCG File Catalog
• Data Movement Utils
4
Enabling Grids for E-sciencE
INFSO-RI-508833
The Grid DM Challenge /1
NEEDS REQUIREMENTS SOLUTIONS
Heterogeneous: Data are stored on different storage systems using different technologies.
A common interface to storage resources is required in order to hide the underlying complexity.
Storage Resource Manager (SRM) interface;
(gLite File I/O Server)
Distributed: Data are stored in different locations; in most cases there is no shared file system or common namespace.
Data need to be moved between different locations.
Need to keep track where data are stored.
File Transfer Service (FTS) – to move files among GRID sites.
Catalog – to keep track where data are stored.
Data Retrieving: Applications are located in different places from where data are stored.
Need of scheduled reliable file transfer service.
File Transfer Service•Data Scheduler •File Placement Service•Transfer Agent•File Transfer Library
Security: Data must be managed according to the VO membership access control policy.
Centralized Access control Service.
File Authorization Service
5
Enabling Grids for E-sciencE
INFSO-RI-508833
The Grid DM Challenge /2
• DM works with files, this assumption is due the following reasons: – semantic of file is very good understood by everyone
– file is the smallest granularity of data.
7
Enabling Grids for E-sciencE
INFSO-RI-508833
Data Management Services
• Storage Element – common interface to storage
– Storage Resource Manager Castor, dCache, DPM, …– POSIX-I/O gLite-I/O– Native Access protocols rfio, dcap– Transfer protocols gsiftp
• Catalogs – keep track where data are stored
– File Catalog – Replica Catalog LFC, Metadata Catalog (es. AMGA)– File Authorization Service– Metadata Catalog
• File Transfer – schedules reliable file transfer
– Data Scheduler – File Transfer Service lcg-utils, gLite FTS
8
Enabling Grids for E-sciencE
INFSO-RI-508833
SE and SRM
• Grid Data Management Challenge
• Storage Elements and SRM
• LFC File Catalog
• Data Movement Utils
9
Enabling Grids for E-sciencE
INFSO-RI-508833
SRM in an example /1
She is running a job which needs:Data for physics event reconstructionSimulated DataSome data analysis filesShe will write files remotely too
They are at CERNIn dCache
They are at FermilabIn a disk array
They are at Nikhefin a classic SE
10
Enabling Grids for E-sciencE
INFSO-RI-508833
SRM in an example /2
dCacheOwn system, own protocols and parameters
CastorNo connection with dCache or DPM
gLite DPMIndependent system from dCache or Castor
You as a user need to know all
the systems!!!
SR
M
I talk to them on your behalfI will even allocate space for your filesAnd I will use transfer protocols to send your files there
11
Enabling Grids for E-sciencE
INFSO-RI-508833
Storage Resource Management
• Data are stored on disk pool servers or Mass Storage Systems
• storage resource management needs to take into account– Transparent access to files (migration to/from disk pool)– File pinning– Space reservation– File status notification– Life time management
• The SRM (Storage Resource Manager) takes care of all these details– The SRM is a single interface that takes care of local storage
interaction and provides a Grid interface to the outside world
12
Enabling Grids for E-sciencE
INFSO-RI-508833
gLite SE types /1
• gLite 3.0 data access protocols:– File Transfer: GSIFTP (GridFTP)– File I/O (Remote File access): gsidcap
insecure RFIO
secured RFIO (gsirfio)
• Classic SE:– GridFTP server– Insecure RFIO daemon (rfiod) – only LAN limited file access– Single disk or disk array– No quota management– Does not support the SRM interface
13
Enabling Grids for E-sciencE
INFSO-RI-508833
gLite SE types /2
• Mass Storage Systems (Castor)– Files migrated between front-end disk and back-end tape
storage hierarchies– GridFTP server– Insecure RFIO (Castor)– Provide a SRM interface with all the benefits
• Disk pool managers (dCache and gLite DPM)– manage distributed storage servers in a centralized way– Physical disks or arrays are combined into a common (virtual)
file system– Disks can be dynamically added to the pool – GridFTP server– Secure remote access protocols (gsidcap for dCache, gsirfio for
DPM)– SRM interface
14
Enabling Grids for E-sciencE
INFSO-RI-508833
File Catalog and DM Tools
• Grid Data Management Challenge
• Storage Elements and SRM
• LFC File Catalog
• Data Movement Utils
15
Enabling Grids for E-sciencE
INFSO-RI-508833
Files & replicas: Naming Conventions
• Logical File Name (LFN) – An alias created by a user to refer to some item of data, e.g. “lfn:cms/20030203/run2/track1”
• Globally Unique Identifier (GUID) – A non-human-readable unique identifier for an item of data, e.g.
“guid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6”
• Site URL (SURL) (or Physical File Name (PFN) or Site FN)– The location of an actual piece of data on a storage system, e.g.
“srm://pcrd24.cern.ch/flatfiles/cms/output10_1” (SRM) “sfn://lxshare0209.cern.ch/data/alice/ntuples.dat” (Classic SE)
• Transport URL (TURL)– Temporary locator of a replica + access protocol: understood by a SE, e.g.
“rfio://lxshare0209.cern.ch//data/alice/ntuples.dat”
16
Enabling Grids for E-sciencE
INFSO-RI-508833
• Provides• Bulk operations• Cursors for large queries• Timeouts and retries for client operations
• Features• User exposed transaction API• Hierarchical namespace and namespace operations• Integrated GSI Authentication and Authorization• Access Control Lists (Unix Permissions and POSIX ACLs)• Checksums
Supported database backends: Oracle and MySQL
LFC - Description
17
Enabling Grids for E-sciencE
INFSO-RI-508833
• LFC stores both logical and physical mappings for the file in the same database Speed up of operations• Treats all entities as files in a UNIX-like filesystem. • File API also similar to UNIX (create(), mkdir(), chown()….)• Hierarchical namespace of LFNs mapped to the GUIDs• GUIDs mapped to the physical locations of file replicas in the storage• System attributes of files (creation time, file size and checksum…) stored as LFN attributes • One field for user-defined metadata • Multiple LFNs per GUID allowed as symbolic links to the primary LFN.
File Metadata
Logical File Name (LFN)
GUID
System Metadata (ACLs, Ownership,etc
Symlinks
Link name
User Metadata
User defined Metadata
File Replica
Storage File Name
Storage Host
LFC - Architecture
18
Enabling Grids for E-sciencE
INFSO-RI-508833
File Catalog and DM Tools
• Grid Data Management Challenge
• Storage Elements and SRM
• LFC File Catalog
• Data Movement Utils
19
Enabling Grids for E-sciencE
INFSO-RI-508833
GFAL: Grid File Access Library
Interactions with SE require some components:→ File catalog services to locate replicas→ SRM→ File access mechanism to access files from the SE on the WN
GFAL does all this tasks for you: → Hides all these operations→ Presents a POSIX interface for the I/O operations
→ User can create all commands needed for storage management
→ It offers as well an interface to SRM Supported protocols:
→ file (local or nfs-like access) → dcap, gsidcap and kdcap (dCache access)→ rfio (castor access) and gsirfio (dpm)
20
Enabling Grids for E-sciencE
INFSO-RI-508833
lcg-utils DM tools
• High level interface (CL tools and APIs) to– Upload/download files to/from the Grid (UI,CE and WN <--->
SEs)– Replicate data between SEs and locate the best replica available– Interact with the file catalog
• Definition: A file is considered to be a Grid File if it is both physically present in a SE and registered in the File Catalog– lfc commands to interact with file catalog features– lcg-utils commands ensure the consistency between files in the
Storage Elements and entries in the File Catalog
21
Enabling Grids for E-sciencE
INFSO-RI-508833
LFC commands
lfc-chmod Change access mode of the LFC file/directory
lfc-chown Change owner and group of the LFC file-directory
lfc-delcomment Delete the comment associated with the file/directory
lfc-getacl Get file/directory access control lists
lfc-ln Make a symbolic link to a file/directory
lfc-ls List file/directory entries in a directory
lfc-mkdir Create a directory
lfc-rename Rename a file/directory
lfc-rm Remove a file/directory
lfc-setacl Set file/directory access control lists
lfc-setcomment Add/replace a comment
LFC Catalog commands
22
Enabling Grids for E-sciencE
INFSO-RI-508833
Listing the entries of a LFC directorylfc-ls [-cdiLlRTu] [--class] [--comment] [--deleted] [--display_side] [--ds] path…
where path specifies the LFN pathname (mandatory)
– Remember that LFC has a directory tree structure /grid/<VO_name>/<you create it>
All members of a VO have read-write permissions for their own directory
– You can set LFC_HOME to use relative path> lfc-ls /grid/gilda/misurelli
> export LFC_HOME=/grid/gilda
> lfc-ls -l misurelli
lfc-ls
Defined by the userLFC Namespace
23
Enabling Grids for E-sciencE
INFSO-RI-508833
lfc-mkdir
Creating directories in the LFClfc-mkdir [-m mode] [-p] path...
• Where path specifies the LFC pathname
• Remember that while registering a new file (using lcg-cr, for example) the corresponding destination directory must be created in the catalog beforehand:
– lfc-mkdir /grid/gilda/misurelli/practise
– lfc-ls -l /grid/gilda/misurelli
24
Enabling Grids for E-sciencE
INFSO-RI-508833
lcg-utils commands
Replica Management
lcg-cp Copies a grid file to a local destination
lcg-cr Copies a file to a SE and registers the file in the catalog
lcg-del Delete one file
lcg-rep Replication between SEs and registration of the replica
lcg-gt Gets the TURL for a given SURL and transfer protocol
lcg-sd Sets file status to “Done” for a given SURL in a SRM request
File Catalog Interaction
lcg-aa Add an alias in LFC for a given GUID
lcg-ra Remove an alias in LFC for a given GUID
lcg-rf Registers in LFC a file placed in a SE
lcg-uf Unregisters in LFC a file placed in a SE
lcg-la Lists the alias for a given SURL, GUID or LFN
lcg-lg Get the GUID for a given LFN or SURL
lcg-lr Lists the replicas for a given GUID, SURL or LFN
25
Enabling Grids for E-sciencE
INFSO-RI-508833
lcg-utils: lcg-cr
• Upload a file to a SE and register it into the catalog
lcg-cr -d dest_file | dest_host [-g guid] [-l lfn] [-v | --verbose] --vo vo src_file
where:– dest_host is the fully qualified hostname of the destination SE– dest_file is a valid SURL (both sfn:// or srm:// format are valid)– guid specifies the Grid Unique IDentifier. If this option is not
present, a GUID is generated internally– lfn specifies the Logical File Name associated with the file– vo specifies the Virtual Organization the user belongs to– src_file specifies the source file name: the protocol can be file:///
or gsiftp:///
26
Enabling Grids for E-sciencE
INFSO-RI-508833
edg-gridftp-exists TURL Checks if file/dir exists on a SE
edg-gridftp-ls TURL Lists a directory on a SE
globus-url-copy srcTURL dstTURL Copies files between SEs
edg-gridftp-mkdir TURL Creates a directory on a SE
edg-gridftp-rename srcTURL dstTURL Renames a file on a SE
edg-gridftp-rm TURL Removes a file from a SE
edg-gridftp-rmdir TURL Removes a directory on a SE
Used for low level management of file/directories in SEsUsed for low level management of file/directories in SEs
Advanced utilities: gridftp commands
27
Enabling Grids for E-sciencE
INFSO-RI-508833
Globus-url-copy
• globus-url-copy srcTURL destTURL– low level file transfer
• Interaction with RLS components– edg-lrc command (actions on LRC)
– edg-rmc command (actions on RMC)
– C++ and Java API for all catalog operations http://edg-wp2.web.cern.ch/edg-wp2/replication/docu/r2.1/edg-lrc-devguide.pdf http://edg-wp2.web.cern.ch/edg-wp2/replication/docu/r2.1/edg-rmc-devguide.pdf
• Using low level CLI and API is STRONGLY discouragedUsing low level CLI and API is STRONGLY discouraged– Risk: loose consistency between SEs and catalogues– REMEMBERREMEMBER: a file is in Grid if it is BOTH:BOTH:
stored in a Storage Element registered in the file catalog
28
Enabling Grids for E-sciencE
INFSO-RI-508833
References
• gLite documentation homepage– http://glite.web.cern.ch/glite/documentation/default.asp
• LFC and DPM documentation– https://uimon.cern.
ch/twiki/bin/view/LCG/DataManagementDocumentation
29
Enabling Grids for E-sciencE
INFSO-RI-508833
Questions…