Grid Data Management - ut · Grid Data Management tools. Basics of Grid and Cloud computing 3/33...

33
Grid Data Management Week #4 Basics of Grid and Cloud computing Hardi Teder [email protected] University of Tartu March 6th 2013

Transcript of Grid Data Management - ut · Grid Data Management tools. Basics of Grid and Cloud computing 3/33...

Page 1: Grid Data Management - ut · Grid Data Management tools. Basics of Grid and Cloud computing 3/33 ... DNA experiments Radio telescopes Sensor networks Digitalizing data: books, documents,

Grid Data Management

Week #4Basics of Grid and Cloud computing

Hardi [email protected]

University of TartuMarch 6th 2013

Page 2: Grid Data Management - ut · Grid Data Management tools. Basics of Grid and Cloud computing 3/33 ... DNA experiments Radio telescopes Sensor networks Digitalizing data: books, documents,

Basics of Grid and Cloud computing 2/33

Overview

● Grid Data Management● Where the Data comes from?● Grid Data Management tools

Page 3: Grid Data Management - ut · Grid Data Management tools. Basics of Grid and Cloud computing 3/33 ... DNA experiments Radio telescopes Sensor networks Digitalizing data: books, documents,

Basics of Grid and Cloud computing 3/33

Grid foundations

Page 4: Grid Data Management - ut · Grid Data Management tools. Basics of Grid and Cloud computing 3/33 ... DNA experiments Radio telescopes Sensor networks Digitalizing data: books, documents,

Basics of Grid and Cloud computing 4/33

Where the data comes from?

● CERN's LHC CMS experiment example● CERN – European Organization for Nuclear Research● LHC – Large Hadron Collider● CMS – Compact Muon Solenoid

Page 5: Grid Data Management - ut · Grid Data Management tools. Basics of Grid and Cloud computing 3/33 ... DNA experiments Radio telescopes Sensor networks Digitalizing data: books, documents,

Basics of Grid and Cloud computing 5/33

Grid acronyms

● EGI Glossary● http://www.egi.eu/about/glossary/ ● Goole search helps

● EGI Security Policy Glossary of Terms● https://documents.egi.eu/public/ShowDocument?docid=71

Page 6: Grid Data Management - ut · Grid Data Management tools. Basics of Grid and Cloud computing 3/33 ... DNA experiments Radio telescopes Sensor networks Digitalizing data: books, documents,

Basics of Grid and Cloud computing 6/33

Large Hadron Collider (LHC)

Page 7: Grid Data Management - ut · Grid Data Management tools. Basics of Grid and Cloud computing 3/33 ... DNA experiments Radio telescopes Sensor networks Digitalizing data: books, documents,

Basics of Grid and Cloud computing 7/33

Smash things together, see what happens!

Page 8: Grid Data Management - ut · Grid Data Management tools. Basics of Grid and Cloud computing 3/33 ... DNA experiments Radio telescopes Sensor networks Digitalizing data: books, documents,

Basics of Grid and Cloud computing 8/33

Discover particles

● Quarks ● Leptons

Quarks

up

down

charm

strange

top

bottom

Leptons

electron muontau

electron neutrino

muon neutrino tau neutrino

Page 9: Grid Data Management - ut · Grid Data Management tools. Basics of Grid and Cloud computing 3/33 ... DNA experiments Radio telescopes Sensor networks Digitalizing data: books, documents,

Basics of Grid and Cloud computing 9/33

Large Hadron Collider (LHC)

Page 10: Grid Data Management - ut · Grid Data Management tools. Basics of Grid and Cloud computing 3/33 ... DNA experiments Radio telescopes Sensor networks Digitalizing data: books, documents,

Basics of Grid and Cloud computing 10/33

CMS detector● Took ~2000 scientists and

engineers more than 20 years to design and build

● Is about 15 metres wide and 21.5 metres long

● Weighs twice as much as the Eiffel Tower – about 14000t

● Uses the largest, most powerful magnet of its kind ever made

Page 11: Grid Data Management - ut · Grid Data Management tools. Basics of Grid and Cloud computing 3/33 ... DNA experiments Radio telescopes Sensor networks Digitalizing data: books, documents,

Basics of Grid and Cloud computing 11/33

Page 12: Grid Data Management - ut · Grid Data Management tools. Basics of Grid and Cloud computing 3/33 ... DNA experiments Radio telescopes Sensor networks Digitalizing data: books, documents,

Basics of Grid and Cloud computing 12/33

Page 13: Grid Data Management - ut · Grid Data Management tools. Basics of Grid and Cloud computing 3/33 ... DNA experiments Radio telescopes Sensor networks Digitalizing data: books, documents,

Basics of Grid and Cloud computing 13/33

Collisions in CMS

Page 14: Grid Data Management - ut · Grid Data Management tools. Basics of Grid and Cloud computing 3/33 ... DNA experiments Radio telescopes Sensor networks Digitalizing data: books, documents,

Basics of Grid and Cloud computing 14/33

CMS in production● volume: ~250 TB/day among dozens of Tiers

● # files: ~19M logical files (but total of replicas so far is ~27M)

● throughput: 2-2.5 GB/s aggregate (weekly averages) in peak weeks in 2012

Page 15: Grid Data Management - ut · Grid Data Management tools. Basics of Grid and Cloud computing 3/33 ... DNA experiments Radio telescopes Sensor networks Digitalizing data: books, documents,

Basics of Grid and Cloud computing 15/33

Worldwide LHC Computing Grid (WLCG)

● Tier0 at CERN

● 11 Tier1 sites

● 138 Tier2 sites

Page 16: Grid Data Management - ut · Grid Data Management tools. Basics of Grid and Cloud computing 3/33 ... DNA experiments Radio telescopes Sensor networks Digitalizing data: books, documents,

Basics of Grid and Cloud computing 16/33

WLCG

● 15 Petabytes of data

annually generated

Page 17: Grid Data Management - ut · Grid Data Management tools. Basics of Grid and Cloud computing 3/33 ... DNA experiments Radio telescopes Sensor networks Digitalizing data: books, documents,

Basics of Grid and Cloud computing 17/33

There are more projects

● DNA experiments

● Radio telescopes

● Sensor networks

● Digitalizing data: books, documents, images

Page 18: Grid Data Management - ut · Grid Data Management tools. Basics of Grid and Cloud computing 3/33 ... DNA experiments Radio telescopes Sensor networks Digitalizing data: books, documents,

Basics of Grid and Cloud computing 18/33

Grid foundations

Page 19: Grid Data Management - ut · Grid Data Management tools. Basics of Grid and Cloud computing 3/33 ... DNA experiments Radio telescopes Sensor networks Digitalizing data: books, documents,

Basics of Grid and Cloud computing 19/33

Data management

● Data access and transfer– Simple, automatic multi-protocol file transfer tools: Integrated

with Resource Management service● Move data from local machine to remote machine, where the job is

executed (input file staging)● Move the output files from the remote computer to the local

machines (output file staging)● Pull executable from a remote location

– To have a secure, high-performance, reliable file transfer over modern WANs: GridFTP

● Data replication and management

Page 20: Grid Data Management - ut · Grid Data Management tools. Basics of Grid and Cloud computing 3/33 ... DNA experiments Radio telescopes Sensor networks Digitalizing data: books, documents,

Basics of Grid and Cloud computing 20/33

ARC Computing Element (CE)

● Universal frontend for different batch systems

● Standard and custom interfaces

● Status information publishing

● File handling

Page 21: Grid Data Management - ut · Grid Data Management tools. Basics of Grid and Cloud computing 3/33 ... DNA experiments Radio telescopes Sensor networks Digitalizing data: books, documents,

Basics of Grid and Cloud computing 21/33

ARC CE and data handling

● Data are moved by the users and/or by the ARC

● Frequently used files are cached at the execution sites

● Cached files are indexed

Page 22: Grid Data Management - ut · Grid Data Management tools. Basics of Grid and Cloud computing 3/33 ... DNA experiments Radio telescopes Sensor networks Digitalizing data: books, documents,

Basics of Grid and Cloud computing 22/33

ARC CE internals

● All services are only in the frontend

● Grid users are mapped to local identities

● Use /tmp/user for files witch are actively used

Page 23: Grid Data Management - ut · Grid Data Management tools. Basics of Grid and Cloud computing 3/33 ... DNA experiments Radio telescopes Sensor networks Digitalizing data: books, documents,

Basics of Grid and Cloud computing 23/33

ARC UI data manipulation

● arcls – to list contents and view some attributes of objects of a specified (by a URL) remote directory

● arccp – a tool to copy files over the Grid

● arcrm – allows users to erase files and directories at any location specified by a valid URL

● arcmkdir – allows users to create directories, if the protocol of the specified URL supports it

Page 24: Grid Data Management - ut · Grid Data Management tools. Basics of Grid and Cloud computing 3/33 ... DNA experiments Radio telescopes Sensor networks Digitalizing data: books, documents,

Basics of Grid and Cloud computing 24/33

ARC URLs● ftp ordinary File Transfer Protocol (FTP)

● gsiftp GridFTP, the Globus - enhanced FTP protocol with security, encryption, etc. developed by The Globus Alliance

● http ordinary Hyper-Text Transfer Protocol (HTTP) with PUT and GET methods using multiple streams

● https HTTP with SSL v3

● httpg HTTP with Globus GSI

● ldap ordinary Lightweight Data Access Protocol (LDAP) [9]

● lfc LFC catalog and indexing service of gLite [1]

● srm Storage Resource Manager (SRM) service [7]

● root Xrootd protocol (read-only, available in ARC 2.0.0 and later)

● file local to the host le name with a full path

Page 25: Grid Data Management - ut · Grid Data Management tools. Basics of Grid and Cloud computing 3/33 ... DNA experiments Radio telescopes Sensor networks Digitalizing data: books, documents,

Basics of Grid and Cloud computing 25/33

An URL can be used:

● In standard form:● protocol://[host[:port]]/file

● Or, to enhance the performance● protocol://[host[:port]][;option[;option[...]]]/file● protocol://[url[|url[...]]@]host[:port][;option[;option[...]]]

/lfn[:metadataoption[:metadataoption[...]]]● protocol://[;commonoption[;commonoption]|][url[|

url[...]]@]host[:port [;option[;option[...]]/lfn[:metadataoption[:metadataoption[...]]]

Page 26: Grid Data Management - ut · Grid Data Management tools. Basics of Grid and Cloud computing 3/33 ... DNA experiments Radio telescopes Sensor networks Digitalizing data: books, documents,

Basics of Grid and Cloud computing 26/33

URL examples ● ARC UI

● arcls lfc://lfc.balticgrid.org/grid/balticgrid/BGCC2013/Lab4/● arcls -l gsiftp://se.grid.eenet.ee/storage/balticgrid/BGCC2013

● XRSL

● to store the job output to storage● (optputFiles=("jobHugeOutputFile.tgz"

"gsiftp://se.grid.eenet.ee/storage/balticgrid/BGCC2013/user/"))

Page 27: Grid Data Management - ut · Grid Data Management tools. Basics of Grid and Cloud computing 3/33 ... DNA experiments Radio telescopes Sensor networks Digitalizing data: books, documents,

GridFTP

● The GSIFTP protocol offers the functionalities of FTP, but with support for GSI.

● Supported by all VOs in Gird

● arccp gsiftp://lscf.nbi.dk:2811/jobs/1323842831451666535/job.out job.out

Page 28: Grid Data Management - ut · Grid Data Management tools. Basics of Grid and Cloud computing 3/33 ... DNA experiments Radio telescopes Sensor networks Digitalizing data: books, documents,

File Catalogue (LFC)

● Users and applications need to locate files (or replicas) on the Grid.

● The File Catalogue is the service which maintains mappings between LFN(s), GUID and SURL(s).

● lfc://lfc.balticgrid.org/grid/balticgrid/BGCC2013/Lab4/P4_data.test

● Lfc:P4_data.test

Page 29: Grid Data Management - ut · Grid Data Management tools. Basics of Grid and Cloud computing 3/33 ... DNA experiments Radio telescopes Sensor networks Digitalizing data: books, documents,

Basics of Grid and Cloud computing 29/33

Relationships between tables

Page 30: Grid Data Management - ut · Grid Data Management tools. Basics of Grid and Cloud computing 3/33 ... DNA experiments Radio telescopes Sensor networks Digitalizing data: books, documents,

LFC environment● !/bin/bash

● export LCG_GFAL_INFOSYS=bdii.balticgrid.org:2170

● export LCG_CATALOG_TYPE=lfc

● export LFC_HOST=lfc.balticgrid.org

● echo -e 'Prindin muutujaid: LCG_GFAL_INFOSYS; LCG_CATALOG_TYPE; LFC_HOST \n'

● echo $LCG_GFAL_INFOSYS; echo $LCG_CATALOG_TYPE; echo $LFC_HOST

● export LFC_HOME=/grid/balticgrid/BGCC2012/Hardi_Teder

Page 31: Grid Data Management - ut · Grid Data Management tools. Basics of Grid and Cloud computing 3/33 ... DNA experiments Radio telescopes Sensor networks Digitalizing data: books, documents,

Basics of Grid and Cloud computing 31/33

Clean up after yourself

● Delete the files you don't use any more

Page 32: Grid Data Management - ut · Grid Data Management tools. Basics of Grid and Cloud computing 3/33 ... DNA experiments Radio telescopes Sensor networks Digitalizing data: books, documents,

Basics of Grid and Cloud computing 32/33

References

● I used several pictures from:● CMS experiment public presentations:

– http://cms.web.cern.ch/org/cms-presentations-public

● NorduGrid repository– http://svn.nordugrid.org/trac/nordugrid/browser/doc/trunk/figures

● FREEIMAGES.co.uk– www.freeimages.co.uk

● More information about ARC Data Management:● http://www.nordugrid.org/papers.html

Page 33: Grid Data Management - ut · Grid Data Management tools. Basics of Grid and Cloud computing 3/33 ... DNA experiments Radio telescopes Sensor networks Digitalizing data: books, documents,

Basics of Grid and Cloud computing 33/33

Thank you● More information from:

● Hardi Teder [email protected]

● http://courses.cs.ut.ee/2013/cloud

[email protected]