Grids and OO New directions in computing for HEP

25
17-10-01 M.Mazzucato – Como Villa Olmo 1 Grids and OO New directions in computing for HEP Mirco Mazzucato INFN-Padova

description

Grids and OO New directions in computing for HEP. Mirco Mazzucato INFN-Padova. Main conclusion of the “LHC Comp. Review”. - PowerPoint PPT Presentation

Transcript of Grids and OO New directions in computing for HEP

Page 1: Grids and OO  New directions in computing for HEP

17-10-01 M.Mazzucato – Como Villa Olmo 1

Grids and OO New directions in computing for HEP

Mirco Mazzucato

INFN-Padova

Page 2: Grids and OO  New directions in computing for HEP

17-10-01 M.Mazzucato – Como Villa Olmo 2

Main conclusion of the “LHC Comp. Review” The Panel recommends the multi-tier hierarchical model proposed

by Monarc as one key element of the LHC computing model with the majority of the resources not based at CERN : 1/3 in 2/3 out

About equal share between Tier0 at CERN, Tier1’s and lower level Tiers down to desktops

Tier0 ierall Tier2 +…

All experiments should perform Data Challenges of increasing size and complexity until LHC start-up involving also Tier2

EU Testbed : 30-50% of one LHC experiment by 2004

Limit heterogeneity : OS = Linux , Persistency = 2 tools max

General consensus that GRID technologies developed by Datagrid can provide the way to efficiently realize this infrastructure

Page 3: Grids and OO  New directions in computing for HEP

17-10-01 M.Mazzucato – Como Villa Olmo 3

HEP Monarc Regional Centre Hierarchy

CERNTier 0

Tier 1

Tier 2

Tier 3

Tier 4

France INFN 2.5Gbps UK Fermilab

Tier2 center

Site Site Site

2.5Gbps

>=622Mbps

622Mbps

2.5Gbps

desktop

100Mbps-1Gbps

INFN-GRID

Page 4: Grids and OO  New directions in computing for HEP

17-10-01 M.Mazzucato – Como Villa Olmo 4

NICE PICTURE

….BUT WHAT DOES IT MEANS ?

Page 5: Grids and OO  New directions in computing for HEP

17-10-01 M.Mazzucato – Como Villa Olmo 5

The real Challenge: the software How to put together all these WAN distributed resources in a “transparent”

way for the users “transparent” means that user should not note the presence of “network and

many WAN distributed sources of resources” As the WEB with good network connectivity

How to group them dynamically to satisfy virtual organizations tasks?

Here comes the Grid paradigm End of ’99 for EU and LHC Computing: Start of DataGrid Project+ US

GRIDS:Enable communities (“virtual organizations”) to share geographically distributed resources as they pursue common goals—in the absence of central control, omniscience, trust relationships (Ian Foster @ Carl Kesselmann – CERN January 2001)

Just in time to answer the question opened by the Monarc model.

Page 6: Grids and OO  New directions in computing for HEP

17-10-01 M.Mazzucato – Como Villa Olmo 6

The Grid concept

Each resource (our Farms in the ’90 language) is transformed by the Grid middleware in a GridService which is accessible via networkSpeaks a well defined protocolHas standard API’s Contains information on itself which are made

available to an index (accessible via network) when it register itself

Has a policy which control its accessCan be used to form more complex GridServices

Page 7: Grids and OO  New directions in computing for HEP

17-10-01 M.Mazzucato – Como Villa Olmo 7

The Globus Team:Layered Grid Architecture

Application

Fabric“Controlling things locally”: Access to, & control of, resources

Connectivity“Talking to things”: communication (Internet protocols) & security

Resource“Sharing single resources”: negotiating access, controlling use

Collective“Coordinating multiple resources”: ubiquitous infrastructure services, app-specific distributed services

InternetTransport

Application

Link

Inte

rnet P

roto

col

Arch

itectu

reThe Anatomy of the Grid: Enabling Scalable Virtual Organizations,

I. Foster, C. Kesselman, S. Tuecke, Intl J. Supercomputer Applns, 2001. www.globus.org/research/papers/anatomy.pdf

Page 8: Grids and OO  New directions in computing for HEP

17-10-01 M.Mazzucato – Como Villa Olmo 8

The GridServices ComputingElement(CE) StorageElement(SE) GridScheduler Information and

Monitoring ReplicaManager(RM) FileMover ReplicaCatalog But also…

UserRunTimeEnvironment Network SecurityPolicyService Accounting

Well defined interfacesSimple dependenciesWell defined

interactions

Page 9: Grids and OO  New directions in computing for HEP

17-10-01 M.Mazzucato – Como Villa Olmo 9

Local ApplicationLocal Application Local DatabaseLocal Database

Grid

Fabric

Local Computing

Grid

Collective ServicesCollective Services

Information &

Monitoring

Information &

Monitoring

Replica ManagerReplica

ManagerGrid

SchedulerGrid

Scheduler

Underlying Grid ServicesUnderlying Grid Services

Computing Element Services

Computing Element Services

Authorization Authentication and Accounting

Authorization Authentication and Accounting

Replica CatalogReplica Catalog

Storage Element Services

Storage Element Services

SQL Database Services

SQL Database Services

Fabric servicesFabric services

ConfigurationManagement

ConfigurationManagement

Node Installation &Management

Node Installation &Management

Monitoringand

Fault Tolerance

Monitoringand

Fault Tolerance

Resource Management

Resource Management

Fabric StorageManagement

Fabric StorageManagement

Grid Application LayerGrid Application Layer

Data Management

Data Management

Job Management

Job Management

Metadata Management

Metadata Management

Object to File Mapping

Object to File Mapping

Service Index

Service Index

EU-DataGrid Architecture

Page 10: Grids and OO  New directions in computing for HEP

17-10-01 M.Mazzucato – Como Villa Olmo 10

The available Basic Services (Globus + EDG+...)Basic and essential services required in a Grid environment .

Computing and Storage Element Service Represent the basic and essential services required in a Grid environment.

These services include the ability to: submit jobs on remote clusters, (Globus GRAM) to transfer files efficiently between sites( Globus GridFTP) GDMP schedule jobs on Grid services (EDG Broker)

The Replica Catalog and Replica Manager (Globus) Stores information about physical files stored on any given Storage Element

and manage replica’s The Information Service (Globus MDS2)

Provide information on available resources SQL Database Service (EDG)

Provide the ability to store Grid Metadata Service Index (EDG)

Stores information on Grid services and access url’s Security: Authentication, Authorization and Accounting (Globus+EDG)

All the services concerning security on the Grid Fabric :Transform hardware in a Grid service (EDG)

Page 11: Grids and OO  New directions in computing for HEP

17-10-01 M.Mazzucato – Como Villa Olmo 11

The status Interest in Grid technology started to significantly grow in

HENP physics community at the end of 1999 Chep2000 (February ): GRID Technology is launched in

HENP:invited talk of I. Foster at the plenary session introduced basic Grid concepts

Saturday and Sunday after the end of Chep2000 ~ 100 people in Padova for the first Globus tutorial to HENP community in Europe

Summer 2000: Turning point “Approval” of HENP Grid projects GriPhyN and DataGrid Many National Grid projects; INFN Grid, UK eScience Grid, …..

HENP Grid community significantly increase 2001: Approval of PPDG, iVGDL, DataTAG…… Autumn 2001: Approval of LHC Computing Grid Project Chep2001: ~ 50 abstracts on Grids

Page 12: Grids and OO  New directions in computing for HEP

17-10-01 M.Mazzucato – Como Villa Olmo 12

Grid progress review: ExperimentsExperiments are increasingly integrating Grid

technology in their core software Alice,Atlas,CMS,LHCb,D0, Cosmology

Extensive tests of available Grid tools using existing environmentSTAR(10-032) Gridftp in production BNL->LBL

First modification of expts application environment to integrate available grid software

Definition of architecture for experiments Grid aware applications

Definition of requirements for future Grid middleware development

Page 13: Grids and OO  New directions in computing for HEP

17-10-01 M.Mazzucato – Como Villa Olmo 13

When an Athena job creates an event collection in a physical database file, register data in a grid-enabled collection: add filename to the (replica catalog) collection add filename to location object describing Site A (can use OutputDatabase from job options as filename)

Command-line equivalent of what needs to be done is globus-replica-catalog … -collection -add-filenames XXX

globus-replica-catalog … -location “Site A” -add-filenames \ XXX

(The “…” elides LDAP URL of the collection, and authentication information)

ATLAS ATHENA Grid enabled Data Mangement using

Globus Replica Catalog

Page 14: Grids and OO  New directions in computing for HEP

17-10-01 M.Mazzucato – Como Villa Olmo 14

HPSSAt

CCIN2P3 Run DB

at Catania(MySQL)

CASTORAt

CERN

Linux farm(LSF, PBS, BQS)

I’m the impatientALICE user looking for available events

Anywhere

bbftp

I’m the local surveyor

MonitoringServer at Bari

bbftp

stdout, stderr

P. Cerello CHEP2001, Beijing 3-7/9/2001

I’m the production manager

CataniaCERNLyon

Torino........

Input

Globus

ALICE farm

Page 15: Grids and OO  New directions in computing for HEP

17-10-01 M.Mazzucato – Como Villa Olmo 15

Alice/Grid: Sites & Resources

BIRMINGHAM

COLUMBUS, US

CAPETOWN, ZA

YEREVAN

DUBNA

PADOVA TORINO

CAGLIARI

NIKHEF

CATANIA

LYON

BARI

BOLOGNA

CALCUTTA, IN

GSI

MEXICO CITY, MX

IRB

SACLAY

CERN

Page 16: Grids and OO  New directions in computing for HEP

17-10-01 M.Mazzucato – Como Villa Olmo 16

G-Tools Integration into the CMS Environment

Stage & Purge scripts

trigger

Physics software

CheckDB script

Production federation

Userfederation

MSS

catalog

catalog

Copy file to MSS

Update catalog

Purge file

Generatenew catalog

Publishnew catalog

Subscriber’slist

Write DB DB completenesscheck

Stage file (opt)

trigger

GDMP export catalog

GDMP import catalog

GDMPserver

Generateimport catalog

Replicatefiles

Userfederation

catalog

MSSStage & Purge scripts

Copy file to MSS

Purge file

Transfer & attach

trigger

write

read

CMS environmentGDMP systemCMS/GDMP interface

wan

Site B

Site A

Page 17: Grids and OO  New directions in computing for HEP

17-10-01 M.Mazzucato – Como Villa Olmo 17

Distributed MC production in future (using DataGRID middleware) – LHC-b 10-011

Submit jobs remotelyvia Web

Executeon farm

Monitorperformanceof farm via

Web

Update bookkeeping

database

Transfer data toCASTOR (and

HPSS, RAL Datastore)

Data Quality Check ‘Online’

WP 1 job submission

tools

WP 4 environment

WP 1 job submission

tools

WP 3 monitoring

tools

WP 2 data replication

WP 5 API for mass storage

Online histogram production using GRID

pipes

WP 2 meta data tools

WP1 tools

Page 18: Grids and OO  New directions in computing for HEP

17-10-01 M.Mazzucato – Como Villa Olmo 18

Workflow Management for Cosmology Approach

Use the Grid for coordination of remote facilities, including telescopes, computing and storage

Use Grid directory-based information service to find needed computing and storage resources and to discover access methods appropriate to their use

Supernova search analysis is now running on the prototype DOE Science Grid based at Berkeley Lab

They will implement a set of workflow management services aimed at the DOE Science Grid

Implementation SWAP-based (Simplified Workflow Access Protocol) engine for job

submission, tracking and completion notification

Condor to manage analysis and categorization tasks with “Class Ads” to match needs to resources

DAGman (Directed Acyclic Graph Job Manager) to schedule parallel execution constrained by tree-like dependency

Page 19: Grids and OO  New directions in computing for HEP

17-10-01 M.Mazzucato – Como Villa Olmo 19

Fab

ric Tape

Storage Elements

Request Formulator and

Planner

Client Applications

Compute Elements

Indicates component that will be replaced

Disk Storage

Elements

LANs and

WANs

Resource and Services Catalog

Replica Catalog

Meta-data

Catalog

Authentication and SecurityGSISAM-specific user, group, node, station registration Bbftp ‘cookie’

Connectivity and Resource

CORBA UDP File transfer protocols - ftp, bbftp, rcp GridFTP

Mass Storage systems protocolse.g. encp, hpss

Collective

Services

Catalogprotocols

Significant Event Logger Naming Service Database ManagerCatalog Manager

SAM Resource Management

Batch Systems - LSF, FBS, PBS, Condor

Data MoverJob Services

Storage Manager

Job ManagerCache ManagerRequest Manager

“Dataset Editor” “File Storage Server”“Project Master” “Station Master” “Station Master”

Web Python codes, Java codes Command lineD0 Framework C++ codes

“Stager”“Optimiser”

CodeRepostor

y

Name in “quotes” is SAM-given software component name

or addedenhanced using PPDG and Grid tools

D0 SAM and PPDG – 10-037

Page 20: Grids and OO  New directions in computing for HEP

17-10-01 M.Mazzucato – Como Villa Olmo 20

To be deliveredOctober 2001

The New DataGrid Middleware

Page 21: Grids and OO  New directions in computing for HEP

17-10-01 M.Mazzucato – Como Villa Olmo 21

Status of Grid middlewareSoftware and Middleware Concluded evaluation phase. Basic Grid services (Globus and Condor) are

in installed in several testbeds: INFN, France, UK, US… Need in general more robustness, reliability and scalability (HEP has hundreds

of users, hundreds of jobs, enormous data sets…)

But DataGrid and US Testbeds 0 are up and running Solved problems of multiple CA, Authorization…

Release 1 of Datagrid middleware is expected this week Real experiments applications will use GRID software in production

(ALICE, ATLAS, CMS, LHC-B, but also EO, biology, Virgo/LIGO ….)

DataGrid Testbed 1 in November will include major Tier1..Tiern Centers in Europe and will be soon extended to US….

Page 22: Grids and OO  New directions in computing for HEP

17-10-01 M.Mazzucato – Como Villa Olmo 22

Summary on Grid developments Activities still mainly concentrated on strategies, architectures, tests General adoption of Globus concept of layered architecture General adoption of Globus basic services

Core Data Grid services: transport (GridFTP), Replica Management and Replica Catalog

Resource management (GRAM), information services (MDS) Security and policy for collaborative groups (PKI)

…but new middleware tools start to appear and being largely used Broker, GDMP, Condor-G…….

In general good collaboration between EU-US Grid developers GDMP, Condor-G, Improvements in Globus Resource Management…

Progress facilitated by largely shared Open Source approach Experiments getting on top of Grid activities

CMS requirements for the Grid DataGrid WP8 requirement document (100 pages for LHC expts, EO and Biology)

Need to plan carefully next iteration of Grid middleware development (realistic application requirements, results of testbeds…)

Page 23: Grids and OO  New directions in computing for HEP

17-10-01 M.Mazzucato – Como Villa Olmo 23

Grids and Mass Storage HENP world has adopted many different MSS solutions

Castor, ADSM/TSM, ENSTORE,Eurostore HPSS,JASMine

All present same (good) functionalities but: Different client API Different Data handling and distributuion Different Hardware support and monitoring

… and many different Databases solutionsObjectivity (OO Db), Root( File based), Oracle…

Difficult to interoperate. ..Possible way outAdopt neutral database Object description that allows

movement between platforms and DB’s: e.g. (Atlas) Data Dictionary&Description Language (DDDL)

Adopt Grid standard access layer on top of different native access methods as GRAM over LSF, PBS, Condor...

Page 24: Grids and OO  New directions in computing for HEP

17-10-01 M.Mazzucato – Como Villa Olmo 24

Grid and OO Simulation & Reconstruction

Geant4 (the OO simulation toolkit) is slowly reaching HENP experiments Extensive debugging of Hadronic models with test beams,

Geometry descriptions, low energy e.m. descriptions….

Expected to be adopted soon as basic production simulation tool by many experiments: Babar, LHC expts...

CMS has OSCAR (Geat4) simulation and ORCA reconstruction fully integrated in their Framework COBRA

Preliminary tests of simulation and reconstruction on the Grid done by all LHC expts + Babar, D0….

Need to plan now Grid aware Framework to fully profit of Grid middleware

Page 25: Grids and OO  New directions in computing for HEP

17-10-01 M.Mazzucato – Como Villa Olmo 25

Conclusions Large developments are ongoing on Grid middleware in parallel in EU

and US : Workflow and Data Management, Information Services… All adopt Open Source approach Several experiments are developing Job and Meta Data Managers

natural and safe

…..but strong coordination is needed to avoid divergent solutions InterGrid organization EU-US-Asia for HENP world Global Grid Forum for general standardization of protocols and API

Grid projects should develop a new world-wide “standard engine” to provide transparent access to resources (computing, storage, network….) As the WEB engine for information in early ’90

Since Source codes are available better to improve existing tool than starting parallel divergent solution

Big Science like HENP due this to the worldwide tax payers

HENP Grid infancy ends with the LHC Computing Grid project and Chep2001