Grids and OO New directions in computing for HEP
description
Transcript of Grids and OO New directions in computing for HEP
17-10-01 M.Mazzucato – Como Villa Olmo 1
Grids and OO New directions in computing for HEP
Mirco Mazzucato
INFN-Padova
17-10-01 M.Mazzucato – Como Villa Olmo 2
Main conclusion of the “LHC Comp. Review” The Panel recommends the multi-tier hierarchical model proposed
by Monarc as one key element of the LHC computing model with the majority of the resources not based at CERN : 1/3 in 2/3 out
About equal share between Tier0 at CERN, Tier1’s and lower level Tiers down to desktops
Tier0 ierall Tier2 +…
All experiments should perform Data Challenges of increasing size and complexity until LHC start-up involving also Tier2
EU Testbed : 30-50% of one LHC experiment by 2004
Limit heterogeneity : OS = Linux , Persistency = 2 tools max
General consensus that GRID technologies developed by Datagrid can provide the way to efficiently realize this infrastructure
17-10-01 M.Mazzucato – Como Villa Olmo 3
HEP Monarc Regional Centre Hierarchy
CERNTier 0
Tier 1
Tier 2
Tier 3
Tier 4
France INFN 2.5Gbps UK Fermilab
Tier2 center
Site Site Site
2.5Gbps
>=622Mbps
622Mbps
2.5Gbps
desktop
100Mbps-1Gbps
INFN-GRID
17-10-01 M.Mazzucato – Como Villa Olmo 4
NICE PICTURE
….BUT WHAT DOES IT MEANS ?
17-10-01 M.Mazzucato – Como Villa Olmo 5
The real Challenge: the software How to put together all these WAN distributed resources in a “transparent”
way for the users “transparent” means that user should not note the presence of “network and
many WAN distributed sources of resources” As the WEB with good network connectivity
How to group them dynamically to satisfy virtual organizations tasks?
Here comes the Grid paradigm End of ’99 for EU and LHC Computing: Start of DataGrid Project+ US
GRIDS:Enable communities (“virtual organizations”) to share geographically distributed resources as they pursue common goals—in the absence of central control, omniscience, trust relationships (Ian Foster @ Carl Kesselmann – CERN January 2001)
Just in time to answer the question opened by the Monarc model.
17-10-01 M.Mazzucato – Como Villa Olmo 6
The Grid concept
Each resource (our Farms in the ’90 language) is transformed by the Grid middleware in a GridService which is accessible via networkSpeaks a well defined protocolHas standard API’s Contains information on itself which are made
available to an index (accessible via network) when it register itself
Has a policy which control its accessCan be used to form more complex GridServices
17-10-01 M.Mazzucato – Como Villa Olmo 7
The Globus Team:Layered Grid Architecture
Application
Fabric“Controlling things locally”: Access to, & control of, resources
Connectivity“Talking to things”: communication (Internet protocols) & security
Resource“Sharing single resources”: negotiating access, controlling use
Collective“Coordinating multiple resources”: ubiquitous infrastructure services, app-specific distributed services
InternetTransport
Application
Link
Inte
rnet P
roto
col
Arch
itectu
reThe Anatomy of the Grid: Enabling Scalable Virtual Organizations,
I. Foster, C. Kesselman, S. Tuecke, Intl J. Supercomputer Applns, 2001. www.globus.org/research/papers/anatomy.pdf
17-10-01 M.Mazzucato – Como Villa Olmo 8
The GridServices ComputingElement(CE) StorageElement(SE) GridScheduler Information and
Monitoring ReplicaManager(RM) FileMover ReplicaCatalog But also…
UserRunTimeEnvironment Network SecurityPolicyService Accounting
Well defined interfacesSimple dependenciesWell defined
interactions
17-10-01 M.Mazzucato – Como Villa Olmo 9
Local ApplicationLocal Application Local DatabaseLocal Database
Grid
Fabric
Local Computing
Grid
Collective ServicesCollective Services
Information &
Monitoring
Information &
Monitoring
Replica ManagerReplica
ManagerGrid
SchedulerGrid
Scheduler
Underlying Grid ServicesUnderlying Grid Services
Computing Element Services
Computing Element Services
Authorization Authentication and Accounting
Authorization Authentication and Accounting
Replica CatalogReplica Catalog
Storage Element Services
Storage Element Services
SQL Database Services
SQL Database Services
Fabric servicesFabric services
ConfigurationManagement
ConfigurationManagement
Node Installation &Management
Node Installation &Management
Monitoringand
Fault Tolerance
Monitoringand
Fault Tolerance
Resource Management
Resource Management
Fabric StorageManagement
Fabric StorageManagement
Grid Application LayerGrid Application Layer
Data Management
Data Management
Job Management
Job Management
Metadata Management
Metadata Management
Object to File Mapping
Object to File Mapping
Service Index
Service Index
EU-DataGrid Architecture
17-10-01 M.Mazzucato – Como Villa Olmo 10
The available Basic Services (Globus + EDG+...)Basic and essential services required in a Grid environment .
Computing and Storage Element Service Represent the basic and essential services required in a Grid environment.
These services include the ability to: submit jobs on remote clusters, (Globus GRAM) to transfer files efficiently between sites( Globus GridFTP) GDMP schedule jobs on Grid services (EDG Broker)
The Replica Catalog and Replica Manager (Globus) Stores information about physical files stored on any given Storage Element
and manage replica’s The Information Service (Globus MDS2)
Provide information on available resources SQL Database Service (EDG)
Provide the ability to store Grid Metadata Service Index (EDG)
Stores information on Grid services and access url’s Security: Authentication, Authorization and Accounting (Globus+EDG)
All the services concerning security on the Grid Fabric :Transform hardware in a Grid service (EDG)
17-10-01 M.Mazzucato – Como Villa Olmo 11
The status Interest in Grid technology started to significantly grow in
HENP physics community at the end of 1999 Chep2000 (February ): GRID Technology is launched in
HENP:invited talk of I. Foster at the plenary session introduced basic Grid concepts
Saturday and Sunday after the end of Chep2000 ~ 100 people in Padova for the first Globus tutorial to HENP community in Europe
Summer 2000: Turning point “Approval” of HENP Grid projects GriPhyN and DataGrid Many National Grid projects; INFN Grid, UK eScience Grid, …..
HENP Grid community significantly increase 2001: Approval of PPDG, iVGDL, DataTAG…… Autumn 2001: Approval of LHC Computing Grid Project Chep2001: ~ 50 abstracts on Grids
17-10-01 M.Mazzucato – Como Villa Olmo 12
Grid progress review: ExperimentsExperiments are increasingly integrating Grid
technology in their core software Alice,Atlas,CMS,LHCb,D0, Cosmology
Extensive tests of available Grid tools using existing environmentSTAR(10-032) Gridftp in production BNL->LBL
First modification of expts application environment to integrate available grid software
Definition of architecture for experiments Grid aware applications
Definition of requirements for future Grid middleware development
17-10-01 M.Mazzucato – Como Villa Olmo 13
When an Athena job creates an event collection in a physical database file, register data in a grid-enabled collection: add filename to the (replica catalog) collection add filename to location object describing Site A (can use OutputDatabase from job options as filename)
Command-line equivalent of what needs to be done is globus-replica-catalog … -collection -add-filenames XXX
globus-replica-catalog … -location “Site A” -add-filenames \ XXX
(The “…” elides LDAP URL of the collection, and authentication information)
ATLAS ATHENA Grid enabled Data Mangement using
Globus Replica Catalog
17-10-01 M.Mazzucato – Como Villa Olmo 14
HPSSAt
CCIN2P3 Run DB
at Catania(MySQL)
CASTORAt
CERN
Linux farm(LSF, PBS, BQS)
I’m the impatientALICE user looking for available events
Anywhere
bbftp
I’m the local surveyor
MonitoringServer at Bari
bbftp
stdout, stderr
P. Cerello CHEP2001, Beijing 3-7/9/2001
I’m the production manager
CataniaCERNLyon
Torino........
Input
Globus
ALICE farm
17-10-01 M.Mazzucato – Como Villa Olmo 15
Alice/Grid: Sites & Resources
BIRMINGHAM
COLUMBUS, US
CAPETOWN, ZA
YEREVAN
DUBNA
PADOVA TORINO
CAGLIARI
NIKHEF
CATANIA
LYON
BARI
BOLOGNA
CALCUTTA, IN
GSI
MEXICO CITY, MX
IRB
SACLAY
CERN
17-10-01 M.Mazzucato – Como Villa Olmo 16
G-Tools Integration into the CMS Environment
Stage & Purge scripts
trigger
Physics software
CheckDB script
Production federation
Userfederation
MSS
catalog
catalog
Copy file to MSS
Update catalog
Purge file
Generatenew catalog
Publishnew catalog
Subscriber’slist
Write DB DB completenesscheck
Stage file (opt)
trigger
GDMP export catalog
GDMP import catalog
GDMPserver
Generateimport catalog
Replicatefiles
Userfederation
catalog
MSSStage & Purge scripts
Copy file to MSS
Purge file
Transfer & attach
trigger
write
read
CMS environmentGDMP systemCMS/GDMP interface
wan
Site B
Site A
17-10-01 M.Mazzucato – Como Villa Olmo 17
Distributed MC production in future (using DataGRID middleware) – LHC-b 10-011
Submit jobs remotelyvia Web
Executeon farm
Monitorperformanceof farm via
Web
Update bookkeeping
database
Transfer data toCASTOR (and
HPSS, RAL Datastore)
Data Quality Check ‘Online’
WP 1 job submission
tools
WP 4 environment
WP 1 job submission
tools
WP 3 monitoring
tools
WP 2 data replication
WP 5 API for mass storage
Online histogram production using GRID
pipes
WP 2 meta data tools
WP1 tools
17-10-01 M.Mazzucato – Como Villa Olmo 18
Workflow Management for Cosmology Approach
Use the Grid for coordination of remote facilities, including telescopes, computing and storage
Use Grid directory-based information service to find needed computing and storage resources and to discover access methods appropriate to their use
Supernova search analysis is now running on the prototype DOE Science Grid based at Berkeley Lab
They will implement a set of workflow management services aimed at the DOE Science Grid
Implementation SWAP-based (Simplified Workflow Access Protocol) engine for job
submission, tracking and completion notification
Condor to manage analysis and categorization tasks with “Class Ads” to match needs to resources
DAGman (Directed Acyclic Graph Job Manager) to schedule parallel execution constrained by tree-like dependency
17-10-01 M.Mazzucato – Como Villa Olmo 19
Fab
ric Tape
Storage Elements
Request Formulator and
Planner
Client Applications
Compute Elements
Indicates component that will be replaced
Disk Storage
Elements
LANs and
WANs
Resource and Services Catalog
Replica Catalog
Meta-data
Catalog
Authentication and SecurityGSISAM-specific user, group, node, station registration Bbftp ‘cookie’
Connectivity and Resource
CORBA UDP File transfer protocols - ftp, bbftp, rcp GridFTP
Mass Storage systems protocolse.g. encp, hpss
Collective
Services
Catalogprotocols
Significant Event Logger Naming Service Database ManagerCatalog Manager
SAM Resource Management
Batch Systems - LSF, FBS, PBS, Condor
Data MoverJob Services
Storage Manager
Job ManagerCache ManagerRequest Manager
“Dataset Editor” “File Storage Server”“Project Master” “Station Master” “Station Master”
Web Python codes, Java codes Command lineD0 Framework C++ codes
“Stager”“Optimiser”
CodeRepostor
y
Name in “quotes” is SAM-given software component name
or addedenhanced using PPDG and Grid tools
D0 SAM and PPDG – 10-037
17-10-01 M.Mazzucato – Como Villa Olmo 20
To be deliveredOctober 2001
The New DataGrid Middleware
17-10-01 M.Mazzucato – Como Villa Olmo 21
Status of Grid middlewareSoftware and Middleware Concluded evaluation phase. Basic Grid services (Globus and Condor) are
in installed in several testbeds: INFN, France, UK, US… Need in general more robustness, reliability and scalability (HEP has hundreds
of users, hundreds of jobs, enormous data sets…)
But DataGrid and US Testbeds 0 are up and running Solved problems of multiple CA, Authorization…
Release 1 of Datagrid middleware is expected this week Real experiments applications will use GRID software in production
(ALICE, ATLAS, CMS, LHC-B, but also EO, biology, Virgo/LIGO ….)
DataGrid Testbed 1 in November will include major Tier1..Tiern Centers in Europe and will be soon extended to US….
17-10-01 M.Mazzucato – Como Villa Olmo 22
Summary on Grid developments Activities still mainly concentrated on strategies, architectures, tests General adoption of Globus concept of layered architecture General adoption of Globus basic services
Core Data Grid services: transport (GridFTP), Replica Management and Replica Catalog
Resource management (GRAM), information services (MDS) Security and policy for collaborative groups (PKI)
…but new middleware tools start to appear and being largely used Broker, GDMP, Condor-G…….
In general good collaboration between EU-US Grid developers GDMP, Condor-G, Improvements in Globus Resource Management…
Progress facilitated by largely shared Open Source approach Experiments getting on top of Grid activities
CMS requirements for the Grid DataGrid WP8 requirement document (100 pages for LHC expts, EO and Biology)
Need to plan carefully next iteration of Grid middleware development (realistic application requirements, results of testbeds…)
17-10-01 M.Mazzucato – Como Villa Olmo 23
Grids and Mass Storage HENP world has adopted many different MSS solutions
Castor, ADSM/TSM, ENSTORE,Eurostore HPSS,JASMine
All present same (good) functionalities but: Different client API Different Data handling and distributuion Different Hardware support and monitoring
… and many different Databases solutionsObjectivity (OO Db), Root( File based), Oracle…
Difficult to interoperate. ..Possible way outAdopt neutral database Object description that allows
movement between platforms and DB’s: e.g. (Atlas) Data Dictionary&Description Language (DDDL)
Adopt Grid standard access layer on top of different native access methods as GRAM over LSF, PBS, Condor...
17-10-01 M.Mazzucato – Como Villa Olmo 24
Grid and OO Simulation & Reconstruction
Geant4 (the OO simulation toolkit) is slowly reaching HENP experiments Extensive debugging of Hadronic models with test beams,
Geometry descriptions, low energy e.m. descriptions….
Expected to be adopted soon as basic production simulation tool by many experiments: Babar, LHC expts...
CMS has OSCAR (Geat4) simulation and ORCA reconstruction fully integrated in their Framework COBRA
Preliminary tests of simulation and reconstruction on the Grid done by all LHC expts + Babar, D0….
Need to plan now Grid aware Framework to fully profit of Grid middleware
17-10-01 M.Mazzucato – Como Villa Olmo 25
Conclusions Large developments are ongoing on Grid middleware in parallel in EU
and US : Workflow and Data Management, Information Services… All adopt Open Source approach Several experiments are developing Job and Meta Data Managers
natural and safe
…..but strong coordination is needed to avoid divergent solutions InterGrid organization EU-US-Asia for HENP world Global Grid Forum for general standardization of protocols and API
Grid projects should develop a new world-wide “standard engine” to provide transparent access to resources (computing, storage, network….) As the WEB engine for information in early ’90
Since Source codes are available better to improve existing tool than starting parallel divergent solution
Big Science like HENP due this to the worldwide tax payers
HENP Grid infancy ends with the LHC Computing Grid project and Chep2001