Post on 18-Jan-2018
description
ATLAS-specific functionality in Ganga- Requirements for distributed analysis- ATLAS considerations- DIAL submission from Ganga- Graphical interfaces- Next steps
K. HarrisonCERN, 26th May 2005
26th May 2005 2
Requirements for distributed analysis
-General requirements for distributed analysis similar for ATLAS and LHCb
Deploy and run software based on Gaudi/Athena framework, and developed in terms of CMT packagesLocate data for analysisRun on a variety of backendsManage job output
-Differences in the choices of how to do the aboveNeed for experiment-specific functionality in
Ganga
26th May 2005 3
ATLAS considerations (1)
-Installation of ATLAS software is performed using pacman
Binary installation is 8-10Gb, so installation not to be taken lightlyPacman tool also being tested by LHCb
-Data-management system (Don Quijote) for ATLAS still under development
Must manage data distributed over different Grids• Today: LCG, NorduGrid, Grid 3 (US)
Have metadata catalogue (AMI) similar to Bookkeeping Database of LHCb, and various other catalogues
26th May 2005 4
ATLAS considerations (2)
-Two systems for distributed data processing developed
Production system used for collaboration-wide managed activities
•Similarities with DIRAC in LHCbATLAS Distributed Analysis (ADA) system, based
on DIALOverlaps between the two systems, and optimal
way of using them still being understood-In Ganga, two systems can be treated as different
backends
26th May 2005 5
- The production database, which contains abstract job definitions;
- The windmill supervisor that reads the production database for job definitions and present them to the different GRID executors in an easy-to-parse XML format;
- The Executors, one for each GRID flavor, that receive the job-definitions in XML format and convert them to the job description language of that particular GRID;
- Don Quijote, the Atlas Data Management System, moves files from their temporary output locations to their final destination on some Storage Element and registers the files in the Replica Location Service of that GRID.
-In order to handle the task of ATLAS DC2 an automated production system was designed
-The ATLAS production system consists of 4 components:
ATLAS production system
26th May 2005 6
ADA model
ROOT PYTHON
Catalogs DIAL AS ATPROD AS ARDA AS
LSF, CONDOR gLiteWMSATPROD
GUI andcommand lineclients
High level servicesfor cataloging andjob submission andmonitoring
Workloadmanagementsystems
AJDL
sh SQL gLite
AJDL
AJDL
26th May 2005 7
Ganga in ATLAS
Ganga User interface for job-related operations: configuration, submission, splitting, merging, monitoring, output retrieval, etc
LSFGrid 3 NorduGrid PBS BQSOtherLCG DIAL
Prod
uctio
n Sy
stem Condor
Other cataloguesand repositories
DIAL cataloguesand repositories
ATLAS Metadata Interface(AMI)
26th May 2005 8
Characteristics of DIAL
-For Ganga 4, have first concentrated on job submission to DIAL backend
-DIAL makes computing facilities available via web services-Server and client functionality implemented in C++
PyDial packaged developed to provide Python bindings to the C++ classes, and functions to simplify their use
-A job in DIAL is defined in terms of:Application: specifies software to be runTask: specifies configuration information (e.g. job options)Dataset: specifies data to be processedAbove map to DIAL objects, each with XML representation
-Information for applications, tasks and datasets is catalogued
26th May 2005 9
DIAL submissionfrom Ganga (1)
-To enable DIAL submission from Ganga, basically needed to implement the following:
Plug-in that uses ADA/DIAL job descriptionPlug-in that interacts with DIAL backendMapping between the two plug-ins
-To be able to store DIAL objects in Ganga job repository, needed object-to-XML converters-In Ganga, create instances of DIAL catalogues
26th May 2005 10
DIAL submissionfrom Ganga (2)
-From Ganga can do following:Query the DIAL cataloguesSubmit jobs to (remote) DIAL servicesRetrieve job output (also partial results)Via pyROOT, display output histogramsKeep track of jobs from one session to the next
Works well
26th May 2005 11
DIAL submissionfrom Ganga (3)
-Current syntax slightly different, but the idea is to be able to do as follows
>>> j = AdaJob()>>> j.application = “atlasopt”>>> j.task = “atlas_release jo.py”>>> j.dataset = “hma.dc2.003007.digit.A1.z_ee.aod-904.10files”>>> j.backend.schedulerURL = “lxgate21.cern.ch.20014”>>> j.submit()>>> print j.status>>> print j.result>>> j.copyResult( “myDirectory” )
- Have shown commands given at Python promptSame commands can be used in a script
26th May 2005 12
Ganga 4:graphical interfaces
-Priority for Ganga 4 has been to have infrastructure in place, and to have functionality available at the command line
Ganga 4 public release planned for June, without GUI
-Graphical tools for job configuration and monitoring have been developed for ADA (Alvin Tan)
For jobs using ADA/DIAL job description, expect this to be reused directly in GangaProvides a starting point for developing further
graphical tools for Ganga
Graphical job builder
Job monitoring
26th May 2005 15
Next steps (1)
-Priorities need to be discussed, but there are a number of possibilities
Enable Ganga submission to production system•Python tools for entering requests in production
database already written (Frederic Brochu)Enable direct submission from Ganga to one or
more Grid flavoursConnect ADA graphical tools to GangaUnderstand interaction with data-management
systemGet release out, so that people can try it
26th May 2005 16
Next steps (2)
-ATLAS review of distributed analysis tentatively scheduled for July
Must have release for this-Have summer student at Cavendish Laboratory for
6-8 weeks from end of June (Ruth Dixon del Tufo)Will test usability of Ganga, and help with
improvements