David Adams
ATLAS
ATLAS Distributed Analysis
David AdamsBNL
March 18, 2004
ATLAS Software WorkshopGrid session
ATLAS Distributed Analysis USATLAS Grid March 18, 2004 2
David Adams
ATLAS
Contents
Definitions
Architecture
AJDL
Analysis service
Catalog services
Strategy
ARDA
More information
ATLAS Distributed Analysis USATLAS Grid March 18, 2004 3
David Adams
ATLAS
DefinitionsAnalysis (not necessarily distributed)
• Supports the manipulation and extraction of summary data (e.g. histograms) from any type of event data
– AOD, ESD, …
• Supports user-level production of event data– e.g. MC generation, simulation and reconstruction
Distributed analysis• Extends the extraction and production support to
include distributed users, data and processing.• Natural extension of non-distributed analysis• Easily invoked from any ATLAS analysis environment
– including Python, ROOT, command line– easily ported to any future environment (e.g. JAS)
ATLAS Distributed Analysis USATLAS Grid March 18, 2004 4
David Adams
ATLAS
Architecture
M id d lew ar e s er v ic e in ter f ac es
C EW M S F ileC ata lo g
etc . . . .e tc . M id d lew ar es er v ic es
Hig h lev e l s er v ic e in te r f ac es ( AJ D L )
D I ALAn aly s isS er v ic e
G AN G AAn aly s isS er v ic e
AT P R O DAn aly s isS er v ic e
R O O Tc m d lin e
C lien t
G AN G Ac m d lin e
C lien t
G AN G AT as k
Ed ito r
D I R ACAn aly s isS er v ic e
G AN G AJ o b
S u b m is s io n
G AN G AJ o b
M o n ito r
Hig h - lev els er v ic es
C lien t to o ls
AR D AAn aly s isS er v ic e
C ata lo gs er v ic es
G AN G A G UI
D atas e tS p lit te r
D atas e tM er g er
ATLAS Distributed Analysis USATLAS Grid March 18, 2004 5
David Adams
ATLAS
AJDLAcronym: Analysis Job Definition Language
Used to define interface for high-level services
Components include:• Application – executable to process data
• Task – user configuration of application
• Dataset – describes input and output data
• Job – app, task and input dataset output dataset
ATLAS Distributed Analysis USATLAS Grid March 18, 2004 6
David Adams
ATLAS
AJDL (cont)Components must be extensible
• Use types– E.g. HistogramDataset, EventDataset, AtlasEventDataset
• Generic interface– For use by (shared) generic high-level services
• Experiment-specific interface– Used by application
Nature of components• Persistent representation of data (e.g. XML)
• Classes to interpret this data (C++, Python ,java,…)
ATLAS Distributed Analysis USATLAS Grid March 18, 2004 7
David Adams
ATLAS
Analysis serviceExample scenario for processing a high-level job
• Input is application, task, dataset and job configuration
• Map input virtual dataset to concrete representation
• Split into sub-datasets
• Create sub-job for each sub-dataset
• Stage files for each sub-job
• Locate and possibly install application
• Build (e.g. compile) task
• Run sub-jobs
• Gather and merge results (output datasets)
• Output is dataset and job performance description
ATLAS Distributed Analysis USATLAS Grid March 18, 2004 8
David Adams
ATLAS
AnalysisFramework
Job 1
Job 2
Application Task
Dataset 1
AnalysisService
1. Locate
2. select 3. Create or select
4. select
5. submit(app,tsk,ds)
6. splitDataset
Dataset 2
7. create
e.g. ROOT
e.g. athena
Result9. create
10. gather
Result 9. create
exe, pkgs scripts, codeADA/DIAL user
interface
ATLAS Distributed Analysis USATLAS Grid March 18, 2004 9
David Adams
ATLAS
Catalog servicesRepositories
• Store AJDL components indexed by ID
Selection (metadata) catalogs• Help user to select input data, task , …
VDC – Virtual Dataset Catalog• Prescriptions for creating datasets
– Application, task input dataset
DRC – Dataset Replica Catalog• Mapping between virtual and concrete datasets
Job catalog• Detailed provenance for concrete datasets
ATLAS Distributed Analysis USATLAS Grid March 18, 2004 10
David Adams
ATLAS
StrategyDefine AJDL
• Components, nature, interfaces
Implement catalogs• Tables in AMI
• Programmatic interface– (C++ with Python binding)
Analysis services• Start with existing services or analogs
– DIAL, ATCOM, Capone, GANGA, …
• Different implementations for different strategies
• At least one using ARDA middleware
ATLAS Distributed Analysis USATLAS Grid March 18, 2004 11
David Adams
ATLAS
Strategy (cont)User interface
• Programmatic interface to high-level services and AJDL components
– C++, python and eventually java bindings
• GANGA will provide python binding and use it to deliver a GUI
– Extensible design: client tools plug into python bus
Middleware• Whatever works to begin
• ARDA services will be used in that context– Like to see better integration with other middleware efforts
ATLAS Distributed Analysis USATLAS Grid March 18, 2004 12
David Adams
ATLAS
Strategy (cont)We service infrastructure
• Short term use independent persistent services
• Mid-term follow ARDA strategy– GAS – grid access service
• Long term follow standards such as WSRF– Dataset becomes a resource?
ATLAS Distributed Analysis USATLAS Grid March 18, 2004 13
David Adams
ATLAS
ARDAARDA begins April 1
Two areas in LCG:• Middleware development (1st report delivered)
• Integration team
Other participants• Implementation team(s) from each experiment
– Use ARDA middleware to provide analysis system
• Tool providers: POOL, SEAL, ROOT, GANGA
• Users in each experiment to try out implementations
• Regional centers deploy services and analysis systems
• GAG to advise
ATLAS Distributed Analysis USATLAS Grid March 18, 2004 14
David Adams
ATLAS
More informationADA home page:
• http://www.usatlas.bnl.gov/ADA
• This page has links to other projects
Top Related