Atlas Computing Alessandro De Salvo Terzo workshop sul calcolo dell’INFN 5-2004...

22
Atlas Computing Atlas Computing Alessandro De Salvo < Alessandro De Salvo <[email protected] > Terzo workshop sul calcolo dell’INFN 5-2004 Terzo workshop sul calcolo dell’INFN 5-2004 Outline Outline Computing model Computing model Activities in 2004 Activities in 2004 Conclusions Conclusions A. De Salvo – Terzo workshop sul calcolo nell'INFN, 27-5-2004 A. De Salvo – Terzo workshop sul calcolo nell'INFN, 27-5-2004

Transcript of Atlas Computing Alessandro De Salvo Terzo workshop sul calcolo dell’INFN 5-2004...

Atlas ComputingAtlas ComputingAlessandro De Salvo <Alessandro De Salvo <[email protected]>>

Terzo workshop sul calcolo dell’INFN 5-2004Terzo workshop sul calcolo dell’INFN 5-2004

OutlineOutline

Computing modelComputing model Activities in 2004Activities in 2004 ConclusionsConclusions

A. De Salvo – Terzo workshop sul calcolo nell'INFN, 27-5-2004A. De Salvo – Terzo workshop sul calcolo nell'INFN, 27-5-2004

A. De Salvo – A. De Salvo – Atlas Computing Atlas Computing – – Terzo workshop sul calcolo nell'INFN, 27-5-2004Terzo workshop sul calcolo nell'INFN, 27-5-2004

Atlas Data Rates per yearAtlas Data Rates per year

Rate(Hz) sec/year Events/year Size(MB) Total(TB)

Raw Data 200 1.00E+07 2.00E+09 1.6 3200

ESD (Event Summary Data) 200 1.00E+07 2.00E+09 0.5 1000

General ESD 180 1.00E+07 1.80E+09 0.5 900

General AOD (Analysis Object Data) 180 1.00E+07 1.80E+09 0.1 180

General TAG 180 1.00E+07 1.80E+09 0.001 2

Calibration 40

MC Raw 1.00E+08 2 200

ESD Sim 1.00E+08 0.5 50

AOD Sim 1.00E+08 0.1 10

TAG Sim 1.00E+08 0.001 0

Tuple 0.01

Nominal year: 10Nominal year: 1077 s sAccelerator efficiency: 50%Accelerator efficiency: 50%

A. De Salvo – A. De Salvo – Atlas Computing Atlas Computing – – Terzo workshop sul calcolo nell'INFN, 27-5-2004Terzo workshop sul calcolo nell'INFN, 27-5-2004

Processing timesProcessing times

Reconstruction Time/event for Reconstruction now: 60 kSI2k sec

• We could recover a factor 4:• factor 2 from running only one default algorithm• factor 2 from optimization

• Foreseen reference: 15 kSI2k sec/event

Simulation Time/event for Simulation now: 400 kSI2k sec

• We could recover a factor 4:• factor 2 from optimization (work already in progress)• factor 2 on average from the mixture of different physics processes (and rapidity

ranges)• Foreseen reference: 100 kSI2k sec/event

Number of simulated events needed: 108 events/year• Generate samples about 3-6 times the size of their streamed AOD samples

A. De Salvo – A. De Salvo – Atlas Computing Atlas Computing – – Terzo workshop sul calcolo nell'INFN, 27-5-2004Terzo workshop sul calcolo nell'INFN, 27-5-2004

Production/analysis modelProduction/analysis model

Central analysis Central production of tuples and TAG collections from ESD Estimate data reduction to 10% of full AOD

• About 720Gb/group/annum 0.5kSI2k per event (estimate), quasi real time 9MSI2k

User analysis Tuples/streams analysis New selections Each user will perform 1/N of the MC non-central simulation load

• analysis of WG samples and AOD• private simulations

Total requirement 4.7kSI2k and 1.5/1.5Tb disk/tape Assume this is all done on T2s

DC2 will provide very useful informations in this domain

A. De Salvo – A. De Salvo – Atlas Computing Atlas Computing – – Terzo workshop sul calcolo nell'INFN, 27-5-2004Terzo workshop sul calcolo nell'INFN, 27-5-2004

Computing centers in AtlasComputing centers in Atlas Tiers defined by capacity and level of service

Tier-0 (CERN)• Hold a copy of all raw data to tape• Copy in real time all raw data to Tier-1’s (second copy useful also for later reprocessing)• Keep calibration data on disk• Run first-pass calibration/alignment and reconstruction• Distribute ESD’s to external Tier-1’s

• (1/3 to each one of 6 Tier-1’s)

Tier-1’s (at least 6):• Regional centers• Keep on disk 1/3 of the ESD’s and a full AOD’s and TAG’s• Keep on tape 1/6 of Raw Data• Keep on disk 1/3 of currently simulated ESD’s and on tape 1/6 of previous versions• Provide facilities for physics group controlled ESD analysis• Calibration and/or reprocessing of real data (one per year)

Tier-2’s (about 4 per Tier-1)• Keep on disk a full copy of TAG and roughly one full AOD copy per four T2s• Keep on disk a small selected sample of ESD’s• Provide facilities (CPU and disk space) for user analysis and user simulation (~25 users/Tier-2)• Run central simulation

A. De Salvo – A. De Salvo – Atlas Computing Atlas Computing – – Terzo workshop sul calcolo nell'INFN, 27-5-2004Terzo workshop sul calcolo nell'INFN, 27-5-2004

Tier-1 RequirementsTier-1 Requirements

External T1 : Storage requirement  Fraction

  Disk (TB) Tape (TB)    

General ESD (curr.) 429 150 1/3

General ESD (prev..) 214 150 1/6

AOD 257 180 1/1

TAG 3 2 1/1

RAW Data (sample) 6 533 1/6

RAW sim 0.0 33.3 1/6

ESD Sim (curr.) 23.8 8.3 1/3

ESD Sim (prev.) 11.9 8.3 1/6

AOD Sim 14 10 1/1

Tag Sim 0 0 1/1

User Data (20 groups) 171 120 1/3

Total 1130 1195  

R.

Jon

es –

Atl

as S

oft

ware

Work

sh

op

may 2

00

5

Processing for Physics Groups 1760 kSI2kReconstruction 588 kSI2k

A. De Salvo – A. De Salvo – Atlas Computing Atlas Computing – – Terzo workshop sul calcolo nell'INFN, 27-5-2004Terzo workshop sul calcolo nell'INFN, 27-5-2004

Tier-2 RequirementsTier-2 Requirements

R.

Jon

es –

Atl

as S

oft

ware

Work

sh

op

may 2

00

5

External T2 : Storage requirement Fraction

  Disk (TB) Tape (TB)   

General ESD (curr.) 26 0 1/50

General ESD (prev..) 0 18 1/50

AOD 64 0 1/4

TAG 3 0 1/1

ESD Sim (curr.) 1.4 0 1/50

ESD Sim (prev.) 0 1 1/50

AOD Sim 14 10 1/1

User Data (600/6/4=25) 37 26

Total 146 57

Simulation 21 kSI2kReconstruction 2 kSI2k

Users 176 kSI2kTotal: 199 kSI2k

A. De Salvo – A. De Salvo – Atlas Computing Atlas Computing – – Terzo workshop sul calcolo nell'INFN, 27-5-2004Terzo workshop sul calcolo nell'INFN, 27-5-2004

Tier 0/1/2 sizesTier 0/1/2 sizes

 

CERNT0+T1/2

All T1(6)

All T2(24) Total

Auto tape (Pb) 4.4 7.2 1.4 12.9

Shelf tape (Pb) 3.2 0.0 0.0 3.2

Disk (Pb) 1.9 6.8 3.5 12.2

CPU (MSI2k) 4.8 14.2 4.8 23.8

Efficiencies (LCG numbers, Atlas sw workshop May 2004 – R. Jones) Scheduled CPU activity, 85% efficiency Chaotic CPU activity, 60% Disk usage, 70% efficient Tape assumed 100% efficient

R.

Jon

es –

Atl

as S

oft

ware

Work

sh

op

may 2

00

5

A. De Salvo – A. De Salvo – Atlas Computing Atlas Computing – – Terzo workshop sul calcolo nell'INFN, 27-5-2004Terzo workshop sul calcolo nell'INFN, 27-5-2004

Atlas Computing SystemAtlas Computing System

R.

Jon

es –

Atl

as S

oft

ware

Work

sh

op

may 2

00

5

Tier2 Centre ~200kSI2k

Event Builder

Event Filter~159kSI2k

T0 ~5MSI2k

UK Regional Centre (RAL)

US Regional Centre

French Regional Centre

Italian Regional Centre

RM1 MINALNF

Workstations

10 GB/sec

450 Mb/sec

100 - 1000 MB/s

•Some data for calibration and monitoring to institutess

•Calibrations flow back

Each Tier 2 has ~25 physicists working on one or more channels

Each Tier 2 should have the full AOD, TAG & relevant Physics Group summary data

Tier 2 do bulk of simulation

Physics data cache

~Pb/sec

~ 300MB/s/T1 /expt

Tier2 Centre ~200kSI2k

Tier2 Centre ~200kSI2k622Mb/s

Tier 0Tier 0

Tier 1Tier 1

PC (2004) = ~1 kSpecInt2k

Tier 2Tier 2 ~200 Tb/year/T2

~7.7MSI2k/T1 ~2 Pb/year/T1

~9 Pb/year/T1 No simulation

622Mb/s

DesktopDesktop

A. De Salvo – A. De Salvo – Atlas Computing Atlas Computing – – Terzo workshop sul calcolo nell'INFN, 27-5-2004Terzo workshop sul calcolo nell'INFN, 27-5-2004

Atlas computing in 2004Atlas computing in 2004

““Collaboration” activitiesCollaboration” activities Data Challenge 2

• May-August 2004• Real test of computing model for computing TDR (end 2004)• Simulation, reconstruction, analysis & calibration

Combined test-beam activities• Combined test-beam operation concurrent with DC2

and using the same tools

“Local” activities• Single muon simulation (Rome1, Naples)• Tau studies (Milan)• Higgs production (LNF)• Other ad-hoc productions

A. De Salvo – A. De Salvo – Atlas Computing Atlas Computing – – Terzo workshop sul calcolo nell'INFN, 27-5-2004Terzo workshop sul calcolo nell'INFN, 27-5-2004

Goals in 2004Goals in 2004

DC2/test-beamDC2/test-beam Computing model studies Pile-up digitization in Athena Deployment of the complete Event Data Model and the Detector Description Simulation of full Atlas and 2004 Combined Testbeam Test of the calibration and alignment procedures Full use of Geant4, POOL and other LCG applications Use widely the GRID middleware and tools Large scale physics analysis Run as much as possible the production on GRID

• Test the integration of multiple GRIDs

“Local” activities• Run local, ad-hoc productions using the LCG tools

A. De Salvo – A. De Salvo – Atlas Computing Atlas Computing – – Terzo workshop sul calcolo nell'INFN, 27-5-2004Terzo workshop sul calcolo nell'INFN, 27-5-2004

DC2 timescaleDC2 timescale September 03: Release7

Mid-November 03: pre-production release

March 17th 04: Release 8 (production)

May 17th 04:

June 23rd 04:

July 15th 04:

August 1st

Put in place, understand & validate: Geant4; POOL; LCG applications Event Data Model Digitization; pile-up; byte-stream Conversion of DC1 data to POOL; large scale persistency

tests and reconstruction

Testing and validation Run test-production

Testing and validation Continuous testing of s/w components Improvements on Distribution/Validation Kit

Start final validation Intensive test of “Production System”

Event generation ready Simulation ready

Data preparation Data transfer

Reconstruction ready

Tier 0 exercise

Physics and Computing model studies Analysis (distributed) Reprocessing Alignment & calibration

Slid

e f

rom

Gilb

ert

Pou

lard

Slid

e f

rom

Gilb

ert

Pou

lard

A. De Salvo – A. De Salvo – Atlas Computing Atlas Computing – – Terzo workshop sul calcolo nell'INFN, 27-5-2004Terzo workshop sul calcolo nell'INFN, 27-5-2004

DC2 resourcesDC2 resources

Process No. of events

Time duration

CPU power

Volume of data

AtCERN

Offsite

months kSI2k TB TB TB

Simulation 107 2 1000 20 4 16 Phase I(May-June-July)RDO 107 2 100 20 4 16

Pile-upDigitization

107 2 100 30 30 24

Event mixing & Byte-stream

107 2 (small) 20 20 0

Total Phase I 107 2 1200 90 58 56

ReconstructionTier-0

107 0.5 600 5 5 10 PhaseII

(>July)ReconstructionTier-1

107 2 600 5 0 5

Total 107 100 63 71

A. De Salvo – A. De Salvo – Atlas Computing Atlas Computing – – Terzo workshop sul calcolo nell'INFN, 27-5-2004Terzo workshop sul calcolo nell'INFN, 27-5-2004

Tiers in DC2Tiers in DC2Country “Tier-1” Sites Grid kSI2k (ATLAS DC)

Australia NG 12

Austria LCG 7

Canada TRIUMF 7 LCG 331

CERN CERN 1 LCG 700

China LCG 30

Czech Republic LCG 25

France CCIN2P3 1 LCG ~ 140

Germany GridKa 3 LCG 90

Greece LCG 10

Israel LCG 23

Italy CNAF 5 LCG 200

Japan Tokyo 1 LCG 127

Netherlands NIKHEF 1 LCG 75

NorduGrid NG 30 NG 380

Poland LCG 80

Russia LCG ~ 70

Slovakia LCG

Slovenia NG

Spain PIC 4 LCG 50

Switzerland LCG 18

Taiwan ASTW 1 LCG 78

UK RAL 8 LCG ~ 1000

US BNL 28 Grid3/LCG ~ 1000

More than 23 countries involvedMore than 23 countries involved

A. De Salvo – A. De Salvo – Atlas Computing Atlas Computing – – Terzo workshop sul calcolo nell'INFN, 27-5-2004Terzo workshop sul calcolo nell'INFN, 27-5-2004

DC2 toolsDC2 tools

Installation toolsInstallation tools Atlas software distribution kitAtlas software distribution kit Validation suiteValidation suite

Production systemProduction system Atlas production system interfaced to LCG, US-Grid, NorduGrid and

legacy systems (batch systems) Tools

• Production management• Data management• Cataloguing• Bookkeping• Job submission

GRID distributed analysis• ARDA domain: test services and implementations

A. De Salvo – A. De Salvo – Atlas Computing Atlas Computing – – Terzo workshop sul calcolo nell'INFN, 27-5-2004Terzo workshop sul calcolo nell'INFN, 27-5-2004

Software installationSoftware installation Software installation and configuration via PACMAN

Full use of the Atlas Code Management Tool (CMT)

Relocatable, multi-release distribution No root privileges needed to install GRID-enabled installation

Grid installation via submission of a job to the destination sites

Software validation tools, integrated with the GRID installation procedure A site is marked as validated after the installed software is checked with the validation tools

Distribution format Pacman packages (tarballs)

Kit creation Building scripts (Deployment package) Built in about 3 hours, after the release is built

Kit requirementsKit requirements RedHat 7.3RedHat 7.3 >= 512 MB of RAM>= 512 MB of RAM Approx 4 GB of disk space + 2 GB in the installation phase for a full installation of a single releaseApprox 4 GB of disk space + 2 GB in the installation phase for a full installation of a single release

Kit installation pacman –get http://atlas.web.cern.ch/Atlas/GROUPS/SOFTWARE/OO/pacman/cache:7.5.0/AtlasRelease

Documentation (building, installing and using) http://atlas.web.cern.ch/Atlas/GROUPS/SOFTWARE/OO/sit/Distribution

A. De Salvo – A. De Salvo – Atlas Computing Atlas Computing – – Terzo workshop sul calcolo nell'INFN, 27-5-2004Terzo workshop sul calcolo nell'INFN, 27-5-2004

Atlas Production System componentsAtlas Production System components

Production database Oracle based Hold definition for the job transformations Hold sensible data on the jobs life cycle

Supervisor (Windmill) Consumes jobs from the production database Dispatch the work to the executors Collect info on the job life-cycle Interact with the DMS for data registration and movements among the systems

Executor One for each grid falvour and legacy system

• LCG (Lexor)• NorduGrid (Dulcinea)• US Grid (Capone)• LSF

Communicates with the supervisor Executes the jobs to the specific subsystems

• Flavour-neutral job definitions are specialized for the specific needs• Submit to the GRID/legacy system• Provide access to GRID flavour specific tools

Data Management System (Don Quijote) Global cataloguing system Allows global data management Common interface on top of the system-specific facilities

A. De Salvo – A. De Salvo – Atlas Computing Atlas Computing – – Terzo workshop sul calcolo nell'INFN, 27-5-2004Terzo workshop sul calcolo nell'INFN, 27-5-2004

Atlas Production System architectureAtlas Production System architecture

RBChimera

RB

Task(Dataset)

PartitionTransf.

Definition

TaskTransf.

Definition+ physics signature

Transformation infosRelease versionsignature

Supervisor 1 Supervisor 2 Supervisor 4

US Grid LCG NG Local Batch

Task = [job]*Task = [job]*Dataset = [partition]*Dataset = [partition]* JOB DESCRIPTION

Humanintervention

DataManagement

System

US GridExecuter

LCGExecuter

NGExecuter

LSFExecuter

Supervisor 3

JobRun Info

LocationHint

(Task)

LocationHint(Job)

Job(Partition)

Jabber Jabber Jabber Jabber

A. De Salvo – A. De Salvo – Atlas Computing Atlas Computing – – Terzo workshop sul calcolo nell'INFN, 27-5-2004Terzo workshop sul calcolo nell'INFN, 27-5-2004

DC2 statusDC2 status

DC2 first phase started May 3rdDC2 first phase started May 3rd Test the production system Start the event generation/simulation tests

Full production should start next week Full use of the 3 GRIDs and legacy systems

DC2 jobs will be monitored via GridICEand an ad-hoc monitoring system, interfaced to the production DB and the production systems

A. De Salvo – A. De Salvo – Atlas Computing Atlas Computing – – Terzo workshop sul calcolo nell'INFN, 27-5-2004Terzo workshop sul calcolo nell'INFN, 27-5-2004

Atlas Computing & INFN (1)Atlas Computing & INFN (1)

Responsibles & managersResponsibles & managers D. BarberisD. Barberis

• Genova, inizialmente membro del Computing Steering Group come responsabile del software Genova, inizialmente membro del Computing Steering Group come responsabile del software dell.Inner Detector, ora ATLAS Computing Coordinatordell.Inner Detector, ora ATLAS Computing Coordinator

G. CataldiG. Cataldi• Lecce, nuovo coordinatore del programma OO di ricostruzione dei muoni, MooreLecce, nuovo coordinatore del programma OO di ricostruzione dei muoni, Moore

S. FalcianoS. Falciano• Roma1, responsabile TDAQ/LVL2Roma1, responsabile TDAQ/LVL2

A. FarillaA. Farilla• Roma3, inizialmente responsabile Moore e segretario scientifico SCASI, ora Muon Reconstruction Roma3, inizialmente responsabile Moore e segretario scientifico SCASI, ora Muon Reconstruction

Coordinator e coordinatore del software per il Combined Test BeamCoordinator e coordinatore del software per il Combined Test Beam L. LuminariL. Luminari

• Roma1, rappresentante INFN nell.ICB e referente per attivit legate al modello di calcolo in ItaliaRoma1, rappresentante INFN nell.ICB e referente per attivit legate al modello di calcolo in Italia A. NisatiA. Nisati

• Roma1, in rappresentanza della LVL1 simulation e Chair del TDAQ Institute BoardRoma1, in rappresentanza della LVL1 simulation e Chair del TDAQ Institute Board L. PeriniL. Perini

• Milano, presidente, ATLAS Grid Co-convener, rappresentante di ATLAS in vari organismi LCG e EGEEMilano, presidente, ATLAS Grid Co-convener, rappresentante di ATLAS in vari organismi LCG e EGEE G. PoleselloG. Polesello

• Pavia, Atlas Physics CoordinatorPavia, Atlas Physics Coordinator A. RimoldiA. Rimoldi

• Pavia, ATLAS Simulation Coordinator e membro del Software Project Management BoardPavia, ATLAS Simulation Coordinator e membro del Software Project Management Board V. VercesiV. Vercesi

• Pavia, PESA Coordinator e membro del Computing Managament BoardPavia, PESA Coordinator e membro del Computing Managament Board

A. De Salvo – A. De Salvo – Atlas Computing Atlas Computing – – Terzo workshop sul calcolo nell'INFN, 27-5-2004Terzo workshop sul calcolo nell'INFN, 27-5-2004

Atlas Computing & INFN (2)Atlas Computing & INFN (2)

Atlas INFN sites LCG compliant for DC2Atlas INFN sites LCG compliant for DC2 Tier-1

• CNAF (G.Negri) Tier-2

• Frascati (M. Ferrer)• Milan (L. Perini, D. Rebatto, S. Resconi, L. Vaccarossa) • Naples (G. Carlino, A. Doria, L. Merola)• Rome1 (A. De Salvo, A. Di Mattia, L. Luminari)

Activities Development of the LCG interface to the Atlas Production Tool

• F. Conventi, A. De Salvo, A. Doria, D. Rebatto, G. Negri, L. Vaccarossa Participation to the DC2 using the GRID middleware (May - July 2004) Local productions with GRID tools Atlas VO management (A. De Salvo) Atlas code distribution (A. De Salvo)

• Atlas code distribution model (PACMAN based) fully deployed• The current installation system/procedure gives the possibility to have easily the cohexistence of

the Atlas software and other experiments’ environment Atlas distribution kit validation (A. De Salvo) Transformations for DC2 (A. De Salvo)

A. De Salvo – A. De Salvo – Atlas Computing Atlas Computing – – Terzo workshop sul calcolo nell'INFN, 27-5-2004Terzo workshop sul calcolo nell'INFN, 27-5-2004

ConclusionsConclusions First real test of the Atlas computing model is starting

DC2 tests started at the beginning of May “Real” production starting in June Will give important informations for the Computing TDR

Very intensive use of the GRIDs Atlas Production System interfacted to LCG, NG and US Grid

(GRID3) Global data management system

Getting closer to the real experiment computing model