ATLAS Data Challenges NorduGrid Workshop Uppsala November 11-13; 2002 Gilbert Poulard ATLAS DC...
-
date post
18-Dec-2015 -
Category
Documents
-
view
214 -
download
0
Transcript of ATLAS Data Challenges NorduGrid Workshop Uppsala November 11-13; 2002 Gilbert Poulard ATLAS DC...
ATLAS Data Challenges
NorduGrid WorkshopUppsalaNovember 11-13; 2002
Gilbert PoulardATLAS DC coordinatorCERN EP-ATC
November 11, 2002
G. Poulard - NorduGrid Workshop 2
Outline Introduction DC0 DC1 Grid activities in ATLAS DCn’s Summary
DC web page:http://atlasinfo.cern.ch/Atlas/GROUPS/SOFTWARE/DC/index.html
November 11, 2002
G. Poulard - NorduGrid Workshop 3
Data challenges Why?
In the context of the CERN Computing Review it has been recognized that the Computing for LHC was very complex and requested a huge amount of resources.
Several recommendations were made, among them:
Create the LHC Computing Grid (LCG) project Ask the experiments to launch a set of Data
Challenges to understand and validate • Their computing model; data model; software suite• Their technology choices • The scalability of the chosen solutions
November 11, 2002
G. Poulard - NorduGrid Workshop 4
ATLAS Data challenges In ATLAS it was decided
To foresee a “serie” of DCs of increasing complexity Start with data which looks like real data Run the filtering and reconstruction chain Store the output data into the ‘ad-hoc’ persistent
repository Run the analysis Produce physics results
To study Performance issues, persistency technologies,
analysis scenarios, ... To identify
weaknesses, bottle necks, etc… (but also good points) In using both the hardware (prototype) the software
and the middleware developed and/or deployed by the LCG project
November 11, 2002
G. Poulard - NorduGrid Workshop 5
ATLAS Data challenges But it was also acknowledged that:
Today we don’t have ‘real data’ Need to produce ‘simulated data’ first So:
• Physics Event generation • Detector Simulation• Pile-up • Reconstruction and analysis
will be part of the first Data Challenges we need also to “satisfy” the
requirements of the ATLAS communities HLT, Physics groups, ...
November 11, 2002
G. Poulard - NorduGrid Workshop 6
ATLAS Data challenges
In addition it is understood that the results of the DCs should be used to Prepare a computing MoU in due time Perform a new Physics TDR ~one year
before the real data taking The retained schedule was to:
start with DC0 in late 2001 Considered at that time as a preparation
one continue with one DC per year
November 11, 2002
G. Poulard - NorduGrid Workshop 7
DC0
Was defined as A readiness and continuity test
Have the full chain running from the same release A preparation for DC1; in particular
One of the main emphasis was to put in place the full infrastructure with Objectivity (which was the base-line technology for persistency at that time)
It should also be noted that there was a strong request from the physicists to be able to reconstruct and analyze the “old” physics TDR data within the new Athena framework
November 11, 2002
G. Poulard - NorduGrid Workshop 8
DC0: Readiness & continuity tests (December 2001 – June 2002)
“3 lines” for “full” simulation 1) Full chain with new geometry (as of January
2002)Generator->(Objy)->Geant3->(Zebra->Objy)->Athena rec.->(Objy)-
>Analysis
2) Reconstruction of ‘Physics TDR’ data within Athena
(Zebra->Objy)->Athena rec.-> (Objy) -> Simple analysis 3) Geant4
Robustness test Generator-> (Objy)->Geant4->(Objy)
“1 line” for “fast” simulationGenerator-> (Objy) -> Atlfast -> (Objy)
Continuity test: Everything from the same release for the full chain (3.0.2)
November 11, 2002
G. Poulard - NorduGrid Workshop 9
Atlsim/G3 Hits/DigitsMCTruth
Atlsim/G3
Atlsim/G3
Pyt
hia
6
H 4 mu
Objectivity/DB Objectivity/DB
HepMC
HepMC
HepMC
Event generation
Schematic View of Task Flow for DC0
Detector Simulation
Hits/DigitsMCTruth
Athena
Hits/DigitsMCTruth
Athena
AthenaHits/DigitsMCTruth
Hits/DigitsMCTruth
Hits/DigitsMCTruth
Zebra
Data Conversion
Athena AOD
Athena AOD
Athena AOD
Reconstruction
November 11, 2002
G. Poulard - NorduGrid Workshop 10
DC0: Readiness & continuity tests (December 2001 – June 2002)
Took longer than foreseen Due to several reasons
Introduction of new toolsChange of the base-line for
persistency• Which has as a major consequence to
divert some of the man powerUnder-evaluation of the statement
• “have everything from the same release”
Nevertheless we learnt a lot Was completed in June 2002
November 11, 2002
G. Poulard - NorduGrid Workshop 11
ATLAS Data Challenges: DC1 Original goals (November 2001):
reconstruction & analysis on a large scale learn about data model; I/O performances; identify bottlenecks …
data management Use/evaluate persistency technology (AthenaRoot I/O) Learn about distributed analysis
Need to produce data for HLT & Physics groups HLT TDR has been delayed to mid 2003
• Study performance of Athena and algorithms for use in HLT• High statistics needed
– Scale: few samples of up to 107 events in 10-20 days, O(1000) PC’s
– Simulation & pile-up will play an important role
Introduce new Event Data Model Checking of Geant4 versus Geant3 Involvement of CERN & outside-CERN sites: Worldwide
excersise use of GRID middleware as and when possible and appropriate
To cope with different sets of requirements and for technical reasons (including software development, access to resources) decided to split DC1 into two phases
November 11, 2002
G. Poulard - NorduGrid Workshop 12
ATLAS DC1 Phase I (April – August 2002)
Primary concern is delivery of events to HLT community Put in place the MC event generation & detector
simulation chain Put in place the distributed MonteCarlo production
Phase II (October 2002 – January 2003) Provide data with (and without) ‘pile-up’ for HLT studies Introduction & testing of new Event Data Model (EDM) Evaluation of new persistency technology Use of Geant4 Production of data for Physics and Computing Model
studies Testing of computing model & of distributed analysis
using AOD Use more widely GRID middleware
November 11, 2002
G. Poulard - NorduGrid Workshop 13
DC1 preparation First major issue was to get the software
ready New geometry (compared to December-DC0
geometry) New persistency mechanism
… Validated … Distributed
“ATLAS kit” (rpm) to distribute the software And to put in place the production scripts and
tools (monitoring, bookkeeping) Standard scripts to run the production AMI bookkeeping database ( Grenoble) Magda replica-catalog (BNL)
November 11, 2002
G. Poulard - NorduGrid Workshop 14
DC1 preparation: software (1)
New geometry (compared to December-DC0 geometry) Inner Detector
Beam pipe Pixels: Services; material updated; More information in hits; better digitization SCT tilt angle reversed (to minimize clusters) TRT barrel: modular design Realistic field
Calorimeter ACBB: material and readout updates ENDE: dead material and readout updated (last minute update to be avoided if
possible) HEND: dead material updated FWDC: detailed design End-cap Calorimeters shifted by 4 cm. Cryostats split into Barrel and End-cap
Muon AMDB p.03 (more detailed chambers cutouts) Muon shielding update
November 11, 2002
G. Poulard - NorduGrid Workshop 15
ATLAS Geometry
-Inner Detector
-Calori meters
-Muon System
November 11, 2002
G. Poulard - NorduGrid Workshop 16
ATLAS/G3 Few Numbers at a Glance
25,5 millions distinct volume copies 23 thousands different volume objects 4,673 different volume types Few hundred pile-up events possible About 1 million hits per event on
average
November 11, 2002
G. Poulard - NorduGrid Workshop 17
DC1 preparation: software (2)
New persistency mechanism AthenaROOT/IO
Used for generated events Readable by Atlfast and Atlsim
Simulation still using zebra
November 11, 2002
G. Poulard - NorduGrid Workshop 18
DC1/Phase I preparation: kit; scripts & tools
Kit “ATLAS kit” (rpm) to distribute the software
It installs release 3.2.1 (all binaries) without any need of AFS
It requires : • Linux OS (Redhat 6.2 or Redhat 7.2) • CERNLIB 2001 (from DataGrid repository) cern-0.0-
2.i386.rpm (~289 MB) It can be downloaded :
• from a multi-release page (22 rpm's; global size ~ 250 MB )• “tar” file also available
Scripts and tools (monitoring, bookkeeping) Standard scripts to run the production AMI bookkeeping database
November 11, 2002
G. Poulard - NorduGrid Workshop 19
Atlsim/Geant3+ Filter
105 events
Atlsim/Geant3+ Filter
Hits/Digits
MCTruth
Atlsim/Geant3+ Filter
As an example, for 1 sample of di-jet events: Event generation: 1.5 x 107 events in 150 partitions Detector simulation: 3000 jobs
Pyt
hia
6
Di-jet
Athena-Root I/O Zebra
HepMC
HepMC
HepMC
Event generation
DC1/Phase I Task Flow
Detector Simulation
(5000 evts) (~450 evts)
Hits/Digits
MCTruth
Hits/Digits
MCtruth
November 11, 2002
G. Poulard - NorduGrid Workshop 20
DC1 preparation: validation & quality control
We defined two types of validation Validation of the sites:
We processed the same data in the various centres and made the comparison
• To insure that the same software was running in all production centres
• We also checked the random number sequences
Validation of the simulation: We used both “old” generated data & “new” data
• Validation datasets: di-jets, single ,e, ,H4e/2/2e2/4• About 107 evts reconstructed in June, July and August
We made the comparison between “old” and “new” simulated data
November 11, 2002
G. Poulard - NorduGrid Workshop 21
DC1 preparation: validation & quality control
This was a very “intensive” activity Many findings: simulation or software
installation sites problems (all eventually solved)
We should increase the number of people involved
It is a “key issue” for the success!
November 11, 2002
G. Poulard - NorduGrid Workshop 22
Example: jets distribution (di-jets sample)
Old sim
sample
New sim
sample
2
Comparison
Reappearance of an old dice version in a site installed software
November 11, 2002
G. Poulard - NorduGrid Workshop 23
Validation samples (740k events)
single particles (e, , , ), jet scans, Higgs events
Single-particle production (30 million events)
single (low pT; pT=1000 GeV with 2.8<<3.2) single (pT=3, …, 100 GeV) single e and
different energies (E=5, 10, …, 200, 1000 GeV) fixed points; scans (||<2.5); crack scans
(1.3<<1.8) standard beam spread (z=5.6 cm);
fixed vertex z-components (z=0, 4, 10 cm)
Minimum-bias production (1.5 million events)
different regions (||<3, 5, 5.5, 7)
Data Samples I
November 11, 2002
G. Poulard - NorduGrid Workshop 24
QCD di-jet production (5.2 million events)
different cuts on ET(hard scattering) during generation
large production of ET>11, 17, 25, 55 GeV samples, applying particle-level filters
large production of ET>17, 35 GeV samples, without filtering, full simulation within ||<5
smaller production of ET>70, 140, 280, 560 GeV samples
Physics events requested by various HLT groups (e/, Level-1, jet/ETmiss, B-physics, b-jet, ; 4.4 million events)
large samples for the b-jet trigger simulated with default (3 pixel layers) and staged (2 pixel layers)
layouts
B-physics (PL) events taken from old TDR tapes
Data Samples II
November 11, 2002
G. Poulard - NorduGrid Workshop 25
ATLAS DC1/Phase I: July-August 2002
Goals : Produce the data needed for the HLT TDR Get as many ATLAS institutes involved as possible
Worldwide collaborative activity
Participation : 39 Institutes in 18 countries
Australia Austria Canada CERN Czech Republic Denmark France Germany Israel
Italy Japan Norway Russia Spain Sweden Taiwan UK USA
November 11, 2002
G. Poulard - NorduGrid Workshop 26
ATLAS DC1 Phase I : July-August 2002ATLAS DC1 Phase I : July-August 2002
CPU Resources used : Up to 3200 processors (5000 PIII/500 equivalent) 110 kSI95 (~ 50% of one Regional Centre at LHC
startup) 71000 CPU*days To simulate one di-jet event : 13 000 SI95sec
Data Volume : 30 Tbytes 35 000 files Output size for one di-jet event (2.4 Mbytes) Data kept at production site for further processing
Pile-up Reconstruction Analysis
November 11, 2002
G. Poulard - NorduGrid Workshop 27
Contribution to the overall CPU-time (%) per country
1.41%
10.92%
0.01%
1.46%9.59%2.36%
4.94%
10.72%
2.22%
3.15%
4.33%
1.89%
3.99%
14.33%
0.02%
28.66%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
3200 CPU‘s110 kSI9571000 CPU days
5*10*7 events generated
1*10*7 events simulated
3*10*7 single particles
30 Tbytes
35 000 files
ATLAS DC1 Phase I : July-August 2002
39 institutions in 18 countries
November 11, 2002
G. Poulard - NorduGrid Workshop 28
Maximum number of normalised processors per country in use for ATLAS DC1 phase 1
48
9
420
900
136
70
332
154
305
218321
206
256
94
588
795
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
ATLAS DC1 Phase 1 : July-August 2002
November 11, 2002
G. Poulard - NorduGrid Workshop 29
ATLAS DC1 Phase II Provide data with and without ‘pile-up’ for HLT studies
new data samples (huge amount of requests) Pile-up in Atlsim “Byte stream” format to be produced
Introduction & testing of new Event Data Model (EDM) This should include new Detector Description
Evaluation of new persistency technology Use of Geant4 Production of data for Physics and Computing Model studies
Both ESD and AOD will be produced from Athena Reconstruction
We would like to get the ‘large scale reconstruction’ and the ‘data-flow’ studies ready but not be part of Phase II
Testing of computing model & of distributed analysis using AOD
Use more widely GRID middleware (have a test in November)
November 11, 2002
G. Poulard - NorduGrid Workshop 30
Pile-up First issue is to produce the pile-up data for
HLT We intend to do this now Code is ready Validation is in progress No “obvious” problems
November 11, 2002
G. Poulard - NorduGrid Workshop 31
Luminosity Effect Simulation
• Aim Study Interesting Processing at different Luminosity L• Separate Simulation of Physics Events & Minimum Bias Events• Merging of
• Primary Stream (Physics)• Background Stream (Pileup)
Primary Stream(KINE,HITS)
Background Stream(KINE,HITS
DIGITIZATION
Bunch Crossing (DIGI)
1 N( L )
November 11, 2002
G. Poulard - NorduGrid Workshop 32
Pile-up features
Different detectors have different memory time requiring very different number of minimum bias events to be read in Silicons, Tile calorimeter: t<25
ns Straw tracker: t<~40-50 ns Lar Calorimeters: 100-400 ns Muon Drift Tubes: 600 ns
Still we want the pile-up events to be the same in different detectors !
November 11, 2002
G. Poulard - NorduGrid Workshop 35
Pile-up production Scheduled for October-November 2002 Both low (2 x 1033) and high luminosity
(1034) data will be prepared Resources estimate:
10000 CPU days (NCU) 70 Tbyte of data 100000 files
November 11, 2002
G. Poulard - NorduGrid Workshop 36
ATLAS DC1 Phase II (2)
Next steps will be to run the reconstruction within Athena framework
Most functionality should be there with release 5.0.0• Probably not ready for ‘massive’ production
Reconstruction ready by the end of November produce the “byte-stream” data perform the analysis of the AOD
In parallel the dedicated code for HLT studies is being prepared (PESA release 3.0.0)
Geant4 tests with a quite complete geometry should be available by mid-December
Large scale Grid test is scheduled for December “Expected “end” date 31st January 2003 “Massive” reconstruction is not part of DC1 Phase II
November 11, 2002
G. Poulard - NorduGrid Workshop 37
ATLAS DC1 Phase II (3)
Compared to Phase I More automated production “Pro-active” use of the AMI bookkeeping database to
prepare the jobs and possibly to monitor the production “Pro-active” use of the “magda” replica catalog
We intend to run the “pile-up” production as much as possible where the data is But we have already newcomers (countries and
institutes) We do not intend to send all the pile-up data to CERN Scenari to access the data for reconstruction and
analysis are being studied Use of Grid tools is ‘seriously’ considered
November 11, 2002
G. Poulard - NorduGrid Workshop 38
ATLAS DC1/Phase II: October 2002-January 2003
Goals : Produce the data needed for the HLT TDR Get as many ATLAS institutes involved as possible
Worldwide collaborative activity
Participation : 43 Institutes
Australia Austria Canada CERN China Czech Republic Denmark France Germany Greece Israel
Italy Japan Norway Russia Spain Sweden Taiwan UK USA
November 11, 2002
G. Poulard - NorduGrid Workshop 39
ATLAS Planning for Grid Activities
Advantages of using the Grid: Possibility to do worldwide production in a
perfectly coordinated way, using identical software, scripts and databases.
Possibility do distribute the workload adequately and automatically, without logging in explicitly to each remote system.
Possibility to execute tasks and move files over a distributed computing infrastructure by using one single personal certificate (no need to memorize dozens of passwords).
Where we are now: Several Grid toolkits are on the market. EDG – probably the most elaborated, but still in
development. This development goes way faster with the help of
the users running real applications.
November 11, 2002
G. Poulard - NorduGrid Workshop 40
Present Grid Activities
Atlas already used Grid test-beds in DC1/1 11 out of 39 sites ( ~5% of the total production)
used Grid middleware: NorduGrid (Bergen, Grendel, Ingvar, ISV,
NBI, Oslo, Lund, LSCF)• all production done on the Grid
US Grid test-bed (Arlington, LBNL, Oklahoma; more sites will join in the next phase)
• used for ~10% of US DC1 production (10% = 900 CPUdays)
November 11, 2002
G. Poulard - NorduGrid Workshop 41
.... in addition ATLAS-EDG task-force
with 40 members from ATLAS and EDG (led by Oxana Smirnova)
used the EU-DataGrid middleware to rerun 350 DC1 jobs in some Tier1 prototype sites: CERN, CNAF, Lyon, RAL, NIKHEF and Karlsruhe ( CrossGrid site)
done in the first half of September) Good results have been achieved:
A team of hard-working people across the Europe ATLAS software is packed into relocatable RPMs, distributed and
validated DC1 production script is “gridified”, submission script is
produced Jobs are run at a site chosen by the resource broker
Still work needed (in progress) for reaching sufficient stability and easiness of use Atlas-EDG continuing till end 2002, interim report with
recommendations is being drafted
November 11, 2002
G. Poulard - NorduGrid Workshop 42
Grid in ATLAS DC1/1
US-ATLAS EDG Testbed Prod NorduGrid
November 11, 2002
G. Poulard - NorduGrid Workshop 43
Plans for the near future
In preparation for the reconstruction phase (spring 2003) we foresee further Grid tests in November Perform more extensive Grid tests. Extend the EDG to more ATLAS sites, not only in Europe. Test a basic implementation of a worldwide Grid. Test the inter-operability between the different Grid
flavors. Inter-operation = submit a job in region A, the job is
run in region B if the input data are in B; the produced data are stored; the job log is made available to the submitter.
The EU project DataTag has a Work Package devoted specifically to interoperation in collaboration with US IvDGL project: the results of the work of these projects is expected to be taken up by LCG (GLUE framework).
November 11, 2002
G. Poulard - NorduGrid Workshop 44
Plans for the near future (continued) ATLAS is collaborating with DataTag-IvDGL for
interoperability demonstrations in November How far we can go we will see during the next week(s) when we
will discuss with technical experts.
The DC1 data will be reconstructed (using ATHENA) early 2003: the scope and way of using Grids for distributed reconstruction will depend on the results of the November/December tests. ATLAS is fully committed to LCG and to its Grid
middleware selection process our “early tester” role has been recognized to be very useful
for EDG. We are confident that it will be the same for LCG.
November 11, 2002
G. Poulard - NorduGrid Workshop 45
Long Term Planning
Worldwide Grid tests are essential to define in detail the ATLAS distributed Computing Model.
ATLAS members are already involved in various Grid activities and take also part in inter-operability tests. In the forthcoming DCs this will become an important issue.
All these tests will be done in close collaboration with the LCG and the different Grid projects.
November 11, 2002
G. Poulard - NorduGrid Workshop 46
DC2-3-4-… DC2: Q3/2003 – Q2/2004
Goals Full deployment of EDM & Detector Description Geant4 replacing Geant3 (fully?) Pile-up in Athena Test the calibration and alignment procedures Use LCG common software (POOL; …) Use widely GRID middleware Perform large scale physics analysis Further tests of the computing model
Scale As for DC1: ~ 107 fully simulated events
DC3: Q3/2004 – Q2/2005 Goals to be defined; Scale: 5 x DC2
DC4: Q3/2005 – Q2/2006 Goals to be defined; Scale: 2 X DC3
November 11, 2002
G. Poulard - NorduGrid Workshop 47
Summary (1)
We learnt a lot from DC0 and the preparation of DC1 The involvement of all people concerned is
a success The full production chain has been put in place The validation phase was “intensive”,
“stressing” but it is a “key issue” in the process
We have in hands the simulated events required for the HLT TDR
Use of Grid tools looks very promising
November 11, 2002
G. Poulard - NorduGrid Workshop 48
Summary (2) For DC1/Phase II:
Pile-up preparation is in good shape The introduction of the new EDM is a
challenge by itself Release 5 (November 12) should provide the
requested functionality Grid tests are scheduled for
November/December Geant4 tests should be ready by mid-
December
November 11, 2002
G. Poulard - NorduGrid Workshop 49
Summary (3) After DC1
New Grid tests are foreseen in 2003 ATLAS is fully committed to LCG
As soon as LCG-1 will be ready (June 2003) we intend to actively participate to the validation effort
Dates for next DCs should be aligned to the deployment of the LCG and Grid software and middleware
November 11, 2002
G. Poulard - NorduGrid Workshop 50
Summary (4): thanks to all DC-team members
A-WP1: Event generation
A-WP1: Event generation
A-WP2: Geant3 simulation
A-WP2: Geant3 simulation
A-WP3: Geant4 Simulation
A-WP3: Geant4 Simulation
A-WP4: Pile-up
A-WP4: Pile-up
A-WP5: Detector response
A-WP5: Detector response
A-WP6: Data Conversion
A-WP6: Data Conversion
A-WP7: Event filtering
A-WP7: Event filtering
A-WP9: Analysis
A-WP9: Analysis A-WP10:
Data Management
A-WP10: Data Management
A-WP8: Reconstruction
A-WP8: Reconstruction
A-WP11: Tools
A-WP11: Tools
A-WP12: TeamsProductionValidation….
A-WP12: TeamsProductionValidation….
A-WP13: Tier Centres
A-WP13: Tier Centres
A-WP14: Fast Simulation
A-WP14: Fast Simulation