LHCb Computing Project Status report to LHCC referees J.Harvey Oct 22, 1998.
LHCb report to LHCC and C-RSG
description
Transcript of LHCb report to LHCC and C-RSG
![Page 1: LHCb report to LHCC and C-RSG](https://reader036.fdocuments.in/reader036/viewer/2022062500/56815a5e550346895dc79290/html5/thumbnails/1.jpg)
LHCb report toLHCC and C-RSG
Philippe CharpentierCERN
on behalf of LHCb
![Page 2: LHCb report to LHCC and C-RSG](https://reader036.fdocuments.in/reader036/viewer/2022062500/56815a5e550346895dc79290/html5/thumbnails/2.jpg)
LHCb to LHCC and C-RSG review, PhC 2
Activities in 2009-Q3/Q4
m Core Softwareo Stable versions of Gaudi and LCG-AA
m Applicationso Stable as of September for real datao Fast minor releases to cope with reality of life…
m Monte-Carloo Intensive MC09 simulation (@ 5TeV)
P Minimum biasP b- and c- inclusiveP b signal channels
o Few events in foreseen 2009 configuration (450 GeV)o MC09 stripping (2 passes)
P Trigger strippingP Physics stripping
m Real data reconstruction and strippingo As of November 20th …
![Page 3: LHCb report to LHCC and C-RSG](https://reader036.fdocuments.in/reader036/viewer/2022062500/56815a5e550346895dc79290/html5/thumbnails/3.jpg)
LHCb to LHCC and C-RSG review, PhC 3
Resource usage
![Page 4: LHCb report to LHCC and C-RSG](https://reader036.fdocuments.in/reader036/viewer/2022062500/56815a5e550346895dc79290/html5/thumbnails/4.jpg)
LHCb to LHCC and C-RSG review, PhC 4
139 sites hit, 4.2 million jobs
m Start in June: start of MC09
![Page 5: LHCb report to LHCC and C-RSG](https://reader036.fdocuments.in/reader036/viewer/2022062500/56815a5e550346895dc79290/html5/thumbnails/5.jpg)
LHCb to LHCC and C-RSG review, PhC 5
Job failure: 15% (17% at Tier1s)
![Page 6: LHCb report to LHCC and C-RSG](https://reader036.fdocuments.in/reader036/viewer/2022062500/56815a5e550346895dc79290/html5/thumbnails/6.jpg)
LHCb to LHCC and C-RSG review, PhC 6
Failure breakdown
![Page 7: LHCb report to LHCC and C-RSG](https://reader036.fdocuments.in/reader036/viewer/2022062500/56815a5e550346895dc79290/html5/thumbnails/7.jpg)
LHCb to LHCC and C-RSG review, PhC 7
Production and user jobs
![Page 8: LHCb report to LHCC and C-RSG](https://reader036.fdocuments.in/reader036/viewer/2022062500/56815a5e550346895dc79290/html5/thumbnails/8.jpg)
LHCb to LHCC and C-RSG review, PhC 8
Jobs at Tier1s
![Page 9: LHCb report to LHCC and C-RSG](https://reader036.fdocuments.in/reader036/viewer/2022062500/56815a5e550346895dc79290/html5/thumbnails/9.jpg)
LHCb to LHCC and C-RSG review, PhC 9
Job types at Tier1s
![Page 10: LHCb report to LHCC and C-RSG](https://reader036.fdocuments.in/reader036/viewer/2022062500/56815a5e550346895dc79290/html5/thumbnails/10.jpg)
LHCb to LHCC and C-RSG review, PhC 10
CPU used (not normalised)
m Average job durationo 5.6 hours for all jobso 20 mn for user jobs (20%)o 6.6 hours for production
jobs
![Page 11: LHCb report to LHCC and C-RSG](https://reader036.fdocuments.in/reader036/viewer/2022062500/56815a5e550346895dc79290/html5/thumbnails/11.jpg)
LHCb to LHCC and C-RSG review, PhC 11
m Average job durationo 5.6 hours for all jobso 20 mn for user jobso 6.6 hours for production
jobs
![Page 12: LHCb report to LHCC and C-RSG](https://reader036.fdocuments.in/reader036/viewer/2022062500/56815a5e550346895dc79290/html5/thumbnails/12.jpg)
LHCb to LHCC and C-RSG review, PhC 12
CPU usage (not normalised)
![Page 13: LHCb report to LHCC and C-RSG](https://reader036.fdocuments.in/reader036/viewer/2022062500/56815a5e550346895dc79290/html5/thumbnails/13.jpg)
LHCb to LHCC and C-RSG review, PhC 13
WLCG vs LHCb accounting (unnormalised)
m 13% more in WLCG than in DIRAC (unnormalised)o 1.26 Mdays vs 1.1 Mdayso Overhead of non reporting jobs + pilot/LCG/batch
frameworksm Average CPU power: 1.5 kSI2k (from WLCG
accounting)
![Page 14: LHCb report to LHCC and C-RSG](https://reader036.fdocuments.in/reader036/viewer/2022062500/56815a5e550346895dc79290/html5/thumbnails/14.jpg)
LHCb to LHCC and C-RSG review, PhC 14
Normalised CPU usage in 2009
m Ramping up of pilot role in summerm Resource usage decreased since LHC restarted
o Concentrate on (few) real datao Wait for data analysis for continuing MC simulation
m Group 1: production
m Group 2: pilotm Group 3 & 4: userm Group 5: lcgadmin
![Page 15: LHCb report to LHCC and C-RSG](https://reader036.fdocuments.in/reader036/viewer/2022062500/56815a5e550346895dc79290/html5/thumbnails/15.jpg)
LHCb to LHCC and C-RSG review, PhC 15
Resource usage
m Note: CERN above does not include non-Grid usage
o From WLCG accounting: 32% is non-Grid at CERNo CERN number should then read: 2.18 kHS06.years
m CPU usage within 10% of requestsm Distribution not exactly like expected
o More non-Tier1 resources availableP Less MC ran at CERN + Tier1s
o Almost no real data: less resources used at CERNP CAF not used as much as expected
Site Used (kHS06.years) Requested (kHS06.years)
CERN 1.48 8.54
Tier1s 8.24 11.7
Tier2s 24.44 17.12
Total 34.16 37.36
![Page 16: LHCb report to LHCC and C-RSG](https://reader036.fdocuments.in/reader036/viewer/2022062500/56815a5e550346895dc79290/html5/thumbnails/16.jpg)
LHCb to LHCC and C-RSG review, PhC 16
Storage usage
m *) From Castor queries todaym **) From WLCG accounting end Decemberm ***) Including 420 TB for T1D0 cache
m Sites provided slightly more than the pledgeso Thanks!o At CERN, some disk pools (default, T1D0) were not
included in the requests but are in the accounting
Site Requested Allocated Used
CERN*) TxD1 650 696.5 482.7
CERN*) T1D0 70 148.5 irrelevant
CERN**) 720 721 478
Tier1s**) 1740***) 1915 633
![Page 17: LHCb report to LHCC and C-RSG](https://reader036.fdocuments.in/reader036/viewer/2022062500/56815a5e550346895dc79290/html5/thumbnails/17.jpg)
LHCb to LHCC and C-RSG review, PhC 17
Experience with real data
![Page 18: LHCb report to LHCC and C-RSG](https://reader036.fdocuments.in/reader036/viewer/2022062500/56815a5e550346895dc79290/html5/thumbnails/18.jpg)
LHCb to LHCC and C-RSG review, PhC 18
First experience with real data
m Very low crossing rateo Maximum 8 bunches colliding (88 kHz crossing)o Very low luminosityo Minimum bias trigger rate: from 0.1 to 10 Hzo Data taken with single beam and with collisions
No zero-suppression in VELOOtherwise ~25 GB only!
![Page 19: LHCb report to LHCC and C-RSG](https://reader036.fdocuments.in/reader036/viewer/2022062500/56815a5e550346895dc79290/html5/thumbnails/19.jpg)
LHCb to LHCC and C-RSG review, PhC 19
Real data processing
m Iterative processo Small changes in reconstruction applicationo Improved alignmento In total 7 sets of processing conditions
P Only last files were all processed 4 times now (twice in 2010)
m Processing submissiono Automatic job creation and submission after:
P File is successfully migrated in CastorP File is successfully replicated at Tier1
o If job fails for a reason other than application crashP The file is reset as “to be processed”P New job is created / submitted (automatic)
o Processing more efficient at CERN (see later)P Eventually after few trials at Tier1, the file is processed
at CERNo No stripping ;-)
P DST files distributed to all Tier1s for analysis
![Page 20: LHCb report to LHCC and C-RSG](https://reader036.fdocuments.in/reader036/viewer/2022062500/56815a5e550346895dc79290/html5/thumbnails/20.jpg)
LHCb to LHCC and C-RSG review, PhC 20
Reconstruction jobs
![Page 21: LHCb report to LHCC and C-RSG](https://reader036.fdocuments.in/reader036/viewer/2022062500/56815a5e550346895dc79290/html5/thumbnails/21.jpg)
LHCb to LHCC and C-RSG review, PhC 21
Issues with real data
m Castor migrationo Very low rate: had to change the migration algorithm
for more frequent migration (1 hour instead of 8 hours)
m Issue with large files (above 2 GB)o Real data files are not ROOT files but open by ROOTo There was an issue with a compatibility library for
slc4-32 bit on slc5 nodesP Fixed within a day
m Wrong magnetic field signo Due to different coordinate systems for LHCb and
LHC ;-)o Fixed within hours
m Data access problem (by protocol, directly from server)
o Still dCache issue at IN2P3 and NIKHEFP dCache experts working on it
o Moved to copy mode paradigm for reconstructiono Still a problem for user jobs: a pain!
P Sites are regularly banned for analysis
![Page 22: LHCb report to LHCC and C-RSG](https://reader036.fdocuments.in/reader036/viewer/2022062500/56815a5e550346895dc79290/html5/thumbnails/22.jpg)
LHCb to LHCC and C-RSG review, PhC 22
Transfers and job latency
m No problem observed during file transferso Files randomly distributed to Tier1o Will move to distribution by runs (few 100’s files)o For 2009, runs were never longer than 4-5 files!o Max file size set to 3 GB
m Very good Grid latencyo Time between submission and jobs starting running
![Page 23: LHCb report to LHCC and C-RSG](https://reader036.fdocuments.in/reader036/viewer/2022062500/56815a5e550346895dc79290/html5/thumbnails/23.jpg)
LHCb to LHCC and C-RSG review, PhC 23
Resource requests
![Page 24: LHCb report to LHCC and C-RSG](https://reader036.fdocuments.in/reader036/viewer/2022062500/56815a5e550346895dc79290/html5/thumbnails/24.jpg)
LHCb to LHCC and C-RSG review, PhC 24
Resource requests for 2010-12
m 2010 runningo The requests were made in April-June 2009
P No additional resources expectedP Try to fit within those requests
o Running scenario for LHCbP March: 35% LHC efficiency @ 100 HzP April-May-June: 50% LHC efficiency @ 1 kHz in averageP July-August-September-half October: 50% @ 2 kHzP no Heavy Ion run for LHCbP This corresponds to 6.1 106 seconds @ 2 kHzP The 2009-10 request accounted precisely by chance for
6.1 106 seconds (0.5+5.6)P Therefore we use 6.1 106 seconds for 2010 at 2 kHz
trigger ratem 2011 running
o Use the recommendation of MBP March: 35% LHC efficiency @ 2 kHzP April to mid-October: 50% LHC efficiency @ 2 kHzP Total running time: 8.9 106 seconds
m 2012: no run
![Page 25: LHCb report to LHCC and C-RSG](https://reader036.fdocuments.in/reader036/viewer/2022062500/56815a5e550346895dc79290/html5/thumbnails/25.jpg)
LHCb to LHCC and C-RSG review, PhC 25
Resource requirements for 2010-12
kHEP06*year2010 (old) 2010 (confirmed) 2011 (prelim.) 2012 (very prelim.)
Integrated Integrated Power Integrated Power Integrated Power
CERN T0 5.70 4.50 4.07
CERN CAF - Analysis/Calib/Alignment
11.56 11.91 15.46
CERN T0 + T1 17.19 17.26 21 16.41 20 19.53 24
Tier1s 32.99 33.84 41 57.49 70 65.55 80
Tier2s 31.74 31.74 46 31.48 46 31.48 46
Total 81.91 82.83 108 105.38 136 116.57 150
Disk (TB)
CERN T0 + T1 1290 1270 1685 1776
Tier1s 3290 3350 4215 4458
Tier2s 20 20 20 20
Total 4600 4640 5920 6254
Tape (TB)
CERN T0 + T1 1500 1462 3020 3723
Tier1s 1800 1922 4271 5605
Total 3300 3384 7290 9328
![Page 26: LHCb report to LHCC and C-RSG](https://reader036.fdocuments.in/reader036/viewer/2022062500/56815a5e550346895dc79290/html5/thumbnails/26.jpg)
LHCb to LHCC and C-RSG review, PhC 26
Comments on resources
m Very uncertain and fluctuating running plans!
m Depending on LHC running, MC requests may be different
o Minimum bias, charm physics, b physics…m Only after one year (at least) experience we can
see how running analysis on the Grid workso Analysis at CERN?o Analysis at Tier3s?o Reliability for analysis?
m 2012 is still very uncertaino No LHC runningo Will the MC requests be the same as previous yearso How many reprocessings?
P Currently assume 1 full reprocessing of 2010 and 2 of 2011
![Page 27: LHCb report to LHCC and C-RSG](https://reader036.fdocuments.in/reader036/viewer/2022062500/56815a5e550346895dc79290/html5/thumbnails/27.jpg)
LHCb to LHCC and C-RSG review, PhC 27
Conclusions
m Real data in 2009o So few that it didn’t impact resource usageo Was extremely valuable for
P Setting proceduresP Start understanding the detector
d Already very promising performance after a few daysd Π0 peak, Λ and K0 reconstruction…
P Exercising automatic processesm 2010
o Still expect somewhat chaotic runningP Frequent changes in LHC settings, LHCb trigger
commissioningo No change in LHCb resource requests w.r.t. June
2009m 2011
o More precise requests with experience from 2010m 2012
o Still very preliminary, but small increase only compared to 2011