CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08
description
Transcript of CPass0/CPass1 on LHC12f/e/d/c Updated at 10:00 on 28/08
+
CPass0/CPass1 on LHC12f/e/d/cUpdated at 10:00 on 28/08
C. Zampolli
Ever tried. Ever failed. No matter. Try Again. Fail again. Fail better.
(S. Beckett)
+To be followed up
LHC12f being processed smoothly
LHC12e done, waiting for LHC12d
LHC12d being processed with Rev-22, failures in T0, and TRD
LHC12c manual update ongoing
pA MC test on LHC11f2 ongoing (merging will be submitted today)
pA data run 187338, done at CPass0, but no info on MonALISA, not appearing at CPass1 snapshot
pA pilot run CTP will have a Alias file with everything defined as kCalibBarrel (to my understanding)
8/28/12C. Zampolli
2
+To be followed up – II
kCalibBarrel triggers number published in logbook (waiting for news, promised by the end of August)
Validation codes from the detectors still missing TPC and MeanVertex, will help for short runs failing in TRD due to statistics
Replicating the OCDB entries: new functionality implemented by Raffaele how should it be implemented in the calib code? Member of the AliCDBManager? Otherwise everybody will have to change their code…
Reprocessing of old runs trigger classes defined, Aliases file to be created, then downscaling
Better selection of runs? Difficult in my opinion
8/28/12C. Zampolli
3
+Some diagnostics
During the last 2 months of CPass0/CPass1 processing, (quite) some manual intervention was needed Fixing steering macros/scripts Restarting CPass0 and/or CPass1 Triggering CPass0 and/or CPass1 manually
Main reasons (to my memory… I might forget something) Wrong AddTaskTPCcalib.C committed to the release by mistake during
synchronization Merging of syswatch trees not properly tested and consuming too much
memory TPC wrong OCDB update in makeOCDB.C macro for CPass1 Wrong TPC gain threshold used for validation
8/28/12C. Zampolli
4
+Some diagnostics – II
Reprocessing of LHC12d due to a bug in the TRD reconstruction Re-reprocessing of LHC12d due to a problem with TRD code in Rev-23 Some LHC12e runs to be reprocessed after a fix in the aliases files due to
“miscommunication” (mis = missing + wrong) between TRD, RC, Trigger, calibration
CPass1 manual triggering for runs failed in T0 at CPass0 (1 done, 20 to be done)
CPass1 manual triggering for a run for which CPass0 was merged manually (Raphaelle)
CNAF disk full ALICE::CERN::T0 issue
8/28/12C. Zampolli
5
+Two more comments…
As already said in July, no modification in AliRoot that may affect the calibration should be requested to be ported to the Release if not properly tested in the calibration train on the grid I cannot know whether changes may affect the calibration, the detector
experts should
Since apparently it is not enough to show updates on Monday Offline, Tuesday RC, Thursday Offline Calibration Readiness and Friday Calibration usual meetings, I think it would be important that: One person representing all the detectors taking part in CPass0/CPass1
should always be present at the calibration meetings If the direct responsible(s) is not available, someone representing the
corresponding detector should anyway participate, to propagate the information discussed there.
8/28/12C. Zampolli
6
+How to decide when to process a run
Currently, we process runs marked as good (DAQ flag), duration > 5min, GRP ok, with Beam Could this be improved? Hardly to say… Not on the offline side at least…
8/28/12C. Zampolli
7
+LHC12f
8/28/12 C. Zampolli 8
+Summary table – on 28/08 at ~ 10:00LHC12f
8/28/12C. Zampolli
69 in logbook Filters used: LHC12f, PHYSICS, Good Run, GRP ok at least one of [SDD, TPC,
TRD, TOF, T0], with Beam
CPass0: Snapshot: 69 Reco+CalibTrain: 69 Merging+OCDB: 69, 1 of which running
CPass1: Snapshot: 49 Reco+CalibTrain: 49 Merging+OCDB: 44
9
+Summary table – on 28/08 at ~ 10:00CPass0 – LHC12f
8/28/12C. Zampolli
COSMICS: 0 failure expected
EMCAL/PHOS/MUON: 13 failure expected
No triggers: 0 failure expected (too short run)
EE/EV/Expired: 0 memory issue during the merging (under investigation)
Running: 1
Others (detectors): 5 (but all short runs)
Successful: 55, but 1 (187338) has no logs in MonALISA
55/(55+5) = 91.7% success rate
10
+Summary table – on 28/08 at ~ 10:00CPass0 – LHC12f
8/28/12C. Zampolli
Failure reason
Run Number
TRD (5)
186694
186816
186855
187147
187148
11
12 min, 4874 events/ 43825 tracks
6 min, 11111 events/ 114733 tracks
7 min, 11242 events/ 107505 tracks
All failures due to too short runs (number of events/tracks in terms of events used by TRD calibration)
7 min, 11089 events/ 138408 tracks
5 min, 11138 events/ 154946 tracks
+Summary table – on 28/08 at ~ 10:00CPass0 – LHC12f
8/28/12C. Zampolli
Failure reason Run Number
EMCAL/MUON/PHOS runs (13)
186805
186834
186926
186962
186980
186981
187046
187064
187081
187117
187133
187193
187198
12
+Summary table – on 28/08 at ~ 10:00CPass1 – LHC12f
8/28/12C. Zampolli
Of the 55 successful runs: 49 at CPass1 reco+CalibTrain 44 at CPass1 merging+OCDB
13
+LHC12e
8/28/12 C. Zampolli 14
+Summary table – on 28/08 at ~ 10:00LHC12e
8/28/12C. Zampolli
27 in logbook Filters used: LHC12e, PHYSICS, Good Run, GRP ok at least one of [SDD,
TPC, TRD, TOF, T0]
CPass0, completed: Snapshot: 27 Reco+CalibTrain: 27 Merging+OCDB: 27, 21 useful, 14 ok
CPass1, completed: Snapshot: 15 Reco+CalibTrain: 15 Merging+OCDB: 15
15
+Summary table – on 28/08 at ~ 10:00CPass0 – LHC12e
8/28/12C. Zampolli
COSMICS: 0 failure expected
EMCAL/PHOS/MUON: 6 failure expected
No triggers: 0 failure expected (too short run)
EE/EV/Expired: 0 memory issue during the merging (under investigation)
Running: 0
Others (detectors): 10: 3 recovered so far for TRD, 7 remaining
Successful: 11 became 14
11/(11+10) = 52.4% success rate became: 14/(14+7) = 66.6%
16
+Summary table – on 28/08 at ~ 10:00CPass0 – LHC12e
8/28/12C. Zampolli
Failure reason Run Number
TRD (8)
186428 (*)
186429 (*)
186453 (*)
186456 (**)
186459 (**)
186507 (*)
186508 (**)
186598 (*)
17
Failure reason Run Number
TRD + T0 (1) 186600 (**)
Failure reason Run Number
T0 (1) 186601 TRD:
(*) suffered from missing class (CSPI8WU-S-NOPF-ALL) in the configuration during data taking
Fixed manually using CINT8WU-S-NOPF-ALL Cpass0/1 should be re-run
(**) suffered from statistics – 186459 has CSPI8WU-S-NOPF-ALL but with zero triggers)
T0 suffers from high background, but limits will be increased Re-running will be ok (but CPass1 should be triggered manually if Rev < Rev-23
will be used)
14 min, events
14 min, events
14 min, events
+Summary table – on 28/08 at ~ 10:00CPass0 – LHC12e - REPROCESSING
8/28/12C. Zampolli
Failure reason Run Number
TRD (5)
186428
186429
186453
186507
186598
18
Failure reason Run Number
T0 (1) 186601
Failed (statistics)Ok
CPass1 re-run! Failing again in CPass1 as expected, but T0 experts already fixed the OCDB
+Summary table – on 28/08 at ~ 10:00CPass0 – LHC12e
8/28/12C. Zampolli
Failure reason Run Number
EMCAL/MUON/PHOS runs (6)
186383
186405
186425
186448
186503
186589
19
+Summary table – on 28/08 at ~ 10:00CPass1 – LHC12e
8/28/12C. Zampolli
Of the 14 successful runs, 15 at CPass1 ( one more since 186601 was inserted manually!): 15 at the snapshot 15 at CPass1 reco+CalibTrain 15 at CPass1 merging+OCDB
20
+Actions
COMPLETED Since the period was too short, the manual update should be done together
with LHC12d waiting for this period to be completed
8/28/12C. Zampolli
21
+LHC12d
8/28/12 C. Zampolli 22
+Summary table – on 28/08 at ~ 10:00LHC12d
8/28/12C. Zampolli
224 in logbook Filters used: LHC12d, PHYSICS, Good Run, GRP ok at least one of [SDD,
TPC, TRD, TOF, T0]
CPass0 completed: Snapshot: 220 Reco+CalibTrain: 220 Merging+OCDB: 220, 176 needed, 147 ok
CPass1 completed: Snapshot: 148 (1 more than CPass0, triggered manually after CPass0) Reco+CalibTrain: 148 Merging+OCDB: 148, 148 needed
23
+Difference between logbook and snapshot in MonALISA In logbook, but not in MonALISA:
184370 (EMCAL), 184645 (EMCAL), 185345 (ACORDE trigger), 185347 (ACORDE trigger), 185467 still in the migration process, checking with offline
In MonALISA but not in the logbook: 185190 (short run, the quality flag was changed)
8/28/12C. Zampolli
24
+Summary table – on 28/08 at ~ 10:00CPass0 – LHC12d
8/28/12C. Zampolli
COSMICS: 9 failure expected
EMCAL/PHOS/MUON: 33 failure expected
No triggers: 2 failure expected (too short run)
EE/EV/Expired: 1 memory issue during the merging, but then merged manually
Running: 0
Others (detectors): 28
Successful: 147
147/(147+28+1) = 83.5% success rate
25
+Summary table – on 28/08 at ~ 10:00CPass0 – LHC12d
8/28/12C. Zampolli
Failure reason Run Number
TPC Gain Threshold (1) 185460
Failure reason Run Number
COSMICS (9)
184880
184882
184885
184886
184889
184910
184914
184918
186264
26
Also TRD
16 recovered rerunning with looser constraints for validation (run 185460 not retried, since it failed anyway in TRD)
+Summary table – on 28/08 at ~ 10:00CPass0 – LHC12d
8/28/12C. Zampolli
27
Failure reason Run Number
T0 (20)
185687
185692
185695
185697
185698
185699
185700
185701
185734
185735
185738
185756
185757
185764
185765 Hardware problem, fixed now
Failure reason Run Number
185768
T0 (20)
185775
185776
185778
185784
+Summary table – on 28/08 at ~ 10:00CPass0 – LHC12d
8/28/12C. Zampolli
Failure reason Run Number
EMCAL/MUON/PHOS runs (33)
184443
184481
184663
184664
184709
184716
184719
184762
184780
185024
185148
185186
185341
28
Failure reason Run Number
EMCAL/MUON/PHOS runs (33)
185456
185559
185560
185562
185631
185647
185677
185731
185934
185994
185998
186036
186062
186063
Failure reason Run Number
EMCAL/MUON/PHOS runs (33)
186159
186192
186224
186225
186232
186316
+Summary table – on 28/08 at ~ 10:00CPass0 – LHC12d
8/28/12C. Zampolli
Failure reason Run Number
No triggers (2)183915
185190
TRD (8)
184190
185133
185378
185460
185915
185916
186319
186320
EV (1) 184673
29
Also TPC
Merged manually
+Summary table – on 28/08 at ~ 10:00CPass1 – LHC12d
8/28/12C. Zampolli
Of the 147 successful runs: 148 at CPass1 reco+CalibTrain
1 more than CPass0 since CPass0 was merged manually and the objects were uploaded manually in the OCDB (184673)
148 at CPass1 merging+OCDB… …of which 147 successful (ignore the red TPC color)… ...1 failed in TRD (184145)…
30
Different statistics for CPass0 and CPass1 480/480 chunks at CPass0 472/480 chunks at CPass1
+TRD issue
Due to a problem in the TRD reconstruction, some wrong OCDB entries were produced at CPass0; it is not possible to get the correct ones without re-running CPass0 Some manual OCDB update is needed (after LHC12d is fully processed,
ongoing for completed runs) DONE Then CPass0/CPass1 should be re-run with a Rev > Rev-18
Rev-23 (the latest) was used Changed in TRD code made the calibration not work properly More tests, new re-running with Rev-22
Will the failed runs be recovered? Waiting for experts’ reply still not known
8/28/12C. Zampolli
31
+Actions
CPass0 completed 20 runs failed at CPass0 due to T0 hardware problems
CPass1 should be triggered manually for these runs To be done after reprocessing, since now it would be useless (they all contain
TRD) Re-running with Rev-22… ongoing
8/28/12C. Zampolli
32
+Summary table – on 28/08 at ~ 10:00CPass0 – LHC12d – Failures after reprocessing
8/28/12C. Zampolli
Failure reason Run Number
TRD (1) 184145
185378
185460
185916
33
12 min, 11490 events/ 208981 tracks, had not failed before
+Summary table – on 28/08 at ~ 10:00CPass0 – LHC12d – Failures after reprocessing
8/28/12C. Zampolli
34
Failure reason Run Number
T0 (20)
185687
185692
185695
185697
185698
185699
185700
185701
185734
185735
185738
185756
185757
185764
185765 Hardware problem, fixed now
Failure reason Run Number
185768
T0 (20)
185775
185776
185778
185784
+LHC12c
8/28/12 C. Zampolli 35
+Summary table – on 28/08 at ~ 10:00LHC12c
8/28/12C. Zampolli
205 in logbook Filters used: LHC12c, PHYSICS, Good Run, GRP ok at least one of [SDD,
TPC, TRD, TOF, T0] Do not coincide with those in MonALISA, since runs were queued
manually for CPass0
CPass0 completed: Snapshot: 208, 1 should be ignored (179444) Reco+CalibTrain: 207 Merging+OCDB: 207, 109 needed, 93 ok
CPass1 completed: Snapshot: 93 Reco+CalibTrain: 93 Merging+OCDB: 93
36
+Summary table – on 28/08 at ~ 10:00CPass0 – LHC12c
8/28/12C. Zampolli
COSMICS: 37 failure expected
EMCAL/PHOS/MUON: 58 failure expected
No triggers: 3 failure expected (too short, or not the right trigger configuration)
EE/EV/Expired: 0
Others (detectors): 16
Successful: 93
93/(93+16) = 85.3% success rate
37
+Summary table – on 28/08 at ~ 10:00CPass0 – LHC12c
8/28/12C. Zampolli
Failure reason Run Number
COSMICS (37)
179941
179943
179944
179946
179948
179950
179951
179960
180164
180979
180980
180981
180983
180984
180985
Failure reason Run Number
COSMICS (37)
180986
180987
180988
180991
180992
182749
182750
38
Failure reason Run Number
COSMICS (37)
179658
179712
179713
179717
179723
179725
179730
179736
179740
179742
179743
179746
179747
179758
179766
+Summary table – on 28/08 at ~ 10:00CPass0 – LHC12c
8/28/12C. Zampolli
Failure reason Run Number
EMCAL/MUON/PHOS runs (58)
179595
179603
179604
179685
179687
180552
180559
180616
180643
180644
180692
180704
39
Failure reason Run Number
EMCAL/MUON/PHOS runs (58)
181026
181040
181046
181328
181339
181344
181360
181546
181558
+
Failure reason Run Number
EMCAL/MUON/PHOS runs (58)
181580
181625
181631
181954
181956
181984
182003
182094
182100
182103
182195
182198
182200
182226
Summary table – on 28/08 at ~ 10:00CPass0 – LHC12c
8/28/12C. Zampolli
40
Failure reason Run Number
EMCAL/MUON/PHOS runs (58)
182316
182403
182405
182410
182449
182451
182452
182470
182471
182475
182477
+Summary table – on 28/08 at ~ 10:00CPass0 – LHC12c
8/28/12C. Zampolli
41
Failure reason Run Number
EMCAL/MUON/PHOS runs (60)
182499
182502
182504
182609
182610
182612
182640
182641
182681
182712
182717
182721
+Summary table – on 28/08 at ~ 10:00CPass0 – LHC12c
8/28/12C. Zampolli
Failure reason Run Number
No triggers (3)
180934
181609
182639
Failure reason Run Number
TRD (7)
180716 (*)
180717 (*)
182325 (*)
182509 (*)
182508 (*)
182513 (*)
182724 (*)
Failure reason Run Number
TPC+TRD (9)
181617 (**)
181618 (**)
181619 (**)
181620 (**)
181652 (**)
181694 (**)
181698 (**)
181701 (**)
181703 (**)
42
(*) Low statistics, recoverable(*) Low statistics, not recoverable(**) No SSD/SDD number of contributors to Vertex Track = 0, TRD calibration failing, TRD fix in place; what about TPC?
+Summary table – on 28/08 at ~ 10:00CPass1 – LHC12c
8/28/12C. Zampolli
Of the 93 successful runs: 93 at CPass1 reco+CalibTrain 93 at CPass1 merging+OCDB…
…of which 84 successful in CPass1 (ignore the red TPC color)… …and 9 failed in T0, but are MUON runs – they should have not gone
through (different AliRoot, some changes in T0)
As soon as CPass1 is completed, 1 week of time will be given for manual update. If too little (QM, holidays), we’ll increase it. Then, Vpass should start
43
+Actions CPass0 completed; 9 runs failed in TPC and TRD
Not recoverable, no CPass1 7 runs failed in TRD due to low statistics
TRD can recover them manually, but no CPass1 would be run after those how will the other detectors mark these runs?
TOF, T0 bad Mean Vertex good TRP? TRD?
CPass1 completed on the available runs
In summary, ready for the manual update window
8/28/12C. Zampolli
44
1 week for the manual update announced: deadline on Friday 31 Aug (so far, eventually extended to Monday)
+Further comments
8/28/12 C. Zampolli 45
+Interdependencies
Under discussion: does EMCAL runs need calibration triggers? (PHOS does not) Seems not!
8/28/12C. Zampolli
46
+Further issues
Some reconstruction jobs fail with bad_alloc under investigation Grid tests with gdb ongoing not many information retrievable, the jobs
ran successfully Valgrind test ongoing did not show anything significant Trying with Rev-21 on LHC12c, LHC12e
Many errors, but FPE, not bad_alloc stack trace available I could not reproduce the problem, still investigating
8/28/12C. Zampolli
47
+PPass
LHC12a and LHC12b Vpass validated ready for Ppass A patched Rev-16 was created to fix the TRD QA issue to be used to run
Ppass LHC12a completed, QA feedback last week LHC12b completed, QA feedback last week
8/28/12C. Zampolli
48
+Calibration of old data
GRP/CTP/Aliases entries to be created, after defining the classes to be used for the reconstruction Might be needed to apply some downscale min(max(nevents/10,30000),nevents)/nevents, but we need to define
nevents
8/28/12C. Zampolli
49
+pA
Since MB will be the main trigger, we propose to use that and downscale. For the pA pilot run, all data are asked to be reconstructed, keeping ESDs,
friends, and ITS RecPoints
Tests on the LHC11f2 ongoing feedback will be asked
8/28/12C. Zampolli
50