Arnd Meyer (RWTH Aachen) Dec 4 th, 2003Page 1 Tevatron and DØ Status and Plans Arnd Meyer, RWTH...

Arnd Meyer (RWTH Aachen) Dec 4th, 2003

Tevatron and DØ Status and Plans

Arnd Meyer, RWTH AachenDØ Germany MeetingDecember 4th, 2003


Data Taking Status

Total datasample ontape withcompletedetector

> 200pb-1

...still waitingfor the first> 200pb-1

analysis


Data Taking Status cont.

● Lab reached its goal of delivering 225pb-1 in FY03

Also for DØ: BD delivered

∫L dt = 227.7pb-1 in FY03

26pb-1 per month since May

But had to run 6 weeks longer than hoped

● Cuts into our running time next year

Long shutdown – difficult to anticipate rapid startup

Six week shutdown next summer / fall● We do not expect to get significantly more ∫L dt in FY04 than

we got this year 25% pbar tax (Recycler commissioning), studies ⇒ 233-328pb-1 in FY04

See Dave Mc Ginnis' presentation on Oct 3 ADM

FY02 FY03


FY 04 Luminosity Profile

More (2x/week) andshorter (8 hours)accelerator study periods

Studies only if >140 hrsof store time in theprevious 14 days

Higher deliveredluminosity throughimproved stacking rate

Improve stacking ratethrough shorter pbarproduction cycle time(2.4sec 1.7sec) 11.3mA/hr (FY03) 18 mA/hr (FY04)

≃ 3 months turn-onafter shutdown

(on schedule so far)

pessimistic∫L dt ≃ 233pb-1

design∫L dt ≃ 328pb-1

Will know by the end of the year if Recycler work was successful(but can benefit only in FY2005)


... and a Wishlist for End of FY04

You are here

3⋅1032cm-2s-1

2004

● STT fully commissioned● Missing pieces of CTT

fully commissioned● Taking data with rates of

2.5kHz / 1kHz / 50 Hz after L1 / L2 / L3 and 90% average efficiency

● CPS / FPS used in the trigger and for physics

● Most data quality problems are caught online

● 1-2 fewer people on shift

● Taking shifts and improving the detector is not considered a necessary evil

● We have 0.5fb-1 of good data on tape● Reco takes 1sec/event on my 2-year old desktop


Data Taking Efficiency

Shutdown

Winter '03Shutdown

The LuckyWeek

Pre-shutdownspecial runs/

studies


Efficiency (Post-Shutdown)

● See some of the improvements in the machines already; e.g. lifetime at 150GeV in the Tevatron now 16-28 hrs vs. few hours pre-shutdown (removed apertures limitations)

● Biggest problem: “messy” store terminations with large losses, quenches, and CDF losing a couple of Silicon ladders

Not bad after10 weeks ofshutdown!

≃ 8 Stores so far

Initial luminosities 0.7 – 9.0 – 8.6 – 15.9 – 22.117.9 – 21.6 – 21.9 ⋅ 1030cm-

2s-1

Factor of 2 below the best stores before the shutdown


Data Taking Efficiency (pre-SD)

● Average data taking efficiency for 2003 is 86%● Current upper limit is ≃ 95%

3-4% global front end busy

1% begin and end store transitions

<1% run transitions

● Typically in the upper 80%'s for the last six months

Since Beaune, “lost” 83.5 hours of store time (12.6% by time)

12 hours for special runs

Largest single failure (4 hrs): low airflow trips of L1CAL on July 27/28. Fan belt replaced. Symptomatic: some of the largest downtimes are one-time occurences

Failures by component (without special runs):

– SMT: 12 hours

– Muon/L1Muon: 12 hours (trips, readout errors/crashes, trigger problems)

– CAL: 9 hours (mostly BLS power supplies and hot trigger); + 5 hours L1CAL

– CFT/CTT/PS: 5 hours

– L3/DAQ/Online: 4 hours

Tracking crates readoutcollaborations' decision: L1A vs. FEB

fairly optimized, contiuouseffort to keep this low


Data Taking Efficiency cont.

● Much of the time running close to our desired efficiency to tape – 90%

● At the same time, data quality improves

Number of conditions that causes us to stop data taking (automagically or manually) is continually increasing

● Credit to many (few) dedicated people!● Large downtimes are discussed at the weekly operations meeting● Several systems marginal in terms of expert coverage: one

or no resident expert – no manpower for proactive improvements

http://www-d0.fnal.gov/runcoor/


Run II Bests

Regularly updated:

Best days by data taking efficiency

– 95.0% on June 22nd; so far 8 days with 93% or better efficiency

Best runs and days by recorded luminosity (Aug 10, 488nb-1; May 4, 1.68pb-

1)

Best stores by initial DØ luminosity (Aug 10, 4.55⋅1031cm-2s-1)

http://www-d0.fnal.gov/runcoor/


A “Typical” Store

Wobbly L1 rates (CAL)Initial Lum. 3.9⋅1031cm-2s-1

Store lost (quench)5-3% L1 Busy

Present max. rate guidelines:Level 1 1.4 kHz FEB < 5% (5-10% headroom to accountLevel 2 800 Hz Muon r/o for rate fluctuations)Level 3 50 Hz Offline (30% room)


“Typical” Store (Post-Shutdown)

7% L1 Busy at 1kHz15% at 1.5kHz

File transfers to FCC


Control Room Shifts

● It is a burden on the collaboration to fill that many shifts (and schedule them!)

The shifter duty is 7 shifts / 6 months per person on masthead

Only about 1 shift per month per person (on average!)

● Calorimeter and Muon shifts consolidated into CalMuon shifts since June

Rocky at times, but overall OK

Cost of additional training offset by savings in total number of shifts

There is more training involved – took some time to be realized by “old” shifters

● Next natural choice for merging is SMT / CFT – will require initiative from detector groups (clear instructions, simplify, automate)

● More than a third of the collaboration have not yet taken a single shift in 2003

The fact that we are collecting data with high efficiency is to a large partdue to the presence of 5 – 6 well trained people in the control room


Data Quality & Global Monitoring

● Online data quality monitoring consists of three (four) parts

Significant Event System (“Slow Control”)

– Catches an increasing number of hard- and software failures

– In many cases pauses the run to ensure consistent data quality

– Working very well, could use additional experts guidance

DAQ Artificial Intelligence

– Notifies shifter of abnormal conditions (global rate fluctuations, BOT trigger rate, ...)

– Automagically fixes certain problems (SCLinit), e.g. sync. problems in L2

Sub-detector monitoring examines

– Many expert-level plots (but experts are not generally on shift)

Global Monitoring

– Trigger rates, Trigger Examine, Vertex Examine, Physics Examine

● Global Monitoring has great potential, but there are many issues – examples follow


– If we can't fill all shifts, need to think about merging with Captain's and other shifters' duties

Global Monitoring cont.

● Technical Issues

During the transition from trigger list v11 to v12, ran for weeks with wrong/bad reference plots (different triggers, then rapidly changing prescales)

PhysEx uses random sample – should be based on certain triggers

Low statistics (slow reconstruction)

● Psychological Issues Lack of interaction between detector

shifters and GM

– GM detects feature in Gtrack phi distribution – SMT shifter cannot correlate with his occupancy plots

– Need effort from all detector groups to bring their expertise into GM plots (only Muon group has done this so far)

● Organizational Issues Shifts not being filled (e.g. 8 in August)


DQ & GM cont.

● LmTrigger urgently needs improvement

For example averaging over different time periods, uncertainties, luminosity dependence of trigger cross sections, ...

Extremely important tool to identify problems quickly

● Overall, somewhat slow progress (remember Beaune?)● If we want to continue reducing the number of shifters in the control

room, GM needs a major effort (time, people, attention) From the collaboration – great task for groups new to DØ Automation should be the goal Need to catch all major problems online

● Up to one third of the data is thrown away in the analysis stage – sad!

Everybody who discards data through “bad” or “good” run lists should

make that extra step and think about how to catch the problems earlier!


“Summer” Shutdown

● Successful 10 week shutdown (Sep 8 – Nov 17)

7-8 weeks for experiments, 2-3 additional weeks with limited access

● Four scheduled power outages, a couple of unscheduled ones● 24x7 DAQ shifts and day shift Captains – thankless task! (“Good God,

please, someone, if you see me in the parking lot, run me over. Kill me. I am so bored to death.” - anonymous DAQ shifter)

● Major D0 goals for the shutdown

Improve reliability: reduce access time, periods with incomplete detector etc.

Improve quality of the data: reduce calorimeter noise, repair Silicon HDI's

● Some major accelerator tasks

Recycler vacuum improvements, bakeout

Tevatron alignment work, installation of Tev alignment network

Replace rotting magnet stands

Improve some aperture limitations (Tev, transfer lines), upgrade instrumentation


Post Shutdown Status

● Access went smoothly overall – no accidents, on schedule, great support

Detector groups together with mechanical and electrical support groups have developed detailed job lists including detector opening and closing, allocation of manpower resources from detector groups and support teams, survey as needed during major moves

● First store on November 22nd, as scheduled Took data within 3 minutes of first 36x36 store

Quality data taking established with 2nd 36x36 store

● Single days with about 90% data taking efficiency, already many runs with >94% efficiency

Biggest downtimes:– Solenoid protection electronics failure (~4 hours)– MCH ↔ FCH switch failure (~2 hours)

● Comprehensive reviews of online and offline quality of the data taken after the shutdown

December 12th, 15th, 19th

Identify more problems much earlier than at the “Bad Run List” level


Silicon Status

CurrentDose

● Cancellation of Silicon replacement means we must plan to operate current detector for the life of the experiment

Layer 0 (which likely will be a part of the rebaselined Run II upgrade) without additional hits from the outer SMT layers is of little use

Have to evaluate what steps can be taken to increase chances of the detector's reliable operation long term

● Bias scans (August): measure depletion one layer at a time

Use tracks from CFT and other SMT layers to determine cluster charge as a function of HV

Runs with HV varied between 0% and 100% in 10% steps

SMT in full readout (no sparsification)

Average over ladders (statistics)

Results confirm expectation, but with large uncertainties


Silicon Status

● Main task during shutdown: repairs of failed HDI's / electronics Before shutdown: 136 disabled (up from “irreducible” 84 just after Jan

shutdown)

12 are “definitely repaired” (2 weeks of 2 shifts/day with 4 people)

59 are “unstable”

● 112 HDI's are currently not powered

50 Ladders (11.6%)

30 F-wedges (10.4%)

32 H-wedges (16.7%)

● A few HDI's failed when magnet was energized

● All but 2-3 of the enabled HDI's participate in track reconstruction

● SMT is operating stably

1st storeafter

shutdown


Central Fiber Tracker

● Shutdown tasks Maintenance of LVPS's and installing upgraded LVPS (better connectors),

maintenance of the VLPC He cooling system

Major job: modification of AFE boards to remove unused SVX inputs from the readout Reduce data size and DAQ deadtime

● Known issues: channels corresponding to 10 SVX chips are not operational

Swapped AFE boards – problems stays

Found that problem appeared after last power outage with warm up of cryostat serious problem!

● Reconstruction is still using old (wrong) CFT maps – makes offline data quality checks hard


Calorimeter

● Replaced all large cooling fans for preamplifier cooling during shutdown

● Studies of calorimeter noise – access priority given to “noise task force”

Characterize noises: 10MHz, 14.3MHz (RF/4), “Ring of Fire”, ...

Controlled power-up after power outages to identify sources

Improve grounding

● “Ring of Fire” / “Welder Noise” Sudden burst of triggers/events

Appears when a welder is triggered inDAB3

Other unidentified sources

● Entry in the cryostat identified Noise disappears almost entirely when

temperature monitoring cables aredisconnected


Calorimeter Noise

● RF/4 noise appeared when muon chambers switched back on Unstable

Not really observed in data

● 10MHz noise from SMT sequencers Went around with a radio tuned to

“WSMT” (10MHz) to identify sources

● Series of grounding tests (shutdown) Disconnect AC, safety ground, phone, etc.

Visual inspection: found ≃10 contacts

Attach current controlled power supply,slowly increase current up to 50A, and

look for heat sources

Found (and fixed) a few more problems

Improved grounding reduces “welder noise” with temperature cables attached by about a factor of 2

● Does all the work pay off? Robert: “Looks better than ever before!”


(New) Calorimeter Monitoring

Occupancy/energy views to catch hot/cold zones2nd store

aftershutdown


Muon Systems

Shutdown Tasks:

● Forward Muon

First-time access to A layer forward muon tracker (8-12 hours for opening and closing) completed: replacement of preamplifiers, gas leaks, gas monitors; C layer repairs in progress

Number of non-working channels now 0.15% (trigger counters), 0.5% (drift tubes)

● Central Muon Installed extra trigger counters under the detector – running into a few snags

(tight clearance on east side) but no show-stoppers

Installed 144 remote power cycle relays for front-end electronics

Pulled a couple of wires drawing moderate to high currents for investigation

Installed Power PC's in the remaining muon readout crates that had 68k's

One PDT problem will require more than 4 hours access to repair

● Muon systems are collecting physics quality data – no known serious issues

The infamous bottom hole


Luminosity System & FPD

● Luminosity system Cable work in the gaps

Complete readout electronics installed (finally!). Required to reduce the embarassing 10% uncertainty on our luminosity measurement

● Forward proton detector

Installation of electronics for full system operation, all 18 pots in 6 castles


Level 2 Upgrade

● All Level 2 Alphas have been replaced with Betas! Running smoothly so far

● Too fast for PDT readout code – had to slow down temporarily until firmware corrected

● Indications that the reason for increased front-end busy lies within the Level 2 system


Silicon Track Trigger

● Reminder

The STT is part of the Level 2 Trigger System

Based on L1CTT roads, refit tracks using SMT axial hits improved pT,

, impact parameter

Reduce background, improve pT

resolution, cut on impact parameter

All Run II papers CDF has published so far are based on their equivalent (SVT)

● By default 5 STT crates in the readout after the shutdown (none before)

● Not yet in trigger

And unfortunately there's actually little point presently, with 1.4kHz L1A and 1kHz L2A


Trigger, DAQ, Online, General

● Firmware upgrades on L1CTT and L1Muon, maintenance on L1CAL

● Online & Controls Replaced 8 disks that have died over the last 2 years (4 disks for the data

logger)

Major online software upgrades – Python, Epics, VxWorks

● New Trigger Control Computer, Online IP shuffling, other upgrades/maintenance

● General detector maintenance – Air handlers, hydraulic systems, vacuum jackets, cooling water systems, ODH heads, etc.

● Old Cryo UPS replaced

● ....

No time to talk about many other shutdown jobs!


Conclusions

● Detector is running well (better than some people want to make us believe)

Data taking efficiency 86% for the year – “physics analysis efficiency”??

216pb-1 integrated luminosity on hands with full detector in readout

Progress in online data quality monitoring not as good as hoped for

● Shutdown went well and on schedule – a lot of work completed● Came out of the shutdown well prepared

First store on November 22nd, as scheduled

Took data within 3 minutes of first 36x36 store, quality data taking established with 2nd 36x36 store

● Major worries

L1 Bandwidth, data quality monitoring, diminishing manpower

Disconnect between data taking and analysis

Offline progress (processing/reprocessing) is slowing down physics output

Arnd Meyer (RWTH Aachen) Dec 4 th, 2003Page 1 Tevatron and DØ Status and Plans Arnd Meyer, RWTH...

Documents

Transcript of Arnd Meyer (RWTH Aachen) Dec 4 th, 2003Page 1 Tevatron and DØ Status and Plans Arnd Meyer, RWTH...