The Next Steps - Upgrading L1Calo: Why, What, When and Who An essentially random set of borrowed and...

The Next Steps -Upgrading L1Calo: Why, What,

When and Who

An essentially random set of borrowed and recycled slides to provide a flavour of

the planned work

Where I’ve borrowed, many thanks

Where I’ve used my own and far better exist elsewhere, many apologies

L1Calo Upgrade: Why

We worked hard for 15+ years to get here, where did we go wrong?

The obvious answer: LHCand the physics available

• LHC is now the main flagship for particle physics – For many years to come– Pressure to squeeze as many physics results out as is

humanly possible• Mass of Higgs is in a very interesting, but tricky

region– Ambition of analyses exceeds pre-LHC expectations

• Hope of any other ‘easy’ discoveries is fading– Need to maintain performance at lower energies as

Higgs/EW physics. It may be all we’ll get

LHC Schedule

• I got this from a Phil Allport talk in July, so it must be right

Typical quotedluminosities*:

Run 2: 1-2 x design

Run 3: 2-3 x design

Run 4+: 5 x design

design = 1034 cm-2s-1

* could be on low side

Focussing on TDAQ

• The (original) TDR aims were well known and well motivated

• But even in 2012-2013 we were pushing beyond the original rate goals

• Except at Level-1, as HARD hardware limited– As distinct from the

rest of TDAQ

Run 1 Realities• Moore’s Law helps HLT +

Offline• Doesn’t do much for Level-1

Level-1Max 75 kHz,

Typical 60 kHz

Level-2Max 5.5 kHz,Typical 4 kHz

Event FilterMax 600 Hz,

Typical 400 Hz

Also watch out for dead-time

• Even when we did run at 75 kHz, dead-time started to creep up

• Aim for 2015 is to try to go faster

• But I have my doubts

The consequence

• The ATLAS trigger performance depends increasingly on Level-1

Level-1 1/500

Level-2 1/20

Evt Filter 1/20

DESIGN

Level-1 1/500

Level-2 1/10

Evt Filter 1/10

RUN 1

• And that’s before the planned HLT upgrades…

HLT beyond 2015

• More hardware with more nodes (of course)• Distinction Level-2 and EF is blurred/removed

– A more incremental approach to algorithms– Along with an increased event building capacity

• Baseline recording rate increased to 1 kHz– With potential for further data ‘parking’ and an offline Level-4 trigger

• Effectively picture could look something like below– And who knows how much more capacity in 2020+

Level-1 1/500

‘Level-2’ 1/7

‘Evt Filter’ 1/7

RUN 1

L1Calo response

• Increasing pressure to improve selection• Several different strands possible

– Address pile-up dependency– More flexibility– Introduce topology– Use more detailed calorimeter information

• When all else fails– Increase Level-1 rate (and call it Level-0!)– With all the detector work that implies

L1Calo Upgrade: What

You will be upgraded. You will become like us.*

*A. Cyberman

L1Calo Rates: the Good, the Badand Missing Energy

Well behaved triggers scale with luminosity– Useful to study trigger ‘cross-section’ as

function of pile-up factor <μ>

Triggers affected by pile-up very dependent on LHC bunch structure– Typically missing energy, forward

jets, multi-jets

Missing Energy: the Pile-Up effect

• L1Calo experiences pedestal shifts due to unbalanced overlaying of signals at the start of the train– This was something we always expected– But perhaps we underestimated the extent– And the horrified reaction from others!

• On-the-fly pedestal correction reduces the problem enormously– Also different types of filters can help too

• This sounded the death knell for the final major ASIC in the system– But gave a lovely new project for Heidelberg

The New Multi-Chip Module

USA 15 system alreadyupgraded with nMCMsrunning in legacy mode

FPGAs: Our Flexible Friends

• Elsewhere FPGAs dominated the old system– Which has definitely proved useful– Already in Run 1– But also for current upgrades

• CPM and JEM will send more detailed information on backplane– Mostly used for topological information– But also allows, for example, isolation fraction cuts

• Required the replacement of the CMM– Which can produce more, and better, triggers

CMX: what does it stand for?

• The CMX breaks the mould of L1Calo modules by being green!

• More modules appearing all the time in USA 15

Towards Topology

• The CMX adds extra functionality and flexibility to the original counting triggers

• Crucially it also provides a new data path to a ‘Topo Box’

• The Topo module adds topological functionality that runs on the current RoIs– It is also built with further upgrade inputs in mind

• In the absence of tracking information at Level-1, this provides another useful lever

The Topo Box, or Boxes

• First use of modern optical links in L1Calo (6.4 Gb/s)• Pioneering ATCA usage (replacement for VME)

– Installed recently in USA15 (first ATCA in ATLAS?)

If you thought 6.4 Gb/s was fast…

• Longer term upgrade:– Use more detailed calorimeter information– Typically each tower becomes 10 ‘super-cells’– Digitisation on detector, optical data received at

speeds of up to ~12 Gb/s

Two become Three:eFEX, jFEX and gFEX

The (not so) far future

• For Run4+, many ATLAS detectors need to be replaced• Potential Options for changing underlying trigger

architecture– Include track trigger– Introduce a higher rate Level-0

• Which means we can ‘relax’, a la HLT and let the upgraded Level-1 (now Level-0) run at a higher rate

• Need to define what the new Level-1 is like– Details are being hammered out, but this is still a very open area– Eg Protocols, Rates– May be possible to implement more software like algorithms in

GPUs, also use of CAMs

L1Calo Upgrade: When

No Gantt charts allowed!More a history of upgrade evolution

L1Calo, the next ten years

• Long Shutdown 1 (NOW)– MCM becomes nMCM– Introduce CMX to RX/TX RoIs rather than hits– Add Topological Trigger

• Long Shutdown 2 (2018-2019)– New digital high granularity trigger towers (super-cells)– Digital eFEX, jFEX ang gFEX run initially in parallel with legacy-L1Calo

• Only LARg for Run 3

– Topological trigger augmented for new signals

• Long Shutdown 3 (2024-2025)– Level-1 split into two stage trigger (L0 and L1)– FEX systems fully populated including Tile and used as L0Calo– New L1Calo fed from detector readout information

L1Calo Upgrade:the early days

• (Probably) the first serious attempt to think ahead happened right here in January 2008:– https://indico.cern.ch/event/24112/

• Here’s an interesting slide, what do we learn?

a) I haven’t changed my PPT styleb) We were still relying on Eric…c) … and Wesleyd) Most of the points are accuratee) Weren’t yet thinking Topology

https://indico.cern.ch/event/24112/

https://indico.cern.ch/event/24112/

Rest of 2008

• Not surprisingly commissioning was the major focus

• Nevertheless, thoughts in several institutes were looking at directions we have now taken:– High speed optical link research– (µ)ATCA developments– Clock jitter cleaning

• Also new institutes starting to take an interest

RAL and Mainz 2009:The Upgrade comes of age

• (Maybe because LHC was broken?)• Argonne, Cambridge and MSU officially accepted as

new upgrade collaborators• First mention of nMCM• Backplane tester• Discussion of CMM speeds…• …and alternatives• By the end of Mainz meeting (June), seeds of

current (LS1) upgrade were sewn• And Eric then retired

L1Calo: Generations

What about calorimeters?

• Francesco attended the meeting in January 2008

• Ideas became more focussed after a Tile Upgrade meeting in February 2008

• Idea was then to be on LS3 time-scale

• Discussions on this option continued regularly with both Tile and LARg

Then 2011 came along

• The idea of the trigger dedicated DPS appeared– Largely motivated by electron selectivity, though

with lots of other potential too• I clearly remember saying to many people that

I expected no calorimeter upgrade on the LS2 time-scale– I was utterly wrong!

• Though there were rumblings before, it mostly erupted at Stockholm meeting in June

The ubiquitous diagram

The project blossomed!

• Ambitious, large scale, high speed, digital electronics– Exactly what the doctor ordered

• There was/is/will be one big problem– Timescale– For such an ambitious project, it seemed a little late to be starting– And our record on sticking to deadlines is not perfect

• Still for many reasons, the show had to go on– So draw up the Gantt charts, and cross the fingers

• For many years (1995-2008) I wondered how it was we always produced impossible looking schedules– Which we then invariably failed to meet

• Now I know how it happens

The collaboration also blossomed!

Moving on rapidly

• Thoughts on sATLAS and sLHC continued• Nikos and others have been pushing Track Trigger for

many years• But it was never clear if it could contribute at the current

Level-1 latency• Another approach was needed: Level-0• The idea had been around for some time, but first

presented in SLAC, November 2010– There were no great objections– But then maybe no-one understood it– In fact I’m still not sure I do

As presented back then

L0A: 500 kHz 2.5 µsL1A: 100 kHz 6.4 µs

• Largely speaking this architecture has been accepted– Or something like it (rates, and particularly latencies

changed/changing)• Many of us are still finding our feet with the implications

Developments

Anotherslide from

2010

L1Calo

The Lament of the Level-1 ‘Expert’

The lament of the Level-1 expert (1)

• Level-1 has to do more rejection than HLT– In 2012, Level-1 took ~15 MHz input down to 75 kHz

• 75 kHz is the advertised figure, actually more like 65 kHz, constrained by detector readout

– Whereas HLT is allowed to output up to 1 kHz• Data can be buffered for processing during quiet periods (eg LS1)

• And yet is also expected to achieve near 100% efficiency– HLT is allowed to at least 5% inefficient for important signals– HLT is better understood so efficiency can be compared more easily to

Monte-Carlo. (So people keep telling me.)


• Everyone compares to offline reconstruction• There’s no MC ‘truth’ in real data

– Normally I’m all in favour of data driven analyses• But in the absence of ‘truth’ people tend to assume offline energy is the truth

– Sadly this is not always the case– Sometimes the trigger path works better than the full detector data path– And both routes can make mistakes in the presence of large pile-up

• HLT uses offline-like quantities from detector readout, so is less affected by these expectations (ie HLT and offline make the same mistakes)

Detector Path OK OK Broken Broken

Trigger Path OK Broken OK Broken

Common Perception

Level-1 might be

right

Level-1 is wrong

Level-1 is wrong

Level-1 is wrong

We say Phew It’s a fair cop Hang on a moment

Not our fault!


• Common complaint: Level-1 is the limiting factor for many triggers– There’s much truth in this

• Hence we’re often under a lot of pressure

– HLT gains hugely by having tracking information

• But it’s not true for everything– For example photon and two-photon channels

• (I’m reliably informed that the two-photon channel has been quite interesting)

– High-pT jets are not a problem

• And it’s partly due to a lack of adventure!– Level-1 is expected to be ‘stable’, can’t change too often

• Tuning for high-pileup was delayed and could probably be improved

– Reluctance to use features built in exactly to keep rates down• Eg electron isolation

L1Calo Upgrade: Who

You know those rock-group line-up charts?

Well this isn’t one of those

The original line-up(1996ish-2009)

• Birmingham on the CPM• Heidelberg underlying everything with the PPM• Mainz accompanying with the larger scale JEM• QMUL: management and programming• RAL providing rhythm and support with the

TCM, CMM and ROD• Stockholm providing the flourish and firmware

The first upgrade wave

• Joining in March 2009– Argonne (and occasionally leaving)– Cambridge– MSU

• Joining in March 2011– Berlin Humboldt

The Upgrade-TDR wave

• Completing the set in 2013– Brookhaven – Stony Brook– Krakow (Jagiellonian)– Oregan– (UBC/Wojtek)

L1Calo Upgrade: The Collaboration

L1Calo Collaboration

Some Last Words

• Digitisation: I still believe, that an 8 bit FADC is sufficient for threshold-settings in Processors.

• [ It is winter: there are no tomatoes to throw. ]

From Paul Hanke

From Steve

It’s been wonderful working with you.

Thanks for re-introducing me to ‘The Now Show’.

The Next Steps - Upgrading L1Calo: Why, What, When and Who An essentially random set of borrowed and...

Documents

Transcript of The Next Steps - Upgrading L1Calo: Why, What, When and Who An essentially random set of borrowed and...