28/03/2003 Julie PRAST, LAPP CNRS, FRANCE
1
The ATLAS Liquid Argon Calorimeters ReadOut
Drivers
A 600 MHz TMS320C6414 DSPs based design
28/03/2003 Julie PRAST, LAPP CNRS, FRANCE
2
The LHC
• LHC is an accelerator ring, where the protons beams are accelerated to energy of 7 TeV.
• The LHC goal will be to have protons from 1 beam collide with the protons from the other.
• 4 experiments.
LHC : Large Hadron Collider
(27 km diameter)
28/03/2003 Julie PRAST, LAPP CNRS, FRANCE
3
The ATLAS experiment
• Goal: explore the fundamental nature of matter and the basic forces that shape our universe.
• About the size of a five story building.
• Collaboration of 2000 physicists.
• 150 universities and laboratories in 34 countries.
28/03/2003 Julie PRAST, LAPP CNRS, FRANCE
4
The electromagnetic calorimeter
• ATLAS : Several sub-detectors
• Electromagnetic calorimeter – Identifies electrons and
photons.– Measures energy carried by
these particles. – 200 000 cells to be read at 40
MHz.Electromagnetic
calorimeter
28/03/2003 Julie PRAST, LAPP CNRS, FRANCE
5
The calorimeter electronic chain
DETECTOR
FRONT END ELECTRONICS 1600
optical links
Glink
800 Optical links
Slink12 BitsADC
AMPLI ANALOG MEMORY (SCA)
Shaping
FEB
BACK END ELECTRONICS
ROBROD
Timing Trigger Control (TTC)
28/03/2003 Julie PRAST, LAPP CNRS, FRANCE
6
The ROD modules
• Calculate precise energy and timing of calorimeter signals from discrete time samples (t = 25 ns).
• Perform monitoring.
• Format data for the following element in the electronics chain.
28/03/2003 Julie PRAST, LAPP CNRS, FRANCE
7
200 modules, each receiving data from 1024 calorimeter cells.
Calculate energy for these data using optimal filtering weights:
E = ai (Si - PED)
If E > threshold, calculate timing and pulse quality factor: (< 10% cells)
E = bi (Si - PED)
2 = (Si - PED - E gi) 2
Performs histograms of E, , 2, ...
During calibration runs, perform signal averaging to calculate calibration constants for each channel.
with i = 1,.,5 time sampleswith PED = pedestal
The ROD modules goals
28/03/2003 Julie PRAST, LAPP CNRS, FRANCE
8
Requirements
• The ROD module must be able to process an event in less than 10 µs, including histograms.
• Use of commercial programmable processor. A natural choice is Digital Signal Processor
Efficient power calculation for that kind of algorithm. High I/O bandwidth.
• Modular design. Basic components should be easily changed/upgraded.
• Low power consumption.
28/03/2003 Julie PRAST, LAPP CNRS, FRANCE
9
The ROD : a 9U VME board
28/03/2003 Julie PRAST, LAPP CNRS, FRANCE
10
The ROD Motherboard
28/03/2003 Julie PRAST, LAPP CNRS, FRANCE
11
The Staging Mode
• At the beginning of LHC.• ROD equipped with half of
the PU.• Level 1 trigger rate <50 kHz.• Data from 4 FEB are routed
to one PU.• 1 DSP process 256 channels
instead of 128.
28/03/2003 Julie PRAST, LAPP CNRS, FRANCE
12
The DSP Processing Unit
config
config
EMIFA
EXT_INT
EMIFA
EXT_INT
TMS320C6414
TMS320C6414
Input FPGA
Apex 20k160
FEB1
FEB3
FEB2
FEB4
Input FPGA
Apex 20k160
16
16
16
16
64
64
FIFO4k*16
FIFO4k*16
16
16
16
Data stream TTC VME
JTAG
EMIF B
EMIF B
16
16
BCIDTType
Acex 1k30
McBSP0 McBSP1
McBSP0
McBSP1 TTCTTCinterface
16
McBSP2
McBSP2
HPI
HPI
VMEVMEinterface
28/03/2003 Julie PRAST, LAPP CNRS, FRANCE
13
The DSP Processing Unit
Input FPGA DSP Output FPGAFIFO
28/03/2003 Julie PRAST, LAPP CNRS, FRANCE
14
PU Software Summary
DSP :
For 128 channels per events
E calculation or E, t, 2
Input FPGA :
Parallelized data
In DSP format
Input data :
Serial data in FEB format.
Output FPGA :
TTC data
Output data :
Integer 16 bit E
or
Integer 16 bit E32 bit t, 2 and gain
or
32 bit E32 bit t, 2 and gain
« Programmable » Part
Fixed part
in outROD
HistogramsVME Interface
28/03/2003 Julie PRAST, LAPP CNRS, FRANCE
15
8 Calculation Units
64 Registers
The TMS320C6414 : a last generation DSP from TI
Instruction Decoding
Pér
iphé
rals
DM
A C
ontr
olle
r
Cen
tral
Mem
ory
1MB
CPUCoreC64x
Cache Memory16kB data
Cache Memory16kB data
External Memory Interface
28/03/2003 Julie PRAST, LAPP CNRS, FRANCE
16
The DSP code structure
EDMA ISR RTX
Send Task
Synchronization Task Process
TTC circular buffer (16 events wide)
Output Event circular buffer (16 events wide)
Physics
Test
Calibration
Event from FEB
Data to output controller
Other Tasks
Input Event circular buffer (16 events wide)
EDMA ISR RTX
Send Task
Synchronization Task Process
TTC circular buffer (16 events wide)
Output Event circular buffer (16 events wide)
Physics
Test
Calibration
Event from FEB
Data to output controller
Other Tasks
Input Event circular buffer (16 events wide)
28/03/2003 Julie PRAST, LAPP CNRS, FRANCE
17
DSP Software
• Developed with Code Composer Studio.• Whole code written in C language except• Physics loops written in linear assembly and then
optimized using CCS.Code complexity limited
Good legibility and maintenance
28/03/2003 Julie PRAST, LAPP CNRS, FRANCE
18
Example of Linear Assembly
• Calculation of the cell energy : E=ai(si-p)
Let the compiler do all the laborious work of parallelizing, pipelining and register allocation.
a1s1
a2s2+a2s2
a5s5+a5s5 aisi
(i=2..5) aisi (i=1..5)
E=aisi-aip
mpy s1,a1,sa1dotp2 a23,s23,sa23 dotp2s45,a45,sa45 addsa23,sa45,sa25 add sa1,sa25,sa15 sub sa15,px,e
28/03/2003 Julie PRAST, LAPP CNRS, FRANCE
19
DSP software results
• Physics calculation of 128 channels : 3.5 s.– Includes all the necessary histograms , 2 for a fraction of 10 % of high energy cells.
• 30 to 40% of time is due to stall cycles.– Cycles lost because data are not in the cache.
28/03/2003 Julie PRAST, LAPP CNRS, FRANCE
20
• When a data or instruction is not in the cache memory => 6 stalls cycles until the data is copied from the central memory to the cache.
• For the E calculation : 6 data to be read => 36 wait cycles
• The cache memory must be understood to ameliorate these numbers.
Pér
iphé
rals
DM
A C
ontr
olle
r
Cen
tral
Mem
ory
1MB
CPUCoreC64x
Cache Memory16kB data
Cache Memory16kB data
The Cache Memory
28/03/2003 Julie PRAST, LAPP CNRS, FRANCE
21
• L1D Mapping:• Take care of which data is
loaded, from which address and in what order.
• L1D Pipelining:• Use of consecutive loads
• 1 miss : 6 wait cycles
• 2 misses : 8 wait cycles
• 4 misses : 12 wait cycles
• L1D access optimization
• Samples preloading
• Interleaved histograms
Pér
iphé
rals
DM
A C
ontr
olle
r
Cen
tral
Mem
ory
1MB
CPUCoreC64x
Cache Memory16kB data
Cache Memory16kB data
Which improvements ?
28/03/2003 Julie PRAST, LAPP CNRS, FRANCE
22
DSP software results• Physics calculation of 128 channels : 3.5 s.
– Includes all the necessary histograms , 2 for a fraction of 10 % of high energy cells.
• 30 to 40% of time is due to stall cycles.– Cycles lost because data are not in the cache.
• The complete code takes about 7 s (600 MHz DSP).– Includes the RTX kernel, synchronization and send tasks,
… 30 % of margin for further improvements.
28/03/2003 Julie PRAST, LAPP CNRS, FRANCE
23
Agenda
• Mid March : Motherboard + PU assembled• May 2003: Validation in standalone mode.• Fall 2003: System test in the experiment environment.• Spring 2004: production launch.• Summer 2004: Boards installation at LHC.
28/03/2003 Julie PRAST, LAPP CNRS, FRANCE
24
Conclusion: the ROD
• Calculate precise energy and timing of the signals calorimeter.
• 1 motherboard and 4 Processing Units.
• 1 PU = two 600 MHz TMS320C6414 DSP.
• 30 % of margin for future improvements.
• 200 ROD to be produced in 2004.
28/03/2003 Julie PRAST, LAPP CNRS, FRANCE
25
Thank You
Top Related