HLT architecture. TPC FEE anode wire pad plane drift region 88 s L1: 5 s 200 Hz PASA ADC Digital...
-
Upload
gwenda-andrews -
Category
Documents
-
view
213 -
download
0
Transcript of HLT architecture. TPC FEE anode wire pad plane drift region 88 s L1: 5 s 200 Hz PASA ADC Digital...
HLT architecture
TPC FEE
anode wire
pad plane
drift region88s
L1: 5s 200 Hz
PASA ADC DigitalCircuit
RAM
8 CHIPS x
16 CH / CHIP
8 CHIPSx
16 CH / CHIP
CUSTOM IC(CMOS 0.35m) CUSTOM IC (CMOS 0.25m )
DETECTOR FEC (Front End Card) - 128 CHANNELS(CLOSE TO THE READOUT PLANE)
FEC (Front End Card) - 128 CHANNELS(CLOSE TO THE READOUT PLANE)
570132PADS
1 MIP = 4.8 fC
S/N = 30 : 1
DYNAMIC = 30 MIP
CSA SEMI-GAUSS. SHAPER
GAIN = 12 mV / fCFWHM = 190 ns
10 BIT
< 10 MHz
• BASELINE CORR.
• TAIL CANCELL.
• ZERO SUPPR.
MULTI-EVENT
MEMORY
L2: < 100 s 200 Hz
DDL(4096 CH / DDL)
Powerconsumption:
< 40 mW / channel
Powerconsumption:
< 40 mW / channel
gat
ing
gri
d
TPC electronics: ALICE TPCE READOUT CHIP (ALTRO)
0 100 200 300 400 500 600 700-50
0
50
100
150
200filter inputthreshold
0 100 200 300 400 500 600 700-50
0
50
100
150
200Filtered data and fixed threshold
filter outputthreshold
DIGITAL TAIL CANCELLATION PERFORMANCE
AD
C c
ou
nts
AD
C c
ou
nts
Time samples (170 ns)
AdaptiveBaselineCorrect.
I
AdaptiveBaselineCorrect.
I
ADCADC TailCancel.
TailCancel.
DataFormat.
DataFormat.
Multi-EventMemory
AdaptiveBaselineCorrect.
II
AdaptiveBaselineCorrect.
II
+-
10- bit20 MSPS
11- bit CA2arithmetic
18- bit CA2arithmetic
11- bitarithmetic
40-bitformat
40-bitformat
SAMPLING CLOCK 20 MHz READOUT CLOCK 40 MHz
DIGITAL PROCESSOR & CONTROL LOGIC
8 A
DC
s 8 A
DC
s
ME
MO
RY
0.25 m (ST) area:64mm2
power:29 mW / ch SEU protection
Data compression: Entropy coder
Variable Length Coding short codes for long codes for
frequent values infrequent values
Probability distribution of 8-bit TPC data
Results:NA49: compressed event size = 72%ALICE:
= 65%
(Arne Wiebalck, diploma thesis, Heidelberg)
TPC - RCU
RCU design – control flow
• State machines RCU
resource &
priority manager
TTCrxFEE bus controller
SIUcontroller DDL
commanddecoder
FEESC
Slow control
Watch dog:health agentDebugger
PCI core
Huffman encoder
RCU design - data flow
TTCrx registers
Event memory
Event fragment pointer list
TTC controller
FEE bus controller
FEE bus controller
Configuration memoryFEE bus controller
Slow control
SIU controller fifoSIU
Huffman encoder
• Shared memory modules
Data compression: TPC - RCU• TPC front-end electronics system architecture
and readout controller unit.• Pipelined Huffman Encoding Unit,
implemented in a Xilinx Virtex 50 chip*
* T. Jahnke, S. Schoessel and K. Sulimma, EDA group, Department of Computer Science, University of Frankfurt
RCU prototypes• Prototype I
– Commercial OEM-PCI board– FEE-board test (ALTRO + FEE bus)– SIU integration– Qtr 3, 2001 – Qtr 2, 2002
• Prototype II– Custom design– All functional blocks– PCB: Qtr 2, 2002– Implementation of basic functionality (FEE-board -> SIU):
Qtr 2, 2002
– Implementation of essential functionalty: Qtr 4, 2002
• Prototype III – SRAM FPGA -> masked version or Antifuse FPGA (if needed)
• RCU production– Qtr 2, 2003
RCU prototype I
• Commercial OEM-PCI board– ALTERA FPGA APEX
EP20K400
– SRAM 4 x 32k x 16bits
– PMC I/O connectors (178 pins)
– Buffered I/O (72 pins)
RCU prototype I
• Implementation of basic test functionality– FEE-board test
(ALTRO + FEE bus)
– SIU integration
PCI core
SIU card
PCI bus FPGA APEX20k400
internalSRAM
I/O
onboard SRAM
4 x 32k x 16
FLASH EEPROM
FEE-busdaughter board
PMC
FEE boards trigger
RCU prototype II
• Implementation of essential functionality– Custom design
– All functional blocks
PCI coreSIU-CMCinterface
SIU
PCI bus
FPGA
internalSRAM
Memory D32
> 2 MB
FLASH EEPROM
SC TTC FEE-bus
RCU prototype II - schematics
JN
1
JN
2J
N4
JN
3
JN
2A
JN
5
APEX
Flash
Flash
Flash
SRAM SRAM
SRAM SRAM
SDRAM
Po
wer
(1.8
V G
en.)
SRAM SRAM
SRAM SRAM
CIA miscellaneous
Co
nn
ecto
rs
RCU prototype II – RCU mezzanine
RCU Mezzanine CardComponents on top side
No maximum height restriction
Front-End Bus Conn 1
Front-End Bus Conn 2
RCU prototype II - schematics
JN
1
JN
2J
N4
JN
3
JN
2A
JN
5
APEX
Flash
Flash
Flash
SRAM SRAM
SRAM SRAM
SDRAM
Po
wer
(1.8
V G
en.)
SRAM SRAM
SRAM SRAM
CIA miscellaneousSIU / DIU mezzanine card (1/2 CMC)
Co
nn
ecto
rs
RCU Mezzanine CardComponents on top side
No maximum height restriction
Front-End Bus Conn 1
Front-End Bus Conn 2
Programming model• Development version – status December 2001
PCLINUX RH7.1 (2.4.2)
PCI coremailboxmemory
SIU controller
FEE buscontroller
FEE bus
ALTROemulator
PCI-toolsRCU-API
device driver
ALTROemulator
PLDA board
SIU
DDL
SIU-RORC integration
SIU controllerPCI core
SIUinterface
PCI bus
FPGA
SRAM
LINUX/NTPLDA/PCI-
toolsRCU-API
devicer driver SIU
PCI bridge Glue logicDIU
interface
PCI bus
LINUXDDL/PCI-
toolspRORC-APIdevice driver DIU
DDL
RCU prototype I
pRORC
SIU-RORC integration• Result
data control
PC1:write memory block to FPGA internal SRAM
DDL
PC1 memory block
RCU internal SRAM
SIU
DIU
PC2 ”bigphys” memory area
SIU controller:wait for
READY-TO-RECEIVE
PC2:allocate bigphys area,
init link + pRORC
PC2:send DDL-FEE command READY-
TO-RECEIVESIU controller:strobe data into SIU
pRORC:copy data into bigphys area
via DMA=
RCU system for TPC test 2002
FEE-bus controller
SIU controller
PCI core
SIUinterface
PCI bus
FPGA
SRAM
LINUX RH7.xDATE
PLDA/PCI-toolsRCU-API
devicer driver SIU
PCI bridge Glue logicDIU
interface
PCI bus
LINUX RH7.xDATE
DDL/PCI-toolspRORC-APIdevice driver DIU
DDL
RCU prototype II/I
pRORC
ext. SRAM FLASH
Manager
FEE-bus
TriggerFEE-boards
Programming model• TPC test version – summer 2002
PCLINUX RH7.1 (2.4.2)
PCI coremailboxmemory
SIU controller
FEE buscontroller
FEE bus
FEE boards
PCI-toolsRCU-API
device driver
Prototype II(Prototype I)
SIU
DDL
DATEFEE configurator
RCUresource
& priority manager
TPC PCI-RORC
PCI bridge Glue logicDIU-CMCinterface
DIU card
PCI bus
FPGA Coprocessor
internalSRAM
MemoryD32
2 MB
Memory D32
2 MB
FLASH EEPROM
HLT architecture overview
• Not a specialized computer, but
a generic large scale (>500 node)
multi processor cluster• A few nodes have additional
hardware (PCI RORC)• has to be operational in off-line
mode also• Use of commodity processors• Use of commodity networks• Reliability and fault tolerance is
mandatory• Use standard OS (Linux)• Use of on-line disks as mass
storage
RcvBd
NICPC
IRcvBdRcvBd
NICNICPC
I RcvBd
NICPC
IRcvBdRcvBd
NICNICPC
I RcvBd
NICPC
IRcvBdRcvBd
NICNICPC
I RcvBd
NICPC
IRcvBdRcvBd
NICNICPC
I
HLT Network
ReceiverProcessos /HLT Processor
NICPC
I
NICNICPC
I
NICPC
I
NICNICPC
I
NICPC
I
NICNICPC
I
NICPC
I
NICNICPC
I
NICPC
I
NICNICPC
I
NICPC
I
NICNICPC
I
NICNICPC
I
HLT Processors
MonitoringServer
DistributedFarm Controller
Optical Links to Front-End
HLT - Cluster Slow ControlFeatures:
• Battery Backed Completely independent of host• Power Controller Remote powering of host• Reset Controller Remote physical RESET• PCI Bus perform PCI bus scans, identify devices• Floppy/flash emulator create remotely defined boot image• Keyboard driver remote keyboard emulation• Mouse driver remote mouse emulation• VGA replace graphics card• price very low cost
Functionality:• complete remote control of PC like terminal server but already at BIOS level• intercept port 80 messages (even remotely diagnose dead computer)• interoperate with remote server, providing status/error information• watch dog functionality• identify host and receive boot image for host• RESET/Power maintenance
HLT Networking (TPC only)
92 000
92 000
92 000
92 000
All data rates in kB/sec (readout not included here)
65 000
180
lin
ks,
200
Hz
cluster finder180+36 nodes
Track segments108+36 nodes
Track merger72+36 nodes
Global L312 nodes
Assume 40 Hz coinzidence trigger plus 160 Hz TRD pretrig with 4 sectors per triggerAssume 40 Hz coinzidence trigger plus 160 Hz TRD pretrig with 4 sectors per trigger
65 000
spare
spare
spare
17 000 000 aggregate 2 340 000 252 000 ?
7 000
7 000
HLT Interfaces
L3 Trigger Processor
Detectors
Event Builder
DCSEC(Experiment Control)
EC(Experiment Control)
On-line/off-lineSoftware
L3-API
L2A
L3A
Logging
Monitoring
DATA Grid
HLT internal, input and output interfacePublish/subscribe:
Publisher
SubscriberProxy
Publisher Process
Subscriber
PublisherProxy
Subscriber Process• When local do not move data – Exchange pointers only• Separate processes, multiple subscribers for one publisher• Network API and architecture independent• Fault tolerant (can loose node)• Consider monitoring• Standard within HLT and for input and output• Demonstrated to work on both shared memory paradigm and sockets• Very light weight
• When local do not move data – Exchange pointers only• Separate processes, multiple subscribers for one publisher• Network API and architecture independent• Fault tolerant (can loose node)• Consider monitoring• Standard within HLT and for input and output• Demonstrated to work on both shared memory paradigm and sockets• Very light weight
• HLT is autonomous system with high reliability standards (part of data path)• HLT has a number of operating modes
• on-line trigger• off-line processor farm• possibly combination of both
• very high input data rates (20 GB/sec)• high internal networking requirements• HLT front-end is first processing layer• Goal: same interface for data input, internal data exchange and data output
• HLT is autonomous system with high reliability standards (part of data path)• HLT has a number of operating modes
• on-line trigger• off-line processor farm• possibly combination of both
• very high input data rates (20 GB/sec)• high internal networking requirements• HLT front-end is first processing layer• Goal: same interface for data input, internal data exchange and data output
HLT system structure
TPC:fast cluster finder + fast tracker
Hough transform + cluster evaluatorKalman fitter
TRD trigger
Dimuon trigger
Trigger detectors
Pattern Recognition
Dimuon arm tracking
PHOStrigger
Extrapolate to ITS
Extrapolate to TOF
Extrapolate to TRD
...
Level-1
Level-3
(Sub)-event Reconstruction
raw data, 10bit dynamic range,zero suppressed
Huffman encoding (and vector quantization)
fast cluster finder: simple unfolding, flagging of
overlapping clusters
RCU
RORC
cluster list
raw data
fast vertex finder
fast track finder initialization (e.g. Hough transform)
Hough histogramsPeakfinder
receiver node
Preprocessing per sector
global node vertex position
detector front-end electronics
Huffman decoding,unpacking, 10-to-8 bit conversion
FPGA coprocessor: cluster finder• Fast cluster finder
– up to 32 padrows per RORC
– up to 141 pads/row and up to 512 timebins/pad
– internal RAM: 2x512x8bit
– timing (in clock cycles, e.g. 5 nsec)1:
#(cluster-timebins per pad) / 2 + #clustersouter padrow: 150
nsec/pad, 21 sec/row
1. Timing estimates by K. Sulimma, EDA group, Department of Computer Science, University of Frankfurt
– centroid calculation: pipelined array multiplier
FPGA coprocessor:Hough transformation
• Fast track finder: Hough transformations2
– (row,pad,time)-to-(2/R,,) transformation
– (n-pixel)-to-(circle-parameter) transformation
– feature extraction: local peak finding in parameter space
2. E.g. see Pattern Recognition Algorithms on FPGAs and CPUs for the ATLAS LVL2 Trigger,
C. Hinkelbein et at., IEEE Trans. Nucl. Sci. 47 (2000) 362.
raw data, 8bit dynamic range,decoded and unpacked
slicing of padrow-pad-time space into sheets of pseudo-rapidity,
subdiving each sheet into overlapping patches
track segments
fast track finder B:1. Hough transformation
receiver node
Processing per sector
vertex position,cluster list
sub-volumes in r,,
cluster deconvolutionand fitting
updated vertex positionupdated cluster list,track segment list
fast track finder B: 2. Hough maxima finder, 3. tracklett verification
RORC
fast track finder A:track follower