Pulsar Functional requirements
description
Transcript of Pulsar Functional requirements
Pulsar Functional requirements
Ted Liu
Level 2 group meeting, Feb 22rd. 02
Part I: Pulsar as a Pulser and Recorder (test stand tools)
Part I outline: Pulser mode
• The need for L2 teststand and goals• pulser board design concept• what pulser board should be able to do?• L2 input data for each subsystem• firmware design considerations for pulser mode
“In the longer term, the test-stand system should be aggressively pursued. This will allow completion of the development effort and longer-term maintenance of the full system.” As recommended by the committee, we will schedule a workshop for the L2 group to discuss the specifications for this system…
“The committee thinks that the test station system, combined with the 2nd test crate sounds like the perfect way to provide various types of simulated event data (different luminosity, trigger types, suspected failure modes etc.). Providing that this effort won't impact any activities needed to make the baseline system work, it should be strongly supported, and prototyping and testing work should go ahead at full speed.”
L2 Review committee Recommendation (Dec. 14th, 2001):
“Hold a design workshop by the rest of the Level 2 groups in order to ensure that what is built is safe to use and is capable of exercising all the important parts of the system in a realistic fashion”.
How you could use expanded teststand capabilities ? (from this meeting agenda)• if we have a teststand with a general input pulser, how would you want to use it? --- debugging broken/spare boards; --- testing firmware modifications; --- ???• what type of tests would you want to run? --- is fixed patterns with fixed timing good enough? --- data from real events? --- test multiple boards and check for interference? --- Randomly timed L1A patterns? --- Random/user controlled latency of input data? --- L1As etc driven in a deterministic way? (TESTCLK) --- L1As etc driven by TS? --- ???• what kind of software tools you will need for your testing?
Basic requirement for a L2 trigger data source board:
L1A for buffer n
Data block
latency
To L2 interfaceBoard input
FIFO
RAM
Test pattern
(1) upon L1A for buffer n, start a counter for buffer n;
(2) At the same time clock data from RAM into the FIFO,
(3) once the counter reaches latency threshold, clock the data out from the FIFO at the speed which matches with that of the subsystem… the actual latency is controlled by when the data is clocked out the FIFO.
This is an over simplified picture. Each subsystem is somewhat different and to design an universal test board is not all that easy…
Reces x 4L1
XTRP
SVT
CLIST
ISO
MUON
AlphasX 4
One SVT Cable each
6 fiber (hotlink)1 LVDS cable 7 fibers (Taxi) 16 fibers (hotlink)
12 fibers(Taxi)L1 cables
Magic bus
L2 crate inputs
can one design an universal L2 test (pulser) board? -- to enhance the testability of L2
“provide various types of simulated event data (different luminosity, trigger types, suspected failure modes etc.)”
Reces x 4L1
XTRP
SVT
CLIST
ISO
MUON
AlphasX 4
One SVT Cable each
6 fiber (hotlink)1 LVDS cable 7 fibers (Taxi) 16 fibers (hotlink)
12 fibers(Taxi)L1 cables
Magic bus
L2 crate inputs
•After L1A, data arrives to each interface board with different latency (L1 within 132 ns, XTRP within ~1us, then the rest. SVT takes 10 us or so---longest);• most boards (L1, XTRP, CLIST, IsoList and SVT) will then request for bus, and send its data to alpha over magicbus;• alpha will process the data, if it needs muon and Reces data (for some events), it will then get the data via programmed I/O over magicbus from muon and Reces boards• once decision is made, alpha will handshake with Trigger Supervisor (via L2-TS cable)
•
Level 2 decision crate
TRACER
MVME
Fixed or variabledata length?
SVT XTRP L1 CLIST ISO Muon
Data with Buffer#?
Incoming dataClock rate
data size range
Latency range*
EOE with data? (or from separate path?)
B0 marker?
Data gap withinone event?
Reces
30Mhz 7.6Mhz 7.6Mhz 20Mhz 12Mhz 30Mhzcdfclk x 4
Interface hardware SVT cable SVT cable L1 cable Hotlink+fiber Taxi+fiber Hotlink+fiber
7.6 Mhz cdfclk
Taxi+fiber
96 bits/evt
fixed fixed
no
yes yes - no no yes -
yes
Level 2 trigger input data paths
Flow control ?
150bits/trk 21 bits/trk 46bits/clu 145bits/clu 11Kbits/evt 1.5Kb/evt
~ 6 us
yesyes
~132 ns~1us - 10us
variable
yes
BC#
yes
not used
~10-100us
variable
yes
BC#
yes
Not used
yes
no
no
no
~1-20us
variable
yes
no
no
no
variable
no
fixed
no no
no
nono
~1-5 us~few us
* Latency range also depends on L1A history …
no
yes
Design issues for an universal test board:
Hardware requirement is clear:
• have all hardware interfaces for all data paths;
Firmware requirements need more thinking:
• variable data size for some subsystem;• variable latency (from event to event);• correct buffer number in the data for a given L1A;• gaps for certain data paths;• record real data and reproduce in test stand;• response to HRR etc• ….???
αSVT
XTRP
CLIST
ISO
L2 decision crate
L1
Reces
Pulsar
Hotlink IO
Taxi IO
SVT/XTRP
L1TS
CDFctrl
VME
MUON
Pulsar is designed to have all the data interfaces that Level 2 decision crate has. It is a data source for all triggerinputs to Level 2 decision crate, it can be used to record data from upstream as well.
PulsAR: Pulser And Recorder
Basic hardware requirement: have all hardware interfaces
αSVT
XTRP
CLIST
ISO
L2 decision crate
L1
RecesHP scope
LogicalAnalyzer
HP scope
What are the test tools ? -> two types of board:
L2 inputs
TS
MMB
PULSAR
PULSAR
PULSAR
PULSAR
PULSAR
MMBMagic
MysteryBoard
SVTformat
Magicbus
CDFctrl
Pulsar
Hotlink IO
Taxi IO
SVT/XTRP
L1TS
CDFctrl
VME
MUON
L2 test crate
TRK
Control
TS
L1
Optical IO
TRK
Mezz cardconnectors
Level2_Pulsar design as test stand tool
9U VME(VME FPGA not shown)
CDFctrl
3 Altera APEX 20K400 FPGAs
OpticalIO
Pulsar: Pulser and Recorder (as Level 2 test stand tools)
MezzanineCard
PULSAR design as test tool only
IO
Ctrl
IO
Hotlink/Taxi
TS
L1
SVTSVT
L1
Front-panel(double width)
component sideOther connectors (1 L1, 1 TS) will stay inside the board.The mezzanine card connectors are used for optical I/O (hotlink and taxi)
FIFO
FIFO
FIFO
FIFO
Custom Mezzanine cards
• Hotlink: Tx and Rx (CLIST, Muon data paths)• Taxi: Tx and Rx (Iso, Reces data paths)
Altera EP1K30_144 FPGA
CMC connector
J1
J3
Hotlink or Taxi Tx/Rx chips
Hotlink Tx/Rx: CY7B923JC/933JC
Taxi Tx/Rx: AM7968/7969-175JC
Hotlink Optical Tx/Rx: HFBR-1119T/2119T
Taxi Optical Tx/Rx: HFBR-1414T/2416T
Usually has 4 fiber connectors.
add one LVDS connector for CLIST case: only two fiber connector (left side) will be loaded for one Mezzanine card, and one LVDS connector will be loaded on the right side instead oftwo fiber connectors
Hotlink Rx mezzanine Hotlink Tx mezzanine
Pulsar boards communication lines on P2
Will use SVT like P2 user defined signals forInter-board communication:
(1) Pulsar_init* (P2 A1);(2) Pulsar_error* (P2 A2);(3) Pulsar_freeze* (P2 A3);(4) Pulsar_Lostlock* (P2 A4); (for SLINK etc).(5) Pulsar_spare* (P2 A5);
Any Pulsar board can drive and listen to these signal lines
Control unit
SRAM
InternalTest
RAM
controller
L1ABuf #
FIFOsL1 dataSVT dataXTRP data
The latency is controlled
by when the data is clocked
out the FIFO
If we want to load large number of events,we will need to use external SRAM.
FIFOFIFO
FIFOFIFO
Optical IO Unit
SRAM
InternalTest
RAM
controller
L1ABuf #
The latency is controlledby when the data is clockedout the FIFO
hotlink examples:
Muon case (only one mezzanine card shown)
FIFOFIFO
FIFO
Optical IO Unitfor CLIST caseF E D C B A
8 bits each @50ns, one cluster is encoded in 6 8-bit words in all 6 fibers
8 bits data streams will be pushed into the FIFOs in the mezzanine card after L1A, later on they will be clocked out onto fibers. The end of event marker comes out via LVDS connector.
FIFOFIFO
FIFOFIFO
LVDS
outputs: 6 fibers + LVDS Another hotlink example
8 bits wideper fiber
128 words deep
Simple way to load test patterns and send them out (optical paths as an example)
buffer0
buffer3
buffer2
buffer1
ClockedFIFO
Fiber Tx8 1
8
The actual latency will be controlled bywhen the data is clocked out the FIFO after L1A (use a register via VME)
This means the latency will be fixedfor a given test run. This is not goodenough to mimic the real systemas the latency varies from event to event, butmay be good enough for testing spares
FPGAInternal RAM
8 bitsdata
8 bitsdata
8 bitsdata
4 Ctrlbits
another way to load test pattern memory: use 36 bits data width, 32 will be for 4 fiber output (4 x 8), the highest 4 bits will be used as control bits to mark the content of data. For each event worth data, the first one will be the header, and the 32 bits data will contain the latency (&number of words etc) for this particular event and this particular path. The last one is the trailer, which can contain other info if needed (such as what L2 decision should be etc) (either use internal RAM or use 16 bit address 36 bit data external SRAM):
Buffer0 data
Buffer1 data
Buffer2 data
Buffer3 data
36bits
8 bitsdata
The highest two address bits will be controlled by buffer number to divide automatically the memory for 4 buffers
How does it work: (1) after L1A, read the first word(header) and get the latency, at the same time start a counter; (2) continue to readout the rest of the data words from the memory and clock them into a FIFO, until the trailer is reached (can get the L2 decision information there) (3) once the counter reaches latency threshold, clock the data out from the FIFO at the speed which matches with the subsystem. this way the latency for each event and each data path can be individually controlled by user.
Buffer 0 data memory
header
trailer
Latency for this event, and other info
data data data data
Other information (what L2 decision should be etc)
One could have morecontrol by insertinggaps in betweendata words…etc usingthe 4 control bits,to better mimic thereal situation for certain data paths.
This approach seem to bequite flexible
1st event
Ctrl bit 35: headerCtrl bit 34: trailerCtrl bit 33: gapCtrl bit 32: reserved
Initial thoughts on tester firmware design
FIFO
L1ABuf#
counter0
counter1
counter2
counter3
RAM
VME36bits
addr
State machine
FFIO
FFIO
FFIO
FFIO
data
ctrl ctrl
MezzanineCard side
• Latch L1A+buf#• read 1st word from RAM• save latency&compare with counter• continue reading data from RAM to FIFOs• until last word• once counter countsUp to latency, enableFIFO output and ctrlSignals for Tx chips• ready for next L1A
Buffer 0 data
Buffer 1 data
Buffer 2 data
Buffer 3 data
Notes:
(1) Counter will be reset by the statemachine, rearm for next L1A(2) if counter already counts beyond the latency for a given L1A, clock data out right away;(3) RAM address controlled by an address counter and buffer number with the current event;(4) HRR: reset L1A FIFO and all output FIFOs, wait for B0 marker to pass by before enabling…(5) To keep things simple, for now just use single event data for a given buffer (to keep RAM address simple)(6) what’s shown is for 4 fiber case, need one more internal RAM for 8 fibers… if use external SRAM, need ping-ponging…
null
L1A
header
RAMtoFIFO
Latency
done
output
Comments:Pro: simpleCon: not so elegant, as thestate machine has to finishsending all data out of the FIFObefore able to process nextone. Maybe ok if run at higher clock rate. Will be some intrinsicdelay between events.
Good starting point, allow us to simulate theboard soon.
Would be better to separatethe RAM to FIFO part fromthe actual data sending part
Possible implementation A:
null
L1A
header
RAMtoFIFO
done
Comments:Pro: more elegantCon: somewhat more involved.
Implement this laterFor the real thing.
We decided to go for Implementation A first.
Output controller
FIFO
Data ready to be sent
latency
counterQ
Possible implementation B:
A few comments:
(1) On average, the maximum data size is from muon. Each fiber can send up to 30 x 4 = 120 8-bit words per event (with 16 fibers total) . If we use a 36 data bits 16 address bits SRAM, we can load up to a few hundreds of different events for muon for a given test run. Can load much more for other subsystems.
(2) The latency for each event and for each data path (arrival time into L2 decision crate after L1A) can be controlled by user to better mimic the real system;
(3) The long data gaps or long delays for some subsystems can be simulated this way;
(4) latency for individual events?
* estimate based on data size (i.e. more clusters -> longer latency etc); * it may be possible to record the real data with Pulsar in recorder mode and time stamp the incoming data during recording. Then save them and can later be used to reproduce the real data with real timing. * note: actual latency also depends on the history of L1As.
for example, it may be possible to record XTRP data with timing information: would this be useful?
XTRP input
data recording
RAM
time recording
RAM
Counter for L1A buffer n
Both data and arrival time can be recorded with the data strobe from XTRP. The data latency AND “gap” information can be recorded this way, and can be reproduce in test stand mode. Since Pulsar has both XTRP input and output connectors,spying on the data is possible. For fiber connections, need to use fiber splitters to spy on data, or take short test runs.
FIFOFIFO
FIFOFIFO
Optical IO Unit
SRAM
InternalTest
RAM
controller
L1ABuf #
Configure the (S)RAMas a circular buffer for recording (for each L1A) and can be stoppedand read out via VME.
Each Optical IO FPGA looks at 8 fiber channels,SRAM has 32+4 bits. So need ping-ponging forrecording (recording is at twice the incomingdata rate, 60MHz)
hotlink examples:
Muon case (only one mezzanine card shown)
Pulsar in recorder mode
or
VME
PULSAR
Magic bus
Ideal test stand setup: Alpha + Pulsar + interface board(s) (this setup has been done already at VME speed by Steve, Matt and Peter with SVT Merger board acting as Pulsar)
ALPHA
INTFACE
data input
Data source: Level2_PulsarData sink: AlphaPossible data patterns: (1) hand made (2) derived from MC (3) derived from data bank (4) recorded from upstream, catch errors and reproduce them
TRACER
ROC
• can test individual board
• can test the full data path;• can also test multiple boards and check for interference• note that with only 2 Pulsar boards one can source data at the same time for L1, XTRP, SVT, CLIST, Iso and muon. need one extra Pulsar to drive one Reces,• …. what else?
PULSAR
Magic bus or TDC backplane
Another possible setup: Pulsar + MMB + Interface board
INTFACE
data input
Data source: Level2_PulsarData sink: MMB+Pulsar/GB
TRACER
ROC
MMB
MagicbusAnalyzer
in case Alpha is not available, it is possible to use MMB to sink the data and convert into SVT data format then send the data into Pulsar or a GhostBuster board.
or TESTCLK?
Reces x 4L1
XTRP
SVT
CLIST
ISO
MUON
alphas
Magic bus
PULSAR
PULSAR
Possible to drive the full system
TRACER
L2 test crate
PULSAR
5 Pulsar boards needed to drive the full system
16 fibersX 3 =48
16 fibers
PULSAR
with different
MMB
PULSAR
Magicbus watchdog
ROC L1A rate &
Event patterns
train no. I II III IV V VI-----------------------------------------------------------------------------sig_0 1 em(5) 1 had(5) 1 crate_selsig_1 L1AB(0) em(6) passbit(0) had(6) phi(0) ntow(0)sig_2 L1AB(1) em(7) passbit(1) had(7) phi(1) ntow(1)sig_3 em(0) em(8) had(0) had(8) eta(0) ntow(2)sig_4 em(1) em(9) had(1) had(9) eta(1) ntow(3)sig_5 em(2) em(10) had(2) had(10) eta(2) ntow(4)sig_6 em(3) em(11) had(3) had(11) eta(3) ntow(5)sig_7 em(4) em(12) had(4) had(12) eta(4) ntow(6)
CLIST cluster information from one LOCOS6 8-bits words per cluster on one fiber input, arriving 50ns apart
Data format from Monica.
pin 1 BUF_DONE(0)+pin 2 BUF_DONE(0)-pin 3 BUF_DONE(1)+pin 4 BUF_DONE(1)-pin 5 CRSUM_SEND+ (not received by CLIST)pin 6 CRSUM_SEND- (not received by CLIST)pin 7 EVENT_DONE*+pin 8 EVENT_DONE*-pin 9 unusedpin 10 unused
1 CLIQUE connection (via 10 pin twisted ribbon cable)
The LVDS signals are driven by a 16.7 nsec clock which is a divided-by-8 copy of the 132 nsec CDF clock:
The time of EVENT_DONE* with respect to the last cluster found in the event is fixed.
8-bits wide cluster data:Em(4:0), buff(1:0),1
Em(12 : 5) Had(4:0), pass(1:0),1Had(12 : 5)Eta(4:0), phi(1:0), 1Ntow(6:0), crate_sel
Evt_done, buff(1:0)
CLIQUE control word
(assume this is the last clusterfor the event )
LVDS output
Transfer A (1st 33ns)---------------------bit0 - data(0)bit1 - data(1)bit2 - data(2)bit3 - data(3)bit4 - data(4)bit5 - data(5)bit6 - data(6)
bit7 - VCC
Transfer B (2nd 33ns)---------------------bit0 - data(7)bit1 - data(8)bit2 - data(9)bit3 - data(10)bit4 - data(11)bit5 - data(12)bit6 - data(13)bit7 - VCC
Transfer C (3rd 33ns)---------------------bit0 - data(14)bit1 - data(15)bit2 - data(16)bit3 - data(17)bit4 - data(18)bit5 - data(19)bit6 - data(20)bit7 - Bunch Zero Marker
Transfer D (4th 33ns)--------------------- bit0 - data(21)bit1 - data(22)bit2 - data(23)bit3 - GNDbit4 - GNDbit5 - GNDbit6 - L2 Endmarkbit7 - GND
Information from Eric James about muon input:
Each matchbox card sends up to 32 24-bit words for each fiber. The transfer time for each word is 132 ns. Each 24-bit word is encoded into 32-bit transfer over hotlink and come in as four groups of 8-bit words.
There is a register on the Matchbox card which gives one the ability to send zero, ten, twenty, or thirty words to L2. This feature was included in case we needed to complete our data transfer within a giventime window to make the system work. The central trigger primitives are sent in the first ten words, the forward trigger primitives are sent in the next ten words, and L1 trigger decision data is sent in the last ten words. If one looks at the table in section 29.5.1 of CDF4152, the words which get sent to L2 begin with the High Pt CMU East bits (P0+3) and end with the IMB Diagnostic bits (P0+32). Theoutput ordering of the words is the same as that shown in the table. The pre-match connections workIn exactly the same way. There are only 16 24-bit words output to L2 from each pre-match card.From the table in section 29.5.2 of CDF4152, the first word which gets sent is the CMP primitives for stacks 00-23 (P0+2). The last word sent is CMP/CSP west matches for stacks 72-95 (P0+17). The ordering is the same as in the table. There are also register bits on the Pre-Match to control the number of words being sent. For this card one would transfer either zero, eight, or sixteen words.
Muon data as an example.Each word is 24 bits (sent as4 8-bit words over hotlink).
CDF Muon bank data format
To source data for each individual data path may not be hard to do, but to drive the full system, test the system robustness and rate capability…need more thinking.
To keep things simple, it will involve TS and most likely will take system time if we need TS to use L2 decisions (can be done when there is no beam);
Calibration L1A strobe is only good for the initial tests, the realtest requires L1A patterns close to real situation. Can TS generatethese patterns?
From test stand tools to a possible upgrade path
Only need a few minor modifications:
(1) Add P3 connector for SLINK IO(2) Make L1 and SVT/XTRP inputs visible to all 3 FPGAs
Note: the input mezzanine card connector is already compatible with SLINK cards
Since the modification is simple at hardware level,it doesn’t hurt to add them in, to make the board moregeneral purpose. It provides the interface to a PC whichcould be very useful as test tools as well.
TRK
Control
TS
L1
Optical IO
TRK
Mezz cardconnectors
Level2_Pulsar baseline design (as a tester only)
9U VME(VME FPGA not shown)
CDFctrl
3 Altera APEX 20K400 FPGAs
OpticalIO
Pulsar: Pulser and Recorder (as Level 2 test stand tools)
TRK
Control
TS
L1
Optical IO
TRK
Mezz cardconnectors
Level2_Pulsar design (enhanced version)
9U VME(VME FPGA not shown)
P1
P2 CDFctrl
3 Altera APEX 20K400 FPGAs
OpticalIO
From test stand tool to a more general purpose board: only need a few minor changes
P3 SLINKsignal lines
Aslo make the L1 and SVT/TRK inputs available to all FPGAs
MezzanineCard
PULSAR design
IO
C
IO
Hotlink/Taxi/Slink etc
TS
L1
SVTSVT
L1
Front-panel(double width)
component sideOther connectors (L1 out, TS in) will stay inside the board, only used in pulser mode,The mezzanine card connectors can be used either for optical I/O or SLINK cards
LSCTo/from
PC
SLINK
The transition module is very simple (just a few SLINK CMC connectors).
Will make our own (it is not commercially available). It usesP2 type connector for P3.
Loaded with SLINKMezzanine cards
Can simply useCDF CAL backplane.
CERN sent ustwo transition modules
LSC (Link Source Card),LDC (LINK Destination Card)
Mezzanine card which can plugonto motherboard via CMC(Common Mezzanine Card)Connector (just like PMC).
Examples of SLINK products (we ordered them and have most of them)
PCI to SLINK SLINK to PCIProven technology, has been used by a few experiments to takehundreds of TB data without problem in the past few years
ATLAS SLINK data format
CDF SLINK format will look very similar….
can follow TL2D bankformat for each subsystem,Will need to define the format for each subsystem soon.
Inputs welcome.
Output from each Pulsar interface board will be in SLINK format
Design philosophy:Simplicity/Uniformity&Testability ONE for ALL and ALL for ONE design
Level2_PULSAR stands for
Level2_Processor Controller (3) pre-processor/merger (2) pULSer (1) And Readout (2) (1) as pulser;
(2) as pre-processor/merger & readout; (3) as Processor Controller (with SLINK to PCI) Backward compatible (with existing system) Based on proven modern technology: CERN SLINK products (use as much as possible commercial available products)
L1 trk svt clist Iso reces mu SumEt,MEt
Tracks
Jetselectrons
photons
muons
Taus
Tags
MetSumEt
What does Level 2 really do?• Create all the trigger objects needed, then• Count objects above thresholds, or,• Cut on kinematics quantities
From Henry Fish’s comments on L2 upgrade
RecesPre-processor
X 3
16 x3fibers
Cluster Pre-processor
MuonPre-processor
L1 bitsXTRP
cluster6 fibers
Iso7 fibers
16 fibers
Global ProcessorController
L1SVT
TS
L2 decision
New System
Reces/trk
Cluster/trk
Muon/trk
Slink to PCI
PCI to Slink
CPU
SumET,METfrom a L1 type cable(another mezzanine card)
SLINK
SLINK
SLINK
RecesPre-processor
X 3
3 RecesPre-processor
+ 1 merger
16x3 fiberstaxi
6 hotlink+ LVDS
7 taxi fibers
12 hotlinkmatchbox
4 hotlinkprematchbox
XTRP
L1
SLINK
SumEt,MET
CPU
TS
L1 SVT
New system configuration
Reces/Trk
Cluster/trk
Muon/trkSLINK
Red: FPGAs require largest internal buffering capability
MUON
RECES
PROCESSOR
ROC
TRACER
CLUSTER
RECES
RECES
Muon/trk
Cluster/trk
MERGER
NewL2Decisioncrate
SumEt,MEt
TS
Slinkto PCI
PCIto Slink
mem
CPU
Reces/trk
GHz PC orVME processor
All Pulsar boards take two slots (due to mezzanine cards)Total: 7 pulsar boards = 14 slots
Baseline design: use pre-processors tosimply suppress/organize data, use processorController to simply pass data to CPU via Slink to PCI and also handshake with TS. All trigger algorithmwill be handled by CPU.
MUON
RECES
PROCESSOR
ROC
TRACER
CLUSTER
RECES
RECES
Muon/trk
Cluster/trk
MEGER
NewL2Decisioncrate
PreFred
TS
Slinkto PCI
PCIto Slink
mem
CPU
Reces/trk
GHz PC orVME processor
We use SLINK products to:(1) Send data from processor controller to CPU;(2) Receive L2 decision data from CPU to Pulsar(3) Link between pulsar boards
48 fiberstaxi
6 hotlink+ LVDS
7 taxi fibers
12 hotlinkmatchbox
4 hotlinkprematchbox
XTRP
L1
SLINK
SumEt,MET
CPU
TS
L1 SVT
Another possible system configuration (to save money)(3 Pulsar boards, use 1 MMB and 4 Reces)
Reces/Trk
Cluster/trk
Muon/trkSLINK
Red: FPGAs require largest internal buffering capability
muon
cluster
Existing Reces boards
Short Mbus
MMBReces data in SVT or SLINK format
FIFO
DAQ buffer
TestRAM
TestRAM
Upstream data
Downstreamdata
Initial estimate of FPGA RAM capacity requirements (worse case):
Each FPGA needs to have up to 4 eventsFIFO like storage at input stage, needs to have 4 events DAQ buffers (not includingtest RAMs at input and output):Optical IO FPGA (worse case is muon input): (1.3KB/evt * 4) *2 = 10.4 KBEP20K400 has 26KB maximum RAM capacity, while EP20K200 has 13KB.
Merger FPGA (worse case is in processor controller mode): (1.8KB/evt * 4)*2 = 15 KB, EP20K400 should be big enough
Assuming no data suppression (for worse case)
Pulsar firmware design initial considerations
• Two different types of FPGA design (& in different modes): Optical IO FPGA (x2) and Merger FPGA• need an overall design to optimize the firmware structure from start, as many parts can be reused for all three types of FPGAs• in general, firmware will be coded in VHDL.• ideally we need to implement all functionalities and fully simulate everything. But since we would like to have the prototype early (in tester mode), need to define a set of functionalities to be implemented and fully simulated before the prototype design is finalized• one important issue is to find out the requirements for each type of FPGAs and IO pin needs (package) through firmware simulation, to optimize the design (cost vs performance)
DAQ readout
Formatter/Filter
algorithm
downstream upstream
RAM RAM
Overall Design for all FPGA types:
Play&Record
Need to identify stuff in common for each FPGA type, they will be in a library (such as SLINK formatter, SRAM controller, RAM play&record, DAQ, VME interface etc).Some initial considerations for each FPGA type later…
VME
Optical IO FPGA
DAQ buffersSRAMController
VME IO
Formatter/filter
RAM controller
RAM controller
MezzanineCard interface
CDF Ctrlinterface
upstreamdownstream
Blocks in green are common to all FPGA types, blocks in whiteare specific to this type of FPGA
Play/recordPlay/record
Mezzanine card interface
Mezzanine Card ID
Mezzanine cardIdentification
Mezzanine cardinterface
Feature: has a dedicated unit which identifies the mezzanine card plugged into motherboard at power up.At power up, all FPGAs will be configured, but the inputswill be “disconnected” by default. The mezzanine card identificationunit will first identify the mezzanine card type, if it is the wrongtype for this FPGA firmware, signal an error (error register&LED). If the right type is identified, the gate will be closed….
Mezzanine card identification at Power Up --- has to be robust and “ professor proof ”
Mezzanine cardshave both inputand output types
gate
Merger FPGA
DAQ buffersSRAMController
VME interface
Meger
RAM controller
RAM Controller(slink spy)
CDF Ctrlinterface
upstream
Blocks in green are common to all FPGA types, blocks in whiteare specific to this type of FPGA
Play/recordPCISLINK out
SLINK in
Transfer A (1st 33ns)---------------------bit0 - data(0)bit1 - data(1)bit2 - data(2)bit3 - data(3)bit4 - data(4)bit5 - data(5)bit6 - data(6)
bit7 - VCC
Transfer B (2nd 33ns)---------------------bit0 - data(7)bit1 - data(8)bit2 - data(9)bit3 - data(10)bit4 - data(11)bit5 - data(12)bit6 - data(13)bit7 - VCC
Transfer C (3rd 33ns)---------------------bit0 - data(14)bit1 - data(15)bit2 - data(16)bit3 - data(17)bit4 - data(18)bit5 - data(19)bit6 - data(20)bit7 - Bunch Zero Marker
Transfer D (4th 33ns)--------------------- bit0 - data(21)bit1 - data(22)bit2 - data(23)bit3 - GNDbit4 - GNDbit5 - GNDbit6 - L2 Endmarkbit7 - GND
Information from Eric James about muon input:
Each matchbox card sends up to 32 24-bit words for each fiber. The transfer time for each word is 132 ns. Each 24-bit word is encoded into 32-bit transfer over hotlink and come in as four groups of 8-bit words.
There is a register on the Matchbox card which gives one the ability to send zero, ten, twenty, or thirty words to L2. This feature was included in case we needed to complete our data transfer within a giventime window to make the system work. The central trigger primitives are sent in the first ten words, the forward trigger primitives are sent in the next ten words, and L1 trigger decision data is sent in the last ten words. If one looks at the table in section 29.5.1 of CDF4152, the words which get sent to L2 begin with the High Pt CMU East bits (P0+3) and end with the IMB Diagnostic bits (P0+32). Theoutput ordering of the words is the same as that shown in the table. The pre-match connections workIn exactly the same way. There are only 16 24-bit words output to L2 from each pre-match card.From the table in section 29.5.2 of CDF4152, the first word which gets sent is the CMP primitives for stacks 00-23 (P0+2). The last word sent is CMP/CSP west matches for stacks 72-95 (P0+17). The ordering is the same as in the table. There are also register bits on the Pre-Match to control the number of words being sent. For this card one would transfer either zero, eight, or sixteen words.
FIFOFIFO
FIFOFIFO
Optical IO FPGA firmwarefor muon case
Motherboard Optical IO FPGAEach matchbox card sends up to 32 24-bit words for each fiber. The transfer time for each word is 132 ns. Each 24-bit word is encoded into 32-bit transfer over hotlink and come in as four groups of 8-bit words.
D C B A
8 bits each, totalEncoded useful data is 24 bits @132 ns
8 bits data stream will be pushed into FIFO for each fiber, then they will be clocked into motherboard optical IO FPGA (pushed or pulled). The FIFO read clock can be faster than the write clock (depends on how fast one can clock data over CMC connectors). To be conservative, assume for now the read/write clock is about the same.
32-bit FIFO@132ns
(30 x 4 deep)8->32
Motherboard Optical IO FPGA (only shown 4 ch)
32-bit FIFO@132ns
32-bit FIFO@132ns
32-bit FIFO@132ns
8->32
8->32
8->32
8 bits@33ns
L1A(x4) queue
64-bit (x4) FIFO@132ns
L1 bits
24-bit FIFO@132ns
XTRP data
Muon Filter
Algorithm(pulls one
event worthof data at
a time)
*Checks data consistence•Filters data based on L1
bits and XTRP information•muon-track
Match…. etc.
SRAM
0-30 deg
30-60 deg
60-90 deg
90-120 deg
SLINKFormatter32-bit@40MHz
32 bits data+ ctrl bits
L1ABuffer(2)
ToMergerFPGA
Muon data as an example.Each word is 24 bits (sent as4 8-bit words over hotlink).
Can pack them into 32-bitwords (SLINK is 32 bits):4 24-bit words can be packed into3 32-bit words.
Or can pack them in 24 bitsexactly as muon bank format,and use the other 8 bitsto stamp other information …
CDF Muon bank data format
36-bit FIFO@40Mhz
Motherboard Merger FPGA
36-bit FIFO@40Mhz
36-bit FIFO@40Mhz
36-bit FIFO@40Mhz
36 bits@40Mhz
L1A(x4) queue
64-bit (x4) FIFO@132ns
L1 bits
24-bit FIFO@132ns
XTRP data
Muon merger
Algorithm(pulls one
event worthof data at
a time)
*Checks data consistence•merges and stamp data
SRAM
0-120 deg
120-240 deg
240-360 deg
SLINKFormatter32-bit@40MHz
32 bits data+ 4 bits ctrl
L1ABuffer(2)
FromOptical IO FPGAs To
ControlFPGA
FIFOFIFO
FIFO
F E D C B A8 bits each @50ns, one cluster is encoded in 6 8-bit words in all 6 fibers
8 bits data stream will be pushed into FIFO for each fiber. The end of event marker comes from LVDS connector.
FIFOFIFO
FIFOFIFO
Optical IO FPGA firmwarefor CLIST case
LVDS
Inputs: 6 fibers + LVDS
48-bit FIFO@300ns8->48
Motherboard Optical IO FPGA
8->48
8->48
8->48
8 bits@50ns
L1A(x4) queue
64-bit (x4) FIFO@300nsL1 bits
24-bit FIFO@300ns
XTRP data
Cluster FormatterAlgorithm
(pulls oneevent worth
of data ata time)
*sums all infofor each cluster*checks end of
event*Checks data consistence•Stamp data based on L1
bits and XTRP information•cluster-track
match?(electron ID)
SRAM
SLINKFormatter32-bit@40MHz
32 bits data+ 4 bits ctrl
L1ABuffer(2)
ToMergerFPGA
48-bit FIFO@300ns
48-bit FIFO@300ns
48-bit FIFO@300ns
48-bit FIFO@300ns
48-bit FIFO@300ns
8->48
8->48
8-bit FIFO@300ns4->16
LVDS
Baseline design mayonly use one Optical IO FPGA to handle cluster data
Motherboard Merger FPGA
36-bit FIFO@40Mhz
36-bit FIFO@40Mhz
36 bits@40Mhz
Iso input
L1A(x4) queue
64-bit (x4) FIFO@132nsL1 bits
24-bit FIFO@132ns
XTRP data
Cluster merger
Algorithm(pulls one
event worthof data at
a time)
*Checks data consistence•merges and stamp data
for both clusterand Iso info
SRAM
SLINKFormatter32-bit@40MHz
32 bits data+ 4 bits ctrl
L1ABuffer(2)
FromOptical IO FPGAs To
ControlFPGA
Cluster input
Pulsar initial layout in progress …
Plan for board level simulation
• implement the Tx case for muon first, with each Optical IO FPGA driving 8 fibers. This is on going effort. VME write/read to 36 bits RAM already working. Peter is working on the state machine.
• for the Rx case, will only implement in such a way as a data recorder, to send the data to a PC as well as making the data available to VME access. Will first start with the simplest case: Pulsar takes 4 SLINK inputs, L1/SVT inputs, and merger them and send out on P3 in SLINK format. next will modify the code so that Pulsar will receive hotlink fiber data (16 fibers), and send them onto P3 in SLINK format.• will decide exactly what to be implemented first. The goal is to quickly get to board level simulation to check all the data input/outputs.
Motherboard FPGA (for all of them)
36-bit FIFO@40Mhz
36-bit FIFO@40Mhz
36 bits@40Mhz
L1A(x4) queue
64-bit (x4) FIFO@132ns
L1 bits
24-bit FIFO@132ns
SVT data
(pulls oneevent worth
of data ata time)
*Checks data consistence•merges and stamp data
SRAM
SLINKFormatter32-bit@40MHz
32 bits data+ 4 bits ctrl
L1ABuffer(2)
VME
TRK
Merger
TS
L1
Optical IO
TRK
Mezz cardconnectors
Initial Pulsar Board (Rx) Level simulation (I)
(VME FPGA not shown)
CDFctrl
OpticalIO
All three FPGAs will have the same firmware. Red lines are the ones to be simulated
SLINK
SLINK
SLINK
TRK
Merger
TS
L1
Optical IO
TRK
Mezz cardconnectors
Initial Pulsar Board (Rx) Level simulation (II)
(VME FPGA not shown)
CDFctrl
OpticalIO
All three FPGAs will have the same firmware. Red lines are the ones to be simulated
hotlink
SLINK
SLINK
TRK
Merger
TS
L1
Optical IO
TRK
Mezz cardconnectors
Initial Pulsar Board (Tx) Level simulation (I)
(VME FPGA not shown)
CDFctrl
OpticalIO
All three FPGAs will have the same firmware. Red lines are the ones to be simulated
hotlink
Transmitter Receiver
Initial mezzanine cards board level simulation
(1) 4 fiber case first(2) 2 fiber + LVDS case
This setup can test everything except CMC connectors
XTRP lookup table(for both muon and Cal):XFT divides the COT into 288 segments (1.25 degree each),A wedge spanning 15 degree has 12 segments, each segmenthas a unique LUT within a wedge.
15 bits Address 18 bits output
0-1: phase CM IM
00: CMU high Pt CAL 01: CMU low Pt crack 10: CMX high Pt IMU 11: CMX low Pt IMU2-8: 96 XFT signed-Pt
9-11: Local phi within segment
12: passed superlayer 8?13: Isolation (other tracks nearby?)
14: reserved for future use
For CMU, CMX and IMU,the 18 bits represent the 18 triggertowers in three wedges (one +two neighbor wedges)
when even a part of tower is withinthe muon footprint and satisfiesthe Pt threshold, the bit corresponding to the tower is set to 1
An example of LUT for CMU for an isolated track with Pt =-6.19 GeV/c, local phi=1.02 degree, and passed superlayer 8 of COT
Bit address output content
001100011111100 000000000000000000001100011111101 000000000000100000001100011111110 000000000000000000001100011111111 000000000001110000
Phase:CMU high Pt
Pt bins for –6.19 GeV/c
local phi
passed SL8
isolation
new version: pre-production board is working . CERN is willingto send us one pre-production board next month
Will use simple SLINK to PCI version (proven technology, 120MB/s) first for R&D, and for low luminosity run. In the future we can use the new faster version (260MB/s) and PC.
Data Drain test board
Data source test boardWe ordered them and have beenplaying with them (ideal for initialPulsar prototype testing)
Transmitter Receiver
Mezzanine card Prototype/production test plan I (use the working teststand setup at UC):
Pattern Generator HP LA
(1) Use PG + LA;(2) Use FPGA internal RAM + LA(3) Use BIST + LA (hotlink) (run for long time and set limit on bit-error-rate totest the robustness of the design)
This setup can test everything except CMC connectors
Tx
Prototype/production test plan II: use one Pulsar prototype board
this would allow FULL tests (including the CMC connectors)
Rx
I/O
I/O
Mezzanine cardsmass productioncan start ONLY AFTERthe prototypesare tested withPulsar prototype
Note: Pulsar prototypewill be tested withSLINK test tools first
M
PULSAR
IO
M
IO
Slink out
CPU
Use SLINK source card to send dataUse SLINK data sink to check data
Initial test plan for Pulsar board prototype Pulsar board prototype can be first tested with SLINK test tools
then can be tested with a PC, then can be tested with custom mezzanine cards
Reces x 4L1
XTRP
SVT
CLIST
ISO
MUON
AlphasX 4
One SVT Cable each
6 fiber (hotlink)1 LVDS cable 7 fibers (Taxi) 16 fibers (hotlink)
12 fibers(Taxi)1 L1 cable
Magic bus
L2 crate inputs
Asked 2 questions a while ago (the answer is in the Level2_Pulsar design):
(1) can one design an universal test (pulser)board? (testability) (2) can one design an universal interface board?(uniformity)
The difficulty we have had with L2 is thatthe system was not designed forsimplicity/uniformity and testability. Of course,simplicity/uniformity is something easy to say but hard to do...
each board isdesigned differentlyby different groups,