Pulsar Functional requirements

Pulsar Functional requirements

Ted Liu

Level 2 group meeting, Feb 22rd. 02

Part I: Pulsar as a Pulser and Recorder (test stand tools)

Part I outline: Pulser mode

• The need for L2 teststand and goals• pulser board design concept• what pulser board should be able to do?• L2 input data for each subsystem• firmware design considerations for pulser mode

“In the longer term, the test-stand system should be aggressively pursued. This will allow completion of the development effort and longer-term maintenance of the full system.” As recommended by the committee, we will schedule a workshop for the L2 group to discuss the specifications for this system…

“The committee thinks that the test station system, combined with the 2nd test crate sounds like the perfect way to provide various types of simulated event data (different luminosity, trigger types, suspected failure modes etc.). Providing that this effort won't impact any activities needed to make the baseline system work, it should be strongly supported, and prototyping and testing work should go ahead at full speed.”

L2 Review committee Recommendation (Dec. 14th, 2001):

“Hold a design workshop by the rest of the Level 2 groups in order to ensure that what is built is safe to use and is capable of exercising all the important parts of the system in a realistic fashion”.

How you could use expanded teststand capabilities ? (from this meeting agenda)• if we have a teststand with a general input pulser, how would you want to use it? --- debugging broken/spare boards; --- testing firmware modifications; --- ???• what type of tests would you want to run? --- is fixed patterns with fixed timing good enough? --- data from real events? --- test multiple boards and check for interference? --- Randomly timed L1A patterns? --- Random/user controlled latency of input data? --- L1As etc driven in a deterministic way? (TESTCLK) --- L1As etc driven by TS? --- ???• what kind of software tools you will need for your testing?

Basic requirement for a L2 trigger data source board:

L1A for buffer n

Data block

latency

To L2 interfaceBoard input

FIFO

RAM

Test pattern

(1) upon L1A for buffer n, start a counter for buffer n;

(2) At the same time clock data from RAM into the FIFO,

(3) once the counter reaches latency threshold, clock the data out from the FIFO at the speed which matches with that of the subsystem… the actual latency is controlled by when the data is clocked out the FIFO.

This is an over simplified picture. Each subsystem is somewhat different and to design an universal test board is not all that easy…

Reces x 4L1

XTRP

SVT

CLIST

ISO

MUON

AlphasX 4

One SVT Cable each

6 fiber (hotlink)1 LVDS cable 7 fibers (Taxi) 16 fibers (hotlink)

12 fibers(Taxi)L1 cables

Magic bus

L2 crate inputs

can one design an universal L2 test (pulser) board? -- to enhance the testability of L2

“provide various types of simulated event data (different luminosity, trigger types, suspected failure modes etc.)”

Reces x 4L1

XTRP

SVT

CLIST

ISO

MUON

AlphasX 4

One SVT Cable each


12 fibers(Taxi)L1 cables

Magic bus

L2 crate inputs

•After L1A, data arrives to each interface board with different latency (L1 within 132 ns, XTRP within ~1us, then the rest. SVT takes 10 us or so---longest);• most boards (L1, XTRP, CLIST, IsoList and SVT) will then request for bus, and send its data to alpha over magicbus;• alpha will process the data, if it needs muon and Reces data (for some events), it will then get the data via programmed I/O over magicbus from muon and Reces boards• once decision is made, alpha will handshake with Trigger Supervisor (via L2-TS cable)

•

Level 2 decision crate

TRACER

MVME

Fixed or variabledata length?

SVT XTRP L1 CLIST ISO Muon

Data with Buffer#?

Incoming dataClock rate

data size range

Latency range*

EOE with data? (or from separate path?)

B0 marker?

Data gap withinone event?

Reces

30Mhz 7.6Mhz 7.6Mhz 20Mhz 12Mhz 30Mhzcdfclk x 4

Interface hardware SVT cable SVT cable L1 cable Hotlink+fiber Taxi+fiber Hotlink+fiber

7.6 Mhz cdfclk

Taxi+fiber

96 bits/evt

fixed fixed

no

yes yes - no no yes -

yes

Level 2 trigger input data paths

Flow control ?

150bits/trk 21 bits/trk 46bits/clu 145bits/clu 11Kbits/evt 1.5Kb/evt

~ 6 us

yesyes

~132 ns~1us - 10us

variable

yes

BC#

yes

not used

~10-100us

variable

yes

BC#

yes

Not used

yes

no

no

no

~1-20us

variable

yes

no

no

no

variable

no

fixed

no no

no

nono

~1-5 us~few us

* Latency range also depends on L1A history …

no

yes

Design issues for an universal test board:

Hardware requirement is clear:

• have all hardware interfaces for all data paths;

Firmware requirements need more thinking:

• variable data size for some subsystem;• variable latency (from event to event);• correct buffer number in the data for a given L1A;• gaps for certain data paths;• record real data and reproduce in test stand;• response to HRR etc• ….???

αSVT

XTRP

CLIST

ISO

L2 decision crate

L1

Reces

Pulsar

Hotlink IO

Taxi IO

SVT/XTRP

L1TS

CDFctrl

VME

MUON

Pulsar is designed to have all the data interfaces that Level 2 decision crate has. It is a data source for all triggerinputs to Level 2 decision crate, it can be used to record data from upstream as well.

PulsAR: Pulser And Recorder

Basic hardware requirement: have all hardware interfaces

αSVT

XTRP

CLIST

ISO

L2 decision crate

L1

RecesHP scope

LogicalAnalyzer

HP scope

What are the test tools ? -> two types of board:

L2 inputs

TS

MMB

PULSAR

PULSAR

PULSAR

PULSAR

PULSAR

MMBMagic

MysteryBoard

SVTformat

Magicbus

CDFctrl

Pulsar

Hotlink IO

Taxi IO

SVT/XTRP

L1TS

CDFctrl

VME

MUON

L2 test crate

TRK

Control

TS

L1

Optical IO

TRK

Mezz cardconnectors

Level2_Pulsar design as test stand tool

9U VME(VME FPGA not shown)

CDFctrl

3 Altera APEX 20K400 FPGAs

OpticalIO

Pulsar: Pulser and Recorder (as Level 2 test stand tools)

MezzanineCard

PULSAR design as test tool only

IO

Ctrl

IO

Hotlink/Taxi

TS

L1

SVTSVT

L1

Front-panel(double width)

component sideOther connectors (1 L1, 1 TS) will stay inside the board.The mezzanine card connectors are used for optical I/O (hotlink and taxi)

FIFO

FIFO

FIFO

FIFO

Custom Mezzanine cards

• Hotlink: Tx and Rx (CLIST, Muon data paths)• Taxi: Tx and Rx (Iso, Reces data paths)

Altera EP1K30_144 FPGA

CMC connector

J1

J3

Hotlink or Taxi Tx/Rx chips

Hotlink Tx/Rx: CY7B923JC/933JC

Taxi Tx/Rx: AM7968/7969-175JC

Hotlink Optical Tx/Rx: HFBR-1119T/2119T

Taxi Optical Tx/Rx: HFBR-1414T/2416T

Usually has 4 fiber connectors.

add one LVDS connector for CLIST case: only two fiber connector (left side) will be loaded for one Mezzanine card, and one LVDS connector will be loaded on the right side instead oftwo fiber connectors

Hotlink Rx mezzanine Hotlink Tx mezzanine

Pulsar boards communication lines on P2

Will use SVT like P2 user defined signals forInter-board communication:

(1) Pulsar_init* (P2 A1);(2) Pulsar_error* (P2 A2);(3) Pulsar_freeze* (P2 A3);(4) Pulsar_Lostlock* (P2 A4); (for SLINK etc).(5) Pulsar_spare* (P2 A5);

Any Pulsar board can drive and listen to these signal lines

Control unit

SRAM

InternalTest

RAM

controller

L1ABuf #

FIFOsL1 dataSVT dataXTRP data

The latency is controlled

by when the data is clocked

out the FIFO

If we want to load large number of events,we will need to use external SRAM.

FIFOFIFO

FIFOFIFO

Optical IO Unit

SRAM

InternalTest

RAM

controller

L1ABuf #

The latency is controlledby when the data is clockedout the FIFO

hotlink examples:

Muon case (only one mezzanine card shown)

FIFOFIFO

FIFO

Optical IO Unitfor CLIST caseF E D C B A

8 bits each @50ns, one cluster is encoded in 6 8-bit words in all 6 fibers

8 bits data streams will be pushed into the FIFOs in the mezzanine card after L1A, later on they will be clocked out onto fibers. The end of event marker comes out via LVDS connector.

FIFOFIFO

FIFOFIFO

LVDS

outputs: 6 fibers + LVDS Another hotlink example

8 bits wideper fiber

128 words deep

Simple way to load test patterns and send them out (optical paths as an example)

buffer0

buffer3

buffer2

buffer1

ClockedFIFO

Fiber Tx8 1

8

The actual latency will be controlled bywhen the data is clocked out the FIFO after L1A (use a register via VME)

This means the latency will be fixedfor a given test run. This is not goodenough to mimic the real systemas the latency varies from event to event, butmay be good enough for testing spares

FPGAInternal RAM

8 bitsdata

8 bitsdata

8 bitsdata

4 Ctrlbits

another way to load test pattern memory: use 36 bits data width, 32 will be for 4 fiber output (4 x 8), the highest 4 bits will be used as control bits to mark the content of data. For each event worth data, the first one will be the header, and the 32 bits data will contain the latency (&number of words etc) for this particular event and this particular path. The last one is the trailer, which can contain other info if needed (such as what L2 decision should be etc) (either use internal RAM or use 16 bit address 36 bit data external SRAM):

Buffer0 data

Buffer1 data

Buffer2 data

Buffer3 data

36bits

8 bitsdata

The highest two address bits will be controlled by buffer number to divide automatically the memory for 4 buffers

How does it work: (1) after L1A, read the first word(header) and get the latency, at the same time start a counter; (2) continue to readout the rest of the data words from the memory and clock them into a FIFO, until the trailer is reached (can get the L2 decision information there) (3) once the counter reaches latency threshold, clock the data out from the FIFO at the speed which matches with the subsystem. this way the latency for each event and each data path can be individually controlled by user.

Buffer 0 data memory

header

trailer

Latency for this event, and other info

data data data data

Other information (what L2 decision should be etc)

One could have morecontrol by insertinggaps in betweendata words…etc usingthe 4 control bits,to better mimic thereal situation for certain data paths.

This approach seem to bequite flexible

1st event

Ctrl bit 35: headerCtrl bit 34: trailerCtrl bit 33: gapCtrl bit 32: reserved

Initial thoughts on tester firmware design

FIFO

L1ABuf#

counter0

counter1

counter2

counter3

RAM

VME36bits

addr

State machine

FFIO

FFIO

FFIO

FFIO

data

ctrl ctrl

MezzanineCard side

• Latch L1A+buf#• read 1st word from RAM• save latency&compare with counter• continue reading data from RAM to FIFOs• until last word• once counter countsUp to latency, enableFIFO output and ctrlSignals for Tx chips• ready for next L1A

Buffer 0 data

Buffer 1 data

Buffer 2 data

Buffer 3 data

Notes:

(1) Counter will be reset by the statemachine, rearm for next L1A(2) if counter already counts beyond the latency for a given L1A, clock data out right away;(3) RAM address controlled by an address counter and buffer number with the current event;(4) HRR: reset L1A FIFO and all output FIFOs, wait for B0 marker to pass by before enabling…(5) To keep things simple, for now just use single event data for a given buffer (to keep RAM address simple)(6) what’s shown is for 4 fiber case, need one more internal RAM for 8 fibers… if use external SRAM, need ping-ponging…

null

L1A

header

RAMtoFIFO

Latency

done

output

Comments:Pro: simpleCon: not so elegant, as thestate machine has to finishsending all data out of the FIFObefore able to process nextone. Maybe ok if run at higher clock rate. Will be some intrinsicdelay between events.

Good starting point, allow us to simulate theboard soon.

Would be better to separatethe RAM to FIFO part fromthe actual data sending part

Possible implementation A:

null

L1A

header

RAMtoFIFO

done

Comments:Pro: more elegantCon: somewhat more involved.

Implement this laterFor the real thing.

We decided to go for Implementation A first.

Output controller

FIFO

Data ready to be sent

latency

counterQ

Possible implementation B:

A few comments:

(1) On average, the maximum data size is from muon. Each fiber can send up to 30 x 4 = 120 8-bit words per event (with 16 fibers total) . If we use a 36 data bits 16 address bits SRAM, we can load up to a few hundreds of different events for muon for a given test run. Can load much more for other subsystems.

(2) The latency for each event and for each data path (arrival time into L2 decision crate after L1A) can be controlled by user to better mimic the real system;

(3) The long data gaps or long delays for some subsystems can be simulated this way;

(4) latency for individual events?

* estimate based on data size (i.e. more clusters -> longer latency etc); * it may be possible to record the real data with Pulsar in recorder mode and time stamp the incoming data during recording. Then save them and can later be used to reproduce the real data with real timing. * note: actual latency also depends on the history of L1As.

for example, it may be possible to record XTRP data with timing information: would this be useful?

XTRP input

data recording

RAM

time recording

RAM

Counter for L1A buffer n

Both data and arrival time can be recorded with the data strobe from XTRP. The data latency AND “gap” information can be recorded this way, and can be reproduce in test stand mode. Since Pulsar has both XTRP input and output connectors,spying on the data is possible. For fiber connections, need to use fiber splitters to spy on data, or take short test runs.

FIFOFIFO

FIFOFIFO

Optical IO Unit

SRAM

InternalTest

RAM

controller

L1ABuf #

Configure the (S)RAMas a circular buffer for recording (for each L1A) and can be stoppedand read out via VME.

Each Optical IO FPGA looks at 8 fiber channels,SRAM has 32+4 bits. So need ping-ponging forrecording (recording is at twice the incomingdata rate, 60MHz)

hotlink examples:

Muon case (only one mezzanine card shown)

Pulsar in recorder mode

or

VME

PULSAR

Magic bus

Ideal test stand setup: Alpha + Pulsar + interface board(s) (this setup has been done already at VME speed by Steve, Matt and Peter with SVT Merger board acting as Pulsar)

ALPHA

INTFACE

data input

Data source: Level2_PulsarData sink: AlphaPossible data patterns: (1) hand made (2) derived from MC (3) derived from data bank (4) recorded from upstream, catch errors and reproduce them

TRACER

ROC

• can test individual board

• can test the full data path;• can also test multiple boards and check for interference• note that with only 2 Pulsar boards one can source data at the same time for L1, XTRP, SVT, CLIST, Iso and muon. need one extra Pulsar to drive one Reces,• …. what else?

PULSAR

Magic bus or TDC backplane

Another possible setup: Pulsar + MMB + Interface board

INTFACE

data input

Data source: Level2_PulsarData sink: MMB+Pulsar/GB

TRACER

ROC

MMB

MagicbusAnalyzer

in case Alpha is not available, it is possible to use MMB to sink the data and convert into SVT data format then send the data into Pulsar or a GhostBuster board.

or TESTCLK?

Reces x 4L1

XTRP

SVT

CLIST

ISO

MUON

alphas

Magic bus

PULSAR

PULSAR

Possible to drive the full system

TRACER

L2 test crate

PULSAR

5 Pulsar boards needed to drive the full system

16 fibersX 3 =48

16 fibers

PULSAR

with different

MMB

PULSAR

Magicbus watchdog

ROC L1A rate &

Event patterns

train no. I II III IV V VI-----------------------------------------------------------------------------sig_0 1 em(5) 1 had(5) 1 crate_selsig_1 L1AB(0) em(6) passbit(0) had(6) phi(0) ntow(0)sig_2 L1AB(1) em(7) passbit(1) had(7) phi(1) ntow(1)sig_3 em(0) em(8) had(0) had(8) eta(0) ntow(2)sig_4 em(1) em(9) had(1) had(9) eta(1) ntow(3)sig_5 em(2) em(10) had(2) had(10) eta(2) ntow(4)sig_6 em(3) em(11) had(3) had(11) eta(3) ntow(5)sig_7 em(4) em(12) had(4) had(12) eta(4) ntow(6)

CLIST cluster information from one LOCOS6 8-bits words per cluster on one fiber input, arriving 50ns apart

Data format from Monica.

pin 1 BUF_DONE(0)+pin 2 BUF_DONE(0)-pin 3 BUF_DONE(1)+pin 4 BUF_DONE(1)-pin 5 CRSUM_SEND+ (not received by CLIST)pin 6 CRSUM_SEND- (not received by CLIST)pin 7 EVENT_DONE*+pin 8 EVENT_DONE*-pin 9 unusedpin 10 unused

1 CLIQUE connection (via 10 pin twisted ribbon cable)

The LVDS signals are driven by a 16.7 nsec clock which is a divided-by-8 copy of the 132 nsec CDF clock:

The time of EVENT_DONE* with respect to the last cluster found in the event is fixed.

8-bits wide cluster data:Em(4:0), buff(1:0),1

Em(12 : 5) Had(4:0), pass(1:0),1Had(12 : 5)Eta(4:0), phi(1:0), 1Ntow(6:0), crate_sel

Evt_done, buff(1:0)

CLIQUE control word

(assume this is the last clusterfor the event )

LVDS output

Transfer A (1st 33ns)---------------------bit0 - data(0)bit1 - data(1)bit2 - data(2)bit3 - data(3)bit4 - data(4)bit5 - data(5)bit6 - data(6)

bit7 - VCC

Transfer B (2nd 33ns)---------------------bit0 - data(7)bit1 - data(8)bit2 - data(9)bit3 - data(10)bit4 - data(11)bit5 - data(12)bit6 - data(13)bit7 - VCC

Transfer C (3rd 33ns)---------------------bit0 - data(14)bit1 - data(15)bit2 - data(16)bit3 - data(17)bit4 - data(18)bit5 - data(19)bit6 - data(20)bit7 - Bunch Zero Marker

Transfer D (4th 33ns)--------------------- bit0 - data(21)bit1 - data(22)bit2 - data(23)bit3 - GNDbit4 - GNDbit5 - GNDbit6 - L2 Endmarkbit7 - GND

Information from Eric James about muon input:

Each matchbox card sends up to 32 24-bit words for each fiber. The transfer time for each word is 132 ns. Each 24-bit word is encoded into 32-bit transfer over hotlink and come in as four groups of 8-bit words.

There is a register on the Matchbox card which gives one the ability to send zero, ten, twenty, or thirty words to L2. This feature was included in case we needed to complete our data transfer within a giventime window to make the system work. The central trigger primitives are sent in the first ten words, the forward trigger primitives are sent in the next ten words, and L1 trigger decision data is sent in the last ten words. If one looks at the table in section 29.5.1 of CDF4152, the words which get sent to L2 begin with the High Pt CMU East bits (P0+3) and end with the IMB Diagnostic bits (P0+32). Theoutput ordering of the words is the same as that shown in the table. The pre-match connections workIn exactly the same way. There are only 16 24-bit words output to L2 from each pre-match card.From the table in section 29.5.2 of CDF4152, the first word which gets sent is the CMP primitives for stacks 00-23 (P0+2). The last word sent is CMP/CSP west matches for stacks 72-95 (P0+17). The ordering is the same as in the table. There are also register bits on the Pre-Match to control the number of words being sent. For this card one would transfer either zero, eight, or sixteen words.

Muon data as an example.Each word is 24 bits (sent as4 8-bit words over hotlink).

CDF Muon bank data format

To source data for each individual data path may not be hard to do, but to drive the full system, test the system robustness and rate capability…need more thinking.

To keep things simple, it will involve TS and most likely will take system time if we need TS to use L2 decisions (can be done when there is no beam);

Calibration L1A strobe is only good for the initial tests, the realtest requires L1A patterns close to real situation. Can TS generatethese patterns?

From test stand tools to a possible upgrade path

Only need a few minor modifications:

(1) Add P3 connector for SLINK IO(2) Make L1 and SVT/XTRP inputs visible to all 3 FPGAs

Note: the input mezzanine card connector is already compatible with SLINK cards

Since the modification is simple at hardware level,it doesn’t hurt to add them in, to make the board moregeneral purpose. It provides the interface to a PC whichcould be very useful as test tools as well.

TRK

Control

TS

L1

Optical IO

TRK

Mezz cardconnectors

Level2_Pulsar baseline design (as a tester only)


CDFctrl


OpticalIO

Pulsar: Pulser and Recorder (as Level 2 test stand tools)

TRK

Control

TS

L1

Optical IO

TRK

Mezz cardconnectors

Level2_Pulsar design (enhanced version)


P1

P2 CDFctrl


OpticalIO

From test stand tool to a more general purpose board: only need a few minor changes

P3 SLINKsignal lines

Aslo make the L1 and SVT/TRK inputs available to all FPGAs

MezzanineCard

PULSAR design

IO

C

IO

Hotlink/Taxi/Slink etc

TS

L1

SVTSVT

L1

Front-panel(double width)

component sideOther connectors (L1 out, TS in) will stay inside the board, only used in pulser mode,The mezzanine card connectors can be used either for optical I/O or SLINK cards

LSCTo/from

PC

SLINK

The transition module is very simple (just a few SLINK CMC connectors).

Will make our own (it is not commercially available). It usesP2 type connector for P3.

Loaded with SLINKMezzanine cards

Can simply useCDF CAL backplane.

CERN sent ustwo transition modules

LSC (Link Source Card),LDC (LINK Destination Card)

Mezzanine card which can plugonto motherboard via CMC(Common Mezzanine Card)Connector (just like PMC).

Examples of SLINK products (we ordered them and have most of them)

PCI to SLINK SLINK to PCIProven technology, has been used by a few experiments to takehundreds of TB data without problem in the past few years

ATLAS SLINK data format

CDF SLINK format will look very similar….

can follow TL2D bankformat for each subsystem,Will need to define the format for each subsystem soon.

Inputs welcome.

Output from each Pulsar interface board will be in SLINK format

Design philosophy:Simplicity/Uniformity&Testability ONE for ALL and ALL for ONE design

Level2_PULSAR stands for

Level2_Processor Controller (3) pre-processor/merger (2) pULSer (1) And Readout (2) (1) as pulser;

(2) as pre-processor/merger & readout; (3) as Processor Controller (with SLINK to PCI) Backward compatible (with existing system) Based on proven modern technology: CERN SLINK products (use as much as possible commercial available products)

L1 trk svt clist Iso reces mu SumEt,MEt

Tracks

Jetselectrons

photons

muons

Taus

Tags

MetSumEt

What does Level 2 really do?• Create all the trigger objects needed, then• Count objects above thresholds, or,• Cut on kinematics quantities

From Henry Fish’s comments on L2 upgrade

RecesPre-processor

X 3

16 x3fibers

Cluster Pre-processor

MuonPre-processor

L1 bitsXTRP

cluster6 fibers

Iso7 fibers

16 fibers

Global ProcessorController

L1SVT

TS

L2 decision

New System

Reces/trk

Cluster/trk

Muon/trk

Slink to PCI

PCI to Slink

CPU

SumET,METfrom a L1 type cable(another mezzanine card)

SLINK

SLINK

SLINK

RecesPre-processor

X 3

3 RecesPre-processor

+ 1 merger

16x3 fiberstaxi

6 hotlink+ LVDS

7 taxi fibers

12 hotlinkmatchbox

4 hotlinkprematchbox

XTRP

L1

SLINK

SumEt,MET

CPU

TS

L1 SVT

New system configuration

Reces/Trk

Cluster/trk

Muon/trkSLINK

Red: FPGAs require largest internal buffering capability

MUON

RECES

PROCESSOR

ROC

TRACER

CLUSTER

RECES

RECES

Muon/trk

Cluster/trk

MERGER

NewL2Decisioncrate

SumEt,MEt

TS

Slinkto PCI

PCIto Slink

mem

CPU

Reces/trk

GHz PC orVME processor

All Pulsar boards take two slots (due to mezzanine cards)Total: 7 pulsar boards = 14 slots

Baseline design: use pre-processors tosimply suppress/organize data, use processorController to simply pass data to CPU via Slink to PCI and also handshake with TS. All trigger algorithmwill be handled by CPU.

MUON

RECES

PROCESSOR

ROC

TRACER

CLUSTER

RECES

RECES

Muon/trk

Cluster/trk

MEGER

NewL2Decisioncrate

PreFred

TS

Slinkto PCI

PCIto Slink

mem

CPU

Reces/trk

GHz PC orVME processor

We use SLINK products to:(1) Send data from processor controller to CPU;(2) Receive L2 decision data from CPU to Pulsar(3) Link between pulsar boards

48 fiberstaxi

6 hotlink+ LVDS

7 taxi fibers

12 hotlinkmatchbox

4 hotlinkprematchbox

XTRP

L1

SLINK

SumEt,MET

CPU

TS

L1 SVT

Another possible system configuration (to save money)(3 Pulsar boards, use 1 MMB and 4 Reces)

Reces/Trk

Cluster/trk

Muon/trkSLINK

Red: FPGAs require largest internal buffering capability

muon

cluster

Existing Reces boards

Short Mbus

MMBReces data in SVT or SLINK format

FIFO

DAQ buffer

TestRAM

TestRAM

Upstream data

Downstreamdata

Initial estimate of FPGA RAM capacity requirements (worse case):

Each FPGA needs to have up to 4 eventsFIFO like storage at input stage, needs to have 4 events DAQ buffers (not includingtest RAMs at input and output):Optical IO FPGA (worse case is muon input): (1.3KB/evt * 4) *2 = 10.4 KBEP20K400 has 26KB maximum RAM capacity, while EP20K200 has 13KB.

Merger FPGA (worse case is in processor controller mode): (1.8KB/evt * 4)*2 = 15 KB, EP20K400 should be big enough

Assuming no data suppression (for worse case)

Pulsar firmware design initial considerations

• Two different types of FPGA design (& in different modes): Optical IO FPGA (x2) and Merger FPGA• need an overall design to optimize the firmware structure from start, as many parts can be reused for all three types of FPGAs• in general, firmware will be coded in VHDL.• ideally we need to implement all functionalities and fully simulate everything. But since we would like to have the prototype early (in tester mode), need to define a set of functionalities to be implemented and fully simulated before the prototype design is finalized• one important issue is to find out the requirements for each type of FPGAs and IO pin needs (package) through firmware simulation, to optimize the design (cost vs performance)

DAQ readout

Formatter/Filter

algorithm

downstream upstream

RAM RAM

Overall Design for all FPGA types:

Play&Record

Need to identify stuff in common for each FPGA type, they will be in a library (such as SLINK formatter, SRAM controller, RAM play&record, DAQ, VME interface etc).Some initial considerations for each FPGA type later…

VME

Optical IO FPGA

DAQ buffersSRAMController

VME IO

Formatter/filter

RAM controller

RAM controller

MezzanineCard interface

CDF Ctrlinterface

upstreamdownstream

Blocks in green are common to all FPGA types, blocks in whiteare specific to this type of FPGA

Play/recordPlay/record

Mezzanine card interface

Mezzanine Card ID

Mezzanine cardIdentification

Mezzanine cardinterface

Feature: has a dedicated unit which identifies the mezzanine card plugged into motherboard at power up.At power up, all FPGAs will be configured, but the inputswill be “disconnected” by default. The mezzanine card identificationunit will first identify the mezzanine card type, if it is the wrongtype for this FPGA firmware, signal an error (error register&LED). If the right type is identified, the gate will be closed….

Mezzanine card identification at Power Up --- has to be robust and “ professor proof ”

Mezzanine cardshave both inputand output types

gate

Merger FPGA

DAQ buffersSRAMController

VME interface

Meger

RAM controller

RAM Controller(slink spy)

CDF Ctrlinterface

upstream

Blocks in green are common to all FPGA types, blocks in whiteare specific to this type of FPGA

Play/recordPCISLINK out

SLINK in

Transfer A (1st 33ns)---------------------bit0 - data(0)bit1 - data(1)bit2 - data(2)bit3 - data(3)bit4 - data(4)bit5 - data(5)bit6 - data(6)

bit7 - VCC

Transfer B (2nd 33ns)---------------------bit0 - data(7)bit1 - data(8)bit2 - data(9)bit3 - data(10)bit4 - data(11)bit5 - data(12)bit6 - data(13)bit7 - VCC

Transfer C (3rd 33ns)---------------------bit0 - data(14)bit1 - data(15)bit2 - data(16)bit3 - data(17)bit4 - data(18)bit5 - data(19)bit6 - data(20)bit7 - Bunch Zero Marker

Transfer D (4th 33ns)--------------------- bit0 - data(21)bit1 - data(22)bit2 - data(23)bit3 - GNDbit4 - GNDbit5 - GNDbit6 - L2 Endmarkbit7 - GND

Information from Eric James about muon input:

Each matchbox card sends up to 32 24-bit words for each fiber. The transfer time for each word is 132 ns. Each 24-bit word is encoded into 32-bit transfer over hotlink and come in as four groups of 8-bit words.

There is a register on the Matchbox card which gives one the ability to send zero, ten, twenty, or thirty words to L2. This feature was included in case we needed to complete our data transfer within a giventime window to make the system work. The central trigger primitives are sent in the first ten words, the forward trigger primitives are sent in the next ten words, and L1 trigger decision data is sent in the last ten words. If one looks at the table in section 29.5.1 of CDF4152, the words which get sent to L2 begin with the High Pt CMU East bits (P0+3) and end with the IMB Diagnostic bits (P0+32). Theoutput ordering of the words is the same as that shown in the table. The pre-match connections workIn exactly the same way. There are only 16 24-bit words output to L2 from each pre-match card.From the table in section 29.5.2 of CDF4152, the first word which gets sent is the CMP primitives for stacks 00-23 (P0+2). The last word sent is CMP/CSP west matches for stacks 72-95 (P0+17). The ordering is the same as in the table. There are also register bits on the Pre-Match to control the number of words being sent. For this card one would transfer either zero, eight, or sixteen words.

FIFOFIFO

FIFOFIFO

Optical IO FPGA firmwarefor muon case

Motherboard Optical IO FPGAEach matchbox card sends up to 32 24-bit words for each fiber. The transfer time for each word is 132 ns. Each 24-bit word is encoded into 32-bit transfer over hotlink and come in as four groups of 8-bit words.

D C B A

8 bits each, totalEncoded useful data is 24 bits @132 ns

8 bits data stream will be pushed into FIFO for each fiber, then they will be clocked into motherboard optical IO FPGA (pushed or pulled). The FIFO read clock can be faster than the write clock (depends on how fast one can clock data over CMC connectors). To be conservative, assume for now the read/write clock is about the same.

32-bit FIFO@132ns

(30 x 4 deep)8->32

Motherboard Optical IO FPGA (only shown 4 ch)

32-bit FIFO@132ns

32-bit FIFO@132ns

32-bit FIFO@132ns

8->32

8->32

8->32

8 bits@33ns

L1A(x4) queue

64-bit (x4) FIFO@132ns

L1 bits

24-bit FIFO@132ns

XTRP data

Muon Filter

Algorithm(pulls one

event worthof data at

a time)

*Checks data consistence•Filters data based on L1

bits and XTRP information•muon-track

Match…. etc.

SRAM

0-30 deg

30-60 deg

60-90 deg

90-120 deg

SLINKFormatter32-bit@40MHz

32 bits data+ ctrl bits

L1ABuffer(2)

ToMergerFPGA

Muon data as an example.Each word is 24 bits (sent as4 8-bit words over hotlink).

Can pack them into 32-bitwords (SLINK is 32 bits):4 24-bit words can be packed into3 32-bit words.

Or can pack them in 24 bitsexactly as muon bank format,and use the other 8 bitsto stamp other information …

CDF Muon bank data format

36-bit FIFO@40Mhz

Motherboard Merger FPGA

36-bit FIFO@40Mhz

36-bit FIFO@40Mhz

36-bit FIFO@40Mhz

36 bits@40Mhz

L1A(x4) queue


L1 bits

24-bit FIFO@132ns

XTRP data

Muon merger

Algorithm(pulls one


a time)

*Checks data consistence•merges and stamp data

SRAM

0-120 deg

120-240 deg

240-360 deg


32 bits data+ 4 bits ctrl

L1ABuffer(2)

FromOptical IO FPGAs To

ControlFPGA

FIFOFIFO

FIFO

F E D C B A8 bits each @50ns, one cluster is encoded in 6 8-bit words in all 6 fibers

8 bits data stream will be pushed into FIFO for each fiber. The end of event marker comes from LVDS connector.

FIFOFIFO

FIFOFIFO

Optical IO FPGA firmwarefor CLIST case

LVDS

Inputs: 6 fibers + LVDS

48-bit FIFO@300ns8->48

Motherboard Optical IO FPGA

8->48

8->48

8->48

8 bits@50ns

L1A(x4) queue

64-bit (x4) FIFO@300nsL1 bits

24-bit FIFO@300ns

XTRP data

Cluster FormatterAlgorithm

(pulls oneevent worth

of data ata time)

*sums all infofor each cluster*checks end of

event*Checks data consistence•Stamp data based on L1

bits and XTRP information•cluster-track

match?(electron ID)

SRAM



L1ABuffer(2)

ToMergerFPGA

48-bit FIFO@300ns

48-bit FIFO@300ns

48-bit FIFO@300ns

48-bit FIFO@300ns

48-bit FIFO@300ns

8->48

8->48

8-bit FIFO@300ns4->16

LVDS

Baseline design mayonly use one Optical IO FPGA to handle cluster data

Motherboard Merger FPGA

36-bit FIFO@40Mhz

36-bit FIFO@40Mhz

36 bits@40Mhz

Iso input

L1A(x4) queue

64-bit (x4) FIFO@132nsL1 bits

24-bit FIFO@132ns

XTRP data

Cluster merger

Algorithm(pulls one


a time)


for both clusterand Iso info

SRAM



L1ABuffer(2)

FromOptical IO FPGAs To

ControlFPGA

Cluster input

Pulsar initial layout in progress …

Plan for board level simulation

• implement the Tx case for muon first, with each Optical IO FPGA driving 8 fibers. This is on going effort. VME write/read to 36 bits RAM already working. Peter is working on the state machine.

• for the Rx case, will only implement in such a way as a data recorder, to send the data to a PC as well as making the data available to VME access. Will first start with the simplest case: Pulsar takes 4 SLINK inputs, L1/SVT inputs, and merger them and send out on P3 in SLINK format. next will modify the code so that Pulsar will receive hotlink fiber data (16 fibers), and send them onto P3 in SLINK format.• will decide exactly what to be implemented first. The goal is to quickly get to board level simulation to check all the data input/outputs.

Motherboard FPGA (for all of them)

36-bit FIFO@40Mhz

36-bit FIFO@40Mhz

36 bits@40Mhz

L1A(x4) queue


L1 bits

24-bit FIFO@132ns

SVT data

(pulls oneevent worth

of data ata time)


SRAM



L1ABuffer(2)

VME

TRK

Merger

TS

L1

Optical IO

TRK

Mezz cardconnectors

Initial Pulsar Board (Rx) Level simulation (I)

(VME FPGA not shown)

CDFctrl

OpticalIO

All three FPGAs will have the same firmware. Red lines are the ones to be simulated

SLINK

SLINK

SLINK

TRK

Merger

TS

L1

Optical IO

TRK

Mezz cardconnectors

Initial Pulsar Board (Rx) Level simulation (II)


CDFctrl

OpticalIO


hotlink

SLINK

SLINK

TRK

Merger

TS

L1

Optical IO

TRK

Mezz cardconnectors

Initial Pulsar Board (Tx) Level simulation (I)


CDFctrl

OpticalIO


hotlink

Transmitter Receiver

Initial mezzanine cards board level simulation

(1) 4 fiber case first(2) 2 fiber + LVDS case

This setup can test everything except CMC connectors

XTRP lookup table(for both muon and Cal):XFT divides the COT into 288 segments (1.25 degree each),A wedge spanning 15 degree has 12 segments, each segmenthas a unique LUT within a wedge.

15 bits Address 18 bits output

0-1: phase CM IM

00: CMU high Pt CAL 01: CMU low Pt crack 10: CMX high Pt IMU 11: CMX low Pt IMU2-8: 96 XFT signed-Pt

9-11: Local phi within segment

12: passed superlayer 8?13: Isolation (other tracks nearby?)

14: reserved for future use

For CMU, CMX and IMU,the 18 bits represent the 18 triggertowers in three wedges (one +two neighbor wedges)

when even a part of tower is withinthe muon footprint and satisfiesthe Pt threshold, the bit corresponding to the tower is set to 1

An example of LUT for CMU for an isolated track with Pt =-6.19 GeV/c, local phi=1.02 degree, and passed superlayer 8 of COT

Bit address output content

001100011111100 000000000000000000001100011111101 000000000000100000001100011111110 000000000000000000001100011111111 000000000001110000

Phase:CMU high Pt

Pt bins for –6.19 GeV/c

local phi

passed SL8

isolation

new version: pre-production board is working . CERN is willingto send us one pre-production board next month

Will use simple SLINK to PCI version (proven technology, 120MB/s) first for R&D, and for low luminosity run. In the future we can use the new faster version (260MB/s) and PC.

Data Drain test board

Data source test boardWe ordered them and have beenplaying with them (ideal for initialPulsar prototype testing)

Transmitter Receiver

Mezzanine card Prototype/production test plan I (use the working teststand setup at UC):

Pattern Generator HP LA

(1) Use PG + LA;(2) Use FPGA internal RAM + LA(3) Use BIST + LA (hotlink) (run for long time and set limit on bit-error-rate totest the robustness of the design)

This setup can test everything except CMC connectors

Tx

Prototype/production test plan II: use one Pulsar prototype board

this would allow FULL tests (including the CMC connectors)

Rx

I/O

I/O

Mezzanine cardsmass productioncan start ONLY AFTERthe prototypesare tested withPulsar prototype

Note: Pulsar prototypewill be tested withSLINK test tools first

M

PULSAR

IO

M

IO

Slink out

CPU

Use SLINK source card to send dataUse SLINK data sink to check data

Initial test plan for Pulsar board prototype Pulsar board prototype can be first tested with SLINK test tools

then can be tested with a PC, then can be tested with custom mezzanine cards

Reces x 4L1

XTRP

SVT

CLIST

ISO

MUON

AlphasX 4

One SVT Cable each


12 fibers(Taxi)1 L1 cable

Magic bus

L2 crate inputs

Asked 2 questions a while ago (the answer is in the Level2_Pulsar design):

(1) can one design an universal test (pulser)board? (testability) (2) can one design an universal interface board?(uniformity)

The difficulty we have had with L2 is thatthe system was not designed forsimplicity/uniformity and testability. Of course,simplicity/uniformity is something easy to say but hard to do...

each board isdesigned differentlyby different groups,

Pulsar Functional requirements

Documents

Transcript of Pulsar Functional requirements