AGATA Global Trigger and Synchronization Hardwareagata.pd.infn.it/GTS/AGATA Global Trigger...

48
Draft version 1.3 – 11 November 2005 AGATA Global Trigger and Synchronization Hardware 1. Introduction .................................................................................................................................... 3 2. The Front-end Model ..................................................................................................................... 4 3. GTS Functionalities ....................................................................................................................... 7 4. Front-end Simulation..................................................................................................................... 9 4.1 Front-end system dead time ................................................................................................................. 9 4.2 The Derandomizer Size....................................................................................................................... 10 4.3 The Simulation Environment ............................................................................................................. 11 4.3.1 The carrier .................................................................................................................................... 11 4.3.2 The Channel Structure ................................................................................................................ 13 4.3.3 Trigger Matching ......................................................................................................................... 14 4.3.4 Simulation Results ....................................................................................................................... 16 5. Global Timestamp Protocol ........................................................................................................ 16 6. Global Trigger Algorithms.......................................................................................................... 16 7. Implementation ............................................................................................................................ 17 7.1 The root node ....................................................................................................................................... 18 7.2 The backplane...................................................................................................................................... 19 7.3 The Fanin-fanout nodes ...................................................................................................................... 20 7.4 The fibre connections ......................................................................................................................... 20 7.5 The mezzanine interface ..................................................................................................................... 21 7.5.1 GTS Mezzanine operation........................................................................................................... 22 7.5.2 A VHDL model of the GTS Mezzanine...................................................................................... 23 8. Phase equalization of GTS clocks ............................................................................................... 25 8.1 Continuous resets method .................................................................................................................. 25 8.1.1 Test results .................................................................................................................................... 27 8.2 Direct Measurement............................................................................................................................ 30 8.2.2 Test Results................................................................................................................................... 30 9. References ..................................................................................................................................... 31 10. Revision History ......................................................................................................................... 31 Appendix A. GTS Mezzanine Interface Pin-out ........................................................................... 32 Appendix B. Costs ............................................................................................................................ 33 M. Bellato – INFN Padova 1

Transcript of AGATA Global Trigger and Synchronization Hardwareagata.pd.infn.it/GTS/AGATA Global Trigger...

Draft version 1.3 – 11 November 2005

AGATA Global Trigger and Synchronization Hardware

1. Introduction....................................................................................................................................3

2. The Front-end Model.....................................................................................................................4

3. GTS Functionalities .......................................................................................................................7

4. Front-end Simulation.....................................................................................................................9 4.1 Front-end system dead time ................................................................................................................. 9 4.2 The Derandomizer Size....................................................................................................................... 10 4.3 The Simulation Environment............................................................................................................. 11

4.3.1 The carrier.................................................................................................................................... 11 4.3.2 The Channel Structure ................................................................................................................ 13 4.3.3 Trigger Matching......................................................................................................................... 14 4.3.4 Simulation Results ....................................................................................................................... 16

5. Global Timestamp Protocol ........................................................................................................16

6. Global Trigger Algorithms..........................................................................................................16

7. Implementation ............................................................................................................................17 7.1 The root node....................................................................................................................................... 18 7.2 The backplane...................................................................................................................................... 19 7.3 The Fanin-fanout nodes...................................................................................................................... 20 7.4 The fibre connections......................................................................................................................... 20 7.5 The mezzanine interface ..................................................................................................................... 21

7.5.1 GTS Mezzanine operation........................................................................................................... 22 7.5.2 A VHDL model of the GTS Mezzanine...................................................................................... 23

8. Phase equalization of GTS clocks ...............................................................................................25 8.1 Continuous resets method .................................................................................................................. 25

8.1.1 Test results.................................................................................................................................... 27 8.2 Direct Measurement............................................................................................................................ 30

8.2.2 Test Results................................................................................................................................... 30 9. References .....................................................................................................................................31

10. Revision History .........................................................................................................................31

Appendix A. GTS Mezzanine Interface Pin-out ...........................................................................32

Appendix B. Costs ............................................................................................................................33

M. Bellato – INFN Padova 1

Draft version 1.3 – 11 November 2005

B.1 Single Crystal prototype costs. .......................................................................................................... 33 B.2 Demonstrator costs. ............................................................................................................................ 33

Appendix C. Timescale and manpower .........................................................................................35 C.1 GTS interface mezzanine................................................................................................................... 35 C.2 Root Node............................................................................................................................................ 35 C.3 Fanin-fanout Node.............................................................................................................................. 35

Appendix D. The GTS Mezzanine VHDL model ..........................................................................36 D.1 T_valreject_ctrl. ................................................................................................................................. 38

D.1.1 T_valreject_ctrl internal block diagram................................................................................... 38 D.2 RTX. .................................................................................................................................................... 40

D.2.1 RTX internal block diagram...................................................................................................... 40 D.3 FIFO_L1A........................................................................................................................................... 42 D.4 DPRAM_tstamp. ................................................................................................................................ 43 D.5 FIFO_FREELIST............................................................................................................................... 44 D.6 Trigger_match. ................................................................................................................................... 45

D.6.1 Trigger_match internal block diagram..................................................................................... 45 D.6.2 Scan_mem_l1a internal block diagram..................................................................................... 46

D.7 T_request_ctrl..................................................................................................................................... 47 D.7.1 T_request_ctrl internal block diagram..................................................................................... 47

M. Bellato – INFN Padova 2

Draft version 1.3 – 11 November 2005

1. Introduction Data synchronization is an important aspect in the operation of the trigger and readout systems of the AGATA experiment. A high increase in efficiency with respect to current spectrometers is expected in AGATA by means of online gamma ray tracking and pulse shape analysis (PSA). Tracking and PSA require the concurrent digitization of preamplifier signals of the 36 fold segmented Ge crystals composing the array. Therefore, the design of the front-end readout and level-1 (L1) trigger in AGATA follows a synchronous pipeline model: the detector data are stored in pipeline buffers at the global AGATA frequency, waiting the global L1 decision. The L1 latency must be constant and shall match the pipeline buffer length. The whole system behaves synchronously and synchronization at different levels and in different contexts has to be achieved and monitored for proper operation of the system. In order to fix definitions, we list in Table 1 the various synchronization types that we refer to.

Type Description

Sampling Synchronization Synchronization of the detector signals with the clock phase

Serial Link Synchronization Recovery of parallel data words from the serial bit stream.

Trigger Requests Alignment Alignment of trigger data at the input of the trigger pipeline processor

L1 Validations Synchronization

Synchronization of L1A signal with data in the readout pipelines

Event Synchronization Assignment of global clock and event number to data fragments in the DAQ path

Table 1: Synchronization types. In AGATA each crystal is considered as a separate entity and from the point of view of the Data Acquisition System (DAQ), the whole detector may be considered as the aggregation of synchronized data supplied by individual crystals, possibly disciplined by a global trigger primitive. Each crystal is composed of 36 segments and a central core contact, all individually readout. The data from the core contact are processed for event detection and hence, a level 1 trigger request or local trigger generation. The choice between the two behaviors is done upon configuration, the former corresponding to an effective way to reduce front-end data rates in cases where anyone of the stages of the readout chain is unable to perform at the actual data throughput.

M. Bellato – INFN Padova 3

Draft version 1.3 – 11 November 2005

2. The Front-end Model A columnar model of the data flow concerning each individual segment of a crystal can be represented as in fig. 1. This sort of digital pipe highlights the flow starting from the digitizer of a single channel down to the Pulse Shape Analysis farm where position estimation of gamma ray interactions is carried out.

Fig. 1 – The AGATA readout column

The model shows two types of trigger interactions with the flow: a local trigger signal generated by central core processing and, possibly, a global trigger L1 accept signal generated externally from a central trigger processor. To ease the solution of the problems posed by the different synchronization levels specified before, AGATA shares a global time reference supplied by a global trigger and synchronization control system (GTS) and distributed by means of a network of optical fibres to the front-end electronics of each crystal. Fig. 2 shows the aggregate model of a front-end system and its interface to the GTS.

M. Bellato – INFN Padova 4

Draft version 1.3 – 11 November 2005

Fig. 2 – Model of a detector channel

The following functionalities will be described in terms of behaviour. 1: Extraction of the physical signal from the sub-detector and transformation to digital data.

The transformation includes any needed shaping or processing. If the detector is the central core contact of the crystal, the data are made available to the Trigger Primitive Generator (TPG).

2: Delay on the data during the trigger level-1 latency.

This delay can be implemented in various ways : storage with pointers management or a simple pipeline shifting synchronously with the global clock for example. The delay value will be the sum of the decision time of the trigger logic and the propagation time. Hence, the delay can vary depending on the location of the front-end.

3: Data selection and tagging.

On the result of level-1 trigger process, data corresponding to that event must be selected from the pipeline. A data frame (data recorded during several consecutive clock cycles) is to be considered and not a single data. Corresponding global clock and event numbers must be associated to those data to tag them.

4: Derandomization.

M. Bellato – INFN Padova 5

Draft version 1.3 – 11 November 2005

Once selected by a level-1 accept, the frame is saved in a multi-event buffer. This buffer stage is needed to allow a frequency derandomization between the level-1 rate and the readout process. The size of the multi-event buffer has to be defined in order to guarantee the smallest buffer-overflow probability possible. A FIFO-like behavior is needed for this buffer.

5: First level merge.

A given number of derandomizer buffers is attached to a merge engine that will readout data associated to a particular event. The engine produces a so called "Front End Event".

6: Front End Event formatting.

The data format of the Front End Event must include the global clock counter and event numbers to allow later data misalignment search [1]. The presence of a Front End identifier could also be required.

7: Transmission to FED

The FE Event is then sent out to the Front-end driver (FED) via a data link. The media and protocol used for this link should be a DAQ standard.

8: Monitoring of the derandomizer filling level

If the readout scheme is such that every derandomizer has the same status at the same time (after readout completion), this case is simple and the solution is straightforward (see below). If the readout scheme is such that every derandomizer can have a different status (e.g. variable event size), warning or worst status must be included to the "Front End Event" for later processing at the FED level.

9: Test facilities To have maximum efficiency in the problem detection, they have to be implemented as high as possible in the readout chain.

10: Processing.

Data processing (e.g. lossless compression) can also be needed in the front end. Location and exact functionality of the processing is to be defined.

M. Bellato – INFN Padova 6

Draft version 1.3 – 11 November 2005

3. GTS Functionalities From the logical description of the front-end operation given above it turns out that a certain number of global time referenced signals are needed. Among them:

1. common clock 2. global clock counter 3. global event counter 4. trigger controls:

i. Throttling of the L1 validation signal ii. Fast commands (fast reset, initialization, etc.) iii. Fast monitoring feedback from the crystals iv. Calibration and test trigger sequence commands v. Monitor of dead time

5. Trigger requests 6. Error reports

In AGATA, the transport medium of all these signals is shared by use of serial optical bidirectional links connecting the front-end electronics of each crystal with a central global trigger and synchronization control unit in a tree-like structure, thus actually merging together the three basic functionalities of synchronization distribution, global control and trigger processing. More in detail: 1: Common clock

This is a 100 MHz digital clock supplied by a central timing unit (possibly GPS disciplined) and used to clock the high speed optical transceivers reaching the front-end electronics of every crystal. At the crystal receiving side the clock is reconstructed and filtered for jitter. The clock signals of each crystal may be equalized for delay and phase, thus accounting for different fibre lengths and different crystal locations in the array.

2: Global clock counter

A 48 bit digital pattern used to tag event fragments before Front-end buffer formatting. The pattern is the actual count of the global clock. It will be used by PSA and global event builders to merge the event fragments in one single event.

3: Global event counter

A 16 bit digital pattern used to tag event fragments before Front-end buffer formatting. The pattern is the actual count of the L1 validations.

4: Trigger controls

The Trigger Control must guarantee that sub-systems are ready to receive every L1 Accept delivered. This is essential to prevent buffers overflows and/or trigger signals missed when the crystals are not ready to receive them. In either case, the consequence would be a loss of synchronization between event fragments.

M. Bellato – INFN Padova 7

Draft version 1.3 – 11 November 2005

Warning signals sent from the crystal through the GTS network, indicating that some of its buffers are almost full, may be received centrally. However this feedback signal can take few microseconds to reach the Trigger Control, which meanwhile could have delivered a number of L1A signals that originate a buffer overflow. This problem is particularly acute in the front-end derandomizers which have a small storage capacity. According to the front-end electronic logical model, the front-end derandomizers after the L1 latency pipelines are the first devices to overflow when the L1A rate is too high. Space and power constraints in the front-ends imply small derandomizer depth and hence these queues are very sensitive to bursty L1A. In general, the derandomizers behave like a first-in-first-out queue: the input/output frequency is directly the L1A rate. The overflow probability is strongly dependent on the ratio between the service time and the buffer depth. The consequence would be of resetting the whole front-end electronics which would cause a severe loss of efficiency in the DAQ. All front-end derandomizers behave identically. Therefore, their occupancy depend only on the L1A rate and on the service time. A state machine receiving the L1A signals can emulate the de-randomizer behavior and determine its occupancy at each new L1A. If a new L1A is estimated to cause a de-randomizer overflow, this L1A is throttled. In general, it would be very difficult to guarantee that the state machine reproduces exactly the buffer status at every time. However in the present case the L1A accept signals are synchronous with the clock and the write and read latencies are measured in multiples of the clock period. It is this time quantization that makes the de-randomizer emulation really possible. A complementary solution to the same problem is to oblige the delivery of L1A signals to comply with a set of trigger rules. These rules take the general form ’no more than n L1A signals in a given time interval’. Suitable rules, inducing a negligible dead time, would minimize the buffer overflow probability.

5: Trigger request

The central core contact signal might be considered as the overlap of all the signals in the segments of a crystal; in fact a deposit of energy in any of the segment will induce a signal in the central core, thus acting as a sort of analog sum of single segment signals. Therefore, the central core can be processed for event localization in a crystal. Suitable algorithms for this task have been identified [] and tested. The outcome of the algorithm may issue a trigger request to the central trigger processor by asserting this signal which is transmitted via the high speed serial links of the GTS network upwards to the central trigger unit. All the trigger requests collected from the crystals at each global clock cycle form a pattern that can be processed centrally for multiplicity or coincidence with ancillary detectors. The result of this processing stage constitutes the L1 validation.

6: Error reports

Abnormal conditions as buffer overflows, local faults, built-in self tests, etc. can be reported centrally for proper corrective actions.

M. Bellato – INFN Padova 8

Draft version 1.3 – 11 November 2005

4. Front-end Simulation The Front End System presents intrinsic inefficiencies by design, in the sense that all level-1 triggers might not be processed. The two major causes of these inefficiencies are the Front End System dead time following a trigger, and the limited size of the Front End Channel derandomizing buffers. For the complete readout chain of AGATA, a maximum acceptable inefficiency has to be defined: a value in the range of a few percents at 1 MHz trigger rate has to be achieved. Unfortunately, this trigger rate is not known as an absolute worst case, hence, to have a good safety margin for system design, we should regard it as the variance of a Gaussian process and take the 6σ value as our absolute worst case.

4.1 Front-end system dead time A dead time is induced after level-1 trigger reception by the hardware architecture of the Local Level Processing electronics. This dead time is due mainly to the time needed to read a frame out of the front-end fifo’s and to the time needed to sink the synchronization tags from the GTS interface mezzanine. As a consequence, triggers occurring during this period cannot be processed, and corresponding event data are lost. An estimation of the inefficiency induced by the Local Level Processing hardware can be calculated with the following assumptions. Let d be the dead time, r the trigger rate. If we consider a Poisson law for the trigger distribution, the probability of one event (at least) occurring during the dead time is:

( ) rderdxeP rdrdl ≈−=⎟⎟

⎞⎜⎜⎝

⎛−= −− 1

!01

0

assuming that rd is small relative to 1. Hence, at the level of each crystal, 1 microsecond of dead time after local trigger generation will account for 5% of local efficiency loss at 50 KHz trigger rate. Global system inefficiency due to hardware dead time is computed in the following way. Let’s remember how a global trigger is usually generated in AGATA: local triggers generated at the crystal level get routed to a central trigger processor which is configured for asserting a trigger validation whenever a programmable condition is met. The simplest of these conditions is a multiplicity, e.g. pretending that more than one (usually M = [2..30]) trigger requests are asserted in the same time window. It’s easy to realize that, at multiplicity M>1, dead time of one crystal electronics induces a dead time for the whole system. For sake of simplicity, let’s suppose M=2. An event firing crystals, say, no. 1 and 2 gets validated and those crystals enter a 1 microsecond dead time. Any subsequent event firing any two crystals will be validated only if crystals no.1 and 2 are not interested by the new event. The probability of the new event firing crystals no. 1 or 2 is computed by knowing the number of times that crystals no. 1 or 2 are present in all possible combinations without repetition of any two crystals out of 180 (this is the number of HPGe detectors foreseen for AGATA). Let’s take N = the total number of crystals, M = multiplicity: the total number T of combinations without repetition of any M crystals out of N is the well know formula:

( ) ⎟⎟⎠

⎞⎜⎜⎝

⎛−×

=!!

!MNM

NT

M. Bellato – INFN Padova 9

Draft version 1.3 – 11 November 2005

The number of times K that one crystal is present in all possible combinations without repetition of any M crystals out of N is given by :

)!()!1()!1(

MNMN

MNTK

−×−−

=⎟⎟⎟⎟

⎜⎜⎜⎜

=

So, the probability of the new event firing a crystal that entered a dead time is given by:

NM

TKMPd

2

=×=

It’s nice to note that, already at multiplicity M=13, with N=180, it’s almost certain that a new event occurring less than one microsecond after the preceding will hit a crystal whose electronics will not catch it. Hence, the global system inefficiency due to crystal dead time is obtained as:

NrdMPPP dl

2

≈×=

for NM ≤ or

rdPPP dl ≈×=

for NM φ

4.2 The Derandomizer Size The size allocated to the front-end derandomizing buffers is a crucial issue. On one hand, available space on silicon and power budget lead to minimize the size. On the other hand, data loss probability due to buffer overflow has to be minimized to avoid misalignment in the DAQ, thus inducing large inefficiencies caused by long recovery times. This constraint leads to maximize the derandomizer size. The model used for this study is mainly defined by 3 parameters:

the mean time between two triggers "t", i.e. the inverse of the trigger rate the fifo’s service time "s", i.e. the minimum time between two consecutive read access to

the fifo’s. The fifo depth "d". This parameter is expressed in terms of events (a constant event size is

assumed). With this model, P(n) , the probability of having n events waiting in the fifo is computed. P(n=d), is the probability for the fifo to be full and also the derandomizer inefficiency. To analyze the contribution of these parameters to the global system inefficiency a simulation environment has been setup. We made a faithful description of the Local Level Processing hardware and the GTS interface mezzanine operations by means of the SystemC hardware description language [ ]. Being both synchronous with the GTS distributed global clock, a cycle accurate description of the local trigger generation, GTS handshake mechanism, fifo’s storage and readout and trigger matching procedure has been easy to achieve and deploy.

M. Bellato – INFN Padova 10

Draft version 1.3 – 11 November 2005

4.3 The Simulation Environment

CARRIER

SCC

GTS_IF

ARBITERFAST_MEM

OUT_BUS

CPU

fiber_in (15:0)fiber_in (15:0)

fiber_out (15:0)fiber_out (15:0) timestamp (47:0)timestamp (47:0)

cmd (7:0)cmd (7:0)

L1AL1A

backpressurebackpressure

trigger_ch (13:0)trigger_ch (13:0)

trigger_requesttrigger_request

ch0

(13:

0)ch

0 (1

3:0)

event_num (23:0)event_num (23:0)

ch1

(13:

0)ch

1 (1

3:0)

ch2

(13:

0)ch

2 (1

3:0)

ch3

(13:

0)ch

3 (1

3:0)

bus_

port

bus_

port

Trigger Primitive Generator Frontend System

GTS Interface

rst

rst

rst

rst

trigger_thresh (11:0)trigger_thresh (11:0) local_triggerlocal_triggerhold_time (9:0)hold_time (9:0)

gclk

gclk

gclk

gclk

gclk

gclk

L1A_latency (15:0)L1A_latency (15:0)

matching_window (15:0)matching_window (15:0)

slave_port_Mslave_port_M arbiter_portarbiter_port

spy (15:0)spy (15:0)

ch7

(13:

0)ch

7 (1

3:0)

ch6

(13:

0)ch

6 (1

3:0)

ch5

(13:

0)ch

5 (1

3:0)

ch4

(13:

0)ch

4 (1

3:0)

ch11

(13

:0)

ch11

(13

:0)

ch10

(13

:0)

ch10

(13

:0)

ch9

(13:

0)ch

9 (1

3:0)

ch8

(13:

0)ch

8 (1

3:0)

cpu_portcpu_port

bus_clkbus_clk

bus_clk

bus_clk

bus_clkbus_clk

bus_clk

bus_clk

Fig. 3 – Top level view of the simulation environment

Figure 3 shows the top level block diagram of the simulation environment. The front-end system is comprised of a GTS mezzanine interface, a primitive trigger generator which analyzes sampled data from central contact of each crystal for local trigger generation, a carrier box collecting data from twelve crystal segments and a readout bus to which a readout cpu and a global memory are connected. The model is sourced with real data taken from 24-fold and 36-fold segmented prototype crystals, illuminated with different radioactive sources. It is worth noting that the simulation environment is a coded implementation of the front-end model depicted in fig. 2.

4.3.1 The carrier The carrier box has the complex structure shown in fig. 4: two mezzanines (M1 and M2) take care of computing the energy and storing the rising edge samples of each event from six crystal segments each. Upon request of the readout engine box, on a per event basis these data are stored in a dual port ram after being tagged with a timestamp and event number taken from the ‘evcount_fifo’ and ‘tstamp_fifo’. A Direct Memory Access (DMA) controller box synchronizes with the Readout Engine box by means of two fifo’s (ro2dma and dma2ro fifo’s) and sinks event data from the dual port ram into its bus port.

M. Bellato – INFN Padova 11

Draft version 1.3 – 11 November 2005

ev_energy_re1

ev_energy_re2

ev_energy_re3

ev_energy_re4

ev_energy_re5

ev_energy_re6

ev_pulse_re1

ev_pulse_re2

ev_pulse_re3

ev_pulse_re4

ev_pulse_re5

ev_pulse_re6

ev_energy1

ev_energy2

ev_energy3

ev_energy4

ev_energy5

ev_energy6

ev_pulse1

ev_pulse2

ev_pulse3

ev_pulse4

ev_pulse5

ev_pulse6

ch1

ch2

ch3

ch4

ch5

ch6

gclk

L1A

L1A

_lat

ency

loca

l_tr

igge

r

mat

chin

g_w

indo

w

rst

times

tam

p

M1

ev_energy_re1

ev_energy_re2

ev_energy_re3

ev_energy_re4

ev_energy_re5

ev_energy_re6

ev_pulse_re1

ev_pulse_re2

ev_pulse_re3

ev_pulse_re4

ev_pulse_re5

ev_pulse_re6

ev_energy1

ev_energy2

ev_energy3

ev_energy4

ev_energy5

ev_energy6

ev_pulse1

ev_pulse2

ev_pulse3

ev_pulse4

ev_pulse5

ev_pulse6

ch1

ch2

ch3

ch4

ch5

ch6

gclk

L1A

L1A

_lat

ency

loca

l_tr

igge

r

mat

chin

g_w

indo

w

rst

times

tam

p

M2

tstamp_fifoevcount_fifo

dma2ro_fifo

ro2dma_fifo

DPRAM

READOUT_ENGINE

DMA_CONTROLLER

cmd (7:0)cmd (7:0)

ch1

(13:

0)ch

1 (1

3:0)

ch2

(13:

0)ch

2 (1

3:0)

ch3

(13:

0)ch

3 (1

3:0)

ch0

(13:

0)ch

0 (1

3:0)

bus_portbus_port

backpressurebackpressure

even

t_nu

m (

23:0

)ev

ent_

num

(23

:0)

spy (15:0)spy (15:0)

ch7

(13:

0)ch

7 (1

3:0)

ch6

(13:

0)ch

6 (1

3:0)

ch5

(13:

0)ch

5 (1

3:0)

ch4

(13:

0)ch

4 (1

3:0)

ch11

(13

:0)

ch11

(13

:0)

ch10

(13

:0)

ch10

(13

:0)

ch9

(13:

0)ch

9 (1

3:0)

ch8

(13:

0)ch

8 (1

3:0)

rstrst rstrst

L1A

L1A L1

A

L1A

L1A

_lat

ency

(15

:0)

L1A

_lat

ency

(15

:0)

L1A

_lat

ency

(15

:0)

L1A

_lat

ency

(15

:0)

loca

l_tr

igge

r

loca

l_tr

igge

r

loca

l_tr

igge

r

loca

l_tr

igge

r

mat

chin

g_w

indo

w (

15:0

)

mat

chin

g_w

indo

w (

15:0

)

mat

chin

g_w

indo

w (

15:0

)

mat

chin

g_w

indo

w (

15:0

)

times

tam

p (4

7:0)

times

tam

p (4

7:0)

times

tam

p (4

7:0)

times

tam

p (4

7:0)

times

tam

p (4

7:0)

times

tam

p (4

7:0)

addr_inaddr_inen

ergy

_ch0

energy_ch0

ener

gy_c

h0

energy_ch0

ener

gy_c

h1

energy_ch1

ener

gy_c

h1

energy_ch1

ener

gy_c

h2

energy_ch2

ener

gy_c

h2

energy_ch2

ener

gy_c

h3

energy_ch3

ener

gy_c

h3

energy_ch3

ener

gy_c

h4

energy_ch4

ener

gy_c

h4

energy_ch4

ener

gy_c

h5

energy_ch5

ener

gy_c

h5

energy_ch5

ener

gy_c

h6

energy_ch6

ener

gy_c

h6

energy_ch6

ener

gy_c

h7

energy_ch7

ener

gy_c

h7

energy_ch7

ener

gy_c

h8

energy_ch8

ener

gy_c

h8

energy_ch8

ener

gy_c

h9

energy_ch9

ener

gy_c

h9

energy_ch9

ener

gy_c

h10

energy_ch10

ener

gy_c

h10

energy_ch10

ener

gy_c

h11

energy_ch11

ener

gy_c

h11

energy_ch11

puls

e_ch

0

pulse_ch0

puls

e_ch

0

pulse_ch0

puls

e_ch

1

pulse_ch1

puls

e_ch

1

pulse_ch1

puls

e_ch

2

pulse_ch2

puls

e_ch

2

pulse_ch2

puls

e_ch

3

pulse_ch3

puls

e_ch

3

pulse_ch3

puls

e_ch

4

pulse_ch4

puls

e_ch

4

pulse_ch4

puls

e_ch

5

pulse_ch5

puls

e_ch

5

pulse_ch5

puls

e_ch

6

pulse_ch6

puls

e_ch

6

pulse_ch6

puls

e_ch

7

pulse_ch7

puls

e_ch

7

pulse_ch7

puls

e_ch

8

pulse_ch8

puls

e_ch

8

pulse_ch8

puls

e_ch

9

pulse_ch9

puls

e_ch

9

pulse_ch9

puls

e_ch

10

pulse_ch10

puls

e_ch

10

pulse_ch10

puls

e_ch

11

pulse_ch11

puls

e_ch

11

pulse_ch11

puls

e_re

10

pulse_re10

puls

e_re

10

pulse_re10

puls

e_re

9

pulse_re9

puls

e_re

9

pulse_re9

puls

e_re

8

pulse_re8

puls

e_re

8

pulse_re8

puls

e_re

7

pulse_re7

puls

e_re

7

pulse_re7

puls

e_re

6

pulse_re6

puls

e_re

6

pulse_re6

puls

e_re

5

pulse_re5

puls

e_re

5

pulse_re5

puls

e_re

4

pulse_re4

puls

e_re

4

pulse_re4

puls

e_re

3

pulse_re3

puls

e_re

3

pulse_re3

puls

e_re

2

pulse_re2

puls

e_re

2

pulse_re2

puls

e_re

1

pulse_re1

puls

e_re

1

pulse_re1

puls

e_re

0

pulse_re0

puls

e_re

0

pulse_re0

ener

gy_r

e11

energy_re11

ener

gy_r

e11

energy_re11

ener

gy_r

e10

energy_re10

ener

gy_r

e10

energy_re10

ener

gy_r

e9

energy_re9

ener

gy_r

e9

energy_re9

ener

gy_r

e8

energy_re8

ener

gy_r

e8

energy_re8

ener

gy_r

e7

energy_re7

ener

gy_r

e7

energy_re7

ener

gy_r

e6

energy_re6

ener

gy_r

e6

energy_re6

ener

gy_r

e5

energy_re5

ener

gy_r

e5

energy_re5

ener

gy_r

e4

energy_re4

ener

gy_r

e4

energy_re4

ener

gy_r

e3

energy_re3

ener

gy_r

e3

energy_re3

ener

gy_r

e2

energy_re2

ener

gy_r

e2

energy_re2

ener

gy_r

e1

energy_re1

ener

gy_r

e1

energy_re1

ener

gy_r

e0

energy_re0

ener

gy_r

e0

energy_re0

puls

e_re

11

pulse_re11

puls

e_re

11

pulse_re11

data_indata_in

token_out_dmatoken_out_dma

token_in_dmatoken_in_dma

addr_out

addr_out

addr_out

addr_out

data_out

data_out

data_out

data_out

token_out_rotoken_out_ro

token_in_rotoken_in_ro

bus_clk

bus_clk

bus_clk

bus_clk

bus_clk

bus_clk

bus_clk

bus_clk

gclk

gclk

gclk

gclk

gclk

gclkgc

lk

gclkgclk

gclk

gclk

gclk

gclk

gclk

gclkgc

lk

gclkgclk

rd_enable_c

rd_e

nabl

e_c

rd_enable_c

rd_e

nabl

e_c

wr_enable_c

wr_

enab

le_c

wr_enable_c

wr_

enab

le_c

rd_enable_t

rd_e

nabl

e_t

rd_enable_t

rd_e

nabl

e_t

wr_enable_t

wr_

enab

le_t

wr_enable_t

wr_

enab

le_t

tst_out

tst_

out

tst_out

tst_

out

evc_out

evc_

out

evc_out

evc_

out

Fig. 4 – Block diagram of the “Carrier” box of fig. 3.

The mezzanines M1 and M2 are a six-fold instance of the block diagram of a single channel, shown in fig. 5.

M. Bellato – INFN Padova 12

Draft version 1.3 – 11 November 2005

channel

ev_energy

ev_pulse

ch

ev_energy_re

ev_pulse_re

gclk

L1A

L1A_latency

local_trigger

matching_window

timestamp

rst

C1

ch1ch1

ev_pulse1ev_pulse1

ev_pulse_re1ev_pulse_re1ev_energy_re1ev_energy_re1

ev_energy1ev_energy1gclkgclk

matching_windowmatching_window

timestamp (15:0)timestamp (15:0)

L1AL1A

L1A_latencyL1A_latency

local_triggerlocal_trigger

rstrst

Fig. 5 – Block diagram of the “Mezzanine” box of fig. 4.

4.3.2 The Channel Structure At the heart of the simulation code is the channel structure, as shown in fig. 6. Samples from the segment contact enter both a delay line and a Moving Window Deconvolution (MWD) box that computes the energy of the pulse and stores the value in a fifo (en_fifo) for later readout. Upon arrival of a local trigger, a pulse controller box (pulse_cntr) moves a predefined number of samples from the delay line into an event fifo (ev_fifo), thus isolating the rising edge of the pulse.

M. Bellato – INFN Padova 13

Draft version 1.3 – 11 November 2005

tstamp_fifo

delay_fifo

ch_fifo

ev_fifo

en_fifo

pulse_cntr

trigger_matching

MWD

energy_cntr

tstamp_in

ch (13:0)ch (13:0)

L1AL1A

writ

e_en

able

_tw

rite_

enab

le_t

puls

e_ou

tpu

lse_

out

times

tam

p_ou

ttim

esta

mp_

out

writ

e_en

able

_pw

rite_

enab

le_p

local_triggerlocal_trigger

read

_ena

ble_

tre

ad_e

nabl

e_t

ener

gy_o

uten

ergy

_out

writ

e_en

able

_en

writ

e_en

able

_en

read_enable_dread_enable_d

read_enable_cread_enable_c

write_enable_evwrite_enable_ev timestamp (15:0)

times

tam

p (1

5:0)

timestamp (15:0)

timestamp (15:0)

times

tam

p (1

5:0)

timestamp (15:0)

ch_o

utch

_out

gclk

gclk

gclk

gclk

gclk

gclkgclk

gclk

gclk

gclk

gclk

gclk

gclk

gclk

gclk

gclk

gclk

gclkgclk

gclk

gclk

gclk

gclk

gclk

ev_pulseev_pulseev_pulse_reev_pulse_re

ev_energyev_energyev_energy_reev_energy_re

L1A_latencyL1A_latency

matching_windowmatching_window

latch_energy

latch_energy

latch_energy

latch_energy

rst

rst

rst

rst

rst

rst

rst

rst

rst

rst

Fig. 6 – The channel structure.

4.3.3 Trigger Matching The arrival of a trigger validation instead enables the storage of the current GTS time into a timestamp fifo (tstamp_fifo) to be used by the trigger matching engine. The trigger matching engine is in charge of correlating the trigger validations stored in the tstamp_fifo with candidate local triggers that have triggered the storage of samples into the ev_fifo. In other words, trigger matching is a time match between a trigger validation time tag and the local trigger time tags themselves. The idea is shown in figure 7.

M. Bellato – INFN Padova 14

Draft version 1.3 – 11 November 2005

Fig. 7 – Trigger matching technique.

Local triggers and subsequent trigger validations differ in time for a number of reasons. The most obvious is that the latters are the result of logical operations performed on the formers, so they are intrinsically time ordered. There is a constant time difference between the two due to hardware infrascture that transports local trigger signals to a central trigger processor for validation generation. There is also a variable time difference between the two due to the mechanism of local trigger generation which depends somehow on the amount of charge induced by gamma interaction in each single crystal. As a consequence, the trigger matching procedure must be done inside a window, the match window, as show in the following figure.

Fig. 8 – Window based Trigger matching.

At the time of a validation arrival (‘trigger input’ in fig. 8), from the current value of GTS clock a trigger latency value is subtracted. The result is an estimate of the time at which the candidate local triggers have happened. From that time on, for a period that equals the match window, all events whose time tags fall inside the window get validated and will be moved in the readout fifo. The events whose time tags fall before the time window can be discarded from the input fifo because they have no chance to be validated in future. Those events instead, with time tags that are younger than the match window will remain in place due to the possibility of being validated at a later time.

M. Bellato – INFN Padova 15

Draft version 1.3 – 11 November 2005

4.3.4 Simulation Results

5. Global Timestamp Protocol

To be filled …

6. Global Trigger Algorithms

To be filled …

M. Bellato – INFN Padova 16

Draft version 1.3 – 11 November 2005

7. Implementation AGATA GTS will have a tree topology, originating from the root node that will therefore act at the same time as the source of all global information (clock, timestamps, commands, L1 validations) and the sink of all trigger requests, fast monitoring signals and service requests coming from the crystals.

Fig. 9 – GTS topology

To solve the problems of building a bi-directional, high capacity and high speed tree network that drives hundreds of nodes displaced several tenths of meters apart, a certain number of technological issues must be addressed. Among them, the fan-out of a source synchronous transmission, noise immunity, low error rate and throughput. Modern serial transceivers, as used in commercial high speed telecom networks, can solve part of them. Simply stated, these devices basically transfer a digital pattern from one side of a transmission channel to the other (and viceversa) by serializing the pattern at a speed that equals the input clock frequency times the pattern width; the serialized pattern is then reconstructed identical at the receiving side of the transmission line by means of a serial to parallel conversion. AGATA GTS may greatly benefit of the technological solutions devoted to high speed serial transmissions, because, by exploiting the use of these components, the design can be kept simple yet adequate. The solution proposed is sketched in the following figure.

M. Bellato – INFN Padova 17

Draft version 1.3 – 11 November 2005

GTSSupervisor

GTS Fanin-Fanout

GTS Transceivermezzanine

WEB Interface

ATCA backplane

Fig. 10 – GTS technology and implementation

The hierarchy is composed of five different parts:

1. the root node 2. the backplane 3. the fanin-fanout nodes 4. the fibre connections 5. the mezzanine interface

7.1 The root node This is the global time reference source and trigger processor. It might actually split in two boards, one for each direction of transmission. It sits in the central slot(s) of a (dual) star backplane and sources:

a) global clock

M. Bellato – INFN Padova 18

Draft version 1.3 – 11 November 2005

b) global timestamps (global clock counter, global event counter) c) fast commands

and sinks: a) trigger requests b) fast monitoring feedback signals c) error notifications

A block diagram is shown in the next figure:

Fig. 11 – Block diagram of GTS root node

7.2 The backplane Advanced Telecommunication Architecture (ATCA) is a telecom specification (PICMG3.0) [2] aimed at standardizing the connectivity backplane for high speed, high throughput computing and switching devices. The specification is geared towards serial switched technology and is of interest for AGATA GTS. Backplanes following ATCA specification have dual star (or full mesh) topology

M. Bellato – INFN Padova 19

Draft version 1.3 – 11 November 2005

with the two central slots as the centres of the two stars and 6+6 slots on the periphery (14 peripheral slots is also an option). Each slot is connected to the centre by means of a certain number of matched impedance PCB traces (whose differential skew is less than 10ps) and routed to sustain a bit rate in excess of 3 gigabits/s per pair.

7.3 The Fanin-fanout nodes These nodes act as splitting-combining nodes. When splitting, they replicate the information originated in the root node in each of the channels that they address; when combining, they merge the information originated in the crystals and send it to the root node. Each node is actually a board sitting in one of the 12 peripheral slots of the ATCA backplane and addresses 16 crystals. A block diagram of the board is show in the following figure:

Fig. 12 – Block diagram of GTS fanin-fanout module

7.4 The fibre connections Long distance transmission, noise immunity, galvanic isolation and low bit error rate (BER) will benefit from the use of serial optical transceivers and fibre connections between each channel of

M. Bellato – INFN Padova 20

Draft version 1.3 – 11 November 2005

the fanin-fanout nodes and the mezzanine interfaces that are hosted in the front-end electronics of each crystal. 7.5 The mezzanine interface This is the block diagram of the GTS interface at the crystal level (or any ancillary detector). It decodes the information originated from the root node in one direction and encodes the information originated in the crystal in the opposite direction. The board is centered around a Xilinx Virtex2pro-XC2VP7-FF672 FPGA which hosts the custom logic for GTS operation. Figure 13 shows the main functional blocks of the card.

XC2VP7-6 FPGA

DDR SDRAMtesting

FLASH2x

SDRAM2x

Flash ROM1x

RocketIOHSSDC2

connector 4x

Temperaturesensor

RocketIOHSSDC2

connector 4x

RocketIOHSSDC2

connector 4x

SFP

4x

LED6xLED6xLED6xLED6xLED6xLED6x

transceivers

Boot ROM1x

LLP IOMictor connectors

132

FLASH2x

Delay Lines2x

PLL Filter

Time to DigitalConverter

JTAG headerEthernet

TP-10/100

Fig. 13 – Block diagram of GTS mezzanine interface

The board features:

• A Xilinx VirtexII-pro XC2VP7-FF672-6 field programmable gate array with 8 serializers/deserializers (rocketIO’s) capable of 3.125Gb/s operation and an embedded PowerPC cpu.

• Up to four pluggable optical transceivers interfacing the rocketIO channels of the FPGA to the fiber tree of the GTS system

• 128Mbytes of RAM and 16Mbytes of flash ROM for the embedded PowerPC microprocessor and operating system support

• A flash ROM for FPGA configuration • Two programmable digital delay lines for phase adjustment of the GTS global clock • A high performance PLL with very low loop bandwith for jitter removal of the GTS clock • A twisted-pair Ethernet channel for slow control operation • Two mictor-type connectors for Local Level Processing and Ancillary Detector interfacing • A time-to-digital converter for time-of-flight pulse measurements during GTS global clock

phase equalization procedure.

M. Bellato – INFN Padova 21

Draft version 1.3 – 11 November 2005

A picture of the board is show in figure 14. Actually two implementations are shown, with one and four optical transceivers.

Fig. 14 – Picture of the GTS mezzanine interface

The form factor adheres to the Common Mezzanine Card (CMC) standard.

7.5.1 GTS Mezzanine operation The basic tasks that the mezzanine fulfils are listed below:

• It must reconstruct the global GTS clock, 48bit timestamp, level 1 trigger validations and global commands coming from the GTS root node through the fiber and a rocketIO channel into the FPGA.

• It must source the reconstructed global clock, 48bit timestamp and global commands to the LLP or ancillary node through the mictor connectors.

• It must feed the root node with a 1 microsecond paced heartbeat messages (“idle” commands) to be used in the central trigger processor.

• It must provide an Ethernet based slow control connection.

M. Bellato – INFN Padova 22

Draft version 1.3 – 11 November 2005

• It must provide a mechanism for setting the reconstructed global clock to an arbitrary phase in a 10 ns interval.

• It must source the reconstructed global clock with a sufficiently low phase noise so as to be readily used for multi-gigabit transceivers operation by the LLP or ancillary node.

• It must provide a mechanism for accepting, tagging and storing local trigger requests coming from the LLP or ancillary node according to a well known, fixed latency handshake protocol.

• It must route tagged trigger requests, status informations and trigger throttle control commands originating from the LLP or ancillary node to the GTS root node via a multi-gigabit transceiver channel and optical transceiver.

• It must implement the trigger matching algorithm, that is correlating trigger validation tags coming from the root node with stored trigger requests and in case of matching or rejection, it must inform the LLP or ancillary node according to a well known handshake protocol.

• It must provide a mechanism for measuring the round trip times of pulses originating in the GTS root node and reaching the Digitizers clock distribution boards for the purpose of phase equalization at the Digitizers nodes.

• It must provide all the necessary resources for the internal CPU to run a high level operating system with a full TCP/IP network connection.

Hence, the role of the GTS mezzanine as global synchronization and control agent becomes evident.

7.5.2 A VHDL model of the GTS Mezzanine As stated before, the role of the Xilinx FPGA in the mezzanine is to host both the glue logic for interfacing the board hardware resources and the custom logic for the value added operations listed above. The following figure shows a block diagram of this logic. Actually the diagram hides a hierarchy of blocks, each taking care of a basic subtask, and deeply relating each other by means of signals. The hierarchy is a graphical representation of an underlying vhdl model that is synthezisable for the VirtexII-pro technology. The aim of the model is also to provide a tool for checking the correctness of mezzanine operation by means of simulation. In fact, the model is part of a broader simulation strategy whose target is to develop and deploy a software framework for the validation of the whole GTS tree. Appendix D reports an expanded view of the hierarchical block diagram, together with a detailed explanation of the interplay among the different subblocks. Fig.15 shows the top level interface of the mezzanine vhdl model (“GTS_TOP”) interfaced, by means of two complementary pairs (“fiber_in” and “fiber_out”), to the root node model (“ROOT”) which acts as a testbench exerciser. The task of the root node model is to provide the necessary setup informations (reset commands) and 8-bit coded timestamp pattern to the mezzanine and subsequently to validate all trigger requests coming from it, thus emulating a multiplicity 1 trigger condition. The red-coloured signals put in evidence the LLP-side interface, whose name, width, direction and electrical standard has been completed specified with a LLP document [ ]. The full model is browsable at the web site : http://agata.pd.infn.it in the GTS reserved section.

M. Bellato – INFN Padova 23

Draft version 1.3 – 11 November 2005

GTS_top

trigger_rejection

tstamp_err

trigger_validation

local_trigger

lreset

trigger_requestbcast_out

msg_inval_rej_tag

local_tag

bcast_strobemsg_strobe

backpressuretag_strobe

lt_strobe

gts_clk

reject_window

fiber_in_n

fiber_in_p

fiber_out_n

fiber_out_p

lclk

GTS_TOP

"0000000100000000"C7

ROOT

idle_tag

trg_tag

fiber_out_n

fiber_out_p

trg_request

idle

fiber_in_n

fiber_in_p

enable

event_num

lreset

val_tag

ocxo_clk

backpressure_in

send_commacmd_ack

cmd_strobe

command

ROOT

tstamp_errtstamp_err

trigger_validationtrigger_validation

trigger_rejectiontrigger_rejection

val_rej_tag (7:0)val_rej_tag (7:0)

local_tag (7:0)local_tag (7:0)

lt_strobelt_strobe

tag_strobetag_strobe

trigger_request (1:0)trigger_request (1:0)

local_trigger (1:0)local_trigger (1:0)

msg_in (7:0)msg_in (7:0)

msg_strobe (1:0)msg_strobe (1:0) bcast_strobebcast_strobe

gts_clkgts_clk

RED SIGNALS <=> MICTOR CONNECTORS

bcast_out (7:0)bcast_out (7:0)

S28 (15:0)S28 (15:0)reject_window = 256 clock cycles

fiber_in_pfiber_in_p

fiber_in_nfiber_in_n

fiber_out_nfiber_out_n

fiber_out_pfiber_out_p

idle_tag (15:0)idle_tag (15:0)

trg_tag (15:0)trg_tag (15:0)

trg_requesttrg_request

backpressure_inbackpressure_in

idleidle

enableenable

event_num (23:0)event_num (23:0)

val_tag (15:0)val_tag (15:0)

backpressurebackpressure

send_commasend_comma

ocxo_clk

ocxo_clk

ocxo_clk

ocxo_clk

lreset

lreset

lreset

lreset

cmd_ackcmd_ack

command (3:0)command (3:0)

cmd_strobecmd_strobe

Fig. 15 – Top level block diagram of the FPGA operation, together with its testbench

M. Bellato – INFN Padova 24

Draft version 1.3 – 11 November 2005

8. Phase equalization of GTS clocks The GTS clock, together with its associated timestamp, constitute the time reference for the whole AGATA system. In each crystal, a replica of the main GTS clock, sourced through a dedicated GTS interface mezzanine, feeds the analog-to-digital converters for digitization of segments signals from the charge preamplifiers. The same clock feeds also the communication devices that transfer the relevant information from the crystals up to the local level processing hardware. Moreover, as the whole data acquisition process relies totally on timestamps for the event building process, it is easy to understand that the GTS system must source replicas of its main clock and timestamp that are time aligned at each crystal level. Failing to do so would imply a number of consequences, the most obvious of which is the incorrect time tagging of event fragments and hence incorrect event building, but also potentially incorrect trigger processing. Hence, the problem of clock phase equalization is a crucial issue in AGATA. The reasons behind the misalignment of clock phases among different crystals are multiple. To cite a few there are: different PCB trace lengths, different fiber lengths, different propagation delays of active devices, different routes inside programmable logic devices, different process-voltage-temperature corners of active devices, different equalization fifo’s depths of serializer-deserializer (serdes) devices used throughout the GTS system and many others. While a detailed model of clock phase mismatches is beyond the scope of this document, from what stated above it is obvious that we need to devise a method for diagnose and take care of phase misalignement. An analysis of the problem shows that the contributions to the phase mismatch can be divided in four categories:

Type of Contribution Reason 1 Fixed PCB trace lengths, propagation delays, …

2 Static Equalization fifo’s of serdes devices

3 Slowly changing Temperature, optical phase dispersion, …

4 Rapidly changing Power supply ripple, …

Table 2: Contributions to phase mismatch. While the contributions that fall in the first category can be measured once, those falling in the last three categories must be diagnosed at run-time and fixed with a fast and automated procedure, so that the contribution to the global system inefficiency is negligible. In AGATA, two different methods of phase equalization are currently under investigation:

8.1 Continuous resets method The situation is the following: in the root node of the GTS tree the global clock and associated timestamp are generated and broadcasted to end nodes. In the end nodes the global clock is reconstructed and phase locked with the clock of the root node but with an unpredictable phase.

M. Bellato – INFN Padova 25

Draft version 1.3 – 11 November 2005

Mezzanine Root node

Tx

MGT Opt Fiber

Rx Fig. 15 – Root node – End node pair with loopback

The main reason of phase uncertainty is contributions of type 2, e.g. due to the nature of clock data recovery circuit (CDR) of serdes devices. By putting a loopback in one end node, from the root node we measure the round trip time of a pulse and try to devise a method from which we can safely assume that the downlink and uplink channels are symmetric and hence the measurement (divided by two) equals the time of flight of a pulse from the root node to that end node. By repeating the measurement for each end node we will know how to setup programmable delay lines at the end nodes in order to reach phase equalization for all crystals. From laboratory test, we can basically assume that the phase of recovered clock at one end node is almost uniformly distributed in one clock period. If we take the phase of recovered clock as a random variable, its probability density function looks like in figure 16

t1+T/2 t1

P(t) 1/T t1-T/2 t Fig. 16 – Ideal probability density function of recovered clock phase at the end node

Now if we consider the transceiver pair with loopback, as in figure 15 and we consider the phase of the recovered clock at the root node as a random variable, we might assume that it is the sum of two identical variables (as long as contributions of type 1 to phase mismatch have been minimized by design).

Root tx

Root rx

Fig. 17 – Global clock and recovered clock at the root node The channels are in fact mutually independent because the transmitter clock is independent from the recovered clock and viceversa, so the probability density function should look like:

M. Bellato – INFN Padova 26

Draft version 1.3 – 11 November 2005

P(t)

2*t1-T 2*t1+T 2*t1

1/T

Fig. 18 – Ideal probability density function of recovered clock phase at the root node

t In order to obtain the single channel latency dividing the total round trip time (RTT) by two, we have to look for a situation of maximum symmetry between the channels. The idea is to perform several reset-and-measure cycles until the RTT measure is close to the expected maximum or minimum. When this happens one can safely assume that the channels times of flight are respectively both close to their maximum or minimum and we can divide by two the RTT value to obtain the individual channel latency.

8.1.1 Test results The test setup is illustrated in the following figure.

P1

100 MHz OSC

Fig. 19 – The test stand for the continous resets method.

TX1

RX1

RX2

TX2 SHIFTER / FILTER

RECOV. CLOCK

DATA

100m FIBRE

DATA PATTERN

DATA OUT

RDOUT CLK

RDOUT CLK

TD

P3

“ROOT” “NODE”

RocketIO MGTRocketIO MGT

P2

oscilloscope

ASYMMETRY = |TD-TU| / 10 [ns]

TU

TL

M. Bellato – INFN Padova 27

Draft version 1.3 – 11 November 2005

A data pattern P1 is injected at the root node in the downlink. At the end node the same pattern (P2) is recovered from the serial stream and its time of flight is measured by means of an oscilloscope (triggered by the time of occurrence of P1). The pattern is loopbacked towards the root node, recovered (P3) and its time of flight measured in the same way. In figure 20 the latency distributions of the downlink (TD) and uplink (TU) in a clock cycle are reported. In figure 21 the latency distribution of the loop (TL) in a clock cycle is reported.

0

200

400

600

800

1000

1200

1400

1600

835

837

838

840

841

843

844

846

847

849

850

852

853

855

latency [ns]

freq

. [#]

0

100

200

300

400

500

600

700

835

836

837

838

839

840

841

842

843

844

845

846

847

848

849

850

851

latency [ns]

freq

. [#]

Fig. 20 – Latency distribution of downlink (TD – upper) and uplink (TU – lower).

0

200

400

600

800

1000

1200

1673

1675

1677

1679

1681

1683

1685

1687

1689

1691

1693

1695

1697

1699

1701

1703

latency [ns]

freq

. [#]

Fig. 21 – Latency distribution of the loop (TL).

M. Bellato – INFN Padova 28

Draft version 1.3 – 11 November 2005

By defining the degree of asymmetry as:

)(10 nsTUTD

Asymmetry−

=

one can plot the degree of asymmetry of the loop versus the global latency as in figure 22:

0,0%

50,0%

100,0%

150,0%

200,0%

250,0%

300,0%

350,0%

1670

1671

1672

1673

1674

1675

1676

1677

1678

1679

1680

1681

1682

1683

1684

1685

1686

1687

1688

1689

1690

1691

1692

1693

1694

1695

1696

1697

1698

1699

1700

1701

1702

1703

1704

1705

latency [ns]

asym

met

ry

Fig. 22 – Loop asymmetry versus global latency.

The measurements shown in figure 22 have been obtained by performing more than 11.000 reset-and-measure cycles on the test stand of figure 19 with an automated procedure. This picture deserves some useful comments:

Two minima of asymmetry are evident at the extremes of the global latency interval [1677..1696] ns. This means that, at those minima (which correspond respectively to the global minimum and maximum of the loop latency) the downlink and uplink latency are almost equal and the two channels (uplink and downlink) are symmetric. Hence, we can estimate the end node clock phase with a measurement procedure performed at the root node.

The resolution of the measurements is 500 ps, which corresponds to the clock cycle of the serial stream.

There is a certain number of points outside the [1677..1696] ns interval. These points are due to computation errors in the automation procedure (less than one hundred out of more than eleven thousands).

There is a pedestal in the plot greater than zero, e.g. the minima of asymmetry at the extremes of the interval are not zero. In other words there is a fixed contribution to asymmetry, e.g. a systematic error due to contribution of type 1 (PCB traces and cable mismatches, propagation delays impairments, impedance mismatches, etc.). This type of contribution can easily be measured from a plot like the one in fig. 22. Once subtracted, the plot in fig. 22 becomes a perfect triangle and the minima at the extremes of the interval become global minima.

Due to the nature of our test stand, the measurements shown in fig. 22 took more than two hours to be performed. While an investigation is currently under way to speed up the measurement

M. Bellato – INFN Padova 29

Draft version 1.3 – 11 November 2005

process, it is evident that it is in the nature of the method itself to perform repetitive resets of serdes devices until a preferred latency is measured. That is, the method can be time consuming because we cannot foresee the time it takes to converge. For this reason it is not obvious that it might be used online, for periodic calibrations of the whole GTS distribution tree at beam time.

8.2 Direct Measurement

8.2.2 Test Results

M. Bellato – INFN Padova 30

Draft version 1.3 – 11 November 2005

9. References [1] – I. Lazarus editor - AGATA Pre-Processing Hardware – Draft. Document – 17/12/2003 [2] – PICMG 3.0 – Advanced Telecommunication Computing Architecture – http://www.picmg.org

10. Revision History 12 Jan 2004 -- Draft version 1.0. 10 Aug 2004 -- Draft version 1.1. 24 Oct 2004 -- Draft version 1.2. 11 Nov 2005 -- Draft version 1.3.

M. Bellato – INFN Padova 31

Draft version 1.3 – 11 November 2005

Appendix A. GTS Mezzanine Interface Pin-out The GTS mezzanine interface pin-out is specified as:

Name Direction No. of bits Description

Clock100 Out 1 Global clock

Clock100des Out 1 Global clock phase adjustable

Timestamp Out 48 Global clock counter

Ev_counter Out 16 Event counter

Scl In 1 I2C interface

Sda In/Out 1 I2C interface

Trigger_valid Out 1 Level 1 trigger validation

Trigger_req In 1 Trigger Request

EvCnt_str Out 1 Event Counter strobe

BrdCst_in Out 8 Broadcast Command bus from root node

BrdCst_in_str Out 1 Broadcast bus strobe

BrdCst_out In 8 Broadcast Command bus towards root node

BrdCst_out_str In 1 Broadcast bus strobe

Error Out 1 Transmission error

Master_reset In 1 Mezzanine reset

Ethernet[0-3] In/Out 4 Ethernet connection

M. Bellato – INFN Padova 32

Draft version 1.3 – 11 November 2005

Appendix B. Costs

B.1 Single Crystal prototype costs.

For single crystal prototype operation only 1 GTS Mezzanine card (working in emulation mode) is needed. GTS Mezzanine costs are detailed in the Pre-Processing Hardware Specs. [ ]

B.2 Demonstrator costs.

For demonstrator operation (15 crystals) the following items are needed:

• 15 Mezzanine interfaces (costs included in the Pre-processing Hardware Specs [ ])

• 1 Fanin-fanout board (16 channels) • 1 Root Node (possibly splitted in two – see before) • 1 ATCA crate

Root node(s) Qty Unit price Cost Pcb Layout 2 €5000 €10000 Pcb 2 €1000 €2000 VirtexII pro 2 €2000 €4000 Memory 2 €10 €20 Flash mem 2 €100 €200 LVDS serdes 24 €20 €480 Power supply 2 €100 €200 Connectors €300 €600 PLL 2 €120 €240 Misc 2 €300 €600 Total €18400

Fanin-Fanout Qty Unit price Cost Pcb Layout 1 €5000 €5000 Pcb 1 €1000 €1000 VirtexII pro 1 €2000 €2000 VirtexII pro V70 2 €500 €1000 Memory 1 €10 €10 Flash mem 1 €100 €100 Optical Transceivers 16 €250 €4000 Power supply 1 €100 €200 Connectors €300 €300 PLL 2 €120 €240 Misc 1 €300 €300 Total €14150

M. Bellato – INFN Padova 33

Draft version 1.3 – 11 November 2005

Assuming that, at the prototyping stage, two runs are needed to obtain a working system, then the costs for hardware for the demonstrator are (€18400 + €14150) X 2 = €65100 for the boards and €20000 for one ATCA crate. In total €85100.

Instrumentation & Software

Qty Price

TDR test Jig 1 €4000 Differential Probe 1 €4000 DSO 4GHZ 1 €45000 2nd ATCA crate 1 €20000 Optical Power Meter 1 €2000 ATCA extender 1 €2000 Xilinx Dev Kit 1 €5000 Xilinx ISE 1 €500/year Synplify Pro 1 €1000/yearVisualHDL 1 €2000/yearSynopsys 1 €2000/yearTotal €87500

NRE costs for Xilinx IP Cores are detailed in the Pre-Processing Hardware Specs.

M. Bellato – INFN Padova 34

Draft version 1.3 – 11 November 2005

Appendix C. Timescale and manpower Timescale and manpower for the GTS Mezzanine are detailed in the Pre-Processing Hardware Specs [1]. It is reported here for sake of completeness:

C.1 GTS interface mezzanine

Task Time (months)

Manpower (1 man)

PCB layout & Assembly 6 1 Testing FPGA development 9 1 Software Total 15 1

For the root node and fanin-fanout boards the following figures hold:

C.2 Root Node

Task Time (months)

Manpower (1 man)

PCB layout & Assembly 4 1 Testing 1 1 FPGA development 12 1 Software 7 1 Total 24 1

C.3 Fanin-fanout Node

Task Time (months)

Manpower (1 man)

PCB layout & Assembly 4 1 Testing 1 1 FPGA development 12 1 Software 7 1 Total 24 1

M. Bellato – INFN Padova 35

Draft version 1.3 – 11 November 2005

Appendix D. The GTS Mezzanine VHDL model The GTS mezzanine is the leaf node of the Global Trigger and Synchronization tree and plays the role of an intelligent interface between the front-end electronics and the trigger complex. The following is a description of the vhdl circuit structure that resides in the field programmable gate array in the mezzanine.

GTS_top

trigger_rejection

tstamp_err

trigger_validation

local_trigger

lreset

trigger_requestbcast_out

msg_inval_rej_tag

local_tag

bcast_strobemsg_strobe

backpressuretag_strobe

lt_strobe

gts_clk

reject_window

fiber_in_n

fiber_in_p

fiber_out_n

fiber_out_p

lclk

GTS_TOP

"0000000100000000"C7

ROOT

idle_tag

trg_tag

fiber_out_n

fiber_out_p

trg_request

idle

fiber_in_n

fiber_in_p

enable

event_num

lreset

val_tag

ocxo_clk

backpressure_in

send_commacmd_ack

cmd_strobe

command

ROOT

tstamp_errtstamp_err

trigger_validationtrigger_validation

trigger_rejectiontrigger_rejection

val_rej_tag (7:0)val_rej_tag (7:0)

local_tag (7:0)local_tag (7:0)

lt_strobelt_strobe

tag_strobetag_strobe

trigger_request (1:0)trigger_request (1:0)

local_trigger (1:0)local_trigger (1:0)

msg_in (7:0)msg_in (7:0)

msg_strobe (1:0)msg_strobe (1:0) bcast_strobebcast_strobe

gts_clkgts_clk

RED SIGNALS <=> MICTOR CONNECTORS

bcast_out (7:0)bcast_out (7:0)

S28 (15:0)S28 (15:0)reject_window = 256 clock cycles

fiber_in_pfiber_in_p

fiber_in_nfiber_in_n

fiber_out_nfiber_out_n

fiber_out_pfiber_out_p

idle_tag (15:0)idle_tag (15:0)

trg_tag (15:0)trg_tag (15:0)

trg_requesttrg_request

backpressure_inbackpressure_in

idleidle

enableenable

event_num (23:0)event_num (23:0)

val_tag (15:0)val_tag (15:0)

backpressurebackpressure

send_commasend_comma

ocxo_clk

ocxo_clk

ocxo_clk

ocxo_clk

lreset

lreset

lreset

lreset

cmd_ackcmd_ack

command (3:0)command (3:0)

cmd_strobecmd_strobe

Fig. D1 – Top level block diagram of the mezzanine model.

At the top level is the interface between the root node and the mezzanine on one side and between the mezzanine and the front-end electronics on the other side. For the sake of clarity the LLP/ancillary interface signals are coloured in red. The model depicted in Fig. D1 shows two complementary pairs that connect the root node and the mezzanine in both directions. The pairs are the inputs and outputs of a couple of multi-gigabit transceiver pair connected back-to-back and simulated in the model by means of an encrypted SmartModel core distributed by Xilinx. The model is missing the description of (and hence the interaction with) the slow control part, e.g. the microprocessor, which is an integral part of the mezzanine and takes care of crucial tasks as the global clock phase equalization and online monitoring. All the relevant setup informations are given to the mezzanine through signals or messages issued from the root node.

M. Bellato – INFN Padova 36

Draft version 1.3 – 11 November 2005

trigger_match

T_valreject_ctrl

RTX

T_request_ctrl

fifo_freelist

dout

empty

full

wr_data_count

din

rd_clk

rd_en

rst

wr_clk

wr_en

FIFO_FREELIST

dpram_tstamp

doutb

addra

addrb

clka

clkb

dina

dinb

wea

web

DPRAM

fifo_l1a

dout

empty

full

rd_data_count

wr_data_count

din

rd_clk

rd_en

rst

wr_clk

wr_en

FIFO_L1A

L1A_fifo_full

L1A_fifo_full

L1A_fifo_full

L1A_fifo_full

L1A_fifo_wr_en

L1A_fifo_wr_en

L1A_fifo_wr_en

L1A_fifo_wr_en

T_request_mem_full T_request_mem_fullT_request_mem_full T_request_mem_full

T_request_mem_wr_en

T_request_mem_wr_en

T_request_mem_wr_en

T_request_mem_wr_en

L1A (15:0)L1A (15:0)

trigger_request (1:0)trigger_request (1:0)

local_trigger (1:0)local_trigger (1:0)

local_tag (7:0)local_tag (7:0)

trigger_validationtrigger_validation

trigger_rejectiontrigger_rejection

val_rej_tag (7:0)val_rej_tag (7:0)

lresetlreset

L1A_fifo_empty

L1A_fifo_empty

L1A_fifo_empty

L1A_fifo_emptyL1A_fifo_rd_en

L1A_fifo_rd_en

L1A_fifo_rd_en

L1A_fifo_rd_en

ev_num (23:0)

ev_num (23:0)

ev_num (23:0)

ev_num (23:0)

L1A_fifo_rd_data_count (5:0)

L1A_fifo_rd_data_count (5:0)

L1A_fifo_rd_data_count (5:0)

L1A_fifo_rd_data_count (5:0)

L1A_fifo_wr_data_count (5:0)

L1A_fifo_wr_data_count (5:0)

L1A_fifo_wr_data_count (5:0)

L1A_fifo_wr_data_count (5:0)

T_request_mem_dout (47:0)

T_request_mem_dout (47:0)

T_request_mem_dout (47:0)

T_request_mem_dout (47:0)

T_request_mem_empty

T_request_mem_empty

T_request_mem_empty

T_request_mem_empty

validate

validate

validate

validate

rejecta

rejecta

rejecta

rejecta

validate_ack

validate_ack

validate_ack

validate_ack

reject_ack

reject_ack

reject_ack

reject_ackvaLreject_tag (47:0)

vaLreject_tag (47:0)

vaLreject_tag (47:0)

vaLreject_tag (47:0)

tstamp_errtstamp_err

freeaddr (4:0)

freeaddr (4:0)

freeaddr (4:0)

freeaddr (4:0)

timestamp (47:0)

timestamp (47:0)

timestamp (47:0)

timestamp (47:0)

t (47:0)

t (47:0)

t (47:0)

t (47:0)

checkaddr (4:0)

checkaddr (4:0)

checkaddr (4:0)

checkaddr (4:0)

storeaddr (4:0)

storeaddr (4:0)

storeaddr (4:0)

storeaddr (4:0)

lclklclk

store_en

store_en

store_en

store_enrd_enrd_en

T_request_wr_data_count (4:0)

T_request_wr_data_count (4:0)

T_request_wr_data_count (4:0)

T_request_wr_data_count (4:0)

clear (47:0)

clear (47:0)

clear (47:0)

clear (47:0)

clear_en

clear_en

clear_en

clear_en

gts_clkgts_clk

backpressurebackpressure

lt_strobelt_strobe

tag_strobetag_strobe

msg_in (7:0)msg_in (7:0)

bcast_out (7:0)bcast_out (7:0)

bcast_strobebcast_strobe

msg_strobe (1:0)msg_strobe (1:0)event_num (23:0)event_num (23:0)

reject_window (15:0)reject_window (15:0)

gclk

gclk

gclkgclk

gclk

gclk

gclk

gclk

gclk

gclk

gclk

gclk

gclkgclk

gclk

gclk

gclk

gclk

gclk

gclk

fiber_in_nfiber_in_n

fiber_in_pfiber_in_p

fiber_out_nfiber_out_n

fiber_out_pfiber_out_p

ltrg (1:0)ltrg (1:0)

ltag (7:0)ltag (7:0)

l_trg (1:0)

l_trg (1:0)

l_trg (1:0)

l_trg (1:0)

l_tag (7:0)

l_tag (7:0)

l_tag (7:0)

l_tag (7:0)

timestamp_lsw (15:0)

timestamp_lsw (15:0)

timestamp_lsw (15:0)

timestamp_lsw (15:0)

validation (39:0)

validation (39:0)

validation (39:0)

validation (39:0)

L1A_tag (15:0)

L1A_tag (15:0)

L1A_tag (15:0)

L1A_tag (15:0)L1A_fifo_dout (39:0)L1A_fifo_dout (39:0)

communication manager

LLP bus handshake

trigger matching

trigger validation's fifo

trigger request's memory(TRM)

trigger request handshake fifo of TRM free slots

reset

reset

reset

reset

reset

reset

reset

reset

reset

reset

reset

reset

Fig. D2 – Internal block diagram of the mezzanine model.

The figure D2 shows the internal of the GTS mezzanine block. Each sub-block takes care of a given task and coordinates its activity with the others. Below is a list of the blocks and a description of the basic function they accomplish.

M. Bellato – INFN Padova 37

Draft version 1.3 – 11 November 2005

D.1 T_valreject_ctrl. (top left of the fig D2)

Fig. D3 – The T_valreject_ctrl block.

It is in charge of communicating a trigger request validation or rejection on the LLP/ancillary bus together with the relevant informations needed for this task, e.g. the 48bit validation or rejection tag that identifies the request and the event number in case of validation. It drives the val_rej_tag(7:0) bus according to a predefined handshake mechanism upon request of the two incoming signals “validate” or “rejecta” that are driven by the trigger_match block (middle right side of the picture).

klc_stg

eborts_gat

noitcejer_reggirt

noitadilav_reggirt

gat_jer_lav

0

0

0

0

00h’ 00 60 D9 00 60 00

0 sf000,000, sf000,000,087,81 sf000,000,008,81 sf000,000,028,81 sf000,000,048,81 sf000,000,068,81 sf000,000,088,81 0,009,81

sf000,558,487,81=AemiT

Fig. D4 – Trigger validation waveforms.

klc_stg

eborts_gat

noitcejer_reggirt

noitadilav_reggirt

gat_jer_lav

0

0

0

0

00h’ 00 10 E5 00

sf000,000,046,6 sf000,000,066,6 sf000,000,086,6 sf000,000,007,6 sf000,000,027,6 sf000,000,047,6 sf000,000,067,6 7,6

sf889,236,847,6=AemiT

Fig. D5 – Trigger rejection waveforms.

D.1.1 T_valreject_ctrl internal block diagram The simple state machine reported in the following figure highlights the expected behaviour of this block. Upon an external decision of either validation or rejection of an event one of the two possible branches is followed, producing on the val_rej_tag bus the sequence of event identifier + event number (validation) or only the event identifier (rejection).

M. Bellato – INFN Padova 38

Draft version 1.3 – 11 November 2005

S0

-- S0 --tag_strobe<='0';reject_ack<='0';validate_ack<='0';trigger_rejection<='0';trigger_validation<='0';val_rej_tag<=(others=>'0');

S1

-- S1 --trigger_rejection<='1';val_rej_tag<=valreject_tag(47 downto 40);tag_strobe<='1';

S2

-- S2 --trigger_rejection<='0';val_rej_tag<=valreject_tag(39 downto 32);

S3

-- S3 --val_rej_tag<=valreject_tag(31 downto 24);

S4

-- S4 --val_rej_tag<=valreject_tag(23 downto 16);

S5-- S5 --val_rej_tag<=valreject_tag(15 downto 8);

S6

-- S6 --trigger_validation<='1';val_rej_tag<=valreject_tag(47 downto 40);tag_strobe<='1';

S7 -- S7 --trigger_validation<='0';val_rej_tag<=valreject_tag(39 downto 32);

S8 -- S8 --val_rej_tag<=valreject_tag(31 downto 24);

S9 -- S9 --val_rej_tag<=valreject_tag(23 downto 16);

S10 -- S10 --val_rej_tag<=valreject_tag(15 downto 8);

S11

-- S11 --val_rej_tag<=valreject_tag(7 downto 0);reject_ack<='1';

S12 -- S12 --val_rej_tag<=valreject_tag(7 downto 0);

S13 -- S13 --val_rej_tag<=(others=>'0');tag_strobe<='0';

S14

-- S14 --val_rej_tag<=(others=>'0');tag_strobe<='0';

S15-- S15 --val_rej_tag<=ev_num(23 downto 16);

S16

-- S16 --val_rej_tag<=ev_num(15 downto 8);

S17 -- S17 --val_rej_tag<=ev_num(7 downto 0);validate_ack<='1';

1

rejecta='1'

2

validate='1'

rejecta='0'

validate='0'

Fig. D6 – T_valreject_ctrl state machine.

M. Bellato – INFN Padova 39

Draft version 1.3 – 11 November 2005

D.2 RTX. (top right of the fig. D2)

Fig. D7 – The RTX block.

It manages the high speed serial communication channel with the Root node (it sinks the fiber_in signals and sources the fiber_out signals). It decodes the 48bit timestamp pattern, the global clock (gts_clk) and all the messages coming from the root, e.g. the global reset (reset signal) and the trigger validation (L1A signal) for the time being. The RTX communicates with the root by means of bidirectional messages that are Hamming encoded for increased reliability. In the uplink direction it sources trigger request messages to the root node with a payload specifying an identifier for each individual request. It sources also periodically an “idle” message for use in the central trigger processor.

D.2.1 RTX internal block diagram The block in violet hides the simulation core of the multi-gigabit transceiver. The 16bit receiver bus rxdata is split in (top left part of the figure) the least significant byte (lsb) and most significant byte (msb). The lsb carries the encoded timestamp and gets decoded by passing through 2 lookup tables and a t_decode block that implements the decoding algorithm (due to D. Bazzacco). The msb carries Hamming encoded commands: to retrieve these commands the msb is passed through a decoder block (cmd_dec) that retrieves from the stream the validation informations (L1A and event number) the global reset and the commands to be routed to the bcast_out bus. A cmd_enc block is in charge of encoding commands in the upstream direction (txdata bus of the transceiver). For the time being the encoded commands are trigger requests, idle messages and the backpressure throttle.

n_ni_rebif

p_ni_rebif

klc_stg

wsl_pmatsemit

1

0

1

8910h’ 4910 5910 6910 7910 8910 9910 A910 B910 C910 D910 E910 F910 0A10 1A10 2A10 3A10 10

sf000,000,024,4 sf000,000,044,4 sf000,000,064,4 sf000,000,084,4 sf000,000,005,4 sf000,000,025,4 sf000,000,045,4 sf000,000,065,4 5,4

sf000,500,464,4=AemiT

Fig. D8 – Global clock, timestamp and gigabit transceivers waveforms.

M. Bellato – INFN Padova 40

Draft version 1.3 – 11 November 2005

t_decode

cmd_dec

MGT_custom

CHBONDDONE

CONFIGOUT

RXBUFSTATUS

RXCHARISCOMMA

RXCHARISK

RXCHECKINGCRC

RXCLKCORCNT

RXCOMMADET

RXCRCERR

RXDATA

RXDISPERR

RXLOSSOFSYNC

RXNOTINTABLE

RXREALIGN

RXRECCLK

RXRUNDISP

TXBUFERR

TXKERR

TXN

TXP

TXRUNDISP

CONFIGENABLE

CONFIGIN

ENMCOMMAALIGN

ENPCOMMAALIGN

ENCHANSYNC

LOOPBACK

POWERDOWN

REFCLK

REFCLK2

REFCLKSEL

BREFCLK

BREFCLK2

RXN

RXP

RXPOLARITY

RXRESET

RXUSRCLK

RXUSRCLK2

TXBYPASS8B10B

TXCHARDISPMODE

TXCHARDISPVAL

TXCHARISK

TXDATA

TXFORCECRCERR

TXINHIBIT

TXPOLARITY

TXRESET

TXUSRCLK

TXUSRCLK2MGT_custom

C12

C13

C15

C17

C18

C19

C20

C21

A

B

C4

"00111100""00111100"

C5

cmd_enc

L1A_fifo_ctrl

C6

C22

auto_cmd

lut_a

SPOA

lut_a

lut_a

lut_b

SPOA

lut_b

lut_b

ts_16 (15:0)

ts_16 (15:0)

ts_16 (15:0)

ts_16 (15:0)

timestamp (47:0)timestamp (47:0)

msk2 (47:0)msk2 (47:0)

msk1 (47:0)msk1 (47:0)

nibble1 (3:0)nibble1 (3:0)

nibble2 (3:0)nibble2 (3:0)

msb (7:0)msb (7:0)

rxdata (15:0)rxdata (15:0)

rxdata (15:0)

rxdata (15:0)rxdata (15:0)

rxdata (15:0)

tstamp_errtstamp_err

L1A (15:0)L1A (15:0)

resetreset

gts_clkgts_clk

backpressurebackpressure

msg_in (7:0)msg_in (7:0)

bcast_out (7:0)bcast_out (7:0)

bcast_strobebcast_strobe

msg_strobe (1:0)msg_strobe (1:0)

event_num (23:0)event_num (23:0)

gclkgclk

fiber_in_nfiber_in_n

fiber_in_pfiber_in_p

fiber_out_nfiber_out_n

fiber_out_pfiber_out_p

RXRECCLKRXRECCLK

S24S24

S26S26

LOOPBACK (1:0)LOOPBACK (1:0)

S35S35

S36S36

TXBYPASS8B10B (1:0)TXBYPASS8B10B (1:0)

S44S44

refclkselrefclksel

CHBONDDONECHBONDDONE

CONFIGOUTCONFIGOUT

RXBUFSTATUS (1:0)RXBUFSTATUS (1:0)

RXCHARISCOMMA (1:0)RXCHARISCOMMA (1:0)

RXCHARISK (1:0)RXCHARISK (1:0)

RXCHECKINGCRCRXCHECKINGCRC

RXCLKCORCNT (2:0)RXCLKCORCNT (2:0)

RXCOMMADETRXCOMMADET

RXCRCERRRXCRCERR

RXDISPERR (1:0)RXDISPERR (1:0)

RXLOSSOFSYNC (1:0)RXLOSSOFSYNC (1:0)

RXNOTINTABLE (1:0)RXNOTINTABLE (1:0)

RXREALIGNRXREALIGN

RXRUNDISP (1:0)RXRUNDISP (1:0)

TXBUFERRTXBUFERR

TXKERR (1:0)TXKERR (1:0)

TXRUNDISP (1:0)TXRUNDISP (1:0)

TXDATA (15:0)TXDATA (15:0)

S78 (7:0)S78 (7:0)

3C - comma

TXCHARISK (1:0)

TXCHARISK (1:0)

TXCHARISK (1:0)

TXCHARISK (1:0)

l_trg (1:0)l_trg (1:0)

l_tag (7:0)l_tag (7:0)

bclk

bclk

bclk

bclk

bclk

bclk

bclk

bclk

bclk

bclk

bclk

bclk

bclk

bclk

bclk

bclk

bclk

bclk

bclk

bclk

bclk

bclk

lreset

lreset

lreset

lreset

lreset

lreset

lreset

lreset

lreset

lreset

lreset

lreset

lreset

lreset

L1A_fifo_wr_enL1A_fifo_wr_enL1A_fifo_fullL1A_fifo_full

L1A_arrived

L1A_arrived

L1A_arrived

L1A_arrived

EQEQ

S90S90

lclklclk

gg

I1I1

I2I2outofsync

outofsync

outofsync

outofsync

OO

TX

DA

TA

(15:

8)T

XD

AT

A(1

5:8)

timestamp_lsw (15:0)timestamp_lsw (15:0)tstamp_lsw (15:0)tstamp_lsw (15:0)

Timestamp Decoder

Command Decoder

Command Encoder

Controller of trigger validations fifo

a_idlea_idle

a_commaa_comma

a_acka_ack

clear_idleclear_idle

Auto generated upstreamcommands

Fig. D9 – RTX block diagram.

M. Bellato – INFN Padova 41

Draft version 1.3 – 11 November 2005

D.3 FIFO_L1A. (middle left of fig. D2)

Fig. D10 – The FIFO_L1A block.

This block is a first-in-first-out memory that stores temporarily incoming trigger validations (and their identifiers) waiting to be served by the trigger_match block. It’s a de-randomizing memory with a depth of twenty locations.

M. Bellato – INFN Padova 42

Draft version 1.3 – 11 November 2005

D.4 DPRAM_tstamp. (low left of fig. D2)

Fig. D11 – The DPRAM_tstamp block.

It’a true dual-port memory holding the trigger requests and their identifiers originated by the LLP/ancillary front-end electronics. The pending trigger requests and associated tags are kept in this memory until validation or rejection or timeout expiry. As the order of arrival of trigger validations is not guaranteed (possibly not time-sorted), the possibility exists of accessing this memory not consecutively, hence the need of a random access memory type.

M. Bellato – INFN Padova 43

Draft version 1.3 – 11 November 2005

D.5 FIFO_FREELIST. (low right of fig. D2)

Fig. D12 – The FIFO_FREELIST block.

For the reason stated above, there is a need to keep a list of free location addresses in the DPRAM_tstamp memory. The FIFO_FREELIST is a first-in-first-out memory holding this list.

M. Bellato – INFN Padova 44

Draft version 1.3 – 11 November 2005

D.6 Trigger_match. (middle right of fig. D2)

Fig. D13 – The Trigger_match block.

This is a state machine in charge of checking if there is correspondence between the current (the oldest actually) trigger validation and any of the pending trigger requests. In case of match or timeout a proper action must be undertaken (e.g. validation or rejection of that event must be notified on the LLP/ancillary bus by means of the T_valreject_ctrl block). To do this, the Trigger_match block reads the L1A_fifo memory and if a validation exists scans the DPRAM_tstamp to look for the corresponding trigger request. It may happen that, during the scan, one or more trigger requests originated at a time that exceeds a fixed timeout are found. In both cases this must be notified to the LLP/ancillary bus by asserting the appropriate signal between “validate” and “rejecta”.

D.6.1 Trigger_match internal block diagram The overall state machine that governs the sequencing of operations is shown in the following figure D14. After reset the FIFO_FREELIST is filled with free location addresses of the DPRAM_tstamp memory. During normal operation the machine continuously scans the DPRAM_tstamp memory looking for expired trigger requests to discard. But when a trigger validation is arrived, the machine scans the DPRAM_tstamp memory looking for trigger requests that match the validation. The sequence of operation in this case is shown in fig. D15.

M. Bellato – INFN Padova 45

Draft version 1.3 – 11 November 2005

RST_S

-- RST_S --trigger_matched<='0';ev_discard<='0';L1A_fifo_rd_en<='0';

check_l1a_fifo

reject_start match_start

-- match_start --L1A_fifo_rd_en<='1';

M1

-- M1 --trigger_matched<='0';ev_discard<='0';L1A_fifo_rd_en<='0';

scan_mem_l1ascan_mem_dis

set_free_fifo

M2

-- M2 --L1A_time<=unsigned('0' & L1A_tag);

At least one L1A waiting to be servedCheck for trigger requests to be discarded

L1A arrived but no trigger requests waiting to be validated

Scan trg request memory in searchfor a match

L1A_fifo_empty='0'

T_request_mem_empty='0'

T_request_mem_empty='0'

Fig. D14 – The Trigger_match state machine.

D.6.2 Scan_mem_l1a internal block diagram Upon arrival of a trigger validation this sequence is entered. Each trigger request found in the DPRAM_tstamp memory is compared with the validation tag and one of three possible choices is evaluated (trigger request has matched, trigger request is expired, trigger request is more recent than validation and must be left in place for future validations).

S15

S16-- S16 --validate<='0';clear_en<='1';store_en<='1';

S17

-- S17 --vaLreject_tag<=T_request_mem_dout;validate<='1';storeaddr<=lcnt;

S18

-- S18 --vaLreject_tag<=T_request_mem_dout;rejecta<='1';storeaddr<=lcnt;

S19

-- S19 --rejecta<='0';clear_en<='1';store_en<='1';

S20

-- S20 --lcnt<="11111";clear_en<='0';clear<=(others=>'0');store_en<='0';

S21

-- S21 --lcnt<=std_logic_vector(unsigned(lcnt)+1);clear_en<='0';store_en<='0';

S22

-- S22 --checkaddr<= lcnt;

S24

m_check

preload counter with mem depth

Trg req mem has a valid request

Trigger request has matched L1ATrigger request is expired

Leave as is for future reqs

Three choicesIn case of match or discard:Clear current Trg req memory locationStore the current location in the free locations fifo

Check if trigger request matches L1A

1

trigger_matched ='1'

2ev_discard='1'

validate_ack='1'reject_ack='1'

lcnt = "10000"

T_request_mem_dout = "000000000000000000000000000000000000000000000000"

Fig. D15 – The scan_mem_l1a state machine.

M. Bellato – INFN Padova 46

Draft version 1.3 – 11 November 2005

D.7 T_request_ctrl. (low middle of fig. D2)

Fig. D16 – The T_request_ctrl block.

The block manages the handshake with LLP/ancillary electronics after an incoming trigger request. A tag is associated with any given trigger request and broadcasted on the local_tag bus. The same tag is also stored on the DPRAM_tstamp memory after having asked for a free location to the FIFO_FREELIST fifo. Note that the tag associated with any given trigger request is actually the 48bit global clock counter value latched at the time of the request. Figure D17 highlights the handshake sequence on the LLP bus that follows a trigger request.

klc_stg

gat_lacol

reggirt_lacol

eborts_tl

tseuqer_reggirt

0

00h’

0h’

0

0h’

00 A0 85 00

0 1 0

0 1 0

sf000,000,008,62 sf000,000,009,62 sf000,000,000,72 001,72

sf874,955,848,62=AemiT

Fig. D17 – Trigger request waveforms.

D.7.1 T_request_ctrl internal block diagram The simple state machine reacts upon a trigger request made on the LLP/ancillary interface. It latches the current value of the 48bit global timestamp and stores it in the DPRAM_tstamp memory. Immediately after, it sends the same timestamp, 8bit at a time, on the local_tag bus for use as an event identifier in the LLP/ancillary electronics.

M. Bellato – INFN Padova 47

Draft version 1.3 – 11 November 2005

S0

-- S0 --T_request_mem_wr_en<='0';ltag<=(others=>'0');ltrg<="00";lt_strobe<='0';t<=(others=>'0');

S1

-- S1 --ltrg<="01";T_request_mem_wr_en<='1';t<=tstamp;rd_en <= '0';

S2

-- S2 --ltrg<="00";T_request_mem_wr_en<='0';ltag<=tstamp(47 downto 40);lt_strobe<='1';

S3-- S3 --ltag<=tstamp(39 downto 32);

S4-- S4 --ltag<=tstamp(31 downto 24);

S5-- S5 --ltag<=tstamp(23 downto 16);

S6-- S6 --ltag<=tstamp(15 downto 8);

S7-- S7 --ltag<=tstamp(7 downto 0);

S9

-- S9 --tstamp<=timestamp(47 downto 0);rd_en <= '1';

trigger_request="01" AND T_request_mem_full='0'

Fig. D18 – T_request_ctrl state machine.

M. Bellato – INFN Padova 48