JFEX Uli Schäfer 1 Mainz Logos ? jFEX (inc. specific software/firmware & Tilecal input signal,...
-
Upload
abner-york -
Category
Documents
-
view
215 -
download
1
Transcript of JFEX Uli Schäfer 1 Mainz Logos ? jFEX (inc. specific software/firmware & Tilecal input signal,...
1
jFEX
Uli Schäfer
Mainz
Logos ?
jFEX (inc. specific software/firmware & Tilecal input signal, options) 30'
http://www.staff.uni-mainz.de/uschaefe/browsable/Meeting/2013/neu/
2
jFEX (inc. specific software/firmware & Tilecal input signal, options) 30'
Assume we have to cover all that’s in the document, but more details, where we feel it helps. Did I forget to cover any important section
- No physics justification• Jet processing general• L1Calo phase 1 scheme• Algorithms• Data replication • jFEX description #4• Density, fibre count etc.
• Input options #4
• Firmware (also software to be mentioned?)
• Demonstrators/current designs (GOLD, Topo, MiniPOD/V7) #4• Schedule, manpower req. jfex input HD+MZ
Uli Schäfer
3
Jet processingPhase-0 jet system consisting of • Pre-Processor
• Analogue signal conditioning• Digitization • Digital signal processing• Jet element pre-summation to 0.2 x 0.2 (η×φ)
• Jet processor• Sliding window processor for jet finding• Jet multiplicity determination• Jet feature extraction into L1Topo (pre-phase1)
At phase-1: complement with jet feature extractor jFEX• LAr signals optically from digital processor system• Tilecal signals from analogue Pre-Processor / JEP …• … eventually Tilecal optical data off detector, and possible
retirement of current L1Calo systemUli Schäfer
4
L1Calo Phase-1 System
Uli Schäfer
CPM
JEMCMX
CMX
Hub
Hub
L1Topo
ROD
ROD
JMMPPR
From Digital Processing System
CPM
JEMCMX
CMX
Hub
Hub
L1Topo
ROD
ROD
JMMPPR
jFEX
CPM
JEMCMX
CMX
Hub
eFEXHub
Opt.
Plant
L1Topo
ROD
ROD
JMM
New at Phase 1
RTM
RTM
5
• Hier 3 optionen
Uli Schäfer
6
Algorithms, now and then
Sliding window algorithm :Find and disambiguate ROIsCalculate jet energy in differently sized windows (programmable)
• Improve granularity by factor of
four, to 0.1×0.1 (η×φ)• Slightly increase environment• Allow for flexibility in jet definition
(non-square jet shape, Gaussian filter, …)
• Fat jets to be calculated from high granularity small jets
• Optionally increase jet environment (baseline 0.9 × 0.9)
Uli Schäfer
Phase 0 Phase 1
ROI 0.4 x 0.4
tower 0.2 x 0.2 0.1 x 0.1
three jet windows
up to 0.8 × 0.8 0.9 × 0.9limited by data duplication
7
Data replication
Sliding window algorithm requiring large scale replication of data• Forward duplication only (fan-out), no re-transmission• Baseline: no replication of any source into more than two
sinks• Fan-out in eta (or phi) handled at source only (DPS)
• Duplication at the parallel end (on-FPGA), using additional Multi-Gigabit Transceivers
• Allowing for differently composed streams• Minimizing latency
• Fan-out in phi (or eta) handled at destination only• Baseline “far end PMA loopback” • Looking into details and alternatives
Uli Schäfer
8
Initial baseline 8+ Modules, each covering full phi, limited eta range• Environment of 0.9 in eta (core bin +/- 4 neighbours)• Each module receives fully duplicated data in eta :
1.6 eta worth of data required for a core of 0.8• 16 eta bins including environment
8 FPGAs per module, each:• Environment 0.9×0.9• Each FPGA receives fully duplicated data in eta and phi:
1.6×1.6 worth of data required for a core of 0.8×0.8• 256 bins @ 0.1×0.1 in η×φ, e/m + had 512 numbers• With baseline 6.4 , 64 Multi-Gb/s receivers
• Hier fasercount
Uli Schäfer
10
fibre count / density• Erst durch 2 und dann mal 8 nach oben an slide 7
• 64 * 8 channels per FPGA• Due to full duplication in phi direction, exactly half of all 512 signals
are routed into the modules optically on fibres• 256 fibres• 22 × 12-channel opto receivers• 4 × 72-way fibre bundles / MTP connectors
• Option• For larger jets window we require larger FPGAs, some more fibres and
replication factor > ×2• Aim at higher line rates (currently FPGAs support 13 Gb/s, microPOD
10 Gb/s)• Allow for even finer granularity / larger jets / smaller FPGA devices
:• If digital processor baseline allows for full duplication of 6.4Gb/s
signals, the spare capacity, when run at higher rate, can be used to achieve a replication of more than 2-fold, so as to support a larger jet environment.
Uli Schäfer
11
How to fit on a module ?• ATCA
• 8 processors (~XC7VX690T)
• 4 microPODs each
• fan-out passive or “far end PMA loopback”
• Small amount of control logic / non-realtime (ROD)
• Nein! Might add 9th processor for consolidation of results
• Opto connectors in Zone 3
• Module control !!!
• Maximise module payload with help of small-footprint ATCA power brick and tiny IPMC mini-DIMM
Uli Schäfer
Z3
12
…and 3-d
Uli Schäfer
13
jFEX system• Need to handle both fine granularity and large jet environment (minimum 0.9×0.9) Require high density / high bandwidth per moduleNeed that density to keep input
replication factor at acceptable level
• ~ 8 modules (+FCAL ?) Bild neuer Teil, e,j, e ausgegraut• Single crate go for ATCA shelf / blades:• Sharing infrastructure with eFEX
• Handling / splitting of fibre bundles• ROD design• Hub design• RTM
Input signals:• Granularity .1×.1 (η×φ)• One electromagnetic, one hadronic tower – nach oben• --- nach unten• Unlike eFEX, no “BCMUX” scheme due to consecutive non-zero data• 6.4 Gb/s line rate, 8b/10b encoding, 128 bit per BC • For now, assume 16bit per tower, 8 towers per fibre
Uli Schäfer
15
Tilecal input options• List 3 options and mention DPS approach• For following slides/drawings:
Merge Sam and HCSC slides• Do we attach preferences, do we talk about
advantages/disadvantages ?• My preference: NO!!!
Uli Schäfer
Background
Phase-1 eFEX and jFEX receive digital EM layer data from LAr DPS But equivalent Tile data path not available
before Phase 2 So: need to extract digital hadronic tower sums
produced from the current analog sums sent to L1Calo
Three points where this can be done See next slide
16
17
Alternatives
EM calorimeterdigital readout
Muon detector
Analog sums from Tile/LAr nMCM
CMX
CMX
JEM
Endcap sector logic
Barrel sector logic
MuCTPi
Muon Trigger
L1Topo CTP
CORE
DPS
eFEX
jFEX
PreProcessor
Topological info
CTP outputNew/upgradedHardware
Central Trigger
L1Calo Trigger
JEP
CP
Receiverstations
EM data to FEX
Hadronic data to FEX
1
2 3
Can extract Tile tower sums from:1. Tile receiver stations2. PreProcessor modules3. JEM modules in JEP
Tile tower“DPS”
Considerations Latency Dynamic range
Current L1Calo towers have 8 bit dynamic range with 1GeV/LSB
Would like 9 or 10 bits, if possible Cost to implement Risk of disruption to existing system
18
Option 1: Tile Rx stations
• Signals extracted at arrival point in USA15, so latency cost is minimal
• Must build a new system to digitize and process analog signals• No constraints on dynamic range• Cost is high – essentially need to build new
receiver and PreProcessor systems• High risk of disruption to current L1Calo:
• Analog data path ahead of L1Calo rearranged• Where do we fit the new systems that do this?
19
Option 2: PreProcessor
New MCM (Phase 0) FPGA based tower processing Can drive higher-speed data to
the LVDS link driver card(blue arrows)
Replacement link card (Phase 1) Send tower data electrically to
CP and JEP (same as now) An FPGA and parallel-optic
transmitter (e.g. minipod) produce hadronic output to FEX
Fiber ribbon takes data from link card to an MTP/MPO output port (probably on front panel)
20
nMCM prototype
Option 2: PreProcessor
• Minimal latency:• Essentially equal to option 1;
• Can extend dynamic range:• nMCM can drive outputs at higher rates, so
more bits per tower possible• ‘Easy’ to get 9 bits, 10 bits probably possible
• Relatively low cost• nMCM will already exist• A few (small) LVDS link boards• Possibly need to replace some PreProcessor
mother boards (8 layers, low component count)
• Low disruption: Only upgrading existing boards
21
22
Option 3: JEM Upgrade
Double-ratetower data fromupgraded PPM(960 Mbit/s)
High-speed links to FEXfrom input cards to frontpanel (lowest latency)(hadronic tower sums)
Upgraded input cards
Option 3: JEM upgrade
Higher latency: Serial transmission from PPr to JEP adds
multiple BCs to latency Limited dynamic range:
BCMUX protocol consumes some bandwidth 9 bits possible (by removing parity), 10 bits
probably not possible Similar cost to Option 2
PreProcessor nMCM and link cards still get replaced (but not PPr mother boards?)
Plans to upgrade JEM daughter boards anyway
Low disruption: Again, similar to Option 2
23
SUM
ch1
ch2ch3
ch4
BCMUX
BCMUX
FPGA (Spartan-6) MCM #1
10
10
LVDS-Tx
LVDS-Tx
LVDS-Tx
MCM #16
CP1
CP2
JEP JEP
CP
Virtex-II
32x
16x
480Mb/s
480Mb/s
FPGA
FPGA
FPGA
FPGA
J2power
PPM
LCD (f/o & routing)
to CP(LVDS cables)
to JEP(LVDS cables)
to DAQr/o data
RGTM
V.Andrei, KIPL1Calo Weekly Meeting, 10/01/2013 2
Current SystemCurrent System
ch1
ch2ch3
ch4
BCMUX
BCMUX
FPGA (Spartan-6) MCM #1
MCM #16
CP1
10CP2
JEP
MUX + LVDS-Tx
LVDS-Tx
LVDS-Tx
JEP
CP
Spartan-6/Artix-7
32x
16x
480Mb/s
960Mb/s
FPGA
FPGA
FPGA
FPGA
J2power
PPM
nLCD (f/o & routing)
to CP(LVDS cables)
to JEP(LVDS cables)
to DAQr/o data
RGTM
V.Andrei, KIPL1Calo Weekly Meeting, 10/01/2013 3
Phase-I: first solutionPhase-I: first solution
ch1
ch2ch3
ch4
BCMUX
BCMUX
FPGA (Spartan-6) MCM #1
MCM #16
CP1
10CP2
JEP
MUX + LVDS-Tx
LVDS-Tx
LVDS-Tx
CP
JEP
JEP
CP
nLCD (f/o & routing)
Spartan-6/Artix-7
32x
16x
480Mb/s
960Mb/s
FPGA
FPGA
FPGA
FPGA
SNAP12
to CP(LVDS cables)
to JEP(LVDS cables)
J2
to DAQr/o data
Xilinx 7 Series
Rear Extension
power
PPM
to jFEX(optic fibers)
RGTM
V.Andrei, KIPL1Calo Weekly Meeting, 10/01/2013 4
Phase-I: second solution (A)Phase-I: second solution (A)
?FPGA
SUM
ch1
ch2ch3
ch4
BCMUX
BCMUX
FPGA (Spartan-6) MCM #1
10
10
LVDS-Tx
LVDS-Tx
LVDS-Tx
MCM #16
CP1
CP2
JEP JEP
CP
Virtex-II
32x
16x
480Mb/s
480Mb/s
FPGA
FPGA
FPGA
FPGA
J2power
PPM
LCD (f/o & routing)
to DAQr/o data
RGTM
V.Andrei, KIPL1Calo Weekly Meeting, 10/01/2013 5
to CP(LVDS cables)
to JEP(LVDS cables)
Xilinx 7 Series
Rear Extension
to jFEX(optic fibers)
FPGA
SNAP12
CP
JEP
Phase-I: second solution (B)Phase-I: second solution (B)
28
Firmware• jFEX• Sliding window algorithm• Infrastructure for high speed links• Module control• DAQ (buffers and embedded ROD functionality, common
effort)
• Example : JEM based Tilecal inputs• Serialization nMCM 960Mb/s• Serialization JMM 6.4Gb/s• Re-target existing JEM input firmware to new FPGA
Uli Schäfer
29
• Development line
Uli Schäfer
30
Demonstrator projects : “GOLD”
Try out technologies and schemes for L1Topo • Fibre input from the backplane (MTP-CPI connectors)• Up to 10Gb/s o/e data paths via industry standard converters on
mezzanine• Using mid range FPGAs (XC6VLX240T) up to 6.4Gb/s,
24 channels per device• Typical sort/dφ algorithm (successfully implemented) takes
~ 13% logic resources• Real-time output via opto links on the front panel (currently used as
data source for latency measurements etc.)• Will continue to be used as source/sink for L1Topo tests
Uli Schäfer
31
Recent GOLD results (Virtex-6)• Jitter analysis on cleaned TTC clock (σ = 2.9 ps)• Signal integrity: sampled in several positions along the chain• MGT and o/e converters settings optimization• Bit Error Rate (BER) < 10-16 at 6.4 Gbps / 12 channels• Eye widths above 60 ps (out of 156 ps)• Crosstalk among channels measured in some cases but with
negligible effect
Uli Schäfer Sampled after fan-out chip
GOLD latency (Virtex-6)
Latency measured along the real-time path in various points at 6.4 Gbs, 16 bit data width, and 8b/10b encoding
• Far End PMA loopback: 34 ns latency• Electrical (LVDS) output: 63 ns latency• Far End PCS loopback: 78 ns latency• Parallel loopback in fabric: 86 ns latencyFirmware mod for latency measurement including algorithm, and electrical out towards CTP under way (Volker almost got there…)
33
Topo processor details – post PDR• Real-time path:
• 14 fibre-optical 12-way inputs (miniPOD)
• Via four 48-way backplane connectors
• 4-fold segmented reference clock tree, 3 Xtal clocks each, plus jitter-cleaned LHC bunch clock multiple
• Two processors XC7V690T (prototype XC7V485T)
• Interlinked by 238-way LVDS path• 12-way (+) optical output to CTP• 32-way electrical (LVDS) output to
CTP via mezzanine• Full ATCA compliance / respective
circuitry on mezzanine• Module control via Kintex and Zynq
processors• Initially via VMEbus extension• Eventually via Kintex or Zynq
processor (Ethernet / base interface)Uli Schäfer
34
floor plan, so far…
Uli Schäfer
• Processor FPGA configuration via SystemACE and through module controller
• Module controller configuration via SPI and SD-card
• DAQ and ROI interface• Two SFPs, L1Calo style• up to 12 opto fibres
(miniPOD)• Hardware to support
both L1Calo style ROD interface, and embedded ROD / S-Link interface on these fibres
35
… and in 3-d
Uli Schäfer
36
Tests miniPOD on VC707• From Eduard
• Note : microPOD same o/e engine as miniPOD• (no pics on MicroPOD, probably confidential…)
Uli Schäfer
37
Schedule• Extract from Ian's Gantt chart, and that’s it?• Check with Ian whether anything to be updated before
the session
• Also manpower estimates ?• If so, anything to be said in addition to what’s in the
document ?
Uli Schäfer
38
conclusion• The 8-module jFEX seems possible with ~2013’s technology• Key technologies explored already (GOLD, L1Topo,…)• Use of microPODs challenging for thermal and mechanical reasons,
but o/e engine is the same as in popular miniPODs• Scheme allows for both fine granularity and large environment at
6.4Gb/s line rate and a limit of 100% duplication of input channels• Rather dense circuitry, but comparable to GOLD demonstrator• For even finer granularity and / or larger jets things get more
complicated Need to explore higher transmission rates • DPS needs to handle the required duplication (in eta)Details of fibre organization and content cannot be presented now Started work on detailed specifications, in parallel exploring higher
data rates…• Tilecal signals required in FEXes in fibre-optical format• Three options for generating them• All seem viable, but probably at different cost• Need to agree on a baseline before TDR
Uli Schäfer