L0 Trigger Primitives: Functionality and Performance on ...bviren/tmp/primitives.pdf · L0 Trigger...

28
L0 Trigger Primitives: Functionality and Performance on Commodity CPU (an initial study) Brett Viren Physics Department 7 Feb 2018

Transcript of L0 Trigger Primitives: Functionality and Performance on ...bviren/tmp/primitives.pdf · L0 Trigger...

Page 1: L0 Trigger Primitives: Functionality and Performance on ...bviren/tmp/primitives.pdf · L0 Trigger Primitives: Functionality and Performance on Commodity CPU (an initial study) Brett

L0 Trigger Primitives:Functionality and Performance on

Commodity CPU(an initial study)

Brett VirenPhysics Department

7 Feb 2018

Page 2: L0 Trigger Primitives: Functionality and Performance on ...bviren/tmp/primitives.pdf · L0 Trigger Primitives: Functionality and Performance on Commodity CPU (an initial study) Brett

Overview

Outline

Overview

Simulation

Baseline and Thresholds

L0 Primitives

CPU Time

Next steps

Brett Viren (BNL) L0+CC 7 Feb 2018 2 / 25

Page 3: L0 Trigger Primitives: Functionality and Performance on ...bviren/tmp/primitives.pdf · L0 Trigger Primitives: Functionality and Performance on Commodity CPU (an initial study) Brett

Overview

Overview

WIBs FELIXs RAM

L0 pipeline

Event Builderpoll

Trigger Logicprimitivecommand

Offline Disk

DUNE FD DAQ Option: FELIX + Commodity Computing.

Focused here on the L0 pipeline:1 Produced Wire-Cell Toolkit simulation: noise + 39Ar signal.2 Used real WIB channel addressing.3 Produce tick-vectors of 480 collection channels (5 ms).4 Calculate baseline and noise levels per channel.5 L0 primitives: per channel, regions of time-over-threshold.6 Measure CPU usage.

Note: for now focus only on finding 39Ar signal. They serve as a proxy for anynear-threshold activity. Will enlarge signal sample later.

Brett Viren (BNL) L0+CC 7 Feb 2018 3 / 25

Page 4: L0 Trigger Primitives: Functionality and Performance on ...bviren/tmp/primitives.pdf · L0 Trigger Primitives: Functionality and Performance on Commodity CPU (an initial study) Brett

Simulation

Overview

Simulation

Baseline and Thresholds

L0 Primitives

CPU Time

Next steps

Brett Viren (BNL) L0+CC 7 Feb 2018 4 / 25

Page 5: L0 Trigger Primitives: Functionality and Performance on ...bviren/tmp/primitives.pdf · L0 Trigger Primitives: Functionality and Performance on Commodity CPU (an initial study) Brett

Simulation

Wire-Cell Toolkit Simulation Feature Overview

• Noise: Rayleigh sampled amplitude spectrum, uniformly sampled phases,apply inverse FFT to get noise waveforms, use MicroBooNE’s post-filteredspectra.

• Deposition: 39Ar energy spectrum and decay rate (1 Bq/kg, adjustable)

• Drift: recombination, longitudinal and transverse diffusion.

• Induction: integrate over and interpolate between long-range (±10wires) and fine-grain ( 1

10th

pitch) induction drift paths.

• Shaping: detailed Cold-FE electronic response function and a simpleADC model.

• Output: write Numpy arrays (other formats available).

→ Stand-alone simulation usable now. Integration with LArSoft waiting on LSdata model developments.

Brett Viren (BNL) L0+CC 7 Feb 2018 5 / 25

Page 6: L0 Trigger Primitives: Functionality and Performance on ...bviren/tmp/primitives.pdf · L0 Trigger Primitives: Functionality and Performance on Commodity CPU (an initial study) Brett

Simulation

Generate Wires $ wirecell-util make-wires -d apa dune-wires.json.bz2$ wirecell-util plot-wires dune-wires.json.bz2 dune-wires.pdf

1 0 1Z [meter]

3

2

1

0

1

2

3

Y [m

eter

]

wip:10000 ch:11205

wip:11140 ch:24505

Anode 0, Face 0, Plane 0 every 10th wire

1 0 1Z [meter]

3

2

1

0

1

2

3

Y [m

eter

]

wip:20000 ch:12311

wip:21140 ch:25811Anode 0, Face 0, Plane 1 every 10th wire

1 0 1Z [meter]

3

2

1

0

1

2

3

Y [m

eter

]

wip:30000 ch:11216wip:30479 ch:25806Anode 0, Face 0, Plane 2 every 10th wire

• Real DUNE/protoDUNE wire length, angles, pitches and multiplicities.

• An APA-level channel identifier number built from WIB addresses:

[WIB connector(1-4)][WIB slot(1-5)][chip(1-8)][channel(01-16)]

Brett Viren (BNL) L0+CC 7 Feb 2018 6 / 25

Page 7: L0 Trigger Primitives: Functionality and Performance on ...bviren/tmp/primitives.pdf · L0 Trigger Primitives: Functionality and Performance on Commodity CPU (an initial study) Brett

Simulation

FEMB-level Channel/Conductor map

ASIC: 1 2 3 4 5 6 7 8ch00 u19 u09 w14 w02 u29 u39 w26 w38ch01 u17 u07 w16 w04 u27 u37 w28 w40ch02 u15 u05 w18 w06 u25 u35 w30 w42ch03 u13 u03 w20 w08 u23 u33 w32 w44ch04 u11 u01 w22 w10 u21 u31 w34 w46ch05 v19 v09 w24 w12 v29 v39 w36 w48ch06 v17 v07 v12 v02 v27 v37 v22 v32ch07 v15 v05 v14 v04 v25 v35 v24 v34ch08 v13 v03 v16 v06 v23 v33 v26 v36ch09 v11 v01 v18 v08 v21 v31 v28 v38ch10 w23 w11 v20 v10 w35 w47 v30 v40ch11 w21 w09 u12 u02 w33 w45 u22 u32ch12 w19 w07 u14 u04 w31 w43 u24 u34ch13 w17 w05 u16 u06 w29 w41 u26 u36ch14 w15 w03 u18 u08 w27 w39 u28 u38ch15 w13 w01 u20 u10 w25 w37 u30 u40

See DUNE DocDB 4064 for details.

Brett Viren (BNL) L0+CC 7 Feb 2018 7 / 25

Page 8: L0 Trigger Primitives: Functionality and Performance on ...bviren/tmp/primitives.pdf · L0 Trigger Primitives: Functionality and Performance on Commodity CPU (an initial study) Brett

Simulation

FEMB-level Conductor/Channel Map (inverse)U layer, first half: conductor / chip / chan

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 202 4 2 4 2 4 2 4 2 4 1 3 1 3 1 3 1 3 1 34 11 3 12 2 13 1 14 0 15 4 11 3 12 2 13 1 14 0 15

U layer, second half: conductor / chip / chan21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 405 7 5 7 5 7 5 7 5 7 6 8 6 8 6 8 6 8 6 84 11 3 12 2 13 1 14 0 15 4 11 3 12 2 13 1 14 0 15

V layer, first half: conductor / chip / chan1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 202 4 2 4 2 4 2 4 2 4 1 3 1 3 1 3 1 3 1 39 6 8 7 7 8 6 9 5 10 9 6 8 7 7 8 6 9 5 10

V layer, second half: conductor / chip / chan21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 405 7 5 7 5 7 5 7 5 7 6 8 6 8 6 8 6 8 6 89 6 8 7 7 8 6 9 5 10 9 6 8 7 7 8 6 9 5 10

W layer, first half: conductor / chip / chan1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 242 4 2 4 2 4 2 4 2 4 2 4 1 3 1 3 1 3 1 3 1 3 1 315 0 14 1 13 2 12 3 11 4 10 5 15 0 14 1 13 2 12 3 11 4 10 5

W layer, second half: conductor / chip / chan25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 485 7 5 7 5 7 5 7 5 7 5 7 6 8 6 8 6 8 6 8 6 8 6 815 0 14 1 13 2 12 3 11 4 10 5 15 0 14 1 13 2 12 3 11 4 10 5

Brett Viren (BNL) L0+CC 7 Feb 2018 8 / 25

Page 9: L0 Trigger Primitives: Functionality and Performance on ...bviren/tmp/primitives.pdf · L0 Trigger Primitives: Functionality and Performance on Commodity CPU (an initial study) Brett

Simulation

WCT Simulation Raw Frame Data$ wire-cell -c wctcfg/noise-ar39-sim.jsonnet

$ blips plot-frames -o plots.pdf dune-wctsim-fullrate-adc-noise.npz

0 2000 4000 6000 8000 10000tick

0

250

500

750

1000

1250

1500

1750

2000

chan

nel a

rray

inde

x

ADC Frame 0

600

800

1000

1200

1400

1600

1800

0 500 1000 1500 2000channel array index

10000

15000

20000

25000

30000

35000

40000

45000

chan

nel i

dent

Front face, z<0Front face, z>0

Back face, z<0Back face, z>0

Channel Identity Numbers for Frame 0

• Simulation provides 2D frame array ADC sample in channel vs tick.• 5 ms frames used here, but size is configurable.

• 1D channel array associates frame array rows with channel identifiernumbers.

• Current limitation (bug): no “back” face collection plane.

Brett Viren (BNL) L0+CC 7 Feb 2018 9 / 25

Page 10: L0 Trigger Primitives: Functionality and Performance on ...bviren/tmp/primitives.pdf · L0 Trigger Primitives: Functionality and Performance on Commodity CPU (an initial study) Brett

Simulation

Group Channels By Plane

0 2000 4000 6000 8000 10000tick

0

500ar

ray

inde

x

U-Plane ADC Frame 0

0 2000 4000 6000 8000 10000tick

0

500

arra

y in

dex

V-Plane ADC Frame 0

0 2000 4000 6000 8000 10000tick

0

200

400arra

y in

dex

W-Plane ADC Frame 0

183018351840184518501855

1830183518401845185018551860

420440460480500

• Use channel/conductor map to sort out data into U/V/W planes.• Focus next only on just the 480 collection channels.

Brett Viren (BNL) L0+CC 7 Feb 2018 10 / 25

Page 11: L0 Trigger Primitives: Functionality and Performance on ...bviren/tmp/primitives.pdf · L0 Trigger Primitives: Functionality and Performance on Commodity CPU (an initial study) Brett

Simulation

0 2000 4000 6000 8000 10000tick

0

100

200

300

400

chan

nel a

rray

inde

xCollection Frame

400

420

440

460

480

500

ADC

Brett Viren (BNL) L0+CC 7 Feb 2018 11 / 25

Page 12: L0 Trigger Primitives: Functionality and Performance on ...bviren/tmp/primitives.pdf · L0 Trigger Primitives: Functionality and Performance on Commodity CPU (an initial study) Brett

Simulation

0 200 400 600 800 1000tick

0

20

40

60

80

chan

nel a

rray

inde

xCollection Frame Zoom

405

410

415

420

425

430

ADC

Brett Viren (BNL) L0+CC 7 Feb 2018 12 / 25

Page 13: L0 Trigger Primitives: Functionality and Performance on ...bviren/tmp/primitives.pdf · L0 Trigger Primitives: Functionality and Performance on Commodity CPU (an initial study) Brett

Simulation

0 25 50 75 100 125 150 175tick

0

5

10

15

20

25

30

35

chan

nel a

rray

inde

xCollection Frame Zoom

405

410

415

420

425

430

435

440

ADC

Brett Viren (BNL) L0+CC 7 Feb 2018 13 / 25

Page 14: L0 Trigger Primitives: Functionality and Performance on ...bviren/tmp/primitives.pdf · L0 Trigger Primitives: Functionality and Performance on Commodity CPU (an initial study) Brett

Baseline and Thresholds

Overview

Simulation

Baseline and Thresholds

L0 Primitives

CPU Time

Next steps

Brett Viren (BNL) L0+CC 7 Feb 2018 14 / 25

Page 15: L0 Trigger Primitives: Functionality and Performance on ...bviren/tmp/primitives.pdf · L0 Trigger Primitives: Functionality and Performance on Commodity CPU (an initial study) Brett

Baseline and Thresholds

Baseline and Thresholds

0 25 50 75 1000

500

1000

1500

2000

2500One channel

0 200 400

0

20

40

60

80

ADC

- 350

480 collection channels

0 25 50 75 100ADC - 350

0

2000

4000

6000

8000

10000

median ---- baseline

low ---- threshold0 200 400

channel

0

20

40

60

80

ADC

- 350

0

500

1000

1500

2000

2500

0

2000

4000

6000

8000

10000

• Fill ADC code histogram and its cumulative histograms.→ Histogram bins: 4096 ADC× 480 channels,

• Find ADClow as 0.1%-percentile and ADCbl as median.• Set threshold: ADCthresh = ADCbl + Nσ (ADCbl − ADClow ) ,Nσ = 2

→ These are tuning parameters.

• Per-channel L0 trigger primitives are regions of consecutive ticks withabove-threshold samples.

Brett Viren (BNL) L0+CC 7 Feb 2018 15 / 25

Page 16: L0 Trigger Primitives: Functionality and Performance on ...bviren/tmp/primitives.pdf · L0 Trigger Primitives: Functionality and Performance on Commodity CPU (an initial study) Brett

L0 Primitives

Overview

Simulation

Baseline and Thresholds

L0 Primitives

CPU Time

Next steps

Brett Viren (BNL) L0+CC 7 Feb 2018 16 / 25

Page 17: L0 Trigger Primitives: Functionality and Performance on ...bviren/tmp/primitives.pdf · L0 Trigger Primitives: Functionality and Performance on Commodity CPU (an initial study) Brett

L0 Primitives

0 2000 4000 6000 8000 10000tick

0

100

200

300

400

chan

nel a

rray

inde

xCollection Frame

400

420

440

460

480

500

ADC

Brett Viren (BNL) L0+CC 7 Feb 2018 17 / 25

Page 18: L0 Trigger Primitives: Functionality and Performance on ...bviren/tmp/primitives.pdf · L0 Trigger Primitives: Functionality and Performance on Commodity CPU (an initial study) Brett

L0 Primitives

0 2000 4000 6000 8000 10000tick

0

100

200

300

400

chan

nel a

rray

inde

xTrigger Primitives

0.0

0.2

0.4

0.6

0.8

1.0

Brett Viren (BNL) L0+CC 7 Feb 2018 17 / 25

Page 19: L0 Trigger Primitives: Functionality and Performance on ...bviren/tmp/primitives.pdf · L0 Trigger Primitives: Functionality and Performance on Commodity CPU (an initial study) Brett

L0 Primitives

0 200 400 600 800 1000tick

0

20

40

60

80

chan

nel a

rray

inde

xCollection Frame Zoom

405

410

415

420

425

430

ADC

Brett Viren (BNL) L0+CC 7 Feb 2018 18 / 25

Page 20: L0 Trigger Primitives: Functionality and Performance on ...bviren/tmp/primitives.pdf · L0 Trigger Primitives: Functionality and Performance on Commodity CPU (an initial study) Brett

L0 Primitives

0 200 400 600 800 1000tick

0

20

40

60

80

chan

nel a

rray

inde

xTrigger Primitives Zoom

0.0

0.2

0.4

0.6

0.8

1.0

Brett Viren (BNL) L0+CC 7 Feb 2018 18 / 25

Page 21: L0 Trigger Primitives: Functionality and Performance on ...bviren/tmp/primitives.pdf · L0 Trigger Primitives: Functionality and Performance on Commodity CPU (an initial study) Brett

L0 Primitives

0 25 50 75 100 125 150 175tick

0

5

10

15

20

25

30

35

chan

nel a

rray

inde

xCollection Frame Zoom

405

410

415

420

425

430

435

440

ADC

Brett Viren (BNL) L0+CC 7 Feb 2018 19 / 25

Page 22: L0 Trigger Primitives: Functionality and Performance on ...bviren/tmp/primitives.pdf · L0 Trigger Primitives: Functionality and Performance on Commodity CPU (an initial study) Brett

L0 Primitives

0 25 50 75 100 125 150 175tick

0

5

10

15

20

25

30

35

chan

nel a

rray

inde

xTrigger Primitives Zoom

0.0

0.2

0.4

0.6

0.8

1.0

Brett Viren (BNL) L0+CC 7 Feb 2018 19 / 25

Page 23: L0 Trigger Primitives: Functionality and Performance on ...bviren/tmp/primitives.pdf · L0 Trigger Primitives: Functionality and Performance on Commodity CPU (an initial study) Brett

CPU Time

Overview

Simulation

Baseline and Thresholds

L0 Primitives

CPU Time

Next steps

Brett Viren (BNL) L0+CC 7 Feb 2018 20 / 25

Page 24: L0 Trigger Primitives: Functionality and Performance on ...bviren/tmp/primitives.pdf · L0 Trigger Primitives: Functionality and Performance on Commodity CPU (an initial study) Brett

CPU Time

Performance on CPU$ ./build/test_primitives channel-map.npz \

dune-wctsim-fullrate-adc-noise.npz \primitives.npz > primitives.log

...copy 480 collection channels x 10000 ticks in 0.0331903find baseline/thresholds in 0.0203175median: 5000, threshold: 10 (0.001)find 10888 primitives in 0.0186963dump intermdiates to numpy in 0.227635

• Compiled with g++ -O3 (clang++ gives similar)

• Xeon E5-2630 v4 @ 2.20GHz (20 threads, 640k/2.5M/25M cache)→ this test is single threaded

• Rounding up, need 50 core-ms for 5 ms × 480 channels.

• If linear scaling holds, this Xeon CPU keeps up with one APA!

Still a big “if”, big fat caveats:

• not streaming data, single thread, scaling not yet proven.

• 20MB collection frame, 32bit int samples, cache may be fooling me.

Brett Viren (BNL) L0+CC 7 Feb 2018 21 / 25

Page 25: L0 Trigger Primitives: Functionality and Performance on ...bviren/tmp/primitives.pdf · L0 Trigger Primitives: Functionality and Performance on Commodity CPU (an initial study) Brett

Next steps

Overview

Simulation

Baseline and Thresholds

L0 Primitives

CPU Time

Next steps

Brett Viren (BNL) L0+CC 7 Feb 2018 22 / 25

Page 26: L0 Trigger Primitives: Functionality and Performance on ...bviren/tmp/primitives.pdf · L0 Trigger Primitives: Functionality and Performance on Commodity CPU (an initial study) Brett

Next steps

Next steps

• Increase simulation data set to cover more time (fewseconds) and event variety→ toss in a few cosmic-µ’s

• Implement feeding “raw data” to multi-threaded ring buffer.• Parallelize the current L0 pipeline to test MT scalability.• Develop MT’ed mechanisms to access “live” ring buffer.

→ feed MT L0 pipelines.→ demonstrate execution of trigger commands.→ readout final “events”.

• New ideas to try (details next)→ Develop streamed/chunked baseline/threshold calculation.→ Reduce data input to L0 pipeline.

Brett Viren (BNL) L0+CC 7 Feb 2018 23 / 25

Page 27: L0 Trigger Primitives: Functionality and Performance on ...bviren/tmp/primitives.pdf · L0 Trigger Primitives: Functionality and Performance on Commodity CPU (an initial study) Brett

Next steps

Streamed/Chunked Baselines/ThresholdsCalculate baselines/thresholds with datafrom recent past, avoid slow recalculationsand allow for drifts.

0 25 50 75 1000

500

1000

1500

2000

2500One channel

0 200 400

0

20

40

60

80

ADC

- 350

480 collection channels

0 25 50 75 100ADC - 350

0

2000

4000

6000

8000

10000

median ---- baseline

low ---- threshold0 200 400

channel

0

20

40

60

80

ADC

- 350

0

500

1000

1500

2000

2500

0

2000

4000

6000

8000

10000

• Calculate N interval cumulative ADC distribution histograms {H(i)},each spanning a subsequent time interval ∆t .

• Store individual {Hi} in ordered deque.• Sum: CN =

∑Ni Hi , total cumulative ADC distribution at tN

• For each new ti+N+1 = ti+N + ∆t do:• Form fresh interval cumulative: HN+i+1• Update total cumulative: Ci+N+1 = Ci+N + Hi+N+1 − Hi• deque.push(Hi+N+1) and deque.pop(Hi).• Recalculate channel baselines/thresholds from CN .

→ n and ∆t are tuning parameters (n ≈ 10 and ∆t ≈ 1 ms?)

→ Also, place per-channel guards on max/min ADC.• Elongated tracks or big showers leading to signal in >50% samples will bias

median away from true baseline.

Brett Viren (BNL) L0+CC 7 Feb 2018 24 / 25

Page 28: L0 Trigger Primitives: Functionality and Performance on ...bviren/tmp/primitives.pdf · L0 Trigger Primitives: Functionality and Performance on Commodity CPU (an initial study) Brett

Next steps

Reduce Input Data Rate to L0 Pipeline

39Ar “blip” in raw ADC

0 25 50 75 100 125 150 175tick

0

5

10

15

20

25

30

35

chan

nel a

rray

inde

x

Collection Frame Zoom

405

410

415

420

425

430

435

440

ADC

Idea from Giovanna:⇒ Look at resampling or rebinning by ×2− 4 ticks.• 2 MHz is actually oversampling our waveforms.

• 2-3µs anti-aliasing filter in electronics.• 5 mm/1.6 mm/µs ≈ 3 µs characteristic time for induction rise.• ≈1 µs spread due to electron diffusion in LAr.

• APA makes 76.8 Gbps, 38 from collection channels.

→ resampling/rebinning: 14.4→ 7.2→ 3.6 Gbps

Brett Viren (BNL) L0+CC 7 Feb 2018 25 / 25