Advanced Topics on FPGA Applications Screen B Wu, Jinyuan Fermilab IEEE NSS 2007 Refresher Course...

32
Advanced Topics on FPGA Applications Screen B Wu, Jinyuan Fermilab IEEE NSS 2007 Refresher Course Supplemental Materials Oct, 2007

Transcript of Advanced Topics on FPGA Applications Screen B Wu, Jinyuan Fermilab IEEE NSS 2007 Refresher Course...

Advanced Topics on FPGA ApplicationsScreen B

Wu, Jinyuan

Fermilab

IEEE NSS 2007 Refresher Course

Supplemental Materials

Oct, 2007

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

2

Outline Digital Design with FPGAs (This 45 min. Course)

Logic Element in a Nutshell Variations of the Registered Adders Tricks of Using RAM RAM based histograms Topics on Multipliers Curved Track Fitter

Advanced Topics on FPGA Applications (Included as Supplemental Materials) Doublet Finding, Hash Sorter Triplet Finding, Tiny Triplet Finder (TTF) Options of Sequence Control, Recursive Structure, etc.

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

3

y

xz

y1a

y1b

x1a

x1b

y2a

y2b

x2a

x2b

y3a

y3b

x3ax3b

2*y1 = y23*y1 = y3

Doublet Matching

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

4

Example of Evaluating the Key Number3*y1 = y3

K= 3*y1/8 K= y3/8

*3

y1 y3

K K

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

5

DIN DOUT

Index RAM

Pointer RAM

DATA RAM

K

Link List Structure of Hash Sorter

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

6

Histogram with Fast Reset

D QK

DV

RAM

QDWAWERA

D QD Q

+1

D Q

D Q

0

RAM

QDWAWERA

==

RCRC

CE

RESET

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

7

An Example of Track Recognition: Hits

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

8

An Example of Track Recognition: Doublets

Hits are paired together as doublet.

Ghost doublets may exist.

Ghost doublets may exist.

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

9

)sin(5025

2

0

0

r

cm

R

cmc

An Example of Track Recognition: Histogram

0

c0

Two track parameters can be calculated for each doublet.

A 2-D histogram is booked.

Doublets from same track are entered into same bin, (since they have same track parameters).

Sometimes they are stored in clusters.

This is a “ghost”.

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

10

An Example of Track Recognition: Tracks

All doublets from a track are contained in a cluster.

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

11

Simulation Results

An event with 200 tracks

It still works at 1000 tracks/event

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

12

Example: Finding “Soft Jets” A simulated event with 200 tracks. Flat distributions. Min. R = 55 cm

16 soft tracks are added. They are grouped in 2 small initial angle

regions, i.e., 2 “soft jets”.

00

Can you see the “soft jets”?

Can you see the “soft jets” now?

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

13

Outline Digital Design with FPGAs (This 45 min. Course)

Logic Element in a Nutshell Variations of the Registered Adders Tricks of Using RAM RAM based histograms Topics on Multipliers Curved Track Fitter

Advanced Topics on FPGA Applications (Included as Supplemental Materials) Doublet Finding, Hash Sorter Triplet Finding, Tiny Triplet Finder (TTF) Options of Sequence Control, Recursive Structure, etc.

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

14

y

xz

u1a

u1b

v1a

v1b

u2a

u2b

v2a v2b

u3a

u3b

v3a

v3b

u

v

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

15

• Three data items must satisfy the condition: xA+ xC = 2 xB.

• A total of n3 combinations must be checked (e.g. 5x5x5=125).

• Three layers of loops if the process is implemented in software.

• Large silicon resource may be needed without careful

planning: O(N2)

Triplet Finding

Plane A Plane B Plane C

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

16

Block Diagram, Step 1

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

17

Block Diagram, Step 2

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

18

Circular Tracks from Collision Point on Cylindrical Detectors

For a given hit on layer 3, the coincident between a layer 2 and a layer 1 hit satisfying coincident map signifies a valid circular track.

A track segment has 2 free parameters, i.e., a triplet. The coincident map is invariant of rotation.

0

10

20

30

40

50

60

70

80

90

100

0 20 40 60 80 100

0

16

32

48

64

80

96

112

128

0 16 32 48 64 80 96 112 128

1-3)+64

2-

3)+

64

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

19

Logarithmic Shifter

S1

S2

S4

# of bits: NShift distance: L# of stages: log2L

Total LE usage: N*log2L

A shift of X bit of the bit pattern is done in one clock cycle rather than X cycles.

Logarithmic shifter is also known as “barrel shifter”, but the term “logarithmic” reflects nature of implementation, resource usage and propagation delay better.

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

20

Logic Cell Usage

Both 64- and 128-bit TTF designs fit $100 FPGA comfortably.

A simple 64-bit Hough transform design is shown for scale.

A $1200 FPGA is shown for scale.

EP2A40 ($1200)

EP1C12 ($118)

TTF64

TTF128

Hough Trans. 64

TTF64 TTF128

$100 $1200

Hough64

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

21

u1

v1

u2

v2

u3

v3

u4

v4

y5

x5

Complex Triplet Finding Problems

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

22

Outline Digital Design with FPGAs (This 45 min. Course)

Logic Element in a Nutshell Variations of the Registered Adders Tricks of Using RAM RAM based histograms Topics on Multipliers Curved Track Fitter

Advanced Topics on FPGA Applications (Included as Supplemental Materials) Doublet Finding, Hash Sorter Triplet Finding, Tiny Triplet Finder (TTF) Options of Sequence Control, Recursive Structure, etc.

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

23

FPGA Process Sequencing Options

Program

Type

Program

Length

(CLK cycles)

Reprogram Resource

Usage

Finite State Machine

(FSM)

Fixed

Wired

10 Hard Small

Enclosed Loop Micro-Sequencer

(ELMS)

Memory

Stored

Program

10-1000 Easy Small

Microprocessor

(MP)

Memory

Stored

Program

>1000 Easy Large

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

24

The Between Counter

0,1,2,3,4,5,6,7,8,9,A

5,6,7,8,9,ASLOAD

D[]

SCLR

N Q[]

M-1==

A[]

B[]

T

5,6,7,8,9,A

5,6,7,8,9,A

5,6,7,8,9,A

5,6,7,8,9,A,B,C,D,E,F…

PC0: instr0PC1: instr1PC2: instr2PC3: instr3PC4: instr4PC5: instr5PC6: instr6PC7: instr7PC8: instr8PC9: instr9PCA: instrAPCB: instrBPCC: instrCPCD: instrD

TROM

BetweenCounter

ControlSignals

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

25

ELMS – Detailed Block Diagram

UserControlSignals

ROM128x

36bits

+1

CondJMP

PC

Reset

Loop & Return Registers

+ Stack (128 words)

Compare

RTNJMPIF

CNT

endA

bckA

PushPop

LoopBack

DEC

RTN

LastPass

LoopBack = DEC =(PC==endA) && (CNT!=0)

LastPass =(PC==endA) && (CNT==1)

desA

JMP

0x04

RUNat04 cnt EndA BckA

FOR BckA1 EndA1 #nLD R2, #addr_aLD R3, #addr_XLD R7, #0

BckA1 LD R4, (R2)INC R2LD R5, (R3)INC R3MUL R6, R4, R5

EndA1 ADD R7, R7, R6LD R8, R7

The Stack supports nested loops, up to 128 layers.

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

26

What’s Good About ELMSFOR Loops at Machine Code Level

Looping sequence is known in this example before entering the loop. Regular micro-processor treat the sequence as unknown. ELMS supports FOR loops with pre-defined iterations at machine code level. Execution time is saved and micro-complexities (branch penalty, pipeline bubble, etc.)

associated with conditional branches are avoided.

LD R1, #nLD R2, #addr_aLD R3, #addr_XLD R7, #0

BckA1 LD R4, (R2)INC R2LD R5, (R3)INC R3MUL R6, R4, R5

EndA1 ADD R7, R7, R6DEC R1BRNZ BckA1

FOR BckA1 EndA1 #nLD R2, #addr_aLD R3, #addr_XLD R7, #0

BckA1 LD R4, (R2)INC R2LD R5, (R3)INC R3MUL R6, R4, R5

EndA1 ADD R7, R7, R6

n

iiiXaY

0

25%

Microprocessor The ELMS

Conditional Branch

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

27

Outline Digital Design with FPGAs (This 45 min. Course)

Logic Element in a Nutshell Variations of the Registered Adders Tricks of Using RAM RAM based histograms Topics on Multipliers Curved Track Fitter

Advanced Topics on FPGA Applications (Included as Supplemental Materials) Doublet Finding, Hash Sorter Triplet Finding, Tiny Triplet Finder (TTF) Options of Sequence Control, Recursive Structure, etc.

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

28

The Problem: 3 60Hz AC

Rectify noise from power supply using 3-phase 60Hz AC are picked up by the input cable laying in the accelerator tunnel.

0

1000

2000

3000

4000

5000

6000

0 360 720 1080 1440 1800 2160 2520 2880 3240 3600

frequency (Hz)

Am

pli

tud

e

Time Domain

Frequency Domain

ADC21s/sample

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

29

Filtering Results

Noises >360Hz, the dominating portion, are filtered out in both filter functions.

CIC sum is a lot smoother than the sliding sum. But small signals are still buried under ripples of 60 and 180 Hz.

SlidingSum

CICSum

Signals

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

30

Recursive Implementation of CIC Sum

The non-recursive

implementation needs: 248 memory fetches, 248 multiplications, 248 additions and more

ops for longer sum lengths.

+

s[n]

-x[n-K]

x[n]

+y[n]

-s[n-K]

+u[n]

-2x[n-K]

x[n]

+y[n]

x[n-2K]

x[n]

y[n]

*h1*h2

*h[K]

The CIC sum constructed

as a sliding sum of sliding

sums: 2 memory fetches, 0 multiplications, 4 add/sub ops for any

sum length.

The re-formulated CIC sum uses the raw data buffer rather than a separate buffer.

CICSum

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

31

Exponential Sequence Generator

Q

SET

D

if (CO==1) {Q = Q - Q/32;}

0

10000

20000

30000

40000

50000

60000

70000

0 20 40 60 80 100 120 140 160

This is also an example of recursive structure. This is IIR but it is stable. Dropping exponential components are used to stabilize

other recursive structures.

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

32

The EndThanks