May. 2009, Wu Jinyuan, Fermilab [email protected] IEEE RT09 Short Course 1 FPGA Structure,...
-
Upload
isaac-freeman -
Category
Documents
-
view
228 -
download
0
Transcript of May. 2009, Wu Jinyuan, Fermilab [email protected] IEEE RT09 Short Course 1 FPGA Structure,...
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course
1
FPGA Structure, Programming Principals and Applications:Part II
Wu, Jinyuan
Fermilab
IEEE Real Time Conference Short Course
May, 2009
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 2
Outline Counting:
Example: LED brightness and DAC Simple Sequencing
Bandwidth and Noise Issues: General Remarks on Sampling Theorem and Dithering. Example: Huffman Coding Example: Decimation & Dynamic Decimation
After-fact Calibration: Several Topics on FPGA Based TDC Serial Communication with Independent Crystals Minimum Synchronization
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 3
Flashing LED, The First Thing First
Counter
Q[23..0]
At least design an LED for an FPGA. When a board is first powered up, first
test the LED flashing function. Many things have to be right so that the
LED flashes: Power pins must be all connected. Configuration devices must be in correct mode. Design software must be correct.
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 4
FPLED Brightness Variation
Counter
Q[23..0]A
B
A<B
LUT
Counter
Q[23..0]
A
B
A<B
The LED brightness is varied by changing the output pulse duty-cycle.
Comparator input A is the brightness and B is the clock cycle count.
Look-up table can be added to input A for different brightness variation curve.
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 5
FP
LED Brightness Exponential Drop
Counter
Q
A
B
A<BCO
Q
SET
D
if (CO==1) {Q = Q - Q/32;}
Narrow pulse are typically stretched for LED display with fix brightness.
The circuit here provides gradually dim of the LED for better visual effect.
Possible
Student Lab
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 6
Exponential Sequence Generator
Q
SET
D
if (CO==1) {Q = Q - Q/32;}
0
10000
20000
30000
40000
50000
60000
70000
0 20 40 60 80 100 120 140 160
An exponential sequence is generated using an accumulator shown above.
Note that not even one multiplier is used. Other function sequences: sine, co-sine, tangent, co-
tangent etc. can also be generated similarly.
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 7
Duty-Cycle Based Single-Pin DAC (1)
The duty-cycle or pulse width of the comparator output is proportional to the DAC input at port A.
Use external RC as low-pass filter. Output voltage of an ideal LP filter is proportional to the
DAC input.
0
1
2
3
4
896 960 1024
CounterQ
A
B
A>B
DAC Input
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 8
Duty-Cycle Based Single-Pin DAC (2)
0
1
2
3
4
896 960 1024
Q
CO
DDAC Input
Possible
Student Lab
Use carry-out of the accumulator as the output. The number of pulses is proportional to the DAC input. Rounding error is carried to later cycles. Output is smoother.
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 9
The Frequency Spectrum of DAC (2)
0
1
2
3
4
896 960 1024
0
100
0 64 128 192 256 320 384 448 512
Frequency
0
100
0 64 128 192 256 320 384 448 512
Frequency
0
100
0 64 128 192 256 320 384 448 512
Frequency
Q
CO
DDAC Input
The first harmonic may be suppressed. Works better with regular low-pass
filters.
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 10
The Frequency Spectrum of DAC (1)
CounterQ
A
B
A>B
DAC Input
0
1
2
3
4
896 960 1024
0
100
0 64 128 192 256 320 384 448 512
Frequency
0
100
0 64 128 192 256 320 384 448 512
Frequency
0
100
0 64 128 192 256 320 384 448 512
Frequency
The first harmonic has dominate concentration.
Works better with notch filter.
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 11
Outline Counting:
Example: LED brightness and DAC Simple Sequencing
Bandwidth and Noise Issues: General Remarks on Sampling Theorem and Dithering. Example: Huffman Coding Example: Decimation & Dynamic Decimation
After-fact Calibration: Several Topics on FPGA Based TDC Serial Communication with Independent Crystals Minimum Synchronization
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 12
ST
CLK
QA[5]
QA[4..0] 0 1 03130
Start, Count: A Single Layer Loop
The ST signal start the sequence
Counting is enabled
Counting stops
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 13
CLK
ST
QC{1..0]
CNTC
VCCCLK INPUT
VCCST INPUT
QAA[7..0]OUTPUT
QBA[7..0]OUTPUT
up countersset 2sset
clock
cnt_en
q[1..0]
lpm_counter25
inst1
up countersclr
clock
cnt_en
q[7..0]
lpm_counter26
inst3
up countersclr
clock
cnt_en
q[7..0]
lpm_counter26
inst4
NOT
inst
OR2
inst6
CLRN
DPRN
Q
DFF
inst7CLRN
DPRN
Q
DFF
inst8
CLK
QC[0]
CLK
QC0QQ
OR2
inst9
AND2
inst10
NOT
inst12
NOT
inst13
data[7..0]eq254
eq255
lpm_decode2
inst15
data[7..0]eq254
eq255
lpm_decode2
inst16
OR2
inst11
AND2
inst14AND2
inst17
AAeqFF
BAeqFF
BAeqFF
QC[0]
CLK
CNTB
QC[0]
SCLRB
AAeqFF
QC[1]
CLK
CNTA
QC[1]
QC[0]QC0QQ
SCLRA
BAeqFF
AAeqFF
QBA[7..0]
QAA[7..0]
A Double-Layer + Single-Layer Sequencer BA AA
0 0 1 2 3 4 255
1 0 1 2 3 4 255
2 0 1 2 3 4 255
3 0 1 2 3 4 255
4 0 1 2 3 4 255
255 0 1 2 3 4 255
0 0 A double-layer loop is followed by a single-layer loop.
1 0
2 0
3 1
4 2
255 253
0 254
0 255
0 0
Inner Loop
Outer Loop
State Control
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 14
256
Wor
d(s)
RA
M
Block Ty pe: M4K
data_a[15..0]
address_a[7..0]
w ren_a
data_b[15..0]
address_b[7..0]
w ren_b
clock
q_a[15..0]
q_b[15..0]
lpm_ram_dp3
inst2
256
Wor
d(s)
RA
MBlock Ty pe: M4K
data_a[15..0]
address_a[7..0]
w ren_a
data_b[15..0]
address_b[7..0]
w ren_b
clock
q_a[15..0]
q_b[15..0]
lpm_ram_dp3
inst5
up countersset 256sset
clock
cnt_en
q[8..0]
lpm_counter27
inst18
GN
D
A
B
A+B
dataa[15..0]
datab[15..0]
result[15..0]
lpm_add_sub15
inst21
CLRN
DPRN
Q
DFF
inst22CLRN
DPRN
Q
DFF
inst23
CLKCLK
CLK
CEA
zz[0]
zz[0]
WE
NOT
inst25
SAX[15..0]
WA[7..0]
RA[7..0]
RA[7..0]
MQX[15..0]
MQA[15..0]
SAX[15..0]
WE
VCCXD[15..0] INPUT
VCCXA[7..0] INPUT
VCCXWE INPUT
CEA,RA[7..0]
ST
CLK
CEA
OUT[15..0]OUTPUT
up countersclr
clockq[7..0]
lpm_counter28
inst24
CLK
WE
zz[31..0]
An Array Adder
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 15
Outline Counting:
Example: LED brightness and DAC Simple Sequencing
Bandwidth and Noise Issues: General Remarks on Sampling Theorem and Dithering. Example: Huffman Coding Example: Decimation & Dynamic Decimation
After-fact Calibration: Several Topics on FPGA Based TDC Serial Communication with Independent Crystals Minimum Synchronization
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 16
Cares Must Be Taken Outside FPGA (1)
DAC
FPGA
ADCShaperLP Filter
LP Filter
BandLimiting
BandLimiting
Spectrum ofOriginal Signal
Spectrum ofDAC Output
LP filter LP filter
ADC Input
SamplingIn ADC
Aliasing w/oLP Filtering
Output ofLP filter
Nyquist Frequency <(1/2) Sampling Frequency
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 17
The “Trend” vs. The Sampling Theorem
There will be no hardware analog
processing. Everything is done
digitally in software.
It sounds very stylish
A shaper/low-pass filter is a minimum requirement.
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 18
Cares Must Be Taken Outside FPGA (2)
DAC
FPGA
ADCShaperLP Filter
n
LP Filter
Dither
51
52
53
54
0 50 100 150
Sampling Index
AD
C
Signal Signal+Noise ADC(signal+noise) Weighted Average Threshold
51
52
53
54
0 50 100 150
Sampling Index
AD
C
Signal ADC(signal) Threshold
Resolution finer than the ADC LSB can be achieved by adding noise at ADC input and digital filtering.
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 19
Adding Noise for Finer Resolution
Photo Credit: www.telegraph.co.uk, trinities.org
Mechanical pressure gauges usually do not track small pressure changes well.
The gauge readers may lightly tap the gauges to get more accurate reading.
The idea of dithering at ADC input is similar.
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 20
Some Notes on Philosophy
WidebandLow Noise
NarrowbandNoisy
Good Bad
Something good in one condition can be bad in another condition.
And vise versa.
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 21
Why Band Limiting & Dithering are Ignored? Pre-amplifiers usually have a naturally limited
bandwidth and an intrinsic noise larger than the LSB of the ADC.
So a lot of time, band limiting and dithering can be “safely” ignored since they are satisfied automatically.
High bandwidth, low noise devices now become easily accessible. A design can be too fast and too quiet.
Do not forget to review the band limiting and dithering requirements for each design.
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 22
Outline Counting:
Example: LED brightness and DAC Simple Sequencing
Bandwidth and Noise Issues: General Remarks on Sampling Theorem and Dithering. Example: Huffman Coding Example: Decimation & Dynamic Decimation
After-fact Calibration: Several Topics on FPGA Based TDC Serial Communication with Independent Crystals Minimum Synchronization
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 23
Data Reduction on Liquid Argon TPC Data
Hit waveforms in TPC carry useful information. Digitizing the waveforms creates large volume of data. Data reduction without losing useful information is necessary.
Drift Time
Wire Number
Data from BO detector of FNAL
0
100
200
300
400
500
600
700
0 200 400 600 800 1000 1200 1400 1600 1800 2000
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 24
Slow Variation of Raw Data
140
142
144
146
148
150
152
154
156
158
160
1100 1150 1200 1250 1300 1350 1400
More than 99% points differ from previous points by -1, 0 or +1.
Huffman Coding can be applied to the differences of the data points.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
-8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
u(n+1)-u(n)
P
wire0_15 wire16_31 wire32_95
DFF
Q
A
B
A-B
U(n+1)
D
U(n+1)-U(n)
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 25
The Huffman Coding
The U(n+1)-U(n) value with highest probability is assigned to shortest code, i.e., single bit 1.
Values with lower probabilities are assigned with longer codes, e.g., 01, 001, 0001 etc.
Huffman coded words and regular words are distinguished by bit-15.
U(n+1)-U(n)
Code
-4 and others
Full 16 bits word
-3 000001
-2 0001
-1 01
0 1
+1 001
+2 00001
+3 0000001
1
0 0 ADC value (13-bit)
Regular ADC data for first point or when U(n+1)-U(n) is outside +-3
Huffman Coded
-1 0 0 0 +1 +2 Padding orContinue toNext WordIn this example, 6 differences of the data
samples are packed in the 16-bit data word.
11 11 1 10 0 0 0 0 0 0 0 0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
-8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
u(n+1)-u(n)
P
wire0_15 wire16_31 wire32_95
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 26
The Huffman Coding Block
The block is able to operate at up to 250MHz clock in Altera Cyclone III FPGA devices.
The block uses 245 logic cells, taking 0.6% in an EP3C40F484C6 device ($129) containing 39600 logic cells.
D[15..0]
DV
D1st
DLast
CK
DV6Q
D1st6Q
DLast6Q
Q[15..0]
QRDY
HuffmanCoding1
inst
D1st
DLast
CK250
Raw Data
Huffman Coded Data
245 Logic Cells(245/39600)*$129
= $0.80 1
0 0 ADC value (13-bit)
-1 0 0 0 +1 +2
11 11 1 10 0 0 0 0 0 0 0 0
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 27
The Schematics of the Huffman Coding BlockVCC
D[15..0] INPUT
VCCDV INPUT
VCCCK INPUT
VCCD1st INPUT
VCCDLast INPUT
QRDYOUTPUT
Q[15..0]OUTPUT
D1st6QOUTPUT
DLast6QOUTPUT
DV6QOUTPUT
PRN
CLRN
D
ENA
Q
DFFE
inst3
GN
DV
CC
data7x[2..0]
data6x[2..0]
data5x[2..0]
data4x[2..0]
data3x[2..0]
data2x[2..0]
data1x[2..0]
data0x[2..0]
sel[2..0]
result[2..0]
lpm_mux0
inst8
AND12
inst10
BAND12
inst11
NOR2
inst14
PRN
CLRN
D
ENA
Q
DFFE
inst13
A
B
A+B
dataa[3..0]
datab[3..0]
cin
result[3..0]
cout
lpm_add_sub1
inst15
A
B
A+B
dataa[3..0]
datab[3..0]
cin
result[3..0]
cout
lpm_add_sub1
inst17
PRN
CLRN
D
ENA
Q
DFFE
inst19
OR2
inst21AND2
inst22
AND2
inst24
NOT
inst27
data[3..0]
eq0
eq1
eq2
eq3
eq4
eq5
eq6
eq7
eq8
eq9
eq10
eq11
eq12
eq13
eq14
eq15
lpm_decode0
inst16
data1x[15..0]
data0x[15..0]
sel
result[15..0]
lpm_mux1
inst37
PRN
CLRN
D
ENA
Q
DFFE
inst26
AND2
inst39
CLRN
DPRN
Q
DFF
inst41
CLRN
DPRN
Q
DFF
inst42
CLRN
DPRN
Q
DFF
inst43
A
B
A-B
dataa[13..0]
datab[13..0]
clock result[13..0]
lpm_add_sub0
inst5
CLRN
DPRN
Q
DFF
inst44
CLRN
DPRN
Q
DFF
inst46
CLRN
DPRN
Q
DFF
inst47
CLRN
DPRN
Q
DFF
inst50
CLRN
DPRN
Q
DFF
inst51
CLRN
DPRN
Q
DFF
inst52
CLRN
DPRN
Q
DFF
inst53CLRN
DPRN
Q
DFF
inst54CLRN
DPRN
Q
DFF
inst55
CLRN
DPRN
Q
DFF
inst56
CLRN
DPRN
Q
DFF
inst57CLRN
DPRN
Q
DFF
inst58
CLRN
DPRN
Q
DFF
inst59
CLRN
DPRN
Q
DFF
inst60
AND2
inst25
CLRN
DPRN
Q
DFF
inst61
OR2
inst28
CLRN
DPRN
Q
DFF
inst62
OR2
inst35
AND2
inst29
NOT
inst30
CLRN
DPRN
Q
DFF
inst48
AND2
inst31
CLRN
DPRN
Q
DFF
inst63CLRN
DPRN
Q
DFF
inst64
AND4
inst1
OR4
inst2
CLRN
DPRN
Q
DFF
inst49
NOT
inst4
CLRN
DPRN
Q
DFF
inst65CLRN
DPRN
Q
DFF
inst66CLRN
DPRN
Q
DFF
inst67
CLRN
DPRN
Q
DFF
inst68CLRN
DPRN
Q
DFF
inst69
CLRN
DPRN
Q
DFF
inst70
zz[3..0]CK
DV3Q
CKDV3Q
BADHC
ROVR
NEWWRD
v v [1]
BADHC
SST[1]
SST[2]
SST[3]
SST[4]
SST[5]
SST[6]
SST[7]
SST[8]
SST[9]
SST[10]
SST[11]
SST[12]
SST[13]
SST[14]
SST[15]
NEWWRD
CLRDATA
CK
DV4Q
CK
CK
DV4Q
v v [15],SST[1..15]
CLRDATA
CK
CK
DV3Q DV4Q
ROVR
ROVR
CK
D1st4QND1st4Q
CK
BADHC4Q BADHC5Q
CK
BADHC BADHC4Q
CK CK
CK
D1st4Q
Number of bits for Huffman Codes 0: 0(+1), 1: 2(+1), 2: 4(+1), 3: 6(+1)
Number of bits for Huffman Codes -1: 1(+1), -2: 3(+1), -3: 5(+1), -4: 7(+1)
If (NBHC+1+HCSS)>=16, HCSS.d=(0xf&(NBHC+1+HCSS))+1
e.g. NBHC=2, HCSS=14 --> HCSS.d=1
+1 w hen rollover since 15 bits/w ord are used for data
zz[3],NBHC[2..0]
D2VQ[15..0]
zz[31..0]
v v [31..0]
DV3Q
DIFF[2..0]
BADHC
NBHC[2..0]
CK
zz[2],zz[1],v v [0]
zz[2],v v [1],v v [0]
v v [2],zz[1],v v [0]
v v [2],v v [1],v v [0]
v v [2],v v [1],zz[0]
v v [2],zz[1],zz[0]
zz[2],v v [1],zz[0]
zz[2],zz[1],zz[0]
DIFF[8]
DIFF[7]
DIFF[12]
DIFF[11]
DIFF[4]
DIFF[6]
DIFF[5]
DIFF[3]
DIFF[13]
DIFF[9]
DIFF[10]
DIFF[2]
DIFF[12]
DIFF[11]
DIFF[8]
DIFF[10]
DIFF[9]
DIFF[5]
DIFF[7]
DIFF[6]
DIFF[2]
DIFF[4]
DIFF[3]
DIFF[13]
CK
CK
CK
CK
CK
CKCK
CK
CK
CK
CK
BADHC5Q
CK
DV5Q
BADHC5Q
DV4Q
DV4Q
NEWWRD
D1st4QN
CK
CK
DV5Q
CK
D1Q[15..0]
CK
CK
CK
CK
DIFF[13..0]
zz[13],D1Q[12..0]
zz[13],D2VQ[12..0]
CK Difference ofData Points
Huffman CodeLookup Table
Huffman CodeComposer
Huffman Code orRaw Data Selector
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 28
The Compress Ratio of Huffman Coding
On typical TPC events a compression ratio of about 10 can be achieved.
Compression ratio is sensitive to high frequency noise.
D[15..0]
DV
D1st
DLast
CK
DV6Q
D1st6Q
DLast6Q
Q[15..0]
QRDY
HuffmanCoding1
inst
D1st
DLast
CK250
N
N/(10.7)
0
100
200
300
400
500
600
700
0 200 400 600 800 1000 1200 1400 1600 1800 2000
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 29
Outline Counting:
Example: LED brightness and DAC Simple Sequencing
Bandwidth and Noise Issues: General Remarks on Sampling Theorem and Dithering. Example: Huffman Coding Example: Decimation & Dynamic Decimation
After-fact Calibration: Several Topics on FPGA Based TDC Serial Communication with Independent Crystals Minimum Synchronization
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 30
A “Mystery” of Huffman Coding Ratios on Down Sampled Data
The 5MHz data is down sampled to 1MHz. The Huffman Coding compress ratio drops from 10.7 to 7.5 when the data is down sampled.
D[15..0]
DV
D1st
DLast
CK
DV6Q
D1st6Q
DLast6Q
Q[15..0]
QRDY
HuffmanCoding1
inst
D1st
DLast
CK250
N
N/(10.7)
D[15..0]
DV
D1st
DLast
CK
DV6Q
D1st6Q
DLast6Q
Q[15..0]
QRDY
HuffmanCoding1
inst
D1st
DLast
CK250
(N/5)
(N/5)/(7.5)
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 31
Averaging in Decimation: A Re-discovery
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 16 32 48 64
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 16 32 48 64
Simple “down-sampling” is not good. When the decimation factor is D, an averaging over D
samples is good either. An averaging over 2*D samples is necessary. There is still aliasing with averaging over 2*D samples but
it is less severe than averaging over D samples.
Nyquist Frequency <(1/2) Sampling Frequency
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 32
Weighted Average, The CIC-2 Filter
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 16 32 48 64
Filter performance can be further improved with weighted average over 4*D samples. The filter is called Cascade-Integrate-Comb filter of order 2 (CIC-2). The CIC-1 filter is the moving average.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 16 32 48 64
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 33
Huffman Coding Ratios for 5MHz to 1MHz
The Huffman Coding compress ratio improves as the filter in Dynamic Decimation improves.
0
2
4
6
8
10
12
no deci no filter AV5 AV10 CIC2_20
Hu
ffm
an C
od
ing
Co
mp
ress
Rat
io
R089_E104 R089_E175 R089_E174 R089_E178 R089_E179 R089_E110
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 34
Dynamic Decimation (DD)
400
420
440
460
480
500
520
540
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Only small time intervals, i.e., region of interest (ROI) must be sampled at high rate. Most time intervals can be sampled with lower rate, without losing useful information.
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 35
A Mystery of Dynamic Decimation & Huffman Coding
Dynamic Decimation reduces number of samples by factor of 10. Huffman Coding reduces number of bits from raw data by factor of 10. When cascaded, the combination reduces number of bits by factor of 60.
DynamicDecimation
HuffmanCoding
N N/10.6
DynamicDecimation
HuffmanCoding
N N/60N N/10.7
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 36
Huffman Coding Ratios for Dynamic Decimation
The Huffman Coding compress ratio improves as the filter in Dynamic Decimation improves.
0
2
4
6
8
10
12
no deci no filter AV16 AV32 CIC2_64
Hu
ffm
an C
od
ing
Co
mp
ress
Rat
io
R089_E104 R089_E175 R089_E174 R089_E178 R089_E179 R089_E110
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 37
Any Differences ?
Raw
With DynamicDecimation
0
100
200
300
400
500
600
700
0 200 400 600 800 1000 1200 1400 1600 1800 2000
0
100
200
300
400
500
600
700
0 200 400 600 800 1000 1200 1400 1600 1800 2000
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 38
Outline Counting:
Example: LED brightness and DAC Simple Sequencing
Bandwidth and Noise Issues: General Remarks on Sampling Theorem and Dithering. Example: Huffman Coding Example: Decimation & Dynamic Decimation
After-fact Calibration: Several Topics on FPGA Based TDC Serial Communication with Independent Crystals Minimum Synchronization
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 39
TDC Using FPGA Logic Chain Delay
This scheme uses current FPGA technology
Low cost chip family can be used. (e.g. EP2C8T144C6 $31.68)
Fine TDC precision can be implemented in slow devices (e.g., 20 ps in a 400 MHz chip).
IN
CLK
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 40
Two Major Issues In a Free Operating FPGA
0
20
40
60
80
100
120
140
160
180
0 16 32 48 64
bin
wid
th (
ps)
1. Widths of bins are different and varies with supply voltage and temperature.
2. Some bins are ultra-wide due to LAB boundary crossing
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 41
Digital Calibration Using Twice-Recording Method
IN
CLK
Use longer delay line. Some signals may be
registered twice at two consecutive clock edges.
N2-N1=(1/f)/t
The two measurements can be used: to calibrate the delay. to reduce digitization errors.
1/f: Clock Periodt: Average Bin Width
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 42
TDC Output at Different PS Voltage
0
5
10
15
20
25
1.5 2 2.5
VCCINT (V)
TD
C O
utp
uts
N1
n2
TDC Output at Different PS Voltage
0
5
10
15
20
25
1.5 2 2.5
VCCINT (V)
TD
C O
utp
uts
N1
n2
Tc
Digital Calibration Result Power supply voltage
changes from 2.5 V to 1.8 V, (about the same as 100 oC to 0 oC).
Delay speed changes by 30%.
The difference of the two TDC numbers reflects delay speed.
N2
N1Corrected Time
)()(
0112
01 NNL
T
NN
NNTTc
Warning: the calibration is based on average bin width, not bin-by-bin widths.
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 43
0
500
1000
1500
2000
2500
0 16 32 48 64
bin
tim
e (p
s)
Auto Calibration Using Histogram Method It provides a bin-by-bin calibration at
certain temperature. It is a turn-key solution (bin in, ps out) It is semi-continuous (auto update
LUT every 16K events)
0
20
40
60
80
100
120
140
160
180
0 16 32 48 64
bin
wid
th (
ps)
DNLHistogram
In (bin)LUT
Out (ps)
16KEvents
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 44
Good, However
Auto calibration solved some problems However, it won’t eliminate the ultra-wide bins
0
20
40
60
80
100
120
140
160
180
0 16 32 48 64
bin
wid
th (
ps)
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 45
Cell Delay-Based TDC + Wave Union Launcher
Wave UnionLauncher
In
CLK
The wave union launcher creates multiple logic transitions after receiving a input logic step.
The wave union launchers can be classified into two types:
Finite Step Response (FSR) Infinite Step Response (ISR)
This is similar as filter or other linear system classifications:
Finite Impulse Response (FIR) Infinite Impulse Response (IIR)
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 46
Wave Union Launcher A (FSR Type)
In
CLK
1: Unleash0: HoldWave UnionLauncher A
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 47
Wave Union Launcher A: 2 Measurements/hit
1: Unleash
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 48
Sub-dividing Ultra-wide Bins
1: Unleash
1
2
1
2
Device: EP2C8T144C6 Plain TDC:
Max. bin width: 160 ps. Average bin width: 60 ps.
Wave Union TDC A: Max. bin width: 65 ps. Average bin width: 30 ps.
0
20
40
60
80
100
120
140
160
180
0 16 32 48 64 80 96 112 128bin
wid
th (p
s)
Plain TDC
Wave Union TDC A
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 49
Measurement Result for Wave Union TDC A
Histogram
Raw
TDC+
LUT53 MHzSeparate Crystal
-
-WaveUnion Histogram
Plain TDC: delta t RMS width: 40 ps. 25 ps single hit.
Wave Union TDC A: delta t RMS width: 25 ps. 17 ps single hit.
0
500
1000
1500
2000
2500
3000
3500
1000 1100 1200 1300 1400 1500
dt (ps)
Un-calibrated
Plain TDC
Wave Union TDC A
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 50
More Measurements
Two measurements are better than one. Let’s try 16 measurements?
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 51
Wave Union Launcher B (ISR Type)
Wave UnionLauncher B
In
CLK
1: Oscillate0: Hold
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 52
Wave Union Launcher B: 16 Measurements/hit
1 Hit16 Measurements@ 400 MHz
VCCINT=1.20V
VCCINT=1.18V
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 53
Delay Correction
0
500
1000
1500
2000
2500
3000
0 4 8 12 16
m
T0
(ps)
16
32
48
64
0 2 4 6 8 10 12 14 16
m
TD
C (
bin
)
Delay Correction Process: Raw hits TN(m) in bins are first calibrated into
TM(m) in picoseconds. Jumps are compensated for in FPGA so that
TM(m) become T0(m) which have a same value for each hit.
Take average of T0(m) to get better resolution.
The raw data contains: U-Type Jumps: [48-63][16-31] V-Type Jumps: other small jumps. W-Type Jumps: [16-31][48-63]
15
000 )(
16
1
mav mtt
The processes are all done in FPGA.
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 54
The Test Module
Two NIM inputs
FPGA with 8ch TDC
Data Output via Ethernet
BNC Adapter to add delay @
150ps step.
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 55
Test ResultNIM Inputs
0 1 2
RMS 10ps
LeCroy 429ANIM Fan-out
NIM/LVDS
NIM/LVDS
-
140ps
Wave Union TDC BWave Union TDC BWave Union TDC BWave Union TDC B
Wave Union TDC BWave Union TDC BWave Union TDC BWave Union TDC B
+
+BNC adapters to add delays @ 140ps step.
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 56
Multi-Sampling TDC FPGA c0
c90
c180
c270
c0
MultipleSampling
ClockDomain
Changing
Trans. Detection& Encode
Q0
Q1
Q2
Q3QF
QE
QD
c90
Coarse TimeCounter
DV
T0T1
TS
Ultra low-cost: 48 channels in $18.27 EP2C5Q208C7.
Sampling rate: 360 MHz x4 phases = 1.44 GHz.
LSB = 0.69 ns.
4Ch
Logic elements with non-critical timing are freely placed by the fitter of the compiler.
This picture represent a placement in Cyclone FPGA
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 57
Issues of Coarse Time Counter
There are some common misunderstandings on coarse time counters in a TDC: Tow coarse time counters are needed, driven by clocks with 180 degree
phase difference. The coarse time counter should be a Gray code counter.
Dual counters and/or Gray code counters are only needed in one ASIC TDC architecture.
In the architectures used by FPGA TDC and some ASIC TDC, only one plain binary counter is needed as coarse time counter.
CoarseTime
Counter
CoarseTime
Counter
CoarseTime
Counter
GrayCode
Counter
000001011010110111101100
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 58
Delay Line Based TDC Architectures
HIT
CLK
HIT
CLK
HIT
CLK
HIT
CLK
Delay Hit Delay CLK Delay Both
CLK is used as clock
HIT is used as clock
Only this architecture needs dual coarse time counters.
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 59
Implementation of Coarse Time CounterCoarseTime
Counter
FineTime
Encoder
In
CLK
ENA
Fine Time
Coarse Time
Data Ready
Hit Detect Logic
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 60
Outline Counting:
Example: LED brightness and DAC Simple Sequencing
Bandwidth and Noise Issues: General Remarks on Sampling Theorem and Dithering. Example: Huffman Coding Example: Decimation & Dynamic Decimation
After-fact Calibration: Several Topics on FPGA Based TDC Serial Communication with Independent Crystals Minimum Synchronization
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 61
Classical Picture of Serial Communications
The parallel data is converted to serial bits driven by crystal oscillator X1 in the transmitter device.
The serial data stream is used to generate a recovered clock at the receiver device with a phase lock loop (PLL).
The recovered clock is used to drive the serial-to-parallel converter and store the data into a first-in-first-out (FIFO) buffer.
The FIFO buffer is used to transfer data from the recovered clock domain to the local clock domain generated by crystal oscillator X2.
Parallel-to-SerialConverter
FIFOSerial-to-Parallel
Converter
PLLX1 X2
LocalLogic
Recovered Clock
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 62
Serial Data Receiving Without PLL etc.
Generating recovered clock with PLL, VCO, VCXO etc. is an analog process and it is not convenient to generate in an FPGA, especially for applications with multiple receiving channels.
There are pure digital methods to receive the serial data. Digital Phase Follower: 1bit/CLK The Two-Cycle Serial IO: 1bit/(2CLK) FM Encoder and Decoder: 1bit/(2-16CLK) Clock-Command Combined Carrier Coding (C5): 4bits/(20CLK)
The transmitter and receiver can be driven by two independent free running crystal oscillators.
Parallel-to-SerialConverter
DigitalSerial-to-Parallel
Converter
X1 X2
LocalLogic
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 63
Digital Phase Follower
c0
c90
c180
c270
c0In
MultipleSampling
ClockDomain
Changing
b0
b1
FrameDetection
DataOut
Tri-speedShift
Register
Shift2
Shift0
was3is0
SEL
was0is3
Trans.Detection
Q0
Q1
Q2
Q3QF
QE
QD
The input data rate is 1bit/clock cycle. Four clock phases, c0, c90, c180 and c270 are used to detect input transition edge. The phase for data sample follows the variation of the transition edge.
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 64
Schematics of Digital Phase Follower
EE[3..0]OUTPUT
C1OUTPUT
C0OUTPUT
PQQ[11..0]OUTPUT
DS5B[4..0]OUTPUT
BBOUTPUT
JMPOUTPUT
ENOUTPUT
IN1
CLK0
CLK90
CLK180
CLK270
EN
QQ[11..0]
BT
JMP
WTN
EE[3..0]
phtrk1
inst3
BB
BX
JMP
EN
CLK
Q[4..0]
C1
C0
DS5B
inst
GND
D[4..0]
C1
C0
CLK
M[23..20]
Q[27..0]
QQ[23..0]
DV
S[1..0]
ERR
Word24_13z
inst9
CLK0
VCCIN1 INPUT
VCCCLK0 INPUT
VCCCLK90 INPUT
VCCCLK180 INPUT
VCCCLK270 INPUT
EE[3..0]OUTPUT
QQ[11..0]OUTPUT
JMPOUTPUT
WTNOUTPUT
BTOUTPUT
CLRN
DPRN
Q
DFF
inst3
CLRN
DPRN
Q
DFF
inst4
CLRN
DPRN
Q
DFF
inst5
CLRN
DPRN
Q
DFF
inst6
CLRN
DPRN
Q
DFF
inst9
CLRN
DPRN
Q
DFF
inst10
CLRN
DPRN
Q
DFF
inst11
CLRN
DPRN
Q
DFF
inst12
NOT
inst27
AND4
inst29
PRN
CLRN
D
ENA
Q
DFFE
inst19CLRN
DPRN
Q
DFF
inst26
CLRN
DPRN
Q
DFF
inst21CLRN
DPRN
Q
DFF
inst24
OR4
inst8
AND2
inst13
AND2
inst14
AND2
inst15
AND2
inst16
CLRN
DPRN
Q
DFF
inst25
AND2
inst1
NAND2
inst2
CLRN
DPRN
Q
DFF
inst28
CLRN
DPRN
Q
DFF
inst30
CLRN
DPRN
Q
DFF
inst31
OR4
inst
CLRN
DPRN
Q
DFF
inst32
OR4
inst18
OR4
inst20
up countersclr
clockq[6..0]
lpm_counter1
inst7
QA[3]
QA[2]
QA[1]
QA[0]
CLK0
CLK90
CLK180
CLK270 CLK90
QQ[3]
QQ[2]
QQ[1]
QQ[0]
CLK0
QQN[6..3]
QQN[5..2]
QQ[4..1]
QQ[3..0]
AD[3..0]
QQ[7..0] QQN[7..0]
CLK0
QQ[3..0] QQ[7..4]
CLK0 CLK0
QQ[7..4] QQ[11..8]
EE[3]
EE[2]
EE[1]
EE[0]
QQ[11]
QQ[10]
QQ[9]
CLK0
QQ[8]
CLK0
CLK0
ADQ[0]
EE[3]
ADQ[3]
EE[0]
CLK0
AD[3]
CLK0
ADQ[3..0]
ADQ[1]
ADQ[0]AD[2]
CLK0
CLK0
ADQ[3]
ADQ[2]
ADQ[1]
ADQ[0] QCNT[6..0]
QCNT[6]
VCCBB INPUT
VCCBX INPUT
VCCJMP INPUT
VCCEN INPUT
VCCCLK INPUT
C1OUTPUT
C0OUTPUT
Q[4..0]OUTPUTdata1x[4..0]
data0x[4..0]
sel
result[4..0]
lpm_mux4
inst
PRN
CLRN
D
ENA
Q
DFFE
inst5
OR2
inst9
XOR
inst10
XOR
inst11
NOT
inst12
PRN
CLRN
D
ENA
Q
DFFE
inst6
PRN
CLRN
D
ENA
Q
DFFE
inst7
Q[2..0],BB,BX
Q[3..0],BBD[4..0] Q[4..0]
EN
CLK
CLK
EN
CLK
EN
JMP
CLK: 375MHz Data Rate:
375Mbits/s
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 65
The Two-Cycle Serial IO
This scheme is slower than digital phase follower but the logic is simpler. The CLK1 and CLK2 can be generated with two free running crystal oscillators.
CLK1
Data Out
Transmitter
Receiver
start bit = 1 b15 b14
b15start bit = 1 X b14X
CLK2
Data In
One data bit is transmitted every 2 clock cycles.
A logic transition is detected between these two falling edges.
Input data are stable at these clock edges.
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 66
Schematics of the Two-Cycle Serial IO
VCCCK200 INPUT
VCCDD[15..0] INPUT
VCCDRDY INPUT
VCCSDIN INPUT
VCCDV INPUT
VCCCK100 INPUT
QQ[15..0]OUTPUT
SDOUTOUTPUT
POPCMDOUTPUT
QQOKOUTPUT
VC
CG
ND
CLRN
DPRN
Q
DFF
inst4
up countermodulus 36sclr
clockq[5..0]
cout
lpm_counterS2
inst3
CLRN
DPRN
Q
DFF
inst7
NOT
inst9
OR2
inst10NOT
inst11
NOT
inst12
CLRN
DPRN
Q
DFF
inst13
CLRN
DPRN
Q
DFF
inst14
NOT
inst16
CLRN
DPRN
Q
DFF
inst18
CLRN
DPRN
Q
DFF
inst19
AND4
inst20
NOT
inst17 up countersset 32sset
clock
cnt_en
q[5..0]
lpm_counterS4
inst2
AND6
inst22CLRN
DPRN
Q
DFF
inst23
lef t shif tload
data[16..0]
clock
enable
shiftin
shiftout
lpm_shiftregS1
inst
lef t shif tclock
enable
shiftinq[15..0]
lpm_shiftregS5
inst21
PRN
CLRN
D
ENA
Q
DFFE
inst1
CLRN
DPRN
Q
DFF
inst5
CLRN
DPRN
Q
DFF
inst24
OR2
inst15
v v v [31..0]
zzz[31..0]
CK200
CK200
DRDY
v v v [16],DD[15..0]
DV
ENA1
zzz[0]
CK200
ENA1
ENA1DV
CK200
CK200 CK200N
CK200
SEQ[0]
CK200
SEQ[0]
SEQ[5]
SEQ[4]
SEQ[3]
SEQ[2]
SEQ[1]
SDINQ
CK200N
SDIN
CK200N
CK200
CK200
SEQ[5..0]
SEQ[5]
SEQ[5]
CK200
SDIN SDINQ
CK200
CK100
CK200
434241403938373635343332
SDIN
SEQ
SDINQ
SD15 SD14 SD13 SD12 SD11 SD10
SD15 SD15,14 SD15..13 SD15..12
SSET
ENAS=SEQ[0]
SDIN1NQ
SDIN2NQ
CK200
CLK: 200MHz Data Rate: 100Mbits/s
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 67
The FM coding
A bit is transmitted in two unit time intervals, usually in two internal clock cycles at frequency f.
For bit=1, the output toggles each cycle, i.e., with frequency (f/2) and for bit=0, the output toggles every two cycles, i.e., with frequency (f/4).
When not transmitting data, the output toggles at frequency (f/4), until seeing the start bit. The data stream is naturally DC balanced suitable for AC coupled transmission. The polarity of the interconnection doesn’t matter.
0 start bit = 1 0 0 1 1
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 68
Schematics of FM Decoder
VCCCK212 INPUT
VCCINA INPUT
DVOUTPUT
DQ[17..0]OUTPUT
PQOUTPUT
CLRN
DPRN
Q
DFF
inst CLRN
DPRN
Q
DFF
inst2
CLRN
DPRN
Q
DFF
inst3
XOR
inst4
up countersset 8sset
clock
cnt_en
q[3..0]
lpm_counter1
inst5
data[2..0]
eq0
eq1
eq2
eq3
eq4
eq5
eq6
eq7
lpm_decode0
inst6
AND2
inst8
NOT
inst10
data[2..0]
eq0
eq1
eq2
eq3
eq4
eq5
eq6
eq7
lpm_decode0
inst11
up countersset 360sset
clock
cnt_en
q[8..0]
lpm_counter4
inst7
PRN
CLRN
D
ENA
Q
DFFE
inst1
NOT
inst9
AND6
inst12CLRN
DPRN
Q
DFF
inst13
CK212
CK212
CK212
INATOG
CK212
INATOG
TOGCNT[3..0]
TOGCNT[3]
INAQ
TOGCNT[2..0]
INAis0x
CK212
CNTSHFT
SSETFCNTSSETFCNTINAis0x
CNTSHFT
CNTSHFT,BitCNT[4..0],BTK[2..0] BTK[2..0]
OKSample
CK212
DQ[17..0],PQ
DD
OKSample
BitCNT[4]
OKSample
BitCNT[3]
BitCNT[2]
BitCNT[1]
BitCNT[0]CK212
DQ[16..0],PQ,DD
TOGCNT[2]
0 0
INAQ
INATOG
TOGCNT[2..0] 1 2 3 1 2 3 0 1 2 3 0 01 2 3 1 2 34 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3
SSETFCNT
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7BTK
CNTSHFT
OKSample
BitCNT 13 14
0 1 2 3 4 5 6 7
... 31
DV
DQ[17] DQ[16] DQ[0] PQ
Logic 0: INA:13.25MHz or 8xCK212
BitCNT: 13..31, Init to 13x8+256=260
CLK: 212MHz Data Rate: 26.5Mbits/s The ratio 8 CLK cycles/bit in this design is not an intrinsic limit.
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 69
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
-1 0 1 2 3 4 5 6
The Clock-Command Combined Carrier Coding (C5)
A data train contains 5 pulses and each pulse is transmitted in four unit time intervals, usually in four internal clock cycles at frequency f.
Information is carried with wide, normal and narrow pulses and the first pulse is always wide or narrow.
When not transmitting data, all pulses have normal width. The data stream is DC balanced over 5 pulses suitable for AC coupled transmission. All leading edges are evenly spread so that the pulse train can be used directly drive the
receiver side logic or PLL.
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 70
Schematics of C5 Decoder
VCCCC INPUT
VCCT38 INPUT
VCCT58 INPUT
CmdValidOUTPUT
CmdBit[3..0]OUTPUT
Y[0..4]OUTPUT
NOT
inst
CLRN
DPRN
Q
DFF
inst3
CLRN
DPRN
Q
DFF
inst4
NA
ND
2
inst
6
CLRN
DPRN
Q
DFF
inst7
CLRN
DPRN
Q
DFF
inst8
NA
ND
2
inst
9
CLRN
DPRN
Q
DFF
inst10
CLRN
DPRN
Q
DFF
inst11
NA
ND
2
inst
12
CLRN
DPRN
Q
DFF
inst13
CLRN
DPRN
Q
DFF
inst14
NA
ND
2
inst
15
CLRN
DPRN
Q
DFF
inst16
CLRN
DPRN
Q
DFF
inst17
CLRN
DPRN
Q
DFF
inst18
NOT
inst19
AND2
inst20
DFFdata[3..0]
clock
enableq[3..0]
lpm_dff0
inst22
up countermodulus 5sclr
clockq[3..0]
cout
lpm_counter0
inst27
BAND4
inst1
CLRN
DPRN
Q
DFF
inst21CLRN
DPRN
Q
DFF
inst23
Y[0]
CmdBit[3..0]
Y[0..3]
Y[1]
Y[2]
Y[3]
Y[4]
VCCCC INPUT
VCCC40 INPUT
T38OUTPUT
T58OUTPUT
CLRN
DPRN
Q
DFF
instCLRN
DPRN
Q
DFF
inst1CLRN
DPRN
Q
DFF
inst2
NOT
inst3
VCCCC INPUT
Cy clone
inclk0 period: 36.000 ns
Operation Mode: Normal
Clk Ratio Ph (dg) Td (ns) DC (%)
c0 4/1 0.00 0.00 50.00
e0 1/1 0.00 0.00 50.00
inclk0 c0
e0
locked
altpll1
inst2
CC
C40
T38
T58
Delay
inst3
T38
T58
CC
Y[0..4]
CmdValid
CmdBit[3..0]
Composer
inst8
Data Rate: 36ns/bit or 27.7Mbits/s
Internal clock: 111MHz
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 71
Outline Counting:
Example: LED brightness and DAC Simple Sequencing
Bandwidth and Noise Issues: General Remarks on Sampling Theorem and Dithering. Example: Huffman Coding Example: Decimation & Dynamic Decimation
After-fact Calibration: Several Topics on FPGA Based TDC Serial Communication with Independent Crystals Minimum Synchronization
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 72
Fixed Latency Everywhere?
In classical trigger system, all cables must have fixed propagation delay.
Serial links intrinsically do not have fixed latency. Do we need fixed latency at all? No.
FrontEnd
Trigger
FrontEnd
FrontEnd
FrontEnd
Trigger
FrontEnd
FrontEnd
SER
DESERDESERDESER
SER SER
?
TimingReference
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 73
Hit Time Coding and Transmitting
Hits in each channel are coded as bits representing small time intervals.
Bit patterns are merged in a front-end module.
DetectorProcessing
BoardHit
5ns
40ns
0 1 0 0 0 0 01
0 0 0 0 0 01
0 1 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 01
0
0
00
0CLK&CMD
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 74
Cable Delay Self Timing
At system initialization, all the Detector Processing Boards send out a special word
in the same clock cycle as start mark. At the receiving end, the absolute arrival
time from each board can be unknown and different. However, the start mark is recognized and stored in the addresses 0 of the corresponding receiving buffer. The words after the start mark are stored in sequence.
Processing Support Board
Detector Processing Board
Detector Processing Board
Detector Processing Board
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 75
An Example
InitialMarker
Data
InitialMarker
Data
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 76
Hit Merging and Coincidence
Hits from different inputs in the Processing Support Board are merged together with an OR function and sent out as a serial data stream.
The Coincidence Module re-align the different stream in the receiver buffers. Inside the Coincidence Module, the coincidence is searched as AND functions of the hit streams from
opposite detector sectors. Very likely, a boundary coverage logic is applied, e.g.: Trigger T[N] = HA[N]&&(HB[N] || HC[N]).
The boundary coverage for time domain is also necessary. This is satisfied by checking adjacent bits in the buffered words, e.g.: Trigger T[N] = (HA[N+1] || HA[N] || HA[N-1])&&(HB[N] || HC[N]).
Processing Support Board
Processing Support Board Coincidence Module
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course
77
Post-Scripts
Some Extra Words for the
Young & Old
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 78
About FPGA: Myths & Thinking We commonly heard about FPGA:
FPGA is cheap. FPGA is fast. FPGA is large. FPGA can do anything.
Not really. At least it is not always the case. Good design tricks are needed in order to take full
advantages of FPGA devices and to avoid drawbacks of FPGA devices.
FPGA: $16-$1500, Micro-Processor: $100-$500. FPGA: 500MHz, Micro-Processor: 1-3GHz. FPGA logic consumes more transistors. Only if the information is collected in FPGA.
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 79
Moore’s Law
Number of transistors in a package:
x2 /18months
Taken from www.intel.com
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 80
Status of Moore’s Law: an Inconvenient Truth
# of transistors Yes, via multi-core.
Clock Speed ?
Taken from www.intel.com
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 81
Complexity in FPGA Designs
Excessive Complexity in FPGA Designs
= Fevers of Moore’s Law + Myths + No Thinking
Complexity causes higher FPGA cost. Complexity creates indirect costs such as PCB
layout, assembly, power consumption, cooling etc.
Complexity confuses people, including designers.
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 82
Indirect Cost of Complexity
If something like this can do the job…
… why do these?
May. 2009, Wu Jinyuan, Fermilab [email protected]
IEEE RT09 Short Course 83
The Winning Line of FPGA Design
We commonly heard: FPGA devices contains millions gate. High parallelism can be implemented in FPGA. FPGA cost drops by half every 18 months.
We want to emphasize, especially to our young students:
1. Creativity,
2. Creativity,
3. Creativity, on Arithmetic ops, on Algorithms, on Architectures & on All Aspects.
O Freunde, nicht diese Töne!