Accelerator Project TB Meeting September 2015 Mats Lindroos Head of Accelerator September 23, 2015.
Parallel accelerator project
-
Upload
megan-hinton -
Category
Documents
-
view
22 -
download
0
description
Transcript of Parallel accelerator project
![Page 1: Parallel accelerator project](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813300550346895d99bd20/html5/thumbnails/1.jpg)
Parallel accelerator project
Final presentationSummer 2008
Student Vitaly ZakharenkoSupervisor Inna Rivkin Duration semester
![Page 2: Parallel accelerator project](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813300550346895d99bd20/html5/thumbnails/2.jpg)
System functionality Large picture
◦ Multiple signal sources share the same media.◦ Each source produces a periodic pulse sequence in
the media. ◦ Observer of the media senses superposed pulse
sequences with the addition of noise. ◦ Preprocessor detects pulses in the signal and
stores each pulse as pulse TOA (time of arrival). ◦ The pulse TOA array produced by the preprocessor
is conveyed to the system.
◦The system separates pulses into original signals (i.e. into periodic pulse sequences).
![Page 3: Parallel accelerator project](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813300550346895d99bd20/html5/thumbnails/3.jpg)
Signal produced by source # 1
Signal produced by source # 2
Signal as seen by observer
TOA1 TOA2 TOA3 TOA4 TOA5 TOA6 TOA7 TOA8 TOA9 TOA10 TOA11
TOA1 TOA2 TOA3 TOA4 TOA5 TOA6 TOA7 TOA8 TOA9
Data structure for signal representation
Missing pulse effect Missing pulse effect
TOA1 TOA2 TOA3 TOA4 TOA5 TOA6 TOA7 TOA8 TOA9
System output : pulses separated by source
![Page 4: Parallel accelerator project](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813300550346895d99bd20/html5/thumbnails/4.jpg)
System components
SimulatorOn a PC constructs datagrams.
Datagram switchOn the FPGA manages flow of datagrams between the simulator and the processing units.
Data processing unitsOn the FPGA each unit processes datagrams.
![Page 5: Parallel accelerator project](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813300550346895d99bd20/html5/thumbnails/5.jpg)
Main system components
Simulator
Switch
Processing unit
Processing unit
Processing unit
Processing unit
Processing unit
Processing unit
FPGA
PC
![Page 6: Parallel accelerator project](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813300550346895d99bd20/html5/thumbnails/6.jpg)
Data processing unitsEach unit contains Nios II processor and C2H generated H/W accelerators.
Sequence search C2H generated accelerator
Histogram builder C2H generated accelerator
Nios II embedded processor
Avalon switchfabric
Avalon switchfabric
![Page 7: Parallel accelerator project](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813300550346895d99bd20/html5/thumbnails/7.jpg)
Data processing algorithm
for {level} := 1 up to {maximum level} do 1. Build histogram of differences (SDIF) of level:= {level}.2. Add SDIF to cumulative histogram (CDIF).
3. Find lowest periodicity column of CDIF above threshold.4. if {column found} = TRUE then
4.1. Detect all pulse sequences of the periodicity.4.2. Mark pulses as associated.
end if 5. Check whether to break the loop.
end for
![Page 8: Parallel accelerator project](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813300550346895d99bd20/html5/thumbnails/8.jpg)
Source 1 signal
Source 2 signal
Source 3 signal
Observed signal
a b c a b c a b c a b c a b c
Data processing example
![Page 9: Parallel accelerator project](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813300550346895d99bd20/html5/thumbnails/9.jpg)
Observed signal
a b c a b c a b c a b c a b c
c
ab
SDIF(LEVEL = 1) CDIF
c
ab
CDIF
Cumulative histogram (CDIF) update
Data processing example
![Page 10: Parallel accelerator project](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813300550346895d99bd20/html5/thumbnails/10.jpg)
c
ab
CDIF
Threshold crossing check
Threshold function
No periodicity candidateNo sequence search
Data processing example
![Page 11: Parallel accelerator project](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813300550346895d99bd20/html5/thumbnails/11.jpg)
Observed signal
a b c a b c a b c a b c a b c
a+b c+a b+c
ca b
CDIF
Cumulative histogram (CDIF) update
b+cc+a
a+b
SDIF(LEVEL = 2)
ca b
CDIF
b+cc+a
a+b
Data processing example
![Page 12: Parallel accelerator project](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813300550346895d99bd20/html5/thumbnails/12.jpg)
Threshold crossing check
No periodicity candidateNo sequence search
Threshold function
ca b
CDIF
b+cc+a
a+b
Data processing example
![Page 13: Parallel accelerator project](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813300550346895d99bd20/html5/thumbnails/13.jpg)
Observed signal
a b c a b c a b c a b c a b c
a+b c+a b+c
a+b+c
Cumulative histogram (CDIF) update
ca b
CDIF
b+cc+a
a+b
SDIF(LEVEL = 3)
a+b+c
a+b+c
ca b
CDIF
b+cc+a
a+b
Data processing example
![Page 14: Parallel accelerator project](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813300550346895d99bd20/html5/thumbnails/14.jpg)
Threshold crossing check
Threshold function
Search for all sequences of periodicity (a+b+c)
a+b+c
ca b
CDIF
b+cc+a
a+b
Threshold satisfied by periodicity (a+b+c)
Data processing example
![Page 15: Parallel accelerator project](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813300550346895d99bd20/html5/thumbnails/15.jpg)
Detected sequence # 1
Data processing example
Detected sequence # 2
Detected sequence # 3
Sequence search results (final results)
![Page 16: Parallel accelerator project](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813300550346895d99bd20/html5/thumbnails/16.jpg)
Input datagram format
TOA 1
IDControl Bits Len
TOA 2
... TOA N
64 bits
![Page 17: Parallel accelerator project](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813300550346895d99bd20/html5/thumbnails/17.jpg)
Output datagram format
Control fields set Length IDTotal pulses associated Total sequences detected
Association of pulse 1Association of pulse 2…Association of pulse N
Total pulses associated with sequence 1 PRI of sequence 1Jitter of sequence 1Confidence level 1 of sequence 1Confidence level 3 of sequence 1
PRI of sequence 2…
2 2
4
4
4
2
4
2
1
1
… 1
4 4
4 …
Field name Size (bytes)
![Page 18: Parallel accelerator project](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813300550346895d99bd20/html5/thumbnails/18.jpg)
Implementation for Nios II Testing and profiling
• In Visual Studio (VS) floating point calculations were replaced by fixed point
• C code of the algorithm was ported from VS to Nios IDE
• Algorithm was profiled on Nios II
![Page 19: Parallel accelerator project](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813300550346895d99bd20/html5/thumbnails/19.jpg)
SoPC system generation
H/w design was generated inAltera SoPC Builder environment
![Page 20: Parallel accelerator project](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813300550346895d99bd20/html5/thumbnails/20.jpg)
Different SoPC system configurations were compared
SoPC system was optimized ◦multiple clock domains were provided
for◦interconnect was minimized◦different processor types were
compared
SoPC system generation
![Page 21: Parallel accelerator project](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813300550346895d99bd20/html5/thumbnails/21.jpg)
C2H Acceleration C2H h/w accelerators were
generated for two blocks of the algorithm: ◦Sequence search function (FindSeqs) ◦Histogram builder function
(BuildHist)
![Page 22: Parallel accelerator project](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813300550346895d99bd20/html5/thumbnails/22.jpg)
C2H acceleratorsPerformance optimization
Sequence search (FindSeqs) function acceleration◦Accelerator results unsatisfactory◦Consumes great amount of FPGA
logic ◦Low acceleration gain (X4 at most)◦Discarded after much efforts wasted
in optimization
![Page 23: Parallel accelerator project](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813300550346895d99bd20/html5/thumbnails/23.jpg)
C2H acceleratorsPerformance optimization
Sequence search (BuildHist) function acceleration◦Good acceleration results ◦X50 acceleration gain◦Moderate FPGA logic consumption
![Page 24: Parallel accelerator project](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813300550346895d99bd20/html5/thumbnails/24.jpg)
Design performanceFPGA resources
6% logic consumption 5% memory
consumption
![Page 25: Parallel accelerator project](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813300550346895d99bd20/html5/thumbnails/25.jpg)
Design performance Timing
1 up to 7 ms processing time3 Nios systems significantly
outperform Pentium 4 processor