Richard Dorrance November 4, 2011
description
Transcript of Richard Dorrance November 4, 2011
![Page 1: Richard Dorrance November 4, 2011](https://reader035.fdocuments.in/reader035/viewer/2022062301/568157ad550346895dc53a33/html5/thumbnails/1.jpg)
Click to edit Master title style
High Speed 3D Tomographyon CPU, GPU, and FPGA
Nicolas GAC, Stéphane Mancini, Michel Desvignes, Dominique Houzet
Reconfigurable MPSoC versus GPU:Performance, Power and Energy Evaluation
Diana Göhringer, Matthias Birk, Yves Dasse-Tiyo,Nicole Ruiter, Michael Hübner, Jürgen Becker
Richard DorranceNovember 4, 2011
Literature Review
![Page 2: Richard Dorrance November 4, 2011](https://reader035.fdocuments.in/reader035/viewer/2022062301/568157ad550346895dc53a33/html5/thumbnails/2.jpg)
Click to edit Master title style
Review
Computed Tomography
![Page 3: Richard Dorrance November 4, 2011](https://reader035.fdocuments.in/reader035/viewer/2022062301/568157ad550346895dc53a33/html5/thumbnails/3.jpg)
Tomography
Basis for CAT scan, MRI, PET, SPECT, etc.
Cross-sectional imagingtechnique using transmissionor reflection data frommultiple angles
Computed Tomography (CT):A form of tomographic reconstruction on computers
3
![Page 4: Richard Dorrance November 4, 2011](https://reader035.fdocuments.in/reader035/viewer/2022062301/568157ad550346895dc53a33/html5/thumbnails/4.jpg)
Cross-Sections by X-Ray Projections
Project X-ray through biological tissue;measure total absorption of ray by tissue
Projection Pθ(t) is the Radontransform of object functionf(x,y):
Total set of projections calledsinogram
4
, cos sinP t f x y x y t dxdy
![Page 5: Richard Dorrance November 4, 2011](https://reader035.fdocuments.in/reader035/viewer/2022062301/568157ad550346895dc53a33/html5/thumbnails/5.jpg)
Phantom and Sinogram
5
Shepp-Logan Phantom
![Page 6: Richard Dorrance November 4, 2011](https://reader035.fdocuments.in/reader035/viewer/2022062301/568157ad550346895dc53a33/html5/thumbnails/6.jpg)
CT Reconstruction
Restore image from projection data
Inverse Radon transform
Most common algorithm is filtered backprojection– “Smear” each projection over image plane
Accuracy of reconstruction depends on the number of detectors and projection angles
6
Original 4 Angles 16 Angles 64 Angles 256 Angles
![Page 7: Richard Dorrance November 4, 2011](https://reader035.fdocuments.in/reader035/viewer/2022062301/568157ad550346895dc53a33/html5/thumbnails/7.jpg)
Note on Filtering
7
No Filtering With Filtering
![Page 8: Richard Dorrance November 4, 2011](https://reader035.fdocuments.in/reader035/viewer/2022062301/568157ad550346895dc53a33/html5/thumbnails/8.jpg)
FBP Algorithm
Input: sinogram sino(θ, N) Output: image img(x,y)
for each θfilter sino(θ,*)for each x
for each yn = x cos θ + y sin θimg(x,y) = sino(θ, n) + img(x,y)
O(N3) algorithm– But highly parallelizable, given sufficient memory
bandwidth; not computationally intensive
8
![Page 9: Richard Dorrance November 4, 2011](https://reader035.fdocuments.in/reader035/viewer/2022062301/568157ad550346895dc53a33/html5/thumbnails/9.jpg)
Click to edit Master title style
High Speed 3D Tomographyon CPU, GPU, and FPGA
Nicolas GAC, Stéphane Mancini, Michel Desvignes, Dominique Houzet
![Page 10: Richard Dorrance November 4, 2011](https://reader035.fdocuments.in/reader035/viewer/2022062301/568157ad550346895dc53a33/html5/thumbnails/10.jpg)
3PA-PET (Pipelined, Prefetch, Parallelized)
10
![Page 11: Richard Dorrance November 4, 2011](https://reader035.fdocuments.in/reader035/viewer/2022062301/568157ad550346895dc53a33/html5/thumbnails/11.jpg)
Algorithms
11
![Page 12: Richard Dorrance November 4, 2011](https://reader035.fdocuments.in/reader035/viewer/2022062301/568157ad550346895dc53a33/html5/thumbnails/12.jpg)
Hardware
CPU– Desktop PC: Pentium 4 (3.2 GHz)– Workstation: bi-Xeon Dual Core (3.0 GHz)
GPU– Nvidia GeForce 8800 GTS (1.2 GHz, 96 Cores)
FPGA– Virtex 4 (200 MHz)
ASIC– Projected/Extrapolated (1.2 GHz)
12
![Page 13: Richard Dorrance November 4, 2011](https://reader035.fdocuments.in/reader035/viewer/2022062301/568157ad550346895dc53a33/html5/thumbnails/13.jpg)
CPU vs. GPU vs. FPGA vs. ASIC
13
![Page 14: Richard Dorrance November 4, 2011](https://reader035.fdocuments.in/reader035/viewer/2022062301/568157ad550346895dc53a33/html5/thumbnails/14.jpg)
w/ Proper Normalization
Hardware Algorithm # of PE [cycles/px] [cycles/px*PE]
Pentium 4 STIR 1 34,505.21 34,505.21
Pentium 4 VBI-flt(v1) 1 169,580.85 169,580.85
Pentium 4 VBI-flt(v2) 1 53,943.45 53,943.45
Pentium 4 VBI-flt(v3) 1 7,750.50 7,750.50
Xeon (Dual Core) STIR 1 16,682.94 16,682.94
Xeon (Dual Core) VBI-flt(v3) 1 3,400.53 3,400.53
Xeon (Dual Core) VBI-flt(v3) 2 1,694.45 3,388.90
Xeon (Dual Core) VBI-flt(v3) 4 854.49 3,417.97
GPU VBI-flt(v4) 96 115.09 11,049.11
GPU VBI-flt(v5) 96 58.13 5,580.36
FPGA VBI-fix 1 484.41 484.41
FPGA VBI-fix 4 149.97 599.89
FPGA VBI-fix 8 101.92 815.35
ASIC VBI-fix 1 580.12 580.12
ASIC VBI-fix 4 248.79 995.16
ASIC VBI-fix 8 156.95 1,255.58
ASIC VBI-fix 40 31.39 1,255.58
14
![Page 15: Richard Dorrance November 4, 2011](https://reader035.fdocuments.in/reader035/viewer/2022062301/568157ad550346895dc53a33/html5/thumbnails/15.jpg)
Click to edit Master title style
Reconfigurable MPSoC versus GPU:Performance, Power and Energy Evaluation
Diana Göhringer, Matthias Birk, Yves Dasse-Tiyo,Nicole Ruiter, Michael Hübner, Jürgen Becker
![Page 16: Richard Dorrance November 4, 2011](https://reader035.fdocuments.in/reader035/viewer/2022062301/568157ad550346895dc53a33/html5/thumbnails/16.jpg)
RAMPSoC
Runtime adaptive multi-processor system-on-chip– ROACH/iBOB-like system from a group out of Germany
16
![Page 17: Richard Dorrance November 4, 2011](https://reader035.fdocuments.in/reader035/viewer/2022062301/568157ad550346895dc53a33/html5/thumbnails/17.jpg)
3D Ultrasound Computed Tomography
Mammography for earlybreast cancer detection
3D USCT works on thesame principles asregular CT scans
17
![Page 18: Richard Dorrance November 4, 2011](https://reader035.fdocuments.in/reader035/viewer/2022062301/568157ad550346895dc53a33/html5/thumbnails/18.jpg)
Hardware
CPU– AMD Athlon 64 3200+ (2.2 GHz, 1 GB RAM)
GPU– Nvidia Tesla C2050 (1.15 GHz, 448 Cores)
FPGA– Xilinx Virtex-4FX100 (125 MHz)
18
![Page 19: Richard Dorrance November 4, 2011](https://reader035.fdocuments.in/reader035/viewer/2022062301/568157ad550346895dc53a33/html5/thumbnails/19.jpg)
CPU vs. GPU vs. FPGA
19
Hardware # of PE [cycles/img] [cycles/img*PE] [W] [1/J]
Athlon 64 1 330,000.00 330,000.00 177 37
GPU 448 3,714.50 1,664,096.00 270 1147
FPGA 8 18,000.00 144,000.00 3.61 1924
![Page 20: Richard Dorrance November 4, 2011](https://reader035.fdocuments.in/reader035/viewer/2022062301/568157ad550346895dc53a33/html5/thumbnails/20.jpg)
References
1. N. GAC, et al., “High Speed 3D Tomography on CPU, GPU, and FPGA,” EURASIP Journal on Embedded Systems, vol. 2008, Article ID 930250, 12 pages, 2008.
2. D. Göhringer, et al., “Reconfigurable MPSoC versus GPU: Performance, power and energy evaluation,” INDIN‘11, pp.848-853, 26-29 July 2011.
3. A. C. Kak and M. Slaney, Principles of Computerized Tomographic Imaging, IEEE Press, 1988.
4. J. Hsieh, Computerized Tomography: Principles, Design, Artifacts, and Recent Advancements, SPIE & Wiley, 2009.
20