Post on 02-Jan-2016
description
GPU Acceleration in Registration
Danny Ruijters26 April 2007
GPU Acceleration in Registration, Danny Ruijters 2
Outline
• The GPU• Rigid 3D-3D Registration• Elastic Registration• Conclusions
GPU Acceleration in Registration, Danny Ruijters 3
The GPU
GPU Acceleration in Registration, Danny Ruijters 4
The graphics card
• Raserization of primitives• Texture mapping• Colour interpolation
GPU Acceleration in Registration, Danny Ruijters 5
The GPU
• Graphics Processing Unit• Programmable processor in the
graphics rendering pipeline• Parallel execution (SIMD like)
GPU Acceleration in Registration, Danny Ruijters 6
on-chip cache memoryvideo memory
system memory
rasterization
CPU
vertex shading
(T&L)
triangle setup
fragment shading
andraster
operations
textures
frame buffer
geometry
commands
pre-TnL cache
post-TnL cache
texture cache
Graphics rendering pipeline
GPU Acceleration in Registration, Danny Ruijters 7
Bottleneckson-chip cache memoryvideo memory
system memory
rasterization
CPU
vertex shading
(T&L)
triangle setup
fragment shading
andraster
operations
textures
frame buffer
geometry
commands
pre-TnL cache
post-TnL cache
texture cache
transform limited
fragment shader limited
CPU limited
texture limited
frame buffer limited
setup limited
raster limited
transfer limited
GPU Acceleration in Registration, Danny Ruijters 8
128 processing units
Local cache
Shared memory
GPU Acceleration in Registration, Danny Ruijters 9
Performance
GPU Acceleration in Registration, Danny Ruijters 10
Performance• Parallelism & pipelining (up to 16 parallel pipelines)• Vector processor• Moore’s Law: CPU: 2* performance per 18 months• GPU: 2* performance per 6 months
GeForce 7900 GTX GeForce 8800 GTX
Code name G71 G80
Release date 3 / 2006 11 / 2006
Transistors 278 M (90 nm) 681 M (90 nm)
Clock speed 650 MHz 1350 MHz
Processing units 24+8 (pixel + vertex) 128 (unified)
Peak pixel fill rate 10.4 Gigapixels/s 36.8 Gigapixels/s
Peak memory bandwidth
51.2 GB/s (256 bit) 86.4 GB/s (384 bit)
Memory 512 MB 768 MB
Peak performance 250 Gigaflops 520 Gigaflops
GPU Acceleration in Registration, Danny Ruijters 11
Textures & buffers
• 1D, 2D, 3D textures
• 2D output buffers (frame buffer, accumulation buffer, stencil buffer, p-buffer)
• 8, 10, 12, 16 bit integers, 16, 32 bit floating point
• 1 (intensity), 2 (luminance-alpha), 3 (RGB), 4 (RGBA) components per pixel
GPU Acceleration in Registration, Danny Ruijters 12
Historic overview GPU• RenderMan (1988, pre-history)• Intel MMX (SIMD, 1997, pre-history)• Register combiners (nVidia, 1999, bronze age)• Vender specific APIs (2001, iron age)• Generic assembly-like language (2002, middle-
ages) • Different high-level languages (2003, industrial
age)• CUDA: general purpose C-like language (2007,
modern age)
GPU Acceleration in Registration, Danny Ruijters 13
Register combiners (1999, bronse age)// Stage 0// spare0.rgb = gradient dot ViewDir, spare1.rgb = -(gradient dot ViewDir)glCombinerInputNV(GL_COMBINER0_NV,GL_RGB,GL_VARIABLE_A_NV,GL_TEXT
URE0_ARB,GL_EXPAND_NORMAL_NV,GL_RGB);glCombinerInputNV(GL_COMBINER0_NV,GL_RGB,GL_VARIABLE_B_NV,GL_CONS
TANT_COLOR1_NV,GL_EXPAND_NORMAL_NV,GL_RGB);glCombinerInputNV(GL_COMBINER0_NV,GL_RGB,GL_VARIABLE_C_NV,GL_TEXT
URE0_ARB,GL_EXPAND_NEGATE_NV,GL_RGB);glCombinerInputNV(GL_COMBINER0_NV,GL_RGB,GL_VARIABLE_D_NV,GL_CONS
TANT_COLOR1_NV,GL_EXPAND_NORMAL_NV,GL_RGB);glCombinerOutputNV(GL_COMBINER0_NV,GL_RGB,GL_SPARE0_NV,GL_SPARE1_
NV,GL_DISCARD_NV,GL_NONE,GL_NONE,GL_TRUE,GL_TRUE,GL_FALSE);
GPU Acceleration in Registration, Danny Ruijters 14
GL_ARB_fragment_program (2002)
!!ARBfp1.0
ATTRIB coord = fragment.texcoord[0];ATTRIB color = fragment.color;OUTPUT out = result.color;TEMP texel;TEMP lookup;
TEX texel, coord, texture[0], 3D;TEX lookup, texel, texture[1], 1D;
MUL out, lookup, color;END
GPU Acceleration in Registration, Danny Ruijters 15
GLSlang (2003)
uniform vec3 ViewDir;
void main (void){
float value;vec3 gradient;gradient = texture3(0, gl_TexCoord0) * 2.0 - 1.0;value = 1.0 - abs(dot(gradient, ViewDir));value *= 1.3 * dot(gradient, gradient);value = clamp(value, 0.0, 1.0);gl_FragColor = vec4(value);
}
GPU Acceleration in Registration, Danny Ruijters 16
CUDA (2007)
• Compute Unified Device Architecture• General purpose C-like language• nVidia only• Very recently released
GPU Acceleration in Registration, Danny Ruijters 17
Rigid 3D-3D Registration
GPU Acceleration in Registration, Danny Ruijters 18
3DRA – MR registration
GPU Acceleration in Registration, Danny Ruijters 19
3DRA – XperCT Registration 1
Pre-operative
GPU Acceleration in Registration, Danny Ruijters 20
3DRA – XperCT Registration 2
Post-operative:verification of the embolization
GPU Acceleration in Registration, Danny Ruijters 21
3DRA Slice
GPU Acceleration in Registration, Danny Ruijters 22
Mutual information
F. Maes et al., "Multimodality Image Registration by Maximization of Mutual Information,“IEEE Transactions on Medical Imaging 16(2), pp. 187-198, April 1997
GPU Acceleration in Registration, Danny Ruijters 23
Joint histogram
GPU Acceleration in Registration, Danny Ruijters 24
Resampling
Joint histogram: increment(g,g)
GPU Acceleration in Registration, Danny Ruijters 25
3DRA – MR, before, after
GPU Acceleration in Registration, Danny Ruijters 26
3DRA – MR: CPU interpolation
GPU Acceleration in Registration, Danny Ruijters 27
3DRA – MR: GPU interpolation
GPU Acceleration in Registration, Danny Ruijters 28
Elastic Registration
GPU Acceleration in Registration, Danny Ruijters 29
Elastic deformation
• Parameterized deformation:
• B-spline deformation:
GPU Acceleration in Registration, Danny Ruijters 30
Cubic B-spline
GPU Acceleration in Registration, Danny Ruijters 31
GPU linear interpolation
• Hardwired: linear interpolation is much faster than separate lookups
GPU Acceleration in Registration, Danny Ruijters 32
GPU Cubic Interpolation
• Compose cubic interpolation from weighted sum of linear interpolations:
=
C. Sigg, M. Hadwiger, “Fast Third-Order Texture Filtering”, GPU Gems 2
GPU Acceleration in Registration, Danny Ruijters 33
Outline of proof
=
GPU Acceleration in Registration, Danny Ruijters 34
GPU Cubic Interpolation
• 2D: 4 linear-interpolated lookups, instead of 16 direct lookups
• 3D: 8 linear-interpolated lookups, instead of 64 direct lookups
GPU Acceleration in Registration, Danny Ruijters 35
GPU Linear Interpolation AccuracynVidia QuadroFX 3500
-1
0
1
2
3
4
5
6
7
8
9
10
1 13 25 37 49 61 73 85 97 109 121 133 145 157 169 181 193 205 217 229 241 253
Err
or
* -1
0^-8
GPU Acceleration in Registration, Danny Ruijters 36
Linear deformation, linear interpolation
GPU Acceleration in Registration, Danny Ruijters 37
Linear deformation, cubic interpolation
GPU Acceleration in Registration, Danny Ruijters 38
Cubic deformation, linear interpolation
GPU Acceleration in Registration, Danny Ruijters 39
Cubic deformation, cubic interpolation
GPU Acceleration in Registration, Danny Ruijters 40
Optimization
• Many parameters: huge parameter space
• Solution: use derivatives like Jacobian, Hessian
• Examples: Gradient Descent, Quasi-Newton, Levenberg-Marquardt
GPU Acceleration in Registration, Danny Ruijters 41
GPU Elastic Registration Iteration
1. Generate deformed image on GPU & store to texture
2. Calculate Similarity Measure & First-Order Derivative on GPU
– Texture with reference image– Texture with deformed image
GPU Acceleration in Registration, Danny Ruijters 42
First-Order Derivative of Sim. Measure
J. Kybic, M. Unser, “Fast Parametric Elastic Image Registration”
GPU Acceleration in Registration, Danny Ruijters 43
Derivative of the Similarity Measure
SSD:
GPU Acceleration in Registration, Danny Ruijters 44
Derivative of the Deformed Image
• Sobel operator to calculate gradients:
-1 0 1
-4 0 4
-1 0 1
1 4 1
0 0 0
-1 -4 -1
GPU Acceleration in Registration, Danny Ruijters 45
Derivative of the Control Points
• Constant• B-spline: separatable kernel of fixed size
GPU Acceleration in Registration, Danny Ruijters 46
Original Fluoroscopy Sequence
GPU Acceleration in Registration, Danny Ruijters 47
2 * 2 Control Points
GPU Acceleration in Registration, Danny Ruijters 48
8 * 8 Control Points
GPU Acceleration in Registration, Danny Ruijters 49
Deformation Field
GPU Acceleration in Registration, Danny Ruijters 50
GPU Elastic Registration
• 40 images: Quasi Newton: 16 seconds
• Gradient Descent: 63 seconds• 8 * 8 Control Points: rest motion• Multi-resolution deformation field,
with reduced parameters (discussed with Dirk Loeckx)
GPU Acceleration in Registration, Danny Ruijters 51
CUDA Libraries
GPU Acceleration in Registration, Danny Ruijters 52
CUDA Software Stack
GPU Acceleration in Registration, Danny Ruijters 53
CUDA Libraries
• CUBLAS• CUFFT
GPU Acceleration in Registration, Danny Ruijters 54
CUBLAS
• Basic Linear Algebra Subprograms• Vector, Matrix, Numerical Math• Almost no initialization• Function calls
GPU Acceleration in Registration, Danny Ruijters 55
CUBLAS performanceexecution times scalar vector add dual-core Woodcrest and G80 core
0
50
100
150
200
250
300
350
400
450
500
0 500 1000 1500 2000 2500 3000 3500 4000 4500
data s ize vector (kB)
exec
utio
n tim
e (m
s)
G80 (ms)
Woodcrest (ms)
GPU Acceleration in Registration, Danny Ruijters 56
CUBLAS performanceexecution times vector inproduct dual-core Woodcrest and G80 core
0.0000
50.0000
100.0000
150.0000
200.0000
250.0000
300.0000
350.0000
400.0000
450.0000
500.0000
0 500 1000 1500 2000 2500 3000 3500 4000 4500
data s ize vector (kB)
exec
utio
n tim
e (m
s)
G80 (ms)
Woodcrest (ms)
GPU Acceleration in Registration, Danny Ruijters 57
CUFFT performanceexecution times 2D FFT single-core Woodcrest and G80 core
(size 2^n)
0.001
0.01
0.1
1
10
100
1000
10000
1 10 100 1000 10000
N point 2D FFT
Ex
ec
uti
on
tim
e (
ms
)
G80 (CudaFFT)
Woodcrest (FFTW)
GPU Acceleration in Registration, Danny Ruijters 58
Conclusion & Future work
GPU Acceleration in Registration, Danny Ruijters 59
Conclusions
• GPU: powerful parallel processor, but has its limitations
• Rigid Registration: interpolation on the GPU
• Elastic Registration: calculation of the Similarity Measure & first order derivative on the GPU
GPU Acceleration in Registration, Danny Ruijters 60
Future work
• Multi-resolution deformation fields• 2D-3D registration of the Coronary
Arteries (not presented)
GPU Acceleration in Registration, Danny Ruijters 61
Questions?