Post on 21-Dec-2015
Hardware-Based Nonlinear Filtering Hardware-Based Nonlinear Filtering and Segmentation using High-Level and Segmentation using High-Level
Shading LanguagesShading Languages
I. Viola, A. Kanitsar, I. Viola, A. Kanitsar, M.M. E. GrE. Grölleröller
Institute of Computer Graphics and AlgorithmsInstitute of Computer Graphics and Algorithms
Vienna University of TechnologyVienna University of Technology
Vienna, AustriaVienna, Austria
2 / 23Ivan Viola Vienna University of Technology
Volume Visualization PipelineVolume Visualization Pipeline
CPUCPU
GPU
GPU
CPUDATA ACQUISITION
DATA ENHANCEMENT
VISUALIZATION MAPPING
RENDERINGRENDERING
VISUALIZATION MAPPING
DATA ENHANCEMENT
3 / 23Ivan Viola Vienna University of Technology
GPU-based AlgorithmsGPU-based Algorithms high performancehigh performance high flexibilityhigh flexibility easy implementation: HLSLeasy implementation: HLSL
necessary features:necessary features: floating point precisionfloating point precision long shader programslong shader programs
latest commodity graphics hardware latest commodity graphics hardware
DATA
ENHANCEMENT
liver datasetliver dataset segmented vesselssegmented vessels
4 / 23Ivan Viola Vienna University of Technology
Talk OutlineTalk Outline processing pipelineprocessing pipeline GPU-based filteringGPU-based filtering
per-vertex stageper-vertex stage per-fragment stageper-fragment stage
median filtermedian filter bilateral filterbilateral filter rotated mask filterrotated mask filter
GPU-based segmentationGPU-based segmentation
5 / 23Ivan Viola Vienna University of Technology
GPU
GPU
CPU
Liver Vessel Tree VisualizationLiver Vessel Tree Visualization pre-filteringpre-filtering
improving thresholding segmentationimproving thresholding segmentation edge-preserving filtersedge-preserving filters
interactive threshold adjustmentinteractive threshold adjustment mask generationmask generation volumetric clippingvolumetric clipping volume renderingvolume rendering
6 / 23Ivan Viola Vienna University of Technology
Processing PipelineProcessing Pipeline
7 / 23Ivan Viola Vienna University of Technology
Talk OutlineTalk Outline processing pipelineprocessing pipeline GPU-based filteringGPU-based filtering
per-vertex stageper-vertex stage per-fragment stageper-fragment stage
median filtermedian filter bilateral filterbilateral filter rotated mask filterrotated mask filter
GPU-based segmentationGPU-based segmentation
8 / 23Ivan Viola Vienna University of Technology
Filtering in Graphics HardwareFiltering in Graphics HardwareIssuesIssues
data representation: texturesdata representation: textures 3D texture3D texture stack of 2D texturesstack of 2D textures
access to value: texture fetchaccess to value: texture fetch neighborhood addressing: texture offsetneighborhood addressing: texture offset
we use 5 we use 5××55××5 neighborhood5 neighborhood filter implementation: perfilter implementation: per--fragment stagefragment stage results: rendered into off-screen bufferresults: rendered into off-screen buffer
9 / 23Ivan Viola Vienna University of Technology
TEXTURE STACK OFF-SCREEN BUFFER STACK
Data RepresentationData Representation
TEXTURE STACK OFF-SCREEN BUFFER STACK
10 / 23Ivan Viola Vienna University of Technology
Neighborhood AddressingNeighborhood Addressing
Two alternatives:Two alternatives:
directly in fragment programdirectly in fragment program requires additional computation requires additional computation
pre-compute in per-vertex stagepre-compute in per-vertex stage store in vertex attributesstore in vertex attributes interpolation “for-free”interpolation “for-free” swizzle operatorswizzle operator
11 / 23Ivan Viola Vienna University of Technology
+ float4(-2, 2, -1,1)IN.TEXCOORD0.xyxy
OUT.TEXCOORD0.xyzw=
Address Pre-computationAddress Pre-computation
IN.TEXCOORD0.xy
OUT.TEXCOORD0.xy =
+ float4(-2, 2
FILTER KERNEL
XY
X-2Y+2
X-1Y+1
PER-VERTEX STAGE
TEXCOORD0.xyTEXCOORD0.zw
TEXCOORD0.zyTEXCOORD0.xw
XY XW
ZWZY
12 / 23Ivan Viola Vienna University of Technology
Per-fragment StagePer-fragment Stage medical data - 12 bit precisionmedical data - 12 bit precision
fixed point 12-bit arithmeticsfixed point 12-bit arithmetics use cache coherenceuse cache coherence exploit 4D instructionsexploit 4D instructions reduce conditionalsreduce conditionals reduce number of registersreduce number of registers push computation to per-vertex stagepush computation to per-vertex stage
13 / 23Ivan Viola Vienna University of Technology
Median FilterMedian Filter central value of ordered setcentral value of ordered set implementationimplementation
CPU-based CPU-based sorting sorting GPU-based GPU-based similar to similar to quickselect()quickselect()
3 1
5 5
2 7
4
6
7
1 2 3 4 5 5 6 7 7
14 / 23Ivan Viola Vienna University of Technology
GPU-based Median FilterGPU-based Median Filter input data 12 bit [0..4095]input data 12 bit [0..4095] multi-pass approachmulti-pass approach not efficient on CPUnot efficient on CPU exploiting GPU 4D arithmeticsexploiting GPU 4D arithmetics
0 1 2 3 4 5 6 7
15 / 23Ivan Viola Vienna University of Technology
edge preservation: anisotropic filter kerneledge preservation: anisotropic filter kernel product of two weightsproduct of two weights::
geometric:geometric:
photometric:photometric:
Bilateral FilterBilateral Filter
x
f(x)
high geometric weightlow geometric weight
high geometric weightlow photometric weight
16 / 23Ivan Viola Vienna University of Technology
GPU-based Bilateral FilterGPU-based Bilateral Filter weights are precomputedweights are precomputed geometric weight stored geometric weight stored
in unused vertex in unused vertex attributes (attributes (COLOR0COLOR0))
photometric weight photometric weight stored in 1D stored in 1D mirrormirror LUT LUT
weight productweight product sum-up contributions & sum-up contributions &
weightsweights normalizenormalize
17 / 23Ivan Viola Vienna University of Technology
Rotated Mask FilterRotated Mask Filteranisotropic noise removal with edge anisotropic noise removal with edge
preservationpreservationsplits filter mask into sub-regionssplits filter mask into sub-regionsmean and variance value for each sub-mean and variance value for each sub-
regionregion result – mean value of sub-region with result – mean value of sub-region with
minimal varianceminimal varianceGPU implementationGPU implementation
single pass - slowsingle pass - slowmultiple passes - reduce temp. registersmultiple passes - reduce temp. registers
0 0
0 7
7 7
0
7
7
0 0
7
0 0
0 7
7
7
0
7
7
7
7 7 7
18 / 23Ivan Viola Vienna University of Technology
Talk OutlineTalk Outline processing pipelineprocessing pipeline GPU-based filteringGPU-based filtering
per-vertex stageper-vertex stage per-fragment stageper-fragment stage
median filtermedian filter bilateral filterbilateral filter rotated mask filterrotated mask filter
GPU-based segmentationGPU-based segmentation
19 / 23Ivan Viola Vienna University of Technology
SegmentationSegmentation input: pre-filtered data after noise input: pre-filtered data after noise
removalremoval thresholding segmentationthresholding segmentation
0 outside interval0 outside interval 1 within interval1 within interval
interactive threshold adjustmentinteractive threshold adjustment output: compressed formoutput: compressed form
32 slices in one 32 bit slice 32 slices in one 32 bit slice
20 / 23Ivan Viola Vienna University of Technology
ResultsResultsOperation GPU [ms] CPU [ms] Speedup
Median filter 24678 48639 1.97
Bilateral filter 9668 14706 1.52
Rotated mask f. 7989 58003 7.26
Thresholding 40 349 8.73
Thresholding
& compression 64 – –
GPU: NVIDIA GeForceFX 5900 Ultra GPU: NVIDIA GeForceFX 5900 Ultra CPU: AMD AthlonXP 2.4 GHz, 1GB DDR RAMCPU: AMD AthlonXP 2.4 GHz, 1GB DDR RAM liver dataset: 512liver dataset: 512×512×72×512×72
21 / 23Ivan Viola Vienna University of Technology
ResultsResults
22 / 23Ivan Viola Vienna University of Technology
ConclusionsConclusions data enhancement step on GPU!data enhancement step on GPU! simple tasks simple tasks better speedup better speedup optimization HW specificoptimization HW specific high-level programminghigh-level programming
friendlyfriendly many implementation possibilitiesmany implementation possibilities compiler efficiencycompiler efficiency
23 / 23Ivan Viola Vienna University of Technology
Thank you for your attention!Thank you for your attention!