Data Compression for Hardware-accelerated Volume Rendering
description
Transcript of Data Compression for Hardware-accelerated Volume Rendering
computer graphics & computer graphics & visualizationvisualization
Data Compression for Data Compression for Hardware-accelerated Volume RenderingHardware-accelerated Volume Rendering
Jens SchneiderJens Schneider
Rüdiger WestermannRüdiger Westermann
Technical University MunichTechnical University Munich
computer graphics & computer graphics & visualizationvisualization
MotivationMotivation
Need to deal with data of increasing size:Need to deal with data of increasing size:• Large-scaleLarge-scale
• Multi-dimensionalMulti-dimensional
• Multi-parameterMulti-parameter
Increasing problems:Increasing problems:• CompressionCompression
• RepresentationRepresentation
• RenderingRendering
We will adress all three problems!We will adress all three problems!
computer graphics & computer graphics & visualizationvisualization
Talk OutlineTalk OutlineThe Approach – The Approach – Vector QuantizationVector Quantization
ContributionsContributionsQuality and speedQuality and speed
• Hierachical encodingHierachical encoding• PCA-SplitPCA-Split• Progressive encoding of time-resolved dataProgressive encoding of time-resolved data
Multi-dimensional dataMulti-dimensional data• Vectors of arbitrary lengthVectors of arbitrary length
Rendering from compressed dataRendering from compressed data• GPU-based decoding and renderingGPU-based decoding and rendering• Per-fragment evaluationPer-fragment evaluation• Interactive frameratesInteractive framerates
computer graphics & computer graphics & visualizationvisualization
Talk OutlineTalk OutlineThe Application – The Application – Volume RenderingVolume Rendering
• Large-scale volumetric data setsLarge-scale volumetric data sets
• Time-varying sequencesTime-varying sequences
16 MB / 14 fps 0.78 MB / 11 fps16 MB / 14 fps 0.78 MB / 11 fps
1.4 GB / 20 fps1.4 GB / 20 fps
70 MB / 24 fps70 MB / 24 fps
computer graphics & computer graphics & visualizationvisualization
Talk OutlineTalk Outline
The Future – The Future – Video Compression ?Video Compression ?• Video compression techniques very exciting!Video compression techniques very exciting!
• Merge video decoding pipeline and 3D APIMerge video decoding pipeline and 3D API
Promising TechnologiesPromising Technologies• MPEG-II StreamsMPEG-II Streams
• XvMC APIXvMC API
• OpenGL SuperbuffersOpenGL Superbuffers
• Commodity graphics hardware video functionalityCommodity graphics hardware video functionality
Chip vendors just beginning to realize this!Chip vendors just beginning to realize this!
computer graphics & computer graphics & visualizationvisualization
Vector QuantizationVector Quantization
Codebook Codebook CC
with codewordswith codewords
EncoderEncoderXXnn
iinn=E(X=E(Xnn))
Input mappingInput mapping
DecoderDecoder
X‘X‘nn=C(i=C(inn)) Output mappingOutput mapping
iinn
computer graphics & computer graphics & visualizationvisualization
Vector QuantizationVector Quantization
LBG-AlgorithmLBG-Algorithm• Linde, Buzo and Gray 1980Linde, Buzo and Gray 1980
• Iterative refinement of a previous CodebookIterative refinement of a previous Codebook
• Sensitive to quality of first CodebookSensitive to quality of first Codebook
• Usually computationally expensiveUsually computationally expensive
Speed-Up possible (and necessary)Speed-Up possible (and necessary)• Partial searchesPartial searches
• Fast searchesFast searches
• Better initial Codebook (i.e. PCA-Splits)Better initial Codebook (i.e. PCA-Splits)
LBG-Algorithm can be fast!LBG-Algorithm can be fast!
computer graphics & computer graphics & visualizationvisualization
Vector QuantizationVector Quantization
The PCA-SplitThe PCA-Split• Lensch et.al. 2001 – BRDF CompressionLensch et.al. 2001 – BRDF Compression
• Covariance analysis to find optimal splitting planeCovariance analysis to find optimal splitting plane
• Cut a cluster of input vectors in two by this plane.Cut a cluster of input vectors in two by this plane.
• Plane is given by centroid of current set and largest Plane is given by centroid of current set and largest Eigenvector (= normal) of the Auto-Covariance MatrixEigenvector (= normal) of the Auto-Covariance Matrix
computer graphics & computer graphics & visualizationvisualization
Vector QuantizationVector Quantization
LBG as PCA post-processingLBG as PCA post-processing• Increases fidelityIncreases fidelity
• Leads to stable Voronoi-RegionsLeads to stable Voronoi-Regions
• Only a few steps are necessaryOnly a few steps are necessary
• Great speed-up compared to LBG only!Great speed-up compared to LBG only!
A series of LBG steps, codebook from last slideA series of LBG steps, codebook from last slide
computer graphics & computer graphics & visualizationvisualization
ExampleExample
Full-color confocal microscopy scan, 512Full-color confocal microscopy scan, 51222x32xRGBx32xRGB
Original, 32MBOriginal, 32MB 4D vectors, 2MB4D vectors, 2MB32D vectors, 1MB32D vectors, 1MB
computer graphics & computer graphics & visualizationvisualization
Hierarchical Vector QuantizationHierarchical Vector Quantization
LaplaceLaplace
DecompositionDecomposition
computer graphics & computer graphics & visualizationvisualization
Hierarchical Vector QuantizationHierarchical Vector Quantization
4433 dim. VQ dim. VQ
223 3 dim. VQdim. VQ
Direct CopyDirect Copy
computer graphics & computer graphics & visualizationvisualization
Hierarchical Vector QuantizationHierarchical Vector Quantization
Output:Output:• One RGB Index-VolumeOne RGB Index-Volume
• Two CodebooksTwo Codebooks
RGB Index-Volume RGB Index-Volume 3D Texture 3D Texture
Codebooks Codebooks 2D 2D -Textures-Textures
computer graphics & computer graphics & visualizationvisualization
ExampleExample
Visible Human (Male), RGB slice 2048x1216Visible Human (Male), RGB slice 2048x1216
Compression took 10.0 seconds, PSNR = 34.72dBCompression took 10.0 seconds, PSNR = 34.72dB
Original (7.1MB) Compressed (285KB)
computer graphics & computer graphics & visualizationvisualization
TimingsTimings
Reference System: P4 2.8GHz, 1GB memoryReference System: P4 2.8GHz, 1GB memory
VHP Slice, 2048x1216 RGBVHP Slice, 2048x1216 RGB 10.0 sec10.0 sec
Engine 256Engine 25622x128 CT-Scanx128 CT-Scan 19.0 sec19.0 sec
Skull 256Skull 25633 CT-Scan CT-Scan 50.6 sec50.6 sec
Vortex Sequence, 128Vortex Sequence, 12833x100x100 13 (5) min13 (5) min
Shockwave Sequence, 256Shockwave Sequence, 25633x89x89 29 (13) min29 (13) min
computer graphics & computer graphics & visualizationvisualization
RenderingRendering
GPU-based decodingGPU-based decoding• Indices stored in 3D RGB-texture (3/64th original size)Indices stored in 3D RGB-texture (3/64th original size)
• Decode index per block Decode index per block dependent fetchdependent fetch
• Decode adress per block Decode adress per block 4433 adress texture adress texture
Decoding process in flatlandDecoding process in flatland
computer graphics & computer graphics & visualizationvisualization
RenderingRendering
Render 3D index and adress textureRender 3D index and adress texture• Nearest neighbor interpolation for bothNearest neighbor interpolation for both
• GL_REPEAT for adress textureGL_REPEAT for adress texture
Per-fragment decodingPer-fragment decoding• Decode detail components and dependent fetchDecode detail components and dependent fetch
• Add the details to average component (Red channel)Add the details to average component (Red channel)
• Lookup result in 1D RGBLookup result in 1D RGB transfer function transfer function
Problem:Problem:
Complex fragment shader slows down renderingComplex fragment shader slows down rendering
computer graphics & computer graphics & visualizationvisualization
RenderingRendering
Solution:Solution: Deferred Fragment ProcessingDeferred Fragment Processing
Avoid decoding in empty regions. „Empty“ means:Avoid decoding in empty regions. „Empty“ means:
a) a) -Transfer function maps 0 -Transfer function maps 0 0. 0.• Check on CPUCheck on CPU
• Switch between two possible rendering modesSwitch between two possible rendering modes
b) Average value is 0 (Red channel)b) Average value is 0 (Red channel)• Check in a first, simple fragment programCheck in a first, simple fragment program
• Fragment‘s depth value is set accordinglyFragment‘s depth value is set accordingly
• Second pass: discard (early Z-Test) or render fragmentSecond pass: discard (early Z-Test) or render fragment
• Full decoding only performed in second passFull decoding only performed in second pass
computer graphics & computer graphics & visualizationvisualization
25625622x128 Engine CT Scanx128 Engine CT Scan
19.0 seconds, PSNR = 36.17dB (P4 2.8GHz)19.0 seconds, PSNR = 36.17dB (P4 2.8GHz)
Original (8MB) – 19 fps Compressed (402KB) – 12 fps
computer graphics & computer graphics & visualizationvisualization
25625633 Skull CT Scan Skull CT Scan
50.6 seconds, PSNR = 35.35dB (P4 2.8GHz)50.6 seconds, PSNR = 35.35dB (P4 2.8GHz)
Original (16MB) – 14 fps Compressed (780KB) – 11 fps
computer graphics & computer graphics & visualizationvisualization
Time-resolved SequencesTime-resolved SequencesExploit temporal coherences during compression:Exploit temporal coherences during compression:
• Group of Frames (GOF)Group of Frames (GOF)
First frame in a GOF:First frame in a GOF:• PCA-Split followed by LBG-RefinementPCA-Split followed by LBG-Refinement
Other frames:Other frames:• LBG-refinement of last Index-Volume and CodebookLBG-refinement of last Index-Volume and Codebook
Result:Result:• Great speed-up (factor 2 to 3)Great speed-up (factor 2 to 3)
• Very large GOFs possible (64+ frames)Very large GOFs possible (64+ frames)
• Virtually same fidelity as frame-by-frameVirtually same fidelity as frame-by-frame
computer graphics & computer graphics & visualizationvisualization
12812833x100 Vortex-Simulationx100 Vortex-Simulation
5 minutes, PSNR = 34.43dB (P4 2.8 GHz)5 minutes, PSNR = 34.43dB (P4 2.8 GHz)
Original (200MB) - 28 fps Compressed (11MB) - 16 fps
computer graphics & computer graphics & visualizationvisualization
25625633x89 Shockwave-Sequencex89 Shockwave-Sequence
13 minutes, PSNR = 51.36dB (P4 2.8 GHz)13 minutes, PSNR = 51.36dB (P4 2.8 GHz)
Original (1.4GB) - 20 fps Compressed (70MB) - 24 fps
computer graphics & computer graphics & visualizationvisualization
ConclusionsConclusions
• Compression ratios of approx. 20:1Compression ratios of approx. 20:1
• Interactive rendering possibleInteractive rendering possible
• Easy random access to each frameEasy random access to each frame
• Wide variety of data sets handledWide variety of data sets handled
• Currently only nearest neighbor interpolationCurrently only nearest neighbor interpolation• Mainly limited by performance / instruction count.Mainly limited by performance / instruction count.
• Tri-linear interpolation can be done on newer GPUs!Tri-linear interpolation can be done on newer GPUs!
computer graphics & computer graphics & visualizationvisualization
Online DemoOnline Demo
Shockwave sequenceShockwave sequence
Vortex sequenceVortex sequence
computer graphics & computer graphics & visualizationvisualization
Typical MPEG Decoding PipelineTypical MPEG Decoding Pipeline
The Future ?The Future ?
CPUCPU
Video ChipVideo Chip
MPEG StreamMPEG Stream
De-QuantisationDe-Quantisation
Motion CompensationMotion Compensation
Inverse DCTInverse DCT
Colorspace ConversionColorspace Conversion
Predictor /Predictor /
CorrectorCorrector
methodmethod
FurtherFurther
compressioncompression
opportunitiesopportunities
computer graphics & computer graphics & visualizationvisualization
The Future ?The Future ?
Merge with OpenGL APIMerge with OpenGL API
MPEG StreamMPEG Stream
De-QuantisationDe-Quantisation
Motion CompensationMotion Compensation
Inverse DCTInverse DCT
Colorspace ConversionColorspace Conversion
P- / Super-Buffer BlitP- / Super-Buffer Blit
Bind as TextureBind as Texture
Fragment ProcessingFragment Processing
XvMCXvMC
computer graphics & computer graphics & visualizationvisualization
XvMCXvMC
Extension to X-ServerExtension to X-Server
Already supported on: Already supported on: • GeForce 4 MX / GeForce FX (full)GeForce 4 MX / GeForce FX (full)
• Other GeForces (no iDCT)Other GeForces (no iDCT)
Driver-CodeDriver-Code• No OpenSourceNo OpenSource
• Other vendors working on implementationOther vendors working on implementation
Specification: Mark Vojkovich, XFree ProjectSpecification: Mark Vojkovich, XFree Project
Good Performance !Good Performance !
computer graphics & computer graphics & visualizationvisualization
Other PossibilitiesOther Possibilities
Super-Buffer / „Über-Buffer“Super-Buffer / „Über-Buffer“• OpenGL extensionOpenGL extension
• Basically allows malloc() on video RAMBasically allows malloc() on video RAM
• Beta implementation availableBeta implementation available
Might be used to merge video and OpenGL pipes!Might be used to merge video and OpenGL pipes!• More OS IndependenceMore OS Independence
• More hardware IndependenceMore hardware Independence
• Easier to implementEasier to implement
• Only on newer GPUsOnly on newer GPUs
Some research still necessary!Some research still necessary!
computer graphics & computer graphics & visualizationvisualization
Thank You!Thank You!
Questions ?Questions ?