Data Compression for Hardware-accelerated Volume Rendering

computer graphics & computer graphics & visualizationvisualization

Data Compression for Data Compression for Hardware-accelerated Volume RenderingHardware-accelerated Volume Rendering

Jens SchneiderJens Schneider

Rüdiger WestermannRüdiger Westermann

Technical University MunichTechnical University Munich


MotivationMotivation

Need to deal with data of increasing size:Need to deal with data of increasing size:• Large-scaleLarge-scale

• Multi-dimensionalMulti-dimensional

• Multi-parameterMulti-parameter

Increasing problems:Increasing problems:• CompressionCompression

• RepresentationRepresentation

• RenderingRendering

We will adress all three problems!We will adress all three problems!


Talk OutlineTalk OutlineThe Approach – The Approach – Vector QuantizationVector Quantization

ContributionsContributionsQuality and speedQuality and speed

• Hierachical encodingHierachical encoding• PCA-SplitPCA-Split• Progressive encoding of time-resolved dataProgressive encoding of time-resolved data

Multi-dimensional dataMulti-dimensional data• Vectors of arbitrary lengthVectors of arbitrary length

Rendering from compressed dataRendering from compressed data• GPU-based decoding and renderingGPU-based decoding and rendering• Per-fragment evaluationPer-fragment evaluation• Interactive frameratesInteractive framerates


Talk OutlineTalk OutlineThe Application – The Application – Volume RenderingVolume Rendering

• Large-scale volumetric data setsLarge-scale volumetric data sets

• Time-varying sequencesTime-varying sequences

16 MB / 14 fps 0.78 MB / 11 fps16 MB / 14 fps 0.78 MB / 11 fps

1.4 GB / 20 fps1.4 GB / 20 fps

70 MB / 24 fps70 MB / 24 fps


Talk OutlineTalk Outline

The Future – The Future – Video Compression ?Video Compression ?• Video compression techniques very exciting!Video compression techniques very exciting!

• Merge video decoding pipeline and 3D APIMerge video decoding pipeline and 3D API

Promising TechnologiesPromising Technologies• MPEG-II StreamsMPEG-II Streams

• XvMC APIXvMC API

• OpenGL SuperbuffersOpenGL Superbuffers

• Commodity graphics hardware video functionalityCommodity graphics hardware video functionality

Chip vendors just beginning to realize this!Chip vendors just beginning to realize this!


Vector QuantizationVector Quantization

Codebook Codebook CC

with codewordswith codewords

EncoderEncoderXXnn

iinn=E(X=E(Xnn))

Input mappingInput mapping

DecoderDecoder

X‘X‘nn=C(i=C(inn)) Output mappingOutput mapping

iinn



LBG-AlgorithmLBG-Algorithm• Linde, Buzo and Gray 1980Linde, Buzo and Gray 1980

• Iterative refinement of a previous CodebookIterative refinement of a previous Codebook

• Sensitive to quality of first CodebookSensitive to quality of first Codebook

• Usually computationally expensiveUsually computationally expensive

Speed-Up possible (and necessary)Speed-Up possible (and necessary)• Partial searchesPartial searches

• Fast searchesFast searches

• Better initial Codebook (i.e. PCA-Splits)Better initial Codebook (i.e. PCA-Splits)

LBG-Algorithm can be fast!LBG-Algorithm can be fast!



The PCA-SplitThe PCA-Split• Lensch et.al. 2001 – BRDF CompressionLensch et.al. 2001 – BRDF Compression

• Covariance analysis to find optimal splitting planeCovariance analysis to find optimal splitting plane

• Cut a cluster of input vectors in two by this plane.Cut a cluster of input vectors in two by this plane.

• Plane is given by centroid of current set and largest Plane is given by centroid of current set and largest Eigenvector (= normal) of the Auto-Covariance MatrixEigenvector (= normal) of the Auto-Covariance Matrix



LBG as PCA post-processingLBG as PCA post-processing• Increases fidelityIncreases fidelity

• Leads to stable Voronoi-RegionsLeads to stable Voronoi-Regions

• Only a few steps are necessaryOnly a few steps are necessary

• Great speed-up compared to LBG only!Great speed-up compared to LBG only!

A series of LBG steps, codebook from last slideA series of LBG steps, codebook from last slide


ExampleExample

Full-color confocal microscopy scan, 512Full-color confocal microscopy scan, 51222x32xRGBx32xRGB

Original, 32MBOriginal, 32MB 4D vectors, 2MB4D vectors, 2MB32D vectors, 1MB32D vectors, 1MB


Hierarchical Vector QuantizationHierarchical Vector Quantization

LaplaceLaplace

DecompositionDecomposition



4433 dim. VQ dim. VQ

223 3 dim. VQdim. VQ

Direct CopyDirect Copy



Output:Output:• One RGB Index-VolumeOne RGB Index-Volume

• Two CodebooksTwo Codebooks

RGB Index-Volume RGB Index-Volume 3D Texture 3D Texture

Codebooks Codebooks 2D 2D -Textures-Textures


ExampleExample

Visible Human (Male), RGB slice 2048x1216Visible Human (Male), RGB slice 2048x1216

Compression took 10.0 seconds, PSNR = 34.72dBCompression took 10.0 seconds, PSNR = 34.72dB

Original (7.1MB) Compressed (285KB)


TimingsTimings

Reference System: P4 2.8GHz, 1GB memoryReference System: P4 2.8GHz, 1GB memory

VHP Slice, 2048x1216 RGBVHP Slice, 2048x1216 RGB 10.0 sec10.0 sec

Engine 256Engine 25622x128 CT-Scanx128 CT-Scan 19.0 sec19.0 sec

Skull 256Skull 25633 CT-Scan CT-Scan 50.6 sec50.6 sec

Vortex Sequence, 128Vortex Sequence, 12833x100x100 13 (5) min13 (5) min

Shockwave Sequence, 256Shockwave Sequence, 25633x89x89 29 (13) min29 (13) min


RenderingRendering

GPU-based decodingGPU-based decoding• Indices stored in 3D RGB-texture (3/64th original size)Indices stored in 3D RGB-texture (3/64th original size)

• Decode index per block Decode index per block dependent fetchdependent fetch

• Decode adress per block Decode adress per block 4433 adress texture adress texture

Decoding process in flatlandDecoding process in flatland


RenderingRendering

Render 3D index and adress textureRender 3D index and adress texture• Nearest neighbor interpolation for bothNearest neighbor interpolation for both

• GL_REPEAT for adress textureGL_REPEAT for adress texture

Per-fragment decodingPer-fragment decoding• Decode detail components and dependent fetchDecode detail components and dependent fetch

• Add the details to average component (Red channel)Add the details to average component (Red channel)

• Lookup result in 1D RGBLookup result in 1D RGB transfer function transfer function

Problem:Problem:

Complex fragment shader slows down renderingComplex fragment shader slows down rendering


RenderingRendering

Solution:Solution: Deferred Fragment ProcessingDeferred Fragment Processing

Avoid decoding in empty regions. „Empty“ means:Avoid decoding in empty regions. „Empty“ means:

a) a) -Transfer function maps 0 -Transfer function maps 0 0. 0.• Check on CPUCheck on CPU

• Switch between two possible rendering modesSwitch between two possible rendering modes

b) Average value is 0 (Red channel)b) Average value is 0 (Red channel)• Check in a first, simple fragment programCheck in a first, simple fragment program

• Fragment‘s depth value is set accordinglyFragment‘s depth value is set accordingly

• Second pass: discard (early Z-Test) or render fragmentSecond pass: discard (early Z-Test) or render fragment

• Full decoding only performed in second passFull decoding only performed in second pass


25625622x128 Engine CT Scanx128 Engine CT Scan

19.0 seconds, PSNR = 36.17dB (P4 2.8GHz)19.0 seconds, PSNR = 36.17dB (P4 2.8GHz)

Original (8MB) – 19 fps Compressed (402KB) – 12 fps


25625633 Skull CT Scan Skull CT Scan

50.6 seconds, PSNR = 35.35dB (P4 2.8GHz)50.6 seconds, PSNR = 35.35dB (P4 2.8GHz)

Original (16MB) – 14 fps Compressed (780KB) – 11 fps


Time-resolved SequencesTime-resolved SequencesExploit temporal coherences during compression:Exploit temporal coherences during compression:

• Group of Frames (GOF)Group of Frames (GOF)

First frame in a GOF:First frame in a GOF:• PCA-Split followed by LBG-RefinementPCA-Split followed by LBG-Refinement

Other frames:Other frames:• LBG-refinement of last Index-Volume and CodebookLBG-refinement of last Index-Volume and Codebook

Result:Result:• Great speed-up (factor 2 to 3)Great speed-up (factor 2 to 3)

• Very large GOFs possible (64+ frames)Very large GOFs possible (64+ frames)

• Virtually same fidelity as frame-by-frameVirtually same fidelity as frame-by-frame


12812833x100 Vortex-Simulationx100 Vortex-Simulation

5 minutes, PSNR = 34.43dB (P4 2.8 GHz)5 minutes, PSNR = 34.43dB (P4 2.8 GHz)

Original (200MB) - 28 fps Compressed (11MB) - 16 fps


25625633x89 Shockwave-Sequencex89 Shockwave-Sequence

13 minutes, PSNR = 51.36dB (P4 2.8 GHz)13 minutes, PSNR = 51.36dB (P4 2.8 GHz)

Original (1.4GB) - 20 fps Compressed (70MB) - 24 fps


ConclusionsConclusions

• Compression ratios of approx. 20:1Compression ratios of approx. 20:1

• Interactive rendering possibleInteractive rendering possible

• Easy random access to each frameEasy random access to each frame

• Wide variety of data sets handledWide variety of data sets handled

• Currently only nearest neighbor interpolationCurrently only nearest neighbor interpolation• Mainly limited by performance / instruction count.Mainly limited by performance / instruction count.

• Tri-linear interpolation can be done on newer GPUs!Tri-linear interpolation can be done on newer GPUs!


Online DemoOnline Demo

Shockwave sequenceShockwave sequence

Vortex sequenceVortex sequence


Typical MPEG Decoding PipelineTypical MPEG Decoding Pipeline

The Future ?The Future ?

CPUCPU

Video ChipVideo Chip

MPEG StreamMPEG Stream

De-QuantisationDe-Quantisation

Motion CompensationMotion Compensation

Inverse DCTInverse DCT

Colorspace ConversionColorspace Conversion

Predictor /Predictor /

CorrectorCorrector

methodmethod

FurtherFurther

compressioncompression

opportunitiesopportunities


The Future ?The Future ?

Merge with OpenGL APIMerge with OpenGL API

MPEG StreamMPEG Stream

De-QuantisationDe-Quantisation

Motion CompensationMotion Compensation

Inverse DCTInverse DCT

Colorspace ConversionColorspace Conversion

P- / Super-Buffer BlitP- / Super-Buffer Blit

Bind as TextureBind as Texture

Fragment ProcessingFragment Processing

XvMCXvMC


XvMCXvMC

Extension to X-ServerExtension to X-Server

Already supported on: Already supported on: • GeForce 4 MX / GeForce FX (full)GeForce 4 MX / GeForce FX (full)

• Other GeForces (no iDCT)Other GeForces (no iDCT)

Driver-CodeDriver-Code• No OpenSourceNo OpenSource

• Other vendors working on implementationOther vendors working on implementation

Specification: Mark Vojkovich, XFree ProjectSpecification: Mark Vojkovich, XFree Project

Good Performance !Good Performance !


Other PossibilitiesOther Possibilities

Super-Buffer / „Über-Buffer“Super-Buffer / „Über-Buffer“• OpenGL extensionOpenGL extension

• Basically allows malloc() on video RAMBasically allows malloc() on video RAM

• Beta implementation availableBeta implementation available

Might be used to merge video and OpenGL pipes!Might be used to merge video and OpenGL pipes!• More OS IndependenceMore OS Independence

• More hardware IndependenceMore hardware Independence

• Easier to implementEasier to implement

• Only on newer GPUsOnly on newer GPUs

Some research still necessary!Some research still necessary!


Thank You!Thank You!

Questions ?Questions ?

Data Compression for Hardware-accelerated Volume Rendering

Documents

Transcript of Data Compression for Hardware-accelerated Volume Rendering