Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

100
Afrigraph 2003 Course on Afrigraph 2003 Course on Advanced Interactive Ray Advanced Interactive Ray Tracing Tracing and and Interactive Global Interactive Global Illumination Illumination Ingo Wald Carsten Benthin Philipp Ingo Wald Carsten Benthin Philipp Slusallek Slusallek Saarland University Saarland University

description

Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination. Ingo Wald Carsten Benthin Philipp Slusallek Saarland University. Ray-Generation. First: What is Ray Tracing ?. Ray-Traversal. Intersection. Shading. Framebuffer. Agenda. - PowerPoint PPT Presentation

Transcript of Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Page 1: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Afrigraph 2003 Course onAfrigraph 2003 Course on

Advanced Interactive Ray TracingAdvanced Interactive Ray Tracingandand

Interactive Global IlluminationInteractive Global Illumination

Ingo Wald Carsten Benthin Philipp SlusallekIngo Wald Carsten Benthin Philipp Slusallek

Saarland UniversitySaarland University

Page 2: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

First: What is Ray Tracing ?First: What is Ray Tracing ?

Ray-Generation

Ray-Traversal

Intersection

Shading

Framebuffer

Page 3: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 3

AgendaAgenda

• Introduction & MotivationIntroduction & Motivation– Why Interactive Ray Tracing at all ?Why Interactive Ray Tracing at all ?

• Part I – Interactive Ray Tracing ArchitecturesPart I – Interactive Ray Tracing Architectures– Software Ray TracingSoftware Ray Tracing– Ray Tracing on Programmable GPUsRay Tracing on Programmable GPUs– Dedicated Ray Tracing HardwareDedicated Ray Tracing Hardware

• Part II – Advanced Ray Tracing IssuesPart II – Advanced Ray Tracing Issues– Handling Dynamic ScenesHandling Dynamic Scenes– The OpenRT Interactive Ray Tracing APIThe OpenRT Interactive Ray Tracing API

• Part III – New ApplicationsPart III – New Applications– Industrial Application: Interactive Visualization of Car HeadlightsIndustrial Application: Interactive Visualization of Car Headlights– Interactive Global IlluminationInteractive Global Illumination

• Summary and ConclusionsSummary and Conclusions

Page 4: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Why Interactive Ray Tracing ?Why Interactive Ray Tracing ?

Page 5: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 5

We have NVidia – so what do We have NVidia – so what do we need Ray Tracing for ?we need Ray Tracing for ?

• Because it is high quality…Because it is high quality…– Fully Programmable and Arbitrary Shading OperationsFully Programmable and Arbitrary Shading Operations

– All operations performed in floating pointAll operations performed in floating point

– Flexibility: Can shoot arbitrary RaysFlexibility: Can shoot arbitrary Rays• Shadows, reflections, refractions, …Shadows, reflections, refractions, …

• Even suitable for global illuminationEven suitable for global illumination

– Simple Programming ModelSimple Programming Model• No need for multiple passes or OpenGL ‘tricks’No need for multiple passes or OpenGL ‘tricks’

• For indirect effect (like shadows): just shoot a ray !For indirect effect (like shadows): just shoot a ray !

– Automatic ‘correctness’Automatic ‘correctness’• No need for approximations (like reflection maps)No need for approximations (like reflection maps)

Ray Tracing is much more flexible and powerful rendering Ray Tracing is much more flexible and powerful rendering algorithm than ‘classical’ triangle rasterizationalgorithm than ‘classical’ triangle rasterization

Page 6: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 6

We have NVidia – so what do We have NVidia – so what do we need Ray Tracing for ?we need Ray Tracing for ?

• But not only that : It’s also efficient !But not only that : It’s also efficient !– Logarithmic scene complexity Logarithmic scene complexity

• Useful for increasingly complex scenes (“1 mtri, no problem !” …)Useful for increasingly complex scenes (“1 mtri, no problem !” …)

– No multiple rendering passesNo multiple rendering passes

– ‘‘Automatic’ Visibility Culling & Occlusion CullingAutomatic’ Visibility Culling & Occlusion Culling• Hidden geometry not even touched …Hidden geometry not even touched …

• Depth complexity not an issueDepth complexity not an issue

– No overdraw, shading performed No overdraw, shading performed exactly once exactly once per rayper ray• Very useful for increasingly costly shadingVery useful for increasingly costly shading

– Small bandwidth requirements (if you do it right…)Small bandwidth requirements (if you do it right…)• Memory access coherence + culling + single shading + …Memory access coherence + culling + single shading + …

Page 7: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 7

We have NVidia – so what do We have NVidia – so what do we need Ray Tracing for ?we need Ray Tracing for ?

To summarize:To summarize:• … … it’s highly flexibleit’s highly flexible• … … it’s high-qualityit’s high-quality• … … it’s efficientit’s efficient

• And: All of that combines automaticallyAnd: All of that combines automatically– Can do some of that sometimes in HW, but usually not all togetherCan do some of that sometimes in HW, but usually not all together

Page 8: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 8

““If its so good, then why isn’t If its so good, then why isn’t it real ?”it real ?”

• 1.) Better asymptotic complexity, but huge constants1.) Better asymptotic complexity, but huge constants– 1 ray ~ 1000 CPU-cycles1 ray ~ 1000 CPU-cycles

– Runs on hardware that it doesn’t really fit to…Runs on hardware that it doesn’t really fit to…• Uses only tiny fraction of today’s CPUs, no parallelism, …Uses only tiny fraction of today’s CPUs, no parallelism, …

– Need Need manymany rays/sec for full interactivity rays/sec for full interactivity• ~ 1Mpix/frame * 4-fold anitaliasing *25 frames/sec * 10 rays/pixel ~ 1Mpix/frame * 4-fold anitaliasing *25 frames/sec * 10 rays/pixel

One One billionbillion rays per second … rays per second …

• 2.) Graphics users don’t have the choice2.) Graphics users don’t have the choice– Rasterization has highly sophisticated HW implementationsRasterization has highly sophisticated HW implementations

HW technology for rasterization 10 years ahead of RT HW…HW technology for rasterization 10 years ahead of RT HW…

– There There isis no interactive ray tracing chip (yet), no matter the cost… no interactive ray tracing chip (yet), no matter the cost… All applications are designed for OpenGLAll applications are designed for OpenGL

There is no There is no market market for interactive ray tracing (really ?)for interactive ray tracing (really ?) Still more money/time/effort spent on improving rasterizationStill more money/time/effort spent on improving rasterization

Page 9: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 9

Why is there no Ray Tracing Why is there no Ray Tracing Hardware ?Hardware ?

Because Graphics hardware evolved 20 years ago !Because Graphics hardware evolved 20 years ago !• And: Rasterization And: Rasterization was was the better choice back then…the better choice back then…

– Small scenes Small scenes (asymptotic) complexity doesn’t matter for small N(asymptotic) complexity doesn’t matter for small N

– Large triangles Large triangles Coherence: incremental ops & interpolation, low bandwidthCoherence: incremental ops & interpolation, low bandwidth

– Simple (integer-)operations, highly pipelinedSimple (integer-)operations, highly pipelinedFPU-requirements of ray tracing FPU-requirements of ray tracing unthinkableunthinkable 10 years ago… 10 years ago…

– No fragment ops except interpolation No fragment ops except interpolation – Programmability not an issue Programmability not an issue

Very deep pipelines: no dependencies, no branches, no nothing, … Very deep pipelines: no dependencies, no branches, no nothing, … Can be built in HW very efficient, very fast, very cheapCan be built in HW very efficient, very fast, very cheap

• Note: All of this is changing today !Note: All of this is changing today !– Eg today, GForce 3 already has more FPU power than Eg today, GForce 3 already has more FPU power than anyany CPU… CPU…

Page 10: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 10

Todays State of the Art in Todays State of the Art in Realtime Ray TracingRealtime Ray Tracing

Software Implementations are slowly becoming availableSoftware Implementations are slowly becoming available• Michael Muuss, Army Research LabsMichael Muuss, Army Research Labs

– Huge Cluster of SGI machines…Huge Cluster of SGI machines…

• Parker et al, University of UtahParker et al, University of Utah– 32-128 CPU SGI Origin32-128 CPU SGI Origin

• Saarland UniversitySaarland University– 4 dual PIII’s in 2000, up to 24 dual Athlon 1800+ today4 dual PIII’s in 2000, up to 24 dual Athlon 1800+ today

Hardware Architectures are already beeing designedHardware Architectures are already beeing designed• SaarCOR (Schmittler et al., HWWS 2002)SaarCOR (Schmittler et al., HWWS 2002)• Ray Tracing on Programmable GPUs (Purcell, SigGraph 2002)Ray Tracing on Programmable GPUs (Purcell, SigGraph 2002)• Hybrid Software/GPU system (Hart, HWWS 2002)Hybrid Software/GPU system (Hart, HWWS 2002)

Several alternatives for future realtime ray tracingSeveral alternatives for future realtime ray tracing– Can’t yet decide which is best, only know: “It’ll come”Can’t yet decide which is best, only know: “It’ll come”

Page 11: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 11

Todays State of the Art in Todays State of the Art in Realtime Ray TracingRealtime Ray Tracing

• Even today, IRT solves tasks that even high-end graphics Even today, IRT solves tasks that even high-end graphics hardware still cannot handle !hardware still cannot handle !– Highly complex models (Muuss, Utah, Saarland [RW2001])Highly complex models (Muuss, Utah, Saarland [RW2001])

– High-quality Isosurface and Volume Visualization (Utah)High-quality Isosurface and Volume Visualization (Utah)

– Shadows, reflections, arbitrary shading… [Saarland, Utah]Shadows, reflections, arbitrary shading… [Saarland, Utah]

– High-quality reflection simulation of car headlights [PGV2002]High-quality reflection simulation of car headlights [PGV2002]

– Interactive Global Illumination [RW2002]Interactive Global Illumination [RW2002]

Page 12: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 12

Todays State of the ArtTodays State of the Art- Some Snapshots- Some Snapshots

Page 13: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

VideoVideo

Page 14: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Part IPart I

Different Approaches toDifferent Approaches toRealtime Ray TracingRealtime Ray Tracing

Page 15: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 15

Different Approaches to Different Approaches to Realtime Ray TracingRealtime Ray Tracing

Basically three choices:Basically three choices:• Pure Software ImplementationsPure Software Implementations

– Today: Highly parallelToday: Highly parallel• Shared Memory (Utah), or PC Clusters (Saarland)Shared Memory (Utah), or PC Clusters (Saarland)

– Future: Single PC ? Future: Single PC ? • Moore’s Law also holds for CPUs !Moore’s Law also holds for CPUs !• Perhaps with streaming co-processors (e.g. “SSE++”)Perhaps with streaming co-processors (e.g. “SSE++”)

• Mixed SW/HW: RT on Programmable GPUsMixed SW/HW: RT on Programmable GPUs– Purcell et al., StandfordPurcell et al., Standford– Converges to the ‘coprocessor’ approachConverges to the ‘coprocessor’ approach

• Pure HWPure HW– Dedicated RT hardware (Schmittler et al., SaarCOR)Dedicated RT hardware (Schmittler et al., SaarCOR)

Summarize all three approachesSummarize all three approaches

Page 16: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Alternative IAlternative I

Software Ray TracingSoftware Ray Tracing(examplary on the Saarland engine)(examplary on the Saarland engine)

Page 17: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 17

The OpenRT Interactive Ray The OpenRT Interactive Ray Tracing EngineTracing Engine

Features of OpenRT:Features of OpenRT:• Highly efficient implementation of RT kernelsHighly efficient implementation of RT kernels

– On a single Athlon MP 1800+ CPU: ~ 500.000-1.5 million rays On a single Athlon MP 1800+ CPU: ~ 500.000-1.5 million rays per second for average models (100ktri – 1 Mtri)per second for average models (100ktri – 1 Mtri)

– Up to 10 million rps (rays/sec) range (no shading, simple scenes)Up to 10 million rps (rays/sec) range (no shading, simple scenes)

• Sophisticated parallelization on cluster of PCsSophisticated parallelization on cluster of PCs– Dynamic load-balancingDynamic load-balancing– Using up to 24 dual-Athlon MP 1800+ or 25 dual P4 Xeon Using up to 24 dual-Athlon MP 1800+ or 25 dual P4 Xeon

2.4GHz2.4GHz

• Dynamically loadable, fully programmable ShadersDynamically loadable, fully programmable Shaders– Arbitrary c-code shading, arbitrary raysArbitrary c-code shading, arbitrary rays– Renderman-like Shading LanguageRenderman-like Shading Language

• Can handle dynamic scenes (later)Can handle dynamic scenes (later)• OpenGL-like API (later)OpenGL-like API (later)

Page 18: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 18

Where does the speed come Where does the speed come from ?from ?

Speed depends on several factors…Speed depends on several factors…• Using fastest available hardwareUsing fastest available hardware

– Fast CPUs, and many CPUsFast CPUs, and many CPUs

• Good algorithms – Avoid operations in the first placeGood algorithms – Avoid operations in the first place– Fast Intersection and Traversal (kd-trees)Fast Intersection and Traversal (kd-trees)

– Minimize Intersections and Trv-steps with high-quality BSPsMinimize Intersections and Trv-steps with high-quality BSPs

• Just as important – Make sure you’re using your silicon Just as important – Make sure you’re using your silicon correctly !correctly !– Highly efficient implementationHighly efficient implementation

– Machine-dependent code, if necessary (SSE)Machine-dependent code, if necessary (SSE)

Page 19: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 19

Where does the speed come Where does the speed come from ?from ?

Keep the Computational Units busy !Keep the Computational Units busy !• Make CPU doesn’t stallMake CPU doesn’t stall

– Avoiding pipeline stalls has top priorityAvoiding pipeline stalls has top priority

Look at memory, caches and bandwidth !!!Look at memory, caches and bandwidth !!!– Example: Cache miss during triangle intersection costs about 4 Example: Cache miss during triangle intersection costs about 4

times as much as the computations themselves !!!times as much as the computations themselves !!! Packing, aligning, cache-friendly data layout, prefetching, …Packing, aligning, cache-friendly data layout, prefetching, …

• But: no details hereBut: no details here– Already covered that at Afrigraph 2001Already covered that at Afrigraph 2001

– It’s not one single method, its more a principle It’s not one single method, its more a principle

Page 20: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 20

Distributed Ray TracingDistributed Ray Tracing

• One CPU still not fast enough One CPU still not fast enough – 1 Mray/sec is fast, but not enough1 Mray/sec is fast, but not enough

– Need more CPUs Need more CPUs Cluster’s are cheap ($20k-$50k) Cluster’s are cheap ($20k-$50k)

• Many approaches:Many approaches:– Static vs dynamic load balancing Static vs dynamic load balancing

– Object-space vs image-space vs ray-based task partitioning, …Object-space vs image-space vs ray-based task partitioning, …

– Pixel-interleaved (load balancing) vs tiles (coherence)Pixel-interleaved (load balancing) vs tiles (coherence)

– ……

• Problem: Interactivity constraintProblem: Interactivity constraint– Have to finish whole frame in 1/10Have to finish whole frame in 1/10thth of a second of a second

– Few time for sophisticated reordering/schedulingFew time for sophisticated reordering/scheduling

Page 21: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 21

Distributed Ray TracingDistributed Ray Tracing

Our approach (mostly Carsten Benthin)Our approach (mostly Carsten Benthin)• Image-based task partitioningImage-based task partitioning

Break image up into ‘tiles’ (usually 16x16 or 32x32)Break image up into ‘tiles’ (usually 16x16 or 32x32)

– Since API: Can dynamically change task partitioning schemeSince API: Can dynamically change task partitioning scheme

• Strongly varying workload Strongly varying workload Need dynamic load balancing: Let clients ask for work …Need dynamic load balancing: Let clients ask for work …

• Have to care about network-latenciesHave to care about network-latencies– (10ms Network-latency = 10.000 rays !)(10ms Network-latency = 10.000 rays !)

– Highly efficient networking/communication code Highly efficient networking/communication code Double-buffering, prefetching, packing, streaming, asynchronous Double-buffering, prefetching, packing, streaming, asynchronous

sending and rendering, interleaving of different tasks, sending and rendering, interleaving of different tasks, multithreading, …multithreading, …

Page 22: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 22

Distributed Ray TracingDistributed Ray TracingResultsResults

• Can efficiently use many CPUsCan efficiently use many CPUs– 32x32 tiles at 640x480 = 150 tiles 32x32 tiles at 640x480 = 150 tiles enough for many CPUs enough for many CPUs

• Usually limiting factor: Pixels/second (not rays/sec)Usually limiting factor: Pixels/second (not rays/sec)– Bandwidth limited at server: 640x480 at 10-15 frames/secBandwidth limited at server: 640x480 at 10-15 frames/sec

– For < 10 fps: Usually achieve 90-99% client utilizationFor < 10 fps: Usually achieve 90-99% client utilization

– Client bandwidth usually not an issue … (100Mbit)Client bandwidth usually not an issue … (100Mbit)

• Rendering Complexity helps !Rendering Complexity helps !– More costly tiles = better compute/BW ratio, less Pixels/secMore costly tiles = better compute/BW ratio, less Pixels/sec

• Can use more CPUs without hitting bandwidth limitCan use more CPUs without hitting bandwidth limit

– Doubling rays/pixel easier than doubling framerateDoubling rays/pixel easier than doubling framerate• Framerate scales linearly only up to max framerateFramerate scales linearly only up to max framerate

• But always scales linearly in rays/pixelBut always scales linearly in rays/pixel

• Better networking hardware would definitely helpBetter networking hardware would definitely help

Page 23: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Realtime Ray TracingRealtime Ray TracingApproach IIApproach II

Ray Tracing on Programmable GPUsRay Tracing on Programmable GPUs

Page 24: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 24

Ray Tracing on Ray Tracing on Programmable GPUsProgrammable GPUs

Graphics Hardware todayGraphics Hardware today• GPUs are extremely powerful GPUs are extremely powerful

– Already more transistors than P4Already more transistors than P4

– Full IEEE floating point !Full IEEE floating point !

– Many, many, many parallel FPU’sMany, many, many parallel FPU’s

– Moore’s Law: Faster growth than for CPUsMoore’s Law: Faster growth than for CPUs

• GPUs become more and more programmableGPUs become more and more programmable– First: ‘Register Combiners’First: ‘Register Combiners’

– Then: ‘Vertex Shaders’Then: ‘Vertex Shaders’• Programmable per vertexProgrammable per vertex

• linear interpolation inside the verticeslinear interpolation inside the vertices

– Today: ‘Pixel Shaders’, ‘Fragment Programs’Today: ‘Pixel Shaders’, ‘Fragment Programs’• Fully programmable for each fragmentFully programmable for each fragment

Page 25: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 25

Ray Tracing on Ray Tracing on Programmable GPUsProgrammable GPUs

GPU programmability today:GPU programmability today:• Full IEEEFull IEEE• SIMD computationsSIMD computations• Access to ‘memory’ (textures) in every instructionAccess to ‘memory’ (textures) in every instruction• Multiple indirections (pointer chasing) now possibleMultiple indirections (pointer chasing) now possible

– ““dependent texture reads”dependent texture reads”

• Still: Several restrictionsStill: Several restrictions– Conditionals, loops, recursion, dependent texture writes …Conditionals, loops, recursion, dependent texture writes …

• Typically programmed in ‘GPU-assembler’Typically programmed in ‘GPU-assembler’• Most recent: High-level ‘meta’ languagesMost recent: High-level ‘meta’ languages

– E.g. ‘CG’ (‘C’ for GPUs)E.g. ‘CG’ (‘C’ for GPUs)

Page 26: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 26

Streaming Computations on Streaming Computations on Programmable GPUsProgrammable GPUs

Idea: Use GPU as streaming co-processorIdea: Use GPU as streaming co-processor– Don’t use it for rasterizing at all…Don’t use it for rasterizing at all…

• Pixels form a ‘stream’ of elementsPixels form a ‘stream’ of elements– Apply small program (‘kernel’) for whole streamApply small program (‘kernel’) for whole stream

• Render screen-aligned quad with a fragment shaderRender screen-aligned quad with a fragment shader Fragment program executed for each screen pixelFragment program executed for each screen pixel

• Each pixel operates on different dataEach pixel operates on different data– Read data from texturesRead data from textures

• Screen-aligned textures : 1 texel for each pixelScreen-aligned textures : 1 texel for each pixel

– Output to framebuffer : 1 ‘pixel’ for each fragment programOutput to framebuffer : 1 ‘pixel’ for each fragment program

– Feedback Loop: Copy framebuffer to texturesFeedback Loop: Copy framebuffer to textures

– Future: Directly write into texturesFuture: Directly write into textures

Page 27: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

KernelKernel

(Fragment Shader)(Fragment Shader)

Memory (Textures)Memory (Textures)

Ray Tracing on Ray Tracing on Programmable GPUsProgrammable GPUs

FrameFrame

BufferBuffer

ScreenScreen

alignedaligned

QuadQuad

FragmentFragment

OutputOutput

DataData

(Texels)(Texels)

Page 28: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

KernelKernel

(Fragment Shader)(Fragment Shader)

Memory (Textures)Memory (Textures)

Ray Tracing on Ray Tracing on Programmable GPUsProgrammable GPUs

FrameFrame

BufferBuffer

ScreenScreen

alignedaligned

QuadQuad

FragmentFragment

OutputOutput

DataData

(Texels)(Texels)

Page 29: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

KernelKernel

(Fragment Shader)(Fragment Shader)

Memory (Textures)Memory (Textures)

Ray Tracing on Ray Tracing on Programmable GPUsProgrammable GPUs

FrameFrame

BufferBuffer

ScreenScreen

alignedaligned

QuadQuad

FragmentFragment

OutputOutput

DataData

(Texels)(Texels)

Page 30: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

KernelKernel

(Fragment Shader)(Fragment Shader)

Memory (Textures)Memory (Textures)

Ray Tracing on Ray Tracing on Programmable GPUsProgrammable GPUs

FrameFrame

BufferBuffer

ScreenScreen

alignedaligned

QuadQuad

FragmentFragment

OutputOutput

DataData

(Texels)(Texels)

Page 31: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

KernelKernel

(Fragment Shader)(Fragment Shader)

Memory (Textures)Memory (Textures)

Ray Tracing on Ray Tracing on Programmable GPUsProgrammable GPUs

FrameFrame

BufferBuffer

ScreenScreen

alignedaligned

QuadQuad

FragmentFragment

OutputOutput

DataData

(Texels)(Texels)

Page 32: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

KernelKernel

(Fragment Shader)(Fragment Shader)

Memory (Textures)Memory (Textures)

Ray Tracing on Ray Tracing on Programmable GPUsProgrammable GPUs

FrameFrame

BufferBuffer

ScreenScreen

alignedaligned

QuadQuad

FragmentFragment

OutputOutput

DataData

(Texels)(Texels)

Page 33: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

KernelKernel

(Fragment Shader)(Fragment Shader)

Memory (Textures)Memory (Textures)

Ray Tracing on Ray Tracing on Programmable GPUsProgrammable GPUs

FrameFrame

BufferBuffer

ScreenScreen

alignedaligned

QuadQuad

FragmentFragment

OutputOutput

DataData

(Texels)(Texels)

Page 34: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

KernelKernel

(Fragment Shader)(Fragment Shader)

Memory (Textures)Memory (Textures)

Ray Tracing on Ray Tracing on Programmable GPUsProgrammable GPUs

FrameFrame

BufferBuffer

ScreenScreen

alignedaligned

QuadQuad

FragmentFragment

OutputOutput

DataData

(Texels)(Texels)

Page 35: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

KernelKernel

(Fragment Shader)(Fragment Shader)

Memory (Textures)Memory (Textures)

Ray Tracing on Ray Tracing on Programmable GPUsProgrammable GPUs

FrameFrame

BufferBuffer

ScreenScreen

alignedaligned

QuadQuad

FragmentFragment

OutputOutput

DataData

(Texels)(Texels)

Page 36: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

KernelKernel

(Fragment Shader)(Fragment Shader)

Memory (Textures)Memory (Textures)

Ray Tracing on Ray Tracing on Programmable GPUsProgrammable GPUs

FrameFrame

BufferBuffer

ScreenScreen

alignedaligned

QuadQuad

FragmentFragment

OutputOutput

DataData

(Texels)(Texels)

Feedback !Feedback !

Page 37: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 37

Ray Tracing on Ray Tracing on Programmable GPUsProgrammable GPUs

Mapping Ray Tracing to the GPUMapping Ray Tracing to the GPU• Use textures for the storing ‘variables’Use textures for the storing ‘variables’

– Ray: ‘origin’ and ‘direction’ 2D textures (3 floats each)Ray: ‘origin’ and ‘direction’ 2D textures (3 floats each)

– Hit: 2D texture (3 floats: u,v,id)Hit: 2D texture (3 floats: u,v,id)

– Vertices: 1D-texture of vertex positions (3 floats each)Vertices: 1D-texture of vertex positions (3 floats each)

– Triangles: 1D-texture of vertex ids (1 Triangles: 1D-texture of vertex ids (1 floatfloat each) each)

– Acceleration structure: e.g. 3D-texture for simple gridAcceleration structure: e.g. 3D-texture for simple grid

• Multiple indirections no problemMultiple indirections no problem– E.g. use triangle[i] as texture coordinate into vertex[] textureE.g. use triangle[i] as texture coordinate into vertex[] texture

– Up to 4 indirections (grid Up to 4 indirections (grid triangle list triangle list triangle triangle vertex) vertex)

Page 38: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 38

Ray Tracing on Ray Tracing on Programmable GPUsProgrammable GPUs

Write ‘kernels’ for different ray tracing opsWrite ‘kernels’ for different ray tracing ops• Ray GenerationRay Generation

– Get pixel position from texture coordinates Get pixel position from texture coordinates

– Somehow get camera settings (e.g. from quad color, or texture)Somehow get camera settings (e.g. from quad color, or texture)

– Compute corresponding rayCompute corresponding ray

– Write to ‘origin’, ‘direction’, ‘state’ texturesWrite to ‘origin’, ‘direction’, ‘state’ textures

• Triangle IntersectionTriangle Intersection– Read triangle ID to be intersected from stateRead triangle ID to be intersected from state

– Get triangle vertices from texturesGet triangle vertices from textures

– IntersectIntersect

– Update state textureUpdate state texture

• Similar for traversal, triangle list intersection, shading, …Similar for traversal, triangle list intersection, shading, …

Page 39: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 39

Ray Tracing on Ray Tracing on Programmable GPUsProgrammable GPUs

• Have kernels for ray generation, traversal, intersection, etc.Have kernels for ray generation, traversal, intersection, etc.• Each ray is in exactly one ‘state’ Each ray is in exactly one ‘state’

– E.g. in ‘intersection’ stateE.g. in ‘intersection’ state– Make sure only rays in ‘correct’ state are processedMake sure only rays in ‘correct’ state are processed

• E.g. apply intersection kernel only to rays in intersect stateE.g. apply intersection kernel only to rays in intersect state• Usual GL masking methods, e.g. stencil bits, early pixel kill etc.Usual GL masking methods, e.g. stencil bits, early pixel kill etc. Can generate overhead, but usually ok …Can generate overhead, but usually ok …

– Fragment program can change state of rayFragment program can change state of ray• E.g. change from ‘traversal’ to ‘intersection’ in non-empty voxelE.g. change from ‘traversal’ to ‘intersection’ in non-empty voxel

• Combine different kernels by just calling them in turnCombine different kernels by just calling them in turn– E.g. rendering an ‘intersection’ quad will do one intersection stepE.g. rendering an ‘intersection’ quad will do one intersection step

(but only for rays in intersect state !)(but only for rays in intersect state !)

– Secondary rays rel. easy for ‘Shader’ kernel Secondary rays rel. easy for ‘Shader’ kernel • Update origin&direction textures, go back to ‘traversal’ state…Update origin&direction textures, go back to ‘traversal’ state…

Page 40: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 40

Ray Tracing on Ray Tracing on Programmable GPUsProgrammable GPUs

Results:Results:• Easy to exploit parallelism in the GPUEasy to exploit parallelism in the GPU

– Many more pixels than fragment pipelinesMany more pixels than fragment pipelines

• Comparable performance to single CPUComparable performance to single CPU– Even though its only a prototype implementationEven though its only a prototype implementation

– Limited by fragment pipeline very soon…Limited by fragment pipeline very soon…

• Main LimitationMain Limitation– Fragment processing speedFragment processing speed

– Texture memoryTexture memory• Need many textures for each pixelNeed many textures for each pixel

• Also need to store whole scene in textureAlso need to store whole scene in texture

– BandwidthBandwidth

– Number of different states must be small !Number of different states must be small !

Page 41: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 41

Ray Tracing on Ray Tracing on Programmable GPUsProgrammable GPUs

Additional limitations of current GPUsAdditional limitations of current GPUs• Bandwidth problems due to missing loopsBandwidth problems due to missing loops

– Often have to write data just to save it for next iterationOften have to write data just to save it for next iteration

• Overhead due to missing ‘write’ capabilityOverhead due to missing ‘write’ capability• Accuracy problems – no ints, all floatsAccuracy problems – no ints, all floats

– E.g. rounding modes when reading IDs from a texture …E.g. rounding modes when reading IDs from a texture …

• Problems due to missing ‘dependent writes’Problems due to missing ‘dependent writes’– Many textures for input, but only one framebuffer for outputMany textures for input, but only one framebuffer for output

• Need multiple passes computing more than 3 values per pix.Need multiple passes computing more than 3 values per pix.

– Each fragment shader writes to exactly one predetermined positionEach fragment shader writes to exactly one predetermined position

– Hard to do recursive operations with that limitationHard to do recursive operations with that limitation• Kd-tree construction ?Kd-tree construction ?

Page 42: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 42

Ray Tracing on Ray Tracing on Programmable GPUsProgrammable GPUs

Ray tracing on GPUs in the future ?Ray tracing on GPUs in the future ?• Many limitations will (probably) changeMany limitations will (probably) change

– Loops, branches, dependent writes, int textures, texture memory, Loops, branches, dependent writes, int textures, texture memory, early pixel kill …early pixel kill …

• Performance will increase faster than for CPUsPerformance will increase faster than for CPUs

Might soon be faster, and similarly flexible, as ray tracing Might soon be faster, and similarly flexible, as ray tracing on a CPU !on a CPU !

Page 43: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Realtime Ray TracingRealtime Ray TracingApproach IIIApproach III

Dedicated Ray Tracing HardwareDedicated Ray Tracing Hardware

Page 44: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 44

Dedicated Ray Tracing Dedicated Ray Tracing HardwareHardware

• Relatively low efficiency when using GPU for RTRelatively low efficiency when using GPU for RT– Many units not needed at all (rasterization, z-buffer, clipping, Many units not needed at all (rasterization, z-buffer, clipping,

lighting, …)lighting, …)

– Lots of overheadLots of overhead

– Programmable units can never be as efficient as dedicated HWProgrammable units can never be as efficient as dedicated HW

Dedicated ray tracing HW should be more efficientDedicated ray tracing HW should be more efficient

• Building RT HW is feasible todayBuilding RT HW is feasible today– FPU power not a problem any more FPU power not a problem any more

(see GForce3 FPU performance)(see GForce3 FPU performance)

– Die size/Nr of transistors not a problem any moreDie size/Nr of transistors not a problem any more

– Main problem: Off-chip bandwidth !Main problem: Off-chip bandwidth !• Already between chip and cacheAlready between chip and cache

Page 45: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 45

Dedicated Ray Tracing Dedicated Ray Tracing HardwareHardware

Bandwidth: Same problem as in SWBandwidth: Same problem as in SW• Approach in SW: Bandwidth reduction by Coherent Ray Approach in SW: Bandwidth reduction by Coherent Ray

Tracing (packet traversal) Tracing (packet traversal) • HW: Much larger packets (64x64 vs 2x2 !)HW: Much larger packets (64x64 vs 2x2 !)

Much bigger bandwidth savingMuch bigger bandwidth saving

• Target realtime full-screen resolutionsTarget realtime full-screen resolutions Larger packet sizes not a problem Larger packet sizes not a problem Lots of coherence Lots of coherence

• Avoiding overhead simple in HWAvoiding overhead simple in HW– Much simpler than with SSEMuch simpler than with SSE

Page 46: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 46

SaarCOR ArchitectureSaarCOR Architecture

FeaturesFeatures• Based on interactive software ray tracerBased on interactive software ray tracer

– Exactly same data structures, …Exactly same data structures, …

• KD-trees as accelleration structureKD-trees as accelleration structure• Pakets of rays to reduce bandwidthPakets of rays to reduce bandwidth• Fixed OpenGL-like shading…Fixed OpenGL-like shading…• … … plus shadow and reflection raysplus shadow and reflection rays

Goals:Goals:• Simple low bandwidth memory interfaceSimple low bandwidth memory interface• Half the floating point requirements of GeForce3Half the floating point requirements of GeForce3• Achieves frame rates comparable to today’s gfxcardsAchieves frame rates comparable to today’s gfxcards

Page 47: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 47

SaarCOR Architecture:SaarCOR Architecture:System overviewSystem overview

Page 48: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 48

• ScalableScalable

• Fully pipelinedFully pipelined

• Multi threading for latency hidingMulti threading for latency hiding

• Simple communication pattern (no routing)Simple communication pattern (no routing)

• Highly asynchronousHighly asynchronous

SaarCOR Architecture:SaarCOR Architecture:FeaturesFeatures

Page 49: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 49

SaarCOR – Current StatusSaarCOR – Current Status

Simulation on register-transfer levelSimulation on register-transfer level

• Core @ 533MHz, Memory 64 Bit @ 133 MHz Core @ 533MHz, Memory 64 Bit @ 133 MHz (simple SD-RAM, no DDR!)(simple SD-RAM, no DDR!)

• Each pipeline uses 36 FP-unitsEach pipeline uses 36 FP-units

• Standard SaarCOR: Standard SaarCOR: – 4 pipelines4 pipelines– 16 threads per pipe16 threads per pipe– 1 GB/s bandwidth to memory (!)1 GB/s bandwidth to memory (!)– 272 KB for caches (!)272 KB for caches (!)

• Four pipes ~ Four pipes ~ ½½ FP-resources of GeForce 3 FP-resources of GeForce 3

Page 50: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 50

IssuesIssues

On-chip memory of standard SaarCOROn-chip memory of standard SaarCOR

• Caches: Caches: 272 KB272 KB• RF for rays: RF for rays: 288 KB288 KB• RF for stack: RF for stack: 535 KB535 KB

Register level simulations onlyRegister level simulations only

Simple shading only Simple shading only

Page 51: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 51

Benchmarks:Benchmarks:ScenesScenes

OpenGL-Like Shading:OpenGL-Like Shading:• No shadow raysNo shadow rays• No reflection raysNo reflection rays

Full screen resolutionFull screen resolution

1024 x 768 pixel1024 x 768 pixel

Page 52: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 52

Benchmarks: Benchmarks: Scenes (2)Scenes (2)

Page 53: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 53

Benchmarks:Benchmarks:ResultsResults

Today’s CPUs: 0.5 – 0.8 mrays/s factor of 100-200!

Page 54: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 54

Efficiency of standard SaarCOREfficiency of standard SaarCOR

Performance scales with number of pipelines, Performance scales with number of pipelines, threads, cache size and bandwidth.threads, cache size and bandwidth.

16 threads 16 threads 32 threads: + 10% 32 threads: + 10%

Benchmarks:Benchmarks:Results (3)Results (3)

Page 55: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 55

What about shading?What about shading?

• Right now: Shading only coarsely approximatedRight now: Shading only coarsely approximated– Fixed phong shader w/ bilinear texturingFixed phong shader w/ bilinear texturing

– Programmable Shading currently evaluatedProgrammable Shading currently evaluated

• Shading packets of rays exploits coherenceShading packets of rays exploits coherence

• BQD scene with bilinear texturesBQD scene with bilinear textures– 14 MB for shading data per frame14 MB for shading data per frame

– 300 – 600 MB/s bandwidth300 – 600 MB/s bandwidth

• Shading BW ~ Ray Tracing BWShading BW ~ Ray Tracing BW

Page 56: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 56

ConclusionsConclusions

SaarCOR architectureSaarCOR architecture

• Scales well in the numberScales well in the numberof pipelinesof pipelines

• Highly efficientHighly efficient– Uses half the FP power of GeForce3Uses half the FP power of GeForce3

– Requires very low bandwidthRequires very low bandwidth

• Provides full featured ray tracingProvides full featured ray tracing• Same frame rates as today’s graphics cardsSame frame rates as today’s graphics cards

Page 57: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 57

Current WorkCurrent Work

• Programmable shadingProgrammable shading• API: OpenRT [Wald’02]API: OpenRT [Wald’02]• Virtual Memory ManagementVirtual Memory Management• Incorporate Features and Algorithms from SW systemIncorporate Features and Algorithms from SW system

– Large Models [Wald’01]Large Models [Wald’01]

– Dynamic scenes [Wald’02]Dynamic scenes [Wald’02]

– Global Illumination [Wald’02]Global Illumination [Wald’02]

• Building a prototype …Building a prototype …

Page 58: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Realtime Ray TracingRealtime Ray TracingApproaches I-IIIApproaches I-III

Summary and ConclusionsSummary and Conclusions

Page 59: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 59

Realtime Ray TracingRealtime Ray Tracing

Summary:Summary:• Different upcoming (and competing !) architectures.Different upcoming (and competing !) architectures.• All these have different advantages / disadvantagesAll these have different advantages / disadvantages

– PC clusters: most flexible, but not useful for consumer marketPC clusters: most flexible, but not useful for consumer market

– GPUs: better performance growth, cheap, but awkward to useGPUs: better performance growth, cheap, but awkward to use

– HW: best performance, best efficiency, but costlyHW: best performance, best efficiency, but costly

Cannot yet predict which one will “win”…Cannot yet predict which one will “win”…

Page 60: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 60

Realtime Ray TracingRealtime Ray Tracing

Summary:Summary:• Different upcoming (and competing !) architectures.Different upcoming (and competing !) architectures.• All these have different advantages / disadvantagesAll these have different advantages / disadvantages

– PC clusters: most flexible, but not useful for consumer marketPC clusters: most flexible, but not useful for consumer market

– GPUs: better performance growth, cheap, but awkward to useGPUs: better performance growth, cheap, but awkward to use

– HW: best performance, best efficiency, but costlyHW: best performance, best efficiency, but costly

Cannot yet predict which one will “win”…Cannot yet predict which one will “win”…

But:But:

Question is not “Question is not “willwill realtime ray tracing ever come ?” realtime ray tracing ever come ?”

Questions rather is “how” and “when” will it come.Questions rather is “how” and “when” will it come.

Page 61: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

End of Part I - Questions ?End of Part I - Questions ?

Page 62: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Part IIPart II

Advanced Ray Tracing IssuesAdvanced Ray Tracing Issues

Page 63: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 63

Advanced Ray Tracing IssuesAdvanced Ray Tracing Issues

• Conclusions from Part I : Realtime Ray Tracing will comeConclusions from Part I : Realtime Ray Tracing will come

• Problem: All these architectures mostly focus only on the Problem: All these architectures mostly focus only on the core ray tracing algorithms, i.e. traversal & intersectioncore ray tracing algorithms, i.e. traversal & intersection

• Ubiquitous Realtime Ray Tracing opens new problemsUbiquitous Realtime Ray Tracing opens new problems– Dynamic Scenes ?Dynamic Scenes ?

– Suitable API(s) ?Suitable API(s) ?

– Implications for future Applications / SceneGraph libraries ?Implications for future Applications / SceneGraph libraries ?

Page 64: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 64

Interactive Ray TracingInteractive Ray Tracing

So far:So far:• Interactive RT possible even today, can already beat Interactive RT possible even today, can already beat

SGI/NVidia SGI/NVidia – Complex modelsComplex models

– High-Quality ApplicationsHigh-Quality Applications

Can do high-quality, Can do high-quality, interactive walkthroughs interactive walkthroughs

• But: “Walkthrough” is not But: “Walkthrough” is not really really interactiveinteractive– Not if scene remains static…Not if scene remains static…

Page 65: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 65

Issue I : Dynamic ScenesIssue I : Dynamic Scenes

• Fact: Ray Tracing Fact: Ray Tracing needs needs acceleration structureacceleration structure– Building it is very costlyBuilding it is very costly

– Precomputation only works for static scenesPrecomputation only works for static scenes

• But: ‘Real’ scenes usually aren’t static…But: ‘Real’ scenes usually aren’t static… “ “What is ‘interactive’ if I cannot interact with it ?”What is ‘interactive’ if I cannot interact with it ?”

• Problem: Few research on this topic…Problem: Few research on this topic…– Just wasn’t interesting before interactive ray tracing…Just wasn’t interesting before interactive ray tracing…

– Previous work: Usually on special casesPrevious work: Usually on special cases• Utah ‘Hack’: Keep dynamic objects out of accel structure…Utah ‘Hack’: Keep dynamic objects out of accel structure…

• [Reinhard RW2001]: Incremental updates of Uniform Grid[Reinhard RW2001]: Incremental updates of Uniform Grid– Costly, not hierarchicalCostly, not hierarchical

• [Moeller, EG2001]: Only rigid-body animation[Moeller, EG2001]: Only rigid-body animation

Page 66: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 66

Handling Dynamic ScenesHandling Dynamic Scenes

• Different kinds of dynamic behavior Different kinds of dynamic behavior – Hierarchical, rigid-body motion vs unstructured motionHierarchical, rigid-body motion vs unstructured motion

– Constrained unstructured motion (e.g. maximum displacement)Constrained unstructured motion (e.g. maximum displacement)

– All triangles animated vs few triangles animatedAll triangles animated vs few triangles animated

– Amortized over many rays/frames or over few raysAmortized over many rays/frames or over few rays

– ……

Page 67: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 67

Handling Dynamic ScenesHandling Dynamic Scenes

• Different kinds of dynamic behavior Different kinds of dynamic behavior – Hierarchical, rigid-body motion vs unstructured motionHierarchical, rigid-body motion vs unstructured motion

– Constrained unstructured motion (e.g. maximum displacement)Constrained unstructured motion (e.g. maximum displacement)

– All triangles animated vs few triangles animatedAll triangles animated vs few triangles animated

– Amortized over many rays/frames or over few raysAmortized over many rays/frames or over few rays

– ……

• Inherently different problems need different solutions…Inherently different problems need different solutions…

• One single algorithm will hardly do the jobOne single algorithm will hardly do the job

Page 68: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 68

Handling Dynamic ScenesHandling Dynamic Scenes

Alternative approach:Alternative approach:• Offer suite of different techniquesOffer suite of different techniques

– Hierarchical animation of whole objectsHierarchical animation of whole objects

– Fast Rebuild of objects for unstructured motion Fast Rebuild of objects for unstructured motion (with sacrifices in traversal speed)(with sacrifices in traversal speed)

– High-quality bsps for often-used static objects High-quality bsps for often-used static objects (with relatively long rebuild time)(with relatively long rebuild time)

• Let the application decide, which one is best for what !Let the application decide, which one is best for what !– IfIf anybody knows what’s best, it’s the application programmer anybody knows what’s best, it’s the application programmer

– Just like OpenGL: AJust like OpenGL: Applications pplications build display lists, not the build display lists, not the drivers !drivers !

– Allow combination of techniquesAllow combination of techniques• E.g.‘some’ unstructured motion but otherwise hierarchically animatedE.g.‘some’ unstructured motion but otherwise hierarchically animated

App needs good API to do that !App needs good API to do that !

Page 69: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 69

Handling Dynamic ScenesHandling Dynamic Scenes

Combining techniques in a hierarchical wayCombining techniques in a hierarchical way• Application groups geometry into ‘objects’Application groups geometry into ‘objects’

– Similar to building display lists (Similar to building display lists (API)API)

– Each object has separate BSP (just like PowerPlant)Each object has separate BSP (just like PowerPlant)

• ‘‘Hints’ can be given to control quality/speed tradeoffHints’ can be given to control quality/speed tradeoff– E.g. whether the object will be static or unstructuredE.g. whether the object will be static or unstructured

• Objects can be ‘instantiated’Objects can be ‘instantiated’– Just like ‘calling’ a display list (Just like ‘calling’ a display list ( API) API) Hierarchical animation: Just re-instantiate with new transform…Hierarchical animation: Just re-instantiate with new transform…

• Objects are kept in additional hierarchy levelObjects are kept in additional hierarchy level– With separate, fast With separate, fast andand high-quality BSP high-quality BSP

– During traversal, just transform the rays when they hit an objectDuring traversal, just transform the rays when they hit an object

Page 70: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 70

Handling Dynamic ScenesHandling Dynamic Scenes- Results- Results

• Side Effect: Instantiation is for free Side Effect: Instantiation is for free – Terrain: 1000 instances of 20ktri-tree: 20 Mtri (and dynamic !)Terrain: 1000 instances of 20ktri-tree: 20 Mtri (and dynamic !)– Sunflowers: 36.000 x 24ktri-sunflowers: 1 Sunflowers: 36.000 x 24ktri-sunflowers: 1 GigaGigaTri (dynamic !)Tri (dynamic !)

• TopLevel BSP reconstruction tolerableTopLevel BSP reconstruction tolerable– Some milliseconds even for a few thousand objectsSome milliseconds even for a few thousand objects– But: scalability bottleneck (redundant computation on each client)But: scalability bottleneck (redundant computation on each client)

• Hierarchical animation is cheapHierarchical animation is cheap– Transformations are cheap (compared with the rest)Transformations are cheap (compared with the rest)

• But: Unstructured motion still costlyBut: Unstructured motion still costly– Especially for big objects (Especially for big objects ( have to use low(er)-quality BSPs) have to use low(er)-quality BSPs)– High bandwidth requirements for sending data over network !!!High bandwidth requirements for sending data over network !!!– Tolerable for moderately complex objects (16k-64ktri)Tolerable for moderately complex objects (16k-64ktri)

• In practice: Total overhead usually ~10-20%In practice: Total overhead usually ~10-20%

Page 71: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 71

Handling Dynamic ScenesHandling Dynamic Scenes- Conclusions- Conclusions

• Works for many different scenes (BART Benchmark suite)Works for many different scenes (BART Benchmark suite)– ‘‘Robots’: Game-like scene, hierarchical animation of 161 ObjectsRobots’: Game-like scene, hierarchical animation of 161 Objects

– ‘‘Kitchen’: Mostly static, with many secondary effectsKitchen’: Mostly static, with many secondary effects

– ‘‘Museum’: Completely unstructured motionMuseum’: Completely unstructured motion• Correct (inter-)reflections, shadows, etc. also on moving triangles !Correct (inter-)reflections, shadows, etc. also on moving triangles !

• Also works for all applications we have built so farAlso works for all applications we have built so far– OpenRT based VRML97 viewer with VRML animationsOpenRT based VRML97 viewer with VRML animations

– Inventor-’port’ under way…Inventor-’port’ under way…

– Dynamic scenes in Interactive Global Illumination applicationDynamic scenes in Interactive Global Illumination application

Page 72: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 72

Handling Dynamic ScenesHandling Dynamic Scenes- Results- Results

Page 73: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 73

Handling Dynamic ScenesHandling Dynamic Scenes- Results- Results

VideoVideo

Page 74: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 74

Handling Dynamic ScenesHandling Dynamic Scenes- Remaining Problems- Remaining Problems

• Lots of potential for future research !Lots of potential for future research !– Faster kd-tree generation ?Faster kd-tree generation ?

– Kd-tree generation in HW ?Kd-tree generation in HW ?

– On-demand generation of kd-trees ?On-demand generation of kd-trees ?

– More efficient solutions for special problemsMore efficient solutions for special problems• Skinning, morphing, progressive meshes, …Skinning, morphing, progressive meshes, …

– ……

Page 75: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 75

Issue II – API IssuesIssue II – API Issues

So far:So far:• Fast, cheap, efficient, …Fast, cheap, efficient, …• Flexible, powerful shading …Flexible, powerful shading …• Can do big models and dynamic scenes, …Can do big models and dynamic scenes, …

So why is nobody using it ?So why is nobody using it ?

Page 76: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 76

Issue II – API IssuesIssue II – API Issues

So far:So far:• Fast, cheap, efficient, …Fast, cheap, efficient, …• Flexible, powerful shading …Flexible, powerful shading …• Can do big models and dynamic scenes, …Can do big models and dynamic scenes, …

So why is nobody using it ?So why is nobody using it ?

Because without a proper API, you can’t !Because without a proper API, you can’t !

Page 77: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 77

Issue II – API IssuesIssue II – API Issues

• Why do we need an API for Interactive Ray Tracing ?Why do we need an API for Interactive Ray Tracing ?– Side Effect: An API helps to ‘divide-n-conquer’ problems Side Effect: An API helps to ‘divide-n-conquer’ problems

(e.g. shaders, globillum, raytracing kernels, …) …(e.g. shaders, globillum, raytracing kernels, …) …

• E.g., can work separately on frontend and backend…E.g., can work separately on frontend and backend…

• Can Abstract from dynamic scene issues in globillum shader aso.Can Abstract from dynamic scene issues in globillum shader aso.

– It helps to create a ‘critical mass’ of usersIt helps to create a ‘critical mass’ of users• Rasterization only Rasterization only really really took off after OpenGLtook off after OpenGL

• Enables code portabilityEnables code portability

– Without an API, nobody will (or can) use it - except ‘insiders’Without an API, nobody will (or can) use it - except ‘insiders’• Not everybody has his own realtime raytracerNot everybody has his own realtime raytracer

• Not everybody wants to - or should - know all implementation detailsNot everybody wants to - or should - know all implementation details

For widespread Realtime Ray Tracing, we do need an APIFor widespread Realtime Ray Tracing, we do need an API

Page 78: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 78

Issue II – API IssuesIssue II – API Issues

• Problem: There are no suitable APIsProblem: There are no suitable APIs

• API has to support both “interactive” API has to support both “interactive” andand “ray tracing” “ray tracing”– OpenGL interactive, OpenGL interactive,

but not suitable for ray tracingbut not suitable for ray tracing

– Renderman/Rayshade/Povray ray tracing capable, Renderman/Rayshade/Povray ray tracing capable, but but inherentlyinherently offline … offline …

Need to find new API(s)…Need to find new API(s)…

Page 79: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 79

Issue II – API IssuesIssue II – API Issues

Goals for an Interactive Ray Tracing API:Goals for an Interactive Ray Tracing API:• As easy to learn and use as (standard) OpenGLAs easy to learn and use as (standard) OpenGL

– Leverage existing programmers’ experience with OpenGLLeverage existing programmers’ experience with OpenGL

• As powerful in Shading as RenderManAs powerful in Shading as RenderMan

Our Approach (OpenRT): Combine the best of bothOur Approach (OpenRT): Combine the best of both• Application API much like OpenGL/GLUTApplication API much like OpenGL/GLUT

– With necessary modifications for Ray Tracing (Shaders, Objects)With necessary modifications for Ray Tracing (Shaders, Objects)

• Shader API like RenderManShader API like RenderMan

Page 80: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 80

The OpenRT Interactive Ray The OpenRT Interactive Ray Tracing APITracing API

• Application API very OpenGL-likeApplication API very OpenGL-like– Geometry: rtVertex3f, rtNormal3f, …Geometry: rtVertex3f, rtNormal3f, …

– Primitives: rtBegin/End(RT_TRIANGLES, RT_QUAD, …)Primitives: rtBegin/End(RT_TRIANGLES, RT_QUAD, …)

– Transformation: rtPushMatrix(), rtMatrixMode(…), …Transformation: rtPushMatrix(), rtMatrixMode(…), …

– Geometry ObjectsGeometry Objects• Just like Display Lists (except: no side effects)Just like Display Lists (except: no side effects)

• rtNewObjects(), rtBeginObject(), rtEndObject(), rtInstantiate(),…rtNewObjects(), rtBeginObject(), rtEndObject(), rtInstantiate(),…

– Shader ObjectsShader Objects• Surface, Light, and Pixel Shaders, exchangeable ‘Renderer Object’Surface, Light, and Pixel Shaders, exchangeable ‘Renderer Object’

• Even support GLUT-like functionality …Even support GLUT-like functionality … Porting GL/GLUT-applications relatively easy Porting GL/GLUT-applications relatively easy

(except multi-pass, of course, …)(except multi-pass, of course, …)

Page 81: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 81

The OpenRT Interactive Ray The OpenRT Interactive Ray Tracing APITracing API

• Shader ObjectsShader Objects– Similar to Stanford Programmable Shading APISimilar to Stanford Programmable Shading API

– Dynamically loaded from DLLs/.so’s: Dynamically loaded from DLLs/.so’s: • rtShaderFile(), rtCreateShader(), rtBindShader()rtShaderFile(), rtCreateShader(), rtBindShader()

– Light shaders : Light shaders : rtCreateShader(…), rtUseLight(…)rtCreateShader(…), rtUseLight(…)

– Application-to-Shader communication via Shader Parameters: Application-to-Shader communication via Shader Parameters: • rtDeclareParam(), rtParameterHandle(…), rtParameter3f(…), …rtDeclareParam(), rtParameterHandle(…), rtParameter3f(…), …

• Parameters can be per vertex, per triangle, per shader, …Parameters can be per vertex, per triangle, per shader, …

• Retained-Mode / Frame Semantics: Retained-Mode / Frame Semantics: – Rendering uses Shader Parameters active at ‘end of frame’Rendering uses Shader Parameters active at ‘end of frame’

NOT at the time that shader/triangle was created…NOT at the time that shader/triangle was created…

– Actual rendering triggered at ‘rtSwapBuffers’Actual rendering triggered at ‘rtSwapBuffers’

– Rendering always done asynchronouslyRendering always done asynchronously

Page 82: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 82

The OpenRT Interactive Ray The OpenRT Interactive Ray Tracing APITracing API

• Shader API – Or how to write a shaderShader API – Or how to write a shader– Declare and Export Shader ParametersDeclare and Export Shader Parameters

• Store as member variablesStore as member variables

– Write callback-functionsWrite callback-functions• ‘‘Shade()’, ‘Illuminate()’, …Shade()’, ‘Illuminate()’, …

– Access Scene Data with RenderMan like APIAccess Scene Data with RenderMan like API• Geometry: rtsShadingNormal(), …Geometry: rtsShadingNormal(), …

• Lights: rtsIlluminate(), rtsOccluded(), rtsLightTransparency(), …Lights: rtsIlluminate(), rtsOccluded(), rtsLightTransparency(), …

– Shoot Arbitrary Secondary RaysShoot Arbitrary Secondary Rays• rtsTrace(…)rtsTrace(…)

Porting RenderMan shaders relatively easy, too…Porting RenderMan shaders relatively easy, too…

Page 83: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 83

The OpenRT Interactive Ray The OpenRT Interactive Ray Tracing APITracing API

• OpenRT: SummaryOpenRT: Summary– Fast and Interactive RenderingFast and Interactive Rendering

– Dynamic ScenesDynamic Scenes

– Very Powerful ShadingVery Powerful Shading

– API for using it …API for using it …

OpenRT is a complete 3D Rendering Engine …OpenRT is a complete 3D Rendering Engine …

Kernel behind OpenRT: Saarland RTRTKernel behind OpenRT: Saarland RTRT Might be changed to e.g. SaarCOR as soon as available…Might be changed to e.g. SaarCOR as soon as available…

Page 84: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 84

OpenRT Example 1OpenRT Example 1VRML97 @OpenRTVRML97 @OpenRT

• Example 1: VRML97 Viewer ported from OpenGLExample 1: VRML97 Viewer ported from OpenGL– Porting relatively easy, almost all functionality was therePorting relatively easy, almost all functionality was there

– Only Modification: Have to gather small objects into fewer bigger Only Modification: Have to gather small objects into fewer bigger objects for performance reasons…objects for performance reasons…

• ResultsResults– Can render all of VRML97 Can render all of VRML97

• Almost no matter how big…Almost no matter how big…

– Can put any kind of shader on any triangle (e.g. GlobIllum…)Can put any kind of shader on any triangle (e.g. GlobIllum…)

– Can do VRML animations, move objects, edit shaders & lightsCan do VRML animations, move objects, edit shaders & lights

Car Headlight,Car Headlight,

800.000 tri800.000 tri

Soda hall FloorSoda hall Floor

400.000 tris400.000 tris

Page 85: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 85

OpenRT Example 2OpenRT Example 2The BART BenchmarkThe BART Benchmark

• Example 2: The BART Benchmark scenesExample 2: The BART Benchmark scenes– To our knowledge, only system so far to render those at all…To our knowledge, only system so far to render those at all…

– All different kind of dynamic behavior, including reflections, All different kind of dynamic behavior, including reflections, refractions, shadows, …refractions, shadows, …

– With ‘GL’ Shader: > 10 frames per secondWith ‘GL’ Shader: > 10 frames per second

– With ‘Raytracing’ Shader : 2-5 frames per secondWith ‘Raytracing’ Shader : 2-5 frames per second

Page 86: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 86

OpenRT Example 3OpenRT Example 3Complex Outdoor SceneComplex Outdoor Scene

• Example 3: Massive Instantiation for Outdoor ScenesExample 3: Massive Instantiation for Outdoor Scenes– Pixel-accurate shadows !Pixel-accurate shadows !

Page 87: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 87

OpenRT Example 3OpenRT Example 3Complex Outdoor SceneComplex Outdoor Scene

Page 88: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 88

OpenRT Example 4OpenRT Example 4Massive Model VisualizationMassive Model Visualization

• Example 4: The PowerPlantExample 4: The PowerPlant– 12.5-37.5 million triangles12.5-37.5 million triangles

– Currently: With replication, without demand-loading/reorderingCurrently: With replication, without demand-loading/reordering

– Just recently: Can now also move the furnace ;-)Just recently: Can now also move the furnace ;-)

Page 89: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 89

OpenRT Example 5OpenRT Example 5Complex Shading Stress TestComplex Shading Stress Test

Page 90: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 90

OpenRT Example 5OpenRT Example 5Complex Shading Stress TestComplex Shading Stress Test

• Example 5: Shading Stress TestExample 5: Shading Stress Test– Volume Shader (CT Head)Volume Shader (CT Head)

• Applied to a ‘box’ of geometryApplied to a ‘box’ of geometry

– Lightfield Shader – on simple quadLightfield Shader – on simple quad

– Procedural Wood and MarbleProcedural Wood and Marble

– Procedural Bump-Mapping on mirrorProcedural Bump-Mapping on mirror Procedurally bump-mapped reflectionsProcedurally bump-mapped reflections

• Result: Everything combines perfectly:Result: Everything combines perfectly:– Transparent Shadow from Volume on Procedural Wood ShaderTransparent Shadow from Volume on Procedural Wood Shader

– Lightfield reflected in procedurally bump-mapped mirror…Lightfield reflected in procedurally bump-mapped mirror…

– … … attenuated by semi-transparent volumeattenuated by semi-transparent volume

– Multiple interreflectionsMultiple interreflections

– Of course, everything is interactive and fully dynamicOf course, everything is interactive and fully dynamic

Page 91: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 91

OpenRT Example 5OpenRT Example 5Complex Shading Stress TestComplex Shading Stress Test

Page 92: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 92

OpenRT Example 6OpenRT Example 6Interactive Global IlluminationInteractive Global Illumination

Implementation: Not now…Implementation: Not now…

Page 93: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 93

OpenRT Example 6OpenRT Example 6Interactive Global IlluminationInteractive Global Illumination

• Fully implemented in OpenRTFully implemented in OpenRT• GlobIllum Application is ‘Shader’ like any otherGlobIllum Application is ‘Shader’ like any other

– Automatically inherit capability for handline dynamic scenes, Automatically inherit capability for handline dynamic scenes, distribution, …distribution, …

• Same frontend as e.g. BART/OfficeSame frontend as e.g. BART/Office– Automatically inherit parser, user interface, etc…Automatically inherit parser, user interface, etc…

– Can be used from different applications (e.g. VRML viewer)Can be used from different applications (e.g. VRML viewer)

• Algorithms & Implementation: Later (Part III)Algorithms & Implementation: Later (Part III)

Page 94: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Questions ?Questions ?

For more info, also visitFor more info, also visit

http://www.OpenRT.dehttp://www.OpenRT.de

Page 95: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Part IIIPart III

New Applications enabled by New Applications enabled by Realtime Ray TracingRealtime Ray Tracing

Page 96: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

For more information on OpenRT, seeFor more information on OpenRT, see

http://www.OpenRT.http://www.OpenRT.dede

Page 97: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 97

The Saarland Interactive Ray The Saarland Interactive Ray Tracing ProjectTracing Project

• Started Jan 1Started Jan 1stst, 2000, 2000• (Original) Goal:(Original) Goal:

– Evaluate practicability of RT as an Interactive Rendering EngineEvaluate practicability of RT as an Interactive Rendering Engine

– Do a fair comparison and analysis of “RT vs GL”Do a fair comparison and analysis of “RT vs GL”• ““What are the advantages and disadvantages ?”What are the advantages and disadvantages ?”

• Compare on common ground: OpenGL like+shadows+reflectionsCompare on common ground: OpenGL like+shadows+reflections– No global illumination, no shading, no advanced features, …No global illumination, no shading, no advanced features, …

– And: Find out why is it so slow…And: Find out why is it so slow… Therefore, needed to build Fast Ray TracerTherefore, needed to build Fast Ray Tracer

Page 98: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 98

The Saarland Interactive Ray The Saarland Interactive Ray Tracing ProjectTracing Project

• Goals have constantly changed since thenGoals have constantly changed since then– It worked, so continue working on it …It worked, so continue working on it …

– One CPU not fast enough, so distribute …One CPU not fast enough, so distribute …

– Great for many triangles, so work on Great for many triangles, so work on really really large models …large models …

– People demand high quality, build full-featured ray tracer …People demand high quality, build full-featured ray tracer …

– If it’s good in Software, why not build it in hardware …If it’s good in Software, why not build it in hardware …

– Static scenes too limiting, make it dynamic …Static scenes too limiting, make it dynamic …

– Others want to use it, so build an API …Others want to use it, so build an API …

– And, if we have it anyway, why not do global illumination …And, if we have it anyway, why not do global illumination …

– ……

Page 99: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 99

Ray Tracing on Ray Tracing on Programmable GPUsProgrammable GPUs

• Application program relatively easy:Application program relatively easy:– Just render many screen-aligned quads with different fragment Just render many screen-aligned quads with different fragment

shadersshaders

• Need some way of ‘load balancing’Need some way of ‘load balancing’– Want to not execute ‘shade’ kernel if no rays is in shade stateWant to not execute ‘shade’ kernel if no rays is in shade state

• Important: Approach is Important: Approach is notnot SIMD SIMD– 1 Quad (=1 fragment program) for whole screen, 1 Quad (=1 fragment program) for whole screen, butbut

– Different rays can be in different statesDifferent rays can be in different statesDifferent pixels in fact behave differentlyDifferent pixels in fact behave differently

• No problem to already shade pixel 2 while still intersecting pixel 1…No problem to already shade pixel 2 while still intersecting pixel 1…

Page 100: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Feb 3rd, 2003 Afrigraph 2003 100