GPU-Assisted Path Tracing

GPU-Assisted Path Tracing

Matthias Boindl

Christian Machacek

Institute of Computer Graphics and Algorithms

Vienna University of Technology

2

Motivation: Why Path Tracing?

Physically basedNature provides the reference image

Parallelizable

Sublinear in #objects

Conceptually simpleCan lead to a clean implementation

But: fast implementation on GPUs not trivial

Outline

Path tracing introMain steps of the algorithm

Mapping the algorithm to the GPUHow to organize code into kernels

When to launch kernels

How to pass data between kernels

Acceleration structuresFocus on bounding volume hierarchies

3Christian Machacek

Like ray tracing, except it……supports arbitrary BRDFs

…is stochastic: at each bounce, the new direction is decided randomly

Convergence video

From Pharr, Humphreys: PBRT, 2nd ed. (2010) 4

Path Tracing Intro


Path Tracing Pseudocode

while image not converged r = new ray from eye through next pixel do i = closest intersection of r with scene if no i: break if i is on a light source: c = c + throughput * emission randomly pick new direction and create reflected ray r evaluate BRDF at i update throughput while path throughput high enough


Path Tracing Pseudocode

while image not converged r = new ray from eye through next pixel do i = closest intersection of r with scene if no i: break if i is on a light source: c = c + throughput * emission randomly pick new direction and create reflected ray r evaluate BRDF at i update throughput while path throughput high enough

logic15%

new path4%

mate-rials25%

ray cast56%

Execution Time

From Bikker (2013) 7

Megakernel Execution Divergence

Solution: Wavefront Path Tracing

Separate, specialized kernels

Keep a pool of ~1 million paths alive

Work for next stage goes into kernel-specific, compact queues (=4MB index arrays)

8https://mediatech.aalto.fi/~samuli/

https://mediatech.aalto.fi/~samuli/

Results

Performance

Execution times(ms / 1M path segments)

9Christian Machacek

Limitations and Possible Improvements

Higher memory requirements (+200 MB)

Kernel launch overheadDynamic parallelism on GK110

Use an outer scheduling kernel

No CPU round trip

Launch independent stages side-by-sideCUDA streams

So kernels with little work don’t hog the GPU

10Christian Machacek

Acceleration Structures

Find nearest intersection in O(log N)

Space partitioning vs. object partitioning

Hybrid methods exist

11Matthias Boindl

Performance

For interactive rendering, compromiseTraversal performance (build quality)

Construction/Update time

Update or rebuild from scratch

Adapt to GPU environmentMemory architecture

Parallel execution

12Matthias Boindl

State of the Art

Tero Karras and Timo Aila. 2013. Fast parallel construction of high-quality bounding volume hierarchies. In Proceedings of the 5th High-Performance Graphics Conference (HPG '13). ACM, New York, NY, USA, 89-99.

13Matthias Boindl

Close the Performance Gap

14Matthias Boindl

Basic Idea

Fast construction of simple BVHGenerate leaf for each triangle

Reduce SAH cost by modifying tree

15Matthias Boindl

Treelets

Allow local tree modification

16Matthias Boindl

ABCF are leaves, DEG are internal nodes

Treelet Construction

Find root: parallel bottom-up traversalStart with leaves

Use atomic counter at conjunctions

Ensures all children have been processed

Build treeletAdd both children

Pick children withhighest surface area

Fixed size: 7 leaf nodes

17Matthias Boindl

Rearrange Treelet

Minimize treelet root node surface areaNaive implementation: test each permutation

Better: dynamic programmingCaching of best intermediate resultsStart with leaves, then pairs, then triplets, …

Suboptimal subtree construction avoided

Parallelizable as well

18Matthias Boindl

Results

Gap closed

19Matthias Boindl

Results

Speed/Quality tradeoff

20Matthias Boindl

Conclusion

Use specialized kernelsLower execution divergence

(Better use of instruction cache)

(Fewer registers used simultaneously)

Construct acceleration structures quicklyBut not too quickly

21Matthias Boindl

Thanks for your attention!

Institute of Computer Graphics and Algorithms

Vienna University of Technology

Results

Speed/Quality tradeoff

23Matthias Boindl

Logic Kernel

Does not need a queue, operates on all paths

If shadow ray was unblocked, add light contribution

Find material or light source the ray hitsPlace path into proper material queue

Russian roulette

If path terminated, accumulate to imagePlace path into new path queue

Sample light sources (aka next event estim.)


New Path Kernel

Generate a new image-space sample

Generate camera rayPlace it into extension ray cast queue

Initialize path stateThroughput

Pixel position

etc.


Material Kernels

Generate incoming direction

Evaluate light contribution based on light sample generated in the logic kernel

We haven’t cast the shadow ray yet!

For MIS: p(light sample) from the BSDF

Discard BSDF stack

Queueextension ray

(shadow ray)


Ray Cast Kernels

Extension raysFind first intersection against scene geometry

Store hit data into path state

Shadow raysBlocked or not?


GPU-Assisted Path Tracing

Documents

Transcript of GPU-Assisted Path Tracing