Maximizing Parallelism in the Construction of BVHs, Octrees, and k-d Trees Tero Karras NVIDIA...

Post on 18-Jan-2016

235 views 2 download

Tags:

Transcript of Maximizing Parallelism in the Construction of BVHs, Octrees, and k-d Trees Tero Karras NVIDIA...

Maximizing Parallelism in the Construction of BVHs, Octrees, and k-d Trees

Tero Karras

NVIDIA Research

Trees

2

Trees

Path tracingPharr & Humphreys

Real-time ray tracingNVIDIA

Collision detectionAgeia

Particle simulationNVIDIA

Voxel-basedglobal illumination

Crassin et al.

Surface reconstructionAmenta et al.

Photon mappingUchida

Better Faster

3

Outline

Fastest existing methods are sequential Parallelize within each hierarchy level But not between levels

4

Outline

Fastest existing methods are sequential Parallelize within each hierarchy level But not between levels

Lack of parallelism Small workloads bottlenecked by top levels Sub-linear scaling of performance

5

Outline

Novel way to build the entire tree in parallel Two algorithmic “building blocks” Fast, scalable

6

Outline

Novel way to build the entire tree in parallel Two algorithmic “building blocks” Fast, scalable

Main focus: BVHs Point-based octrees and k-d trees also covered

in the paper

7

Bounding volume hierarchy

8

Bounding volume hierarchy

?

9

LBVH - Lauterbach et al. [2009]

1. Assign Morton codes2. Sort primitives3. Generate hierarchy4. Fit bounding boxes

p

px = 0.py = 0.pz = 0.

1 0 1 00 1 1 11 1 0 0

10

01

0

LBVH - Lauterbach et al. [2009]

1. Assign Morton codes2. Sort primitives3. Generate hierarchy4. Fit bounding boxes

p

px = 0.py = 0.pz = 0.

1 1 00 1 11 1 0

1 0 1 00 1 1 1

1 1 0 0

11

LBVH - Lauterbach et al. [2009]

1. Assign Morton codes2. Sort primitives3. Generate hierarchy4. Fit bounding boxes

p

px = 0.py = 0.pz = 0.

1 0 1 00 1 1 1

1 1 0 01 0 1 01 1 0 0code =

12

LBVH - Lauterbach et al. [2009]

1. Assign Morton codes2. Sort primitives3. Generate hierarchy4. Fit bounding boxes

13

LBVH - Lauterbach et al. [2009]

1. Assign Morton codes2. Sort primitives3. Generate hierarchy4. Fit bounding boxes

14

Binary radix tree

00010

00100

00101

10011

11000

11001

11110

00001

0 1 2 3 4 5 6 7

15

Binary radix tree

00010

00100

00101

10011

11000

11001

11110

00001

0 1 2 3 4 5 6 7

000

16

Binary radix tree

00010

00100

00101

10011

11000

11001

11110

00001

2 3 4 5 6 7

00

000

000

0 1

17

Binary radix tree

00010

00100

00101

10011

11000

11001

11110

00001

4

00

000

0 1

0010

2 3

1

11

7

1100

5 6

18

Binary radix tree

1

11

00

1100

0010

000

00010

00100

00101

10011

11000

11001

11110

00001

0 1 2 3 4 5 6 7

n

n-1

19

Longest common prefix

1

11

00

1100

0010

000

00010

00100

00101

10011

11000

11001

11110

00001

0 1 2 3 4 5 6 7

20

Longest common prefix

00

00001

000010

100100

200101

310011

411001

611110

711000

5

δ(5,6) = 4

δ(5,6) = 4

41100

11000

511001

6

4

21

Longest common prefix

200

00001

000010

100100

200101

310011

411001

611110

711000

5

δ(5,6) = 4

1100

δ(0,3) = ?

00001

000101

3

δ(0,3) = 2δ(0,3) = 2

2

4

22

Garanzha et al. [2011]

Level 3 1 node

Level 2 3 nodes

Level 0 1 node

00010

00001

00100

00101

10011

11000

11001

11110

Level 1 2 nodes1 2

0

4 5 3

0 1 2 3 4 5 6 7

6

23

Our method

00010

00001

00100

00101

10011

11000

11001

11110

0 1 2 3 4 5 6 7

?24

Our method

Define a numbering scheme for the nodes Gain some knowledge of their identity Establish a connection with the keys

25

Our method

Define a numbering scheme for the nodes Gain some knowledge of their identity Establish a connection with the keys

Find the children of a given node Only look at node index and nearby keys

26

Our method

Define a numbering scheme for the nodes Gain some knowledge of their identity Establish a connection with the keys

Find the children of a given node Only look at node index and nearby keys

Do this for all nodes in parallel27

Numbering scheme

00010

00100

00101

10011

11000

11001

11110

00001

0 1 2 3 4 5 6 7

0

3 4

28

Numbering scheme

00010

00100

00101

10011

11000

11001

11110

00001

0 1 2 3 4 5 6 7

0

3 4

5

29

Numbering scheme

00010

00100

00101

10011

11000

11001

11110

00001

0 1 2 3 4 5 6 7

1

6

1

6

3 3

0 0

44

5522

30

Numbering scheme

00010

00100

00101

10011

11000

11001

11110

00001

0 1 2 3 4 5 6 7

0

3 4

1 5

6

2

31

Algorithm

00010

00100

00101

10011

11000

11001

11110

00001

0 1 2 3 4 5 6 7

0

3 4

1 5

6

2

32

δ(2,3) = 4

Algorithm

00001

000010

100100

200101

310011

411000

511001

611110

7

?00100

200101

300101

310011

4

δ(3,4) = 03

δ(2,3) = 4

33

Algorithm

00001

000010

100100

200101

310011

411000

511001

611110

7

00101

310011

4

δ(3,4) = 03

δ(2,3) = 4

?δ(1,3) = 2 δ(0,3) = 2

34

Algorithm

00001

000010

100100

200101

310011

411000

511001

611110

7

3

δ(0,3) = 2

? δ(2,3) = 4 δ(1,3) = 2

35

Algorithm

00001

000010

100100

200101

310011

411000

511001

611110

7

3

δ(0,3) = 2

2 2

1 2

36

For each node i=0..n-2 in parallel:1. Determine direction of the range2. Expand the range as far as possible3. Find where to split the range4. Identify children

Algorithm

Binary search

37

Duplicate keys

The algorithm only works with unique keys Duplicates are common in practice

38

Duplicate keys

The algorithm only works with unique keys Duplicates are common in practice

Trick: Augment each key with its index Distinguishes between duplicates Keys are still in lexicographical order

39

Duplicate keys

The algorithm only works with unique keys Duplicates are common in practice

Trick: Augment each key with its index Distinguishes between duplicates Keys are still in lexicographical order

Tie-break when evaluating δ(i,j)40

LBVH

1. Assign Morton codes2. Sort primitives3. Generate hierarchy4. Fit bounding boxes

41

Lauterbach et al. [2009]

42

Our method

Need a different approach How many levels are there? Which nodes are located on a given level?

43

Our method

Need a different approach How many levels are there? Which nodes are located on a given level?

Traverse paths in the tree in parallel Start from leaves, advance toward the root Terminate threads using per-node atomic flags

44

Our method

45

Results

Evaluate performance on GTX 480 (Fermi) CUDA, 30-bit Morton codes

46

Results

Evaluate performance on GTX 480 (Fermi) CUDA, 30-bit Morton codes

Compare against Garanzha et al. [2011] Identical tree (top-level SAH splits disabled)

47

Results

Evaluate performance on GTX 480 (Fermi) CUDA, 30-bit Morton codes

Compare against Garanzha et al. [2011] Identical tree (top-level SAH splits disabled)

Simulate large GPUs N times as many cores N times the memory bandwidth

48

Results

49

Our method Garanzha et al.

Fairy Forest174K triangles

milliseconds

Morton Sort Build AABB Morton Sort Build AABB

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Morton Sort Build AABB Morton Sort Build AABB

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

2 ×1 × cores

4 ×

Results

50

Our method Garanzha et al.

Fairy Forest174K triangles

milliseconds

Morton Sort Build AABB Morton Sort Build AABB

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Morton Sort Build AABB Morton Sort Build AABB

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

1.7 ×1.1 ×

12.5 ×

33.6 ×

1.3 ×

2.4 ×

Results

51

Our method

Turbine Blade1.77M triangles

milliseconds

Garanzha et al.

Morton Sort Build AABB Morton Sort Build AABB

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

Morton Sort Build AABB Morton Sort Build AABB

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

3.7 ×

7.0 ×

0.8 ×

1.0 ×

11.625

2.252.875 3.5

4.1254.75

5.375 66.625

7.257.875

0

1

2

3

4

5

6

Results

11.625

2.252.875 3.5

4.1254.75

5.375 66.625

7.257.875

0

1

2

3

4

5

6

Garanzha et al.

52

Stanford Dragon871K triangles

Morton

Sort

Build

AABB

milliseconds

Our method

milliseconds

Acknowledgements

Timo Aila Samuli Laine David Luebke Jacopo Pantaleoni Jaakko Lehtinen

For helpful suggestions and proofreading.

53

Thank You

Questions

Backup slides

Pseudocode

56

ResultsCommon Build AABB

Scene Cores Eval Sort Our Prev Our Prev

Fairy Forest(174K tris)

1x 0.05 0.56 0.15 1.88 0.23 0.292x 0.03 0.33 0.09 1.75 0.13 0.224x 0.02 0.30 0.05 1.68 0.08 0.19

Conference Room(283K tris)

1x 0.08 0.78 0.24 1.93 0.35 0.382x 0.04 0.51 0.13 1.72 0.19 0.264x 0.03 0.31 0.08 1.58 0.12 0.20

Stanford Dragon(871K tris)

1x 0.22 1.67 0.65 3.14 1.10 1.032x 0.12 1.09 0.34 2.38 0.57 0.604x 0.06 0.64 0.18 1.97 0.30 0.38

Turbine Blade(1.77M tris)

1x 0.45 2.73 1.28 4.73 2.10 1.772x 0.23 1.63 0.65 3.19 1.07 0.964x 0.12 1.08 0.34 2.37 0.56 0.55

57

Algorithm

00010

00100

00101

10011

11000

11001

11110

00001

0 1 2 3 4 5 6 7

0

3 4

1 2 5

6

58

Algorithm

00001

000010

100100

200101

310011

411000

511001

611110

7

?20

0010

100100

200100

200101

3

δ(1,2) = 2 δ(2,3) = 4 δ(2,3) = 4

59

δ(2,4) = 0

Algorithm

00001

000010

100100

200101

310011

411000

511001

611110

7

?00010

100100

2 δ(2,3) = 4δ(1,2) = 2

2

60

δ(2,3) = 4

Algorithm

00001

000010

100100

200101

310011

411000

511001

611110

7

2

? δ(2,3) = 4

61

δ(2,3) = 4

Algorithm

00001

000010

100100

200101

310011

411000

511001

611110

7

2

1 1

62