Memory-Savvy Distributed Interactive Ray Tracing

47
Memory-Savvy Distributed Interactive Ray Tracing David E. DeMarle Christiaan Gribble Steven Parker

description

Memory-Savvy Distributed Interactive Ray Tracing. David E. DeMarle Christiaan Gribble Steven Parker. Impetus for the Paper. data sets are growing memory access time is a bottleneck use parallel memory resources efficiently three techniques for faster access to scene data. System Overview. - PowerPoint PPT Presentation

Transcript of Memory-Savvy Distributed Interactive Ray Tracing

Page 1: Memory-Savvy Distributed Interactive Ray Tracing

Memory-Savvy Distributed Interactive Ray Tracing

David E. DeMarle

Christiaan Gribble

Steven Parker

Page 2: Memory-Savvy Distributed Interactive Ray Tracing

Impetus for the Paper

• data sets are growing

• memory access time is a bottleneck

• use parallel memory resources efficiently

• three techniques for faster access to scene data

Page 3: Memory-Savvy Distributed Interactive Ray Tracing

System Overview

• base system presented at IEEE PVG’03

• cluster port of an interactive ray tracer for shared memory supercomputers IEEE VIS’98

• image parallel work division• fetch scene data over from peers and cache

locally

Page 4: Memory-Savvy Distributed Interactive Ray Tracing

Three Techniques forMemory Efficiency

• ODSM PDSM

• central work queue distributed work sharing

• polygonal mesh reorganization

Page 5: Memory-Savvy Distributed Interactive Ray Tracing

Distributed Shared Memory• data is kept in memory blocks• each node has 1/nth of the blocks• fetch rest over the network from peers• cache recently fetched blocks

1 2 3 4 6 75 8 9abstract view of memory

1 4 7 2node 1’s memory

2 5 8 3node 2’s memory

3 6 9 2 4node 3’s memory

resident set cache

Page 6: Memory-Savvy Distributed Interactive Ray Tracing

Object Based DSM

• each block has a unique handle• application finds handle for each datum• acquire and release for every block access

//locate datahandle, offset = ODSM_location(datum);block_start_addr = acquire(handle);//use datadatum = *(block_start_addr + offset);//relinquish spacerelease(handle);

Page 7: Memory-Savvy Distributed Interactive Ray Tracing

ODSM Observations

• handle = level of indirection > 4 GB

• mapping scene data to blocks is tricky• acquire and release add overhead• address computations add overhead

7.5 GB Richtmyer-Meshkov time step

64 CPUs ~3fps,

with view and isovalue changes

Page 8: Memory-Savvy Distributed Interactive Ray Tracing

Page Based DSM

• like ODSM: • each node keeps 1/nth of scene• fetches from peers• uses caching

• difference is how memory is accessed• normal virtual memory addressing• use addresses between heap and stack• PDSM installs a segmentation fault signal handler:

on a miss obtain page from peer, return

Page 9: Memory-Savvy Distributed Interactive Ray Tracing

PDSM Observations

• no handles, normal memory access• no acquire/release or address computations • easy to place any type of scene data in shared space • limited to 2^32 bytes • hard to make thread safe

• DSM acts only in the exceptional case of a miss• ray tracing acceleration structure > 90 % hit rates

ODSM PDSM

Hit time 10.2 µs 4.97 µs

Miss time 629 µs 632 µs

Page 10: Memory-Savvy Distributed Interactive Ray Tracing

Head-to-Head Comparison

• compare replication, PDSM and ODSM

• use a small 512^3 volumetric data set

• PDSM and ODSM keep only 1/16th locally

• change viewpoint and isovalue throughout • first half, large working set• second half, small working set

Page 11: Memory-Savvy Distributed Interactive Ray Tracing

Head-to-Head Comparison

note - accelerated ~2x for presentation

Page 12: Memory-Savvy Distributed Interactive Ray Tracing

Head-to-Head Comparison

0

2

4

6

8

10

12

Frame #

Fra

mes

/Sec

REP

PDSM

ODSM

Page 13: Memory-Savvy Distributed Interactive Ray Tracing

Head-to-Head Comparison

0

2

4

6

8

10

12

Frame #

Fra

mes

/Sec

REP

PDSM

ODSM

replicated 3.74 frames/sec average

Page 14: Memory-Savvy Distributed Interactive Ray Tracing

Head-to-Head Comparison

0

2

4

6

8

10

12

Frame #

Fra

mes

/Sec

REP

PDSM

ODSM

ODSM 32% speed of replication

Page 15: Memory-Savvy Distributed Interactive Ray Tracing

Head-to-Head Comparison

0

2

4

6

8

10

12

Frame #

Fra

mes

/Sec

REP

PDSM

ODSM

PDSM 82% speed of replication

Page 16: Memory-Savvy Distributed Interactive Ray Tracing

Three Techniques forMemory Efficiency

• ODSM PDSM

• central work queue distributed work sharing

• polygonal mesh reorganization

Page 17: Memory-Savvy Distributed Interactive Ray Tracing

Load Balancing Options

• central work queue• legacy from original shared memory implementation• display node keeps task queue• render nodes get tiles from queue

• now distributed work sharing• start with tiles traced last frame

hit rates increase• workers get tiles from each other

communicate in parallel, better scalability• steal from random peers, slowest worker gives work

Page 18: Memory-Savvy Distributed Interactive Ray Tracing

Supervisor node

tile 0 tile 1 tile 2 tile 3 …

Worker node 0

Worker node 1

Worker node 2

Worker node 3

Worker node 0

Worker node 1

Worker node 2

Worker node 3

…tile 0

tile 1

tile 2

tile 3

Central Work Queue Distributed Work Sharing

Page 19: Memory-Savvy Distributed Interactive Ray Tracing

Central Work Queue Distributed Work Sharing

Page 20: Memory-Savvy Distributed Interactive Ray Tracing

Central Work Queue Distributed Work Sharing

Page 21: Memory-Savvy Distributed Interactive Ray Tracing

Central Work Queue Distributed Work Sharing

Page 22: Memory-Savvy Distributed Interactive Ray Tracing

Central Work Queue Distributed Work Sharing

Page 23: Memory-Savvy Distributed Interactive Ray Tracing

Comparison

• bunny, dragon, and acceleration structures in PDSM

• measure misses and frame rates

• vary local memory to simulate data much larger than physical memory

Page 24: Memory-Savvy Distributed Interactive Ray Tracing

167 135 103 71 39 23

Mis

ses

Fra

mes

/Sec

0

1E6

0

20

MB locally

15

105

5E4

central queue distributed sharing

Page 25: Memory-Savvy Distributed Interactive Ray Tracing

167 135 103 71 39 23

Mis

ses

Fra

mes

/Sec

0

1E6

0

20

MB locally

15

105

5E4

central queue distributed sharing

Page 26: Memory-Savvy Distributed Interactive Ray Tracing

167 135 103 71 39 23

Mis

ses

Fra

mes

/Sec

0

1E6

0

20

MB locally

15

105

5E4

central queue distributed sharing

Page 27: Memory-Savvy Distributed Interactive Ray Tracing

167 135 103 71 39 23

Mis

ses

Fra

mes

/Sec

0

1E6

0

20

MB locally

15

105

5E4

central queue distributed sharing

Page 28: Memory-Savvy Distributed Interactive Ray Tracing

Three Techniques forMemory Efficiency

• ODSM PDSM

• central work queue distributed work sharing

• polygonal mesh reorganization

Page 29: Memory-Savvy Distributed Interactive Ray Tracing

Mesh “Bricking”

• similar to volumetric bricking• increase hit rates by reorganizing scene data for

better data locality• place neighboring triangles on the same page

&0 &1

&90 &91

&2

&92

&3

&93

volume bricking

&3 &5&4

&7&8

&1&0 &2 &6

&94 &96&95

&98&90 &92&91 &93 &97

……

mesh “bricking”

Page 30: Memory-Savvy Distributed Interactive Ray Tracing

Mesh “Bricking”

• similar to volumetric bricking• increase hit rates by reorganizing scene data for

better data locality• place neighboring triangles on the same page

&0 &1

&90 &91

&2

&92

&3

&93

volume bricking

&3 &5&4

&7&8

&1&0 &2 &6

&94 &96&95

&98&90 &92&91 &93 &97

……

mesh “bricking”

Page 31: Memory-Savvy Distributed Interactive Ray Tracing

Mesh “Bricking”

• similar to volumetric bricking• increase hit rates by reorganizing scene data for

better data locality• place neighboring triangles on the same page

&0 &1

&90 &91

&2

&92

&3

&93

volume bricking

&3 &5&4

&7&8

&1&0 &2 &6

&94 &96&95

&98&90 &92&91 &93 &97

……

mesh “bricking”

Page 32: Memory-Savvy Distributed Interactive Ray Tracing

Mesh “Bricking”

• similar to volumetric bricking• increase hit rates by reorganizing scene data for

better data locality• place neighboring triangles on the same page

&0 &1

&90 &91

&2

&92

&3

&93

volume bricking

&3 &5&4

&7&8

&1&0 &2 &6

&94 &96&95

&98&90 &92&91 &93 &97

……

mesh “bricking”

Page 33: Memory-Savvy Distributed Interactive Ray Tracing

Mesh “Bricking”

• similar to volumetric bricking• increase hit rates by reorganizing scene data for

better data locality• place neighboring triangles on the same page

&0 &1

&90 &91

&2

&92

&3

&93

volume bricking

&3 &5&4

&7&8

&1&0 &2 &6

&94 &96&95

&98&90 &92&91 &93 &97

……

mesh “bricking”

Page 34: Memory-Savvy Distributed Interactive Ray Tracing

Mesh “Bricking”

• similar to volumetric bricking• increase hit rates by reorganizing scene data for

better data locality• place neighboring triangles on the same page

&0 &1

&2 &3

&4

&6

&5

&7

volume bricking

&6 &8&7

&13&14

&1&0 &2 &12

&10 &15&11

&17&3 &5&4 &9 &16

……

mesh “bricking”

Page 35: Memory-Savvy Distributed Interactive Ray Tracing

Mesh “Bricking”

• similar to volumetric bricking• increase hit rates by reorganizing scene data for

better data locality• place neighboring triangles on the same page

&0 &1

&2 &3

&4

&6

&5

&7

volume bricking

&6 &8&7

&13&14

&1&0 &2 &12

&10 &15&11

&17&3 &5&4 &9 &16

……

mesh “bricking”

Page 36: Memory-Savvy Distributed Interactive Ray Tracing

Input Mesh

Page 37: Memory-Savvy Distributed Interactive Ray Tracing

Sorted Mesh

Page 38: Memory-Savvy Distributed Interactive Ray Tracing

Reorganizing the Mesh

• based on a grid acceleration structure• each grid cell contains pointers to triangles within• our grid structure is bricked in memory

1. create grid acceleration structure2. traverse the cells as stored in memory3. append copies of the triangles to a new mesh

• new mesh has triangles sorted in space and memory

Page 39: Memory-Savvy Distributed Interactive Ray Tracing

Comparison

• same test as before

• compare input and sorted mesh

Page 40: Memory-Savvy Distributed Interactive Ray Tracing

0

20000

40000

60000

80000

100000

120000

140000

160000

0

2

4

6

8

10

12

72.8 40.8 32.8 24.8 20.8 16.8 14.8

Mis

ses

Fra

mes

/Sec

MB locally

input mesh sorted mesh

Page 41: Memory-Savvy Distributed Interactive Ray Tracing

0

20000

40000

60000

80000

100000

120000

140000

160000

0

2

4

6

8

10

12

72.8 40.8 32.8 24.8 20.8 16.8 14.8

Mis

ses

Fra

mes

/Sec

MB locally

input mesh sorted mesh

Page 42: Memory-Savvy Distributed Interactive Ray Tracing

0

20000

40000

60000

80000

100000

120000

140000

160000

0

2

4

6

8

10

12

72.8 40.8 32.8 24.8 20.8 16.8 14.8

Mis

ses

Fra

mes

/Sec

MB locally

input mesh sorted mesh

Page 43: Memory-Savvy Distributed Interactive Ray Tracing

0

20000

40000

60000

80000

100000

120000

140000

160000

0

2

4

6

8

10

12

72.8 40.8 32.8 24.8 20.8 16.8 14.8

Mis

ses

Fra

mes

/Sec

MB locally

input mesh sorted mesh

Page 44: Memory-Savvy Distributed Interactive Ray Tracing

0

2

4

6

8

10

12

72.8 40.8 32.8 24.8 20.8 16.8 14.8

Fra

mes

/Sec

MB locally

input mesh sorted mesh

• grid based approach duplicates split triangles

Page 45: Memory-Savvy Distributed Interactive Ray Tracing

Summary

three techniques for more efficientmemory use:

1. PDSM adds overhead only in the exceptional case of data miss

2. reuse tile assignments with parallel load balancing heuristics

3. mesh reorganization puts related triangles onto nearby pages

Page 46: Memory-Savvy Distributed Interactive Ray Tracing

Future Work

• need 64-bit architecture for very large data

• thread safe PDSM for hybrid parallelism

• distributed pixel result gathering

• surface based mesh reorganization

Page 47: Memory-Savvy Distributed Interactive Ray Tracing

Acknowledgments

• Funding agencies• NSF 9977218, 9978099• DOE VIEWS• NIH

• Reviewers - for tips and seeing through the rough initial data presentation

• EGPGV Organizers

• Thank you!