Realtime and Interactive Ray Tracing Basics and Latest … · until the first cheap 8-bit home...

66
Realtime and Interactive Ray Tracing Basics and Latest Developments Thomas Wöllert (Dipl.-Inf. (FH)) Matriculation no. 05478901-0199 Semestergroup IG2 Advanced Seminar Semester Thesis Summer 2006 Master of Science Computer Graphics and Image Processing Department of Computer Science and Mathematics Munich University of Applied Sciences

Transcript of Realtime and Interactive Ray Tracing Basics and Latest … · until the first cheap 8-bit home...

Realtime and Interactive Ray TracingBasics and Latest Developments

Thomas Wöllert (Dipl.-Inf. (FH))

Matriculation no. 05478901-0199Semestergroup IG2

Advanced SeminarSemester Thesis Summer 2006

Master of ScienceComputer Graphics and Image Processing

Department of Computer Science and MathematicsMunich University of Applied Sciences

Unsichtbarer Text, damit das Zitat in die Mitte der Seite gerückt werden kann.

„No pessimist ever discovered thesecret of the stars or sailed an uncharted land,or opened a new doorway for the human spirit.“

Helen Keller (1880 - 1968) [1]

Abstract

Type: Advanced Seminar ThesisAuthor: Wöllert, Thomas (Dipl.-Inf. (FH))Titel: Realtime and Interactive Ray Tracing, Basics and Latest Developments

Date: 22nd June 2006Number of Pages: 66

Field of Study: Master of Science - Computer Graphics and Image ProcessingUniversity: Munich University of Applied Sciences, GermanyAdvisor: Prof. Dr. A. Nischwitz

Ray Tracing was first described by Arthur Appel in 1968. Due to its high rendering time it isalmost solely used in pre-rendered motion pictures featuring realistic illumination, reflectionand refraction. During the past years big strides have been made to prepare ray tracing forrealtime applications like computer games and virtual dynamic environments.

This semester thesis starts with an explanation of the principles of how ray tracing works,presenting the terms and different rendering styles (i.e. recursive, distribution etc.). After-wards reasons, why ray tracing is not already used in todays graphics cards, as well as prosand cons refering to common rasterization are discussed.

Section 3 focuses on different methods to speed up the ray tracing process. This can beaccomplished by reducing the number of rays or by using acceleration structures (i.e. uniformgrid, kd-tree, etc.) to improve the intersection tests, taking up most of the time in therendering process. Additional information on the latest developments regarding accelerationstructures especially designed for dynamic scenes, can be found in Chapter 4.

With the described tools it is possible to realize different approaches aiming at realtime raytracing, which are presented in Chapter 5 (CPU, hybrid CPU-GPU, GPU, special purposehardware).

A special focus is laid upon GPU ray tracers in Section 6, explaining different approaches andtheir problems. A benchmark visualizing the results concludes this chapter.

Vital for the mass marketing of ray tracing is an easy-to-use programming interface. Oneapproach with that potential is described in Part 7 of this document: OpenRT Realtime RayTracing API, featuring an OpenGL-like programming language.

Specific applications of ray tracing are described in Section 8, presenting graphical (i.e. mas-sively complex models) and non-graphical (i.e. collision detection, artificial intelligence) ex-amples. The final Part 9 concludes this document.

Keywords: ray tracing, interactive rendering, programmable graphics hardware, GPU, accel-eration structures

Contents

Abstract v

List of Figures ix

List of Tables xi

1 Introduction 1

1.1 Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.3 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Basic Principles of Ray Tracing 5

2.1 Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 The Ray Tracing Rendering Algorithm . . . . . . . . . . . . . . . . . . . . . . 6

2.2.1 The First Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2.2 Recursive Ray Tracing . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2.3 Distribution Ray Tracing . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2.4 Shader Ray Tracing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.3 Ray Tracing vs. Rasterization . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3 Acceleration Methods 11

3.1 Computing Less Samples in the Image Plane . . . . . . . . . . . . . . . . . . 11

3.2 Reducing the Number of Secondary Rays . . . . . . . . . . . . . . . . . . . . . 12

3.3 Accelerated Intersection Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.3.1 Primitive-specific Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.3.2 Bounding Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.3.3 Spatial Subdivision Schemes . . . . . . . . . . . . . . . . . . . . . . . . 13

3.3.4 Bounding Volume Hierarchies . . . . . . . . . . . . . . . . . . . . . . . 16

vii

viii CONTENTS

4 Acceleration Methods for Dynamic Scenes 19

4.1 Dynamic Bounding Volume Hierarchies . . . . . . . . . . . . . . . . . . . . . . 19

4.2 Coherent Grid Traversal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.3 Distributed Interactive Ray Tracing . . . . . . . . . . . . . . . . . . . . . . . . 22

4.4 Benchmarking Animated Ray Tracing . . . . . . . . . . . . . . . . . . . . . . 23

5 Approaches to realize Realtime Ray Tracing 25

5.1 Software Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

5.2 Using Programmable GPUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

5.3 Special purpose Hardware Architectures . . . . . . . . . . . . . . . . . . . . . 27

6 Using the GPU for Ray Tracing 29

6.1 Stream Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

6.2 Choosing the Acceleration Structure . . . . . . . . . . . . . . . . . . . . . . . 32

6.3 Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

7 Realtime Ray Tracing API (RTRT/OpenRT) 39

8 Applications 41

8.1 Computer Graphic Applications . . . . . . . . . . . . . . . . . . . . . . . . . . 41

8.1.1 Computer Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

8.1.2 Visualizing Massively Complex Models . . . . . . . . . . . . . . . . . . 43

8.1.3 Free-Form Surfaces and Volume Ray Tracing . . . . . . . . . . . . . . 45

8.1.4 Mixed Reality Rendering . . . . . . . . . . . . . . . . . . . . . . . . . 45

8.2 A.I.-Vision and Collision Detection . . . . . . . . . . . . . . . . . . . . . . . . 47

9 Conclusions 49

9.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

9.2 Final Thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

Bibliography 51

List of Figures

1.1 Rendered Image from Final Fantasy - The Spirits Within (Courtesy of ColumbiaPictures). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Rendered Image from Ice Age II - Meltdown (Courtesy of Twentieth CenturyFox). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2.1 Basic Ray Tracing: Rays being cast into a scene, filled with primitives (Cour-tesy of Wald [8]). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Recursive Ray Tracing: Generating secondary rays at thit of the primary ray(Courtesy of Wald [8]). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.1 Spatial Subdivision: Triangle B belongs to multiple voxels. Stopping the raytraversal early would mean to miss the intersection with triangle A (Courtesyof Thrane [20]). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.2 Uniform grid: Traversal of an uniform grid (Courtesy of Havran [9]). . . . . . 15

3.3 KD-Tree: Three steps of the tree construction (Courtesy of Havran [9]). . . . 15

3.4 BVH: Tree for cow example (Courtesy of Somchaipeng [23]). . . . . . . . . . . 16

3.5 BVH: A solid cow and the levels in its bounding volume hierarchy (Courtesyof Somchaipeng [23]). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.1 Dynamic BVH: Two childnodes of a BVH tree. Bounding volumes move overtime (left to right) (Courtesy of Wald [24]). . . . . . . . . . . . . . . . . . . . 20

4.2 Dynamic BVH: Rendered scene using 174.000 triangles, rendering at 1024x1024pixels (Courtesy of Wald [24]). . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.3 Coherent Grid Traversal: Computation steps (Courtesy of Wald [25]). . . . . 21

4.4 Coherent Grid Traversal: Adaption to the scene geometry of kd-tree (a) andgrid (b) (Courtesy of Wald [25]). . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.5 Distributed Ray Tracing: Robots (left) with color-coded objects (right). Tri-angles of the same object belong to the same color (Courtesy of Wald [26]). . 23

4.6 Distributed Ray Tracing: Two-level hierarchy with a top level BSP-tree con-taining references to instances. Objects consist of geometry and a local BSPtree (Courtesy of Wald [26]). . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

ix

x LIST OF FIGURES

4.7 BART: Images from „kitchen“ (left) and „robots“ (right) (Courtesy of Lext [28]) 24

4.8 BART: Images from „museum“ (Courtesy of Lext [28]) . . . . . . . . . . . . . 24

5.1 VIZARD: A rendered volume dataset of a lobster (Courtesy of Meißner [41]). 27

5.2 VolumePro: Rendered medical volume data (Courtesy of Mitsubishi Electric [42]). 28

5.3 SaarCOR: Scenes rendered at 4.5 and 7.5 fps on the 66 MHz prototype (Cour-tesy of Woop [45]). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

6.1 The stream programming model (Courtesy of Purcell [39]). . . . . . . . . . . 29

6.2 The programmable graphics pipeline (Courtesy of Purcell [39]). . . . . . . . . 30

6.3 The kernels used in the streaming ray tracer (Courtesy of Purcell [39]). . . . . 31

6.4 Uniform grid stored in several textures (Courtesy of Purcell [39]). . . . . . . . 33

6.5 Example for traversal of a BVH tree in a texture (Courtesy of Thrane [20]). . 34

6.6 Benchmark: Purcell GPU Test Scenes: Cornell box, teapotahedron, Quake 3(Courtesy of Purcell [39]). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

6.7 Benchmark: Thrane & Simonsen: Cows, Robots, Kitchen and Bunny (fromleft to right) (Courtesy of Thrane [20]) . . . . . . . . . . . . . . . . . . . . . . 37

8.1 Quake 3 Raytraced: Multiple reflections . . . . . . . . . . . . . . . . . . . . . 42

8.2 Quake 3 Raytraced: Look-through portal . . . . . . . . . . . . . . . . . . . . . 42

8.3 Quake 3 Raytraced: High number of polygons . . . . . . . . . . . . . . . . . . 42

8.4 Quake 3 Raytraced: Area crowded with characters . . . . . . . . . . . . . . . 42

8.5 UNC power plant: Highly detailed building (Courtesy of Wald [55]) . . . . . . 43

8.6 UNC power plant: Rendering with shadows (Courtesy of Wald [55]) . . . . . 43

8.7 Boing 777: Overview (Courtesy of Boeing Corp.) . . . . . . . . . . . . . . . . 44

8.8 Boing 777: Engine interior (Courtesy of Boeing Corp.) . . . . . . . . . . . . . 44

8.9 Boing 777: Dynamic loading (at startup) (Courtesy of Boeing Corp.) . . . . . 44

8.10 Boing 777: Dynamic loading (after startup) (Courtesy of Boeing Corp.) . . . 44

8.11 Free-Form Surfaces: Round face (Courtesy of Benthin [57]) . . . . . . . . . . 45

8.12 Free-Form Surfaces: Chess game (Courtesy of Benthin [57]) . . . . . . . . . . 45

8.13 Volume Ray Tracing: Bonsai (Courtesy of Marmitt [58]) . . . . . . . . . . . . 46

8.14 Volume Ray Tracing: Skull (Courtesy of Marmitt [58]) . . . . . . . . . . . . . 46

8.15 Mixed Reality: Shadows and reflections (Courtesy of Pomi [59] [60]) . . . . . 46

8.16 Mixed Reality: TV with reflections (Courtesy of Pomi [59] [60]) . . . . . . . . 46

8.17 Mixed Reality: TV as red light source (Courtesy of Pomi [59] [60]) . . . . . . 46

8.18 Mixed Reality: TV as green light source (Courtesy of Pomi [59] [60]) . . . . . 46

8.19 A.I. Vision - Visible player (Courtesy of Pohl [52]) . . . . . . . . . . . . . . . 47

8.20 A.I. Vision - Hidden player (Courtesy of Pohl [52]) . . . . . . . . . . . . . . . 47

List of Tables

6.1 Benchmark: CPU-GPU Hybrid: Speedups if using the GPU to render theteapot scene. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

6.2 Benchmark: Purcell GPU: Scene complexity and results, with eye rays (E),shadow rays (S), reflection rays (R). . . . . . . . . . . . . . . . . . . . . . . . 36

6.3 Benchmark: Thrane & Simonsen: Scene complexity. . . . . . . . . . . . . . . 37

6.4 Benchmark: Thrane & Simonsen: Average rendering times in milliseconds perframe, including shadow and reflection rays where applicable. . . . . . . . . . 38

Chapter 1

Introduction

This semester thesis was created in the context of the Master of Science advanced seminar,held at the Munich University of Applied Sciences [2]. The main focus was laid upon usingthe GPU1 on the modern graphics cards for acceleration purposes.

1.1 Task

Staying up-to-date on the latest developments in a certain area is often a difficult task. De-velopment and science take rapid strides forward every day, making it almost impossible forsomeone to stay informed, without needing a 48 hour day. The task of this advanced seminarwas to study and collect information on a specific topic, and present them adequately in thiswritten document and in a presentation for all participants.

Ray tracing has the potential to determine the way, how computer graphics will evolve overthe next decade. This document is designed as a starting point, someone can use to get anoverview not only about the basics of ray tracing, but also about its latest developments (upto the time of this writing in June, 2006).

1.2 Motivation

People have always tried to break out of their normal world. Reading books has been andstill is a popular way to do that, with the drawback, that the experience only exists withinthe reader’s mind. It is impossible to see this other world aside of paintings on the book’scover.

Since the beginning of the 20th century movies have tried to remedy this disadvantage, pre-senting fictional worlds to their viewers. To tell the story, models and puppeteers were usedacting as starships or monsters. Today these first steps looked crude and not very convincing(sometimes the ropes hanging on the wings of the starfighter were visible).

1 GPU, Graphics Processing Unit

1

2 CHAPTER 1. INTRODUCTION

Computer games were born around the mid of the last century, but found no mass marketuntil the first cheap 8-bit home computers (i.e. Apple, Commodore VIC-20 etc.) [3] wereavailable around 1980. Due to the hardware’s limited graphical abilities, most games of thattime solely relied on text (i.e Zork by Infocom [4]).

Evolving over the years, movies and computers merged. Today no movie is created withoutextensive use of computer graphics, to make the story more convincing for the viewer.

As one of the first movies to be completely rendered within a computer, Final Fantasy -The Spirits Within (2001) (see Figure 1.1) [5] showed everyone, that human actors could berealistically replaced by computer-generated characters. Afterwards many motion pictureswere following in these footsteps, i.e. Ice Age (2002), Ice Age II (2006) (see Figure 1.2) [6]and others.

Figure 1.1: Rendered Image from FinalFantasy - The Spirits Within (Courtesy ofColumbia Pictures).

Figure 1.2: Rendered Image from Ice AgeII - Meltdown (Courtesy of Twentieth Cen-tury Fox).

Among other new graphic technologies, ray tracing played a vital role in creating these virtualworlds. It enabled the director to simulate realistic illumination and other effects. Majordisadvantage of ray tracing is, that rendering one image takes its time, making it unfeasiblefor computer games. These games cannot rely on pre-rendered imagery, because the worldneeds to be dynamic, immediately responding to the user’s actions.

Ray tracing was first described by Arthur Appel in 1968 [7]. While being solely used in pre-rendered images for some time, ray tracing has become more and more interesting for realtimescenes during the past years. The processing power in todays computer CPUs2 and graphicscards has evolved far enough, to make realtime ray tracing possible.

2 CPU, Central Processing Unit

1.3. OVERVIEW 3

1.3 Overview

The purpose of this document is to provide an overview of the field of ray tracing. Not onlycovering the basics but also presenting the latest developments in algorithms and hardware.

This semester thesis starts with an explanation of the principles of how ray tracing works,presenting the terms and different rendering styles (i.e. recursive, distribution etc.). After-wards reasons are discussed, why ray tracing is not already used in todays graphics cards,also mentioning pros and cons referring to common rasterization.

As explained in Part 2, the intersection tests in the ray tracing algorithm need to be ac-celerated. Therefore Chapter 3 explains methods to speed up these tests by reducing thenumber of rays as well as creating support structures like uniform grids, kd-trees and others.However, these methods and support structures have been developed for static scenes. Usingthem in a dynamic environment often yields their problems adapting to these scenes. Latestdevelopments mainly focus on creating new acceleration methods for dynamic scenes, whichare explained in Section 4.

Now with all the tools ready, Chapter 5 presents different approaches on how realtime ray trac-ing can be realized. Possibilities like CPU-, GPU-, hybrid CPU-GPU-, and special-purposehardware techniques are discussed, explaining their advantages and drawbacks.

A special focus is laid upon GPU ray tracers in Part 6, explaining different approaches andtheir problems. A benchmark visualizing the results concludes this chapter.

Vital for the mass marketing of ray tracing is an easy-to-use programming interface. As onesuch approach the OpenRT Realtime Ray Tracing API is explained in Part 7, featuring anOpenGL-like programming language.

Specific graphical applications of ray tracing are described in Section 8, featuring computergames, massively complex geometry rendering and others. Some applications leaving thegraphics area are described afterwards, showing, how ray tracing can be useful for other fieldsof use like collision detection and artificial intelligence.

Chapter 9 concludes this document, offering a resume as well as a perspective of futuredevelopments and how they will have an impact on the field of computer graphics.

Chapter 2

Basic Principles of Ray Tracing

In contrast to polygonal rendering methods used on todays graphics cards, ray tracing triesto simulate the natural way light rays traverse through a scene. This chapter describes thebasic elements and terms of ray tracing as well as different algorithm approaches.

2.1 Terms

The core task of all ray tracing algorithms is to cast a ray of light into a scene containinggeometry. If doing that, the ray might or might not intersect objects in the scene. Theproblem is now to determine, which object, if any, has been hit by the ray and where. Inorder to do that, the light ray needs to be defined by its origin ~o and its direction ~d. Usingthe time t as a variable leads to ~r(t) = ~o + t · ~d. Additionally a time tmax is defined, tellingthe algorithm, when to discard the ray in case it did not hit any geometry (the ray then leftthe scene).

There are two different types of rays: primary rays and secondary rays. Primary rays areall rays where the origin ~o is the same position as the eye point. In contrast, secondary raysare only generated, if primary rays hit an object in the scene. Such secondary rays includeshadow, reflection, and refraction rays. A more detailed description is given in section 2.2.

The ray tracer itself covers three different problems as described by Wald in his PhD thesis [8]:Finding the closest intersection to the origin, finding any intersection along the ray, andfinding all intersections along the rays path.

Finding the closest intersection to the origin is the basic task of any ray tracer. It involvesdetermining the time thit, when the ray intersected with a primitive P in the scene. WhenP is hit, most algorithms also store additional information like the surface material or thenormal vector in order to correctly shade the rays’ pixel or generate secondary rays if needed.

The second problem is to find any intersection along the ray. This visibility test is applied tothe ray between its origin ~o and its end ~o+tmax · ~d. Most ray tracers include very sophisticatedalgorithms capable of such tests, which are especially helpful if shooting shadow rays. Normal

5

6 CHAPTER 2. BASIC PRINCIPLES OF RAY TRACING

primary rays might require more intersection tests to determine the primitive which have beenhit, but in case of a shadow ray it is only interesting, if a geometry is hit at all.

Finding all intersections along the path of a ray is only necessary for some advanced lightingalgorithms as described by Wald [8] which are not very common (i.e. global monte carlotechniques to compute radiosity).

2.2 The Ray Tracing Rendering Algorithm

Since ray tracing was first described about thirty years ago, several different approaches havebeen invented, all with the goal to include more effects (i.e. reflections) than the previousalgorithm.

2.2.1 The First Approach

The first approach was created by Arthur Appel for the rendering of solid objects in 1968 [7].A three dimensional scene needs to be rendered in a two dimensional image, which is thendisplayed on the screen. For each pixel in the image one primary ray is generated, originatingat the eye point. In case an object is hit by a ray, the object’s material properties and colordetermine the color of the ray and therefore the color of the pixel in the image (see Figure2.1).

Figure 2.1: Basic Ray Tracing: Rays being cast into a scene, filled with primitives (Courtesyof Wald [8]).

At this stage the pixel of the image received a certain color, based on the primitive which hasbeen hit by the associated light ray, but lights and shadows are not supported. A simple wayto implement shadows is to generate a shadow ray everytime a primary ray hits an object.The origin ~o of the shadow ray is the intersection point determined for the primary ray:~r(thit) = ~oeye + thit · ~d. The direction ~d of the shadow ray is defined by the position of alllight sources in the scene. One shadow ray is generated per light source. In case there is nointersection along the shadow ray’s path, the light from the specific light source is reachingthe hit point. If there is an intersection found, another object is located in the path betweenthe light source and the hit point. Both cases cause the pixel color to change (becoming

2.2. THE RAY TRACING RENDERING ALGORITHM 7

either lighter (no obstacle between the origin and the light source) or darker (obstacle castinga shadow)). Shadow rays do not generate any new shadow or primary rays.

This approach is unable to display any kind of reflection or refraction because no new primaryrays are generated. Also the rendered shadows are not diffuse, resulting in sharp bordersbetween the shadow’ed and light’ed area.

2.2.2 Recursive Ray Tracing

Today the most common approach for ray tracing involving recursive calls has first beendescribed by Turner Whitted [10] in 1980.

Additionally to the ray casting described by Arthur Appel, it was now possible to generatesecondary rays accounting for reflections and refractions. If a primary ray intersects an object,not only shadow rays but also reflection and refraction rays are generated. An example isshown in Figure 2.2, where a primary ray hits a glass, which is partly reflective and refractive.

Figure 2.2: Recursive Ray Tracing: Generating secondary rays at thit of the primary ray(Courtesy of Wald [8]).

In the left image of Figure 2.2 a reflective secondary ray is generated. The direction ~d of thisray is determined by the normal vector at the hit point of the primary ray and the reflectionmaterial property of the glass. The origin ~o of the secondary ray is the same as the positionof the primary ray at thit. The same happens with the refraction secondary ray, for which ~dis based on the materials refraction property.

Both generated secondary rays act like new primary rays, which means, that they can alsogenerate shadow rays, when hitting a new object as well as new secondary rays. The onlydifference is, that the color, generated by all secondary rays, also affects the color of theprimary ray and therefore the pixel in the image.

2.2.3 Distribution Ray Tracing

Recursive ray tracing already implemented the possibility to generate reflections, refractions,and shadows. A drawback was the fact, that neither smooth shadows nor blurs or similareffects were supported.

8 CHAPTER 2. BASIC PRINCIPLES OF RAY TRACING

These restrictions have been removed by Cook et al. [11]. He started by modeling all theseeffects with a probability distribution, which allowed computing of i.e. smooth shadows viastochastic sampling. Glossy reflections for example can then be computed by stochasticallysampling this distribution and recursively shooting rays into the sampled directions. However,in order to achieve a sufficient quality, this technique requires a relatively large number ofsamples and is usually quite costly.

2.2.4 Shader Ray Tracing

Till now all described algorithms are generating the color of a ray based on the color andmaterial properties of the objects themselves or any generated secondary ray hit. Nowadaysprogrammable shaders are already an important tool in creating even more realistic materialsor other effects in traditional rasterization techniques. So it seemed straight forward to alsouse the same approach in ray tracing with the extended ability, that a shader is also ableto generate new secondary rays. This made it possible to separate the shading process fromthe actual ray tracing process. Additionally one shader does not need information from anyother shader, so combining shaders is also easily possible (i.e. a piece of wood visible througha liquid environment, featuring a wood surface shader, combined with a water environmentshader).

Using this approach several different shader classes can be identified: camera, surface, light,environment, volume, and pixel shaders as described by Wald [8].

Camera Shaders

Camera shaders are responsible for generating and casting primary rays. This enables theprogrammer to generate different kinds of camera views, i.e. using a shader to simulate afish-eye lens. Also all special effects, which affect the whole image, can be placed within acamera shader (i.e. motion blur, depth of field etc.).

Surface Shaders

Each object in the scene has its own shader determining, what happens if a ray hits. Thesurface shader also takes care of generating shadow rays and adjusting its pixel color valuedepending on the results. Also additional reflection and refraction rays might be generatedby this type of shader.

Light Shaders

A light shader takes care of any shadow ray, which reaches the specific light source. Thisenables the ray tracer to support a wider range of light sources (i.e. different shapes andcolors).

Environment Shaders

All rays leaving the scene without hitting any object are taken care of by the environmentshader, which can be used for example to simulate a cloudy sky.

2.3. RAY TRACING VS. RASTERIZATION 9

Volume Shaders, Pixel Shaders etc.

Volume shaders are used to compute attenuation of a ray, if travelling between two objects.That way different environments like water can be simulated, perhaps by changing the rays’direction, when traversing through the water. Pixel shaders can take care of post-image-processing (i.e. tone mapping).

2.3 Ray Tracing vs. Rasterization

As described on the previous pages it seems, that ray tracing has all the advantages on itsside: simple algorithms, realistic reflections and refractions, programmable shaders, and muchmore.

Ray tracing has significant advantages compared to rasterization. Taking a look on rastariza-tion used by todays graphics cards, it is obvious, that most effects (i.e. shadows, reflectionsetc.) can only be computed by using programming „tricks“. For instance to compute theshadow of an object, a scene needs to be rendered at least twice, because the pixels andvertices do not know if they are within a shadow or not during the first rendering. Due tothat, programming new shader effects becomes more and more complicated and costly for thedevelopers. Another drawback is the fact that such shaders are often only approximationsof the real effects, which means that the generated images are not physically correct (i.e.reflections). Different shaders also cannot simply be combined in rasterization.

Still, rasterization has its advantages. The used technique made graphics cards so cheap, thata new mass market has been created. All computer games, released today, use the advantagesof fast graphic boards as well as the simple programming, to bring more realistic virtual worldsto life.

So the question is not, whether ray tracing is replacing rasterization, but when, and whetherthere might be some intermediate stage with rasterization and ray tracing working together,each doing, what they can do best. It is a fact that GPUs are getting faster every year, butmore importantly the restrictions in programming them are also diminishing. So GPUs mightturn out to be the perfect ray tracing processors in a few years. More information regardingthis topic can be found in Chapter 5 and 6 of this document.

Some interesting insights into the „fight“ between rasterization and ray tracing can be foundin a script based on a panel hold at SIGGRAPH 2002. Several panelists from NVidia, ATI,Silicon Graphics and the Saarland University discussed the question, when and if ray tracingis going to replace rasterization [12]. Though the stated facts are only personal opinions.

Chapter 3

Acceleration Methods

The last chapter showed, that ray tracing algorithms are fairly simple copies of what happensin a real room at the moment when the lights are turned on. One of the main drawbacks isstill the needed rendering time. In 1980 Whitted already discovered, that about 95% of hiscomputation time is taken up by intersection tests [10]. Therefore making ray tracing faster ismainly possible by speeding up these tests. Some of the proposed improvements are describedon the following pages, started by a way to reduce the number of rays sent into the scene.Graph theory and trees make up a big proportion of further improvements as shown later inthis chapter.

3.1 Computing Less Samples in the Image Plane

The first approach to reduce the number of intersection tests is clearly a reduction of thenumber of primary rays sent into the scene. The following list is far from being complete,considering the large amount of people working on the acceleration problem.

One method described by Glassner [13] manages to reduce the number of primary rays bysampling the image plane adaptively. Instead of tracing a ray through each pixel of the plane,a fixed spacing is used to subsample the image. The colors of the pixels between the spacingare determined via a given heuristic based on the colors of the adjacent pixels. However, thisworks best, if the geometry in the scene is quite large. Highly detailed objects, and highfrequency features (i.e. textures) suffer from using this method, which might result in certainartifacts, especially in animations as described by Wald [8].

Another method named „Vertex Tracing“ was first described by Ullmann et al. [14]. He alsogives up tracing rays through all pixels in the image plane, and instead only sends rays intothe image targeting the corners of visible vertices. The colors between these corners are theninterpolated using standard rasterization on the graphics card. This can reduce the number ofrays in scenes with simple geometry significantly, but breaks down for highly-detailed objectswith a lot of triangles. Additionally this technique has similar problems as the one describedby Glassner in the previous paragraph. By using interpolation, fine details or high frequencyfeatures might be lost, resulting in poor image quality.

11

12 CHAPTER 3. ACCELERATION METHODS

3.2 Reducing the Number of Secondary Rays

Till now the number of primary rays have been reduced, but these only make up a smallportion of the rays actually traced in the image, more exactly one for each pixel in the imageplane. Computing secondary rays to handle reflections and refractions pose a bigger problem,perhaps resulting in dozens of secondary rays for each primary ray depending on the numberof lights or reflecting and refracting objects in the scene.

One approach is to reduce the number of shadow rays. These only need to check, if there is anobject between their origin and the targeted light source. A cache technique called „ShadowCache“ was proposed by Haines and Greenberg [15] in 1986. In case a shadow ray is not ableto reach a certain light source, due to an object in the rays path, the targeted light sourceremembers this object in a cache. The next ray shot at the same light source is first tested forintersection of the cached object due to the fact, that certain rays are often all occluded bythe same object. However this algorithm quickly breaks down in scenes with highly detailedgeometry, because one triangle is less likely to occlude more than one shadow ray.

In 2002 Fernandez et al. [16] proposed „Local Illumination Environments“ (LIE) subdivid-ing the whole scene into a set of voxels. Each voxel stores information, on how differentlight sources influence this region of space. The LIE voxels and information need to be pre-computed before the actual ray tracing starts, but if this is done the number of shadow rayscan be significantly reduced (i.e. by skipping some light sources, which do not have anyinfluence in the respective LIE voxel).

Another technique described by Wald in his PhD thesis [8], focuses on smooth shadows,which can produce a fairly large amount of shadow rays to be approximated (see Section2.2.3). „Single Sample Soft Shadows“ regard all light sources in the scene as point lights,which reduces the number of casted shadow rays to one. This point light is then attenuateddepending on how narrowly the ray misses potential occluders. Still, with the algorithm onlyapproximating the problem, it creates convincing shadows.

More possibilities to reduce shadow rays are described by Wald in his PhD thesis [8].

3.3 Accelerated Intersection Tests

The preface of this chapter referred to the fact, that 95 % of the ray tracing computation timeis spent in intersection tests. The previous paragraphs reduced the number of rays but hadno impact on the real time needed for these tests. The following paragraphs describe commonapproaches to speed up the intersection by using certain acceleration structures.

3.3.1 Primitive-specific Tests

There are many known algorithms aiming at fast primitive intersection tests (i.e. triangle-triangle, line-line, etc.). As these algorithms are commonly available and not a ray tracingspecific problem, they are not described in this document.

3.3. ACCELERATED INTERSECTION TESTS 13

3.3.2 Bounding Volumes

Bounding volumes are an easy approach to speed up intersection tests. Every object in thescene is surrounded by a simple bounding volume (i.e. a rectangle). The first intersectiontest of a ray is computed against the bounding volumes of the objects in the scene. If the raydoes not intersect the volume, the whole object in the volume can be skipped. Problem is thefact that simple bounding volumes might fail to approximate the shape of certain geometrythey enclose (i.e. spheres). The ray might still miss the sphere, but intersects the boundingvolume creating unneeded intersection tests against the sphere’s triangles. Therefore simplebounding volumes are not used in ray tracers.

3.3.3 Spatial Subdivision Schemes

Spatial subdivision techniques divide the three-dimensional space of the scene into a finitenumber of voxels, which do not overlap each other. Each voxel keeps information on whichprimitives (i.e. triangles) it contains. The subdivision is performed by taking the scene spaceand dividing it into two areas. This division can then be called recursively until a certaindivision depth is reached, or if the subareas only contain a given minimum number of triangles.

If a ray is shot into such a scene the spatial acceleration structure sequentially iterates throughall encountered voxels. All primitives within an encountered voxel need to be intersected withthe ray. As soon as an intersection with a primitive is found, the ray tracing algorithm canbe terminated, skipping all further voxels.

This works just fine as long as a triangle only belongs to one spatial region in the grid. Dueto the fact, that spatial subdivision divides the space and not the geometry, it is common,that a triangle is part of two adjacent grid regions. In that case early ray termination mightresult in errors, because certain intersections with geometry after the current voxel might bemissed (see Figure 3.1). In the shown example voxel 1 holds a reference to triangle B, becauseat least a part of it is contained in this voxel. If the ray is intersected with the geometry invoxel 1, it is tested against the complete triangle B resulting in a hit. In case the traversal isstopped, the intersection with triangle A in voxel 2 is lost, which would have occured beforethe hit of triangle B. A solution to this problem would be to avoid overlapping geometry. Atriangle leaping in both voxels could simply be divided into two independent triangles, eachfully contained in one voxel. However this possibility is rarely used because it might generatemuch more triangles raising the memory requirements.

Solving this problem is easy. If a ray hits a primitive, which is contained in two voxels,additional intersections have to be computed by also intersecting the ray with all primitivescontained in both voxels. However, this might result in double computations of the trianglescontained (i.e. intersecting triangle B two times). To avoid this, a technique called „mail-boxing“ has been introduced by Amanatides and Woo in 1987 [17]. A unique id number isassigned to each ray. If a triangle is tested for intersection with a ray, it remembers the ray’sid number. During the next intersection test with the same triangle it is first checked, if theid number matches the one of the current ray. If it does, the intersection can be skippedbecause it has already been computed.

14 CHAPTER 3. ACCELERATION METHODS

Figure 3.1: Spatial Subdivision: Triangle B belongs to multiple voxels. Stopping the raytraversal early would mean to miss the intersection with triangle A (Courtesy of Thrane [20]).

However, „mailboxing“ creates problems, if using multithreaded implementations. If rays aretraversed through the spatial structure in multiple threads, remembering the last ray id inthe triangle, might be pointless, because another ray is tested for intersection inbetween,trashing this kind of caching as described by Wald [8]. A solution for this problem is „hashedmailboxing“, though less efficient, but preferable if many threads are used or memory isscarce [18].

Only two spatial subdivision approaches, uniform grids and kd-trees are presented on the nextpages, as these are the most commonly used.

Uniform Grid

The uniform grid was first described by Fujimoto et al. in 1986 [19] and follows the ideaof spatial subdivision schemes as described before. Before starting the grid’s construction,a resolution for all three axis of the grid has to be determined. The best parameters forthis resolution are depending on the scene geometry. More voxels mean, that there are onlyfew triangles to intersect per voxel, but this causes longer grid traversal. However, less voxelsresult in more intersection tests due to more triangles contained in each voxel. Several differentideas exist, described by Thrane et al. [20], but it is still necessary to tweak the resolution byhand depending on the scene.

To traverse the grid, the 3D version of the 2D line drawing algorithm is used, known as 3Ddigital differential analyzer (DDA) algorithm (see Figure 3.2). Examples are described indetail by Fujimoto et al. [19] and Thrane et al. [20]. Christen [21] also included several codeexamples in his Diploma thesis.

KD-Trees

As the name already suggests kd-trees are a version of spatial subdivision structures arrangedin a tree, more exactly a binary tree. The primitives (i.e. triangles) of the scene’s geometryare stored in the tree’s leaves. This specialization of binary trees has first been described byBentley in 1975 [22].

3.3. ACCELERATED INTERSECTION TESTS 15

Figure 3.2: Uniform grid: Traversal of an uniform grid (Courtesy of Havran [9]).

Constructing a kd-tree begins with a bounding box sourrounding the whole scene and itstriangles. First a splitting plane is chosen, dividing the bounding box in half, which createstwo child nodes for the tree. All primitives of the original box get assigned to the new childnode which now contains the primitive. Primitives contained in both new boxes get referencedin both child nodes. This procedure can be repeated recursively till certain criteria are met:Either a set maximum tree depth has been reached, or the number of primitives in a nodedropped below a certain threshold.

Choosing the position of the splitting plane is the main problem during the tree’s construction.Several approaches have been taken in calculating the optimal bounding box division. Someare described by Thrane et al. [20]. Extensive research into this problem has also been doneby Havran in his PhD thesis [9]. An example construction is shown in Figure 3.3.

Figure 3.3: KD-Tree: Three steps of the tree construction (Courtesy of Havran [9]).

16 CHAPTER 3. ACCELERATION METHODS

Traversing the tree starts at a given node N (i.e. the root node). If N is a leaf node, allprimitives referenced in the leaf are tested for intersection with the ray. If N is an internalnode, the child node first intersected by the ray is called recursively. In case an intersectionis found, it can be returned as the nearest one. If there was no intersection detected in thesub-tree, the recursion goes back to the next branch and calls this child node recursively.

3.3.4 Bounding Volume Hierarchies

Bounding volume hierarchies (BVH) differ from spatial subdivision techniques by dividingobjects, not the scene space. As described on page 13, simple bounding volumes enclose acertain geometry. Intersection testing against such a bounding volume is easier and fasterthan testing against the enclosed primitives. If a ray is not intersecting the bounding volume,it can also not intersect any primitive included. Main advantage of BVHs is the fact that theused bounding boxes are faster to test for intersection than the included geometry, comparedto uniform grids or kd-trees, which always need to test the triangles in the current ray area. AsThrane et al. point out [20], that in practice the most widely used bounding volume for BVHsis the axis aligned bounding box (AABB). The AABB makes up for its loose fit by allowingfast ray intersection. It is also a good structure in terms of simplicity of implementation.Glassner gives an overview over studies done on more complex forms [13].

The construction of a BVH depends on the contents of the scene (similar to kd-trees). A goodoverview and pseudo code listing is given by Thrane et al. [20] using the traversal quality ofa BVH as a quality criteria. If traversal is cheap, many intersection tests can be skippedvery fast. The traversal itself is done using recursive descent, similar to kd-trees with a littlechange, that all child nodes need to be investigated. This is based on the fact that the createdbounding volumes can overlap each other. The child nodes in the tree also follow no sortingby default. However, whether sorting is really improving the intersection time is still notclear, both sides have their supporters.

An example tree for the rendering of a solid cow is displayed in Fig. 3.4. The createdbounding volumes (spheres in this case), dividing the cow, are shown in Fig. 3.5. Each imagecorresponds to a level in the BVH tree.

Figure 3.4: BVH: Tree for cow example (Courtesy of Somchaipeng [23]).

Figure 3.5: BVH: A solid cow and the levels in its bounding volume hierarchy (Courtesy ofSomchaipeng [23]).

Chapter 4

Acceleration Methods for DynamicScenes

The acceleration methods mentioned in Chapter 3 have been developed in the early years ofray tracing, especially designed to render static scenes. This results in certain restrictionsfor interactive ray tracing, as the rendered scenes are only suitable for static walkthroughs.All described acceleration structures need to be re-created every time the scene changes (i.e.due to unpredictable user interaction), at worst-case for every frame, rendering them unfea-sible due to their high construction costs. This is a major disadvantage for applications likeinteractive simulations or games, which need to react to user interactions.

Latest researches have concentrated on this aspect of ray tracing acceleration. Differentapproaches are presented on the next pages, as well as a benchmark independently developedto stress-test available ray tracers, creating a common performance basis.

4.1 Dynamic Bounding Volume Hierarchies

The first approach has been presented by Wald et al. [24] early 2006. They used a boundingvolume hierarchy (BVH) as described in Section 3.3.4, rendering a „deformable“ scene (seeFig. 4.2). Deformable scenes include moving triangles, but no triangles are split, created,or destroyed over time. The entire scene is ray traced using a single BVH, whose topologyis constant for the whole animation. Figure 4.1 shows two child nodes of the BVH tree.When the objects move, a BVH can keep the same hierarchy, and only needs to update thebounding volumes. Though the new hierarchy mai not be as good as the old one, it will alwaysbe correct. In contrast to spatial subdivion structures (i.e. uniform grids, kd-trees), the BVHsubdivides the object hierarchy, which is more robust over time, than a given subdivision ofspace. As a result, a BVH can be quickly updated between frames avoiding a complete perframe rebuilding phase.

Before getting into more detail, on how a BVH can be used for a dynamic scene, Waldet al. first described how BVHs can be made faster for static scenes. Till now kd-treeimplementations are still superior in speed compared to BVHs.

19

20 CHAPTER 4. ACCELERATION METHODS FOR DYNAMIC SCENES

Figure 4.1: Dynamic BVH: Two childnodes of a BVH tree. Bounding volumes move overtime (left to right) (Courtesy of Wald [24]).

In order to keep their algorithm open for user-based interactions, they did not base theirapproach on the knowledge of all possible deformations of a model. Times ranging up toabout 30 seconds for a complex scene have been measured in case the BVH is re-built forevery frame. Applying the explained dynamic update, these times could be reduced to about0.018 seconds per frame. The shown scene was ray traced at 3.7 frames per second on adual-2.6 GHz Opteron desktop PC including shadows and texturing.

Figure 4.2: Dynamic BVH: Rendered scene using 174.000 triangles, rendering at 1024x1024pixels (Courtesy of Wald [24]).

4.2 Coherent Grid Traversal

Another possibility to accelerate ray tracing was also presented by Wald et al. [25] earlyin 2006. He used a uniform grid together with ray packets, frustum testing and SIMD1

extensions.

1 SIMD, „Single Instruction Multiple Data“ is a set of operations for efficiently handling large quantities ofdata in parallel, as in a vector processor or array processor. First popularized in large-scale supercomputers(as opposed to MIMD parallelization), smaller-scale SIMD operations have now become widespread inpersonal computer hardware. Today the term is associated almost entirely with these smaller units.

4.2. COHERENT GRID TRAVERSAL 21

A key feature of this algorithm is to exploit the nature of a packet of rays with almostthe same direction. These coherent rays are then traversing through the grid in one singlepackage, because the grid elements, which they are visiting and intersecting, are most likelythe same. The algorithm first computes the packet’s bounding frustum (see image a in Figure4.3), which is then traversed through the grid one slice at a time (see image b). For eachslice (blue), the frustums overlap with the slice (yellow) is computed, which determines theactual cells (red) overlapped by the frustum. Picture c shows, that each frustum traversalstep requires only one four-float SIMD addition to incrementally compute the minimum andmaximum coordinates of the frustum slice overlap, plus one SIMD float-to-int truncation tocompute the overlapped grid cells. Viewed down the major traversal axis (see „d“), each raypacket (green) will have corner rays, which define the frustum boundaries (dashed). At eachslice, this frustum covers all of the cells covered by the rays.

Figure 4.3: Coherent Grid Traversal: Computation steps (Courtesy of Wald [25]).

The computed frustum is then used to improve the triangle intersection. As shown in Figure4.4, a grid (see b) does not adapt as well to the scene geometry as a kd-tree (see a). This causesthe grid to often intersect triangles (red), which a kd-tree would have successfully avoided.These triangles however usually lie far outside the view frustum, and can be inexpensivelydiscarded by inverse frustum culling during frustum-triangle intersection.

Aside of these basics more detailed information are given on the re-creation of the grid forevery frame due to the dynamic nature of the supported scene.

22 CHAPTER 4. ACCELERATION METHODS FOR DYNAMIC SCENES

Figure 4.4: Coherent Grid Traversal: Adaption to the scene geometry of kd-tree (a) andgrid (b) (Courtesy of Wald [25]).

4.3 Distributed Interactive Ray Tracing

Another approach created by Wald et al. [26] involved the creation of a scene graph similar tothe ones used in OpenGL implementations. This method separates the scene into independentobjects with common properties concerning dynamic updates. Three classes of objects wereidentified: Static objects are treated as usual, objects undergoing affine transformations arehandled by transforming rays, and objects with unstructured motion are rebuilt whenevernecessary.

The approach is based on the observations made by Lext et al. [28] of how dynamic scenesbehave: Large parts of a scene often remain static over long periods of time. Other partsundergo well-structured transformations like affine transforms. Yet other parts are changedin a totally unstructured way. This common structure within scenes can be exploited bymaintaining geometry in separate objects according to their dynamic properties, and handlingthe various kinds of motion with different, specialized algorithms, that are then combinedinto a common architecture. Each object can consist of an arbitrary number of triangles.It has its own acceleration structure and can be updated independently of the rest of thescene. Of course an additional top-level acceleration structure must then be maintained,which accelerates ray traversal between the objects in a scene. Each ray then first startstraversing this toplevel structure. As soon as a leaf is found, the ray is intersected with theobjects in the leaf by simply traversing the respective objects local acceleration structures.

An example can be seen in Figure 4.5, where the robots are divided into different parts, whichare coded in different colors. Each part is represented by its own acceleration structure,integrated into a top-level structure (see Figure 4.6).

In the given example (see Figure 4.6) a close relative to the KD-tree, the binary space par-titioning (BSP) is used. A top-level BSP contains references to the instances of the objects.Additionally in a second level each sub-object is again represented by its own BSP tree. An-other advantage of this structure is the fact that equal objects only need to be loaded once,as the top-level BSP is only working with references (i.e. a forest of hundreds of the sametrees is represented by a single instance of the tree), reducing memory consumption.

The paper especially focuses on the problem of ray tracing in a distributed environment,i.e. one master server sending ray tracing data to all connected client PCs for computation.

4.4. BENCHMARKING ANIMATED RAY TRACING 23

Figure 4.5: Distributed Ray Tracing: Robots (left) with color-coded objects (right). Trianglesof the same object belong to the same color (Courtesy of Wald [26]).

Figure 4.6: Distributed Ray Tracing: Two-level hierarchy with a top level BSP-tree con-taining references to instances. Objects consist of geometry and a local BSP tree (Courtesyof Wald [26]).

Therefore the bottleneck was mainly located in the communication between the differentclients and the master server. However, the combination of different acceleration structuresas some sort of scene graph showed some improvements and possibilities.

4.4 Benchmarking Animated Ray Tracing

While reading the papers on new approaches to accelerate ray tracing, especially dynamicscenes, it is a challenge to compare the results presented in these documents. To test theirapproaches, every scientist creates own, often unique, test scenes, measuring the frame rate.The problem is to judge the significance of these results with the next paper, offering differenttest scenes revealing drawbacks, which did not show up in the first scenes.

To remedy this problem, a benchmark for animated ray tracing (BART) was proposed byLext et al. [28], to measure and compare performance and quality of ray traced scenes, thatare animated. BART is a suite of test scenes, placed in the public domain, designed to stressray tracing algorithms, where both the camera and objects are animated parametrically.Also rules on how to measure performance and the error in the rendered images, if usingapproximating algorithms, are described.

Previously there has only been one recognized benchmark, the „Standard Procedural Database“

24 CHAPTER 4. ACCELERATION METHODS FOR DYNAMIC SCENES

(SPD) created by Haines [29] in 1987. With ray tracing focused on static scenes in these days,the benchmark primarily also targeted single static images and walkthroughs.

To construct a widely-usable benchmark, Lext et al. first identified, what stresses existing raytracing algorithms and thus decreases performance. The goal was to implement each of thesepotential stresses into the benchmark resulting in the scenarios described in the following list.

• Hierarchical animation using translation, rotation, and scaling

• Unorganized animation (i.e. not just combinations of translations, rotations, scalings)

• „Teapot in the stadium“ problem

• Low frame-to-frame coherency

• Large working-set sizes

• Overlap of bounding volumes or overlap of their projections

• Changing object distribution

Figure 4.7: BART: Images from „kitchen“ (left) and „robots“ (right) (Courtesy of Lext [28])

Figure 4.8: BART: Images from „museum“ (Courtesy of Lext [28])

Lext et al. also include reasons of why they think, that these scenarios are best suited tostress-test many different ray tracing algorithms. Test scenes aiming at these problems havealso been created, called „kitchen“, „robots“, and „museum“ (see Figures 4.7, 4.8). All theneeded data, as well as sample parsers and additional source code is available for downloadat the BART homepage [30].

Chapter 5

Approaches to realize Realtime RayTracing

Now with the basics and acceleration structures described in Chapters 2, 3 and 4, the nextpages are used to shed some light on already existing approaches to realize realtime ray tracing.These implementations differ not only in the used algorithms (i.e. acceleration structures),but most importantly in the needed hardware. Approaches solely relying on CPU comput-ing power are described, as well as CPU-GPU-hybrid- and sole GPU-implementations. Thechapter is concluded by information, regarding some special purpose hardware architectures,solely created for the purpose of tracing rays.

5.1 Software Approaches

Realtime ray tracing systems running on the CPU are common. In order to run these atan interactive frame rate, two different problems have to be solved. First, the best raytracing algorithm and acceleration structure need to be chosen, paying careful attention onimplementing them optimally on the given hardware. Second, even the best CPU or algorithmcannot deliver frames at interactive rates today. To do that, the nature of the algorithm forparallel processing must be exploited by using a shared-memory architecture, a cluster of PCs,or multi-CPU computers working together.

The ray tracing algorithm can trivially be expanded to support parallelization with the prob-lems starting, when it gets to the communication and synchronization. As shared-memorysystems support fast inter-processor communication and synchronization with little program-ming effort, the first ray tracing approach has been implemented on such structures. Thechase to achieve interactive frame rates has been won in 1995 by Muuss [31]. A full-featuredray tracer with shadows, reflections, refractions and shaders has been developed in 1999 atthe university of Utah by Parker et al. [32].

However, the used shared-memory systems are quite costly and therefore only in limited use,i.e. at universities. Small companies cannot afford these and have to rely on standard PCs andPC clusters as they are readily available and cheap. Compared to the described supercomput-ers, such systems have serious drawbacks, when it comes to inter-processor communication.

25

26 CHAPTER 5. APPROACHES TO REALIZE REALTIME RAY TRACING

Additionally they have less memory, less communication bandwidth, and a higher latency.The first implementation using PC clusters has been completed at the Saarland university byWald et al. in 2001 [33]. Wald used a client-server approach to overcome the small bandwidthsoffered by the cluster’s PCs. The server is not computing any data by itself, but solely handlesthe distribution of image parts and geometry to his clients. The limited client memory posedits own unique problems, when rendering came to massively complex models consisting ofseveral gigabytes of data. To avoid the complete geometry being copied to all clients, a trans-parent software caching layer has been implemented, loading data from the server if required(see Section 8.1.2). More information can be found in Wald’s PhD thesis [8].

5.2 Using Programmable GPUs

Graphics cards have included support for programmable shaders a few years ago in an effort toincrease the realism of their renderers. This led to a great amount of flexibility, transformingthe fast GPU into a parallel working co-processor. Since this time, programmers have triedto exploit the computing power of the GPU for other purposes than their designers originallyintended (i.e. Fast-Fourier-Transformations etc.). Many examples for such implementationscan be found at the GPGPU homepage [34]. With dual and quad GPU solutions enteringthe market (NVidia SLI [35], ATI Crossfire [36]) the computing power can be doubled orquadrupled easily. However, programming the GPU is still severly limited, compared to thefreedom if working on the CPU, but these constraints diminish more and more with each newshader model and graphics card generation.

In 2002 Carr et al. [37] followed an idea, to use the GPU for ray-triangle intersections. Dueto the limits in programming the GPU at that time, when the need came to flow control, heused a CPU-GPU-hybrid implementation. To „feed“ the GPU with the necessary intersectiondata, the CPU is used to reorganize the rays into efficient structures, because the ray tracingalgorithm performs best, if intersection testing is done for groups of coherent rays. In the endthe frame rates rivaled single-CPU ray tracing implementations of that time. However, a lotof performance was lost because the approach required too much communication between theCPU and GPU in both directions, which often does not pay off due to the high communicationcost. For example the cost of sending the data for a ray over the PCI bus is rather highcompared to just performing the intersection on the CPU itself. Even worse, the traversalalgorithm on the CPU depends on the results of the intersection computations, requiring aread-back from the GPU, which is both rather slow and has very high latencies [8].

Purcell et al. [38] [39] managed to map the complete ray tracing algorithm to the GPU,without the need to run a part of the code on the CPU in 2004. Looking at the GPU as astream processor, he subdivided the ray tracing process in streams and kernels. Streams areregarded as a flow of data, one can read from or write to. The processing work is done by thekernels, each of which having an input- and output-stream. Several kernels are used in a rowto perform the ray tracing algorithm. Still his conclusion was, that ray tracing on the GPUis not very much faster than equal implementations on a CPU.

Ray tracing on the GPU, especially the approach used by Purcell et al. [39] is described inmore detail in Chapter 6 of this thesis.

5.3. SPECIAL PURPOSE HARDWARE ARCHITECTURES 27

5.3 Special purpose Hardware Architectures

Similar to todays graphics cards produced by NVidia, or ATI, creating an accelerator cardespecially designed to run the ray tracing algorithm has also been considered. Several ap-proaches have been developed over the years presented on the following pages.

One of the first accelerator cards is named VIZARD (Visualization Accelerator for Real-Time Display), developed in 1998 at the university of Tübingen in Germany. The main aimby Meißner et al. [41] was to accelerate real-time volume rendering, which is often used inmedical and scientific applications. Rendering such sampled data is still a challenging taskfor the CPU. The used ray tracing algorithm was simplified by only using ray casting. Inray casting only primary rays are generated, without any secondary rays and therefore noreflections and refractions. Also shading was not implemented. However, it was possible todefine cut-planes, to look „inside“ the volume data, especially useful for medical applications.An example rendering of a lobster can be seen in Figure 5.1. A rate of 10 frames per secondcould be reached for datasets containing 2563 (16.777.216) voxels, casting 2562 (65.536) rays.

Figure 5.1: VIZARD: A rendered volume dataset of a lobster (Courtesy of Meißner [41]).

A year later, in 1999, Mitsubishi Electric developed the VolumePro Real-Time Ray-CastingSystem [42]. Interactive rendering of volume datasets was also a main goal of this accel-erator card. Additionally to optical improvements the main advantage over VIZARD wasVolumePro’s higher rendering speed, reaching up to 30 frames per second (see Figure 5.2).

A more general approach was taken in 2002 with the developed 3DCGiRAM architecture.Created by IBM Japan in conjunction with several local universities, 3DCGiRAM featuredinteractive ray tracing of a 3D scene, including reflections and refractions [43]. Running at333 MHz, simulated frame rates of about 18 frames per second have been measured.

A promising approach, called SaarCOR, has also been created at the Saarland university inGermany. With the first paper presented in 2004 [44] Schmittler et al. showed, that real timeray tracing was possible using an acceleration card running at 90 MHz (later at 66 MHz).

28 CHAPTER 5. APPROACHES TO REALIZE REALTIME RAY TRACING

Figure 5.2: VolumePro: Rendered medical volume data (Courtesy of Mitsubishi Electric [42]).

The advantage, compared to previously described approaches, is SaarCOR’s general usability.Supporting all kinds of rays (primary, secondary, shadow etc.) as well as texturing andprogrammable shaders, it offers all features needed in modern computer graphics. Createdin conjunction with OpenRT (see Chapter 7) it is easily possible to program SaarCOR in acommon shader language style. More information can be found in a paper by Woop et al. [45]and the PhD thesis of J. Schmittler describing the hardware architecture in great detail [46].

Figure 5.3: SaarCOR: Scenes rendered at 4.5 and 7.5 fps on the 66 MHz prototype (Courtesyof Woop [45]).

Chapter 6

Using the GPU for Ray Tracing

A short introduction, describing several possibilities to use the GPU for ray tracing, hasalready been given in Section 5.2. On the following pages the approach developed by Purcellet al. [38] [39] will be described in more detail. So far it has been the most successful attemptto map the entire ray tracing algorithm onto the graphics card. After the description ofthe basics, the next step is to choose a fitting acceleration structure (to get an overviewof these structures see Chapter 3). The criteria are somewhat different compared to CPUimplementations, because several limitations in programming the GPU make some structuresmore or less useful. At the end of this chapter a comparison is made between the CPU andGPU approaches presenting some benchmarks.

6.1 Stream Computation

The stream computation as described by Purcell et al. [38] [39] is a specific way to abstract theGPU in order to program it. The stream programming model constrains the way, software iswritten, such that locality and parallelism are explicit within a program. This model consistsof programs called kernels and separate data streams (see Figure 6.1). Computation is carriedout arranging input data in a stream and feeding this stream to a number of processors, eachexecuting a kernel on the stream elements one by one. The results of each kernel invocationare placed in an output stream.

Figure 6.1: The stream programming model (Courtesy of Purcell [39]).

29

30 CHAPTER 6. USING THE GPU FOR RAY TRACING

An easy to use stream programming language for the GPU is called BrookGPU [47]. Themain advantage is that any implementation written in Brook can be compiled either for normalCPUs or for GPUs, without the need of reprogramming. Purcell pointed out, that his raytracing approach was recently reimplemented in BrookGPU within only a couple of days.

In order to apply the stream programming model to the GPU, a closer look must be taken atthe processing pipeline of a graphics card (see Figure 6.2). The grey boxes in the image show,where the programmable vertex and fragment engines are located. Also the input, respectiveoutput stream data types at each stage are shown.

Figure 6.2: The programmable graphics pipeline (Courtesy of Purcell [39]).

The vertex processor of a graphics card is a programmable unit, that operates on incomingvertex attributes, such as position, color, texture coordinates and so on. The vertex programstage is generally used to transform the vertices from model coordinates to screen coordinatesusing matrix multiplication.

After the rasterization is complete the fragment processor is called. In addition to the possibil-ities of the vertex processor, the fragment processor offers texture operations to access images.This stage is generally used to modify the color of each fragment with texture mapping orother mathematical operations.

A current state of the art graphics card from NVidia, GeForce 7800 GTX, features 24 fragmentand 8 vertex pipelines. As the numbers indicate, the main processing power of the GPU liesin the fragment processors. Therefore the kernels of the stream programming model areimplemented as fragment programs, with input and output streams realized as textures.

According to the stream programming model, the ray tracing algorithm has been broken downby Purcell into four different kernels (see Figure 6.3). The eye ray generator kernel produces

6.1. STREAM COMPUTATION 31

a stream of viewing rays. Each viewing ray is a single ray corresponding to a pixel in theimage. The traversal kernel reads the stream of rays produced by the eye ray generator. Itthen steps a ray through the grid, until the ray encounters a voxel containing triangles. Theray and voxel address are placed in the output stream and passed to the intersection kernel.This kernel is responsible for testing a ray with all the triangles contained in a voxel. Theintersector has two types of output. If a ray-triangle intersection (hit) occurs in that voxel,the ray and the triangle, which is hit, are sent to the output for shading. If no hit occurs,the ray is passed back to the traversal kernel and the search for voxels, containing triangles,continues. The shading kernel computes a color. If a ray terminates at this hit, then the coloris written to the accumulated image. Additionally, the shading kernel may generate shadowor secondary rays. In this case, these new rays are passed back to the traversal stage.

Figure 6.3: The kernels used in the streaming ray tracer (Courtesy of Purcell [39]).

Eye Ray Generator

In order to start tracing rays, the primary rays pointing from the eye into the scene have to begenerated. On a GPU, the interpolation capability of the rasterizer can be used to generateall the primary rays in a single kernel invocation, as described by Thrane et al. [20].

Given the four corners of the viewing rectangle and the eye point, the four rays lining thecorners of the view frustum can be computed. If the rasterizer of the graphics card nowinterpolates the direction of these four rays across a certain pixel region (i.e. 512 × 512pixels), the result is, that all primary rays are generated, which are needed for an image ofthat size. The information about these rays can now be stored in two textures, one holdingthe direction of the ray, the second holding the ray’s origin.

32 CHAPTER 6. USING THE GPU FOR RAY TRACING

Traversal

The traversal kernel depends on the used acceleration structure. Purcell focused on a uniformgrid (see Section 3.3.3), because it was the easiest structure to implement on the GPU at thattime. Other approaches use a traversal kernel depending on the acceleration structure (seeThrane et al. [20]), because they want to test different structures for their GPU applicabilityand performance. In any case the traversal kernel forwards data to the intersection kernel, incase a region of the scene has been reached by the ray, where triangles might be intersected.

Intersection

The intersection kernel takes a stream of ray-voxel pairs, which are determined by the accel-eration structure in the traversal kernel. Now the ray-triangle intersection tests are computedfor all triangles in the voxel. If a hit occurs, a ray-triangle pair is passed to the shading stage.Because triangles can overlap multiple grid cells (see Figure 3.1 on page 14), it is possible foran intersection point to be outside of the current voxel. The intersection kernel checks for thiscase and treats it as a miss. Note that rejecting intersections in this way, may cause a ray tobe tested against the same triangle multiple times in different voxels. Although this can beavoided by „mailboxing“, which Amanatides and Woo introduced [17] (see Section 3.3.3), themulti-threaded implementation of this technique on a GPU is very difficult.

Shading

The shading kernel evaluates the color of the triangle at the point the ray hit the surface.Shading data consists of vertex normals and colors for each triangle. The hit information,which is passed to the shader includes the triangles ID number, which makes it possible toaccess the proper shading information by a simple lookup.

The shading kernel optionally generates shadow, reflection, refraction or randomly generatedrays, depending on the material, which the ray hit. These secondary rays are placed in thestream of rays processed by the traverser. Each invocation of the shading kernel returns botha color and a new ray for each pixel. The shading kernel also takes the color buffer output byprevious shading passes as input. This makes it possible to combine the colors of successivespecular surfaces as successive rays are traced.

6.2 Choosing the Acceleration Structure

Choosing the acceleration structure might seem trivial: Simply take fastest. Till now the bestresults have been accomplished by using tree-like structures (i.e. kd-trees, see Section 3.3.3).Like Thrane et al. [20] have shown, tree traversal is a non-trivial problem on the GPU. Theytried to traverse through a binary tree, using the estimate, that no assumptions can be madeabout the structure of the tree or the traversal order. As most programmers know, such atraversal can only be accomplished by using a stack. The situation becomes problematic,because in fragment programs, there is no indexable writable memory available. The onlyplaces, where data can be written to at runtime, is in temporary registers, local to the fragment

6.2. CHOOSING THE ACCELERATION STRUCTURE 33

program, and in the ouput buffer, where the result of the fragment program computation isstored. Several stack implementations have been tested by Thrane et al. [20] till they reachedthe conclusion, that efficient general purpose tree traversal is not feasible on current graphicshardware. However, they found an efficient strategy to traverse bounding volume hierarchies.Some traversal strategies for kd-trees running on the GPU are also described by Thrane, butthe previous example with the binary tree shows, that kd-trees on the GPU might be a lotslower than on the CPU.

Since the acceleration structures have already been described in Chapter 3 the following pagesfocus on the GPU implications, if using a specific structure, based on research done by Thraneet al. [20].

Uniform Grid

The idea behind uniform grids has already been presented in Section 3.3.3. Purcell alsoselected the uniform grid as the acceleration structure of his choice [39]. He argued, that noacceleration structure is better than the other for different scenes. Additionally the uniformgrid can easily be implemented on the GPU. As Thrane points out, the computation times (i.e.for accessing a voxel) are mostly constant. Additionally no stack is needed during traversal.Detailed code listings can be found in Thrane’s Master thesis [20].

The only information, which can efficiently be shared with the graphics card, is a texture.The conclusion is, that the information stored in the used acceleration structure (i.e. uniformgrid) must be packed in a texture. The approach used by Purcell [39] explains how a uniformgrid could be stored in such a way (see Figure 6.4). Each grid cell contains a pointer to thestart of a list of triangles contained in that cell, or a null pointer, if the cell is empty. Thetriangle lists are stored in another texture. Each entry in the triangle list is a pointer to a setof vertex data for the indicated triangle. Triangle vertices are stored in a set of three separatetextures.

Figure 6.4: Uniform grid stored in several textures (Courtesy of Purcell [39]).

34 CHAPTER 6. USING THE GPU FOR RAY TRACING

KD-Trees

It was already described, that CPU and GPU implementations of the same acceleration struc-ture might differ a lot. On the CPU, kd-trees are one of the best structures to accelerate raytracing as determined by Havran in his PhD thesis [9]. The problem of the missing stackto traverse the tree could be solved by using an older approach. Before recursive descenttraversal was developed, kd-trees were traversed in a sequential manner. One such approachhas been described by Foley and Sugerman [48]. Implementation details are given by Thraneet al. [20].

Bounding Volume Hierarchies

Bounding volume hierarchies are also represented as a tree-structure. Therefore they face thesame problems on the GPU as kd-trees, most importantly the missing stack. The solutionto solve this problem lies in the representation of the tree in the texture. Thrane et al. [20]describe a way to build a tree-texture which can be traversed in a sequential manner (seeFigure 6.5).

Figure 6.5: Example for traversal of a BVH tree in a texture (Courtesy of Thrane [20]).

6.3 Benchmarks

As mentioned in Section 4.4 on page 23, benchmarking ray tracing systems requires agreementson certain parameters. The frame rate achieved by a certain renderer depends on the imagesize, the number of triangles in the scene’s geometry, and much more. Many authors createtheir own test scenes, with only a few, choosing scenes from the BART benchmark. However,the results presented on the following pages should give a fairly good image, of where GPU-supported ray tracing stands compared to CPU implementations.

6.3. BENCHMARKS 35

N. Carr et al. - The Ray Engine

The following results have been reported by Carr et al. [37] benchmarking their CPU-GPUhybrid approach . They used an ATI Radeon 8500 graphics card, but gave no information,which CPU they used. Sadly Carr only described a few results, some even based on as-sumptions regarding future graphics cards. The only comparable results have been taken byrendering the infamous teapot (see Table 6.1). He compared a CPU-only approach with theCPU-GPU implementation and reported a speedup of about 22%. Graphics cards at thattime did not support a fast read-back of data to the CPU, which was necessary for his im-plementation. Therefore he also measured the possibility of using an asynchronous read-backand reached a theoretical speedup of 34% compared to the CPU-only implementation. In theend he measured the theoretical speedup, in case an infinitely fast GPU is used, resulting ina 73% raise.

System Rays per Sec. SpeedupCPU only 135,812plus GPU 165,098 22%Asynch. Readback 183,273 34%Infinitely fast GPU 234,102 73%

Table 6.1: Benchmark: CPU-GPU Hybrid: Speedups if using the GPU to render the teapotscene.

He concluded, that his ray tracer performed at speeds comparable to the fastest CPU raytracer of that time. He combined the best features offered by the CPU and GPU (CPUfor traversal of the acceleration structure and for ray coherence, and GPU for ray-triangleintersection) at the expense of a slow read-back of data to the CPU. The AGP1 graphics bussupports high-bandwidth transmission from the CPU to the GPU, but less bandwidth forrecovery of the results.

Carr et al. were very anxious regarding their hybrid approach, but compared to the research,which was done during the last years it seems that no one shared their enthusiasm. Mappingthe full ray tracing process to the GPU is a popular approach many people focused at, becausethe read-back delay to the CPU could not be overcome till the newest PCI-Express graphicscards have been available. No new benchmark results, involving such a graphics card, havebeen available at the time of this writing.

T. Purcell - Ray Tracing on a Stream Processor

Purcell made the first step to map the whole ray tracing process to the GPU [39], regardingthe graphics card as a stream processor (see Section 6.1). He tested his results on an ATIRadeon 9700 Pro, running at DirectX 9, Windows XP, with Catalyst 2.3 drivers on a dualPentium III 800 MHz machine with 1 GB RAM. Three different scenes, all not part of the

1 AGP, Accelerated Graphics Port (also called Advanced Graphics Port) is a high-speed point-to-point chan-nel for attaching a graphics card to a computer’s motherboard, primarily to assist in the acceleration of3D computer graphics. Some motherboards have been built with multiple independent AGP slots. AGPis slowly being phased out in favour of PCI Express.

36 CHAPTER 6. USING THE GPU FOR RAY TRACING

BART benchmark were used (see Figure 6.6 and Table 6.2). All images have been renderedat 256 × 256 pixels.

Figure 6.6: Benchmark: Purcell GPU Test Scenes: Cornell box, teapotahedron, Quake 3(Courtesy of Purcell [39]).

Scene Triangles Rays FramerateCornell Box 32 S 10.5Teapotahedron 840 E, S, R 1.0Teapotahedron 840 E, R 1.5Quake 3 35468 S 1.8

Table 6.2: Benchmark: Purcell GPU: Scene complexity and results, with eye rays (E), shadowrays (S), reflection rays (R).

He concludes, that his system roughly achieves the same ray-triangle intersection tests persecond as the CPU-GPU hybrid developed by Carr et al. [37]. Looking at the results Purcellreports this seems a bit odd, especially compared with the ATI Radeon 8500 of Carr andPurcell’s Radeon 9700 Pro. Such a large gap between these two graphics cards should resultin a high performance boost compared to the CPU-GPU hybrid.

Purcell also discovered some bottlenecks, which are mainly caused by GPU limitations. Themain restriction occured from the 24-bit floating point numbers on his graphics card. Thisonly allowed to address a 256 × 512 texture through integer calculations and caps his scenecomplexity at 131.000 triangles at a time. Additionally he used several OpenGL extensionsnot available in the normal OpenGL API at that time, mainly the creation of light-weightbuffers, which a fragment program can render into. These are an alternative to full screenframe buffers (p-buffers), if a full rendering context is not needed.

He also concluded, that the GPU hardware does not fully support the stream processorabstraction as well as it could. The stream processor he implemented is far less general thanit should be. Additionally he points out, that programming the GPU is not easy, becausehigh level languages, debuggers, and code profilers have been missing at that time. However,Purcell put much hope into the GPU to become a full high-performance parallel co-processor.

6.3. BENCHMARKS 37

N. Thrane et al. - A Comparison of Acceleration Structures for GPU AssistedRay Tracing

Both Carr and Purcell focused on one acceleration structure (i.e. the uniform grid in Purcell’simplementation). The only comparison of different structures found, during the work onthis document, has been given by Thrane and Simonsen in their Master’s thesis [20]. Theyimplemented a GPU ray tracer able to support uniform grid, kd-tree and bounding volumehierarchy as acceleration structure.

They used a 3.2 GHz Pentium 4, with 1 GB RAM, and a NVidia GeForce 6800 Ultra at 400MHz with 256 MB RAM. Windows XP Professional, service pack 2, was used as operatingsystem, with the graphics card running at NVidia 77.62 drivers.

Apart from Carr and Purcell they used highly complex scenes as found in the BART bench-mark (see Figure 6.7 and Table 6.3). More interestingly, two scenes they tested were animated(Cows and Bunny), which might give an idea, on how the tested acceleration structures com-pete in interactive ray tracing.

Figure 6.7: Benchmark: Thrane & Simonsen: Cows, Robots, Kitchen and Bunny (from leftto right) (Courtesy of Thrane [20])

Scene TrianglesCows 30226Robots (BART) 71708Kitchen (BART) 110561Bunny 69451Cornell Box (from Purcell) 32

Table 6.3: Benchmark: Thrane & Simonsen: Scene complexity.

The results (see Table 6.4) show that Thrane & Simonsen implemented two different versionsof kd-trees and bounding volume hierarchies. They conclude, that uniform grids were thefirst to be implemented on the GPU by Purcell, but have turned out to be the slowest, exceptfor single-object type scenes. The uniform grid is not suited for scenes with high variance ingeometric density because it is incapable of adapting to such changes. Additionally, traversalon the GPU suffers from a relatively large amount of data, required to represent the current

38 CHAPTER 6. USING THE GPU FOR RAY TRACING

Cows Robots Kitchen Bunny Cornell BoxUniform grid 770 4626 3269 667 205Kd-tree (Restart) 728 2771 1961 1299 203Kd-tree (Backtrack) 567 2619 1908 913 220BVH (Kay/Kajiya) 195 1017 556 257 143BVH (Goldsmith/Salmon) 183 825 476 275 103

Table 6.4: Benchmark: Thrane & Simonsen: Average rendering times in milliseconds perframe, including shadow and reflection rays where applicable.

state of the traversal. Both variants of the kd-tree clearly outperform the uniform grid, butloose against the bounding volume hierarchies. As Thrane et al. point out, that might be theresult of the simple implementation on the GPU. Kd-trees suffer from complicated traversalstrategies. Both bounding volume hierarchies are easier to construct and traverse than anyother structure on the GPU.

Chapter 7

Realtime Ray Tracing API(RTRT/OpenRT)

Rasterization graphics engines often make use of standardized APIs, to ease up their usage.That way a programmer can implement his application at a higher level without the need toactually know the hardware running on the computer (i.e. DirectX, OpenGL etc.). The basicidea behind OpenRT [49] is to create an API, similar to OpenGL, so programmers can makeeasy use of the ray tracer without the need to actually know, how it works.

OpenRT is consisting of three different parts. Primarily it is the realtime ray tracing projectstarted at the Saarland university. Secondly it is a ray tracer at its core. The third part isformed by the API to ease up the usage. The lack of a common API severly hampers thewide-spread use of ray tracing. Dozens of ray tracing implementations can be found on theinternet, but all these applications are not compatible to each other.

Standardized APIs are crucial for new technologies. They allow to build a common user base,where the user can abstract his programming work from the underlying hardware. Also ifprogrammers built their application on top of the API, certain changes and optimizations arepossible in the underlying layers, without changing the API itself. Therefore all previouslywritten programs can make immediate use of the upgrades by installing the new API version.

The following list presents some key goals during the development of OpenRT:

• Create a highly optimized ray tracer

• Use acceleration structures to make interactive dynamic scenes possible

• Abstract from the underlying hardware (run it on single Desktop PCs, clusters, specialpurpose hardware etc.)

• Offer all features of ray tracing (i.e. shaders, complex geometry etc.).

• Be as similar to OpenGL as possible.

39

40 CHAPTER 7. REALTIME RAY TRACING API (RTRT/OPENRT)

One of the first rendering engines, designed for industrial use, was RenderMan [50] createdby Pixar. It has been used to create several motion pictures (some completely rendered)supporting shaders and other features. OpenRT also offers the flexibility to use differentshaders in a plug-and-play manner as well as complex geometry. The shader environment isbased on C++, allowing easy implementation and conversion of existing shaders.

Another key feature of a renderer is the ability to run on different hardware. Exchanginglower layers, to either run on a single desktop PC, a clusters of computers, or specific purposehardware, should not affect the application. Also due to the nature of interactive ray tracing,support and optimizations for dynamic scenes are a must-have.

To ease up the usage, only triangles are used as geometric primitives. This makes handlingvery similar to OpenGL, although a ray tracer can support free form surfaces (see Section8.1.3).

The similarity to OpenGL should also help programmers get accustomed to the new render-ing engine by keeping the learning time and effort minimal. Additionally porting existingOpenGL-based applications would then be possible without investing too much time.

As of SIGGRAPH 2005 a non-commercial version of OpenRT is available on the Internet [49].It is limited to a lower resolution and a single-desktop PC (not running on clusters), butalready shows the easy installation and use of such an API.

Most of the examples described in Chapter 8 are implemented in OpenRT (i.e. Quake 3Raytraced [52], visualization of massively complex models [55], mixed reality rendering [59][60]).

Chapter 8

Applications

Ray tracing is in its core a concept developed to produce realistic images, supporting featureslike reflection, refraction, shadows and more. Therefore most of the examples and applicationsdescribed in this chapter focus on the graphical usage of ray tracing. However, ray tracing isin its heart still „only“ a fast intersection test, making byproducts, i.e. for artificial intelligencein games and simulations, feasible, which are also mentioned in this chapter.

8.1 Computer Graphic Applications

With ray tracing designed to create images, the number of applications surpasses the spaceavailable for this chapter. The following pages represent some very different examples all withtheir unique problems and possibilities in the field of ray tracing.

8.1.1 Computer Games

With more and cheaper computers available to everyone, the market for computer games hasgrown extensively over the past years. The big amount of money, people are spending ongames, laid the base for the development of fast CPUs and graphics cards. If ray tracingcould be introduced in computer games, a lot more research and money would be availablefor this field of technology.

First steps to incorporate ray tracing in games have already been taken on by Daniel Pohl etal. at the Saarland university [52]. He based his project on the popular ego-shooter Quake 3by id Sofware [53]. Removing the rasterization engine of the game, he and a few other studentsbuilt a new ray tracing 3d-engine from scratch in about six months, using OpenRT as a base.Compared to the years of development, companies invest in the creation of a traditional gameengine, half a year seems negligible.

As already described much time can be saved by using ray tracing, because the renderingalgorithm is automatically taking care of many effects (i.e. shadows, reflections, refractionsetc.). Programmers of rasterization engines need to use a lot of „tricks“ to bring such effectsto life in their games. As already described in Section 2.3 rasterization operates on a stream

41

42 CHAPTER 8. APPLICATIONS

of independent triangles, and therefore cannot efficiently render the mentioned effects in onerendering pass. Every effect has to be split into several rendering passes by the application,mostly relying on approximations, which are inaccurate and break down in many situations(i.e. multiple reflections (see Figure 8.1)) [51].

The game engine developed by Daniel Pohl et al. supports player and bot movement, in-cluding shooting and jumping, collision detection and many special effects like jumppads andteleporters. More ray tracing specific effects like portals (see Figure 8.2) and surveillancecameras are automatically rendered correctly by default, if they recursively see each other.Also a level-of-detail mechanism is not necessary to reduce scene complexity (see Figure 8.3),which allows for highly crowded scenes with many characters (see Figure 8.4). All screenshotsshown on this page are courtesy of Daniel Pohl [52].

Figure 8.1: Quake 3 Raytraced: Multiplereflections

Figure 8.2: Quake 3 Raytraced: Look-through portal

Figure 8.3: Quake 3 Raytraced: Highnumber of polygons

Figure 8.4: Quake 3 Raytraced: Areacrowded with characters

Other games and technology demos have already been developed with screenshots and videosavailable at [54].

8.1. COMPUTER GRAPHIC APPLICATIONS 43

8.1.2 Visualizing Massively Complex Models

Rendering a model either via rasterization or ray tracing is not problematic as long as theneeded data (i.e. polygons, textures etc.) fits within the memory of the graphics card or hostcomputer. With CAD1 becoming more and more important in the industry, displaying highlycomplex models at interactive frame rates on the engineers’ desktop computer, becomes anecessity.

Both described projects have been developed at the Saarland university. The first involvedrendering a UNC2 power plant reference model, featuring fifty million triangles [55] (see Figure8.5). Previous works regarding this power plant architecture featured frame rates of 5 to 15frames per second on an early SGI3 Onyx with four R4400 and an InfiniteReality graphicssubsystem. A reduction of rendered polygons by 96% has been achieved by using several„tricks“, like replacing distant geometry with textured meshes. Also some pre-compilation timehas been invested in creating several level-of-detail levels of the model. The main drawbackof the UNC approach is the tremendous preprocessing time estimated to be three weeks for asingle copy of the power-plant model. Most of the used reduction strategies are implicit whenusing ray tracing, rendering the needed pre-processing time obsolete.

The data management of such large models was also a major field of work. These problemsinvolved mapping the high amount of data into the memory of a single client or a cluster. Inthe latter case an optimal distribution of the data on the network was needed to optimize theperformance.

The final version consisted of two servers, one to display and one to store and distribute themodel data where needed. The rendering itself was processed by seven dual Pentium-III 800MHz clients all connected by Gigabit Ethernet switches. Using this configuration interactiveframe rates of 6-12 frames per second could be reached, all together with shaders and shadows(see Figure 8.6).

Figure 8.5: UNC power plant: Highly de-tailed building (Courtesy of Wald [55])

Figure 8.6: UNC power plant: Renderingwith shadows (Courtesy of Wald [55])

1 CAD, Computer Aided Design2 UNC, University of North Carolina at Chapel Hill, USA3 SGI, Silicon Graphics Inc.

44 CHAPTER 8. APPLICATIONS

The second example involved the rendering of a highly-detailed Boeing 777 model featuringabout 350 million triangles (see Figures 8.7, 8.8). A special feature of the Boeing 777 modelwas the low degree of occlusion. Many rays penetrate the model deeper than visible due tothe models’ skeleton appearance.

Figure 8.7: Boing 777: Overview (Cour-tesy of Boeing Corp.)

Figure 8.8: Boing 777: Engine interior(Courtesy of Boeing Corp.)

Memory management was also a great problem with the model being even more detailedthan the previously described UNC power plant. To improve loading times and interactivenavigation, geometry proxies have been used, resulting in a crude model at startup (see Figure8.9). While more data is processed over time, more fine details become visible (see Figure8.10).

Figure 8.9: Boing 777: Dynamic loading(at startup) (Courtesy of Boeing Corp.)

Figure 8.10: Boing 777: Dynamic loading(after startup) (Courtesy of Boeing Corp.)

With shadows and shading enabled the final application achieved frame rates of 3-7 framesper second at 640 × 480 pixels on a single dual-CPU desktop PC with 1.8 GHz.

8.1. COMPUTER GRAPHIC APPLICATIONS 45

8.1.3 Free-Form Surfaces and Volume Ray Tracing

Up to now, most ray tracers use triangles to display a scene’s geometry, much like standardrasterization approaches. While triangles are a must-have in a normal rasterization engine,this constraint is just used for the sake of simplicity in a ray tracer. Certain algorithms arealready supporting perfect shapes (i.e. perfectly round spheres) described via mathematicalformulas and not triangles.

Free form objects, described by splines instead of triangles, offer many advantages like reducedmemory requirements and higher precision results (see Figures 8.11, 8.12, by Benthin etal. [57]). However, their usage poses special problems regarding the used intersection testsand spatial index structures.

Figure 8.11: Free-Form Surfaces: Roundface (Courtesy of Benthin [57])

Figure 8.12: Free-Form Surfaces: Chessgame (Courtesy of Benthin [57])

While perfect shapes are normally described mathematically, volumetric data sets are muchmore difficult to handle. Consisting of sole independent points within the scene, forming anobject. Such data sets can be generated by taking a three-dimensional laser scan of a realgeometry. One task in volume ray tracing is to find the correct intersection of a ray withthe interpolated implicit surface defined by the data values. Depending on the intersectionalgorithm, the results are either accurate but slow, or fast but only approximate the solution.New optimized algorithms have also been used to achieve frame rates of 2 frames per secondon a dual Pentium-IV running at 2.2 GHz (see Figures 8.13, 8.14, by Marmitt et al. [58]).

8.1.4 Mixed Reality Rendering

The fundamental problem of all science-fiction movies is, how real actors should be copiedinto the computer generated scene which does not really exist? After solving this problem,the next step is to generate certain effects involving the actors (i.e. an actor casting a shadowon a virtual object in the scene, or the actor being reflected in a virtual mirror). All sucheffects can normally only be generated with a lot of „tricks“ and processing time. AndreasPomi et al. [59] [60] presented a way, to stream videos directly into the ray tracer. Thesevideos are acting as textures being directly integrated into the scene.

46 CHAPTER 8. APPLICATIONS

Figure 8.13: Volume Ray Tracing: Bonsai(Courtesy of Marmitt [58])

Figure 8.14: Volume Ray Tracing: Skull(Courtesy of Marmitt [58])

In Figure 8.15 two billboards have been generated in the scene, acting as planes for thestreamed video textures. The ray tracer treats these images like textures, which causes theactors to cast realistic shadows and reflections on the floor. Also realistic television effects arepossible. In Figure 8.16 the streamed video texture is used as a television image. Similar tothe actors, the image is reflected by the table. Additionally certain color effects are possible,if a shader is added to the television, making it act like a light source. Two examples with ared and green light effect are shown in Figures 8.17 and 8.18.

Figure 8.15: Mixed Reality: Shadows andreflections (Courtesy of Pomi [59] [60])

Figure 8.16: Mixed Reality: TV with re-flections (Courtesy of Pomi [59] [60])

Figure 8.17: Mixed Reality: TV as redlight source (Courtesy of Pomi [59] [60])

Figure 8.18: Mixed Reality: TV as greenlight source (Courtesy of Pomi [59] [60])

8.2. A.I.-VISION AND COLLISION DETECTION 47

8.2 A.I.-Vision and Collision Detection

Although graphical applications for ray tracing clearly dominate the field of applications,there are also some byproducts for other problems like artificial intelligence.

Programming an artificial intelligence algorithm, which needs to react to other characters orhuman players in the game or simulation is always challenging. One part involves actually„seeing“ other characters or humans. Such problems can be very interesting in case thecomputer-controlled character is working against the player.

Modern computer games give the player a wide-range of possibilities to play against thecomputer. One genre specialized in hide-and-seek games, with the player acting as thief,needing to get past the guards without being seen. To do that, the player hides in theshadows and sneaks around corners in the back of the computer-controlled characters. Fora programmer this creates serious problems, because the A.I. algorithms of the guards needto check if they are actually seeing the player. This is easy, if there is no direct line of sightbetween the guard and the thief, but problematic, if the player hides in a dark corner theguard is directly looking at or if the player is allowed to use camouflage (see Figures 8.19,8.20).

Figure 8.19: A.I. Vision - Visible player(Courtesy of Pohl [52])

Figure 8.20: A.I. Vision - Hidden player(Courtesy of Pohl [52])

In order to detect the player, the computer-controlled character could use the ray tracer torender a few pixels of the player. A heuristic based on the contrast or the color of the renderedpixels could then be used to check, if the guard is seeing the player, or if the camouflage issuccessful. Aside of the heuristic all code needed, already exists within the ray tracer.

A similar solution is possible, if collision detection is needed. Ray tracing a few samples isenough to perform a collision detection against the real geometry of the scene. No boundingboxes or other ways to simplify the problem need to be used.

Chapter 9

Conclusions

The last chapter in this thesis should give a short summary of the described topics. Also someconclusions are made, based on the literature read, while working on this document.

9.1 Summary

The first part of this thesis described, why ray tracing became so important over the pastyears. Afterwards, the terms and basic algorithms to perform ray tracing have been presented,featuring simple, recursive, distribution and shader approaches. The advantages of ray tracingcompared to traditional rasterization techniques conclude Chapter 2. Section 3 showed themost important structures to accelerate ray tracing (i.e. uniform grids, kd-trees and boundingvolume hierarchies). As these have only been designed for static scenes, support of dynamicenvironments is a necessity for computer games and similar applications. In case the userwants to interact with the scene in an unpredictable way, dynamic acceleration structures needto be used, which have been described in Chapter 4. Several different approaches, realizingrealtime ray tracing, were discussed in Section 5, spanning CPU, CPU-GPU-hybrid, GPU andspecial-purpose hardware approaches. The main focus on the following pages in Chapter 6was laid upon the GPU, especially the stream programming approach developed by Purcell etal.. Different acceleration structures behave in a different way on the CPU and GPU, whichwas shown on the next pages. Benchmarks based on the represented approaches are alsopresented, showing, that GPU approaches can compete with current CPU implementations.Section 7 shed some light on the first approach for a common ray tracing API named OpenRT.Some examples were presented in Chapter 8, showing the unique applications of ray tracers.

9.2 Final Thoughts

Even with ray tracing being a relatively old approach, the most research, regarding realtimeapplications, has been completed during the past five to six years. At the time of this writingthe main goal is to develop an acceleration structure especially created for dynamic scenes.

All ray tracing implementations, using the GPU, either alone or in conjunction with theCPU, come to the conclusion, that the reachable speeds are not that different from their CPU

49

50 CHAPTER 9. CONCLUSIONS

counterparts. Looking at the approach by Carr et al. [37] and Purcell et al. [39], these areinteresting conclusions. Carr used an older ATI Radeon 8500, Purcell the newer ATI Radeon9700 Pro graphics card. The speed difference between both cards is a few magnitudes often. Now, comparing GPUs to CPUs, it is commonly known, that GPUs are evolving muchfaster in terms of computation power than CPUs. Still, Carr and Purcell are coming to thesame conclusion, that their implementation is only as fast as the CPU implementations oftheir time. This seems rather odd, compared to the sheer computing power difference reachedduring these years. The only conclusion is, that acceleration structures are more easily tobe implemented and traversed on the CPU. Additionally, the new structures proposed fordynamic scenes (see Chapter 4) are much more complicated, than the traditional ones (i.e.uniform grids). Therefore their implementation will be much easier on the CPU than on theGPU.

However, the GPU already evolved into a fast parallel co-processor. Since Carr and Purcellthe programming constraints have already diminished a lot. Perhaps the GPU will throwaway these chains in the years to come, but if that happens, it already evolved into a fullfeatured processor which might replace the CPU.

Looking through the papers and applications of ray tracing, the CPU implementations dom-inate. These are easier to implement and scale than their GPU counterparts. PC clusters ofnormal desktop computers can easily be expanded, especially with new approaches like theOpenRT API, increasing the processing power. Especially the field of dual- and quad-coreCPUs currently showing up on the market open new possibilities. Wald and others have oftenmentioned, that PC clusters do not reach their full potential, because the data needs to bedistributed across ethernet in an intelligent way to avoid bottlenecks. With multi-core CPUsthis problem is being partially solved, because already existing multi CPU systems can easilybe expanded, featuring more parallel working cores (multi-core supercomputers are left outby intent, because they are not commonly available and expensive).

Based on the research I have done, I believe, that the GPU will not win in the field ofray tracing, due to the reasons described in this thesis. However, ray tracing will dominatethe field of computer graphics in a few years, even in interactive environments. The bruteforce calculation power of multi-core CPUs will make it possible, to raise the frame ratesto realtime levels (more than 4-7 fps as shown in the previous chapters). As soon as thishappens, traditional rasterization approaches will be displaced by new ray tracing graphicsengines. Physically correct effects will be possible by simply tracing rays, without the needto use programming „tricks“.

Bibliography

[1] The Quotations Page, Discovery Quotes, http://www.quotationspage.com/subjects/discovery/

[2] Munich University of Applied Sciences, Master of Computer Science, http://www.cs.fhm.edu/m_inf/englisch/index.html

[3] Wikipedia, Home computer, http://en.wikipedia.org/wiki/Home_computer

[4] Infocom Games, Zork I, http://www.infocom-if.org/games/zork1/zork1.html

[5] Internet Movie Database, Final Fantasy: The Spirits Within (2001), http://www.imdb.com/title/tt0173840/

[6] Ice Age, Movie Hompage, http://www.iceagemovie.com/

[7] A. Appel: Some Techniques for Shading Machine Renderings of Solids. Proceedings ofthe AFIPS Spring Joint Computer Conference, AFIPS Press, Reston, VA., Vol. 32, 1968,37-49

[8] I. Wald: Realtime Ray Tracing and Interactive Global Illumination. Saarland University,PhD Tesis, 2004.

[9] V. Havran: Heuristic Ray Shooting Algorithms. Czech Technical University, Prague, PhDThesis, November 2000.

[10] T. Whitted: An Improved Illumination Model for Shaded Display. CACM, 23(6), pp.343-349, June 1980.

[11] R. Cook, T. Porter, L. Carpenter: Distributed Ray Tracing. Computer Graphics (Pro-ceeding of SIGGRAPH 84), 18(3), pp. 137-144, 1984.

[12] K. Akeley, D. Kirk, L. Seiler, P. Slusallek, B. Grantham: When will Ray-Tracing replaceRasterization? Panel at SIGGRAPH 2002.

[13] A. Glassner: An Introduction to Ray Tracing. Morgan Kaufmann, 1989. ISBN 0-12286-160-4.

[14] T. Ullmann, A. Schmidt, D. Beier, Beat Brüderlin: Adaptive Progressive Vertex Tracingfor Interactive Reflections. Computer Graphics Forum (Proceedings of Eurographics),20(3), September 2001.

51

52 BIBLIOGRAPHY

[15] E. A. Haines, D. P. Greenberg: The Light Buffer: A Ray Tracer Shadow Testing Accel-erator. IEEE Computer Graphics and Applications, 6(9):6-16, September 1986.

[16] S. Fernandez, K. Bala, D. Greenberg: Local Illumination Environments for Direct Light-ing Acceleration. In Proceedings of 13th Eurographics Workshop on Rendering, pages7-14, Pisa, Italy, June 2002.

[17] J. Amanatides, A. Woo: A Fast Voxel Traversal Algorithm for Ray Tracing. In G.Marechal, editor, Eurographics ’87, pages 3-10. Elsevier Science Publishers, Amsterdam,North-Holland, 1987.

[18] I. Wald, C. Benthin, M. Wagner, P. Slusallek: Interactive Rendering with Coherent RayTracing. Computer Graphics Forum, 20(3):153-164, 2001. (Proceedings of Eurographics2001).

[19] A. Fujimoto, T. Tanaka, K. Iwata: ARTS: Accelerated Ray Tracing System, IEEE Com-puter Graphics and Applications, Volume 6, Issue 4, 1986, p. 16-26.

[20] N. Thrane, L. Simonsen: A Comparison of Acceleration Structures for GPU Assisted RayTracing. Master’s Thesis, August 2005.

[21] M. Christen: Implementing Ray Tracing on the GPU. Diploma Thesis, University ofApplied Sciences Basel (FHBB), January 2005.

[22] J. L. Bentley: Multidimensional Binary Search Trees Used for Associative Searching.Comm. of the ACM, 18(9), 509-517, Stanford University, 1975.

[23] K. Somchaipeng, K. Erleben, J. Sporring: A Multi-Scale Singularity Bounding VolumeHierarchy. Technical Report no. 2004/08, ISSN: 0107-8283, University of Copenhagen,Denmark, 2004.

[24] I. Wald, S. Boulos, P. Shirley: Ray Tracing Deformable Scenes using Dynamic BoundingVolume Hierarchies. SCI Institute, University of Utah. Technical Report UUSCI-2006-015.

[25] I. Wald, T. Ize, A. Kesler, A. Knoll, S. Parker: Ray Tracing Animated Scenes usingCoherent Grid Traversal. SCI Institute, University of Utah. Technical Report UUSCI-2006-014.

[26] I. Wald, C. Benthin, P. Slusallek: Distributed Interactive Ray Tracing of Dynamic Scenes.IEEE Symposium on Parallel and Large-Data Visualization and Graphics 2003: 77-86.

[27] M. Slater, J. Mortensen, P. Khanna, I. Yu: A Virtual Light Field Approach to GlobalIllumination. Proceedings of Computer Graphics International, Greece, June 12-19, 2004.

[28] J. Lext, U. Assarsson, T. Möller: A Benchmark for Animated Ray Tracing. Departmentof Computer Engineering, Chalmers University of Technology, Göteborg, Sweden, May2000.

[29] E. Haines: A Proposal for Standard Graphics Environments. IEEE Computer Graphicsand Applications, vol. 7, no. 11, pp. 3-5, November 1987.

[30] BART, A Benchmark for Animated Ray Tracing, http://www.ce.chalmers.se/old/BART/

BIBLIOGRAPHY 53

[31] M. J. Muuss: Towards Real-Time Ray-Tracing of Combinatorial Solid Geometric Models.In Proceedings of BRL-CAD Symposium ’95, June 1995.

[32] S. Parker, P. Shirley, Y. Livnat, C. Hansen, P. Sloan: Interactive Ray Tracing. In Pro-ceedings of Interactive 3D Graphics (I3D), pages 119-126, April 1999.

[33] I. Wald, C. Benthin, M. Wagner, P. Slusallek: Interactive Rendering with Coherent RayTracing. Computer Graphics Forum, 20(3):153-164, 2001. Proceedings of Eurographics2001.

[34] GPGPU, General Purpose Programming on the GPU, http://www.gpgpu.org/

[35] NVidia, SLIZone, http://www.slizone.com/page/home.html

[36] ATI, CrossFire, http://www.ati.com/technology/crossfire/index.html

[37] N. Carr, J. Hall, J. Hart: The Ray Engine. Graphics Hardware (2002), pp. 1-10.

[38] T. Purcell, I. Buck, W. R. Mark, P. Hanrahan: Ray tracing on programmable graphicshardware. In SIGGRAPH ’02: Proceedings of the 29th annual conference on Computergraphics and interactive techniques, pages 703-712, New York, NY, USA, 2002. ACMPress.

[39] T. Purcell: Ray Tracing on a Stream Processor. PhD Thesis, March 2004.

[40] B. Bisballe, J. F. Andersen: GPGPU Library on Linux - Raytracing. Semester Thesis,26. April 2005.

[41] M. Meißner, U. Kanus, W. Straßer,: VIZARD II, A PCI-Card for Real-Time VolumeRendering. Proceedings Eurographics/Siggraph Workshop on Graphics Hardware, 1998.

[42] H. Pfister, J. Hardenbergh, J. Knittely, H. Lauery, L. Seilery: The VolumePro Real-TimeRay-Casting System. Mitsubishi Electric. Proceedings of SIGGRAPH 1999.

[43] H. Kobayashi, K. Ichi Suzuki, K. Sano, N. Oba: Interactive Ray-Tracing on the 3DCGi-RAM Architecture. In Proceedings of ACM/IEEE MICRO-35, 2002.

[44] J. Schmittler, S. Woop, D. Wagner, W. Paul, P. Slusallek: Realtime Ray Tracing ofDynamic Scenes on an FPGA Chip. Proceedings of Graphics Hardware 2004.

[45] S. Woop, J. Schmittler, P. Slusallek: RPU: A Programmable Ray Processing Unit forRealtime Ray Tracing. Siggraph 2005.

[46] J. Schmittler: SaarCOR - A Hardware-Architecture for Realtime Ray Tracing. SaarlandUniversity, PhD Thesis, 2006.

[47] BrookGPU, Library for Stream Programming on the GPU, http://graphics.stanford.edu/projects/brookgpu/

[48] T. Foley, J. Sugerman: KD-Tree Acceleration Structures for a GPU Raytracer. In HWWS’05 Proceedings (2005), ACM Press, pp. 15-22.

[49] OpenRT, Real-Time Ray Tracing Project, http://www.openrt.de/

[50] Pixar, RenderMan, https://renderman.pixar.com/products/whatsrenderman/index.htm

54 BIBLIOGRAPHY

[51] J. Schmittler, D. Pohl, T. Dahmen, C. Vogelgesang, P. Slusallek: Realtime Ray Tracingfor Current and Future Games. 2004.

[52] Daniel Pohl, Quake 3 Raytraced, http://graphics.cs.uni-sb.de/~sidapohl/egoshooter/

[53] id Software, Homepage, http://www.idsoftware.com/

[54] Saarland University, Computer Games Based on Ray Tracing, http://graphics.cs.uni-sb.de/RTGames/

[55] I. Wald, P. Slusallek, C. Benthin: Interactive Distributed Ray-Tracing of Highly ComplexModels, Rendering Techniques 2001 / Proceedings of the EUROGRAPHICS Workshopon Rendering 2001, pp 274 -285, London, June 2001.

[56] I. Wald, A. Dietrich, P. Slusallek: An Interactive Out-of-Core Rendering Frameworkfor Visualizing Massively Complex Models. Rendering Techniques 2004, EurographicsSymposium on Rendering.

[57] C. Benthin, I. Wald, P. Slusallek: Interactive Ray Tracing of Free-Form Surfaces. Pro-ceedings of Afrigraph 2004, Stellenbosch (Cape Town), South Africa, November 3-5,2004.

[58] G. Marmitt, A. Kleer, I. Wald, G. Friedrich, P. Slusallek: Fast and Accurate Ray-VoxelIntersection Techniques for Iso-Surface Ray Tracing. Proceedings of Vision, Modeling,and Visualization (VMV) 2004, Stanford (CA), USA, November 16-18, 2004.

[59] A. Pomi, G. Marmitt, I. Wald, P. Slusallek: Streaming Video Textures for Mixed RealityApplications in Interactive Ray Tracing Environments in Ertl, Girod, Greiner, Niemann,Seidel, Steinbach, Westermann (Eds.): Vision, Modeling and Visualization (VMV) 2003Proceedings, Munich, Germany, November 19-21, 2003, pp. 261-269

[60] A. Pomi, P. Slusallek: Interactive Mixed Reality Rendering in a Distributed Ray TracingFramework. IEEE and ACM International Symposium on Mixed and Augmented Reality(ISMAR) 2004, Student Colloquium.