Interactive Global Illuminationwald//Publications/2002/IGI/igi02.pdf · Interactive Global...

Technical Report TR-2002-02, Computer Graphics Group, Saarland University

Interactive Global Illumination

Ingo Wald† Thomas Kollig‡ Carsten Benthin† Alexander Keller‡ Philipp Slusallek†

†Saarland University ‡Kaiserslautern University

Figure 1: Scene with complete global illumination computed at 1 fps (640×480 on 16 dual-AthlonMP 1800+ PCs) while the book and glassball are moved by a user. Note the changing soft shadows, the caustics from the glass ball, the reflections in the window, and especially thecolor bleeding effects on the walls due to indirect illumination.

Abstract

Interactive graphics has been limited to simple direct illuminationthat commonly results in an artificial appearance. A more realisticappearance by simulating global illumination effects has been toocostly to compute at interactive rates.

In this paper we describe a new Monte Carlo-based global illu-mination algorithm. It achieves performance of up to 10 frames persecond while arbitrary changes to the scene may be applied inter-actively. The performance is obtained through the effective use ofa fast, distributed ray-tracing engine as well as a new interleavedsampling technique for parallel Monte Carlo simulation. A newfiltering step in combination with correlated sampling avoids thedisturbing noise artifacts common to Monte Carlo methods.

CR Categories: I.3.7 [Computer Graphics]: Three-DimensionalGraphics and Realism—Raytracing, Animation, Color, shad-ing, shadowing, and texture; I.6.8 [Simulation And Model-ing]: Types of Simulation—Animation, Monte Carlo, Parallel,Visual; I.3.3 [Graphics Systems]: Picture/Image Generation—Display algorithms; I.3.2 [Graphics Systems]: Graphics Systems—Distributed/network graphics;

Keywords: Animation, Distributed Graphics, Illumination, MonteCarlo Techniques, Parallel Computing, Ray Tracing, RenderingSystems.

1 Introduction

With the availability of fast and inexpensive graphics hardware, in-teractive 3D graphics has become a mainstream feature of todaysdesktop and even notebook computers. All these systems are basedon the rasterization pipeline. Recent hardware features such asmulti texturing, vertex programming, and pixel shaders [Lindholmand Moreton 2001; NVIDIA 2001] have significantly increased therealism achievable with this environment.

However, these interactive systems are still limited to simple di-rect illumination from the light sources. With the pipeline renderingmodel it is impossible to compute the interaction between objectsin the scene directly. Even simple shadows must be approximatedin separate rendering passes for each light source.

Commonly, more complex lighting effects are precomputed off-line with existing global illumination algorithms. They are expen-sive and slow, taking in the order of minutes to hours for a singleupdate. Obviously, this works only for static illumination in equallystatic scenes and is insufficient for the highly dynamic environmentof interactive applications.

The importance of realistic illumination and global illumina-tion effects becomes apparent if we consider the significant ef-forts invested in lighting effects for the production of real and vir-tual movies. They employ a large staff of specially trained light-ing artists to create the required atmosphere and mood of a scenethrough an appropriate illumination.

In real movie productions the artists have to work with the globaleffects of real light sources, while lighting artists in virtual produc-tions have to simulate any global effects with local lighting. Onlyrecently have global illumination algorithms been introduced in thisenvironment, mainly in order to reduce time and money by automat-ing some of the lighting effects. Previously global illumination wasconsidered too inflexible and time consuming to be useful.

However, realistic lighting has a much wider range of appli-cation. It is instrumental in achieving realistic images of virtualobjects, supporting the early design and prototyping phases in aproduction pipeline. Realistic lighting is particularly important forlarge, expensive projects as in the car and airplane industry but isequally applicable for projects from architecture, interior design,industrial design, and many others.

Algorithms for computing such realistic lighting often rely on


ray tracing, which has long been well known for it high render-ing times. Only but recently, research in faster and more efficientray-tracing has drastically changed the environment in which globalillumination operates. Even on commodity hardware, ray-tracinghas been accelerated by more than an order of magnitude [Waldet al. 2001a; Wald et al. 2001b]. In addition, novel techniques al-low for an efficient and scalable distribution of the computationsover a number of client machines.

Since ray-tracing is at the core of most global illumination al-gorithms, one should suspect that global illumination algorithmsshould equally benefit from these developments. However, it turnsout that fast ray-tracing implementations impose constraints thatare incompatible with most existing global illumination algorithms(see Section 2.1).

1.1 Outline of the new Algorithm

In this paper we introduce a new, Monte Carlo global illumina-tion algorithm that is specifically designed to work within the con-straints of newly available distributed interactive ray tracers in orderto achieve interactive performance on a small cluster of PCs.

Our algorithm efficiently simulates direct and mostly-diffuse in-terreflection as well as caustic effects, while allowing arbitrary andinteractive changes to the scene. It avoids noise artifacts that areparticularly visible and distracting in interactive applications.

Interactive performance is obtained through the effective use ofa fast, distributed ray-tracing engine for computing the transport oflight, as well as new interleaved sampling and filtering techniquesfor parallel Monte Carlo simulations. The idea of instant radios-ity [Keller 1997] is used to compute a small number of point lightsources by tracing particles from the light sources. Indirect illumi-nation is then obtained by computing the direct illumination withshadows from this set of point light sources.

In addition illumination via specular paths is computed by shoot-ing caustic photons [Jensen 2001] towards specular surfaces, ex-tending their paths until diffuse surfaces are hit and storing the hitsin a caustic photon map. All illumination from point light sourcesand caustic photons are recomputed for every frame.

We distribute the computation over a number of machines witha client/server approach by using interleaved sampling in the imageplane. Based on a fixed pattern, samples of an image tile are com-puted by different machines with each machine using a different setof point light sources and caustic photons.

Finally, we apply filtering on the master machine to combine theresults from neighboring pixels. This step is important for achiev-ing sufficient image quality as it implicitly increases the numberof point lights and caustic photons used at each pixel. We currentlyuse a simple heuristic based on normals and distances for restrictingthe filter to a useful neighborhood.

1.2 Structure of the Paper

We start in Section 2 with an overview of the fast ray-tracing sys-tems that have recently become available. In particular we discussthe constraints those systems impose on global illumination algo-rithms. In Section 3 we review previous work specifically with re-spect to these constraints. The details of the new global illuminationalgorithm are then discussed in Section 4 before presenting resultsin Section 5.

2 Fast Ray-Tracing

Ray-tracing is one of the oldest and most fundamental techniquesused in computer graphics [Appel 1968; Whitted 1980; Cook et al.1984]. In its most basic form it is used for computing the visibility

along a ray or between two points. Most global illumination algo-rithms use ray-tracing in their core procedures to determine visi-bility or to compute the transport of light via particle propagation.Often most of their computation time is actually spent inside theray-tracer.

Ray-tracing is also well-known for its long computation times.It requires traversing a ray through a precomputed index structurefor locating geometry possibly intersecting the ray, computing theactual intersections, and finally evaluating some shader code at theintersection point. Each step involves significant computation andmost applications require tracing up to several million rays. Con-sequently, those algorithms were limited to off-line computation,taking minutes to hours for computing a single image or global il-lumination solution.

Recently, ray-tracing has been optimized to deliver interactiveperformance on certain platforms. Muuss [Muuss and Lorenzo1995; Muuss 1995] and Parker et al. [Parker et al. 1999b; Parkeret al. 1999a; Parker et al. 1998] have shown that the inherent paral-lelism of ray-tracing allows one to efficiently scale to many proces-sors on large and expensive supercomputers with shared memory.Exploiting the scalability of ray-tracing and combining it with low-level optimizations allowed them to achieve interactive frame ratesfor a full-featured ray-tracer, including shadows, reflections, differ-ent shader models, and non-polygonal primitives such as NURBS,isosurfaces, or CSG models.

Last year, Wald et al. [Wald et al. 2001a] have shown that in-teractive ray-tracing performance can also be obtained on inexpen-sive, off-the-shelf PCs. Their implementation is designed for goodcache performance using optimized intersection and traversal algo-rithms and a careful layout and alignment of core data structures. Aredesign of the core algorithms also allowed them to exploit com-monly available processor features like prefetching, explicit cachemanagement, and SIMD-instructions.

Together these techniques increased the performance of ray-tracing by more than an order of magnitude compared to other soft-ware ray-tracers [Wald et al. 2001a]. Because ray-tracing scaleslogarithmically with scene complexity, ray-tracing on a single CPUwas able to even outperform the fastest graphics hardware for com-plex models and moderate resolutions.

In a related publication it was shown that ray-tracing scales wellalso in a distributed memory environment using commodity PCsand networks [Wald et al. 2001b]. Distributed computing is imple-mented in a client/server model and is mainly based on demand-loading and caching of scene geometry on client machines, as wellas on load-balancing and hiding of network latencies through re-ordering of computations. It achieves interactive rendering perfor-mance even for scenes with tens of millions of triangles. Usingmore processors also allows one to use more expensive renderingtechniques, such as reflections and shadows.

It is an obvious next step to use the fast ray-tracing engine to alsospeed up the previously slow global illumination algorithms thatdepend so heavily on ray-tracing. However, it turns out that thisis not as simple as it seems at first: Most of these algorithms areincompatible with the requirements imposed by a fast ray-tracingsystem.

2.1 Constraints and Requirements

There is a large number of constraints imposed on any potentialglobal illumination algorithm. In the following we briefly discusseach of these constraints.

2.1.1 Computational Constraints

Complexity. It has been shown that interactive ray-tracingcan handle huge scenes with tens of millions of triangles effi-


ciently [Wald et al. 2001b]. However, in most cases complex en-vironments translate to complex illumination patterns that are sig-nificantly more costly to compute. While pure ray-tracing scaleslogarithmically with scene geometry, this is not the case for globalillumination computations in general. This limits the complexity ofscenes that we will be able to simulate interactively.

Performance. A target resolution of 640×480 pixels containsroughly 300,000 pixels. Furthermore, we assume that a single clientprocessor can trace roughly 500,000 rays per second. Given a smallnetwork of PCs with 30 processor, e.g. 15 dual processor machines,we have a total theoretical budget of only 50 rays per pixel for esti-mating the global illumination for an image.

Parallelization. Due to price and availability considerations, weare targeting networks of inexpensive but fast PCs with standardnetwork components. Due to this highly parallel environment, theglobal illumination algorithm must also run in parallel across anumber of client machines. Furthermore, the ray-tracer schedulesbundles of ray trees to be computed on each client. For best perfor-mance, the algorithms should offer enough independent tasks andthese tasks should be organized similar to the ray scheduling pat-tern.

Other Costs. The underlying engine significantly speeds upray-tracing compared to previous implementations. However, itsspeedup is limited to this algorithm. Other costs in a global illu-mination algorithms that have previously been dominated by ray-tracing can easily become the new bottleneck.

Communication. Commodity network technology, such asFast-Ethernet or even Gigabit-Ethernet present significant hurdlesfor distributed computing. Compared to shared-memory systems,communication parameters differ by several orders of magnitude:bandwidth is low, measured in megabytes versus gigabytes per sec-ond, and latencies are high, measured in milliseconds versus frac-tions of microseconds.

Thus, a good algorithm must keep its bandwidth requirementswithin the limits of the network and must avoid introducing laten-cies. In particular, it must minimize synchronization across the net-work, which would result in costly round trip delays and in clientsrunning idle.

Global Data. Many existing global illumination algorithmsstrongly depend on access to some global data structure. However,access to global data must be minimized for distributed tasks asit causes network delays and possible synchronization overhead toprotect updates to the data.

Access to global data is less problematic before and after tasksare distributed to clients. In a client/server environment the globaldata can be maintained by the master and is then streamed to andfrom clients together with other task parameters.

Ideally, global data to be read by a client is transmitted to it withthe initial task parameters. The task then performs its computa-tions on the client without further communication. Global data tobe written is then transmitted back to the master together with theother results. One must still be careful to avoid network latenciesand bandwidth problems.

Amortization. Many existing algorithms perform lengthy pre-computations before the first results are available. This processingis then amortized over remaining computations. Unfortunately, thisstrategy is inadequate for interactive applications, where the goalis to provide immediate feedback to the user. Preprocessing mustbe limited to a few milliseconds per frame. Furthermore, it must

also be amortized over at most a few frames, as it might otherwisebecome obsolete due to interactive changes in the environment.

Accumulation. During interactive sessions many lighting pa-rameters change constantly, making it difficult to accumulate andreuse previous results. However, this technique can be used to im-prove the quality of the global illumination solution in static situa-tions.

Coherence. A fast ray-tracer depends significantly on coherentsets of rays to make good use of caches and for an efficient use ofSIMD computations [Wald et al. 2001a]. An algorithm should sendrays in coherent batches to achieve best performance.

2.1.2 Quality Constraints

Illumination Effects. Given current technology, it seems unre-alistic to expect perfect results at interactive rates, yet. Thereforewe focus on the major contributions of global illumination, suchas direct and indirect illumination by point and area light sources,reflection and refraction, and direct caustics. Less attention is paidto less important effects like glossy reflection or caustics of higherorder.

Display Quality. Operating at extremely low sampling rates, wehave to deal with the resulting sampling artifacts. While some ofthese artifacts (e.g. high-frequency noise) are hardly perceivable instill images, they may become noticeable and highly disturbing ininteractive environments. Therefore special care has to be taken oftemporal artifacts.

Interactivity. We define interactive performance to be at least 1global illumination solution per second. Since higher rates are verydesirable, we are aiming for 4-5 frames per second.

3 Previous Work

The global illumination problem has been formularized by the ra-diance equation [Kajiya 1986].

Using finite element methods, increasingly complex algorithmshave been developed to approximate the global solution of the radi-ance equation. In diffuse environments, radiosity methods [Cohenand Wallace 1993] were the first that allowed for interactive walk-throughs, but required extensive preprocessing and thus were avail-able only for static scenes. Accounting for interactive changes byincremental updates [Drettakis and Sillion 1997; Granier and Dret-takis 2001] forces expensive updates to global data structures andis difficult to parallelize.

The use of rasterization hardware allows for interactive displayof finite element solutions. However, glossy and specular effectscan only be approximated or must be added by a separate ray trac-ing pass [Stamminger et al. 2000]. Due to the underlying finiteelement solution, these approaches are not available at interactiverates.

Path tracing based algorithms [Cook et al. 1984; Kajiya 1986;Chen et al. 1991; Veach and Guibas 1994; Veach and Guibas 1997]correctly handle glossy and specular effects. The view dependencyrequires to recompute the solution for every frame. The typical dis-cretization artifacts of finite element methods are replaced by lessobjectionable noise [Ramasubramanian et al. 1999], which how-ever is difficult to handle over time. Reducing the noise to accept-able levels usually is obtained by increasing the sampling rate andresults in frame rates that are far from interactive.

Walter et al. [Walter et al. 1999] achieved interactive frame ratesby reducing the number of pixels computed in every frame. Results


from previous frames are reused through image-based reprojection.It is most effective for costly, path tracing based algorithms. How-ever, the approach results in significant rendering artifacts and isdifficult to parallelize.

Path tracing based approaches can be supplemented by photonmapping [Jensen 2001]. This simple method of direct simulationof light results in biased solutions, but allows to efficiently rendereffects like caustics that may be difficult to generate with previousalgorithms. Direct visualization of the photon map usually resultsin visible artifacts. A local smoothing or final gather pass can beused for removing these artifacts but is prohibitively expensive dueto the large number of rays that must be traced.

Exploiting the local smoothness of the irradiance, an extrapo-lation scheme [Ward and Heckbert 1992] has been developed thatconsiderably reduces the rendering time required by a local pass.For a parallel implementation global synchronization and commu-nication are necessary to provide each processor with the extrap-olation samples. Far too many of the expensive samples are con-centrated around corners as illustrated in [Jensen 2001, p. 143] andthe position of the samples is hardly predictable. This requires ei-ther dense initial samples and consequently is expensive or resultsin tremendous popping artifacts when changing geometry duringinteraction.

Instant radiosity [Keller 1997] allows for interactive radiositywithout solution discretization: The lighting in a scene is approxi-mated by point light sources generated by a quasi-random walk, andrasterization hardware is used for shadow computation. Arbitraryinteractive changes to the environment were possible. However thelarge number of rendering passes required for a single frame limitedinteractivity to relatively simple environments.

4 Algorithm

This technical section introduces the new algorithm that meets theconstraints (see Section 2.1) that were imposed by a low cost clus-ter of consumer PCs and overcomes the problems of previous work.For brevity of presentation space we assume familiarity with the ra-diance integral equation and refer to standard texts like e.g. [Cohenand Wallace 1993].

Similar to distribution ray-tracing [Cook et al. 1984] for eachpixel an eye path is generated by a random walk. The scatteredradianceL(x,ω) (for the selected symbols see Figure 2) at the end-pointx of the path in directionω is approximated by

L(x,ω)

= Le(x,ω)+∫

SV(y,x) fr (ωyx,x,ω)Lin(y,x)

cosθycosθx

|y−x|2dA(y)

≈ Le(x,ω)+M

∑j=1

V(y j ,x) fr (ωy j x,x,ω)L j

cosθy jcosθx

|y j −x|2

+1

πr2

N

∑j=1

Br (zj ,x) fr (ω j ,x,ω)Φ j ,

whereP := (y j ,L j )Mj=1 is the set of point lights iny j with radi-

anceL j [Keller 1997] andC := (zj ,ω j ,Φ j )Nj=1 is the set of caus-

tic photons that are incident from directionω j in zj with the fluxΦ j [Jensen 2001]. These sets have to be generated at least onceper frame by random walks of fixed maximum path length. Afterthis preprocessing step the scattered radiance is determined by onlyvisibility testsV(y j ,x) and photon queries, whereBr (zj ,x) is 1 if|zj −x| ≤ r and 0 else.

In comparison to bidirectional path tracing [Veach and Guibas1994] the first sum uses only one technique to generate path space

S scene surfaceA area measureL scattered radianceLe emitted radianceLin incident radiancefr bidirectional scattering distribution functionV(y,x) ∈ {0,1} mutual visibility ofy andxθ angle between incident direction and normalωyx direction fromy to x

Figure 2: Selected Symbols.

samples. For the majority of all path space samples this techniqueis best or at least sufficient to generate them. An exception are pathspace samples with a small distance|y j − x| or which are belong-ing to caustics. In order to avoid overmodulation the first group ishandled in a biased way by just clipping the distance to a minimalvalue. Since samples of the second group cannot be generated bythis technique their contribution is approximated by the second sumusing photon mapping.

For each pixel only one eye path is generated allowing for higherframe rates during interaction. The flickering of materials inher-ent with random walk simulation is reduced by splitting the eyepath once at the first point of interaction with the scene. Then foreach component of the bidirectional scattering distribution functiona separate path is continued. Anti-aliasing is performed by accumu-lating the images over time, progressively improving image qualityduring times of no interaction.

In order to obtain interactive frame rates with the above algorith-mic core on a cluster of PCs, the preprocessing must not block theclients and avoid multiple computation of identical results as far aspossible, while further variance reduction is still needed to reducenoise artifacts and increase efficiency. The necessary improvementsare discussed in the sequel.

4.1 Fast Caustics

Shooting a sufficient number of photons is affordable in an inter-active application, since the random walk simulations require onlya small fraction of the total number of rays to be shot. However,the original photon map algorithms [Jensen 2001] for storing andquerying photons are far too slow for interactive purposes: Re-building the 3d-tree for the photon map for every frame does notamortize, and the nearest neighbor queries are as costly as shootingseveral rays.

Therefore photon mapping is applied only to visualize caustics,where usually the photon density is rather high, and density estima-tion is applied with a fixed filter radiusr. Assuming the photonsto be stored in a 3-dimensional regular grid of resolution 2r, only8 voxels have to be looked up for a query. Since in practice only afew voxels will actually be occupied by caustic photons, a simplehashing scheme is used in order to avoid storing the complete grid.The photons that are potentially in the query ball are found almostinstantaneously by 8 hash table lookups. Hashing and storing thephotons in the hash table is almost negligible as compared to 3d-tree traversal and left-balancing of the 3d-tree as presented in theoriginal work.

4.2 Interleaved Sampling

Generating a different set of point lights for each pixel is too costly,while using the same set causes aliasing artifacts (see Figure 3a).The same arguments hold for the direct visualization of the causticphoton map. In spite of the previous section’s improvements, com-puting a sufficiently large photon map in parallel and merging the


a) no interleaved samplingno discontinuity buffer

b) 5×5 interleaved samplingno discontinuity buffer

c) 5×5 interleaved sampling3×3 discontinuity buffer

d) 5×5 interleaved sampling5×5 discontinuity buffer

Figure 3: Interleaved sampling and the discontinuity buffer: All close-ups have been rendered with the same number of rays apart frompreprocessing. In a) only one set of point light sources and caustic photons is generated, while for b)-d) 25 independent such sets have beeninterleaved. Choosing the filter size appropriate to the interleaving factor completely removes the structured noise artifacts.

results would block the clients before actually rendering the frameand decrease the available network bandwidth.

However, generalizing interleaved sampling [Molnar 1991;Keller and Heidrich 2001] allows one to control the ratio of pre-processing cost and aliasing: Each pixel of a smalln×m grid isassigned a different setPk of light points andCk of caustics pho-tons (1≤ k≤ n ·m). Padding this prototype over the whole imagereplaces aliasing artifacts by structured noise (see Figure 3b) whileonly a small numbern·mof sets of light points and caustic photonshas to be generated. Since interleaved sampling achieves a muchbetter visual quality, the setsPk andCk can be chosen much smallerthanP andC.

Each client computes the setsPk of point lights andCk of caus-tics photons by itself. Parallel tasks are assigned such that a clientpredominantly is processing tiles of equal identificationk in orderto allow for perfect cachingPk andCk. Due to interleaved samplingsynchronizing for global setsP andC (as opposed to [Christensen2001]) is obsolete and in fact no network transfers are required.

4.3 The Discontinuity Buffer

The constraints of interactivity allow for only a small budget ofrays to be shot, resulting in setsPk andCk of moderate size. Conse-quently the variance is rather high and has to be reduced in order toremove the noise artifacts. Taking into account that the irradiance isa piecewise smooth function the variance can be reduced efficientlyby the discontinuity buffer.

For each pixel the server buffers the reflectance function accu-mulated up to the end point of the eye path, the distance to thatpoint, the normal in that point, and the incident irradiance. The ir-radiance value consists of both the contribution by the point lightsourcesPk and the caustic photonsCk. Instead of just multiplyingirradiance and reflectance function, the irradiance of the 8 neigh-boring pixels is considered, too: Local smoothness is detected bythresholding the difference of distances and the scalar product of thenormals of the center pixel and each neighbor. If geometric conti-nuity is detected, the irradiance of the neighboring pixel is added tothe center pixel’s irradiance. The final pixel color is determined bymultiplying the accumulated irradiance with the reflectance func-tion divided by the number of total irradiances included.

In the locally smooth case this procedure implicitly increases theirradiance sampling rate by a factor of 9, while at the same time re-ducing its variance by the same factor. Note that no additional rayshave to be shot in order to obtain this huge reduction of noise, andthat a generalization to larger than 3× 3 filter kernels is straight-forward. In the discontinuous case no smoothing is possible, how-ever the remaining noise is superimposed on the discontinuities andsuch less perceivable [Ramasubramanian et al. 1999]. Since only

the irradiance is blurred, texture details on the surface are perfectlyreconstructed. Using the accumulation buffer method [Haeberli andAkeley 1990] in combination with the discontinuity buffer allowsfor oversampling in a straightforward way.

Including the direct illumination calculations into the discontinu-ity buffer averaging process, allows to drastically reduce the num-ber of shadow rays to be shot, but slightly blurs the direct shadows.Similar to the irradiance caching methods, the detection of geomet-ric discontinuities can fail. Then the same blurring artifacts becomevisible at e.g. slighty offset parallel planes.

Interleaved sampling and the discontinuity buffer perfectly com-plement each other (see Figure 3d) but require the filtering to bedone on the server. In consequence an increased amount of datahas to be sent to the server, which of course is quantized and com-pressed. Compared to irradiance caching the discontinuity buffersamples the space much more evenly and avoids the typical flick-ering artifacts encountered in dynamic scenes. In addition no com-munication is required to broadcast irradiance samples during ren-dering.

4.4 Minimal Randomization

The integrands in computer graphics are square-integrable, usuallyof high dimension and containing unknown discontinuities. Con-sequently the Monte Carlo method is appropriate for numerical in-tegration. Since the pure Monte Carlo method is rather slow, weapply the much more efficient randomized quasi-Monte Carlo in-tegration [Owen 1998]. This method of integration saves around30% of computation time as compared to stratified sampling [Kol-lig and Keller 2001] and exposes much less variance, i.e. noise.The principle consists of using the estimator

∫(0,1)s

f (x)dx≈ 1r

r

∑i=1

1n

n−1

∑j=0

f (xi, j ),

where the samplesxi, j for fixed i are of low discrepancy (for defini-tions see [Niederreiter 1992]) and for fixedj are independent setsof random realizations. Low discrepancy guarantees a much bet-ter uniform distribution of the samples than independent randomsamples can obtain. This implies good stratification properties thatguarantee for faster convergence. On the other hand the indepen-dence makes the estimator a real Monte Carlo estimate that is validfor all square-integrable functions and in addition allows one to es-timate the variance of the estimate.

The deterministic low discrepancy sequence of Sobol’ can begenerated in only a few lines of code [Press et al. 1992] in integerarithmetic. The points are randomized by justXOR-ing them witha random bit vector before floating point conversion [Friedel and


Keller 2001]. Taking the firstn pointsa j of the Sobol’ sequence,rindependent random realizations for the above scheme are obtainedby xi, j := a j ⊕bi , wherebi arer independent random bit vectors.Note that this randomization scheme preserves the good uniformityproperties of the points.

In fact choosing onlyr = 1 randomized instances is sufficient toobtain a valid Monte Carlo estimator, whiler = 2 instances alreadyenable to estimate the error (see [Sobol’ 1994]). In consequencevariance and noise inherent with Monte Carlo methods can be al-most avoided by this simple and minimal scheme of randomization.Tabulating the Sobol’ sequence provides stratified sample genera-tion at rates much faster than a pseudo-random number generatorcan achieve in combination with stratified sampling.

In the algorithm each identificationk is assigned a subsequentsubsequence of one instance of a randomized low discrepancy se-quence. These samples are used for generating the setsPk of pointlight sources andCk of caustic photons. In case of geometric con-tinuity the discontinuity buffer assembles the samples of neighbor-ing pixels such joining different subsequences. Since these subse-quences are part of the large sequence, the samples almost perfectlycomplement each other resulting in a superior convergence. In or-der to avoid the costly computation of high-dimensional low dis-crepancy sequences, padded replications sampling is used [Kolligand Keller 2001].

For the randomization each client needs an identical stream ofonly a few pseudo-random numbers per frame, which are createdby the same pseudo-random number generator on each client. Thusany parallelization problems inherent with pseudo-random numbergeneration are avoided and no communication is required duringrendering.

5 Results and Discussion

For our experiments we have used a cluster of dual processor ma-chines each equipped with two AMD AthlonMP 1800+ CPUs and512 MB of RAM. All machines are connected to a fully switched100 Mbit Ethernet. In order to handle the amount of pixel data sentto the server, a single Gigabit uplink from the switch was connectedto the master machine that otherwise was identical to the clients.

In the following some example scenes are provided to show theperformance, quality, and scalability of the proposed algorithm andits current implementation.

5.1 Example Scenes

Unfortunately it is difficult to visualize the temporal behavior ofthe approach in a printed publication. Therefore the accompany-ing video shows the examples captured in realtime from the screenof the master machine. The still images in this paper have beengrabbed from the screen during interaction. The converged imagesare accumulated results of successive frames after interaction hasstopped. This process usually takes about 1 or 2 seconds.

All scenes are rendered at video resolution of 640×480 pixelsunless stated otherwise. A maximum path length of 4 is used forgenerating the point light sources. For all of the following examples3× 3 interleaved sampling has been used in combination with a3×3 discontinuity buffer.

5.1.1 Simple Scenes with Caustics

The left image in Figure 4 shows a simple room lit by a single arealight source located underneath the ceiling above the table, wherea glass sphere casts a caustic. Global illumination is computed at3.3 fps on 8 clients, while for each pixel 22 shadow rays are castand 500 caustic photons are generated each frame. 4 of the 22 point

light sources are located directly on the light source itself, while 18are spread all over the scene.

The scene in the right image of Figure 4 contains two lightsources with 5 samples each in addition to 20 indirect light samples.Roughly 1,500 photons per light source were used to represent thetwo caustics.

Temporal flickering is minimal in both scenes due to low com-plexity of the illumination, and shadows are smooth and clearly vis-ible with all detail already in the dynamic scene. The dynamic andconverged images only can be distinguished by the slightly bettercaustics and anti-aliasing due to accumulation.

Figure 4: Two simple test scenes with a glass ball and a glass eggconsisting of 800 and 4,000 triangles. These scenes render at 3.3and 2.5 fps on 8 clients.

5.1.2 Invisible Date

The “Invisible Date” scene as shown in Figure 5 contains 9,000triangles. It is lit mostly indirectly from the two lamps pointingtowards the ceiling. No direct illumination reaches the furnitureand the reflective floor. A glass sphere is floating below the ceilingcasting caustics on the wall.

This scene nicely demonstrates the combination of specular illu-mination effects due to ray-tracing reflections and indirect illumi-nation computed with the new algorithm. It uses 4 direct samples,9 indirect samples, and 500 photons per light source. The scenerenders at 2.6 fps on 8 clients.

Smooth penumbras are produced by the shelf and other objects.Some flickering is visible due to the low number of indirect sam-ples. This could be avoided by taking more samples at the cost oflower frame rate.

Although the eye path is split once at the first point of interac-tion, subsequent path segments randomly select a component of thebidirectional scattering distribution function. This is the reason forsome small flickering artifacts visible for the teapot and the cups onthe table during interaction.

Figure 5: The Invisible Date scene containing mostly indirect illu-mination from two lights pointing towards the ceiling. It shows acombination of ray-traced reflections and smooth indirect illumina-tion. Little differences can be observed between the dynamic andthe converged image on the left and right, respectively.


5.1.3 Office

The office scene in Figure 6 contains 34,000 triangles. The light-ing conditions are quite difficult due to significant occlusion. Thecomplex indirect illumination is particularly visible in the bottomrow of Figure 6, where the strongly illuminated book acts as a sec-ondary area light source causing additional smooth shadows on thewall. In addition color bleeding is clearly visible on the wall in theupper row of images.

Global illumination is computed with 4 direct, 18 indirect, and1,000 photons per light source, which results in a frame rate of 2.2fps on 8 clients for both views. The sharp shadow boundaries fromthe point light sources visible in the non-converged images on theleft already indicate noticeable flickering. While this is disturbingduring movements it helps in locating important lighting effects.The illumination, however, quickly converges as soon as the move-ment stops and results in a highly detailed and smooth illuminationpattern.

The effects of insufficient filtering of interleaved sampling areclearly visible at curved surfaces. They are resolved in the con-verged solution on the right.

Figure 6: Two views of the office environment. The scene is illu-minated by two area lights at the ceiling and a desk lamp. The im-ages in the bottom row show a close-up of the detailed illuminationpatterns caused by the book under the desk lamp. While there issignificant flickering during movements, the solution quickly con-verges to the smooth images on the right. Both views render at 2.2frames per second on 8 clients.

5.1.4 Conference Room

The conference room test scene from Figure 7 contains illuminationfrom 104 area light sources and consists of 290,000 triangles. Dueto its material and geometrical complexity the scene renders only at1.7 fps using 12 clients.

Subsampling is used to limit the number of shadow rays castfor each pixel. In this case, a total of 5 light sources are determinedand sampled once. For the indirect illumination 20 indirect sampleswere used. Due to the well distributed location of the lights and thefiltering by the discontinuity buffer, almost no flickering is visible.However, the limitations of the discontinuity buffer again becomevisible on curved surfaces. Here an improvement of the filtering isnecessary. Finally the accumulated solution again is free from anyartifacts.

Figure 7: Conference room scene with 104 light sources and de-tailed shadows cast by the chairs. The scene renders at 1.7 fps on12 clients and hardly shows any flickering.

5.1.5 Dynamic Environments

Interactive changes to the scenes (see Figure 1) are trivially han-dled by the global illumination algorithm, because the underlyingray-tracing engine handles changing geometry transparently to theapplication.

5.2 Simulation Quality

The combination of the idea of instant radiosity, interleaved sam-pling, the discontinuity buffer, and correlated sampling by random-ized quasi-Monte Carlo integration allows our system to achieverelatively smooth image quality with very few samples.

The algorithm can be controlled by a small set of intuitive pa-rameters: the number of point light sources, the number of causticphotons, and the filter size of the discontinuity buffer that auto-matically determines an interleaved sampling pattern of same size.The user can easily trade off rendering speed for image quality byadjusting these parameters and gets immediate feedback by the in-teractive system.

Flickering due to discontinuous changes in the illumination canbecome particularly apparent when successive frames are illumi-nated by different point light sources. As shown in Section 5.1, ac-cumulating successive frames efficiently smooths out these discon-tinuities. As a consequence flickering can be reduced drastically byincluding moderate oversampling during interaction, however, at areduced frame rate.

5.3 Scalability

The client/server concept of our global illumination algorithm relieson non-blocking clients. Thus the scalability of the system dependson the ratio of the overhead and work done by the clients, on theability of the server to schedule tasks and to process their results,and on the bandwidth of the network.

The number of rays traced for a preprocess is mainly determinedby the number of caustic photons and is usually small compared tothe total number of shadow rays. If possible, tiles with the sameidentificationk are scheduled to the same client. Due to differentloads on the clients and in order to hide latencies, the ideal distri-bution cannot always be maintained. Our experiments show that onthe average the clients have to preprocess data for roughly two dif-ferent identifications. Consequently the ratio of the overhead andwork is small and hardly influences scalability.

The main workload of the server consists of handling largeamounts of pixel data and performing the filtering by the discon-tinuity buffer. Since these computations require information fromadjacent pixels computed by different clients, they have to be per-formed on the server. Consequently frame rate and resolution arerestricted by the performance of the server.


In a similar way the limited network bandwidth restricts theframe rate and resolution.

As expected, Table 1 shows that the algorithm scales almost per-fectly up to 16 clients. The maximum performance of our serveris restricted to process around 1.5 million pixels per second. Thiscorresponds to roughly 5 frames per second at 640× 480. Otherthan waiting for faster hardware this bottleneck can be avoided bydistributing the server computations. This can be done easily, butintroduces additional latency. Reducing the image resolution allowsto easily scale this maximum performance, reaching more than 10frames per second at 400×300 pixels.

Nevertheless there are no restrictions imposed on the scalabilitywith respect to image quality: Increasing the number of rays obvi-ously improves image quality. If the number of clients is increasedby the same factor, still the same number of tasks is scheduled andthe same amount of pixel data is transferred and processed. Onlylittle overhead is introduced resulting in almost identical frame ratesat unchanged resolution.

Number of clients 1 2 4 8 16Room with table 0.4 0.8 1.6 3.2 5.3Room with egg 0.3 0.7 1.4 2.7 5.4Office 0.2 0.3 0.6 1.2 2.4

Table 1: The algorithm almost perfectly scales over the range ofavailable clients. For frame rates above 5 fps the server workloadlimits performance.

6 Conclusion

We have shown that - contrary to general opinion - interactiveglobal illumination is indeed feasible, and can even be realized ona low cost cluster of consumer PC hardware. Running on 8 to 16cluster nodes, our system delivers interactive global illumination ofup to 5 frames per second at a resolution of 640×480 for scenesranging from simple environments to complex geometry contain-ing dozens of light sources. All lighting is completely recomputedevery frame, allowing one to interactively change rendering param-eters, material properties, lighting and even geometry. High qualityglobal illumination is obtained in only seconds.

All this has been achieved by analyzing the drawbacks of previ-ous approaches and by designing our system specifically to matchthe underlying framework of a fast, scalable ray-tracing engine.There is no communication or synchronization between differenttasks distributed among the clients and only little overhead calcula-tions. These are important reasons for its good scalability.

Besides including arbitrary physical surface properties and vol-umetric effects, future work will concentrate on even more exploit-ing temporal coherence. Since most of the dynamic changes arecontinuous, information can be used for predictions for unbiasedimportance sampling. Furthermore the majority of rays shot areshadow rays towards point light sources, which offer a high degreeof coherence. For this special problem, tracing the shadow rays canbe made much more efficient by tracing bundles of rays benefittingfrom spatial coherence [Lukaszewski 2001]. Such speed improve-ments directly affect the rendering speed of our system.

Acknowledgments

Thomas Kollig has been supported by the Stiftung Rheinland-Pfalzfur Innovation. This work was supported by Advanced Micro De-vices (AMD).

ReferencesAPPEL, A. 1968. Some Techniques for Shading Machine Renderings of

Solids.SJCC, 27–45.

CHEN, S., RUSHMEIER, H., MILLER , G., AND TURNER, D. 1991. Aprogressive Multi-Pass Method for Global Illumination. InComputerGraphics (SIGGRAPH 91 Conference Proceedings), 165 – 174.

CHRISTENSEN, P. 2001. Photon Mapping Tricks. InACM SIGGRAPHCourse Notes, Course 38: A Practical Guide to Global Illumintion usingPhoton Mapping. ACM SIGGRAPH, 91–115.

COHEN, M., AND WALLACE , J. 1993. Radiosity and Realistic ImageSynthesis. Academic Press Professional, Cambridge.

COOK, R., PORTER, T., AND CARPENTER, L. 1984. Distributed Ray Trac-ing. In Computer Graphics (SIGGRAPH 84 Conference Proceedings),137–145.

DRETTAKIS, G., AND SILLION , F. 1997. Interactive Update of Global Illu-mination Using A Line-Space Hierarchy. InProceedings of SIGGRAPH97, ACM SIGGRAPH / Addison Wesley.

FRIEDEL, I., AND KELLER, A. 2001. Fast generation of randomized low-discrepancy point sets. InMonte Carlo and Quasi-Monte Carlo Methods2000, H. Niederreiter, K. Fang, and F. Hickernell, Eds. Springer, 257–273.

GRANIER, X., AND DRETTAKIS, G. 2001. Incremental Updates for RapidGlossy Global Illumination. InComputer Graphics Forum, BlackwellPublishers, vol. 20, 268–277.

HAEBERLI, P., AND AKELEY, K. 1990. The Accumulation Buffer: Hard-ware Support for High-Quality Rendering. InComputer Graphics (SIG-GRAPH 90 Conference Proceedings), 309–318.

JENSEN, H. 2001.Realistic Image Synthesis Using Photon Mapping. AKPeters.

KAJIYA , J. 1986. The Rendering Equation. InComputer Graphics (SIG-GRAPH 86 Conference Proceedings), 143–150.

KELLER, A., AND HEIDRICH, W. 2001. Interleaved Sampling. InRender-ing Techniques 2001 (Proc. 12th Eurographics Workshop on Rendering),Springer, K. Myszkowski and S. Gortler, Eds., 269–276.

KELLER, A. 1997. Instant Radiosity. InSIGGRAPH 97 Conference Pro-ceedings, Annual Conference Series, 49–56.

KOLLIG , T., AND KELLER, A. 2001. Efficient bidirectional path tracing byrandomized quasi-Monte Carlo integration. InMonte Carlo and Quasi-Monte Carlo Methods 2000, H. Niederreiter, K. Fang, and F. Hickernell,Eds. Springer, 290–305.

L INDHOLM , E., AND MORETON, M. K. H. 2001. A user-programmablevertex engine. InComputer Graphics (Proc. of Siggraph), 149–159.

LUKASZEWSKI, A. 2001. Exploiting coherence of shadow rays. InPro-ceedings of AFRIGRAPH, ACM SIGGRAPH, 147–150.

MOLNAR, S. 1991. Efficient Supersampling Antialiasing for High-Performance Architectures. Tech. Rep. TR91-023, The University ofNoth Carolina at Chapel Hill, April.

MUUSS, M. J., AND LORENZO, M. 1995. High-resolution interactivemultispectral missile sensor simulation for atr and dis. InProceedings ofBRL-CAD Symposium ’95.

MUUSS, M. J. 1995. Towards real-time ray-tracing of combinatorial solidgeometric models. InProceedings of BRL-CAD Symposium ’95.

NIEDERREITER, H. 1992.Random Number Generation and Quasi-MonteCarlo Methods. SIAM, Pennsylvania.

NVIDIA. 2001. NVIDIA OpenGL Extension Specifications. NVIDIA,March.

OWEN, A. 1998. Monte Carlo Extension of Quasi-Monte Carlo. InWinterSimulation Conference, IEEE Press, 571–577.

PARKER, S., SHIRLEY, P., LIVNAT , Y., HANSEN, C., AND SLOAN , P. P.1998. Interactive ray tracing for isosurface rendering. InIEEE Visual-ization ’98, 233–238.


PARKER, S., PARKER, M., L IVNAT , Y., SLOAN , P. P., HANSEN, C., AND

SHIRLEY, P. 1999. Interactive ray tracing for volume visualization.IEEE Transactions on Computer Graphics and Visualization 5, 3 (July-September), 238–250.

PARKER, S., SHIRLEY, P., LIVNAT , Y., HANSEN, C., AND SLOAN , P. P.1999. Interactive ray tracing. InInteractive 3D Graphics (I3D), 119–126.

PRESS, H., TEUKOLSKY, S., VETTERLING, T., AND FLANNERY, B. 1992.Numerical Recipes in C. Cambridge University Press.

RAMASUBRAMANIAN , M., PATTANAIK , S.,AND GREENBERG, D. 1999.A Perceptually Based Physical Error Metric for Realistic Image Synthe-sis. InProceedings of SIGGRAPH 99, Computer Graphics Proceedings,Annual Conference Series, 73–82.

SOBOL’, I. 1994. A Primer for the Monte Carlo Method. CRC Press.

STAMMINGER , M., HABER, J., SCHIRMACHER, H., AND SEIDEL, H.-P.2000. Walkthroughs with Corrective Texturing. InRendering Techniques2000 (Proc. 11th Eurographics Workshop on Rendering), Springer, 377–390.

VEACH, E., AND GUIBAS, L. 1994. Bidirectional Estimators for LightTransport. InProc. 5th Eurographics Worshop on Rendering, 147 – 161.

VEACH, E., AND GUIBAS, L. 1997. Metropolis Light Transport. InSIG-GRAPH 97 Conference Proceedings, Addison Wesley, T. Whitted, Ed.,Annual Conference Series, ACM SIGGRAPH, 65–76.

WALD , I., BENTHIN, C., WAGNER, M., AND SLUSALLEK , P. 2001. In-teractive rendering with coherent ray tracing.Computer Graphics Forum20, 3.

WALD , I., SLUSALLEK , P., AND BENTHIN, C. 2001. Interactive dis-tributed ray tracing of highly complex models. InProceedings of the12th EUROGRPAHICS Workshop on Rendering. London.

WALTER, B., DRETTAKIS, G., AND PARKER, S. 1999. Interactive ren-dering using the render cache.Eurographics Rendering Workshop 1999.Granada, Spain.

WARD, G., AND HECKBERT, P. 1992. Irradiance gradients. In3rd Euro-graphics Workshop on Rendering.

WHITTED, T. 1980. An improved illumination model for shaded display.CACM 23, 6 (June), 343–349.

Interactive Global Illuminationwald//Publications/2002/IGI/igi02.pdf · Interactive Global...

Documents

Transcript of Interactive Global Illuminationwald//Publications/2002/IGI/igi02.pdf · Interactive Global...