Download - LOD Case Study & Application Robert Huebner Nihilistic Software [email protected] Robert Huebner Nihilistic Software [email protected].

LOD Case Study &ApplicationLOD Case Study &Application

Robert HuebnerNihilistic [email protected]

Robert HuebnerNihilistic [email protected]

Speaker Bio• President and Director of Technology

for Nihilistic Software– Currently working on “Starcraft:Ghost” for

Blizzard Entertainment– Previous credits include Vampire: The

Masquerade, Jedi Knight: Dark Forces 2, Descent

• International Game Developer’s Association Board Member (IGDA)– www.igda.org

• Game Developer’s Conference (GDC) Advisory Board

Purpose of Talk• Review some of the topics and

ideas presented earlier in the course– Try to explain what worked for us,

and what didn’t

• This talk is a “case study in progress” for our current Gamecube and XBOX work– Still tweaking and changing some

LOD schemes

Starcraft: Ghost(needs LOD too!)

Goal of LOD• Back on Pre-3D-hardware PCs, we would

spend a LOT of CPU to avoid drawing a few triangles– The cost of rendering was much higher– We were willing to spend significant CPU to

eliminate a single triangle• Systems like ROAM, view-dependent LOD

• Current hardware renders fast, so we only spend CPU if we can discard a lot of triangles– Or if it saves us state changes, texture

fetches, memory bandwidth, or other costly processing

RAM

General Block Diagram

Vertex Unit

Pixel Unit

CPU

GPU

FIFO

Texture Mem

Frame buffer

Data Flow Management• Managing data flow and bandwidth

is an important performance metric• Each platform has different

architectures– So our choice of LOD differs for each

platform

• Each main data path can utilize different LOD techniques to increase throughput– We try to do this without wasting CPU

or memory resources, which are also scarce

RAM

Where Do We Use LOD?

Vertex Unit

Pixel Unit

CPU

GPU

FIFO

Texture Mem

Framebuffer

Classes of Game LOD• The design of most console

systems is dominated by three data paths:– The RAM->GPU path and GPU

throughput is managed with geometric LOD

– The GPU->Framebuffer path is managed via shader LOD

– The Texture->GPU path is managed with MIP-mapping and shader LOD

Games Vs. Research• The biggest problems we run into

when adopting academic LOD systems to game use are:– Dealing with additional properties of

meshes• Vertex normals, texture, UV coordinates,

etc.

– Avoid the need for general-purpose processing at the vertex level

– Maintaining data in a format that our hardware can process directly

Runtime Selection• In our engine, all LOD processing

for a given object is driven by a single value– The LOD value is stored both as a

float (0.0 to 1.0) and as a discrete BYTE (1..X)

– Each sub-system that wants to do LOD can use either version of the LOD metric to control behavior

Runtime Selection• The LOD metric is stored for each

object or “sector” (world section)• Based on many factors (highest to

lowest weight)– Estimated screen space (size / distance)– Overall performance or estimated triangle

counts for scene (scene metric)– Current player control mode (interact or

cutscene, combat or stealth)– “Importance” of the object (active AI vs.

inactive AI)– Viewing angle for terrain blocks

Geometric LOD• Geometric LOD is the most

interesting & complex topic for games• There are three main goals we try to

achieve with geometric LOD:– Send less data to the GPU to avoid

exceeding its throughput– Utilize less bus bandwidth moving data into

the graphics unit– Try achieve a constant average triangle size

to balance load between vertex and pixel units

Compiled Models• Most game engines are constructed

to load “compiled” models– Vertex data is adjusted to match native

format– Triangles are batched to minimize state

changes and fit within hardware limits– Optimum strips are constructed– DisplayLists/Pushbuffers are compiled

• Compiled models are highly platform-specific

Basic LOD Choices• Based on platform specifics, we

select a simple half-edge collapse operation as the basis of our LOD– Minimizes memory use, vertex data

remains unchanged– Minimizes dynamically changing

vertex data, which minimizes bandwidth & FIFO space

– Allows us to address problems with property discontinuities

Calculating LOD• We perform all our LOD computation off-

line during model compilation– We offer the artists a choice of LOD metric to

use when computing automatic LOD levels

• We chose an LOD scheme that is based on half-edge collapse operations only– Less memory, more static data set

• The LOD is constructed based on edge score– Each edge in the model is given a score based

on its length, curvature, or other factors– Vertices are also given scores to control

which endpoint is preserved during the edge collapse

Calculating LOD• We begin by building an augmented

“collapse vertex” structure for the model– Links to neighbor verts (edges)– Links to associated faces– Link and score of “least cost” edge– Identification of “border” or “seam” verts– Links to “paired” verts– Links to the actual “render” vertices

• This process happens after vertices are split due to texture/normal/UV changes– This means one collapse vertex can be

linked to multiple “export” vertices

Calculating LOD• We add game-specific restrictions to LOD

– Either adjust the vertex score, exempt it entirely, or link its removal to that of another vertex

– Texture or UV mapping “seams” due to composited textures

– Vertex normal discontinuities (hard edge)– Unpaired edges– Artist influence (blind vertex data in Maya)

• We also use domain-specific knowledge to adjust scoring algorithm– Terrain blocks use z (height) differential as

main score factor– Shadow/collision LOD ignores texture/UV seams

Calculating LOD• Once we have a full set of edge scores,

we select the least cost edge and remove its least cost vertex– Half-edge collapse to the higher-cost

endpoint– Record the operation in fields in our

underlying data– Remove degenerate triangles– Re-compute all edge costs in neighboring

triangles– Repeat until only non-collapsible edges

remain

Note on quality• Our reduction and scoring system is

simple, but accuracy suffers– Because of this, we have found that the last

10% or so of the collapse operations are judged by artists as being unsatisfactory

• We allow the export process to specify some control over the quality– Limit on the maximum cost collapse that will

be executed (default excludes about 10% of operations)

– Object-specific tweaks to the computed LOD factor

Calculating LOD• The results of this operation are two new

data fields in our renderable vertex structure– The “collapseOrder” field gives the ordering of

the collapse operation– The “collapseTo” field is the destination vertex

for the edge collapse operation that removes this vertex from the mesh

• Using these fields, we can export the LOD in various ways in the final compilation

• Since the LOD metrices are all export-side, we can adopt improvements periodically without affecting run-time data– Just re-export to get benefits of better

reduction

Discrete LOD• Discrete LOD is still the workhorse of

game mesh LOD– Each level can undergo heavy pre-processing

for strip-ordering or displaylist creation– Artists can hand-tune the reduction for

visual accuracy– Can optionally replace both vertices and

index lists, or just indices to save memory

• We represent discrete LOD by loading multiple sets of face index lists, or separate “index buffers”– Vertex data is unchanged

Exporting Discrete LOD• We can use our computed data to

export any number of discrete LOD steps– Pick a desired number of vertices for the LOD

level• Calculate how many collapse operations will

reach this level– Build an indexed ordering for the mesh

• For any vertex with a “collapseOrder” value lower than the # of operations, replace its index with its “collapseTo” index

• Repeat until a vertex is reached that has a higher collapseOrder field

• Process each index ordering for strips & cache coherency, create packets, etc.

Discrete Blended LOD• To minimize “popping” that occurs

during the LOD switch, we can use image-space blending– When an object needs to change between

discrete LOD levels, it is queued for blending

– During blending, the object is actually rendered twice, at both LOD levels, and the alpha values are cross-faded

• In practice, we find this is useful for larger objects or terrain blocks, but not useful for typical models

Continuous LOD• Continuous LOD can be an effective

extension to discrete-LOD for games– Reductions with greater granularity can

avoid visible “popping”– It can also save memory compared to

storing a high number of discrete levels

• Our continuous implementation is based mainly on half-edge collapse– This is the best way to keep our data

static

CLOD Implementation• To implement run-time CLOD,

what we’re effectively doing is moving our off-line creation of discrete LOD index lists to the run-time engine– To save memory, we re-order

vertices in order of their “collapseOrder” field

– We export a separate parallel array to contain the “collapseTo” index for each vertex

CLOD Runtime• At run-time, we select a desired

number of vertices and repeat the recursive collapse process– Each index replaced with its collapseTo until

a value less than the desired size is reached• For efficiency, we re-order our original

index list in reverse-collapse order– This allows us to stop when the first

degenerate triangle is detected during the collapse process

• The result is a new indexing of the mesh with the precise number of vertices requested– Result is cached in our model instance data

CLOD Advantages• This method maps moderately

well to console needs– The vertex data remains static and

indexable– Re-indexing can be cached over

multiple frames to amortize costs– Minimal storage costs above cost of

storing basic model data• 2 bytes per vert fixed-cost• Can actually be more memory-efficient

than discrete LOD, but not by a lot

CLOD Disadvantages• The biggest challenge with CLOD is to

optimize the index ordering– Normally we perform intense, off-line strip

generation to achieve this– With an index list that could change every

frame, we aren’t able to spend time generating strips

– We can still “compile” displaylists, etc. but at some additional cost

• Skip strips and similar techniques of partial-strip buffering can help address these concerns– Exploit the fact that most of the model

remains unchanged after each step

Non-Geometric LOD

Vertex Shader LOD• Vertex “shader” refers to the

processing path required to setup each vertex in the scene– Newer PC and console hardware

allow for extremely complex vertex operations including transformation, blending, and lighting

– The throughput of the GPU in verts/sec varies by orders of magnitude depending on the processing required• Un-textured, un-lit = 30M V/s• Dual-texture, 4 Lights = 9M V/s

Lighting LOD• One of the most costly parts of

vertex processing is lighting calculation– Generally the cost increases linearly

with the number of active lights.– All games do basic operations like

selecting the X brightest nearby lights for each mesh• The number of lights X can be

increased/decreased based on LOD metrics

Pre-lighting• Because lighting is so expensive, a

common optimization is to pre-calculate lights when possible– A non-moving (or rarely-moving object) can

have the lighting contribution from all nearby, non-moving lights calculated offline & stored in per-vertex color channel

• As long as certain conditions hold, the object is rendered with a 0-light path

– If additional moving lights come into range, the hardware allows us to add dynamic and pre-calculated colors in hardware

– If the object moves, it can revert to real-time lighting

Lighting LOD• At lower LOD levels, we can use

simpler lighting equations– Use a static envmap (spherical or cubic)

and normal-based texture projection to approximate diffuse lighting

– Switch to purely ambient lighting or directional lighting at low LOD

• At lower LOD levels, shadow generation is reduced or disabled– Remove self-shadowing, remove

accurate projected shadow volumes or textures

Projected Lighting• A common technique in current games

is to use texture projection to simulate complex lighting scenarios– Generally this requires an additional

rendering pass on affected meshes– At lower LOD, we attempt to replace a

projected light with a similar point or spotlight• Match color & size to approximate the texture

effect

– We also begin to exclude smaller objects from projection• Light will affect walls, but not characters

Vertex Shader LOD• After lighting, the next most costly

operation is skinning or blending the vertex– Can be performed by fixed-function matrix-

palette blending, or programmable vertex shader– Our goal with LOD is to use the existing model

data but to simplify the vertex processing math

• We create N versions of all active game vertex processing functions– All accept the same input data– Selection is driven at run-time by the shared

“LOD Factor”– Essentially its discrete vertex LOD

Model Coordinate System• We store vertex position and normal

data in “model space”– This enables us to select between several

types of vertex processing when needed– If we ignore all bone associations and

render with a single transform, we get the “at-rest” model pose

– If we store bone influences in sorted order, we can blend only against the first bone to get less-accurate skinning

Skeleton LOD• The number of bones in a model skeleton

can also affect performance– Our vertex shader offers a fixed number of

matrices that can be loaded into hardware registers simultaneously

– This limits on the number of faces we can render before re-loading these registers (batch size)

• We can replace a vertex->bone binding with that bone’s parent to eliminate “leaf” bones– Their geometry will behave as if the removed

bones are fused in their at-rest pose– This needs to be done off-line because it affects

how we split the model into render groups

Other Vertex LOD• At lower LOD, we replace accurate

reflected-normal vectors with camera-space normal vectors– Requires less CPU assistance on

some platforms

• We can often reduce the accuracy of skinning/blending for normal vectors before we do the same for position vectors– Effects of inaccurate normals are far

less obvious

Pixel Shader LOD• Pixel shader LOD simply means having multiple

implementations of each raster-level visual effect– Alternate versions would achieve a similar

visual result with fewer render passes, texture stages, or texture fetches

– Disabling multi-pass techniques is particularly effective because it benefits geometric LOD as well

– Reducing texture stages or fetches increases pixel fill-rate

• Generally implemented simply as multiple code paths selectable according to LOD metrics– Light mapped walls can revert to vertex-lit– Bumpmaps, Envmaps are blended out

Imposters• The most extreme form of

geometric LOD is replacing a complex object with an imposter– The imposter can be a flat, textured

quad– Or it can be a simple geometric shell – The goal is to approximate the shape

& color of the original object at great distances

• Some game objects are always rendered as imposters– Particles, explosions, bullets, foliage

Billboard Imposter• The billboard imposter replaces a complex

shape with a flat textured quad– Can be rotated to face the camera in 1, 2 or 3

axes, depending on object symmetry– The texture can contain multiple frames to

represent different angles or animation frames

– The engine can blend between frames to improve fidelity, or use 3D volume textures to perform hardware blending

– Typically billboard imposters use masked (1-bit alpha) texture images so the actual quad outline is not visible

– “Z sprites” can provide imposters that z-buffer more accurately, particularly useful in clusters of objects

Dynamic Texture Imposter• Render-to-texture is a common &

reasonably efficient console pipeline– Non-dynamic texture imposters use valuable

texture memory– Gives better simulation of animation,

lighting, and movement of the replaced objects

• We allocate a pool of textures for dynamic imposters at startup and re-use them when necessary– A large crowd scene might re-use each

imposter many times

Geometric Imposter• A Geometric imposter uses a rigid 3D

model in place of a complex articulated 3D model– The “rigid mesh” vertex shader is usually

several times faster than skinned/blended– The imposter can use simpler shaders, fewer

textures, and larger render batches– Geometric imposters look better when

viewed from multiple angles (object rotating or camera panning)

– Can take up less memory than multi-frame texture imposters, and can render nearly as quickly

Terrain LOD• Terrain LOD is often handled

specially– Mainly because the terrain is very

large compared to the viewer (player)

– Our terrain is not stored as a heightfield, so we can do more arbitrary shapes

– We break the terrain into separate blocks according to a 2D grid overlay

Terrain LOD– Each block has discrete LOD levels pre-

computed and compiled into display lists

– At run-time, an LOD factor is computed for each block• Based on distance, viewing angle, viewer

height

– Vertices that lie along the boundaries between blocks are not subject to removal• This avoids opening gaps and allows each

block to LOD independently

– Image-space blending can help hide switches

Image Processing Techniques• Z-Fade

– Gameplay elements that are only of player interest at close range can be alpha blended out at increasing z-distance

– Powerups, small detail models, ground cover foliage, atmosphere objects, etc.

• Depth of Field effects– If the game utilizes a depth-of-field effect to

blur distant objects, the game can use far more aggressive distance LOD schemes

Non-Visual LOD• Creating a special LOD geometry for

shadow projection– Could use more aggressive methods beyone

half-edge collapse to generate silhouettes– Because shadows don’t have

texture/lighting concerns, we can be more aggressive in choosing algorithms

• Automatic Collision geometry– Currently we create collision geometry using

simple volume shapes, or convex hull algorithms

– More demanding games could use some of the volume-based LOD reductions to create better-fit collision geometry

Future Directions• Subdivision & curved surfaces

– If future platforms increase RAM sizes and are fast enough to render 1-tri-per-pixel, its unclear if subdiv is needed• However, artists are adopting this rapidly for

cutscene work, so data-sharing is appealing benefit

– Subdivision with hardware support that was effectively “free” would definitely find an audience• Otherwise, we expect that next-generation

projects will continue to encode more data into textures and use programmable shaders to simulate details

Future Directions• Vertex processing hardware is

becoming more general-purpose– Will allow more meaningful per-

vertex processing for LOD schemes– Possibly more emphasis on view-

dependent schemes

References• Surface Simplification Using Quadric Error Metrics,

by Michael Garland and Paul Heckbert, SIGGRAPH 97 • Bischoff, "Towards Hardware Implementation of Loop Subdivision",

Proceedings 2000 SIGGRAPH/EUROGRAPHICS Workshop on Graphics Hardware, August 2000

• Brickhill, "Practical Implementation Techniques for Multi-Resolution Subdivision Surfaces". GDC Conference Proceeding, 2001.