LOD Case Study &ApplicationLOD Case Study &Application
Robert HuebnerNihilistic [email protected]
Robert HuebnerNihilistic [email protected]
Speaker Bio• President and Director of Technology
for Nihilistic Software– Currently working on “Starcraft:Ghost” for
Blizzard Entertainment– Previous credits include Vampire: The
Masquerade, Jedi Knight: Dark Forces 2, Descent
• International Game Developer’s Association Board Member (IGDA)– www.igda.org
• Game Developer’s Conference (GDC) Advisory Board
Purpose of Talk• Review some of the topics and
ideas presented earlier in the course– Try to explain what worked for us,
and what didn’t
• This talk is a “case study in progress” for our current Gamecube and XBOX work– Still tweaking and changing some
LOD schemes
Starcraft: Ghost(needs LOD too!)
Goal of LOD• Back on Pre-3D-hardware PCs, we would
spend a LOT of CPU to avoid drawing a few triangles– The cost of rendering was much higher– We were willing to spend significant CPU to
eliminate a single triangle• Systems like ROAM, view-dependent LOD
• Current hardware renders fast, so we only spend CPU if we can discard a lot of triangles– Or if it saves us state changes, texture
fetches, memory bandwidth, or other costly processing
RAM
General Block Diagram
Vertex Unit
Pixel Unit
CPU
GPU
FIFO
Texture Mem
Frame buffer
Data Flow Management• Managing data flow and bandwidth
is an important performance metric• Each platform has different
architectures– So our choice of LOD differs for each
platform
• Each main data path can utilize different LOD techniques to increase throughput– We try to do this without wasting CPU
or memory resources, which are also scarce
RAM
Where Do We Use LOD?
Vertex Unit
Pixel Unit
CPU
GPU
FIFO
Texture Mem
Framebuffer
Classes of Game LOD• The design of most console
systems is dominated by three data paths:– The RAM->GPU path and GPU
throughput is managed with geometric LOD
– The GPU->Framebuffer path is managed via shader LOD
– The Texture->GPU path is managed with MIP-mapping and shader LOD
Games Vs. Research• The biggest problems we run into
when adopting academic LOD systems to game use are:– Dealing with additional properties of
meshes• Vertex normals, texture, UV coordinates,
etc.
– Avoid the need for general-purpose processing at the vertex level
– Maintaining data in a format that our hardware can process directly
Runtime Selection• In our engine, all LOD processing
for a given object is driven by a single value– The LOD value is stored both as a
float (0.0 to 1.0) and as a discrete BYTE (1..X)
– Each sub-system that wants to do LOD can use either version of the LOD metric to control behavior
Runtime Selection• The LOD metric is stored for each
object or “sector” (world section)• Based on many factors (highest to
lowest weight)– Estimated screen space (size / distance)– Overall performance or estimated triangle
counts for scene (scene metric)– Current player control mode (interact or
cutscene, combat or stealth)– “Importance” of the object (active AI vs.
inactive AI)– Viewing angle for terrain blocks
Geometric LOD• Geometric LOD is the most
interesting & complex topic for games• There are three main goals we try to
achieve with geometric LOD:– Send less data to the GPU to avoid
exceeding its throughput– Utilize less bus bandwidth moving data into
the graphics unit– Try achieve a constant average triangle size
to balance load between vertex and pixel units
Compiled Models• Most game engines are constructed
to load “compiled” models– Vertex data is adjusted to match native
format– Triangles are batched to minimize state
changes and fit within hardware limits– Optimum strips are constructed– DisplayLists/Pushbuffers are compiled
• Compiled models are highly platform-specific
Basic LOD Choices• Based on platform specifics, we
select a simple half-edge collapse operation as the basis of our LOD– Minimizes memory use, vertex data
remains unchanged– Minimizes dynamically changing
vertex data, which minimizes bandwidth & FIFO space
– Allows us to address problems with property discontinuities
Calculating LOD• We perform all our LOD computation off-
line during model compilation– We offer the artists a choice of LOD metric to
use when computing automatic LOD levels
• We chose an LOD scheme that is based on half-edge collapse operations only– Less memory, more static data set
• The LOD is constructed based on edge score– Each edge in the model is given a score based
on its length, curvature, or other factors– Vertices are also given scores to control
which endpoint is preserved during the edge collapse
Calculating LOD• We begin by building an augmented
“collapse vertex” structure for the model– Links to neighbor verts (edges)– Links to associated faces– Link and score of “least cost” edge– Identification of “border” or “seam” verts– Links to “paired” verts– Links to the actual “render” vertices
• This process happens after vertices are split due to texture/normal/UV changes– This means one collapse vertex can be
linked to multiple “export” vertices
Calculating LOD• We add game-specific restrictions to LOD
– Either adjust the vertex score, exempt it entirely, or link its removal to that of another vertex
– Texture or UV mapping “seams” due to composited textures
– Vertex normal discontinuities (hard edge)– Unpaired edges– Artist influence (blind vertex data in Maya)
• We also use domain-specific knowledge to adjust scoring algorithm– Terrain blocks use z (height) differential as
main score factor– Shadow/collision LOD ignores texture/UV seams
Calculating LOD• Once we have a full set of edge scores,
we select the least cost edge and remove its least cost vertex– Half-edge collapse to the higher-cost
endpoint– Record the operation in fields in our
underlying data– Remove degenerate triangles– Re-compute all edge costs in neighboring
triangles– Repeat until only non-collapsible edges
remain
Note on quality• Our reduction and scoring system is
simple, but accuracy suffers– Because of this, we have found that the last
10% or so of the collapse operations are judged by artists as being unsatisfactory
• We allow the export process to specify some control over the quality– Limit on the maximum cost collapse that will
be executed (default excludes about 10% of operations)
– Object-specific tweaks to the computed LOD factor
Calculating LOD• The results of this operation are two new
data fields in our renderable vertex structure– The “collapseOrder” field gives the ordering of
the collapse operation– The “collapseTo” field is the destination vertex
for the edge collapse operation that removes this vertex from the mesh
• Using these fields, we can export the LOD in various ways in the final compilation
• Since the LOD metrices are all export-side, we can adopt improvements periodically without affecting run-time data– Just re-export to get benefits of better
reduction
Discrete LOD• Discrete LOD is still the workhorse of
game mesh LOD– Each level can undergo heavy pre-processing
for strip-ordering or displaylist creation– Artists can hand-tune the reduction for
visual accuracy– Can optionally replace both vertices and
index lists, or just indices to save memory
• We represent discrete LOD by loading multiple sets of face index lists, or separate “index buffers”– Vertex data is unchanged
Exporting Discrete LOD• We can use our computed data to
export any number of discrete LOD steps– Pick a desired number of vertices for the LOD
level• Calculate how many collapse operations will
reach this level– Build an indexed ordering for the mesh
• For any vertex with a “collapseOrder” value lower than the # of operations, replace its index with its “collapseTo” index
• Repeat until a vertex is reached that has a higher collapseOrder field
• Process each index ordering for strips & cache coherency, create packets, etc.
Discrete Blended LOD• To minimize “popping” that occurs
during the LOD switch, we can use image-space blending– When an object needs to change between
discrete LOD levels, it is queued for blending
– During blending, the object is actually rendered twice, at both LOD levels, and the alpha values are cross-faded
• In practice, we find this is useful for larger objects or terrain blocks, but not useful for typical models
Continuous LOD• Continuous LOD can be an effective
extension to discrete-LOD for games– Reductions with greater granularity can
avoid visible “popping”– It can also save memory compared to
storing a high number of discrete levels
• Our continuous implementation is based mainly on half-edge collapse– This is the best way to keep our data
static
CLOD Implementation• To implement run-time CLOD,
what we’re effectively doing is moving our off-line creation of discrete LOD index lists to the run-time engine– To save memory, we re-order
vertices in order of their “collapseOrder” field
– We export a separate parallel array to contain the “collapseTo” index for each vertex
CLOD Runtime• At run-time, we select a desired
number of vertices and repeat the recursive collapse process– Each index replaced with its collapseTo until
a value less than the desired size is reached• For efficiency, we re-order our original
index list in reverse-collapse order– This allows us to stop when the first
degenerate triangle is detected during the collapse process
• The result is a new indexing of the mesh with the precise number of vertices requested– Result is cached in our model instance data
CLOD Advantages• This method maps moderately
well to console needs– The vertex data remains static and
indexable– Re-indexing can be cached over
multiple frames to amortize costs– Minimal storage costs above cost of
storing basic model data• 2 bytes per vert fixed-cost• Can actually be more memory-efficient
than discrete LOD, but not by a lot
CLOD Disadvantages• The biggest challenge with CLOD is to
optimize the index ordering– Normally we perform intense, off-line strip
generation to achieve this– With an index list that could change every
frame, we aren’t able to spend time generating strips
– We can still “compile” displaylists, etc. but at some additional cost
• Skip strips and similar techniques of partial-strip buffering can help address these concerns– Exploit the fact that most of the model
remains unchanged after each step
Non-Geometric LOD
Vertex Shader LOD• Vertex “shader” refers to the
processing path required to setup each vertex in the scene– Newer PC and console hardware
allow for extremely complex vertex operations including transformation, blending, and lighting
– The throughput of the GPU in verts/sec varies by orders of magnitude depending on the processing required• Un-textured, un-lit = 30M V/s• Dual-texture, 4 Lights = 9M V/s
Lighting LOD• One of the most costly parts of
vertex processing is lighting calculation– Generally the cost increases linearly
with the number of active lights.– All games do basic operations like
selecting the X brightest nearby lights for each mesh• The number of lights X can be
increased/decreased based on LOD metrics
Pre-lighting• Because lighting is so expensive, a
common optimization is to pre-calculate lights when possible– A non-moving (or rarely-moving object) can
have the lighting contribution from all nearby, non-moving lights calculated offline & stored in per-vertex color channel
• As long as certain conditions hold, the object is rendered with a 0-light path
– If additional moving lights come into range, the hardware allows us to add dynamic and pre-calculated colors in hardware
– If the object moves, it can revert to real-time lighting
Lighting LOD• At lower LOD levels, we can use
simpler lighting equations– Use a static envmap (spherical or cubic)
and normal-based texture projection to approximate diffuse lighting
– Switch to purely ambient lighting or directional lighting at low LOD
• At lower LOD levels, shadow generation is reduced or disabled– Remove self-shadowing, remove
accurate projected shadow volumes or textures
Projected Lighting• A common technique in current games
is to use texture projection to simulate complex lighting scenarios– Generally this requires an additional
rendering pass on affected meshes– At lower LOD, we attempt to replace a
projected light with a similar point or spotlight• Match color & size to approximate the texture
effect
– We also begin to exclude smaller objects from projection• Light will affect walls, but not characters
Vertex Shader LOD• After lighting, the next most costly
operation is skinning or blending the vertex– Can be performed by fixed-function matrix-
palette blending, or programmable vertex shader– Our goal with LOD is to use the existing model
data but to simplify the vertex processing math
• We create N versions of all active game vertex processing functions– All accept the same input data– Selection is driven at run-time by the shared
“LOD Factor”– Essentially its discrete vertex LOD
Model Coordinate System• We store vertex position and normal
data in “model space”– This enables us to select between several
types of vertex processing when needed– If we ignore all bone associations and
render with a single transform, we get the “at-rest” model pose
– If we store bone influences in sorted order, we can blend only against the first bone to get less-accurate skinning
Skeleton LOD• The number of bones in a model skeleton
can also affect performance– Our vertex shader offers a fixed number of
matrices that can be loaded into hardware registers simultaneously
– This limits on the number of faces we can render before re-loading these registers (batch size)
• We can replace a vertex->bone binding with that bone’s parent to eliminate “leaf” bones– Their geometry will behave as if the removed
bones are fused in their at-rest pose– This needs to be done off-line because it affects
how we split the model into render groups
Other Vertex LOD• At lower LOD, we replace accurate
reflected-normal vectors with camera-space normal vectors– Requires less CPU assistance on
some platforms
• We can often reduce the accuracy of skinning/blending for normal vectors before we do the same for position vectors– Effects of inaccurate normals are far
less obvious
Pixel Shader LOD• Pixel shader LOD simply means having multiple
implementations of each raster-level visual effect– Alternate versions would achieve a similar
visual result with fewer render passes, texture stages, or texture fetches
– Disabling multi-pass techniques is particularly effective because it benefits geometric LOD as well
– Reducing texture stages or fetches increases pixel fill-rate
• Generally implemented simply as multiple code paths selectable according to LOD metrics– Light mapped walls can revert to vertex-lit– Bumpmaps, Envmaps are blended out
Imposters• The most extreme form of
geometric LOD is replacing a complex object with an imposter– The imposter can be a flat, textured
quad– Or it can be a simple geometric shell – The goal is to approximate the shape
& color of the original object at great distances
• Some game objects are always rendered as imposters– Particles, explosions, bullets, foliage
Billboard Imposter• The billboard imposter replaces a complex
shape with a flat textured quad– Can be rotated to face the camera in 1, 2 or 3
axes, depending on object symmetry– The texture can contain multiple frames to
represent different angles or animation frames
– The engine can blend between frames to improve fidelity, or use 3D volume textures to perform hardware blending
– Typically billboard imposters use masked (1-bit alpha) texture images so the actual quad outline is not visible
– “Z sprites” can provide imposters that z-buffer more accurately, particularly useful in clusters of objects
Dynamic Texture Imposter• Render-to-texture is a common &
reasonably efficient console pipeline– Non-dynamic texture imposters use valuable
texture memory– Gives better simulation of animation,
lighting, and movement of the replaced objects
• We allocate a pool of textures for dynamic imposters at startup and re-use them when necessary– A large crowd scene might re-use each
imposter many times
Geometric Imposter• A Geometric imposter uses a rigid 3D
model in place of a complex articulated 3D model– The “rigid mesh” vertex shader is usually
several times faster than skinned/blended– The imposter can use simpler shaders, fewer
textures, and larger render batches– Geometric imposters look better when
viewed from multiple angles (object rotating or camera panning)
– Can take up less memory than multi-frame texture imposters, and can render nearly as quickly
Terrain LOD• Terrain LOD is often handled
specially– Mainly because the terrain is very
large compared to the viewer (player)
– Our terrain is not stored as a heightfield, so we can do more arbitrary shapes
– We break the terrain into separate blocks according to a 2D grid overlay
Terrain LOD– Each block has discrete LOD levels pre-
computed and compiled into display lists
– At run-time, an LOD factor is computed for each block• Based on distance, viewing angle, viewer
height
– Vertices that lie along the boundaries between blocks are not subject to removal• This avoids opening gaps and allows each
block to LOD independently
– Image-space blending can help hide switches
Image Processing Techniques• Z-Fade
– Gameplay elements that are only of player interest at close range can be alpha blended out at increasing z-distance
– Powerups, small detail models, ground cover foliage, atmosphere objects, etc.
• Depth of Field effects– If the game utilizes a depth-of-field effect to
blur distant objects, the game can use far more aggressive distance LOD schemes
Non-Visual LOD• Creating a special LOD geometry for
shadow projection– Could use more aggressive methods beyone
half-edge collapse to generate silhouettes– Because shadows don’t have
texture/lighting concerns, we can be more aggressive in choosing algorithms
• Automatic Collision geometry– Currently we create collision geometry using
simple volume shapes, or convex hull algorithms
– More demanding games could use some of the volume-based LOD reductions to create better-fit collision geometry
Future Directions• Subdivision & curved surfaces
– If future platforms increase RAM sizes and are fast enough to render 1-tri-per-pixel, its unclear if subdiv is needed• However, artists are adopting this rapidly for
cutscene work, so data-sharing is appealing benefit
– Subdivision with hardware support that was effectively “free” would definitely find an audience• Otherwise, we expect that next-generation
projects will continue to encode more data into textures and use programmable shaders to simulate details
Future Directions• Vertex processing hardware is
becoming more general-purpose– Will allow more meaningful per-
vertex processing for LOD schemes– Possibly more emphasis on view-
dependent schemes
References• Surface Simplification Using Quadric Error Metrics,
by Michael Garland and Paul Heckbert, SIGGRAPH 97 • Bischoff, "Towards Hardware Implementation of Loop Subdivision",
Proceedings 2000 SIGGRAPH/EUROGRAPHICS Workshop on Graphics Hardware, August 2000
• Brickhill, "Practical Implementation Techniques for Multi-Resolution Subdivision Surfaces". GDC Conference Proceeding, 2001.
Top Related