VAR/Fence: Using NV_vertex_array_range and NV_fence Cass Everitt.
-
Upload
alisha-patrick -
Category
Documents
-
view
234 -
download
0
description
Transcript of VAR/Fence: Using NV_vertex_array_range and NV_fence Cass Everitt.
Overview
• What is NV_vertex_array_range?• Fast variation of vertex arrays• GPU pulls data asynchronously
• What is NV_fence?• Better synchronization
• How to use VAR/fence together
• Performance hints
What is NV_vertex_array_range (a.k.a. VAR) ?
• Standard vertex arrays really only reduce function call overhead
• Other optimizations difficult because• Coherency model too strict
• Driver must copy array data before glEnd() returns• Same for glDrawElements() and glDrawArrays()
• Memory range unbounded• Any client memory can be used
What is NV_vertex_array_range (a.k.a. VAR) ? (2)
• Compiled vertex arrays improve this somewhat• Relaxes coherency requirements
• Lock/Unlock semantics• More room to optimize
• Usually requires lots of redundant copying• App could do better memory management
• Introduces index bounds• But not explicit memory bounds
• For multipass rendering• Can re-use transformed vertices (software T&L)• Can put data in AGP/video memory (hardware T&L)
What is NV_vertex_array_range (a.k.a. VAR) ? (3)
• VAR allows the GPU to pull vertex data (via DMA)• Coherency model completely relaxed
• No constraints on when data must be fetched• Introduces synchronization issues
• VAR memory must be specially allocated• Reqiures AGP or video memory • Special wgl and glx entry points
• VAR memory range must be contiguous• This version of “lock” takes a pointer and a size• Typically a single, large arena is allocated
• Greater application burden of memory management, but with potentially big payoff
What is NV_vertex_array_range (a.k.a. VAR) ? (4)
• VAR gives the application total control!• App decides how best to use limited resources
• Avoids continually copying static arrays (e.g. skin texture coords)
• The power of display lists – with mutability!
What VAR (alone) does not do well…
• VAR adds the ability to have really fast vertex arrays, but• VAR memory is in limited supply, and• VAR does not provide an efficient way to re-use
memory• Calling glFinish() or glFlushVertexArrayRangeNV()
is too heavy-handed
What is NV_fence?
• NV_fence provides fine-grained synchronization• A “fence” is a probe that can be placed into the
OpenGL command stream• Each fence has a condition that can be tested
• GL_ALL_COMPLETED_NV is currently the only condition
• glFinishFenceNV() allows an app to wait until a specific fence’s condition is satisfied
• Very fine application-level control
How to use VAR/fence together
• The combination of VAR and fences is very powerful• VAR gives the best possible T&L performance• With fences, apps can achieve very efficient
pipelining• In memory-limited situations, VAR memory must
be reused• Fences can be placed in the command stream to
determine when memory can be reclaimed• Different strategies can be used for dynamic/static
data• App chooses management mechanism
How to use VAR/fence together (2)
• The learning_VAR demo uses multiple buffers (within a single arena) to achieve high T&L throughput on completely dynamic geometry!
Performance hints
• If T&L is not a bottleneck, VAR won’t help• VAR helps get geometry to the GPU• If your app is fill-bound, consider
• Adding more geometry – it probably won’t cost anything
• Make sure to use multi-texture and NV_register_combiners to reduce your memory bandwidth requirements
• Dynamic geometry requires CPU tuning work• Float-> int casts can be a huge bottleneck
Performance hints (2)
• Effective memory management and synchronization is key• Avoid redundant copies
• Do this on a per-array bases, not per-object• Be clever in your use of memory
• Use fences to keep both CPU and GPU working
Questions, comments, feedback?
• Cass Everitt, [email protected]• www.nvidia.com/developer• We try to answer questions at OpenGL.org’s
“advanced” opengl forum