Performance OpenGL Platform Independent Techniques Dave Shreiner Brad Grantham Dave Shreiner Brad...
-
Upload
hector-sherman -
Category
Documents
-
view
234 -
download
2
Transcript of Performance OpenGL Platform Independent Techniques Dave Shreiner Brad Grantham Dave Shreiner Brad...
Performance OpenGLPerformance OpenGLPlatform Independent TechniquesPlatform Independent Techniques
Performance OpenGLPerformance OpenGLPlatform Independent TechniquesPlatform Independent Techniques
Dave ShreinerDave Shreiner
Brad GranthamBrad GranthamDave ShreinerDave Shreiner
Brad GranthamBrad Grantham
Performance OpenGL 3
What You’ll See Today What You’ll See Today ……What You’ll See Today What You’ll See Today ……
•An in-depth look at the OpenGL pipeline from a performance perspective
•Techniques for determining where OpenGL application performance bottlenecks are
•A bunch of simple, good habits for OpenGL applications
•An in-depth look at the OpenGL pipeline from a performance perspective
•Techniques for determining where OpenGL application performance bottlenecks are
•A bunch of simple, good habits for OpenGL applications
Performance OpenGL 4
Performance Tuning Performance Tuning AssumptionsAssumptionsPerformance Tuning Performance Tuning AssumptionsAssumptions
•You’re trying to tune an interactive OpenGL application
•There’s an established metric for estimating the application’s performance– Consistent frames/second
– Number of pixels or primitives to be rendered per frame
•You can change the application’s source code
•You’re trying to tune an interactive OpenGL application
•There’s an established metric for estimating the application’s performance– Consistent frames/second
– Number of pixels or primitives to be rendered per frame
•You can change the application’s source code
Performance OpenGL 5
Errors Skew Errors Skew Performance Performance MeasurementsMeasurements
Errors Skew Errors Skew Performance Performance MeasurementsMeasurementsOpenGL Reports Errors OpenGL Reports Errors
AsynchronouslyAsynchronously•OpenGL doesn’t tell you when something
goes wrong– Need to use glGetError() to determine if something
went wrong
– Calls with erroneous parameters will silently set error state and return without completing
OpenGL Reports Errors OpenGL Reports Errors AsynchronouslyAsynchronously•OpenGL doesn’t tell you when something
goes wrong– Need to use glGetError() to determine if something
went wrong
– Calls with erroneous parameters will silently set error state and return without completing
Performance OpenGL 6
Checking a single Checking a single commandcommandChecking a single Checking a single commandcommand
Simple MacroSimple Macro
• Some limitations on where the macro can be used– can’t use inside of glBegin() / glEnd() pair
Simple MacroSimple Macro
• Some limitations on where the macro can be used– can’t use inside of glBegin() / glEnd() pair
#define CHECK_OPENGL_ERROR( cmd ) \ cmd; \ { GLenum error; \ while ( (error = glGetError()) != GL_NO_ERROR) { \ printf( "[%s:%d] '%s' failed with error %s\n", \
__FILE__, __LINE__, #cmd, \ gluErrorString(error) ); \ }
Performance OpenGL 7
The OpenGL PipelineThe OpenGL Pipeline(The Macroscopic View)(The Macroscopic View)The OpenGL PipelineThe OpenGL Pipeline(The Macroscopic View)(The Macroscopic View)
Ap
plic
ati
on
Tra
nsf
orm
ati
on
Rast
eri
zati
on
Fram
eb
uff
er
Performance OpenGL 8
Performance Performance BottlenecksBottlenecksPerformance Performance BottlenecksBottlenecks
BottlenecksBottlenecks are the performance are the performance limiting part of the applicationlimiting part of the application•Application bottleneck
– Application may not pass data fast enough to the OpenGL pipeline
•Transform-limited bottleneck– OpenGL may not be able to process vertex
transformations fast enough
BottlenecksBottlenecks are the performance are the performance limiting part of the applicationlimiting part of the application•Application bottleneck
– Application may not pass data fast enough to the OpenGL pipeline
•Transform-limited bottleneck– OpenGL may not be able to process vertex
transformations fast enough
Performance OpenGL 9
Performance Performance Bottlenecks (cont.)Bottlenecks (cont.)Performance Performance Bottlenecks (cont.)Bottlenecks (cont.)
•Fill-limited bottleneck– OpenGL may not be able to rasterize primitives fast
enough
•Fill-limited bottleneck– OpenGL may not be able to rasterize primitives fast
enough
Performance OpenGL 10
There Will Always Be A There Will Always Be A BottleneckBottleneckThere Will Always Be A There Will Always Be A BottleneckBottleneck
Some portion of the application will Some portion of the application will always be the limiting factor to always be the limiting factor to performanceperformance• If the application performs to expectations, then the
bottleneck isn’t a problem
• Otherwise, need to be able to identify which part of the application is the bottleneck
• We’ll work backwards through the OpenGL pipeline in resolving bottlenecks
Some portion of the application will Some portion of the application will always be the limiting factor to always be the limiting factor to performanceperformance• If the application performs to expectations, then the
bottleneck isn’t a problem
• Otherwise, need to be able to identify which part of the application is the bottleneck
• We’ll work backwards through the OpenGL pipeline in resolving bottlenecks
Performance OpenGL 11
Fill-limited BottlenecksFill-limited BottlenecksFill-limited BottlenecksFill-limited Bottlenecks
System cannot fill all the pixels System cannot fill all the pixels required in the allotted timerequired in the allotted time•Easiest bottleneck to test
•Reduce number of pixels application must fill– Make the viewport smaller
System cannot fill all the pixels System cannot fill all the pixels required in the allotted timerequired in the allotted time•Easiest bottleneck to test
•Reduce number of pixels application must fill– Make the viewport smaller
Performance OpenGL 12
Reducing Fill-limited Reducing Fill-limited BottlenecksBottlenecksReducing Fill-limited Reducing Fill-limited BottlenecksBottlenecks
The Easy FixesThe Easy Fixes•Make the viewport smaller
– This may not be an acceptable solution, but it’s easy
•Reduce the frame-rate
The Easy FixesThe Easy Fixes•Make the viewport smaller
– This may not be an acceptable solution, but it’s easy
•Reduce the frame-rate
framepixels
secondframes
secondpixels
M3.1306
M800frame
pixels
secondframes
secondpixels
M7.1075
M800
Performance OpenGL 13
A Closer Look at A Closer Look at OpenGL’s OpenGL’s Rasterization PipelineRasterization Pipeline
A Closer Look at A Closer Look at OpenGL’s OpenGL’s Rasterization PipelineRasterization Pipeline
TextureMappingEngine
ToFragment Tests
PointRasterization
LineRasterization
TriangleRasterization
Pixel Rectangle
Rasterization
BitmapRasterization
FogEngine
Color Sum(Sep.
Specular Color)
Performance OpenGL 14
Reducing Fill-limited Reducing Fill-limited Bottlenecks (cont.)Bottlenecks (cont.)Reducing Fill-limited Reducing Fill-limited Bottlenecks (cont.)Bottlenecks (cont.)
Rasterization PipelineRasterization Pipeline• Cull back facing
polygons
– Does require all primitives have same facediness
• Use per-vertex fog, as compared to per-pixel
Rasterization PipelineRasterization Pipeline• Cull back facing
polygons
– Does require all primitives have same facediness
• Use per-vertex fog, as compared to per-pixel
TextureMappingEngine
ToFragment Tests
PointRasterization
LineRasterization
TriangleRasterization
Pixel Rectangle
Rasterization
BitmapRasterization
FogEngine
Color Sum
Performance OpenGL 15
A Closer Look at A Closer Look at OpenGL’s OpenGL’s Rasterization Pipeline Rasterization Pipeline (cont.)(cont.)
A Closer Look at A Closer Look at OpenGL’s OpenGL’s Rasterization Pipeline Rasterization Pipeline (cont.)(cont.)
Fragments(from previous
stages)
PixelOwnership
Test
ScissorTest
AlphaTest
StencilTest
DepthBufferTest
Blending DitheringLogical
Operations
Fram
eb
uff
er
Performance OpenGL 16
Reducing Fill-limited Reducing Fill-limited Bottlenecks (cont.)Bottlenecks (cont.)Reducing Fill-limited Reducing Fill-limited Bottlenecks (cont.)Bottlenecks (cont.)
Fragment PipelineFragment Pipeline•Do less work per pixel
– Disable dithering
– Depth-sort primitives to reduce depth testing
– Use alpha test to reject transparent fragments saves doing a pixel read-back from the framebuffer in the
blending phase
Fragment PipelineFragment Pipeline•Do less work per pixel
– Disable dithering
– Depth-sort primitives to reduce depth testing
– Use alpha test to reject transparent fragments saves doing a pixel read-back from the framebuffer in the
blending phase
PixelOwnership
Test
ScissorTest
AlphaTest
StencilTest
DepthBufferTest
Blending DitheringLogical
Operations
Performance OpenGL 17
A Closer Look at A Closer Look at OpenGL’s Pixel OpenGL’s Pixel PipelinePipeline
A Closer Look at A Closer Look at OpenGL’s Pixel OpenGL’s Pixel PipelinePipeline
Pixel Scale&
BiasClamp
PixelRectangle
PixelUnpacking
TypeConversion
FramebufferTextureMemory
Pixel Map
Performance OpenGL 18
Working with Pixel Working with Pixel RectanglesRectanglesWorking with Pixel Working with Pixel RectanglesRectangles
Texture downloads and BltsTexture downloads and Blts•OpenGL supports many formats for storing
pixel data– Signed and unsigned types, floating point
•Type conversions from storage type to framebuffer / texture memory format occur automatically
Texture downloads and BltsTexture downloads and Blts•OpenGL supports many formats for storing
pixel data– Signed and unsigned types, floating point
•Type conversions from storage type to framebuffer / texture memory format occur automatically
Performance OpenGL 19
Pixel Data ConversionsPixel Data ConversionsPixel Data ConversionsPixel Data Conversions
0
5
10
15
20
25
30
Machine 1 Machine 2 Machine 3
GL_BYTE GL_UNSIGNED_BYTE GL_SHORT GL_UNSIGNED_SHORT
GL_INT GL_UNSIGNED_INT GL_FLOAT
Performance OpenGL 20
Pixel Data Conversions Pixel Data Conversions (cont.)(cont.)Pixel Data Conversions Pixel Data Conversions (cont.)(cont.)
0
0.5
1
1.5
2
2.5
Machine 1 Machine 2 Machine 3
GL_UNSIGNED_SHORT_4_4_4_4 GL_UNSIGNED_SHORT_4_4_4_4_REV GL_UNSIGNED_SHORT_5_5_5_1
GL_UNSIGNED_SHORT_1_5_5_5_REV GL_UNSIGNED_INT_8_8_8_8 GL_UNSIGNED_INT_8_8_8_8_REV
GL_UNSIGNED_INT_10_10_10_2 GL_UNSIGNED_INT_2_10_10_10_REV
Performance OpenGL 21
Pixel Data Conversions Pixel Data Conversions (cont.)(cont.)Pixel Data Conversions Pixel Data Conversions (cont.)(cont.)
ObservationsObservations•Signed data types probably aren’t optimized
– OpenGL clamps colors to [0, 1]
•Match pixel format to window’s pixel format for blts– Usually involves using packed pixel formats
– No significant difference for rendering speed for texture’s internal format
ObservationsObservations•Signed data types probably aren’t optimized
– OpenGL clamps colors to [0, 1]
•Match pixel format to window’s pixel format for blts– Usually involves using packed pixel formats
– No significant difference for rendering speed for texture’s internal format
Performance OpenGL 22
Fragment Operations Fragment Operations and Fill Rate and Fill Rate Fragment Operations Fragment Operations and Fill Rate and Fill Rate
The more you do, the less you getThe more you do, the less you get•The more work per pixel, the less fill you get
The more you do, the less you getThe more you do, the less you get•The more work per pixel, the less fill you get
Performance OpenGL 23
Fragment Operations Fragment Operations and Fill Rate (cont’d)and Fill Rate (cont’d)Fragment Operations Fragment Operations and Fill Rate (cont’d)and Fill Rate (cont’d)
Percentage of Peak Fill
0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
120.00%
1 4 7 10
13
16
19
22
25
28
31
34
37
40
43
46
49
52
55
58
61
64
Machine 2Machine 2 Machine 4Machine 4
Percentage of Peak Fill
0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
120.00%
1 4 7 10
13
16
19
22
25
28
31
34
37
40
43
46
49
52
55
58
61
64
Machine 3Machine 3
Percentage of Peak Fill
0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
120.00%
1 4 7 10
13
16
19
22
25
28
31
34
37
40
43
46
49
52
55
58
61
64
Pecentage of Peak Fill
0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
120.00%
1 4 7 10
13
16
19
22
25
28
31
34
37
40
43
46
49
52
55
58
61
64
Machine 1Machine 1
Performance OpenGL 24
Texture-mapping Texture-mapping ConsiderationsConsiderationsTexture-mapping Texture-mapping ConsiderationsConsiderations
Use Texture ObjectsUse Texture Objects•Allows OpenGL to do texture memory
management– Loads texture into texture memory when appropriate
– Only convert data once
•Provides queries for checking if a texture is resident– Load all textures, and verify they all fit simultaneously
Use Texture ObjectsUse Texture Objects•Allows OpenGL to do texture memory
management– Loads texture into texture memory when appropriate
– Only convert data once
•Provides queries for checking if a texture is resident– Load all textures, and verify they all fit simultaneously
Performance OpenGL 25
Texture-mapping Texture-mapping Considerations (cont.)Considerations (cont.)Texture-mapping Texture-mapping Considerations (cont.)Considerations (cont.)
Texture Objects (cont.)Texture Objects (cont.)•Assign priorities to textures
– Provides hints to texture-memory manager on which textures are most important
•Can be shared between OpenGL contexts– Allows one thread to load textures; other thread to
render using them
•Requires OpenGL 1.1
Texture Objects (cont.)Texture Objects (cont.)•Assign priorities to textures
– Provides hints to texture-memory manager on which textures are most important
•Can be shared between OpenGL contexts– Allows one thread to load textures; other thread to
render using them
•Requires OpenGL 1.1
Performance OpenGL 26
Texture-mapping Texture-mapping Considerations (cont.)Considerations (cont.)Texture-mapping Texture-mapping Considerations (cont.)Considerations (cont.)
Sub-loading TexturesSub-loading Textures•Only update a portion of a texture
– Reduces bandwidth for downloading textures
– Usually requires modifying texture-coordinate matrix
Sub-loading TexturesSub-loading Textures•Only update a portion of a texture
– Reduces bandwidth for downloading textures
– Usually requires modifying texture-coordinate matrix
Performance OpenGL 27
Texture-mapping Texture-mapping Considerations (cont.)Considerations (cont.)Texture-mapping Texture-mapping Considerations (cont.)Considerations (cont.)
Know what sizes your textures Know what sizes your textures need to beneed to be•What sizes of mipmaps will you need?
•OpenGL 1.2 introduces texture level-of-detail– Ability to have fine grained control over mipmap stack
Only load a subset of mipmaps Control which mipmaps are used
Know what sizes your textures Know what sizes your textures need to beneed to be•What sizes of mipmaps will you need?
•OpenGL 1.2 introduces texture level-of-detail– Ability to have fine grained control over mipmap stack
Only load a subset of mipmaps Control which mipmaps are used
Performance OpenGL 28
What If Those Options What If Those Options Aren’t Viable?Aren’t Viable?What If Those Options What If Those Options Aren’t Viable?Aren’t Viable?
•Use more or faster hardware
•Utilize the “extra time” in other parts of the application– Transform pipeline
tessellate objects for smoother appearance use better lighting
– Application more accurate simulation better physics
•Use more or faster hardware
•Utilize the “extra time” in other parts of the application– Transform pipeline
tessellate objects for smoother appearance use better lighting
– Application more accurate simulation better physics
Performance OpenGL 29
Transform-limited Transform-limited BottlenecksBottlenecksTransform-limited Transform-limited BottlenecksBottlenecks
System cannot process all the System cannot process all the vertices required in the allotted vertices required in the allotted timetime• If application doesn’t speed up in fill-limited
test, it’s most likely transform-limited
•Additional tests include– Disable lighting
– Disable texture coordinate generation
System cannot process all the System cannot process all the vertices required in the allotted vertices required in the allotted timetime• If application doesn’t speed up in fill-limited
test, it’s most likely transform-limited
•Additional tests include– Disable lighting
– Disable texture coordinate generation
Performance OpenGL 30
A Closer Look at A Closer Look at OpenGL’s OpenGL’s Transformation Transformation PipelinePipeline
A Closer Look at A Closer Look at OpenGL’s OpenGL’s Transformation Transformation PipelinePipeline
TextureCoordinateGeneration
VertexCoordinates
LightingNormals
TextureCoordinates
Color
ModelViewTransform
ProjectionTransform
Lighting Clipping
TextureCoordinateTransform
Performance OpenGL 31
Reducing Transform-Reducing Transform-limited Bottleneckslimited BottlenecksReducing Transform-Reducing Transform-limited Bottleneckslimited Bottlenecks
Do less work per-vertexDo less work per-vertex•Tune lighting
•Use “typed” OpenGL matrices
•Use explicit texture coordinates
•Simulate features in texturing– lighting
Do less work per-vertexDo less work per-vertex•Tune lighting
•Use “typed” OpenGL matrices
•Use explicit texture coordinates
•Simulate features in texturing– lighting
Performance OpenGL 32
Lighting Lighting ConsiderationsConsiderationsLighting Lighting ConsiderationsConsiderations
•Use infinite (directional) lights– Less computation compared to local (point) lights
– Don’t use GL_LIGHTMODEL_LOCAL_VIEWER
•Use fewer lights– Not all lights may be hardware accelerated
•Use infinite (directional) lights– Less computation compared to local (point) lights
– Don’t use GL_LIGHTMODEL_LOCAL_VIEWER
•Use fewer lights– Not all lights may be hardware accelerated
Performance OpenGL 33
Lighting Lighting Considerations (cont.)Considerations (cont.)Lighting Lighting Considerations (cont.)Considerations (cont.)
•Use a texture-based lighting scheme– Only helps if you’re not fill-limited
•Use a texture-based lighting scheme– Only helps if you’re not fill-limited
Performance OpenGL 34
Reducing Transform-Reducing Transform-limited Bottlenecks limited Bottlenecks (cont.)(cont.)
Reducing Transform-Reducing Transform-limited Bottlenecks limited Bottlenecks (cont.)(cont.)Matrix AdjustmentsMatrix Adjustments
•Use “typed” OpenGL matrix calls
– Some implementations track matrix type to reduce matrix-vector multiplication operations
Matrix AdjustmentsMatrix Adjustments•Use “typed” OpenGL matrix calls
– Some implementations track matrix type to reduce matrix-vector multiplication operations
glLoadMatrix*()glMultMatrix*()
glRotate*()glScale*()glTranslate*()glLoadIdentity()
“Untyped”“Typed”
Performance OpenGL 35
Application-limited Application-limited BottlenecksBottlenecksApplication-limited Application-limited BottlenecksBottlenecks
When OpenGL does all you ask, and When OpenGL does all you ask, and your application still runs too slowyour application still runs too slow•System may not be able to transfer data to
OpenGL fast enough
•Test by modifying application so that no rendering is performed, but all data is still transferred to OpenGL
When OpenGL does all you ask, and When OpenGL does all you ask, and your application still runs too slowyour application still runs too slow•System may not be able to transfer data to
OpenGL fast enough
•Test by modifying application so that no rendering is performed, but all data is still transferred to OpenGL
Performance OpenGL 36
Application-limited Application-limited Bottlenecks (cont.)Bottlenecks (cont.)Application-limited Application-limited Bottlenecks (cont.)Bottlenecks (cont.)
•Rendering in OpenGL is triggered when vertices are sent to the pipe
•Send all data to pipe, just not necessarily in its original form– Replace all glVertex*() and glColor*() calls with glNormal*() calls
glNormal*() only sets the current vertex’s normal values Application transfers the same amount of data to the
pipe, but doesn’t have to wait for rendering to complete
•Rendering in OpenGL is triggered when vertices are sent to the pipe
•Send all data to pipe, just not necessarily in its original form– Replace all glVertex*() and glColor*() calls with glNormal*() calls
glNormal*() only sets the current vertex’s normal values Application transfers the same amount of data to the
pipe, but doesn’t have to wait for rendering to complete
Performance OpenGL 37
Reducing Application-Reducing Application-limited Bottleneckslimited BottlenecksReducing Application-Reducing Application-limited Bottleneckslimited Bottlenecks
•No amount of OpenGL transform or rasterization tuning will help the problem
•Revisit application design decisions– Data structures
– Traversal methods
– Storage formats
•Use an application profiling tool (e.g. pixie & prof, gprof, or other similar tools)
•No amount of OpenGL transform or rasterization tuning will help the problem
•Revisit application design decisions– Data structures
– Traversal methods
– Storage formats
•Use an application profiling tool (e.g. pixie & prof, gprof, or other similar tools)
Performance OpenGL 38
The Novice OpenGL The Novice OpenGL Programmer’s View of Programmer’s View of the Worldthe World
The Novice OpenGL The Novice OpenGL Programmer’s View of Programmer’s View of the Worldthe World
SetState
Render
Performance OpenGL 39
What Happens When What Happens When You Set OpenGL StateYou Set OpenGL StateWhat Happens When What Happens When You Set OpenGL StateYou Set OpenGL State
• The amount of work varies by operation
• But all request a validation at next rendering operation
• The amount of work varies by operation
• But all request a validation at next rendering operation
Turning on or off a Turning on or off a feature (feature (glEnable()))
Set the feature’s enable Set the feature’s enable flagflag
Set a “typed” set of Set a “typed” set of data data ((glMaterialfv()))
Set values in OpenGL’s Set values in OpenGL’s contextcontext
Transfer “untyped” Transfer “untyped” data data ((glTexImage2D()))
Transfer and convert Transfer and convert data from host format data from host format into internal into internal representationrepresentation
Performance OpenGL 40
A (Somewhat) More A (Somewhat) More Accurate Accurate RepresentationRepresentation
A (Somewhat) More A (Somewhat) More Accurate Accurate RepresentationRepresentation
SetState
Render
Validation
Performance OpenGL 41
ValidationValidationValidationValidation
OpenGL’s synchronization processOpenGL’s synchronization process• Validation occurs in the transition from state setting
to rendering
• Not all state changes trigger a validation
– Vertex data (e.g. color, normal, texture coordinates)
– Changing rendering primitive
OpenGL’s synchronization processOpenGL’s synchronization process• Validation occurs in the transition from state setting
to rendering
• Not all state changes trigger a validation
– Vertex data (e.g. color, normal, texture coordinates)
– Changing rendering primitive
glMaterial( GL_FRONT, GL_DIFFUSE, blue );glEnable( GL_LIGHT0 );glBegin( GL_TRIANGLES );
Performance OpenGL 42
What Happens in a What Happens in a ValidationValidationWhat Happens in a What Happens in a ValidationValidation
•Changing state may do more than just set values in the OpenGL context– May require reconfiguring the OpenGL pipeline
selecting a different rasterization routine enabling the lighting machine
– Internal caches may be recomputed vertex / viewpoint independent data
•Changing state may do more than just set values in the OpenGL context– May require reconfiguring the OpenGL pipeline
selecting a different rasterization routine enabling the lighting machine
– Internal caches may be recomputed vertex / viewpoint independent data
Performance OpenGL 43
The Way it Really Is The Way it Really Is (Conceptually)(Conceptually)The Way it Really Is The Way it Really Is (Conceptually)(Conceptually)
SetState
Render
Validation
DifferentRenderingPrimitive
Performance OpenGL 44
Why Be Concerned Why Be Concerned About Validations?About Validations?Why Be Concerned Why Be Concerned About Validations?About Validations?
Validations can rob performance Validations can rob performance from an applicationfrom an application• “Redundant” state and primitive changes
•Validation is a two-step process– Determine what data needs to be updated
– Select appropriate rendering routines based on enabled features
Validations can rob performance Validations can rob performance from an applicationfrom an application• “Redundant” state and primitive changes
•Validation is a two-step process– Determine what data needs to be updated
– Select appropriate rendering routines based on enabled features
Performance OpenGL 45
How Can Validations How Can Validations Be Minimized?Be Minimized?How Can Validations How Can Validations Be Minimized?Be Minimized?
Be LazyBe Lazy• Change state as little as possible
• Try to group primitives by type
• Beware of “under the covers” state changes– GL_COLOR_MATERIAL
may force an update to the lighting cache ever call to glColor*()
Be LazyBe Lazy• Change state as little as possible
• Try to group primitives by type
• Beware of “under the covers” state changes– GL_COLOR_MATERIAL
may force an update to the lighting cache ever call to glColor*()
Performance OpenGL 46
How Can Validations How Can Validations Be Minimized? (cont.)Be Minimized? (cont.)How Can Validations How Can Validations Be Minimized? (cont.)Be Minimized? (cont.)
Beware of Beware of glPushAttrib() / glPopAttrib()• Very convenient for writing libraries
• Saves lots of state when called– All elements of an attribute groups are copied for later
• Almost guaranteed to do a validation when calling glPopAttrib()
Beware of Beware of glPushAttrib() / glPopAttrib()• Very convenient for writing libraries
• Saves lots of state when called– All elements of an attribute groups are copied for later
• Almost guaranteed to do a validation when calling glPopAttrib()
Performance OpenGL 47
State SortingState SortingState SortingState Sorting
Simple technique … Big payoffSimple technique … Big payoff•Arrange rendering sequence to minimize
state changes
•Group primitives based on their state attributes
•Organize rendering based on the expense of the operation
Simple technique … Big payoffSimple technique … Big payoff•Arrange rendering sequence to minimize
state changes
•Group primitives based on their state attributes
•Organize rendering based on the expense of the operation
Performance OpenGL 48
State Sorting (cont.)State Sorting (cont.)State Sorting (cont.)State Sorting (cont.)
Texture Download
Modifying LightingParameters
Matrix Operations
Vertex Data Least Expensive
Most Expensive
Performance OpenGL 49
A Comment on A Comment on EncapsulationEncapsulationA Comment on A Comment on EncapsulationEncapsulation
An Extremely Handy Design An Extremely Handy Design Mechanism, however …Mechanism, however …•Encapsulation may affect performance
– Tendency to want to complete all operations for an object before continuing to next object
limits state sorting potential may cause unnecessary validations
An Extremely Handy Design An Extremely Handy Design Mechanism, however …Mechanism, however …•Encapsulation may affect performance
– Tendency to want to complete all operations for an object before continuing to next object
limits state sorting potential may cause unnecessary validations
Performance OpenGL 50
A Comment on A Comment on Encapsulation (cont.)Encapsulation (cont.)A Comment on A Comment on Encapsulation (cont.)Encapsulation (cont.)
• Using a “visitor” type pattern can reduce state changes and validations– Usually a two-pass operation
Traverse objects, building a list of rendering primitives by state and type
Render by processing lists
– Popular method employed by many scene-graph packages
• Using a “visitor” type pattern can reduce state changes and validations– Usually a two-pass operation
Traverse objects, building a list of rendering primitives by state and type
Render by processing lists
– Popular method employed by many scene-graph packages
Performance OpenGL 51
““Scene Graph” Scene Graph” LessonsLessons““Scene Graph” Scene Graph” LessonsLessons
Can you use pre-packaged software?Can you use pre-packaged software?•Save yourself some trouble
Lessons learned from scene graphsLessons learned from scene graphsInventor OpenRM Performer
DirectModel Cosmo3D OpenSG
DirectSceneGraph/Fahrenheit
Can you use pre-packaged software?Can you use pre-packaged software?•Save yourself some trouble
Lessons learned from scene graphsLessons learned from scene graphsInventor OpenRM Performer
DirectModel Cosmo3D OpenSG
DirectSceneGraph/Fahrenheit
Performance OpenGL 52
Scene Graph LessonsScene Graph LessonsScene Graph LessonsScene Graph Lessons
Organization is the Organization is the keykey•Organize scene data for performance
•E.g. by transformation & bounding hierarchy
All about balance (like all high-perf All about balance (like all high-perf coding)coding)•Speed versus convenience
•Portability versus speed
Organization is the Organization is the keykey•Organize scene data for performance
•E.g. by transformation & bounding hierarchy
All about balance (like all high-perf All about balance (like all high-perf coding)coding)•Speed versus convenience
•Portability versus speed
Performance OpenGL 53
Performance GoalsPerformance GoalsPerformance GoalsPerformance Goals
Identify performance techniques and Identify performance techniques and goalsgoals•Sort to eliminate costly state changes
•Evaluate state lazily– Don’t set state if won’t draw geometry
•Eliminate redundance– Don’t set “red” if material is already “red”
Identify performance techniques and Identify performance techniques and goalsgoals•Sort to eliminate costly state changes
•Evaluate state lazily– Don’t set state if won’t draw geometry
•Eliminate redundance– Don’t set “red” if material is already “red”
Performance OpenGL 54
Feature GoalsFeature GoalsFeature GoalsFeature Goals
Identify algorithms and requirementsIdentify algorithms and requirements•E.g. Sort alpha/transparent shapes back-to-
front
Identify multiple rendering passesIdentify multiple rendering passes•Shadow texture
•Environment Map
Will you need threading?Will you need threading?
Identify algorithms and requirementsIdentify algorithms and requirements•E.g. Sort alpha/transparent shapes back-to-
front
Identify multiple rendering passesIdentify multiple rendering passes•Shadow texture
•Environment Map
Will you need threading?Will you need threading?
Performance OpenGL 55
Scene Graph LessonsScene Graph LessonsScene Graph LessonsScene Graph Lessons
Put units of work in nodes/leavesPut units of work in nodes/leaves•Speed of traversal vs. convenience
– Instancing
•e.g. Vertex Arrays, Material, Primitives
Do work where you have the Do work where you have the information neededinformation needed•e.g. don't require leaf info before traversal
Put units of work in nodes/leavesPut units of work in nodes/leaves•Speed of traversal vs. convenience
– Instancing
•e.g. Vertex Arrays, Material, Primitives
Do work where you have the Do work where you have the information neededinformation needed•e.g. don't require leaf info before traversal
Performance OpenGL 56
Scene Graph LessonsScene Graph LessonsScene Graph LessonsScene Graph Lessons
Data (“Nodes”) versus operation Data (“Nodes”) versus operation (“Action”)(“Action”)VertexSet::Draw() might be too inflexible
Renderer::draw(Node*)allows more flexibility
•Write new Renderer without changing nodes
Data (“Nodes”) versus operation Data (“Nodes”) versus operation (“Action”)(“Action”)VertexSet::Draw() might be too inflexible
Renderer::draw(Node*)allows more flexibility
•Write new Renderer without changing nodes
Performance OpenGL 57
Scene Graph LessonsScene Graph LessonsScene Graph LessonsScene Graph Lessons
Data (“Nodes”) versus operation Data (“Nodes”) versus operation (“Action”)(“Action”)•Might take a little thought - but beneficial
•Gives you flexibility for future
•Need to keep careful eye on performance
Data (“Nodes”) versus operation Data (“Nodes”) versus operation (“Action”)(“Action”)•Might take a little thought - but beneficial
•Gives you flexibility for future
•Need to keep careful eye on performance
Performance OpenGL 58
Example Data Example Data StructureStructureExample Data Example Data StructureStructure
Directed Graph / TreeDirected Graph / Tree• Internal and leaf Nodes
•Subclassing
Render TraversalRender Traversal•Use bounding volumes
•OpenGL state management
Directed Graph / TreeDirected Graph / Tree• Internal and leaf Nodes
•Subclassing
Render TraversalRender Traversal•Use bounding volumes
•OpenGL state management
Performance OpenGL 59
ExampleExampleExampleExample GroupGroup
GroupGroup
ShapeShape
VertexSet
PixelState
VertexState
Performance OpenGL 60
Internal NodesInternal NodesInternal NodesInternal Nodes
Represent spatial organizationRepresent spatial organization•Bounding hierarchy, spatial grid, etc…
Might encode some functionalityMight encode some functionality•Animation
•Level of Detail
Represent spatial organizationRepresent spatial organization•Bounding hierarchy, spatial grid, etc…
Might encode some functionalityMight encode some functionality•Animation
•Level of Detail
Performance OpenGL 61
Internal NodesInternal NodesInternal NodesInternal Nodes
Opportunities from bounding Opportunities from bounding volumes volumes •View frustum cull (early if using a hierarchy)
•Screen space size estimation– e.g. for level-of-detail or subdivision
•Ray/picking optimizations
Opportunities from bounding Opportunities from bounding volumes volumes •View frustum cull (early if using a hierarchy)
•Screen space size estimation– e.g. for level-of-detail or subdivision
•Ray/picking optimizations
Performance OpenGL 62
Leaf NodesLeaf NodesLeaf NodesLeaf Nodes
Make leaf data approximate OpenGL Make leaf data approximate OpenGL state state •Rapid application from leaf to OpenGL
But encode some abstractionBut encode some abstraction• Inventor’s SoComplexity
•EnvironmentMap instead of assuming GL_SPHERE_MAP
•Allow optimizations on specific platforms
Make leaf data approximate OpenGL Make leaf data approximate OpenGL state state •Rapid application from leaf to OpenGL
But encode some abstractionBut encode some abstraction• Inventor’s SoComplexity
•EnvironmentMap instead of assuming GL_SPHERE_MAP
•Allow optimizations on specific platforms
Performance OpenGL 63
Example: Example: VertexSetVertexSetExample: Example: VertexSetVertexSet
class VertexSet {class VertexSet {
GLenum fmtGLenum fmt
float *verts;float *verts;
int vertCount;int vertCount;
};};
class VertexSet {class VertexSet {
GLenum fmtGLenum fmt
float *verts;float *verts;
int vertCount;int vertCount;
};};
Performance OpenGL 64
Example: Example: VertexStateVertexStateExample: Example: VertexStateVertexState
class VertexState {class VertexState {
MaterialMaterial *Mtl; *Mtl;
LightingLighting *Lighting; *Lighting;
ClipPlaneClipPlane *Planes; *Planes;
......
};};
class VertexState {class VertexState {
MaterialMaterial *Mtl; *Mtl;
LightingLighting *Lighting; *Lighting;
ClipPlaneClipPlane *Planes; *Planes;
......
};};
Performance OpenGL 65
Example: Example: NodeNodeExample: Example: NodeNode
class Node {class Node {
VolumeVolume Bounds; Bounds;
......
};};
class Node {class Node {
VolumeVolume Bounds; Bounds;
......
};};
Performance OpenGL 66
Example: Example: ShapeShapeExample: Example: ShapeShape
class Shape : Node {class Shape : Node {
VertexSetVertexSet *Verts; *Verts;
VVertexStateertexState *VState; *VState;
PPixelStateixelState *PState; *PState;
int PrimLengths[];int PrimLengths[];
Glenum PrimTypes[];Glenum PrimTypes[];
int *PrimIndices;int *PrimIndices;
......
};};
class Shape : Node {class Shape : Node {
VertexSetVertexSet *Verts; *Verts;
VVertexStateertexState *VState; *VState;
PPixelStateixelState *PState; *PState;
int PrimLengths[];int PrimLengths[];
Glenum PrimTypes[];Glenum PrimTypes[];
int *PrimIndices;int *PrimIndices;
......
};};
Performance OpenGL 67
Example: Example: GroupGroupExample: Example: GroupGroup
class Group : Node {class Group : Node {
int ChildCount;int ChildCount;
NodeNode Children[]; Children[];
......
};};
class Group : Node {class Group : Node {
int ChildCount;int ChildCount;
NodeNode Children[]; Children[];
......
};};
Performance OpenGL 68
OpenGL Context OpenGL Context EncapsulationEncapsulationOpenGL Context OpenGL Context EncapsulationEncapsulation
Bundles of OpenGL stateBundles of OpenGL state• Items commonly changed together
texture fragment ops
lighting material
vertex arrays shaders
Bundles of OpenGL stateBundles of OpenGL state• Items commonly changed together
texture fragment ops
lighting material
vertex arrays shaders
Performance OpenGL 69
Example: Example: OpenGLRendererOpenGLRendererExample: Example: OpenGLRendererOpenGLRenderer
This is where you evaluate lazily and skip This is where you evaluate lazily and skip redundant state changesredundant state changes
class OpenGLRenderer : Traversal {class OpenGLRenderer : Traversal { virtual void Traverse(Node *root);virtual void Traverse(Node *root);private:private: void Schedule (float mtx[16],void Schedule (float mtx[16],
ShapeShape * *shape);shape); void Finish(void); void Finish(void); ......};};
This is where you evaluate lazily and skip This is where you evaluate lazily and skip redundant state changesredundant state changes
class OpenGLRenderer : Traversal {class OpenGLRenderer : Traversal { virtual void Traverse(Node *root);virtual void Traverse(Node *root);private:private: void Schedule (float mtx[16],void Schedule (float mtx[16],
ShapeShape * *shape);shape); void Finish(void); void Finish(void); ......};};
Performance OpenGL 70
Example: Example: OpenGLRendererOpenGLRendererExample: Example: OpenGLRendererOpenGLRenderer
OpenGLRenderer::Traverse(root)OpenGLRenderer::Traverse(root)Recursive traversal - call from appRecursive traversal - call from appCheck bounding volumeCheck bounding volumeAccumulate transformation matrixAccumulate transformation matrix
private OpenGLRenderer::Schedule(mtx, shape)private OpenGLRenderer::Schedule(mtx, shape)Schedule a shape for renderingSchedule a shape for rendering
private OpenGLRenderer::Finish()private OpenGLRenderer::Finish()State sort by bundleState sort by bundleSort transparent objs back to frontSort transparent objs back to frontDraw all the objectsDraw all the objects
OpenGLRenderer::Traverse(root)OpenGLRenderer::Traverse(root)Recursive traversal - call from appRecursive traversal - call from appCheck bounding volumeCheck bounding volumeAccumulate transformation matrixAccumulate transformation matrix
private OpenGLRenderer::Schedule(mtx, shape)private OpenGLRenderer::Schedule(mtx, shape)Schedule a shape for renderingSchedule a shape for rendering
private OpenGLRenderer::Finish()private OpenGLRenderer::Finish()State sort by bundleState sort by bundleSort transparent objs back to frontSort transparent objs back to frontDraw all the objectsDraw all the objects
Performance OpenGL 71
Case Study: Rendering Case Study: Rendering A CubeA CubeCase Study: Rendering Case Study: Rendering A CubeA Cube
More than one way to render a cubeMore than one way to render a cube•Render 100000 cubes
More than one way to render a cubeMore than one way to render a cube•Render 100000 cubes
Render sixRender sixseparate quadsseparate quads
Render twoRender twoquads, and onequads, and one
quad-stripquad-strip
Performance OpenGL 72
Case Study: Method 1Case Study: Method 1Case Study: Method 1Case Study: Method 1
Once for each cube …
glColor3fv( color );
for ( i = 0; i < NUM_CUBE_FACES; ++i ) {
glBegin( GL_QUADS );
glVertex3fv( cube[cubeFace[i][0]] );
glVertex3fv( cube[cubeFace[i][1]] );
glVertex3fv( cube[cubeFace[i][2]] );
glVertex3fv( cube[cubeFace[i][3]] );
glEnd();
}
Once for each cube …
glColor3fv( color );
for ( i = 0; i < NUM_CUBE_FACES; ++i ) {
glBegin( GL_QUADS );
glVertex3fv( cube[cubeFace[i][0]] );
glVertex3fv( cube[cubeFace[i][1]] );
glVertex3fv( cube[cubeFace[i][2]] );
glVertex3fv( cube[cubeFace[i][3]] );
glEnd();
}
Performance OpenGL 73
Case Study: Method 2Case Study: Method 2Case Study: Method 2Case Study: Method 2
Once for each cube …
glColor3fv( color ); glBegin( GL_QUADS ); for ( i = 0; i < NUM_CUBE_FACES; ++i ) { glVertex3fv( cube[cubeFace[i][0]] ); glVertex3fv( cube[cubeFace[i][1]] ); glVertex3fv( cube[cubeFace[i][2]] ); glVertex3fv( cube[cubeFace[i][3]] ); } glEnd();
Once for each cube …
glColor3fv( color ); glBegin( GL_QUADS ); for ( i = 0; i < NUM_CUBE_FACES; ++i ) { glVertex3fv( cube[cubeFace[i][0]] ); glVertex3fv( cube[cubeFace[i][1]] ); glVertex3fv( cube[cubeFace[i][2]] ); glVertex3fv( cube[cubeFace[i][3]] ); } glEnd();
Performance OpenGL 74
Case Study: Method 3Case Study: Method 3Case Study: Method 3Case Study: Method 3
glBegin( GL_QUADS ); for ( i = 0; i < numCubes; ++i ) { for ( i = 0; i < NUM_CUBE_FACES; ++i ) { glVertex3fv( cube[cubeFace[i][0]] ); glVertex3fv( cube[cubeFace[i][1]] ); glVertex3fv( cube[cubeFace[i][2]] ); glVertex3fv( cube[cubeFace[i][3]] ); } } glEnd();
glBegin( GL_QUADS ); for ( i = 0; i < numCubes; ++i ) { for ( i = 0; i < NUM_CUBE_FACES; ++i ) { glVertex3fv( cube[cubeFace[i][0]] ); glVertex3fv( cube[cubeFace[i][1]] ); glVertex3fv( cube[cubeFace[i][2]] ); glVertex3fv( cube[cubeFace[i][3]] ); } } glEnd();
Performance OpenGL 75
Case Study: Method 4Case Study: Method 4Case Study: Method 4Case Study: Method 4
Once for each cube …
glColor3fv( color );
glBegin( GL_QUADS );
glVertex3fv( cube[cubeFace[0][0]] );
glVertex3fv( cube[cubeFace[0][1]] );
glVertex3fv( cube[cubeFace[0][2]] );
glVertex3fv( cube[cubeFace[0][3]] );
glVertex3fv( cube[cubeFace[1][0]] );
glVertex3fv( cube[cubeFace[1][1]] );
glVertex3fv( cube[cubeFace[1][2]] );
glVertex3fv( cube[cubeFace[1][3]] );
glEnd();
Once for each cube …
glColor3fv( color );
glBegin( GL_QUADS );
glVertex3fv( cube[cubeFace[0][0]] );
glVertex3fv( cube[cubeFace[0][1]] );
glVertex3fv( cube[cubeFace[0][2]] );
glVertex3fv( cube[cubeFace[0][3]] );
glVertex3fv( cube[cubeFace[1][0]] );
glVertex3fv( cube[cubeFace[1][1]] );
glVertex3fv( cube[cubeFace[1][2]] );
glVertex3fv( cube[cubeFace[1][3]] );
glEnd();
glBegin( GL_QUAD_STRIP );
for ( i = 2; i < NUM_CUBE_FACES; ++i ) {
glVertex3fv( cube[cubeFace[i][0]] );
glVertex3fv( cube[cubeFace[i][1]] );
}
glVertex3fv( cube[cubeFace[2][0]] );
glVertex3fv( cube[cubeFace[2][1]] );
glEnd();
glBegin( GL_QUAD_STRIP );
for ( i = 2; i < NUM_CUBE_FACES; ++i ) {
glVertex3fv( cube[cubeFace[i][0]] );
glVertex3fv( cube[cubeFace[i][1]] );
}
glVertex3fv( cube[cubeFace[2][0]] );
glVertex3fv( cube[cubeFace[2][1]] );
glEnd();
Performance OpenGL 76
Case Study: Method 5Case Study: Method 5Case Study: Method 5Case Study: Method 5
glBegin( GL_QUADS );
for ( i = 0; i < numCubes; ++i ) {
Cube& cube = cubes[i];
glColor3fv( color[i] );
glVertex3fv( cube[cubeFace[0][0]] );
glVertex3fv( cube[cubeFace[0][1]] );
glVertex3fv( cube[cubeFace[0][2]] );
glVertex3fv( cube[cubeFace[0][3]] );
glVertex3fv( cube[cubeFace[1][0]] );
glVertex3fv( cube[cubeFace[1][1]] );
glVertex3fv( cube[cubeFace[1][2]] );
glVertex3fv( cube[cubeFace[1][3]] );
}
glEnd();
glBegin( GL_QUADS );
for ( i = 0; i < numCubes; ++i ) {
Cube& cube = cubes[i];
glColor3fv( color[i] );
glVertex3fv( cube[cubeFace[0][0]] );
glVertex3fv( cube[cubeFace[0][1]] );
glVertex3fv( cube[cubeFace[0][2]] );
glVertex3fv( cube[cubeFace[0][3]] );
glVertex3fv( cube[cubeFace[1][0]] );
glVertex3fv( cube[cubeFace[1][1]] );
glVertex3fv( cube[cubeFace[1][2]] );
glVertex3fv( cube[cubeFace[1][3]] );
}
glEnd();
for ( i = 0; i < numCubes; ++i ) { Cube& cube = cubes[i];
glColor3fv( color[i] );
glBegin( GL_QUAD_STRIP );
for ( i = 2; i < NUM_CUBE_FACES; ++i ) {
glVertex3fv( cube[cubeFace[i][0]] );
glVertex3fv( cube[cubeFace[i][1]] );
}
glVertex3fv( cube[cubeFace[2][0]] );
glVertex3fv( cube[cubeFace[2][1]] );
glEnd();
}
for ( i = 0; i < numCubes; ++i ) { Cube& cube = cubes[i];
glColor3fv( color[i] );
glBegin( GL_QUAD_STRIP );
for ( i = 2; i < NUM_CUBE_FACES; ++i ) {
glVertex3fv( cube[cubeFace[i][0]] );
glVertex3fv( cube[cubeFace[i][1]] );
}
glVertex3fv( cube[cubeFace[2][0]] );
glVertex3fv( cube[cubeFace[2][1]] );
glEnd();
}
Performance OpenGL 77
Case Study: ResultsCase Study: ResultsCase Study: ResultsCase Study: Results
0.00.20.40.60.81.01.21.41.61.82.0
Machine 1 Machine 2 Machine 3
Method 1
Method 2
Method 3
Method 4
Method 5
Performance OpenGL 78
Rendering GeometryRendering GeometryRendering GeometryRendering Geometry
OpenGL has four ways to specify OpenGL has four ways to specify vertex-based geometryvertex-based geometry• Immediate mode
•Display lists
•Vertex arrays
• Interleaved vertex arrays
OpenGL has four ways to specify OpenGL has four ways to specify vertex-based geometryvertex-based geometry• Immediate mode
•Display lists
•Vertex arrays
• Interleaved vertex arrays
Performance OpenGL 79
Rendering Geometry Rendering Geometry (cont.)(cont.)Rendering Geometry Rendering Geometry (cont.)(cont.)
Not all ways are created equalNot all ways are created equal
0
0.5
1
1.5
2
2.5
3
3.5
Machine 1 Machine 2 Machine 3
Immediate
Display List
Array Element
Draw Array
Draw Elements
Interleaved Array Element
Interleaved Draw Array
Interleaved Draw Elements
Performance OpenGL 80
Rendering Geometry Rendering Geometry (cont.)(cont.)Rendering Geometry Rendering Geometry (cont.)(cont.)
Add lighting and color material to Add lighting and color material to the mixthe mix
Add lighting and color material to Add lighting and color material to the mixthe mix
0
1
2
3
4
5
6
Machine 1 Machine 2 Machine 3
Immediate
Display List
Array Element
Draw Array
Draw Elements
Interleaved Array Element
Interleaved Draw Array
Interleaved Draw Elements
Performance OpenGL 81
Case Study: Case Study: Application DescriptionApplication DescriptionCase Study: Case Study: Application DescriptionApplication Description
•1.02M Triangles
•507K Vertices
•Vertex Arrays– Colors
– Normals
– Coordinates
•Color Material
•1.02M Triangles
•507K Vertices
•Vertex Arrays– Colors
– Normals
– Coordinates
•Color Material
Performance OpenGL 82
Case Study: What’s the Case Study: What’s the Problem?Problem?Case Study: What’s the Case Study: What’s the Problem?Problem?
Low frame rateLow frame rate•On a machine capable of 13M
polygons/second application was getting less than 1 frame/second
•Application wasn’t fill limited
Low frame rateLow frame rate•On a machine capable of 13M
polygons/second application was getting less than 1 frame/second
•Application wasn’t fill limited
secondframes
frametrianglessecond
polygons
12 M1.02
M1.13
Performance OpenGL 83
Case Study: The Case Study: The Rendering LoopRendering LoopCase Study: The Case Study: The Rendering LoopRendering Loop
•Vertex Arrays
•glDrawElements() – index based rendering
•Color MaterialglColorMaterial( GL_FRONT, GL_AMBIENT_AND_DIFFUSE );
•Vertex Arrays
•glDrawElements() – index based rendering
•Color MaterialglColorMaterial( GL_FRONT, GL_AMBIENT_AND_DIFFUSE );
glVertexPointer( GL_VERTEX_POINTER );
glNormalPointer( GL_NORMAL_POINTER );
glColorPointer( GL_COLOR_POINTER );
Performance OpenGL 84
Case Study: What To Case Study: What To NoticeNoticeCase Study: What To Case Study: What To NoticeNotice
•Color Material changes two lighting material components per glColor*() call
•Not that many colors used in the model– 18 unique colors, to be exact
– (3 * 1020472 – 18) = 3061398 redundant color calls per frame
•Color Material changes two lighting material components per glColor*() call
•Not that many colors used in the model– 18 unique colors, to be exact
– (3 * 1020472 – 18) = 3061398 redundant color calls per frame
Performance OpenGL 85
Case Study: Case Study: ConclusionsConclusionsCase Study: Case Study: ConclusionsConclusions
A little state sorting goes a long wayA little state sorting goes a long way•Sort triangles based on color
•Rewriting the rendering loop slightly
•Frame rate increased to six frames/second– 500% performance increase
A little state sorting goes a long wayA little state sorting goes a long way•Sort triangles based on color
•Rewriting the rendering loop slightly
•Frame rate increased to six frames/second– 500% performance increase
for ( i = 0; i < numColors; ++i ) { glColor3fv( color[i] ); glDrawElements( …, trisForColor[i] );
}
Performance OpenGL 86
SummarySummarySummarySummary
Know the answer before you startKnow the answer before you start•Understand rendering requirements of your
applications– Have a performance goal
•Utilize applicable benchmarks– Estimate what the hardware’s capable of
•Organize rendering to minimize OpenGL validations and other work
Know the answer before you startKnow the answer before you start•Understand rendering requirements of your
applications– Have a performance goal
•Utilize applicable benchmarks– Estimate what the hardware’s capable of
•Organize rendering to minimize OpenGL validations and other work
Performance OpenGL 87
Summary (cont.)Summary (cont.)Summary (cont.)Summary (cont.)
Pre-process dataPre-process data•Convert images and textures into formats
which don’t require pixel conversions
•Pre-size textures– Simultaneously fit into texture memory
– Mipmaps
•Determine what’s the best format for sending data to the pipe
Pre-process dataPre-process data•Convert images and textures into formats
which don’t require pixel conversions
•Pre-size textures– Simultaneously fit into texture memory
– Mipmaps
•Determine what’s the best format for sending data to the pipe
Performance OpenGL 88
Questions & AnswersQuestions & AnswersQuestions & AnswersQuestions & Answers
Thanks for comingThanks for coming•Updates to notes and slides will be available
at
http://www.plunk.org/Performance.OpenGL
•Feel free to email if you have questions
Thanks for comingThanks for coming•Updates to notes and slides will be available
at
http://www.plunk.org/Performance.OpenGL
•Feel free to email if you have questionsDave ShreinerDave Shreiner
[email protected] Brad GranthamBrad [email protected]
Performance OpenGL 89
ReferencesReferencesReferencesReferences
•OpenGL Programming Guide, 3rd EditionWoo, Mason et. al., Addison Wesley
•OpenGL Reference Manual, 3rd EditionOpenGL Architecture Review Board, Addison Wesley
•OpenGL Specification, Version 1.4OpenGL Architecture Review Board
•OpenGL Programming Guide, 3rd EditionWoo, Mason et. al., Addison Wesley
•OpenGL Reference Manual, 3rd EditionOpenGL Architecture Review Board, Addison Wesley
•OpenGL Specification, Version 1.4OpenGL Architecture Review Board
Performance OpenGL 90
For More InformationFor More InformationFor More InformationFor More Information
•SIGGRAPH 2002 Course # 3 - Developing Efficient Graphics Software
•SIGGRAPH 2000 Course # 32 - Advanced Graphics Programming Techniques Using OpenGL
•SIGGRAPH 2002 Course # 3 - Developing Efficient Graphics Software
•SIGGRAPH 2000 Course # 32 - Advanced Graphics Programming Techniques Using OpenGL
Performance OpenGL 91
AcknowledgementsAcknowledgementsAcknowledgementsAcknowledgements
A Big Thank You to …A Big Thank You to …•Peter Shaheen for a number of the
benchmark programs
•David Shirley for Case Study application
A Big Thank You to …A Big Thank You to …•Peter Shaheen for a number of the
benchmark programs
•David Shirley for Case Study application