EE4271 VLSI Design Interconnect Optimizations Buffer Insertion
Z-Buffer Optimizations Patrick Cozzi Analytical Graphics, Inc.
-
date post
21-Dec-2015 -
Category
Documents
-
view
222 -
download
2
Transcript of Z-Buffer Optimizations Patrick Cozzi Analytical Graphics, Inc.
![Page 1: Z-Buffer Optimizations Patrick Cozzi Analytical Graphics, Inc.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d605503460f94a4127f/html5/thumbnails/1.jpg)
Z-Buffer Optimizations
Patrick Cozzi
Analytical Graphics, Inc.
![Page 2: Z-Buffer Optimizations Patrick Cozzi Analytical Graphics, Inc.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d605503460f94a4127f/html5/thumbnails/2.jpg)
Overview
Z-Buffer Review Hardware: Early-Z Software: Front-to-Back Sorting Hardware: Double-Speed Z-Only Software: Early-Z Pass Software: Deferred Shading Hardware: Buffer Compression Hardware: Fast Clear Hardware: Z-Cull Future: Programmable Culling Unit
![Page 3: Z-Buffer Optimizations Patrick Cozzi Analytical Graphics, Inc.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d605503460f94a4127f/html5/thumbnails/3.jpg)
Z-Buffer Review
Also called Depth Buffer Fragment vs Pixel Alternatives: Painter’s, Ray Casting, etc
![Page 4: Z-Buffer Optimizations Patrick Cozzi Analytical Graphics, Inc.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d605503460f94a4127f/html5/thumbnails/4.jpg)
Z-Buffer History
“Brute-force approach” “Ridiculously expensive”
Sutherland, Sproull, and, Schumacker, “A Characterization of Ten Hidden-Surface Algorithms”, 1974
![Page 5: Z-Buffer Optimizations Patrick Cozzi Analytical Graphics, Inc.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d605503460f94a4127f/html5/thumbnails/5.jpg)
Z-Buffer Quiz 10 triangles cover a pixel. Rendering
these in random order with a Z-buffer, what is the average number of times the pixel’s z-value is written?
See Subtle Tools Slides: erich.realtimerendering.com
![Page 6: Z-Buffer Optimizations Patrick Cozzi Analytical Graphics, Inc.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d605503460f94a4127f/html5/thumbnails/6.jpg)
Z-Buffer Quiz
1st triangle writes depth 2nd triangle has 1/2 chance of writing depth 3rd triangle has 1/3 chance of writing depth
1 + 1/2 + 1/3 + …+ 1/10 = 2.9289…
See Subtle Tools Slides: erich.realtimerendering.com
![Page 7: Z-Buffer Optimizations Patrick Cozzi Analytical Graphics, Inc.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d605503460f94a4127f/html5/thumbnails/7.jpg)
Z-Buffer Quiz
Harmonic Series
# Triangles # Depth Writes
1 1
4 2.08
11 3.02
31 4.03
83 5
12,367 10
See Subtle Tools Slides: erich.realtimerendering.com
![Page 8: Z-Buffer Optimizations Patrick Cozzi Analytical Graphics, Inc.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d605503460f94a4127f/html5/thumbnails/8.jpg)
Z-Test in the Pipeline
When is the Z-Test?
FragmentShader
FragmentShader
Z-Test
Z-Test
or
![Page 9: Z-Buffer Optimizations Patrick Cozzi Analytical Graphics, Inc.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d605503460f94a4127f/html5/thumbnails/9.jpg)
Early-Z
Avoid expensive fragment shaders Reduce bandwidth to frame buffer
Writes not reads
FragmentShader
Z-Test
![Page 10: Z-Buffer Optimizations Patrick Cozzi Analytical Graphics, Inc.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d605503460f94a4127f/html5/thumbnails/10.jpg)
Early-Z
Automatically enabled on GeForce (8?) unlessFragment shader discards or write depthDepth writes and alpha-test are enabled
Fine-grained as opposed to Z-Cull. ATI: “Top of the Pipe Z Reject”
FragmentShader
Z-Test
See NVIDIA GPU Programming Guide for exact details
![Page 11: Z-Buffer Optimizations Patrick Cozzi Analytical Graphics, Inc.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d605503460f94a4127f/html5/thumbnails/11.jpg)
Front-to-Back Sorting
Utilize Early-Z for opaque objects Old hardware still has less z-buffer writes CPU overhead. Need efficient sorting
Bucket SortOcttree
Conflicts with state sorting
0 - 0.25 0.25 – 0.5 0.5 – 0.75 0.75 - 1
0
1
1
2
![Page 12: Z-Buffer Optimizations Patrick Cozzi Analytical Graphics, Inc.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d605503460f94a4127f/html5/thumbnails/12.jpg)
Double Speed Z-Only
GeForce FX and later render at double speed when writing only depth or stencil
Enabled whenColor writes are disabledFragment shader discards or write depthAlpha-test is disabled
See NVIDIA GPU Programming Guide for exact details
![Page 13: Z-Buffer Optimizations Patrick Cozzi Analytical Graphics, Inc.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d605503460f94a4127f/html5/thumbnails/13.jpg)
Early-Z Pass
Software technique to utilize Early-Z and Double Speed Z-Only
Two passesRender depth only. “Lay down depth”
– Double Speed Z-OnlyRender with full shaders – Early-Z
(and Z-Cull)
![Page 14: Z-Buffer Optimizations Patrick Cozzi Analytical Graphics, Inc.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d605503460f94a4127f/html5/thumbnails/14.jpg)
Deferred Shading
Similar to Early-Z Pass1st Pass: Visibility tests2nd Pass: Shading
Different than Early-Z PassGeometry is only transformed once
![Page 15: Z-Buffer Optimizations Patrick Cozzi Analytical Graphics, Inc.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d605503460f94a4127f/html5/thumbnails/15.jpg)
Deferred Shading
1st PassRender geometry into G-Buffers:
Images from Tabula Rasa. See Resources.
Fragment Colors Normals
Depth Edge Weight
![Page 16: Z-Buffer Optimizations Patrick Cozzi Analytical Graphics, Inc.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d605503460f94a4127f/html5/thumbnails/16.jpg)
Deferred Shading
2nd PassShading == post processing effectsRender full screen quads that read
from G-BuffersObjects are no longer needed
![Page 17: Z-Buffer Optimizations Patrick Cozzi Analytical Graphics, Inc.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d605503460f94a4127f/html5/thumbnails/17.jpg)
Deferred Shading
Light Accumulation Result
Image from Tabula Rasa. See Resources.
![Page 18: Z-Buffer Optimizations Patrick Cozzi Analytical Graphics, Inc.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d605503460f94a4127f/html5/thumbnails/18.jpg)
Deferred Shading
Eliminates shading fragments that fail Z-Test
Increases video memory requirement How does it affect bandwidth?
![Page 19: Z-Buffer Optimizations Patrick Cozzi Analytical Graphics, Inc.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d605503460f94a4127f/html5/thumbnails/19.jpg)
Buffer Compression
Reduce depth buffer bandwidth Generally does not reduce memory
usage of actual depth buffer Same architecture applies to other
buffers, e.g. color and stencil
![Page 20: Z-Buffer Optimizations Patrick Cozzi Analytical Graphics, Inc.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d605503460f94a4127f/html5/thumbnails/20.jpg)
Buffer Compression
Tile Table: Status for nxn tile of depths, e.g. n=8[state, zmin, zmax]state is either compressed,
uncompressed, or cleared
0.1
0.5
0.5
0.1
0.5 0.5 0.1
0.8 0.8
0.8 0.8
0.5
0.5
0.5 0.5 0.1
[uncompressed, 0.1, 0.8]
![Page 21: Z-Buffer Optimizations Patrick Cozzi Analytical Graphics, Inc.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d605503460f94a4127f/html5/thumbnails/21.jpg)
Buffer Compression
Tile Table
Decompress Compress
Compressed Z-Buffer
Rasterizer
updated z-values
updated z-max
nxn uncompressed z values[zmin, zmax]
![Page 22: Z-Buffer Optimizations Patrick Cozzi Analytical Graphics, Inc.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d605503460f94a4127f/html5/thumbnails/22.jpg)
Buffer Compression
Depth Buffer WriteRasterizer modifies copy of uncompressed
tileTile is lossless compressed (if possible)
and sent to actual depth bufferUpdate Tile Table
• zmin and zmax
• status: compressed or decompressed
![Page 23: Z-Buffer Optimizations Patrick Cozzi Analytical Graphics, Inc.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d605503460f94a4127f/html5/thumbnails/23.jpg)
Buffer Compression
Depth Buffer ReadTile Status
• Uncompressed: Send tile• Decompress: Decompress and send tile• Cleared: See Fast Clear
![Page 24: Z-Buffer Optimizations Patrick Cozzi Analytical Graphics, Inc.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d605503460f94a4127f/html5/thumbnails/24.jpg)
Fast Clear
Don’t touch depth buffer glClear sets state of each tile to
cleared When the rasterizer reads a cleared
bufferA tile filled with
GL_DEPTH_CLEAR_VALUE is sentDepth buffer is not accessed
![Page 25: Z-Buffer Optimizations Patrick Cozzi Analytical Graphics, Inc.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d605503460f94a4127f/html5/thumbnails/25.jpg)
Fast Clear
Use glClearNot full screen quadsNo "one frame positive, one frame
negative“ trick Clear stencil together with depth
![Page 26: Z-Buffer Optimizations Patrick Cozzi Analytical Graphics, Inc.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d605503460f94a4127f/html5/thumbnails/26.jpg)
Z-Cull
Cull blocks of fragments before shading
Coarse-grained as opposed to Early-Z
FragmentShader
Z-Cull
Ztrianglemin > tile’s zmax
ztrianglemin
![Page 27: Z-Buffer Optimizations Patrick Cozzi Analytical Graphics, Inc.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d605503460f94a4127f/html5/thumbnails/27.jpg)
Z-Cull
Zmax-Culling
Rasterizer fetches zmax for each tile it processes
Compute ztrianglemin for a triangle
Culled if ztrianglemin > zmax
FragmentShader
Z-Cull
Ztrianglemin > tile’s zmax
ztrianglemin
![Page 28: Z-Buffer Optimizations Patrick Cozzi Analytical Graphics, Inc.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d605503460f94a4127f/html5/thumbnails/28.jpg)
Z-Cull
Zmin-CullingSupport different depth testsAvoid depth buffer readsIf triangle is in front of tile, depth tests
for each pixel is unnecessary
FragmentShader
Z-Cull
Ztrianglemax < tile’s zmin
ztrianglemax
![Page 29: Z-Buffer Optimizations Patrick Cozzi Analytical Graphics, Inc.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d605503460f94a4127f/html5/thumbnails/29.jpg)
Z-Cull
Automatically enabled on GeForce (6?) cards unless glClear isn’t used Fragment shader writes depth (or discards?) Direction of depth test is changed
ATI recommends avoiding = and != depth compares and stencil fail and stencil depth fail operations
Less efficient when depth varies a lot within a few pixels
See NVIDIA GPU Programming Guide for exact details
![Page 30: Z-Buffer Optimizations Patrick Cozzi Analytical Graphics, Inc.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d605503460f94a4127f/html5/thumbnails/30.jpg)
Programmable Culling Unit
Cull before fragment shader even if the shader writes depth or discards
Run part of shader over an entire tile to determine lower bound z value
Hasselgren and Akenine-Möller, “PCU: The Programmable Culling Unit,” 2007
![Page 31: Z-Buffer Optimizations Patrick Cozzi Analytical Graphics, Inc.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d605503460f94a4127f/html5/thumbnails/31.jpg)
Summary
What was once “ridiculously expensive” is now the primary visible surface algorithm for rasterization
![Page 32: Z-Buffer Optimizations Patrick Cozzi Analytical Graphics, Inc.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d605503460f94a4127f/html5/thumbnails/32.jpg)
Resources
www.realtimerendering.com
Sections 7.9.2 and 18.3
![Page 33: Z-Buffer Optimizations Patrick Cozzi Analytical Graphics, Inc.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d605503460f94a4127f/html5/thumbnails/33.jpg)
Resources
developer.nvidia.com/object/gpu_programming_guide.html
GeForce 8 Guide: sections 3.4.9, 3.6, and 4.8GeForce 7 Guide: section 3.6
![Page 34: Z-Buffer Optimizations Patrick Cozzi Analytical Graphics, Inc.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d605503460f94a4127f/html5/thumbnails/34.jpg)
Resources
http://www.graphicshardware.org/previous/www_2000/presentations/ATIHot3D.pdf
ATI Radeon HyperZ TechnologySteve Morein
![Page 35: Z-Buffer Optimizations Patrick Cozzi Analytical Graphics, Inc.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d605503460f94a4127f/html5/thumbnails/35.jpg)
Resources
http://ati.amd.com/developer/dx9/ATI-DX9_Optimization.pdf
Performance Optimization Techniques for ATI Graphics Hardware with DirectX® 9.0
Guennadi Riguer
Sections 6.5 and 8
![Page 36: Z-Buffer Optimizations Patrick Cozzi Analytical Graphics, Inc.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d605503460f94a4127f/html5/thumbnails/36.jpg)
Resources
developer.nvidia.com/object/gpu_gems_home.html
Chapter 28: Graphics Pipeline Performance
![Page 37: Z-Buffer Optimizations Patrick Cozzi Analytical Graphics, Inc.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d605503460f94a4127f/html5/thumbnails/37.jpg)
Resources
developer.nvidia.com/object/gpu-gems-3.html
Chapter 19: Deferred Shading in Tabula Rasa