PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland -...
-
Upload
william-hunt -
Category
Documents
-
view
219 -
download
2
Transcript of PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland -...
PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS
Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD
2| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
Part 1 – Introduction to HD7970 and Partially Resident Textures, Bill Bilodeau
Part 2 – Implementation in OpenGL, Graham Sellers
Part 3 – Ptex, an example PRT application, Karl Hillesland
AGENDA FOR TODAY’S TALK
3| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
PART 1INTRODUCTION TO THE RADEON
HD7970 AND PARTIALLY RESIDENT TEXTURES
4| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
Partially Resident Textures (PRTs) are textures that have only portions of the texture stored in GPU video memory
Best known example of virtual texturing (software implementation) is John Carmack’s “MegaTextures”
WHAT ARE PARTIALLY RESIDENT TEXTURES?
Image from id Software’s Rage
5| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
World’s first GPU to have dedicated hardware for Partially Resident Textures
Completely new Shader architecture
Improved cache and memory bandwidth
World’s first Direct3D® 11.1 GPU
RADEON HD7970 OVERVIEW
6| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
Previous AMD GPUs used VLIW (Very Long Instruction Word) architecture
– Combines instructions into a 4-wide VLIW that gets executed on a SIMD
PREVIOUS SHADER ARCHITECTURE
b + c c + d d + e e + fa = b + c;b = c + d;c = d + e;d = e + f;
b + a idle idle idlea = b + c;b = a + c;c = b + a;d = c + d;
Shader Instructions VLIW Instruction
b + c idle idle idlea + c idle idle idlec + d idle idle idle
X Y Z W
b + c c + d d + e e + f
b + c c + d d + e e + f
b + c c + d d + e e + f
b + c idle idle idle
b + c idle idle idle
b + c idle idle idle
a + c idle idle idle
a + c idle idle idle
a + c idle idle idle
b + a idle idle idle
b + a idle idle idle
b + a idle idle idle
c + d idle idle idle
c + d idle idle idle
c + d idle idle idle
Thread 0
Thread 1
Thread 2
Thread 63
7| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
64-wide SIMD architecture without VLIW instructions
– No need to combine instructions, since multiple threads can run in parallel
NEW SHADER ARCHITECTURE
b + a b + a b + a
a = b + c;b = a + c;c = b + a;d = c + d;
Shader Instructions ALUs
b + c b + c b + ca + c a + c a + cc + d c + d c + d
No idle ALUs!
b + cb + ac + db + c
S0 S1 S2 S63
....
8| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
Each Compute unit consists of 4 SIMDs and one Scalar unit
Higher execution efficiency
Simplified logic design
Simplified assembly language
HD7970 has 32 Compute Units– 4 SIMDs per CU
COMPUTE UNITS ARE THE NEW BASIC BUILDING BLOCK FOR SHADERS
SIMD0 SIMD1
SQInstruction
Buffers/ArbitersScalar ALU
LDS32 banks of
512x32(total – 64kb)
SIMD2 SIMD3
Texture Unit(Data Section)
Texture Unit(Addr Section)
16kbR/WL1
Compute Unit
9| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
Improved Tessellation PerformanceImproved Geometry Shader PerformanceFast depth accept for fully visible triangles, depth bounds testing support384 bit memory busDX11.1 And of course, Partially Resident Texture support!
ADDITIONAL FEATURES OF THE HD7970
10| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
Enables application to manage more texture data than can physically fit in a fixed footprint
– A.k.a. Virtual texturing or Sparse texturing
The principle behind PRT is that not all texture contents is likely to be needed at any given time
– Current render view may only require selected portions of the texture to be resident in memory
– Or selected MIPMap levels
PRT textures only have a portion of their data mapped into GPU-accessible memory at a given time
INTRODUCTION TO PARTIALLY RESIDENT TEXTURES
11| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
The PRT texture is chunked into 64 KB tiles
–Fixed memory size
–Not dependant on texture type or format
PRT TILES
Highlighted areas represent texture data that needs highest resolution
Chunked texture Texture tiles needing to be resident in GPU memory
Images from “Sparse Virtual Textures”, Sean Barrett, GDC 2008
12| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
The GPU virtual memory page table translates tiles into a resident texture tile pool
TRANSLATION TABLE
Texture Map Texture Tile Pool (Video Memory)
(linear storage)
Unmapped page entryMapped page entry
64Kb tile
Mapped page entry
Texture Map Texture Tile Pool (Video Memory)
(linear storage)
Unmapped page entry64Kb tile
Page Table
Images from “Sparse Virtual Textures”, Sean Barrett, GDC 2008
13| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
MIPMaps can be included in the Texture Tile Pool
TRANSLATION TABLE - MIPMAPS
Texture Map Page Table Texture Tile Pool (Video Memory)
Unmapped page entryMapped page entry
64Kb tile
Images from “Sparse Virtual Textures”, Sean Barrett, GDC 2008
14| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
“FAILED” TEXEL FETCH CONDITION
How does the application know which texture tiles to upload?Answer: PRT-specific texture fetch instructions in a shader
–Return a “Failed” texel fetch condition when sampling a PRT pixel whose tile is currently not in the pool
This information is then stored in render target or UAV
–Texel fetch failed for a given (x,y) tile location...and then copied to the CPU so that application can upload required tilesApp chooses what to render until missing data gets uploaded
15| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
“LOD WARNING” TEXEL FETCH CONDITION
PRT fetch condition code can also indicate an “LOD Warning”The minimum LOD warning is specified by the application on a per texture basis
If a fetched pixel’s LOD is below the specified LOD warning value then the condition code is returned
This functionality is typically used to try to predict when higher-resolution MIP levels are going to be needed
–E.g. Camera getting closer to PRT-mapped geometry
16| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
EXAMPLE USAGE
1) App allocates PRT (e.g. 16kx16k DXT1) using PRT API
2) App uploads MIP levels using API calls
3) Shader fetches PRT data at specified texcoords
Two possibilities:3a) Texel data belongs to a resident (64KB) tile
- Valid color returned, no error code3b) Texel data points to non-resident tile or specified LOD
- Error/LOD Warning code returned- Shader writes tile location and error code to RT or UAV
4) App reads RT or UAV and upload/release new tiles as needed
17| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
PRT ADVANTAGES VS SOFTWARE IMPLEMENTATION
PRT
Ease of implementation•Eliminates the complexity and limitations of SW solutions
Full filtering support•Includes anisotropic filtering
Full-speed filtering•SW solution requires “manual” filtering in pixel shader•Can be quite costly if anisotropic filtering is used
Don’t go overboard with PRT allocation!•Page table entry size is 4 DWORDs•Have to be resident in video memory
Software Impementation
18| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
PART 2IMPLEMENTATION IN OpenGL AMD_sparse_texture Extension
19| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
Partially Resident Textures exposed in OpenGL via extension
Two design goals for the extension
– Minimally invasive to the API
Easy to retrofit into existing application
Plays well with non-sparse textures
– Easy fallback path
Most of the same code will work in the absence of the extension
Two parts to the extension
– Update to the API – 1 function, a hand full of tokens
– Update to the shading language
OPENGL EXTENSION | AMD_sparse_texture
20| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
Use of immutable texture storage
This is the existing OpenGL immutable storage API – declare storage, specify image data
UPLOAD TEXTURES | Example Using Existing OpenGL API
GLuint tex;
glGenTextures(1, &tex);glBindTexture(GL_TEXTURE_2D, tex);glTexStorage2D(GL_TEXTURE_2D, 10, GL_RGBA8, 1024, 1024);glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, 1024, 1024, GL_RGBA, GL_UNSIGNED_BYTE, data);
21| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
Use of sparse texture storage
glTexStorageSparseAMD is the one new function in the extension
– Notice very little difference to previous API
UPLOAD TEXTURES | Example Using New OpenGL Extension
GLuint tex;
glGenTextures(1, &tex);glBindTexture(GL_TEXTURE_2D, tex);glTexStorageSparseAMD(GL_TEXTURE_2D, GL_RGBA, 1024, 1024, 1, 1, GL_TEXTURE_STORAGE_SPARSE_BIT_AMD);glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, 1024, 1024, GL_RGBA, GL_UNSIGNED_BYTE, data);
22| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
Previous example used glTexSubImage2D
– Upload sub-region of the texture
– Physical pages allocated on demand by the OpenGL driver
– Unused pages remain free
Enough storage for two 256x256 regions allocated
MAKE PAGES RESIDENT | Reuse Existing API
glTexStorageSparseAMD(GL_TEXTURE_2D, GL_RGBA, 1024, 1024, 1, 10, GL_TEXTURE_STORAGE_SPARSE_BIT_AMD);glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, 256, 256, GL_RGBA, GL_UNSIGNED_BYTE, data1);glTexSubImage2D(GL_TEXTURE_2D, 0, 768, 768, 256, 256, GL_RGBA, GL_UNSIGNED_BYTE, data2);
23| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
Passing NULL to glTexSubImage2D makes pages non-resident
– Driver returns physical pages to the pool
FREE PHYSICAL PAGES | Again, Reuse Existing API
glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, 256, 256, GL_RGBA, GL_UNSIGNED_BYTE, NULL);
24| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
Sparse Textures rely on VM subsystem
– Pages are 64KB in size on Southern Islands
Note size is measured in bytes, not texels
– Texel size of a page depends on texture format
PAGE SIZES | Determining Page Sizes
25| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
Reuse existing API: glGetInternalFormativ
– New OpenGL tokens – GL_VIRTUAL_PAGE_SIZE_{X,Y,Z}_AMD
Given a target (texture dimensionality) and format, returns the page size
– It is not necessary to create a texture to get this information
PAGE SIZE | Retrieving Page Size from OpenGL
GLint page_size_x;
glGetInternalFormativ(GL_TEXTURE_2D, GL_RGBA8, GL_VIRTUAL_PAGE_SIZE_X_AMD, sizeof(GLint), &page_size_x);
26| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
Highest resolution LOD requires multiple pages
Each LOD requires fewer and fewer pages
Eventually, one LOD does not fill a page
– Now what?
At some point, we must make all LODs resident
– But which LOD?
Use glGetInternalFormativ to retrieve the lowest sparse level for a given target/format
– All levels below this reside in the same page and share residency
MIPMAPS | Dealing With Small Textures
GLint min_sparse_level;
glGetInternalFormativ(GL_TEXTURE_2D, GL_RGBA16F, GL_MIN_SPARSE_LEVEL_AMD, 1, &min_sparse_level);
27| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
To assist in streaming we include a per-texture low water mark
– Set this to the highest resolution LOD that’s fully resident
– Once you hit this, you’ll get a signal in the shader
Returned data is still valid
Signal says it’s time to start streaming the next mip
Exposed using the glTexParameter API
– Here, an LOD warning will be returned to the shader if hardware attempts to access LOD 4 or lower
More on residency returns later...
LOD WARNING | Low Water Mark
glTexParameteri(GL_TEXTURE_2D, GL_MIN_WARNING_LOD_AMD, 4);
28| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
It is possible to render to a PRT using an FBO
Writes to unmapped regions are simply dropped
RENDERING TO PRT | Attach PRT to FBO
GLuint prt, fbo;
glGenTextures(1, &prt);glBindTexture(GL_TEXTURE_2D, prt);glTexStorageSparseAMD(GL_TEXTURE_2D, GL_RGBA, 1024, 1024, 1, 1, GL_TEXTURE_STORAGE_SPARSE_BIT_AMD);glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, 1024, 1024, GL_RGBA, GL_UNSIGNED_BYTE, data);glGenFramebuffers(1, &fbo);glBindFramebuffer(GL_FRAMEBUFFER, fbo);glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, prt, 0);
29| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
Applications can read PRTs to CPU memory using existing APIs
– Call glGetTexImage to read the entire content back
– Bind to FBO and use glReadPixels or glBlitFramebuffer Reads to system memory or into another FBO, respectively
READING FROM PRT | Retrieving Data from PRTs
glGetTexImage(GL_TEXTURE_2D, 0, GL_RGBA, GL_UNSIGNED_BYTE, data);
glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, prt, 0);glReadPixels(0, 0, 1024, 1024, GL_RGBA, GL_UNSIGNED_BYTE, data);glBlitFramebuffer(0, 0, 1024, 1024, 0, 0, 128, 128, GL_COLOR_BUFFER_BIT, GL_LINEAR);
30| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
There are some restrictions on the use of sparse textures
– Dimensions of the base level must be integer multiples of the page size (GL_VIRTUAL_PAGE_SIZE_{X,Y,Z}_AMD)
This means... no sparse textures below this size
– No buffer textures or “TBOs” – another extension is coming for that!
– No depth or stencil textures, nor MSAA textures
RESTRICTIONS | Mostly Everything Works
31| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
Virtual address space is extremely large – 10’s to 100’s of gigabytes
– You will run out eventually, but it’ll take a while
Physical memory is still limited
– glTexSubImage2D etc., may fail
– Draw calls may fail
Feel free to create an 4k x 4k x 4k volume texture
– Don’t try to make it all resident at the same time!
There are no sparse read-backs
– glGetTexImage could read gigabytes of data back
– This will fail
MANAGING FAILURE | Memory is not Unlimited
32| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
First and most important:
SPARSE TEXTURES IN SHADERS | Extending GLSL
IT IS NOT NECESSARY TO MAKE SHADER CHANGES TO USE SPARSE TEXTURES
33| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
Basic type for textures in GLSL is the ‘sampler’
– Several types of samplers exist... sampler2D, sampler3D, samplerCUBE, sampler2DArray, etc.
– We didn’t add any new sampler types
PRTs look like regular textures in the shader
Textures are read using the ‘texture’ built-in function, its overloads and variants
– We didn’t add any overloads
SPARSE TEXTURES IN SHADERS | Extending GLSL
gvec4 texture(gsampler1D sampler, float P [, float bias]);gvec4 texture(gsampler2D sampler, vec2 P [, float bias]);gvec4 texture(gsampler2DArray sampler, vec3 P [, float bias]);gvec4 textureLod(gsampler2D sampler, vec2 P, float lod);gvec4 textureProj(gsampler2D sampler, vec4 P [, float bias]);gvec4 textureOffset(gsampler2D sampler, vec2 P, ivec2 offset [, float bias]);// ... etc.
34| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
Adding more overloads to existing functions was difficult
– Need to return a status code and a texel
– Need user-specified defaults with conditional move like functionality
– Optional parameters in existing overloads made this very difficult
Added new built-in functions
– New built-in functions return status code
– New built-in functions return texel data via inout parameters
– Most existing texture functions have a sparseTexture equivalent
Non-PRTs work with new functions
– Will appear as fully-resident PRT
EXTENDING GLSL | New Built-in Functions
int sparseTexture(gsampler2D sampler, vec2 P, inout gvec4 texel [, float bias]);int sparseTextureLod(gsampler2D sampler, vec2 P, float lod, inout gvec4 texel);// ... etc.
35| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
All sparseTexture functions return two pieces of data:
– Texel data via inout parameter
– Residency status code
Texel data returned in inout parameter
– If texel fetch fails, old data remains in variable
– Think of it as a CMOV type operation
Return code is hardware-dependent bit-field information
– More built-in functions for decoding status codes
– This allows us to extend this further in the future, or to change the implementation
EXTENDING GLSL | sparseTexture Functions
int sparseTexture(gsampler2D sampler, vec2 P, inout gvec4 texel [, float bias]);
36| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
Texel data is returned in inout parameter
– No direct support for ‘default value’ behavior
– This is emulated in the shader:
Note that regular texture fetch functions work on PRTs too:
– Value of texel is undefined if you miss ...
... but feel free to use on known-resident data (atlases, explicit LoD, etc.)
sparseTexture FUNCTIONS | Texture Data Return
vec4 texel = vec4(1.0, 0.0, 0.7, 1.0); // Default value
sparseTexture(s, texCoord, texel);
// On success, texel contains texture data. On failure, it has the shader-supplied// default value in it (pinkish magenta here).
vec4 texel = texture(s, texCoord);
37| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
Residency data is bit-packed into the return value from the fetch
After this, code can be interpreted by three additional functions:
sparseTexture FUNCTIONS | Residency Data Return
vec4 texel = vec4(1.0, 0.0, 0.7, 1.0); // Default valueint code;
code = sparseTexture(s, texCoord, texel);
bool sparseTexelResident(int code);bool sparseTexelMinLodWarning(int code);int sparseTexelLodWarningFetch(int code);
38| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
sparseTexelResident simply indicates whether the data fetched is valid
Returns true if data is valid, false otherwise
Texel miss is generated if any required sample is not resident, including:
– Texels required for bilinear or trilinear sampling
– Missing mip maps
– Anisotropic filter taps
It is up to the shader to ‘do the right thing’
– Fall back to lower mips
– Write out to an image or framebuffer attachment
– etc., etc.
RESIDENCY DATA | sparseTexelResident
bool sparseTexelResident(int code);
39| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
sparseTexelMinLodWarning returns true if a min LOD warning was generated
– This occurs when generating the returned texel required fetching from an LOD lower than the low-water mark specified by the application
– This can be a signal to the application to start streaming more mip levels
RESIDENCY DATA | sparseTexelMinLodWarning
bool sparseTexelMinLodWarning(int code);
40| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
Returns the LOD that caused the low-watermark warning to be generated
– This also causes sparseTexelMinLodWarning to return true
– sparseTexelLodWarningFetch returns 0 if the warning was not hit
RESIDENCY DATA | sparseTexelLodWarningFetch
int sparseTexelLodWarningFetch(int code);
41| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
Drop in replacement for traditional 2D Sparse Virtual Texture (SVT)
– Well, almost – maximum texture size hasn’t increased
Very large texture arrays
– Sparsely populate array
– Can almost eliminate texture binds in some applications
Volume textures + ray marching
– Sparse or homogeneous media
– Default value is maximum step distance for ray marching distance fields
Arrays of variable sized textures
– Make a large array, but populate different mip levels in each slice
– Store LOD bias per array slice in an auxiliary array (UBO, for example)
Etc., etc., etc.
EXAMPLE USE CASES | What Can I Use This For?
42| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
PART 3PRT PTEX
PTex Using Sparse Textures
43| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
Ptex: Per-face Texture Mapping for Production Rendering
[Burley and Lacewell, 2008]
No UV setup (it’s implicit)
No Seams
Per-Patch Resolution Control
Out-of-core Performance Advantages
PTEX | Introduction
44| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
Ptex: Per-face Texture Mapping for Production Rendering
[Burley and Lacewell, 2008]
Per-face textures + MIPs
Adjacency for filtering
PTEX | Introduction
45| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
BORDERS FOR FILTERING
Face Texture A Face Texture B
46| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
MANUAL TRILINEAR FILTERING
Resolution Lookup
(ddx ddy)Lerp
floor
floor+1
frac
47| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
PRT PTEX
Packed in one texture array
– Slice per resolution
– Resolution includes MIPs
– Cannot fit in standard MIP chain
– Easy lookups
– Easy resolution management
– Still one texture
48| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
PRT PTEX PRAGMATICS
Better organization possibilities
– Pack pages
– Scaled squares
Other Methods
– Packed Ptex – all in one texture slice
– Face per slice, array per resolution
49| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
MULTIRES SLICES
50| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
MIP FALLBACK
Resolution Lookup
(ddx ddy)Lerp
floor
floor+1
frac
Demo
52| Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
Trademark Attribution
AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. Other names used in this presentation are for identification purposes only and may be trademarks of their respective owners.
©2012 Advanced Micro Devices, Inc. All rights reserved.