GPU Memory Model Overview - Penn Engineering - Welcome to the
Transcript of GPU Memory Model Overview - Penn Engineering - Welcome to the
![Page 1: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/1.jpg)
Aaron LefohnAaron LefohnUniversity of California, DavisUniversity of California, DavisWith updates from slides by With updates from slides by Suresh Venkatasubramanian,Suresh Venkatasubramanian,University of PennsylvaniaUniversity of Pennsylvania
Updates performed by Updates performed by Joseph Kider,Joseph Kider,University of PennsylvaniaUniversity of Pennsylvania
GPU Memory Model OverviewGPU Memory Model Overview
Note: These slides do not include the NVIDIA 8-series memory model
![Page 2: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/2.jpg)
ReviewReview
VertexIndex
Stream
3D APICommands
AssembledPrimitives
PixelUpdates
PixelLocationStream
ProgrammableFragmentProcessor
ProgrammableFragmentProcessor
Tra
nsf
orm
edVer
tice
s
ProgrammableVertex
Processor
ProgrammableVertex
Processor
GPUFront End
GPUFront End
PrimitiveAssembly
PrimitiveAssembly
Frame Buffer
Frame Buffer
RasterOperations
Rasterizationand
Interpolation
3D API:OpenGL orDirect3D
3D API:OpenGL orDirect3D
3DApplication
Or Game
3DApplication
Or Game
Pre-tran
sform
edVertices
Pre-tran
sform
edFrag
men
ts
Tra
nsf
orm
edFr
agm
ents
GPU
Com
man
d &
Data S
tream
CPU-GPU Boundary (AGP/PCIe)
Fixed-function pipeline
![Page 3: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/3.jpg)
ReviewReview
• Color Buffers– Front-left– Front-right– Back-left– Back-right
• Depth Buffer (z-buffer)• Stencil Buffer• Accumulation Buffer
![Page 4: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/4.jpg)
OverviewOverview
• GPU Memory Model• GPU Data Structure Basics• Introduction to Framebuffer Objects• Fragment Pipeline• Vertex Pipeline
![Page 5: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/5.jpg)
5
Memory HierarchyMemory Hierarchy
• CPU and GPU Memory Hierarchy
Disk
CPU Main Memory
GPU Video MemoryCPU Caches
CPU Registers GPU Caches
GPU Temporary Registers
GPU Constant Registers
Slow
![Page 6: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/6.jpg)
6
CPU Memory ModelCPU Memory Model
• At any program point– Allocate/free local or global memory– Random memory access
• Registers– Read/write
• Local memory– Read/write to stack
• Global memory– Read/write to heap
• Disk– Read/write to disk
![Page 7: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/7.jpg)
7
GPU Memory ModelGPU Memory Model• Much more restricted memory access
– Allocate/free memory only before computation– Limited memory access during computation (kernel)
• Registers– Read/write
• Local memory– Read/write
• Shared Memory– Only available in GPGPU not Graphics pipeline
• Global memory– Read-only during computation– Write-only at end of computation (pre-computed address)– Read/write in GPGPU world only
• Virtual Memory– Does not exist
• Disk access– Does not exist
![Page 8: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/8.jpg)
GPU Memory ModelGPU Memory Model
• Where is GPU Data Stored?– Vertex buffer– Frame buffer– Texture
Vertex Buffer Vertex Processor Rasterizer Fragment
Processor
Texture
FrameBuffer(s)
![Page 9: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/9.jpg)
9
GPU Memory APIGPU Memory API
• Each GPU memory type supports subset of the following operations– CPU interface– GPU interface
![Page 10: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/10.jpg)
10
GPU Memory APIGPU Memory API
• CPU interface– Allocate– Free– Copy CPU GPU– Copy GPU CPU– Copy GPU GPU– Bind for read-only vertex stream access– Bind for read-only random access– Bind for write-only framebuffer access
![Page 11: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/11.jpg)
11
GPU Memory APIGPU Memory API
• GPU (shader/kernel) interface– Random-access read– Stream read
![Page 12: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/12.jpg)
Vertex BuffersVertex Buffers
• GPU memory for vertex data• Vertex data required to initiate render pass
Vertex Buffer Vertex Processor Rasterizer Fragment
Processor
Texture
FrameBuffer(s)
VS 3.0 GPUs
![Page 13: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/13.jpg)
13
Vertex BuffersVertex Buffers
• Supported Operations– CPU interface
• Allocate• Free• Copy CPU GPU• Copy GPU GPU (Render-to-vertex-array)• Bind for read-only vertex stream access
– GPU interface• Stream read (vertex program only)
![Page 14: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/14.jpg)
14
Vertex BuffersVertex Buffers
• Limitations– CPU
• No copy GPU CPU• No bind for read-only random access• No bind for write-only framebuffer access
– GPU• No random-access reads• No access from fragment programs
![Page 15: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/15.jpg)
TexturesTextures
• Random-access GPU memory
Vertex Buffer Vertex Processor Rasterizer Fragment
Processor
Texture
FrameBuffer(s)
![Page 16: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/16.jpg)
16
TexturesTextures
• Supported Operations– CPU interface
• Allocate• Free• Copy CPU GPU• Copy GPU CPU• Copy GPU GPU (Render-to-texture)• Bind for read-only random access (vertex or fragment)• Bind for write-only framebuffer access
– GPU interface• Random read
![Page 17: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/17.jpg)
17
TexturesTextures
• Limitations– No bind for vertex stream access
![Page 18: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/18.jpg)
FramebufferFramebuffer
• Memory written by fragment processor• Write-only GPU memory
Vertex Buffer Vertex Processor Rasterizer Fragment
Processor
Texture
FrameBuffer(s)
![Page 19: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/19.jpg)
19
OpenGL OpenGL FramebufferFramebuffer ObjectsObjects
• General idea– Framebuffer object is lightweight struct of pointers– Bind GPU memory to framebuffer as write-only– Memory cannot be read while bound to framebuffer
• Which memory? – Texture– Renderbuffer– Vertex buffer??
Texture(RGBA)
Renderbuffer(Depth)
FramebufferObject
![Page 20: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/20.jpg)
20
What is a What is a RenderbufferRenderbuffer??
• “Traditional” framebuffer memory – Write-only GPU memory
• Color buffer• Depth buffer• Stencil buffer
• New OpenGL memory object– Part of Framebuffer Object extension
![Page 21: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/21.jpg)
21
RenderbufferRenderbuffer
• Supported Operations– CPU interface
• Allocate• Free• Copy GPU CPU• Bind for write-only framebuffer access
![Page 22: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/22.jpg)
22
Pixel Buffer ObjectsPixel Buffer Objects
• Mechanism to efficiently transfer pixel data– API nearly identical to vertex buffer objects
Vertex Buffer Vertex Processor Rasterizer Fragment
Processor
Texture
FrameBuffer(s)
![Page 23: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/23.jpg)
23
Pixel Buffer ObjectsPixel Buffer Objects
• Uses– Render-to-vertex-array
• glReadPixels into GPU-based pixel buffer• Use pixel buffer as vertex buffer
– Fast streaming textures• Map PBO into CPU memory space• Write directly to PBO• Reduces one or more copies
![Page 24: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/24.jpg)
24
Pixel Buffer ObjectsPixel Buffer Objects
• Uses (continued)– Asynchronous readback
• Non-blocking GPU CPU data copy• glReadPixels into PBO does not block• Blocks when PBO is mapped into CPU memory
![Page 25: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/25.jpg)
Summary : RenderSummary : Render--toto--TextureTexture
• Basic operation in GPGPU apps
• OpenGL Support– Save up to 16, 32-bit floating values per pixel
• Multiple Render Targets (MRTs) on ATI and NVIDIA
1. Copy-to-texture• glCopyTexSubImage
2. Render-to-texture• GL_EXT_framebuffer_object
![Page 26: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/26.jpg)
26
Summary : RenderSummary : Render--ToTo--VertexVertex--ArrayArray
• Enable top-of-pipe feedback loop
• OpenGL Support– Copy-to-vertex-array
• GL_ARB_pixel_buffer_object• NVIDIA and ATI
– Render-to-vertex-array• Maybe future extension to framebuffer objects
![Page 27: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/27.jpg)
27
Multiple Render to Texture (MRT) [nv40]Multiple Render to Texture (MRT) [nv40]
Fragmentprogram
Fragmentprogram
MRT allows us to compress multiple passes into a single one.
This does not fundamentally change the model though, since read/write access is still not allowed.
![Page 28: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/28.jpg)
28
OverviewOverview
• GPU Memory Model• GPU Data Structure Basics• Introduction to Framebuffer Objects• Fragment Pipeline• Vertex Pipeline
![Page 29: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/29.jpg)
29
GPU ArraysGPU Arrays
• Large 1D Arrays– Current GPUs limit 1D array sizes to 2048 or 4096– Pack into 2D memory– 1D-to-2D address translation
![Page 30: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/30.jpg)
30
GPU ArraysGPU Arrays
• 3D Arrays– Problem
• GPUs do not have 3D frame buffers• No render-to-slice-of-3D-texture yet (coming soon?)
– Solutions1. Stack of 2D slices2. Multiple slices per 2D buffer
![Page 31: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/31.jpg)
31
GPU ArraysGPU Arrays
• Problems With 3D Arrays for GPGPU– Cannot read stack of 2D slices as 3D texture– Must know which slices are needed in advance– Visualization of 3D data difficult
• Solutions– Flat 3D textures– Need render-to-slice-of-3D-texture
– Maybe with GL_EXT_framebuffer_object
– Volume rendering of flattened 3D data– “Deferred Filtering: Rendering from Difficult Data Formats,”
GPU Gems 2, Ch. 41, p. 667
![Page 32: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/32.jpg)
32
GPU ArraysGPU Arrays
• Higher Dimensional Arrays– Pack into 2D buffers– N-D to 2D address translation– Same problems as 3D arrays if data does not fit in a
single 2D texture
![Page 33: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/33.jpg)
33
Sparse/Adaptive Data StructuresSparse/Adaptive Data Structures• Why?
– Reduce memory pressure– Reduce computational workload
• Examples– Sparse matrices
• Krueger et al., Siggraph 2003• Bolz et al., Siggraph 2003
– Deformable implicit surfaces (sparse volumes/PDEs)• Lefohn et al., IEEE Visualization 2003 / TVCG 2004
– Adaptive radiosity solution (Coombe et al.)
Premoze et al.Eurographics 2003
![Page 34: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/34.jpg)
34
Sparse/Adaptive Data StructuresSparse/Adaptive Data Structures
• Basic Idea– Pack “active” data elements into GPU memory
![Page 35: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/35.jpg)
35
OverviewOverview
• GPU Memory Model• GPU-Based Data Structures• Introduction to Framebuffer Objects• Fragment Pipeline• Vertex Pipeline
![Page 36: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/36.jpg)
36
FramebufferFramebuffer ObjectsObjects
• What is an FBO?– A struct that holds pointers to memory objects– Each bound memory object can be a
framebuffer rendering surface– Platform-independent
Texture(RGBA)
Renderbuffer(Depth)
FramebufferObject
![Page 37: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/37.jpg)
37
FramebufferFramebuffer ObjectsObjects
• Which memory can be bound to an FBO?– Textures– Renderbuffers
• Depth, stencil, color• Traditional write-only framebuffer surfaces
![Page 38: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/38.jpg)
38
FramebufferFramebuffer ObjectsObjects
• Usage models– Keep N textures bound to one FBO (up to 16)
• Change render targets with glDrawBuffers
– Keep one FBO for each size/format• Change render targets with attach/unattach textures
– Keep several FBOs with textures attached• Change render targets by binding FBO
![Page 39: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/39.jpg)
39
FramebufferFramebuffer ObjectsObjects
• Performance– Render-to-texture
• glDrawBuffers is fastest on NVIDIA/ATI– As-fast or faster than pbuffers
• Attach/unattach textures same as changing FBOs– Slightly slower than glDrawBuffers but faster than
wglMakeCurrent
• Keep format/size identical for all attached memory– Current driver limitation, not part of spec
– Readback• Same as pbuffers for NVIDIA and ATI
![Page 40: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/40.jpg)
40
FramebufferFramebuffer ObjectsObjects
• Driver support still evolving– GPUBench FBO tests coming soon…
• “fbocheck” evalulates completeness• Other tests…
![Page 41: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/41.jpg)
41
FramebufferFramebuffer ObjectObject
• Code examples– Simple C++ FBO and Renderbuffer classes
• HelloWorld example• http://gpgpu.sourceforge.net/
– OpenGL Spechttp://oss.sgi.com/projects/ogl-sample/registry/EXT/framebuffer_object.txt
![Page 42: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/42.jpg)
OverviewOverview
• GPU Memory Model• GPU Data Structure Basics• Introduction to Framebuffer Objects• Fragment Pipeline• Vertex Pipeline
![Page 43: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/43.jpg)
ReviewReview
VertexIndex
Stream
3D APICommands
AssembledPrimitives
PixelUpdates
PixelLocationStream
ProgrammableFragmentProcessor
ProgrammableFragmentProcessor
Tra
nsf
orm
edVer
tice
s
ProgrammableVertex
Processor
ProgrammableVertex
Processor
GPUFront End
GPUFront End
PrimitiveAssembly
PrimitiveAssembly
Frame Buffer
Frame Buffer
RasterOperations
Rasterizationand
Interpolation
3D API:OpenGL orDirect3D
3D API:OpenGL orDirect3D
3DApplication
Or Game
3DApplication
Or Game
Pre-tran
sform
edVertices
Pre-tran
sform
edFrag
men
ts
Tra
nsf
orm
edFr
agm
ents
GPU
Com
man
d &
Data S
tream
CPU-GPU Boundary (AGP/PCIe)
Fixed-function pipeline
![Page 44: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/44.jpg)
44
The fragment pipelineThe fragment pipeline
…
-[Z]YXTexture coordinates
-[Z]YXTexture coordinates
WZYXPosition
R ABGColor
Input: Fragment
Attributes
Interpolated from vertex information
Input: Texture Image
WZYX
• Each element of texture is 4D vector
• Textures can be “square” or rectangular (power-of-two or not)
32 bits = float
16 bits = half
![Page 45: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/45.jpg)
45
The fragment pipelineThe fragment pipeline
Input: Uniform parameters
• Can be passed to a fragment program like normal parameters
• set in advance before the fragment program executes
Example:A counter that tracks which pass the algorithm is in.
Input: Constant parameters
• Fixed inside program• E.g. float4 v = (1.0,
1.0, 1.0, 1.0)Examples:
3.14159..Size of compute window
![Page 46: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/46.jpg)
46
The fragment pipelineThe fragment pipeline
Math ops: USE THEM !• cos(x)/log2(x)/pow(x,y)• dot(a,b)• mul(v, M)• sqrt(x)• cross(u, v)
Using built-in ops is more efficient than writing your own
Swizzling/masking: an easy way to move data around.
v1 = (4,-2,5,3); // Initializev2 = v1.yx; // v2 = (-2,4)s = v1.w; // s = 3v3 = s.rrr; // v3 = (3,3,3)
Write masking:
v4 = (1,5,3,2);v4.ar = v2; // v4=(4,5,4,-2)
![Page 47: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/47.jpg)
47
The fragment pipelineThe fragment pipeline
x
yfloat4 v = tex2D(IMG, float2(x,y))
Texture access is like an array lookup.
The value in v can be used
to perform another lookup!
This is called a dependent read
Texture reads (and dependent reads) are expensive resources, and are limited in different GPUs. Use them wisely !
![Page 48: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/48.jpg)
48
The fragment pipelineThe fragment pipelineControl flow:• (<test>)?a:b operator.• if-then-else conditional
– [nv3x] Both branches are executed, and the condition code is used to decide which value is used to write the output register.
– [nv40] True conditionals• for-loops and do-while
– [nv3x] limited to what can be unrolled (i.e no variable loop limits)– [nv40] True looping.
WARNING: Even though nv40 has true flow control, performance will suffer if there is no coherence (more on this later)
![Page 49: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/49.jpg)
49
The fragment pipelineThe fragment pipelineFragment programs use call-by-result
Notes:• Only output color can be modified• Textures cannot be written• Setting different values in different channels of result can be
useful for debugging
out float4 result : COLOR
// Do computation
result = <final answer>
![Page 50: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/50.jpg)
OverviewOverview
• GPU Memory Model• GPU Data Structure Basics• Introduction to Framebuffer Objects• Fragment Pipeline• Vertex Pipeline
![Page 51: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/51.jpg)
ReviewReview
VertexIndex
Stream
3D APICommands
AssembledPrimitives
PixelUpdates
PixelLocationStream
ProgrammableFragmentProcessor
ProgrammableFragmentProcessor
Tra
nsf
orm
edVer
tice
s
ProgrammableVertex
Processor
ProgrammableVertex
Processor
GPUFront End
GPUFront End
PrimitiveAssembly
PrimitiveAssembly
Frame Buffer
Frame Buffer
RasterOperations
Rasterizationand
Interpolation
3D API:OpenGL orDirect3D
3D API:OpenGL orDirect3D
3DApplication
Or Game
3DApplication
Or Game
Pre-tran
sform
edVertices
Pre-tran
sform
edFrag
men
ts
Tra
nsf
orm
edFr
agm
ents
GPU
Com
man
d &
Data S
tream
CPU-GPU Boundary (AGP/PCIe)
Fixed-function pipeline
![Page 52: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/52.jpg)
52
The Vertex PipelineThe Vertex Pipeline
Input: vertices• position, color, texture coords.Input: uniform and constant parameters.• Matrices can be passed to a vertex program.• Lighting/material parameters can also be
passed.
![Page 53: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/53.jpg)
53
The Vertex PipelineThe Vertex Pipeline
Operations:• Math/swizzle ops• Matrix operators• Flow control (as before)[nv3x] No access to textures.Output:• Modified vertices (position, color)• Vertex data transmitted to primitive
assembly.
![Page 54: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/54.jpg)
54
Vertex programs are usefulVertex programs are useful• We can replace the entire geometry
transformation portion of the fixed-function pipeline.
• Vertex programs used to change vertex coordinates (move objects around)
• There are many fewer vertices than fragments: shifting operations to vertex programs improves overall pipeline performance.
• Much of shader processing happens at vertex level.• We have access to original scene geometry.
![Page 55: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/55.jpg)
55
Vertex programs are not useful for GPGPUVertex programs are not useful for GPGPU
• Fragment programs allow us to exploit full parallelism of GPU pipeline (“a processor at every pixel”).
• Vertex programs can’t read input ! [nv3x]• NV4X Cards can read vertex textures but
can not read FBOs
Rule of thumb:If computation requires intensive calculation,it should probably be in the fragment processor.
If it requires more geometric/graphic computing, it should be in the vertex processor.
![Page 56: GPU Memory Model Overview - Penn Engineering - Welcome to the](https://reader035.fdocuments.in/reader035/viewer/2022081622/613d4727736caf36b75b6e63/html5/thumbnails/56.jpg)
56
AcknowledgementsAcknowledgements
• Adam Moerschell, Shubho Sengupta UCDavis• Mike Houston Stanford University• John Owens, Ph.D. advisor UC Davis
• National Science Foundation Graduate Fellowship
• Extra slides were added by Joe Kider, Gary Katz from Suresh Venkatasubramanian, lecture 3 found at http://www.cis.upenn.edu/~suvenkat/700/
• Alteration to this slide package were made without the authorization by the original authors and should be used for educational purposes only.