Evolution of the Programmable Graphics Pipeline
Patrick CozziUniversity of PennsylvaniaCIS 565 - Spring 2011
Administrivia
Tip: google “cis 565” Slides posted before each class Tentative assignment dates on website 1st assignment handed out today
Write conciselyDue start of class, one week from today
Google group in progress FYI. GDC Early Registration - 01/24
Survey Results
15/23 – graphics experience Most students have usable video cards Lerk – don’t be scared I want to be a Toys R Us kid too
Survey Results
Class interestsPure architectureGame renderingPhysical simulationsAnimationVision algorithms Image/video processing…
Course Roadmap
Graphics Pipeline (GLSL) GPGPU (GLSL)
Briefly GPU Computing (CUDA, OpenCL) Choose your own adventure
Student PresentationFinal Project
Goal: Prepare you for your presentation and project
Agenda
Why program the GPU? Graphics Review Evolution of the Programmable Graphics
PipelineUnderstand the past
Why Program the GPU?
Graph from: http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/docs/CUDA_C_Programming_Guide.pdf
Why Program the GPU?
Graph from: http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/docs/CUDA_C_Programming_Guide.pdf
Why Program the GPU?
Compute Intel Core i7 – 4 cores – 100 GFLOP NVIDIA GTX280 – 240 cores – 1 TFLOP
Memory Bandwidth System Memory – 60 GB/s NVIDIA GT200 – 150 GB/s
Install Base Over 200 million NVIDIA G80s shipped
Numbers from Programming Massively Parallel Processors.
NVIDIA GPU Evolution
Slide from David Luebke: http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf
Graphics Review: Modeling
ModelingPolygons vs Triangles
How do you store a triangle mesh?
Implicit SurfacesHeight maps…
Triangles
Image courtesy of A K Peters, Ltd. www.virtualglobebook.com. Imagery from NASA Visible Earth: visibleearth.nasa.gov.
Implicit Surfaces
Images from GPU Gems 3: http://http.developer.nvidia.com/GPUGems3/gpugems3_ch01.html
Graphics Review: Rendering
RenderingGoal: Assign color to pixels
Two PartsVisible surfaces
What is in front of what for a given viewShading
Simulate the interaction of material and light to produce a pixel color
Visible Surfaces
Z-Buffer / Depth Buffer Fragment vs Pixel
Image courtesy of A K Peters, Ltd. www.virtualglobebook.com
Graphics Pipeline
PrimitiveAssembly
PrimitiveAssembly
VertexTransforms
VertexTransforms
Frame Buffer
Frame Buffer
RasterOperations
Rasterizationand
Interpolation
Scissor Test Stencil Test Depth Test Blending
Graphics Review: Animation
Move the camera and/or agents, and re-render the scene In less than 16.6 ms (60 fps)
Evolution of the Programmable Graphics Pipeline Pre GPU Fixed function GPU Programmable GPU Unified Shader Processors
Early 90s – Pre GPU
Slide from Mike Houston: http://s09.idav.ucdavis.edu/talks/01-BPS-SIGGRAPH09-mhouston.pdf
Why GPUs?
Exploit ParallelismPipeline parallelData-parallelCPU and GPU executing in parallel
Hardware: texture filtering, MAD, etc.
Generation I: 3dfx Voodoo (1996)
Image from “7 years of Graphics”
• Did not do vertex transformations: these were done in the CPU
• Did do texture mapping, z-buffering.
PrimitiveAssembly
PrimitiveAssembly
VertexTransforms
VertexTransforms
Frame Buffer
Frame Buffer
RasterOperations
Rasterizationand
Interpolation
CPU GPUPCI
Slide adapted from Suresh Venkatasubramanian and Joe Kider
Aside: Mario Kart 64
Image from: http://www.gamespot.com/users/my_shoe/
High fragment load / low vertex load
Aside: Mario Kart Wii High fragment load / low vertex load?
Image from: http://wii.ign.com/dor/objects/949580/mario-kart-wii/images/
Generation II: GeForce/Radeon 7500 (1998)
Slide from Suresh Venkatasubramanian and Joe Kider
VertexTransforms
VertexTransforms
• Main innovation: shifting the transformation and lighting calculations to the GPU
• Allowed multi-texturing: giving bump maps, light maps, and others..
• Faster AGP bus instead of PCI
PrimitiveAssembly
PrimitiveAssembly
Frame Buffer
Frame Buffer
RasterOperations
Rasterizationand
Interpolation
GPUAGP
Image from “7 years of Graphics”
Generation III: GeForce3/Radeon 8500(2001)
Slide from Suresh Venkatasubramanian and Joe Kider
VertexTransforms
VertexTransforms
• For the first time, allowed limited amount of programmability in the vertex pipeline
• Also allowed volume texturing and multi-sampling (for antialiasing)
PrimitiveAssembly
PrimitiveAssembly
Frame Buffer
Frame Buffer
RasterOperations
Rasterizationand
Interpolation
GPUAGP
Small vertexshaders
Small vertexshaders
Image from “7 years of Graphics”
Generation IV: Radeon 9700/GeForce FX (2002)
VertexTransforms
VertexTransforms
• This generation is the first generation of fully-programmable graphics cards
• Different versions have different resource limits on fragment/vertex programs
PrimitiveAssembly
PrimitiveAssembly
RasterOperations
Rasterizationand
Interpolation
AGPProgrammableVertex shader
ProgrammableVertex shader
ProgrammableFragmentProcessor
ProgrammableFragmentProcessor
Texture Memory
Slide from Suresh Venkatasubramanian and Joe Kider
Image from “7 years of Graphics”
Generation IV.V: GeForce6/X800 (2004)
Slide adapted from Suresh Venkatasubramanian and Joe Kider
Simultaneous rendering to multiple buffers True conditionals and loops PCIe bus Vertex texture fetch
VertexTransforms
VertexTransforms
PrimitiveAssembly
PrimitiveAssembly
Frame Buffer
Frame Buffer
RasterOperations
Rasterizationand
Interpolation
PCIeProgrammableVertex shader
ProgrammableVertex shader
ProgrammableFragmentProcessor
ProgrammableFragmentProcessor
Texture Memory Texture Memory
NVIDIA NV40 Architecture
Image from GPU Gems 2: http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter30.html
6 vertexshader units
16 fragmentshader units
Vertex TextureFetch
Generation V: GeForce8800/HD2900 (2006)
Slide adapted from Suresh Venkatasubramanian and Joe Kider
Ground-up GPU redesign Support for Direct3D 10 / OpenGL 3 Geometry Shaders Stream out / transform-feedback Unified shader processors Support for General GPU programming
Input Assembler
Input Assembler
ProgrammablePixel (Fragment)
Shader
ProgrammablePixel (Fragment)
Shader
RasterOperations
ProgrammableGeometry
Shader
PCIe
ProgrammableVertex shader
ProgrammableVertex shader
OutputMerger
D3D 10 Pipeline
Image from David Blythe : http://download.microsoft.com/download/f/2/d/f2d5ee2c-b7ba-4cd0-9686-b6508b5479a1/direct3d10_web.pdf
Geometry Shaders
Image from David Blythe : http://download.microsoft.com/download/f/2/d/f2d5ee2c-b7ba-4cd0-9686-b6508b5479a1/direct3d10_web.pdf
NVIDIA G80 Architecture
Slide from David Luebke: http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf
NVIDIA G80 Architecture
Slide from David Luebke: http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf
Why Unify Shader Processors?
Slide from David Luebke: http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf
Why Unify Shader Processors?
Slide from David Luebke: http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf
Unified Shader Processors
Slide from David Luebke: http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf
Terminology
Shader Model
Direct3D OpenGL Video card
Example
2 9 2.x NVIDIA GeForce 6800
ATI Radeon X800
3 10.x 3.x NVIDIA GeForce 8800
ATI Radeon HD 2900
4 11.x 4.x NVIDIA GeForce GTX 480
ATI Radeon HD 5870
Evolution of the Programmable Graphics Pipeline
Slide from Mike Houston: http://s09.idav.ucdavis.edu/talks/01-BPS-SIGGRAPH09-mhouston.pdf
Evolution of the Programmable Graphics Pipeline
Slide from Mike Houston: http://s09.idav.ucdavis.edu/talks/01-BPS-SIGGRAPH09-mhouston.pdf
Not covered today:SM 5 / D3D 11 / GL 4Tessellation shaders
*cough* student presentation *cough*
Later this semester: NVIDIA Fermi Dual warp scheduler Configurable L1 / shared memory Double precision …
Evolution of the Programmable Graphics Pipeline
Top Related