OpenGL 3.2 and More

84
San Jose | September 30, 2009 | Mark J. Kilgard, NVIDIA Corporation San Jose | September 30, 2009 | Mark J. Kilgard, NVIDIA Corporation OpenGL 3.2 and More OpenGL 3.2 and More

description

Presented September 30, 2009 in San Jose, California at GPU Technology Conference.Describes the new features of OpenGL 3.2 and NVIDIA's extensions beyond 3.2 such as bindless graphics, direct state access, separate shader objects, copy image, texture barrier, and Cg 2.2.

Transcript of OpenGL 3.2 and More

Page 1: OpenGL 3.2 and More

San Jose | September 30, 2009 | Mark J. Kilgard, NVIDIA CorporationSan Jose | September 30, 2009 | Mark J. Kilgard, NVIDIA Corporation

OpenGL 3.2 and MoreOpenGL 3.2 and More

Page 2: OpenGL 3.2 and More

Mark J. Kilgard

• Principal System Software Engineer– OpenGL driver

– Cg shading language

• OpenGL Utility Toolkit (GLUT) implementer• co-author of Cg Tutorial

Page 3: OpenGL 3.2 and More

Overview

• OpenGL 3.2– Available today

– What’s in it?

• NVIDIA’s additional functionality– Above & beyond OpenGL 3.2

Page 4: OpenGL 3.2 and More

A brief 2-slide review ofOpenGL 3.0 & 3.1

Before we get really started…

You are already familiar and using OpenGL 3.1 aren’t you??

Page 5: OpenGL 3.2 and More

For review, OpenGL 3.0• Texturing

– Integer & floating-pointtexture formats

– Compact floating-point formats

– sRGB color space texture formats

– 1- and 2-component compressed texture formats

– 1D and 2D texture array targets

• Miscellaneous– Vertex array objects

– Conditional rendering

– Multisample-aware stretch blits

– Fine control over mapping & flushing buffer sub-ranges

• Framebuffer functionality– Render-to-texture with

framebuffer objects

– sRGB blending

– Packed depth/stencil formats for render-buffers (and texturing)

– Per-color-attachment blend enables and color write masks

• Shader improvements– OpenGL Shading Language 1.30

Page 6: OpenGL 3.2 and More

For review, OpenGL 3.1• Texturing

– Guarantees 16 texture units

– Texture buffer objects

– Texture rectangle target: 2D image with [0..width, 0..height] coordinate space

– Signed normalized texture formats

• Miscellaneous– Fast data copying between

buffer objects

– Primitive restart indexfor vertex arrays

• Shader improvements– OpenGL Shading Language 1.40– Shader can access uniform

values from buffer objects– Instanced rendering provides

instance counter to vertex shader

Page 7: OpenGL 3.2 and More

OpenGL 3.2

modern GPU functionality,platform portability,

API maturity & completeness

Page 8: OpenGL 3.2 and More

From the 1994 OpenGL 1.1 Data Flow…

vertexprocessing

rasterization& fragment

coloring

textureraster

operationsframebuffer

pixelunpack

pixelpack

vertexpuller

clientmemory

pixeltransfer

glReadPixels / glCopyPixels / glCopyTex{Sub}Image

glDrawPixelsglBitmapglCopyPixels

glTex{Sub}ImageglCopyTex{Sub}Image

glDrawElementsglDrawArrays

selection / feedback / transform feedback

glVertex*glColor*glTexCoord*etc.

blendingdepth testingstencil testingaccumulation

storageaccess

operations

Page 9: OpenGL 3.2 and More

…OpenGL 1.0 in detail

Vertexprocessing

Pixelprocessing

Texturemapping

Imageprimitive

processing

Rasteroperations

Framebuffer

Commandparser

Pixelunpacking

Pixelpacking

Vertexassembly

textureimagespecification

imagerectangles,bitmaps

primitive topology,transformedvertex data

stenciling, depth testing,blending, accumulation

pixelimage

primitive batch type,vertex attributes

primitivebatchtype,

vertex data

fragmenttexturefetches

pixel image ortexture imagespecification

image and bitmapfragments

point, line,and polygonfragments

pixels to pack

unpackedpixels

pixels

fragments

filtered texels

buffer data

vertices

Legend

programmableoperations

fixed-functionoperations

copy pixels,copy texture image

Fragmentprocessing

Geometric primitiveassembly &processing

Page 10: OpenGL 3.2 and More

…to the 2009 OpenGL 3.2 Data Flow

Vertexprocessing

Pixelprocessing

Texturemapping

Geometric primitiveassembly &processing

Imageprimitive

processing

Rasteroperations

Framebuffer

Commandparser

Transformfeedback

Pixelunpacking

Pixelpacking

Vertexassembly

pixels in framebuffer object textures

texturebufferobjects

textureimagespecification

imagerectangles,bitmaps

primitive topology,transformedvertex data

vertextexturefetches

pixelpackbufferobjects

pixelunpack

bufferobjects

vertexbufferobjects

transformfeedbackbufferobjects

buffer data,unmapbuffer

geometrytexturefetches

primitive batch type,vertex indices,vertex attributes

primitivebatchtype,

vertex data

fragmenttexturefetches

pixel image ortexture imagespecification

map buffer,get buffer

data

transformedvertexattributes

image and bitmapfragments

point, line,and polygonfragments

pixels to pack

unpackedpixels

pixels

fragments

filtered texels

buffer data

vertices

Legend

programmableoperations

fixed-functionoperations

copy pixels,copy texture image

Bufferstore

uniform/parametersbuffer objects

Fragmentprocessing

stenciling, depth testing,blending, accumulation

Page 11: OpenGL 3.2 and More

Buffer Centric View of OpenGL

Vertex Array Buffer Object (VaBO)

Transform Feedback Buffer (XBO)

Parameter Buffer (PaBO)

Pixel Unpack Buffer (PuBO)

Pixel Pack Buffer (PpBO)Bindable

Uniform Buffer (BUB)

Texture Buffer Object (TexBO)

Vertex Puller

Vertex Shading

Geometry Shading

FragmentShading

Texturing

Array Element Buffer Object (VeBO)

Pixel Pipeline

vertex data

texel data

pixel data

parameter data(not ARB functionality yet)

glBegin, glDrawElements, etc.

glDrawPixels, glTexImage2D, etc.

glReadPixels,etc.

Framebuffer

Page 12: OpenGL 3.2 and More

OpenGL 3.2 Functional Overview• Direct3D-isms

– BGRA vertex component ordering– Provoking vertex convention– Drawing commands allowing modification of the base vertex index– Upper-left and lower-left fragment coordinate conventions

• Geometry shaders– Per-primitive programmability

• Shader improvements– OpenGL Shading Language 1.50

• Miscellaneous– Depth clamping, synchronization, seamless cube map filtering,

multisample improvements

Page 13: OpenGL 3.2 and More

Direct3Disms

better OpenGL & Direct3Dcontent portability

Page 14: OpenGL 3.2 and More

Direct3Dism Motivation

• A posteriori “3D content tied to API” scheme

– Without intending it, 3D application content gets tied to API’s conventions

YourOpenGL

application

OpenGLdriver

same GPU

Direct3Ddriver

Your OpenGLapplication

content

YourDirect3D

application

Your Direct3Dapplication

content

OpenGLconventions

Direct3Dconventions

contentauthored

to OpenGLconventions

contentauthored

to Direct3Dconventions

OpenGLAPI

Direct3DAPI

hardwareinterface

3D APIinterface

Page 15: OpenGL 3.2 and More

NVIDIA Recognizes 3D API Reality• You decide the 3D API best for your application

– Lots of reasons to pick your API choice• Target systems, intended market, cross-platform requirements,

software legacy, content creation vs. deployment, etc.

• Fundamentally, NVIDIA believes in Visual Computing (not APIs)

– So is essentially agnostic about your 3D API choice

– OpenGL, Direct3D 9/10/11, or OpenGL ES• NVIDIA provides best implementations of all options; you pick

• NVIDIA’s belief in Visual Computing means– Your 3D API choice shouldn’t tie down your 3D application or 3D

content

Page 16: OpenGL 3.2 and More

Direct3Dism Concept

• Allows your 3D content to be API agnostic– OpenGL supports both OpenGL & Direct3D conventions, so support either

styleYour

OpenGLapplication

OpenGLdriver

GPU

Direct3Ddriver

Your OpenGLapplication

content

YourDirect3D

application

Your Direct3Dapplication

content

OpenGLAPI

Direct3DAPI

contentauthored

to OpenGLconventions

contentauthored

to Direct3Dconventions

OpenGL+ Direct3D

conventions

Direct3Dconventions

hardwareinterface

3D APIinterfaceDirect3D conventions

supported by OpenGL too

Page 17: OpenGL 3.2 and More

Convention OpenGL Direct3D Addressed by

Window originLower-left,

pixels at half-integers

Upper-left,pixels on integers (DX9)

pixels on half-integers (DX 10)

projection matrix & front-facing

re-configuration

Clip space [-1…+1]3 [-1…+1]2[0…1] projection matrixre-configuration

4-byte vertex color RGBA BGRA OpenGL 3.2

Provoking vertexfor flat-shading

Last vertex of primitive (mostly)

First vertex of primitive OpenGL 3.2

Fragment coordinate origin

Lower-left Upper-left OpenGL 3.2

Shading Language syntax

GLSL HLSL 9, 10, and 11 Cg

Shader bind granularityLinked (for GLSL)

Per-domain (for Cg & assembly)

Per-domainEXT_separate

shader_objects

Object manipulation Bind-to-edit,Bind-to-query

Edit-by-name,Query-by-name

EXT_direct_state_access

OpenGL & Direct3D Conventions

Page 18: OpenGL 3.2 and More

Dealing with API Convention Differences• Innocuous differences

– API granularity• OpenGL fine-grain state vs.

Direct3D 10 state blocks• OpenGL selectors versus

Direct3D direct state access

– Easily dealt with by reconfiguring existing state

• Examples: window origin & clip space conventions

• Formidable differences– Format differences

• Unsupported formats such as 4-byte BGRA vertex colors

– Inconsistent state management• Per-domain shaders vs. monolithic

GLSL shaders

– Shaders coded to a particular shading language syntax

• GLSL vs. HLSL,achieve commonality via Cg

– Conventions baked into shaders• Fragment coordinate origin as

visible from a fragment shaderfairly easy to address inyour application difficult to address

without 3D API help

Page 19: OpenGL 3.2 and More

Impetus for Direct3Dism Effort

• Many software companies motivated this effort– TransGaming

– Blizzard

– Destineer

– Aspyr

– CodeWeavers

• Direct result of feedback from 3D software engineers

– Yes, you can influence OpenGL’s direction & course

Page 20: OpenGL 3.2 and More

Supporting Direct3Disms Not New to OpenGL• OpenGL has always supported multiple formats well

– OpenGL’s plethora of pixel and vertex formats– Very first OpenGL extension: EXT_bgra

• Provides a pixel component ordering to match the color component ordering of Windows for 2D GDI rendering

• Made core functionality by OpenGL 1.3

• Many OpenGL extensions have embraced Direct3Disms– Secondary color– Fog coordinate– Point sprites– OpenGL 3.0’s fine-grain buffer mapping

Page 21: OpenGL 3.2 and More

BGRA Vertex Array Order• Direct3D 9’s most common usage for per-vertex colors is 32-bit D3DCOLOR data

type:

– Red in bits 16:23

– Green in bits 8:15

– Blue in bits 0:7

– Alpha in bits 24:31

• Laid in memory, looks like BGRA order

– OpenGL assumes RGBA order for all vertex arrays

– However Direct3D colors not stored in packed unsigned bytes have RGBA order

• Direct3Dism EXT_vertex_array_bgra extension allows:

glColorPointer(GL_BGRA, GL_UNSIGNED_BYTE, stride, pointer);glSecondaryColorPointer(GL_BGRA, GL_UNSIGNED_BYTE, stride, pointer);glVertexAttribPointer(GL_BGRA, GL_UNSIGNED_BYTE, stride, pointer);

8-bitred

8-bitalpha

8-bitgreen

8-bitblue

bit 31 bit 0

Page 22: OpenGL 3.2 and More

Provoking Vertex Order Conventions• Direct3D uses “first” vertex of a triangle or line to

determine which color is used for flat shading

• OpenGL uses “last” vertex for lines, triangles, and quads– Except for polygons (GL_POLYGON) mode that use the first vertex

Direct3D 9pDev->SetRenderState( D3DRS_SHADEMODE, D3DSHADE_FLAT);

OpenGLglShadeModel(GL_FLAT);

Input triangle stripwith per-vertex colors

Page 23: OpenGL 3.2 and More

Configurable Provoking Vertex

• Easy-to-use API– New command glProvokingVertex// “native” OpenGL conventionglProvokingVertex(GL_LAST_VERTEX_CONVENTION);// Direct3D conventionglProvokingVertex(GL_FIRST_VERTEX_CONVENTION);

– OpenGL 3.2 promotion of EXT_provoking_vertex extension

• Affects– fixed-function glShadeModel– flat shaded attributes for fragment shaders– geometry shaders that emit flat shaded attributes

Page 24: OpenGL 3.2 and More

Provoking Vertex Details• Provoking vertex sounds really obscure

– Technically shade model is part of “deprecated” feature set of OpenGL

– However very common mode for real-time strategy games• Many, many objects drawn this way

• Very difficult for application to “juggle” vertex data to match API’s native provoking vertex convention

– Particularly when using vertex buffer objects

• Quad behavior may vary– Direct3D doesn’t support quadrilateral primitives

– So “first vertex” provoking vertex convention may or may not apply to quadrilateral primitives

• GeForce 8 say true for “quads follow the convention”

• GeForce 7 and earlier say false for “quads follow the convention”

• Check GL_QUADS_FOLLOW_PROVOKING_VERTEX_CONVENTION boolean if you care

Page 25: OpenGL 3.2 and More

Provoking Vertex BehaviorPrimitive type of polygon i First vertex

conventionLast vertex convention

GL_POINT i i

GL_LINES 2i-1 2i

GL_LINE_LOOPi i+1, if i<n

1, if i=n

GL_LINE_STRIP i i+1

GL_TRIANGLES 3i-2 3i

GL_TRIANGLE_STRIP i i+2

GL_TRIANGLE_FAN i+1 i+2

GL_QUADS4i-34i

4i, if quads follow provoking vertex4i, if not

GL_QUAD_STRIP2i-12i+2

2i+2, if quads follow provoking vertex2i+2, if not

GL_POLYGON i i

GL_LINES_ADJACENCY 4i-2 4i-1

GL_LINE_STRIP_ADJACENCY i+1 i+2

GL_TRIANGLE_ADJACENCY 6i-5 6i-1

GL_TRIANGLE_STRIP_ADJACENCY 2i-1 2i+3

same

same

same

same

geometryshaderprimitives

Page 26: OpenGL 3.2 and More

Direct3D vs. OpenGLCoordinate System Conventions• Window origin

conventions– Direct3D = upper-left origin– OpenGL = lower-left origin

• Pixel center conventions– Direct3D9 = pixel centers at

integer locations– OpenGL and Direct3D 10 =

pixel centers at half-pixel locations

• Makes pixel centers for rasterization “match” texel centers for texturing

• Clip space conventions– Direct3D = [-1,+1] for XY, [0,1]

for Z– OpenGL = [-1,+1] range for

XYZ

• Affects– How projection matrix is loaded– Fragment shaders that access

the window position– Point sprites have upper-left

texture coordinate origin• OpenGL already lets application

choose lower-left or upper-left

Page 27: OpenGL 3.2 and More

3 APIs, 3 Different Window Space Conventions

• Pixel center grids coordinate systems

OpenGL Direct3D 9 Direct3D 10

Upper-left origin

Lower-left origin= pixel sample center

Page 28: OpenGL 3.2 and More

Direct3D 9 to OpenGL

• How to go from Direct3D ’s– [-1,+1]x[-1,+1]x[0,1] clip space to OpenGL’s [-1,+1]3

– integer-centered pixel centers to OpenGL’s half-pixel centers

• Simple state adjustment– Projection matrix fudge

glMatrixLoadIdentityEXT(GL_PROJECTION);glMatrixScalefEXT(GL_PROJECTION, 1, -1, 2);glMatrixTranslatefEXT(GL_PROJECTION, 0.5/windowWidth, 0.5/windowHeight, -0.5);

– Reverse convention for what is front-facingglFrontFace(GL_CW); // OpenGL default is GL_CCW

• Compensates for y-flip that reverses coordinate system’s handedness

• No need for API additions to support Direct3D 9’s system

Page 29: OpenGL 3.2 and More

Direct3D 10 to OpenGL

• How to go from Direct3D 10’s– [-1,+1]x[-1,+1]x[0,1] clip space to OpenGL’s [-1,+1]3

– where both APIs have half-pixel centers

• Simple state adjustment– Projection matrix fudge

glMatrixLoadIdentityEXT(GL_PROJECTION);glMatrixScalefEXT(GL_PROJECTION, 1, -1, 2);glMatrixTranslatefEXT(GL_PROJECTION, 0, 0, // no half-pixel shift for Direct3 10 -0.5);

– Reverse convention for what is front-facingglFrontFace(GL_CW); // OpenGL default is GL_CCW

• Compensates for y-flip that reverses coordinate system’s handedness

• Again, no need for API additions to support Direct3D 10’s system

Page 30: OpenGL 3.2 and More

Fragment Coordinate Convention Usage• Typically used in post-processing shaders

Examples:– Motion blur– Depth-of-field

• Shader assumes aparticular convention for the fragmentcoordinate origin

– Attempting to “re-write” Direct3D shader tends to• Compromise shader performance

– Introduces new “window height” uniform thatmust be always set correctly

• Hard to do automatically and robustly

• Robust approach: Allow shader author (or automatic translator) tospecify convention explicitly

Page 31: OpenGL 3.2 and More

Fragment Shader Coordinate Conventions• Required GLSL introduction#extension GL_ARB_fragment_coord_conventions : require

• Pick one of the following GLSL declarations: // “native” OpenGL conventionin vec4 gl_FragCoord;

// DirectX 9 conventionlayout(origin_upper_left, pixel_center_integer) in vec4 gl_FragCoord;

// DirectX 10 conventionlayout(origin_upper_left) in vec4 gl_FragCoord;

• Also supported by NVIDIA assembly extensionsOPTION ARB_fragment_coord_origin_upper_left;OPTION ARB_fragment_coord_pixel_center_integer;

Page 32: OpenGL 3.2 and More

Deprecation

there’s “old” & there’s “still supported”

Page 33: OpenGL 3.2 and More

Deprecation – OpenGL ARB view• OpenGL has never removed features. However,

• After 15+ years, defining new features to work with old features becomes increasingly difficult

• OpenGL 3.0 marks features as deprecated– OpenGL 3.0 does not remove any features

– Redundant, legacy and obsolete features

– Parts of OpenGL unlikely to be accelerated

– Guidance to developers to prepare for future revisions

Page 34: OpenGL 3.2 and More

Deprecation – OpenGL ARB view

• OpenGL 3.1 removed these deprecated features

• Added support back with ARB_compatibility extension

• OpenGL 3.2 formalized this in two profiles– “Core” profile with features removes

– “Compatibility” profile with all features present

• Implementation of “Core” mandatory– “Compatibility” optional

Page 35: OpenGL 3.2 and More

Deprecation – NVIDIA view

• Set of removed functionality is in use by applications today, helping our customer’s business

• Using just “Core” OpenGL 3.2 is a huge effort in rewriting existing code

• OpenGL 3.2 “Core” not offering enough incentive to re-write existing code

• Deprecation is NOT in the best interest of ISVs and therefore not in NVIDIA’s business interest

Page 36: OpenGL 3.2 and More

Deprecation – NVIDIA view

• We will not remove ANY feature from our drivers

• OpenGL on NVIDIA will be fully backwards compatible

• NVIDIA has and will ship the Compatibility profile

• NVIDIA will fully support, tune and bug fix all features

• See our public statement:http://developer.nvidia.com/object/opengl_3_driver.html

Page 37: OpenGL 3.2 and More

Deprecation – Myths

• Feature removal will result in a faster driver

• Feature removal will result in a higher quality driver

• Feature removal will result in a cleaner API

• Not removing features means OpenGL will die

• Only useless features were deprecated– Far from true

Page 38: OpenGL 3.2 and More

So You can just ignore Deprecation• NVIDIA values OpenGL API backward compatibility

– We don’t take API functionality away from you– We aren’t going to force you to re-write apps

• Does deprecated functionality “stay fast”?– Yes, of course—and stays fully tested– Bottom-line: Old & new features run fast

Page 39: OpenGL 3.2 and More

Geometry Shaders

per-primitiveprogrammability

Page 40: OpenGL 3.2 and More

Geometry Shaders via OpenGL

• Programmability for geometric primitives– one geometric primitive in, zero or more primitives out

• Supported by NVIDIA’s OpenGL driver since GeForce 8 launch

– NV_gpu_program4 for assembly– Cg 2.x’s gp4gp “geometry” profile– NV_geometry_shader4 / EXT_geometry_shader4 for (GLSL)

• Standardized as an ARB extension in OpenGL 3.1 timeframe– ARB_geometry_shader4

• Now finally core functionality in OpenGL 3.2– Essentially unchanged from EXT and ARB versions

Page 41: OpenGL 3.2 and More

Geometry Shaders

• New programmable shader domain

– Operates on assembled primitives

• Triangles, lines, points, and new adjacency primitives

– Outputs zero or more primitives

• Must be point, line stripes, or triangle strips

• Primitive restarts allowed

• Warning: Not well suited for unbounded tessellation

application

Vertex shader

Primitive assembly

Geometry shader

Rasterizer

Fragment shader

Raster operations

framebuffer

applicationprogrammable

Page 42: OpenGL 3.2 and More

silhouetteedgedetectiongeometryprogram

Complete mesh

Silhouette edges

Useful for non-photorealistic rendering

Looks like human sketching

Geometry Shader Silhouette Edge Rendering

Page 43: OpenGL 3.2 and More

More Geometry Shader Examples

Shimmering point

sprites

Generate fins for lines

Generate shells for

fur rendering

Page 44: OpenGL 3.2 and More

Improved Interpolation

• Using geometry shader functionality

Quadratic normal interpolation

True quadrilateral rendering with mean value coordinate interpolation

Page 45: OpenGL 3.2 and More

“Fair” Quadrilateral Interpolation• glBegin(GL_QUADS);• glColor3fv(red); glVertex3fv(lowerLeft);

• glColor3fv(green); glVertex3fv(lowerRight);

• glColor3fv(red); glVertex3fv(upperRight);

• glColor3fv(blue); glVertex3fv(upperLeft);

• glEnd();

• Geometry shader actually operates on 4-vertex GL_LINE_ADJACENCY primitives instead of quads

Wrong, slashtriangle split

Wrong, backslashtriangle split

Better: Mean valuecoordinates

Page 46: OpenGL 3.2 and More

Geometry Shader-based Bump Map Setup• Vertex shader does skinning

– Problem: how does texture-space basis for bump mapping respond to arbitrary skinning?

– Solution: geometry shader constructs per-triangle texture-basis using post-skinning vertex positions and normals

• So geometry shader:– Computes object-to-texture space basis

for triangle

• Can account of texture mirroring in normal map

– Transforms object-space vectors to texture space

– Outputs triangle

• Fragment shader uses texture-space normals for bump map shading

Page 47: OpenGL 3.2 and More

Cg Code

• Shader performs texture-basis setup

• Can compile to GLSL or HLSL 10 code

– Cg 2.2 feature

• See working example code in Cg 2.2

TRIANGLE voidmd2bump_geometry(AttribArray<float4> position : POSITION, AttribArray<float2> texCoord : TEXCOORD0, AttribArray<float3> objPosition : TEXCOORD1, AttribArray<float3> objNormal : TEXCOORD2, AttribArray<float3> objView : TEXCOORD3, AttribArray<float3> objLight : TEXCOORD4){ float3 dXYZdU = objPosition[1] - objPosition[0]; float dSdU = texCoord[1].s - texCoord[0].s; float3 dXYZdV = objPosition[2] - objPosition[0]; float dSdV = texCoord[2].s - texCoord[0].s; float3 tangent = normalize(dSdV * dXYZdU - dSdU * dXYZdV); float area = determinant(float2x2(dSTdV, dSTdU)); float3 orientedTangent = area >= 0 ? tangent : -tangent; for (int i=0; i<3; i++) { float3 normal = objNormal[i], binormal = cross(tangent,normal); float3x3 basis = float3x3(orientedTangent, binormal, normal); float3 surfaceLightVector : TEXCOORD1 = mul(basis, objLight[i]); float3 surfaceViewVector : TEXCOORD2 = mul(basis, objView[i]);

emitVertex(position[i], texCoord[i], surfaceLightVector, surfaceViewVector); }}

Page 48: OpenGL 3.2 and More

Geometry Shader-basedShadow Volume Generation

un-shadowedbump-mapped shading via geometry shadertexture-space basis setup

shadowvolume

extrusionby geometry

shader

shadowregionstencil

multi-passcombination

of shadowed andun-shadowed shading

Page 49: OpenGL 3.2 and More

Miscellaneous

some other 3.2 goodness

Page 50: OpenGL 3.2 and More

Tripped Up By Near/Far Clipping

• Conventionally 3D APIs “clip” to near & far view frustum planes

– Results in classic artifacts• Geometry is “cut open” by near clip plane

• Naïvely moving near plane closer poorly distributes depth buffer precision

• Alternatively, geometry is “lost” beyond the far clip plane

noclippingproblem

closer to aliennear clip planecuts open alien

head

Page 51: OpenGL 3.2 and More

Depth Clamping to the Rescue

• Depth clamping API– Easy to enable/disableglEnable(GL_DEPTH_CLAMP);glDisable(GL_DEPTH_CLAMP);

• What it does– Disables near & far clip planes

• But this allows depth values to interpolate beyond [0,1] representable range of the depth buffer

– So additionally clamps interpolated values to [0,1] range

Page 52: OpenGL 3.2 and More

Depth Clamping Applications

• Avoid near plane “cut opens” via depth clamping– Fragment shader replaces color of z=0 fragments with black

– In GLSL:if (gl_FragCoord.z == 0) gl_FragColor = vec4(0,0,0,1);

– Alternatively, use Painter’s algorithm for objects at the near plane• Last (or first) fragment at z=0 “wins”

• Infinite Z-fail Shadow volumes– See [Everett & Kilgard 2002]

– Conserves depth buffer precision when eye-space infinity must be within depth range

Page 53: OpenGL 3.2 and More

Near Plane Depth Clamping Example

without depth clamping depth clamping enabled *

* simple situation because depth complexity at z=0 is a single layer

Page 54: OpenGL 3.2 and More

Seam-free Cube Map Edges

• Cube maps have edges along each face

– Traditionally texture mapping hardware simply clamps to these seam edges

• Results in “seam” artifacts– Particularly when level-of-detail

bias is large• Meaning very blurry levels

• But seams appear sharply

• Use glEnable( GL_TEXTURE_CUBE_MAP_SEAMLESS) to mitigate these artifacts

seam

Page 55: OpenGL 3.2 and More

Seamless Cube Maps: Before and After• Before: with edge seams • After: without

seams

Page 56: OpenGL 3.2 and More

Remaining OpenGL 3.2 Features• Async objects

– Synchronization of GPU completion

– Supports synchronization between multiple contexts

• Draw elements base index– Provides a base added to all vertex indices

• Multisampled renderbuffers– Also can query framebuffer’s sample locations

Page 57: OpenGL 3.2 and More

Beyond OpenGL 3.2

NVIDIA’s further contributions

Page 58: OpenGL 3.2 and More

• Texture arrays– 1D texture array

– 2D texture array

– Cube map texture array

• Multisample– 2D texture multisample

– 2D texture array multisample

All of OpenGL’s Texture Targets

• Conventional targets– 1D texture

– 2D texture

– 3D texture

• Special addressing– Cube map texture

• cube face selection

– Rectangle texture• [0..w]x[0..h] range

– Texture buffer• 1D unfiltered buffer objects

Page 59: OpenGL 3.2 and More

Bindless Graphics• NVIDIA keeps building faster and faster GPUs

– But that x86 core feeding the GPU isn’t getting faster at anything near the same rate!

– Makes your application more & more likely to be CPU limited, instead of GPU limited

• Bundling OpenGL state in objects helps– But time goes on… GPUs keep getting faster…

• Eventually even binding to objects becomes a bottleneck– Hence the desire for “bindless” graphics

• Extensions:– NV_vertex_buffer_unified_memory (VBUM) for bindless vertex pulling– NV_shader_buffer_load (SBL) for bindless buffer loads from shaders

Page 60: OpenGL 3.2 and More

“Classic” OpenGL 1.0 Model

Application

Driver

GPU command buffer

GPUVideo

memory

wide streamof commands

wide interconnect

• OpenGL commands contains data directly

– Examples: immediate mode vertices, pixels to draw, downloaded texels

• Inefficient– All data flows through the

CPU

– GPU can’t access the data directly from video memory

Page 61: OpenGL 3.2 and More

Object Bind Model of OpenGL 2.x/3.x

• OpenGL commands “name” objects to use

– Objects allow GPU to access object data (texels, vertices, pixels, constants, etc.) via fast video memory directly

• Driver must lookup and access object’s vital information

– Tends to generate lots of cache misses

– Cache misses are the bane of modern, fast CPUs

Application

Driver

GPU command buffer

GPUVideo

memory

narrow streamof commands

wide interconnect

Systemmemory

expensive streamof cache misses

Page 62: OpenGL 3.2 and More

Bindless Graphics Model of OpenGL

• OpenGL commands and shaders can use GPU addresses of buffers

– So driver doesn’t have to translate to addresses

– & doesn’t take cache misses

• GPU addresses for– Vertex buffer offsets

– Constant loads from buffers within shaders

Application

Driver

GPU command buffer

GPUVideo

memory

narrow streamof commands

wide interconnect

feedback GPUaddress at creation time

Page 63: OpenGL 3.2 and More

Direct State Access• Existing OpenGL model

– Bind-to-edit, bind-to-query, bind-to-use• One bind operation for all three purposes

– To change a GL object, you must first “bind” to it• Example

glBindTexture(GL_TEXTURE_2D, obj);glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);

• Bind-to-edit leads to unnecessary re-validations

• NEW additional Direct State Access (DSA) approach– Edit-by-name– To change a GL object, name the object to change

• ExampleglTextureParameteriEXT(obj, GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);

• Extension: EXT_direct_state_access

Page 64: OpenGL 3.2 and More

What is the root of the problem?• “Selectors”

– OpenGL state that tells which state other OpenGL commands should update

• Think of selectors as “sticky” phantom parameters to all your matrix, texture, program, buffers, etc. commands and queries

– Examples of selectors• glMatrixMode• glActiveTexture• glBindTexture• glBindProgramARB• glUseProgram

two distinct selectors fortexture commands, extra confusing

Page 65: OpenGL 3.2 and More

Reasons to Avoid Selectors

• Direct3D has an “edit-by-name” model of operation– Means Direct3D has no selectors– Having to manage selectors when porting Direct3D or console code

to OpenGL is awkward• Requires deferring updates to minimize selector and object bind changes

• Layered libraries can’t count of selector state– To be safe when updating sate controlled by selectors, such libraries

must use idiom• Save selector, Set selector, Update state, Restore selector• Bad for performance, particularly bad for dual-core drivers since queries

are expensive

– Cg 2.2 October 2009 makes use of DSA automatically when available

Page 66: OpenGL 3.2 and More

Direct State Access Advantages• Less error-prone

– Consider this codeglRotatef(phi, x,y,z);

– Which matrix did you change?• Depends on how the matrix mode selector was last left!

– Instead consider the DSA versionglMatrixRotatefEXT(GL_MODELVIEW, phi, x,y,z);

• Another example• Consider this code

glActiveTexture(GL_TEXTURE3);some_function();glBindTexture(GL_TEXTURE2D, 89);

• But what if some_function calls glActiveTexture?• It might not now, but could in the future!

• Instead useglBindMultiTextureEXT(GL_TEXTURE3, GL_TEXTURE_2D, 89);

• Problem solved!

Page 67: OpenGL 3.2 and More

Direct State Access Advantages• More efficient layered libraries

– Consider a library that uses OpenGL commands to create a texture object from an image file

• Example: loadPNGtoGLtexture(GLuint texobj, …);

• Ideally, calling loadPNGtoGLtexture shouldn’t disturb the current bound texture

– Preserving the current bound texture requires a save-selector/change-state/restore-selector idiom

GLint saved_current_binding;glGetIntegerv(GL_TEXTURE_BINDING_2D, &saved_current_binding);glBindTexture(GL_TEXTURE_2D, texobj);// now you can change texobj with bind-to-edit commandsglBindTexture(GL_TEXTURE_2D, saved_current_binding);

– But save/change/restore undermines dual-core OpenGL operation• Because GL queries of the selector sync the app and driver threads

– DSA routines avoid disturbing selectors• Cg 2.2 October 2009 is an example of such a library

Page 68: OpenGL 3.2 and More

Latched State

• Direct State Access solves another problem– Some OpenGL state is “latched” by subsequent

commands– Think of latched state as phantom parameters to

commands that come from the OpenGL state• Examples: pixel store (pack/unpack) state, vertex array state

• Provides new commands– glPushClientAttribDefaultEXT command

• Like glPushClientAttrib but also resets affected state to default

– Fast and efficient

Page 69: OpenGL 3.2 and More

Copy Image

• Fast copies of pixels between image objects– 1D textures, 2D textures, 3D textures, cube maps, texture

rectangles, 1D texture arrays, 2D texture arrays, cube map texture arrays, & render-buffers all work

• Pixel data can be 1D, 2D, or 3D• Best part

– Image objects can belong to distinct OpenGL rendering contexts

– Even when contexts do not share objects!

– Even when contexts on system’s different physical GPUs

• Extension: NV_copy_image

Page 70: OpenGL 3.2 and More

Basic Copy Image Command

• Basic prototype, for within a context

void glCopyImageSubDataNV( GLuint srcName, GLenum srcTarget, GLint srcLevel, GLint srcX, GLint srcY, GLint srcZ, GLuint dstName, GLenum dstTarget, GLint dstLevel, GLint dstX, GLint dstY, GLint dstZ, GLsizei width, GLsizei height, GLsizei depth);

• Color key:– source arguments– destination arguments– sub-image dimensions

Page 71: OpenGL 3.2 and More

Texture Barrier

• Background– Framebuffer objects allow rendering into textures

– Nothing keeps you from sampling a texture you are also bound to, though the behavior is specified to be undefined

• Provides a mechanism to avoid read-after-write hazards when rendering into a bound texture

– In limited circumstances• Reads (including all filtered samples) and writes are to/from disjoint

pixels

• There is only a single read and write of a pixel by a fragment shader “over” that pixel without an intervening glTextureBarrierNV() command

• Extension: NV_texture_barrier

Page 72: OpenGL 3.2 and More

Improved: Parameter Buffer Object• Parameter buffer objects give shaders access to values stored in buffers

– Also called constant or uniform buffers– Supported by Cg 2.2’s BUFFER semantics

• Originally just 32-bit scalars or 32-bit 4-component vectors

– Now 1, 2, 4, 8, or 16 byte accesses allowed

• Extension: NV_parameter_buffer_object2

Page 73: OpenGL 3.2 and More

Separate Shader Objects

• Combining different GLSL shaders at once

– Needed linking

• Better to allow mixing and matching of shader objects

– Like Direct3D– Like OpenGL

assembly extensions

• Extension: EXT_separate_shader_objects (SSO)

Specular brickbump mapping

Red diffuse

Wobblytorus

Smoothtorus

DifferentGLSLvertex

shaders

Different GLSL fragment shaders

Page 74: OpenGL 3.2 and More

Separate Shader Object Binding

• Per-domain bindingglUseShaderProgramEXT(GL_VERTEX_SHADER, vprog);glUseShaderProgramEXT(GL_GEOMETRY_SHADER, gprog);glUseShaderProgramEXT(GL_FRAGMENT_SHADER, fprog);

– Uses a linked program object, but only the portion of that linked program for the specified domain

• Introduces selector for glUniform callsglActiveProgramEXT(program_updated_by_glUniform);

– Better to use DSA’s selector-free glProgramUniform*EXT commands

Page 75: OpenGL 3.2 and More

glUseProgram Equivalence

• Question: What does the existing glUseProgram call “mean” in the context of SSO?

glUseProgram(glsl_prog);

• Answer: It is exactly equivalent to these calls:glUseShaderProgramEXT(GL_VERTEX_SHADER, glsl_prog);glUseShaderProgramEXT(GL_GEOMETRY_SHADER, glsl_prog);glUseShaderProgramEXT(GL_FRAGMENT_SHADER, glsl_prog);glActiveProgramEXT(glsl_prog);

Page 76: OpenGL 3.2 and More

Convenient 1-StepSingle-domain Shader Loading• GLSL requires elaborate multi-step API for

compiling/linking a shader– Over-kill for separate shader objects– Desirable to have an API more like glProgramStringARB

• 1-Step command glCreateShaderProgramEXT( GLenum domain, const char *shader_string);

• Just a convenience function– You don’t have to use it for SSO– You can still create separate shaders with multi-step API– Sometimes necessary for binding attributes and fragment out

locations

Page 77: OpenGL 3.2 and More

glCreateShaderProgramEXTEquivalent to:

const GLuint shader = glCreateShader(type);if (shader) { const GLint len = (GLint) strlen(string); glShaderSource(shader, 1, &string, &len); glCompileShader(shader); const GLuint program = glCreateProgram(); if (program) { GLint compiled = GL_FALSE; glGetShaderiv(shader, GL_COMPILE_STATUS, &compiled); if (compiled) { glAttachShader(program, shader); glLinkProgram(program); glDetachShader(program, shader); } // Possibly... if (active-user-defined-varyings-in-linked-program) { append-error-to-info-log set-program-link-status-false } append-shader-info-log-to-program-info-log } glDeleteShader(shader); return program;} else { return 0;}

Page 78: OpenGL 3.2 and More

Passing Varyings BetweenSeparate Shader Objects• Programs in separate domains should pass varyings

through builtin varyings (NOT user-specified varyings)– So instead of varying float4 my_varying;

– Use a built-in such as gl_Texcoord[0]

– Guarantees up-stream and down-stream domains rendezvous with the same value

• Use of user-declared varyings are undefined• Compiling Cg code to GLSL profiles guarantees this is

the case– Cg has semantics to indicate how varyings correspond to API

resources

– Example Cg declaration: float4 my_varying : TEXCOORD0;

Page 79: OpenGL 3.2 and More

Thoughts of OpenGL Future

what direction now?

Page 80: OpenGL 3.2 and More

Where Do OpenGL Extensions Come From?

29%

17%

15%

15%4%

2%

2%

2%2%

2%

2%

2%

1% 1%

4%

15%

Multi-vendor

Silicon Graphics

Architectural Review Board

NVIDIA

ATI

Apple

Mesa3D

Sun Microsystems

OpenGL ES

OpenML

IBM

Intense3D

Hewlett Packard

3Dfx

Other

EXT

SGISGISSGIX

ARB

NV

Others Others

ATI

APPLE

MESA

• 44% of extensions are “core” or multi-vendor• Lots of vendors have initiated extensions

• Extending OpenGL is industry-wide collaboration Source: http://www.opengl.org/registry (Dec 2008)

Page 81: OpenGL 3.2 and More

What’s Driving OpenGL Modernization?

Human desire for VisualIntuition and Entertainment

Embarrassing

Parallelism ofGraphics

Increasing

Semiconductor

Density

Particularly thehardware-amenable,

latency tolerantnature of rasterization Particularly

interactive video games

Page 82: OpenGL 3.2 and More

Conclusions

• NVIDIA’s OpenGL driver leads the industry– Functional, performance, & semantic parity with

Direct3D– NVIDIA provides OpenGL 3.2 now

• If past is prologue… – NVIDIA OpenGL extensions: where to-be-core

functionality shows up first• Get a head-start by using the functionality now

– All new GPU functionality exposed for OpenGL in first shipping NVIDIA driver

Page 83: OpenGL 3.2 and More

More Information

• NVIDIA OpenGL 3.2 driver– Available now!– http://developer.nvidia.com/object/opengl_3_driver.html

• OpenGL 3.2 specification– http://www.opengl.org/registry/doc/glspec32.compatibility.20090803.pdf

• NVIDIA’s OpenGL extension registry– http://developer.nvidia.com/object/nvidia_opengl_specs.html

• Cg Toolkit 2.2 October 2009– Includes geometry shader examples shown here

– http://developer.nvidia.com/object/cg_toolkit.html

Page 84: OpenGL 3.2 and More

Links to Specific Extension Specifications• Provoking Vertex• Vertex Array BGRA• Depth Clamp• Texture Multisample• Seamless Cube Map• Fragment Coordinate Conv

entions• Synchronization Objects• Geometry Shaders

• Bindless graphics– Shader Buffer Load

– Vertex Buffer Unified Memory

• Direct State Access• Separate Shader Objects• Copy Image• Texture Barrier• Draw Elements Base Verte

x