Direct3D 9 Or why programmable hardware kicks ass Matthew M Trentacoste.
-
date post
22-Dec-2015 -
Category
Documents
-
view
218 -
download
4
Transcript of Direct3D 9 Or why programmable hardware kicks ass Matthew M Trentacoste.
Direct3D 9Direct3D 9Or why programmable hardware Or why programmable hardware
kicks asskicks ass
Matthew M TrentacosteMatthew M Trentacoste
IntroductionIntroduction
►Direct3D API has changed fundamentally Direct3D API has changed fundamentally to meet changes in hardwareto meet changes in hardware
►API has been adjusted to fit the API has been adjusted to fit the paradigm shift that has occurred in real-paradigm shift that has occurred in real-time graphicstime graphics
►Adapted to the fact that someone Adapted to the fact that someone programming real-time graphics is programming real-time graphics is writing code for 2 asymmetric writing code for 2 asymmetric processors, the GPU and CPUprocessors, the GPU and CPU
DifferencesDifferences
► OpenGL not as much bad, as outdatedOpenGL not as much bad, as outdated► OpenGL was very well designed, but that was OpenGL was very well designed, but that was
15 years ago15 years ago► Everything that has happened in real-time Everything that has happened in real-time
graphics since then has been stapled ongraphics since then has been stapled on► It is more of a pain for beginners learning to It is more of a pain for beginners learning to
program graphics in Direct3Dprogram graphics in Direct3D► But much more elegant once you are But much more elegant once you are
experienced enough to fully utilize the experienced enough to fully utilize the functionality providedfunctionality provided
► Much less of a state machine than OpenGLMuch less of a state machine than OpenGL
Differences (2)Differences (2)
► Has no immediate mode, can’t just specify Has no immediate mode, can’t just specify vertices, colors etc… directly from codevertices, colors etc… directly from code
► Built around a stream based model of dataBuilt around a stream based model of data► All data must be put into a buffer of elements to All data must be put into a buffer of elements to
be loaded onto the hardwarebe loaded onto the hardware► Trying to gracefully give control of the flow of Trying to gracefully give control of the flow of
data between CPU and GPU while still being data between CPU and GPU while still being efficientefficient
► API is getting there, streamlining of functionality API is getting there, streamlining of functionality means fewer objects to accomplish all tasksmeans fewer objects to accomplish all tasks
Direct3D 9 API Object ListDirect3D 9 API Object List► IDirect3DSwapChain9IDirect3DSwapChain9 (back buffers)(back buffers)
► IDirect3DTexture9IDirect3DTexture9 (textures)(textures)
► IDirect3DVolume9IDirect3DVolume9 (volume textures)(volume textures)
► IDirect3DVertexBuffer9IDirect3DVertexBuffer9 (vertex lists)(vertex lists)
► IDirect3DIndexBuffer9IDirect3DIndexBuffer9 (index lists)(index lists)
► IDirect3DSurface9IDirect3DSurface9 (render targets)(render targets)
► IDirect3DStateBlock9IDirect3DStateBlock9 (render state container)(render state container)
► IDirect3DVDecl9IDirect3DVDecl9 (vertex format)(vertex format)
► IDirect3DVertexShader9IDirect3DVertexShader9 (vertex shader)(vertex shader)
► IDirect3DPixelShader9IDirect3DPixelShader9 (p(piixel shader)xel shader)
Other Reason D3D rocksOther Reason D3D rocks
►D3DX!!!!D3DX!!!!►All the math you could possibly need for All the math you could possibly need for
graphics already writtengraphics already written►Vectors, matrices, quaternions, textures, Vectors, matrices, quaternions, textures,
models, etc…models, etc…►Optimized code using all special instruction Optimized code using all special instruction
sets (3Dnow, SSE2, and what not)sets (3Dnow, SSE2, and what not)►Best solution for almost anything you could Best solution for almost anything you could
want to do, unless some crazy special casewant to do, unless some crazy special case
Cool ShitCool Shit
►Still with me?Still with me?
►High-order primitivesHigh-order primitives►Adaptive tessellationAdaptive tessellation►Displacement mapsDisplacement maps
►And pretty pictures of themAnd pretty pictures of them
Higher Order PrimitivesHigher Order Primitives
► Current primitives are not ideal for representing smooth Current primitives are not ideal for representing smooth surfacessurfaces
► Direct3D 9 supports points, lines, triangles, and grid Direct3D 9 supports points, lines, triangles, and grid primitivesprimitives
► Higher-order interpolation methods, such as cubic Higher-order interpolation methods, such as cubic polynomials, allow more accurate calculations in polynomials, allow more accurate calculations in rendering curved shapesrendering curved shapes
► The application need only provide a desired level of The application need only provide a desired level of tessellationtessellation
► Transmit the data using standard triangle syntax that Transmit the data using standard triangle syntax that includes normal vectors includes normal vectors
Adaptive TessellationAdaptive Tessellation► Adaptively tessellates a patch, based on the Adaptively tessellates a patch, based on the
depth value of the control vertex in eye depth value of the control vertex in eye space space
► Tessellation level computed per-vertexTessellation level computed per-vertex From API value scaled by 1.0 / ZeyeFrom API value scaled by 1.0 / Zeye
► Then surface is tessellated accordinglyThen surface is tessellated accordingly► API takes triangles, defines high order API takes triangles, defines high order
surfaces from them, and then tessellates surfaces from them, and then tessellates those surfaces as neededthose surfaces as needed
► Meaning : more detail the closer you getMeaning : more detail the closer you get
Demo Time #1Demo Time #1
Displacement MappingDisplacement Mapping► Adaptive tessellation enables us to use a texture Adaptive tessellation enables us to use a texture
to deform a surfaceto deform a surface► A texture of a height field is spread across a A texture of a height field is spread across a
high-order surfacehigh-order surface► Tessellates surface until the detail of the Tessellates surface until the detail of the
geometry is high enough to represent height fieldgeometry is high enough to represent height field► Changes shape of surface to match displacement Changes shape of surface to match displacement
as opposed to merely modifying the surface as opposed to merely modifying the surface normal vector to appear like a deformed surfacenormal vector to appear like a deformed surface
► What bump maps wish they wereWhat bump maps wish they were
Demo Time #2Demo Time #2
DirectX Graphics ArchitectureDirectX Graphics Architecture
Primitive OpsPrimitive Ops
VertexVertexComponentsComponents
PixelPixelShaderShader
SamplersSamplers
ImageImageSurfaceSurface
VBVB
PosPos ColorColor TC1TC1 TC2TC2
Output pixelsOutput pixels
Tex1Tex1
Tex0Tex0
Tex2Tex2
VertexVertexShaderShader
Vec0Vec0 Vec1Vec1 Vec2Vec2 Vec3Vec3 VectorVectorDataData
GeometryGeometryOpsOps
PixelPixelOpsOps
Pipeline OverviewPipeline Overview
►Create VertexBuffer Create VertexBuffer (where model goes)(where model goes)
►Set up Vertex Stream Set up Vertex Stream (put model there)(put model there)
►Define VertexDecl Define VertexDecl (what data means)(what data means)
►Vertex Shader Object Vertex Shader Object (operate on model)(operate on model)
►Pixel Shader Object Pixel Shader Object (render model)(render model)
►FrameBuffer blender FrameBuffer blender (add image of model to (add image of model to scene)scene)
Vertex Declaration ObjectVertex Declaration Object►New syntax for describing vertex New syntax for describing vertex
formats for DMA engine and tessellator formats for DMA engine and tessellator behaviorbehavior
►New object IDirect3DVDecl9New object IDirect3DVDecl9►Separately createableSeparately createable
CreateVertexDeclaration()CreateVertexDeclaration()
►Separately settableSeparately settable SetVertexDeclaration()SetVertexDeclaration() Settable independent of vertex shaderSettable independent of vertex shader
Default SemanticsDefault Semantics► VertexDecl now supports “usage” fieldVertexDecl now supports “usage” field
Position, Normal, Tangent, Binormal, etc.Position, Normal, Tangent, Binormal, etc.
► Provided to enable default semanticsProvided to enable default semantics► Allows implementation to connect shaders Allows implementation to connect shaders
together without requiring a fixed register together without requiring a fixed register conventionconvention
► Acts as symbol table for run-time linking of Acts as symbol table for run-time linking of shaders to core API and therefore hardwareshaders to core API and therefore hardware
► No addl. policy is imposed over DirectX 8No addl. policy is imposed over DirectX 8 Default semantics can be overriddenDefault semantics can be overridden
► Deals with concepts, not memory addressesDeals with concepts, not memory addresses
DirectX 8 Vertex DeclarationDirectX 8 Vertex Declaration
v0v0 skipskip v1v1
Strm0Strm0 Strm1Strm1
Declaration
Vertex layout
vs 1.1vs 1.1mov r0, v0mov r0, v0……
Shader program
Shader handleShader handle
pospos normnorm diffdiff
Strm0Strm0 Strm1Strm1
pospos normnorm diffdiff
Strm0Strm0
Declaration
Vertex layout
vs 1.1vs 1.1
dcl_position v0dcl_position v0
dcl_diffuse v1dcl_diffuse v1
mov r0, v0mov r0, v0
……
Shader program(Shader handle)
New Vertex DeclarationNew Vertex Declaration
vs 1.1vs 1.1
dcl_position v0dcl_position v0
dcl_diffuse v1dcl_diffuse v1
mov r0, v0mov r0, v0
……
Vertex Shader ArchitectureVertex Shader ArchitectureVec0Vec0 Vec1Vec1 Vec2Vec2 Vec3Vec3 Vec4Vec4 Vec15Vec15
Const0Const0
R3R3
R0R0
R11R11
Const1Const1
Const2Const2
Const3Const3
Const95Const95
HposHpos Color0Color0 Color1Color1TC0TC0 TC1TC1 TC2TC2 TC3TC3
……
…… …
…
Vertex ALUVertex ALU
R1R1
A0A0
R2R2
Vertex ShadersVertex Shaders Vertex Shader 2.0 Register ReferenceVertex Shader 2.0 Register Reference
NameName r/w?r/w? DescriptionDescription CountCount Port CountPort Count
bbnn rr BooleanBoolean 1616 11
iinn r/wr/w Loops-3Loops-3 1616 11
aann ** 4-D Address 4-D Address 11 11
ccnn rr ConstantConstant 256256 11
rrnn r/wr/w TemporaryTemporary 1212 33
vvnn rr Vertex inputVertex input 1616 11*a*ann Can only be written to by Can only be written to by movmov and result used as integer and result used as integer
offset in relative addressingoffset in relative addressing
Note: Port Count = number of times a different register of that Note: Port Count = number of times a different register of that class can be used in single instructionclass can be used in single instruction
Math InstructionsMath Instructions
► Parallel ops (componentwise):Parallel ops (componentwise): add, sub, mul, mad, frc, cmpadd, sub, mul, mad, frc, cmp
► Vector opsVector ops dp3, dp4dp3, dp4
► Scalar ops:Scalar ops: rcp, rsq, exprcp, rsq, exp22, log, log22
► MacrosMacros LRP, NRM3, POW, CRS, SINCOS, SGN, ABSLRP, NRM3, POW, CRS, SINCOS, SGN, ABS
Vertex ShadersVertex Shaders Instruction referenceInstruction reference
max Maximum
min Minimum
sge Set on greater or equal than
slt Set on less than
rcp Reciprocal
rsq Reciprocal square root
expp Exponential 16-bit precision
logp Logarithm 16-bit precision
Vertex Shader Flow ControlVertex Shader Flow Control
► DirectX 9 vertex shaders vs2.0 supports DirectX 9 vertex shaders vs2.0 supports flow controlflow control
► Result is “Structured Assembly” languageResult is “Structured Assembly” language► Control logic based on constants onlyControl logic based on constants only► Required by ISVs to solveRequired by ISVs to solve
Enable/Disable environment mapping, etc.Enable/Disable environment mapping, etc. ““varying # of lights” problemvarying # of lights” problem Brings support == to nonprogrammableBrings support == to nonprogrammable
► Ideally better skinning approachIdeally better skinning approach ““varying # of bones” problemvarying # of bones” problem
Instruction Counts vs. SlotsInstruction Counts vs. Slots
► Flow control means slots != countsFlow control means slots != counts► Instruction store is 256, but more Instruction store is 256, but more
instructions can be executed than are storedinstructions can be executed than are stored► Executed instruction count limit is higherExecuted instruction count limit is higher
Recommend to not exceed 1024Recommend to not exceed 1024
Sampler State SeparationSampler State Separation► TextureStageState (TSS) has been splitTextureStageState (TSS) has been split
One category for Texture Sampler dataOne category for Texture Sampler data One category for Texture Iterator controlOne category for Texture Iterator control
► Why?Why? Sampler State has 16 elements as 16 textures Sampler State has 16 elements as 16 textures
may be sampled in one passmay be sampled in one pass Other state has only 8 elements Other state has only 8 elements Much of this state is for legacy pipelinesMuch of this state is for legacy pipelines
► All enum indices remain the sameAll enum indices remain the same DDI impact is minimalDDI impact is minimal
Pixel ShadersPixel Shaders►Float data precision supportedFloat data precision supported►Enables photoreal rendering of high-Enables photoreal rendering of high-
dynamic range scenes - cf Debevecdynamic range scenes - cf Debevec►Pixel shader ALU must supportPixel shader ALU must support
At least s10e5 precision for color dataAt least s10e5 precision for color data At least s17e6 precision for all other dataAt least s17e6 precision for all other data
►Any inputs data of 32-bit float such as texture Any inputs data of 32-bit float such as texture iterators or reads of 32-bit float texture formatsiterators or reads of 32-bit float texture formats
►_pp modifier supported on any instruction_pp modifier supported on any instruction Highlights operations where reduced Highlights operations where reduced
precision is acceptable for performanceprecision is acceptable for performance
Demo Time #3Demo Time #3
Pixel Shader 2.0 ArchitecturePixel Shader 2.0 Architecture
t0t0 t1t1 t2t2 t3t3 t4t4 t7t7
c0c0
r3r3
r0r0
r11r11
c1c1
c2c2
c3c3
c31c31
oC0oC0 oC1oC1 oC2oC2 oC3oC3
……
…… …
…
Pixel ALUPixel ALU
r1r1
r2r2
v0v0 v1v1
Pixel ShadersPixel Shaders Pixel Shader 2.0 Register ReferencePixel Shader 2.0 Register Reference
► Port Count = # of times different registers of same class can be Port Count = # of times different registers of same class can be used in one instructionused in one instruction
Name r/w? Description Count Port Count
vn r Color 2 1
tn r/wTexcoordIterators
8 1
rn r/w Temporary 12 3
cn r Constant 32 1*
sn rTexture
Samplers16 1
Texture Load InstructionsTexture Load Instructions
► 3 instructions provided in ps_2_03 instructions provided in ps_2_0► Standard texture load:Standard texture load:
texld r0, t1, s3texld r0, t1, s3
► Texture with per-pixel LOD bias:Texture with per-pixel LOD bias:texldb r0, t0, s2texldb r0, t0, s2 Bias value stored in t0.wBias value stored in t0.w
► Projected texture load:Projected texture load:texldp r1, t2, s0texldp r1, t2, s0 Does perspective divide before lookupDoes perspective divide before lookup
Dependent ReadsDependent Reads
► Can be serialized, but only to a max Can be serialized, but only to a max depth of 4:depth of 4:
► dcl t0.xy;dcl t0.xy;
dcl_2d s0.rg;dcl_2d s0.rg;
texld r0, s0, t0;texld r0, s0, t0;
texld r1, s1, r0;texld r1, s1, r0;
texld r2, s1, r1;texld r2, s1, r1;
texld r3, s1, r2;texld r3, s1, r2;
► Is legalIs legal
Dependent Reads RockDependent Reads Rock
►What’s so great?What’s so great?►Textures become functional mapsTextures become functional maps►Any continuous function that takes up Any continuous function that takes up
to 3 inputs and produces up to 4 to 3 inputs and produces up to 4 outputs can be stored as a textureoutputs can be stored as a texture
►Pre-compute results and store in texturePre-compute results and store in texture►Load texture at coordinates of inputLoad texture at coordinates of input►Returns output as value at that pointReturns output as value at that point
Dependent Reads RockDependent Reads Rock
►Allows for results far too complicated to Allows for results far too complicated to be calculated in real-time to be used on be calculated in real-time to be used on GPU with minimal costGPU with minimal cost
►Stop thinking of textures as mere Stop thinking of textures as mere images, but stores of dataimages, but stores of data
►Lookup tables, noise generators, and Lookup tables, noise generators, and most arbitrary functions are all capable most arbitrary functions are all capable of being emulated in current hardware of being emulated in current hardware quicklyquickly
Multi-Render Target (MRT)Multi-Render Target (MRT)►Step towards rationalizing textures and Step towards rationalizing textures and
vertex buffersvertex buffers►Allow writing out multiple values from Allow writing out multiple values from
a single pixel shader passa single pixel shader pass Up to 4 color elements plus Z/depthUp to 4 color elements plus Z/depth Facilitates multipass algorithmsFacilitates multipass algorithms
►Can have a pixel shader output 4 Can have a pixel shader output 4 vector-4s + depth for each pixelvector-4s + depth for each pixel
►That is 17 pieces floating point of data That is 17 pieces floating point of data that can be storedthat can be stored
MRT Example : Depth of FieldMRT Example : Depth of Field
The images on the left are the original. The The images on the left are the original. The center is the alpha map. Black is in focus, white center is the alpha map. Black is in focus, white is out of focus. We can move the focal plane is out of focus. We can move the focal plane anywhere we like.anywhere we like.
OriginalOriginal Alpha of OriginalAlpha of Original Blurred ResultBlurred Result
MRT Example : Edge MRT Example : Edge DetectionDetection
► Edge Detection, Images courtesy of ATI Edge Detection, Images courtesy of ATI Technologies, Inc.Technologies, Inc.
World Space Normals
Eye Space Depth Outlines
Edge Detect
MRT Example : Edge MRT Example : Edge DetectionDetection
► Composite outlines to get a cell-Composite outlines to get a cell-shaded effect. Images courtesy of ATIshaded effect. Images courtesy of ATI
High Level Shader LanguageHigh Level Shader Language► Why?Why?► Because assembly sucksBecause assembly sucks► Allows all the things that make C so Allows all the things that make C so
much better than machine codemuch better than machine code► Can separate pixel and vertex shader Can separate pixel and vertex shader
code from datacode from data► No longer have to map elements of a No longer have to map elements of a
stream to registers, done stream to registers, done semanticallysemantically
DirectXDirectX®® 8 Assembly 8 Assembly
tex t0tex t0 ; base texture; base texturetex t1tex t1 ; environment map; environment map
add r0, t0, t1add r0, t0, t1 ; apply reflection; apply reflection
DirectX 9 HLSL SyntaxDirectX 9 HLSL Syntax
outColor = outColor =
tex2d( baseTextureCoord, baseTexture )+tex2d( baseTextureCoord, baseTexture )+
texCube( EnvironmentMapCoord, Environment );texCube( EnvironmentMapCoord, Environment );
Maybe more characters, but makes much more senseMaybe more characters, but makes much more sense
DatatypesDatatypes
► Ints, bools, floats, etc…Ints, bools, floats, etc…►All the things you know and loveAll the things you know and love►Plus things that make graphics easy like Plus things that make graphics easy like
vectors and matrixesvectors and matrixes►1x1 up to 4x4 first order floating point 1x1 up to 4x4 first order floating point
datadata►matrix4x4 not matrix[4][4]matrix4x4 not matrix[4][4]►All operations designed to operate on up All operations designed to operate on up
to 4x4 data-types nativelyto 4x4 data-types natively
DirectX 8 Vertex Declaration DirectX 8 Vertex Declaration (again)(again)
v0v0 skipskip v1v1
Strm0Strm0 Strm1Strm1
Declaration
Vertex layout
vs 1.1vs 1.1mov r0, v0mov r0, v0……
Shader program
Shader handleShader handle
pospos normnorm diffdiff
Strm0Strm0 Strm1Strm1
pospos normnorm diffdiff
Strm0Strm0
Declaration
Vertex layout
vs 1.1vs 1.1
dcl_position v0dcl_position v0
dcl_diffuse v1dcl_diffuse v1
mov r0, v0mov r0, v0
……
Shader program(Shader handle)
New Vertex Declaration New Vertex Declaration (again)(again)
vs 1.1vs 1.1
dcl_position v0dcl_position v0
dcl_diffuse v1dcl_diffuse v1
mov r0, v0mov r0, v0
……
Vertex Shader Input Vertex Shader Input SemanticsSemantics
► position[n]position[n] untransformed positionuntransformed position► blendweight[n]blendweight[n] skinning blending weight skinning blending weight ► blendindices[n]blendindices[n]skinning blending indicesskinning blending indices► normal[n]normal[n] normal vectornormal vector► psize[n]psize[n] point size (particle system)point size (particle system)► diffuse[n]diffuse[n] diffuse (matte) colordiffuse (matte) color► specular[n]specular[n] specular (shiny) colorspecular (shiny) color► texcoord[n]texcoord[n] texture coordinatestexture coordinates► tangent[n]tangent[n] these two with normal these two with normal
vectorvector► binormal[n]binormal[n] make a 3D coordinate make a 3D coordinate
systemsystem
VS output / PS input VS output / PS input semanticssemantics
► PositionPosition transformed positiontransformed position► PsizePsize PointsizePointsize► FogFog Fog blending valueFog blending value► color[n]color[n] Computed colorsComputed colors► texcoord[n]texcoord[n] Texture coordinatesTexture coordinates
Uses for SemanticsUses for Semantics
► A data binding protocol:A data binding protocol: Between vertex data and shadersBetween vertex data and shaders Between pixel and vertex shadersBetween pixel and vertex shaders Between pixel shaders and hardware Between pixel shaders and hardware Between shader fragmentsBetween shader fragments
► One smooth process of describing the One smooth process of describing the flow of data in an out of various flow of data in an out of various elements of the render processelements of the render process
So…So…
► Yeh, we got all this programmable hardwareYeh, we got all this programmable hardware► What does it really give us?What does it really give us?
OPTIONS!!!OPTIONS!!!
► Are finally able to compute what you wantAre finally able to compute what you want► No longer the fixed function pipeline’s bitchNo longer the fixed function pipeline’s bitch► Can render Pong, even Wolfenstein on GPUCan render Pong, even Wolfenstein on GPU► Think of the GPU as a signal processor of Think of the GPU as a signal processor of
vertex and pixel data, not merely rendering vertex and pixel data, not merely rendering picturespictures
FinallyFinally
►All graphics that use the fixed function All graphics that use the fixed function pipeline, ie. Standard Lighting pipeline, ie. Standard Lighting Equation fundamentally look the sameEquation fundamentally look the same
►Many hacks to work aroundMany hacks to work around►But still stuck with:But still stuck with:
ambient + diffuse + specularambient + diffuse + specular►Allows graphics programmers to tailor Allows graphics programmers to tailor
the look of their work to fit the content the look of their work to fit the content
Choose Your LookChoose Your Look
► Pick a unique “Look” and do itPick a unique “Look” and do it► ToonToon several methodsseveral methods► CheesyCheesy unlit or flat shadedunlit or flat shaded► RetroRetro standard FF pipelinesstandard FF pipelines► RadiosityRadiosity soft lighting onlysoft lighting only► ShadowsShadows horror movie, Doom IIIhorror movie, Doom III► GrittyGritty ultra realisticultra realistic► And many moreAnd many more
Time for Hands OnTime for Hands On
Hemisphere ModelHemisphere Model
Sky Color
Final ColorFinal Color
Ground ColorGround Color
Hemisphere ModelHemisphere Model
Distributed Light ModelDistributed Light Model
Hemisphere of possible incident light directions
Surface Normal
Microfacet Normal - defines axis of hemisphere
2-Hemisphere Model2-Hemisphere Model
Sky Color
Ground Color
Distributed Light ModelDistributed Light Model
Hemisphere of possible incident light directions
Microfacets
Other facets can shadow this one: Occlusion
Ray Cast Occlusion ModelRay Cast Occlusion Model
Microfacet
Some rays hit this object, others miss it
Occlusion RepresentationsOcclusion Representations
► Can store result in various waysCan store result in various ways► Compute ratio of hits / missesCompute ratio of hits / misses
Occlusion FactorOcclusion Factor A single scalar parameterA single scalar parameter Should weight with cosineShould weight with cosine
► Use to blend in shadow colorUse to blend in shadow color► Sufficient for hemisphere lightingSufficient for hemisphere lighting
Hemisphere Lighting Hemisphere Lighting +Occlusion+OcclusionSky Color
Final ColorFinal Color
Ground ColorGround Color
Object ColorObject Color Sphere Model
Occlusion Factor
Back to WorkBack to Work