DIRECTX CONSTANTS OPTIMIZATIONS FOR INTEL® INTEGRATED GRAPHICS · DirectX 9 auto-allocates shader...

25
DIRECTX CONSTANTS OPTIMIZATIONS FOR INTEL® INTEGRATED GRAPHICS Katen Shah Luis Gimenez Arzhange Safdarzadeh December 2008 Intel® Corporation

Transcript of DIRECTX CONSTANTS OPTIMIZATIONS FOR INTEL® INTEGRATED GRAPHICS · DirectX 9 auto-allocates shader...

Page 1: DIRECTX CONSTANTS OPTIMIZATIONS FOR INTEL® INTEGRATED GRAPHICS · DirectX 9 auto-allocates shader constants, assigning each a set of float4 registers. Table 3 outlines the type,

DIRECTX CONSTANTS OPTIMIZATIONS FOR INTEL®

INTEGRATED GRAPHICS

Katen Shah

Luis Gimenez

Arzhange Safdarzadeh

December 2008

Intel® Corporation

Page 2: DIRECTX CONSTANTS OPTIMIZATIONS FOR INTEL® INTEGRATED GRAPHICS · DirectX 9 auto-allocates shader constants, assigning each a set of float4 registers. Table 3 outlines the type,

2

INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® ® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTEL® LECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL® 'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL® ASSUMES NO LIABILITY WHATSOEVER, AND INTEL® DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL® PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTEL® LECTUAL PROPERTY RIGHT. Intel® products are not intended for use in medical, life saving, life sustaining, critical control or safety systems, or in nuclear facility applications.

Intel® may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel® reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information.

This white paper, as well as the software described in it, is furnished under license and may only be used or copied in accordance with the terms of the license. The information in this document is furnished for informational use only, is subject to change without notice, and should not be construed as a commitment by Intel® Corporation. Intel® Corporation assumes no responsibility or liability for any errors or inaccuracies that may appear in this document or any software that may be provided in association with this document.

Intel® processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families. See www.Intel.com/products/processor_number for details.

The Intel® processor/chipset families may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Recipient is not obligated to provide Intel with comments or suggestions regarding this document. However,

should Recipient provide Intel with comments or suggestions for the modification, correction, improvement

or enhancement of: (a) this document; or (b) Intel products which may embody this document, Recipient

grants to Intel a non-exclusive, irrevocable, worldwide, royalty-free license, with the right to sublicense Intel’s

licensees and customers, under Recipient intellectual property rights, to use and disclose such comments and suggestions in any manner Intel chooses and to display, perform, copy, make, have made, use, sell, and

otherwise dispose of Intel's and its sublicensee’s products embodying such comments and suggestions in any

manner and via any media Intel chooses, without reference to the source.

Copies of documents, which have an order number and are referenced in this document, or other Intel® literature, may be obtained by calling 1-800-548-4725, or by visiting Intel 's Web Site.

Intel® and the Intel® Logo are trademarks of Intel® Corporation in the U.S. and other countries.

*Other names and brands may be claimed as the property of others.

Copyright © 2008, Intel® Corporation. All rights reserved

Page 3: DIRECTX CONSTANTS OPTIMIZATIONS FOR INTEL® INTEGRATED GRAPHICS · DirectX 9 auto-allocates shader constants, assigning each a set of float4 registers. Table 3 outlines the type,

3

TABLE OF CONTENTS

Purpose 4

Introduction 4

constants in Directx 9 and directx 10................................................................................. 5

TIPS for Direct3D 9 constant in IIG ................................................................................... 7

D3D10 Constants Management ........................................................................................ 11

Mul t Iple CBUFFERS PERFORMANCE IMP ACT ...................................................................... 17 Example 1: Ocean Fog D3D10 Demo ............................................................... 17 EXAMPLE 2: SKINNING10 ....................................................................... 21

Note 2: These measurements were taking on a HP Pavillon with Mobile Intel® 4 Series Express Chipset FamilyOpt imizing

DIRECT X 10 ............................................................................................... 22

Summary 24

Web site and Engineering support ................................................................................................... 24

References 24

Appendix 25

TABLES 25

Page 4: DIRECTX CONSTANTS OPTIMIZATIONS FOR INTEL® INTEGRATED GRAPHICS · DirectX 9 auto-allocates shader constants, assigning each a set of float4 registers. Table 3 outlines the type,

4

PURPOSE

The goal of this paper is to describe the behavior of constants in DirectX 9 and DirectX 10 to

help developers optimize the performance of the applications in Intel Integrated Graphics.

Intel Integrated Graphics refers to Intel® Graphics Media Accelerator used in the Intel® 4 Series Chipsets (the Intel® 4500, X4500, and X4500HD GMAs). These chipsets are used in desktop G41, G43, and G45 and mobile GM45 and GM47 systems. In general, the core of the graphics media accelerators is broken into generations; these generations are known as Gen3, Gen4, Gen5, etc. Commonly, these are known as “GenX”. Each year, more capabilities and better performance are provided by new integrated graphics cores. Intel Integrated Graphics has the largest market share for new PC shipments. Source Mercury

Research (Q1/09). Therefore, it makes sense to write your 3D applications to this market segment and optimize the experience for the largest number of people.

INTRODUCTION

Constants are external variables passed as parameters to the shaders; their values remain “constant” during each invocation of the shader program. Despite their name, constants are one of the most frequently changing values in a DirectX application. A shader program can initialize a constant variable statically to a value in the shader file or at runtime through the application. Most of the recommendations described here are not completely new and may have been described elsewhere. However, it is still very much applicable to integrated graphics and the recommendations are provided in a cohesive manner. Finally, care needs to be taken when porting from DirectX 9 to DirectX 10 to maintain performance. .

Page 5: DIRECTX CONSTANTS OPTIMIZATIONS FOR INTEL® INTEGRATED GRAPHICS · DirectX 9 auto-allocates shader constants, assigning each a set of float4 registers. Table 3 outlines the type,

5

CONSTANTS IN DIRECTX 9 AND DIRECTX 10

In DirectX9 the constant data is specified in constant registers, while in DirectX 10 external variables residing in constant buffers are passed as parameter to the shader program. Depending on the use and declaration in the shader program constants can be immediate, immediate indexed, or dynamic indexed. Table 1 shows code samples with examples of each case. TABLE 1 IMMEDIATE, IMMEDIATE INDEXED, DYNAMIC INDEXED CONSTANTS

Direct X 9 Direct X 10 // Generated by Microsoft (R) HLSL Shader

Compiler 9.24.950.2656

// Parameters:

//

// float4x4 mViewProj;

// float4x3 mWorldMatrixArray[26];

//

// Registers:

//

// Name Reg Size

// ----------------- ----- ----

// mWorldMatrixArray c0 78

// mViewProj c78 4

vs_2_0

mova a0.w, r1.x

dp3 r2.x, v3, c0[a0.w]

dynamic indexedconstants

mova a0.w, r1.x

dp3 r2.y, v3, c1[a0.w]

mova a0.w, r1.x

dp3 r2.z, v3, c2[a0.w]

…..

dp4 r5.x, v0, c0[a0.w]

mova a0.w, r1.y

dp4 oPos.x, r2, c78 immediate constants

dp4 oPos.y, r2, c79

dp4 oPos.z, r2, c80

dp4 oPos.w, r2, c81

VertexShader = asm {

// Generated by Microsoft (R) HLSL Shader

Compiler 9.24.949.2307

// Buffer Definitions:

// cbuffer cbAnimMrtx

//{

//float4x4 g_mConstBoneWrld[255]; //Offset:0

Size: 16320

//}

//

//cbuffer cbDynamic

//{

//float4x4 g_mWorld;//Offset:32 Size:64

//float4x4 g_mWorldViewProjection;

//Offset:96 Size:64

//}

. . . . . .

vs_4_0

dcl_constantbuffer cb0[1020], dynamic

indexeddynamicIndexed

dcl_constantbuffer cb1[8], immediate

indexedimmediateIndexed

dp4o0.y, v0.xyzw, cb1[7].xyzw

dp4o0.z, v0.xyzw, cb1[8].xyzw

dp4o0.w, v0.xyzw, cb1[9].xyzw

...........

dp4 r2.x, r0.xyzw, cb0[r1.y + 0].xyzw dynamic

indexed constants

dp4 r2.y, r0.xyzw, cb0[r1.y + 1].xyzw

dp4 r2.z, r0.xyzw, cb0[r1.y + 2].xyzw

Direct X 10 also supports immediate constants as a source operand per instruction using a 32-bit

immediate scalar or 32-bit immediate 4-component vector. It is equivalent to the def instruction

used in d3d9 to define immediate constants. The immediate constant values are used during the

life of the shader. These occur as a result of literal values used in HLSL code. See the example Table 2

Page 6: DIRECTX CONSTANTS OPTIMIZATIONS FOR INTEL® INTEGRATED GRAPHICS · DirectX 9 auto-allocates shader constants, assigning each a set of float4 registers. Table 3 outlines the type,

6

TABLE 2 DIRECTX 10 INMEDIATE CONSTANT

BASICHLSL VERTEX SHADER CODE:

… // ANIMATION THE VERTEX BASED ON TIME AND THE VERTEX’S OBJECT SPACE POSITION IF (BANIMATE) VANIMATEDPOS += FLOAT4(VNORMAL, 0) * (SIN(G_FTIME + 5.5) + 0.5) * 5

D3D9 CODE:

//FLOAT G_FTIME; //G_FTIME C12 1 VS2_0 DEF C13, 5.5, 0.159154937, 0.5, 5 DEF C14, 6.28318548, -3.14159274, 0, 1 DEF C15, -1.55009923E-006, -2.170389E-

005, 0.0026041667, 0.00026041668 DEF C16, -0.020833334, -0.125, 1, 0.5 ... //R0.X = 5.5

MOV R0.X, C13.X

// G_FTIME + 5.5 ADD R0.X, R0.X, C12.X

...

D3D10 CODE:

… //CBUFFER $GLOBALS //{ // … //FLOAT G_FTIME; //OFFSET: 160 SIZE: 4 // … //} VS_4_0 DCL_CONSTANTBUFFER CB0[19],

IMMEDIATEINDEXED … //G_FTIME + 5.5 ADD R0.X, CB0[10].X, L(5.500000)

//SIN(G_FTIME + 5.5) SINCOS R0.X, NULL, R0.X

// SIN(G_FTIME + 5.5)+0.5 ADD R0.X, R0.X, L(0.500000)

...

Page 7: DIRECTX CONSTANTS OPTIMIZATIONS FOR INTEL® INTEGRATED GRAPHICS · DirectX 9 auto-allocates shader constants, assigning each a set of float4 registers. Table 3 outlines the type,

7

TIPS FOR DIRECT3D 9 CONSTANT IN IIG

DirectX 9 auto-allocates shader constants, assigning each a set of float4 registers. Table 3 outlines the type, size and use of the registers that DirectX 9 requires to support across Shader Model 3.

TABLE 3 DIRECTX 9 SHADER MODEL 3

Register Name Count R/W # Read ports

# Reads / inst

Size Rel Addr *

Defaults

c# Constant Float Register

VS 256 PS 224

R 1 Unlimited 4 VS a0/aL PS No

(0, 0, 0, 0)

a0 Address Register 1 R/W 1 Unlimited 4 No None

b# Constant Boolean Register

16 R 1 1 1 No FALSE

i# Constant Integer Register

16 R 1 1 4 No (0, 0, 0, 0)

aL Loop Counter Register

1 R 1 Unlimited 1 No None

* Only the Vertex Shader allows relative addressing and only floating-point constant registers

can be indexed.

In DirectX 9 local constants always take precedence over global constants and the scope of local constants is restricted to the shader they are defined in. As mentioned in the previous section the shader program can define a constant as immediate, (stored as a constant register) or as constant array (stored as indexed constant). Constant arrays can be indexed with either an immediate index (such as int i=0) or with a dynamic index. The Intel Integrated Graphics Driver treats immediate constants and immediate indexed constants the same way, for example C78[0] is C78 and C78[1] is C79. Dynamically indexed constants are only available in vertex shaders. Dynamic indexing addressing allows the shader to access a register based on the value stored in the address register a0 or loop register aL. The Intel Integrated Graphics Driver optimizes the access of most frequently used immediate constants by loading them into a constant hardware buffer. This buffer is the most efficient way to access shader constants on Intel Integrated Graphics. In addition local variables perform better than global variables because the driver is able to optimize them and convert them into an immediate value.

Page 8: DIRECTX CONSTANTS OPTIMIZATIONS FOR INTEL® INTEGRATED GRAPHICS · DirectX 9 auto-allocates shader constants, assigning each a set of float4 registers. Table 3 outlines the type,

8

TABLE 4 SAMPLECODE FROM MICROSOFT DIRECT3D SDK SKINNED MESH

HLSL // Skinned Mesh Effect file

// Copyright (c) 2000-2002 Microsoft Corporation. All

rights reserved.

//

float4 lhtDir = {0.0f, 0.0f, -1.0f, 1.0f}; //light

Direction

……….

FLOAT4 MATERIALAMBIENT : MATERIALAMBIENT =

{0.1F, 0.1F, 0.1F, 1.0F};

// Matrix Pallette

static const int MAX_MATRICES = 26;

float4x3 mWorldMatrixArray[MAX_MATRICES]:

WORLDMATRIXARRAY;

float4x4 mViewProj : VIEWPROJECTION; // IMMEDIATE

constants

///////////////////////////////////////////////////////

VS_OUTPUT VShade(VS_INPUT i, uniform int NumBones)

{

VS_OUTPUT o;

float3 Pos = 0.0f;

float3 Normal = 0.0f;

float LastWeight = 0.0f;

……………………………………….

// calculate the pos/normal using the "normal" weights

// and accumulate the weights to calculate the

last weight

for (int iBone = 0; iBone < NumBones-1; iBone++)

{

LastWeight = LastWeight + BlendWeightsArray[iBone];

Pos += mul(i.Pos, mWorldMatrixArray[IndexArray[iBone]])

* BlendWeightsArray[iBone];

Normal += mul(i.Normal,

mWorldMatrixArray[IndexArray[iBone]]) *

BlendWeightsArray[iBone];

}

………………………………

ASSEMBLY ASM Generated by Microsoft (R) HLSL

Shader Compiler 9.24.950.2656

//

// Parameters:

//…………….

// float4 MaterialAmbient;

// float4 lhtDir;

// float4x4 mViewProj;

// float4x3 mWorldMatrixArray[26];

//

// Registers:

// Name Reg Size

// ----------------- ----- ----

// mWorldMatrixArray c0 78

// MVIEWPROJ C78 4

// LH C82 1

// MATERIALAMBIENT C83 1

// ………

VS_2_0

…….

mova a0.w, r1.x

DP3 R2.X, V3, C0[A0.W] DYNAMIC

INDEXED CONSTANTS

mova a0.w, r1.x

dp3 r2.y, v3, c1[a0.w]

mova a0.w, r1.x

dp3 r2.z, v3, c2[a0.w]

mul r0.xyz, r2, v1.x

………

dp4 oPos.x, r2, c78 immediate

constants

dp4 oPos.y, r2, c79

dp4 oPos.z, r2, c80

DP4 OPOS.W, R2, C81

……..

In Table5 mWorldmatrixArray is dynamic indexed (HLSL mWorldMatrixArray[IndexArray[iBone]] with its values initialized by the application at runtime. The dynamic index is referenced in ASM by the address register (a0). The IIG driver optimizes the use of the immediate constants C78-C83 by pushing them into the Hardware Constant Buffer. Since constants Cx[a0] are dynamic indexed, the driver will not include them into the optimization algorithm. Constants that have static values are compiled into the shader as an immediate value as shown in

TABLE 6. Those constants should be declared as static const as the shader will improve

performance.

Page 9: DIRECTX CONSTANTS OPTIMIZATIONS FOR INTEL® INTEGRATED GRAPHICS · DirectX 9 auto-allocates shader constants, assigning each a set of float4 registers. Table 3 outlines the type,

9

TABLE 5 SAMPLECODE FROM MICROSOFT DIRECT3D SDK REFLECTIVE LIGHTING MODEL

HLSL // Reflective Lighting Model

// Copyright (c) Microsoft Corporation. All

rights reserved.

//---------------------------------------------

-----------------------------------------

// light direction (world space)

float3 lightDir = {0.577, -0.577, -0.577};

// Transformation Matrices

matrix matView : VIEW;

matrix matProj : PROJECTION;

matrix matWorld : WORLD;

….

-----------------------------------------------

--------------------------------------------

HLSL // Reflective Lighting Model

// Copyright (c) Microsoft Corporation. All

rights reserved.

//---------------------------------------------

-----------------------------------------

// light direction (world space)

static const float3 lightDir = {0.577, -0.577,

-0.577};

//lightdir will not use the constants

register//

// Transformation Matrices

matrix matView : VIEW;

matrix matProj : PROJECTION;

matrix matWorld : WORLD;

….

-----------------------------------------------

--------------------------------------------

ASSEMBLY vertexshader =

asm {

// Generated by Microsoft (R) HLSL Shader

Compiler 9.24.950.2656

// ……

// Registers:

//

// Name Reg Size

// ------------ ----- ----

// matWorld c0 4

// matView c4 3

// …..

//

PRESHADER

mul r0, c4.x, c0

……

mul r0, c6.w, c3

add c3, r1, r0

dot r0.xyz, (0.577, -0.577, -0.577), c4.xyz

dot r0.yzw, (0.577, -0.577, -0.577), c5.xyz

dot r0.zwx, (0.577, -0.577, -0.577), c6.xyz

dot r1.xyz, r0.xyz, r0.xyz

rsq r0.w, r1.x

mul c0.xyz, r0.w, r0.xyzmul c5.xyz, c7.xyz,

(0.358824, 0.311765, 0.059804)

mul c4.xyz, c8.xyz, (0.358824, 0.311765,

0.059804)

mul c6.xyz, (0.9, 0.9, 0.9), c9.xyz

// approximately 30 instructions used

ASSEMBLY vertexshader =

asm {

// Generated by Microsoft (R) HLSL Shader

Compiler 9.24.950.2656

//………

// Registers:

//

// Name Reg Size

// ------------ ----- ----

// matWorld c0 4

// matView c4 3

//

preshader

mul r0, c4.x, c0

……..

mul r0, c6.w, c3

add c7, r1, r0

dot r0.xyz, (0.577, -0.577, -0.577), c4.xyz

//static values

dot r0.yzw, (0.577, -0.577, -0.577), c5.xyz

dot r0.zwx, (0.577, -0.577, -0.577), c6.xyz

dot r1.xyz, r0.xyz, r0.xyz

rsq r0.w, r1.x

mul c4.xyz, r0.w, r0.xyz

// approximately 27 instructions use

Though the immediate constants and immediate indexed constants “pushed” into the hardware

register perform better, constant buffers created by the driver are required to support the amount

of constant registers specified for Shader Model 3.0 for the Intel 4 series Chipset Family

(224+16+ =256 for the pixel shader and 256+16+16=288 for the vertex shader).

Page 10: DIRECTX CONSTANTS OPTIMIZATIONS FOR INTEL® INTEGRATED GRAPHICS · DirectX 9 auto-allocates shader constants, assigning each a set of float4 registers. Table 3 outlines the type,

10

OPTIMIZING DirectX 9

Higher performance is obtained with local constants over global constants. Immediate constants provide better performance than dynamic indexed constants. In dynamic indexed constants the driver cannot determine a prior the index value and needs to create a full size constant buffer space in memory, instead of using the hardware constant buffer. To take advantage of the optimization, limit the use of global constants and the use of dynamically indexed constants C[ax] as these skip the IIG optimization algorithm within the Intel Driver.

Page 11: DIRECTX CONSTANTS OPTIMIZATIONS FOR INTEL® INTEGRATED GRAPHICS · DirectX 9 auto-allocates shader constants, assigning each a set of float4 registers. Table 3 outlines the type,

11

D3D10 CONSTANTS MANAGEMENT

Direct3D10 places all shader constants in one or more buffer resources in memory and allows

managing this like any other resource. This is in contrast to Direct3D9 where each shader stage

had a limited constant register file and required frequent CPU access for changing or resetting

the values using SetXXXXXShaderConstantX.

The new method in Direct3D10 minimizes bandwidth as well as the overhead associated with

setting of shader constants. However, the D3D9 driver could optimize the constant delivery to

the hardware whereas on D3D10 more of this burden has shifted to the software developer.

Constant buffers are managed in D3D10-like vertex or texture data buffers. They are updated via

Map (D3D10_MAP_WRITE_DISCARD) or by calling UpdateSubResource which enables CPU

copy of data from memory to the buffer.

Constants are organized into two constant buffer types - cbuffer and tbuffer. cbuffers are

optimized for uniformly indexed data and sequential access whereas tbuffers are optimized for

arbitrarily indexed data and more random access like a texture.

The sizes for these buffers are:

• cbuffer <= 4096*4*32-bit entries; although a large number of buffers can be created, D3D10 limits the maximum number of simultaneous cbuffers to 14, plus 1 immediate constant

buffer

• tbuffer <= 128Mbytes

As noted by the layout of the cbuffer, they are packed with a float4 granularity. As an example

two float2 values can be packed together whereas two float3 values would be stored as separate

entries. By default, the D3D10 compiler packs as many variables as possible per entry.

However, a keyword (packoffset) can be used to arrange constants in specific ways. This is

described in detail in the Microsoft SDK.

A constant buffer is bound to a shader stage using one of the following APIs:

[VS/GS/PS]SetConstantBuffers

Example: Microsoft SDK Skinning10 sample demonstrates different methods of indexing bone

transformation matrices for skinning on the GPU along with Stream out. We show in Table 6 the

use of cbuffers and tbuffers in skinning:

Page 12: DIRECTX CONSTANTS OPTIMIZATIONS FOR INTEL® INTEGRATED GRAPHICS · DirectX 9 auto-allocates shader constants, assigning each a set of float4 registers. Table 3 outlines the type,

12

TABLE 6 SKINNING 10 EXAMPLE

// BUFFER DEFINITIONS:

// CBUFFER CBANIMMATRICES

// {

// FLOAT4X4 G_MCONSTBONEWORLD[255]; //OFFSET: 0 SIZE: 16320

// }

// BUFFER DEFINITIONS:

// TBUFFER TBANIMMATRICES

// {

// FLOAT4X4 G_MTEXBONEWORLD[255]; //OFFSET: 0 SIZE: 16320

// }

When Stream Out (SO) is disabled, the cbuffers performance of is much higher than tbuffers. As

expected, with SO enabled, there is minimal performance difference. The disassembly of the

sample shows that tbuffer has more number of instructions including texture loads. We

recommend using cbuffers where possible especially for smaller payloads. Usage of tbuffers is

observed to be minimal in today’s games.

In addition to this all constants that are not placed in constant buffers are grouped under a global

cbuffer $Globals as shown from the BasicHLSL10 in Table 7

TABLE 7 BASICHLSL10

//

// BUFFER DEFINITIONS:

// CBUFFER $GLOBALS

// {

//

// FLOAT4 G_MATERIALAMBIENTCOLOR; // OFFSET: 0 SIZE: 16

// FLOAT4 G_MATERIALDIFFUSECOLOR; // OFFSET: 16 SIZE: 16

// INT G_NNUMLIGHTS; // OFFSET: 32 SIZE: 4 [UNUSED]

// FLOAT3 G_LIGHTDIR[3]; // OFFSET: 48 SIZE: 44

// FLOAT4 G_LIGHTDIFFUSE[3]; //OFFSET: 96 SIZE:48

// FLOAT4 G_LIGHTAMBIENT; //OFFSET: 144 SIZE:16

// FLOAT4 G_FTIME; //OFFSET: 160 SIZE:4

// FLOAT4X4 G_MWORLD; //OFFSET: 176 SIZE: 64

// FLOAT4 G_MWORLDVIEWPROJECTION; //OFFSET: 240 SIZE:64

// }

It is tempting to create an uber constant buffer that houses all of the constants especially if

porting from DX9 which can result in a large global buffer. However, constant buffers are

typically characterized by frequent updates from the CPU. Therefore, if any constant value is

changed it results in reloading the whole buffer to the GPU. This can cause significant

performance impact.

For optimal constant buffer management it is recommended that constants are partitioned into a

set of separate buffers based on the frequency of updates and according to the access pattern

within a buffer.

Page 13: DIRECTX CONSTANTS OPTIMIZATIONS FOR INTEL® INTEGRATED GRAPHICS · DirectX 9 auto-allocates shader constants, assigning each a set of float4 registers. Table 3 outlines the type,

13

Example: Constants are grouped in terms of whether they are used once per Level, once per

Frame, once per batch, once per Draw(), etc.,. essentially based on how often they are updated.

TABLE 8 CONSTANT UPDATE

CBUFFER GLOBAL$ { VFOGCOLOR, … }

CBUFFER CBPERLEVELDATA { VSUNPOSITION, … }

CBUFFER CBPERFRAMEDATA { VAPPTIME, … }

CBUFFER CBPERPASSDATA { MATVIEWPROJ, VRENDERTARGETSIZE, … }

CBUFFER CBPEROBJECTDYNAMIC { VBONES, … }

CBUFFER CBPEROBJECTSTATIC { MATWORLD, … }

CBUFFER CBPERMATERIALA { VSPECPOWER, VBDRFCOEFFICIENT, … }

Carsten Wenzel describes the benefits in his Siggraph 2007 presentation, “Porting Game Engines

to Direct3D 10: Crysis/Cryengine2”.. According to Wenzel, a simple port from DirectX 9

showed ~7000 updates per frame in the in-game profiler and after optimizations that figure

dropped to ~5000 which was equivalent to number of draw calls. Cryengine2 groups constants

by frequency of update – Per-frame, Per-Batch, Per-Instance, Per-Material and Per-Light group.

Microsoft has shown an example which shows the difference in terms of number of bytes

updated if using an uber buffer vs. splitting into multiple buffers. The benefit is outlined in the

Table 9 below:

TABLE 9 BYTES UPDATED UBER BUFFER VS MULTIPLE BUFFERS

100 SKINNED MESHES (100 MATERIALS), 900 STATIC MESHES (400 MATERIALS), 2 PASSES

PER FRAME

CBUFFER UBERCB

{

MATRIX VIEWPROJ;

MATRIX BONES[100];

MATRIX WORLD;

FLOAT SPECPOWER;

FLOAT4 BDRFCOEFFICIENTS;

FLOAT APPTIME; SIZE: 4 BYTES

UINT2 RENDERTARGETSIZE;

}

BEGIN FRAME

SHADOW PASS

UPDATE UBERCB

6560X100 = 656000 BYTES

UPDATE UBERCB

6560X900 = 5904000 BYTES

LIGHT PASS

UPDATE UBERCB

6560X100 = 656000 BYTES

UPDATE UBERCB

6560X900 = 5904000 BYTES

END FRAME

TOTAL = 13,120,000 = 13MB/FRAME

CBUFFER VSGLOBALPERFRAMECB

{

FLOAT APPTIME; SIZE: 4 BYTES

};

CBUFFER VSPERSKINNEDCB

{

MATRIX BONES[100]; SIZE: 6400 BYTES

};

CBUFFER VSPERSTATICCB

{

MATRIX WORLD; SIZE: 64 BYTES

};

CBUFFER VSPERPASSCB

{

MATRIX VIEWPROJ; SIZE: 64 BYTES

UINT2 RENDERTARGETSIZE; SIZE: 8 BYTESM

};

BEGIN FRAME

UPDATE VSGLOBALPERFRAMECB

4 X 1 = 4BYTES

UPDATE VSPERSKINNEDCB

6400X100 = 640000 BYTES

UPDATE VSPERSTATICCB

64X900 = 57600 BYTES

SHADOW PASS

UPDATE VSPERPASSCB

72X1 = 72 BYTES

LIGHT PASS

UPDATE VSPERPASSCB

72X1 = 72 BYTES

UPDATE VSPERMATERIALCB

500X20 = 10000 BYTES

END FRAME

TOTAL = 707, 748 BYTES = 708KB/FRAME

Page 14: DIRECTX CONSTANTS OPTIMIZATIONS FOR INTEL® INTEGRATED GRAPHICS · DirectX 9 auto-allocates shader constants, assigning each a set of float4 registers. Table 3 outlines the type,

14

CBUFFER VSPERMATERIALCB

{

FLOAT SPECPOWER; SIZE: 4 BYTES

FLOAT4 BDRFCOEFFICIENTS; SIZE: 16

BYTES

};

BETTER THAN 18X LESS DATA UPDATED

EVERY FRAME

It is generally preferred to have a larger number of small size constant buffers. Additionally it is

better where possible to share constant buffers between different shaders. Listing below shows

an example where cbuffer cbConstant is shared between the vertex shader and pixel shader. The

pixel shader only uses the float3 watercolour only, the rest are unused. This is generally observed

in D3D10 code. Another optimization to keep in mind is that if there are constants that are

unused by most of the shaders then moving those to the bottom will allow binding a smaller

buffer to those shaders. In the Table 10 example below,Table 10 both float sun shininess and

float sun strength could be moved to the bottom since neither shader uses them.

TABLE 10 CONSTANT ORDER

//

//GENERATED BY MICROSOFT (R) HLSL SHADER COMPILER 9.24.949.2307

// BUFFER DEFINITIONS:

//

// CBUFFER CBCONSTANT

// {

// FLOAT3 WATERCOLOUR; // OFFSET: 0 SIZE: 12 [UNUSED]M

// FLOAT SUN_SHININESS; // OFFSET: 12 SIZE: 4 [UNUSED]

// FLOAT SUN_STRENGTH; // OFFSET: 16 SIZE: 4 [UNUSED]EE

// FLOAT3 SUN_VEC; // OFFSET: 20 SIZE: 12E

// }

//

// CBUFFER CBDYNAMIC

// {

// FLOAT4X4 MWORLD; // OFFSET: 0 SIZE: 64

// FLOAT4X4 MWORLDVIEWPROJ; // OFFSET: 64 SIZE: 64

// FLOAT4 CLIPPLANE; // OFFSET: 128 SIZE: 16

// }

// RESOURCE BINDINGS:

//

// NAME TYPE FORMAT DIM SLOT ELEMENTS

// ---------------- ---------- ------- ----------- ---- --------

// CBCONSTANT CBUFFER NA NA 0 1

// CBDYNAMIC CBUFFER NA NA 1 1

//

// INPUT SIGNATURE:

//

VS_4_0

DCL_INPUT V0.XYZ

DCL_INPUT V1.XYZ

DCL_INPUT V2.XY

DCL_OUTPUT_SIV O0.XYZW , POSITION

DCL_OUTPUT_SIV O1.X , CLIP_DISTANCE

DCL_OUTPUT O2.XY

DCL_OUTPUT O2.Z

DCL_OUTPUT O3.XYZW

DCL_CONSTANTBUFFER CB0[2], IMMEDIATEINDEXED

DCL_CONSTANTBUFFER CB1[9], IMMEDIATEINDEXED

DCL_TEMPS 2

MOV R0.XYZ, V0.XYZX

MOV R0.W, L(1.000000)

DP4 O0.X, R0.XYZW, CB1[4].XYZW

…………

MOV_SAT O3.XYZW, R0.XXXX

Page 15: DIRECTX CONSTANTS OPTIMIZATIONS FOR INTEL® INTEGRATED GRAPHICS · DirectX 9 auto-allocates shader constants, assigning each a set of float4 registers. Table 3 outlines the type,

15

RET

//

//GENERATED BY MICROSOFT (R) HLSL SHADER COMPILER 9.24.949.2307

// BUFFER DEFINITIONS:

//

// CBUFFER CBCONSTANT

// {

// FLOAT3 WATERCOLOUR; // OFFSET: 0 SIZE: 12

// FLOAT SUN_SHININESS; // OFFSET: 12 SIZE: 4 [UNUSED]

// FLOAT SUN_STRENGTH; // OFFSET: 16 SIZE: 4 [UNUSED]

// FLOAT3 SUN_VEC; // OFFSET: 20 SIZE: 12 [UNUSED]

// }

// RESOURCE BINDINGS:

// NAME TYPE FORMAT DIM SLOT ELEMENTS

// ---------------- ---------- ------- ----------- ---- --------

// SDIFFUSE SAMPLER NA NA 0 1

// G_MESHTEXTURE TEXTURE FLOAT4 2D 0 1

// CBCONSTANT CBUFFER NA NA 0 1

// …………

PS_4_0

DCL_INPUT_PS LINEAR V2.XY

DCL_INPUT_PS LINEAR V2.Z

DCL_INPUT_PS LINEAR V3.XYZW

DCL_OUTPUT O0.XYZW

DCL_CONSTANTBUFFER CB0[1], IMMEDIATEINDEXED

DCL_SAMPLER S0, MODE_DEFAULT

DCL_RESOURCE_TEXTURE2D ( FLOAT , FLOAT , FLOAT , FLOAT ) T0

DCL_TEMPS 3

SAMPLE R0.XYZW, V2.XYXX, T0.XYZW, S0

MUL R1.XYZW, R0.XYZW, V3.XYZW

MAD R2.XYZ, -V3.XYZX, R0.XYZX, CB0[0].XYZX

MAD R2.W, -V3.W, R0.W, L(1.000000)

MAD O0.XYZW, V2.ZZZZ, R2.XYZW, R1.XYZW

RET

The assembly code generated by the HLSL compiler has two main declarations for constant

buffers: dcl_constantBuffer and dcl_immediateConstantBuffer

• A shader constant buffer declared using dcl_constantBuffer cbN[size] where N is the

constant buffer register number and size is the # of elements it has. In addition to this the

declaration also includes the access type of the buffer. There are 2 types:

immediateIndexed where index used is a literal value and dynamicIndexed where the

index is a computed value. This is applicable to VS, GS and PS.

• A shader immediate-constant buffer can also be declared using

dcl_immediateConstantBuffer {values} where values are an array of four-component

elements. The buffer must contain at least one but less than 4096 values. Only one

immediate constant buffer can be used with a shader. It is accessed similar to the constant

buffer with dynamic indexing. This is also applicable to VS, GS and PS. In general this is

not observed to be used a lot in games.

The listing Table 11 below shows the Skinning10 example.

TABLE 11 SKINNING 10 EXAMPLE

//

// GENERATED BY MICROSOFT (R) HLSL SHADER COMPILER 9.23.949.2378

// BUFFER DEFINITIONS:

//

Page 16: DIRECTX CONSTANTS OPTIMIZATIONS FOR INTEL® INTEGRATED GRAPHICS · DirectX 9 auto-allocates shader constants, assigning each a set of float4 registers. Table 3 outlines the type,

16

// CBUFFER $PARAMS

// {

// UINT IFETCHTYPE; // OFFSET: 0 SIZE: 4

//

// }

//

// CBUFFER CB0

// {

// FLOAT4X4 G_MWORLDVIEWPROJ; // OFFSET: 0 SIZE: 64

// FLOAT4X4 G_MWORLD; // OFFSET: 64 SIZE: 64

// }

//

// CBUFFER CBANIMMATRICES

// {

// FLOAT4X4 G_MCONSTBONEWORLD[255]; // OFFSET: 0 SIZE: 16320

// }

//

// TBUFFER TBANIMMATRICES

// {

// FLOAT4X4 G_MTEXBONEWORLD[255]; // OFFSET: 0 SIZE: 16320

// }

VS_4_0

DCL_INPUT V0.XYZ

DCL_INPUT V1.XYZW

DCL_INPUT V2.XYZW

DCL_INPUT V3.XYZ

DCL_INPUT V4.XY

DCL_INPUT V5.XYZ

DCL_OUTPUT_SIV O0.XYZW , POSITION

DCL_OUTPUT O1.XYZ

DCL_OUTPUT O2.XYZ

DCL_OUTPUT O3.XY

DCL_OUTPUT O4.XYZ

DCL_CONSTANTBUFFER CB0[1], IMMEDIATEINDEXED

DCL_CONSTANTBUFFER CB1[7], IMMEDIATEINDEXED

DCL_CONSTANTBUFFER CB2[1020], DYNAMICINDEXED

MOV O3.XY, V4.XYXX

RET

Page 17: DIRECTX CONSTANTS OPTIMIZATIONS FOR INTEL® INTEGRATED GRAPHICS · DirectX 9 auto-allocates shader constants, assigning each a set of float4 registers. Table 3 outlines the type,

17

MULTIPLE CBUFFERS PERFORMANCE IMPACT

EXAMPLE 1: OCEAN FOG D3D10 DEMO

The Ocean Fog demo FIGURE1 is a good example of how to scale code for Intel Integrated

graphics. It utilizes the Perlin noise algorithm based blur along with a Gaussian blur to give a

smooth effect. Fog is projected onto mesh surfaces in the GPU. Ocean Fog demo has 22 Shaders

in 9 effects files, and it uses 1.3 Kb of constants. The demo allocates all the constants in cbuffers

- there are no tbuffers. It does not utilize DynamicIndexed constants enabling the IntelIntegrated

Graphics driver to optimize the constant accesses using a lower latency path.

Figure 1

Figure 2 Oceanfog show the metrics impact of different cbuffers arrangements in the time taken

to update resources running the Oceanfog over 100 seconds.

Page 18: DIRECTX CONSTANTS OPTIMIZATIONS FOR INTEL® INTEGRATED GRAPHICS · DirectX 9 auto-allocates shader constants, assigning each a set of float4 registers. Table 3 outlines the type,

18

Figure 2

Note 1: These measurements were taking on a Lenovo X301 with Mobile Intel® 4 Series Express

Chipset Family

Note 2: Shorter time implies better performance

The chart above shows the time taken to make COPYREGION_D3D10 and

UPDATESUBRESOURCEUP_D310 API calls. We notice a small 7% improvement from the

original measurement to the second set of measurements when using one cbuffer local per fx file

vs. most constants in a global buffer located in an include file. The third set of measurements

shows a significant 70% improvement when using 18 buffers optimized per frequency of

constant update (2 cbuffers per effect file). In the optimized version, we use one cbuffer

cbconstant grouping the constants that do not change during the shader invocation and cbuffer

cdynamic grouping the constants that change per frame as shown in Table 12 for the effect file

RE

SO

UR

CE

CO

PY

RE

GIO

N_D

3D

10,

7.1

77

RE

SO

UR

CE

CO

PY

RE

GIO

N_D

3D

10,

6.6

01

RE

SO

UR

CE

CO

PY

RE

GIO

N_D

3D

10,

0.0

00

RE

SO

UR

CE

UP

DA

TE

SU

BR

ES

OU

RC

EU

P_D

310,

20.9

31

RE

SO

UR

CE

UP

DA

TE

SU

BR

ES

OU

RC

EU

P_D

310,

19.5

87

RE

SO

UR

CE

UP

DA

TE

SU

BR

ES

OU

RC

EU

P_D

310

8.6

94

0

5

10

15

20

25

1 global cbuffer 9 cbuffers (1 local per fx file)

18 cbuffers (optimized per freq of update)

Seco

nd

s

cbuffers

OCEAN FOG

Page 19: DIRECTX CONSTANTS OPTIMIZATIONS FOR INTEL® INTEGRATED GRAPHICS · DirectX 9 auto-allocates shader constants, assigning each a set of float4 registers. Table 3 outlines the type,

19

“fogmesh”. Table 13 shows the ASM including a global cbuffer and Table 13 shows the ASM

for un-optimized local cbuffers (one per fx file).

TABLE 12 OCEAN FOG CBUFFERS PER SHADER AND PER FRAME

TABLE 13 OCEAN FOG GLOBAL CONSTANT BUFFER

// Fogmesh

// FX Version: fx_4_0

// Child effect (requires effect pool): false

//

// 2 local buffer(s)

//

cbuffer cbGlobal

{

float4x4 g_mWorld; // Offset: 0, size: 64

float4x4 g_mWorldViewProjection; // Offset: 64, size: 64

float4 ClipPlane; // Offset: 128, size: 16

float4 g_vCloudColor; // Offset: 144, size: 16

float g_fHeight; // Offset: 160, size: 4

float3 g_sunvec; // Offset: 164, size: 12

float4 g_sundiffuse; // Offset: 176, size: 16

float3 g_vCameraPos; // Offset: 192, size: 12

float3 g_vSpriteCenter; // Offset: 208, size: 12

float g_fCloudDensity; // Offset: 220, size: 4

//

// FOGMESH

// FX Version: fx_4_0

// Child effect (requires effect pool): false

//

// FX Version: fx_4_0

// Child effect (requires effect pool): false

//

// 2 local buffer(s)

//

cbuffer cbConstant

{

float4 g_MaterialAmbientColor; // Offset: 0, size: 16

float4 g_LightDiffuse; // Offset: 16, size: 16

float g_fNormalMapFactor; // Offset: 32, size: 4

float3 g_vSpotPos; // Offset: 36, size: 12

float g_fSpotFrustum; // Offset: 48, size: 4

float4 g_vSpotDiffuse; // Offset: 64, size: 16

float4 vWaterColor = { 0.0199999996, 0.0250000004, 0.0350000001, 1 };// Offset: 80, size: 16

float3 vFogDirection = { 0, 0, -1 };// Offset: 96, size: 12

float4 vFogColor = { 0.600000024, 0.600000024, 0.600000024, 1 };// Offset: 112, size: 16

}

cbuffer cbDynamic

{

float3 g_LightDir; // Offset: 0, size: 12

float3 g_CameraPos; // Offset: 16, size: 12

float3 g_CameraForward; // Offset: 32, size: 12

float g_fTime; // Offset: 44, size: 4

float4x4 g_mWorld; // Offset: 48, size: 64

float4x4 g_mWorldViewProjection; // Offset: 112, size: 64

float3 g_vSpotDir; // Offset: 176, size: 12

float g_fSpotIntensity; // Offset: 188, size: 4

float4x4 g_mSpotWorld; // Offset: 192, size: 64

bool underwater; // Offset: 256, size: 4

float g_FogDensity; // Offset: 260, size: 4

float4 ClipPlane; // Offset: 272, size: 16

}

//

Page 20: DIRECTX CONSTANTS OPTIMIZATIONS FOR INTEL® INTEGRATED GRAPHICS · DirectX 9 auto-allocates shader constants, assigning each a set of float4 registers. Table 3 outlines the type,

20

float4 g_MaterialAmbientColor; // Offset: 224, size: 16

float4 g_LightDiffuse; // Offset: 240, size: 16

float g_fNormalMapFactor; // Offset: 256, size: 4

float3 g_vSpotPos; // Offset: 260, size: 12

float g_fSpotFrustum; // Offset: 272, size: 4

float4 g_vSpotDiffuse; // Offset: 288, size: 16

float3 g_LightDir; // Offset: 304, size: 12

float3 g_CameraPos; // Offset: 320, size: 12

float3 g_CameraForward; // Offset: 336, size: 12

float g_fTime; // Offset: 348, size: 4

float3 g_vSpotDir; // Offset: 352, size: 12

float g_fSpotIntensity; // Offset: 364, size: 4

float4x4 g_mSpotWorld; // Offset: 368, size: 64

float g_FogDensity; // Offset: 432, size: 4

float g_SunAlpha; // Offset: 436, size: 4

float g_SunTheta; // Offset: 440, size: 4

float g_SunShininess; // Offset: 444, size: 4

float g_SunStrength; // Offset: 448, size: 4

float4 g_mViewProjection; // Offset: 464, size: 16

float g_fFogDensity; // Offset: 480, size: 4

float4 g_fSpotDiffuse; // Offset: 496, size: 16

float3 g_vSpotCenter; // Offset: 512, size: 12

bool underwater; // Offset: 524, size: 4

float3 watercolour; // Offset: 528, size: 12

float sun_shininess; // Offset: 540, size: 4

float sun_strength; // Offset: 544, size: 4

float3 sun_vec; // Offset: 548, size: 12

float4x4 mWorld; // Offset: 560, size: 64

float4x4 mWorldViewProj; // Offset: 624, size: 64

float scale; // Offset: 688, size: 4

float inv_mapsize_x; // Offset: 692, size: 4

float inv_mapsize_y; // Offset: 696, size: 4

float4 corner00; // Offset: 704, size: 16

float4 corner01; // Offset: 720, size: 16

float4 corner10; // Offset: 736, size: 16

float4 corner11; // Offset: 752, size: 16

float amplitude; // Offset: 768, size: 4

}

cbuffer cbConstant

{

float4 vWaterColor = { 0.0199999996, 0.0250000004, 0.0350000001, 1 };// Offset: 0, size: 16

float3 vFogDirection = { 0, 0, -1 };// Offset: 16, size: 12

float4 vFogColor = { 0.600000024, 0.600000024, 0.600000024, 1 }; // Offset: 32, size: 16

}

TABLE 14 NOT OPTIMIZED ONE CONSTANT BUFFER PER FX FILE

// fogmesh

// FX Version: fx_4_0

// Child effect (requires effect pool): false

//

// 1 local buffer(s)

//

cbuffer cbConstant

{

float4 g_MaterialAmbientColor; // Offset: 0, size: 16

float4 g_LightDiffuse; // Offset: 16, size: 16

float g_fNormalMapFactor; // Offset: 32, size: 4

float3 g_vSpotPos; // Offset: 36, size: 12

float g_fSpotFrustum; // Offset: 48, size: 4

float4 g_vSpotDiffuse; // Offset: 64, size: 16

float4 vWaterColor = { 0.0199999996, 0.0250000004, 0.0350000001, 1 };// Offset:80, size: 16

float3 vFogDirection = { 0, 0, -1 };// Offset: 96, size: 12

float4 vFogColor = { 0.600000024, 0.600000024, 0.600000024, 1 };// Offset: 112, size: 16

float4x4 g_mWorld; // Offset: 128, size: 64

float4x4 g_mWorldViewProjection; // Offset: 192, size: 64

float4 ClipPlane; // Offset: 256, size: 16

Page 21: DIRECTX CONSTANTS OPTIMIZATIONS FOR INTEL® INTEGRATED GRAPHICS · DirectX 9 auto-allocates shader constants, assigning each a set of float4 registers. Table 3 outlines the type,

21

float3 g_LightDir; // Offset: 272, size: 12

float g_fSpotIntensity; // Offset: 284, size: 4

float3 g_vSpotDir; // Offset: 288, size: 12

float4x4 g_mSpotWorld; // Offset: 304, size: 64

float g_FogDensity; // Offset: 368, size: 4

float g_fTime; // Offset: 372, size: 4

float3 g_CameraPos; // Offset: 384, size: 12

float3 g_CameraForward; // Offset: 400, size: 12

bool underwater; // Offset: 412, size: 4

}

//

// 6 local object(s)

EXAMPLE 2: SKINNING10

Figure 3 uses Skinning10 SDK app to show the impact of using a single buffer vs. multiple

buffers. This sample renders the app multiple times and skins it each time when it is rendered

without Stream Out.

FIGURE 3

The sample was run on 2 different platforms and measured. The chart Figure 4 below shows

anywhere from 5-8% in the minimum case to 15-20% frame rate impact of utilizing multiple

constant buffers.

Page 22: DIRECTX CONSTANTS OPTIMIZATIONS FOR INTEL® INTEGRATED GRAPHICS · DirectX 9 auto-allocates shader constants, assigning each a set of float4 registers. Table 3 outlines the type,

22

FIGURE 4

NOTE 3: These measurements were taking on a HP Pavillon with Mobile Intel® 4

Series Express Chipset Family

OPTIMIZING DIRECTX 10

From hardware perspective pushing immediate constants has the highest performance vs.

indexed constants which normally incur a high latency path. In the latter case, Indexed Constant

buffers with literal indices have higher performance than those with computed indices. Finally,

performance of indexed constant buffers with computed scalar index (independent of

pixel/vertex position) has higher performance than those with computed vector index. The higher

access latency can also be amortized by the # of instructions in the shader using the constants.

Another optimization is to use immediate constant buffers (dcl_immediateConstantBuffer) where

possible.

In general building smaller packed constant buffers grouped by frequency of update and access

pattern are ideal for higher performance. As an example: Organize PerFrame/ Per Pass/ Per

Instance constant buffers first which tend to be smaller in size and have a low update rate

followed by Per Draw/Per Material constant buffers which may also be small but have a higher

update rate. Finally, define large constant buffers like skinning constants.

Another optimization that could be made is to breakup constant buffers based on features that are

optional in games (e.g. shadows, post-processing effects, etc.). Essentially due to performance

constraints for integrated platforms some of these features are either going to be disabled or run

with a lower setting – given this it would beneficial to breakup constants into separate buffers

1.00

1.05

1.10

1.15

1.20

1.25

1 5 10 20 30

# of Soldiers

Performance Improvement with Cb

GM45

Page 23: DIRECTX CONSTANTS OPTIMIZATIONS FOR INTEL® INTEGRATED GRAPHICS · DirectX 9 auto-allocates shader constants, assigning each a set of float4 registers. Table 3 outlines the type,

23

and then disabling the updates to these constant buffers based on the settings selected by the

gamer/user.

For indexed Constant buffers it is recommended to keep the buffer size tailored to actual needs.

For example, if the shader iterates over 5 elements only, declare 5-element constant buffer for

this shader rather than a general purpose 50-element constant buffer shared among shaders.

This allows the driver to optimize placement so it incurs a low latency path.

Page 24: DIRECTX CONSTANTS OPTIMIZATIONS FOR INTEL® INTEGRATED GRAPHICS · DirectX 9 auto-allocates shader constants, assigning each a set of float4 registers. Table 3 outlines the type,

24

SUMMARY

Using the above tips and tricks to optimize your application for Intel Integrated Graphics will

help to ensure that your application will run well on the largest volume graphics platforms. For

any issues in implementing these tips, please visit the links below. We welcome feedback and

ways to enhance this guide with more information. See the Legal Information section of this

document (page 2) regarding any feedback provided to Intel.

WEB SITE AND ENGINEERING SUPPORT

Software developers can go to the forum at http://software.intel.com/en-us/forums/user-

community-for-intel-graphics-technology/ and post questions/comments about the complete line

of Intel® ’s Integrated Graphics chipset solutions. If you are a game developer, many useful

documents including topics from multithreading to audio, are available at http://www.Intel

.com/software/games.

REFERENCES

Intel® Graphics Media Accelerator Developer Guide http://software.intel.com/en-

us/articles/intel-graphics-media-accelerator-developers-guide

MICROSOFT DIRECTX SDK http://msdn.microsoft.com/en-us/directx/default.aspx

Wenzel, C.: Porting Game Engines to Direct3D 10: Crysis / CryEngine™ 2 Syggraph 2007.

Oceanfog Demo http://software.intel.com/en-us/articles/ocean-fog-using-direct3d-10/

Page 25: DIRECTX CONSTANTS OPTIMIZATIONS FOR INTEL® INTEGRATED GRAPHICS · DirectX 9 auto-allocates shader constants, assigning each a set of float4 registers. Table 3 outlines the type,

25

APPENDIX

TABLES

Table 1 immediate, immediate indexed, dynamic indexed constants ..........................................5

Table 2 DirectX 10 inmediate constant ........................................................................................6

Table 3 dIRECTX 9 SHADER MODEL 3 .....................................................................................7

Table 5 Samplecode from microsoft direct3d sdk skinned mesh .................................................8

Table 6 Samplecode from microsoft direct3d sdk reflective lighting model ..................................9

Table 7 sKINNING 10 EXAMPLE ..............................................................................................12

Table 8 BasicHLSL10 ...............................................................................................................12

Table 9 CONSTANT UPDATE ..................................................................................................13

Table 10 BYTES UPDATED UBER BUFFER VS MULTIPLE BUFFERS ..................................13

Table 11 CONSTANT ORDER ..................................................................................................14

Table 12 Skinning 10 example ..................................................................................................15

Table 14 OCEAN FOG CBUFFERS PER SHADER AND PER FRAME ....................................19

Table 15 OCEAN FOG gLOBAL CONSTANT BUFFER ............................................................19

Table 16 NOT OPTIMIZED ONE CONSTANT BUFFER PER FX FILE .....................................20