Ivan Nevraev Microsoft Introduction to Direct3D 12.
-
Upload
alayna-skillern -
Category
Documents
-
view
221 -
download
4
Transcript of Ivan Nevraev Microsoft Introduction to Direct3D 12.
![Page 1: Ivan Nevraev Microsoft Introduction to Direct3D 12.](https://reader038.fdocuments.in/reader038/viewer/2022102900/551c1c80550346a84f8b594f/html5/thumbnails/1.jpg)
Ivan Nevraev
Microsoft
Introduction to Direct3D 12
![Page 2: Ivan Nevraev Microsoft Introduction to Direct3D 12.](https://reader038.fdocuments.in/reader038/viewer/2022102900/551c1c80550346a84f8b594f/html5/thumbnails/2.jpg)
Goals & Assumptions
• Preview of Direct3D 12• More API details in future talks• Assuming familiarity with Direct3D 11
![Page 3: Ivan Nevraev Microsoft Introduction to Direct3D 12.](https://reader038.fdocuments.in/reader038/viewer/2022102900/551c1c80550346a84f8b594f/html5/thumbnails/3.jpg)
Direct3D 12 API – Goals
• Console API efficiency and performance• Reduce CPU overhead• Increase scalability across multiple CPU cores• Greater developer control• Superset of D3D 11 rendering functionality
![Page 4: Ivan Nevraev Microsoft Introduction to Direct3D 12.](https://reader038.fdocuments.in/reader038/viewer/2022102900/551c1c80550346a84f8b594f/html5/thumbnails/4.jpg)
ID3D11DeviceContext
Render Context: Direct3D 11
Input Assembler
Vertex Shader
Hull Shader
Tessellator
Rasterizer
Domain Shader
Geometry Shader
Pixel Shader
Output Merger
GPU Memory
Other State
![Page 5: Ivan Nevraev Microsoft Introduction to Direct3D 12.](https://reader038.fdocuments.in/reader038/viewer/2022102900/551c1c80550346a84f8b594f/html5/thumbnails/5.jpg)
CPU Overhead: Changing Pipeline State• Direct3D 10 reduced number of state objects• Still mismatched from hardware state• Drivers resolve state at Draw
![Page 6: Ivan Nevraev Microsoft Introduction to Direct3D 12.](https://reader038.fdocuments.in/reader038/viewer/2022102900/551c1c80550346a84f8b594f/html5/thumbnails/6.jpg)
Direct3D 11 – Pipeline State Overhead
Small state objects Hardware mismatch overhead
HW State 1
HW State 2
D3D Vertex Shader
D3D Rasterizer
D3D Pixel Shader
D3D Blend StateHW State 3
![Page 7: Ivan Nevraev Microsoft Introduction to Direct3D 12.](https://reader038.fdocuments.in/reader038/viewer/2022102900/551c1c80550346a84f8b594f/html5/thumbnails/7.jpg)
Direct3D 12 – Pipeline State Optimization
Group pipeline into single objectCopy from PSO to Hardware State
HW State 1
HW State 2
PipelineState
ObjectHW State 3
![Page 8: Ivan Nevraev Microsoft Introduction to Direct3D 12.](https://reader038.fdocuments.in/reader038/viewer/2022102900/551c1c80550346a84f8b594f/html5/thumbnails/8.jpg)
ID3D11DeviceContext
Render Context: Direct3D 11
Input Assembler
Vertex Shader
Hull Shader
Tessellator
Rasterizer
Domain Shader
Geometry Shader
Pixel Shader
Output Merger
GPU Memory
Non-PSO State
![Page 9: Ivan Nevraev Microsoft Introduction to Direct3D 12.](https://reader038.fdocuments.in/reader038/viewer/2022102900/551c1c80550346a84f8b594f/html5/thumbnails/9.jpg)
Render Context: Pipeline State Object (PSO)
Pipeline State ObjectInput Assembler
Vertex Shader
Hull Shader
Tessellator
Rasterizer
Domain Shader
Geometry Shader
Pixel Shader
Output Merger
GPU Memory
Non-PSO State
![Page 10: Ivan Nevraev Microsoft Introduction to Direct3D 12.](https://reader038.fdocuments.in/reader038/viewer/2022102900/551c1c80550346a84f8b594f/html5/thumbnails/10.jpg)
CPU Overhead: Resource Binding
• System needs to do lots of binding inspection• Resource hazards• Resource lifetime• Resource residency management
• Mirrored copies of state used to implement Get*• Ease of use for middleware
![Page 11: Ivan Nevraev Microsoft Introduction to Direct3D 12.](https://reader038.fdocuments.in/reader038/viewer/2022102900/551c1c80550346a84f8b594f/html5/thumbnails/11.jpg)
Resource Hazard Resolution
• Hazard tracking and resolution• Runtime• Driver
• Resource hazards• Render Target/Depth <> Texture• Tile Resource Aliasing• etc…
![Page 12: Ivan Nevraev Microsoft Introduction to Direct3D 12.](https://reader038.fdocuments.in/reader038/viewer/2022102900/551c1c80550346a84f8b594f/html5/thumbnails/12.jpg)
Direct3D 12 – Explicit Hazard ResolutionResourceBarrier: generalization of Direct3D 11’s TiledResourceBarrier
D3D12_RESOURCE_BARRIER_DESC Desc;Desc.Type = D3D12_RESOURCE_BARRIER_TYPE_TRANSITION;Desc.Transition.pResource = pRTTexture;Desc.Transition.Subresource = D3D12_RESOURCE_BARRIER_ALL_SUBRESOURCES;Desc.Transition.StateBefore = D3D12_RESOURCE_USAGE_RENDER_TARGET;Desc.Transition.StateAfter = D3D12_RESOURCE_USAGE_PIXEL_SHADER_RESOURCE;pContext->ResourceBarrier( 1, &Desc );
![Page 13: Ivan Nevraev Microsoft Introduction to Direct3D 12.](https://reader038.fdocuments.in/reader038/viewer/2022102900/551c1c80550346a84f8b594f/html5/thumbnails/13.jpg)
Resource Lifetime and Residency
• Explicit application control over resource lifetime• Resource destruction is immediate• Application must ensure no queued GPU work• Use Fence API to track GPU progress• One fence per-frame is well amortized
• Explicit application control over resource residency• Application declares resources currently in use on GPU
![Page 14: Ivan Nevraev Microsoft Introduction to Direct3D 12.](https://reader038.fdocuments.in/reader038/viewer/2022102900/551c1c80550346a84f8b594f/html5/thumbnails/14.jpg)
Remove State Mirroring
• Application responsibility to communicate current state to middleware
![Page 15: Ivan Nevraev Microsoft Introduction to Direct3D 12.](https://reader038.fdocuments.in/reader038/viewer/2022102900/551c1c80550346a84f8b594f/html5/thumbnails/15.jpg)
Render Context: Pipeline State Object (PSO)
Pipeline State ObjectInput Assembler
Vertex Shader
Hull Shader
Tessellator
Rasterizer
Domain Shader
Geometry Shader
Pixel Shader
Output Merger
GPU Memory
Non-PSO State
![Page 16: Ivan Nevraev Microsoft Introduction to Direct3D 12.](https://reader038.fdocuments.in/reader038/viewer/2022102900/551c1c80550346a84f8b594f/html5/thumbnails/16.jpg)
Render Context: Remove State Reflection
Pipeline State ObjectInput Assembler
Vertex Shader
Hull Shader
Tessellator
Rasterizer
Domain Shader
Geometry Shader
Pixel Shader
Output Merger
GPU Memory
Non-PSO State
![Page 17: Ivan Nevraev Microsoft Introduction to Direct3D 12.](https://reader038.fdocuments.in/reader038/viewer/2022102900/551c1c80550346a84f8b594f/html5/thumbnails/17.jpg)
CPU Overhead: Redundant Resource Binding• Streaming identical resource bindings frame over frame• Partial changes require copying all bindings
![Page 18: Ivan Nevraev Microsoft Introduction to Direct3D 12.](https://reader038.fdocuments.in/reader038/viewer/2022102900/551c1c80550346a84f8b594f/html5/thumbnails/18.jpg)
Direct3D 12: Descriptor Heaps & Tables• Scales across extremes of HW capability• Unified approach serves breadth of app binding flows• Streaming changes to bindings• Reuse of static bindings• And everything between
• Dynamic indexing of shader resources
![Page 19: Ivan Nevraev Microsoft Introduction to Direct3D 12.](https://reader038.fdocuments.in/reader038/viewer/2022102900/551c1c80550346a84f8b594f/html5/thumbnails/19.jpg)
Descriptor
• Small chunk of data defining resource parameters• Just opaque data – no OS lifetime management• Hardware representation of Direct3D “View”
Descriptor { Type Format Mip Count pData }
Texture
![Page 20: Ivan Nevraev Microsoft Introduction to Direct3D 12.](https://reader038.fdocuments.in/reader038/viewer/2022102900/551c1c80550346a84f8b594f/html5/thumbnails/20.jpg)
Descriptor Heaps
• Storage for descriptors• App owns the layout• Low overhead to manipulate• Multiple heaps allowed
GPU Memory
Des
crip
tor H
eap
![Page 21: Ivan Nevraev Microsoft Introduction to Direct3D 12.](https://reader038.fdocuments.in/reader038/viewer/2022102900/551c1c80550346a84f8b594f/html5/thumbnails/21.jpg)
Descriptor Tables• Context points to active heap• A table is an index and a size in the heap• Not an API object• Single view type per table• Multiple tables per type
Pipeline State Object…
Vertex Shader
…
Pixel Shader
…
Start IndexSize
![Page 22: Ivan Nevraev Microsoft Introduction to Direct3D 12.](https://reader038.fdocuments.in/reader038/viewer/2022102900/551c1c80550346a84f8b594f/html5/thumbnails/22.jpg)
Render Context: Remove State Reflection
Pipeline State ObjectInput Assembler
Vertex Shader
Hull Shader
Tessellator
Rasterizer
Domain Shader
Geometry Shader
Pixel Shader
Output Merger
GPU Memory
Non-PSO State
![Page 23: Ivan Nevraev Microsoft Introduction to Direct3D 12.](https://reader038.fdocuments.in/reader038/viewer/2022102900/551c1c80550346a84f8b594f/html5/thumbnails/23.jpg)
Render Context: Descriptor Tables & Heaps
Pipeline State ObjectInput Assembler
Vertex Shader
Hull Shader
Tessellator
Rasterizer
Domain Shader
Geometry Shader
Pixel Shader
Output Merger
GPU Memory
Non-PSO State
![Page 24: Ivan Nevraev Microsoft Introduction to Direct3D 12.](https://reader038.fdocuments.in/reader038/viewer/2022102900/551c1c80550346a84f8b594f/html5/thumbnails/24.jpg)
Render Context: Direct3D 12
Pipeline State ObjectInput Assembler
Vertex Shader
Hull Shader
Tessellator
Rasterizer
Domain Shader
Geometry Shader
Pixel Shader
Output Merger
GPU Memory
Non-PSO State
![Page 25: Ivan Nevraev Microsoft Introduction to Direct3D 12.](https://reader038.fdocuments.in/reader038/viewer/2022102900/551c1c80550346a84f8b594f/html5/thumbnails/25.jpg)
CPU Overhead: Redundant Render Commands• Typical applications send identical sequences of commands frame-
over-frame• Measured 90-95% coherence on typical modern games
![Page 26: Ivan Nevraev Microsoft Introduction to Direct3D 12.](https://reader038.fdocuments.in/reader038/viewer/2022102900/551c1c80550346a84f8b594f/html5/thumbnails/26.jpg)
Bundles
• Small command list• Recorded once• Reused multiple times
• Free threaded creation• Inherits from execute site• Non-PSO State• Descriptor Table Bindings
• Restrictions to ensure efficient driver implementation
![Page 27: Ivan Nevraev Microsoft Introduction to Direct3D 12.](https://reader038.fdocuments.in/reader038/viewer/2022102900/551c1c80550346a84f8b594f/html5/thumbnails/27.jpg)
Bundles
Context
ClearDrawSetTableExecute BundleSetTableExecute BundleSetPSO…
SetP
SODra
wSe
tPSO
SetTa
bleDisp
atch
SetP
SOSe
tTable
Draw
SetP
SODra
w
![Page 28: Ivan Nevraev Microsoft Introduction to Direct3D 12.](https://reader038.fdocuments.in/reader038/viewer/2022102900/551c1c80550346a84f8b594f/html5/thumbnails/28.jpg)
Example code without Bundles// Setup
pContext->SetPipelineState(pPSO);
pContext->SetRenderTargetViewTable(0, 1, FALSE, 0);
pContext->SetVertexBufferTable(0, 1);
pContext->IASetPrimitiveTopology(D3D_PRIMITIVE_TOPOLOGY_TRIANGLELIST);
// Draw 1
pContext->SetConstantBufferViewTable(D3D12_SHADER_STAGE_PIXEL, 0, 1);
pContext->SetShaderResourceViewTable(D3D12_SHADER_STAGE_PIXEL, 0, 1);
pContext->DrawInstanced(6, 1, 0, 0);
pContext->SetShaderResourceViewTable(D3D12_SHADER_STAGE_PIXEL, 1, 1);
pContext->DrawInstanced(6, 1, 6, 0);
// Draw 2
pContext->SetConstantBufferViewTable(D3D12_SHADER_STAGE_PIXEL, 1, 1);
pContext->SetShaderResourceViewTable(D3D12_SHADER_STAGE_PIXEL, 0, 1);
pContext->DrawInstanced(6, 1, 0, 0);
pContext->SetShaderResourceViewTable(D3D12_SHADER_STAGE_PIXEL, 1, 1);
pContext->DrawInstanced(6, 1, 6, 0);
Set object #1 specific tables and draw
Setup pipeline state and common descriptor tables
Set object #2 specific tables and draw
![Page 29: Ivan Nevraev Microsoft Introduction to Direct3D 12.](https://reader038.fdocuments.in/reader038/viewer/2022102900/551c1c80550346a84f8b594f/html5/thumbnails/29.jpg)
Bundles – Creating a Bundle
// Create bundle
pDevice->CreateCommandList(D3D12_COMMAND_LIST_TYPE_BUNDLE, pBundleAllocator, pPSO, pDescriptorHeap, &pBundle);
// Record commands
pBundle->IASetPrimitiveTopology(D3D_PRIMITIVE_TOPOLOGY_TRIANGLELIST);
pBundle->SetShaderResourceViewTable(D3D12_SHADER_STAGE_PIXEL, 0, 1);
pBundle->DrawInstanced(6, 1, 0, 0);
pBundle->SetShaderResourceViewTable(D3D12_SHADER_STAGE_PIXEL, 1, 1);
pBundle->DrawInstanced(6, 1, 6, 0);
pBundle->Close();
![Page 30: Ivan Nevraev Microsoft Introduction to Direct3D 12.](https://reader038.fdocuments.in/reader038/viewer/2022102900/551c1c80550346a84f8b594f/html5/thumbnails/30.jpg)
No Bundles// Setup
pContext->SetPipelineState(pPSO);
pContext->SetRenderTargetViewTable(0, 1, FALSE, 0);
pContext->SetVertexBufferTable(0, 1);
pContext->IASetPrimitiveTopology(D3D_PRIMITIVE_TOPOLOGY_TRIANGLELIST);
// Draw 1
pContext->SetConstantBufferViewTable(D3D12_SHADER_STAGE_PIXEL, 0, 1);
pContext->SetShaderResourceViewTable(D3D12_SHADER_STAGE_PIXEL, 0, 1);
pContext->DrawInstanced(6, 1, 0, 0);
pContext->SetShaderResourceViewTable(D3D12_SHADER_STAGE_PIXEL, 1, 1);
pContext->DrawInstanced(6, 1, 6, 0);
// Draw 2
pContext->SetConstantBufferViewTable(D3D12_SHADER_STAGE_PIXEL, 1, 1);
pContext->SetShaderResourceViewTable(D3D12_SHADER_STAGE_PIXEL, 0, 1);
pContext->DrawInstanced(6, 1, 0, 0);
pContext->SetShaderResourceViewTable(D3D12_SHADER_STAGE_PIXEL, 1, 1);
pContext->DrawInstanced(6, 1, 6, 0);
// Setup
pContext->SetRenderTargetViewTable(0, 1, FALSE, 0);
pContext->SetVertexBufferTable(0, 1);
// Draw 1 and 2
pContext->SetConstantBufferViewTable(D3D12_SHADER_STAGE_PIXEL, 0, 1);
pContext->ExecuteBundle(pBundle);
pContext->SetConstantBufferViewTable(D3D12_SHADER_STAGE_PIXEL, 1, 1);
pContext->ExecuteBundle(pBundle);
Bundles
![Page 31: Ivan Nevraev Microsoft Introduction to Direct3D 12.](https://reader038.fdocuments.in/reader038/viewer/2022102900/551c1c80550346a84f8b594f/html5/thumbnails/31.jpg)
Bundles: CPU performance improvements• PC – 0.7ms to 0.2ms in a simple test (GPU bound)• Xbox• 1/3 CPU consumption for rendering submission in one game• 100s of thousand DrawBundle executions are possible per 60FPS frame
• Even one draw per draw bundle helps• Saves engine overhead
![Page 32: Ivan Nevraev Microsoft Introduction to Direct3D 12.](https://reader038.fdocuments.in/reader038/viewer/2022102900/551c1c80550346a84f8b594f/html5/thumbnails/32.jpg)
Direct3D 12 – Command Creation Parallelism• About that context…
• No Immediate Context• All rendering via Command Lists• Command Lists are submitted on a Command Queue
![Page 33: Ivan Nevraev Microsoft Introduction to Direct3D 12.](https://reader038.fdocuments.in/reader038/viewer/2022102900/551c1c80550346a84f8b594f/html5/thumbnails/33.jpg)
Command Lists and Command Queue• Application responsible for• Hazard tracking• Declaring maximum number of recording command lists • Resource renaming with GPU signaled fence• Resources lifetime referenced by command lists
• Fence operations on the Command Queue• Not on Command List or Bundle• Signals occur on Command List completion
• Command List submission cost reduced by WDDM 2.0
![Page 34: Ivan Nevraev Microsoft Introduction to Direct3D 12.](https://reader038.fdocuments.in/reader038/viewer/2022102900/551c1c80550346a84f8b594f/html5/thumbnails/34.jpg)
Command Queue
Command Queue
Execute Command List 1Execute Command List 2Signal Fence
Command List 1ClearSetTableExecute Bundle ASetTableDrawSetPSODraw
Command List 2ClearDispatchSetTableExecute Bundle ASetTableExecute Bundle B
SetP
SODra
wSe
tPSO
SetTa
bleDisp
atch
SetP
SOSe
tTable
Draw
SetP
SODra
w
![Page 35: Ivan Nevraev Microsoft Introduction to Direct3D 12.](https://reader038.fdocuments.in/reader038/viewer/2022102900/551c1c80550346a84f8b594f/html5/thumbnails/35.jpg)
Command Queue
Command Queue
Execute Command List 1Execute Command List 2Signal Fence
Command List 1ClearSetTableExecute Bundle ASetTableDrawSetPSODraw
Command List 2ClearDispatchSetTableExecute Bundle ASetTableExecute Bundle B
SetP
SODra
wSe
tPSO
SetTa
bleDisp
atch
SetP
SOSe
tTable
Draw
SetP
SODra
w
![Page 36: Ivan Nevraev Microsoft Introduction to Direct3D 12.](https://reader038.fdocuments.in/reader038/viewer/2022102900/551c1c80550346a84f8b594f/html5/thumbnails/36.jpg)
Dynamic Heaps
• Resource Renaming Overhead• Significant CPU overhead on ExecuteCommandList• Significant driver complexity
• Solution: Efficient Application Suballocation• Application creates large buffer resource and suballocates• Data type determined by application• Standardized alignment requirements• Persistently mapped memory
![Page 37: Ivan Nevraev Microsoft Introduction to Direct3D 12.](https://reader038.fdocuments.in/reader038/viewer/2022102900/551c1c80550346a84f8b594f/html5/thumbnails/37.jpg)
Allocation vs. Suballocation
GPU Memory Resource 2Resource 1Heap
CB IB VB …
GPU Memory Resource 2Resource 1
CB IB VB
![Page 38: Ivan Nevraev Microsoft Introduction to Direct3D 12.](https://reader038.fdocuments.in/reader038/viewer/2022102900/551c1c80550346a84f8b594f/html5/thumbnails/38.jpg)
Direct3D 12 – CPU Parallelism
• Direct3D 12 has several parallel tasks• Command List Generation• Bundle Generation• PSO Creation• Resource Creation• Dynamic Data Generation
• Runtime and driver designed for parallelism• Developer chooses what to make parallel
![Page 39: Ivan Nevraev Microsoft Introduction to Direct3D 12.](https://reader038.fdocuments.in/reader038/viewer/2022102900/551c1c80550346a84f8b594f/html5/thumbnails/39.jpg)
D3D11 Profiling
PresentApp Logic D3D11 UMD KMDDXGK
App Logic D3D11
App Logic D3D11
App Logic D3D11
Thread 0
Thread 1
Thread 2
Thread 3
0 ms 2.50 ms 5.00 ms 7.50 ms
App Logic D3D Runtime User-mode Driver DXGKernel Kernel-mode Driver Present
![Page 40: Ivan Nevraev Microsoft Introduction to Direct3D 12.](https://reader038.fdocuments.in/reader038/viewer/2022102900/551c1c80550346a84f8b594f/html5/thumbnails/40.jpg)
D3D12 Profiling
App Logic UMDD3D
12 Pres
entD
XGK/
KMD
App Logic UMDD3D
12App Logic UMDD
3D12
App Logic UMDD3D
12
Thread 0
Thread 1
Thread 2
Thread 3
0 ms 2.50 ms 5.00 ms 7.50 ms
App Logic D3D Runtime User-mode Driver DXGKernel Kernel-mode Driver Present
![Page 41: Ivan Nevraev Microsoft Introduction to Direct3D 12.](https://reader038.fdocuments.in/reader038/viewer/2022102900/551c1c80550346a84f8b594f/html5/thumbnails/41.jpg)
D3D11 v D3D12 numbers
App Logic UMDD3
D1
2 Pre
se ntDXG
K/ KMD
App Logic UMDD3
D1
2
App Logic UMDD3
D1
2
App Logic UMDD3
D1
2
Thread 0
Thread 1
Thread 2
Thread 3
0 ms 2.50 ms 5.00 ms 7.50 ms
PresentApp Logic D3D11 UMD KMDDXGK
App Logic D 3 D 1 1
App Logic D 3 D 1 1
App Logic D 3 D 1 1
Thread 0
Thread 1
Thread 2
Thread 3
0 ms 2.50 ms 5.00 ms 7.50 ms
App+GFX (ms) GFX-only (ms)
D3D11 D3D12 D3D11 D3D12
Thread 0 7.88 3.80 5.73 1.17
Thread 1 3.08 2.50 0.35 0.81
Thread 2 2.84 2.46 0.34 0.69
Thread 3 2.63 2.45 0.23 0.65
Total 16.42 11.21 6.65 3.32
![Page 42: Ivan Nevraev Microsoft Introduction to Direct3D 12.](https://reader038.fdocuments.in/reader038/viewer/2022102900/551c1c80550346a84f8b594f/html5/thumbnails/42.jpg)
Summary
• Greater CPU Efficiency• Greater CPU Scalability• Greater Developer Control• CPU Parallelism• Resource Lifetime• Memory Usage
![Page 43: Ivan Nevraev Microsoft Introduction to Direct3D 12.](https://reader038.fdocuments.in/reader038/viewer/2022102900/551c1c80550346a84f8b594f/html5/thumbnails/43.jpg)
The End