USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make...
Transcript of USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make...
![Page 1: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/1.jpg)
Stuart Schaefer, Microsoft
Yury Uralsky, NVIDIA
Daniel Kennett, Microsoft
USING AI TO ACCELERATE YOUR GAME (PART 2)
![Page 2: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/2.jpg)
2
WINDOWS MACHINE LEARNING
Applications of ML to make games more real are rapidly becoming more practical
Windows Machine Learning allows developers to utilize trained models for inference on AI capable silicon running with Windows
Create ML models with training frameworks, convert to ONNX, and integrate Windows ML in your game workflow
Introduction
ML Superscaling vs Bilinear Upscaling
![Page 3: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/3.jpg)
3
WINDOWS ML ARCHITECTURE
WinML API
Win32 & WinRT Layers
Converts images to Tensor Resources
Available on all Windows editions in 2018
Inference Engine
Model & Device resource management
Loads and compiles operator kernels
Execute dataflow graph
Device Layer
CPU instruction optimizations up to AVX-512
DirectML generates DX12 Compute shaders
Direct3D
GPU
DirectML
Model Inference Engine
WinML Win32 API
WinML APIInput
Surface
Output
Surface
Application #1
WinML Runtime
Application #2
NPU
CPU
![Page 4: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/4.jpg)
4
DIRECTML
ML model defines dataflow graph of mathematical operators
Evaluation is computationally expensive and highly parallelizable
DirectML operators defined using DX12 Compute and HLSL
Baseline for breadth platform support, end-to-end use of GPU
Enable the Inferencing Engine to leverage non-CPU silicon
Provide strong hardware platform coherency to unlock performance
DirectML manages interaction with D3D resource model – PSO, CommandLists, …
MetaCommands - IHV accelerated replacement for graph operators
![Page 5: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/5.jpg)
5
WINDOWS ML ACCELERATION
![Page 6: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/6.jpg)
6
MACHINE LEARNING MODELSCan be viewed as programs
sqr
a
10
tmp1 tmp2
res
int
b tmp3
w
res = sqr(a * 10) + b – w;
const int
int
float float float
const floatint
*
+ -
![Page 7: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/7.jpg)
7
MACHINE LEARNING MODELSCan be viewed as programs
conv
2dReLU
a
f
tmp1 tmp2
res+
tensor
b FCtmp3
w
res = FC(ReLU(Conv2d(a, f)) + b, w);
const tensor
tensor
tensor tensor tensor
const tensortensor
![Page 8: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/8.jpg)
8
TENSORSMulti-dimensional arrays of scalars
Heig
ht (H
)
R
G
B
![Page 9: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/9.jpg)
9
PROGRAM-CENTRIC VIEW
Machine learning operators low-level machine instructions
Tensors basic datatypes
WinML graph optimizer code compiler
Want to define a tensor machine ISA
Of machine learning
![Page 10: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/10.jpg)
10
D3D12 METACOMMANDS
![Page 11: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/11.jpg)
11
METACOMMANDSTensor machine ISA
conv
2dReLU
a
f
tmp1 tmp2
res+b FCtmp3
w
![Page 12: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/12.jpg)
12
METACOMMANDSTensor machine ISA
conv
2dReLU
a
f
tmp1 tmp2
res+b FCtmp3
w
Fully connected metacommand
![Page 13: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/13.jpg)
13
METACOMMANDSTensor machine ISA
conv
2dReLU
a
f
tmp1 tmp2
res+b FCtmp3
w
Element-wise metacommand
![Page 14: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/14.jpg)
14
ReLU tmp2
METACOMMANDSTensor machine ISA
conv
2d
a
f
tmp1
res+b FCtmp3
w
Convolution + activation
metacommand
![Page 15: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/15.jpg)
15
METACOMMANDSGeneral definition
Meta
Command……
Input 0
Input N
Output 0
Output M
Abstracts an optimized machine
function
![Page 16: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/16.jpg)
16
METACOMMANDSGeneral definition
Meta
Command
Scratch space
Persistent meta-data
Initialized at creation time
Read and written during meta command execution
……
Input 0
Input N
Output 0
Output M
![Page 17: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/17.jpg)
17
LIFE OF A METACOMMAND
Enumerate metacommands and their signatures
Query implementation for supported metacommands
Create metacommand
Create metacommand object instance
Get required resource sizes
Query memory footprint requirements for parameters and scratch space
Metacommand overview
![Page 18: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/18.jpg)
18
LIFE OF A METACOMMAND
Initialize metacommand
Initialize persistent scratch resources of the metacommand
Insert into the command list
Schedule metacommand instance for execution on the command list
Metacommand overview
![Page 19: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/19.jpg)
19
METACOMMAND PARAMETERSSpecified in memory structures
Per input, output tensor Type
Location GPU_VIRTUAL_ADDRESS
Element type FP32 or FP16
Dimensions {batch, width, height, depth}
Memory layout NCHW or hardware native
Per metacommand instance Type
Scratch GPU_VIRTUAL_ADDRESS
Persistent meta-data GPU_VIRTUAL_ADDRESS
Metacommand options <varies>
![Page 20: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/20.jpg)
20
METACOMMAND EXAMPLES
![Page 21: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/21.jpg)
21
FULLY CONNECTED OPERATOREach element in the output depends on all elements in the input
𝑦𝑖 = 𝒇(
𝑗
𝑤𝑖𝑗 ∙ 𝑥𝑗)
0
1
2
0
1
2
𝑤𝑖𝑗 𝑦𝑖𝑥𝑗
![Page 22: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/22.jpg)
22
FULLY CONNECTED OPERATOREach element in the output depends on all elements in the input
𝑦𝑖 = 𝒇(
𝑗
𝑤𝑖𝑗 ∙ 𝑥𝑗)
0
1
2
0
1
2
𝑤𝑖𝑗𝑥𝑗 𝑦𝑖
![Page 23: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/23.jpg)
23
FULLY CONNECTED OPERATOREach element in the output depends on all elements in the input
𝑦𝑖 = 𝒇(
𝑗
𝑤𝑖𝑗 ∙ 𝑥𝑗)
0
1
2
0
1
2
𝑤𝑖𝑗𝑥𝑗 𝑦𝑖
![Page 24: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/24.jpg)
24
FULLY CONNECTED OPERATOREach element in the output depends on all elements in the input
𝑦𝑖 = 𝒇(
𝑗
𝑤𝑖𝑗 ∙ 𝑥𝑗)
0
1
2
0
1
2
𝑤𝑖𝑗𝑥𝑗 𝑦𝑖
![Page 25: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/25.jpg)
25
FULLY CONNECTED OPERATION AS A MATRIX MULTIPLICATION
![Page 26: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/26.jpg)
26
FULLY CONNECTED OPERATORAs matrix multiplication
=*
𝑛 = 12𝑚=16 𝑛
=12
𝑚=16
𝐖
𝐗 𝐘
![Page 27: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/27.jpg)
27
CONVOLUTIONS ON TENSORSConvolving input tensor with 3 filter kernels
Input tensor X [4, 8, 8] 3 filter kernels W [4, 2, 2]
![Page 28: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/28.jpg)
28
CONVOLUTIONS ON TENSORSApplying red filter kernel across entire tensor with a stride of 3
𝑦𝑗 =
𝑖
𝑤𝑖 ∙ 𝑥𝑘
![Page 29: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/29.jpg)
29
CONVOLUTIONS ON TENSORS… Repeat for green
𝑦𝑗 =
𝑖
𝑤𝑖 ∙ 𝑥𝑘
![Page 30: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/30.jpg)
30
CONVOLUTIONS ON TENSORS… Repeat for blue
𝑦𝑗 =
𝑖
𝑤𝑖 ∙ 𝑥𝑘
![Page 31: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/31.jpg)
31
CONVOLUTIONS ON TENSORSStack them together to form the resulting tensor
![Page 32: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/32.jpg)
32
CONVOLUTIONS ON TENSORSConvolving input tensor with 3 filter kernels
Input tensor [4, 8, 8] 3 filter kernels [4, 2, 2]
* =
Output tensor [3, 3, 3]
![Page 33: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/33.jpg)
33
LOWERING CONVOLUTIONS TO MATRIX MULTIPLICATION
![Page 34: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/34.jpg)
34
CONVOLUTION AS MATRIX MULTIPLICATIONForming the filter matrix
𝑘 = 16
𝑚 = 3𝐅 =
![Page 35: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/35.jpg)
35
CONVOLUTION AS MATRIX MULTIPLICATIONForming the input tensor matrix
𝐗 =
Input tensor X [4, 8, 8]
𝑛 = 9
𝑘 = 16
![Page 36: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/36.jpg)
36
CONVOLUTION AS MATRIX MULTIPLICATIONEvaluating the matrix product
𝐅𝑚 = 3
𝑘 = 16
𝐗
𝑘=16
𝑛 = 9
Output tensor [3, 3, 3]
𝐅 ∙ 𝐗
![Page 37: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/37.jpg)
37
EFFICIENT MATRIX MULTIPLICATION
![Page 38: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/38.jpg)
38
MATRIX MULTIPLICATION
![Page 39: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/39.jpg)
39
MATRIX MULTIPLICATIONA naïve method
![Page 40: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/40.jpg)
40
MATRIX MULTIPLICATIONA naïve method
![Page 41: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/41.jpg)
41
MATRIX MULTIPLICATIONA naïve method
![Page 42: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/42.jpg)
42
MATRIX MULTIPLICATIONA naïve method
![Page 43: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/43.jpg)
43
MATRIX MULTIPLICATIONA naïve method
![Page 44: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/44.jpg)
44
MATRIX MULTIPLICATIONA naïve method
![Page 45: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/45.jpg)
45
TILED MATRIX MULTIPLICATIONMemory efficient
𝐁
𝐀 𝐃
𝐃 = 𝐀 ∙ 𝐁
![Page 46: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/46.jpg)
46
TILED MATRIX MULTIPLICATIONMemory efficient
𝐃 += 𝐀 ∙ 𝐁
𝐀
𝐁
𝐃
![Page 47: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/47.jpg)
47
TILED MATRIX MULTIPLICATIONMemory efficient
𝐃 += 𝐀 ∙ 𝐁
𝐀
𝐁
𝐃
![Page 48: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/48.jpg)
48
TILED MATRIX MULTIPLICATIONMemory efficient
𝐃 += 𝐀 ∙ 𝐁
𝐀
𝐁
𝐃
![Page 49: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/49.jpg)
49
TILED MATRIX MULTIPLICATIONMemory efficient, hierarchical
𝐀
𝐁
𝐃
𝐃 += 𝐀 ∙ 𝐁
![Page 50: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/50.jpg)
50
VOLTA TENSOR CORES
![Page 51: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/51.jpg)
51
VOLTA SM TENSOR CORESDedicated hardware datapaths for machine learning acceleration
8 Tensor cores per SM
512 FMA operations per clock
Mixed precision operation
110 TFLOPs peak on TitanV
![Page 52: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/52.jpg)
52
TENSOR CORE OPERATIONA fixed-size matrix multiply-add
= +
*
FP16
FP16
FP16 or FP32FP16 or FP32
𝐀 ∙ 𝐁
𝐁
𝐀𝐃 𝐂
![Page 53: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/53.jpg)
53
TENSOR CORE OPERATION
Warp-synchronizing operation
Warp-wide Matrix Math
Composed Matrix Multiply and Accumulate
Result distributed across warp
warp
= +
![Page 54: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/54.jpg)
54
TENSOR CORE PERFORMANCEAlmost an order of magnitude performance increase
Rela
tive P
erf
orm
ance
P100 V100 – Tensor Cores
9.3xfaster
cuBLAS Mixed Precision (FP16 input, FP32 compute)
Matrix Multiply (M=N=K=2048)
![Page 55: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/55.jpg)
55
STYLE TRANSFER DEMO
![Page 56: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/56.jpg)
56
STYLE TRANSFER
16 convolution layers with residual connections
Almost 640 billion floating-point operations per evaluation
80% of peak tensor core performance delivered in most convolution layers
Up to 88 TFLOPs compute density
Some facts about the demo
![Page 57: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/57.jpg)
57
STYLE TRANSFER PERFORMANCENVIDIA TitanV @ 1080p, absolute perf
3.48.6
27
0
5
10
15
20
25
30
35
Baseline DirectML FP32 Metacommands Tensor-core acceleratedmetacommands
Fra
mes
per
second
2.5x
3.1x
![Page 58: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/58.jpg)
58
STYLE TRANSFER PERFORMANCENVIDIA TitanV @ 1080p, relative speedup
0
2
4
6
8
10
Baseline DirectML FP32 Metacommands Tensor-core acceleratedmetacommands
Rela
tive s
peedup
8x
2.5x1x
![Page 59: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/59.jpg)
59
SUMMARY
DirectML implements core machine learning operations in DirectCompute
Metacommands allow implementations export optimized versions of those operations
NVIDIA’s HW and SW provides accelerated support for machine learning
Machine learning is here and can be used in games today
Windows ML is available now in Windows 10
![Page 60: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/60.jpg)
60
GDC ATTENDEES SAVE 20%
Featuring new gaming track focused on deep
learning and AI on Monday, 26 March 2018
REGISTER AT: www.gputechconf.com
USE CODE: GMGDC
![Page 61: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/61.jpg)
Booth #223 - South Hall
www.nvidia.com/GDC
Stuart Schaefer, Microsoft
Yury Uralsky, NVIDIA
Daniel Kennett, Microsoft
![Page 62: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/62.jpg)
62
CONVOLUTIONS ON TENSORS
![Page 63: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/63.jpg)
63
CONVOLUTON OPERATOREach element in the output depends on the local neighborhood in the
input
0
x y
1
2
3
4
0
1
2
3
4
𝑦𝑖 = 𝒇(
𝑗=0
𝑘−1
𝑤𝑗 ∙ 𝑥𝑖+𝑗−𝑘2)
0
0
![Page 64: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/64.jpg)
64
CONVOLUTON OPERATOREach element in the output depends on the local neighborhood in the
input
0
x y
1
2
3
4
0
1
2
3
4
𝑦𝑖 = 𝒇(
𝑗=0
𝑘−1
𝑤𝑗 ∙ 𝑥𝑖+𝑗−𝑘2)
0
0
![Page 65: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/65.jpg)
65
CONVOLUTON OPERATOREach element in the output depends on the local neighborhood in the
input
0
x y
1
2
3
4
0
1
2
3
4
𝑦𝑖 = 𝒇(
𝑗=0
𝑘−1
𝑤𝑗 ∙ 𝑥𝑖+𝑗−𝑘2)
0
0
![Page 66: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/66.jpg)
66
CONVOLUTON OPERATOREach element in the output depends on the local neighborhood in the
input
0
x y
1
2
3
4
0
1
2
3
4
𝑦𝑖 = 𝒇(
𝑗=0
𝑘−1
𝑤𝑗 ∙ 𝑥𝑖+𝑗−𝑘2)
0
0
![Page 67: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/67.jpg)
67
CONVOLUTON OPERATOREach element in the output depends on the local neighborhood in the
input
0
x y
1
2
3
4
0
1
2
3
4
𝑦𝑖 = 𝒇(
𝑗=0
𝑘−1
𝑤𝑗 ∙ 𝑥𝑖+𝑗−𝑘2)
0
0
![Page 68: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/68.jpg)
68
CONVOLUTON OPERATOREach element in the output depends on the local neighborhood in the
input
0
x y
1
2
3
4
0
1
2
3
4
𝑦𝑖 = 𝒇(
𝑗=0
𝑘−1
𝑤𝑗 ∙ 𝑥𝑖+𝑗−𝑘2)
0
0
![Page 69: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/69.jpg)
69
CONVOLUTION AS MATRIX MULTIPLICATION
![Page 70: USING AI TO ACCELERATE YOUR GAME (PART 2)...WINDOWS MACHINE LEARNING Applications of ML to make games more real are rapidly becoming more practical Windows Machine Learning allows](https://reader035.fdocuments.in/reader035/viewer/2022070809/5f07d64f7e708231d41f0004/html5/thumbnails/70.jpg)
70
METACOMMANDSTensor machine ISA
conv
2dReLU
a
f
tmp1 tmp2
res+b FCtmp3
w
Activation metacommand