Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · MATLAB Coder and GPU Coder...
Transcript of Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · MATLAB Coder and GPU Coder...
![Page 1: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · MATLAB Coder and GPU Coder generates code that takes advantage of: NVIDIA® CUDA libraries, including TensorRT](https://reader030.fdocuments.in/reader030/viewer/2022040302/5e8152aa8c17217a1b7d8d26/html5/thumbnails/1.jpg)
1© 2015 The MathWorks, Inc.
Deploying Deep Neural Networks
to Embedded GPUs and CPUs
Steven Thomsett
![Page 2: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · MATLAB Coder and GPU Coder generates code that takes advantage of: NVIDIA® CUDA libraries, including TensorRT](https://reader030.fdocuments.in/reader030/viewer/2022040302/5e8152aa8c17217a1b7d8d26/html5/thumbnails/2.jpg)
2
Deep Learning Workflow in MATLAB
Application
logic
Application
Design
Standalone
Deployment
Deep Neural Network
Design + Training
Trained
DNN
![Page 3: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · MATLAB Coder and GPU Coder generates code that takes advantage of: NVIDIA® CUDA libraries, including TensorRT](https://reader030.fdocuments.in/reader030/viewer/2022040302/5e8152aa8c17217a1b7d8d26/html5/thumbnails/3.jpg)
3
Deep Neural Network Design and Training
▪ Design in MATLAB
▪ Manage large data sets
▪ Automate data labeling
▪ Easy access to models
▪ Training in MATLAB
▪ Acceleration with GPU’s
▪ Scale to clusters
Train in MATLAB
Model
importer
Trained
DNN
Transfer
learning
Reference
model
![Page 4: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · MATLAB Coder and GPU Coder generates code that takes advantage of: NVIDIA® CUDA libraries, including TensorRT](https://reader030.fdocuments.in/reader030/viewer/2022040302/5e8152aa8c17217a1b7d8d26/html5/thumbnails/4.jpg)
4
Application Design
Pre-
processing
Post-
processing
![Page 5: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · MATLAB Coder and GPU Coder generates code that takes advantage of: NVIDIA® CUDA libraries, including TensorRT](https://reader030.fdocuments.in/reader030/viewer/2022040302/5e8152aa8c17217a1b7d8d26/html5/thumbnails/5.jpg)
5
Multi-Platform Deep Learning Deployment
Application
logic
![Page 6: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · MATLAB Coder and GPU Coder generates code that takes advantage of: NVIDIA® CUDA libraries, including TensorRT](https://reader030.fdocuments.in/reader030/viewer/2022040302/5e8152aa8c17217a1b7d8d26/html5/thumbnails/6.jpg)
6
Multi-Platform Deep Learning Deployment
Application
logic
Embedded
MobileNVIDIA Jetson Raspberry pi
Beaglebone
Desktop Data Center
…
![Page 7: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · MATLAB Coder and GPU Coder generates code that takes advantage of: NVIDIA® CUDA libraries, including TensorRT](https://reader030.fdocuments.in/reader030/viewer/2022040302/5e8152aa8c17217a1b7d8d26/html5/thumbnails/7.jpg)
7
Algorithm Design to Embedded Deployment Workflow
Conventional Approach
Functional test1
Desktop
GPU
Deployment
unit-test
2
Desktop
GPU
C++
Deployment
integration-test
3
Embedded GPU
C++
Real-time test4
High-level language
Deep learning framework
Large, complex software stack
Challenges
• Integrating multiple libraries and packages
• Verifying and maintaining multiple
implementations
• Algorithm & vendor lock-in
C/C++
Low-level APIs
Application-specific libraries
C/C++
Target-optimized libraries
Optimize for memory & speed
![Page 8: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · MATLAB Coder and GPU Coder generates code that takes advantage of: NVIDIA® CUDA libraries, including TensorRT](https://reader030.fdocuments.in/reader030/viewer/2022040302/5e8152aa8c17217a1b7d8d26/html5/thumbnails/8.jpg)
8
Solution: Use MATLAB Coder & GPU Coder for
Deep Learning Deployment
Application
logic
GPU
Coder
NVIDIA
TensorRT &
cuDNN
Libraries
Intel
MKL-DNN
Library
MATLAB
Coder
Target Libraries
ARM
Compute
Library
![Page 9: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · MATLAB Coder and GPU Coder generates code that takes advantage of: NVIDIA® CUDA libraries, including TensorRT](https://reader030.fdocuments.in/reader030/viewer/2022040302/5e8152aa8c17217a1b7d8d26/html5/thumbnails/9.jpg)
9
Solution: Use MATLAB Coder & GPU Coder for
Deep Learning Deployment
Application
logic
GPU
Coder
NVIDIA
TensorRT &
cuDNN
Libraries
Intel
MKL-DNN
Library
MATLAB
Coder
ARM
Compute
Library
![Page 10: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · MATLAB Coder and GPU Coder generates code that takes advantage of: NVIDIA® CUDA libraries, including TensorRT](https://reader030.fdocuments.in/reader030/viewer/2022040302/5e8152aa8c17217a1b7d8d26/html5/thumbnails/10.jpg)
10
Deep Learning Deployment Workflows
Pre-
processing
Post-
processing
codegen
Portable target code
INTEGRATED APPLICATION DEPLOYMENT
cnncodegen
Portable target code
INFERENCE ENGINE DEPLOYMENT
Trained
DNN
![Page 11: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · MATLAB Coder and GPU Coder generates code that takes advantage of: NVIDIA® CUDA libraries, including TensorRT](https://reader030.fdocuments.in/reader030/viewer/2022040302/5e8152aa8c17217a1b7d8d26/html5/thumbnails/11.jpg)
12
Workflow for Inference Engine Deployment
Steps for inference engine deployment
1. Generate the code for trained model>> cnncodegen(net, 'targetlib’, ‘arm-
compute’)
2. Copy the generated code onto target board
3. Build the code for the inference engine>> make –C ./codegen –f …mk
4. Use hand written main function to call inference
engine
5. Generate the exe and test the executable>> make –C ./ ……
cnncodegen
Portable target code
INFERENCE ENGINE DEPLOYMENT
Trained
DNN
![Page 12: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · MATLAB Coder and GPU Coder generates code that takes advantage of: NVIDIA® CUDA libraries, including TensorRT](https://reader030.fdocuments.in/reader030/viewer/2022040302/5e8152aa8c17217a1b7d8d26/html5/thumbnails/12.jpg)
14
Deep Learning Inference Deployment
Application
logic
GPU
Coder
NVIDIA
TensorRT &
cuDNN
Libraries
Intel
MKL-DNN
Library
MATLAB
Coder
Target Libraries
ARM
Compute
Library
Pedestrian Detection
![Page 13: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · MATLAB Coder and GPU Coder generates code that takes advantage of: NVIDIA® CUDA libraries, including TensorRT](https://reader030.fdocuments.in/reader030/viewer/2022040302/5e8152aa8c17217a1b7d8d26/html5/thumbnails/13.jpg)
15
Deep Learning Inference Deployment
Application
logic
GPU
Coder
NVIDIA
TensorRT &
cuDNN
Libraries
Intel
MKL-DNN
Library
MATLAB
Coder
Target Libraries
ARM
Compute
Library
Blood Smear Segmentation
![Page 14: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · MATLAB Coder and GPU Coder generates code that takes advantage of: NVIDIA® CUDA libraries, including TensorRT](https://reader030.fdocuments.in/reader030/viewer/2022040302/5e8152aa8c17217a1b7d8d26/html5/thumbnails/14.jpg)
16
ARM
Compute
Library
Deep Learning Inference Deployment
Application
logic
GPU
Coder
NVIDIA
TensorRT &
cuDNN
Libraries
Intel
MKL-DNN
Library
MATLAB
Coder
Target Libraries
Defect Classification &
Detection
![Page 15: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · MATLAB Coder and GPU Coder generates code that takes advantage of: NVIDIA® CUDA libraries, including TensorRT](https://reader030.fdocuments.in/reader030/viewer/2022040302/5e8152aa8c17217a1b7d8d26/html5/thumbnails/15.jpg)
18
ARM
Compute
Library
Application
logic
GPU
Coder
NVIDIA
TensorRT &
cuDNN
Libraries
Intel
MKL-DNN
Library
MATLAB
Coder
Target Libraries
How is the
Performance?
![Page 16: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · MATLAB Coder and GPU Coder generates code that takes advantage of: NVIDIA® CUDA libraries, including TensorRT](https://reader030.fdocuments.in/reader030/viewer/2022040302/5e8152aa8c17217a1b7d8d26/html5/thumbnails/16.jpg)
19
Performance of Generated Code
▪ CNN inference (ResNet-50, VGG-16, Inception V3) on Titan V GPU
▪ CNN inference (ResNet-50) on Jetson TX2
▪ CNN inference (ResNet-50 , VGG-16, Inception V3) on Intel Xeon CPU
![Page 17: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · MATLAB Coder and GPU Coder generates code that takes advantage of: NVIDIA® CUDA libraries, including TensorRT](https://reader030.fdocuments.in/reader030/viewer/2022040302/5e8152aa8c17217a1b7d8d26/html5/thumbnails/17.jpg)
20
Single Image Inference on Titan V using cuDNN
PyTorch (1.0.0)
MXNet (1.4.0)
GPU Coder (R2019a)
TensorFlow (1.13.0)
Intel® Xeon® CPU 3.6 GHz - NVIDIA libraries: CUDA10 - cuDNN 7 - Frameworks: TensorFlow 1.13.0, MXNet 1.4.0 PyTorch 1.0.0
![Page 18: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · MATLAB Coder and GPU Coder generates code that takes advantage of: NVIDIA® CUDA libraries, including TensorRT](https://reader030.fdocuments.in/reader030/viewer/2022040302/5e8152aa8c17217a1b7d8d26/html5/thumbnails/18.jpg)
21
Even Stronger Performance with INT8 using TensorRT
Intel® Xeon® CPU 3.6 GHz - NVIDIA libraries: CUDA10 - cuDNN 7.4.1 – TensorRT 5.0.2.6 - Frameworks: TensorFlow 1.13.0_rc0
Batch Size
GPU Coder + TensorRT (FP32)
TensorFlow + TensorRT (FP32)
ResNet-50 Inference (Titan V)
GPU Coder + TensorRT (INT8)
TensorFlow + TensorRT (INT8)
![Page 19: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · MATLAB Coder and GPU Coder generates code that takes advantage of: NVIDIA® CUDA libraries, including TensorRT](https://reader030.fdocuments.in/reader030/viewer/2022040302/5e8152aa8c17217a1b7d8d26/html5/thumbnails/19.jpg)
22
Single Image Inference on Jetson TX2
NVIDIA libraries: CUDA9 - cuDNN 7 – TensorRT 3.0.4 - Frameworks: TensorFlow 1.12.0
GPU Coder
+
TensorRT
TensorFlow
+
TensorRT
ResNet-50
![Page 20: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · MATLAB Coder and GPU Coder generates code that takes advantage of: NVIDIA® CUDA libraries, including TensorRT](https://reader030.fdocuments.in/reader030/viewer/2022040302/5e8152aa8c17217a1b7d8d26/html5/thumbnails/20.jpg)
23
CPU Performance
MATLAB
TensorFlow
MXNet
MATLAB Coder
PyTorch
Intel® Xeon® CPU 3.6 GHz - Frameworks: TensorFlow 1.6.0, MXNet 1.2.1, PyTorch 0.3.1
CPU, Single Image Inference (Linux)
![Page 21: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · MATLAB Coder and GPU Coder generates code that takes advantage of: NVIDIA® CUDA libraries, including TensorRT](https://reader030.fdocuments.in/reader030/viewer/2022040302/5e8152aa8c17217a1b7d8d26/html5/thumbnails/21.jpg)
24
Brief Summary
DNN libraries are great for inference, …
MATLAB Coder and GPU Coder generates code that takes advantage of:
NVIDIA® CUDA libraries, including TensorRT & cuDNN
Intel® Math Kernel Library for Deep Neural Networks
(MKL-DNN)
ARM® Compute libraries for mobile platforms
![Page 22: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · MATLAB Coder and GPU Coder generates code that takes advantage of: NVIDIA® CUDA libraries, including TensorRT](https://reader030.fdocuments.in/reader030/viewer/2022040302/5e8152aa8c17217a1b7d8d26/html5/thumbnails/22.jpg)
25
Brief Summary
DNN libraries are great for inference, …
MATLAB Coder and GPU Coder generates code that takes advantage of:
NVIDIA® CUDA libraries, including TensorRT & cuDNN
Intel® Math Kernel Library for Deep Neural Networks
(MKL-DNN)
ARM® Compute libraries for mobile platforms
But, Applications
Require More than
just Inference
![Page 23: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · MATLAB Coder and GPU Coder generates code that takes advantage of: NVIDIA® CUDA libraries, including TensorRT](https://reader030.fdocuments.in/reader030/viewer/2022040302/5e8152aa8c17217a1b7d8d26/html5/thumbnails/23.jpg)
26
Deep Learning Workflows: Integrated Application Deployment
Pre-
processing
Post-
processing
codegen
Portable target code
![Page 24: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · MATLAB Coder and GPU Coder generates code that takes advantage of: NVIDIA® CUDA libraries, including TensorRT](https://reader030.fdocuments.in/reader030/viewer/2022040302/5e8152aa8c17217a1b7d8d26/html5/thumbnails/24.jpg)
27
Lane
Detection
Strongest
Bounding
Box
Lane and Object Detection using YOLO v2
Post-
processing
Object
Detection
Workflow:
1) Test in MATLAB on CPU
2) Generate code and test on
desktop GPU
3) Generate code and test on
Jetson AGX Xavier GPU
AlexNet-based YOLO v2
![Page 25: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · MATLAB Coder and GPU Coder generates code that takes advantage of: NVIDIA® CUDA libraries, including TensorRT](https://reader030.fdocuments.in/reader030/viewer/2022040302/5e8152aa8c17217a1b7d8d26/html5/thumbnails/25.jpg)
28
Lane
Detection
Strongest
Bounding
Box
(1) Test in MATLAB on CPU
Post-
processing
Object
Detection
AlexNet-based YOLO v2
![Page 26: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · MATLAB Coder and GPU Coder generates code that takes advantage of: NVIDIA® CUDA libraries, including TensorRT](https://reader030.fdocuments.in/reader030/viewer/2022040302/5e8152aa8c17217a1b7d8d26/html5/thumbnails/26.jpg)
29
Lane
Detection
Strongest
Bounding
Box
(2) Generate Code and Test on Desktop GPU
Post-
processing
Object
Detection
cuDNN/TensorRT optimized code
CUDA optimized code
AlexNet-based YOLO v2
![Page 27: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · MATLAB Coder and GPU Coder generates code that takes advantage of: NVIDIA® CUDA libraries, including TensorRT](https://reader030.fdocuments.in/reader030/viewer/2022040302/5e8152aa8c17217a1b7d8d26/html5/thumbnails/27.jpg)
30
Lane
Detection
Strongest
Bounding
Box
(3) Generate Code and Test on Jetson AGX Xavier GPU
Post-
processing
Object
Detection
cuDNN/TensorRT optimized code
CUDA optimized code
AlexNet-based YOLO v2
![Page 28: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · MATLAB Coder and GPU Coder generates code that takes advantage of: NVIDIA® CUDA libraries, including TensorRT](https://reader030.fdocuments.in/reader030/viewer/2022040302/5e8152aa8c17217a1b7d8d26/html5/thumbnails/28.jpg)
31
Lane
Detection
Strongest
Bounding
Box
Lane and Object Detection using YOLO v2
Post-
processing
Object
Detection
cuDNN/TensorRT optimized code
CUDA optimized code
AlexNet-based YOLO v2
1) Running on CPU
2) 7X faster running generate
code on desktop GPU
3) Generate code and test on
Jetson AGX Xavier GPU
![Page 29: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · MATLAB Coder and GPU Coder generates code that takes advantage of: NVIDIA® CUDA libraries, including TensorRT](https://reader030.fdocuments.in/reader030/viewer/2022040302/5e8152aa8c17217a1b7d8d26/html5/thumbnails/29.jpg)
32
Accessing Hardware
Deploy Standalone
Application
Access Peripheral
from MATLAB
Processor-in-Loop
Verification
![Page 30: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · MATLAB Coder and GPU Coder generates code that takes advantage of: NVIDIA® CUDA libraries, including TensorRT](https://reader030.fdocuments.in/reader030/viewer/2022040302/5e8152aa8c17217a1b7d8d26/html5/thumbnails/30.jpg)
33
Deploy to Target Hardware via Apps and Command Line
![Page 31: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · MATLAB Coder and GPU Coder generates code that takes advantage of: NVIDIA® CUDA libraries, including TensorRT](https://reader030.fdocuments.in/reader030/viewer/2022040302/5e8152aa8c17217a1b7d8d26/html5/thumbnails/31.jpg)
34
PyTorch (1.0.0)
MXNet (1.4.0)
GPU Coder (R2019a)
TensorFlow (1.13.0)
![Page 32: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · MATLAB Coder and GPU Coder generates code that takes advantage of: NVIDIA® CUDA libraries, including TensorRT](https://reader030.fdocuments.in/reader030/viewer/2022040302/5e8152aa8c17217a1b7d8d26/html5/thumbnails/32.jpg)
35
PyTorch (1.0.0)
MXNet (1.4.0)
GPU Coder (R2019a)
TensorFlow (1.13.0)
How does
MATLAB Coder and
GPU Coder
achieve these results?
![Page 33: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · MATLAB Coder and GPU Coder generates code that takes advantage of: NVIDIA® CUDA libraries, including TensorRT](https://reader030.fdocuments.in/reader030/viewer/2022040302/5e8152aa8c17217a1b7d8d26/html5/thumbnails/33.jpg)
36
Coders Apply Various Optimizations
….
….
CUDA kernel
lowering
Traditional compiler
optimizations
MATLAB Library function mapping
Parallel loop creation
CUDA kernel creation
cudaMemcpy minimization
Shared memory mapping
CUDA code emission
Scalarization
Loop perfectization
Loop interchange
Loop fusion
Scalar replacement
Loop
optimizations
![Page 34: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · MATLAB Coder and GPU Coder generates code that takes advantage of: NVIDIA® CUDA libraries, including TensorRT](https://reader030.fdocuments.in/reader030/viewer/2022040302/5e8152aa8c17217a1b7d8d26/html5/thumbnails/34.jpg)
37
Coders Apply Various Optimizations
….
….
CUDA kernel
lowering
Traditional compiler
optimizations
MATLAB Library function mapping
Parallel loop creation
CUDA kernel creation
cudaMemcpy minimization
Shared memory mapping
CUDA code emission
Scalarization
Loop perfectization
Loop interchange
Loop fusion
Scalar replacement
Loop
optimizations
Optimized
Libraries
Network
OptimizationCoding
Patterns
![Page 35: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · MATLAB Coder and GPU Coder generates code that takes advantage of: NVIDIA® CUDA libraries, including TensorRT](https://reader030.fdocuments.in/reader030/viewer/2022040302/5e8152aa8c17217a1b7d8d26/html5/thumbnails/35.jpg)
38
Intel MKL-DNN
Library
Generated Code Calls Optimized Libraries
Pre-
processing
Post-
processing
NVIDIA TensorRT &
cuDNN Libraries
ARM Compute
Library
cuFFT, cuBLAS,
cuSolver, Thrust
Libraries
Performance
1. Optimized Libraries
2. Network Optimizations
3. Coding Patterns
![Page 36: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · MATLAB Coder and GPU Coder generates code that takes advantage of: NVIDIA® CUDA libraries, including TensorRT](https://reader030.fdocuments.in/reader030/viewer/2022040302/5e8152aa8c17217a1b7d8d26/html5/thumbnails/36.jpg)
39
Network
Deep Learning Network Optimization
conv
Batch
Norm
ReLu
Add
conv
ReLu
Max
Pool
Max
Pool
Layer fusion
Optimized computation
FusedConv
FusedConv
BatchNormAdd
Max
Pool
Max
Pool
Buffer minimization
Optimized memory
FusedConv
FusedConv
BatchNormAdd
Max
Pool
buffer a
buffer b
buffer d
Max
Pool
buffer c
buffer e
X Reuse buffer a
X Reuse buffer b
Performance
1. Optimized Libraries
2. Network Optimizations
3. Coding Patterns
![Page 37: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · MATLAB Coder and GPU Coder generates code that takes advantage of: NVIDIA® CUDA libraries, including TensorRT](https://reader030.fdocuments.in/reader030/viewer/2022040302/5e8152aa8c17217a1b7d8d26/html5/thumbnails/37.jpg)
40
Coding Patterns: Stencil Kernels
▪ Automatically applied for image processing functions (e.g. imfilter, imerode,
imdilate, conv2, …)
▪ Manually apply using gpucoder.stencilKernel()
Dotprod
Input image Conv. kernel Output image
rows
cols
kw
kh
Performance
1. Optimized Libraries
2. Network Optimizations
3. Coding Patterns
![Page 38: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · MATLAB Coder and GPU Coder generates code that takes advantage of: NVIDIA® CUDA libraries, including TensorRT](https://reader030.fdocuments.in/reader030/viewer/2022040302/5e8152aa8c17217a1b7d8d26/html5/thumbnails/38.jpg)
41
Coding Patterns: Matrix-Matrix Kernels
▪ Automatically applied for many MATLAB functions (e.g. matchFeatures
SAD, SSD, pdist, …)
▪ Manually apply using gpucoder.matrixMatrixKernel()
Performance
1. Optimized Libraries
2. Network Optimizations
3. Coding Patterns
![Page 39: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · MATLAB Coder and GPU Coder generates code that takes advantage of: NVIDIA® CUDA libraries, including TensorRT](https://reader030.fdocuments.in/reader030/viewer/2022040302/5e8152aa8c17217a1b7d8d26/html5/thumbnails/39.jpg)
42
Deep Learning Workflow in MATLAB
Deep Neural Network
Design + Training
Train in MATLAB
Model
importer
Trained
DNN
Transfer
learning
Reference
model
Application
Design
Application
logic
Standalone
Deployment
TensorRT and
cuDNN Libraries
MKL-DNN
Library
Coders
ARM Compute
Library
Application
logic
Application
Design
Standalone
Deployment
Deep Neural Network
Design + Training
Trained
DNN
![Page 40: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · MATLAB Coder and GPU Coder generates code that takes advantage of: NVIDIA® CUDA libraries, including TensorRT](https://reader030.fdocuments.in/reader030/viewer/2022040302/5e8152aa8c17217a1b7d8d26/html5/thumbnails/40.jpg)
43
Deep Learning Workflow in MATLAB
Deep Neural Network
Design + Training
Train in MATLAB
Model
importer
Trained
DNN
Transfer
learning
Reference
model
Application
Design
Application
logic
Standalone
Deployment
TensorRT and
cuDNN Libraries
MKL-DNN
Library
Coders
ARM Compute
Library