ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise...
Transcript of ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise...
ALISON B LOWNDESAI DevRel | EMEA
@alisonblowndes
June 2020
2
INTRO TO NVIDIA
Training & deployment
RAPIDS
Accelerating the datascience
ROBOTICS & SIMULATION
The hardware, the software & the environments
WRAPUP + Q&A
AGENDA
3
NVIDIA AI BREAKTHROUGHS
IN GRAPHICS
PROJECT SOL:A Showcase for the Power of NVIDIA RTX
MINECRAFT RTX:Real-time Ray Tracing in the World’s Most Popular Game
OMNIVERSE:A Powerful Collaboration Platform for 3D Design
NASA MARS LANDER:Visualizing NASA’s Supercomputer Simulations
44
“AMPERE” NVIDIA A100
20X Volta
54B XTOR | 826mm2 | TSMC 7N | 40GB Samsung HBM2 | 600 GB/s NVLink
Peak Vs Volta
FP32 TRAINING 312 TFLOPS 20X
INT8 INFERENCE 1,248 TOPS 20X
FP64 HPC 19.5 TFLOPS 2.5X
MULTI INSTANCE GPU 7X GPUs
55
25 YEARS OF ACCELERATED COMPUTING
X-FACTOR SPEED UP FULL STACK ONE ARCHITECTURESYSTEMS
GPU
CPU
66
25 YEARS OF ACCELERATED COMPUTING
X-FACTOR SPEED UP FULL STACK DATA-CENTER SCALE
GPU
CPU
DPU
ONE ARCHITECTURE
7
Original NVIDIA Campus
NVIDIA Endeavor (2017)
New SaturnV Datacenter (2020)
NVIDIA Voyager (2020)
8
10.5 MW
45,000 Sq. Ft.
9
Unmatched Data Center Scalability —Deployed in Under 3 Weeks
NVIDIA DGX SUPERPODWITH DGX A100
Leadership-class AI infrastructure
The blueprint for AI power and scale using DGX A100
Infused with the expertise of NVIDIA’s AI practitioners
Designed to solve the previously unsolvable
Configurations start at 20 systems
NVIDIA DGX SuperPOD deployed in SATURNV
1,120 A100 GPUs
140 DGX A100 systems
170 Mellanox 200G HDR switches
4 PB of high-performance storage
700 PFLOPS of power to train the previously impossible
nvidia.com/en-us/data-center/dgx-a100/
10
ProvisioningOS Provisioning, BMaaS, netw ork assignment
Cluster CMSW deployment, updates & upgrades
Sys Monitoring & ReportingSystem usage, health checks & alerting
Dataset ManagementStorage, tagging, & versioning of datasets
Interactive NotebooksNotebooks w / schedulable GPU resources
Experiment ManagementJob & results tracking
GUI/CLIPortal/CLI/A PI for requesting resources
Model DeploymentDeployment to prod, inference services, etc
User ManagementAuth, users, teams, & resource restrictions
System Administrator
Data Scientist/Researcher
AI Infra on DGX PODWhat are customers asking for?
11
1212
5 MIRACLES OF A100
AmpereWorld’s Largest 7nm chip
54B XTORS, HBM2
3rd Gen NVLINK and NVSWITCHEfficient Scaling to Enable Super GPU
2X More Bandwidth
3rd Gen Tensor CoresFaster, Flexible, Easier to use
20x AI Perf with TF32
New Sparsity AccelerationHarness Sparsity in AI Models
2x AI Performance
New Multi-Instance GPUOptimal utilization with right sized GPU
7x Simultaneous Instances per GPU
13
NEW MULTI-INSTANCE GPU (MIG)Optimize GPU Utilization, Expand Access to More Users with Guaranteed Quality of Service
nvidia.com/en-us/technologies/multi-instance-gpu/
Up To 7 GPU Instances In a Single A100: Dedicated SM, Memory, L2 cache, Bandwidth for hardware QoS & isolation
Simultaneous Workload Execution With Guaranteed Quality Of Service: All MIG instances run in parallel with predictable throughput & latency
Right Sized GPU Allocation: Different sized MIG instances based on target workloads
Flexibility to run any type of workload on a MIG instance
Diverse Deployment Environments: Supported with Bare metal, Docker, Kubernetes, Virtualized Env.
Amber
GPU Mem
GPU
GPU Mem
GPU
GPU Mem
GPU
GPU Mem
GPU
GPU Mem
GPU
GPU Mem
GPU
GPU Mem
GPU
https://blogs.nvidia.com/blog/2020/05/14/multi-instance-gpus/
14
BECAUSE MODEL DEVELOPMENT IS JUST THE FIRST STEP
Develop and Test Locally
Package─• Dependencies• Parameters• Run scripts
• Build
Scale-out─• Load-balance• Data partitions• Model distribution
• AutoML
Tune─• Parallelism• GPU support• Query tuning
• Caching
Instrument─• Monitoring• Logging• Versioning
• Security
Automate─• CI/CD• Workflows• Rolling upgrades
• A/B testing
Weekswith one data
scientist or developer
Monthswith a large team of developers,
scientists, data engineers and DevOps
Production
15
16
17
19
20
AI IS NOT MAGIC
Definitions
22
BUILDING AN AI MODEL
AI MODELFEATURES DEPLOYMENTDATA
DATA
ANALYTICSMACHINE
LEARNING
MODEL
VALIDATION
NEW DATA
23
BUILDING AN AI PRODUCT
SENSORS
PERCEIVE REASON
PLAN
DATA
DATAANALYTICS
MACHINE LEARNING
AI MODELVALIDATION
ACTUATORSAI MODEL
24
12
6
39
GPUPOWEREDWORKFLOW
DAY IN THE LIFE OF A DATA SCIENTIST
Train Model
Validate
Test Model
Experiment with Optimizations and Repeat
Go Home on Time
DatasetDownloadsOvernight
Start GET A COFFEE
Stay Late
Restart Data Prep Workflow Again
Find Unexpected Null Values Stored as String…
Switch to Decaf
12
6
39
CPUPOWEREDWORKFLOW
Restart Data Prep Workflow
@*#! Forgot to Add a Feature
ANOTHER…
GET A COFFEE
Start Data PrepWorkflow
GET A COFFEE
Configure Data PrepWorkflow
DatasetDownloadsOvernight
Dataset Collection Analysis Data Prep Train Inference
25
NVIDIA Nsight Systems
• Balance your workload across multiple CPUs and GPUs
• Locate idle CPU and GPU time
• Locate redundant synchronizations
• Locate optimization opportunities
• Improve application’s performance
System Wide Profiling Tool
26
Processes and threads
CUDA and OpenGL API trace
Multi-GPU
Kernel and memory transfer activities
cuDNN and cuBLAS trace
Thread/core migration
Thread state
https://arxiv.org/pdf/1909.13371.pdf
28
IMAGE BASED DL IS EASY
Object detection Semantic Segmentation
Figures copyright Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun,2015. [Faster R-CNN]
Figures copyr ight Preferred Networks Inc., 2016.
29
Numerous applications
3D DL IS EXCITING
Simulation Medical imaging Autonomous driving
Manipulation Robotics Augmented reality
* This slide is best viewed in "slide show" mode.
30
KAOLIN
- A Pytorch library for 3D DL
- Supports a wide range of 3D data representations
- Convenient dataloading/preprocessing/conversions
- Large collection of 3D neural nets to choose from
- Optimized implementations
- Omniverse-Kit integration for easy rendering,
interactive visualization, and much more.
https://gitlab-master.nvidia.com/Toronto_
DL_Lab/kaolin
32
33
World Sense See, Understand Automation
AI Program
Computer
ARTIFICIAL INTELLIGENCE IS DOMAIN SPECIFIC
Self-Driving
34
World Sense See, Understand Automation
AI Program
Computer
AI Program
Computer
ARTIFICIAL INTELLIGENCE IS DOMAIN SPECIFIC
Self-Driving
Manufacturing
35
World Sense See, Understand Automation
AI Program
Computer
AI Program
Computer
AI Program
Computer
ARTIFICIAL INTELLIGENCE IS DOMAIN SPECIFIC
Self-Driving
Manufacturing
Radiology
36
37
RAPIDSGPU POWERED MACHINE LEARNINGMiguel Martínez – Sr. Data Scientist @ NVIDIA
38
WHAT IS RAPIDS
39
GPU Accelerated Data Science
RAPIDS is a set of open source software libraries which
gives you the freedom to execute end-to-end data science
and analytics pipelines entirely on GPUs.
www.rapids.ai
40
CPU Memory
Data Preparation VisualizationModel Training
Open Source Data Science Ecosystem
Familiar Python APIs
Dask
Matplotlib/Plotly
Visualization
Scikit-Learn
Machine Learning
NetworkX
Graph Analytics
Pandas
Analytics
Pytorch, MxNet…
Deep Learning
41
GPU Memory
Dask
cuXFilter <> pyViz
Visualization
cuML
Machine Learning
cuGraph
Graph Analytics
cuDF
Analytics
Pytorch, MxNet…
Deep Learning
Data Preparation VisualizationModel Training
End-to-End Accelerated GPU Data Science
42
cuDF
• GPU-accelerated data preparation and feature engineering
• Python drop-in Pandas replacement
cuML
• GPU-accelerated traditional machine learning libraries
• XGBoost, PCA, Kalman, K-means, k-NN, DBScan, tSVD…
cuGraph
• GPU-accelerated graph analytics libraries
cuXfilter
• Web Data Visualization library
• DataFrame kept in GPU-memory throughout the session
43
LEARNING FROM
Pandas
Spark
Drill
Impala
Parquet
Cassandra Kudu
HBase
Copy & Convert
Copy & Convert
Copy & Convert
Copy & Convert
Copy & Convert
Each system has its own internal memory format
Similar functionality implemented in multiple projects
70-80% computation wasted on serialization & deserialization
All systems utilize the same memory format
Projects can share functionality
No overhead for cross-system communication
Pandas
Spark
Drill
Impala
Parquet
Cassandra Kudu
HBase
Arrow Memory
44
APACHE ARROW
Columnar layout leverages GPU strengths
Emphasis on zero-copy and shallow-copy operations minimizes a core bottleneck
Consistency with CPU version simplifies development and conversion
gdf[‘session_id’]
45
Why OpenUCX?Bringing Hardware Accelerated Communications to Dask
• TCP sockets are slow!
• UCX provides uniform access to transports:
– TCP, InfiniBand, Shared memory, NVLink
• Alpha Python bindings for UCX (ucx-py)
• Provides best communication performance to Dask, based on available hardware on nodes/cluster
46
Environment
• cuDF v0.11,
• UCX-PY 0.11
• Running on NVIDIA DGX-2:
• GPU NVIDIA Tesla V100 32GB
• CPU Intel(R) Xeon(R) CPU 8168 @ 2.70GHz
Benchmark Setup
• DataFrames:
Left/Right 1x int64 column key column,
1x int64 value columns.
• Inner Merge
• 30% of matching data balanced across each partition
Distributed cuDF Random Merge
BENCHMARKS
47
cuDF
48
GPU-Accelerated ETLThe average data scientist spends 90+% of their
time in ETL, as opposed to training models
49
• Follow Pandas APIs and provide >10x speedup
– CSV Reader/Writer
– Parquet Reader/Writer
– ORC Reader/Writer
– JSON Reader
– Avro Reader
• GPU Direct Storage integration in progress forbypassing PCIe bottlenecks!
• Key is GPU-accelerating both parsing anddecompression wherever possible
EXTRACTION IS THE CORNERSTONEcuDF for Faster Data Loading
50
Python
Cython
cuDF C++
CUDA
cuDFDask cuDF
Pandas
ThrustCub
Jitify
CUDA Libraries
ETL Technology Stack
51
ETL – THE BACKBONE OF DATA SCIENCE
libcuDF is… cuDF is…
• Low level library containing function
implementations and C/C++ API
• Importing/exporting Apache Arrow in GPU
memory using CUDA IPC
• CUDA kernels to perform element-wise
math operations on GPU DataFrame columns
• CUDA sort, join, groupby, reduction, etc.
operations on GPU DataFrames
• A Python library for manipulating GPU
DataFrames following the Pandas API
• Python interface to CUDA C++ library with
additional functionality
• Create GPU DataFrames from Numpy arrays,
Pandas DataFrames, and PyArrow Tables
• JIT compilation of User-Defined Functions
(UDFs) using Numba
CUDA C++ Library Python Library
52
BENCHMARKSSingle-GPU Speedup vs Pandas
Environment
• cuDF v0.13
• Pandas v0.25.3
• GPU NVIDIA Tesla V100 32GB
• CPU Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz
Benchmark Setup
• DataFrames:
2x int32 columns key columns,
3x int32 value columns.
• Inner Merge
• GroupBy:
count, sum, min, max.
calculated for each value column.
500
240 220
970
360
290
0
200
400
600
800
1000
1200
Merge Sort GroupBy
# rows
GPU Speedup over CPU
10M 100M
53
500
240 220
970
360290
0
200
400
600
800
1000
1200
Merge Sort GroupBy
# rows
GPU Speedup over CPU
10M 100M
430
140
28
870
300
150
0
100
200
300
400
500
600
700
800
900
1000
Merge Sort GroupBy
# rows
GPU Speedup over CPU
10M 100M
BENCHMARKSContinuous Improvement
cuDF v0.10 cuDF v0.13
54
LOADING DATA INTO A GPU DATAFRAME
Create an empty DataFrame, and add a column
cuDF code examples
Create a DataFrame with two columns
Load a CSV file into a GPU DataFrame
Use Pandas to load a CSV file, and copy its content into a GPU DataFrame
55
WORKING WITH GPU DATAFRAMEScuDF code examples
Return the first three rows as a new DataFrame Row slicing with column selection
Find the mean and standard deviation of a column Count number of occurrences per value, and number of unique values
Transform column values with a custom function Change the data type of a column
56
QUERY, SORT, GROUP, JOIN, …cuDF code examples
Query a DataFrame with a boolean expression
Return the first ‘n’ rows ordered by ‘columns’
Sort a column by its values
One-hot encoding
Group by column with aggregate function
Join and merge DataFrames
57
cuML
58
Python
Cython
cuML Algorithms
cuML Prims
CUDA Libraries
CUDA
cuDFDask cuDFDask cuML
Numpy
ThrustCub
cuSolvernvGraphCUTLASScuSparsecuRandcuBlas
ML Technology Stack
59
CPU vs GPU
Training results
CPU: 57.1 seconds
GPU: 4.28 seconds
System: AWS p3.8xlarge
CPU: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz,
32 vCPU cores, 244 GB RAM
GPU: Tesla V100 SXM2 16GB
PRINCIPAL COMPONENT
ANALYSIS(PCA)
Specific: Import CPU algorithm
Common: Data loading and algo params Common: Data loading and algo params
Specific: DataFrame from Pandas to GPU
Common: Model training Common: Model training
Specific: Import GPU algorithm
60
cuML roadmap
March 2020 – RAPIDS 0.13
cuML Single-GPU Multi-GPUMulti-NodeMulti-GPU
Gradient Boosted Decision Trees
Linear Regression
Logistic Regression
Random Forest
K-Means
K-NN
DBSCAN
UMAP
ARIMA & Holt-Winters
Kalman Filter
t-SNE
Principal Components
Singular Value Decomposition
SVM
61
cuML roadmap
2020 – RAPIDS 1.0
cuML Single-GPU Multi-GPUMulti-NodeMulti-GPU
Gradient Boosted Decision Trees
Linear Regression
Logistic Regression
Random Forest
K-Means
K-NN
DBSCAN
UMAP
ARIMA & Holt-Winters
Kalman Filter
t-SNE
Principal Components
Singular Value Decomposition
SVM
62
0 5,000 10,000
20 CPU Nodes
30 CPU Nodes
50 CPU Nodes
100 CPU Nodes
DGX-2
5x DGX-1
BENCHMARKS
Benchmark
200GB CSV dataset; Data preparation includes joins, variable transformations.
CPU Cluster Configuration
CPU nodes (61 GiB of memory, 8 vCPUs, 64-bit platform), Apache Spark
DGX Cluster Configuration
5x DGX-1 on InfiniBand network
Time in seconds — Shorter is better
2290
1956
1999
1948
147
137
0 1,000 2,000 3,000
20 CPU Nodes
30 CPU Nodes
50 CPU Nodes
100 CPU Nodes
DGX-2
5x DGX-1
2741
1675
715
379
37
17
0 1,000 2,000 3,000
20 CPU Nodes
30 CPU Nodes
50 CPU Nodes
100 CPU Nodes
DGX-2
5x DGX-1
8762
6148
3925
3,221
209
164
cuDF – Load and Data Prep cuML – XGBoost End-to-End
cuDF (Load and Data Preparation) Data Conversion XGBoost
63
cuGraph
64
Python
Cython
cuGraph Algorithms
CUDA
cuDFDask cuDFDask cuML
Numpy
ThrustCub
cuSolvercuSparsecuRand
Gunrock*
Prims
CUDA Libraries
cuGraphBLAS cuHornet
Graph Technology Stack
* Gunrock is from UC Davis
65
Focus on Features and User Experience
GOALS AND BENEFITS OF CUGRAPH
• Property Graph support via DataFrames
Seamless Integration with cuDF & cuML
• Up to 500 million edges on a single 32GB GPU
• Multi-GPU support for scaling into the billions
of edges
Breakthrough Performance
• Python: Familiar NetworkX-like API
• C/C++: lower-level granular control for
application developers
Multiple APIs
• Extensive collection of algorithm, primitive,
and utility functions
Growing Functionality
66
Louvain Single Run
Returns:
cudf.DataFrame with two names columns:
- louvain["vertex"]: The vertex id.
- louvain["partition"]: The assigned partition.
G = cugraph.Graph()
G.add_edge_list(gdf["src_0"], gdf["dst_0"], gdf["data"])
df, mod = cugraph.nvLouvain(G)
67
BENCHMARKSpeedup vs Scipy PageRank and cyLouvain
68
cuGraph roadmap
March 2020 – RAPIDS 0.13
cuGraph Single-GPU Multi-GPUMulti-NodeMulti-GPU
PageRank
Personal Page Rank
Katz
Betweenness Centrality
Spectral Clustering
Louvain
Ensemble Clustering for Graphs
K-Core
K-Truss
Triangle Counting
Connected Components (Weak and Strong)
Jaccard
Overlap Coefficent
Single Source Shortest Path (SSSP)
Breadth First Search (BFS)
69
cuGraph roadmap
2020 – RAPIDS 1.0
cuGraph Single-GPU Multi-GPUMulti-NodeMulti-GPU
PageRank
Personal Page Rank
Katz
Betweenness Centrality
Spectral Clustering
Louvain
Ensemble Clustering for Graphs
K-Core
K-Truss
Triangle Counting
Connected Components (Weak and Strong)
Jaccard
Overlap Coefficent
Single Source Shortest Path (SSSP)
Breadth First Search (BFS)
70
HOW TO START
71
On-premisesIn the cloud
https://github.com/rapidsai
Source code on GitHub
https://ngc.nvidia.com
Containers on NGC & Docker Hub
https://anaconda.org/rapidsai
Conda packages
Pascal architecture or better CUDA 9.2, 10.0 or 10.1.2
Ubuntu 16.04/18.04,CentOS 7 & RHEL 7
72https://rapids.ai/start.html
73https://rapids.ai/start.html
74
LEARN MORE
75www.rapids.ai
76
ALISON B LOWNDESAI DevRel | EMEA
78
https://fortune.com/longform/ai-artificial-intelligence-big-tech-microsoft-alphabet-openai/
80
82
83
Brain Computer Interfaces Focused on treatment for disease and dysfunction eg epilepsy, depression, Parkinsons but ultimately to advance human intelligence by restoring and extending cognitive vibrancy.
“We’re either going to have to merge with AI or be left behind”; Elon Musk
84
85
86
87
88
ISAAC PLATFORM FOR ROBOTICSNvidia's multi-tool for robotics
DESIGN
JETSON XAVIER
SIMULATE TRAIN DEPLOY
89
CARTER — THE NAVIGATION ROBOT
High-end platform for logistics applications
Navigation Stack
3D Obstacle Detection & Avoidance
NVIDIA Jetson Xavier
90
91
92
GTC Digital talk: S21182
93
ISAAC 2020.1
“High-fidelity simulation lets us train and test algorithms more effectively, leading to more robust and adaptive networks” A. Anandkumar, Prof CS, CalTech & NVIDIA.
94
95
97
THE JETSON FAMILYfor AI at the Edge and Autonomous System designs
Same software Full specs at developer.nvidia.com/jetson * TX2i: 10-20W
7.5 – 15W*50mm x 87mm
JETSON TX2 series1.3 TFLOPS (FP16)
5 - 10W45mm x 70mm
JETSON NANO0.5 TFLOPS (FP16)
10 – 30W100mm x 87mm
JETSON AGX XAVIER series11 TFLOPS (FP16)32 TOPS (INT8)
10 - 15W45mm x 70mm
JETSON Xavier NX6 TFLOPS (FP16)21 TOPS (INT8)
Mainstream Autonomous machinesEntry
NVIDIA Jetson
Xavier “NX”
21TOPS (INT8) at 15w8GB LPDDR4x16GB eMMCSupports up to 32x 1080p IP cameras
70x45mmModule
Developer kit
99
Sample Code
Deep Learning
CUDA, Linux4Tegra, ROS
Multimedia API
MediaComputer Vision Graphics
Nsight Developer Tools
Jetson Embedded Supercomputer: Advanced GPU, 64-bit CPU, Video CODEC, VIC, ISP
JETPACK SDK FOR AI @ THE EDGE
DEVELOPER.NVIDIA.COM/EMBEDDED-COMPUTING
TensorRT
cuDNN
VisionWorks
OpenCV
Vulkan
OpenGL
libargus
Video API
100
DEEP LEARNING INSTITUTE
Training Labs
Nanodegrees
nvidia.com/DLI
TWO DAYS TO A DEMO
Create your first demo today
developer.nvidia.com/
embedded/twodaystoademo
JETSON DEVELOPER KIT
AGX Xavier Developer Kit $699
Xavier NX software patch
developer.nvidia.com/
buy-jetson
GTC
Largest event for GPU
developers
gputechconf.com
JETSON - START NOW
103
JARVISFramework for Multimodal Conversational AI services
PRE-TRAINED MODELS
JARVIS
End-to-End Multimodal Conversational AI ServicesPre-trained SOTA models-100,000 Hours of DGX Retrain with NeMoInteractive Response – 150ms on A100 versus 25sec on CPUDeploy Services with One Line of Code
RETRAIN
video
audio
Multi-Speaker
Tr anscription
NVIDIA GPU CLOUD NVIDIA AI TOOLKIT
Transfer Learning
NeMo
Service Maker
TRITON INFERENCE SERVER
Dialog Manager
ChatbotMulti-
Speaker Tr anscription
Look to TalkGestur e
Recognition
Speech
Vision
NLU
Sign-up for EA:developer.nvidia.com/nvidia-jarvis
104
Enabling Resilience & Monitoring of Advanced Deployments
Package manager for Kubernetes Easily configure, deploy and update
applications on Kubernetes
Container OrchestrationAutomated container deployment
including self-healing
Cloud Native Deployment Approach
NVIDIA EGX Stack
GPU Operator
105
PURPOSE-BUILT AI SUPERCOMPUTERS
AI WORKSTATION AI DATA CENTER
Universal SW for Deep Learning
Predictable execution across platforms
Pervasive reach
NGC DL SOFTWARE STACK
The Essential Instrument for AI
Research
DGX-1
The Personal AI Supercomputer
DGX Station
The World’s Most Powerful AI System for the Most Complex AI Challenges
DGX-2
106106
NEW NGC FEATURES
SDKs & CONTAINERS FOR A100Q2
NGC PRIVATE REGISTRYNow
NGC-READY SYSTEMS FOR A100Q2
DL - TF, PyT, MxNet, Triton…
HPC – NAMD, Chroma, LAMMPS…
Easily grant and manage content access
Container scanning and signing.Model versioning and encryption
Multi-arch support - x86, Arm, POWER
Securely share and collaborate
Industry SDKs – Jarvis, Aerial…
107
Secure and Accelerate End to End AI WorkflowsNGC AI Model and Security Enhancements
PRE-TRAINED
MODELS
AI Toolkits & SDK’s
Transfer Learning
Federated Learning
NeMo
ConversationalAI
TensorRT Optimizer
Service Maker
TRAINING & REFINING
NGC Catalog Private Registry
Container Signing
Model Encryption
Model Versioning
Security Scanning
Access Control
DeploySecure
Manage
Remote EGX Systems
108
NVIDIA CLARA FIGHTING COVIDTesting | Treating | Tracking
Clara GuardianVideo, Vision, Voice18+ Global Partners
First DGX A100 for COVIDArgonne National LabBlue Print for Pharmas
Accelerated GenomicsMinutes & Hours vs. Weeks & MonthsEpidemiology to Infected Population
AI Models for COVID in CT2 Pre-trained Models
NVIDIA Clara Imaging in NGC
109
FIVE ROADS TO GPU COMPUTING
GPU Libraries______________
Drop-in replacement for
existing libraries
cuBLAS, CUDA Math,
cuSPARSE, cuRAND, cuSOLVER, nvGRAPH, cuDNN,
cuFFT, Thrust
OPEN-ACC______________
Comment-based
directives in
C / C++ / Fortran
Single source code
parallelization for
multiple architectures
CUDA______________
Parallel Programming
Model for GPUs in C, C++,
Fortran, Python, MATLAB
Specialized Kernels for
general purpose GPU
RAPIDS______________
GPU Acceleration of
Traditional Machine
Learning
Accelerate Scikit-Learn
style ML algorithms
DEEP LEARNING______________
GPU accelerated deep
learning frameworks
TensorFlow, Pytorch
Build GPU-accelerated
functions directly
from data
110developer.nvidia.com
111
RICH CONTENT PORTFOLIOFundamentals and advanced hands-on training in key technologies and application domains
AI for Digital Content Creation
Deep Learning Fundamentals
AI for HealthcareAI for Autonomous Vehicles
AI for Intelligent Video Analytics
Accelerated Computing Fundamentals
AI for RoboticsAI for
Predictive Maintenance
Accelerated Data Science Fundamentals
Intro to AI in the Data Center
AI for Anomaly Detection
AI for Industrial Inspection
112
DLI UNIVERSITY TRAINING
UNIVERSITY AMBASSADOR PROGRAM
• Qualified faculty and researchers can get certified to teach DLI
workshops to their students at no cost.
• Hundreds of universities certified around the world, including:
TEACHING KITS
• Qualified university educators can download courseware across
deep learning, accelerated computing, and robotics.
• Kits include lecture materials, GPU cloud resources, access to
self-paced DLI courses, and more.
Learn more at www.nvidia.com/dli
113
`https://blogs.nvidia.com/blog/2019/11/20/nvidia-microsoft-aid-ai-startups/
THANK YOU