ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise...

ALISON B LOWNDESAI DevRel | EMEA

@alisonblowndes

June 2020

2

INTRO TO NVIDIA

Training & deployment

RAPIDS

Accelerating the datascience

ROBOTICS & SIMULATION

The hardware, the software & the environments

WRAPUP + Q&A

AGENDA

3

NVIDIA AI BREAKTHROUGHS

IN GRAPHICS

PROJECT SOL:A Showcase for the Power of NVIDIA RTX

MINECRAFT RTX:Real-time Ray Tracing in the World’s Most Popular Game

OMNIVERSE:A Powerful Collaboration Platform for 3D Design

NASA MARS LANDER:Visualizing NASA’s Supercomputer Simulations

44

“AMPERE” NVIDIA A100

20X Volta

54B XTOR | 826mm2 | TSMC 7N | 40GB Samsung HBM2 | 600 GB/s NVLink

Peak Vs Volta

FP32 TRAINING 312 TFLOPS 20X

INT8 INFERENCE 1,248 TOPS 20X

FP64 HPC 19.5 TFLOPS 2.5X

MULTI INSTANCE GPU 7X GPUs

55

25 YEARS OF ACCELERATED COMPUTING

X-FACTOR SPEED UP FULL STACK ONE ARCHITECTURESYSTEMS

GPU

CPU

66

25 YEARS OF ACCELERATED COMPUTING

X-FACTOR SPEED UP FULL STACK DATA-CENTER SCALE

GPU

CPU

DPU

ONE ARCHITECTURE

7

Original NVIDIA Campus

NVIDIA Endeavor (2017)

New SaturnV Datacenter (2020)

NVIDIA Voyager (2020)

8

10.5 MW

45,000 Sq. Ft.

9

Unmatched Data Center Scalability —Deployed in Under 3 Weeks

NVIDIA DGX SUPERPODWITH DGX A100

Leadership-class AI infrastructure

The blueprint for AI power and scale using DGX A100

Infused with the expertise of NVIDIA’s AI practitioners

Designed to solve the previously unsolvable

Configurations start at 20 systems

NVIDIA DGX SuperPOD deployed in SATURNV

1,120 A100 GPUs

140 DGX A100 systems

170 Mellanox 200G HDR switches

4 PB of high-performance storage

700 PFLOPS of power to train the previously impossible

nvidia.com/en-us/data-center/dgx-a100/

https://www.nvidia.com/en-us/data-center/dgx-a100/

10

ProvisioningOS Provisioning, BMaaS, netw ork assignment

Cluster CMSW deployment, updates & upgrades

Sys Monitoring & ReportingSystem usage, health checks & alerting

Dataset ManagementStorage, tagging, & versioning of datasets

Interactive NotebooksNotebooks w / schedulable GPU resources

Experiment ManagementJob & results tracking

GUI/CLIPortal/CLI/A PI for requesting resources

Model DeploymentDeployment to prod, inference services, etc

User ManagementAuth, users, teams, & resource restrictions

System Administrator

Data Scientist/Researcher

AI Infra on DGX PODWhat are customers asking for?

1212

5 MIRACLES OF A100

AmpereWorld’s Largest 7nm chip

54B XTORS, HBM2

3rd Gen NVLINK and NVSWITCHEfficient Scaling to Enable Super GPU

2X More Bandwidth

3rd Gen Tensor CoresFaster, Flexible, Easier to use

20x AI Perf with TF32

New Sparsity AccelerationHarness Sparsity in AI Models

2x AI Performance

New Multi-Instance GPUOptimal utilization with right sized GPU

7x Simultaneous Instances per GPU

13

NEW MULTI-INSTANCE GPU (MIG)Optimize GPU Utilization, Expand Access to More Users with Guaranteed Quality of Service

nvidia.com/en-us/technologies/multi-instance-gpu/

Up To 7 GPU Instances In a Single A100: Dedicated SM, Memory, L2 cache, Bandwidth for hardware QoS & isolation

Simultaneous Workload Execution With Guaranteed Quality Of Service: All MIG instances run in parallel with predictable throughput & latency

Right Sized GPU Allocation: Different sized MIG instances based on target workloads

Flexibility to run any type of workload on a MIG instance

Diverse Deployment Environments: Supported with Bare metal, Docker, Kubernetes, Virtualized Env.

Amber

GPU Mem

GPU

GPU Mem

GPU

GPU Mem

GPU

GPU Mem

GPU

GPU Mem

GPU

GPU Mem

GPU

GPU Mem

GPU

https://blogs.nvidia.com/blog/2020/05/14/multi-instance-gpus/

https://www.nvidia.com/en-us/technologies/multi-instance-gpu/

https://blogs.nvidia.com/blog/2020/05/14/multi-instance-gpus/

14

BECAUSE MODEL DEVELOPMENT IS JUST THE FIRST STEP

Develop and Test Locally

Package─• Dependencies• Parameters• Run scripts

• Build

Scale-out─• Load-balance• Data partitions• Model distribution

• AutoML

Tune─• Parallelism• GPU support• Query tuning

• Caching

Instrument─• Monitoring• Logging• Versioning

• Security

Automate─• CI/CD• Workflows• Rolling upgrades

• A/B testing

Weekswith one data

scientist or developer

Monthswith a large team of developers,

scientists, data engineers and DevOps

Production

20

AI IS NOT MAGIC

Definitions

22

BUILDING AN AI MODEL

AI MODELFEATURES DEPLOYMENTDATA

DATA

ANALYTICSMACHINE

LEARNING

MODEL

VALIDATION

NEW DATA

23

BUILDING AN AI PRODUCT

SENSORS

PERCEIVE REASON

PLAN

DATA

DATAANALYTICS

MACHINE LEARNING

AI MODELVALIDATION

ACTUATORSAI MODEL

24

12

6

39

GPUPOWEREDWORKFLOW

DAY IN THE LIFE OF A DATA SCIENTIST

Train Model

Validate

Test Model

Experiment with Optimizations and Repeat

Go Home on Time

DatasetDownloadsOvernight

Start GET A COFFEE

Stay Late

Restart Data Prep Workflow Again

Find Unexpected Null Values Stored as String…

Switch to Decaf

12

6

39

CPUPOWEREDWORKFLOW

Restart Data Prep Workflow

@*#! Forgot to Add a Feature

ANOTHER…

GET A COFFEE

Start Data PrepWorkflow

GET A COFFEE

Configure Data PrepWorkflow

DatasetDownloadsOvernight

Dataset Collection Analysis Data Prep Train Inference

25

NVIDIA Nsight Systems

• Balance your workload across multiple CPUs and GPUs

• Locate idle CPU and GPU time

• Locate redundant synchronizations

• Locate optimization opportunities

• Improve application’s performance

System Wide Profiling Tool

26

Processes and threads

CUDA and OpenGL API trace

Multi-GPU

Kernel and memory transfer activities

cuDNN and cuBLAS trace

Thread/core migration

Thread state

https://arxiv.org/pdf/1909.13371.pdf


28

IMAGE BASED DL IS EASY

Object detection Semantic Segmentation

Figures copyright Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun,2015. [Faster R-CNN]

Figures copyr ight Preferred Networks Inc., 2016.

29

Numerous applications

3D DL IS EXCITING

Simulation Medical imaging Autonomous driving

Manipulation Robotics Augmented reality

* This slide is best viewed in "slide show" mode.

30

KAOLIN

- A Pytorch library for 3D DL

- Supports a wide range of 3D data representations

- Convenient dataloading/preprocessing/conversions

- Large collection of 3D neural nets to choose from

- Optimized implementations

- Omniverse-Kit integration for easy rendering,

interactive visualization, and much more.

https://gitlab-master.nvidia.com/Toronto_

DL_Lab/kaolin

https://gitlab-master.nvidia.com/Toronto_DL_Lab/kaolin

31

NOVOGRADhttps://arxiv.org/pdf/1905.11286.pdf


33

World Sense See, Understand Automation

AI Program

Computer

ARTIFICIAL INTELLIGENCE IS DOMAIN SPECIFIC

Self-Driving

34


AI Program

Computer

AI Program

Computer


Self-Driving

Manufacturing

35


AI Program

Computer

AI Program

Computer

AI Program

Computer


Self-Driving

Manufacturing

Radiology

37

RAPIDSGPU POWERED MACHINE LEARNINGMiguel Martínez – Sr. Data Scientist @ NVIDIA

38

WHAT IS RAPIDS

39

GPU Accelerated Data Science

RAPIDS is a set of open source software libraries which

gives you the freedom to execute end-to-end data science

and analytics pipelines entirely on GPUs.

www.rapids.ai

40

CPU Memory

Data Preparation VisualizationModel Training

Open Source Data Science Ecosystem

Familiar Python APIs

Dask

Matplotlib/Plotly

Visualization

Scikit-Learn

Machine Learning

NetworkX

Graph Analytics

Pandas

Analytics

Pytorch, MxNet…

Deep Learning

41

GPU Memory

Dask

cuXFilter <> pyViz

Visualization

cuML

Machine Learning

cuGraph

Graph Analytics

cuDF

Analytics

Pytorch, MxNet…

Deep Learning

Data Preparation VisualizationModel Training

End-to-End Accelerated GPU Data Science

42

cuDF

• GPU-accelerated data preparation and feature engineering

• Python drop-in Pandas replacement

cuML

• GPU-accelerated traditional machine learning libraries

• XGBoost, PCA, Kalman, K-means, k-NN, DBScan, tSVD…

cuGraph

• GPU-accelerated graph analytics libraries

cuXfilter

• Web Data Visualization library

• DataFrame kept in GPU-memory throughout the session

43

LEARNING FROM

Pandas

Spark

Drill

Impala

Parquet

Cassandra Kudu

HBase

Copy & Convert

Copy & Convert

Copy & Convert

Copy & Convert

Copy & Convert

Each system has its own internal memory format

Similar functionality implemented in multiple projects

70-80% computation wasted on serialization & deserialization

All systems utilize the same memory format

Projects can share functionality

No overhead for cross-system communication

Pandas

Spark

Drill

Impala

Parquet

Cassandra Kudu

HBase

Arrow Memory

44

APACHE ARROW

Columnar layout leverages GPU strengths

Emphasis on zero-copy and shallow-copy operations minimizes a core bottleneck

Consistency with CPU version simplifies development and conversion

gdf[‘session_id’]

45

Why OpenUCX?Bringing Hardware Accelerated Communications to Dask

• TCP sockets are slow!

• UCX provides uniform access to transports:

– TCP, InfiniBand, Shared memory, NVLink

• Alpha Python bindings for UCX (ucx-py)

• Provides best communication performance to Dask, based on available hardware on nodes/cluster

46

Environment

• cuDF v0.11,

• UCX-PY 0.11

• Running on NVIDIA DGX-2:

• GPU NVIDIA Tesla V100 32GB

• CPU Intel(R) Xeon(R) CPU 8168 @ 2.70GHz

Benchmark Setup

• DataFrames:

Left/Right 1x int64 column key column,

1x int64 value columns.

• Inner Merge

• 30% of matching data balanced across each partition

Distributed cuDF Random Merge

BENCHMARKS

47

cuDF

48

GPU-Accelerated ETLThe average data scientist spends 90+% of their

time in ETL, as opposed to training models

49

• Follow Pandas APIs and provide >10x speedup

– CSV Reader/Writer

– Parquet Reader/Writer

– ORC Reader/Writer

– JSON Reader

– Avro Reader

• GPU Direct Storage integration in progress forbypassing PCIe bottlenecks!

• Key is GPU-accelerating both parsing anddecompression wherever possible

EXTRACTION IS THE CORNERSTONEcuDF for Faster Data Loading

50

Python

Cython

cuDF C++

CUDA

cuDFDask cuDF

Pandas

ThrustCub

Jitify

CUDA Libraries

ETL Technology Stack

51

ETL – THE BACKBONE OF DATA SCIENCE

libcuDF is… cuDF is…

• Low level library containing function

implementations and C/C++ API

• Importing/exporting Apache Arrow in GPU

memory using CUDA IPC

• CUDA kernels to perform element-wise

math operations on GPU DataFrame columns

• CUDA sort, join, groupby, reduction, etc.

operations on GPU DataFrames

• A Python library for manipulating GPU

DataFrames following the Pandas API

• Python interface to CUDA C++ library with

additional functionality

• Create GPU DataFrames from Numpy arrays,

Pandas DataFrames, and PyArrow Tables

• JIT compilation of User-Defined Functions

(UDFs) using Numba

CUDA C++ Library Python Library

52

BENCHMARKSSingle-GPU Speedup vs Pandas

Environment

• cuDF v0.13

• Pandas v0.25.3

• GPU NVIDIA Tesla V100 32GB

• CPU Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz

Benchmark Setup

• DataFrames:

2x int32 columns key columns,

3x int32 value columns.

• Inner Merge

• GroupBy:

count, sum, min, max.

calculated for each value column.

500

240 220

970

360

290

0

200

400

600

800

1000

1200

Merge Sort GroupBy

# rows

GPU Speedup over CPU

10M 100M

53

500

240 220

970

360290

0

200

400

600

800

1000

1200

Merge Sort GroupBy

# rows


10M 100M

430

140

28

870

300

150

0

100

200

300

400

500

600

700

800

900

1000

Merge Sort GroupBy

# rows


10M 100M

BENCHMARKSContinuous Improvement

cuDF v0.10 cuDF v0.13

54

LOADING DATA INTO A GPU DATAFRAME

Create an empty DataFrame, and add a column

cuDF code examples

Create a DataFrame with two columns

Load a CSV file into a GPU DataFrame

Use Pandas to load a CSV file, and copy its content into a GPU DataFrame

55

WORKING WITH GPU DATAFRAMEScuDF code examples

Return the first three rows as a new DataFrame Row slicing with column selection

Find the mean and standard deviation of a column Count number of occurrences per value, and number of unique values

Transform column values with a custom function Change the data type of a column

56

QUERY, SORT, GROUP, JOIN, …cuDF code examples

Query a DataFrame with a boolean expression

Return the first ‘n’ rows ordered by ‘columns’

Sort a column by its values

One-hot encoding

Group by column with aggregate function

Join and merge DataFrames

57

cuML

58

Python

Cython

cuML Algorithms

cuML Prims

CUDA Libraries

CUDA

cuDFDask cuDFDask cuML

Numpy

ThrustCub

cuSolvernvGraphCUTLASScuSparsecuRandcuBlas

ML Technology Stack

59

CPU vs GPU

Training results

CPU: 57.1 seconds

GPU: 4.28 seconds

System: AWS p3.8xlarge

CPU: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz,

32 vCPU cores, 244 GB RAM

GPU: Tesla V100 SXM2 16GB

PRINCIPAL COMPONENT

ANALYSIS(PCA)

Specific: Import CPU algorithm

Common: Data loading and algo params Common: Data loading and algo params

Specific: DataFrame from Pandas to GPU

Common: Model training Common: Model training

Specific: Import GPU algorithm

60

cuML roadmap

March 2020 – RAPIDS 0.13

cuML Single-GPU Multi-GPUMulti-NodeMulti-GPU

Gradient Boosted Decision Trees

Linear Regression

Logistic Regression

Random Forest

K-Means

K-NN

DBSCAN

UMAP

ARIMA & Holt-Winters

Kalman Filter

t-SNE

Principal Components

Singular Value Decomposition

SVM

61

cuML roadmap

2020 – RAPIDS 1.0

cuML Single-GPU Multi-GPUMulti-NodeMulti-GPU

Gradient Boosted Decision Trees

Linear Regression

Logistic Regression

Random Forest

K-Means

K-NN

DBSCAN

UMAP

ARIMA & Holt-Winters

Kalman Filter

t-SNE

Principal Components

Singular Value Decomposition

SVM

62

0 5,000 10,000

20 CPU Nodes

30 CPU Nodes

50 CPU Nodes

100 CPU Nodes

DGX-2

5x DGX-1

BENCHMARKS

Benchmark

200GB CSV dataset; Data preparation includes joins, variable transformations.

CPU Cluster Configuration

CPU nodes (61 GiB of memory, 8 vCPUs, 64-bit platform), Apache Spark

DGX Cluster Configuration

5x DGX-1 on InfiniBand network

Time in seconds — Shorter is better

2290

1956

1999

1948

147

137

0 1,000 2,000 3,000

20 CPU Nodes

30 CPU Nodes

50 CPU Nodes

100 CPU Nodes

DGX-2

5x DGX-1

2741

1675

715

379

37

17

0 1,000 2,000 3,000

20 CPU Nodes

30 CPU Nodes

50 CPU Nodes

100 CPU Nodes

DGX-2

5x DGX-1

8762

6148

3925

3,221

209

164

cuDF – Load and Data Prep cuML – XGBoost End-to-End

cuDF (Load and Data Preparation) Data Conversion XGBoost

63

cuGraph

64

Python

Cython

cuGraph Algorithms

CUDA

cuDFDask cuDFDask cuML

Numpy

ThrustCub

cuSolvercuSparsecuRand

Gunrock*

Prims

CUDA Libraries

cuGraphBLAS cuHornet

Graph Technology Stack

* Gunrock is from UC Davis

65

Focus on Features and User Experience

GOALS AND BENEFITS OF CUGRAPH

• Property Graph support via DataFrames

Seamless Integration with cuDF & cuML

• Up to 500 million edges on a single 32GB GPU

• Multi-GPU support for scaling into the billions

of edges

Breakthrough Performance

• Python: Familiar NetworkX-like API

• C/C++: lower-level granular control for

application developers

Multiple APIs

• Extensive collection of algorithm, primitive,

and utility functions

Growing Functionality

66

Louvain Single Run

Returns:

cudf.DataFrame with two names columns:

- louvain["vertex"]: The vertex id.

- louvain["partition"]: The assigned partition.

G = cugraph.Graph()

G.add_edge_list(gdf["src_0"], gdf["dst_0"], gdf["data"])

df, mod = cugraph.nvLouvain(G)

67

BENCHMARKSpeedup vs Scipy PageRank and cyLouvain

68

cuGraph roadmap

March 2020 – RAPIDS 0.13

cuGraph Single-GPU Multi-GPUMulti-NodeMulti-GPU

PageRank

Personal Page Rank

Katz

Betweenness Centrality

Spectral Clustering

Louvain

Ensemble Clustering for Graphs

K-Core

K-Truss

Triangle Counting

Connected Components (Weak and Strong)

Jaccard

Overlap Coefficent

Single Source Shortest Path (SSSP)

Breadth First Search (BFS)

69

cuGraph roadmap

2020 – RAPIDS 1.0

cuGraph Single-GPU Multi-GPUMulti-NodeMulti-GPU

PageRank

Personal Page Rank

Katz

Betweenness Centrality

Spectral Clustering

Louvain

Ensemble Clustering for Graphs

K-Core

K-Truss

Triangle Counting

Connected Components (Weak and Strong)

Jaccard

Overlap Coefficent

Single Source Shortest Path (SSSP)

Breadth First Search (BFS)

70

HOW TO START

71

On-premisesIn the cloud

https://github.com/rapidsai

Source code on GitHub

https://ngc.nvidia.com

Containers on NGC & Docker Hub

https://anaconda.org/rapidsai

Conda packages

Pascal architecture or better CUDA 9.2, 10.0 or 10.1.2

Ubuntu 16.04/18.04,CentOS 7 & RHEL 7

https://github.com/rapidsai

https://ngc.nvidia.com/

https://anaconda.org/rapidsai

72https://rapids.ai/start.html

https://rapids.ai/start.html

73https://rapids.ai/start.html

https://rapids.ai/start.html

74

LEARN MORE

75www.rapids.ai

http://www.rapids.ai/

ALISON B LOWNDESAI DevRel | EMEA

78

https://fortune.com/longform/ai-artificial-intelligence-big-tech-microsoft-alphabet-openai/

https://fortune.com/longform/ai-artificial-intelligence-big-tech-microsoft-alphabet-openai/

www.Robust.ai

Gray Marcus, Rodney Brooks, Steven Pinker et al

http://www.robust.ai/

81

EVOLUTIONARY META-LEARNINGhttps://arxiv.org/abs/2003.01239

https://arxiv.org/abs/2003.01239

83

Brain Computer Interfaces Focused on treatment for disease and dysfunction eg epilepsy, depression, Parkinsons but ultimately to advance human intelligence by restoring and extending cognitive vibrancy.

“We’re either going to have to merge with AI or be left behind”; Elon Musk

88

ISAAC PLATFORM FOR ROBOTICSNvidia's multi-tool for robotics

DESIGN

JETSON XAVIER

SIMULATE TRAIN DEPLOY

89

CARTER — THE NAVIGATION ROBOT

High-end platform for logistics applications

Navigation Stack

3D Obstacle Detection & Avoidance

NVIDIA Jetson Xavier

92

GTC Digital talk: S21182

93

ISAAC 2020.1

“High-fidelity simulation lets us train and test algorithms more effectively, leading to more robust and adaptive networks” A. Anandkumar, Prof CS, CalTech & NVIDIA.

97

THE JETSON FAMILYfor AI at the Edge and Autonomous System designs

Same software Full specs at developer.nvidia.com/jetson * TX2i: 10-20W

7.5 – 15W*50mm x 87mm

JETSON TX2 series1.3 TFLOPS (FP16)

5 - 10W45mm x 70mm

JETSON NANO0.5 TFLOPS (FP16)

10 – 30W100mm x 87mm

JETSON AGX XAVIER series11 TFLOPS (FP16)32 TOPS (INT8)

10 - 15W45mm x 70mm

JETSON Xavier NX6 TFLOPS (FP16)21 TOPS (INT8)

Mainstream Autonomous machinesEntry

NVIDIA Jetson

Xavier “NX”

21TOPS (INT8) at 15w8GB LPDDR4x16GB eMMCSupports up to 32x 1080p IP cameras

70x45mmModule

Developer kit

99

Sample Code

Deep Learning

CUDA, Linux4Tegra, ROS

Multimedia API

MediaComputer Vision Graphics

Nsight Developer Tools

Jetson Embedded Supercomputer: Advanced GPU, 64-bit CPU, Video CODEC, VIC, ISP

JETPACK SDK FOR AI @ THE EDGE

DEVELOPER.NVIDIA.COM/EMBEDDED-COMPUTING

TensorRT

cuDNN

VisionWorks

OpenCV

Vulkan

OpenGL

libargus

Video API

100

DEEP LEARNING INSTITUTE

Training Labs

Nanodegrees

nvidia.com/DLI

TWO DAYS TO A DEMO

Create your first demo today

developer.nvidia.com/

embedded/twodaystoademo

JETSON DEVELOPER KIT

AGX Xavier Developer Kit $699

Xavier NX software patch

developer.nvidia.com/

buy-jetson

GTC

Largest event for GPU

developers

gputechconf.com

JETSON - START NOW

103

JARVISFramework for Multimodal Conversational AI services

PRE-TRAINED MODELS

JARVIS

End-to-End Multimodal Conversational AI ServicesPre-trained SOTA models-100,000 Hours of DGX Retrain with NeMoInteractive Response – 150ms on A100 versus 25sec on CPUDeploy Services with One Line of Code

RETRAIN

video

audio

Multi-Speaker

Tr anscription

NVIDIA GPU CLOUD NVIDIA AI TOOLKIT

Transfer Learning

NeMo

Service Maker

TRITON INFERENCE SERVER

Dialog Manager

ChatbotMulti-

Speaker Tr anscription

Look to TalkGestur e

Recognition

Speech

Vision

NLU

Sign-up for EA:developer.nvidia.com/nvidia-jarvis

104

Enabling Resilience & Monitoring of Advanced Deployments

Package manager for Kubernetes Easily configure, deploy and update

applications on Kubernetes

Container OrchestrationAutomated container deployment

including self-healing

Cloud Native Deployment Approach

NVIDIA EGX Stack

GPU Operator

105

PURPOSE-BUILT AI SUPERCOMPUTERS

AI WORKSTATION AI DATA CENTER

Universal SW for Deep Learning

Predictable execution across platforms

Pervasive reach

NGC DL SOFTWARE STACK

The Essential Instrument for AI

Research

DGX-1

The Personal AI Supercomputer

DGX Station

The World’s Most Powerful AI System for the Most Complex AI Challenges

DGX-2

106106

NEW NGC FEATURES

SDKs & CONTAINERS FOR A100Q2

NGC PRIVATE REGISTRYNow

NGC-READY SYSTEMS FOR A100Q2

DL - TF, PyT, MxNet, Triton…

HPC – NAMD, Chroma, LAMMPS…

Easily grant and manage content access

Container scanning and signing.Model versioning and encryption

Multi-arch support - x86, Arm, POWER

Securely share and collaborate

Industry SDKs – Jarvis, Aerial…

107

Secure and Accelerate End to End AI WorkflowsNGC AI Model and Security Enhancements

PRE-TRAINED

MODELS

AI Toolkits & SDK’s

Transfer Learning

Federated Learning

NeMo

ConversationalAI

TensorRT Optimizer

Service Maker

TRAINING & REFINING

NGC Catalog Private Registry

Container Signing

Model Encryption

Model Versioning

Security Scanning

Access Control

DeploySecure

Manage

Remote EGX Systems

108

NVIDIA CLARA FIGHTING COVIDTesting | Treating | Tracking

Clara GuardianVideo, Vision, Voice18+ Global Partners

First DGX A100 for COVIDArgonne National LabBlue Print for Pharmas

Accelerated GenomicsMinutes & Hours vs. Weeks & MonthsEpidemiology to Infected Population

AI Models for COVID in CT2 Pre-trained Models

NVIDIA Clara Imaging in NGC

109

FIVE ROADS TO GPU COMPUTING

GPU Libraries______________

Drop-in replacement for

existing libraries

cuBLAS, CUDA Math,

cuSPARSE, cuRAND, cuSOLVER, nvGRAPH, cuDNN,

cuFFT, Thrust

OPEN-ACC______________

Comment-based

directives in

C / C++ / Fortran

Single source code

parallelization for

multiple architectures

CUDA______________

Parallel Programming

Model for GPUs in C, C++,

Fortran, Python, MATLAB

Specialized Kernels for

general purpose GPU

RAPIDS______________

GPU Acceleration of

Traditional Machine

Learning

Accelerate Scikit-Learn

style ML algorithms

DEEP LEARNING______________

GPU accelerated deep

learning frameworks

TensorFlow, Pytorch

Build GPU-accelerated

functions directly

from data

110developer.nvidia.com

111

RICH CONTENT PORTFOLIOFundamentals and advanced hands-on training in key technologies and application domains

AI for Digital Content Creation

Deep Learning Fundamentals

AI for HealthcareAI for Autonomous Vehicles

AI for Intelligent Video Analytics

Accelerated Computing Fundamentals

AI for RoboticsAI for

Predictive Maintenance

Accelerated Data Science Fundamentals

Intro to AI in the Data Center

AI for Anomaly Detection

AI for Industrial Inspection

112

DLI UNIVERSITY TRAINING

UNIVERSITY AMBASSADOR PROGRAM

• Qualified faculty and researchers can get certified to teach DLI

workshops to their students at no cost.

• Hundreds of universities certified around the world, including:

TEACHING KITS

• Qualified university educators can download courseware across

deep learning, accelerated computing, and robotics.

• Kits include lecture materials, GPU cloud resources, access to

self-paced DLI courses, and more.

Learn more at www.nvidia.com/dli

https://developer.nvidia.com/teaching-kits

113

`https://blogs.nvidia.com/blog/2019/11/20/nvidia-microsoft-aid-ai-startups/

https://blogs.nvidia.com/blog/2019/11/20/nvidia-microsoft-aid-ai-startups/?ncid=so-elev-27490#cid=in01_so-elev_en-us

THANK YOU

[email protected]

ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise...

Documents

Transcript of ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise...