ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise...

114
ALISON B LOWNDES AI DevRel | EMEA @alisonblowndes June 2020

Transcript of ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise...

Page 1: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

ALISON B LOWNDESAI DevRel | EMEA

@alisonblowndes

June 2020

Page 2: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

2

INTRO TO NVIDIA

Training & deployment

RAPIDS

Accelerating the datascience

ROBOTICS & SIMULATION

The hardware, the software & the environments

WRAPUP + Q&A

AGENDA

Page 3: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

3

NVIDIA AI BREAKTHROUGHS

IN GRAPHICS

PROJECT SOL:A Showcase for the Power of NVIDIA RTX

MINECRAFT RTX:Real-time Ray Tracing in the World’s Most Popular Game

OMNIVERSE:A Powerful Collaboration Platform for 3D Design

NASA MARS LANDER:Visualizing NASA’s Supercomputer Simulations

Page 4: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

44

“AMPERE” NVIDIA A100

20X Volta

54B XTOR | 826mm2 | TSMC 7N | 40GB Samsung HBM2 | 600 GB/s NVLink

Peak Vs Volta

FP32 TRAINING 312 TFLOPS 20X

INT8 INFERENCE 1,248 TOPS 20X

FP64 HPC 19.5 TFLOPS 2.5X

MULTI INSTANCE GPU 7X GPUs

Page 5: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

55

25 YEARS OF ACCELERATED COMPUTING

X-FACTOR SPEED UP FULL STACK ONE ARCHITECTURESYSTEMS

GPU

CPU

Page 6: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

66

25 YEARS OF ACCELERATED COMPUTING

X-FACTOR SPEED UP FULL STACK DATA-CENTER SCALE

GPU

CPU

DPU

ONE ARCHITECTURE

Page 7: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

7

Original NVIDIA Campus

NVIDIA Endeavor (2017)

New SaturnV Datacenter (2020)

NVIDIA Voyager (2020)

Page 8: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

8

10.5 MW

45,000 Sq. Ft.

Page 9: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

9

Unmatched Data Center Scalability —Deployed in Under 3 Weeks

NVIDIA DGX SUPERPODWITH DGX A100

Leadership-class AI infrastructure

The blueprint for AI power and scale using DGX A100

Infused with the expertise of NVIDIA’s AI practitioners

Designed to solve the previously unsolvable

Configurations start at 20 systems

NVIDIA DGX SuperPOD deployed in SATURNV

1,120 A100 GPUs

140 DGX A100 systems

170 Mellanox 200G HDR switches

4 PB of high-performance storage

700 PFLOPS of power to train the previously impossible

nvidia.com/en-us/data-center/dgx-a100/

Page 10: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

10

ProvisioningOS Provisioning, BMaaS, netw ork assignment

Cluster CMSW deployment, updates & upgrades

Sys Monitoring & ReportingSystem usage, health checks & alerting

Dataset ManagementStorage, tagging, & versioning of datasets

Interactive NotebooksNotebooks w / schedulable GPU resources

Experiment ManagementJob & results tracking

GUI/CLIPortal/CLI/A PI for requesting resources

Model DeploymentDeployment to prod, inference services, etc

User ManagementAuth, users, teams, & resource restrictions

System Administrator

Data Scientist/Researcher

AI Infra on DGX PODWhat are customers asking for?

Page 11: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

11

Page 12: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

1212

5 MIRACLES OF A100

AmpereWorld’s Largest 7nm chip

54B XTORS, HBM2

3rd Gen NVLINK and NVSWITCHEfficient Scaling to Enable Super GPU

2X More Bandwidth

3rd Gen Tensor CoresFaster, Flexible, Easier to use

20x AI Perf with TF32

New Sparsity AccelerationHarness Sparsity in AI Models

2x AI Performance

New Multi-Instance GPUOptimal utilization with right sized GPU

7x Simultaneous Instances per GPU

Page 13: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

13

NEW MULTI-INSTANCE GPU (MIG)Optimize GPU Utilization, Expand Access to More Users with Guaranteed Quality of Service

nvidia.com/en-us/technologies/multi-instance-gpu/

Up To 7 GPU Instances In a Single A100: Dedicated SM, Memory, L2 cache, Bandwidth for hardware QoS & isolation

Simultaneous Workload Execution With Guaranteed Quality Of Service: All MIG instances run in parallel with predictable throughput & latency

Right Sized GPU Allocation: Different sized MIG instances based on target workloads

Flexibility to run any type of workload on a MIG instance

Diverse Deployment Environments: Supported with Bare metal, Docker, Kubernetes, Virtualized Env.

Amber

GPU Mem

GPU

GPU Mem

GPU

GPU Mem

GPU

GPU Mem

GPU

GPU Mem

GPU

GPU Mem

GPU

GPU Mem

GPU

https://blogs.nvidia.com/blog/2020/05/14/multi-instance-gpus/

Page 14: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

14

BECAUSE MODEL DEVELOPMENT IS JUST THE FIRST STEP

Develop and Test Locally

Package─• Dependencies• Parameters• Run scripts

• Build

Scale-out─• Load-balance• Data partitions• Model distribution

• AutoML

Tune─• Parallelism• GPU support• Query tuning

• Caching

Instrument─• Monitoring• Logging• Versioning

• Security

Automate─• CI/CD• Workflows• Rolling upgrades

• A/B testing

Weekswith one data

scientist or developer

Monthswith a large team of developers,

scientists, data engineers and DevOps

Production

Page 15: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

15

Page 16: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

16

Page 17: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

17

Page 18: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations
Page 19: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

19

Page 20: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

20

AI IS NOT MAGIC

Page 21: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

Definitions

Page 22: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

22

BUILDING AN AI MODEL

AI MODELFEATURES DEPLOYMENTDATA

DATA

ANALYTICSMACHINE

LEARNING

MODEL

VALIDATION

NEW DATA

Page 23: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

23

BUILDING AN AI PRODUCT

SENSORS

PERCEIVE REASON

PLAN

DATA

DATAANALYTICS

MACHINE LEARNING

AI MODELVALIDATION

ACTUATORSAI MODEL

Page 24: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

24

12

6

39

GPUPOWEREDWORKFLOW

DAY IN THE LIFE OF A DATA SCIENTIST

Train Model

Validate

Test Model

Experiment with Optimizations and Repeat

Go Home on Time

DatasetDownloadsOvernight

Start GET A COFFEE

Stay Late

Restart Data Prep Workflow Again

Find Unexpected Null Values Stored as String…

Switch to Decaf

12

6

39

CPUPOWEREDWORKFLOW

Restart Data Prep Workflow

@*#! Forgot to Add a Feature

ANOTHER…

GET A COFFEE

Start Data PrepWorkflow

GET A COFFEE

Configure Data PrepWorkflow

DatasetDownloadsOvernight

Dataset Collection Analysis Data Prep Train Inference

Page 25: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

25

NVIDIA Nsight Systems

• Balance your workload across multiple CPUs and GPUs

• Locate idle CPU and GPU time

• Locate redundant synchronizations

• Locate optimization opportunities

• Improve application’s performance

System Wide Profiling Tool

Page 26: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

26

Processes and threads

CUDA and OpenGL API trace

Multi-GPU

Kernel and memory transfer activities

cuDNN and cuBLAS trace

Thread/core migration

Thread state

Page 27: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

https://arxiv.org/pdf/1909.13371.pdf

Page 28: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

28

IMAGE BASED DL IS EASY

Object detection Semantic Segmentation

Figures copyright Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun,2015. [Faster R-CNN]

Figures copyr ight Preferred Networks Inc., 2016.

Page 29: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

29

Numerous applications

3D DL IS EXCITING

Simulation Medical imaging Autonomous driving

Manipulation Robotics Augmented reality

* This slide is best viewed in "slide show" mode.

Page 30: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

30

KAOLIN

- A Pytorch library for 3D DL

- Supports a wide range of 3D data representations

- Convenient dataloading/preprocessing/conversions

- Large collection of 3D neural nets to choose from

- Optimized implementations

- Omniverse-Kit integration for easy rendering,

interactive visualization, and much more.

https://gitlab-master.nvidia.com/Toronto_

DL_Lab/kaolin

Page 31: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

31

NOVOGRADhttps://arxiv.org/pdf/1905.11286.pdf

Page 32: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

32

Page 33: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

33

World Sense See, Understand Automation

AI Program

Computer

ARTIFICIAL INTELLIGENCE IS DOMAIN SPECIFIC

Self-Driving

Page 34: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

34

World Sense See, Understand Automation

AI Program

Computer

AI Program

Computer

ARTIFICIAL INTELLIGENCE IS DOMAIN SPECIFIC

Self-Driving

Manufacturing

Page 35: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

35

World Sense See, Understand Automation

AI Program

Computer

AI Program

Computer

AI Program

Computer

ARTIFICIAL INTELLIGENCE IS DOMAIN SPECIFIC

Self-Driving

Manufacturing

Radiology

Page 36: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

36

Page 37: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

37

RAPIDSGPU POWERED MACHINE LEARNINGMiguel Martínez – Sr. Data Scientist @ NVIDIA

Page 38: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

38

WHAT IS RAPIDS

Page 39: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

39

GPU Accelerated Data Science

RAPIDS is a set of open source software libraries which

gives you the freedom to execute end-to-end data science

and analytics pipelines entirely on GPUs.

www.rapids.ai

Page 40: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

40

CPU Memory

Data Preparation VisualizationModel Training

Open Source Data Science Ecosystem

Familiar Python APIs

Dask

Matplotlib/Plotly

Visualization

Scikit-Learn

Machine Learning

NetworkX

Graph Analytics

Pandas

Analytics

Pytorch, MxNet…

Deep Learning

Page 41: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

41

GPU Memory

Dask

cuXFilter <> pyViz

Visualization

cuML

Machine Learning

cuGraph

Graph Analytics

cuDF

Analytics

Pytorch, MxNet…

Deep Learning

Data Preparation VisualizationModel Training

End-to-End Accelerated GPU Data Science

Page 42: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

42

cuDF

• GPU-accelerated data preparation and feature engineering

• Python drop-in Pandas replacement

cuML

• GPU-accelerated traditional machine learning libraries

• XGBoost, PCA, Kalman, K-means, k-NN, DBScan, tSVD…

cuGraph

• GPU-accelerated graph analytics libraries

cuXfilter

• Web Data Visualization library

• DataFrame kept in GPU-memory throughout the session

Page 43: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

43

LEARNING FROM

Pandas

Spark

Drill

Impala

Parquet

Cassandra Kudu

HBase

Copy & Convert

Copy & Convert

Copy & Convert

Copy & Convert

Copy & Convert

Each system has its own internal memory format

Similar functionality implemented in multiple projects

70-80% computation wasted on serialization & deserialization

All systems utilize the same memory format

Projects can share functionality

No overhead for cross-system communication

Pandas

Spark

Drill

Impala

Parquet

Cassandra Kudu

HBase

Arrow Memory

Page 44: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

44

APACHE ARROW

Columnar layout leverages GPU strengths

Emphasis on zero-copy and shallow-copy operations minimizes a core bottleneck

Consistency with CPU version simplifies development and conversion

gdf[‘session_id’]

Page 45: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

45

Why OpenUCX?Bringing Hardware Accelerated Communications to Dask

• TCP sockets are slow!

• UCX provides uniform access to transports:

– TCP, InfiniBand, Shared memory, NVLink

• Alpha Python bindings for UCX (ucx-py)

• Provides best communication performance to Dask, based on available hardware on nodes/cluster

Page 46: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

46

Environment

• cuDF v0.11,

• UCX-PY 0.11

• Running on NVIDIA DGX-2:

• GPU NVIDIA Tesla V100 32GB

• CPU Intel(R) Xeon(R) CPU 8168 @ 2.70GHz

Benchmark Setup

• DataFrames:

Left/Right 1x int64 column key column,

1x int64 value columns.

• Inner Merge

• 30% of matching data balanced across each partition

Distributed cuDF Random Merge

BENCHMARKS

Page 47: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

47

cuDF

Page 48: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

48

GPU-Accelerated ETLThe average data scientist spends 90+% of their

time in ETL, as opposed to training models

Page 49: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

49

• Follow Pandas APIs and provide >10x speedup

– CSV Reader/Writer

– Parquet Reader/Writer

– ORC Reader/Writer

– JSON Reader

– Avro Reader

• GPU Direct Storage integration in progress forbypassing PCIe bottlenecks!

• Key is GPU-accelerating both parsing anddecompression wherever possible

EXTRACTION IS THE CORNERSTONEcuDF for Faster Data Loading

Page 50: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

50

Python

Cython

cuDF C++

CUDA

cuDFDask cuDF

Pandas

ThrustCub

Jitify

CUDA Libraries

ETL Technology Stack

Page 51: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

51

ETL – THE BACKBONE OF DATA SCIENCE

libcuDF is… cuDF is…

• Low level library containing function

implementations and C/C++ API

• Importing/exporting Apache Arrow in GPU

memory using CUDA IPC

• CUDA kernels to perform element-wise

math operations on GPU DataFrame columns

• CUDA sort, join, groupby, reduction, etc.

operations on GPU DataFrames

• A Python library for manipulating GPU

DataFrames following the Pandas API

• Python interface to CUDA C++ library with

additional functionality

• Create GPU DataFrames from Numpy arrays,

Pandas DataFrames, and PyArrow Tables

• JIT compilation of User-Defined Functions

(UDFs) using Numba

CUDA C++ Library Python Library

Page 52: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

52

BENCHMARKSSingle-GPU Speedup vs Pandas

Environment

• cuDF v0.13

• Pandas v0.25.3

• GPU NVIDIA Tesla V100 32GB

• CPU Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz

Benchmark Setup

• DataFrames:

2x int32 columns key columns,

3x int32 value columns.

• Inner Merge

• GroupBy:

count, sum, min, max.

calculated for each value column.

500

240 220

970

360

290

0

200

400

600

800

1000

1200

Merge Sort GroupBy

# rows

GPU Speedup over CPU

10M 100M

Page 53: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

53

500

240 220

970

360290

0

200

400

600

800

1000

1200

Merge Sort GroupBy

# rows

GPU Speedup over CPU

10M 100M

430

140

28

870

300

150

0

100

200

300

400

500

600

700

800

900

1000

Merge Sort GroupBy

# rows

GPU Speedup over CPU

10M 100M

BENCHMARKSContinuous Improvement

cuDF v0.10 cuDF v0.13

Page 54: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

54

LOADING DATA INTO A GPU DATAFRAME

Create an empty DataFrame, and add a column

cuDF code examples

Create a DataFrame with two columns

Load a CSV file into a GPU DataFrame

Use Pandas to load a CSV file, and copy its content into a GPU DataFrame

Page 55: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

55

WORKING WITH GPU DATAFRAMEScuDF code examples

Return the first three rows as a new DataFrame Row slicing with column selection

Find the mean and standard deviation of a column Count number of occurrences per value, and number of unique values

Transform column values with a custom function Change the data type of a column

Page 56: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

56

QUERY, SORT, GROUP, JOIN, …cuDF code examples

Query a DataFrame with a boolean expression

Return the first ‘n’ rows ordered by ‘columns’

Sort a column by its values

One-hot encoding

Group by column with aggregate function

Join and merge DataFrames

Page 57: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

57

cuML

Page 58: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

58

Python

Cython

cuML Algorithms

cuML Prims

CUDA Libraries

CUDA

cuDFDask cuDFDask cuML

Numpy

ThrustCub

cuSolvernvGraphCUTLASScuSparsecuRandcuBlas

ML Technology Stack

Page 59: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

59

CPU vs GPU

Training results

CPU: 57.1 seconds

GPU: 4.28 seconds

System: AWS p3.8xlarge

CPU: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz,

32 vCPU cores, 244 GB RAM

GPU: Tesla V100 SXM2 16GB

PRINCIPAL COMPONENT

ANALYSIS(PCA)

Specific: Import CPU algorithm

Common: Data loading and algo params Common: Data loading and algo params

Specific: DataFrame from Pandas to GPU

Common: Model training Common: Model training

Specific: Import GPU algorithm

Page 60: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

60

cuML roadmap

March 2020 – RAPIDS 0.13

cuML Single-GPU Multi-GPUMulti-NodeMulti-GPU

Gradient Boosted Decision Trees

Linear Regression

Logistic Regression

Random Forest

K-Means

K-NN

DBSCAN

UMAP

ARIMA & Holt-Winters

Kalman Filter

t-SNE

Principal Components

Singular Value Decomposition

SVM

Page 61: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

61

cuML roadmap

2020 – RAPIDS 1.0

cuML Single-GPU Multi-GPUMulti-NodeMulti-GPU

Gradient Boosted Decision Trees

Linear Regression

Logistic Regression

Random Forest

K-Means

K-NN

DBSCAN

UMAP

ARIMA & Holt-Winters

Kalman Filter

t-SNE

Principal Components

Singular Value Decomposition

SVM

Page 62: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

62

0 5,000 10,000

20 CPU Nodes

30 CPU Nodes

50 CPU Nodes

100 CPU Nodes

DGX-2

5x DGX-1

BENCHMARKS

Benchmark

200GB CSV dataset; Data preparation includes joins, variable transformations.

CPU Cluster Configuration

CPU nodes (61 GiB of memory, 8 vCPUs, 64-bit platform), Apache Spark

DGX Cluster Configuration

5x DGX-1 on InfiniBand network

Time in seconds — Shorter is better

2290

1956

1999

1948

147

137

0 1,000 2,000 3,000

20 CPU Nodes

30 CPU Nodes

50 CPU Nodes

100 CPU Nodes

DGX-2

5x DGX-1

2741

1675

715

379

37

17

0 1,000 2,000 3,000

20 CPU Nodes

30 CPU Nodes

50 CPU Nodes

100 CPU Nodes

DGX-2

5x DGX-1

8762

6148

3925

3,221

209

164

cuDF – Load and Data Prep cuML – XGBoost End-to-End

cuDF (Load and Data Preparation) Data Conversion XGBoost

Page 63: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

63

cuGraph

Page 64: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

64

Python

Cython

cuGraph Algorithms

CUDA

cuDFDask cuDFDask cuML

Numpy

ThrustCub

cuSolvercuSparsecuRand

Gunrock*

Prims

CUDA Libraries

cuGraphBLAS cuHornet

Graph Technology Stack

* Gunrock is from UC Davis

Page 65: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

65

Focus on Features and User Experience

GOALS AND BENEFITS OF CUGRAPH

• Property Graph support via DataFrames

Seamless Integration with cuDF & cuML

• Up to 500 million edges on a single 32GB GPU

• Multi-GPU support for scaling into the billions

of edges

Breakthrough Performance

• Python: Familiar NetworkX-like API

• C/C++: lower-level granular control for

application developers

Multiple APIs

• Extensive collection of algorithm, primitive,

and utility functions

Growing Functionality

Page 66: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

66

Louvain Single Run

Returns:

cudf.DataFrame with two names columns:

- louvain["vertex"]: The vertex id.

- louvain["partition"]: The assigned partition.

G = cugraph.Graph()

G.add_edge_list(gdf["src_0"], gdf["dst_0"], gdf["data"])

df, mod = cugraph.nvLouvain(G)

Page 67: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

67

BENCHMARKSpeedup vs Scipy PageRank and cyLouvain

Page 68: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

68

cuGraph roadmap

March 2020 – RAPIDS 0.13

cuGraph Single-GPU Multi-GPUMulti-NodeMulti-GPU

PageRank

Personal Page Rank

Katz

Betweenness Centrality

Spectral Clustering

Louvain

Ensemble Clustering for Graphs

K-Core

K-Truss

Triangle Counting

Connected Components (Weak and Strong)

Jaccard

Overlap Coefficent

Single Source Shortest Path (SSSP)

Breadth First Search (BFS)

Page 69: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

69

cuGraph roadmap

2020 – RAPIDS 1.0

cuGraph Single-GPU Multi-GPUMulti-NodeMulti-GPU

PageRank

Personal Page Rank

Katz

Betweenness Centrality

Spectral Clustering

Louvain

Ensemble Clustering for Graphs

K-Core

K-Truss

Triangle Counting

Connected Components (Weak and Strong)

Jaccard

Overlap Coefficent

Single Source Shortest Path (SSSP)

Breadth First Search (BFS)

Page 70: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

70

HOW TO START

Page 71: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

71

On-premisesIn the cloud

https://github.com/rapidsai

Source code on GitHub

https://ngc.nvidia.com

Containers on NGC & Docker Hub

https://anaconda.org/rapidsai

Conda packages

Pascal architecture or better CUDA 9.2, 10.0 or 10.1.2

Ubuntu 16.04/18.04,CentOS 7 & RHEL 7

Page 72: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

72https://rapids.ai/start.html

Page 73: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

73https://rapids.ai/start.html

Page 74: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

74

LEARN MORE

Page 75: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

75www.rapids.ai

Page 76: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

76

Page 77: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

ALISON B LOWNDESAI DevRel | EMEA

Page 78: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

78

https://fortune.com/longform/ai-artificial-intelligence-big-tech-microsoft-alphabet-openai/

Page 79: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

www.Robust.ai

Gray Marcus, Rodney Brooks, Steven Pinker et al

Page 80: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

80

Page 81: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

81

EVOLUTIONARY META-LEARNINGhttps://arxiv.org/abs/2003.01239

Page 82: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

82

Page 83: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

83

Brain Computer Interfaces Focused on treatment for disease and dysfunction eg epilepsy, depression, Parkinsons but ultimately to advance human intelligence by restoring and extending cognitive vibrancy.

“We’re either going to have to merge with AI or be left behind”; Elon Musk

Page 84: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

84

Page 85: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

85

Page 86: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

86

Page 87: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

87

Page 88: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

88

ISAAC PLATFORM FOR ROBOTICSNvidia's multi-tool for robotics

DESIGN

JETSON XAVIER

SIMULATE TRAIN DEPLOY

Page 89: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

89

CARTER — THE NAVIGATION ROBOT

High-end platform for logistics applications

Navigation Stack

3D Obstacle Detection & Avoidance

NVIDIA Jetson Xavier

Page 90: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

90

Page 91: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

91

Page 92: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

92

GTC Digital talk: S21182

Page 93: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

93

ISAAC 2020.1

“High-fidelity simulation lets us train and test algorithms more effectively, leading to more robust and adaptive networks” A. Anandkumar, Prof CS, CalTech & NVIDIA.

Page 94: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

94

Page 95: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

95

Page 96: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations
Page 97: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

97

THE JETSON FAMILYfor AI at the Edge and Autonomous System designs

Same software Full specs at developer.nvidia.com/jetson * TX2i: 10-20W

7.5 – 15W*50mm x 87mm

JETSON TX2 series1.3 TFLOPS (FP16)

5 - 10W45mm x 70mm

JETSON NANO0.5 TFLOPS (FP16)

10 – 30W100mm x 87mm

JETSON AGX XAVIER series11 TFLOPS (FP16)32 TOPS (INT8)

10 - 15W45mm x 70mm

JETSON Xavier NX6 TFLOPS (FP16)21 TOPS (INT8)

Mainstream Autonomous machinesEntry

Page 98: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

NVIDIA Jetson

Xavier “NX”

21TOPS (INT8) at 15w8GB LPDDR4x16GB eMMCSupports up to 32x 1080p IP cameras

70x45mmModule

Developer kit

Page 99: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

99

Sample Code

Deep Learning

CUDA, Linux4Tegra, ROS

Multimedia API

MediaComputer Vision Graphics

Nsight Developer Tools

Jetson Embedded Supercomputer: Advanced GPU, 64-bit CPU, Video CODEC, VIC, ISP

JETPACK SDK FOR AI @ THE EDGE

DEVELOPER.NVIDIA.COM/EMBEDDED-COMPUTING

TensorRT

cuDNN

VisionWorks

OpenCV

Vulkan

OpenGL

libargus

Video API

Page 100: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

100

DEEP LEARNING INSTITUTE

Training Labs

Nanodegrees

nvidia.com/DLI

TWO DAYS TO A DEMO

Create your first demo today

developer.nvidia.com/

embedded/twodaystoademo

JETSON DEVELOPER KIT

AGX Xavier Developer Kit $699

Xavier NX software patch

developer.nvidia.com/

buy-jetson

GTC

Largest event for GPU

developers

gputechconf.com

JETSON - START NOW

Page 101: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations
Page 102: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations
Page 103: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

103

JARVISFramework for Multimodal Conversational AI services

PRE-TRAINED MODELS

JARVIS

End-to-End Multimodal Conversational AI ServicesPre-trained SOTA models-100,000 Hours of DGX Retrain with NeMoInteractive Response – 150ms on A100 versus 25sec on CPUDeploy Services with One Line of Code

RETRAIN

video

audio

Multi-Speaker

Tr anscription

NVIDIA GPU CLOUD NVIDIA AI TOOLKIT

Transfer Learning

NeMo

Service Maker

TRITON INFERENCE SERVER

Dialog Manager

ChatbotMulti-

Speaker Tr anscription

Look to TalkGestur e

Recognition

Speech

Vision

NLU

Sign-up for EA:developer.nvidia.com/nvidia-jarvis

Page 104: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

104

Enabling Resilience & Monitoring of Advanced Deployments

Package manager for Kubernetes Easily configure, deploy and update

applications on Kubernetes

Container OrchestrationAutomated container deployment

including self-healing

Cloud Native Deployment Approach

NVIDIA EGX Stack

GPU Operator

Page 105: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

105

PURPOSE-BUILT AI SUPERCOMPUTERS

AI WORKSTATION AI DATA CENTER

Universal SW for Deep Learning

Predictable execution across platforms

Pervasive reach

NGC DL SOFTWARE STACK

The Essential Instrument for AI

Research

DGX-1

The Personal AI Supercomputer

DGX Station

The World’s Most Powerful AI System for the Most Complex AI Challenges

DGX-2

Page 106: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

106106

NEW NGC FEATURES

SDKs & CONTAINERS FOR A100Q2

NGC PRIVATE REGISTRYNow

NGC-READY SYSTEMS FOR A100Q2

DL - TF, PyT, MxNet, Triton…

HPC – NAMD, Chroma, LAMMPS…

Easily grant and manage content access

Container scanning and signing.Model versioning and encryption

Multi-arch support - x86, Arm, POWER

Securely share and collaborate

Industry SDKs – Jarvis, Aerial…

Page 107: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

107

Secure and Accelerate End to End AI WorkflowsNGC AI Model and Security Enhancements

PRE-TRAINED

MODELS

AI Toolkits & SDK’s

Transfer Learning

Federated Learning

NeMo

ConversationalAI

TensorRT Optimizer

Service Maker

TRAINING & REFINING

NGC Catalog Private Registry

Container Signing

Model Encryption

Model Versioning

Security Scanning

Access Control

DeploySecure

Manage

Remote EGX Systems

Page 108: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

108

NVIDIA CLARA FIGHTING COVIDTesting | Treating | Tracking

Clara GuardianVideo, Vision, Voice18+ Global Partners

First DGX A100 for COVIDArgonne National LabBlue Print for Pharmas

Accelerated GenomicsMinutes & Hours vs. Weeks & MonthsEpidemiology to Infected Population

AI Models for COVID in CT2 Pre-trained Models

NVIDIA Clara Imaging in NGC

Page 109: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

109

FIVE ROADS TO GPU COMPUTING

GPU Libraries______________

Drop-in replacement for

existing libraries

cuBLAS, CUDA Math,

cuSPARSE, cuRAND, cuSOLVER, nvGRAPH, cuDNN,

cuFFT, Thrust

OPEN-ACC______________

Comment-based

directives in

C / C++ / Fortran

Single source code

parallelization for

multiple architectures

CUDA______________

Parallel Programming

Model for GPUs in C, C++,

Fortran, Python, MATLAB

Specialized Kernels for

general purpose GPU

RAPIDS______________

GPU Acceleration of

Traditional Machine

Learning

Accelerate Scikit-Learn

style ML algorithms

DEEP LEARNING______________

GPU accelerated deep

learning frameworks

TensorFlow, Pytorch

Build GPU-accelerated

functions directly

from data

Page 110: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

110developer.nvidia.com

Page 111: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

111

RICH CONTENT PORTFOLIOFundamentals and advanced hands-on training in key technologies and application domains

AI for Digital Content Creation

Deep Learning Fundamentals

AI for HealthcareAI for Autonomous Vehicles

AI for Intelligent Video Analytics

Accelerated Computing Fundamentals

AI for RoboticsAI for

Predictive Maintenance

Accelerated Data Science Fundamentals

Intro to AI in the Data Center

AI for Anomaly Detection

AI for Industrial Inspection

Page 112: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

112

DLI UNIVERSITY TRAINING

UNIVERSITY AMBASSADOR PROGRAM

• Qualified faculty and researchers can get certified to teach DLI

workshops to their students at no cost.

• Hundreds of universities certified around the world, including:

TEACHING KITS

• Qualified university educators can download courseware across

deep learning, accelerated computing, and robotics.

• Kits include lecture materials, GPU cloud resources, access to

self-paced DLI courses, and more.

Learn more at www.nvidia.com/dli

Page 113: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

113

`https://blogs.nvidia.com/blog/2019/11/20/nvidia-microsoft-aid-ai-startups/

Page 114: ALISON B LOWNDES · The blueprint for AI power and scale using DGX A100 Infused with the expertise of NVIDIA’s AI practitioners Designed to solve the previously unsolvable Configurations

THANK YOU

[email protected]