AI, A New Computing Model

TAIPEI | SEP. 21-22, 2016

Marc Hamilton, VP Solutions Architecture & Engineering

AI, A NEW COMPUTING MODEL

2

GPU Computing

NVIDIAComputing for the Most Demanding Users

Computing Human Imagination

Computing Human Intelligence

3

DEEP LEARNING —A NEW COMPUTING MODEL

“Software that writes software”

“little girl is eating piece of cake"

LEARNINGALGORITHM

“millions of trillions of FLOPS”

4

AI IS EVERYWHERE

“Find where I parked my car” “Find the bag I just saw in this magazine”

“What movie should I watch next?”

5

TOUCHING OUR LIVES

Bringing grandmother closer to family by bridging language barrier

Predicting sick baby’s vitals like heart rate, blood pressure, survival rate

Enabling the blind to “see” their surrounding, read emotions on faces

6

FUELING ALL INDUSTRIES

Increasing public safety with smart video surveillance at airports & malls

Providing intelligent services in hotels, banks and stores

Separating weeds as it harvests, reduces chemical usage by 90%

7

DEEP LEARNING DEMANDS NEW CLASS OF HPC

TRAINING INFERENCING

Data / Users

ScalablePerformance

Throughput+ Efficiency

Billions of TFLOPS per training run

Years of compute-days on Xeon CPU

GPU turns years to days

Billions of FLOPS per inference

Seconds for response on Xeon CPU

GPU for instant response

8

BAIDU DEEP SPEECH 2

12K Neurons

100MParameters

2.5x Deep Speech 1 4x Deep Speech 1

15 Exaflops

Super-humanAccuracy

10x Deep Speech 12 Months on CPU Server | 2 Days on DGX-1

Word Error RateDS2: 5% | Human: 6% | DS1: 8%

“Deep Speech 2: End-to-End Speech Recognition in English and Mandarin”, 12/2015 | Dataset: LibriSpeech test-clean

9

MODERN AI NEEDS NEW INFERENCE SOLUTION

0 0.5 1 1.5 2 2.5

Network

Network

Deep Speech 2

User Wait Time (seconds)

“Where is the nearest Szechuan restaurant?”

User Experience: From Seconds to InstantWait Time for Text after Speech is Complete

6 secCPU

0.1 secPascal GPU

Deep Speech 2 inference performance on 16 user server | CPU: 170 ms of estimated compute time required for each 100 ms of speech sample | Pascal GPU: 51 ms of compute required for each 100

ms of speech sample

2.2 secCPU

11

“FIVE MIRACLES”

16nm FinFETPascal Architecture CoWoS with HBM2 New AI AlgorithmsNVLink

12

0X

4X

8X

12X

16X

GeForce® GTX TITAN X GeForce® GTX 1080 Tesla® P100 DIGITS™ DevBox (4X GeForce® GTX Titan X)

Quadro® VCA (8X Quadro® M6000)

DGX-1™ (8X Tesla® P100)

Rela

tive

Tra

inin

g Pe

rfor

man

ce

ResNet Inception v3 AlexNet vgg MSR

DGX-1 — A LEAGUE OF ITS OWN

NVIDIA CONFIDENTIAL. PRELIMINARY NUMBERS. NOT FOR DISTRIBUTION.Caffe on DeepMark. GeForce TITAN X and GTX 1080 system: Intel Core i7-5930K @ 3.5 GHz, 64 GB System Memory | Tesla P100 (SXM2) system: Dual CPU server, Intel E5-2698 v4 @ 2.2 GHz, 256 GB System Memory

1X

GeForce GTX TITAN X GeForce GTX 1080 Tesla P100 DIGITSDevBox(4X GeForce GTX TITAN X)

Quadro VCA(8X Quadro M6000)

DGX-1(8X Tesla P100)

13

Instant productivity — plug-and-play, supports every AI framework

Performance optimized across the entire stack

Always up-to-date via the cloud

Mixed framework environments —containerized

Direct access to NVIDIA experts

DGX STACKFully integrated Deep Learning platform

14

DGX — THE ESSENTIAL TOOL OF DEEP LEARNING SCIENTISTS

The platform ofAI pioneers

Reduce training timefrom weeks to days

250 node HPC Supercomputer-in-a-Box

15

0 50 100 150 200 250 300

P40

P4

1x CPU (14 cores)

Inference Execution Time (ms)

11 ms

6 ms

User Experience: Instant Response45x Faster with Pascal + TensorRT

Faster, more responsive AI-powered services such as voice recognition, speech translation

Efficient inference on images, video, & other data in hyperscale production data centers

Based on VGG-19 from IntelCaffe Github: https://github.com/intel/caffe/tree/master/models/mkl2017_vgg_19CPU: IntelCaffe, batch size = 4, Intel E5-2690v4, using Intel MKL 2017 | GPU: Caffe, batch size = 4, using TensorRT internal version

INTRODUCING NVIDIA TensorRTHigh Performance Inference Engine

260 ms

16

NVIDIA DEEPSTREAM SDKDelivering Video Analytics at Scale

Inference

PreprocessHardware Decode

“Boy playing soccer”

Simple, high performance API for analyzing video

Decode H.264, HEVC, MPEG-2, MPEG-4, VP9

CUDA-optimized resize and scale

TensorRT

0

20

40

60

80

100

1x Tesla P4 Server + DeepStream SDK

13x E5-2650 v4 Servers

Conc

urre

nt V

ideo

Str

eam

s

Concurrent Video Streams Analyzed

720p30 decode | IntelCaffe using dual socket E5-2650 v4 CPU servers, Intel MKL 2017Based on GoogLeNet optimized by Intel: https://github.com/intel/caffe/tree/master/models/mkl2017_googlenet_v2

17

PIONEERS ADOPTING HPCFOR DEEP LEARNING

“Investments in computer systems — and I think

the bleeding-edge of AI, and deep learning

specifically, is shifting to HPC — can cut down

the time to run an experiment from a week to

a day and sometimes even faster.” — Andrew Ng, Baidu

Dr. Andrew Ng, Chief Scientist, Baidu

18NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

END-TO-END DATA CENTER PRODUCT FAMILY

MIXED-APPS HPCSTRONG-SCALE HPC

Data centers running HPC and DL apps scaling to multiple GPUs

HPC data centers running mix of CPU and GPU workloads

HYPERSCALE HPC

Hyperscale deployment for deep learning training & inference

Training - Tesla P100

Inference - Tesla P40 & P4

Tesla P100 with NVLink Tesla P100 with PCI-E

19

NVIDIA EXPERTISE AT EVERY STEP

Solution Architects Global Network of Partners

Deep Learning Institute

GTC Conferences

1:1 supportNetwork training setupNetwork optimization

Certified expert instructorsWorldwide workshops

Online courses

Epicenter of industry leadersOnsite trainingGlobal reach

NVIDIA Partner NetworkOEMs

Startups

Need image

20

NVIDIA DEEP LEARNING PARTNERS

Graph Analytics Enterprises Data ManagementDL Frameworks Enterprise DL

Services Core Analytics Tech

21

MOST PERVASIVE HPC PLATFORM EVER BUILT

ACCESS ANYWHERE BUY ANYWHERE LEARN EVERYWHERE

+ 240 Resellers Worldwide

1000Universities Teaching CUDA

78Countries

300KCUDA Developers

TAIPEI | SEP. 21-22, 2016

THANK YOU

AI, A New Computing Model

Technology

Transcript of AI, A New Computing Model