Building Automous Machines with JETSON - Inria · HPC data centers running mix ... €630 / £450...

24
1 NVIDIA TEGRA K1 Mar 2, 2014 NVIDIA Confidential Francois Courteille Senior Solution Architect, Accelerated Computing Building Automous Machines with JETSON Machine Learning, Vision , and GPUs

Transcript of Building Automous Machines with JETSON - Inria · HPC data centers running mix ... €630 / £450...

Page 1: Building Automous Machines with JETSON - Inria · HPC data centers running mix ... €630 / £450 retail €350 / £250 academic discount . 14. 15 ... Jetson TX1 // Tesla/Titan/GRID

1

NVIDIA TEGRA K1 Mar 2, 2014

NVIDIA Confidential

Francois Courteille Senior Solution Architect, Accelerated Computing

Building Automous Machines with JETSON Machine Learning, Vision , and GPUs

Page 2: Building Automous Machines with JETSON - Inria · HPC data centers running mix ... €630 / £450 retail €350 / £250 academic discount . 14. 15 ... Jetson TX1 // Tesla/Titan/GRID

2

AGENDA

JETSON™ EMBEDDED PLATFORM Deep learning for next-generation intelligent, autonomous machines

Background on NVIDIA

Deep Learning & GPUs

NVIDIA Embedded Platform

Deep Learning & Autonomy

Page 3: Building Automous Machines with JETSON - Inria · HPC data centers running mix ... €630 / £450 retail €350 / £250 academic discount . 14. 15 ... Jetson TX1 // Tesla/Titan/GRID

3

THE WORLD LEADER IN VISUAL COMPUTING

GAMING ENTERPRISE TECHNOLOGY HPC & CLOUD AUTO

Page 4: Building Automous Machines with JETSON - Inria · HPC data centers running mix ... €630 / £450 retail €350 / £250 academic discount . 14. 15 ... Jetson TX1 // Tesla/Titan/GRID

4

CREDENTIALS BUILT OVER TIME

300K CUDA Developers, 4x Growth in 4 years

Majority of HPC Applications are GPU-Accelerated, 410 and Growing

100% of Deep Learning Frameworks are Accelerated

113

206

242

370

410

0

50

100

150

200

250

300

350

400

450

2011 2012 2013 2014 2015 2016

287

# of Applications Academia Games Finance

Manufacturing Internet Oil & Gas

National Labs Automotive Defense

M & E

300K

TORCH

THEANO

CAFFE

MATCONVNET

PURINEMOCHA.JL

MINERVA MXNET*

BIG SUR TENSORFLOW

WATSON CNTK

Page 5: Building Automous Machines with JETSON - Inria · HPC data centers running mix ... €630 / £450 retail €350 / £250 academic discount . 14. 15 ... Jetson TX1 // Tesla/Titan/GRID

5

END-TO-END TESLA PRODUCT FAMILY

HYPERSCALE HPC

Tesla M4, M40

MIXED-APPS HPC

Tesla K80

STRONG-SCALING HPC

Tesla P100

FULLY INTEGRATED DL SUPERCOMPUTER

DGX-1

For customers who need to get going now with fully

integrated solution

Hyperscale & HPC data centers running apps that

scale to multiple GPUs

HPC data centers running mix of CPU and GPU workloads

Hyperscale deployment for DL training, inference, video &

image processing

Page 6: Building Automous Machines with JETSON - Inria · HPC data centers running mix ... €630 / £450 retail €350 / £250 academic discount . 14. 15 ... Jetson TX1 // Tesla/Titan/GRID

6

THE BIG BANG IN MACHINE LEARNING

“ Google’s AI engine also reflects how the world of computer hardware is changing. (It) depends on machines equipped with GPUs… And it depends on these chips more than the larger tech universe realizes.”

DNN GPU BIG DATA

Page 7: Building Automous Machines with JETSON - Inria · HPC data centers running mix ... €630 / £450 retail €350 / £250 academic discount . 14. 15 ... Jetson TX1 // Tesla/Titan/GRID

7

K40 K80 + cuDNN1

M40 + cuDNN4

P100 + cuDNN5

0x

10x

20x

30x

40x

50x

60x

70x

BLISTERING PACE

OF INNOVATION

FOR AI

AlexNet training throughput based on 20 iterations, CPU: 1x E5-2680v3 12 Core 2.5GHz. 128GB System Memory, Ubuntu 14.04

M40 bar: 8x M40 GPUs in a node P100: 8x P100 NVLink-enabled

Deep Learning Training Performance Caffe AlexNet

2013 2014 2015 2016

Speed-u

p o

f Im

ages/

Sec v

s K40 in 2

013

Page 8: Building Automous Machines with JETSON - Inria · HPC data centers running mix ... €630 / £450 retail €350 / £250 academic discount . 14. 15 ... Jetson TX1 // Tesla/Titan/GRID

8

72% 74%

84%

88% 93%

95.1%

Human:94.9%

2010 2011 2012 2013 2014 2015

Deep Learning on GPUs

HOW FAR HAS AUTONOMY COME? ImageNet classification accuracy

Page 9: Building Automous Machines with JETSON - Inria · HPC data centers running mix ... €630 / £450 retail €350 / £250 academic discount . 14. 15 ... Jetson TX1 // Tesla/Titan/GRID

9

Object Classification

Localization/Mapping

Collision Avoidance

3D Reconstruction

Segmentation

DEEP LEARNING AND AUTONOMY

Page 10: Building Automous Machines with JETSON - Inria · HPC data centers running mix ... €630 / £450 retail €350 / £250 academic discount . 14. 15 ... Jetson TX1 // Tesla/Titan/GRID

10

DEEP LEARNING FOR AUTONOMOUS VEHICLES

Page 11: Building Automous Machines with JETSON - Inria · HPC data centers running mix ... €630 / £450 retail €350 / £250 academic discount . 14. 15 ... Jetson TX1 // Tesla/Titan/GRID

11

Jetson TX1

JETSON TX1

GPU 1024 GFLOPS 256-core Maxwell

CPU 4x 64-bit ARM A57 CPUs | 1.6 GHz

Memory 4 GB LPDDR4 | 25.6 GB/s

Video decode 4K 60Hz H.264 / H.265

Video encode 4K 30Hz H.264 / H.265

CSI Up to 6 cameras | 1400 Mpix/s

Display 2x DSI, 1x eDP 1.4, 1x DP 1.2/HDMI

Wi-Fi 802.11 2x2 ac

Networking 1 Gigabit Ethernet

PCI-E Gen 2 1x1 + 1x4

Storage 16 GB eMMC, SDIO, SATA

Other 3x UART, 3x SPI, 4x I2C, 4x I2S, GPIOs

Power 10-15W, 6.6V-19.5VDC

Size 50mm x 87mm

Page 12: Building Automous Machines with JETSON - Inria · HPC data centers running mix ... €630 / £450 retail €350 / £250 academic discount . 14. 15 ... Jetson TX1 // Tesla/Titan/GRID

12

JETSON AT THE LEADING EDGE

Powering the Next Generation of Autonomous Machines

Page 13: Building Automous Machines with JETSON - Inria · HPC data centers running mix ... €630 / £450 retail €350 / £250 academic discount . 14. 15 ... Jetson TX1 // Tesla/Titan/GRID

13

Jetson TX1 Developer Kit

Jetson TX1 Module

Developer Board

5MP Camera

€630 / £450 retail

€350 / £250 academic discount

Page 14: Building Automous Machines with JETSON - Inria · HPC data centers running mix ... €630 / £450 retail €350 / £250 academic discount . 14. 15 ... Jetson TX1 // Tesla/Titan/GRID

14

Page 15: Building Automous Machines with JETSON - Inria · HPC data centers running mix ... €630 / £450 retail €350 / £250 academic discount . 14. 15 ... Jetson TX1 // Tesla/Titan/GRID

15

JETSON SDK: THE DETAILS

CUDA OpenGL

Linux for Tegra

Compute

Jetson TX1

Vision Machine Learning

cuFFT

cuBLAS

cuSolver

cuSPARSE

cuRAND

NPP

Thrust

CUDA Math Library

Graphics

Tools

NVTX NVIDIA Tools eXtension

Source code editor

Debugger

Profiler

System Trace

Vertically Integrated Packaages

Vertically Integrated Packages

V4L2

libjpeg

Page 16: Building Automous Machines with JETSON - Inria · HPC data centers running mix ... €630 / £450 retail €350 / £250 academic discount . 14. 15 ... Jetson TX1 // Tesla/Titan/GRID

16

TRAIN, THEN DEPLOY

YOUR NEURAL NETWORKS Programming Approaches

Classified Object!

Trained Deep Neural Net Model

Camera Inputs

Network

Solver

Data Scientist

Training: Server, GRID, DIGITS Inference: GIE, Embedded (Jetson)

Page 17: Building Automous Machines with JETSON - Inria · HPC data centers running mix ... €630 / £450 retail €350 / £250 academic discount . 14. 15 ... Jetson TX1 // Tesla/Titan/GRID

17

Your Application (dog breed detector, for example)

CUDNN

CUDA-accelerated Deep Learning library

Supports industry-standard frameworks:

Out-of-the-box speedups of neural networks:

For both Inference and Training:

Jetson TX1 // Tesla/Titan/GRID

CUDA

cuDNN

Frameworks

DIGITS

Page 18: Building Automous Machines with JETSON - Inria · HPC data centers running mix ... €630 / £450 retail €350 / £250 academic discount . 14. 15 ... Jetson TX1 // Tesla/Titan/GRID

18

Process Data Configure DNN Visualization Monitor Progress

Interactive Deep Learning GPU Training System

NVIDIA DIGITS

Page 19: Building Automous Machines with JETSON - Inria · HPC data centers running mix ... €630 / £450 retail €350 / £250 academic discount . 14. 15 ... Jetson TX1 // Tesla/Titan/GRID

19

OBJECT DETECTION New in DIGITS 4

ADVANCED DRIVER ASSISTANCE SYSTEMS (ADAS)

REMOTE SENSING

MEDICAL DIAGNOSTICS

INTELLIGENT VIDEO ANALYTICS

developer.nvidia.com/digits

Page 20: Building Automous Machines with JETSON - Inria · HPC data centers running mix ... €630 / £450 retail €350 / £250 academic discount . 14. 15 ... Jetson TX1 // Tesla/Titan/GRID

21

GPU INFERENCE ENGINE (GIE) High-performance deep learning inference for production deployment

developer.nvidia.com/gie

-5x

5x

15x

25x

35x

45x

1 8

CPU-Only Tesla M4 + GIE

40x more inference perf/watt with GIE

EMBEDDED

Jetson TX1

DATA CENTER

Tesla M4

AUTOMOTIVE

Drive PX Batch Sizes

GoogLenet Images Per Second Per Watt

CPU-Only vs Tesla M4 + GIE on

dual-socket Haswell E5-2698 [email protected] HT on

Page 21: Building Automous Machines with JETSON - Inria · HPC data centers running mix ... €630 / £450 retail €350 / £250 academic discount . 14. 15 ... Jetson TX1 // Tesla/Titan/GRID

22

END-TO-END DEEP LEARNING FOR

SELF-CONTROL

Motor PWM

Sensory Inputs

Perceptron

RNN

Recognition

Inference

Goal/reward

function

user application

Short

-term

nav

Long-t

erm

nav

Page 22: Building Automous Machines with JETSON - Inria · HPC data centers running mix ... €630 / £450 retail €350 / £250 academic discount . 14. 15 ... Jetson TX1 // Tesla/Titan/GRID

23

PC GAMING

ONE ARCHITECTURE — END-TO-END AI

CUDA + Linux throughout the stack.

Page 23: Building Automous Machines with JETSON - Inria · HPC data centers running mix ... €630 / £450 retail €350 / £250 academic discount . 14. 15 ... Jetson TX1 // Tesla/Titan/GRID

24 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

NVIDIA DGX-1 WORLD’S FIRST DEEP LEARNING SUPERCOMPUTER

170 TFLOPS FP16

8x Tesla P100 16GB

NVLink Hybrid Cube Mesh

Accelerates Major AI Frameworks

Dual Xeon

7 TB SSD Deep Learning Cache

Dual 10GbE, Quad IB 100Gb

3RU – 3200W

Page 24: Building Automous Machines with JETSON - Inria · HPC data centers running mix ... €630 / £450 retail €350 / £250 academic discount . 14. 15 ... Jetson TX1 // Tesla/Titan/GRID

26

THANK YOU! Q&A: WHAT CAN I HELP YOU BUILD?