Urs Köster Presenting at RE-Work DL Summit in Boston

16
Proprietary and confidential. Do not distribute. Deep Learning at Scale May 2016 Urs Köster, PhD Nervana MAKING MACHINES SMARTER.

Transcript of Urs Köster Presenting at RE-Work DL Summit in Boston

Page 1: Urs Köster Presenting at RE-Work DL Summit in Boston

Proprietary and confidential. Do not distribute.

Deep Learning at Scale

May 2016 Urs Köster, PhD

Nervana

MAKING MACHINES SMARTER.

Page 2: Urs Köster Presenting at RE-Work DL Summit in Boston

Proprietary and confidential. Do not distribute.

ne r vana

About nervana

2

• A platform for machine intelligence

• enable deep learning at scale

• optimized from algorithms to silicon

X

Page 3: Urs Köster Presenting at RE-Work DL Summit in Boston

Proprietary and confidential. Do not distribute.

ne r vana

The Nervana Platform - a full-stack solution

3

neon deep learning

framework

nervana cloud Solutions

Images

Text

Tabular

Speech

Time series

Video

Page 4: Urs Köster Presenting at RE-Work DL Summit in Boston

neon: nervana python deep learning library

4

• User-friendly, extensible, fast

• Support for many deep learning models

• Interface to nervana cloud

• Multiple backends

• nervana engine

• GPU (optimized assembler kernels)

• CPU cluster

Open source (Apache 2.0) on github.com/nervanaSystems/neon

Page 5: Urs Köster Presenting at RE-Work DL Summit in Boston

Proprietary and confidential. Do not distribute.

ne r vana

Nervana Cloud

5

web interface

command line

Page 6: Urs Köster Presenting at RE-Work DL Summit in Boston

Proprietary and confidential. Do not distribute.

ne r vana

Deep learning as a core technology

6

DL

Photos Maps

Voice Search

Self-driving car

Ad Targeting

Machine Translation

‘Google Brain’ model

DL

Image Classification

Object Localization

Video Indexing

Speech Recognition

Nervana Platform

Natural Language

Page 7: Urs Köster Presenting at RE-Work DL Summit in Boston

Proprietary and confidential. Do not distribute.

ne r vana

Video recognition with 3D convolution

7

Training Speed

0

0.25

0.5

0.75

1

epochs / hour

neon caffe

Page 8: Urs Köster Presenting at RE-Work DL Summit in Boston

Proprietary and confidential. Do not distribute.

ne r vana

Object Localization / Segmentation

8

CamVid DatasetSegNet model

KITTI DatasetFast R-CNN model

neon (ms) caffe (ms) Speedup

Fast-RCNN (batch size=4) 360 670 1.8x

SegNet (batch size=4) 267 1455 5.4x

SegNet (4 GPUs, batch size=16) 348 -- *5.9x

Page 9: Urs Köster Presenting at RE-Work DL Summit in Boston

Proprietary and confidential. Do not distribute.

ne r vana

Image Classification (Residual Network)

9

Page 10: Urs Köster Presenting at RE-Work DL Summit in Boston

Proprietary and confidential. Do not distribute.

ne r vana

Speech to text

10

Page 11: Urs Köster Presenting at RE-Work DL Summit in Boston

Proprietary and confidential. Do not distribute.

ne r vana

Imagenet ILSVRC Challenge

11

Top-5

err

or

rate

0%

10%

20%

30%

2010 2011 2012 2013 2014 2015

Deep learninghuman

performance

Alex

Net

C

larifa

i

Goo

gleNe

t

Res

Net

Page 12: Urs Köster Presenting at RE-Work DL Summit in Boston

Proprietary and confidential. Do not distribute.

ne r vana 12

• Same model, better performance:

• Hardware improvements

• Algorithmic improvements

Speeding up Deep Learning

0100200

300400500600

CPU GTX580TitanX neon

Soumith's AlexNet Benchmark

ms

0

100

200

300

400

500

4/2015 8/2015 3/2016

neonCuDNN

Soumith's GoogleNet Benchmark

ms

0

100

200

300

400

500

4/2015 8/2015 3/2016

neonCuDNN

15,000 ...

Alexnet ms / iteration

Page 13: Urs Köster Presenting at RE-Work DL Summit in Boston

Proprietary and confidential. Do not distribute.

ne r vana

Dennard scaling has ended

13

# OF PROCESSORS

LEARNING SPEED

INDUSTRY STANDARD: COMMUNICATION OVERHEAD = PERFORMANCE CEILING

NERVANA: BETTER COMMUNICATION FABRIC, NEAR LINEAR SCALING

Transistors Clock speed Power Perf / clock

Page 14: Urs Köster Presenting at RE-Work DL Summit in Boston

Proprietary and confidential. Do not distribute.

ne r vana

Nervana Engine (coming in 2017)

14

• Unprecedented computing power

• 10x speedup over current GPUs

• More memory on-chip

• High-Bandwidth Memory off-chip

• Six bi-directional high-bandwidth

links for 3D torus interconnect

• 8 chips in a box, seamlessly scale

to multiple chassis

Page 15: Urs Köster Presenting at RE-Work DL Summit in Boston

Proprietary and confidential. Do not distribute.

ne r vana

Summary

15

• Deep learning is a new computational paradigm

• Learning and Inference on data

• neon with state-of-the-art GPU kernels

• Nervana Cloud with multi-GPU training

• Watch for Nervana Engine deep learning processor

Page 16: Urs Köster Presenting at RE-Work DL Summit in Boston