Intel Nervana Artificial Intelligence Meetup 1/31/17

68
Proprietary and confidential. Do not distribute. Introduction to deep learning with neon MAKING MACHINES SMARTER.

Transcript of Intel Nervana Artificial Intelligence Meetup 1/31/17

Page 1: Intel Nervana Artificial Intelligence Meetup 1/31/17

Proprietary and confidential. Do not distribute.

Introduction to deeplearning with neon

MAKING MACHINES SMARTER.™

Page 2: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

2

• Intel Nervana overview• Machine learning basics

• What is deep learning?

• Basic deep learning concepts

• Example: recognition of handwritten digits

• Model ingredients in-depth

• Deep learning with neon

Page 3: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

Intel Nervana‘s deep learning solution stack

3

Images

Video

Text

Speech

Tabular

Time series

Solutions

Page 4: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

Deep Dream

Autoencoders

Deep Speech 2

Skip-thought

SegNet

Fast-RCNN Object Localization

Deep Reinforcement Learning

imdb Sentiment Analysis

Video Activity Detection

Deep Residual Net

bAbI Q&A

AIICNN AlexNet GoogLeNet

VGG

https://github.com/NervanaSystems/ModelZoo

Page 5: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

Intel Nervana in action

5

Healthcare: Tumor detection

Automotive: Speech interfacesFinance: Time-series search engine

Positive:

Negative:

Agricultural Robotics Oil & Gas

Positive:

Negative:

Proteomics: Sequence analysis

Query:

Results:

Page 6: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

• Optimized AVX-2 and AVX-512 instructions• Intel® Xeon® processors and Intel® Xeon Phi™ processors• Optimized for common deep learning operations

• GEMM (useful in RNNs and fully connected layers)• Convolutions• Pooling• ReLU• Batch normalization

• Coming soon: LSTM, GRU, Winograd-based convolutions

6

Page 7: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

Page 8: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

8

• Intel Nervana overview

• Machine learning basics• What is deep learning?

• Basic deep learning concepts

• Example: recognition of handwritten digits

• Model ingredients in-depth

• Deep learning with neon

Page 9: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

9

• SUPERVISED LEARNING

• DATA -> LABELS

• UNSUPERVISED LEARNING

• NO LABELS; CLUSTERING

• REDUCING DIMENSIONALITY

• REINFORCEMENT LEARNING

• REWARD ACTIONS (E.G., ROBOTICS)

Page 10: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

10

• SUPERVISED LEARNING

• DATA -> LABELS

• UNSUPERVISED LEARNING

• NO LABELS; CLUSTERING

• REDUCING DIMENSIONALITY

• REINFORCEMENT LEARNING

• REWARD ACTIONS (E.G., ROBOTICS)

Page 11: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

11

(𝑓#, 𝑓%, … , 𝑓')

SVMRandom ForestNaïve BayesDecision TreesLogistic RegressionEnsemble methods

𝑁×𝑁

𝐾 ≪ 𝑁

Arjun

Page 12: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

12

Animals

FacesChairs

Fruits

Vehicles

Page 13: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

Animals

FacesChairs

Fruits

Vehicles

13

Page 14: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

Animals

FacesChairs

Fruits

Vehicles

14

Training error

x

x

x

x

x

x

x

x x

xx

x xxx x

xxx

x

x

xxx

xxx

Testing error

Page 15: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

15

Training Time

Erro

r

Training Error

Testing/Validation Error

Underfitting Overfitting

Bias-Variance Trade-off

Page 16: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

16

• Intel Nervana overview

• Machine learning basics

• What is deep learning? • Basic deep learning concepts

• Example: recognition of handwritten digits

• Model ingredients in-depth

• Deep learning with neon

Page 17: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

17

~60 million parameters

Arjun

But old practices apply: Data Cleaning, Underfit/Overfit, Data exploration, right cost function, hyperparameters, etc.

𝑁×𝑁

Page 18: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

18

Bigger Data Better Hardware Smarter Algorithms

Image: 1000 KB / pictureAudio: 5000 KB / song

Video: 5,000,000 KB / movie

Transistor density doubles every 18 months

Cost / GB in 1995: $1000.00Cost / GB in 2015: $0.03

Advances in algorithm innovation, including neural networks, leading to better accuracy in training models

Page 19: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

19

Page 20: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

20

• Intel Nervana overview

• Machine learning basics

• What is deep learning?

• Basic deep learning concepts• Model ingredients in-depth

• Deep learning with neon

Page 21: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

𝑦𝑥%

𝑥0

𝑥#

𝑎

max(𝑎, 0)

𝑡𝑎𝑛ℎ(𝑎)

Output of unit

Activation FunctionLinear weights Bias unit

Input from unit j

𝒘𝟏

𝒘𝟐

𝒘𝟑

𝑔∑

Page 22: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

InputHidden

Output

Affine layer: Linear + Bias + Activation

Page 23: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

MNIST dataset 70,000 images (28x28 pixels)Goal: classify images into a digit 0-9

N = 28 x 28 pixels = 784 input units

N = 10 output units (one for each digit)

Each unit i encodes the probability of the

input image of being of the digit i

N = 100 hidden units (user-defined parameter)

InputHidden

Output

Page 24: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

N=784N=100

N=10

Total parameters:

𝑊@→B, 𝑏B𝑊B→D, 𝑏D

𝑊@→B

𝑏B𝑊B→D𝑏D

784x100100100x1010

= 84,600

𝐿𝑎𝑦𝑒𝑟𝑖𝐿𝑎𝑦𝑒𝑟𝑗

𝐿𝑎𝑦𝑒𝑟𝑘

Page 25: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

InputHidden

Output 1. Randomly seed weights2. Forward-pass3. Cost4. Backward-pass5. Update weights

Page 26: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

InputHidden

Output

𝑊@→B, 𝑏B ∼ 𝐺𝑎𝑢𝑠𝑠𝑖𝑎𝑛(0,1)

𝑊B→D, 𝑏D ∼ 𝐺𝑎𝑢𝑠𝑠𝑖𝑎𝑛(0,1)

Page 27: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

0.00.10.00.30.10.10.00.00.40.0

Output (10x1)

28x28

InputHidden

Output

Page 28: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

0.00.10.00.30.10.10.00.00.40.0

Output (10x1)

28x28

InputHidden

Output0001000000

Ground Truth

Cost function𝑐(𝑜𝑢𝑡𝑝𝑢𝑡, 𝑡𝑟𝑢𝑡ℎ)

Page 29: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

0.00.10.00.30.10.10.00.00.40.0

Output (10x1)

InputHidden

Output0001000000

Ground Truth

Cost function𝑐(𝑜𝑢𝑡𝑝𝑢𝑡, 𝑡𝑟𝑢𝑡ℎ)

Δ𝑊@→B Δ𝑊B→D

Page 30: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

InputHidden

Output 𝐶 𝑦, 𝑡𝑟𝑢𝑡ℎ

𝑊∗

𝜕𝐶𝜕𝑊∗

compute

Page 31: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

InputHidden

Output 𝐶 𝑦, 𝑡𝑟𝑢𝑡ℎ = 𝐶 𝑔 ∑(𝑊B→D𝑥D + 𝑏D)

𝑊∗

Page 32: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

InputHidden

Output 𝐶 𝑦, 𝑡𝑟𝑢𝑡ℎ = 𝐶 𝑔 ∑(𝑊B→D𝑥D + 𝑏D)

𝑎(𝑊B→D, 𝑥D)=

𝑊B→D∗𝜕𝐶𝜕𝑊∗ =

𝜕𝐶𝜕𝑔 \

𝜕𝑔𝜕𝑎 \

𝜕𝑎𝜕𝑊∗

a

𝑔 = max(𝑎, 0)

a

𝑔′(𝑎)

= 𝐶 𝑔(𝑎 𝑊B→D, 𝑥D )

Page 33: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

InputHidden

Output 𝐶 𝑦, 𝑡𝑟𝑢𝑡ℎ = 𝐶 𝑔D(𝑎D 𝑊B→D, 𝑔B(𝑎B(𝑊@→B, 𝑥B))

𝜕𝐶𝜕𝑊∗ =

𝜕𝐶𝜕𝑔D

\𝜕𝑔D𝜕𝑎D

\𝜕𝑎D𝜕𝑔B

\𝜕𝑔B𝜕𝑎B

\𝜕𝑎B𝜕𝑊∗

𝐶 𝑦, 𝑡𝑟𝑢𝑡ℎ = 𝐶 𝑔D 𝑎D(𝑊B→D, 𝑥D = 𝑦B

𝑦B

𝑊@→B∗

Page 34: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

𝐽 𝒘(_) =`𝑐𝑜𝑠𝑡(𝒘(_), 𝒙𝑖)b

@c#

𝒘𝒘(_)

Page 35: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

𝐽 𝒘(_) =`𝑐𝑜𝑠𝑡(𝒘(_), 𝒙𝑖)b

@c#

𝒘𝒘(_)

𝑑𝐽 𝒘(_)

𝑑𝒘

Page 36: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

𝐽 𝒘(_) =`𝑐𝑜𝑠𝑡(𝒘(_), 𝒙𝑖)b

@c#

𝒘𝒘(_)

𝒘(#) = 𝒘(_) −𝑑𝐽 𝒘(_)

𝑑𝒘

Page 37: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

𝐽 𝒘(_) =`𝑐𝑜𝑠𝑡(𝒘(_), 𝒙𝑖)b

@c#

𝒘𝒘(_)

𝒘(#) = 𝒘(_) − 𝛼𝑑𝐽 𝒘(_)

𝑑𝒘

learning rate

Page 38: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

𝐽 𝒘(_) =`𝑐𝑜𝑠𝑡(𝒘(_), 𝒙𝑖)b

@c#

𝒘𝒘(_)

𝒘(#) = 𝒘(_) − 𝛼𝑑𝐽 𝒘(_)

𝑑𝒘

𝒘(#)

too small

Page 39: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

𝐽 𝒘(_) =`𝑐𝑜𝑠𝑡(𝒘(_), 𝒙𝑖)b

@c#

𝒘𝒘(_)

𝒘(#) = 𝒘(_) − 𝛼𝑑𝐽 𝒘(_)

𝑑𝒘

𝒘(#)

too large

Page 40: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

𝐽 𝒘(_) =`𝑐𝑜𝑠𝑡(𝒘(_), 𝒙𝑖)b

@c#

𝒘𝒘(_)

𝒘(#) = 𝒘(_) − 𝛼𝑑𝐽 𝒘(_)

𝑑𝒘

𝒘(#)

good enough

Page 41: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

𝐽 𝒘(#) =`𝑐𝑜𝑠𝑡(𝒘(#), 𝒙𝑖)b

@c#

𝒘𝒘(%)

𝒘(%) = 𝒘(#) − 𝛼𝑑𝐽 𝒘(#)

𝑑𝒘

𝒘(#)

Page 42: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

𝐽 𝒘(%) =`𝑐𝑜𝑠𝑡(𝒘(%), 𝒙𝑖)b

@c#

𝒘

𝒘(0) = 𝒘(%) − 𝛼𝑑𝐽 𝒘(%)

𝑑𝒘

𝒘(%)𝒘(0)

Page 43: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

𝐽 𝒘(0) =`𝑐𝑜𝑠𝑡(𝒘(0), 𝒙𝑖)b

@c#

𝒘

𝒘(g) = 𝒘(0) − 𝛼𝑑𝐽 𝒘(0)

𝑑𝒘

𝒘(g)

𝒘(0)

Page 44: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

fprop cost bprop 𝛿𝑊

fprop cost bprop 𝛿𝑊

fprop cost bprop 𝛿𝑊

fprop cost bprop 𝛿𝑊

fprop cost bprop 𝛿𝑊

fprop cost bprop 𝛿𝑊

Page 45: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

fprop cost bprop 𝛿𝑊

fprop cost bprop 𝛿𝑊

fprop cost bprop 𝛿𝑊

fprop cost bprop 𝛿𝑊

fprop cost bprop 𝛿𝑊

fprop cost bprop 𝛿𝑊

Update weights via:

Δ𝑊 = 𝛼 ∗1𝑁`𝛿𝑊

Learning rate

Page 46: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

fprop cost bprop 𝛿𝑊

fprop cost bprop 𝛿𝑊

fprop cost bprop 𝛿𝑊

fprop cost bprop 𝛿𝑊

fprop cost bprop 𝛿𝑊

fprop cost bprop 𝛿𝑊

minibatch #1 weight update

minibatch #2 weight update

Page 47: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

Epoch 0

Epoch 1

Sample numbers:• Learning rate ~0.001• Batch sizes of 32-128• 50-90 epochs

Page 48: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

SGDGradient Descent

Page 49: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

Krizhevsky, 2012

60 million parameters

120 million parameters Taigman, 2014

Page 50: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

50

• Intel Nervana overview

• Machine learning basics

• What is deep learning?

• Basic deep learning concepts

• Model ingredients in-depth• Deep learning with neon

Page 51: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

Dataset Model/Layers Activation OptimizerCost

𝐶(𝑦, 𝑡)

Page 52: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

Filter + Non-Linearity

Pooling

Filter + Non-Linearity

Fully connected layers

“how can I help you?”

cat

Low level features

Mid level features

Object parts, phonemes

Objects, words

*Hinton et al., LeCun, Zeiler, Fergus

Filter + Non-Linearity

Pooling

Page 53: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

Tanh Rectified Linear UnitLogistic

-1

11

0

𝑔 𝑎 =𝑒j

∑ 𝑒jk�D

Softmax

Page 54: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

Gaussian Gaussian(mean, sd)

GlorotUniform Uniform(-k, k)

Xavier Uniform(k, k)

Kaiming Gaussian(0, sigma)

𝑘 =6

𝑑@m + 𝑑nop

𝑘 =3𝑑@m

𝜎 =2𝑑@m

Page 55: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

• Cross Entropy Loss

• Misclassification Rate

• Mean Squared Error

• L1 loss

Page 56: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

0.00.10.00.30.10.10.00.00.40.0

Output (10x1)

0001000000

Ground Truth

−`𝑡D×log(𝑦D)�

D= −log(0.3)

Page 57: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

0.3 0.3 0.4

0.3 0.4 0.3

0.1 0.2 0.7

0 0 1

0 1 0

1 0 0

Outputs Targets Correct?YY

N

0.1 0.2 0.7

0.1 0.7 0.2

0.3 0.4 0.3

0 0 1

0 1 0

1 0 0

YY

N

-(log(0.4) + log(0.4) + log(0.1))/3=1.38

-(log(0.7) + log(0.7) + log(0.3))/3=0.64

Page 58: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

• SGD with Momentum

• RMS propagation

• Adagrad

• Adadelta

• Adam

Page 59: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

Δ𝑊# Δ𝑊% Δ𝑊0 Δ𝑊g

training time

𝛼pcxy =𝛼

∑ Δ𝑊p%pcx

pc_�

Page 60: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

Δ𝑊# Δ𝑊% Δ𝑊0 Δ𝑊g

training time

𝛼pcgy =𝛼

Δ𝑊%% + Δ𝑊0

% + Δ𝑊g%�

Page 61: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

61

• Intel Nervana overview

• Machine learning basics

• What is deep learning?

• Basic deep learning concepts

• Model ingredients in-depth

• Deep learning with neon

Page 62: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

Page 63: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

Page 64: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

•Popular, well established, developer familiarity

•Fast to prototype

•Rich ecosystem of existing packages.

•Data Science: pandas, pycuda, ipython, matplotlib, h5py, …

•Good “glue” language: scriptable plus functional and OO support,

plays well with other languages

Page 65: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

Backend NervanaGPU, NervanaCPU

DatasetsMNIST, CIFAR-10, Imagenet 1K, PASCAL VOC, Mini-Places2, IMDB, Penn Treebank,

Shakespeare Text, bAbI, Hutter-prize, UCF101, flickr8k, flickr30k, COCO

Initializers Constant, Uniform, Gaussian, Glorot Uniform, Xavier, Kaiming, IdentityInit, Orthonormal

Optimizers Gradient Descent with Momentum, RMSProp, AdaDelta, Adam, Adagrad,MultiOptimizer

Activations Rectified Linear, Softmax, Tanh, Logistic, Identity, ExpLin

LayersLinear, Convolution, Pooling, Deconvolution, Dropout, Recurrent,Long Short-

Term Memory, Gated Recurrent Unit, BatchNorm, LookupTable,Local Response Normalization, Bidirectional-RNN, Bidirectional-LSTM

Costs Binary Cross Entropy, Multiclass Cross Entropy, Sum of Squares Error

Metrics Misclassification (Top1, TopK), LogLoss, Accuracy, PrecisionRecall, ObjectDetection

Page 66: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

1. Generate backend2. Load data3. Specify model architecture4. Define training parameters5. Train model6. Evaluate

Page 67: Intel Nervana Artificial Intelligence Meetup 1/31/17

Nervana Systems Proprietary

Page 68: Intel Nervana Artificial Intelligence Meetup 1/31/17

NERVANA

[email protected]