Intel Nervana Artificial Intelligence Meetup 1/31/17

Proprietary and confidential. Do not distribute.

Introduction to deeplearning with neon

MAKING MACHINES SMARTER.™

Nervana Systems Proprietary

• Intel Nervana overview• Machine learning basics

• What is deep learning?

• Basic deep learning concepts

• Example: recognition of handwritten digits

• Model ingredients in-depth

• Deep learning with neon

Intel Nervana‘s deep learning solution stack

Images

Speech

Tabular

Time series

Solutions

Deep Dream

Autoencoders

Deep Speech 2

Skip-thought

SegNet

Fast-RCNN Object Localization

Deep Reinforcement Learning

imdb Sentiment Analysis

Video Activity Detection

Deep Residual Net

bAbI Q&A

AIICNN AlexNet GoogLeNet

https://github.com/NervanaSystems/ModelZoo

Intel Nervana in action

Healthcare: Tumor detection

Automotive: Speech interfacesFinance: Time-series search engine

Positive:

Negative:

Agricultural Robotics Oil & Gas

Positive:

Negative:

Proteomics: Sequence analysis

Query:

Results:

• Optimized AVX-2 and AVX-512 instructions• Intel® Xeon® processors and Intel® Xeon Phi™ processors• Optimized for common deep learning operations

• GEMM (useful in RNNs and fully connected layers)• Convolutions• Pooling• ReLU• Batch normalization

• Coming soon: LSTM, GRU, Winograd-based convolutions

• Intel Nervana overview

• Machine learning basics• What is deep learning?

• SUPERVISED LEARNING

• DATA -> LABELS

• UNSUPERVISED LEARNING

• NO LABELS; CLUSTERING

• REDUCING DIMENSIONALITY

• REINFORCEMENT LEARNING

• REWARD ACTIONS (E.G., ROBOTICS)

• SUPERVISED LEARNING

• DATA -> LABELS

• UNSUPERVISED LEARNING

• NO LABELS; CLUSTERING

• REDUCING DIMENSIONALITY

• REINFORCEMENT LEARNING

• REWARD ACTIONS (E.G., ROBOTICS)

(𝑓#, 𝑓%, … , 𝑓')

SVMRandom ForestNaïve BayesDecision TreesLogistic RegressionEnsemble methods

𝑁×𝑁

𝐾 ≪ 𝑁

Animals

FacesChairs

Fruits

Vehicles

Animals

FacesChairs

Fruits

Vehicles

Animals

FacesChairs

Fruits

Vehicles

Training error

x xxx x

Testing error

Training Time

Training Error

Testing/Validation Error

Underfitting Overfitting

Bias-Variance Trade-off

• Machine learning basics

• What is deep learning? • Basic deep learning concepts

~60 million parameters

But old practices apply: Data Cleaning, Underfit/Overfit, Data exploration, right cost function, hyperparameters, etc.

𝑁×𝑁

Bigger Data Better Hardware Smarter Algorithms

Image: 1000 KB / pictureAudio: 5000 KB / song

Video: 5,000,000 KB / movie

Transistor density doubles every 18 months

Cost / GB in 1995: $1000.00Cost / GB in 2015: $0.03

Advances in algorithm innovation, including neural networks, leading to better accuracy in training models

• Basic deep learning concepts• Model ingredients in-depth

𝑦𝑥%

max(𝑎, 0)

𝑡𝑎𝑛ℎ(𝑎)

Output of unit

Activation FunctionLinear weights Bias unit

Input from unit j

𝒘𝟏

𝒘𝟐

𝒘𝟑

𝑔∑

InputHidden

Output

Affine layer: Linear + Bias + Activation

MNIST dataset 70,000 images (28x28 pixels)Goal: classify images into a digit 0-9

N = 28 x 28 pixels = 784 input units

N = 10 output units (one for each digit)

Each unit i encodes the probability of the

input image of being of the digit i

N = 100 hidden units (user-defined parameter)

InputHidden

Output

N=784N=100

Total parameters:

𝑊@→B, 𝑏B𝑊B→D, 𝑏D

𝑊@→B

𝑏B𝑊B→D𝑏D

784x100100100x1010

= 84,600

𝐿𝑎𝑦𝑒𝑟𝑖𝐿𝑎𝑦𝑒𝑟𝑗

𝐿𝑎𝑦𝑒𝑟𝑘

InputHidden

Output 1. Randomly seed weights2. Forward-pass3. Cost4. Backward-pass5. Update weights

InputHidden

Output

𝑊@→B, 𝑏B ∼ 𝐺𝑎𝑢𝑠𝑠𝑖𝑎𝑛(0,1)

𝑊B→D, 𝑏D ∼ 𝐺𝑎𝑢𝑠𝑠𝑖𝑎𝑛(0,1)

0.00.10.00.30.10.10.00.00.40.0

Output (10x1)

InputHidden

Output

0.00.10.00.30.10.10.00.00.40.0

Output (10x1)

InputHidden

Output0001000000

Ground Truth

Cost function𝑐(𝑜𝑢𝑡𝑝𝑢𝑡, 𝑡𝑟𝑢𝑡ℎ)

0.00.10.00.30.10.10.00.00.40.0

Output (10x1)

InputHidden

Output0001000000

Ground Truth

Cost function𝑐(𝑜𝑢𝑡𝑝𝑢𝑡, 𝑡𝑟𝑢𝑡ℎ)

Δ𝑊@→B Δ𝑊B→D

InputHidden

Output 𝐶 𝑦, 𝑡𝑟𝑢𝑡ℎ

𝑊∗

𝜕𝐶𝜕𝑊∗

compute

InputHidden

Output 𝐶 𝑦, 𝑡𝑟𝑢𝑡ℎ = 𝐶 𝑔 ∑(𝑊B→D𝑥D + 𝑏D)

𝑊∗

InputHidden

Output 𝐶 𝑦, 𝑡𝑟𝑢𝑡ℎ = 𝐶 𝑔 ∑(𝑊B→D𝑥D + 𝑏D)

𝑎(𝑊B→D, 𝑥D)=

𝑊B→D∗𝜕𝐶𝜕𝑊∗ =

𝜕𝐶𝜕𝑔 \

𝜕𝑔𝜕𝑎 \

𝜕𝑎𝜕𝑊∗

𝑔 = max(𝑎, 0)

𝑔′(𝑎)

= 𝐶 𝑔(𝑎 𝑊B→D, 𝑥D )

InputHidden

Output 𝐶 𝑦, 𝑡𝑟𝑢𝑡ℎ = 𝐶 𝑔D(𝑎D 𝑊B→D, 𝑔B(𝑎B(𝑊@→B, 𝑥B))

𝜕𝐶𝜕𝑊∗ =

𝜕𝐶𝜕𝑔D

\𝜕𝑔D𝜕𝑎D

\𝜕𝑎D𝜕𝑔B

\𝜕𝑔B𝜕𝑎B

\𝜕𝑎B𝜕𝑊∗

𝐶 𝑦, 𝑡𝑟𝑢𝑡ℎ = 𝐶 𝑔D 𝑎D(𝑊B→D, 𝑥D = 𝑦B

𝑊@→B∗

𝐽 𝒘(_) =`𝑐𝑜𝑠𝑡(𝒘(_), 𝒙𝑖)b

𝒘𝒘(_)

𝑑𝐽 𝒘(_)

𝑑𝒘

𝒘𝒘(_)

𝒘(#) = 𝒘(_) −𝑑𝐽 𝒘(_)

𝑑𝒘

𝒘𝒘(_)

𝒘(#) = 𝒘(_) − 𝛼𝑑𝐽 𝒘(_)

𝑑𝒘

learning rate

𝒘𝒘(_)

𝒘(#) = 𝒘(_) − 𝛼𝑑𝐽 𝒘(_)

𝑑𝒘

𝒘(#)

too small

𝒘𝒘(_)

𝒘(#) = 𝒘(_) − 𝛼𝑑𝐽 𝒘(_)

𝑑𝒘

𝒘(#)

too large

𝒘𝒘(_)

𝒘(#) = 𝒘(_) − 𝛼𝑑𝐽 𝒘(_)

𝑑𝒘

𝒘(#)

good enough

𝐽 𝒘(#) =`𝑐𝑜𝑠𝑡(𝒘(#), 𝒙𝑖)b

𝒘𝒘(%)

𝒘(%) = 𝒘(#) − 𝛼𝑑𝐽 𝒘(#)

𝑑𝒘

𝒘(#)

𝐽 𝒘(%) =`𝑐𝑜𝑠𝑡(𝒘(%), 𝒙𝑖)b

𝒘(0) = 𝒘(%) − 𝛼𝑑𝐽 𝒘(%)

𝑑𝒘

𝒘(%)𝒘(0)

𝐽 𝒘(0) =`𝑐𝑜𝑠𝑡(𝒘(0), 𝒙𝑖)b

𝒘(g) = 𝒘(0) − 𝛼𝑑𝐽 𝒘(0)

𝑑𝒘

𝒘(g)

𝒘(0)

fprop cost bprop 𝛿𝑊

Update weights via:

Δ𝑊 = 𝛼 ∗1𝑁`𝛿𝑊

Learning rate

minibatch #1 weight update

minibatch #2 weight update

Epoch 0

Epoch 1

Sample numbers:• Learning rate ~0.001• Batch sizes of 32-128• 50-90 epochs

SGDGradient Descent

Krizhevsky, 2012

60 million parameters

120 million parameters Taigman, 2014

• Model ingredients in-depth• Deep learning with neon

Dataset Model/Layers Activation OptimizerCost

𝐶(𝑦, 𝑡)

Filter + Non-Linearity

Pooling

Fully connected layers

“how can I help you?”

Low level features

Mid level features

Object parts, phonemes

Objects, words

*Hinton et al., LeCun, Zeiler, Fergus

Pooling

Tanh Rectified Linear UnitLogistic

𝑔 𝑎 =𝑒j

∑ 𝑒jk�D

Softmax

Gaussian Gaussian(mean, sd)

GlorotUniform Uniform(-k, k)

Xavier Uniform(k, k)

Kaiming Gaussian(0, sigma)

𝑘 =6

𝑑@m + 𝑑nop

𝑘 =3𝑑@m

𝜎 =2𝑑@m

• Cross Entropy Loss

• Misclassification Rate

• Mean Squared Error

• L1 loss

0.00.10.00.30.10.10.00.00.40.0

Output (10x1)

0001000000

Ground Truth

−`𝑡D×log(𝑦D)�

D= −log(0.3)

0.3 0.3 0.4

0.3 0.4 0.3

0.1 0.2 0.7

Outputs Targets Correct?YY

0.1 0.2 0.7

0.1 0.7 0.2

0.3 0.4 0.3

-(log(0.4) + log(0.4) + log(0.1))/3=1.38

-(log(0.7) + log(0.7) + log(0.3))/3=0.64

• SGD with Momentum

• RMS propagation

• Adagrad

• Adadelta

• Adam

Δ𝑊# Δ𝑊% Δ𝑊0 Δ𝑊g

training time

𝛼pcxy =𝛼

∑ Δ𝑊p%pcx

pc_�

Δ𝑊# Δ𝑊% Δ𝑊0 Δ𝑊g

training time

𝛼pcgy =𝛼

Δ𝑊%% + Δ𝑊0

% + Δ𝑊g%�

•Popular, well established, developer familiarity

•Fast to prototype

•Rich ecosystem of existing packages.

•Data Science: pandas, pycuda, ipython, matplotlib, h5py, …

•Good “glue” language: scriptable plus functional and OO support,

plays well with other languages

Backend NervanaGPU, NervanaCPU

DatasetsMNIST, CIFAR-10, Imagenet 1K, PASCAL VOC, Mini-Places2, IMDB, Penn Treebank,

Shakespeare Text, bAbI, Hutter-prize, UCF101, flickr8k, flickr30k, COCO

Initializers Constant, Uniform, Gaussian, Glorot Uniform, Xavier, Kaiming, IdentityInit, Orthonormal

Optimizers Gradient Descent with Momentum, RMSProp, AdaDelta, Adam, Adagrad,MultiOptimizer

Activations Rectified Linear, Softmax, Tanh, Logistic, Identity, ExpLin

LayersLinear, Convolution, Pooling, Deconvolution, Dropout, Recurrent,Long Short-

Term Memory, Gated Recurrent Unit, BatchNorm, LookupTable,Local Response Normalization, Bidirectional-RNN, Bidirectional-LSTM

Costs Binary Cross Entropy, Multiclass Cross Entropy, Sum of Squares Error

Metrics Misclassification (Top1, TopK), LogLoss, Accuracy, PrecisionRecall, ObjectDetection

1. Generate backend2. Load data3. Specify model architecture4. Define training parameters5. Train model6. Evaluate

NERVANA

andres.rodriguez@intel.com

Intel Nervana Artificial Intelligence Meetup 1/31/17

Technology

Transcript of Intel Nervana Artificial Intelligence Meetup 1/31/17

DLP Intel Nervana - GitHub Pages · 2019-07-26 · CV, RL, Speech, Multi-task learning) Integrating multiple sensory inputs (audio, vision, etc.) into an end-to-end RLsystem. Models

MAFIA MEETUP KIT - zyngablog.typepad.com · MAFIA MEETUP KIT Mafia Meetup Checklist (Short Version) Mafia Meetup Checklist (Long/Detailed Version with Images) Meetup Tips from the

Platelet Structure & Function Dr. Nervana Bayoumy MD, PhD (Aberdeen – UK)) Associate Professor of Physiology, College of Medicine.

Intel Education on the Need for Standards | Education Metadata Meetup

IBD Meetup Pikes Peak Meetup Group Colorado Springs, CO

Jim Harris Principal Software Engineer Intel Data Center Group … · • SPDK Developer Meetup (Chandler, AZ) ... RocksDB Ceph Core Application Framework GPT PMDK blk virtio scsi

Intel Real Sense, Diversity Meetup by Jamie Tanna

Wearables meetup

Intel Nervana software stack - Microsigma · Intel® Nervana™ Deep Learning Studio. Intel® Computer Vision SDK. ... Convolutional Neural Networks and JIT. Innovation happens to

GREEN SMOOTHIES - Find Meetup groups near you - Meetup

Nervana and the Future of Computing

Dublin Spark Meetup - Meetup 1 - Intro to Spark

Altruism - Find Meetup groups near you - Meetup

Meetup -- RFID

Birmingham Meetup

Meetup 5min Lightning Talk for Meetup 2/17/2016

Sydney IoE Meetup Community - 1st Meetup Presentation

SEOworkshop-Meetup-Mar272011 SEO Meetup Full Presentat… · SEO Workshop Mar 27, 2011 Internet Marketing Meetup @ Branding Personality Please Tweet: I am attending the Meetup “Using

Intellectual Property - eng 0805 - ed. 2 Manar, Nervana & Sherine Ragab · 2008. 6. 5. · Title: Microsoft Word - Intellectual Property - eng 0805 - ed. 2 _Manar, Nervana & Sherine

Tecnologieemergentiper nuoviambitiapplicativi: HPC/A.I./D.L. · Nervana e e ce TAM b Products In Plan Working Plan SD530/SR650 Greenhill: 4 GPUs DGX-1/2 (VLH) Future Nervana ASIC