Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep...

100
Center of Excellence Cognitive Interaction Technology Deep Learning for Incipient Slip Detecon Robert Haschke Center of Excellence Cognive Interacon Technology (CITEC)

Transcript of Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep...

Page 1: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Deep Learning for Incipient Slip Detection

Robert Haschke

Center of Excellence Cognitive Interaction Technology (CITEC)

Page 2: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Overview

● Success Stories of Deep Learning

● Motivation for Deep Architectures

● Ingredients of Deep Learning

Page 3: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Success Stories of Deep Learning

Page 4: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Success Stories of Deep Learning

● Vision (ImageNet competition)

– 1.3 million images, 1000 classes

– top 5 error of ~5% (matches human performance)

Page 5: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Success Stories of Deep Learning

● Vision (ImageNet competition)

– 1.3 million images, 1000 classes

– top 5 error of ~5% (matches human performance)

● Natural Language Processing (Siri, ...)

Page 6: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Success Stories of Deep Learning

● Vision (ImageNet competition)

– 1.3 million images, 1000 classes

– top 5 error of ~5% (matches human performance)

● Natural Language Processing (Siri, ...)

● Word Embeddings

Page 7: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Success Stories of Deep Learning

● Vision (ImageNet competition)

– 1.3 million images, 1000 classes

– top 5 error of ~5% (matches human performance)

● Natural Language Processing (Siri, ...)

● Word Embeddings

● Text Processing

– Automatic Translation

Page 8: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

ImageNet Examples

Page 9: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Word Embeddings for Language Processing

● represent words by vectors

● learned from word co-occurence in large text-corpora

Page 10: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Word Embeddings for Language Processing

● represent words by vectors

● learned from word co-occurence in large text-corpora

● semantics encoded in the (linear) topology of the space

Page 11: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Fusing Vision and Speech

● instead of softmax layer, feed output to RNN

● RNN trained on human description of images

Vinyals et al. 2014 Show and Tell: A Neural Image Caption Generator

Page 12: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Limitations of Neural Networks

● confidence >99.6%

● generated with Genetic Algorithms

Nguyen et al. 2014 Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images

Page 13: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Deep Learning History

● 1958 Perceptron (Rosenblatt)

● 1980 Neocognitron (Fukushima)

● 1982 Hopfield network, SOM (Kohonen)

● 1985 Boltzmann machines (Ackley et al)

● 1986 MLP + backpropagation (Rumelhart)

● 1988 RBF networks (Broomhead + Lowe)

● 1989 Autoencoders (Baldi + Hornik)

● 1989 Convolutional Network (LeCun)

● 1993 Sparse Coding (Field)

● 2000s Sparse, Probabilistic, and Layer-wise models (Hinton, Bengio, Ng)

● 2012 DL clearly won ImageNet competition (Krizhevsky et al.)

Rosenblatt’s Perceptron

Page 14: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Why Now?

● Big Data

– ImageNet et al: millions of labeled images(crowd-sourced)

● Computing Power – GPUs

– terabytes/s memory bandwidth

– teraflops compute

● Improved Methods

– efficient + numerically robust learning frameworks

– new optimization methods

Page 15: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

How are these amazing results achieved?

Page 16: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Neural Networks

● simple units layered in a network structure

● weighted sum of inputs:

● nonlinear activation:

Page 17: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Neural Network Learning

● learning by backpropagation of errors

● layered structure + chain rule = backpropagation

Page 18: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Distributed Representation

● prototype-based representation needs many examples

prototype-based learning perceptron half-spaces

Page 19: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Distributed Representation

● prototype-based representation needs many examples

● composition of features is exponentially more efficient

Page 20: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Distributed Representation

● prototype-based representation needs many examples

● composition of features is exponentially more efficient

Consider a network whose hidden units represent the features:● person is male / female● person is young / old● person wears glasses● person has beard

Page 21: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Distributed Representation

● prototype-based representation needs many examples

● composition of features is exponentially more efficient

Consider a network whose hidden units represent the features:● person is male / female● person is young / old● person wears glasses● person has beard

Given n features and each feature requires O(k) parameters, need O(nk) examples.Prototype-based methods would require O(kn) examples.

Page 22: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Distributed Representation

● prototype-based representation needs many examples

● composition of features is exponentially more efficient

● prior assumption: compositionality is useful to describe real-world

● exploit underlying structure of the world

Page 23: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Backpropagation Doesn't Scale to Deep Nets

Deep nets perform worse than shallow netswhen trained with randomly-initialized backpropagation.

Bengio et al., NIPS 2007

training validation test

shallow netrandom initialization

0.004% 1.8% 1.9%

deep netrandom initialization 0.004% 2.1% 2.4%

deep netunsupervised pre-training

0% 1.4% 1.4%

Page 24: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Why going deep?

● one hidden layer of

– neurons

– RBF units

– logic units

is a universal approximator

Page 25: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Why going deep?

● one hidden layer of

– neurons

– RBF units

– logic units

is a universal approximator

● stacking multiple hidden layers is more efficient than a single oneMontufar et al, NIPS 2014

Page 26: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Why going deep?

● one hidden layer of

– neurons

– RBF units

– logic units

is a universal approximator

● stacking multiple hidden layers is more efficient than a single oneMontufar et al, NIPS 2014

● hierarchy allows for more complex features

Page 27: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Recognizing numbers (Google Street View)

[graph credit Goodfellow, 2014]

Page 28: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Deep models make better use of more params

[graph credit Goodfellow, 2014]

Page 29: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Increase of Depth in ImageNet Classification

[graph credit K. He]

❏ dog❏ car❏ horse❏ bike❏ cat❏ bottle❏ person

Page 30: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Hierarchy of ML

● Neural Nets learnfeatures

● Deep Learninglearns a hierarchyof features

Fig. I Goodfellow

Page 31: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Issues with Backpropagation

● vanishing gradientgradient is diluted from layer to layer due to factor

● learning gets stuckespecially if started far from good regions(random initialization)

● huge number of parameters (connection weights)

Page 32: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Ingredients for Successful Deep Learning

Page 33: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Ingredients for Successful Deep Learning

● powerful priors to reduce number of parameters

– deep hierarchies

– Convolutional Networks

Page 34: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Ingredients for Successful Deep Learning

● powerful priors to reduce number of parameters

– deep hierarchies

– Convolutional Networks

● layer-wise training

Page 35: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Ingredients for Successful Deep Learning

● powerful priors to reduce number of parameters

– deep hierarchies

– Convolutional Networks

● layer-wise training

● boosting gradient descent

Page 36: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Ingredients for Successful Deep Learning

● powerful priors to reduce number of parameters

– deep hierarchies

– Convolutional Networks

● layer-wise training

● boosting gradient descent

● computing power

– simple non-linearity

– highly-parallel processing (GPU)

Page 37: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Ingredients for Successful Deep Learning

● powerful priors to reduce number of parameters

– deep hierarchies

– Convolutional Networks

● layer-wise training

● boosting gradient descent

● computing power

– simple non-linearity

– highly-parallel processing (GPU)

● Big Data

Page 38: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Ingredients for Successful Deep Learning

● powerful priors to reduce number of parameters

– deep hierarchies

– Convolutional Networks

● layer-wise training

● boosting gradient descent

● computing power

– simple non-linearity

– highly-parallel processing (GPU)

● Big Data

Page 39: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Convolutional Networks

● features in natural images are translation-invariantfeatures useful in one region are useful anywhere else

● motivates use of filter-bank of convolutions

Page 40: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Convolutional Networks

● features in natural images are translation-invariantfeatures useful in one region are useful anywhere else

● motivates use of filter-bank of convolutions

● pooling: aggregate (similar) results over an image region

2x2 pooling, stride 2

Max pooling

Average pooling

10x10 pooling, stride 10

Page 41: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Convolutional Networks

● features in natural images are translation-invariantfeatures useful in one region are useful anywhere else

● motivates use of filter-bank of convolutions

– small filter-kernel

– re-use filter-kernel (weight sharing)

– dramatic reduction of weights

● pooling: aggregate (similar) results over an image region

– reduce dimensionality of representation

– operations: mean, max, median, …

– overlapping or non-overlapping (stride vs. window size)

Page 42: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Convolution

32

inputfilterbiasoutput

3x32x323x5x5

11

feature map

1x28x28

convolve (slide) over all spatial locations

Convolving the filter with the input gives a feature map.

[figure adapted from A. Karpathy]

Page 43: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Page 44: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Convolution

32

feature maps

[figure adapted from A. Karpathy]

Convolution Layercomputes multiple feature maps

Convolving the filter with the input gives a feature map.

inputfilterbiasoutput

3x32x326x3x5x5

66x28x28

filter parameters: 6 * 3*52 = 450fully-conn. Params: 3*322 * 6*282 = 14M

Page 45: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Convolution Filters provide Rich Feature Maps

● 1st layer filters learned by AlexNet (ILSVRC‘12)

– 96 filters of size 11x11x3

– filters for oriented + colored edges

– resembles Gabor filters

Page 46: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Convolution Filters provide Rich Feature Maps

● Filters learned by Zeiler+Fergus (ILSVRC‘13)

● deeper layers exhibit more complex features

Page 47: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Convolutional Networks: Ingredients

● exploit spatial structure in input

● Normalization: average removal, variance normalization

● Filter bank: projection on overcomplete feature basis

● Non-Linearity: sparsification, saturation, lateral inhibition

● Pooling: aggregation over space or feature type

● deep convolutional networks: stack convolutional layers

normfilterbank

poolingnon-

linearity

Page 48: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Convnet Computation: 2012 & 2014

FC 1000

FC 4096 / ReLU

FC 4096 / ReLU

Max Pool 3x3s2

Conv 3x3s1, 256 / ReLU

Conv 3x3s1, 384 / ReLU

Conv 3x3s1, 384 / ReLU

Max Pool 3x3s2

Local Response Norm

Conv 5x5s1, 256 / ReLU

Max Pool 3x3s2

Local Response Norm

Conv 11x11s4, 96 / ReLU

4M

16M

37M

442K

1.3M

884K

307K

35K

4M

16M

37M

74M

112M

149M

223M

105M

params FLOPsAlexNet (2012)

AlexNet (ILSVRC12)

● 3x227x227 input image

● 60M parameters

● 725 MFLOPS

● < 1ms / image on Titan X

GoogleNet (ILSVRC14)

● 1.4 GFLOPs (200%)

● 5M parameters (10%)

● 14% more accurate

Architecture matters!Computational primitives are the same.

Page 49: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

GoogLeNet (2014)

● composition of multi-scale dimension-reduced“Inception” modules

● no FC layers

● only 5 million parameters

Page 50: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

1x1 Convolution

each filter has size64x1x1 and does a64-dim dot product

64x1x1 convolutionwith 32 filters

[figure credit A. Karpathy]

● compute pixel-specific combination of layer activities

● reduce channel dimension

● stack with non-linearity for deeper net

● found in many of the latest nets

Page 51: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Ingredients for Successful Deep Learning

● powerful priors to reduce number of parameters

– deep hierarchies

– Convolutional Networks

● layer-wise training

● boosting gradient descent

● computing power

– simple non-linearity

– highly-parallel processing (GPU)

● Big Data

Page 52: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Layer-wise training: AutoEncoder

● defeat vanishing gradient problem

● train network layer-wise using classical auto-encoder

x1

x2

x3

a1

a2

ŷ1

ŷ2

ŷ3

Page 53: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Layer-wise training: AutoEncoder

● defeat vanishing gradient problem

● train network layer-wise using classical auto-encoder

x1

x2

x3

a1

a2

ŷ1

ŷ2

ŷ3

● network trainedto predict input

● ŷ(x) ≈ x

Page 54: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Layer-wise training: AutoEncoder

● defeat vanishing gradient problem

● train network layer-wise using classical auto-encoder

x1

x2

x3

a1

a2

ŷ1

ŷ2

ŷ3

● network trainedto predict input

● ŷ(x) ≈ x● trivial solution unless:

● constrain #hidden units● constrain sparsity of

hidden units

Page 55: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Layer-wise training: AutoEncoder

● defeat vanishing gradient problem

● train network layer-wise using classical auto-encoder

x1

x2

x3

a1

a2

ŷ1

ŷ2

ŷ3

● drop output layer● consider hidden layer

as new, dimension-reduced representation of input

Page 56: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Layer-wise training: AutoEncoder

● defeat vanishing gradient problem

● train network layer-wise using classical auto-encoder

x1

x2

x3

a1

a2

● repeat procedure for next layer

● predict hidden layer activity

● ŷ(x) ≈ a

b1

b2

ŷ1

ŷ2

Page 57: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Layer-wise training: AutoEncoder

● defeat vanishing gradient problem

● train network layer-wise using classical auto-encoder

x1

x2

x3

a1

a2

● repeat procedure for next layer

● predict hidden layer activity

● ŷ(x) ≈ a

b1

b2

ŷ1

ŷ2

Page 58: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Layer-wise training: AutoEncoder

● defeat vanishing gradient problem

● train network layer-wise using classical auto-encoder

x1

x2

x3

a1

a2

b1

b2

c1

c2

ŷ1

ŷ2

Page 59: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Layer-wise training: AutoEncoder

● defeat vanishing gradient problem

● train network layer-wise using classical auto-encoder

x1

x2

x3

a1

a2

b1

b2

c1

c2

ŷ1

ŷ2

Page 60: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Layer-wise training: AutoEncoder

● defeat vanishing gradient problem

● train network layer-wise using classical auto-encoder

x1

x2

x3

a1

a2

b1

b2

c1

c2

d1

d2

ŷ1

ŷ2

Page 61: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Layer-wise training: AutoEncoder

● defeat vanishing gradient problem

● train network layer-wise using classical auto-encoder

x1

x2

x3

a1

a2

b1

b2

c1

c2

d1

d2

ŷ1

ŷ2

Page 62: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Layer-wise training: AutoEncoder

● defeat vanishing gradient problem

● train network layer-wise using classical auto-encoder

● final supervised training to task

x1

x2

x3

a1

a2

b1

b2

c1

c2

d1

d2

ŷ1

ŷ2

ŷ3

Page 63: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Denoising AutoEncoder

● stochastically corrupt input

● task: reconstruct original input

Page 64: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Denoising AutoEncoder

● stochastically corrupt input

● task: reconstruct original input

– random dropout with probability p:

Page 65: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Denoising AutoEncoder

● stochastically corrupt input

● task: reconstruct original input

– random dropout with probability p:

– Gaussian white noise:

Page 66: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Denoising AutoEncoder

● stochastically corrupt input

● task: reconstruct original input

● learns vector fieldpointing towardsdata distribution manifold

● better generalization

Page 67: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Ingredients for Successful Deep Learning

● powerful priors to reduce number of parameters

– deep hierarchies

– Convolutional Networks

● layer-wise training

● boosting gradient descent

● computing power

– simple non-linearity

– highly-parallel processing (GPU)

● Big Data

Page 68: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Boosting Gradient Descent

● Batching

● Momentum

● Learning Rate adaptation

Page 69: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Gradient Descent

Page 70: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Gradient Descent

● batch gradient:

– slow (full sweep over data required)

– accurate

Page 71: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Gradient Descent

● batch gradient:

– slow (full sweep over data required)

– accurate

● stochastic gradient:

– fast progress

– fluctuates near minima / saddles

– can escape from local minima

Page 72: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Mini-Batching

● combine the best of both worlds:average over small batch sizes

● fast convergence + reduced fluctuations

● assumes homogenous batches (e.g. randomly drawn)

● efficient on GPUs:parallel processing of several samples simultaneously

● reshuffle batches between epochs!

Page 73: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Classical Momentum

● gradient oscillates when navigating ravines

Page 74: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Classical Momentum

● gradient oscillates when navigating ravines

Page 75: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Classical Momentum

● gradient oscillates when navigating ravines

Page 76: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Classical Momentum

● gradient oscillates when navigating ravines

Page 77: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Classical Momentum

● gradient oscillates when navigating ravines

● add discounted average gradient

Page 78: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Classical Momentum

● gradient oscillates when navigating ravines

● add discounted average gradient

● speed-up by factor

Page 79: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Nesterov-Momentum

● invert order of momentum & gradient computation

● first jump to new location (due to momentum)

● and then compute corrective gradient

Page 80: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Nesterov-Momentum

● invert order of momentum & gradient computation

● first jump to new location (due to momentum)

● and then compute corrective gradient

● It‘s better to correct a mistake after you have made it.

Page 81: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Learning Rate Adaptation

● gradient defines direction

● optimal step size depends on curvature

● adapt

Figs. from DL-tutorial @ NIPS2015

Page 82: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Resilient Backpropagation (RPROP)

● use individual learning rates

Riedmiller, Braun ICANN 1993

Page 83: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Resilient Backpropagation (RPROP)

● use individual learning rates

Riedmiller, Braun ICANN 1993

Page 84: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Resilient Backpropagation (RPROP)

● use individual learning rates

● monitor direction (sign) of gradient

Riedmiller, Braun ICANN 1993

Page 85: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Resilient Backpropagation (RPROP)

● use individual learning rates

● monitor direction (sign) of gradient

– same sign: increase learning rate

Riedmiller, Braun ICANN 1993

Page 86: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Resilient Backpropagation (RPROP)

● use individual learning rates

● monitor direction (sign) of gradient

– same sign: increase learning rate

– sign change: decrease rate

● use directly as step size

Riedmiller, Braun ICANN 1993

Page 87: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Resilient Backpropagation (RPROP)

● use individual learning rates

● monitor direction (sign) of gradient

– same sign: increase learning rate

– sign change: decrease rate

● use directly as step size

● tends to overfitting

Riedmiller, Braun ICANN 1993

Page 88: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

ADAGRAD

● automatically tune down learning ratebased on learning history

● denominator grows with past update steps

● effective learning rate tends to zero

➔ learning stagnates

Page 89: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

ADADELTA

● average gradient updates across finite windowusing sliding average:

Page 90: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

ADADELTA

● average gradient updates across finite windowusing sliding average:

Page 91: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

ADADELTA

● average gradient updates across finite windowusing sliding average:

● correct units: nominator = average of weight updates

Page 92: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Adaptive Moment Estimation (ADAM)

● integrate momentum:sliding average of 1st and 2nd moments

Page 93: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Adaptive Moment Estimation (ADAM)

● integrate momentum:sliding average of 1st and 2nd moments

● biased towards zero (due to initialization)bias correction:

Page 94: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Adaptive Moment Estimation (ADAM)

● integrate momentum:sliding average of 1st and 2nd moments

● biased towards zero (due to initialization)bias correction:

Page 95: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Comparison of Optimizers

Fig. Sebastian Ruder

Page 96: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Ingredients for Successful Deep Learning

● powerful priors to reduce number of parameters

– deep hierarchies

– Convolutional Networks

● layer-wise training

● boosting gradient descent

● computing power

– simple non-linearity

– highly-parallel processing (GPU)

● Big Data

Page 97: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Rectified Linear Unit

● Softplus:

● ReLu:

● suitable to modelreal numbers

● max induces sparsityin hidden units

● no vanishing gradient

Page 98: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Highly Parallel Processing with GPUs

Row 1 Row 2 Row 3 Row 40

2

4

6

8

10

12

Column 1

Column 2

Column 3

Peak GFlops

NVIDIA GPU

US$ 250

2003 2004 2005 2006 2007 2008

Intel CPU

Page 99: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Ingredients for Successful Deep Learning

● powerful priors to reduce number of parameters

– deep hierarchies

– Convolutional Networks

● layer-wise training

● boosting gradient descent

● computing power

– simple non-linearity

– highly-parallel processing (GPU)

● Big Data

Page 100: Deep Learning for Incipient Slip DetectionCenter of Excellence Cognitive Interaction Technology Deep Learning History 1958 Perceptron (Rosenblatt) 1980 Neocognitron (Fukushima) 1982

Center of Excellence Cognitive Interaction Technology

Big Data

● many model parameters (weights)require many training examplesto avoid overfitting

● ImageNet: 1.3 million images

● unsupervised pre-training possible