Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau -...

74
“Demystifying” Deep Learning Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17

Transcript of Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau -...

Page 1: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

“Demystifying” Deep LearningValentin Leveau and Titouan Lorieul

Cocomeet 15/12/17

Page 2: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Summary1. Introduction2. From shallow neural networks to (very) deep ones3. Why deep CNN get popularized so late?4. Theoretical focus: learning as an optimization problem5. Practical limitations of neural networks

Page 3: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

We now are good at mimicking some part of intelligence : Learning

(Machine) Learning = Learning from examples to do a given task and

generalize to new examples.

Predict some variables given others.

Capture statistical relationships / structure

between observed variables.

Introduction: DL vs ML

AIMachine Learning

Deep Learning

Page 4: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Machine Learning : Supervised Learning It is all about learning a prediction function parametrized by some

parameters

A simple function could be a linear model:

Goal is to learn the parameters so that we predict good values of yi given xi

Page 5: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

An image can be modeled by its pixel representation:

Very very high dimensional feature space

Not relevant features

Classification in high dimensional spaces

See is why a machine sees !!!

Page 6: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Find Non Linear Invariant in the data (S. Mallat)

Page 7: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Find a mapping such that the problem becomes linearly separable

We want to produce an abstraction of the image containing highly

discriminative features→ Representation Learning

Find Non Linear Invariant in the data (S. Mallat)

Page 8: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Produce handcrafted intermediate representation of images Image abstraction that contain relevant information

Problem: Decide manually which kind of features are good for the different tasks

Solution: Let the system learn the change of variable

→ Toward Deep Representation Learning

Mid-LevelRepresentation

(BoW, Fisher Vectors,...)

Low-LevelRepresentation(SIFT, GIST,...)

Classification

Label

Query Image Intermediate Representation pipeline

Several strategies to go non linear

Page 9: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

11/09/2016Valentin Leveau - Kernelizing spatially consistent visual matches for FGC - 9

• Learning several levels of abstraction of the input signal (compositionality).• Progressive Embedding from input pixels to the label space.• Jointly learning all the trainable modules.

Low-level feature

extractionClassification

Image

Intermediate Representation pipeline Local

pooling

(sum, max, avg, etc.)

High-level feature

extraction

Local pooling

(sum, max, avg, etc.)

Deep (Convolutional) Neural Network

Page 10: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

From Shallow Neural Networks to (very) Deep Ones

Page 11: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Perceptron : Simple Elementary Neural Unit

Page 12: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Multi Layer Perceptron 2 layers architecture :

Layer 1: several units in parallel + non-linearities

Layer 2 : final linear classifier unit

W1 W2 W3 W4 W5

input data Deep non linear embedding Classification layer

Page 13: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Multi Layer Perceptron• The nonlinearities are crucial !!!

• Linear combination of linear combinations = Linear combination ”depth is useless)

• Nonlinearities allow to bend the space to get samples linearly separable ”click the curved space)

Page 14: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Optimization strategy : Supervised Learning Define an objective function to minimize with respect to the parameters

Example: least square minimization over training samples

Page 15: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Optimization strategy : Gradient Descent (GD) Update parameters of an objective function in

the direction of the gradient

Several GD strategies: a) All the training samples : Batch GD (exact gradient of J) b) Only one training sample : Stochastic GD c) A subsample of the training set: Mini-batch GD (reasonable approximation of a)

E

Page 16: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

t

Page 17: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Gradient descent through deep feedforward models Gradient can be hard to compute A lot of parameters with interdependencies from layer to layer Solution: Deep model is a composition of function

Page 18: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Backpropagation of the gradient

We move co linearly in the direction of input data

Page 19: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Convolutional Neural Network

W1 W2 W3 W4 W5

Such deep non linear mapping still suffer from lack of efficiency to learn invariance We still try to map pixel-wise vectorial representation to the target space. We need stronger assumptions about the model to learn useful invariants for vision.

Solution: Replace linear applications by convolution kernels with learnable params !

Page 20: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

A convolutionReplace linear applications by a convolution

Page 21: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Applying a convolution is equivalent to sharing local receptive fields parameters !→ Huge numbers of matrix multiplications → But the number of free parameters is low !

Translation Invariance with Conv Layers

image patch at position (x,y)

Weight sharing:

Activation at position ”x,y) for the k-th filter

Page 22: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Pooling layers Progressively reduce spatial resolution (e.g. max pooling):

Backprop gradient only spatial locations that chosen as max

Page 23: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Why ConvNet got famous so late ?

• Not many theoretical justifications ”compared to CV + Kernel Methods)

• Too much parameters for little datasets and too slow algorithms and hardwares

• High convergence problems ”gradient vanishing, bad normalizations, …)

• No good ways to initialize deep models

→ Motivations in Unsupervised Learning ”2000-2011)

Page 24: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Learn a model to: Project data into a non linear intermediate embedding ”Coding)

Reconstruct its own input from the codes ”Decoding)

PCA’s eigen directions span the same space than linear autoencoder’s

Coding function:

Decoding function:

Reconstruction objective:

Unsupervised Learning example: Autoencoder

min ||X - X_hat||²

h

X X_hat

Page 25: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Unsupervised Learning example: Autoencoder It is more efficient in practice to learn the layers separately.

→ Actually with ReLU, that’s okay.

Stacked autoencoders: Train the first autoencoder layer to reconstruct the input

Use the intermediate representation as input to the next layer and re-apply the process

Fine-tune the model to jointly learn the layers

Page 26: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Why ConvNet got famous so late ?• Large scale labelled dataset : ImageNet ”15 M)

• Hardware acceleration : GPU

Page 27: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

• The ReLU nonlinearity : • The killer detail : as good (if not better) as unsupervised pretraining !

What Changed ?

Page 28: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

What Changed ?• Batch Normalization ”x14 faster !!!)• Dropout

Page 29: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Evolution of the architectures • LeNet1-5

[Y. LeCun et al. 1989]

• AlexNet[A. Krizhevsky et al. 2012]

Page 30: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Evolution of the architectures • VGG Net

[K. Simonyan et al. 2014]

Page 31: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Evolution of the architectures • GoogLeNet

[C. Szegedy et al. 2015]

Page 32: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Evolution of the architectures • Inception modules

Page 33: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Evolution of the architectures • ResNet ”152 layers, even 1,000 ...)

[K. He et al. 2015]

Page 34: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Evolution of the architectures • Densenet:

[Liuzhuang et al. 2016]

Page 35: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Setting hyper-parameters of DL is a Nightmare !

• A lot more of good practices to make it work:

• Data-augmentation ”random flip, crop, rotation, etc …)

• Momentum and adaptive learning rate methods

• Learning rate tuning and policy

• Learn an ensemble of models (and aggregate somehow their predictions)

Page 36: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Setting hyper-parameters of DL is a Nightmare !

• Learning rate decay

Page 37: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Transfer learning (fine-tuning)Problem: CNNs require huge training data to learn the millions of parameters Solution: Learn domain specific features by transfer learning

1. Train CNN on a generalist image dataset with millions of images

Page 38: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Transfer learning (fine-tuning)Problem: CNNs require huge training data to learn the millions of parameters Solution: Learn domain specific features by transfer learning

1. Train CNN on a generalist image dataset with millions of images2. Keep the weights of the lowest layers but remove/reset the top layers

Page 39: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Transfer learning (fine-tuning)Problem: CNNs require huge training data to learn the millions of parameters Solution: Learn domain specific features by transfer learning

1. Train CNN on a generalist image dataset with millions of images2. Keep the weights of the lowest layers but remove/reset the top layers3. Feed forward and back-propagate new domain specific images (with usually

a different number of classes C)

New output layer

Layer 7

Page 40: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

The power of transfer learningTransfer learning usually works for any domain

Table 1 - accuracy measured on several fine-grained image classification datasets

Even very specific ones:

Rice seeds varieties recognition 100 classes, 1 500 texture images

GoogLeNet trained from scratch 58.1%

GoogLeNet pre-trained on ImageNet 70.4%

Herbaria species recognition 255 classes, 11K herbaria sheets

Trademark Logos Car models Paris Buildings Aircraft models Bird species Flower species

GoogLeNet trained from scratch 67.7% 59.3% 55.3% 72.7% 24.4% 59.5%

GoogLeNet pre-trained on ImageNet 87.5% 79.9% 71.3% 88.1% 72.4% 89.5%

GoogLeNet trained from scratch 8.8%

GoogLeNet pre-trained on ImageNet 52.4%

Page 41: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Frameworks

From academia…

...to industry

etc.

ToolkitHardware

GPUs…

Mobile…

FPGA… Project Brainwave (Microsoft)

And others… TPUs (Google), etc.

Page 42: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Applications

Page 43: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Plant species recognition: Pl@ntNet

Page 44: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Localization / Segmentation (Deep Mask)

Page 45: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Style Transfer to real and sketch images

Page 46: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Image Generation with GAN[S. Chopra et al. 2015]

Page 47: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Generative Adversarial Network (GAN)[I. GoodFellow et al. 2014]

Page 48: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Logic with Deep Learning

Page 49: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

NLP with Memory Networks

Page 50: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

NLP with Memory Networks

Page 51: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Image Captioning via attention based models

Page 52: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

“Reverse Image Captioning” :

Page 54: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Questions

Page 55: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Theoretical Focus:Learning as an Optimization Problem

Page 56: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Statistical Learning Theory

Data generating process: supervised learning

X × Y probability space Input space X , e.g. Rd, images, ... Output space Y, e.g. R, 0, 1, 0, 1, ...,K − 1, ...

Data drawn from p(x, y) is fixed but unknown probability models uncertainty p(x, y) = p(y|x)p(x)

We have a sample set Tn = (xi, yi), i = 1, 2, ..., n

Objective

Find the function f that best predict y given x.

y = f(x)

Page 57: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Objective

Find the function f that best predict y given x.

y = f(x)

Formalism

Reduce search to a parametrized function class f(.; θ) e.g. linear regression f(x;w, b) = wTx+ b, Θ = R

d × R

Loss function: l(y, y) = l(y, f(x; θ)) e.g. squared loss l(y, y) = (y − y)2

Risk function: J∗(θ) = Ex,y[l(y, f(x; θ))]

Optimization problem 1

minθ∈Θ

J∗(θ)

Page 58: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Statistical Learning Theory

Optimization problem 1

minθ∈Θ

Ex,y[l(y, f(x; θ))]

p(x, y) is unknown, we cannot compute the expectation.

Use empirical risk instead J(θ) = 1

n

∑i l(yi, f(xi; θ))

Optimization problem 2

minθ∈Θ

J(θ)

Page 59: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Optimization algorithms

Usually, we have convex objective functions and Θ ⊂ Rp. We

know how to deal with these optimization problems:

if derivable, first-order gradient approaches: GradientDescent, Stochastic Gradient Descent, ...

if two time derivable, second-order approaches:Newton-Raphson, Quasi-Newton, ...

...

Page 60: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Optimization algorithms

Usually, we have convex objective functions and Θ ⊂ Rp. We

know how to deal with these optimization problems:

if derivable, first-order gradient approaches: GradientDescent, Stochastic Gradient Descent, ...

if two time derivable, second-order approaches:Newton-Raphson, Quasi-Newton, ...

...

However, when measured on a test set, the error is high. Thefitted model does not generalize on new input...

Why?

Page 61: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Underfitting VS Overfitting

Page 62: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Regularization term

Optimization problem 3

minθ∈Θ

J(θ) + λΩ(θ)

where λ is the regularization parameter (hyperparameter)

Examples:

L1 regularization: Ω(θ) = ‖θ‖1 =∑

k |θk|

L2 regularization: Ω(θ) = 1

2‖θ‖2

2= 1

2

∑k θ

2

k

Backed by theory: VC theory, PAC learning

ǫ2 ≤log |H|+ log 1

δ

2n

Page 63: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

What about Neural Networks?

Optimization problem

definitely not convex

needs a lot of engineering to make it converge tosomething: good initialization, ReLU activations, lots oftricks to avoid gradient from exploding, etc.

...but somehow Stochastic Gradient Descent works well

Generalization

no regularization term (but other approaches)

huge capacity (universal approximation theorem): classictheory do not provide a good bound on generalization...

In the end, we do not even optimize what we would like(surrogate loss), we just monitor the validation error in order toknow when we stop...

Page 64: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

What about Neural Networks?

Open question: Why does it work?

Optimization

empirically: a lot of minima reached by SGD are equivalentin term of generalization error1

Generalization

new theories rising?2

1Choromanska et al. 2015; Goodfellow and Vinyals 2014.2Shwartz-Ziv and Tishby 2017.

Page 65: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Practical Limitations of Neural Networks

Page 66: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Some current limitations

Interpretability

building predictive models can be useful to performanalysis on a set of variables (e.g. economics, ecology, ...)

... given we can understand what the model has learned

but neural networks are black boxes

Can be fooled

no measure of prediction uncertainty usually outputs only some sort of scores that can not be

interpreted as uncertainty estimates Bayesian Neural Networks (BNN)3

easy to find adversarial examples need for formal verification

3Kendall and Gal 2017.

Page 67: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Some failures...

Because neural networks have allowed us to make considerableprogress in a lot of different tasks, we might considerer themsolved... but there are not.

Page 68: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Some failures...

Page 69: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Some failures...

Page 70: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Some failures...

Page 71: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Adversarial examples4

4Goodfellow, Shlens, et al. 2014; Kurakin et al. 2016.

Page 72: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Verification

For safety-critical systems, we need formal guarantees on thebehavior of the algorithms, e.g.:

check properties of DNN implementation of next-generationairborne collision avoidance system for unmanned aircraft5

verify that there is no adversarial example around a testpoint6

5Katz et al. 2017.6Huang et al. 2017.

Page 73: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Conclusion

Deep Learning has been very successful and has a very largearray of applications.

Proved that the current statistical learning theories areinsufficient.

We need to rethink of generalization in high dimension.

We need to find a new learning theory.

Lack of theoretical understanding limits their usage as blackbox modules in software because they do not provide strongcontracts (hence need for formal verification).

There is a current on-going debate in the community (e.g. seeAli Rahimi’s NIPS 2017 Test-of-Time award talk).

Page 74: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several

Questions?