Université Paris Dauphine, PSL Research University Accès ...
Deep Learning: a gentle introduction - Paris Dauphine Universityatif/lib/exe/fetch.php?... ·...
Transcript of Deep Learning: a gentle introduction - Paris Dauphine Universityatif/lib/exe/fetch.php?... ·...
![Page 1: Deep Learning: a gentle introduction - Paris Dauphine Universityatif/lib/exe/fetch.php?... · 2016-02-29 · Whyatalkaboutdeeplearning? Jamal Atif (Université Paris-Dauphine) Deep](https://reader034.fdocuments.in/reader034/viewer/2022042416/5f30fbf17df8e254d96bc209/html5/thumbnails/1.jpg)
Deep Learning: a gentle introduction
Jamal [email protected]
PSL, Université Paris-Dauphine, LAMSADE
February 8, 2016
Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 2016 1 / 1
![Page 2: Deep Learning: a gentle introduction - Paris Dauphine Universityatif/lib/exe/fetch.php?... · 2016-02-29 · Whyatalkaboutdeeplearning? Jamal Atif (Université Paris-Dauphine) Deep](https://reader034.fdocuments.in/reader034/viewer/2022042416/5f30fbf17df8e254d96bc209/html5/thumbnails/2.jpg)
Why a talk about deep learning?
Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 2016 2 / 1
![Page 3: Deep Learning: a gentle introduction - Paris Dauphine Universityatif/lib/exe/fetch.php?... · 2016-02-29 · Whyatalkaboutdeeplearning? Jamal Atif (Université Paris-Dauphine) Deep](https://reader034.fdocuments.in/reader034/viewer/2022042416/5f30fbf17df8e254d96bc209/html5/thumbnails/3.jpg)
Why a talk about deep learning?
Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 2016 3 / 1
![Page 4: Deep Learning: a gentle introduction - Paris Dauphine Universityatif/lib/exe/fetch.php?... · 2016-02-29 · Whyatalkaboutdeeplearning? Jamal Atif (Université Paris-Dauphine) Deep](https://reader034.fdocuments.in/reader034/viewer/2022042416/5f30fbf17df8e254d96bc209/html5/thumbnails/4.jpg)
Why a talk about deep learning?
Convolutional Networks (Yann Le Cun): challenge ILSVRC : 1000 categories and 1.461.406images.
Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 2016 4 / 1
![Page 5: Deep Learning: a gentle introduction - Paris Dauphine Universityatif/lib/exe/fetch.php?... · 2016-02-29 · Whyatalkaboutdeeplearning? Jamal Atif (Université Paris-Dauphine) Deep](https://reader034.fdocuments.in/reader034/viewer/2022042416/5f30fbf17df8e254d96bc209/html5/thumbnails/5.jpg)
Neuron’s basic anatomy
Figure: A neuron’s basic anatomy consists of four parts: a soma (cell body),dendrites, an axon, and nerve terminals. Information is received by dendrites,gets collected in the cell body, and flows down the axon.
Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 2016 5 / 1
![Page 6: Deep Learning: a gentle introduction - Paris Dauphine Universityatif/lib/exe/fetch.php?... · 2016-02-29 · Whyatalkaboutdeeplearning? Jamal Atif (Université Paris-Dauphine) Deep](https://reader034.fdocuments.in/reader034/viewer/2022042416/5f30fbf17df8e254d96bc209/html5/thumbnails/6.jpg)
Artificial neuron
∑ni=1 wixi + b
x1
x2
x3
xn
xn+1 = 1
Pre-activation ActivationActivation
w1
w2
w3
wd
b
Dendrites
Cell body
Axon
Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 2016 6 / 1
![Page 7: Deep Learning: a gentle introduction - Paris Dauphine Universityatif/lib/exe/fetch.php?... · 2016-02-29 · Whyatalkaboutdeeplearning? Jamal Atif (Université Paris-Dauphine) Deep](https://reader034.fdocuments.in/reader034/viewer/2022042416/5f30fbf17df8e254d96bc209/html5/thumbnails/7.jpg)
PerceptronRosenblatt 1957
Figure: Mark I Perceptron machine
Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 2016 7 / 1
![Page 8: Deep Learning: a gentle introduction - Paris Dauphine Universityatif/lib/exe/fetch.php?... · 2016-02-29 · Whyatalkaboutdeeplearning? Jamal Atif (Université Paris-Dauphine) Deep](https://reader034.fdocuments.in/reader034/viewer/2022042416/5f30fbf17df8e254d96bc209/html5/thumbnails/8.jpg)
Perceptron learning
Input: a sample S = {(x(1), y1), · · · , (x(n), yn)}I Initialize a parameter t to 0I Initialize the weights wi with random values.I Repeat
I Pick an example x(k) = [x(k)1 , · · · , x(k)
d ]T from SI Let y(k) be the target value and y(k) the computed value using the current
perceptronI If y(k) 6= y(k) then
I Update the weights: wi(t + 1) = wi(t) + ∆wi(t) where∆wi(t) = (y(k) − y(k))x
(k)i
I End IfI t = t+ 1
I Until all the examples in S were visited and no change occurs in theweights
Output: a perceptron for linear discrimination of S
Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 2016 8 / 1
![Page 9: Deep Learning: a gentle introduction - Paris Dauphine Universityatif/lib/exe/fetch.php?... · 2016-02-29 · Whyatalkaboutdeeplearning? Jamal Atif (Université Paris-Dauphine) Deep](https://reader034.fdocuments.in/reader034/viewer/2022042416/5f30fbf17df8e254d96bc209/html5/thumbnails/9.jpg)
Perceptron learningExample: learning the OR function
Initialization: w1(0) = w2(0) = 1, w3(0) = −1t w1(t) w2(t) w3(t) x(k)
∑wix
ki y(k) y(k) ∆w1(t) ∆w2(t) ∆w3(t)
0 1 1 -1 001 -1 0 0 0 0 01 1 1 -1 011 0 0 1 0 1 12 1 2 0 101 1 1 1 0 0 03 1 2 0 111 3 1 1 0 0 04 1 2 0 001 0 0 0 0 0 05 1 2 0 011 2 1 1 0 0 0
Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 2016 9 / 1
![Page 10: Deep Learning: a gentle introduction - Paris Dauphine Universityatif/lib/exe/fetch.php?... · 2016-02-29 · Whyatalkaboutdeeplearning? Jamal Atif (Université Paris-Dauphine) Deep](https://reader034.fdocuments.in/reader034/viewer/2022042416/5f30fbf17df8e254d96bc209/html5/thumbnails/10.jpg)
Perceptron illustration
1 if a > 00 elsewhere
0
0
w1 = 1
w2 = 1
w3 = −11
a =∑
iwixi
−1 y = 0
y = 0
wi = wi + (y − y)xi
w1 = 1 + 0 ∗ 0w2 = 1 + 0 ∗ 0w3 = −1 + 0 ∗ −1
x(1) = [0, 0]T t = 0
Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 2016 10 / 1
![Page 11: Deep Learning: a gentle introduction - Paris Dauphine Universityatif/lib/exe/fetch.php?... · 2016-02-29 · Whyatalkaboutdeeplearning? Jamal Atif (Université Paris-Dauphine) Deep](https://reader034.fdocuments.in/reader034/viewer/2022042416/5f30fbf17df8e254d96bc209/html5/thumbnails/11.jpg)
Perceptron illustration
1 if a > 00 elsewhere
0
1
w1 = 1
w2 = 1
w3 = −11
a =∑
iwixi
y = 0
y = 1
wi = wi + (y − y)xi
w1 = 1 + 1 ∗ 0 = 1w2 = 1 + 1 ∗ 1 = 2w3 = −1 + 1 ∗ 1 = 0
x(2) = [0, 1]T t = 1
0
Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 2016 11 / 1
![Page 12: Deep Learning: a gentle introduction - Paris Dauphine Universityatif/lib/exe/fetch.php?... · 2016-02-29 · Whyatalkaboutdeeplearning? Jamal Atif (Université Paris-Dauphine) Deep](https://reader034.fdocuments.in/reader034/viewer/2022042416/5f30fbf17df8e254d96bc209/html5/thumbnails/12.jpg)
Perceptron illustration
1 if a > 00 elsewhere
1
0
w1 = 1
w2 = 2
w3 = 01
a =∑
iwixi
y = 1
y = 1
wi = wi + (y − y)xi
w1 = 1 + 0 ∗ 0 = 1w2 = 2 + 0 ∗ 1 = 2w3 = 0 + 0 ∗ 1 = 0
x(3) = [1, 0]T t = 2
1
Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 2016 12 / 1
![Page 13: Deep Learning: a gentle introduction - Paris Dauphine Universityatif/lib/exe/fetch.php?... · 2016-02-29 · Whyatalkaboutdeeplearning? Jamal Atif (Université Paris-Dauphine) Deep](https://reader034.fdocuments.in/reader034/viewer/2022042416/5f30fbf17df8e254d96bc209/html5/thumbnails/13.jpg)
Perceptron illustration
1 if a > 00 elsewhere
1
1
w1 = 1
w2 = 2
w3 = 01
a =∑
iwixi
y = 1
y = 1
wi = wi + (y − y)xi
w1 = 1 + 0 ∗ 0 = 1w2 = 2 + 0 ∗ 1 = 2w3 = 0 + 0 ∗ −1 = 0
x(4) = [1, 1]T t = 3
3
Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 2016 13 / 1
![Page 14: Deep Learning: a gentle introduction - Paris Dauphine Universityatif/lib/exe/fetch.php?... · 2016-02-29 · Whyatalkaboutdeeplearning? Jamal Atif (Université Paris-Dauphine) Deep](https://reader034.fdocuments.in/reader034/viewer/2022042416/5f30fbf17df8e254d96bc209/html5/thumbnails/14.jpg)
Perceptron illustration
1 if a > 00 elsewhere
0
0
w1 = 1
w2 = 2
w3 = 01
a =∑
iwixi
y = 0
y = 0
wi = wi + (y − y)xi
w1 = 1 + 0 ∗ 0 = 1w2 = 2 + 0 ∗ 1 = 2w3 = 0 + 0 ∗ 1 = 0
x(1) = [0, 0]T t = 4
0
Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 2016 14 / 1
![Page 15: Deep Learning: a gentle introduction - Paris Dauphine Universityatif/lib/exe/fetch.php?... · 2016-02-29 · Whyatalkaboutdeeplearning? Jamal Atif (Université Paris-Dauphine) Deep](https://reader034.fdocuments.in/reader034/viewer/2022042416/5f30fbf17df8e254d96bc209/html5/thumbnails/15.jpg)
Perceptron illustration
1 if a > 00 elsewhere
0
1
w1 = 1
w2 = 2
w3 = 01
a =∑
iwixi
y = 1
y = 1
wi = wi + (y − y)xi
w1 = 1 + 0 ∗ 0 = 1w2 = 2 + 0 ∗ 1 = 2w3 = 0 + 0 ∗ 1 = 0
x(2) = [0, 1]T t = 5
2
Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 2016 15 / 1
![Page 16: Deep Learning: a gentle introduction - Paris Dauphine Universityatif/lib/exe/fetch.php?... · 2016-02-29 · Whyatalkaboutdeeplearning? Jamal Atif (Université Paris-Dauphine) Deep](https://reader034.fdocuments.in/reader034/viewer/2022042416/5f30fbf17df8e254d96bc209/html5/thumbnails/16.jpg)
Perceptron capacity
0 1
0
1
0 1
0
1
OR(x1, x2) AND(x1, x2)
Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 2016 16 / 1
![Page 17: Deep Learning: a gentle introduction - Paris Dauphine Universityatif/lib/exe/fetch.php?... · 2016-02-29 · Whyatalkaboutdeeplearning? Jamal Atif (Université Paris-Dauphine) Deep](https://reader034.fdocuments.in/reader034/viewer/2022042416/5f30fbf17df8e254d96bc209/html5/thumbnails/17.jpg)
Perceptron autumn
0 1
0
1
XOR(x1, x2)
Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 2016 17 / 1
![Page 18: Deep Learning: a gentle introduction - Paris Dauphine Universityatif/lib/exe/fetch.php?... · 2016-02-29 · Whyatalkaboutdeeplearning? Jamal Atif (Université Paris-Dauphine) Deep](https://reader034.fdocuments.in/reader034/viewer/2022042416/5f30fbf17df8e254d96bc209/html5/thumbnails/18.jpg)
Link with Logistic regression
∑di=1 wixi + b
x1
x2
x3
xd
xd+1 = 1
w1
w2
w3
wd
b
a(x) =
h(x) = 11+e−a(x)
Stochastic gradient update rule:
wj = wj + λ(y(i) − h(x(i)))x(i)j
The same as the perceptron update rule!
Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 2016 18 / 1
![Page 19: Deep Learning: a gentle introduction - Paris Dauphine Universityatif/lib/exe/fetch.php?... · 2016-02-29 · Whyatalkaboutdeeplearning? Jamal Atif (Université Paris-Dauphine) Deep](https://reader034.fdocuments.in/reader034/viewer/2022042416/5f30fbf17df8e254d96bc209/html5/thumbnails/19.jpg)
But!
0 1
0
1
XOR(x1, x2)
AND(x1, x2)
AND(x
1,x
2)
Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 2016 19 / 1
![Page 20: Deep Learning: a gentle introduction - Paris Dauphine Universityatif/lib/exe/fetch.php?... · 2016-02-29 · Whyatalkaboutdeeplearning? Jamal Atif (Université Paris-Dauphine) Deep](https://reader034.fdocuments.in/reader034/viewer/2022042416/5f30fbf17df8e254d96bc209/html5/thumbnails/20.jpg)
Multilayer Perceptron
Paul Werbos, 84. David Rumelhart, 86
x1
x2
x3
xd
1b
Input Layer
o1
om
Hidden layers
Output layer
1
Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 2016 20 / 1
![Page 21: Deep Learning: a gentle introduction - Paris Dauphine Universityatif/lib/exe/fetch.php?... · 2016-02-29 · Whyatalkaboutdeeplearning? Jamal Atif (Université Paris-Dauphine) Deep](https://reader034.fdocuments.in/reader034/viewer/2022042416/5f30fbf17df8e254d96bc209/html5/thumbnails/21.jpg)
1 Layer Perceptron
w11
w13
w1d
w12
w21
w22
w23
w2d
w31
w32
w33
w3d
wn1
wn2
w n3
wnd
x1
x2
x3
xn
1 1b
b2
Hidden Layer
w21
w22
w23
w2d
h1
h2
h3
hd
hj(x) = g(b+∑n
i=1wijxi)
Output Layerf(x) = o(b2 +
∑di=1w
21hi(x))
Input Layer
Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 2016 21 / 1
![Page 22: Deep Learning: a gentle introduction - Paris Dauphine Universityatif/lib/exe/fetch.php?... · 2016-02-29 · Whyatalkaboutdeeplearning? Jamal Atif (Université Paris-Dauphine) Deep](https://reader034.fdocuments.in/reader034/viewer/2022042416/5f30fbf17df8e254d96bc209/html5/thumbnails/22.jpg)
MLP training: backpropagationXOR example
w111
w112
w221
w222
b12b11 b2
1 1
∑i w
1i1xi
+ b11
∑i w
1i2x2 + b12
a1
∑k w
2khk + b2 ho(x) =
11+e−a(x)
a2
h(x) = 11+e−a(x)
h(x) = 11+e−a(x)
h1
h2
w21
w22
x1
x2
ao y
Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 2016 22 / 1
![Page 23: Deep Learning: a gentle introduction - Paris Dauphine Universityatif/lib/exe/fetch.php?... · 2016-02-29 · Whyatalkaboutdeeplearning? Jamal Atif (Université Paris-Dauphine) Deep](https://reader034.fdocuments.in/reader034/viewer/2022042416/5f30fbf17df8e254d96bc209/html5/thumbnails/23.jpg)
MLP training: backpropagationXOR example: initialisation
w111 = 1
w112=1
w1
21=1
w122 = 1
b12=−1
b1 1=
−1
b2=
−1
1 1
∑i w
1i1xi
+ b11
∑i w
1i2x2 + b12
a1
∑k w
2khk + b2 ho(x) =
11+e−a(x)
a2
h(x) = 11+e−a(x)
h(x) = 11+e−a(x)
h1
h2
w21 = 1
w22 = 1
x1
x2
ao y
Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 2016 23 / 1
![Page 24: Deep Learning: a gentle introduction - Paris Dauphine Universityatif/lib/exe/fetch.php?... · 2016-02-29 · Whyatalkaboutdeeplearning? Jamal Atif (Université Paris-Dauphine) Deep](https://reader034.fdocuments.in/reader034/viewer/2022042416/5f30fbf17df8e254d96bc209/html5/thumbnails/24.jpg)
MLP training: backpropagationXOR example: Feed-forward
w111 = 1
w112=1
w1
21=1
w122 = 1
b12=−1
b1 1=
−1
b2=
−1
1 1
∑i w
1i1xi
+ b11
∑i w
1i2x2 + b12
−1
∑k w
2khk + b2
ho(x) =1
1+e−ao(x)
−1
h1(x) =1
1+e−a1(x)
h2(x) =1
1+e−a2(x)
0.27
0.27
w21 = 1
w22 = 1
x1
x2
ao = y = 0.39
0
0
a1 =
a2 =
−0.46
y = 0
Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 2016 24 / 1
![Page 25: Deep Learning: a gentle introduction - Paris Dauphine Universityatif/lib/exe/fetch.php?... · 2016-02-29 · Whyatalkaboutdeeplearning? Jamal Atif (Université Paris-Dauphine) Deep](https://reader034.fdocuments.in/reader034/viewer/2022042416/5f30fbf17df8e254d96bc209/html5/thumbnails/25.jpg)
MLP training: backpropagationXOR example: Backpropagation
w111 = 1
w112=1
w1
21=1
w122 = 1
b12=−1
b1 1=
−1
b2=
−1
1 1
∑i w
1i1xi
+ b11
∑i w
1i2x2 + b12
−1
∑k w
2khk + b2
ho(x) =1
1+e−ao(x)
−1
h1(x) =1
1+e−a1(x)
h2(x) =1
1+e−a2(x)
0.27
0.27
w21 = 1
w22 = 1
x1
x2
ao = y = 0.39
0
0
a1 =
a2 =
−0.46
y = 0
w2k = w2
k + ηδk
δk = (y − y)(1 − y)yhk
=∂Ew
∂y∂y∂ao
∂ao
∂w2k
Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 2016 25 / 1
![Page 26: Deep Learning: a gentle introduction - Paris Dauphine Universityatif/lib/exe/fetch.php?... · 2016-02-29 · Whyatalkaboutdeeplearning? Jamal Atif (Université Paris-Dauphine) Deep](https://reader034.fdocuments.in/reader034/viewer/2022042416/5f30fbf17df8e254d96bc209/html5/thumbnails/26.jpg)
MLP training: backpropagationXOR example: Backpropagation
w111 = 1
w112=1
w1 21=1
w122 = 1
b12=−.9
8
b1 1=
−.98
b2=
−1.07
1 1
∑i w
1i1xi
+ b11
∑i w
1i2x2 + b12
−1
∑k w
2khk + b2
ho(x) =1
1+e−ao(x)
−1
h1(x) =1
1+e−a1(x)
h2(x) =1
1+e−a2(x)
0.27
0.27
w21 = 0.98
w22 = 0.98
x1
x2
ao = y = 0.39
0
0
a1 =
a2 =
−0.46
y = 0
w2k = w2
k − ηδk
δk = −(y − y)(1− y)yhk=∂Ew
∂y∂y∂ao
∂ao
∂w2k
δk = ∂Ew
∂w2k
, Ew = 12 (y − y)2
w21 = w2
1 + 1 ∗ (0 − .39)(1− .39).3 ∗ 0.27
w22 = w2
2 + 1 ∗ (0 − .39)(1− .39).3 ∗ 0.27
b2 = b2 + 1 ∗ (0 − .39)(1− .39).3 ∗ 1
w1ij = w1
ij − ηδij δij =∂Ew
∂wij= ∂Ew
∂ho
∂ho
∂ao
∂ao
∂hj
∂hj
∂aj
∂aj
∂wij
δij = −(y − y)(1 − y)yw2jhj(1− hj)xi
w111 = 1 + 1 ∗ (0− .39)(1− .39) ∗ .39 ∗ 1 ∗ .27 ∗ (1 − .27) ∗ 0
w122 = 1 + 1 ∗ (0− .39)(1− .39) ∗ .39 ∗ 1 ∗ .27 ∗ (1 − .27) ∗ 0
b11 = −1 + 1 ∗ (0− .39)(1− .39) ∗ .39 ∗ 1 ∗ .27 ∗ (1− .27) ∗ −1
...
Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 2016 26 / 1
![Page 27: Deep Learning: a gentle introduction - Paris Dauphine Universityatif/lib/exe/fetch.php?... · 2016-02-29 · Whyatalkaboutdeeplearning? Jamal Atif (Université Paris-Dauphine) Deep](https://reader034.fdocuments.in/reader034/viewer/2022042416/5f30fbf17df8e254d96bc209/html5/thumbnails/27.jpg)
MLP training: backpropagationXOR example: Feed-forward
w111 = 1
w112=1
w1
21=1
w122 = 1
b12=−.9
8
b1 1=
−.98
b2=
−1.07
1 1
∑i w
1i1xi
+ b11
∑i w
1i2x2 + b12
.02
∑k w
2khk + b2
ho(x) =1
1+e−ao(x)
.02
h1(x) =1
1+e−a1(x)
h2(x) =1
1+e−a2(x)
0.5
0.5
w21 = 0.98
w22 = 0.98
x1
x2
ao = y ≈ 0.5
0
1
a1 =
a2 =
−0.09
y = 1
11
Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 2016 27 / 1
![Page 28: Deep Learning: a gentle introduction - Paris Dauphine Universityatif/lib/exe/fetch.php?... · 2016-02-29 · Whyatalkaboutdeeplearning? Jamal Atif (Université Paris-Dauphine) Deep](https://reader034.fdocuments.in/reader034/viewer/2022042416/5f30fbf17df8e254d96bc209/html5/thumbnails/28.jpg)
Multilayer Perceptron as a deep NN
MLP with more than 2/3 layers is a deep network
x1
x2
x3
xd
1b
Input Layer
o1
om
Hidden layers
Output layer
1
Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 2016 28 / 1
![Page 29: Deep Learning: a gentle introduction - Paris Dauphine Universityatif/lib/exe/fetch.php?... · 2016-02-29 · Whyatalkaboutdeeplearning? Jamal Atif (Université Paris-Dauphine) Deep](https://reader034.fdocuments.in/reader034/viewer/2022042416/5f30fbf17df8e254d96bc209/html5/thumbnails/29.jpg)
Training issues
Setting the hyperparametersI InitializationI Number of iterationsI Learning rate ηI Activation functionI Early stopping criterion
Overfitting/UnderfittingI Number of hidden layersI Number of neurons
OptimizationVanishing gradient problem
Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 2016 29 / 1
![Page 30: Deep Learning: a gentle introduction - Paris Dauphine Universityatif/lib/exe/fetch.php?... · 2016-02-29 · Whyatalkaboutdeeplearning? Jamal Atif (Université Paris-Dauphine) Deep](https://reader034.fdocuments.in/reader034/viewer/2022042416/5f30fbf17df8e254d96bc209/html5/thumbnails/30.jpg)
Overfitting/Underfitting
Source: Bishop, PRMLJamal Atif (Université Paris-Dauphine) Deep Learning February 8, 2016 30 / 1
![Page 31: Deep Learning: a gentle introduction - Paris Dauphine Universityatif/lib/exe/fetch.php?... · 2016-02-29 · Whyatalkaboutdeeplearning? Jamal Atif (Université Paris-Dauphine) Deep](https://reader034.fdocuments.in/reader034/viewer/2022042416/5f30fbf17df8e254d96bc209/html5/thumbnails/31.jpg)
Overfitting/Underfitting
Source: Bishop, PRML
Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 2016 31 / 1
![Page 32: Deep Learning: a gentle introduction - Paris Dauphine Universityatif/lib/exe/fetch.php?... · 2016-02-29 · Whyatalkaboutdeeplearning? Jamal Atif (Université Paris-Dauphine) Deep](https://reader034.fdocuments.in/reader034/viewer/2022042416/5f30fbf17df8e254d96bc209/html5/thumbnails/32.jpg)
What’s new!
Before 2006, training deep architectures was unsuccessful!
UnderfittingNew optimization techniques
OverfittingBengio, Hinton, LeCun
I Unsupervised pre-trainingI Stochastic “dropout" training
Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 2016 32 / 1
![Page 33: Deep Learning: a gentle introduction - Paris Dauphine Universityatif/lib/exe/fetch.php?... · 2016-02-29 · Whyatalkaboutdeeplearning? Jamal Atif (Université Paris-Dauphine) Deep](https://reader034.fdocuments.in/reader034/viewer/2022042416/5f30fbf17df8e254d96bc209/html5/thumbnails/33.jpg)
Unsupervised pre-training
Main ideaInitialize the network in an unsupervised way.
ConsequencesI Network layers encode successive invariant features (latent structure of
the data)I Better optimization and hence better generalization
Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 2016 33 / 1
![Page 34: Deep Learning: a gentle introduction - Paris Dauphine Universityatif/lib/exe/fetch.php?... · 2016-02-29 · Whyatalkaboutdeeplearning? Jamal Atif (Université Paris-Dauphine) Deep](https://reader034.fdocuments.in/reader034/viewer/2022042416/5f30fbf17df8e254d96bc209/html5/thumbnails/34.jpg)
Unsupervised pre-training
How to?Proceed greedily, layer by layer.
x1
x2
x3
xd
Input Layer =⇒
x1
x2
x3
xd
Input Layer =⇒
x1
x2
x3
xd
Input Layer
Unsupervised NN learning techniquesI Restricted Boltzmann Machines (RBM)I Auto-encodersI and many many variants since 2006
Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 2016 34 / 1
![Page 35: Deep Learning: a gentle introduction - Paris Dauphine Universityatif/lib/exe/fetch.php?... · 2016-02-29 · Whyatalkaboutdeeplearning? Jamal Atif (Université Paris-Dauphine) Deep](https://reader034.fdocuments.in/reader034/viewer/2022042416/5f30fbf17df8e254d96bc209/html5/thumbnails/35.jpg)
Unsupervised pre-trainingAutoencoder
Credit: Hugo Larochelle
Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 2016 35 / 1
![Page 36: Deep Learning: a gentle introduction - Paris Dauphine Universityatif/lib/exe/fetch.php?... · 2016-02-29 · Whyatalkaboutdeeplearning? Jamal Atif (Université Paris-Dauphine) Deep](https://reader034.fdocuments.in/reader034/viewer/2022042416/5f30fbf17df8e254d96bc209/html5/thumbnails/36.jpg)
Unsupervised pre-trainingSparse Autoencoders
Credit: Y. Bengio
Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 2016 36 / 1
![Page 37: Deep Learning: a gentle introduction - Paris Dauphine Universityatif/lib/exe/fetch.php?... · 2016-02-29 · Whyatalkaboutdeeplearning? Jamal Atif (Université Paris-Dauphine) Deep](https://reader034.fdocuments.in/reader034/viewer/2022042416/5f30fbf17df8e254d96bc209/html5/thumbnails/37.jpg)
Fine tuning
How to?I Add the output layerI Initialize its weights with random valuesI initialize the hidden layers with the values from the pre-trainingI Update the layers by Backpropagation
x1
x2
x3
xd
Input Layer
Output Layer
Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 2016 37 / 1
![Page 38: Deep Learning: a gentle introduction - Paris Dauphine Universityatif/lib/exe/fetch.php?... · 2016-02-29 · Whyatalkaboutdeeplearning? Jamal Atif (Université Paris-Dauphine) Deep](https://reader034.fdocuments.in/reader034/viewer/2022042416/5f30fbf17df8e254d96bc209/html5/thumbnails/38.jpg)
Drop out
IntuitionRegularize the network by dropping out stochastically some hidden units.
ProcedureAssign to each hidden unit a value 0 with probability p (common choice: .5)
x1
x2
x3
xd
Input Layer
Output Layer
Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 2016 38 / 1
![Page 39: Deep Learning: a gentle introduction - Paris Dauphine Universityatif/lib/exe/fetch.php?... · 2016-02-29 · Whyatalkaboutdeeplearning? Jamal Atif (Université Paris-Dauphine) Deep](https://reader034.fdocuments.in/reader034/viewer/2022042416/5f30fbf17df8e254d96bc209/html5/thumbnails/39.jpg)
Some applications
Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 2016 39 / 1
![Page 40: Deep Learning: a gentle introduction - Paris Dauphine Universityatif/lib/exe/fetch.php?... · 2016-02-29 · Whyatalkaboutdeeplearning? Jamal Atif (Université Paris-Dauphine) Deep](https://reader034.fdocuments.in/reader034/viewer/2022042416/5f30fbf17df8e254d96bc209/html5/thumbnails/40.jpg)
Computer visionConvolutional Neural Networks Lecun, 89–
State of the art in digit recognition
Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 2016 40 / 1
![Page 41: Deep Learning: a gentle introduction - Paris Dauphine Universityatif/lib/exe/fetch.php?... · 2016-02-29 · Whyatalkaboutdeeplearning? Jamal Atif (Université Paris-Dauphine) Deep](https://reader034.fdocuments.in/reader034/viewer/2022042416/5f30fbf17df8e254d96bc209/html5/thumbnails/41.jpg)
Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 2016 41 / 1
![Page 42: Deep Learning: a gentle introduction - Paris Dauphine Universityatif/lib/exe/fetch.php?... · 2016-02-29 · Whyatalkaboutdeeplearning? Jamal Atif (Université Paris-Dauphine) Deep](https://reader034.fdocuments.in/reader034/viewer/2022042416/5f30fbf17df8e254d96bc209/html5/thumbnails/42.jpg)
Modern CNN
A. Krizhevsky, I. Sutskever, and G. Hinton, ImageNet Classification with DeepConvolutional Neural Networks, NIPS 2012
I 7 hidden layers, 650,000 units, 60,000,000 parametersI Drop outI 106 imagesI GPU implementationI Activation function: f(x) = max(0, x) (ReLu)
Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 2016 42 / 1
![Page 43: Deep Learning: a gentle introduction - Paris Dauphine Universityatif/lib/exe/fetch.php?... · 2016-02-29 · Whyatalkaboutdeeplearning? Jamal Atif (Université Paris-Dauphine) Deep](https://reader034.fdocuments.in/reader034/viewer/2022042416/5f30fbf17df8e254d96bc209/html5/thumbnails/43.jpg)
Computer visionConvolutional Neural Networks
LeCun, Hinton, Bengio, Nature 2015
Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 2016 43 / 1
![Page 44: Deep Learning: a gentle introduction - Paris Dauphine Universityatif/lib/exe/fetch.php?... · 2016-02-29 · Whyatalkaboutdeeplearning? Jamal Atif (Université Paris-Dauphine) Deep](https://reader034.fdocuments.in/reader034/viewer/2022042416/5f30fbf17df8e254d96bc209/html5/thumbnails/44.jpg)
What is deep in deep learning!Old vs New ML paradigms
Raw data
Hand-craftedfeatures
Raw data
Learnedfeatures
Classifier
DecisionDecision
ClassifierRepresen
tatio
nlea
rning
Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 2016 44 / 1
![Page 45: Deep Learning: a gentle introduction - Paris Dauphine Universityatif/lib/exe/fetch.php?... · 2016-02-29 · Whyatalkaboutdeeplearning? Jamal Atif (Université Paris-Dauphine) Deep](https://reader034.fdocuments.in/reader034/viewer/2022042416/5f30fbf17df8e254d96bc209/html5/thumbnails/45.jpg)
Softwares
I TensorFlow (google), https://www.tensorflow.org/I Theano (Python CPU/GPU) mathematical and deep learning library
http ://deeplearning.net/software/theano. Can do automatic, symbolicdifferentiation.
I Senna: NLPI by Collobert et al. http ://ronan.collobert.com/senna/I State-of-the-art performance on many tasksI 3500 lines of C, extremely fast and using very little memory.
I Torch ML Library (C+Lua) : http ://www.torch.ch/I Recurrent Neural Network Language Model http ://www.fit.vutbr.cz/
imikolov/rnnlmI Recursive Neural Net and RAE models for paraphrase detection,
sentiment analysis, relation classiffcation www.socher.org
Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 2016 45 / 1