Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by...
Transcript of Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by...
![Page 1: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/1.jpg)
Deep LearningPart II
University of Wisconsin, Madison
Some of the slides from Yingyu Liang, Marc'Aurelio Ranzato and others
![Page 2: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/2.jpg)
Convolution (continued)
![Page 3: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/3.jpg)
Illustration 1
a b c d e f
x y z
xb+yc+zd𝑤𝑤= [z, y, x]𝑢𝑢 = [a, b, c, d, e, f]
𝑠𝑠3
𝐰𝐰𝟐𝟐 𝐰𝐰𝟏𝟏 𝐰𝐰𝟎𝟎
𝐮𝐮𝟏𝟏 𝒖𝒖𝟐𝟐 𝐮𝐮𝟑𝟑
![Page 4: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/4.jpg)
Illustration 1
a b c d e f
x y z
xc+yd+ze𝑠𝑠4
𝐰𝐰𝟐𝟐 𝐰𝐰𝟏𝟏 𝐰𝐰𝟎𝟎
𝐮𝐮𝟐𝟐 𝒖𝒖𝟑𝟑 𝐮𝐮𝟒𝟒
![Page 5: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/5.jpg)
Illustration 1
a b c d e f
x y z
xd+ye+zf
𝐰𝐰𝟐𝟐 𝐰𝐰𝟏𝟏 𝐰𝐰𝟎𝟎
𝐮𝐮𝟑𝟑 𝒖𝒖𝟒𝟒 𝐮𝐮𝟓𝟓
𝑠𝑠5
![Page 6: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/6.jpg)
Illustration 1: boundary case
a b c d e f
x y
xe+yf
𝐰𝐰𝟐𝟐 𝐰𝐰𝟏𝟏
𝒖𝒖𝟒𝟒 𝐮𝐮𝟓𝟓
𝑠𝑠6
![Page 7: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/7.jpg)
Gradient of convolution
• Convolution 𝑠𝑠 = 𝑢𝑢 ∗ 𝑤𝑤
• We will need to compute the gradient for back-propagation• Gradient with respect to the input 𝑢𝑢• Gradient with respect to the filter / kernel 𝑤𝑤
![Page 8: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/8.jpg)
Gradient of convolution
y z
x y z
x y z
x y z
x y z
x y
a
b
c
d
e
f
𝑤𝑤= [z, y, x] 𝑢𝑢 = [a, b, c, d, e, f] 𝑠𝑠 = 𝑢𝑢 ∗ 𝑤𝑤
𝒔𝒔𝟏𝟏𝒔𝒔𝟐𝟐𝒔𝒔𝟑𝟑𝒔𝒔𝟒𝟒𝒔𝒔𝟓𝟓𝒔𝒔𝟔𝟔
=𝜕𝜕𝑠𝑠𝜕𝜕𝑢𝑢
=?
𝑠𝑠 𝑊𝑊 𝑢𝑢
![Page 9: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/9.jpg)
Gradient of convolution
y z
x y z
x y z
x y z
x y z
x y
a
b
c
d
e
f
𝑤𝑤= [z, y, x] 𝑢𝑢 = [a, b, c, d, e, f] 𝑠𝑠 = 𝑢𝑢 ∗ 𝑤𝑤
𝒔𝒔𝟏𝟏𝒔𝒔𝟐𝟐𝒔𝒔𝟑𝟑𝒔𝒔𝟒𝟒𝒔𝒔𝟓𝟓𝒔𝒔𝟔𝟔
=𝜕𝜕𝑠𝑠1𝜕𝜕𝑢𝑢1
= 𝑦𝑦
𝑠𝑠 𝑢𝑢𝑊𝑊
![Page 10: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/10.jpg)
Gradient of convolution
y z
x y z
x y z
x y z
x y z
x y
a
b
c
d
e
f
𝑤𝑤= [z, y, x] 𝑢𝑢 = [a, b, c, d, e, f] 𝑠𝑠 = 𝑢𝑢 ∗ 𝑤𝑤
𝒔𝒔𝟏𝟏𝒔𝒔𝟐𝟐𝒔𝒔𝟑𝟑𝒔𝒔𝟒𝟒𝒔𝒔𝟓𝟓𝒔𝒔𝟔𝟔
=𝜕𝜕𝑠𝑠1𝜕𝜕𝑢𝑢2
= 𝑧𝑧
𝑠𝑠 𝑢𝑢𝑊𝑊
![Page 11: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/11.jpg)
Gradient of convolution
y z
𝑤𝑤= [z, y, x] 𝑢𝑢 = [a, b, c, d, e, f] 𝑠𝑠 = 𝑢𝑢 ∗ 𝑤𝑤
𝜕𝜕𝑠𝑠𝜕𝜕𝑢𝑢
𝜕𝜕𝑠𝑠1𝜕𝜕𝑢𝑢
=𝜕𝜕𝑠𝑠1𝜕𝜕𝑢𝑢1
, …𝜕𝜕𝑠𝑠1𝜕𝜕𝑢𝑢6
![Page 12: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/12.jpg)
Gradient of convolution
y z
x y z
x y z
x y z
x y z
x y
a
b
c
d
e
f
𝑤𝑤= [z, y, x] 𝑢𝑢 = [a, b, c, d, e, f] 𝑠𝑠 = 𝑢𝑢 ∗ 𝑤𝑤
𝒔𝒔𝟏𝟏𝒔𝒔𝟐𝟐𝒔𝒔𝟑𝟑𝒔𝒔𝟒𝟒𝒔𝒔𝟓𝟓𝒔𝒔𝟔𝟔
=
𝑠𝑠 𝑢𝑢𝑊𝑊
𝜕𝜕𝑠𝑠2𝜕𝜕𝑢𝑢1
= 𝑥𝑥
𝜕𝜕𝑠𝑠2𝜕𝜕𝑢𝑢2
= 𝑦𝑦
𝜕𝜕𝑠𝑠2𝜕𝜕𝑢𝑢3
= 𝑧𝑧
![Page 13: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/13.jpg)
Gradient of convolution
y z
x y z
𝑤𝑤= [z, y, x] 𝑢𝑢 = [a, b, c, d, e, f] 𝑠𝑠 = 𝑢𝑢 ∗ 𝑤𝑤
𝜕𝜕𝑠𝑠𝜕𝜕𝑢𝑢
𝜕𝜕𝑠𝑠2𝜕𝜕𝑢𝑢
=𝜕𝜕𝑠𝑠2𝜕𝜕𝑢𝑢1
, …𝜕𝜕𝑠𝑠2𝜕𝜕𝑢𝑢6
![Page 14: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/14.jpg)
Gradient of convolution
y z
x y z
x y z
x y z
x y z
x y
a
b
c
d
e
f
𝑤𝑤= [z, y, x] 𝑢𝑢 = [a, b, c, d, e, f] 𝑠𝑠 = 𝑢𝑢 ∗ 𝑤𝑤
𝒔𝒔𝟏𝟏𝒔𝒔𝟐𝟐𝒔𝒔𝟑𝟑𝒔𝒔𝟒𝟒𝒔𝒔𝟓𝟓𝒔𝒔𝟔𝟔
=
𝑠𝑠 𝑢𝑢𝑊𝑊
𝜕𝜕𝑠𝑠3𝜕𝜕𝑢𝑢2
= 𝑥𝑥
𝜕𝜕𝑠𝑠3𝜕𝜕𝑢𝑢3
= 𝑦𝑦
𝜕𝜕𝑠𝑠3𝜕𝜕𝑢𝑢4
= 𝑧𝑧
![Page 15: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/15.jpg)
Gradient of convolution
y z
x y z
x y z
𝑤𝑤= [z, y, x] 𝑢𝑢 = [a, b, c, d, e, f] 𝑠𝑠 = 𝑢𝑢 ∗ 𝑤𝑤
𝜕𝜕𝑠𝑠𝜕𝜕𝑢𝑢
𝜕𝜕𝑠𝑠3𝜕𝜕𝑢𝑢
=𝜕𝜕𝑠𝑠3𝜕𝜕𝑢𝑢1
, …𝜕𝜕𝑠𝑠3𝜕𝜕𝑢𝑢6
![Page 16: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/16.jpg)
Gradient of convolution𝑤𝑤= [z, y, x] 𝑢𝑢 = [a, b, c, d, e, f] 𝑠𝑠 = 𝑢𝑢 ∗ 𝑤𝑤
𝜕𝜕𝑠𝑠𝜕𝜕𝑢𝑢
= 𝑊𝑊
y z
x y z
x y z
x y z
x y z
x y
![Page 17: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/17.jpg)
Gradient of convolution
y z
x y z
x y z
x y z
x y z
x y
a
b
c
d
e
f
𝑤𝑤= [z, y, x] 𝑢𝑢 = [a, b, c, d, e, f] 𝑠𝑠 = 𝑢𝑢 ∗ 𝑤𝑤
𝒔𝒔𝟏𝟏𝒔𝒔𝟐𝟐𝒔𝒔𝟑𝟑𝒔𝒔𝟒𝟒𝒔𝒔𝟓𝟓𝒔𝒔𝟔𝟔
=𝑠𝑠 = 𝑢𝑢 ∗ 𝑤𝑤
= 𝑊𝑊𝑢𝑢
𝑠𝑠 𝑊𝑊 𝑢𝑢
𝜕𝜕𝑠𝑠𝜕𝜕𝑢𝑢
= 𝑊𝑊
![Page 18: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/18.jpg)
Gradient of convolution
y z
x y z
x y z
x y z
x y z
x y
a
b
c
d
e
f
𝑤𝑤= [z, y, x] 𝑢𝑢 = [a, b, c, d, e, f] 𝑠𝑠 = 𝑢𝑢 ∗ 𝑤𝑤
𝒔𝒔𝟏𝟏𝒔𝒔𝟐𝟐𝒔𝒔𝟑𝟑𝒔𝒔𝟒𝟒𝒔𝒔𝟓𝟓𝒔𝒔𝟔𝟔
=𝜕𝜕𝑠𝑠𝜕𝜕𝑤𝑤
=?
𝑠𝑠 𝑊𝑊 𝑢𝑢
![Page 19: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/19.jpg)
Gradient of convolution
b a
c b a
d c b
e d c
f e d
f e
z
y
x
𝑤𝑤= [z, y, x] 𝑢𝑢 = [a, b, c, d, e, f] 𝑠𝑠 = 𝑤𝑤 ∗ 𝑢𝑢
𝒔𝒔𝟏𝟏𝒔𝒔𝟐𝟐𝒔𝒔𝟑𝟑𝒔𝒔𝟒𝟒𝒔𝒔𝟓𝟓𝒔𝒔𝟔𝟔
=𝜕𝜕𝑠𝑠𝜕𝜕𝑤𝑤
=?
𝑠𝑠 𝑈𝑈 𝑤𝑤
f, e, d, c
![Page 20: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/20.jpg)
Gradient of convolution
b a
c b a
d c b
e d c
f e d
f e
z
y
x
𝑤𝑤= [z, y, x] 𝑢𝑢 = [a, b, c, d, e, f] 𝑠𝑠 = 𝑢𝑢 ∗ 𝑤𝑤
𝒔𝒔𝟏𝟏𝒔𝒔𝟐𝟐𝒔𝒔𝟑𝟑𝒔𝒔𝟒𝟒𝒔𝒔𝟓𝟓𝒔𝒔𝟔𝟔
=𝜕𝜕𝑠𝑠𝜕𝜕𝑤𝑤
= 𝑈𝑈
𝑠𝑠 𝑈𝑈 𝑤𝑤
![Page 21: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/21.jpg)
Convolution with high dimensional data
• Input and kernel can be 1D / 2D
*
![Page 22: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/22.jpg)
Convolution with high dimensional data
• Input and kernel can be 3D, e.g., an RGB image have 3 channels
*
![Page 23: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/23.jpg)
Convolution with high dimensional data
• Input and kernel can be 3D, e.g., an RGB image have 3 channels
*
![Page 24: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/24.jpg)
Convolution with high dimensional data
• Input and kernel can be 3D, e.g., an RGB image have 3 channels
*
![Page 25: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/25.jpg)
Convolution with high dimensional data
• Input and kernel can be 3D, e.g., an RGB image have 3 channels
*
![Page 26: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/26.jpg)
Pooling
![Page 27: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/27.jpg)
Pooling
• Summarizing the input (i.e., output the max of the input)
Figure from Deep Learning, by Goodfellow, Bengio, and Courville
![Page 28: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/28.jpg)
Slides Credit: Deep Learning Tutorial by Marc’Aurelio Ranzato
![Page 29: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/29.jpg)
Slides Credit: Deep Learning Tutorial by Marc’Aurelio Ranzato
![Page 30: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/30.jpg)
Motivation from neuroscience
• David Hubel and Torsten Wiesel studied early visual system in human brain (V1 or primary visual cortex), and won Nobel prize for this
• V1 properties• 2D spatial arrangement• Simple cells: inspire convolution layers• Complex cells: inspire pooling layers
![Page 31: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/31.jpg)
Variants of pooling
• Stride
Figure from Deep Learning, by Goodfellow, Bengio, and Courville
![Page 32: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/32.jpg)
![Page 33: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/33.jpg)
Gradient of Pooling
• 1 for max values and 0 elsewhere• Bookkeeping: simply remember the index of the max values.
Figure from Deep Learning, by Goodfellow, Bengio, and Courville
![Page 34: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/34.jpg)
Output normalization
![Page 35: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/35.jpg)
Output normalization
• Normalize the output of a neural network to keep things in range
• Often used for classification (normalize the output to a proper probability distribution)
![Page 36: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/36.jpg)
Output normalization: Sigmoid
• Normalize the output into the range of (0,1)
• As a probability distribution for a binary variable
• No parameters and differentiable
Sigmoid
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑥𝑥 =1
1 + 𝑒𝑒−𝑥𝑥
![Page 37: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/37.jpg)
Output normalization: Softmax
• Normalize a vector such that• Each element in the range of (0, 1)• All elements sum to 1
• As a probability distribution for a categorical variable (e.g., 𝑥𝑥 = {1, …𝐾𝐾})
• No parameters and differentiable
Softmax
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑥𝑥 𝑥𝑥𝑘𝑘 =exp(𝑥𝑥𝑘𝑘)∑𝑗𝑗 exp 𝑥𝑥𝑗𝑗
![Page 38: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/38.jpg)
Loss functions
![Page 39: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/39.jpg)
The training criterion: loss functions
• A loss function compares the output of a neural network to the label
• A loss function is similar to an error function
• Loss -> 0 if the output matches the label, otherwise a large value
![Page 40: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/40.jpg)
The training criterion: loss functions
• Classification • Cross entropy loss• C-way classification problem• Often in combination with
sigmoid (binary) or softmax (C-way)
• Regression• L2 loss
𝐻𝐻 𝑦𝑦,𝑝𝑝 = −�𝑗𝑗
𝑦𝑦𝑗𝑗log(𝑝𝑝𝑗𝑗)
𝐿𝐿2 𝑦𝑦, �𝑦𝑦 = �𝑗𝑗
𝑦𝑦𝑗𝑗 − �𝑦𝑦𝑗𝑗2
![Page 41: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/41.jpg)
How to get the deep networks work?Deep Learning: Composing a set of (nonlinear) functions g
𝑠𝑠 𝒙𝒙;𝜽𝜽 = 𝑠𝑠1 …𝑠𝑠𝑛𝑛−1(𝑠𝑠𝑛𝑛 𝒙𝒙;𝜽𝜽𝒏𝒏 ,𝜽𝜽𝒏𝒏−𝟏𝟏 … ,𝜽𝜽𝟏𝟏)
Each of the function g is represented using a layer of a neural network
• Key element: 𝜎𝜎 𝑾𝑾𝑻𝑻𝒙𝒙 + 𝒃𝒃
• Which activation function to use?
• What linear function to use?
• Other operations?
• The design of the network …
![Page 42: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/42.jpg)
Case study: LeNet-5
![Page 43: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/43.jpg)
LeNet-5
• Proposed in “Gradient-based learning applied to document recognition” , by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the IEEE, 1998
![Page 44: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/44.jpg)
LeNet-5
• Proposed in “Gradient-based learning applied to document recognition” , by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the IEEE, 1998
• Apply convolution on 2D images (MNIST) and use backpropagation
![Page 45: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/45.jpg)
LeNet-5
• Proposed in “Gradient-based learning applied to document recognition” , by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the IEEE, 1998
• Apply convolution on 2D images (MNIST) and use backpropagation
• Structure: 2 convolutional layers (with pooling) + 3 fully connected layers• Input size: 32x32x1• Convolution kernel size: 5x5• Pooling: 2x2
![Page 46: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/46.jpg)
LeNet-5
Figure from Gradient-based learning applied to document recognition,by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner
![Page 47: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/47.jpg)
LeNet-5
Figure from Gradient-based learning applied to document recognition,by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner
![Page 48: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/48.jpg)
LeNet-5
Figure from Gradient-based learning applied to document recognition,by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner
Filter: 5x5, stride: 1x1, #filters: 6
Activation: sigmoid
![Page 49: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/49.jpg)
LeNet-5
Figure from Gradient-based learning applied to document recognition,by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner
Pooling: 2x2, stride: 2
![Page 50: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/50.jpg)
LeNet-5
Figure from Gradient-based learning applied to document recognition,by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner
Filter: 5x5x6, stride: 1x1, #filters: 16
Activation: sigmoid
![Page 51: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/51.jpg)
LeNet-5
Figure from Gradient-based learning applied to document recognition,by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner
Pooling: 2x2, stride: 2
![Page 52: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/52.jpg)
LeNet-5
Figure from Gradient-based learning applied to document recognition,by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner
Weight matrix: 400x120Activation: sigmoid
![Page 53: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/53.jpg)
LeNet-5
Figure from Gradient-based learning applied to document recognition,by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner
Weight matrix: 120x84Activation: sigmoid
Weight matrix: 84x10
![Page 54: Deep Learning Part II - GitHub PagesGradient-based learning applied to document recognition”, by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the](https://reader033.fdocuments.in/reader033/viewer/2022060903/609f4b0970b52851313208a9/html5/thumbnails/54.jpg)
LeNet-5
Figure from Gradient-based learning applied to document recognition,by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner
Output normalization: softmax*
Loss: cross entropy*