Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A...
Transcript of Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A...
![Page 1: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology.](https://reader030.fdocuments.in/reader030/viewer/2022041015/5ec67acb18bbe6094721576c/html5/thumbnails/1.jpg)
Convolutional Neural NetworksMaria Florina Balcan
10/17/2018
![Page 2: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology.](https://reader030.fdocuments.in/reader030/viewer/2022041015/5ec67acb18bbe6094721576c/html5/thumbnails/2.jpg)
Convolutional neural networks
• A specialized kind of neural network for processing data that has a known grid-like topology.
• E.g., time-series data, which can be thought of as a 1-D grid taking samples at regular time intervals, and image data, which can be thought of as a 2-D grid of pixels
• The name “convolutional neural network” indicates that the network employs a mathematical operation called convolution . Convolution is a specialized kind of linear operation.
• Convolutional networks are neural networks that use convolution in place of general matrix multiplication in at least one of their layers.
![Page 3: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology.](https://reader030.fdocuments.in/reader030/viewer/2022041015/5ec67acb18bbe6094721576c/html5/thumbnails/3.jpg)
Convolutional neural networks
• Strong empirical application performance
• Convolutional networks: neural networks that use convolution in place of general matrix multiplication in at least one of their layers
for a specific kind of weight matrix 𝑊
ℎ = 𝜎(𝑊𝑇𝑥 + 𝑏)
![Page 4: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology.](https://reader030.fdocuments.in/reader030/viewer/2022041015/5ec67acb18bbe6094721576c/html5/thumbnails/4.jpg)
Convolution
![Page 5: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology.](https://reader030.fdocuments.in/reader030/viewer/2022041015/5ec67acb18bbe6094721576c/html5/thumbnails/5.jpg)
Convolution: discrete version
• Given array 𝑢𝑡 and 𝑤𝑡, their convolution is a function 𝑠𝑡
• Written as
• When 𝑢𝑡 or 𝑤𝑡 is not defined, assumed to be 0
𝑠𝑡 =
𝑎=−∞
+∞
𝑢𝑎𝑤𝑡−𝑎
𝑠 = 𝑢 ∗ 𝑤 or 𝑠𝑡 = 𝑢 ∗ 𝑤 𝑡
![Page 6: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology.](https://reader030.fdocuments.in/reader030/viewer/2022041015/5ec67acb18bbe6094721576c/html5/thumbnails/6.jpg)
Convolution, Motivation
• Suppose we track the location of a spaceship with a laser sensor. The laser sensor provides a single output u(t), the position of the spaceship at second t.
• Suppose sensor is noisy. To obtain a less noisy estimate of the spaceship’s position, we average several measurements. More recent measurements are more relevant, so we use a weighted average that gives more weight to recent measurements.
𝑠𝑡 =
𝑎=−∞
+∞
𝑢𝑎𝑤𝑡−𝑎
• Use a weighting function w(a), where a is the age of a measurement. If we apply such a weighted average operation at every moment, we obtain a new function s providing a smoothed estimate of the position of the spaceship:
![Page 7: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology.](https://reader030.fdocuments.in/reader030/viewer/2022041015/5ec67acb18bbe6094721576c/html5/thumbnails/7.jpg)
Illustration 1
a b c d e f
x y z
xb+yc+zd
𝑤= [z, y, x]𝑢 = [a, b, c, d, e, f]
𝑠3
𝐰𝟐 𝐰𝟏 𝐰𝟎
𝐮𝟏 𝒖𝟐 𝐮𝟑
![Page 8: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology.](https://reader030.fdocuments.in/reader030/viewer/2022041015/5ec67acb18bbe6094721576c/html5/thumbnails/8.jpg)
Illustration 1
a b c d e f
x y z
xc+yd+ze
𝑠4
𝐰𝟐 𝐰𝟏 𝐰𝟎
𝐮𝟐 𝒖𝟑 𝐮𝟒
![Page 9: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology.](https://reader030.fdocuments.in/reader030/viewer/2022041015/5ec67acb18bbe6094721576c/html5/thumbnails/9.jpg)
Illustration 1
a b c d e f
x y z
xd+ye+zf
𝐰𝟐 𝐰𝟏 𝐰𝟎
𝐮𝟑 𝒖𝟒 𝐮𝟓
𝑠5
![Page 10: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology.](https://reader030.fdocuments.in/reader030/viewer/2022041015/5ec67acb18bbe6094721576c/html5/thumbnails/10.jpg)
Illustration 1: boundary case
a b c d e f
x y
xe+yf
𝐰𝟐 𝐰𝟏
𝒖𝟒 𝐮𝟓
𝑠6
![Page 11: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology.](https://reader030.fdocuments.in/reader030/viewer/2022041015/5ec67acb18bbe6094721576c/html5/thumbnails/11.jpg)
Illustration 1 as matrix multiplication
y z
x y z
x y z
x y z
x y z
x y
a
b
c
d
e
f
![Page 12: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology.](https://reader030.fdocuments.in/reader030/viewer/2022041015/5ec67acb18bbe6094721576c/html5/thumbnails/12.jpg)
Illustration 2: two dimensional case
a b c d
e f g h
i j k l
w x
y z
aw + bx + ey + fz
bw + cx + fy + gz
cw + dx + gy + hz
ew + fx + iy + jz
fw + gx + jy + kz
gw + hx + ky + lz
![Page 13: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology.](https://reader030.fdocuments.in/reader030/viewer/2022041015/5ec67acb18bbe6094721576c/html5/thumbnails/13.jpg)
Illustration 2: two dimensional case
a b c d
e f g h
i j k l
w x
y z
wa + bx + ey + fz
![Page 14: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology.](https://reader030.fdocuments.in/reader030/viewer/2022041015/5ec67acb18bbe6094721576c/html5/thumbnails/14.jpg)
Illustration 2
a b c d
e f g h
i j k l
w x
y z
bw + cx + fy + gz
wa + bx + ey + fz
![Page 15: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology.](https://reader030.fdocuments.in/reader030/viewer/2022041015/5ec67acb18bbe6094721576c/html5/thumbnails/15.jpg)
Illustration 2
a b c d
e f g h
i j k l
w x
y z
bw + cx + fy + gz
wa + bx + ey + fz
Kernel (or filter)
Feature map
Input
![Page 16: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology.](https://reader030.fdocuments.in/reader030/viewer/2022041015/5ec67acb18bbe6094721576c/html5/thumbnails/16.jpg)
Illustration 2
a b c d
e f g h
i j k l
w x
y z
bw + cx + fy + gz
wa + bx + ey + fz
Kernel (or filter)
Feature map
Input
![Page 17: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology.](https://reader030.fdocuments.in/reader030/viewer/2022041015/5ec67acb18bbe6094721576c/html5/thumbnails/17.jpg)
Advantage: sparse interaction
Figure from Deep Learning, by Goodfellow, Bengio, and Courville
Fully connected layer, 𝑚 × 𝑛 edges
𝑚 output nodes
𝑛 input nodes
![Page 18: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology.](https://reader030.fdocuments.in/reader030/viewer/2022041015/5ec67acb18bbe6094721576c/html5/thumbnails/18.jpg)
Advantage: sparse interaction
Figure from Deep Learning, by Goodfellow, Bengio, and Courville
Convolutional layer, ≤ 𝑚 × 𝑘 edges
𝑚 output nodes
𝑛 input nodes
𝑘 kernel size
Store fewer parameters:
• reduces memory requirements
• improves statistical efficiency.
![Page 19: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology.](https://reader030.fdocuments.in/reader030/viewer/2022041015/5ec67acb18bbe6094721576c/html5/thumbnails/19.jpg)
Advantage: sparse interaction
Figure from Deep Learning, by Goodfellow, Bengio, and Courville
Multiple convolutional layers: larger receptive field
• Receptive field of units in deeper layers larger than receptive field of units in shallow layers.
• Even though direct connections are sparse, units in the deeper layers are indirectly connected most of the input image.
• At the first layer capture more local features, but as we go deeper in the network we capture more global features.
![Page 20: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology.](https://reader030.fdocuments.in/reader030/viewer/2022041015/5ec67acb18bbe6094721576c/html5/thumbnails/20.jpg)
Advantage: parameter sharing
Figure from Deep Learning, by Goodfellow, Bengio, and Courville
The same kernel are used repeatedly.E.g., the black edge is the same weightin the kernel.
Reduce the storage requirements of the model.
![Page 21: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology.](https://reader030.fdocuments.in/reader030/viewer/2022041015/5ec67acb18bbe6094721576c/html5/thumbnails/21.jpg)
Advantage: equivariant representations
• Equivariant: transforming the input = transforming the output
• Example: input is an image, transformation is shifting
• Convolution(shift(input)) = shift(Convolution(input))
• Useful when care only about the existence of a pattern, rather than the location
![Page 22: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology.](https://reader030.fdocuments.in/reader030/viewer/2022041015/5ec67acb18bbe6094721576c/html5/thumbnails/22.jpg)
Pooling
![Page 23: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology.](https://reader030.fdocuments.in/reader030/viewer/2022041015/5ec67acb18bbe6094721576c/html5/thumbnails/23.jpg)
Terminology
Figure from Deep Learning, by Goodfellow, Bengio, and Courville
![Page 24: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology.](https://reader030.fdocuments.in/reader030/viewer/2022041015/5ec67acb18bbe6094721576c/html5/thumbnails/24.jpg)
Pooling
• Summarizing the input (i.e., output the max of the input)
Figure from Deep Learning, by Goodfellow, Bengio, and Courville
A pooling function replaces the output of the net at a certain location with a summary statistic of the nearby outputs. . For example, the max pooling takes maximum output within a rectangular neighborhood.
![Page 25: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology.](https://reader030.fdocuments.in/reader030/viewer/2022041015/5ec67acb18bbe6094721576c/html5/thumbnails/25.jpg)
Advantage
Induce invariance
Figure from Deep Learning, by Goodfellow, Bengio, and Courville
![Page 26: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology.](https://reader030.fdocuments.in/reader030/viewer/2022041015/5ec67acb18bbe6094721576c/html5/thumbnails/26.jpg)
Variants of pooling
• Max pooling 𝑦 = max{𝑥1, 𝑥2, … , 𝑥𝑘}
• Average pooling 𝑦 = mean{𝑥1, 𝑥2, … , 𝑥𝑘}
• Others like max-out
![Page 27: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology.](https://reader030.fdocuments.in/reader030/viewer/2022041015/5ec67acb18bbe6094721576c/html5/thumbnails/27.jpg)
Motivation from neuroscience
• David Hubel and Torsten Wiesel studied early visual system in human brain (V1 or primary visual cortex), and won Nobel prize for this
• V1 properties• 2D spatial arrangement
• Simple cells: inspire convolution layers
• Complex cells: inspire pooling layers
![Page 28: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology.](https://reader030.fdocuments.in/reader030/viewer/2022041015/5ec67acb18bbe6094721576c/html5/thumbnails/28.jpg)
Variants of convolution and pooling
![Page 29: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology.](https://reader030.fdocuments.in/reader030/viewer/2022041015/5ec67acb18bbe6094721576c/html5/thumbnails/29.jpg)
Variants of convolutional layers
• Multiple dimensional convolution
• Input and kernel can be 3D• E.g., images have (width, height, RBG channels)
• Multiple kernels lead to multiple feature maps (also called channels)
![Page 30: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology.](https://reader030.fdocuments.in/reader030/viewer/2022041015/5ec67acb18bbe6094721576c/html5/thumbnails/30.jpg)
Variants of convolutional layers
• Padding: valid
a b c d e f
x y z
xd+ye+zf
![Page 31: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology.](https://reader030.fdocuments.in/reader030/viewer/2022041015/5ec67acb18bbe6094721576c/html5/thumbnails/31.jpg)
Variants of convolutional layers
• Padding: same
a b c d e f
x y
xe+yf
![Page 32: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology.](https://reader030.fdocuments.in/reader030/viewer/2022041015/5ec67acb18bbe6094721576c/html5/thumbnails/32.jpg)
Variants of convolutional layers
• Stride
Figure from Deep Learning, by Goodfellow, Bengio, and Courville
![Page 33: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology.](https://reader030.fdocuments.in/reader030/viewer/2022041015/5ec67acb18bbe6094721576c/html5/thumbnails/33.jpg)
Variants of pooling
• Stride and padding
Figure from Deep Learning, by Goodfellow, Bengio, and Courville
![Page 34: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology.](https://reader030.fdocuments.in/reader030/viewer/2022041015/5ec67acb18bbe6094721576c/html5/thumbnails/34.jpg)
Case study: LeNet-5
![Page 35: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology.](https://reader030.fdocuments.in/reader030/viewer/2022041015/5ec67acb18bbe6094721576c/html5/thumbnails/35.jpg)
LeNet-5
• Proposed in “Gradient-based learning applied to document recognition” , by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the IEEE, 1998
![Page 36: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology.](https://reader030.fdocuments.in/reader030/viewer/2022041015/5ec67acb18bbe6094721576c/html5/thumbnails/36.jpg)
LeNet-5
• Proposed in “Gradient-based learning applied to document recognition” , by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the IEEE, 1998
• Apply convolution on 2D images (MNIST) and use backpropagation
![Page 37: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology.](https://reader030.fdocuments.in/reader030/viewer/2022041015/5ec67acb18bbe6094721576c/html5/thumbnails/37.jpg)
LeNet-5
• Proposed in “Gradient-based learning applied to document recognition” , by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the IEEE, 1998
• Apply convolution on 2D images (MNIST) and use backpropagation
• Structure: 2 convolutional layers (with pooling) + 3 fully connected layers• Input size: 32x32x1
• Convolution kernel size: 5x5
• Pooling: 2x2
![Page 38: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology.](https://reader030.fdocuments.in/reader030/viewer/2022041015/5ec67acb18bbe6094721576c/html5/thumbnails/38.jpg)
LeNet-5
Figure from Gradient-based learning applied to document recognition,by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner
![Page 39: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology.](https://reader030.fdocuments.in/reader030/viewer/2022041015/5ec67acb18bbe6094721576c/html5/thumbnails/39.jpg)
LeNet-5
Figure from Gradient-based learning applied to document recognition,by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner
![Page 40: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology.](https://reader030.fdocuments.in/reader030/viewer/2022041015/5ec67acb18bbe6094721576c/html5/thumbnails/40.jpg)
LeNet-5
Figure from Gradient-based learning applied to document recognition,by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner
Filter: 5x5, stride: 1x1, #filters: 6
![Page 41: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology.](https://reader030.fdocuments.in/reader030/viewer/2022041015/5ec67acb18bbe6094721576c/html5/thumbnails/41.jpg)
LeNet-5
Figure from Gradient-based learning applied to document recognition,by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner
Pooling: 2x2, stride: 2
![Page 42: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology.](https://reader030.fdocuments.in/reader030/viewer/2022041015/5ec67acb18bbe6094721576c/html5/thumbnails/42.jpg)
LeNet-5
Figure from Gradient-based learning applied to document recognition,by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner
Filter: 5x5x6, stride: 1x1, #filters: 16
![Page 43: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology.](https://reader030.fdocuments.in/reader030/viewer/2022041015/5ec67acb18bbe6094721576c/html5/thumbnails/43.jpg)
LeNet-5
Figure from Gradient-based learning applied to document recognition,by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner
Pooling: 2x2, stride: 2
![Page 44: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology.](https://reader030.fdocuments.in/reader030/viewer/2022041015/5ec67acb18bbe6094721576c/html5/thumbnails/44.jpg)
LeNet-5
Figure from Gradient-based learning applied to document recognition,by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner
Weight matrix: 400x120
![Page 45: Convolutional Neural Networks10715-f18/lectures/cnns_2018.pdfConvolutional neural networks •A specialized kind of neural network for processing data that has a known grid-like topology.](https://reader030.fdocuments.in/reader030/viewer/2022041015/5ec67acb18bbe6094721576c/html5/thumbnails/45.jpg)
LeNet-5
Figure from Gradient-based learning applied to document recognition,by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner
Weight matrix: 120x84
Weight matrix: 84x10