Presented by Mohammad Nayeem Teli€¦ · DeepFace: Closing the Gap to Human-Level Performance in...
Transcript of Presented by Mohammad Nayeem Teli€¦ · DeepFace: Closing the Gap to Human-Level Performance in...
Convolutional Neural Networks
Presented by
Mohammad Nayeem Teli
Convolutional Neural Networks■ ConvNets have been around for a long time, ■ One of the seminal works came from LeCun et al. 1989 ■ MNIST digit and OC recognition ■ Most recent version LeNet-5
7
■ Approx. 14 M labeled images, 20 K classes ■ Images gathered from internet ■ Human labels via Amazon Turk ■ For object recognition challenge 1.2 M training images and 1000
classeshttp://image-net.org/
Performance on ImageNet
Face recognition and verification system using FaceNet and DeepFace Algorithms - Implementation
FaceNet: A Unified Embedding for Face Recognition and Clustering
Florian Schroff, Dmitry Kalenichenko and James Philbin (2015) in Proceedings of the IEEE Computer Society
Conference on Computer Vision and Pattern Recognition 2015.
DeepFace: Closing the Gap to Human-Level Performance in Face Verification
Yaniv Taigman, Ming Yang, Marc’ Aurelio Ranzato and Lior Wolf (2014) in Proceedings of the IEEE Computer
Society Conference on Computer Vision and Pattern Recognition 2014.
Object Detection for autonomous car driving application - YOLO Algorithm Implementation
You Only Look Once: Unified, Real-Time Object Detection
Joseph Redmon, Santos Divvala, Ross Girschick and Ali Farhadi (2016) in Proceedings of the IEEE Computer
Society Conference on Computer Vision and Pattern Recognition 2016.
■ You OnlySee Once (YOLO) is an object detection model ■ It runs on an input image through a Convolution Neural
Network
Deep Learning & Art: Neural Style Transfer
A neural algorithm of artistic style
Leon A. Gatys, Alexander S. Ecker, Matthias Bethge (2015) in Proceedings of the IEEE Computer Society
Conference on Computer Vision and Pattern Recognition 2015.
■ Implementation of neural style transfer algorithm ■ Generate novel artistic images
Cross-Correlation and Convolution■ Cross-correlation is a similarity measure between two signals when
one has a time-lag.Continuous domain:
■ Convolution is similar, although one signal is reversed
■ They have two key features: ■ shift invariance :
Same operation is performed at every point in the image ■ linearity
Every pixel is replaced with a linear combination of its neighbors.
(g∗h)(t) = ∫ g*(𝜏)h(t+𝜏)d𝜏
fconv = g(-t)∗h(t)
Convolution
Gaussian
Cross-Correlation and Convolution
*
5 15 4 0 -1
10 1 5 1 0
6 9 11 1 -1
0 -1
1 0
5 15 4
10 1 5
-1 0 1
-2 0 2
-1 0 1
filterImagen x n
f x f
Cross-Correlation and Convolution
*
5 15 4 0 -1
10 1 5 1 0
6 9 11 1 -1
0 -1
1 0
5 15 4
10 1 5
-1 0 1
-2 0 2
-1 0 1
=
-1 -23 -27
10 0 -30
24 25 -19
5*(-1)+15*0+4*1+10*(-2)+1*0+5*2+6*(-2)+9*0+11*2 = -1
Cross-Correlation and Convolution
*
5 15 4 0 -1
10 1 5 1 0
6 9 11 1 -1
0 -1
1 0
5 15 4
10 1 5
-1 0 1
-2 0 2
-1 0 1
=
Cross-Correlation and Convolution
5 15 4 0 -1
10 1 5 1 0
6 9 11 1 -1
0 -1
1 0
5 15 4
10 1 5
-1 0 1
-2 0 2
-1 0 1
-1
Cross-Correlation and Convolution
5 15 4 0 -1
10 1 5 1 0
6 9 11 1 -1
0 -1
1 0
5 15 4
10 1 5
-1 0 1
-2 0 2
-1 0 1
-1 -23
Cross-Correlation and Convolution
5 15 4 0 -1
10 1 5 1 0
6 9 11 1 -1
0 -1
1 0
5 15 4
10 1 5
-1 0 1
-2 0 2
-1 0 1
-1 -23 -27
Cross-Correlation and Convolution
5 15 4 0 -1
10 1 5 1 0
6 9 11 1 -1
0 -1
1 0
5 15 4
10 1 5
-1 0 1
-2 0 2
-1 0 1
-1 -23 -27
10
Cross-Correlation and Convolution
5 15 4 0 -1
10 1 5 1 0
6 9 11 1 -1
0 -1
1 0
5 15 4
10 1 5
-1 0 1
-2 0 2
-1 0 1
-1 -23 -27
10 0
Cross-Correlation and Convolution
5 15 4 0 -1
10 1 5 1 0
6 9 11 1 -1
0 -1
1 0
5 15 4
10 1 5
-1 0 1
-2 0 2
-1 0 1
-1 -23 -27
10 0 -30
Cross-Correlation and Convolution
5 15 4 0 -1
10 1 5 1 0
6 9 11 1 -1
0 -1
1 0
5 15 4
10 1 5
-1 0 1
-2 0 2
-1 0 1
-1 -23 -27
10 0 -30
24
Cross-Correlation and Convolution
5 15 4 0 -1
10 1 5 1 0
6 9 11 1 -1
0 -1
1 0
5 15 4
10 1 5
-1 0 1
-2 0 2
-1 0 1
-1 -23 -27
10 0 -30
24 25
Cross-Correlation and Convolution
5 15 4 0 -1
10 1 5 1 0
6 9 11 1 -1
0 -1
1 0
5 15 4
10 1 5
-1 0 1
-2 0 2
-1 0 1
-1 -23 -27
10 0 -30
24 25 -19
Output Image
n-f+1 x n-f+1n x n
Cross-Correlation and Convolution
-1 0 1
-2 0 2
-1 0 1
* =
*
-1 -2 -1
0 0 0
1 2 1
=
horizontal edge detector
vertical edge detector
Cross-Correlation and Convolution
-1 0 1
-2 0 2
-1 0 1
1 2 1
0 0 0
-1 -2 -1
Cross-correlation
Convolution
Neural Network
g - activation function
y = g(w0 * x0 + w1 * x1 + w2 * x2)
w0=-10
w1=+20
+20
g(z) =1
1 + e-z
x0
x1
x2
Neural Network
g - activation function
y = g(w0 * x0 + w1 * x1 + w2 * x2)
w0=-10
w1=+20
+20z
g(z) =1
1 + e-z
x0
x1
x2
Neural Network
a21
a22
a23
a31
a32
a41
g(w211*x1 + w212*x2 + w213*x3)
Neural Network
x1
x2
x3
Activation Functions■ Sigmoid
■ ReLU
■ Hyperbolic tangent
■ Leaky ReLU
CNN
*
5 15 4 0 -1
10 1 5 1 0
6 9 11 1 -1
0 -1
1 0
5 15 4
10 1 5
-1 0 1
-2 0 2
-1 0 1
filterImagen x n
f x f
CNN
*
5 15 4 0 -1
10 1 5 1 0
6 9 11 1 -1
0 -1
1 0
5 15 4
10 1 5
w1 w2 w3
w4 w5 w6
w7 w8 w9
filterImagen x n f x f
CNN
*
5 15 4 0 -1
10 1 5 1 0
6 9 11 1 -1
0 -1
1 0
5 15 4
10 1 5
w1 w2 w3
w4 w5 w6
w7 w8 w9
filterImagen x n f x f
# of parameters to learn: 3x3 = 9
f = 3n = 5
CNN
*
5 15 4 0 -1
10 1 5 1 0
6 9 11 1 -1
0 -1
1 0
5 15 4
10 1 5
w1 w2 w3
w4 w5 w6
w7 w8 w9
filterImage5 x 5 3 x 3
# of parameters to learn: 3x3 = 9
x12
x13
x14
x25
x1
y∧
a1
a9
# of parameters to learn: 9x25 = 225
w15 w16 w19
w4 w5 w20
w7 w8 w21
w10 w11 w12
w4 w5 w13
w7 w8 w14
CNN over multiple channels
*
filterImagen x n x 3
f x f x 3
1 5 7 1 -2
10 1 5 1 6
6 9 11 1 1
0 -1
1 0
5 15 -3
10 1 2
5 15 4 0 -1
10 1 5 1 0
6 9 11 1 -1
0 -1
1 0
5 15 4
10 1 5
2 1 -3 8 13
10 1 5 1 15
6 9 11 1 -1
0 -1
1 0
5 15 -6
10 1 0
w1 w2 w3
w4 w5 w6
w7 w8 w9
=
o1 o2 o3
o4 o5 o6
o7 o8 o9
n - f +1 x n - f + 1
5 x 5 x 3 3 x 3 x 3 3 x 3 x 1
o10 o11 o12
o4 o5 o15
o7 o8 o18
w19 w20 w21
w4 w5 w24
w7 w8 w27
w10 w11 w12
w4 w5 w15
w7 w8 w18
CNN using multiple filters
*
horizontal edge filter
Imagen x n x 3
f x f x 3
1 5 7 1 -2
10 1 5 1 6
6 9 11 1 1
0 -1
1 0
5 15 -3
10 1 2
5 15 4 0 -1
10 1 5 1 0
6 9 11 1 -1
0 -1
1 0
5 15 4
10 1 5
2 1 -3 8 13
10 1 5 1 15
6 9 11 1 -1
0 -1
1 0
5 15 -6
10 1 0
w1 w2 w3
w4 w5 w6
w7 w8 w9
=
(n - f +1) x (n - f + 1 ) x 2v19 v20 v21
w4 w5 v24
w7 w8 v27
v10 v11 v12
w4 w5 v15
w7 w8 v18
v1 v2 v3
v4 v5 v6
v7 v8 v9 vertical edge filter
f x f x 3
o1 o2 o3
o4 o5 o6
o7 o8 o9
*=
o10 o11 o12
o4 o5 o15
o7 o8 o18
o1 o2 o3
o4 o5 o6
o7 o8 o9
CNN layer
*
horizontal edge filterImage
n x n x 3
f x f x 3
1 5 7 1 -210
1 5 1 66 9 1
11 1
0 -11 0
5 15
-310
1 2
5 15
4 0 -11
01 5 1 0
6 9 11
1 -10 -
11 05 1
54
10
1 5
2 1 -3 8 131
01 5 1 1
56 9 11
1 -10 -11 0
5 15
-610
1 0
w19 w20 w21
w4 w5 w24
w7 w8 w27
w10 w11 w12
w4 w5 w15
w7 w8 w18
w1 w2 w3
w4 w5 w6
w7 w8 w9
(n - f +1) x (n - f + 1 ) x 1
v19 v20 v21
w4 w5 v24
w7 w8 v27
v10 v11 v12
w4 w5 v15
w7 w8 v18
v1 v2 v3
v4 v5 v6
v7 v8 v9vertical edge filter
f x f x 3
*
ReLUo1 o2 o3
o4 o5 o6
o7 o8 o9( + b1(
ReLU
o10 o11 o12
o4 o5 o15
o7 o8 o18( + b2 (
Padding
5 15 4 0 -1
10 1 5 1 0
6 9 11 1 -1
0 -1
1 0
5 15 4
10 1 5
-1 0 1
-2 0 2
-1 0 1
Output Image
( n + 2p - f + 1) x ( n + 2p - f + 1)
n x n
Padding
5 15 4 0 -1
10 1 5 1 0
6 9 11 1 -1
0 -1
1 0
5 15 4
10 1 5
0 0 0 0 0
0
0
0
0
0
0
0
0
0
0
0
0
0 0 0 0 0 00
-1 0 1
-2 0 2
-1 0 1
Output Image( n + 2p - f + 1 ) x ( n + 2p - f + 1)
n x n
n + 2p - f + 1 = n
p = f - 1 / 2
5 + 2 *1 - 3 + 1 = 5
Strides
5 15 4 0 -1
10 1 5 1 0
6 9 11 1 -1
0 -1
1 0
5 15 4
10 1 5
-1 0 1
-2 0 2
-1 0 1
Output Image( n + 2p - f + 1 ) x ( n + 2p - f + 1)
n x n
Strides
5 15 4 0 -1
10 1 5 1 0
6 9 11 1 -1
0 -1
1 0
5 15 4
10 1 5
-1 0 1
-2 0 2
-1 0 1
Output Imagen x n
n + 2p - f
s+ 1⎩ ⎩x
n + 2p - f
s+ 1⎩ ⎩
Strides
5 15 4 0 -1
10 1 5 1 0
6 9 11 1 -1
0 -1
1 0
5 15 4
10 1 5
-1 0 1
-2 0 2
-1 0 1
Output Imagen x n
xn + 2p -f
s+ 1⎩ ⎩ n + 2p -f
s+ 1⎩ ⎩
Strides
5 15 4 0 -1
10 1 5 1 0
6 9 11 1 -1
0 -1
1 0
5 15 4
10 1 5
-1 0 1
-2 0 2
-1 0 1
Output Image
n x n
5 + 2x0 - 3
2
x 5 + 2x0 - 3
2
2 X 2
xn + 2p -f
s+ 1⎩ ⎩ n + 2p -f
s+ 1⎩ ⎩
CNN layer
*
horizontal edge filterImagen x n x 3
f x f x 3
1 5 7 1 -210
1 5 1 66 9 1
11 1
0 -11 0
5 15
-310
1 2
5 15
4 0 -11
01 5 1 0
6 9 11
1 -10 -
11 05 1
54
10
1 5
2 1 -3 8 131
01 5 1 1
56 9 11
1 -10 -11 0
5 15
-610
1 0
w19 w20 w21
w4 w5 w24
w7 w8 w27
w10 w11 w12
w4 w5 w15
w7 w8 w18
w1 w2 w3
w4 w5 w6
w7 w8 w9
v19 v20 v21
w4 w5 v24
w7 w8 v27
v10 v11 v12
w4 w5 v15
w7 w8 v18
v1 v2 v3
v4 v5 v6
v7 v8 v9 vertical edge filter
f x f x 3
*
ReLUo1 o2 o3
o4 o5 o6
o7 o8 o9( + b1 (
ReLU
o10 o11 o12
o4 o5 o15
o7 o8 o18( + b2 (x
n + 2p -f
s+ 1⎩ ⎩ n + 2p -f
s+ 1⎩ ⎩
Convolutional Neural Networkx
n + 2p -f
s+ 1⎩ ⎩ n + 2p -f
s+ 1⎩ ⎩
32 x 32 x 3
f = 3s = 1p = 0
10 filters30 x 30 x 10
f = 5s = 2p = 0
20 filters13 x 13 x 20
f = 5s = 2p = 0
40 filters5 x 5 x 40
1000
0 / 1
logisticoutput
Convolutional Neural Networkx
n + 2p -f
s+ 1⎩ ⎩ n + 2p -f
s+ 1⎩ ⎩
32 x 32 x 3
f = 3s = 1p = 0
10 filters30 x 30 x 10
f = 5s = 2p = 0
20 filters13 x 13 x 20
f = 5s = 2p = 0
40 filters5 x 5 x 40
1000
softmax output
Max Pooling
1 5
10 8
6 -1
2 3
4 1
0 9
2 -5
-1 0
Max Pooling
1 5
10 8
6 -1
2 3
4 1
0 9
2 -5
-1 0
10
Max Pooling
1 5
10 8
6 -1
2 3
4 1
0 9
2 -5
-1 0
10 6
Max Pooling
1 5
10 8
6 -1
2 3
4 1
0 9
2 -5
-1 0
10 6
9
Max Pooling
1 5
10 8
6 -1
2 3
4 1
0 9
2 -5
-1 0
10 6
9 2
Max Pooling
1 5
10 8
6 -1
2 3
4 1
1 9
2 -5
-1 6
10
10
1
6
1
5 8 -1 2 -7
xn + 2p -f
s+ 1⎩ ⎩ n + 2p -f
s+ 1⎩ ⎩
f = 3s = 1p = 0
Max Pooling
1 5
10 8
6 -1
2 3
4 1
1 9
2 -5
-1 6
10 8
10
1
6
1
5 8 -1 2 -7
xn + 2p -f
s+ 1⎩ ⎩ n + 2p -f
s+ 1⎩ ⎩
f = 3s = 1p = 0
-75 8 -1 2
10
1
6
1
Max Pooling
1 5
10 8
6 -1
2 3
4 1
1 9
2 -5
-1 6
10 8 10
xn + 2p -f
s+ 1⎩ ⎩ n + 2p -f
s+ 1⎩ ⎩
f = 3s = 1p = 0
-75 8 -1 2
10
1
6
1
Max Pooling
1 5
10 8
6 -1
2 3
4 1
1 9
2 -5
-1 6
10 8
10
10
xn + 2p -f
s+ 1⎩ ⎩ n + 2p -f
s+ 1⎩ ⎩
f = 3s = 1p = 0
-75 8 -1 2
10
1
6
1
Max Pooling
1 5
10 8
6 -1
2 3
4 1
1 9
2 -5
-1 6
10 8
10 9
10
xn + 2p -f
s+ 1⎩ ⎩ n + 2p -f
s+ 1⎩ ⎩
f = 3s = 1p = 0
-75 8 -1 2
10
1
6
1
Max Pooling
1 5
10 8
6 -1
2 3
4 1
1 9
2 -5
-1 6
10 8
10 9
10
6
xn + 2p -f
s+ 1⎩ ⎩ n + 2p -f
s+ 1⎩ ⎩
f = 3s = 1p = 0
-75 8 -1 2
10
1
6
1
Max Pooling
1 5
10 8
6 -1
2 3
4 1
1 9
2 -5
-1 6
10 8
10 9
10
6
9
xn + 2p -f
s+ 1⎩ ⎩ n + 2p -f
s+ 1⎩ ⎩
f = 3s = 1p = 0
-75 8 -1 2
10
1
6
1
Max Pooling
1 5
10 8
6 -1
2 3
4 1
1 9
2 -5
-1 6
10 8
10 9
10
6
9 9
xn + 2p -f
s+ 1⎩ ⎩ n + 2p -f
s+ 1⎩ ⎩
f = 3s = 1p = 0
-75 8 -1 2
10
1
6
1
Max Pooling
1 5
10 8
6 -1
2 3
4 1
1 9
2 -5
-1 6
10 8
10 9
10
6
9 9 6
xn + 2p -f
s+ 1⎩ ⎩ n + 2p -f
s+ 1⎩ ⎩
f = 3s = 1p = 0
Average Pooling
1 5
10 8
6 -1
2 3
4 1
0 9
2 -5
-1 0
6 2.5
3.5 -1
LeNet - 5
f = 5s = 1p = 0
6 filters
f = 2s = 2
maxpooling
xn + 2p -f
s+ 1⎩ ⎩ n + 2p -f
s+ 1⎩ ⎩
⎨layer 1
f = 5s = 1p = 0
16 filters
f = 2s = 2
maxpooling
⎨layer 2
7