Presented by Mohammad Nayeem Teli€¦ · DeepFace: Closing the Gap to Human-Level Performance in...

Convolutional Neural Networks

Presented by

Mohammad Nayeem Teli

Convolutional Neural Networks■ ConvNets have been around for a long time, ■ One of the seminal works came from LeCun et al. 1989 ■ MNIST digit and OC recognition ■ Most recent version LeNet-5

7

■ Approx. 14 M labeled images, 20 K classes ■ Images gathered from internet ■ Human labels via Amazon Turk ■ For object recognition challenge 1.2 M training images and 1000

classeshttp://image-net.org/

Performance on ImageNet

Face recognition and verification system using FaceNet and DeepFace Algorithms - Implementation

FaceNet: A Unified Embedding for Face Recognition and Clustering

Florian Schroff, Dmitry Kalenichenko and James Philbin (2015) in Proceedings of the IEEE Computer Society

Conference on Computer Vision and Pattern Recognition 2015.

DeepFace: Closing the Gap to Human-Level Performance in Face Verification

Yaniv Taigman, Ming Yang, Marc’ Aurelio Ranzato and Lior Wolf (2014) in Proceedings of the IEEE Computer

Society Conference on Computer Vision and Pattern Recognition 2014.

Object Detection for autonomous car driving application - YOLO Algorithm Implementation

You Only Look Once: Unified, Real-Time Object Detection

Joseph Redmon, Santos Divvala, Ross Girschick and Ali Farhadi (2016) in Proceedings of the IEEE Computer

Society Conference on Computer Vision and Pattern Recognition 2016.

■ You OnlySee Once (YOLO) is an object detection model ■ It runs on an input image through a Convolution Neural

Network

Deep Learning & Art: Neural Style Transfer

A neural algorithm of artistic style

Leon A. Gatys, Alexander S. Ecker, Matthias Bethge (2015) in Proceedings of the IEEE Computer Society

Conference on Computer Vision and Pattern Recognition 2015.

■ Implementation of neural style transfer algorithm ■ Generate novel artistic images

Cross-Correlation and Convolution■ Cross-correlation is a similarity measure between two signals when

one has a time-lag.Continuous domain:

■ Convolution is similar, although one signal is reversed

■ They have two key features: ■ shift invariance :

Same operation is performed at every point in the image ■ linearity

Every pixel is replaced with a linear combination of its neighbors.

(g∗h)(t) = ∫ g*(𝜏)h(t+𝜏)d𝜏

fconv = g(-t)∗h(t)

Convolution

Gaussian

Cross-Correlation and Convolution

*

5 15 4 0 -1

10 1 5 1 0

6 9 11 1 -1

0 -1

1 0

5 15 4

10 1 5

-1 0 1

-2 0 2

-1 0 1

filterImagen x n

f x f


*

5 15 4 0 -1

10 1 5 1 0

6 9 11 1 -1

0 -1

1 0

5 15 4

10 1 5

-1 0 1

-2 0 2

-1 0 1

=

-1 -23 -27

10 0 -30

24 25 -19

5*(-1)+15*0+4*1+10*(-2)+1*0+5*2+6*(-2)+9*0+11*2 = -1


*

5 15 4 0 -1

10 1 5 1 0

6 9 11 1 -1

0 -1

1 0

5 15 4

10 1 5

-1 0 1

-2 0 2

-1 0 1

=


5 15 4 0 -1

10 1 5 1 0

6 9 11 1 -1

0 -1

1 0

5 15 4

10 1 5

-1 0 1

-2 0 2

-1 0 1

-1


5 15 4 0 -1

10 1 5 1 0

6 9 11 1 -1

0 -1

1 0

5 15 4

10 1 5

-1 0 1

-2 0 2

-1 0 1

-1 -23


5 15 4 0 -1

10 1 5 1 0

6 9 11 1 -1

0 -1

1 0

5 15 4

10 1 5

-1 0 1

-2 0 2

-1 0 1

-1 -23 -27


5 15 4 0 -1

10 1 5 1 0

6 9 11 1 -1

0 -1

1 0

5 15 4

10 1 5

-1 0 1

-2 0 2

-1 0 1

-1 -23 -27

10


5 15 4 0 -1

10 1 5 1 0

6 9 11 1 -1

0 -1

1 0

5 15 4

10 1 5

-1 0 1

-2 0 2

-1 0 1

-1 -23 -27

10 0


5 15 4 0 -1

10 1 5 1 0

6 9 11 1 -1

0 -1

1 0

5 15 4

10 1 5

-1 0 1

-2 0 2

-1 0 1

-1 -23 -27

10 0 -30


5 15 4 0 -1

10 1 5 1 0

6 9 11 1 -1

0 -1

1 0

5 15 4

10 1 5

-1 0 1

-2 0 2

-1 0 1

-1 -23 -27

10 0 -30

24


5 15 4 0 -1

10 1 5 1 0

6 9 11 1 -1

0 -1

1 0

5 15 4

10 1 5

-1 0 1

-2 0 2

-1 0 1

-1 -23 -27

10 0 -30

24 25


5 15 4 0 -1

10 1 5 1 0

6 9 11 1 -1

0 -1

1 0

5 15 4

10 1 5

-1 0 1

-2 0 2

-1 0 1

-1 -23 -27

10 0 -30

24 25 -19

Output Image

n-f+1 x n-f+1n x n


-1 0 1

-2 0 2

-1 0 1

* =

*

-1 -2 -1

0 0 0

1 2 1

=

horizontal edge detector

vertical edge detector


-1 0 1

-2 0 2

-1 0 1

1 2 1

0 0 0

-1 -2 -1

Cross-correlation

Convolution

Neural Network

g - activation function

y = g(w0 * x0 + w1 * x1 + w2 * x2)

w0=-10

w1=+20

+20

g(z) =1

1 + e-z

x0

x1

x2

Neural Network

g - activation function

y = g(w0 * x0 + w1 * x1 + w2 * x2)

w0=-10

w1=+20

+20z

g(z) =1

1 + e-z

x0

x1

x2

Neural Network

a21

a22

a23

a31

a32

a41

g(w211*x1 + w212*x2 + w213*x3)

Neural Network

x1

x2

x3

Activation Functions■ Sigmoid

■ ReLU

■ Hyperbolic tangent

■ Leaky ReLU

CNN

*

5 15 4 0 -1

10 1 5 1 0

6 9 11 1 -1

0 -1

1 0

5 15 4

10 1 5

-1 0 1

-2 0 2

-1 0 1

filterImagen x n

f x f

CNN

*

5 15 4 0 -1

10 1 5 1 0

6 9 11 1 -1

0 -1

1 0

5 15 4

10 1 5

w1 w2 w3

w4 w5 w6

w7 w8 w9

filterImagen x n f x f

CNN

*

5 15 4 0 -1

10 1 5 1 0

6 9 11 1 -1

0 -1

1 0

5 15 4

10 1 5

w1 w2 w3

w4 w5 w6

w7 w8 w9

filterImagen x n f x f

# of parameters to learn: 3x3 = 9

f = 3n = 5

CNN

*

5 15 4 0 -1

10 1 5 1 0

6 9 11 1 -1

0 -1

1 0

5 15 4

10 1 5

w1 w2 w3

w4 w5 w6

w7 w8 w9

filterImage5 x 5 3 x 3


x12

x13

x14

x25

x1

y∧

a1

a9


w15 w16 w19

w4 w5 w20

w7 w8 w21

w10 w11 w12

w4 w5 w13

w7 w8 w14

CNN over multiple channels

*

filterImagen x n x 3

f x f x 3

1 5 7 1 -2

10 1 5 1 6

6 9 11 1 1

0 -1

1 0

5 15 -3

10 1 2

5 15 4 0 -1

10 1 5 1 0

6 9 11 1 -1

0 -1

1 0

5 15 4

10 1 5

2 1 -3 8 13

10 1 5 1 15

6 9 11 1 -1

0 -1

1 0

5 15 -6

10 1 0

w1 w2 w3

w4 w5 w6

w7 w8 w9

=

o1 o2 o3

o4 o5 o6

o7 o8 o9

n - f +1 x n - f + 1

5 x 5 x 3 3 x 3 x 3 3 x 3 x 1

o10 o11 o12

o4 o5 o15

o7 o8 o18

w19 w20 w21

w4 w5 w24

w7 w8 w27

w10 w11 w12

w4 w5 w15

w7 w8 w18

CNN using multiple filters

*

horizontal edge filter

Imagen x n x 3

f x f x 3

1 5 7 1 -2

10 1 5 1 6

6 9 11 1 1

0 -1

1 0

5 15 -3

10 1 2

5 15 4 0 -1

10 1 5 1 0

6 9 11 1 -1

0 -1

1 0

5 15 4

10 1 5

2 1 -3 8 13

10 1 5 1 15

6 9 11 1 -1

0 -1

1 0

5 15 -6

10 1 0

w1 w2 w3

w4 w5 w6

w7 w8 w9

=

(n - f +1) x (n - f + 1 ) x 2v19 v20 v21

w4 w5 v24

w7 w8 v27

v10 v11 v12

w4 w5 v15

w7 w8 v18

v1 v2 v3

v4 v5 v6

v7 v8 v9 vertical edge filter

f x f x 3

o1 o2 o3

o4 o5 o6

o7 o8 o9

*=

o10 o11 o12

o4 o5 o15

o7 o8 o18

o1 o2 o3

o4 o5 o6

o7 o8 o9

CNN layer

*

horizontal edge filterImage

n x n x 3

f x f x 3

1 5 7 1 -210

1 5 1 66 9 1

11 1

0 -11 0

5 15

-310

1 2

5 15

4 0 -11

01 5 1 0

6 9 11

1 -10 -

11 05 1

54

10

1 5

2 1 -3 8 131

01 5 1 1

56 9 11

1 -10 -11 0

5 15

-610

1 0

w19 w20 w21

w4 w5 w24

w7 w8 w27

w10 w11 w12

w4 w5 w15

w7 w8 w18

w1 w2 w3

w4 w5 w6

w7 w8 w9

(n - f +1) x (n - f + 1 ) x 1

v19 v20 v21

w4 w5 v24

w7 w8 v27

v10 v11 v12

w4 w5 v15

w7 w8 v18

v1 v2 v3

v4 v5 v6

v7 v8 v9vertical edge filter

f x f x 3

*

ReLUo1 o2 o3

o4 o5 o6

o7 o8 o9( + b1(

ReLU

o10 o11 o12

o4 o5 o15

o7 o8 o18( + b2 (

Padding

5 15 4 0 -1

10 1 5 1 0

6 9 11 1 -1

0 -1

1 0

5 15 4

10 1 5

-1 0 1

-2 0 2

-1 0 1

Output Image

( n + 2p - f + 1) x ( n + 2p - f + 1)

n x n

Padding

5 15 4 0 -1

10 1 5 1 0

6 9 11 1 -1

0 -1

1 0

5 15 4

10 1 5

0 0 0 0 0

0

0

0

0

0

0

0

0

0

0

0

0

0 0 0 0 0 00

-1 0 1

-2 0 2

-1 0 1

Output Image( n + 2p - f + 1 ) x ( n + 2p - f + 1)

n x n

n + 2p - f + 1 = n

p = f - 1 / 2

5 + 2 *1 - 3 + 1 = 5

Strides

5 15 4 0 -1

10 1 5 1 0

6 9 11 1 -1

0 -1

1 0

5 15 4

10 1 5

-1 0 1

-2 0 2

-1 0 1

Output Image( n + 2p - f + 1 ) x ( n + 2p - f + 1)

n x n

Strides

5 15 4 0 -1

10 1 5 1 0

6 9 11 1 -1

0 -1

1 0

5 15 4

10 1 5

-1 0 1

-2 0 2

-1 0 1

Output Imagen x n

n + 2p - f

s+ 1⎩ ⎩x

n + 2p - f

s+ 1⎩ ⎩

Strides

5 15 4 0 -1

10 1 5 1 0

6 9 11 1 -1

0 -1

1 0

5 15 4

10 1 5

-1 0 1

-2 0 2

-1 0 1

Output Imagen x n

xn + 2p -f

s+ 1⎩ ⎩ n + 2p -f

s+ 1⎩ ⎩

Strides

5 15 4 0 -1

10 1 5 1 0

6 9 11 1 -1

0 -1

1 0

5 15 4

10 1 5

-1 0 1

-2 0 2

-1 0 1

Output Image

n x n

5 + 2x0 - 3

2

x 5 + 2x0 - 3

2

2 X 2

xn + 2p -f

s+ 1⎩ ⎩ n + 2p -f

s+ 1⎩ ⎩

CNN layer

*

horizontal edge filterImagen x n x 3

f x f x 3

1 5 7 1 -210

1 5 1 66 9 1

11 1

0 -11 0

5 15

-310

1 2

5 15

4 0 -11

01 5 1 0

6 9 11

1 -10 -

11 05 1

54

10

1 5

2 1 -3 8 131

01 5 1 1

56 9 11

1 -10 -11 0

5 15

-610

1 0

w19 w20 w21

w4 w5 w24

w7 w8 w27

w10 w11 w12

w4 w5 w15

w7 w8 w18

w1 w2 w3

w4 w5 w6

w7 w8 w9

v19 v20 v21

w4 w5 v24

w7 w8 v27

v10 v11 v12

w4 w5 v15

w7 w8 v18

v1 v2 v3

v4 v5 v6

v7 v8 v9 vertical edge filter

f x f x 3

*

ReLUo1 o2 o3

o4 o5 o6

o7 o8 o9( + b1 (

ReLU

o10 o11 o12

o4 o5 o15

o7 o8 o18( + b2 (x

n + 2p -f

s+ 1⎩ ⎩ n + 2p -f

s+ 1⎩ ⎩

Convolutional Neural Networkx

n + 2p -f

s+ 1⎩ ⎩ n + 2p -f

s+ 1⎩ ⎩

32 x 32 x 3

f = 3s = 1p = 0

10 filters30 x 30 x 10

f = 5s = 2p = 0


f = 5s = 2p = 0


1000

0 / 1

logisticoutput

Convolutional Neural Networkx

n + 2p -f

s+ 1⎩ ⎩ n + 2p -f

s+ 1⎩ ⎩

32 x 32 x 3

f = 3s = 1p = 0


f = 5s = 2p = 0


f = 5s = 2p = 0


1000

softmax output

Max Pooling

1 5

10 8

6 -1

2 3

4 1

0 9

2 -5

-1 0

Max Pooling

1 5

10 8

6 -1

2 3

4 1

0 9

2 -5

-1 0

10

Max Pooling

1 5

10 8

6 -1

2 3

4 1

0 9

2 -5

-1 0

10 6

Max Pooling

1 5

10 8

6 -1

2 3

4 1

0 9

2 -5

-1 0

10 6

9

Max Pooling

1 5

10 8

6 -1

2 3

4 1

0 9

2 -5

-1 0

10 6

9 2

Max Pooling

1 5

10 8

6 -1

2 3

4 1

1 9

2 -5

-1 6

10

10

1

6

1

5 8 -1 2 -7

xn + 2p -f

s+ 1⎩ ⎩ n + 2p -f

s+ 1⎩ ⎩

f = 3s = 1p = 0

Max Pooling

1 5

10 8

6 -1

2 3

4 1

1 9

2 -5

-1 6

10 8

10

1

6

1

5 8 -1 2 -7

xn + 2p -f

s+ 1⎩ ⎩ n + 2p -f

s+ 1⎩ ⎩

f = 3s = 1p = 0

-75 8 -1 2

10

1

6

1

Max Pooling

1 5

10 8

6 -1

2 3

4 1

1 9

2 -5

-1 6

10 8 10

xn + 2p -f

s+ 1⎩ ⎩ n + 2p -f

s+ 1⎩ ⎩

f = 3s = 1p = 0

-75 8 -1 2

10

1

6

1

Max Pooling

1 5

10 8

6 -1

2 3

4 1

1 9

2 -5

-1 6

10 8

10

10

xn + 2p -f

s+ 1⎩ ⎩ n + 2p -f

s+ 1⎩ ⎩

f = 3s = 1p = 0

-75 8 -1 2

10

1

6

1

Max Pooling

1 5

10 8

6 -1

2 3

4 1

1 9

2 -5

-1 6

10 8

10 9

10

xn + 2p -f

s+ 1⎩ ⎩ n + 2p -f

s+ 1⎩ ⎩

f = 3s = 1p = 0

-75 8 -1 2

10

1

6

1

Max Pooling

1 5

10 8

6 -1

2 3

4 1

1 9

2 -5

-1 6

10 8

10 9

10

6

xn + 2p -f

s+ 1⎩ ⎩ n + 2p -f

s+ 1⎩ ⎩

f = 3s = 1p = 0

-75 8 -1 2

10

1

6

1

Max Pooling

1 5

10 8

6 -1

2 3

4 1

1 9

2 -5

-1 6

10 8

10 9

10

6

9

xn + 2p -f

s+ 1⎩ ⎩ n + 2p -f

s+ 1⎩ ⎩

f = 3s = 1p = 0

-75 8 -1 2

10

1

6

1

Max Pooling

1 5

10 8

6 -1

2 3

4 1

1 9

2 -5

-1 6

10 8

10 9

10

6

9 9

xn + 2p -f

s+ 1⎩ ⎩ n + 2p -f

s+ 1⎩ ⎩

f = 3s = 1p = 0

-75 8 -1 2

10

1

6

1

Max Pooling

1 5

10 8

6 -1

2 3

4 1

1 9

2 -5

-1 6

10 8

10 9

10

6

9 9 6

xn + 2p -f

s+ 1⎩ ⎩ n + 2p -f

s+ 1⎩ ⎩

f = 3s = 1p = 0

Average Pooling

1 5

10 8

6 -1

2 3

4 1

0 9

2 -5

-1 0

6 2.5

3.5 -1

LeNet - 5

f = 5s = 1p = 0

6 filters

f = 2s = 2

maxpooling

xn + 2p -f

s+ 1⎩ ⎩ n + 2p -f

s+ 1⎩ ⎩

⎨layer 1

f = 5s = 1p = 0

16 filters

f = 2s = 2

maxpooling

⎨layer 2

7

Presented by Mohammad Nayeem Teli€¦ · DeepFace: Closing the Gap to Human-Level Performance in...

Documents

Transcript of Presented by Mohammad Nayeem Teli€¦ · DeepFace: Closing the Gap to Human-Level Performance in...