Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3...

117
Slides credited from Mark Chang & Hung-Yi Lee

Transcript of Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3...

Page 1: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Slides credited from Mark Chang & Hung-Yi Lee

Page 2: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Outline

• CNN (Convolutional Neural Networks) Introduction

• Evolution of CNN

• Visualizing the Features

• CNN as Artist

• More Applications

2

Page 3: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Outline

• CNN (Convolutional Neural Networks) Introduction

• Evolution of CNN

• Visualizing the Features

• CNN as Artist

• More Applications

3

Page 4: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Image Recognition

http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf

4

Page 5: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Why CNN for Image

• Some patterns are much smaller than the whole image

5

A neuron does not have to see the whole image to discover the pattern.

“beak” detector

Connecting to small region with less parameters

Page 6: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Why CNN for Image

• The same patterns appear in different regions.

6

“upper-left beak” detector

“middle beak”detector

They can use the same set of parameters.

Do almost the same thing

Page 7: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Why CNN for Image

• Subsampling the pixels will not change the object

7

subsampling

bird

bird

We can subsample the pixels to make image smaller

Less parameters for the network to process the image

Page 8: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Image Recognition

8

Page 9: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Local Connectivity

Neurons connect to a small region

9

Page 10: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Parameter Sharing

• The same feature in different positions

Neurons share the same weights

10

Page 11: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Parameter Sharing

• Different features in the same position

Neuronshave different weights

11

Page 12: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Convolutional Layers

depth

widthwidth depth

weights weights

height

shared weight

12

Page 13: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Convolutional Layers

c1

c2

b1

b2

a1

a2

a3

depth = 2depth = 1

13

Page 14: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Convolutional Layers

c1

b1

b2

a1

a2

d1

b3

a3

c2

d2

depth = 2 depth = 2

14

Page 15: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Convolutional Layers

c1

b1

b2

a1

a2

d1

b3

a3

c2

d2

depth = 2 depth = 2

15

Page 16: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Convolutional LayersA B C

A B C D

16

Page 17: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Hyper-parameters of CNN

• Stride • Padding

0 0

Stride = 1

Stride = 2

Padding = 0

Padding = 1

17

Page 18: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Example

OutputVolume (3x3x2)

InputVolume (7x7x3)

Stride = 2

Padding = 1

http://cs231n.github.io/convolutional-networks/

Filter(3x3x3)

18

Page 19: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Convolutional Layers

http://cs231n.github.io/convolutional-networks/

19

Page 20: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Convolutional Layers

http://cs231n.github.io/convolutional-networks/

20

Page 21: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Convolutional Layers

http://cs231n.github.io/convolutional-networks/

21

Page 22: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Relationship with Convolution

22

Page 23: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Relationship with Convolution

23

Page 24: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Relationship with Convolution

24

Page 25: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Relationship with Convolution

25

Page 26: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Nonlinearity

• Rectified Linear (ReLU)

n

ReLU

26

Page 27: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Why ReLU?

• Easy to train

• Avoid gradient vanishing problem

Sigmoidsaturated

gradient ≈ 0 ReLU not saturated

27

Page 28: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Why ReLU?

• Biological reason

strong stimulationReLU

weak stimulation

neuron t

v

strong stimulation

neuront

v

weak stimulation

28

Page 29: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Pooling Layer

1 3 2 4

5 7 6 8

0 0 3 3

5 5 0 0

4 5

5 3

7 8

5 3

MaximumPooling

AveragePooling

Max(1,3,5,7) = 7 Avg(1,3,5,7) = 4

no overlap

no weights

depth = 1

Max(0,0,5,5) = 5

29

Page 30: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Why “Deep” Learning?

30

Page 31: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Visual Perception of Computer

ConvolutionalLayer

ConvolutionalLayer Pooling

Layer

PoolingLayer

Receptive FieldsReceptive Fields

InputLayer

31

Page 32: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Fully-Connected Layer

• Fully-Connected Layers : Global feature extraction

• Softmax Layer: Classifier

ConvolutionalLayer Convolutional

Layer PoolingLayer

PoolingLayer

InputLayer

InputImage

Fully-ConnectedLayer Softmax

Layer

5

7

ClassLabel

32

Page 33: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Training

• Forward Propagation

n2 n1

33

Page 34: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Training

• Update weights

n2 n1

Cost function:

34

Page 35: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Training

• Update weights

n2 n1

Cost function:

35

Page 36: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Training

• Propagate to the previous layer

n2 n1

Cost function:

36

Page 37: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Training Convolutional Layers

• example:

a3

a2

a1

b2

b1

output input

ConvolutionalLayer

37

Page 38: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Training Convolutional Layers

• Forward propagation

a3

a2

a1

b2

b1

input

ConvolutionalLayer

38

Page 39: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Training Convolutional Layers

• Update weights

Cost function:

a3

a2

a1

b2

b1

39

Page 40: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Training Convolutional Layers

• Update weights

a3

a2

a1

b2

b1Cost

function:

40

Page 41: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Training Convolutional Layers

• Update weights

a3

a2

a1

b2

b1Cost function:

41

Page 42: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Training Convolutional Layers

• Update weights

a3

a2

a1

b2

b1Cost function:

42

Page 43: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Training Convolutional Layers

• Propagate to the previous layer

Cost function:

a3

a2

a1

b2

b1

43

Page 44: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Training Convolutional Layers

• Propagate to the previous layer

Cost function:

a3

a2

a1

b2

b1

44

Page 45: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Max-Pooling Layers during Training

• Pooling layers have no weights

• No need to update weights

a3

a2

a1

b2

b1

Max-pooling

45

Page 46: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Max-Pooling Layers during Training

• Propagate to the previous layer

a3

a2

a1

b2

b1Cost function:

46

Page 47: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Max-Pooling Layers during Training

• if a1 = a2 ??◦ Choose the node with smaller index

a3

a2

a1

b2

b1Cost function:

47

Page 48: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Avg-Pooling Layers during Training

• Pooling layers have no weights

• No need to update weights

a3

a2

a1

b2

b1

Avg-pooling

48

Page 49: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Avg-Pooling Layers during Training

• Propagate to the previous layer

Cost function:

a3

a2

a1

b2

b1

49

Page 50: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

ReLU during Training

n

50

Page 51: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Training CNN

51

Page 52: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Outline

• CNN(Convolutional Neural Networks) Introduction

• Evolution of CNN

• Visualizing the Features

• CNN as Artist

• More Applications

52

Page 53: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

LeNet (1998)

Yann LeCun http://yann.lecun.com/exdb/lenet/

53http://vision.stanford.edu/cs598_spring07/papers/Lecun98.pdf

Page 54: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

ImageNet Challenge (2010-2017)

• ImageNet Large Scale Visual Recognition Challenge◦ 1000 categories

◦ Training: 1,200,000

◦ Validation: 50,000

◦ Testing: 100,000

http://vision.stanford.edu/Datasets/collage_s.png

54http://image-net.org/challenges/LSVRC/

Page 55: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

ImageNet Challenge (2010-2017)

https://chtseng.wordpress.com/2017/11/20/ilsvrc-歷屆的深度學習模型 55

Page 56: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

AlexNet (2012)

• The resurgence of Deep Learning◦ ReLU, dropout, image augmentation, max pooling

Geoffrey HintonAlex Krizhevsky

56http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf

Page 57: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

VGGNet (2014)

D: VGG16E: VGG19All filters are 3x3

57https://arxiv.org/abs/1409.1556

Page 58: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

VGGNet

• More layers & smaller filters (3x3) is better

• More non-linearity, fewer parameters

One 5x5 filter• Parameters:

5x5 = 25• Non-linear:1

Two 3x3 filters• Parameters:

3x3x2 = 18• Non-linear:2

58

Page 59: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

VGG 19

depth=643x3 convconv1_1conv1_2

maxpool

depth=1283x3 convconv2_1conv2_2

maxpool

depth=2563x3 convconv3_1conv3_2conv3_3conv3_4

depth=5123x3 convconv4_1conv4_2conv4_3conv4_4

depth=5123x3 convconv5_1conv5_2conv5_3conv5_4

maxpool maxpool maxpool

size=4096FC1FC2

size=1000softmax

59

Page 60: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

GoogLeNet (2014)

• Paper: http://www.cs.unc.edu/~wliu/papers/GoogLeNet.pdf

22 layers deep network

InceptionModule

60http://www.cs.unc.edu/~wliu/papers/GoogLeNet.pdf

Page 61: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Inception Module

• Best size? ◦ 3x3? 5x5?

• Use them all, and combine

61

Page 62: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Inception Module

1x1 convolution

3x3 convolution

5x5 convolution

3x3 max-pooling

Previous layer

FilterConcatenate

62

Page 63: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Inception Module with Dimension Reduction

• Use 1x1 filters to reduce dimension

63

Page 64: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Inception Module with Dimension Reduction

Previous layer

1x1 convolution(1x1x256x128)

Reduceddimension

Input size1x1x256

128256

Output size1x1x128

64

Page 65: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

ResNet (2015)

• Residual Networks with 152 layers

65https://arxiv.org/abs/1512.03385

Page 66: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

ResNet

• Residual learning: a building block

Residualfunction

66

Page 67: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Residual Learning with Dimension Reduction

• using 1x1 filters

67

Page 68: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Open Images Extended - Crowdsourced

68https://ai.google/tools/datasets/open-images-extended-crowdsourced/?fbclid=IwAR1K8mvT68l6JH8Ff8MtBmIAE0JpYX3d_VFiChzKeSECATng4yV1xdANGTg

Page 69: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Pretrained Model Download

• http://www.vlfeat.org/matconvnet/pretrained/◦ Alexnet:

◦ http://www.vlfeat.org/matconvnet/models/imagenet-matconvnet-alex.mat

◦ VGG19:

◦ http://www.vlfeat.org/matconvnet/models/imagenet-vgg-verydeep-19.mat

◦ GoogLeNet:

◦ http://www.vlfeat.org/matconvnet/models/imagenet-googlenet-dag.mat

◦ ResNet

◦ http://www.vlfeat.org/matconvnet/models/imagenet-resnet-152-dag.mat

69

Page 70: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Using Pretrained Model

• Lower layers:edge, blob, texture (more general)

• Higher layers : object part (more specific)

http://vision03.csail.mit.edu/cnn_art/data/single_layer.png

70

Page 71: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Transfer Learning

• The pretrained model is trained on ImageNet

• If your data is similar to the ImageNet data

◦ Fix all CNN Layers

◦ Train FC layer

Conv layer

FC layer FC layer

Labeled dataLabeled dataLabeled dataLabeled dataLabeled dataImageNetdata

Yourdata

Conv layer

Conv layer

Conv layer

Yourdata

… …

71

Page 72: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Transfer Learning

• The pretrained model is trained on ImageNet

• If your data is far different from the ImageNet data

◦ Fix lower CNN Layers

◦ Train higher CNN and FC layers

Conv layer

FC layer FC layer

Labeled dataLabeled dataLabeled dataLabeled dataLabeled dataImageNetdata

Yourdata

Conv layer

Conv layer

Conv layer

Yourdata

… …

72

Page 73: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Transfer Learning Example

daisy634

photos

dandelion899

photos

roses642

photos

tulips800

photos

sunflowers700

photos

http://download.tensorflow.org/example_images/flower_photos.tgz

https://www.tensorflow.org/versions/r0.11/how_tos/style_guide.html 73

Page 74: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Outline

• CNN(Convolutional Neural Networks) Introduction

• Evolution of CNN

• Visualizing the Features

• CNN as Artist

• More Applications

74

Page 75: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Visualizing CNN

http://vision03.csail.mit.edu/cnn_art/data/single_layer.png 75

Page 76: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Visualizing CNN

CNN

CNN

flower

randomnoise

filterresponse

filterresponse

76

Page 77: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

filterresponse:

Gradient Ascent

• Magnify the filter response

random noise:

score:

lower score

higher score

gradient:

77

Page 78: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

filterresponse:

Gradient Ascent

• Magnify the filter response

random noise:

gradient:

lower score

higher score

update

learning rate

78

Page 79: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Gradient Ascent

79

Page 80: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Different Layers of Visualization

CNN

80

Page 81: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Multiscale Image Generation

visualize resizevisualize

resize

visualize

81

Page 82: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Deep Dream

• Given a photo, machine adds what it sees ……

82http://deepdreamgenerator.com/

Page 83: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Outline

• CNN(Convolutional Neural Networks) Introduction

• Evolution of CNN

• Visualizing the Features

• CNN as Artist

• More Applications

83

Page 84: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

The Mechanism of Painting

BrainArtist

Scene Style ArtWork

Computer Neural Networks

84https://arxiv.org/abs/1508.06576

Page 85: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Content Generation

BrainArtistContent

Canvas

Minimizethe

difference

Neural Stimulation

Draw

85

Page 86: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Content Generation

86

Filter ResponsesVGG19

Update thecolor of

the pixels

Content

Canvas

Result

Width*HeightD

epth

Minimizethe

difference

Page 87: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Content Generation

87

Page 88: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Style Generation

VGG19Artwork

G

G

Filter Responses Gram Matrix

Width*Height

Dep

th

Depth

Dep

th

Position-dependent

Position-independent

88

Page 89: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Style Generation

VGG19Filter

ResponsesGram Matrix

Minimizethe

difference

G

G

Style

Canvas

Update the color of the pixelsResult

89

Page 90: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Style Generation

90

Page 91: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Artwork GenerationFilter ResponsesVGG19

Gram Matrix

91

Page 92: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Artwork Generation

92

Page 93: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Content v.s. Style

0.15 0.05

0.02 0.007

93

Page 94: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Neural Doodle

• Image analogy

94https://arxiv.org/abs/1603.01768

Page 95: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Neural Doodle

• Image analogy

恐怖連結,慎入!https://raw.githubusercontent.com/awentzonline/image-analogies/master/examples/images/trump-image-analogy.jpg

95

Page 96: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Real-time Texture Synthesis

96https://arxiv.org/pdf/1604.04382v1.pdf

Page 97: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Outline

• CNN(Convolutional Neural Networks) Introduction

• Evolution of CNN

• Visualizing the Features

• CNN as Artist

• More Applications

97

Page 98: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

More Application: Playing Go

CNN

CNN

record of previous plays

Target:“天元” = 1else = 0

Target:“五之 5” = 1else = 0

Training: 黑: 5之五 白: 天元 黑: 五之5 …

98

Page 99: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Why CNN for Go?

• Some patterns are much smaller than the whole image

• The same patterns appear in different regions.AlphaGo uses 5 x 5 for first layer

99

Page 100: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Why CNN for Go?

• Subsampling the pixels will not change the object

Alpha Go does not use Max Pooling ……

Max Pooling How to explain this???

100

Page 101: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

More Applications: Sentence Encoding

101https://arxiv.org/abs/1404.2188

Page 102: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Ambiguity in Natural Language

http://3rd.mafengwo.cn/travels/info_weibo.php?id=2861280

102

http://www.appledaily.com.tw/realtimenews/article/new/20151006/705309/

Page 103: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Element-wise 1D Operations on Word Vectors

• 1D Convolution or 1D Pooling

This is a

operationoperation

This is a

Representedby

103

Page 104: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

CNN Model

104

This is a dog

conv3

conv2

conv1conv1conv1

conv2

Differentconv layers

Page 105: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

CNN with Max-Pooling Layers

• Similar to syntax tree

• But human-labeled syntax tree is not needed

This is a dog

conv2

pool1

conv1conv1conv1

pool1

This is a dog

conv2

pool1

conv1conv1

pool1

MaxPooling

105

Page 106: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Sentiment Analysis by CNN

• Use softmax layer to classify the sentiments

positive

This movie is awesome

conv2

pool1

conv1conv1conv1

pool1

softmax

negative

This movie is awful

conv2

pool1

conv1conv1conv1

pool1

softmax

106

Page 107: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Sentiment Analysis by CNN

• Build the “correct syntax tree” by training

negative

This movie is awesome

conv2

pool1

conv1conv1conv1

pool1

softmax

negative

This movie is awesome

conv2

pool1

conv1conv1conv1

pool1

softmax

Backwardpropagation

error

107

Page 108: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Sentiment Analysis by CNN

• Build the “correct syntax tree” by training

negative

This movie is awesome

conv2

pool1

conv1conv1conv1

pool1

softmax

positive

This movie is awesome

conv2

pool1

conv1conv1conv1

pool1

softmax

Updatethe weights

108

Page 109: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Multiple Filters

• Richer features than RNN

This is

filter11 Filter13filter12

a

filter11 Filter13filter12

filter21 Filter23filter22

109

Page 110: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Resizing Sentence

• Image can be easily resized • Sentence can’t be easily resized

全台灣最高樓在台北

resize

resize

全台灣最高的高樓在台北市

全台灣最高樓在台北市

台灣最高樓在台北

110

Page 111: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Various Input Size

• Convolutional layers and pooling layers ◦ can handle input with various size

This is a dog

pool1

conv1conv1conv1

pool1

the dog run

pool1

conv1conv1

111

Page 112: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Various Input Size

• Fully-connected layer and softmax layer ◦ need fixed-size input

The dog run

fc

softmax

This is a

fc

softmax

dog

112

Page 113: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

k-max Pooling

• choose the k-max values

• preserve the order of input values

• variable-size input, fixed-size output

3-maxpooling

13 4 1 7 812 5 21 15 7 4 9

3-maxpooling

12 21 15 13 7 8

113

Page 114: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Wide Convolution

• Ensures that all weights reach the entire sentence

conv conv conv conv conv convconv conv

Wide convolution Narrow convolution

114

Page 115: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Dynamic k-max Pooling

Wide convolution &Dynamic k-max pooling

115

Page 116: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

CNN for Sentence Classification

• Pretrained by word2vec

• Static & non-static channels◦ Static: fix the values during training

◦ Non-Static: update the values during training

116http://www.aclweb.org/anthology/D14-1181

Page 117: Slides credited from Mark Chang & Hung-Yi Leemiulab/s107-adl/doc/190507_CNN.pdf · depth=64 3x3 conv conv1_1 conv1_2 maxpool depth=128 3x3 conv2_1 conv2_2 maxpool depth=256 3x3 conv

Concluding Remarks

117

ConvolutionalLayer Convolutional

Layer PoolingLayer

PoolingLayer

InputLayer

InputImage

Fully-ConnectedLayer Softmax

Layer

5

7

ClassLabel