Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view...

104
Convolutional Neural Networks CS194: Image Manipulation, Comp. Vision, and Comp. Photo Alexei Efros, UC Berkeley, Spring 2020

Transcript of Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view...

Page 1: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Convolutional Neural Networks

CS194: Image Manipulation, Comp. Vision, and Comp. PhotoAlexei Efros, UC Berkeley, Spring 2020

Page 2: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

image X

Convolutional Neural

Network“Penguin”

label Y

Neural Nets: a particularly useful Black Box

Page 3: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

image X

“Penguin”

label Y

Classic Object Recognition

Edges

Texture

Colors

Segments

Parts

Slide by Philip Isola

Feature extractors

Classifier

Page 4: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

image X

“Penguin”

label Y

Classic Object Recognition

Edges

Texture

Colors

Segments

Parts

Slide by Philip Isola

Feature extractors

Classifier

Learned

Page 5: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

image X label Y

Learning Features

Edges

Texture

Colors

Feature extractorsLearned

Page 6: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

image X

“Penguin”

label Y

Neural Network:algorithm + feature + data!

Slide by Philip Isola

Learned

Page 7: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Vanilla (fully-connected) Neural Networks

Page 8: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

8

Example: 200x200 image40K hidden units

~2B parameters!!!

- Spatial correlation is local- Waste of resources + we have not enough

training samples anyway..

Fully Connected Layer

Ranzato

Page 9: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

9

Locally Connected Layer

Example: 200x200 image40K hidden unitsFilter size: 10x10

4M parameters

Ranzato

Note: This parameterization is good when input image is registered (e.g., face recognition).

Page 10: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

10

STATIONARITY? Statistics is similar at different locations

Ranzato

Locally Connected Layer

Example: 200x200 image40K hidden unitsFilter size: 10x10

4M parameters

Page 11: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

11

Convolutional Layer

Share the same parameters across different locations (assuming input is stationary):Convolutions with learned kernels

Ranzato

Page 12: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Convolutional Layer

Ranzato

Page 13: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Convolutional Layer

Ranzato

Page 14: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Convolutional Layer

Ranzato

Page 15: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Convolutional Layer

Ranzato

Page 16: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Convolutional Layer

Ranzato

Page 17: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Convolutional Layer

Ranzato

Page 18: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Convolutional Layer

Ranzato

Page 19: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Convolutional Layer

Ranzato

Page 20: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Convolutional Layer

Ranzato

Page 21: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Convolutional Layer

Ranzato

Page 22: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Convolutional Layer

Ranzato

Page 23: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Convolutional Layer

Ranzato

Page 24: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Convolutional Layer

Ranzato

Page 25: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Convolutional Layer

Ranzato

Page 26: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Convolutional Layer

Ranzato

Page 27: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Convolutional Layer

Ranzato

Page 28: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

28

Learn multiple filters.

E.g.: 200x200 image100 FiltersFilter size: 10x10

10K parameters

Ranzato

Convolutional Layer

Page 29: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Fei-Fei Li & Andrej Karpathy Lecture 7 - 21 Jan 201529

before:

now:

input layer hidden layer

output layer

Page 30: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201630

32

32

3

Convolution Layer32x32x3 image

width

height

depth

Page 31: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201631

32

32

3

Convolution Layer

5x5x3 filter

32x32x3 image

Convolve the filter with the imagei.e. “slide over the image spatially, computing dot products”

Page 32: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201632

32

32

3

Convolution Layer

5x5x3 filter

32x32x3 image

Convolve the filter with the imagei.e. “slide over the image spatially, computing dot products”

Filters always extend the full depth of the input volume

Page 33: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201633

32

32

3

Convolution Layer32x32x3 image5x5x3 filter

1 number: the result of taking a dot product between the filter and a small 5x5x3 chunk of the image(i.e. 5*5*3 = 75-dimensional dot product + bias)

Page 34: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201634

32

32

3

Convolution Layer32x32x3 image5x5x3 filter

convolve (slide) over all spatial locations

activation map

1

28

28

Page 35: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201635

32

32

3

Convolution Layer32x32x3 image5x5x3 filter

convolve (slide) over all spatial locations

activation maps

1

28

28

consider a second, green filter

Page 36: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201636

32

32

3

Convolution Layer

activation maps

6

28

28

For example, if we had 6 5x5 filters, we’ll get 6 separate activation maps:

We stack these up to get a “new image” of size 28x28x6!

Page 37: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201637

convolving the first filter in the input gives the first slice of depth in output volume

Page 38: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201638

Preview: ConvNet is a sequence of Convolution Layers, interspersed with activation functions

32

32

3

28

28

6

CONV,ReLUe.g. 6 5x5x3 filters

Page 39: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201639

Preview: ConvNet is a sequence of Convolutional Layers, interspersed with activation functions

32

32

3

CONV,ReLUe.g. 6 5x5x3 filters 28

28

6

CONV,ReLUe.g. 10 5x5x6 filters

CONV,ReLU

….

10

24

24

Page 40: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201640

A closer look at spatial dimensions:

32

32

3

32x32x3 image5x5x3 filter

convolve (slide) over all spatial locations

activation map

1

28

28

Page 41: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201641

7x7 input (spatially)assume 3x3 filter

7

7

A closer look at spatial dimensions:

Page 42: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201642

7x7 input (spatially)assume 3x3 filter

7

7

A closer look at spatial dimensions:

Page 43: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201643

7x7 input (spatially)assume 3x3 filter

7

7

A closer look at spatial dimensions:

Page 44: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201644

7x7 input (spatially)assume 3x3 filter

7

7

A closer look at spatial dimensions:

Page 45: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201645

7x7 input (spatially)assume 3x3 filter

=> 5x5 output

7

7

A closer look at spatial dimensions:

Page 46: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201646

7x7 input (spatially)assume 3x3 filterapplied with stride 2

7

7

A closer look at spatial dimensions:

Page 47: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201647

7x7 input (spatially)assume 3x3 filterapplied with stride 2

7

7

A closer look at spatial dimensions:

Page 48: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201648

7x7 input (spatially)assume 3x3 filterapplied with stride 2=> 3x3 output!

7

7

A closer look at spatial dimensions:

Page 49: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201649

7x7 input (spatially)assume 3x3 filterapplied with stride 3?

7

7

A closer look at spatial dimensions:

Page 50: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201650

7x7 input (spatially)assume 3x3 filterapplied with stride 3?

7

7

A closer look at spatial dimensions:

doesn’t fit! cannot apply 3x3 filter on 7x7 input with stride 3.

Page 51: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201651

N

NF

F

Output size:(N - F) / stride + 1

e.g. N = 7, F = 3:stride 1 => (7 - 3)/1 + 1 = 5stride 2 => (7 - 3)/2 + 1 = 3stride 3 => (7 - 3)/3 + 1 = 2.33 :\

Page 52: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201652

In practice: Common to zero pad the border0 0 0 0 0 0

0

0

0

0

e.g. input 7x73x3 filter, applied with stride 1 pad with 1 pixel border => what is the output?

(recall:)(N - F) / stride + 1

Page 53: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201653

In practice: Common to zero pad the border

e.g. input 7x73x3 filter, applied with stride 1 pad with 1 pixel border => what is the output?

7x7 output!

0 0 0 0 0 0

0

0

0

0

Page 54: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201654

In practice: Common to zero pad the border

e.g. input 7x73x3 filter, applied with stride 1 pad with 1 pixel border => what is the output?

7x7 output!in general, common to see CONV layers with stride 1, filters of size FxF, and zero-padding with (F-1)/2. (will preserve size spatially)e.g. F = 3 => zero pad with 1

F = 5 => zero pad with 2F = 7 => zero pad with 3

0 0 0 0 0 0

0

0

0

0

Page 55: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201655

Remember back to… E.g. 32x32 input convolved repeatedly with 5x5 filters shrinks volumes spatially!(32 -> 28 -> 24 ...). Shrinking too fast is not good, doesn’t work well.

32

32

3

CONV,ReLUe.g. 6 5x5x3 filters 28

28

6

CONV,ReLUe.g. 10 5x5x6 filters

CONV,ReLU

….

10

24

24

Page 56: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201656

(btw, 1x1 convolution layers make perfect sense)

6456

561x1 CONVwith 32 filters

3256

56(each filter has size 1x1x64, and performs a 64-dimensional dot product)

Page 57: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201657

The brain/neuron view of CONV Layer

32

32

3

32x32x3 image5x5x3 filter

1 number: the result of taking a dot product between the filter and this part of the image(i.e. 5*5*3 = 75-dimensional dot product)

Page 58: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201658

The brain/neuron view of CONV Layer

32

32

3

32x32x3 image5x5x3 filter

1 number: the result of taking a dot product between the filter and this part of the image(i.e. 5*5*3 = 75-dimensional dot product)

It’s just a neuron with local connectivity...

Page 59: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201659

The brain/neuron view of CONV Layer

32

32

3

An activation map is a 28x28 sheet of neuron outputs:1. Each is connected to a small region in the input2. All of them share parameters

“5x5 filter” -> “5x5 receptive field for each neuron”28

28

Page 60: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201660

The brain/neuron view of CONV Layer

32

32

3

28

28

E.g. with 5 filters,CONV layer consists of neurons arranged in a 3D grid(28x28x5)

There will be 5 different neurons all looking at the same region in the input volume5

Page 61: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

61

Let us assume filter is an “eye” detector.

Q.: how can we make the detection robust to the exact location of the eye?

Pooling Layer

Ranzato

Page 62: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

62

By “pooling” (e.g., taking max) filterresponses at different locations we gainrobustness to the exact spatial locationof features.

Ranzato

Pooling Layer

Page 63: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201663

Pooling layer- makes the representations smaller and more manageable - operates over each activation map independently:

Page 64: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201664

1 1 2 4

5 6 7 8

3 2 1 0

1 2 3 4

Single depth slice

x

y

max pool with 2x2 filters and stride 2 6 8

3 4

MAX POOLING

Page 65: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

65

ConvNets: Typical Stage

Convol. Pooling

One stage (zoom)

courtesy ofK. Kavukcuoglu Ranzato

Page 66: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Neural network

“clown fish”

Learned

Page 67: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Deep neural network

“clown fish”

Learned

Page 68: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Computation in a neural net

Input representation

Output representation

Page 69: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Computation in a neural net

Input representation

Output representation

Page 70: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Computation in a neural net

Rectified linear unit (ReLU)

Page 71: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Computation in a neural net

Last layer

dolphin

cat

grizzly bear

angel fish

chameleon

iguana

elephant

clown fish

“clown fish”argmax

Page 72: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Learning with deep nets

“clown fish”

Learned

Page 73: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Learning with deep nets

“clown fish”

“grizzly bear”

“chameleon”

Train network to associate the right

label with each image

Page 74: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Learning with deep nets

“clown fish”

Loss

Learned

Page 75: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Loss function

“clown fish”

Loss error

Network output

dolphin

cat

grizzly bear

angel fish

chameleon

iguana

elephant

clown fish

Ground truth label

Page 76: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Loss function

“clown fish”

Loss small

Network output

dolphin

cat

grizzly bear

angel fish

chameleon

iguana

elephant

clown fish

Ground truth label

Page 77: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Loss function

“grizzly bear”

Loss large

Network output

dolphin

cat

grizzly bear

angel fish

chameleon

iguana

elephant

clown fish

Ground truth label

Page 78: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Loss function for classificationNetwork output

dolphin

cat

grizzly bear

angel fish

chameleon

clown fish

Ground truth label

iguana

elephant

Probability of the observed data under

the model

Results in learning a probability model !

Page 79: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Learning with deep nets

“clown fish”

Loss

Learned

Page 80: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Learning with deep nets

“grizzly bear”

Loss

Learned

Page 81: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Learning with deep nets

“chameleon”

Loss

Learned

Page 82: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Gradient descent

One iteration of gradient descent:

learning rate

Page 83: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Gradient descent

Page 84: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Texture synthesis by non-parametric sampling

p

Synthesizing a pixel

non-parametricsampling

Input image

[Efros & Leung 1999]

Models

Page 85: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Input partial image

Predicted color of next

pixel

Texture synthesis with a deep net

“white”

PixelCNN [van der Oord et al. 2016]

Page 86: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Texture synthesis with a deep net

“white”

[van der Oord et al. 2016]

Page 87: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Sampling

Network output

turquoise

blue

green

red

orange

gray

black

white

p

Page 88: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Sampling

Network output

p

turquoise

blue

green

red

orange

gray

black

white

prob

abilit

y

Page 89: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Sampling

Network output

prob

abilit

y

turquoise

blue

green

red

orange

gray

black

white

Page 90: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Sampling

Network output

prob

abilit

y

turquoise

blue

green

red

orange

gray

black

white

Page 91: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Sampling

Network output

turquoise

blue

green

red

orange

gray

black

white

prob

abilit

y

Page 92: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Sampling

Network output

turquoise

blue

green

red

orange

gray

black

white

prob

abilit

y

Page 93: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

“yellow”

Page 94: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Discriminative Deep Networks

96

“Rockfish”

Page 95: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Discriminative Deep Networks

97

Raw, Unlabeled Pixels

Page 96: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Generative Deep Networks

98

Raw, Unlabeled Pixels

Page 97: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Ansel Adams. Yosemite Valley Bridge.99

Page 98: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Grayscale image: L channel Color information: ab channels

abL

Zhang, Isola, Efros. Colorful Image Colorization. In ECCV, 2016.100

Page 99: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Concatenate (L,ab) channels

101

Grayscale image: L channel

Zhang, Isola, Efros. Colorful Image Colorization. In ECCV, 2016.

abL

Page 100: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Input Output Ground truth

Simple L2 regression doesn’t work

Page 101: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons
Page 102: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Better Loss Function Colors in ab space(discrete)

104

• Regression with L2 loss inadequate

• Use per-pixel multinomial classification

Page 103: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

105

Page 104: Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view of CONV Layer 32 32. 3. 28. 28. E.g. with 5 filters, CONV layer consists of neurons

Color distribution cross-entropy loss with colorfulness enhancing term.

Zhang et al. 2016

[Zhang, Isola, Efros, ECCV 2016]

Designing loss functionsInput Ground truth