Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view...

Convolutional Neural Networks

CS194: Image Manipulation, Comp. Vision, and Comp. PhotoAlexei Efros, UC Berkeley, Spring 2020

image X

Convolutional Neural

Network“Penguin”

label Y

Neural Nets: a particularly useful Black Box

image X

“Penguin”

label Y

Classic Object Recognition

Texture

Colors

Segments

Slide by Philip Isola

Feature extractors

Classifier

image X

“Penguin”

label Y

Classic Object Recognition

Texture

Colors

Segments

Feature extractors

Classifier

Learned

image X label Y

Learning Features

Texture

Colors

Feature extractorsLearned

image X

“Penguin”

label Y

Neural Network:algorithm + feature + data!

Learned

Vanilla (fully-connected) Neural Networks

Example: 200x200 image40K hidden units

~2B parameters!!!

- Spatial correlation is local- Waste of resources + we have not enough

training samples anyway..

Fully Connected Layer

Ranzato

Locally Connected Layer

Example: 200x200 image40K hidden unitsFilter size: 10x10

4M parameters

Ranzato

Note: This parameterization is good when input image is registered (e.g., face recognition).

STATIONARITY? Statistics is similar at different locations

Ranzato

Locally Connected Layer

Example: 200x200 image40K hidden unitsFilter size: 10x10

4M parameters

Convolutional Layer

Share the same parameters across different locations (assuming input is stationary):Convolutions with learned kernels

Ranzato

Convolutional Layer

Ranzato

Convolutional Layer

Ranzato

Convolutional Layer

Ranzato

Convolutional Layer

Ranzato

Convolutional Layer

Ranzato

Convolutional Layer

Ranzato

Convolutional Layer

Ranzato

Convolutional Layer

Ranzato

Convolutional Layer

Ranzato

Convolutional Layer

Ranzato

Convolutional Layer

Ranzato

Convolutional Layer

Ranzato

Convolutional Layer

Ranzato

Convolutional Layer

Ranzato

Convolutional Layer

Ranzato

Convolutional Layer

Ranzato

Learn multiple filters.

E.g.: 200x200 image100 FiltersFilter size: 10x10

10K parameters

Ranzato

Convolutional Layer

Fei-Fei Li & Andrej Karpathy Lecture 7 - 21 Jan 201529

before:

input layer hidden layer

output layer

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201630

Convolution Layer32x32x3 image

height

Convolution Layer

5x5x3 filter

32x32x3 image

Convolve the filter with the imagei.e. “slide over the image spatially, computing dot products”

Convolution Layer

5x5x3 filter

32x32x3 image

Convolve the filter with the imagei.e. “slide over the image spatially, computing dot products”

Filters always extend the full depth of the input volume

Convolution Layer32x32x3 image5x5x3 filter

1 number: the result of taking a dot product between the filter and a small 5x5x3 chunk of the image(i.e. 5*5*3 = 75-dimensional dot product + bias)

convolve (slide) over all spatial locations

activation map

activation maps

consider a second, green filter

Convolution Layer

activation maps

For example, if we had 6 5x5 filters, we’ll get 6 separate activation maps:

We stack these up to get a “new image” of size 28x28x6!

convolving the first filter in the input gives the first slice of depth in output volume

Preview: ConvNet is a sequence of Convolution Layers, interspersed with activation functions

CONV,ReLUe.g. 6 5x5x3 filters

Preview: ConvNet is a sequence of Convolutional Layers, interspersed with activation functions

CONV,ReLUe.g. 6 5x5x3 filters 28

CONV,ReLU

A closer look at spatial dimensions:

32x32x3 image5x5x3 filter

activation map

7x7 input (spatially)assume 3x3 filter

=> 5x5 output

7x7 input (spatially)assume 3x3 filterapplied with stride 2

7x7 input (spatially)assume 3x3 filterapplied with stride 2=> 3x3 output!

7x7 input (spatially)assume 3x3 filterapplied with stride 3?

doesn’t fit! cannot apply 3x3 filter on 7x7 input with stride 3.

Output size:(N - F) / stride + 1

e.g. N = 7, F = 3:stride 1 => (7 - 3)/1 + 1 = 5stride 2 => (7 - 3)/2 + 1 = 3stride 3 => (7 - 3)/3 + 1 = 2.33 :\

In practice: Common to zero pad the border0 0 0 0 0 0

e.g. input 7x73x3 filter, applied with stride 1 pad with 1 pixel border => what is the output?

(recall:)(N - F) / stride + 1

In practice: Common to zero pad the border

7x7 output!

0 0 0 0 0 0

In practice: Common to zero pad the border

7x7 output!in general, common to see CONV layers with stride 1, filters of size FxF, and zero-padding with (F-1)/2. (will preserve size spatially)e.g. F = 3 => zero pad with 1

F = 5 => zero pad with 2F = 7 => zero pad with 3

0 0 0 0 0 0

Remember back to… E.g. 32x32 input convolved repeatedly with 5x5 filters shrinks volumes spatially!(32 -> 28 -> 24 ...). Shrinking too fast is not good, doesn’t work well.

CONV,ReLUe.g. 6 5x5x3 filters 28

CONV,ReLU

(btw, 1x1 convolution layers make perfect sense)

561x1 CONVwith 32 filters

56(each filter has size 1x1x64, and performs a 64-dimensional dot product)

The brain/neuron view of CONV Layer

1 number: the result of taking a dot product between the filter and this part of the image(i.e. 5*5*3 = 75-dimensional dot product)

It’s just a neuron with local connectivity...

An activation map is a 28x28 sheet of neuron outputs:1. Each is connected to a small region in the input2. All of them share parameters

“5x5 filter” -> “5x5 receptive field for each neuron”28

E.g. with 5 filters,CONV layer consists of neurons arranged in a 3D grid(28x28x5)

There will be 5 different neurons all looking at the same region in the input volume5

Let us assume filter is an “eye” detector.

Q.: how can we make the detection robust to the exact location of the eye?

Pooling Layer

Ranzato

By “pooling” (e.g., taking max) filterresponses at different locations we gainrobustness to the exact spatial locationof features.

Ranzato

Pooling Layer

Pooling layer- makes the representations smaller and more manageable - operates over each activation map independently:

1 1 2 4

5 6 7 8

3 2 1 0

1 2 3 4

Single depth slice

max pool with 2x2 filters and stride 2 6 8

MAX POOLING

ConvNets: Typical Stage

Convol. Pooling

One stage (zoom)

courtesy ofK. Kavukcuoglu Ranzato

Neural network

“clown fish”

Learned

Deep neural network

“clown fish”

Learned

Computation in a neural net

Input representation

Output representation

Input representation

Output representation

Rectified linear unit (ReLU)

Last layer

dolphin

grizzly bear

angel fish

chameleon

iguana

elephant

clown fish

“clown fish”argmax

Learning with deep nets

“clown fish”

Learned

“clown fish”

“grizzly bear”

“chameleon”

Train network to associate the right

label with each image

“clown fish”

Learned

Loss function

“clown fish”

Loss error

Network output

dolphin

grizzly bear

angel fish

chameleon

iguana

elephant

clown fish

Ground truth label

Loss function

“clown fish”

Loss small

Network output

dolphin

grizzly bear

angel fish

chameleon

iguana

elephant

clown fish

Ground truth label

Loss function

“grizzly bear”

Loss large

Network output

dolphin

grizzly bear

angel fish

chameleon

iguana

elephant

clown fish

Ground truth label

Loss function for classificationNetwork output

dolphin

grizzly bear

angel fish

chameleon

clown fish

Ground truth label

iguana

elephant

Probability of the observed data under

the model

Results in learning a probability model !

“clown fish”

Learned

“grizzly bear”

Learned

“chameleon”

Learned

Gradient descent

One iteration of gradient descent:

learning rate

Gradient descent

Texture synthesis by non-parametric sampling

Synthesizing a pixel

non-parametricsampling

Input image

[Efros & Leung 1999]

Models

Input partial image

Predicted color of next

Texture synthesis with a deep net

“white”

PixelCNN [van der Oord et al. 2016]

Texture synthesis with a deep net

“white”

[van der Oord et al. 2016]

Sampling

Network output

turquoise

orange

Sampling

Network output

turquoise

orange

abilit

Sampling

Network output

abilit

turquoise

orange

Sampling

Network output

abilit

turquoise

orange

Sampling

Network output

turquoise

orange

abilit

Sampling

Network output

turquoise

orange

abilit

“yellow”

Discriminative Deep Networks

“Rockfish”

Discriminative Deep Networks

Raw, Unlabeled Pixels

Generative Deep Networks

Raw, Unlabeled Pixels

Ansel Adams. Yosemite Valley Bridge.99

Grayscale image: L channel Color information: ab channels

Zhang, Isola, Efros. Colorful Image Colorization. In ECCV, 2016.100

Concatenate (L,ab) channels

Grayscale image: L channel

Zhang, Isola, Efros. Colorful Image Colorization. In ECCV, 2016.

Input Output Ground truth

Simple L2 regression doesn’t work

Better Loss Function Colors in ab space(discrete)

• Regression with L2 loss inadequate

• Use per-pixel multinomial classification

Color distribution cross-entropy loss with colorfulness enhancing term.

Zhang et al. 2016

[Zhang, Isola, Efros, ECCV 2016]

Designing loss functionsInput Ground truth

Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view...

Documents

Transcript of Convolutional Neural Networkscs194-26/sp20/Lectures/... · 2020. 3. 14. · The brain/neuron view...

Catyl Conv

Hauptkatalog Conv

Conv indoamerica

Conv 0918400100710

Conv Firm Address

Conv Dsp Tutorial

Conv prog 2

CNN-Based Simultaneous Dehazing and Depth Estimation./jeanoh/papers/LLOK-dehazing... · 2020. 6. 4. · 1 1 conv 3 3 conv 1 1 conv 2 2 upsample Output Size 32 32 1792 16 16 896 16

Chicago Conv

arXiv:1610.00291v1 [cs.CV] 2 Oct 2016Layer Output Size input image [ 3 x 64 x 64 Layer Name Output Size 64 x 16 x 16 32x4x4 conv, stride 2 + BN + LeakyReLU 32 x 32 x 32 128 x 8 x 8

Conv. ventilation physi

UC Conv Topics

Capital Account Conv

3D Conv Nets for Robotic Manipulation - Liangliang Caollcao.net/cu-deeplearning15/project_final/deep_learning... · 2020. 11. 7. · Label: 1x1x1x nheatmaps. Trained Layer 0 Filters

Conv Report

Georai Tanda Kanchanwadi Kharpudi Udgir …mkv2.mah.nic.in/doi/conv/BSC_AG_GeoraiTanda_Kanchanwadi_Kharpudi...32 2010 A 32 BGT Kathar Atul Balaji 6.87 ... 39 2009 A 34 BGT Mundhe Suryakant

Vadim Conv Encoder

Conv Codes

Conv Loco Items

Conv Guide