Lecture 3 recap · Neural Network Credit: Li/Karpathy/Johnson 3 Prof. Leal-Taixé and Prof....

41
Lecture 3 recap 1 Prof. Leal-Taixé and Prof. Niessner

Transcript of Lecture 3 recap · Neural Network Credit: Li/Karpathy/Johnson 3 Prof. Leal-Taixé and Prof....

Page 1: Lecture 3 recap · Neural Network Credit: Li/Karpathy/Johnson 3 Prof. Leal-Taixé and Prof. Niessner. Neural Network Depth Width 4 Prof. Leal-Taixé and Prof. Niessner. Neural Network

Lecture 3 recap

1Prof. Leal-Taixé and Prof. Niessner

Page 2: Lecture 3 recap · Neural Network Credit: Li/Karpathy/Johnson 3 Prof. Leal-Taixé and Prof. Niessner. Neural Network Depth Width 4 Prof. Leal-Taixé and Prof. Niessner. Neural Network

Beyond linear• Linear score function 𝑓 = 𝑊𝑥

Credit: Li/Karpathy/Johnson

On CIFAR-10

On ImageNet2

Prof. Leal-Taixé and Prof. Niessner

Page 3: Lecture 3 recap · Neural Network Credit: Li/Karpathy/Johnson 3 Prof. Leal-Taixé and Prof. Niessner. Neural Network Depth Width 4 Prof. Leal-Taixé and Prof. Niessner. Neural Network

Neural Network

Credit: Li/Karpathy/Johnson

3Prof. Leal-Taixé and Prof. Niessner

Page 4: Lecture 3 recap · Neural Network Credit: Li/Karpathy/Johnson 3 Prof. Leal-Taixé and Prof. Niessner. Neural Network Depth Width 4 Prof. Leal-Taixé and Prof. Niessner. Neural Network

Neural Network

Depth

Wid

th

4Prof. Leal-Taixé and Prof. Niessner

Page 5: Lecture 3 recap · Neural Network Credit: Li/Karpathy/Johnson 3 Prof. Leal-Taixé and Prof. Niessner. Neural Network Depth Width 4 Prof. Leal-Taixé and Prof. Niessner. Neural Network

Neural Network• Linear score function 𝑓 = 𝑊𝑥

• Neural network is a nesting of ‘functions’– 2-layers: 𝑓 = 𝑊2max(0,𝑊1𝑥)

– 3-layers: 𝑓 = 𝑊3max(0,𝑊2max(0,𝑊1𝑥))

– 4-layers: 𝑓 = 𝑊4 tanh (W3, max(0,𝑊2max(0,𝑊1𝑥)))

– 5-layers: 𝑓 = 𝑊5𝜎(𝑊4 tanh(W3, max(0,𝑊2max(0,𝑊1𝑥))))

– … up to hundreds of layers

5Prof. Leal-Taixé and Prof. Niessner

Page 6: Lecture 3 recap · Neural Network Credit: Li/Karpathy/Johnson 3 Prof. Leal-Taixé and Prof. Niessner. Neural Network Depth Width 4 Prof. Leal-Taixé and Prof. Niessner. Neural Network

Computational Graphs• Neural network is a computational graph

– It has compute nodes

– It has edges that connect nodes

– It is directional

– It is organized in ‘layers’

6Prof. Leal-Taixé and Prof. Niessner

Page 7: Lecture 3 recap · Neural Network Credit: Li/Karpathy/Johnson 3 Prof. Leal-Taixé and Prof. Niessner. Neural Network Depth Width 4 Prof. Leal-Taixé and Prof. Niessner. Neural Network

Backprop con’t

7Prof. Leal-Taixé and Prof. Niessner

Page 8: Lecture 3 recap · Neural Network Credit: Li/Karpathy/Johnson 3 Prof. Leal-Taixé and Prof. Niessner. Neural Network Depth Width 4 Prof. Leal-Taixé and Prof. Niessner. Neural Network

The importance of gradients• All optimization schemes are based on computing

gradients

• One can compute gradients analytically but what if our function is too complex?

• Break down gradient computation Backpropagation

Rumelhart 19868

Prof. Leal-Taixé and Prof. Niessner

Page 9: Lecture 3 recap · Neural Network Credit: Li/Karpathy/Johnson 3 Prof. Leal-Taixé and Prof. Niessner. Neural Network Depth Width 4 Prof. Leal-Taixé and Prof. Niessner. Neural Network

Backprop: Forward Pass• 𝑓 𝑥, 𝑦, 𝑧 = 𝑥 + 𝑦 ⋅ 𝑧 Initialization 𝑥 = 1, 𝑦 = −3, 𝑧 = 4

mult

sum 𝑓 = −8

1

−3

4

𝑑 = −2

1

−3

4

Prof. Leal-Taixé and Prof. Niessner 9

Page 10: Lecture 3 recap · Neural Network Credit: Li/Karpathy/Johnson 3 Prof. Leal-Taixé and Prof. Niessner. Neural Network Depth Width 4 Prof. Leal-Taixé and Prof. Niessner. Neural Network

Backprop: Backward Pass

with 𝑥 = 1, 𝑦 = −3, 𝑧 = 4

mult

sum 𝑓 = −8

1

−3

4

𝑑 = −2

1

−3

4

𝑓 𝑥, 𝑦, 𝑧 = 𝑥 + 𝑦 ⋅ 𝑧

𝑑 = 𝑥 + 𝑦𝜕𝑑

𝜕𝑥= 1, 𝜕𝑑

𝜕𝑦= 1

𝑓 = 𝑑 ⋅ 𝑧𝜕𝑓

𝜕𝑑= 𝑧, 𝜕𝑓

𝜕𝑧= 𝑑

What is 𝜕𝑓𝜕𝑥,𝜕𝑓

𝜕𝑦,𝜕𝑓

𝜕𝑧?

Prof. Leal-Taixé and Prof. Niessner 10

Page 11: Lecture 3 recap · Neural Network Credit: Li/Karpathy/Johnson 3 Prof. Leal-Taixé and Prof. Niessner. Neural Network Depth Width 4 Prof. Leal-Taixé and Prof. Niessner. Neural Network

Backprop: Backward Pass

with 𝑥 = 1, 𝑦 = −3, 𝑧 = 4

mult

sum 𝑓 = −8

1

−3

4

𝑑 = −2

1

−3

4

𝑓 𝑥, 𝑦, 𝑧 = 𝑥 + 𝑦 ⋅ 𝑧

𝑑 = 𝑥 + 𝑦𝜕𝑑

𝜕𝑥= 1, 𝜕𝑑

𝜕𝑦= 1

𝑓 = 𝑑 ⋅ 𝑧𝜕𝑓

𝜕𝑑= 𝑧, 𝜕𝑓

𝜕𝑧= 𝑑

What is 𝜕𝑓𝜕𝑥,𝜕𝑓

𝜕𝑦,𝜕𝑓

𝜕𝑧?

𝜕𝑓

𝜕𝑓

1

Prof. Leal-Taixé and Prof. Niessner 11

Page 12: Lecture 3 recap · Neural Network Credit: Li/Karpathy/Johnson 3 Prof. Leal-Taixé and Prof. Niessner. Neural Network Depth Width 4 Prof. Leal-Taixé and Prof. Niessner. Neural Network

Backprop: Backward Pass

with 𝑥 = 1, 𝑦 = −3, 𝑧 = 4

mult

sum 𝑓 = −8

1

−3

4

𝑑 = −2

1

−3

4

𝑓 𝑥, 𝑦, 𝑧 = 𝑥 + 𝑦 ⋅ 𝑧

𝑑 = 𝑥 + 𝑦𝜕𝑑

𝜕𝑥= 1, 𝜕𝑑

𝜕𝑦= 1

𝑓 = 𝑑 ⋅ 𝑧𝜕𝑓

𝜕𝑑= 𝑧, 𝜕𝑓

𝜕𝑧= 𝑑

What is 𝜕𝑓𝜕𝑥,𝜕𝑓

𝜕𝑦,𝜕𝑓

𝜕𝑧?

1

𝜕𝑓

𝜕𝑧

−2−2

Prof. Leal-Taixé and Prof. Niessner 12

Page 13: Lecture 3 recap · Neural Network Credit: Li/Karpathy/Johnson 3 Prof. Leal-Taixé and Prof. Niessner. Neural Network Depth Width 4 Prof. Leal-Taixé and Prof. Niessner. Neural Network

Backprop: Backward Pass

with 𝑥 = 1, 𝑦 = −3, 𝑧 = 4

mult

sum 𝑓 = −8

1

−3

4

𝑑 = −2

1

−3

4

𝑓 𝑥, 𝑦, 𝑧 = 𝑥 + 𝑦 ⋅ 𝑧

𝑑 = 𝑥 + 𝑦𝜕𝑑

𝜕𝑥= 1, 𝜕𝑑

𝜕𝑦= 1

𝑓 = 𝑑 ⋅ 𝑧𝜕𝑓

𝜕𝑑= 𝑧, 𝜕𝑓

𝜕𝑧= 𝑑

What is 𝜕𝑓𝜕𝑥,𝜕𝑓

𝜕𝑦,𝜕𝑓

𝜕𝑧?

−2

𝜕𝑓

𝜕𝑑

4

1

−2

Prof. Leal-Taixé and Prof. Niessner 13

Page 14: Lecture 3 recap · Neural Network Credit: Li/Karpathy/Johnson 3 Prof. Leal-Taixé and Prof. Niessner. Neural Network Depth Width 4 Prof. Leal-Taixé and Prof. Niessner. Neural Network

Backprop: Backward Pass

with 𝑥 = 1, 𝑦 = −3, 𝑧 = 4

mult

sum 𝑓 = −8

1

−3

4

𝑑 = −2

1

−3

4

𝑓 𝑥, 𝑦, 𝑧 = 𝑥 + 𝑦 ⋅ 𝑧

𝑑 = 𝑥 + 𝑦𝜕𝑑

𝜕𝑥= 1, 𝜕𝑑

𝜕𝑦= 1

𝑓 = 𝑑 ⋅ 𝑧𝜕𝑓

𝜕𝑑= 𝑧, 𝜕𝑓

𝜕𝑧= 𝑑

What is 𝜕𝑓𝜕𝑥,𝜕𝑓

𝜕𝑦,𝜕𝑓

𝜕𝑧?

1

4

𝜕𝑓

𝜕𝑦

4

𝜕𝑓

𝜕𝑦=𝜕𝑓

𝜕𝑑⋅𝜕𝑑

𝜕𝑦

Chain Rule:

→𝜕𝑓

𝜕𝑦= 4 ⋅ 1 = 4

4

−2

Prof. Leal-Taixé and Prof. Niessner 14

Page 15: Lecture 3 recap · Neural Network Credit: Li/Karpathy/Johnson 3 Prof. Leal-Taixé and Prof. Niessner. Neural Network Depth Width 4 Prof. Leal-Taixé and Prof. Niessner. Neural Network

Backprop: Backward Pass

with 𝑥 = 1, 𝑦 = −3, 𝑧 = 4

mult

sum 𝑓 = −8

1

−3

4

𝑑 = −2

1

−3

4

𝑓 𝑥, 𝑦, 𝑧 = 𝑥 + 𝑦 ⋅ 𝑧

𝑑 = 𝑥 + 𝑦𝜕𝑑

𝜕𝑥= 1, 𝜕𝑑

𝜕𝑦= 1

𝑓 = 𝑑 ⋅ 𝑧𝜕𝑓

𝜕𝑑= 𝑧, 𝜕𝑓

𝜕𝑧= 𝑑

What is 𝜕𝑓𝜕𝑥,𝜕𝑓

𝜕𝑦,𝜕𝑓

𝜕𝑧?

1

−2

44

−2

𝜕𝑓

𝜕𝑥=𝜕𝑓

𝜕𝑑⋅𝜕𝑑

𝜕𝑥

Chain Rule:

→𝜕𝑓

𝜕𝑥= 4 ⋅ 1 = 4

𝜕𝑓

𝜕𝑥

4

4

4

Prof. Leal-Taixé and Prof. Niessner 15

Page 16: Lecture 3 recap · Neural Network Credit: Li/Karpathy/Johnson 3 Prof. Leal-Taixé and Prof. Niessner. Neural Network Depth Width 4 Prof. Leal-Taixé and Prof. Niessner. Neural Network

Compute Graphs -> Neural Networks• 𝑥𝑘 input variables• 𝑤𝑙,𝑚,𝑛 network weights (note 3 indices)

– l which layer– m which neuron in layer– n weights in neuron

• 𝑦𝑖 computed output (i output dim; nout)• 𝑡𝑖 ground truth targets• 𝐿 is loss function

Prof. Leal-Taixé and Prof. Niessner 16

Page 17: Lecture 3 recap · Neural Network Credit: Li/Karpathy/Johnson 3 Prof. Leal-Taixé and Prof. Niessner. Neural Network Depth Width 4 Prof. Leal-Taixé and Prof. Niessner. Neural Network

Compute Graphs -> Neural Networks

Prof. Leal-Taixé and Prof. Niessner 17

𝑥0

𝑥1

𝑦0 𝑡0

Input layer Output layer

e.g., class label / regression target

𝑥0

𝑥1 ∗ 𝑤1

∗ 𝑤0

-𝑡0+ x*x

Input Weights(unknowns!)

L2 Lossfunction

Loss/cost

We want to compute gradients w.r.t. all weights w

Page 18: Lecture 3 recap · Neural Network Credit: Li/Karpathy/Johnson 3 Prof. Leal-Taixé and Prof. Niessner. Neural Network Depth Width 4 Prof. Leal-Taixé and Prof. Niessner. Neural Network

Compute Graphs -> Neural Networks

Prof. Leal-Taixé and Prof. Niessner 18

𝑥0

𝑥1

𝑦0 𝑡0

Input layer Output layer

e.g., class label / regression target

𝑥0

𝑥1 ∗ 𝑤1

∗ 𝑤0

-𝑡0+ x*x

Input Weights(unknowns!)

L2 Lossfunction

Loss/cost

We want to compute gradients w.r.t. all weights w

max(0, 𝑥)

ReLU Activation(btw. I’m not arguingthis is the right choice here)

Page 19: Lecture 3 recap · Neural Network Credit: Li/Karpathy/Johnson 3 Prof. Leal-Taixé and Prof. Niessner. Neural Network Depth Width 4 Prof. Leal-Taixé and Prof. Niessner. Neural Network

Compute Graphs -> Neural Networks

Prof. Leal-Taixé and Prof. Niessner 19

𝑥0

𝑥1

𝑦0 𝑡0

Input layer Output layer

𝑦1

𝑦2

𝑡1

𝑡2

𝑥0

𝑥1

∗ 𝑤0,0

-𝑡0+ x*x

Loss/cost

-𝑡1+ x*x

Loss/cost

∗ 𝑤0,1

∗ 𝑤1,0

∗ 𝑤1,1

-𝑡2+ x*x

Loss/cost∗ 𝑤2,0

∗ 𝑤2,1

We want to compute gradients w.r.t. all weights w

Page 20: Lecture 3 recap · Neural Network Credit: Li/Karpathy/Johnson 3 Prof. Leal-Taixé and Prof. Niessner. Neural Network Depth Width 4 Prof. Leal-Taixé and Prof. Niessner. Neural Network

Compute Graphs -> Neural Networks

Prof. Leal-Taixé and Prof. Niessner 20

𝑥0

𝑥𝑘

𝑦0 𝑡0

Input layer Output layer

𝑦1 𝑡1

𝑦𝑖 = 𝐴(𝑏𝑖 +

𝑘

𝑥𝑘𝑤𝑖,𝑘)

𝐿𝑖 = 𝑦𝑖 − 𝑡𝑖2

𝐿 =

𝑖

𝐿𝑖

We want to compute gradients w.r.t. all weights w*AND* biases b

L2 loss -> simply sum up squaresEnergy to minimize is E=L

Activation function bias

𝜕𝐿

𝜕𝑤𝑖,𝑘=

𝜕𝐿

𝜕𝑦𝑖⋅𝜕𝑦𝑖𝜕𝑤𝑖,𝑘

-> use chain rule to compute partials

Page 21: Lecture 3 recap · Neural Network Credit: Li/Karpathy/Johnson 3 Prof. Leal-Taixé and Prof. Niessner. Neural Network Depth Width 4 Prof. Leal-Taixé and Prof. Niessner. Neural Network

Gradient Descent

Optimum

Initialization

21Prof. Leal-Taixé and Prof. Niessner

Page 22: Lecture 3 recap · Neural Network Credit: Li/Karpathy/Johnson 3 Prof. Leal-Taixé and Prof. Niessner. Neural Network Depth Width 4 Prof. Leal-Taixé and Prof. Niessner. Neural Network

Gradient Descent• From derivative to gradient

• Gradient steps in direction of negative gradient

Direction of greatest

increase of the function

Learning rate22

Prof. Leal-Taixé and Prof. Niessner

Page 23: Lecture 3 recap · Neural Network Credit: Li/Karpathy/Johnson 3 Prof. Leal-Taixé and Prof. Niessner. Neural Network Depth Width 4 Prof. Leal-Taixé and Prof. Niessner. Neural Network

Gradient Descent for Neural Networks

Prof. Leal-Taixé and Prof. Niessner 23

𝛻𝑤𝑓𝑥,𝑡 (𝑤) =

𝜕𝑓

𝜕𝑤0,0,0……𝜕𝑓

𝜕𝑤𝑙,𝑚,𝑛

For a given training pair {x,t},we want to update all weights;i.e., we need to compute derivativesw.r.t. to all weights

𝑙 Layers

𝑚N

euro

ns

Gradient step:𝑤′ = 𝑤 − 𝜖𝛻𝑤𝑓𝑥,𝑡 (𝑤)

Page 24: Lecture 3 recap · Neural Network Credit: Li/Karpathy/Johnson 3 Prof. Leal-Taixé and Prof. Niessner. Neural Network Depth Width 4 Prof. Leal-Taixé and Prof. Niessner. Neural Network

Gradient Descent for Neural Networks

Prof. Leal-Taixé and Prof. Niessner 24

𝑥0

𝑥1

𝑥2

ℎ0

ℎ1

ℎ2

ℎ3

𝑦0

𝑦1

𝑡0

𝑡1

𝑦𝑖 = 𝐴(𝑏1,𝑖 +

𝑗

ℎ𝑗𝑤1,𝑖,𝑗)ℎ𝑗 = 𝐴(𝑏0,𝑗 +

𝑘

𝑥𝑘𝑤0,𝑗,𝑘)

𝐿𝑖 = 𝑦𝑖 − 𝑡𝑖2

𝛻𝑤𝑓𝑥,𝑡 (𝑤) =

𝜕𝑓

𝜕𝑤0,0,0……𝜕𝑓

𝜕𝑤𝑙,𝑚,𝑛……𝜕𝑓

𝜕𝑏𝑙,𝑚

Just simple: 𝐴 𝑥 = max(0, 𝑥)

Page 25: Lecture 3 recap · Neural Network Credit: Li/Karpathy/Johnson 3 Prof. Leal-Taixé and Prof. Niessner. Neural Network Depth Width 4 Prof. Leal-Taixé and Prof. Niessner. Neural Network

Gradient Descent for Neural Networks

Prof. Leal-Taixé and Prof. Niessner 25

𝑥0

𝑥1

𝑥2

ℎ0

ℎ1

ℎ2

ℎ3

𝑦0

𝑦1

𝑡0

𝑡1

𝑦𝑖 = 𝐴(𝑏1,𝑖 +

𝑗

ℎ𝑗𝑤1,𝑖,𝑗)ℎ𝑗 = 𝐴(𝑏0,𝑗 +

𝑘

𝑥𝑘𝑤0,𝑗,𝑘)

𝐿𝑖 = 𝑦𝑖 − 𝑡𝑖2

Backpropagation

𝜕𝐿

𝜕𝑤1,𝑖,𝑗=

𝜕𝐿

𝜕𝑦𝑖⋅𝜕𝑦𝑖

𝜕𝑤1,𝑖,𝑗

𝜕𝐿

𝜕𝑤0,𝑗,𝑘=

𝜕𝐿

𝜕𝑦𝑖⋅𝜕𝑦𝑖𝜕ℎ𝑗

⋅𝜕ℎ𝑗

𝜕𝑤0,𝑗,𝑘

Just

go

th

rou

gh la

yer

by

laye

r

𝜕𝐿𝑖𝜕𝑦𝑖

= 2(yi − ti)

𝜕𝑦𝑖

𝜕𝑤1,𝑖,𝑗= ℎ𝑗 if > 0, else 0

Page 26: Lecture 3 recap · Neural Network Credit: Li/Karpathy/Johnson 3 Prof. Leal-Taixé and Prof. Niessner. Neural Network Depth Width 4 Prof. Leal-Taixé and Prof. Niessner. Neural Network

Gradient Descent for Neural Networks

Prof. Leal-Taixé and Prof. Niessner 26

𝑥0

𝑥1

𝑥2

ℎ0

ℎ1

ℎ2

ℎ3

𝑦0

𝑦1

𝑡0

𝑡1

𝑦𝑖 = 𝐴(𝑏1,𝑖 +

𝑗

ℎ𝑗𝑤1,𝑖,𝑗)

𝐿𝑖 = 𝑦𝑖 − 𝑡𝑖2

How many unknown weights?

Output layer: 2 ⋅ 4 + 2

Hidden Layer: 4 ⋅ 3 + 4

#neurons #biases#input channels

Note that some activations have also weightsℎ𝑗 = 𝐴(𝑏0,𝑗 +

𝑘

𝑥𝑘𝑤0,𝑗,𝑘)

Page 27: Lecture 3 recap · Neural Network Credit: Li/Karpathy/Johnson 3 Prof. Leal-Taixé and Prof. Niessner. Neural Network Depth Width 4 Prof. Leal-Taixé and Prof. Niessner. Neural Network

Derivatives of Cross Entropy Loss

Prof. Leal-Taixé and Prof. Niessner 27

[Sadowski]

Page 28: Lecture 3 recap · Neural Network Credit: Li/Karpathy/Johnson 3 Prof. Leal-Taixé and Prof. Niessner. Neural Network Depth Width 4 Prof. Leal-Taixé and Prof. Niessner. Neural Network

Derivatives of Cross Entropy Loss

Prof. Leal-Taixé and Prof. Niessner 28

[Sadowski]

Gradients of weights of last layer:

Page 29: Lecture 3 recap · Neural Network Credit: Li/Karpathy/Johnson 3 Prof. Leal-Taixé and Prof. Niessner. Neural Network Depth Width 4 Prof. Leal-Taixé and Prof. Niessner. Neural Network

Derivatives of Cross Entropy Loss

Prof. Leal-Taixé and Prof. Niessner 29

[Sadowski]

Gradients of weights of first layer:

ℎ𝑗

Page 30: Lecture 3 recap · Neural Network Credit: Li/Karpathy/Johnson 3 Prof. Leal-Taixé and Prof. Niessner. Neural Network Depth Width 4 Prof. Leal-Taixé and Prof. Niessner. Neural Network

Derivatives of Neural Networks

Prof. Leal-Taixé and Prof. Niessner 30

Combining nodes:Linear activation node + hinge loss + regularization

Credit: Li/Karpathy/Johnson

Page 31: Lecture 3 recap · Neural Network Credit: Li/Karpathy/Johnson 3 Prof. Leal-Taixé and Prof. Niessner. Neural Network Depth Width 4 Prof. Leal-Taixé and Prof. Niessner. Neural Network

Can become quite complex…

Prof. Leal-Taixé and Prof. Niessner 31

𝑙 Layers

𝑚N

euro

ns

Page 32: Lecture 3 recap · Neural Network Credit: Li/Karpathy/Johnson 3 Prof. Leal-Taixé and Prof. Niessner. Neural Network Depth Width 4 Prof. Leal-Taixé and Prof. Niessner. Neural Network

Can become quite complex…• These graphs can be huge!

32Prof. Leal-Taixé and Prof. Niessner

Page 33: Lecture 3 recap · Neural Network Credit: Li/Karpathy/Johnson 3 Prof. Leal-Taixé and Prof. Niessner. Neural Network Depth Width 4 Prof. Leal-Taixé and Prof. Niessner. Neural Network

The flow of the gradientsA

ctiv

atio

ns

Activation function 33Prof. Leal-Taixé and Prof. Niessner

Page 34: Lecture 3 recap · Neural Network Credit: Li/Karpathy/Johnson 3 Prof. Leal-Taixé and Prof. Niessner. Neural Network Depth Width 4 Prof. Leal-Taixé and Prof. Niessner. Neural Network

The flow of the gradientsA

ctiv

atio

ns

Activation function 34Prof. Leal-Taixé and Prof. Niessner

Page 35: Lecture 3 recap · Neural Network Credit: Li/Karpathy/Johnson 3 Prof. Leal-Taixé and Prof. Niessner. Neural Network Depth Width 4 Prof. Leal-Taixé and Prof. Niessner. Neural Network

The flow of the gradients

• Many many many many of these nodes form a neural network

• Each one has its own work to do

NEURONS

FORWARD AND BACKWARD PASS

35Prof. Leal-Taixé and Prof. Niessner

Page 36: Lecture 3 recap · Neural Network Credit: Li/Karpathy/Johnson 3 Prof. Leal-Taixé and Prof. Niessner. Neural Network Depth Width 4 Prof. Leal-Taixé and Prof. Niessner. Neural Network

Gradient descent

36Prof. Leal-Taixé and Prof. Niessner

Page 37: Lecture 3 recap · Neural Network Credit: Li/Karpathy/Johnson 3 Prof. Leal-Taixé and Prof. Niessner. Neural Network Depth Width 4 Prof. Leal-Taixé and Prof. Niessner. Neural Network

Gradient Descent

Optimum

Initialization

37Prof. Leal-Taixé and Prof. Niessner

Page 38: Lecture 3 recap · Neural Network Credit: Li/Karpathy/Johnson 3 Prof. Leal-Taixé and Prof. Niessner. Neural Network Depth Width 4 Prof. Leal-Taixé and Prof. Niessner. Neural Network

Gradient Descent• From derivative to gradient

• Gradient steps in direction of negative gradient

Direction of greatest

increase of the function

Learning rate38

Prof. Leal-Taixé and Prof. Niessner

Page 39: Lecture 3 recap · Neural Network Credit: Li/Karpathy/Johnson 3 Prof. Leal-Taixé and Prof. Niessner. Neural Network Depth Width 4 Prof. Leal-Taixé and Prof. Niessner. Neural Network

Gradient Descent• How to pick good learning rate?

• How to compute gradient for single training pair?

• How to compute gradient for large training set?

• How to speed things up

Prof. Leal-Taixé and Prof. Niessner 39

Page 40: Lecture 3 recap · Neural Network Credit: Li/Karpathy/Johnson 3 Prof. Leal-Taixé and Prof. Niessner. Neural Network Depth Width 4 Prof. Leal-Taixé and Prof. Niessner. Neural Network

Next lecture• This week:

– Exercise 1 starts next Tuesday (Nov 20th)– Exercise session will introduce it!

• Next lecture on Nov 22nd: – Optimization of Neural Networks– In particular, introduction to SGD (our main method!)

40Prof. Leal-Taixé and Prof. Niessner

Page 41: Lecture 3 recap · Neural Network Credit: Li/Karpathy/Johnson 3 Prof. Leal-Taixé and Prof. Niessner. Neural Network Depth Width 4 Prof. Leal-Taixé and Prof. Niessner. Neural Network

See you next week!

Prof. Leal-Taixé and Prof. Niessner 41