TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough...

105
CS 4803 / 7643: Deep Learning Zsolt Kira Georgia Tech Topics: Recurrent Neural Networks (RNNs) BackProp Through Time (BPTT)

Transcript of TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough...

Page 1: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

CS 4803 / 7643: Deep Learning

Zsolt KiraGeorgia Tech

Topics: – Recurrent Neural Networks (RNNs)– BackProp Through Time (BPTT)

Page 2: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Plan for Today• Model

– Recurrent Neural Networks (RNNs)

• Learning – BackProp Through Time (BPTT)

(C) Dhruv Batra and Zsolt Kira 2

Page 3: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

New Topic: RNNs

(C) Dhruv Batra and Zsolt Kira 3Image Credit: Andrej Karpathy

Page 4: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

New Words• Recurrent Neural Networks (RNNs)

• Recursive Neural Networks– General family; think graphs instead of chains

• Types:– “Vanilla” RNNs (Elman Networks)– Long Short Term Memory (LSTMs)– Gated Recurrent Units (GRUs)– …

• Algorithms– BackProp Through Time (BPTT)– BackProp Through Structure (BPTS)

(C) Dhruv Batra and Zsolt Kira 4

Page 5: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

What’s wrong with MLPs?• Problem 1: Can’t model sequences

– Fixed-sized Inputs & Outputs– No temporal structure

(C) Dhruv Batra and Zsolt Kira 5Image Credit: Alex Graves, book

Page 6: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

What’s wrong with MLPs?• Problem 1: Can’t model sequences

– Fixed-sized Inputs & Outputs– No temporal structure

• Problem 2: Pure feed-forward processing– No “memory”, no feedback

(C) Dhruv Batra and Zsolt Kira 6Image Credit: Alex Graves, book

Page 7: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Why model sequences?

Figure Credit: Carlos Guestrin

Page 8: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Why model sequences?

(C) Dhruv Batra and Zsolt Kira 8Image Credit: Alex Graves

Page 9: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Sequences are everywhere…

(C) Dhruv Batra and Zsolt Kira 9Image Credit: Alex Graves and Kevin Gimpel

Page 10: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

(C) Dhruv Batra and Zsolt Kira 10

Even where you might not expect a sequence…

Image Credit: Vinyals et al.

Page 11: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Ba, Mnih, and Kavukcuoglu, “Multiple Object Recognition with Visual Attention”, ICLR 2015.Gregor et al, “DRAW: A Recurrent Neural Network For Image Generation”, ICML 2015Figure copyright Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra, 2015. Reproduced with permission.

Classify images by taking a series of “glimpses”

Even where you might not expect a sequence…

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 12: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Even where you might not expect a sequence…

12Image Credit: Ba et al.; Gregor et al

• Output ordering = sequence

(C) Dhruv Batra and Zsolt Kira

Page 13: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Sequences in Input or Output?• It’s a spectrum…

(C) Dhruv Batra and Zsolt Kira 13

Input: No sequence

Output: No sequence

Example: “standard”

classification / regression problems

Image Credit: Andrej Karpathy

Page 14: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Sequences in Input or Output?• It’s a spectrum…

(C) Dhruv Batra and Zsolt Kira 14

Input: No sequence

Output: No sequence

Example: “standard”

classification / regression problems

Input: No sequence

Output: Sequence

Example: Im2Caption

Image Credit: Andrej Karpathy

Page 15: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Sequences in Input or Output?• It’s a spectrum…

(C) Dhruv Batra and Zsolt Kira 15

Input: No sequence

Output: No sequence

Example: “standard”

classification / regression problems

Input: No sequence

Output: Sequence

Example: Im2Caption

Input: Sequence

Output: No sequence

Example: sentence classification,

multiple-choice question answering

Image Credit: Andrej Karpathy

Page 16: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Sequences in Input or Output?• It’s a spectrum…

(C) Dhruv Batra and Zsolt Kira 16

Input: No sequence

Output: No sequence

Example: “standard”

classification / regression problems

Input: No sequence

Output: Sequence

Example: Im2Caption

Input: Sequence

Output: No sequence

Example: sentence classification,

multiple-choice question answering

Input: Sequence

Output: Sequence

Example: machine translation, video classification, video captioning, open-ended question answering

Image Credit: Andrej Karpathy

Page 17: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

2 Key Ideas• Parameter Sharing

– in computation graphs = adding gradients

(C) Dhruv Batra and Zsolt Kira 17

Page 18: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra and Zsolt Kira 18

Computational Graph

Page 19: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

+

Gradients add at branches

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 20: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

2 Key Ideas• Parameter Sharing

– in computation graphs = adding gradients

• “Unrolling”– in computation graphs with parameter sharing

(C) Dhruv Batra and Zsolt Kira 21

Page 21: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

How do we model sequences?• No input

(C) Dhruv Batra and Zsolt Kira 23Image Credit: Bengio, Goodfellow, Courville

Page 22: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

How do we model sequences?• No input

(C) Dhruv Batra and Zsolt Kira 24Image Credit: Bengio, Goodfellow, Courville

Page 23: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

How do we model sequences?• With inputs

(C) Dhruv Batra and Zsolt Kira 25Image Credit: Bengio, Goodfellow, Courville

Page 24: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

2 Key Ideas• Parameter Sharing

– in computation graphs = adding gradients

• “Unrolling”– in computation graphs with parameter sharing

• Parameter sharing + Unrolling– Allows modeling arbitrary sequence lengths!– Keeps numbers of parameters in check

(C) Dhruv Batra and Zsolt Kira 26

Page 25: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Recurrent Neural Network

x

RNN

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 26: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Recurrent Neural Network

x

RNN

yusually want to predict a vector at some time steps

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 27: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Recurrent Neural Network

x

RNN

yWe can process a sequence of vectors x by applying a recurrence formula at every time step:

new state old state input vector at some time step

some functionwith parameters W

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 28: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Recurrent Neural Network

x

RNN

yWe can process a sequence of vectors x by applying a recurrence formula at every time step:

Notice: the same function and the same set of parameters are used at every time step.

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 29: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

(Vanilla) Recurrent Neural Network

x

RNN

y

The state consists of a single “hidden” vector h:

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231nSometimes called a “Vanilla RNN” or an “Elman RNN” after Prof. Jeffrey Elman

Page 30: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

h0 fW h1

x1

RNN: Computational Graph

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 31: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

h0 fW h1 fW h2

x2x1

RNN: Computational Graph

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 32: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

h0 fW h1 fW h2 fW h3

x3

…x2x1

RNN: Computational Graph

hT

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 33: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

h0 fW h1 fW h2 fW h3

x3

…x2x1W

RNN: Computational Graph

Re-use the same weight matrix at every time-step

hT

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 34: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

h0 fW h1 fW h2 fW h3

x3

yT

…x2x1W

RNN: Computational Graph: Many to Many

hT

y3y2y1

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 35: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

h0 fW h1 fW h2 fW h3

x3

yT

…x2x1W

RNN: Computational Graph: Many to Many

hT

y3y2y1 L1 L2 L3 LT

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 36: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

h0 fW h1 fW h2 fW h3

x3

yT

…x2x1W

RNN: Computational Graph: Many to Many

hT

y3y2y1 L1 L2 L3 LT

L

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 37: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

h0 fW h1 fW h2 fW h3

x3

y

…x2x1W

RNN: Computational Graph: Many to One

hT

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 38: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

h0 fW h1 fW h2 fW h3

yT

…x

W

RNN: Computational Graph: One to Many

hT

y3y2y1

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 39: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Sequence to Sequence: Many-to-one + one-to-many

h0 fW h1 fW h2 fW h3

x3

x2x1W1

hT

Many to one: Encode input sequence in a single vector

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 40: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Sequence to Sequence: Many-to-one + one-to-many

y1 y2

Many to one: Encode input sequence in a single vector

One to many: Produce output sequence from single input vector

fW h1 fW h2 fW

W2

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

h0 fW h1 fW h2 fW h3

x3

x2x1W1

hT

Page 41: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Example: Character-levelLanguage Model

Vocabulary:[h,e,l,o]

Example trainingsequence:“hello”

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 42: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Example: Character-levelLanguage Model

Vocabulary:[h,e,l,o]

Example trainingsequence:“hello”

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 43: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Distributed Representations Toy Example• Local vs Distributed

(C) Dhruv Batra and Zsolt Kira 45Slide Credit: Moontae Lee

Page 44: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Distributed Representations Toy Example• Can we interpret each dimension?

(C) Dhruv Batra and Zsolt Kira 46Slide Credit: Moontae Lee

Page 45: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Power of distributed representations!

(C) Dhruv Batra and Zsolt Kira 47

Local

Distributed

Slide Credit: Moontae Lee

Page 46: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Example: Character-levelLanguage Model

Vocabulary:[h,e,l,o]

Example trainingsequence:“hello”

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 47: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Example: Character-levelLanguage Model

Vocabulary:[h,e,l,o]

Example trainingsequence:“hello”

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Training Time: MLE / “Teacher Forcing”

Page 48: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Example: Character-levelLanguage ModelSampling

Vocabulary:[h,e,l,o]

At test-time sample characters one at a time, feed back to model

.03

.13

.00

.84

.25

.20

.05

.50

.11

.17

.68

.03

.11

.02

.08

.79Softmax

“e” “l” “l” “o”Sample

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Test Time: Sample / Argmax

Page 49: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

.03

.13

.00

.84

.25

.20

.05

.50

.11

.17

.68

.03

.11

.02

.08

.79Softmax

“e” “l” “l” “o”SampleExample:

Character-levelLanguage ModelSampling

Vocabulary:[h,e,l,o]

At test-time sample characters one at a time, feed back to model

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Test Time: Sample / Argmax

Page 50: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

.03

.13

.00

.84

.25

.20

.05

.50

.11

.17

.68

.03

.11

.02

.08

.79Softmax

“e” “l” “l” “o”SampleExample:

Character-levelLanguage ModelSampling

Vocabulary:[h,e,l,o]

At test-time sample characters one at a time, feed back to model

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Test Time: Sample / Argmax

Page 51: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

.03

.13

.00

.84

.25

.20

.05

.50

.11

.17

.68

.03

.11

.02

.08

.79Softmax

“e” “l” “l” “o”SampleExample:

Character-levelLanguage ModelSampling

Vocabulary:[h,e,l,o]

At test-time sample characters one at a time, feed back to model

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Test Time: Sample / Argmax

Page 52: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Backpropagation through timeLoss

Forward through entire sequence to compute loss, then backward through entire sequence to compute gradient

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 53: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Truncated Backpropagation through timeLoss

Run forward and backward through chunks of the sequence instead of whole sequence

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 54: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Truncated Backpropagation through timeLoss

Carry hidden states forward in time forever, but only backpropagate for some smaller number of steps

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 55: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Truncated Backpropagation through timeLoss

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 56: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Deep (Stacked) RNNs

Page 57: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2
Page 58: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

x

RNN

y

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 59: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

train more

train more

train more

at first:

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 60: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 61: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

The Stacks Project: open source algebraic geometry textbook

Latex source http://stacks.math.columbia.edu/The stacks project is licensed under the GNU Free Documentation License

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 62: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 63: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 64: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 65: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Generated C code

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 66: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 67: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 68: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Searching for interpretable cells

Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 69: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Searching for interpretable cells

Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016Figures copyright Karpathy, Johnson, and Fei-Fei, 2015; reproduced with permission

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 70: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Searching for interpretable cells

Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016Figures copyright Karpathy, Johnson, and Fei-Fei, 2015; reproduced with permission

quote detection cell

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 71: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Searching for interpretable cells

Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016Figures copyright Karpathy, Johnson, and Fei-Fei, 2015; reproduced with permission

line length tracking cell

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 72: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Searching for interpretable cells

Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016Figures copyright Karpathy, Johnson, and Fei-Fei, 2015; reproduced with permission

if statement cell

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 73: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Searching for interpretable cells

Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016Figures copyright Karpathy, Johnson, and Fei-Fei, 2015; reproduced with permission

quote/comment cell

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 74: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Searching for interpretable cells

Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016Figures copyright Karpathy, Johnson, and Fei-Fei, 2015; reproduced with permission

code depth cell

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 75: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Neural Image Captioning

(C) Dhruv Batra 77

Convolution Layer+ Non-Linearity

Pooling Layer Convolution Layer+ Non-Linearity

Pooling Layer Fully-Connected MLP

Image Embedding (VGGNet)4096-dim

Page 76: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Neural Image Captioning

(C) Dhruv Batra 78

Convolution Layer+ Non-Linearity

Pooling Layer Convolution Layer+ Non-Linearity

Pooling Layer Fully-Connected MLP

4096-dim

Image Embedding (VGGNet)

Page 77: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Neural Image Captioning

(C) Dhruv Batra 79

Con

volu

tion

Laye

r +

Non

-Lin

earit

y Po

olin

g La

yer

Con

volu

tion

Laye

r +

Non

-Lin

earit

y Po

olin

g La

yer

Fully

-Con

nect

ed M

LP

4096

-dim

Imag

e Em

bedd

ing

(VG

GN

et)

Line

ar

<start> Two people and two horses.

RN

N

RN

N

RN

N

RN

N

RN

N

RN

N

P(next) P(next) P(next) P(next) P(next)P(next)

Page 78: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Neural Image Captioning

(C) Dhruv Batra 80

Con

volu

tion

Laye

r +

Non

-Lin

earit

y Po

olin

g La

yer

Con

volu

tion

Laye

r +

Non

-Lin

earit

y Po

olin

g La

yer

Fully

-Con

nect

ed M

LP

4096

-dim

Imag

e Em

bedd

ing

(VG

GN

et)

Line

ar

<start> Two people and two horses.

RN

N

RN

N

RN

N

RN

N

RN

N

RN

N

P(next) P(next) P(next) P(next) P(next)P(next)

Page 79: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2
Page 80: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2
Page 81: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2
Page 82: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2
Page 83: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2
Page 84: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2
Page 85: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2
Page 86: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2
Page 87: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2
Page 88: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2
Page 89: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2
Page 90: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2
Page 91: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2
Page 92: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2
Page 93: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2
Page 94: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Beam Search• Proceed from left to right• Maintain N partial captions• Expand each caption with possible next words• Discard all but the top N new partial translations

Page 95: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

A cat sitting on a suitcase on the floor

A cat is sitting on a tree branch

A dog is running in the grass with a frisbee

A white teddy bear sitting in the grass

Two people walking on the beach with surfboards

Two giraffes standing in a grassy field

A man riding a dirt bike on a dirt track

Image Captioning: Example Results

A tennis player in action on the court

Captions generated using neuraltalk2All images are CC0 Public domain: cat suitcase, cat tree, dog, bear, surfers, tennis, giraffe, motorcycle

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 96: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Image Captioning: Failure Cases

A woman is holding a cat in her hand

A woman standing on a beach holding a surfboard

A person holding a computer mouse on a desk

A bird is perched on a tree branch

A man in a baseball uniform throwing a ball

Captions generated using neuraltalk2All images are CC0 Public domain: fur coat, handstand, spider web, baseball

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 97: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Image Captioning with Attention

CNN

Image: H x W x 3

Features: L x D

h0

Xu et al, “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention”, ICML 2015

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 98: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

CNN

Image: H x W x 3

Features: L x D

h0

a1

Distribution over L locations

Xu et al, “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention”, ICML 2015

Image Captioning with Attention

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 99: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

CNN

Image: H x W x 3

Features: L x D

h0

a1

Weighted combination of features

Distribution over L locations

z1Weighted

features: D

Xu et al, “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention”, ICML 2015

Image Captioning with Attention

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 100: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

CNN

Image: H x W x 3

Features: L x D

h0

a1

z1

Weighted combination of features

h1

Distribution over L locations

Weighted features: D y1

First wordXu et al, “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention”, ICML 2015

Image Captioning with Attention

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 101: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

CNN

Image: H x W x 3

Features: L x D

h0

a1

z1

Weighted combination of features

y1

h1

First word

Distribution over L locations

a2 d1

Weighted features: D

Distribution over vocab

Xu et al, “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention”, ICML 2015

Image Captioning with Attention

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 102: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

CNN

Image: H x W x 3

Features: L x D

h0

a1

z1

Weighted combination of features

y1

h1

First word

Distribution over L locations

a2 d1

h2

z2 y2Weighted

features: D

Distribution over vocab

Xu et al, “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention”, ICML 2015

Image Captioning with Attention

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 103: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

CNN

Image: H x W x 3

Features: L x D

h0

a1

z1

Weighted combination of features

y1

h1

First word

Distribution over L locations

a2 d1

h2

a3 d2

z2 y2Weighted

features: D

Distribution over vocab

Xu et al, “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention”, ICML 2015

Image Captioning with Attention

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 104: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Soft attention

Hard attention

Image Captioning with Attention

Xu et al, “Show, Attend, and Tell: Neural Image Caption Generation with Visual Attention”, ICML 2015Figure copyright Kelvin Xu, Jimmy Lei Ba, Jamie Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Benchio, 2015. Reproduced with permission.

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 105: TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Image Captioning with Attention

Xu et al, “Show, Attend, and Tell: Neural Image Caption Generation with Visual Attention”, ICML 2015Figure copyright Kelvin Xu, Jimmy Lei Ba, Jamie Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Benchio, 2015. Reproduced with permission.

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n