Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… ·...
Transcript of Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… ·...
![Page 1: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/1.jpg)
Recurrent Neural networks
Slides borrowed fromAbhishek Narwekar and Anusri Pampari at UIUC
![Page 2: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/2.jpg)
Lecture Outline
1. Introduction
2. Learning Long Term Dependencies
3. Visualization for RNNs
![Page 3: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/3.jpg)
Applications of RNNs
Image Captioning [re fe rence]
.. and Trum p [re fe rence]
Write like Shakespeare [reference]
… and m ore !
![Page 4: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/4.jpg)
Applications of RNNs
Technically, an RNN models sequences• Time series• Natural Language, Speech
We can even convert non -sequences to sequences, eg: feed an image as a sequence of pixels!
![Page 5: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/5.jpg)
Applications of RNNs
• RNN Generated TED Talks
• YouTube Link
• RNN Generated Eminem rapper
• RNN Shady
• RNN Generated Music
• Music Link
![Page 6: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/6.jpg)
Why RNNs?
• Can model sequences having variable length
• Efficient: Weights shared across time-steps
• They work!
• SOTA in several speech, NLP tasks
![Page 7: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/7.jpg)
The Recurrent Neuron
next time step
Source: Slides by Arun
● xt: Input at time t● ht-1: State at time t -1
![Page 8: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/8.jpg)
Unfolding an RNN
Weights shared over time!
Source: Slides by Arun
![Page 9: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/9.jpg)
Making Feedforward Neural Networks Deep
Source: http ://www.opennn .ne t/im ages/deep_neura l_ne twork.png
![Page 10: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/10.jpg)
Option 1: Feedforward Depth (df)
• Feedforward depth: longest path between an input and output at thesame timestep
Feedforward depth = 4
High level feature!
Notation : h 0,1 ⇒ tim e step 0, neuron # 1
![Page 11: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/11.jpg)
Option 2: Recurrent Depth (dr)
● Recurrent depth : Longest path
between same hidden state in
successive timesteps
Recurrent depth = 3
![Page 12: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/12.jpg)
Backpropagation Through Time (BPTT)
• Objective is to update the weight matrix:
• Issue: W occurs each timestep• Every path from W to L is one dependency• Find all paths from W to L!
(note: dropping subscript h from Wh for brevity)
![Page 13: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/13.jpg)
Systematically Finding All Paths
How many paths exist from W to L through L1?
Just 1. Origina ting a t h 0.
![Page 14: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/14.jpg)
Systematically Finding All Paths
How many paths from W to L through L2?
2. Origina ting a t h 0 and h 1.
![Page 15: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/15.jpg)
Systematically Finding All Paths
And 3 in this case.
The grad ien t has two sum m ations:1: Over Lj2: Over h k
Origin of pa th = basis for Σ
![Page 16: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/16.jpg)
Backpropagation as two summations
• First summation over L
(𝐿𝐿 = 𝐿𝐿1 + 𝐿𝐿2 + ⋯𝐿𝐿𝑇𝑇−1)
𝜕𝜕𝐿𝐿𝜕𝜕𝐿𝐿𝑗𝑗
![Page 17: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/17.jpg)
Backpropagation as two summations
• First summation over L
(𝐿𝐿 = 𝐿𝐿1 + 𝐿𝐿2 + ⋯𝐿𝐿𝑇𝑇−1)
𝜕𝜕𝐿𝐿𝜕𝜕𝐿𝐿𝑗𝑗
![Page 18: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/18.jpg)
Backpropagation as two summations
● Second summation over h: Each Lj depends on the weight matrices before it
Lj depends on a ll h k be fore it.
![Page 19: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/19.jpg)
Backpropagation as two summations
● No explicit of L j on h k
● Use chain rule to fill missing steps
j
k
![Page 20: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/20.jpg)
Backpropagation as two summations
● No explicit of L j on h k
● Use chain rule to fill missing steps
j
k
![Page 21: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/21.jpg)
The Jacobian
Indirect dependency. One final use of the chain rule gives:
“The Jacobian”
j
k
![Page 22: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/22.jpg)
The Final Backpropagation Equation
![Page 23: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/23.jpg)
Backpropagation as two summations
j
k
● Often , to reduce m em ory requ irem ent,
we trunca te the ne twork
● Inne r sum m ation runs from j-p to j for
som e p ==> trunca ted BPTT
![Page 24: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/24.jpg)
Expanding the Jacobian
![Page 25: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/25.jpg)
The Issue with the Jacobian
Repeated matrix multiplications leads to vanishing and exploding gradients.How? Let’s take a slight detour.
Weight Matrix Deriva tive of activa tion function
![Page 26: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/26.jpg)
Eigenvalues and Stability• Consider identity activation function• If Recurrent Matrix Wh is a diagonalizable:
Computing powers of Wh is sim ple :
Bengio et al, "On the difficulty of training recurrent neural networks." (2012)
Q matrix composed of eigenvectors of W h
Λ is a d iagonal m atrix with e igenvalues p laced on the d iagonals
![Page 27: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/27.jpg)
Eigenvalues and stabilityVanishing gradients
Exploding gradients
![Page 28: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/28.jpg)
2. Learning Long Term Dependencies
![Page 29: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/29.jpg)
Outline
Vanishing/Exploding Gradients in RNN
Weight In itia liza tion
Methods
Constan t Error Carouse l
Hessian Free Optim iza tion
Echo Sta te Ne tworks
● Iden tity-RNN● np-RNN
● LSTM● GRU
![Page 30: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/30.jpg)
Outline
Vanishing/Exploding Gradients in RNN
Weight In itia liza tion
Methods
Constan t Error Carouse l
Hessian Free Optim iza tion
Echo Sta te Ne tworks
● Iden tity-RNN● np-RNN
● LSTM● GRU
![Page 31: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/31.jpg)
Weight Initialization Methods
• Activation function : ReLU
Bengio et al,. "On the difficulty of training recurrent neural networks." ( 2012
![Page 32: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/32.jpg)
Weight Initialization Methods
• Random Wh initialization of RNN has no constraint on eigenvalues⇒ vanishing or exploding gradients in the initial epoch
![Page 33: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/33.jpg)
Weight Initialization Methods
• Careful initialization of Wh with suitable eigenvalues⇒ allows the RNN to learn in the initial epochs ⇒ hence can generalize well for further iterations
![Page 34: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/34.jpg)
Weight Initialization Trick #1: IRNN
● Wh initialized to Identity
● Activation function: ReLU
Geoffrey et al, “A Simple Way to Initialize Recurrent Networks of Rectified Linear Units”
![Page 35: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/35.jpg)
Weight Initialization Trick #2: np-RNN● Wh positive definite (+ve real eigenvalues)
● At least one eigenvalue is 1, others all less than equal to one
● Activation function: ReLU
Geoffrey et al, “Improving Performance of Recurrent Neural Network with ReLU nonlinearity””
![Page 36: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/36.jpg)
np-RNN vs IRNN
• Sequence Classification Task
Geoffrey et al, “Improving Perfomance of Recurrent Neural Network with ReLU nonlinearity””
RNN Type Accuracy Test Parameter ComplexityCompared to RNN
Sensitivity to parameters
IRNN 67 % x1 high
np-RNN 75.2 % x1 low
LSTM 78.5 % x4 low
![Page 37: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/37.jpg)
Outline
Vanishing/Exploding Gradients in RNN
Weight In itia liza tion
Methods
Constan t Error Carouse l
Hessian Free Optim iza tion
Echo Sta te Ne tworks
● Iden tity-RNN● np-RNN
● LSTM● GRU
![Page 38: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/38.jpg)
The LSTM Network (Long Short-Term Memory)
Source: http://colah.github.io/posts/2015 -08-Understand ing-LSTMs/
![Page 39: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/39.jpg)
The LSTM Cell
● σ(): sigmoid non -linearity
● x : e lem ent-wise m ultip lica tion
ℎ𝑡𝑡−1 ℎ𝑡𝑡
𝑐𝑐𝑡𝑡−1 𝑐𝑐𝑡𝑡 (cell state)
(output)
![Page 40: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/40.jpg)
The LSTM Cell● σ(): sigmoid non -linearity
● x : e lem ent-wise m ultip lica tion
Forge t ga te (f)
Input ga te (i)
Candida te sta te (g)
ℎ𝑡𝑡−1 ℎ𝑡𝑡
𝑐𝑐𝑡𝑡−1 𝑐𝑐𝑡𝑡 (cell state)
(output)
Output ga te (o)
![Page 41: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/41.jpg)
The LSTM CellForget old state Remember new state
Forget gate(f)
Output ga te (o)Input ga te (i)
Candida te sta te (g)
ℎ𝑡𝑡−1 ℎ𝑡𝑡
𝑐𝑐𝑡𝑡−1 𝑐𝑐𝑡𝑡 (cell state)
(output)
x
x x
● σ(): sigm oid non -linearity
● x : e lem ent-wise m ultip lica tion
![Page 42: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/42.jpg)
The LSTM CellForget old state Remember new state
Forget gate(f)
Output ga te (o)Input ga te (i)
Candida te sta te (g)
ℎ𝑡𝑡−1 ℎ𝑡𝑡
𝑐𝑐𝑡𝑡−1 𝑐𝑐𝑡𝑡 (cell state)
(output)
x
x x
● σ(): sigm oid non -linearity
● x : e lem ent-wise m ultip lica tion
![Page 43: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/43.jpg)
Gated Recurrent Unit
Source: https://en.wikipedia.org/wiki/Gated_recurrent_unit
h[t]: memoryh[t]: new informationz[t]: update gate (where to add)r[t]: reset gatey[t]: output
^
^
![Page 44: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/44.jpg)
Comparing GRU and LSTM
• Both GRU and LSTM better than RNN with tanh on music and speech
modeling
• GRU performs comparably to LSTM
• No clear consensus between GRU and LSTM
Source: Empirical evaluation of GRUs on sequence modeling, 2014
![Page 45: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/45.jpg)
Section 3: Visualizing and Understanding Recurrent Networks
![Page 46: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/46.jpg)
Character Level Language Modelling
Task: Predicting the next character given the current character
Andrej Karpathy, Blog on “Unreasonable Effectiveness of Recurrent Neural Networks”
![Page 47: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/47.jpg)
Generated Text:● Remembers to close a bracket● Capitalize nouns● 404 Page Not Found! :P The
LSTM hallucinates it.
Andrej Karpathy, Blog on “Unreasonable Effectiveness of Recurren t Neura l Ne tworks”
![Page 48: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/48.jpg)
100 thiteration
300 thiteration
700 thiteration
2000 thiteration
Andrej Karpathy, Blog on “Unreasonable Effectiveness of Recurrent Neural Networks”
![Page 49: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/49.jpg)
Visualizing Predictions and Neuron “firings”
Excited neuron in urlNot excited neuron outside url
Likely predictionNot a likely prediction
Andrej Karpathy, Blog on “Unreasonable Effectiveness of Recurrent Neural Networks”
![Page 50: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/50.jpg)
Features RNN Captures in Common Language ?
![Page 51: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/51.jpg)
Cell Sensitive to Position in Line● Can be interpreted as tracking the line length
Andrej Karpathy, Blog on “Unreasonable Effectiveness of Recurrent Neural Networks”
![Page 52: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/52.jpg)
Cell That Turns On Inside Quotes
Andrej Karpathy, Blog on “Unreasonable Effectiveness of Recurrent Neura l Ne tworks”
![Page 53: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/53.jpg)
Features RNN Captures in C Language?
![Page 54: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/54.jpg)
Cell That Activates Inside IF Statements
Andrej Karpathy, Blog on “Unreasonable Effectiveness of Recurrent Neural Networks”
![Page 55: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/55.jpg)
Cell That Is Sensitive To Indentation● Can be interpreted as tracking indentation of code.
● Increasing strength as indentation increases
Andrej Karpathy, Blog on “Unreasonable Effectiveness of Recurrent Neural Networks”
![Page 56: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/56.jpg)
Non-Interpretable Cells
● Only 5% of the cells show such interesting properties● Large portion of the cells are not interpretable by themselves
Andrej Karpathy, Blog on “Unreasonable Effectiveness of Recurrent Neural Networks ”
![Page 57: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/57.jpg)
Visualizing Hidden State Dynamics
• Observe changes in hidden state representation overtime• Tool : LSTMVis
Hendrick et al, “Visual Analysis of Hidden State Dynamics in Recurrent Neural Networks”
![Page 58: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/58.jpg)
Visualizing Hidden State Dynamics
Hendrick et al, “Visual Analysis of Hidden State Dynamics in Recurren t Neura l Ne tworks”
![Page 59: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/59.jpg)
Visualizing Hidden State Dynamics
Hendrick et al, “Visual Analysis of Hidden State Dynamics in Recurren t Neura l Ne tworks”
![Page 60: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/60.jpg)
Key Takeaways
• Recurrent neural network for sequential data processing• Feedforward depth
• Recurrent depth
• Long term dependencies are a major problem in RNNs. Solutions:• Intelligent weight initialization
• LSTMs / GRUs
• Visualization helps• Analyze finer details of features produced by RNNs
![Page 61: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/61.jpg)
References
• Survey Papers• Lipton, Zachary C., John Berkowitz, and Charles Elkan. A critical review of recurrent neural networks for sequence
learning, arXiv preprint arXiv:1506.00019 (2015).• Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. Chapter 10: Sequence Modeling: Recurrent and Recursive Nets.
MIT Press, 2016.• Training
• Semeniuta, Stanislau, Aliaksei Severyn, and Erhardt Barth. Recurrent dropout without memory loss. arXiv preprint arXiv:1603.05118 (2016).
• Arjovsky, Martin, Amar Shah, and Yoshua Bengio. Unitary evolution recurrent neural networks. arXiv preprint arXiv:1511.06464 (2015).
• Le, Quoc V., Navdeep Jaitly, and Geoffrey E. Hinton. A simple way to initialize recurrent networks of rectified linear units. arXiv preprint arXiv:1504.00941 (2015).
• Cooijmans, Tim, et al. Recurrent batch normalization. arXiv preprint arXiv:1603.09025 (2016).
![Page 62: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv](https://reader034.fdocuments.in/reader034/viewer/2022043021/5f3cabff9d77ad0bdb53d38e/html5/thumbnails/62.jpg)
References (contd)
• Architectural Complexity Measures• Zhang, Saizheng, et al, Architectural Complexity Measures of Recurrent Neural Networks. Advances in Neural
Information Processing Systems. 2016.• Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv preprint arXiv:1312.6026 (2013).
• RNN Variants• Zilly, Julian Georg, et al. Recurrent highway networks. arXiv preprint arXiv:1607.03474 (2016)• Chung, Junyoung, Sungjin Ahn, and Yoshua Bengio. Hierarchical multiscale recurrent neural networks, arXiv preprint
arXiv:1609.01704 (2016).• Visualization
• Karpathy, Andrej, Justin Johnson, and Li Fei-Fei. Visualizing and understanding recurrent networks. arXiv preprint arXiv:1506.02078 (2015).
• Hendrik Strobelt, Sebastian Gehrmann, Bernd Huber, Hanspeter Pfister, Alexander M. Rush. LSTMVis: Visual Analysis for RNN, arXiv preprint arXiv:1606.07461 (2016).