Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation...
Transcript of Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation...
![Page 1: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/1.jpg)
Lecture 9Recurrent Neural Networks
“I’m glad that I’m Turing Complete now”
Xinyu ZhouMegvii (Face++) Researcher
[email protected] 2017
![Page 2: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/2.jpg)
Raise your hand and ask,whenever you have questions...
![Page 3: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/3.jpg)
We have a lot to coverand
DON’T BLINK
![Page 4: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/4.jpg)
Outline● RNN Basics● Classic RNN Architectures
○ LSTM○ RNN with Attention○ RNN with External Memory
■ Neural Turing Machine■ CAVEAT: don’t fall asleep
● Applications○ A market of RNNs
![Page 5: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/5.jpg)
RNN Basics
![Page 6: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/6.jpg)
Feedforward Neural Networks● Feedforward neural networks can fit any bounded continuous (compact)
function● This is called Universal approximation theorem
https://en.wikipedia.org/wiki/Universal_approximation_theoremCybenko, George. "Approximation by superpositions of a sigmoidal function." Mathematics of Control, Signals, and Systems (MCSS) 2.4 (1989): 303-314.
![Page 7: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/7.jpg)
Bounded Continuous Function is NOT ENOUGH!
How to solve Travelling Salesman Problem?
![Page 8: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/8.jpg)
Bounded Continuous Function is NOT ENOUGH!
How to solve Travelling Salesman Problem?
We Need to be Turing Complete
![Page 9: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/9.jpg)
RNN is Turing Complete
Siegelmann, Hava T., and Eduardo D. Sontag. "On the computational power of neural nets." Journal of computer and system sciences 50.1 (1995): 132-150.
![Page 10: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/10.jpg)
Sequence Modeling
![Page 11: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/11.jpg)
Sequence Modeling● How to take a variable length sequence as input?● How to predict a variable length sequence as output?
RNN
![Page 12: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/12.jpg)
RNN DiagramA lonely feedforward cell
![Page 13: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/13.jpg)
RNN DiagramGrows … with more inputs and outputs
![Page 14: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/14.jpg)
RNN Diagram… here comes a brother
(x_1, x_2) comprises a length-2 sequence
![Page 15: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/15.jpg)
RNN Diagram… with shared (tied) weights
x_i: inputsy_i: outputsW: all the sameh_i: internal states that passedalongF: a “pure” function
![Page 16: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/16.jpg)
RNN Diagram… with shared (tied) weights
A simple implementation of F
![Page 17: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/17.jpg)
Categorize RNNs by input/output types
![Page 18: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/18.jpg)
Categorize RNNs by input/output typesMany-to-many
![Page 19: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/19.jpg)
Categorize RNNs by input/output typesMany-to-one
![Page 20: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/20.jpg)
Categorize RNNs by input/output typesOne-to-Many
![Page 21: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/21.jpg)
Categorize RNNs by input/output typesMany-to-Many: Many-to-One + One-to-Many
![Page 22: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/22.jpg)
Many-to-Many ExampleLanguage Model
● Predict next word givenprevious words
● “h” → “he” → “hel” → “hell” → “hello”
![Page 23: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/23.jpg)
Language Modeling● Tell story●● “Heeeeeel”● ⇒ “Heeeloolllell”● ⇒ “Hellooo”● ⇒ “Hello”
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture10.pdf
![Page 24: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/24.jpg)
Language Modeling● Write (nonsense) book in latex
\begin{proof}We may assume that $\mathcal{I}$ is an abelian sheaf on $\mathcal{C}$.\item Given a morphism $\Delta : \mathcal{F} \to \mathcal{I}$is an injective and let $\mathfrak q$ be an abelian sheaf on $X$.Let $\mathcal{F}$ be a fibered complex. Let $\mathcal{F}$ be a category.\begin{enumerate}\item \hyperref[setain-construction-phantom]{Lemma}\label{lemma-characterize-quasi-finite}Let $\mathcal{F}$ be an abelian quasi-coherent sheaf on $\mathcal{C}$.Let $\mathcal{F}$ be a coherent $\mathcal{O}_X$-module. Then$\mathcal{F}$ is an abelian catenary over $\mathcal{C}$.\item The following are equivalent\begin{enumerate}\item $\mathcal{F}$ is an $\mathcal{O}_X$-module.\end{lemma}
![Page 25: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/25.jpg)
Language Modeling● Write (nonsense) book in latex
![Page 26: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/26.jpg)
Many-to-One ExampleSentiment analysis
● “RNNs are awesome!” ⇒ ● “The course project is too hard for me.” ⇒
![Page 27: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/27.jpg)
Many-to-One ExampleSentiment analysis
● “RNNs are awesome!” ⇒ ● “The course project is too hard for me.” ⇒
![Page 28: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/28.jpg)
Many-to-One + One-to-ManyNeural Machine Translation
![Page 29: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/29.jpg)
Many-to-One + One-to-ManyNeural Machine Translation
![Page 30: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/30.jpg)
Many-to-One + One-to-ManyNeural Machine Translation
![Page 31: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/31.jpg)
Many-to-One + One-to-ManyNeural Machine Translation
Encoder
Decoder
![Page 32: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/32.jpg)
Vanishing/Exploding Gradient Problem
![Page 33: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/33.jpg)
Training RNN● “Backpropagation Through Time”
○ Truncated BPTT
● The chain rule of differentiation○ Just Backpropagation
![Page 34: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/34.jpg)
Vanishing/Exploding Gradient Problem● Consider a linear recurrent net with zero inputs●
Bengio, Yoshua, Patrice Simard, and Paolo Frasconi. "Learning long-term dependencies with gradient descent is difficult." IEEE transactions on neural networks 5.2 (1994): 157-166. https://en.wikipedia.org/wiki/Power_iterationhttp://www.cs.cornell.edu/~bindel/class/cs6210-f09/lec26.pdf
![Page 35: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/35.jpg)
Vanishing/Exploding Gradient Problem● Consider a linear recurrent net with zero inputs●●● Singular value > 1 ⇒ Explodes● Singular value < 1 ⇒ Vanishes
Bengio, Yoshua, Patrice Simard, and Paolo Frasconi. "Learning long-term dependencies with gradient descent is difficult." IEEE transactions on neural networks 5.2 (1994): 157-166. https://en.wikipedia.org/wiki/Power_iterationhttp://www.cs.cornell.edu/~bindel/class/cs6210-f09/lec26.pdf
![Page 36: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/36.jpg)
Vanishing/Exploding Gradient Problem● Consider a linear recurrent net with zero inputs●●● “It is sufficient for the largest eigenvalue λ_1 of the recurrent weight matrix to
be smaller than 1 for long term components to vanish (as t → ∞) and necessary for it to be larger than 1 for gradients to explode.”
Bengio, Yoshua, Patrice Simard, and Paolo Frasconi. "Learning long-term dependencies with gradient descent is difficult." IEEE transactions on neural networks 5.2 (1994): 157-166. https://en.wikipedia.org/wiki/Power_iterationhttp://www.cs.cornell.edu/~bindel/class/cs6210-f09/lec26.pdf
Details are here
![Page 37: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/37.jpg)
Long short-term memory (LSTM) come to the rescue
Vanilla RNN
LSTM
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture10.pdf
![Page 38: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/38.jpg)
Why LSTM works● i: input gate● f: forget gate● o: output gate● g: temp variable● c: memory cell●● Key observation:
○ If f == 1, then■ C_t
○ Looks like a ResNet!■
http://people.idsia.ch/~juergen/lstm/sld017.htm
![Page 39: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/39.jpg)
LSTM vs Weight Sharing ResNet
● Difference○ Never forgets○ No intermediate inputs
Cell
vs
![Page 40: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/40.jpg)
GRU● Similar to LSTM● Let information flow without a
separate memory cell●● Consider
Chung, Junyoung, et al. "Empirical evaluation of gated recurrent neural networks on sequence modeling." arXiv preprint arXiv:1412.3555 (2014).
![Page 41: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/41.jpg)
Search for Better RNN Architecture1. Initialize a pool with {LSTM, GRU}2. Evaluate new architecture with
20 hyperparameter settings3. Select one at random from the pool4. Mutate the selected architecture5. Evaluate new architecture with
20 hyperparameter settings6. Maintain a list of 100 best architectures7. Goto 3
Jozefowicz, Rafal, Wojciech Zaremba, and Ilya Sutskever. "An empirical exploration of recurrent network architectures." Proceedings of the 32nd International Conference on Machine Learning (ICML-15). 2015.
Key step
![Page 42: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/42.jpg)
Simple RNN Extensions
![Page 43: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/43.jpg)
Bidirectional RNN (BDRNN)● RNN can go either way● “Peak into the future”● Truncated version used in speech recognition
https://github.com/huseinzol05/Generate-Music-Bidirectional-RNN
![Page 44: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/44.jpg)
2D-RNN: Pixel-RNN● Pixel-RNN● Each pixel depends on its top and left neighbor
Oord, Aaron van den, Nal Kalchbrenner, and Koray Kavukcuoglu. "Pixel recurrent neural networks." arXiv preprint arXiv:1601.06759 (2016).
![Page 45: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/45.jpg)
Pixel-RNN
Oord, Aaron van den, Nal Kalchbrenner, and Koray Kavukcuoglu. "Pixel recurrent neural networks." arXiv preprint arXiv:1601.06759 (2016).
![Page 46: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/46.jpg)
Pixel-RNN Application● Segmentation
Visin, Francesco, et al. "Reseg: A recurrent neural network-based model for semantic segmentation." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2016.
![Page 47: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/47.jpg)
Deep RNN● Stack more of them
○ Pros■ More representational power
○ Cons■ Harder to train
● ⇒ Need residual connections along depth
![Page 48: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/48.jpg)
RNN Basics Summary● The evolution of RNN from Feedforward NN● Recurrence as unrolled computation graph● Vanishing/Exploding gradient problem
○ LSTM and variants○ and the relation to ResNet
● Extensions○ BDRNN○ 2DRNN○ Deep-RNN
![Page 49: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/49.jpg)
RNN with Attention
![Page 50: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/50.jpg)
What is Attention?● Differentiate entities by its importance
○ spatial attention is related to location○ temporal attention is related to causality
https://distill.pub/2016/augmented-rnns
![Page 51: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/51.jpg)
Attention over Input Sequence● Neural Machine Translation (NMT)
Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. "Neural machine translation by jointly learning to align and translate." arXiv preprint arXiv:1409.0473 (2014).
![Page 52: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/52.jpg)
Neural Machine Translation (NMT)● Attention over input sequence● There’re words in two languages that
share the same meaning.● Attention ⇒ Alignment
○ Differentiable, allowing end-to-end training
Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. "Neural machine translation by jointly learning to align and translate." arXiv preprint arXiv:1409.0473 (2014).
![Page 53: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/53.jpg)
![Page 54: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/54.jpg)
z
![Page 55: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/55.jpg)
z
![Page 56: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/56.jpg)
z
![Page 57: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/57.jpg)
https://distill.pub/2016/augmented-rnns
![Page 58: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/58.jpg)
https://distill.pub/2016/augmented-rnns
![Page 59: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/59.jpg)
https://distill.pub/2016/augmented-rnns
![Page 60: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/60.jpg)
https://distill.pub/2016/augmented-rnns
![Page 61: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/61.jpg)
https://distill.pub/2016/augmented-rnns
![Page 62: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/62.jpg)
https://distill.pub/2016/augmented-rnns
![Page 63: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/63.jpg)
https://distill.pub/2016/augmented-rnns
![Page 64: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/64.jpg)
https://distill.pub/2016/augmented-rnns
![Page 65: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/65.jpg)
Image Attention: Image Captioning●
Xu, Kelvin, et al. "Show, attend and tell: Neural image caption generation with visual attention." International Conference on Machine Learning. 2015.
![Page 66: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/66.jpg)
![Page 67: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/67.jpg)
![Page 68: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/68.jpg)
![Page 69: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/69.jpg)
![Page 70: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/70.jpg)
![Page 71: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/71.jpg)
![Page 72: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/72.jpg)
![Page 73: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/73.jpg)
![Page 74: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/74.jpg)
Image Attention: Image Captioning
![Page 75: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/75.jpg)
Image Attention: Image Captioning
![Page 76: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/76.jpg)
Text Recognition● Implicit language model
![Page 77: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/77.jpg)
Text Recognition● Implicit language model
![Page 78: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/78.jpg)
Soft Attention RNN for OCR
CNN
金 口
Column FC金口香牛肉面
金口香牛肉面
Loss1
Loss2
Attention
![Page 79: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/79.jpg)
RNN with External Memory
![Page 80: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/80.jpg)
Copy a sequence
Input
Output
![Page 81: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/81.jpg)
Copy a sequence
Input
Output
Solution in Python
![Page 82: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/82.jpg)
Copy a sequence
Input
Output
Solution in Python
Can neural network learn this program purely from data?
![Page 83: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/83.jpg)
Traditional Machine Learning● √ Elementary Operations● √* Logic flow control
○ Decision tree
● × External Memory○ As opposed to internal memory (hidden states)
Graves, Alex, Greg Wayne, and Ivo Danihelka. "Neural turing machines." arXiv preprint arXiv:1410.5401 (2014).
![Page 84: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/84.jpg)
Traditional Machine Learning● √ Elementary Operations● √* Logic flow control● × External Memory
Graves, Alex, Greg Wayne, and Ivo Danihelka. "Neural turing machines." arXiv preprint arXiv:1410.5401 (2014).
![Page 85: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/85.jpg)
Neural Turing Machines (NTM)● NTM is a neural networks with
a working memory● It reads and write multiple times
at each step● Fully differentiable and can be
trained end-to-end
Graves, Alex, Greg Wayne, and Ivo Danihelka. "Neural turing machines." arXiv preprint arXiv:1410.5401 (2014).
An NTM “Cell”
![Page 86: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/86.jpg)
Neural Turing Machines (NTM)● Memory
○ Sdfsdf
http://llcao.net/cu-deeplearning15/presentation/NeuralTuringMachines.pdf
n
m
![Page 87: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/87.jpg)
Neural Turing Machines (NTM)● Read
● Hard indexing ⇒ Soft Indexing○ A distribution of index○ “Attention”
![Page 88: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/88.jpg)
Neural Turing Machines (NTM)● Read
● Hard indexing ⇒ Soft Indexing○ A distribution of index○ “Attention”
MemoryLocations
![Page 89: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/89.jpg)
Neural Turing Machines (NTM)● Read
● Hard indexing ⇒ Soft Indexing○ A distribution of index○ “Attention”
MemoryLocations
![Page 90: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/90.jpg)
Neural Turing Machines (NTM)● Write
○ Write = erase + add
erase
add
![Page 91: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/91.jpg)
Neural Turing Machines (NTM)● Write
○ Write = erase + add
erase
add
![Page 92: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/92.jpg)
Neural Turing Machines (NTM)● Addressing
![Page 93: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/93.jpg)
Neural Turing Machines (NTM)● Addressing● 1. Focusing by Content
● Cosine Similarity
![Page 94: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/94.jpg)
Neural Turing Machines (NTM)● Addressing● 1. Focusing by Content
● Cosine Similarity
![Page 95: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/95.jpg)
Neural Turing Machines (NTM)● 1. Focusing by Content● 2. Interpolate with previous step
![Page 96: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/96.jpg)
Neural Turing Machines (NTM)● 1. Focusing by Content● 2. Interpolate with previous step
![Page 97: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/97.jpg)
Neural Turing Machines (NTM)● 1. Focusing by Content● 2. Interpolate with previous step● 3. Convolutional Shift
![Page 98: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/98.jpg)
Neural Turing Machines (NTM)● 1. Focusing by Content● 2. Interpolate with previous step● 3. Convolutional Shift
![Page 99: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/99.jpg)
Neural Turing Machines (NTM)● 1. Focusing by Content● 2. Interpolate with previous step● 3. Convolutional Shift
![Page 100: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/100.jpg)
Neural Turing Machines (NTM)● 1. Focusing by Content● 2. Interpolate with previous step● 3. Convolutional Shift● 4. Shapening
![Page 101: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/101.jpg)
Neural Turing Machines (NTM)● 1. Focusing by Content● 2. Interpolate with previous step● 3. Convolutional Shift● 4. Shapening
![Page 102: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/102.jpg)
Neural Turing Machines (NTM)● Addressing
One Head
![Page 103: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/103.jpg)
Neural Turing Machines (NTM)● Addressing
One Head
![Page 104: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/104.jpg)
Neural Turing Machines (NTM)● Controller
○ Feedforward○ LSTM
● Take input● Predict all red-circled variables ● Even if a feedforward controller is
used, NTM is an RNN
![Page 105: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/105.jpg)
NTM: Copy Task
NTM
![Page 106: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/106.jpg)
NTM: Copy Task
LSTM
![Page 107: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/107.jpg)
NTM: Copy Task Comparison
NTM
LSTM
![Page 108: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/108.jpg)
Neural Turing Machines (NTM)● Copy Task● Memory heads
loc_writeloc_read
![Page 109: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/109.jpg)
Neural Turing Machines (NTM)● Repeated Copy Task● Memory heads● White cells are positions of
memory heads
![Page 110: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/110.jpg)
Neural Turing Machines (NTM)● Priority Sort
![Page 111: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/111.jpg)
Misc● More networks with memories
○ Memory networks○ Differentiable Neural Computer (DNC)
● Adaptive Computing Time● Using different weights for each step
○ HyperNetworks
● Neural GPU Learns Algorithms
![Page 112: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/112.jpg)
More Applications
![Page 113: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/113.jpg)
RNN without a sequence input● Left
○ learns to read out house numbers from left to right
● Right○ a recurrent network generates
images of digits by learning to sequentially add color to a canvas
Ba, Jimmy, Volodymyr Mnih, and Koray Kavukcuoglu. "Multiple object recognition with visual attention." arXiv preprint arXiv:1412.7755 (2014).Gregor, Karol, et al. "DRAW: A recurrent neural network for image generation." arXiv preprint arXiv:1502.04623 (2015).
![Page 114: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/114.jpg)
Generalizing Recurrence● What is recurrence
○ A computation unit with shared parameter occurs at multiple places in the computation graph■ Convolution will do too
○ … with additional states passing among them■ That’s recurrence
● “Recursive”
![Page 115: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/115.jpg)
![Page 116: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/116.jpg)
![Page 117: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/117.jpg)
![Page 118: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/118.jpg)
![Page 119: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/119.jpg)
Recursive Neural Network● Apply when there’s tree structure in data
○ For natural language use The Standford Parserto build the syntax tree given a sentence
http://cs224d.stanford.edu/lectures/CS224d-Lecture10.pdfhttps://nlp.stanford.edu/software/lex-parser.shtml
![Page 120: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/120.jpg)
Recursive Neural Network● Bottom-up aggregation of
information○ Sentiment Analysis
Socher, Richard, et al. "Recursive deep models for semantic compositionality over a sentiment treebank." Proceedings of the 2013 conference on empirical methods in natural language processing. 2013.
![Page 121: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/121.jpg)
Recursive Neural Network● As a lookup table
Andrychowicz, Marcin, and Karol Kurach. "Learning efficient algorithms with hierarchical attentive memory." arXiv preprint arXiv:1602.03218 (2016).
![Page 122: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/122.jpg)
Speech Recognition● Deep Speech 2
○ Baidu
Amodei, Dario, et al. "Deep speech 2: End-to-end speech recognition in english and mandarin." International Conference on Machine Learning. 2016.
![Page 123: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/123.jpg)
Generating Sequence● Language modeling
○ Input: “A”○ Output: “A quick brown fox jumps over the lazy dog.”
● Handwriting stroke generation○
https://www.cs.toronto.edu/~graves/handwriting.html
![Page 124: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/124.jpg)
Question Answering1. Mary moved to the bathroom2. John went to the hallway3. Where is Mary? 4. Answer: bathroom
Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory networks." arXiv preprint arXiv:1410.3916 (2014). Sukhbaatar, Sainbayar, Jason Weston, and Rob Fergus. "End-to-end memory networks." Advances in neural information processing systems. 2015. Andreas, Jacob, et al. "Learning to compose neural networks for question answering." arXiv preprint arXiv:1601.01705 (2016). http://cs.umd.edu/~miyyer/data/deepqa.pdfhttps://research.fb.com/downloads/babi/
![Page 125: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/125.jpg)
Visual Question Answering
Antol, Stanislaw, et al. "Vqa: Visual question answering." Proceedings of the IEEE International Conference on Computer Vision. 2015.
![Page 126: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/126.jpg)
Visual Question Answering● Reason the relations among
Objects in image●● “What size is the cylinder that is
left of the brown metal thing that is left of the big sphere”
●● Dataset
○ CLEVR
https://distill.pub/2016/augmented-rnns/http://cs.stanford.edu/people/jcjohns/clevr/
![Page 127: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/127.jpg)
Combinatorial Problems● Pointer Networks
○ Convex Hull○ TSP○ Delaunay triangulation
● Cross-entropy loss on Soft-attention● Application in Vision
○ Object Tracking
MLA Vinyals, Oriol, Meire Fortunato, and Navdeep Jaitly. "Pointer networks." Advances in Neural Information Processing Systems. 2015.
![Page 128: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/128.jpg)
Combinatorial Problems● Pointer Networks
○ Convex Hull○ TSP○ Delaunay triangulation
● Cross-entropy loss on Soft-attention● Application in Vision
○ Object Tracking
MLA Vinyals, Oriol, Meire Fortunato, and Navdeep Jaitly. "Pointer networks." Advances in Neural Information Processing Systems. 2015.
![Page 129: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/129.jpg)
Learning to execute● Executing program
Zaremba, Wojciech, and Ilya Sutskever. "Learning to execute." arXiv preprint arXiv:1410.4615 (2014).
![Page 130: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/130.jpg)
Compress Image● Compete with JPEG
Toderici, George, et al. "Full resolution image compression with recurrent neural networks." arXiv preprint arXiv:1608.05148 (2016).
![Page 131: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/131.jpg)
Model Architecture Search● Use an RNN to produce model architectures
○ Learned using Reinforcement Learning
Zoph, Barret, et al. "Learning transferable architectures for scalable image recognition." arXiv preprint arXiv:1707.07012 (2017).
![Page 132: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/132.jpg)
Model Architecture Search● Use an RNN to produce model architectures
○ Learned using Reinforcement Learning
Zoph, Barret, et al. "Learning transferable architectures for scalable image recognition." arXiv preprint arXiv:1707.07012 (2017).
![Page 133: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/133.jpg)
![Page 134: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/134.jpg)
Meta-Learning
Santoro, Adam, et al. "Meta-learning with memory-augmented neural networks." International conference on machine learning. 2016.
![Page 135: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/135.jpg)
RNN: The Good, Bad and Ugly● Good
○ Turing Complete, strong modeling ability
● Bad○ Dependencies between temporal connections make computation slow
■ CNNs are resurging now to predict sequence■ WaveNet■ Attention is all you need
● Actually IS a kind of RNN
● Ugly○ Generally hard to train ○ REALLY Long-term memory ??○ The above two fights
![Page 136: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/136.jpg)
RNN’s Rival: WaveNet● Causal Dilated Convolution
Oord, Aaron van den, et al. "Wavenet: A generative model for raw audio." arXiv preprint arXiv:1609.03499 (2016).
![Page 137: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/137.jpg)
RNN’s Rival: Attention is All You Need (Transformer)
Vaswani, Ashish, et al. "Attention Is All You Need." arXiv preprint arXiv:1706.03762 (2017).https://research.googleblog.com/2017/08/transformer-novel-neural-network.htmlhttps://courses.cs.ut.ee/MTAT.03.292/2017_fall/uploads/Main/Attention%20is%20All%20you%20need.pdf
Get rid of sequential computation
![Page 138: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/138.jpg)
Attention is All You Need● The encoder self-attention distribution for the word “it” from the 5th to the 6th layer of a
Transformer trained on English to French translation (one of eight attention heads)
![Page 139: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/139.jpg)
![Page 140: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/140.jpg)
Attention is All You Need● But … the decoder part is actually an RNN ??
○ Kinds of like neural GPU
![Page 141: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/141.jpg)
Make RNN Great Again!
![Page 142: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/142.jpg)
Summary● RNN’s are great!
![Page 143: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/143.jpg)
Summary● RNN’s are great!
● RNN’s omnipotent!
![Page 144: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/144.jpg)
Summary● Turing complete
● … So you cannot solve halting problem
● But besides that, the only limit is your imagination.
![Page 145: Lecture 9 Recurrent Neural Networks - zsc.github.io Recurrent... · Neural Machine Translation (NMT) Attention over input sequence There’re words in two languages that share the](https://reader030.fdocuments.in/reader030/viewer/2022040715/5e1ef9545ee7da6b5b6d4209/html5/thumbnails/145.jpg)