Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant...

Recurrent Neural Networks11-785 / 2020 Spring / Recitation 7

Vedant Sanil, David Park

“Drop your RNN and LSTM, they are no good!”The fall of RNN / LSTM, Eugenio Culurciello

Wise words to live by indeed

Content

• 1 Language Model

• 2 RNNs in PyTorch

• 3 Training RNNs

• 4 Generation with an RNN

• 5 Variable length inputs

A recurrent neural network and the unfolding in time of the computation involved in its forward computation.

RNNs Are Hard to TrainWhat isn’t? I had to spend a weektraining an MLP :(

Different Tasks

Each rectangle is a vector and arrows represent functions (e.g. matrix multiply). Input vectors are in red, output vectors are in blue and green vectors hold the RNN's state (more on this later)

Different Tasks

One-to-OneA simple MLP, no recurrence

One-To-ManyExample, Image Captioning: Have a single

image, generate a sequence of words

Different Tasks

Many-to-OneExample, Sentiment analysis: Given a sentence, classify if its sentiment as positive or negative

Many-To-ManyExample, Machine Translation: Have an input sentence

in one language, convert it to another language

A history of machine translations from Cold War to Deep Learning:https://www.freecodecamp.org/news/a-history-of-machine-translation-from-the-cold-war-to-deep-learning-f1d335ce8b5/

Language Models

• Goal: predict the “probability of a sentence” P(E)

• How likely it is to be an actual sentence

Credits to cs224n

A lot of jargon, to basically say:

An RNN Language Model

RNN module in Pytorch

RNN modules in Pytorch

• Important: the outputs are exactly the hidden states of the final layer. Hence if the model returns y, h:

• y: (seq_len, batch, num_directions * hidden_size)

• h: (num_layers * num_directions, batch, hidden_size)

• Lets num_directions = 1, y[-1] == h[-1]

• LSTM and GRU:

• (h_n, c_n): (hidden_states, memory_cell)

Can be better visualized as,

RNN cell

Embedding Layers

Training a Language Model

Evaluate your model

Generation

Greedy Search, Random Search and Beam Search1. Greedy search: select the most likely word

2. Random Search: sample a word from the distribution

3. Beam Search: keep the n best words at each step, n is the beam size

Are we done?

• No.

How to train a LM: fixed length

Limits of fixed-length inputs

How to train a LM: Variable Length• Your dataset is now a list of N sequences of different lengths

• The input has a fixed dimension (seq_len, batch, input_size)

• How could we deal with this situation?

1. pad_sequence

2. Packed sequence

pad_sequence

Notice how seq_len is different, but input size is same

torch.Size([4, 3, 2])

(seq_len, batch, input_size)

pad_sequence

pack_sequence

packed_2 = pack_sequence([x1,x2,x3], enforce_sorted=True) packed = pack_sequence([x1,x2,x3], enforce_sorted=False)

pack_padded_sequence and pad_packed_sequence

Why is the batch_size = tensor([3, 2, 2, 1]) here?

batch_sizes (Tensor): Tensor of integers holding information about the batch size at each sequence stepFor instance, given data ``abc`` and ``x`` the:class:`PackedSequence` would contain data ``axbc`` with``batch_sizes=[2,1,1]``.

Packed Sequences and RNNs

• Packed sequences are on the same device as the padded sequence

• Packed sequences could help your RNNs know the length for each instance

MLP, RNN, CNN, Transformer• All these layers are just features extractors

• Temporal convolutional network (TCN) “outperform canonical recurrent networks such as LSTMs across a diverse range of tasks and datasets, while demonstrating longer effective memory”

(An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling)

Is that enough?

• RNN, LSTM, GRU

• Transformer (would be covered more in the future lectures and recitations)

• CNN

Credits: Attention is all you need

Papers

• https://arxiv.org/pdf/1708.02182.pdf

• For more tricks about Regularizing and Optimizing LSTM Language Models

• Attention is all you need!

• https://towardsdatascience.com/the-fall-of-rnn-lstm-2d1594c74ce0

• LSTM song!!!

Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant...

Documents

Transcript of Recurrent Neural Networks · Recurrent Neural Networks 11-785 / 2020 Spring / Recitation 7 Vedant...

Exploring Recurrent Neural Networks to Detect Named ...cips-cl.org/static/anthology/CCL-2015/CCL-15-077.pdf · 2 Recurrent neural network model 2.1 Two types of RNN architectures

Linked Recurrent Neural Networks - arXiv · 2018-08-21 · Linked Recurrent Neural Networks WOODSTOCK’97, July 1997, El Paso, Texas USA RNN Input Layer RNN Layer Link Layer Output

Reduced-Order Modeling of Subsurface Multi-phase Flow ...€¦ · RNN) (Nagoor Kani et al. in DR-RNN: a deep residual recurrent neural network for model reduction. ArXiv e-prints,

DR-RNN: A deep residual recurrent neural network for … · DR-RNN: A deep residual recurrent neural network for model reduction J.Nagoor Kania,, Ahmed H. Elsheikha a School of Energy,

Recurrent Neural Networks (RNN) and Long-Short-Term … · Last Time: CNN Architectures. ... Sequential Processing of Non-Sequence Data Ba, Mnih, and Kavukcuoglu, ... Recurrent Neural

Recurrent Neural Networks - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf · Recurrent Neural Network x RNN y We can process a sequence of vectors x

Deep Captioning with Multimodal Recurrent Neural Networks ...Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN) Junhua Mao1,2, Wei Xu1, Yi Yang1, Jiang Wang1, Zhiheng

Fundamentals of Recurrent Neural Network (RNN) and Long ...

YoTube: Searching Action Proposal Via Recurrent and Static ... Searching... · A. Recurrent Neural Network Recurrent Neural Network (RNN), especially Long-Short Term Memory (LSTM)

Keras Intro ADACS - OzGrav · 2019. 3. 21. · •To employ Neural Networks for learning over time sequences of data, we use a Recurring Neural Network (RNN). •A Recurrent neural

Recurrent Neural Nets...FIR/IIR CNN/RNN Back-Prop Back-Prop BPTT Conclusion Recurrent Neural Nets Mark Hasegawa-Johnson All content CC-SA 4.0 unless otherwise speci ed. ECE 417: Multimedia

Recurrent Neural Networksbhiksha/courses/deeplearning/...The Backpropagation Through Time (BTT) Algorithm Different Recurrent Neural Network (RNN) paradigms How Layering RNNs works

Lecture 5: Recurrent Neural Networkswavelab.uwaterloo.ca/wp-content/uploads/2017/07/Lecture_2.pdf · 2 RNN Architectures for Learning Long Term Dependencies 3 Other RNN Architectures

Deep Captioning with Multimodal Recurrent Neural Networks ... Memo 033.… · Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN) by Junhua Mao, Wei Xu, Yi Yang, Jiang

Partition-wise Recurrent Neural Networks for Point-based ... · current Neural Networks (RNN), especially Long Short-Term Memory (LSTM) [5], Gated Recurrent Units (GRU) [6], and bidirectional

Recurrent Neural Networks - znu.ac.ircv.znu.ac.ir/afsharchim/DeepL/RNN1.pdf · Summary of RNN types (1) Vanilla mode of processing without RNN, from fixed-sized input to fixed-sized

Deep Learning: What Makes Neural Networks Great Again?infochim.u-strasbg.fr/CS3_2018/Presentations/... · • Convolutional neural networks (CNN) • Recurrent neural networks (RNN)

Recurrent Neural Networks · 2020-04-06 · Overview •Finite state models •Recurrent neural networks (RNNs) •Training RNNs •RNN Models •Long short-term memory (LSTM) •Attention

Recurrent Neural Network - University of Torontotingwuwang/rnn_tutorial.pdf · Recurrent Neural Network ... RNN could do a lot more than modeling language 1. ... (Gated Feedback Recurrent

Development of A New Recurrent Neural Network Toolbox (RNN-Tool)