Recurrent Neural Networksmausam/courses/col772/spring... · Recurrent Neural Networks ... (Long...

Recurrent Neural

NetworksYoav Goldberg

Dealing with Sequences

• For an input sequence x1,...,xn, we can:

• If n is fixed: concatenate and feed into an MLP.

• sum the vectors (CBOW) and feed into an MLP.

• Break the sequence into windows. Find n-gram

embedding, sum into an MLP.

• Find good ngrams using ConvNet, using pooling

(either sum/avg or max) to combine to a single

vector.

Dealing with Sequences

• For an input sequence x1,...,xn, we can:

• If n is fixed: concatenate and feed into an MLP.

• sum the vectors (CBOW) and feed into an MLP.

• Break the sequence into windows (i.e., for tagging).

Each window is fixed size, concatenate into an MLP.

• Find good ngrams using ConvNet, using pooling

(either sum/avg or max) to combine to a single

vector.

Some of these approaches consider local word

order (which ones?).

How can we consider global word order?

Recurrent Neural Networks

• Very strong models of sequential data.

• Trainable function from n vectors to a single vector.

v(what) v(is) v(your) v(name) enc(what is your name)

• There are different variants (implementations).

• So far, we focused on the interface level.

• Trainable function from n vectors to a single* vector.

*this one is internal. we only care about the y

• Trainable function from n vectors to a single* vector.

• Recursively defined.

• There's a vector for every prefix

for every finite input sequence,

can unroll the recursion.

for every finite input sequence,

can unroll the recursion.

An unrolled RNN is just a very deep Feed Forward Network

with shared parameters across the layers,

and a new input at each layer.

trained parameters.

• But we can train them.define function form

define loss

for Text Classification

(what are the parameters?)

CBOW as an RNN

(what are the parameters?)

CBOW as an RNN

Is this a good parameterization?

CBOW as an RNN

how about this modification?

Simple RNN (Elman RNN)

• Looks very simple.

• Theoretically very powerful.

• In practice not so much (hard to train).

• Why? Vanishing gradients.

Simple RNN (Elman RNN)

• RNN as a "computer":

input xi arrives, memory s is updated.

• In the Elman RNN, entire memory is written at

each time-step.

• entire memory = output!

Another view on behavior:

LSTM RNN

better controlled memory access

continuous gates

Differentiable "Gates"

• The main idea behind the LSTM is that you want to

somehow control the "memory access".

• In a SimpleRNN:

• All the memory gets overwritten

read previous state memory write new input

Vector "Gates"

• We'd like to:

* Selectively read from some memory "cells".

* Selectively write to some memory "cells".

• A gate function:

vector of valuesgate controls access

(element-wise multiplication)

Vector "Gates"

• We'd like to:

vector of valuesgate controls access

(element-wise multiplication)

Vector "Gates"

• We'd like to:

vector of values gate controls access

Vector "Gates"

• Using the gate function to control access:

which cells to read which cells to write

Vector "Gates"

• Using the gate function to control access:

• (can also tie them: )

which cells to read which cells to write

Vector "Gates"

• Problem with the gates:

* they are fixed.

* they don't depend on the input or the output.

• Solution: make them smooth, input dependent, and

trainable.

"almost 0"

"almost 1"

function of input and state

• Problem with the gates:

* they are fixed.

* they don't depend on the input or the output.

• Solution: make them smooth, input dependent, and

trainable.

"almost 0"

"almost 1"

function of input and state

• The LSTM is a specific combination of gates.

LSTM (Long short-term Memory)

• The GRU is a different combination of gates.

GRU(Gated Recurrent Unit)

• The GRU and the LSTM are very similar ideas.

• Invented independently of the LSTM, almost two

decades later.

GRU vs LSTM

• The GRU formulation:

• The GRU formulation.

• Many other variants exist.

• Mostly perform similarly to each other.

• Different tasks may work better with different

variants.

• The important idea is the differentiable gates.

Other Variants

• The LSTM is formulation:

Recurrent Additive Networks

• Systematic search over LSTM choices

• Find that (1) forget gate is most important

• (2) non-linearity in output important since cell state

can be unbounded

• GRU effective since it doesn’t let cell state be

unbounded

Bidirectional LSTMs

Infinite window around the word

Deep LSTMs

Deep Bi-LSTMs

• The gated architecture also helps the vanishing

gradients problems.

• For a good explanation, see Kyunghyun Cho's

notes:

http://arxiv.org/abs/1511.07916 sections 4.2, 4.3

• Chris Olah's blog post

Recurrent Neural Networksmausam/courses/col772/spring... · Recurrent Neural Networks ... (Long...

Documents

Transcript of Recurrent Neural Networksmausam/courses/col772/spring... · Recurrent Neural Networks ... (Long...

Gated Recurrent Convolution Neural Network for OCR...Inspired by the Gated Recurrent Unit (GRU) [4], we let the controlling gate receive the signals from the feed-forward input as

RECURRENT NEURAL NETWORKS

Processing 11. Recurrent Neural Networks and Natural ... · LSTM and GRU Fran˘cois Fleuret EE-559 { Deep learning / 11. Recurrent Neural Networks and Natural Language Processing

Recurrent Neural Network Grammars - nlp.cs.hku.hk

Quasi-Recurrent Neural Networks - NVIDIAon-demand.gputechconf.com/...bradbury-quasi-recurrent-neural-netw… · Quasi-Recurrent Neural Networks James Bradbury and Stephen Merity Salesforce

Partition-wise Recurrent Neural Networks for Point-based ... · current Neural Networks (RNN), especially Long Short-Term Memory (LSTM) [5], Gated Recurrent Units (GRU) [6], and bidirectional

Lecture 13: Recurrent Neural Nets

Lecture 7: Recurrent Neural Networksfall97.class.vision/slides/7.pdf · 2018-11-05 · SRTTU –A.Akhavan Lecture 7- GRU(Simplified) [Cho et al., 2014. On the properties of neural

Machine Learning Lecture 18 Bayes Decision Theory · Gated Recurrent Units (GRU) •Applications of RNNs 3 B. Leibe ng ‘18 Recurrent Neural Networks •Up to now Simple neural network

Recurrent neural networks for statistical machine translation (Neural Machine Translation) · 2016-11-14 · Recurrent neural networks for statistical machine translation (Neural

Recurrent Neural Network - whdeng Neural Network.pdfRecurrent neural networks Long Short-Term Memory recurrent networks Applications Recurrent Neural Network Xiaogang Wang xgwang@ee.cuhk.edu.hk

Recurrent Neural Networks with Top-k Gains for Session ... · Recurrent Unit (GRU) [5], which we use in this work. Recurrent Neural Networks have been used with success in the area

Understanding Recurrent Neural Networks Using ...

Recurrent neural network - SINTEF · 2017. 1. 18. · THE RECURRENT NEURAL NETWORK A recurrent neural network (RNN) is a universal approximator of dynamical systems. It can be trained

Gated Recurrent Unit (GRU) for Emotion Classiﬁcation … Gated Recurrent Unit (GRU) for Emotion Classiﬁcation from Noisy Speech Rajib Rana , Julien Eppsy, Raja Jurdakz, Xue Lix,

A Recurrent Quantum Neural Network

A Recurrent Neural Network Based Recommendation Systemcs224d.stanford.edu/reports/LiuSingh.pdf17 Recurrent Unit (GRU) and Long Short-Term Memory (LSTM) as well as novel 18 implementation

Recurrent Neural Radio Anomaly Detection

Lecture 6: Recurrent & Graph Neural NetworksLecture 6: Recurrent & Graph Neural Networks Efstratios Gavves. UVA DEEP LEARNING COURSE –EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS

Fancy Recurrent Neural Networks - GitHub Pages · Fancy Recurrent Neural Networks Richard Socher Material from cs224d.stanford.edu. ... GRU intuition 9 Richard Socher 2/1/17 • If