slides Empirical Evaluation of Gated Recurrent …...Approach •Empirically evaluated recurrent...

Empirical Evaluation of Gated Recurrent Neural Networks on

Sequence ModelingAuthors: Junyoung Chung, Caglar Gulcehre, KyungHyun Cho

and Yoshua Bengio

Presenter: Yu-Wei Lin

Background: Recurrent Neural Network

• Traditional RNNs encounter many difficulties when training long-term dependencies.o The vanishing gradient problem/exploding gradient problem.

• There are two approach to solve this problem:oDesign use new methods to improve or replace stochastic gradient descent

(SGD) methodoDesign more sophisticated recurrent unit, such as LSTM, GRU.

• The paper focus on the performance of LSTM and GRU

Research Question

• Do RNNs using recurrent units with gates outperform traditional RNNs?

• Does the LSTM or the GRU perform better as a recurrent unit for tasks such as music and speech prediction?

Approach

• Empirically evaluated recurrent neural networks (RNN) with three

widely used recurrent units

o Traditional tanh unit

o Long short-term memory (LSTM) unit

oGated recurrent unit (GRU)

• The evaluation focused on the task of sequence modeling

oDataset: (1) polyphonic music data (2) raw speech signal data.

• Compare their performances using a log-likelihood loss function

Recurrent Neural Networks

• xt is the input at time step t.• ht is the hidden state at time step t.• ht is calculated based on the previous hidden state and

the input at the current step:oℎ" = ∫(&'" +)ht−1)

• ot is the output at step t. o E.g., if we wanted to predict the next word in a sentence it

would be a vector of probabilities across our vocabulary

Main concept of LSTM

• Closer to how humans process informationoControl how much of the previous hidden state to forgetoControl how much of new input to take

• The notion is proposed by Hochreiter and Schmidhuber 1997

Long Short-Term Memory (LSTM)• Forget Gate (gate 0, forget past)

• Input Gate (current cell matters)

• New memory cell

• Final memory cell

• Output Gate (how much cell is exposed)

• Final hidden state

Main concept of Gated Recurrent Unit (GRU)

• LSTMs work well but unnecessarily complicated• GRU is a variant of LSTM• Approach:

oCombine the forgetting gate and input gate in LSTM into a single "Update Gate".

oCombine the Cell State and Hidden State.• Computationally less expensive

o less parameters, less complex structure• Performance is as good as LSTM

Gated Recurrent Unit (GRU)• Reset gate: determines how to combine the new

input with the previous memory

• Update gate: decides how much of the previous memory to keep around

• Candidate hidden layer

• Final memory at time step combines current and previous time steps:

• If we set the reset to all 1’s and update gate to all 0’s, the model is the same as plain RNN model

Advantage of LSTM/GRU

• It is easy for each unit to remember the existence of a specific feature in the input stream for a long series of steps.

• The shortcut paths allow the error to be back-propagated easily without too quickly vanishingo Error pass through multiple bounded nonlinearities, which reduces the

likelihood of the vanishing gradient.

LSTMs v.s. GRU

LSTM GRU

Three gates Two gates

Control the exposure of memory content (cell state)

Expose the entire cell state to other units in the network

Has separate input and forget gates

Performs both of these operations together via update gate

More parameters Fewer parameters

Model• The authors built models for each of

their three test units (LSTM, GRU, tanh) along the following criteria:o Similar numbers of parameters in each

network, for fair comparisonoRMSProp optimizationo Learning rate chosen to maximize the

validation performance from 10 different points from -12 to -6

• The models are tested across four music datasets and two speech datasets.

Task• Music dataset

o Input: the sequence of vectorsoOutput: predict the next time step of

the sequence• Speech signal dataset: • Look at 20 consecutive samples to

predict the following 10 consecutive samples

• Input: one-dimensional raw audio signal at each time step

• Output: the next time 10 consecutive step of the sequence

Result - average negative log-likelihood

• Music datasetso The GRU-RNN outperformed

all the others (LSTM-RNN and tanh-RNN)

oAll the three models performed closely to each other

• Ubisoft datasetso the RNNs with the gating

units clearly outperformed the more traditional tanh-RNN

Result - Learning curves• Learning curves for

training and validation sets of different types of units o Top: number of iterations oBottom: the wall clock

time• y-axis: the negative-log

likelihood of the model shown in log-scale.• GRU-RNN makes faster

progress in terms of both the number of updates and actual CPU time.

Result - Learning curves Cont’d

• The gated units (LSTM and GRU) well outperformed the tanhunit• The GRU-RNN once again

producing the best results

Take ways

• Music datasetso The GRU-RNN reached the inching better performance.oAll of the models performed relatively closely

• Speech datasetso The gated units well outperformed the tanh unit o The GRU-RNN produce the best results both in terms of accuracy and training

time.• Gated units are superior to recurrent neural networks (RNNs)• The performance of the two gated units (LTM and RGU) cannot be

clearly distinguished.

Thank you !

slides Empirical Evaluation of Gated Recurrent …...Approach •Empirically evaluated recurrent...

Documents

Transcript of slides Empirical Evaluation of Gated Recurrent …...Approach •Empirically evaluated recurrent...

Gated Feedback Recurrent Neural Networkssji/classes/DL16/RNN-basic/Gated Feedback... · Gated Feedback Recurrent Neural Networks ... of character-level language modeling and Python

Recurrent Neural Networks - Hacettepe · Recurrent Neural Networks Multi-layer Perceptron Recurrent Network • An MLP can only map from input to output vectors, whereas an RNN can,

Recurrent Neural Network (RNN) - 國立臺灣大學speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/RNN.pdf · Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can

Lecture 10: Recurrent Neural Networksvision.stanford.edu/teaching/cs231n/slides/2017/cs231n_2017_lectur… · Lecture 10 - 21 May 4, 2017 Recurrent Neural Network x RNN y We can process

Development of A New Recurrent Neural Network Toolbox (RNN-Tool)

Lecture 5: Recurrent Neural Networkswavelab.uwaterloo.ca/wp-content/uploads/2017/07/Lecture_2.pdf · 2 RNN Architectures for Learning Long Term Dependencies 3 Other RNN Architectures

Intro to Recurrent Neural Networks€¦ · This Lecture: 1. Segmentation Review 2. Intro to Polygon-RNN 3. Modeling Sequences 1. Vanilla RNN 2. Training and Problems 3. LSTM 4. Convolutional

Recurrent Neural Networks & Long Short-Termed MemoryRecurrent Neural Network (RNN) •Feedforward networks don’t consider temporal states •RNN has a loop to “memorize” information

Recurrent Neural Networks - znu.ac.ircv.znu.ac.ir/afsharchim/DeepL/RNN1.pdf · Summary of RNN types (1) Vanilla mode of processing without RNN, from fixed-sized input to fixed-sized

arXiv:1406.1078v3 [cs.CL] 3 Sep 2014 · PDF filearXiv:1406.1078v3 [cs.CL] 3 Sep 2014. 2 RNN Encoder–Decoder 2.1 Preliminary: Recurrent Neural Networks A recurrent neural network

Recurrent Neural Networks (RNN)guerzhoy/321/lec/W08/rnn.pdf · RNN for Language Modelling •Given a list of word vectors (e.g., one-hot encodings of words) 1,…, 𝑡−1, 𝑡,

Recurrent Fusion Network for Image Captioning arXiv:1807 ... · encoder is usually a convolutional neural network (CNN), while the decoder is a recurrent neural network (RNN). The

Beyond Credential Stuffing: Password Similarity Models ... · Encoder RNN Decoder RNN Encoder-decoder architecture built using character level recurrent neural network (RNN) Pass2Path

Recurrent Neural Network shad.pcs15@iitp.acshad.pcs15/data/rnn-shad.pdf · Recurrent Neural Network (RNN) Training of RNNs BPTT Visualization of RNN through Feed-Forward Neural Network

Recurrent Neural Networks · 2020-04-06 · Overview •Finite state models •Recurrent neural networks (RNNs) •Training RNNs •RNN Models •Long short-term memory (LSTM) •Attention

Statistical Machine Learning Methods for Bioinformatics IV ...calla.rnet.missouri.edu/cheng_courses/cheng_nn_bioinfo.pdf · Dimensional Recurrent Neural Network (1D-RNN) Pollastri

Linked Recurrent Neural Networks - arXiv · 2018-08-21 · Linked Recurrent Neural Networks WOODSTOCK’97, July 1997, El Paso, Texas USA RNN Input Layer RNN Layer Link Layer Output

13-recurrent neural network - KAISTmac.kaist.ac.kr/.../slides/13-recurrent_neural_network.pdf · 2019-05-16 · The resulting network is called a recurrent neural network (RNN). RNNs

Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered

Partition-wise Recurrent Neural Networks for Point-based ... · current Neural Networks (RNN), especially Long Short-Term Memory (LSTM) [5], Gated Recurrent Units (GRU) [6], and bidirectional