Lecture 5: Recurrent Neural...

ME 780

Lecture 5: Recurrent Neural Networks

Nima Mohajerin

University of Waterloo

WAVE Labnima.mohajerin@uwaterloo.ca

July 4, 2017

ME 780

Overview

1 Recap

2 RNN Architectures for Learning Long Term Dependencies

3 Other RNN Architectures

4 System Identification with RNNs

ME 780Recap

RNNs deal with sequential information.RNNs are dynamic systems. Frequently their dynamic isrepresented via state-space equations.

ME 780Recap

A simple RNN

A simple RNN in discrete-time domain:

x(k) = f(Ax(k − 1) + Bu(k) + bx

)y(k) = g

(Cx(k) + by

)x(k) ∈ Rs : RNN state vector, no. of states = no. of hidden neurons = sy(k) ∈ Rn : RNN output vector, no. of output neurons = nu(k) ∈ Rm : Input vector to RNN (Independent input)

A ∈ Rs × Rs : State feedback weight matrixB ∈ Rs × Rm : Input weight matrix

bx ∈ Rs : Bias termC ∈ Rn × Rs : State to output weight matrix

by ∈ Rn : Output bias

ME 780Recap

Back Propagation Through Time

One data sample:Input: U =

[u(k0 + 1) u(k0 + 2) . . . u(k0 + T )

output: Yt =[yt(k0 + 1) yt(k0 + 2) . . . yt(k0 + T )

SSE cost (per sample):

L = 0.5T∑

k=1e(k0+k)>e(k0+k) = 0.5

T∑k=1

n∑i=1

(yi (k0+k)−yt,i (k0+k)

Batch cost (batch size = D):L = 0.5

∑Dd=1

∑Tk=1 ed (k0 + k)>ed (k0 + k)

ME 780Recap

GradientsTo do a derivative-based optimization, we need the gradient of L:

∂L∂aij

k=1e>(k0 + k)

∂e(k0 + k)

∂aij=

T∑k=1

e>(k0 + k)∂y(k0 + k)

∂aij

∂y(k)

∂aij=∂(Cx(k) + by

)∂aij

g′(Cx(k) + by

)= C∂x(k)

∂aijg′(Cx(k) + by

)∂x(k)

∂aij=∂v(k)

∂aijf ′(v(k)), v(k) = Ax(k − 1) + Bu(k) + bx

∂v(k)

∂aij=∂A∂aij

x(k − 1) + A∂x(k − 1)

∂aij=

xj(k − 1)0...

+ A∂x(k − 1)

∂aij

ME 780RNN Architectures for Learning Long Term Dependencies

Section 2

RNN Architectures for Learning Long TermDependencies

Gated Architectures

g(x) = x σ(Ax + b)

: element-wise multiplicaitonx ∈ Rn

g(x) ∈ Rn

A ∈ Rn × Rn

Preserve Information with Gated Architectures

The idea is that if a neuron has a self-feedback with weightequal to one, the information will retain for an infinite amountof time when unfolded.Some information should decay, some should not be stored.With a gate the intention is to control the self-feedbackweight.

Gated Recurrent Unit

gf (k) =σ(Wi

f u(k) + Wof yh(k − 1)

)gi (k) =σ

iu(k) + Woi yh(k − 1)

)m(k) =h

y u(k) + Wyy (gf yh(k − 1))

)yh(k) =gi (k) yh(k − 1) +

(1 gi (k)

Long Short Term Memory Cell

gi(k) =σ(Wi

i u(k) + Woi yh(k − 1) + Wc

i c(k − 1) + bi)

gf (k) =σ(Wi

f u(k) + Wof yh(k − 1) + Wc

f c(k − 1) + bf)

go(k) =σ(Wi

ou(k) + Wooyh(k − 1) + Wc

oc(k − 1) + bo)

c(k) =gi(k) f(Wi

cu(k) + Woc yh(k − 1)

)+ gf (k) c(k − 1)

m(k) =c(k) go(k)yh(k) =h(Wy m(k − 1) + by ). 11/25

Learning Long-Term Dependencies

One way to avoid gradient exploding is to clip the gradient:

if ||g|| > v , g← gv||g||

One way to address vanishing gradient is to use a regularizerthat maintains the magnitude of the gradient vector (Pascanu et

al. 2013):

Ω =∑

( ||∇x(k)L ∂x(k)∂x(k−1) ||

||∇x(k)L||− 1

ME 780Other RNN Architectures

Section 3

Other RNN Architectures

RNNs as Associative Memories

An RNN is a nonlinear chaotic system.It can have many attractors in its phase space.Hopfield (1985) model is the most popular one. It is a fullyconnected recurrent model where the feedback weight matrixis symmetric and has diagonal elements equal to zero.Hopfield model is stable in a Lyapunov sense if the outputneurons are updated one at a time. (Refer to Du KL, Swamy MNS (2006)Neural networks in a softcomputing framework doe further discussion)

Deep RNNs

We can generalize this idea and create a connection matrix:

Deep RNNsExample:

(to L1) (to L2) (to L3) (to output)

1 0 1 0 (from input)1 0 1 1 (from L1)1 0 1 0 (from L2)0 1 1 1 (from L3)

Reservoir Computing

One approach to cope with the difficulty of training RNNs.The idea is to use a very large RNN, as a reservoir and use itto transform the input.The transformed input by the RNN is then linearly combinedto form the output.The linear weights are trained while the reservoir (RNN) isfixed.Echo State Networks (continuous output neurons), LiquidState Machines (spiking binary neurons)

Reservoir Computing

How to set the reservoir weights?Set weights in such a way that the RNN is at the edge ofstability: set the eigenvalues of the state Jacobian close toone.

J(k) =∂x(k)

∂x(k − 1)

Echo State Networks (continuous output neurons), LiquidState Machines (spiking binary neurons)

ME 780System Identification with RNNs

Section 4

System Identification with RNNs

Reconstructing the System States

We have a set of observations, i.e., measurements of adynamic system input and output (states).We want to learn the system dynamicsRNNs are universal approximators for dynamic systems(K. Funahashi and Y. Nakamura, 1993)

Delay embedding theorem (Taken’s theorem) States that achaotic dynamical system can be reconstructed from asequence of observations of the system.It leads to Auto-Regressive with eXogenous (ARX) models.

Nonlinear Auto-Regressive with eXogenous Inputs

y(k) = F(u(k),u(k − 1), . . . ,u(k − dx ), y(k − 1), . . . , y(k − dy )

F can be constructed using a neural network. Typically anMLP is used.

Teacher Forcing (Parallel Training)

Is mainly used in NARX architectures.Substitute the past network predictions with the targets.

y(k) = F(u(k),u(k−1), . . . ,u(k−dx ), yt(k−1), . . . , yt(k−dy )

Converts the RNN to a FFNN (Single-step prediction).

Lecture 5: Recurrent Neural...

Documents

Transcript of Lecture 5: Recurrent Neural...

CNN RNN Autoencoder - UNISTisystems.unist.ac.kr/wp-content/uploads/sites/209/2017/... · 2018. 12. 28. · CNN RNN Autoencoder. Today 1. Machine*Learning 2. Deep*Learning –CNN –RNN

RNN Slides

Introduction to Cognitive Modeling: Architectures and ...clcl/Q550/Lecture_2/Lecture_2.pdf · Introduction to Cognitive Modeling: Architectures and Approaches Q550: Models in Cognitive

Design 2016 intro lecture_2

arXiv:2004.08500v3 [cs.CL] 6 Jul 2020 · then introduce more specialized formal concepts: WFAs, counter machines (CMs), space complexity, and, ﬁnally, various RNN architectures.

Dp-Rnn: Type Ii Diabetic Prediction Using Gkfcm And Rnn

Deep Learning for Load Forecasting: Sequence to Sequence … · 2020-06-11 · a Sequence-to-Sequence (S2S) RNN which combines an encoder RNN and decoder RNN has shown a greater success

RNN Architectures · 2.1 - Mode-Based Challenges Video Audio ＋ Driver conversing － Passenger conversing － Ambient noise － Silence ＋ Driver clearly visible － Face (feature)

Computational Psycholinguistics - Division of Social …idiom.ucsd.edu/~rlevy/teaching/esslli2009/lecture_2/lecture_2.pdfaction at generative (joint) versus ... • There is usually

Compression of Neural Machine Translation Models via Pruningnlp.stanford.edu/pubs/see2016compression.pdf(2015b) to recurrent neural networks (RNN), in particular LSTM architectures

RNN LSTM and Deep Learning Libraries · RNN LSTM and Deep Learning Libraries UDRC Summer School Muhammad Awais m.a.rana@surrey.ac.uk

Lecture 4: Recurrent Neural Networks + Generalization · 2020. 5. 25. · – RNN architectures,unrolling,back-propagation-through-time(BPTT),param reuse 3. Vanishing gradients, Long-Short-Term

NLP architectures Towards Common-Sense …...Why RNN is good for NLP : Sentences are variable length sequences. We can infer meaning of one word in a sentence from the context. So

RNN · shortage and why so many bosses hate it save 22 RNN Training PART OF RNN GROUP . AZ Diagram 2a: Summary approach to "On-programme" and End Point Assessment— Engineering Technician

HNPRT LB (LN RNRN D PLNT DNT RNN - COREt Btn ltn 22: 42 l, HNPRT LB (LN RNRN D PLNT DNT RNN DVNT Rfl ZRT, blrd PR, nl NT ntn TRN RN. hnprt lb (Ltt rnrn d plnt dnt rnn dvnt. L rnn n

Lecture_2 Electrical Engineering IIT G

Dennis Madsen Fall 2019 Pattern Recognition - unibas.ch · 2019. 11. 1. · •CNN popular architectures •Sequence Models/Recurrent neural networks (RNN) ... To prevent vanishing/exploding

Recurrent Neural Networks for Time Series Forecasting ... · of existing RNN architectures for forecasting, that allow us to develop guidelines and best practices for their use. For

cs221 final poster - Stanford Universityweb.stanford.edu/class/cs224n/posters/15744482.pdfContext Final Representation RNN Self-Attention RNN Inter-Attention RNN Contextualized Word

Lecture_2-tRNA Structure & Function

CNN RNN Autoencoder - UNISTisystems.unist.ac.kr/wp-content/uploads/sites/209/2017/... · 2018. 12. 28. · CNN RNN Autoencoder. Today 1. MachineLearning 2. DeepLearning –CNN –RNN