Neural networks - Zagazig University · through the network, cyclic paths exist that connect one or...

Neural networks

1

Dynamic networks: Recurrent neural networks

They learn a nonstationary I/O mapping,Y(t)f(X(t)), X(t) and Y(t) are timevarying patterns

They model dynamic systems: control systems,optimization problems, artificial vision and speechrecognition tasks, time series predictions

2

Introduction

Dynamic networks

Equipped with a temporal dynamics, thesenetworks are able to capture the temporalstructure of the input and to “produce” a timelineoutput

Temporal dynamics: unit activations can changeeven in presence of the same input pattern

Architectures composed by units having feedbackconnections, both between neurons belonging tothe same layer or to different layers

Partially recurrent networks

Recurrent networks

3

Partially recurrent networks

Feedforward networks equipped with a set of inputunits, called state or context unitsThe context layer output correspond to the output,at the previous time step, of the units that emitfeedback signals, and it is sent to the unitsreceiving feedback signals

Elman network (1990)

Jordan network (1986)

4

Elman networks 1

The output of each context unit is equal to that of thecorresponding hidden unit at the previous (discrete)instant:

xc,i(t) xh,i(t 1)

To train the network, the Backpropagation algorithm isused, in order to learn the hiddenoutput, theinputhidden and the contexthidden weights 5

Feedback connections on thehidden layer, with fixedweights all equal to oneContext units equal, innumber, to the hidden units,and considered just as inputunits

All the output functions operate on the weighed sum ofthe inputs, except for the input and the context layers(that act just as “buffers”)

Actually, sigmoidal functions are used in both the hiddenand the output layer

The context layer inserts a singlestep delay in thefeedback loop: the output of the context layer ispresented to the hidden layer, in addition to thecurrent pattern

The context layer adds, to the current input, a value thatreproduces the output achieved at the hidden layerbased on all the patterns presented up to the previousstep

6

Elman networks 2

Learning all the trainable weights are attached toforward connections

1) The activation of the context units is initially set to zero,

i.e. xc,i(1)0, i at t1

2) Input pattern xt: evaluation of the activations/outputs ofall the neurons, based on the feedforward transmissionof the signal along the network

3) Weight updating using Backpropagation

4) Let t t1 and go to 2)

The Elman network produces a finite sequence ofoutputs, one for each input

The Elman network is normally used for object trajectoryprediction, and for the generation/recognition oflinguistic patterns

7

Elman networks 3

Elman matlab code

• Elmannet (layer delays ,hidden Sizes ,train Fcn)

• Ex: Here an Elman neural network is used to

solve a simple time series problem.

• [X,T] = simpleseries_dataset; net =

elmannet(1:2,10);

• [Xs,Xi,Ai,Ts] = preparets(net,X,T);

• net = train(net,Xs,Ts,Xi,Ai);

• view(net) Y = net(Xs,Xi,Ai);

• perf = perform(net,Ts,Y)

8

Jordan networks 1

Feedback connections on the output layer, with fixedweights all equal to oneSelffeedback connections for the state neurons, with

constant weights equal to a; a 1 is the recency

constant

9

The network output is sent to the hidden layer byusing a context layerThe activation, for the context units, is determinedbased on the activation of the same neurons and ofthe output neurons, both calculated at the previoustime step

xc,i(t) xo,i(t 1) axc,i(t 1) Selfconnections allow the context units to develop alocal or “individual” memory

To train the network, the Backpropagation algorithm isused, in order to learn the hiddenoutput, theinputhidden and the contexthidden weights

10

Jordan networks 2

The context layer inserts a delay step in the feedbackloop: the context layer output is presented to thehidden layer, in addition to the current pattern

The context layer adds, to the input, a value thatreproduces the output achieved by the network based onall the patterns presented up to the previous step,coupled with a fraction of the value calculated, also atthe previous step, by the context layer itself (viaselfconnections)

11

Jordan networks 3

Recurrent networks 1

A neural network is said to be recurrent if it containssome neurons whose activations depend directly orindirectly from their outputsIn other words, following the signal transmissionthrough the network, cyclic paths exist that connectone or more neurons with itself/themselves:

without crossing other neurons direct feedback (xi(t)explicitly appears in the evaluation of ai(t1) where ai()and xi() respectively represent the activation and theoutput of neuron i)

and/or crossing other neurons undirect feedback

A fully connected neural network is always a recurrentnetwork

12


13

RNN with lateral feedbacks

Fully connected RNN

RNN with self feedbacks

A recurrent network processes a temporal sequence byusing an internal state representation, thatappropriately encodes all the past information injectedinto its inputs

memory arises from the presence of feedback loopsbetween the output of some neurons and the input ofother neurons belonging to the same/previous layersassuming a synchronous update mechanism, thefeedback connections have a memory element (aonestep delay)

The inputs are sequences of arrays:

where Tp represents the length of the pth sequence

(in general, sequences of finite length are considered,even if this is not a necessary requirement)

14


Using an MLP as the basic block, multiple types ofrecurrent networks may be defined, depending onwhich neurons are involved in the feedback

The feedback may be established from the output to theinput neuronsThe feedback may involve the output of the hidden layerneuronsIn the case of multiple hidden layers, feedbacks can alsobe present on several layers

Therefore, many different configurations are possiblefor a recurrent networkMost common architectures exploit the ability of MLPsto implement non linear functions, in order to realizenetworks with a non linear dynamics

15


The behaviour of a recurrent network (during a timesequence) can be reproduced by unfolding it in time,and obtaining the corresponding feedforward network

16

x(t) = f(x(t1),u(t))

y(t) = g(x(t),u(t))

y

y1

y2

u1

u

u2


Recurrent processing

Before starting to process the pth sequence, the state

of the network must be initialized to an assigned value

(initial state) xp(0)Every time the network begins to process a newsequence, there occurs a preliminary “reset” to the initialstate, losing the memory of the past processing phases,that is, we assume to process each sequenceindependently from the others

At each time step, the network calculates the current

output of all the neurons, starting from the input up(t)

and from the state xp(t1)

17

Processing modesLet us suppose that the Lth layer represents the

output layerThe neural network can be trained to transform the inputsequence into an output sequence of the same length(realizing an Input/Output transduction)

A different case is when we are interested only in thenetwork response at the end of the sequence, so as totransform the sequence into a vector

This approach can be used to associate each sequence to aclass in a set of predefined classes

18

Learning in recurrent networks

Backpropagation Through Time (BPTT, Rumelhart,Hinton, Williams, 1986)

The temporal dynamics of the recurrent network is“converted” into that of the corresponding unfoldedfeedforward network

Advantage: very simple to calculate

Disadvantage: heavy memory requirements

RealTime Recurrent Learning (Williams, Zipser, 1989)

Recursive calculation of the gradient of the cost functionassociated with the network

Disadvantage: computationally expensive

19

Learning Set

Let us consider a supervised learning scheme inwhich:

input patterns are represented by sequences

target values are represented by subsequences

Therefore, the supervised framework is supposed toprovide a desired output only with respect to a subset ofthe processing time steps

In the case of sequence classification (or sequence coding

into vectors) there will be a single target value, at time Tp

20

Cost function

The learning set is composed by sequences, eachassociated with a target subsequence

where ϵ stands for empty positions, possibly contained

in the target sequenceThe cost function, measuring the difference between thenetwork output and the target sequence, for all theexamples belonging to the learning set, is defined by

where the instantaneous error epW(ti) is expressed as the

Euclidean distance between the output vector and thetarget vector (but, other distances may also be used)

21

BackPropagation Through Time 1

Given the targets to be produced, the network can betrained using BPTTUsing BPTT means…

…considering the corresponding feedforward networkunfolded in time the length Tp of the sequence to belearnt must be known

…updating all the weights wi(t), t1,…,Tp, in thefeedforward network, which are copies of the same wi inthe recurrent network, by the same amount,corresponding to the sum of the various updates

reported in different layers all the copies of wi(t)should be maintained equal

22

Let N be a recurrent network that must be trained,

starting from 0, on a sequence of length Tp

On the other hand, let N* be the feedforwatd networkobtained by unfolding N in timeWith respect to N* and N, the following statements

hold:N* has a “layer” that contains a copy of N, correspondingto each time step

Each layer in N* collects a copy of all the neuronscontained in N

For each time step t[0, Tp], the synapse from neuron i inlayer l to neuron j in layer l1 in N* is just a copy of thesame synapse in N

23


24


Feedforward network correspondingto a sequence of length T4

Recurrent network

The gradient calculation may be carried out in afeedforward networklike style

The algorithm can be derived from the observation thatrecurrent processing in time is equivalent to constructingthe corresponding unfolded feedforward networkThe unfolded network is a multilayer network, on whichthe gradient calculation can be realized via standardBackPropagationThe constraint that each replica of the recurrent networkwithin the unfolding network must share the same set ofweights has to be taken into account (this constraintsimply imposes to accumulate the gradient related toeach weight with respect to each replica during thenetwork unfolding process)

25


The meaning of backpropagation through time ishighlighted by the idea of network unfoldingThe algorithm is nonlocal in time the wholesequence must be processed, by storing all theneuron outputs at each time step but it is localin space, since it uses only local variables to eachneuronIt can be implemented in a modular fashion,based on simple modifications to the Back-Propagation procedure, normally applied to staticMLP networks

26


The simplest dynamic data type is the sequence,which is a natural way to model temporal domains

In speech recognition, the words, which are the object ofthe recognition problem, naturally flow to constitute atemporal sequence of acoustic features

In molecular biology, proteins are organized in aminoacid strings

The simplest dynamic architectures are recurrentnetworks, able to model temporal/sequentialphenomena

recurrent networks

27

Structured domains

In many realworld problems, the information isnaturally collected in structured data, that have ahybrid nature, both symbolic and subsymbolic,and cannot be represented regardless of the linksbetween some basic entities:

Classification of chemical compoundsAnalysis of DNA regulatory networksTheorem provingPattern recognitionWorld Wide Web

28

Example 1: Inference of chemical properties

Chemical compounds are naturally represented asgraphs (undirected and cyclic)

29

Example 2: Analysis of DNA regulatory networks

A gene regulatory network is a collection of regulatorsthat interact with each other to govern the geneexpression levels of mRNA and proteins

30

Example 4: Pattern recognition

Each node of the tree contains local features, suchas area, perimeter, shape, color, etc., of the relatedobject, while branches denote inclusion relations

31

Feedforward- vs. recurrent NN

...

...

...

...

...

...

Input InputOutput Output

• connections only "from left

to right", no connection

cycle

• activation is fed forward

from input to output through

"hidden layers"

• no memory

• at least one connection

cycle

• activation can

"reverberate", persist even

with no input

• system with memory

Neural networks - Zagazig University · through the network, cyclic paths exist that connect one or...

Documents

Transcript of Neural networks - Zagazig University · through the network, cyclic paths exist that connect one or...