Recurrent Network InputsOutputs. Motivation Associative Memory Concept Time Series Processing –...

31
Recurrent Network Inputs Outputs

Transcript of Recurrent Network InputsOutputs. Motivation Associative Memory Concept Time Series Processing –...

Page 1: Recurrent Network InputsOutputs. Motivation Associative Memory Concept Time Series Processing – Forecasting of Time series – Classification Time series.

Recurrent Network

Inputs Outputs

Page 2: Recurrent Network InputsOutputs. Motivation Associative Memory Concept Time Series Processing – Forecasting of Time series – Classification Time series.

Motivation• Associative Memory Concept• Time Series Processing– Forecasting of Time series– Classification Time series–Modeling of Time series–Mapping one Time series onto other

• Signal Processing ( a field very close to TS processing)

• Optimization problems (like Traveling Salesman)

Page 3: Recurrent Network InputsOutputs. Motivation Associative Memory Concept Time Series Processing – Forecasting of Time series – Classification Time series.

Address – addressable Vs. content addressable Memory

AM provide an approach of storing and retrieving data based on content rather than storage address. Storage in a NN is distributed throughout the system in the net’s weights, hence a pattern does not have a storage .

Page 4: Recurrent Network InputsOutputs. Motivation Associative Memory Concept Time Series Processing – Forecasting of Time series – Classification Time series.

Auto associative vs. HeteroAssociative Memory

s = f s = !fauto-associative memory hetero-associative memory

Page 5: Recurrent Network InputsOutputs. Motivation Associative Memory Concept Time Series Processing – Forecasting of Time series – Classification Time series.

So What’s the difference?

• The net not only learns the specific patterns pairs that were used for training, but is also able to recall the desired response pattern when given an I/P stimulus that is similar, but not identical, to the training I/P.

Associative Recalls

evoke associated patterns

recall a pattern by part of it

evoke/recall with incomplete/ noisy patterns

Page 6: Recurrent Network InputsOutputs. Motivation Associative Memory Concept Time Series Processing – Forecasting of Time series – Classification Time series.

Training an AM NN

• The original patterns must be converted to an appropriate representation for computation.

“on” →+1 , “off”→0 (binary representation) OR “on”→+1, “off”→-1 (bipolar representation).

• Two common training methods for single layer nets are usually :

– Hebbian learning rule – Its variations–gradient descent

Page 7: Recurrent Network InputsOutputs. Motivation Associative Memory Concept Time Series Processing – Forecasting of Time series – Classification Time series.

Hebbian Learning Rule• ”When one cell repeatedly assists in firing another, the axon of the first

cell• develops synaptic knobs (or enlarges them if they already exist) in contact• with the soma of the second cell.” (Hebb, 1949)• In an associative neural net, if we compare two pattern components (e.g. pixels) within many patterns and find that they are frequently in:

a) the same state, then the arc weight between their NN nodes should be +ve b) different states, then the arc weight between their NN nodes shoud be -ve

The weights must store the average correlations between all pattern components across all patterns. A net presented with a partial pattern can then use the correlations to recreate the entire pattern.

Weights = Average Correlations

Page 8: Recurrent Network InputsOutputs. Motivation Associative Memory Concept Time Series Processing – Forecasting of Time series – Classification Time series.

Quantative definition of Hebbian Learning Rule

• Auto-Association:pjpkjk iiw

* When the two components are the same (different), increase (decrease) the weight

Hetero-Association:pjpkjk oiw i = input component

o = output component

Ideally, the weights will record the average correlations across all patterns:

P

ppjpkjk iiw

1

Hebbian Principle: If all the input patterns are known prior to retrieval time, then init weights as:

P

ppjpkjk ii

Pw

1

1

P

ppjpkjk oiw

1

Auto: Hetero:

Auto:

P

ppjpkjk oi

Pw

1

1Hetero:

Page 9: Recurrent Network InputsOutputs. Motivation Associative Memory Concept Time Series Processing – Forecasting of Time series – Classification Time series.

Architectures of AM NN

Associative memory NN

Static / Feed Forward Systems

Dynamic /Recurrent / Iterative Systems i.e.

with feed back

Auto associative

Hetero associative

Auto associative

Hetero associative

Page 10: Recurrent Network InputsOutputs. Motivation Associative Memory Concept Time Series Processing – Forecasting of Time series – Classification Time series.

Mapping of Inputs to Outputs

Information Recording

• M is expressed as the prototype vectors stored.

• M is matrix type operator

Information Retrieval

• Mapping: x →v•Linear or nonlinear • v = M [x]

• Input a key vector x and find a desired vector. • v previously stored in the network.

Page 11: Recurrent Network InputsOutputs. Motivation Associative Memory Concept Time Series Processing – Forecasting of Time series – Classification Time series.

Static vs. Dynamic

Static Memory Recurrent Autoassociative Memory

The operator M2 operates at present instant k on the present input xk and output vk to produce the o/p in the next instant k+1 Δ is a unit delay needed for cyclic operation. The pattern is associated to itself (auto-)

Page 12: Recurrent Network InputsOutputs. Motivation Associative Memory Concept Time Series Processing – Forecasting of Time series – Classification Time series.

Heteroassociative Memory Net

• Memory Association of pairs (x,v)This AM operates with a cycle of 2Δ.

• It associates pairs of vectors (x(i),v(i)).

Page 13: Recurrent Network InputsOutputs. Motivation Associative Memory Concept Time Series Processing – Forecasting of Time series – Classification Time series.

Hop Field Model(A recurrent auto associative network)

The input x0 is used to initialize v0, i.e. x0= v0, and the input is then removed for the following evolution:

Operator M2 consists of multiplication by a weight matrix followed by the ensemble of non-linear mapping operations vi = f(neti) performed by the layer of neurons.

Page 14: Recurrent Network InputsOutputs. Motivation Associative Memory Concept Time Series Processing – Forecasting of Time series – Classification Time series.

Hopfield Model

Page 15: Recurrent Network InputsOutputs. Motivation Associative Memory Concept Time Series Processing – Forecasting of Time series – Classification Time series.

Hopfields AutoassociativeMemory (1982,1984)

Distributed RepresentationInfo is stored as a pattern of activations/weightsMultiple info is imprinted on the same network

Content-addressable memory Store patterns in a network by adjusting weights To retrieve a pattern, specify a portion

Distributed, asynchronous controlIndividual processing elements behave independently

Fault tolerance Few processors can fail, and the network will still work

Active or Inactive processing units are in one of two states

Units are connected with weighted, symmetric connections

Page 16: Recurrent Network InputsOutputs. Motivation Associative Memory Concept Time Series Processing – Forecasting of Time series – Classification Time series.

Multiple-loop feedback systemwith no self-feedback

1

2 3

-1-1

-1

-1

+1

+1

neuron

weight Example of Hopfield NN for 3 dimensional input data

X1

X2 X3

Attribute 3 of input (x1,x2,x3)

Execution: Input pattern attributes are initial states of neuronsRepeatedly update state of neurons asynchronously until states do not change

Page 17: Recurrent Network InputsOutputs. Motivation Associative Memory Concept Time Series Processing – Forecasting of Time series – Classification Time series.

Hopfields Autoassociative Memory• Input vectors values are in {-1,1} (or {0,1}).

• The number of neurons is equal to the input dimension.

• Every neuron has a link from every other neuron (recurrent• architecture) except itself (no self-feedback).

• The activation function used to update a neuron state is the• sign function but if the input of the activation function is 0 then the new output (state) of the neuron is equal to the old one.

• The weights are symmetric jiw

ijw

Page 18: Recurrent Network InputsOutputs. Motivation Associative Memory Concept Time Series Processing – Forecasting of Time series – Classification Time series.

NN Training1. Storage: . Let f1, f2, … , fM denote a known set of N-dimensional

fundamental memories. The weights of the network are:

• i-th component of the fundamental memory.

• State of neuron i at time n.

• wji is the weight from neuron i to neuron j. • The elements of the vector fμ are in {-1,+1}. Once they are computed,

the synaptic weights are kept fixed.• N: input dimension ;• M: number of patterns (called fundamental memories) are used to

compute the weights.

ij

i jM

jfifM

jiw

0

1

,,1

if

,)(n

ix

Page 19: Recurrent Network InputsOutputs. Motivation Associative Memory Concept Time Series Processing – Forecasting of Time series – Classification Time series.

NN Training• Each stored pattern (fundamental memory) establishes a

correlation between pairs of neurons: neurons tend to be of the same sign or opposite sign according to their value in the pattern.

• If wji is large, this expresses an expectation that neurons i and j are positively correlated. If it is small (negative) this indicates a negative correlation.

• will thus be large for a state x equal to a fundamental memory (since wij will be positive if the product xi xj > 0 and negative if xi xj < 0).

• The negative of the sum will thus be small.

ji

jxixijw,

Page 20: Recurrent Network InputsOutputs. Motivation Associative Memory Concept Time Series Processing – Forecasting of Time series – Classification Time series.

NN Execution

2. Initialization: Let x denote an input vector( probe) presented to the network. The algorithm is initialized by setting:

jprobexjx ,)0(

jprobex

, is the j-th element of the probe vector x probe.

)0(j

x is the state of neuron j at time t =0

j = 1……N

Page 21: Recurrent Network InputsOutputs. Motivation Associative Memory Concept Time Series Processing – Forecasting of Time series – Classification Time series.

NN Execution

3. Iteration Until Convergence Update the elements of network state vector

x(n) asynchronously (i.e. randomly and one at the time) according to the rule:

)(1sgn)1( tN

j ixjiwtjx

Repeat the iteration until the state vector x remains unchanged.

Page 22: Recurrent Network InputsOutputs. Motivation Associative Memory Concept Time Series Processing – Forecasting of Time series – Classification Time series.

NN Execution

• Outputting: Let denote the fixed point (or stable state, that is such that x(t+1)=x(t)) computed at the end of step 3. The resulting output y of the network is:

fixedx

fixedxy

Page 23: Recurrent Network InputsOutputs. Motivation Associative Memory Concept Time Series Processing – Forecasting of Time series – Classification Time series.

Pictorial Execution of Hop field Net

• No. Of neuron = dimension of pattern• Fully connected• Weights = avg correlations across all patterns of the corresponding units

1

3 4

2

2. Distributed Storage of All Patterns: 1

3 4

2

1

3 4

2

1

3 4

2

1

3 4

2

3. Retrieval

-1

1

1

3 4

21

3 4

2Comp/Node value legend:dark (blue) with x => +1dark (red) w/o x => -1light (green) => 0

1. Auto-Associative Patterns to Remember

Page 24: Recurrent Network InputsOutputs. Motivation Associative Memory Concept Time Series Processing – Forecasting of Time series – Classification Time series.

Hopfield Network Example

1

3 4

2 1

3 4

2

1. Patterns to Rememberp1 p2 p3

W12 1 1 -1 1/3

p1 p2 p3 Avg

W13 1 -1 -1 -1/3

W14 -1 1 1 1/3

W23 1 -1 1 1/3

W24 -1 1 -1 -1/3

W34 -1 -1 -1 -1

1

3 4

2[-]

[+]

-1

-1/3-1/3

1/3

1/3

1/3

3. Build Network

4. Enter Test Pattern

1

3 4

2

-1

-1/31/3

1/3

1/3

1

3 4

2

+1 0 -1

-1/3

Page 25: Recurrent Network InputsOutputs. Motivation Associative Memory Concept Time Series Processing – Forecasting of Time series – Classification Time series.

Hopfield Network Example (2)5. Synchronous Iteration (update all nodes at once)

-1

-1/31/3

1/3-1/3

1/3

Stable State

p1

1

3 4

2=

Values from Input Layer

From discrete output rule: sign(sum)Node 1 2 3 4 Output 1 1 0 0 -1/3 1 2 1/3 0 0 1/3 1 3 -1/3 0 0 1 1 4 1/3 0 0 -1 -1

Inputs

Page 26: Recurrent Network InputsOutputs. Motivation Associative Memory Concept Time Series Processing – Forecasting of Time series – Classification Time series.

Matrices ComputationGoal: Set weights such that an input vector Vi, yields itself when multiplied by the weights, W.

X = V1,V2..Vp, where p = # input vectors (i.e., patterns)

So Y=X, and the Hebbian weight calculation is: W = XTY = XTX 1 1 -1

1 1 1 -1 1 1 1 X = 1 1 -1 1 XT= 1 -1 1

-1 1 1 -1 -1 1 -1

3 1 -1 1 Common index = pattern #, so

XTX = 1 3 1 -1 this is correlation sum. -1 1 3 -3

1 -1 -3 3

w2,4 = w4,2 = xT2,1x1,4 + xT

2,2x2,4 + xT2,3x3,4

Page 27: Recurrent Network InputsOutputs. Motivation Associative Memory Concept Time Series Processing – Forecasting of Time series – Classification Time series.

Matrices computation• The upper and lower triangles of the product matrix represents the 6

weights wi,j = wj,i

• Scale the weights by dividing by p (i.e., averaging) . • This produces the same weights as in the non-matrix description.• Testing with input = ( 1 0 0 -1)

3 1 -1 1 (1 0 0 -1) 1 3 1 -1 = (2 2 2 -2) -1 1 3 -3

1 -1 -3 3 Scaling by p = 3 and using 0 as a threshold gives:

(2/3 2/3 2/3 -2/3) => (1 1 1 -1)

Page 28: Recurrent Network InputsOutputs. Motivation Associative Memory Concept Time Series Processing – Forecasting of Time series – Classification Time series.

Associative Retrieval = Search

p1 p2

p3

Back-propagation: • Search in space of weight vectors to minimize output error

Associative Memory Retrieval: • Search in space of node values to minimize conflicts between a)node-value pairs and average correlations (weights), and b) node values and their initial values.• Input patterns are local (sometimes global) minima, but many spurious patterns are also minima.• High dependence upon initial pattern and update sequence (if asynchronous)

Page 29: Recurrent Network InputsOutputs. Motivation Associative Memory Concept Time Series Processing – Forecasting of Time series – Classification Time series.

Energy FunctionEnergy of the associative memory should be low when pairs of node valuesmirror the average correlations (i.e. weights) on the arcs that connect the node pair, andwhen current node values equal their initial values (from the test pattern).

k

kkk j

kjkj xIbxxwaE

When pairs match correlations,wkjxjxk > 0

When current values match input values,Ikxk > 0

Gradient Descent A little math shows that asynchronous updates using the discrete rule:

))(sgn()1(1

pk

n

jpjkjpk Itxwtx

yield a gradient descent search along the energy landscape for the E defined above.

Page 30: Recurrent Network InputsOutputs. Motivation Associative Memory Concept Time Series Processing – Forecasting of Time series – Classification Time series.

Storage Capacity of Hopfield NetworksCapacity = Relationship between # patterns that can be stored &

retrieved without error to the size of the network.

Capacity = # patterns / # nodes or # patterns / # weights

• If we use the following definition of 100% correct retrieval:When any of the stored patterns is entered completely (no noise), then that same pattern is returned by the network; i.e. The pattern is a stable attractor.

• A detailed proof shows that a Hopfield network of N nodes can achieve 100% correct retrieval on P patterns if: P < N/(4*ln(N))

N Max P10 1100 51000 3610000 2711011 109

In general, as more patterns are added to a network,the avg correlations will be less likely to match thecorrelations in any particular pattern. Hence, the likelihood of retrieval error will increase. => The key to perfect recall is selective ignorance!!

Page 31: Recurrent Network InputsOutputs. Motivation Associative Memory Concept Time Series Processing – Forecasting of Time series – Classification Time series.

• Auto-Associative -vs- Hetero-associative Wide variety of net topologies All use Hebbian Learning => weights ~ avg correlations

• One-shot -vs- Iterative Retrieval Iterative gives much better error correction.

• Asynchronous -vs- Synchronous state updates Synchronous updates can easily lead to oscillation Asynchronous updates can quickly find a local optima (attractor)

Update order can determine attractor that is reached.• Pattern Retrieval = Search in node-state space.

Spurious patterns are hard to avoid, since many are attractors also. Stochasticity helps jiggle out of local minima. Memory load increase => recall error increase.

• Associative -vs- Feed-Forward Nets Assoc: Many - 1 mapping Feed-Forward: many-many mapping Backprop is resource-intensive, while Hopfield iterative update is O(n) Gradient-Descent on an Error -vs- Energy Landscape:

Backprop => arc-weight space Hopfield => node-state space

Things to Remember