Jacek Mazurkiewicz, PhD Softcomputing · Softcomputing Part 3: Recurrent Artificial Neural Networks...

Post on 19-Oct-2020

16 views 0 download

Transcript of Jacek Mazurkiewicz, PhD Softcomputing · Softcomputing Part 3: Recurrent Artificial Neural Networks...

Internet EngineeringJacek Mazurkiewicz, PhD

Softcomputing

Part 3: Recurrent Artificial Neural Networks

Self-Organising Artificial Neural Networks

Recurrent Artificial Neural NetworksFeedback signals between neurons

Dynamic relations

Single neuron change is transmitted to whole net

Stable state is reached after the set of temporary states

Stable state is available if strict assumptions are fixed to weights

Recurrent artificial neural networks are equipedby symmetric inter-neurons connections

Associative Memory

computer „memory” – as close as possible to human memory:

associative memory – to store „patterns”

auto-associative: Smoth – Smith

learning procedure – to inprint the set of patterns

retrieving phase– output the stored pattern closest to the actual input signal

hetero-associative: Smith – Smith’s face (Smith’s feature)

Hopfield Network (1)

Hamming distance for a binary input:

=

−+−=n

i

iiiiH yxyxd1

])1()1([

Hamming distance equals to zero if:

y = x

Hamming distance is a numberof not equal bits

wij

vN

v1

v2

vN-1

xN

x1

x2

xN-1

y1

neuron

y2

yN-1

yN

Retrieving Phase (1)each neuron performs the following two steps:

p pjj

N

j pk w v k ( ) ( )+ = −=

11

– computes the coproduct:

– updates the state:

p

p

p p

p

v k

k

v k k

k

for

for

for

( )

( )

( ) ( )

( )

+

+

+

− +

=

=

11 1 0

1 0

1 1 0

wij

vN

v1

v2

vN-1

xN

x1

x2

xN-1

y1

neuron

y2

yN-1

yN

where:

wpj – weight related to feedback signal

vi(k) – feedback signal

p – bias

initial condition:

process is repeated until convergence,which occurs when none of the elements changes state during any iteration:

Retrieving Phase (2)wij

vN

v1

v2

vN-1

xN

x1

x2

xN-1

y1

neuron

y2

yN-1

yN

p p pv x =( )0

p p p pv k v k y + = =( ) ( )1

converged state of Hopfield net means thatnet has already reached one of attractorsattractor - point of a local minimum of the energy function (Liapunov function):

E x w x x xij ij

N

i

N

j ii

N

i( ) = − +== =

1

2 11 1

E x x W x xT T

( ) = − +1

2

Hebbian Learningtraining patterns are presented one by onein a fitted time intervals

convergence condition:

wij

vN

v1

v2

vN-1

xN

x1

x2

xN-1

y1

neuron

y2

yN-1

yN

during each interval input data is communicated to neuron’s neighbours N times

ij

i

m

j

m

m

M

wx x i j

i jN

dla

dla

=

=

=

1

1

0

( ) ( )

p pp p j pj jpw w w = =0

algorithm: easy, fast, low memory capacity:

NM 138.0max=

correct weight values means:– input signal generates itself as output– converged state available at once:

one of possible solutions is:

Pseudoinverse Learningwij

vN

v1

v2

vN-1

xN

x1

x2

xN-1

y1

neuron

y2

yN-1

yN

XXW =

( )W X X X XT T

=

−1

algorithm: sophisticated, high memory capacity:

maxM N=

Delta-Rule Learningwij

vN

v1

v2

vN-1

xN

x1

x2

xN-1

y1

neuron

y2

yN-1

yN

weights are tuned step by step using all learning signals, presented in a sequence:

W W x W x xNi i i

T

= + − ( ) ( ) ( )

07 09. ., – learning rate

algorithm is quite similar to gradient methods used for Multilayer Perceptron learning

algorithm: sophisticated, high memory capacity:

maxM N=

Retrieving Phase - ProblemsInput signals heavily corrupted by noise can follow to a false answer

– net output is far from learned/stored patterns

Energy function value for symmetric states is identical(+1,+1,-1) == (-1,-1,+1)– both solutions offer the same „acceptance factor”

Learning algorithms can produce additional local minima– as linear combination of learning patterns

Additional minima are not fixed to any learning pattern– strongly important if the number of learning patterns is significant

Example of Answers

10 digits, 7x7 pixels

Hebbian learning:– 1 correct answer

Pseudoinverse & Delta-rule learning:– 7 correct answers– 9 answers with 1 wrong pixel– 4 answers with 2 wrong pixels

Hamming Network (1)

Hamming Network (2)

Hamming net – maximum likelihood classifierfor binary inputs corrupted by noise

Lower Sub Net calculates N minus the Hamming distanceto M exemplar patterns

Upper Sub Net selects that node with the maximum output

All nodes use threshold logic nonlinearities– the outputs of these nonlinearities never saturate

Thresholds and weights in the Maxnet are fixed

All thresholds are set to zero, weights from each node to itself are 1

Weights between nodes are inhibitory

Hamming Network (3)

weights and offsets of the Lower Sub Net:

weights in the Maxnet are fixed as:

ji

i

j

jwx N

= =2 2 for i N and j M0 1 0 1 − −

==

1

11

kif

kifwlk

for l k M andM

01

,

all thresholds in the Maxnet are kept zero

Hamming Network (4)outputs of the Lower Sub Net are obtained as:

weights in the Maxnet are fixed as:

for i N and j M0 1 0 1 − −

Maxnet does the maximisation by evaluating:

j ji i ji

N

w x = −=

0

1

( ) ( )j t j

y f0 = for j M0 1 −

( ) ( ) ( )j t j k

k j

y f y yt t t+ = −

1 for j k M0 1 −,

this process is repeated until convergence

Introduction

learning without a teacher – data overload

unsupervised learning:– similarity– PCA algorithms– classification– archetype finding– feature maps

Pavlov Experiment

FOOD (UCS) SALIVATION (UCR)

BELL (CS) SALIVATION (CR)

FOOD + BELL (UCS + CS) SALIVATION (CR)

CS – conditioned stimulus CR – conditioned reflexUCS – unconditioned stimulus UCR – unconditioned reflex

Fields of Usingsimilarity

– single-output net– how close is input signal to „mean-learned-pattern”

PCA– multi-output net, each output = single principal component– principal components responsible for similarity– actual output vector – correlation level

classification– binary multi-output with 1 of n code – class of closest data

stored patterns finding– associative memory

coding– data compression

Hebbian Rule (1949)

if neuron A is activated in a cyclic way by neuron B– neuron A is more and more sensitive to activation from neuron B

f(a) is any function– linear for example

f(ai)

X1

Xm

Wi1

Wim

ui yi

X2

Wi2)()()1( kijijij wkwkw +=+

)()( kykxw ijij =

General Hebbian Rule

Problem:– unlimited weight growth

Solution:– set limitations (Linsker)– Oja’s rule

Limitations:

Oja’s rule:– Hebbian rule + normalisation– additional requirements

),( jiij yxFw =

=

=m

j

jiji kxwky0

)()(

0;0;0

)()(

=

ijij

ijij

wyx

kykxw ],[ +− iii www

)]()()()[()( kwkykxkykw ijijiij −=

Principal Component Analysis - PCA

Statistic loss compression in telecommunication– Karhuenen-Loeve approach

Linear conversion into output space with reduced dimensions– preserves the most important features of stochastic process x

First component estimation– weights vector – using Oja’s rule:

Other principal components– by Sanger’s rule:

Wxy =NK

Rx N

NK

K

RW

Ry

+

)()()(0

111 kxWkxWkyN

j

jj

T =

==

=

=N

j

jiji kxWky0

)()(

Neural Networks for PCA

Oja’s rule - 1989

Sanger’s rule - 1989

nj

ki

wyxywk

l

ijijiij

,...,1

,...,1

1

=

=

−=

=

nj

ki

wyxywi

l

ijijiij

,...,1

,...,1

1

=

=

−=

=

Rubner & Tavan Network – 1989 (1)

Single-layer

One-way connections

Weights:– input layer – calculation layer according to the Hebbian rule

Internal connections within calculation layer– according to the anti-Hebb rule

ijij yxw =

ijij yyv −=

Rubner & Tavan Network – 1989 (2)

x1 x2 x3 x4 x5

y1 y2 y3 y4

v21 v32 v43

v41

v31v42

w11 w45

Picture Compression for PCA

Large amount of input data substitutedby lower amount combined in vector y and Wi

Level of compression – number of PCA components– main factor of the restored picture quality

More principal components– better quality– lower compression level

Picture restored based on:– 2 principal components– compression level: 28

Self-Organising Artificial Neural NetworksInter-neurons action

Goal: input signals mapped into output signals

Similar input data are grouped

Groups are separated

Kohonen neural network – leader!

T. Kohonen from Finland!

Concurrent Learning

WTA – Winner Takes All WTM – Winner Takes Most

W

Y X

WTA (1)

Single layer of working neurons

The same input signals xj are loaded to all competitive neurons

Starting weight values are random

Each neuron calculates the product:

The winner is … the neuron with a maximum output!

Neuron the winner – final output equals to 1

Other neurons set output values to 0

=j

jiji xwu

WTA (2)

First presentation of learning vectors is the base to pointthe winner neuron

Weights are modified by the Grossberg rule

If the learning vectors are similar the same winner neuron,the winner’s weights are the mean values of input signals

X

W

WTM (1)

Winner selection like in WTA

Winner’s output is maximum

Winner activates the neighbourhood neurons

Distance from the winner drives the level of activation

Level of activation is a part of weight tuning algorithm

All weights are modified during learning algorithm

Neurons Neighbourhood (1)

Neurons as nodes of regular network

Central neuron – in the middle of the region

Neighbourhood neurons in the closest columns and rows

simple neighbourhood sophisticated neighbourhood

Neurons Neighbourhood (2)

2-D neighbourhood

1-D neighbourhood

Neighbourhood function h(r)

distance function betweeneach neuron and the winner

defines the necessary parametersfor weights tuning

rrh

1)( =

2

)( rerh −=

r – distance between the winnerand neurons in the neighbourhood

or

Grossberg Ruleneighbourhood around the wining neuron,

size of neighbourhood decreases with iteration,

modulation of learning rate by frequency sensitivity.

Neighbourhood function = Mexican Hat:

a - neighbourhood parameter,r - distance from winner neuron

to each single neuron

=

=

rvaluesotherfora

rforar

arrfor

jijih ww

0

)2

,0()sin(

01

),,,(

The Grossberg rule:

))()(,,,()()()1( kwxjijihkkwkw lijl

ww

lijlij −+=+

k - iteration index, - learning rate function, xl - component of input learning vectorwlij - weight associated with proper connection, h - neighbourhood function,

(iw ,jw) - indexes related to the winner neuron, (i, j) - indexes related to a single neuron