Jacek Mazurkiewicz, PhD Softcomputing · Softcomputing Part 3: Recurrent Artificial Neural Networks...

Internet EngineeringJacek Mazurkiewicz, PhD

Softcomputing

Part 3: Recurrent Artificial Neural Networks

Self-Organising Artificial Neural Networks

Recurrent Artificial Neural NetworksFeedback signals between neurons

Dynamic relations

Single neuron change is transmitted to whole net

Stable state is reached after the set of temporary states

Stable state is available if strict assumptions are fixed to weights

Recurrent artificial neural networks are equipedby symmetric inter-neurons connections

Associative Memory

computer „memory” – as close as possible to human memory:

associative memory – to store „patterns”

auto-associative: Smoth – Smith

learning procedure – to inprint the set of patterns

retrieving phase– output the stored pattern closest to the actual input signal

hetero-associative: Smith – Smith’s face (Smith’s feature)

Hopfield Network (1)

Hamming distance for a binary input:

−+−=n

iiiiH yxyxd1

])1()1([

Hamming distance equals to zero if:

Hamming distance is a numberof not equal bits

neuron

Retrieving Phase (1)each neuron performs the following two steps:

j pk w v k ( ) ( )+ = −=

– computes the coproduct:

– updates the state:

( ) ( )

11 1 0

neuron

where:

wpj – weight related to feedback signal

vi(k) – feedback signal

p – bias

initial condition:

process is repeated until convergence,which occurs when none of the elements changes state during any iteration:

Retrieving Phase (2)wij

neuron

p p pv x =( )0

p p p pv k v k y + = =( ) ( )1

converged state of Hopfield net means thatnet has already reached one of attractorsattractor - point of a local minimum of the energy function (Liapunov function):

E x w x x xij ij

i( ) = − +== =

2 11 1

E x x W x xT T

( ) = − +1

Hebbian Learningtraining patterns are presented one by onein a fitted time intervals

convergence condition:

neuron

during each interval input data is communicated to neuron’s neighbours N times

wx x i j

( ) ( )

p pp p j pj jpw w w = =0

algorithm: easy, fast, low memory capacity:

NM 138.0max=

correct weight values means:– input signal generates itself as output– converged state available at once:

one of possible solutions is:

Pseudoinverse Learningwij

neuron

( )W X X X XT T

algorithm: sophisticated, high memory capacity:

maxM N=

Delta-Rule Learningwij

neuron

weights are tuned step by step using all learning signals, presented in a sequence:

W W x W x xNi i i

= + − ( ) ( ) ( )

07 09. ., – learning rate

algorithm is quite similar to gradient methods used for Multilayer Perceptron learning

algorithm: sophisticated, high memory capacity:

maxM N=

Retrieving Phase - ProblemsInput signals heavily corrupted by noise can follow to a false answer

– net output is far from learned/stored patterns

Energy function value for symmetric states is identical(+1,+1,-1) == (-1,-1,+1)– both solutions offer the same „acceptance factor”

Learning algorithms can produce additional local minima– as linear combination of learning patterns

Additional minima are not fixed to any learning pattern– strongly important if the number of learning patterns is significant

Example of Answers

10 digits, 7x7 pixels

Hebbian learning:– 1 correct answer

Pseudoinverse & Delta-rule learning:– 7 correct answers– 9 answers with 1 wrong pixel– 4 answers with 2 wrong pixels

Hamming Network (1)

Hamming Network (2)

Hamming net – maximum likelihood classifierfor binary inputs corrupted by noise

Lower Sub Net calculates N minus the Hamming distanceto M exemplar patterns

Upper Sub Net selects that node with the maximum output

All nodes use threshold logic nonlinearities– the outputs of these nonlinearities never saturate

Thresholds and weights in the Maxnet are fixed

All thresholds are set to zero, weights from each node to itself are 1

Weights between nodes are inhibitory

Hamming Network (3)

weights and offsets of the Lower Sub Net:

weights in the Maxnet are fixed as:

= =2 2 for i N and j M0 1 0 1 − −

kifwlk

for l k M andM

all thresholds in the Maxnet are kept zero

Hamming Network (4)outputs of the Lower Sub Net are obtained as:

weights in the Maxnet are fixed as:

for i N and j M0 1 0 1 − −

Maxnet does the maximisation by evaluating:

j ji i ji

w x = −=

( ) ( )j t j

y f0 = for j M0 1 −

( ) ( ) ( )j t j k

y f y yt t t+ = −

1 for j k M0 1 −,

this process is repeated until convergence

Introduction

learning without a teacher – data overload

unsupervised learning:– similarity– PCA algorithms– classification– archetype finding– feature maps

Pavlov Experiment

FOOD (UCS) SALIVATION (UCR)

BELL (CS) SALIVATION (CR)

FOOD + BELL (UCS + CS) SALIVATION (CR)

CS – conditioned stimulus CR – conditioned reflexUCS – unconditioned stimulus UCR – unconditioned reflex

Fields of Usingsimilarity

– single-output net– how close is input signal to „mean-learned-pattern”

PCA– multi-output net, each output = single principal component– principal components responsible for similarity– actual output vector – correlation level

classification– binary multi-output with 1 of n code – class of closest data

stored patterns finding– associative memory

coding– data compression

Hebbian Rule (1949)

if neuron A is activated in a cyclic way by neuron B– neuron A is more and more sensitive to activation from neuron B

f(a) is any function– linear for example

Wi2)()()1( kijijij wkwkw +=+

)()( kykxw ijij =

General Hebbian Rule

Problem:– unlimited weight growth

Solution:– set limitations (Linsker)– Oja’s rule

Limitations:

Oja’s rule:– Hebbian rule + normalisation– additional requirements

),( jiij yxFw =

jiji kxwky0

kykxw ],[ +− iii www

)]()()()[()( kwkykxkykw ijijiij −=

Principal Component Analysis - PCA

Statistic loss compression in telecommunication– Karhuenen-Loeve approach

Linear conversion into output space with reduced dimensions– preserves the most important features of stochastic process x

First component estimation– weights vector – using Oja’s rule:

Other principal components– by Sanger’s rule:

Wxy =NK

)()()(0

111 kxWkxWkyN

jiji kxWky0

Neural Networks for PCA

Oja’s rule - 1989

Sanger’s rule - 1989

wyxywk

ijijiij

,...,1

wyxywi

ijijiij

,...,1

Rubner & Tavan Network – 1989 (1)

Single-layer

One-way connections

Weights:– input layer – calculation layer according to the Hebbian rule

Internal connections within calculation layer– according to the anti-Hebb rule

ijij yxw =

ijij yyv −=

Rubner & Tavan Network – 1989 (2)

x1 x2 x3 x4 x5

y1 y2 y3 y4

v21 v32 v43

v31v42

w11 w45

Picture Compression for PCA

Large amount of input data substitutedby lower amount combined in vector y and Wi

Level of compression – number of PCA components– main factor of the restored picture quality

More principal components– better quality– lower compression level

Picture restored based on:– 2 principal components– compression level: 28

Self-Organising Artificial Neural NetworksInter-neurons action

Goal: input signals mapped into output signals

Similar input data are grouped

Groups are separated

Kohonen neural network – leader!

T. Kohonen from Finland!

Concurrent Learning

WTA – Winner Takes All WTM – Winner Takes Most

WTA (1)

Single layer of working neurons

The same input signals xj are loaded to all competitive neurons

Starting weight values are random

Each neuron calculates the product:

The winner is … the neuron with a maximum output!

Neuron the winner – final output equals to 1

Other neurons set output values to 0

jiji xwu

WTA (2)

First presentation of learning vectors is the base to pointthe winner neuron

Weights are modified by the Grossberg rule

If the learning vectors are similar the same winner neuron,the winner’s weights are the mean values of input signals

WTM (1)

Winner selection like in WTA

Winner’s output is maximum

Winner activates the neighbourhood neurons

Distance from the winner drives the level of activation

Level of activation is a part of weight tuning algorithm

All weights are modified during learning algorithm

Neurons Neighbourhood (1)

Neurons as nodes of regular network

Central neuron – in the middle of the region

Neighbourhood neurons in the closest columns and rows

simple neighbourhood sophisticated neighbourhood

Neurons Neighbourhood (2)

2-D neighbourhood

1-D neighbourhood

Neighbourhood function h(r)

distance function betweeneach neuron and the winner

defines the necessary parametersfor weights tuning

)( rerh −=

r – distance between the winnerand neurons in the neighbourhood

Grossberg Ruleneighbourhood around the wining neuron,

size of neighbourhood decreases with iteration,

modulation of learning rate by frequency sensitivity.

Neighbourhood function = Mexican Hat:

a - neighbourhood parameter,r - distance from winner neuron

to each single neuron

rvaluesotherfora

rforar

arrfor

jijih ww

,0()sin(

The Grossberg rule:

))()(,,,()()()1( kwxjijihkkwkw lijl

lijlij −+=+

k - iteration index, - learning rate function, xl - component of input learning vectorwlij - weight associated with proper connection, h - neighbourhood function,

(iw ,jw) - indexes related to the winner neuron, (i, j) - indexes related to a single neuron

Jacek Mazurkiewicz, PhD Softcomputing · Softcomputing Part 3: Recurrent Artificial Neural Networks...

Documents

Transcript of Jacek Mazurkiewicz, PhD Softcomputing · Softcomputing Part 3: Recurrent Artificial Neural Networks...

13 Artificial Intelligence-NeuralNetworks · Artificial Neural Network Structures 22 23 Neural Network Structures • Mathematically artificial neural networks are represented by

Artificial Neural Networks

Artificial Neural Networks - Newcastle University Automation/NN... · Artificial Neural Networks . Introduction . ... The term "artificial" means that neural nets are ... The basic

Artificial Neural Network1

CHAPTER 4 ARTIFICIAL NEURAL NETWORKS · 4.4 AN ARTIFICIAL NEURAL NETWORK Fig. 4.3: An artificial neural network Fig. 4.3 shows an artificial neural network. Inputs enter into the

Artificial Neural Networks

Artificial Intelligence: Artificial Neural Networks

Artificial Neural Networks - primordial.comprimordial.com/documents/Neural_Networks.pdf · Artificial Neural Networks ... – MNIST Handwritten Digits Benchmark ... • Fast Neural

Channel Equalization using Artificial Neural NetworkChannel Equalization Using Artificial Neural Network

Artificial Neural Networsks

Artificial neural network model & hidden layers in multilayer artificial neural networks

ARTIFICIAL NEURAL NETWORKS FOR DATA MININGapps.iasri.res.in/sscnars/data_mining/4-Artificial Neural Networks_Amrender.pdf · ARTIFICIAL NEURAL NETWORKS FOR DATA MINING Amrender Kumar

Rapidly Adapting Artificial Neural Networks for Autonomous ...papers.nips.cc/paper/432-rapidly-adapting-artificial-neural-networks... · Rapidly Adapting Artificial Neural Networks

Artificial Neural Network

SPATIAL PREDICTIVE MAPPING USING ARTIFICIAL NEURAL · PDF fileSPATIAL PREDICTIVE MAPPING USING ARTIFICIAL NEURAL NETWORKS ... KEY WORDS Artificial Neural Network, ... like artificial

ARTIFICIAL NEURAL NETWORKS TECHNOLOGY - IZ3MEZ · 2 2.0 What are Artificial Neural Networks? Artificial Neural Networks are relatively crude electronic models based on the neural

Artificial Neural NetworksArtificial Neural Networksrailway.iust.ac.ir/files/rail/Booklet_Teach/neural_network_moaveni.pdf · Artificial Neural NetworksArtificial Neural Networks

Artificial Neural Networks - LASAR · In an artificial neural network, variables are associated with ... What is the mathematical basis behind Artificial Neural Networks? Neural networks

Ahmad Aljebaly Artificial Neural Networks. Agenda History of Artificial Neural Networks What is an Artificial Neural Networks? How it works? Learning.

An Artificial Neural Networks Primer with Financial ... Artificial Neural... · ‘An Artificial Neural Networks Primer with Financial Applications Examples in Financial Distress