74.419 Artificial Intelligence 2004 - Neural Networks

74.419 Artificial Intelligence 2004 - Neural Networks -

Neural Networks (NN)

• basic processing units

• general network architectures

• learning

• qualities and problems of NNs

Neural Networks – Central Conceptsbiologically inspired

– McCulloch-Pitts Neuron (automata theory), Perceptronbasic architecture

– units with activation state, – directed weighted connections between units – "activation spreading", output used as input to connected

unitsbasic processing in unit

– integrated input: sum of weighted outputs of connected “pre-units”

– activation of unit = function of integrated input– output depends on input/activation state– activation function or output function often threshold

dependent, also sigmoid (differentiable for backprop!) or linear

Anatomy of a Neuron

Diagram of an Action Potential

From: Ana Adelstein, Introduction to the Nervous System, Part Ihttp://www.ualberta.ca/~anaa/PSYCHO377/PSYCH377Lectures/L02Psych377/

General Neural Network Model• Network of simple processing units (neurons)• Units connected by weighted links (labelled di-

graph; connection matrix) wij

ui uj yi xj yj xj Input to uj aj Activation state of uj yj Output of uj wij Weight of connection from ui to uj (often wji )

Neuron Model as FSACalculate new activation state and output based on current activation and input. wij

ui uj yi xj yj Formalization as Finite State Machine (observe delay in unit)

Input to uj xj (t) = Σi=1,…,n wij ⋅ yi (t) Activation function aj (t+1) = δ (aj (t), xj (t)) Output function yj (t) = λ (aj (t), xj (t)) Output function often either linear function, threshold function or sigmoid function, depending only on the activation state.. Activation function often identity function of input, i.e. aj (t+1) = xj (t)

NN - Activation Functions

Sigmoid Activation Function Threshold Activation Function (Step Function)

adapted from Thomas Riga, University of Genoa, Italyhttp://www.hku.nl/~pieter/EDU/neuro/doc/ThomasRiga/ann.htm#pdp

Parallelism – Competing Rules VP

h1 h2 h3 Connection weights 1.0

0.5 0.33 -0.2 V NP PP

h1 h2 h3

V 1.0 0.5 0.33 t=1

NP 0.8 1.0 0.66 t=2

PP 0.6 0.8 0.99 t=3

NN Architectures + FunctionFeedforward, layered networks

simple pattern classification, function estimatingRecurrent networks

for space/time-variant input (e.g. natural language)Completely connected networks

Boltzman Machine, Hopfield Networkoptimization; constraint satisfaction

Self-Organizing NetworksSOMs, Kohonen networks, winner-take-all (WTA) networksunsupervised development of classificationbest-fitting weight vector slowly adapted to input vector

NN Architectures + Function

Feedforward networks → layers of uni-directionally connected units→ strict forward processing from input to output units→ simple pattern classification, function estimating,

decoder, control systemsRecurrent networks

→ Feedforward network with internal feedback (context memory)

→ processing of space/time-variant input, e.g. naturallanguage

→ e.g. Elman networks

Feed-forward Network

Haykin, Simon: Neural Networks - A Comprehensive Foundation, Prentice-Hall, 1999, p. 22.

NN Architectures + FunctionCompletely connected networks

→ all units bi-directionally connected → positive weight ≡ positive association between

units; units support each other, are compatible→ optimization; constraint satisfaction→ Boltzman Machine, Hopfield Network

Self-Organizing Networks→ SOMs, Kohonen networks, also winner-take-all

(WTA) networks→ best-fitting weight vector slowly adapts to input

vector→ unsupervised learning of classification

Neural Networks - Learning

Learning = change connection weightsadjust connection weights in network, changes input-

output behaviour, make it react “properly” to input pattern– supervised = network is told about “correct”

answer = teaching input; e.g. backpropagation, reinforcement learning

– unsupervised = network has to find correct output (usually classification of input patterns) on it’s own; e.g. competitive learning, winner-take-all networks, self-organizing or Kohonenmaps

Backpropagation - Schema

Backpropagation - Schematic Representation

The input is processed in a forward pass. Then the error is determined at the output units and propagated back through the network towards the input units.adapted from Thomas Riga, University of Genoa, Italyhttp://www.hku.nl/~pieter/EDU/neuro/doc/ThomasRiga/ann.htm#pdp

Backpropagation LearningBackpropagation Learning is supervised Correct input-output relation is known for some pattern

samples; take some of these patterns for training: calculate error between produced output and correct output; propagate error back from output to input units and adjust weights. After training perform tests with known I/O patterns. Then use with unknown input patterns.

Idea behind the Backpropagation Rule (next slides):Determine error for output units (compare produced output with 'teaching input' = correct or wanted output). Adjust weights based on error, activation state, and current weights. Determine error for internal units based on the derivation of activation function. Adjust weights for internal units using the error function, using an adapted delta-rule.

NN-Learning as OptimizationLearning: adjust network in order to adapt its input-output

behaviour so that it reacts “properly” to input patternsLearning as optimization process: find parameter setting for

network (in particular weights) which determines network that produces best-fitting behaviour (input-output relation)

→ minimize error in I/O behaviour→ optimize weight setting w.r.t error function→ find minimum in error surface for different weight settingsBackpropagation implements a gradient descent search for

correct weight setting (method not optimal)Statistical models (include a stochastic parameter) allow for

“jumps” out of local minima (cf. Hopfield Neuron with probabilistic activation function, Thermodynamic Modelswith temperature parameter, Simulated Annealing)

Genetic Algorithms can be used to determine parameter setting of Neural Network.

Backpropagation - Delta RuleThe Error is calculated as erri = (ti - yi) where

ti is the teaching input (the correct or wanted output)yi is the produced output

Note: In the textbook it is called (Ti - Oi)Backpropagation- or delta-rule:

wj,i ← wj,i + α • aj • ∆iwhere

α is a constant, the learning rate,aj is the activation of uj and ∆i is the backpropagated error.

∆i = erri • g' for units in the output layer

∆j = g' (xj) • Σ wj,i •∆i for internal hidden units

Where g' is the derivative of the activation function g.Then wk,j ← wk,j + α • xk • ∆j

Backpropagation as Error MinimizationFind Minimum of the Error function

E = 1/2 • Σi (ti - yi)2

Transform the above formula by integrating the weights (substitute the output term yi with g(Σ wj,i • aj) = sum of weighted outputs of pre-neurons):

E(W) = 1/2 • Σi (ti - g(Σj wj,i • aj))2

where W is the complete weight matrix for the net.Determine the derivative of the error function (the gradient) w.r.t to

a single weight wk,j :

dE / dwk,j = -xk • ∆j

To minimize the error, take the inverse of the gradient (+xk • ∆j).

This yields the Backpropagation- or delta-rule: wk,j ← wk,j + α • xk • ∆j

Implementation of Backprop-Learning• Choose description of input and output patterns

which is suitable for the task.• Determine test set and training set (disjoint sets)• Do – in general thousands of – training runs (with

various patterns) until parameters of the NN converge.

• The training goes several times through the different pattern classes (outputs), either one class at a time or one pattern from each class at a time.

• Measure performance of the network for test data (determine error – wrong vs. right reaction of NN)

• re-train if necessary

Competitive Learning 1

Competitive Learning is unsupervised. Discovers classes in the set of input patterns.Classes are determined by similarity of inputs.Determines (output) unit which responds to all sample inputs of the same class.Unit reacts to patterns which are similar and thus represents this class.

Different classes are represented by different units. The system can thus - after learning - be used for classification.

Competitive Learning 2Units specialize to recognize pattern classes

Unit which responds strongest (among all units) to the current input, moves it's weight vector towards the input vector (use e.g. Euclidean distance):

reduce weight on inactive lines, raise weighton active linesall other units keep or reduce their weights (often a Gaussian curve used to determine which units change their weights and how)

Winning units (their weight vectors) represent a prototype of the class they recognize.

Competitive Learning - Figure

from Haykin, Simon: Neural Networks, Prentice-Hall, 1999, p. 60

Example: NetTalk (from 319)• Terry Sejnowski of Johns Hopkins developed a

system that can pronounce words of text• The system consists of a backpropagation network

with 203 input units (29 text characters, 7 characters at a time), 80 hidden units, and 26 output units– The system was developed over a year

• The DEC-talk system consists of hand-coded linguistic rules for speech pronunciation– developed over approximately 10 years

• DEC-talk outperforms NETtalk but DEC-talk required significantly more development time

NetTalk (from 319)

• "This exemplifies the utility of neural networks; they are easy to construct and can be used even when a problem is not fully understood. However, rule-based algorithms usually out-perform neural networks when enough understanding is available”

» Hertz, Introduction to the Theory of Neural Networks, p. 133

NETtalk - General

• Feedforward network architecture• NETtalk used text as input• Text was moved over input units ("window") →

split text into fixed length input with some overlap between adjacent text windows

• Output represents controls for Speech Generator• Training through backpropagation• Training Patterns from human-made phonetic

transcripts

NETtalk - Processing Unit

NETtalk - Network Architecture

NETtalk - Some Articulatory Features (Output)

NN - Caveats 1often 3 layers necessary

Perceptron, Minsky&Papert’s analysislinearly separable pattern classes

position dependencevisual pattern recognition can depend on position of

pattern in input layer / matrixintroduce feature vectors (pre-analysis yields features of patterns; features input to NN)

time- and space invariancepatterns may be stretched / squeezed in space / time

dimension (visual objects, speech)

NN - Caveats 2Recursive structures and functions

not directly representable due to fixed architecture (fixed size)move window of input units over input (which is larger than input window)store information in hidden units ("context memory") and feedback into input layeruse hybrid model

Variable binding and value assignmentsimulation possible through simultaneously active,

synchronized units (cf. Lokendra Shastri)

Additional References

Haykin, Simon: Neural Networks – A Comprehensive Foundation, Prentice-Hall, 1995.

Rumelhart, McClelland & The PDP Research Group: Parallel Distributed Processing. Explorations into the Microstructures of Cognition, The MIT Press, 1986.

Neural Networks Web Pages

The neuroinformatics Site (incl. Software etc.)http://www.neuroinf.org/

Neural Networks incl. Software Repository at CBIIS (Connectionist-Based Intelligent Information Systems), University of Otago, New Zealandhttp://divcom.otago.ac.nz/infosci/kel/CBIIS.html

Kohonen Feature Map - Demohttp://rfhs8012.fh-regensburg.de/~saj39122/begrolu/ kohonen.html

Neurophysiology / Neurobiology Web Pages

Animated diagram of an Action Potential (Neuroscience for Kids - featuring the giant axon of the squid)http://faculty.washington.edu/chudler/ap.html

Adult explanation of processes involved in information transmission on the cell level (with diagrams but no animation)http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/E/ExcitableCells.html

Similar to above but with animation and partially spanishhttp://www.epub.org.br/cm/n10/fundamentos/pot2_i.htm

Neurophysiology / Neurobiology Web Pages

Kandel's Nobel Lecture "Molecular Biology of Memory Storage: A Dialogue Between Genes and Synapses," December 8, 2000 http://www.nobel.se/medicine/laureates/2000/kandel-lecture.html

The Molecular Sciences Institute, Berkeley http://www.molsci.org/Dispatch

The Salk Institute for Biological Studies, San Diegohttp://www.salk.edu/

74.419 Artificial Intelligence 2004 - Neural Networks

Documents

Transcript of 74.419 Artificial Intelligence 2004 - Neural Networks