74.419 Artificial Intelligence 2004 - Neural Networks

38
74.419 Artificial Intelligence 2004 - Neural Networks - Neural Networks (NN) • basic processing units • general network architectures learning qualities and problems of NNs

Transcript of 74.419 Artificial Intelligence 2004 - Neural Networks

Page 1: 74.419 Artificial Intelligence 2004 - Neural Networks

74.419 Artificial Intelligence 2004 - Neural Networks -

Neural Networks (NN)

• basic processing units

• general network architectures

• learning

• qualities and problems of NNs

Page 2: 74.419 Artificial Intelligence 2004 - Neural Networks

Neural Networks – Central Conceptsbiologically inspired

– McCulloch-Pitts Neuron (automata theory), Perceptronbasic architecture

– units with activation state, – directed weighted connections between units – "activation spreading", output used as input to connected

unitsbasic processing in unit

– integrated input: sum of weighted outputs of connected “pre-units”

– activation of unit = function of integrated input– output depends on input/activation state– activation function or output function often threshold

dependent, also sigmoid (differentiable for backprop!) or linear

Page 3: 74.419 Artificial Intelligence 2004 - Neural Networks

Anatomy of a Neuron

Page 4: 74.419 Artificial Intelligence 2004 - Neural Networks

Diagram of an Action Potential

From: Ana Adelstein, Introduction to the Nervous System, Part Ihttp://www.ualberta.ca/~anaa/PSYCHO377/PSYCH377Lectures/L02Psych377/

Page 5: 74.419 Artificial Intelligence 2004 - Neural Networks

General Neural Network Model• Network of simple processing units (neurons)• Units connected by weighted links (labelled di-

graph; connection matrix) wij

ui uj yi xj yj xj Input to uj aj Activation state of uj yj Output of uj wij Weight of connection from ui to uj (often wji )

Page 6: 74.419 Artificial Intelligence 2004 - Neural Networks

Neuron Model as FSACalculate new activation state and output based on current activation and input. wij

ui uj yi xj yj Formalization as Finite State Machine (observe delay in unit)

Input to uj xj (t) = Σi=1,…,n wij ⋅ yi (t) Activation function aj (t+1) = δ (aj (t), xj (t)) Output function yj (t) = λ (aj (t), xj (t)) Output function often either linear function, threshold function or sigmoid function, depending only on the activation state.. Activation function often identity function of input, i.e. aj (t+1) = xj (t)

Page 7: 74.419 Artificial Intelligence 2004 - Neural Networks

NN - Activation Functions

Sigmoid Activation Function Threshold Activation Function (Step Function)

adapted from Thomas Riga, University of Genoa, Italyhttp://www.hku.nl/~pieter/EDU/neuro/doc/ThomasRiga/ann.htm#pdp

Page 8: 74.419 Artificial Intelligence 2004 - Neural Networks
Page 9: 74.419 Artificial Intelligence 2004 - Neural Networks
Page 10: 74.419 Artificial Intelligence 2004 - Neural Networks

Parallelism – Competing Rules VP

h1 h2 h3 Connection weights 1.0

0.5 0.33 -0.2 V NP PP

h1 h2 h3

V 1.0 0.5 0.33 t=1

NP 0.8 1.0 0.66 t=2

PP 0.6 0.8 0.99 t=3

Page 11: 74.419 Artificial Intelligence 2004 - Neural Networks

NN Architectures + FunctionFeedforward, layered networks

simple pattern classification, function estimatingRecurrent networks

for space/time-variant input (e.g. natural language)Completely connected networks

Boltzman Machine, Hopfield Networkoptimization; constraint satisfaction

Self-Organizing NetworksSOMs, Kohonen networks, winner-take-all (WTA) networksunsupervised development of classificationbest-fitting weight vector slowly adapted to input vector

Page 12: 74.419 Artificial Intelligence 2004 - Neural Networks

NN Architectures + Function

Feedforward networks → layers of uni-directionally connected units→ strict forward processing from input to output units→ simple pattern classification, function estimating,

decoder, control systemsRecurrent networks

→ Feedforward network with internal feedback (context memory)

→ processing of space/time-variant input, e.g. naturallanguage

→ e.g. Elman networks

Page 13: 74.419 Artificial Intelligence 2004 - Neural Networks

Feed-forward Network

Haykin, Simon: Neural Networks - A Comprehensive Foundation, Prentice-Hall, 1999, p. 22.

Page 14: 74.419 Artificial Intelligence 2004 - Neural Networks

NN Architectures + FunctionCompletely connected networks

→ all units bi-directionally connected → positive weight ≡ positive association between

units; units support each other, are compatible→ optimization; constraint satisfaction→ Boltzman Machine, Hopfield Network

Self-Organizing Networks→ SOMs, Kohonen networks, also winner-take-all

(WTA) networks→ best-fitting weight vector slowly adapts to input

vector→ unsupervised learning of classification

Page 15: 74.419 Artificial Intelligence 2004 - Neural Networks

Neural Networks - Learning

Learning = change connection weightsadjust connection weights in network, changes input-

output behaviour, make it react “properly” to input pattern– supervised = network is told about “correct”

answer = teaching input; e.g. backpropagation, reinforcement learning

– unsupervised = network has to find correct output (usually classification of input patterns) on it’s own; e.g. competitive learning, winner-take-all networks, self-organizing or Kohonenmaps

Page 16: 74.419 Artificial Intelligence 2004 - Neural Networks

Backpropagation - Schema

Backpropagation - Schematic Representation

The input is processed in a forward pass. Then the error is determined at the output units and propagated back through the network towards the input units.adapted from Thomas Riga, University of Genoa, Italyhttp://www.hku.nl/~pieter/EDU/neuro/doc/ThomasRiga/ann.htm#pdp

Page 17: 74.419 Artificial Intelligence 2004 - Neural Networks

Backpropagation LearningBackpropagation Learning is supervised Correct input-output relation is known for some pattern

samples; take some of these patterns for training: calculate error between produced output and correct output; propagate error back from output to input units and adjust weights. After training perform tests with known I/O patterns. Then use with unknown input patterns.

Idea behind the Backpropagation Rule (next slides):Determine error for output units (compare produced output with 'teaching input' = correct or wanted output). Adjust weights based on error, activation state, and current weights. Determine error for internal units based on the derivation of activation function. Adjust weights for internal units using the error function, using an adapted delta-rule.

Page 18: 74.419 Artificial Intelligence 2004 - Neural Networks

NN-Learning as OptimizationLearning: adjust network in order to adapt its input-output

behaviour so that it reacts “properly” to input patternsLearning as optimization process: find parameter setting for

network (in particular weights) which determines network that produces best-fitting behaviour (input-output relation)

→ minimize error in I/O behaviour→ optimize weight setting w.r.t error function→ find minimum in error surface for different weight settingsBackpropagation implements a gradient descent search for

correct weight setting (method not optimal)Statistical models (include a stochastic parameter) allow for

“jumps” out of local minima (cf. Hopfield Neuron with probabilistic activation function, Thermodynamic Modelswith temperature parameter, Simulated Annealing)

Genetic Algorithms can be used to determine parameter setting of Neural Network.

Page 19: 74.419 Artificial Intelligence 2004 - Neural Networks
Page 20: 74.419 Artificial Intelligence 2004 - Neural Networks

Backpropagation - Delta RuleThe Error is calculated as erri = (ti - yi) where

ti is the teaching input (the correct or wanted output)yi is the produced output

Note: In the textbook it is called (Ti - Oi)Backpropagation- or delta-rule:

wj,i ← wj,i + α • aj • ∆iwhere

α is a constant, the learning rate,aj is the activation of uj and ∆i is the backpropagated error.

∆i = erri • g' for units in the output layer

∆j = g' (xj) • Σ wj,i •∆i for internal hidden units

Where g' is the derivative of the activation function g.Then wk,j ← wk,j + α • xk • ∆j

Page 21: 74.419 Artificial Intelligence 2004 - Neural Networks

Backpropagation as Error MinimizationFind Minimum of the Error function

E = 1/2 • Σi (ti - yi)2

Transform the above formula by integrating the weights (substitute the output term yi with g(Σ wj,i • aj) = sum of weighted outputs of pre-neurons):

E(W) = 1/2 • Σi (ti - g(Σj wj,i • aj))2

where W is the complete weight matrix for the net.Determine the derivative of the error function (the gradient) w.r.t to

a single weight wk,j :

dE / dwk,j = -xk • ∆j

To minimize the error, take the inverse of the gradient (+xk • ∆j).

This yields the Backpropagation- or delta-rule: wk,j ← wk,j + α • xk • ∆j

Page 22: 74.419 Artificial Intelligence 2004 - Neural Networks

Implementation of Backprop-Learning• Choose description of input and output patterns

which is suitable for the task.• Determine test set and training set (disjoint sets)• Do – in general thousands of – training runs (with

various patterns) until parameters of the NN converge.

• The training goes several times through the different pattern classes (outputs), either one class at a time or one pattern from each class at a time.

• Measure performance of the network for test data (determine error – wrong vs. right reaction of NN)

• re-train if necessary

Page 23: 74.419 Artificial Intelligence 2004 - Neural Networks

Competitive Learning 1

Competitive Learning is unsupervised. Discovers classes in the set of input patterns.Classes are determined by similarity of inputs.Determines (output) unit which responds to all sample inputs of the same class.Unit reacts to patterns which are similar and thus represents this class.

Different classes are represented by different units. The system can thus - after learning - be used for classification.

Page 24: 74.419 Artificial Intelligence 2004 - Neural Networks

Competitive Learning 2Units specialize to recognize pattern classes

Unit which responds strongest (among all units) to the current input, moves it's weight vector towards the input vector (use e.g. Euclidean distance):

reduce weight on inactive lines, raise weighton active linesall other units keep or reduce their weights (often a Gaussian curve used to determine which units change their weights and how)

Winning units (their weight vectors) represent a prototype of the class they recognize.

Page 25: 74.419 Artificial Intelligence 2004 - Neural Networks

Competitive Learning - Figure

from Haykin, Simon: Neural Networks, Prentice-Hall, 1999, p. 60

Page 26: 74.419 Artificial Intelligence 2004 - Neural Networks

Example: NetTalk (from 319)• Terry Sejnowski of Johns Hopkins developed a

system that can pronounce words of text• The system consists of a backpropagation network

with 203 input units (29 text characters, 7 characters at a time), 80 hidden units, and 26 output units– The system was developed over a year

• The DEC-talk system consists of hand-coded linguistic rules for speech pronunciation– developed over approximately 10 years

• DEC-talk outperforms NETtalk but DEC-talk required significantly more development time

Page 27: 74.419 Artificial Intelligence 2004 - Neural Networks

NetTalk (from 319)

• "This exemplifies the utility of neural networks; they are easy to construct and can be used even when a problem is not fully understood. However, rule-based algorithms usually out-perform neural networks when enough understanding is available”

» Hertz, Introduction to the Theory of Neural Networks, p. 133

Page 28: 74.419 Artificial Intelligence 2004 - Neural Networks

NETtalk - General

• Feedforward network architecture• NETtalk used text as input• Text was moved over input units ("window") →

split text into fixed length input with some overlap between adjacent text windows

• Output represents controls for Speech Generator• Training through backpropagation• Training Patterns from human-made phonetic

transcripts

Page 29: 74.419 Artificial Intelligence 2004 - Neural Networks

NETtalk - Processing Unit

Page 30: 74.419 Artificial Intelligence 2004 - Neural Networks

NETtalk - Network Architecture

Page 31: 74.419 Artificial Intelligence 2004 - Neural Networks

NETtalk - Some Articulatory Features (Output)

Page 32: 74.419 Artificial Intelligence 2004 - Neural Networks

NETtalk - Some Articulatory Features (Output)

Page 33: 74.419 Artificial Intelligence 2004 - Neural Networks

NN - Caveats 1often 3 layers necessary

Perceptron, Minsky&Papert’s analysislinearly separable pattern classes

position dependencevisual pattern recognition can depend on position of

pattern in input layer / matrixintroduce feature vectors (pre-analysis yields features of patterns; features input to NN)

time- and space invariancepatterns may be stretched / squeezed in space / time

dimension (visual objects, speech)

Page 34: 74.419 Artificial Intelligence 2004 - Neural Networks

NN - Caveats 2Recursive structures and functions

not directly representable due to fixed architecture (fixed size)move window of input units over input (which is larger than input window)store information in hidden units ("context memory") and feedback into input layeruse hybrid model

Variable binding and value assignmentsimulation possible through simultaneously active,

synchronized units (cf. Lokendra Shastri)

Page 35: 74.419 Artificial Intelligence 2004 - Neural Networks

Additional References

Haykin, Simon: Neural Networks – A Comprehensive Foundation, Prentice-Hall, 1995.

Rumelhart, McClelland & The PDP Research Group: Parallel Distributed Processing. Explorations into the Microstructures of Cognition, The MIT Press, 1986.

Page 36: 74.419 Artificial Intelligence 2004 - Neural Networks

Neural Networks Web Pages

The neuroinformatics Site (incl. Software etc.)http://www.neuroinf.org/

Neural Networks incl. Software Repository at CBIIS (Connectionist-Based Intelligent Information Systems), University of Otago, New Zealandhttp://divcom.otago.ac.nz/infosci/kel/CBIIS.html

Kohonen Feature Map - Demohttp://rfhs8012.fh-regensburg.de/~saj39122/begrolu/ kohonen.html

Page 37: 74.419 Artificial Intelligence 2004 - Neural Networks

Neurophysiology / Neurobiology Web Pages

Animated diagram of an Action Potential (Neuroscience for Kids - featuring the giant axon of the squid)http://faculty.washington.edu/chudler/ap.html

Adult explanation of processes involved in information transmission on the cell level (with diagrams but no animation)http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/E/ExcitableCells.html

Similar to above but with animation and partially spanishhttp://www.epub.org.br/cm/n10/fundamentos/pot2_i.htm

Page 38: 74.419 Artificial Intelligence 2004 - Neural Networks

Neurophysiology / Neurobiology Web Pages

Kandel's Nobel Lecture "Molecular Biology of Memory Storage: A Dialogue Between Genes and Synapses," December 8, 2000 http://www.nobel.se/medicine/laureates/2000/kandel-lecture.html

The Molecular Sciences Institute, Berkeley http://www.molsci.org/Dispatch

The Salk Institute for Biological Studies, San Diegohttp://www.salk.edu/