[email protected] CSM6120 Introduction to Intelligent Systems Neural Networks.

80
[email protected] CSM6120 Introduction to Intelligent Systems Neural Networks

Transcript of [email protected] CSM6120 Introduction to Intelligent Systems Neural Networks.

Page 1: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

[email protected]

CSM6120Introduction to Intelligent

SystemsNeural Networks

Page 2: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Sub-symbolic learning When we use some sort of rule-based system,

generally we understand the rules We understand the conclusions it draws, because it can

tell us

When a system learns from such rules, processes it in an understood way, and comes up with an understandable result, it is known as symbolic learning

Page 3: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Sub-symbolic learning Some systems of learning – which we are coming

to – operate in quite a different way

Sub-symbolic learning is learning where we don't really understand or have control over the process by which it comes to its conclusion Though we could, with considerable effort and vast

amounts of time, find out

This includes neural networks, genetic algorithms, genetic programming and to some extent, statistical methods

Page 4: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Artificial neural networks ANNs for short

We use the word “Artificial” to distinguish them from biological neural networks

They come in various shapes and sizes

Inputs (variables), outputs (results)

May be one or more outputs

Page 5: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Neural networks (applications) Face recognition Event recognition, tracking Time series prediction Process control Optical character recognition Handwriting recognition Robotics, movement Games: car control, etc Etc…

Page 6: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

ANNs statistical? We all think of ANNs as pure AI, right?

There is much published literature which claims that ANNs are really no more than non-linear regression models, and claims that they can be implemented using standard statistical software

But there is a difference: randomness

Page 7: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Neurons Original ANN research aimed at modelling

networks of real neurons in the brain 1010 neurons in the brain – unrealistic target!

Brain is massively parallel One job shared between many neurons If one goes wrong, no big deal

Known as distributed processing Fault-tolerant

Page 8: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Patterns within data ANNs learn to recognise patterns within sets

of data

Fault tolerance helps – can still cope with unexpected input examples, placing emphasis only on what it considers important

Page 9: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

The neuron metaphor Neurons

Accept information from multiple inputs Transmit information to other neurons Multiply inputs by weights along edges Apply some function to the set of inputs at each node

Page 10: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

ANN topology

Page 11: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Nodes (neurons, or perceptrons) Each input node brings into the network the

value of one independent variable

The hidden layer nodes do most of the work

The output node(s) give us our result

Page 12: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Types of neuron

Linear neuron

Logistic neuron

Perceptron

Page 13: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Activation functions Transforms neuron’s input into output Features of activation functions:

A squashing effect is required Prevents accelerating growth of activation levels

through the network Simple and easy to calculate, differentiable...?

Page 14: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Perceptrons Formal neuron (McCulloch & Pitts, 1943)

Summation of inputs and threshold functions Showed how any logical function could be

computed with simple model neurons

Perceptron

Page 15: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Perceptrons Learning by weight adjustment

Output is 1 if the weighted sum of inputs is greater than a given threshold, otherwise output is 0

An extra weight (bias) is used as a way of adjusting the threshold

Perceptron

Page 16: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Alternative notation The threshold (for these examples) is always = 0

If the sum of the weighted inputs >= 0 then P = 1, else P = 0

A

B

X = 1

P

w0

w2

w1

Page 17: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Example If the inputs are binary (0 or 1) then the neuron

units act like a conventional logic module

In this example, P = A or B Equates to: if 2A + B – 0.5 >= 0 then output 1, else

output 0

A

B

X = 1

P

-0.5

1

2

Page 18: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Example 2 What is being computed here?

A

B

X = 1

P

-2.5

1

2

By only changing the bias, we can model another logical function = AND

Page 19: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Example 3 What is being computed here?

A

B

X = 1

P

-1.5

2

2

X = 1

Q

-0.5

1-3

C

-2

2.5

Page 20: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Standard backpropagation The commonest learning rule for ANNs

Made popular by Rumelhart and McClelland in 1986

Connections between nodes given random initial weights

We therefore get a value at the output node(s) which is what happens when these random weights are applied to the data at the input

Use the difference between the output and correct values to adjust weights

Page 21: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Training perceptronsFor ANDA B P0 0 00 1 01 0 01 1 1

• What are the weight values?

• Initialize with random weights

A

B

X = -1

P

?

?

?

Page 22: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Training perceptronsFor ANDA B P0 0 00 1 01 0 01 1 1

A

B

-1

P

0.3

-0.4

0.5

Bias A B Summation Output

-1 0 0 (-1*0.3) + (0*0.5) + (0*-0.4) = -0.3 0

-1 0 1 (-1*0.3) + (0*0.5) + (1*-0.4) = -0.7 0

-1 1 0 (-1*0.3) + (1*0.5) + (0*-0.4) = 0.2 1

-1 1 1 (-1*0.3) + (1*0.5) + (1*-0.4) = -0.2 0

Page 23: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Learning algorithm Epoch: Presentation of the entire training set to

the neural network In the case of the AND function an epoch consists of four

sets of inputs being presented to the network (i.e. [0,0], [0,1], [1,0], [1,1])

Error: The error value is the amount by which the value outputted by the network differs from the target value For example, if we required the network to output 0 and it

outputted a 1, then Error = -1

Page 24: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Learning algorithm Error

The weightings on the connections are then adjusted to try to reduce this error

Further iterations of weight adjustment proceed until we decide we’re finished And that is a study in itself

Epochs There may be many thousands of epochs in one training

run, before a satisfactory model is achieved But how many do we need?

The BIG question (or one of them!)

Page 25: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Perceptron learning ruleWeights are updated: wi = wi + wi

wi = (t - o) xi

t= the target valueo is the perceptron output is a small constant (e.g. 0.12) called the learning rate

For each training example: • If the output is correct (t=o) the weights wi are not changed• If the output is incorrect (to) the weights wi are changed such that the output of the perceptron for the new weights is closer to t

• The algorithm converges to the correct classification• If the training data is linearly separable• And is sufficiently small

Page 26: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Search space It’s often useful to think of ANN training as a

marble rolling across a surface, which has valleys in it

Some valleys may be deeper than others – we want to find a deep one

When we find it, we stop training

Local minimum

Global minimum

Page 27: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Gradient descent learning rule Consider linear unit without threshold and

continuous output o (not just 0,1 as with perceptrons) o = w0 + w1 x1 + … + wn xn

We train the wi such that they minimise the squared error E[w1,…,wn] = ½ dD (td-od)2

where D is the set of training examples

Page 28: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Gradient descent

Gradient:E[w]=[E/w0,… E/wn]

(w1,w2)

(w1+w1,w2 +w2)w=- E[w]wi=- E/wi

Page 29: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Gradient descent (linear units)Gradient-descent(training_examples, )

Each training example is a pair of the form <(x1,…xn),t> where (x1,…,xn) is the vector of input values, and t is the target output value, is the learning rate (e.g. 0.1)

Initialize each wi to some small random value Until the termination condition is met:

Initialize each wi to zero

For each <(x1,…xn),t> in training_examples:

Input the instance (x1,…,xn) to the linear unit and compute the output o

For each linear unit weight wi

wi = wi + (t-o)xi (update deltas)

For each linear unit weight wi

wi=wi+wi (update weights)

Page 30: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Incremental stochastic gradient descent Two approaches to this:

Batch mode = gradient descent over the entire data D

Incremental mode = gradient descent over individual training examples d

Incremental gradient descent can approximate

batch gradient descent arbitrarily closely if is small enough

Page 31: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Perceptron vs gradient descent rulePerceptron learning rule guaranteed to succeed if

Training examples are linearly separable Sufficiently small learning rate

Gradient descent learning for linear units Guaranteed to converge to hypothesis with minimum

squared error Given sufficiently small learning rate Even when training data contains noise

Page 32: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Decision boundaries In simple cases, divide feature space by drawing a

hyperplane across it Known as a decision boundary

Discriminant function: returns different values on opposite sides (straight line)

Problems which can be thus classified are linearly separable

Page 33: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Linear separability

X1

X2

A

B

A

A

AA

AA

B

B

B

B

B

B

B DecisionBoundary

Page 34: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Decision surface of a perceptron

+

++

+ -

-

-

-x1

x2

+

+-

-

x1

x2

• Perceptron is able to represent some useful functions• AND(x1,x2) choose weights w0=-1.5, w1=1, w2=1

• But functions that are not linearly separable (e.g. XOR) are not representable – we need multilayer perceptrons here

Linearly separable Non-linearly separable

Page 35: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Multilayer networks Cascade neurons together The output from one layer is the input to the next Each layer has its own sets of weights Hidden layer(s)…

Page 36: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Linear regression neural networks What happens when we arrange linear neurons

in a multilayer network?

Page 37: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Linear regression neural networks …nothing special happens

The product of two linear transformations is itself a linear transformation

Page 38: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Neural networks We want to introduce non-linearities to the network

Non-linearities allow a network to identify complex regions in space

Page 39: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Linear separability 1-layer cannot handle XOR More layers can handle more complicated spaces –

but require more parameters Each node splits the feature space with a hyperplane If the second layer is AND, a 2-layer network can

represent any convex hull

Page 40: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Separability

StructureExclusive-OR

problemClasses with

meshed regionsMost generalregion shapes

Single-Layer

Two-Layer

Three-Layer

A

AB

B

A

AB

B

A

AB

B

BA

BA

BA

Page 41: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Feed-forward networks Predictions are fed forward through the

network to classify

Page 42: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Feed-forward networks Predictions are fed forward through the

network to classify

Page 43: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Feed-forward networks Predictions are fed forward through the

network to classify

Page 44: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Feed-forward networks Predictions are fed forward through the

network to classify

Page 45: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Feed-forward networks Predictions are fed forward through the

network to classify

Page 46: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Feed-forward networks Predictions are fed forward through the

network to classify

Page 47: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Error backpropagation Error backpropagation solves the gradient for each

partial component separately The target values for each layer come from the next layer

This feeds the errors back along the network

Page 48: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Backpropagation (BP) algorithm BP employs gradient descent to attempt to

minimize the squared error between the network output values and the target values for these outputs

Two stage learning Forward stage: calculate outputs given input

pattern Backward stage: update weights by calculating

deltas

Page 49: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Termination conditions for BP The weight update loop may be iterated

thousands of times in a typical application

The choice of termination condition is important because Too few iterations can fail to reduce error sufficiently Too many iterations can lead to overfitting the training

data (see later!)

Termination criteria After a fixed number of iterations (epochs) Once the training error falls below some threshold Once the validation error meets some criterion

Page 50: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Why BP works in practiceA possible scenario

Weights are initialized to values near zero

Early gradient descent steps will represent a very smooth function (approximately linear). Why? The sigmoid function is almost linear when the total input

(weighted sum of inputs to a sigmoid unit) is near 0

The weights gradually move close to the global minimum As weights grow in the later stages of learning, they

represent highly non-linear network functions

Gradient steps in this later stage move toward local minima in this region, which is acceptable

Page 51: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Representational power of MLPs Every boolean function can be represented

exactly by some network with two layers of units Note: The number of hidden units required may grow

exponentially with the number of network inputs

Continuous functions Every bounded continuous function can be approximated

with arbitrarily small error, by network with one hidden layer (Cybenko 1989, Hornik 1989)

Any function can be approximated to arbitrary accuracy by a network with two hidden layers (Cybenko 1988)

Page 52: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

The hidden layer Without this, network probably won’t perform well

No hidden layer means a linear model

But it’s another big question! How many nodes do we need in the hidden layer?

http://vision.eng.shu.ac.uk/neural/FAQ/FAQ3.html#A_hu And might it perform better with more than one?

http://vision.eng.shu.ac.uk/neural/FAQ/FAQ3.html#A_hl (answers to both: in general, we don't know – experiment

and see how it works for your data)

See also http://www.heatonresearch.com/node/707

Page 53: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Interpretation of hidden layers What are the hidden layers actually doing?!

Essentially feature extraction

The non-linearities in the feature extraction can make interpretation of the hidden layers very difficult

This leads to neural networks being treated as black boxes

Page 54: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Overfitting We’re trying to form a model using some data,

which we can then use to predict the result of other data If we train too much, we end up with a model which can

near perfectly describe the data it’s seen, but can’t generalise

This model is said to have overtrained/overfitted Very important factor in machine learning!

Page 55: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Overfitting example We wanted the model to be able to predict the x

points (the model would have been simpler) But it overtrained on the o points (this is a more

complicated model) Clearly this is a very exaggerated example…

y

x

o

o

o

o

x

x x

xKeyO Training setX Test set

Desired trainOverfitted

Page 56: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Stopping the training So we have to stop the ANN training after it’s

found a good model, but before it goes too far

Since all datasets have different characteristics, we have to experiment to find out what is optimal

This overfitting problem occurs in other AI methods too and you must recognise it

Page 57: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Avoiding overfitting In order to avoid overfitting, we need to make

sure we validate our model: an overfitted model cannot generalise for unseen data

Use your query (test) set to check: does it still work?

If so, you can assume your model is OK and will work on further unseen data This assumes there is some more data of course!

Page 58: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Advantages of ANNs No programming (training instead)

Very fast in operation (parallel)

Can generalize over examples

Can tolerate noise

Degrades gracefully

Page 59: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Disadvantages of ANNs Needs large training sets

Can take long time Both positive and negative examples needed Supervised learning (desired results known)

Can be unpredictable on untrained data

Can’t explain results

Page 60: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

When to consider ANNs Input is high-dimensional discrete or real-

valued E.g. raw sensor data

Output is discrete or real-valued

Possibly noisy data

Form of target function is unknown

Human readability of result is unimportant

Page 62: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

More to try Pythia

Evaluation copy, limited to 13 neurons But good examples supplied http://download.cnet.com/Pythia/3000-2651_4-192

584.html

Sharky Nice graphical displays, but more difficult to see

what's happening with the data http://sharktime.com/

Page 63: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Other types of neural network Multiple Outputs Skip Layer Network Recurrent Neural Networks …

Page 64: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Multiple outputs

• Used for N-way classification• Each node in the output layer corresponds to a different class• No guarantee that the sum of the output vector will equal 1

Page 65: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Skip layer network Input nodes are also sent directly to the output

layer

Page 66: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Recurrent neural networks Output or hidden layer information is stored in

a context or memory layer

Context Layer

Output Layer

Hidden Layer

Input Layer

Page 67: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Time Delayed Recurrent Neural Networks

Hidden Layer

Output Layer

Input Layer

Output layer from time t are used as inputs to the hidden layer at time t+1

With an optional decay

Page 68: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Radial Basis Functions (RBFs) Features

One hidden layer Several outputs

Radial units

OutputsInputs

Page 69: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Radial Basis Functions (RBFs) RBF hidden layer units have a receptive field

which has a centre

Generally, the hidden unit function is Gaussian

The output layer is linear

Realized function

K

j jj cxWxs1

)(

2

exp

j

j

j

cxcx

Page 70: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Learning The training is performed by deciding on

How many hidden nodes there should be The centers and the sharpness of the Gaussians

2 steps In the 1st stage, the input dataset is used to determine

the parameters of the basis functions In the 2nd stage, functions are kept fixed while the second

layer weights are estimated (simple backpropagation algorithm like for MLPs)

Page 71: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

MLPs versus RBFs Classification

MLPs separate classes via hyperplanes

RBFs separate classes via hyperspheres

Learning MLPs use distributed

learning RBFs use localized learning RBFs train faster

Structure MLPs have one or more

hidden layers RBFs have only one layer RBFs require more hidden

neurons

X2

X1

MLP

X2

X1

RBF

Page 72: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Kohonen ANNs = Self-organising maps (SOMs)

What is it? It’s a neural network without any training outputs!

(unsupervised learning) It doesn’t need to be told how to form its model; it

organises itself A bit like PCA in that respect http://www.cis.hut.fi/projects/somtoolbox/theory/so

malgorithm.shtml

Page 73: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Self organizing maps The purpose of SOMs is to map a

multidimensional input space onto a topology-preserving map of neurons Preserve topology so that neighboring neurons respond to

“similar” input patterns The topological structure is often a 2 or 3 dimensional

space

Each neuron is assigned a weight vector with the same dimensionality as the input space

Input patterns are compared to each weight vector and the closest wins (Euclidean distance)

Page 74: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

The activation of the neuron is spread in its direct neighbourhood =>neighbours become sensitive to the same input patterns

Block distance

The size of the neighbourhood is initially large but reduce over time => Specialization of the network

First neighbourhood

2nd neighbourhood

Page 75: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Adaptation During training, the

“winner” neuron and its neighborhood adapts to make their weight vector more similar to the input pattern that caused the activation

The neurons are moved closer to the input pattern

The magnitude of the adaptation is controlled via a learning parameter which decays over time

Page 76: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

The algorithm Randomise each output node’s weights Choose a training input at random See which output matches closest (this is the

BMU: Best Matching Unit) Adjust weightings of all neighbouring nodes

(nodes within the radius of the BMU) to make them more like the training input The closer to BMU, the greater the change

Repeat from step 2

Page 77: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

The radius This defines which nodes are to be thought of

as near to the initial one

Initially (usually) it will encompass all the nodes in the map And decrease to 0 as the network proceeds

Page 78: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Testing We can then present to the network a new,

unseen input

The node which it falls closest to will be deemed to be its classification

Page 79: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

In action… http://fbim.fh-regensburg.de/~saj39122/begro

lu/kohonen.html a good applet demonstration of a Kohonen net

training

http://www.generation5.org/jdk/demos.asp (scroll down): applet running Kohonen net on

colours. Note how it’s clear that radius decreases over time

Page 80: Rkj@aber.ac.uk CSM6120 Introduction to Intelligent Systems Neural Networks.

Further reading Follow up all the links in the slides

Read Russell & Norvig, chapter 20 (especially the Neural Network section), but don't worry too much about the maths at this stage

Useful resources http://www.ai-junkie.com/ann/som/som1.html

Good tutorial Enough information to write your own Kohonen net!