What is a neural network (NN)? Neural Networksbabaoglu/courses/cas00-01/lectures/NN-SCA.6up.… ·...

1

Neural Networks

26 aprile 2001

Renzo Davoli

Sistemi Complessi Adattivi

What is a neural network (NN)?

• According to the DARPA Neural NetworkStudy (1988, AFCEA International Press, p.60):– ... a neural network is a system composed of

many simple processing elements operating inparallel whose function is determined bynetwork structure, connection strengths, and theprocessing performed at computing elements ornodes.

What is a neural network (NN)?• According to Haykin, S. (1994), Neural

Networks: A Comprehensive Foundation,NY: Macmillan, p. 2:– A neural network is a massively parallel

distributed processor that has a naturalpropensity for storing experiential knowledgeand making it available for use. It resembles thebrain in two respects:

• 1.Knowledge is acquired by the network through alearning process.

• 2.Inter-neuron connection strengths known assynaptic weights are used to store the knowledge.


• According to Nigrin, A. (1993), NeuralNetworks for Pattern Recognition,Cambridge, MA: The MIT Press, p. 11:– A neural network is a circuit composed of a

very large number of simple processingelements that are neurally based. Each elementoperates only on local information.Furthermore each element operatesasynchronously; thus there is no overall systemclock.


• According to Zurada, J.M. (1992),Introduction To Artificial Neural Systems,Boston: PWS Publishing Company, p. xv:– Artificial neural systems, or neural networks,

are physical cellular systems which can acquire,store, and utilize experiential knowledge.

The von Neumannmachine and the

symbolic paradigm

• The machine must be told in advance, and in great detail, theexact series of steps required to perform the algorithm. Thisseries of steps is the computer program.

• The type of data it deals with has to be in a precise format -noisy data confuses the machine.

• The hardware is easily degraded - destroy a few key memorylocations and the machine will stop functioning or `crash'.

• There is a clear correspondence between the semanticobjects being dealt with (numbers, words, database entriesetc) and the machine hardware. Each object can be `pointedto' in a block of computer memory.

2

Real NeuronsReal Neurons

• Signals are transmitted between neurons by electrical pulses (action-potentialsor `spike' trains) travelling along the axon. These pulses impinge on thesynapses.

• These are found principally on a set of branching processes emerging from thecell body (soma) known as dendrites.

• Each pulse occurring at a synapse initiates the release of a small amount ofchemical substance or neurotransmitter which travels across the synaptic cleftand which is then received at post-synaptic receptor sites on the dendritic sideof the synapse. The neurotransmitter becomes bound to molecular sites herewhich, in turn, initiates a change in the dendritic membrane potential. Thispost-synaptic-potential (PSP) change may serve to increase (hyperpolarise) ordecrease (depolarise) the polarisation of the post-synaptic membrane.

• In the former case, the PSP tends to inhibit generation of pulses in the afferentneuron, while in the latter, it tends to excite the generation of pulses. The sizeand type of PSP produced will depend on factors such as the geometry of thesynapse and the type of neurotransmitter. Each PSP will travel along itsdendrite and spread over the soma, eventually reaching the base of the axon(axon-hillock). The afferent neuron sums or integrates the effects of thousandsof such PSPs over its dendritic tree and over time. If the integrated potential atthe axon-hillock exceeds a threshold, the cell `fires' and generates an actionpotential or spike which starts to travel along its axon.

• This then initiates the whole sequence ofevents again in neurons contained inthe efferent pathway.

Artificial neurons:the Threshold Logic Unit (TLU)

[McCulloch and Pitts, 1943]

Artificial neurons: the TLU [McCulloch and Pitts, 1943]

We suppose there are n inputs with signals and weights

The signals take on the values `1' or `0' only. That is the signals areBoolean valued.

The activation a, is given by

the output y is then given by thresholding the activation

Theorem of TLUNon-binary signal communication

It is generally accepted that, in real neurons,information is encoded in terms of the frequency offiring rather than merely the presence or absence of apulse. There are two ways we can represent this in ourartificial neurons.

– First, we may extend the signal range to bepositive real numbers.

– We may emulate the real neuron and encode asignal as the frequency of the occurrence of a `1'in a pulse stream.

3

Sigmoid output function• Encoding frequencies (so managing real numbers

instead of binary data) works fine at the input straightaway, but the use of a step function limits the outputsignals to be binary. This may be overcome by`softening' the step-function to a continuous`squashing' function like the Sigmoid.

(with threshold θ)

ρ determines the shape of the sigmoid

What can you do with an NNand what not? (1)

• In principle, NNs can compute anycomputable function, i.e., they can doeverything a normal digital computer cando.(Valiant, 1988; Siegelmann and Sontag, 1999; Orponen,2000; Sima and Orponen, 2001)


• Clearly the style of processing is completely dif ferent from von Neumannmachines - it is more akin to signal processing than symbol processing. Thecombining of signals and producing new ones is to be contrasted with theexecution of instructions stored in a memory.

• Information is stored in a set of weights rather than a program. The weights aresupposed to adapt when the net is shown examples from a training set.

• Nets are robust in the presence of noise: small changes in an input signal willnot drastically af fect a node's output.

• Nets are robust in the presence of hardware f ailure: a change in a weight mayonly af fect the output for a f ew of the possible input patterns.

• High level concepts will be represented as a pattern of activity across manynodes rather than as the contents of a small portion of computer memory.

• The net can deal with `unseen' patterns and generalise from the training set.• Nets are good at `perceptual' tasks and associative recall. These are just the

tasks that the symbolic approach has difficulties with.


• There are important problems that are so difficult that aneural network will be unable to learn them withoutmemorizing the entire training set, such as:– Predicting random or pseudo-random numbers.– Factoring large integers.– Determing whether a large integer is prime or composite.– Decrypting anything encrypted by a good algorithm.

• And it is important to understand that there are nomethods for training NNs that can magically createinformation that is not contained in the training data.

Categories of NN: Learning

• The two main kinds of learning algorithms aresupervised and unsupervised.– In supervised learning, the correct results (target values,

desired outputs) are known and are given to the NN duringtraining so that the NN can adjust its weights to try match itsoutputs to the target values. After training, the NN is tested bygiving it only input values, not target values, and seeing howclose it comes to outputting the correct target values.

– In unsupervised learning, the NN is not provided with thecorrect results during training. Unsupervised NNs usuallyperform some kind of data compression, such asdimensionality reduction or clustering. See "What doesunsupervised learning learn?"

Categories of NN: Topology

• Two major kinds of network topology are feedforwardand feedback.– In a feedforward NN, the connections between units do not

form cycles. Feedforward NNs usually produce a response toan input quickly. Most feedforward NNs can be trained using awide variety of efficient conventional numerical methods (e.g.conjugate gradients) in addition to algorithms invented by NNresearchers.

– In a feedback or recurrent NN, there are cycles in theconnections. In some feedback NNs, each time an input ispresented, the NN must iterate for a potentially long timebefore it produces a response. Feedback NNs are usually moredifficult to train than feedforward NNs.

4

Categories of NN: Accepted Data• Two major kinds of data are categorical and

quantitative.– Categorical variables take only a finite (technically,

countable) number of possible values, and there are usuallyseveral or more cases falling into each category. Categoricalvariables may have symbolic values (e.g., "male" and"female", or "red", "green" and "blue") that must be encodedinto numbers before being given to the network. Bothsupervised learning with categorical target values andunsupervised learning with categorical outputs are called"classification."

– Quantitative variables are numerical measurements of someattribute, such as length in meters. The measurements must bemade in such a way that at least some arithmetic relationsamong the measurements reflect analogous relations amongthe attributes of the objects that are measured. Supervisedlearning with quantitative target values is called "regression."

Types of NN: 1 supervised learning• Feedforward

– Linear• Hebbian - Hebb (1949), Fausett (1994)• Perceptron - Rosenblatt (1958), Minsky and Papert (1969/1988), Fausett (1994)• Adaline - Widrow and Hoff (1960), Fausett (1994)• Higher Order - Bishop (1995)• Functional Link - Pao (1989)

– MLP: Multilayer perceptron - Bishop (1995), Reed and Marks (1999), Fausett (1994)• Backprop - Rumelhart, Hinton, and Williams (1986)• Cascade Correlation - Fahlman and Lebiere (1990), Fausett (1994)• Quickprop - Fahlman (1989)• RPROP - Riedmiller and Braun (1993)

– RBF networks - Bishop (1995), Moody and Darken (1989), Orr (1996)• OLS: Orthogonal Least Squares - Chen, Cowan and Grant (1991)• CMAC: Cerebellar Model Articulation Controller - Albus (1975), Brown and Harris (1994)

– Classification only• LVQ: Learning Vector Quantization - Kohonen (1988), Fausett (1994)• PNN: Probabilistic Neural Network - Specht (1990), Masters (1993), Hand (1982), Fausett (1994)

– Regression only• GNN: General Regression Neural Network - Specht (1991), Nadaraya (1964), Watson (1964)

• Feedback - Hertz, Krogh, and Palmer (1991), Medsker and Jain (2000)– BAM: Bidirectional Associative Memory - Kosko (1992), Fausett (1994)– Boltzman Machine - Ackley et al. (1985), Fausett (1994)– Recurrent time series

• Backpropagation through time - Werbos (1990)• Elman - Elman (1990)• FIR: Finite Impulse Response - Wan (1990)• Jordan - Jordan (1986)• Real-time recurrent network - Williams and Zipser (1989)• Recurrent backpropagation - Pineda (1989), Fausett (1994)• TDNN: Time Delay NN - Lang, Waibel and Hinton (1990)

• Competitive– ARTMAP - Carpenter, Grossberg and Reynolds (1991)– Fuzzy ARTMAP - Carpenter, Grossberg, Markuzon, Reynolds and Rosen (1992), Kasuba (1993)– Gaussian ARTMAP - Williamson (1995)– Counterpropagation - Hecht-Nielsen (1987; 1988; 1990), Fausett (1994)– Neocognitron - Fukushima, Miyake, and Ito (1983), Fukushima, (1988), Fausett (1994)

Types of NN: 2 unsupervised learning

• Competitive– Vector Quantization

• Grossberg - Grossberg (1976)• Kohonen - Kohonen (1984)• Conscience - Desieno (1988)

– Self-Organizing Map• Kohonen - Kohonen (1995), Fausett (1994)• GTM: - Bishop, Svensen and Williams (1997)• Local Linear - Mulier and Cherkassky (1995)

– Adaptive resonance theory• ART 1 - Carpenter and Grossberg (1987a), Moore (1988), Fausett (1994)• ART 2 - Carpenter and Grossberg (1987b), Fausett (1994)• ART 2-A - Carpenter, Grossberg and Rosen (1991a)• ART 3 - Carpenter and Grossberg (1990)• Fuzzy ART - Carpenter, Grossberg and Rosen (1991b)• DCL: Differential Competitive Learning - Kosko (1992)

• Dimension Reduction - Diamantaras and Kung (1996)– Hebbian - Hebb (1949), Fausett (1994)– Oja - Oja (1989)– Sanger - Sanger (1989)– Differential Hebbian - Kosko (1992)

• Autoassociation– Linear autoassociator - Anderson et al. (1977), Fausett (1994)– BSB: Brain State in a Box - Anderson et al. (1977), Fausett (1994)– Hopfield - Hopfield (1982), Fausett (1994)

Training TLUs(first simple example of supervised learning)

• The training set for the TLU will consist of a set of pairs{v,t}, where is v an input vector and t is the target class oroutput (`1' or `0') that belongs to (i.e. the expected output).

• The learning rule (or training rule is):

• The parameter α is called the learning rate.

• This is named “the Perceptron learning rule”

Training TLUs(first simple example training algorithm)

Training TLUsnumerical example

5

Training TLUconvergence Theorem

note: read d(x) is the target

Perceptron(Rosenblatt 1959)

Perceptron Adaline• Is a perceptron-like network.

• In a simple physical implementation this device consists ofa set of controllable resistors connected to a circuit whichcan sum up currents caused by the input voltage signals.

• An adaline is an array of computing element like this:

Adaline: the delta rule

• Adaline learning rule is a refinement ofPerceptron.

• The Least Mean Square (LMS) procedurefinds the values of all the weights thatminimize the error function by a methodcalled gradient descent.

Adaline: the delta rule

• the total error E is defined to be

• The idea is to make a change in the weight proportional tothe negative of the derivative of the error as measured onthe current pattern with respect to each weight:

• γ is the learning rate

6

Adaline: the delta rulea bit of calculus...

Using TLUs and perceptrons asclassifiers

• All the percepton like networks are subjectto the TLU theorem: they are able toclassify only linearly separable set of data.

The XOR problem

• XOR is not linearly separable!

Solution of the XOR problem

• A multilayer perceptron is able to solve theperceptron problem.

Theorem: Multi-layerperceptrons can do everything

Theorem: Multi-layerperceptrons can do everything

• Each function f:{-1,1}n->{-1,1}m can becomputed by a multi-layer perceptron

• with a single hiddenlayer

• but the number of hiddennodes can be up to 2n

7

The learning rule for multi-layerperceptrons: the generalized delta rule

generalized delta rule (calculus...)

output layer�

other layers �

Generalized delta rule:the core of backprop model

• The equations derived in the previous section may be mathematically correct,but what do they actually mean? Is there a way of understanding back-propagation other than reciting the necessary equations?

• The answer is, of course, yes. In fact, the whole back-propagation process isintuitively very clear. What happens in the above equations is the following.When a learning pattern is clamped, the activation values are propagated to theoutput units, and the actual network output is compared with the desired outputvalues, we usually end up with an error in each of the output units. We knowfrom the delta rule that, in order to reduce an error, we have to adapt itsincoming weig hts according to

• That's step one. But it alone is not enough: when we only apply this rule, theweights from input to hidden units are never changed, and we do not have thefull representational power of the feed-forward network as promised by theuniversal approximation theorem. In order to adapt the weights from input tohidden units, we again want to apply the delta rule. In this case, however, we donot have a value for for the hidden units. This is solved by the chain rule whichdoes the following: distribute the error of an output unit o to all the hidden unitsthat is it connected to, weighted by this connection.

Weight adjustments with sigmoidactivation function.

Learning Rate and Momentum Deficiencies of back-propagation• Network paralysis. As the network trains, the weights can

be adjusted to very large values. The total input of a hiddenunit or output unit can therefore reach very high (eitherpositive or negative) values, and because of the sigmoidactivation function the unit will have an activation veryclose to zero or very close to one.

• Local minima. The error surface of a complex network isfull of hills and valleys. Because of the gradient descent,the network can get trapped in a local minimum whenthere is a much deeper minimum nearby.

8

Tuning BackProp: # of samplesof the learning set

Tuning BackProp: # of samplesof the learning set

Tuning BackProp: # of hiddenunits

Tuning BackProp: # of hiddenunits

This slide has been intentionally left blank!

Associative memories

• `Remembering' something in common parlance usually consistsof associating something with a sensory cue. For example,someone may say something, like the name of a celebrity, and weimmediately recall a chain of events or some experience relatedto the celebrity - we may have seen them on TV recently forexample. Or, we may see a picture of a place visited in ourchildhood and the image recalls memories of the time. The senseof smell (olfaction) is known to be especially evocative in thisway.

• On a more mundane level, but still in the same category, we maybe presented with a partially obliterated letter, or one seenthrough a window when it is raining (letter + noise) and go on torecognize the letter.

9

The nature of associativememory

A physical analogy with memory

The Hopfield network The Hopfield Network

Hopfield Nets: convergence Theorem(and correlation net-energy)

Teaching Hopfield nets to beassociative memories

10

Hopfield nets: learning ruleAnalogue Hopfield nets:

a NN solution to TSP(travel salesman problem)

Analogue Hopfield nets:a NN solution to TSP

(travel salesman problem)

Boltzmann Machines

This slide has been intentionally left blank!

Self-Organizing Networks

11

Competitive Learning.

• Competitive Learning is an unsupervised learningprocedure that divides input patterns in cluster.

Winner Selection 1: dot product

Competitive learning:geometrical meaning.

Winner Selection 2: Euclidean distance

ExampleLinear Vector Quantization (LVQ)

12

LVQ2 strategy Kohonen Networks

Kohonen network: learning ruleKohonen network: example

Credits

• Neural Network FAQ:ftp://ftp.sas.com/pub/neural/FAQ.html.

• Dr. Leslie Smith brief on-line introduction to NNshttp://www.cs.stir.ac.uk/~lss/NNIntro/InvSlides.html.

• Kevin Gurney An Introduction to Neural Networks,http://www.shef.ac.uk/psychology/gurney/notes/index.html

• Ben Kröse and Patrick van der Smagt, An Introductionto Neural Networks, ftp://ftp.wins.uva.nl/pub/computer-systems/aut-sys/reports/neuro-intro/neuro-intro.ps.gz

What is a neural network (NN)? Neural Networksbabaoglu/courses/cas00-01/lectures/NN-SCA.6up.… ·...

Documents

Transcript of What is a neural network (NN)? Neural Networksbabaoglu/courses/cas00-01/lectures/NN-SCA.6up.… ·...