Download - Neural Networks. Biological Neuron dendrites cell body axon signal direction colaterals synapse... Biological Neural Networks electrical signal electrical.

Neural Networks

Biological Neuron

dendrites

cell body

axon

signaldirection

colaterals

synapse

. . .

Biological Neural Networks

electricalsignal

electricalsignal

synapticgap

neurotransmitters

dendrite

vesicles

presynapticmembrane

postsynapticmembrane

axon

http://pharmacyebooks.com/2010/10/artifitial-neural-networks-hot-topic-pharmaceutical-research.html

BiologicalNetwork

inputdata output

(−∞ ,+∞ ) (−1 ,+1 )

The Perceptron

The perceptron was developed by Frank Rosenblatt in 1957. It is a simple feed-forward network that can solve (create a decision function for) linearly separable problems.

Inside the Perceptron

𝚺𝝎 𝒊𝑺𝒊

𝑺𝟎

𝑺𝟏

𝑺𝒊

𝑺𝑵 −𝟐

𝑺𝑵 −𝟏

𝝎𝟎

𝝎𝟏

𝝎 𝒊

𝝎𝑵 −𝟐

𝝎𝑵 −𝟏

......

𝑶𝒌

sigma-pi

stepfunction

weights

perceptronoutput

When is a Problem Linearly Separable?RED vs BLUE

Linearly Separable Not Linearly Separable

http://dynamicnotions.blogspot.com/2008/09/single-layer-perceptron.html

http://4.bp.blogspot.com/_NpEM479W8bw/SMAykRqIgYI/AAAAAAAAADM/35mfeqXHi4k/s1600-h/Non+Linear.PNG

A Practical ApplicationClassification

Data from: UCI Machine Learning Repository - http://archive.ics.uci.edu/ml/

The Iris Data - This is one of the most famous datasets used to illustrate the classification problem. From four characteristics of the flower (the length of the sepal, the width of the sepal, the length of the petal and the width of the petal), the objective is to classify a sample of 150 irises into three species: versicolor, virginica and setosa.

Sources: R.A. Fisher. "The use of multiple measurements in taxonomic problems. Annals of Eugenics", 7(2), 179–188 (1936)

4210.280.0110000300.1835137273718-1.521854844881471.06085392071769-10.1057086709985-1.533286977513334.0131689222145-1.6375908770170810.741961194748-6.013315934547286.66056158141261

input layerhidden layeroutput layer

learning rateerror limitmax runs

# training sets

ihweights

howeights

0.0

0.5

1.0

Iris-setosa

Iris-versicolor

Iris-virginica

sepallength

sepalwidth

petallength

petalwidth

5.1 3.5 1.4 0.2 Iris-setosa4.9 3.0 1.4 0.2 Iris-setosa4.7 3.2 1.3 0.2 Iris-setosa4.6 3.1 1.5 0.2 Iris-setosa5.0 3.6 1.4 0.2 Iris-setosa :7.0 3.2 4.7 1.4 Iris-versicolor6.4 3.2 4.5 1.5 Iris-versicolor6.9 3.1 4.9 1.5 Iris-versicolor5.5 2.3 4.0 1.3 Iris-versicolor6.5 2.8 4.6 1.5 Iris-versicolor :6.3 3.3 6.0 2.5 Iris-virginica5.8 2.7 5.1 1.9 Iris-virginica7.1 3.0 5.9 2.1 Iris-virginica6.3 2.9 5.6 1.8 Iris-virginica6.5 3.0 5.8 2.2 Iris-virginica :

Iris Data - 3 classes 50 samples eachtrained network specification

4-2-1 net

iris characteristics

Training a 4-2-1 Network for the Iris Data

1/5 of Iris Data selected uniformly, 10 samples per class for a total of 30 training set pairs. The 4-2-1 network is comprised of a total of 10 weights, 8 between the input and hidden layers, and 2 between the hidden layer and the output.

The outputs for the three classes were set to 0, 0.5 and 1.0

1 2 3

1

2

3

50 0 0

0 46 1

0 4 49

1 2 3

1

2

3

1.0 0.0 0.0

0.0 0.92 1.0

0.0 0.08 0.98

ClassifierPerformance

Sample Count

Perf. Fraction

A Demonstration

inputdata

outputdata

(−∞ ,+∞ ) (−∞ ,+∞ )

inputlayer

hiddenlayer

outputlayer

(−1 ,+1 )

Typical Feed-Forward Neural Network

Inside an Artificial Neuron

𝚺𝝎 𝒊𝑶𝒊

𝑶𝟎

𝑶𝟏

𝑶𝒊

𝑶𝑵 −𝟐

𝑶𝑵 −𝟏

𝝎𝟎

𝝎𝟏

𝝎 𝒊

𝝎𝑵 −𝟐

𝝎𝑵 −𝟏

......

𝑶𝒌

sigma-pi

sigmoidfunction

ou

tpu

ts f

rom

pre

vio

us

laye

r

weights

neuronoutput

dis

trib

uti

on

to

nex

t la

yer

1. Initialize the network with small random weights.

2. Present an input pattern to the input layer of the network.

3. Feed the input pattern forward through the network to calculate its activation value.

4. Take the difference between desired output and the activation value to calculate the network’s activation error.

5. Adjust the weights feeding the output neurons to reduce the activation error for this input pattern.

6. Propagate an error value back to each hidden neuron that is proportional to its contribution to the network activation error.

7. Adjust the weights feeding each hidden neuron to reduce its contribution of error for this input pattern.

8. Repeat steps 2 to 7 for each input pattern in the training set ensemble.

9. Repeat step 8 until the network is suitably trained.

Backward Error Propagation

Implementing a Neural Network

m input layer nodes

n hiddenlayer nodes p output

layer nodes

t inputtraining

setseach withm values

t outputtraining

setseach withp values

m x nweightsinput tohiddenlayer

n x pweightshidden to

outputlayer

public static double learn = 0.28;public static double error = 0.01;public static int npairs = 0;public static int maxnumruns = 10000;public static int numinput = 1;public static int numhidden = 1;public static int numoutput = 1;public static double[,] inTrain;public static double[,] outTrain;public static neuron[] iLayer;public static neuron[] hLayer;public static neuron[] oLayer;public static weight[,] ihWeight;public static weight[,] hoWeight;public static int pxerr;public static double Scalerr;public static bool showtoterr = true;

public class neuron{ public double input; public double output; public double error; public neuron() { input = 0.0; output = 0.0; error = 0.0;

}}

public class weight{ public double wt; public double delta; public weight(double wght) { wt = wght; delta = 0.0;

}}

Neural Network Data Structure & Components

𝒘 𝒊𝒋

𝝄𝒑𝒊

𝜹𝒑𝒋

𝜼

=

correction to weight value

error in jth unit

learning rate

𝚫𝒑𝒘 𝒊𝒋

Generalized Delta Rule

tpi

pth

tra

inin

g s

et i

np

ut

Quantifying Error for Back Propagation

𝜹𝒑𝒋= 𝒇 ′ (𝒂𝒑𝒋) (𝒕𝒑𝒋−𝒐𝒑𝒋 )𝒇 (𝒂𝒑𝒋 )neuron output function for pth presentation for training

error for jth unit in output layer

error for jth unit in hidden layer𝜹𝒑𝒋= 𝒇 ❑

′ 𝒋 (𝒂𝒑𝒋)( ∑( 𝒇𝒐𝒓 𝒂𝒍𝒍𝒌)

𝜹𝒑𝒌𝒘 𝒋𝒌)

𝒘 𝒋𝒌

𝜹𝒑𝒋

output layerhidden layer

𝜹𝒑𝟏

𝜹𝒑𝟐

𝜹𝒑𝒌

...

𝒘 𝒋𝟐

𝒘 𝒋𝟏 pth

trainin

g set o

utp

ut

𝒕𝒑𝟏

𝒕𝒑𝟐

𝒕𝒑𝒌

𝒇 (𝒙 )= 𝟐𝟏+𝒆−𝟐𝒙

−𝟏

𝒇 ′ (𝒙 )=𝟏− 𝒇 (𝒙)𝟐

The Sigmoid Function

sigmoid

derivative ofthe sigmoid

𝒇 (𝒙 )= 𝟏𝟏+𝒆−𝒙

𝒇 ′ (𝒙 )= 𝒇 (𝒙) (𝟏− 𝒇 (𝒙))

Another Sigmoid Function

sigmoid

derivative ofthe sigmoid

public void calcInputLayer(int p){ for (int i = 0; i < iLayer.Length; i++) { iLayer[i].output = inTrain[i, p]; }} public void calcHiddenLayer(){ for(int h=0;h<hLayer.Length;h++) { hLayer[h].input = 0.0; for (int i = 0; i < iLayer.Length; i++) hLayer[h].input += ihWeight[i, h].wt * iLayer[i].output; hLayer[h].output = f(hLayer[h].input); }} public void calcOutputLayer(){ for (int o = 0; o < oLayer.Length; o++) { oLayer[o].input = 0.0; for (int h = 0; h < hLayer.Length; h++) oLayer[o].input += hoWeight[h, o].wt * hLayer[h].output; oLayer[o].output = f(oLayer[o].input); }}

Running the Neural Network

public double f(double x){ return 1.0 / (1.0 + Math.Exp(-x));} public double df(double x){ return f(x) * (1.0 - f(x));}

Running the network is a feed-forward process. Input data is presented to the input layer.

The activation (input) is computed for each node of the hidden layer and then used to compute the output of the hidden layer nodes

The activation (input) is computed and used to compute the output of the network.

public void calcOutputError(int p, int r){ for (int o = 0; o < oLayer.Length; o++) oLayer[o].error = df(oLayer[o].input) * (outTrain[o, p] - oLayer[o].output); for (int h = 0; h < hLayer.Length; h++) for (int o = 0; o < oLayer.Length; o++) hoWeight[h, o].wt += learn * oLayer[o].error * hLayer[h].output;} public void calcHiddenError(int p, int r){ double err = 0.0; for (int h = 0; h < hLayer.Length; h++) { for (int o = 0; o < oLayer.Length; o++) err = oLayer[o].error * hoWeight[h, o].wt; hLayer[h].error = df(hLayer[h].input) * err; } for (int i = 0; i < iLayer.Length; i++) for (int h = 0; h < hLayer.Length; h++) ihWeight[i, h].wt += learn * hLayer[h].error * iLayer[i].output;}

Training the Network

In backward error propagation, the difference between the actual output and the goal (or target) output provided in the training set is used to compute the error in the network. This error is then used to compute the delta (change) in weight values for the weights between the hidden layer and the output layer.

These new weight values are then used to distribute the output error to the hidden layer nodes. These nodes errors are, in turn, used to compute the changes in value for the weights between the input layer and the hidden layer of the network.

1. Set the number of neurons in each level

2. Select the learning rate, error limit and max training runs

3. Give the number of training pairs and include them in the left-hand text window with input output pairs listed sequentially

input 1output 1input 2output 2 :input noutput n

Total Training Set Ensemble Error during training process

Training rate depends on initialvalue of random weights

User can monitor rate of error correction ineach weight during training as weight color

large delta

small delta

small or zero changes in each weight donot necessarily mean that network is trained

training could be hung in a local minimum

When running the network, place inputvalues in text window and click run

answer(s) appear on next line(s)

How Many Nodes?

Number of Input Layer Nodes matches number of input valuesNumber of Ouput Layer Nodes matches number of output values

But what about the hidden Layer?

Too few hidden layer nodes and the NN can't learn the patterns.

Too many hidden layer nodes and the NN doesn't generalize.

When Should We Use Neural Networks?

Neural Networks need lots of data (example solutions) for training.

The functional relationships of the problem/solution are not well understood.

The problem/solution is not applicable to a rule-based solution.

"Similar input data sets generate "similar" outputs.

Neural Networks perform general Pattern Recognition.

Neural Networks are particularly good as Decision Support tools.

Also good for modeling behavior of living systems.

Can a Neural Network do More than a Digital Computer?

Clearly a simlulation of a Neural Network running on a digital computer cannot be more powerful than the computer on which it is being executed.

The question is, "Can a computational system such as a Neural Network be built that can do something that a digital computer cannot?"

A digital computer is the physical embodiment of a Turing Machine which is defined as a universal computer of all computable functions.

An artificial Neural Network is loosely modeled on the human brain.

Rather than using a software simulation of neurons, we can build electronic circuits that closely mimic the activities of human brain cells.

Can we build a physical system of any kind (based on electronics, chemistry, etc...) that does everything a human brain can do?

Can you think of something human brains do that, so far, has not been accomplished or, at least, approximated by a computer or any other physical (man-made) system?

Consciousness

Can a Neural Network do More than a Digital Computer?

Clearly a simlulation of a Neural Network running on a digital computer cannot be more powerful than the computer on which it is being executed.

The question is, "Can a computational system such as a Neural Network be built that can do something that a digital computer cannot?"

A digital computer is the physical embodiment of a Turing Machine which is defined as a universal computer of all computable functions.

An artificial Neural Network is loosely modeled on the human brain.

Rather than using a software simulation of neurons, we can build electronic circuits that closely mimic the activities of human brain cells.

Can we build a physical system of any kind (based on electronics, chemistry, etc...) that does everything a human brain can do?

Can you think of something human brains do that, so far, has not been accomplished or, at least, approximated by a computer or any other physical (man-made) system?

What is the Computational Power of Consciousness?

Since we can't quantify consciousness, it is not likely that we can determine the level of computational power necessary to manifest it.

However, we can establish a relative measure of computational power for systems that do and (so far) do not exhibit consciousness.

Human Mind/Brain

Turing Machine

Digital Computer

Neural Network

Physical System/Model

Mind/Brain

TuringMachine

PhysicalModel

DigitalComputer

NeuralNetwork

Relative Computational Power

Mind/Brain

TuringMachine

PhysicalModel

DigitalComputer

NeuralNetwork

The RevisedTuring Test

Dualismvs

Materialism

Finite Storageand

Finite Precision

Engineeringand

Technology

Symbolismvs

Connectionism

Due to limitations of finite storage and the related issue of finite precision arithmetic, a Turing Machine can exhibit greater computational power than a digital computer.


Mind/Brain

TuringMachine

PhysicalModel

DigitalComputer

NeuralNetwork


>