Ann

25
Artificial Neural Networks 1

Transcript of Ann

Page 1: Ann

ArtificialNeural Networks

1

Page 2: Ann

What is Neural Network ?

Definition: An artificial neural network is a computer program that can recognize patterns in a given collection of data and produce a model for that data. It resembles the brain in two respects:

o Knowledge is acquired by the network through a learning process (trial and error).

o inter-neuron connection strengths known as synaptic weights are used to store the knowledge.

2

Page 3: Ann

Demonstration

Demonstration of a neural network used within an optical character recognition (OCR) application.

3

Page 4: Ann

Neural Network Structure

Artificial neuron

Network refers to the inter–connections between the neurons in the different layers of each system.

The most basic system has three layers. The first layer has input neurons which send data via

synapses to the second layer of neurons and then via more synapses to the third layer of output neurons.

The synapses store parameters called "weights" which are used to manipulate the data in the calculations.

w1

w2

wn

: Σ F(net)

Input Output

4

Page 5: Ann

Network Function f(x) is a composition of other functions gi(x) , which can further be defined as a composition of other functions.

This can be conveniently represented as a network structure, with arrows depicting the dependencies between variables.

5

Page 6: Ann

This figure depicts a decomposition of “f”, with dependencies between variables indicated by arrows. These can be interpreted in two ways.

Functional view: the input is transformed into a 3-dimensional vector “h” , which is then transformed into a 2-dimensional vector “g” , which is finally transformed into “f”

Probabilistic view: the random variable F=f(G) depends upon the random variable G=g(H) , which depends upon H=h(X), which depends upon the random variable X. This view is most commonly encountered in the context of graphical models.

6

Page 7: Ann

Why do we use Neural Networks?

Ability to represent both linear and non-linear relationships

Their ability to learn these relationships directly from the data being modeled.

Although computing these days is truly advanced, there are certain tasks that a program made for a common microprocessor is unable to perform.

There are different architectures, which consequently requires different types of algorithms, but despite to be an apparently complex system, a neural network is relatively simple.

7

Page 8: Ann

Advantages of ANN

A neural network can perform tasks that a linear program can not.

When an element of the neural network fails, it can continue without any problem by their parallel nature.

A neural network learns and does not need to be reprogrammed.

It can be implemented in any application without any problem.

8

Page 9: Ann

Disadvantages of ANN

The neural network needs training to operate.

The architecture of a neural network is different from the architecture of microprocessors therefore needs to be emulated.

Requires high processing time for large neural networks.

9

Page 10: Ann

How do Neural Networks Work?

Input Output

Train the Network•Present the data to the network.•Network computes an output.•Network output compared to desired output.•Network weights are modified to reduce error.

Use the Network•Present new data to network.•Network computes an output based on its training

10

Page 11: Ann

Mathematical Model

A set of major aspects of a parallel distributed model can be distinguished :

a set of processing units ('neurons,' 'cells'); a state of activation yk for every unit, equivalent to

the output.connections between the units. a propagation rulean activation function Φk

an external input (aka bias, offset) θk for each unit; a method for information gathering (the learning

rule); an environment within which the system must

operate.

11

Page 12: Ann

Contd…

Connections between unitso Assume that unit provides an additive

contribution to the input of the unit which it is connected

o The total input to unit k is simply the weighted sum of the separate outputs from each of the connected units plus a bias or offset term θk

o A positive wjk is considered excitation and negative wjk as inhibition.

o The units of propagation rule be call sigma units

sk (t) = Σ wjk (t) yj (t)+ θk

12

Page 13: Ann

Contd…

The activity of summing up of inputs is referred to as linear combination.

Finally, an activation function controls the amplitude of the output of the neuron.

An acceptable range of output is usually between 0 and 1, or -1 and 1.

The output of the neuron, yk, would therefore be the outcome of some activation function on the value of vk.

13

Page 14: Ann

Contd…

Mathematically, this process is described in this figure

14

Page 15: Ann

Activation Functions

There are three types of activation functions, denoted by Φ(.): Threshold Function: takes on a value of 0 if the

summed input is less than a certain threshold value (v), and the value 1 if the summed input is greater than or equal to the threshold value.

15

Page 16: Ann

Contd…

Piecewise-Linear function: This function again can take on the values of 0 or 1, but can also take on values between that depending on the amplification factor in a certain region of linear operation.

16

Page 17: Ann

Contd…

Sigmoid function: This function can range between 0 and 1, but it is also sometimes useful to use the -1 to 1 range. An example of the sigmoid function is the hyperbolic tangent function.

17

Page 18: Ann

Training of Artificial Neural Networks

o A neural network has to be configured such that the application of a set of inputs produces (either ‘direct’ or via a relaxation process) the desired set of output.

o One way is to set the weights explicitly, using a priori knowledge.

o Other way is to ‘train’ the neural network by feeding it teaching patterns and letting it change its weights according to some learning rule.

18

Page 19: Ann

Paradigms of Learning

Supervised learning or Associative learning

In supervised learning, we are given a set of example pairs (x, y) {x € X , y € Y} and the aim is to find a function f: X->Y in the allowed class of functions that matches the examples.

We wish to infer the mapping implied by the data; the cost function is related to the mismatch between our mapping and the data and it implicitly contains prior knowledge about the problem domain.

19

Page 20: Ann

A commonly used cost is the mean-squared error which tries to minimize the average squared error between the network's output, f(x), and the target value “y” over all the example pairs.

When one tries to minimize this cost using gradient descent for the class of neural networks called Multi-Layer Perceptrons, one obtains the common and well-known backpropagation algorithm for training neural networks.

20

Page 21: Ann

Contd…

Unsupervised learning or Self-organization o An (output) unit is trained to respond to clusters

of pattern within the input.

o In this paradigm the system is supposed to discover statistically salient features of the input population.

o Unlike the supervised learning paradigm, there is no a priori set of categories into which the patterns are to be classified; rather the system must develop its own representation of the input stimuli.

21

Page 22: Ann

we are given some data “x” and the cost function to be minimized, that can be any function of the data “x” and the network's output, “f”.

The cost function is dependent on the task (what we are trying to model) and our a priori assumptions (the implicit properties of our model, its parameters and the observed variables).

22

Page 23: Ann

Modifying patterns of connectivity

Hebbian learning ruleo Suggested by Hebb in his classic book

Organization of Behaviour (Hebb, 1949) o The basic idea is that if two units j and k are

active simultaneously, their interconnection must be strengthened. If j receives input from k, the simplest version of Hebbian learning prescribes to modify the weight wjk with:

ϒ is a positive constant of proportionality representing the learning rate

23

Page 24: Ann

Contd…

Widrow-Hoff rule or the delta rule o Another common rule uses not the actual

activation of unit k but the difference between the actual and desired activation for adjusting the weights.

o d is the desired activation provided by a k teacher

24

Page 25: Ann

THANK YOU

25