ArtificialNeural Networks
1
What is Neural Network ?
Definition: An artificial neural network is a computer program that can recognize patterns in a given collection of data and produce a model for that data. It resembles the brain in two respects:
o Knowledge is acquired by the network through a learning process (trial and error).
o inter-neuron connection strengths known as synaptic weights are used to store the knowledge.
2
Demonstration
Demonstration of a neural network used within an optical character recognition (OCR) application.
3
Neural Network Structure
Artificial neuron
Network refers to the inter–connections between the neurons in the different layers of each system.
The most basic system has three layers. The first layer has input neurons which send data via
synapses to the second layer of neurons and then via more synapses to the third layer of output neurons.
The synapses store parameters called "weights" which are used to manipulate the data in the calculations.
w1
w2
wn
: Σ F(net)
Input Output
4
Network Function f(x) is a composition of other functions gi(x) , which can further be defined as a composition of other functions.
This can be conveniently represented as a network structure, with arrows depicting the dependencies between variables.
5
This figure depicts a decomposition of “f”, with dependencies between variables indicated by arrows. These can be interpreted in two ways.
Functional view: the input is transformed into a 3-dimensional vector “h” , which is then transformed into a 2-dimensional vector “g” , which is finally transformed into “f”
Probabilistic view: the random variable F=f(G) depends upon the random variable G=g(H) , which depends upon H=h(X), which depends upon the random variable X. This view is most commonly encountered in the context of graphical models.
6
Why do we use Neural Networks?
Ability to represent both linear and non-linear relationships
Their ability to learn these relationships directly from the data being modeled.
Although computing these days is truly advanced, there are certain tasks that a program made for a common microprocessor is unable to perform.
There are different architectures, which consequently requires different types of algorithms, but despite to be an apparently complex system, a neural network is relatively simple.
7
Advantages of ANN
A neural network can perform tasks that a linear program can not.
When an element of the neural network fails, it can continue without any problem by their parallel nature.
A neural network learns and does not need to be reprogrammed.
It can be implemented in any application without any problem.
8
Disadvantages of ANN
The neural network needs training to operate.
The architecture of a neural network is different from the architecture of microprocessors therefore needs to be emulated.
Requires high processing time for large neural networks.
9
How do Neural Networks Work?
Input Output
Train the Network•Present the data to the network.•Network computes an output.•Network output compared to desired output.•Network weights are modified to reduce error.
Use the Network•Present new data to network.•Network computes an output based on its training
10
Mathematical Model
A set of major aspects of a parallel distributed model can be distinguished :
a set of processing units ('neurons,' 'cells'); a state of activation yk for every unit, equivalent to
the output.connections between the units. a propagation rulean activation function Φk
an external input (aka bias, offset) θk for each unit; a method for information gathering (the learning
rule); an environment within which the system must
operate.
11
Contd…
Connections between unitso Assume that unit provides an additive
contribution to the input of the unit which it is connected
o The total input to unit k is simply the weighted sum of the separate outputs from each of the connected units plus a bias or offset term θk
o A positive wjk is considered excitation and negative wjk as inhibition.
o The units of propagation rule be call sigma units
sk (t) = Σ wjk (t) yj (t)+ θk
12
Contd…
The activity of summing up of inputs is referred to as linear combination.
Finally, an activation function controls the amplitude of the output of the neuron.
An acceptable range of output is usually between 0 and 1, or -1 and 1.
The output of the neuron, yk, would therefore be the outcome of some activation function on the value of vk.
13
Contd…
Mathematically, this process is described in this figure
14
Activation Functions
There are three types of activation functions, denoted by Φ(.): Threshold Function: takes on a value of 0 if the
summed input is less than a certain threshold value (v), and the value 1 if the summed input is greater than or equal to the threshold value.
15
Contd…
Piecewise-Linear function: This function again can take on the values of 0 or 1, but can also take on values between that depending on the amplification factor in a certain region of linear operation.
16
Contd…
Sigmoid function: This function can range between 0 and 1, but it is also sometimes useful to use the -1 to 1 range. An example of the sigmoid function is the hyperbolic tangent function.
17
Training of Artificial Neural Networks
o A neural network has to be configured such that the application of a set of inputs produces (either ‘direct’ or via a relaxation process) the desired set of output.
o One way is to set the weights explicitly, using a priori knowledge.
o Other way is to ‘train’ the neural network by feeding it teaching patterns and letting it change its weights according to some learning rule.
18
Paradigms of Learning
Supervised learning or Associative learning
In supervised learning, we are given a set of example pairs (x, y) {x € X , y € Y} and the aim is to find a function f: X->Y in the allowed class of functions that matches the examples.
We wish to infer the mapping implied by the data; the cost function is related to the mismatch between our mapping and the data and it implicitly contains prior knowledge about the problem domain.
19
A commonly used cost is the mean-squared error which tries to minimize the average squared error between the network's output, f(x), and the target value “y” over all the example pairs.
When one tries to minimize this cost using gradient descent for the class of neural networks called Multi-Layer Perceptrons, one obtains the common and well-known backpropagation algorithm for training neural networks.
20
Contd…
Unsupervised learning or Self-organization o An (output) unit is trained to respond to clusters
of pattern within the input.
o In this paradigm the system is supposed to discover statistically salient features of the input population.
o Unlike the supervised learning paradigm, there is no a priori set of categories into which the patterns are to be classified; rather the system must develop its own representation of the input stimuli.
21
we are given some data “x” and the cost function to be minimized, that can be any function of the data “x” and the network's output, “f”.
The cost function is dependent on the task (what we are trying to model) and our a priori assumptions (the implicit properties of our model, its parameters and the observed variables).
22
Modifying patterns of connectivity
Hebbian learning ruleo Suggested by Hebb in his classic book
Organization of Behaviour (Hebb, 1949) o The basic idea is that if two units j and k are
active simultaneously, their interconnection must be strengthened. If j receives input from k, the simplest version of Hebbian learning prescribes to modify the weight wjk with:
ϒ is a positive constant of proportionality representing the learning rate
23
Contd…
Widrow-Hoff rule or the delta rule o Another common rule uses not the actual
activation of unit k but the difference between the actual and desired activation for adjusting the weights.
o d is the desired activation provided by a k teacher
24
THANK YOU
25
Top Related