Tutorial An Introduction to the Use of Artificial Neural...
Transcript of Tutorial An Introduction to the Use of Artificial Neural...
(c) P. Gómez-Gil, INAOE 2015
Tutorial
An Introduction to the Use of
Artificial Neural Networks.
Part 2: Basic architectures
Dra. Ma. del Pilar Gómez Gil INAOE
[email protected] [email protected]
This version: October 13, 2015
Outline
(c) P. Gómez-Gil, INAOE 2015
Duration Topic Sub-topics
1 hour 1. Artificial Neural Networks.
What is that?
1.1 Some definitions.
1.2 Advantages and drawbacks of
ANN.
1.3 Characteristics of solutions
using ANN.
1.4 The fundamental neuron.
1.5 The concept of “learning” by
examples
1 hour 2. Basic architectures 2.1 Types of ANN
2.2 Single layer perceptron
network
2.3 Multi-layer Perceptrons
1 hour 3. Solutions using ANN 3.1 ANN as classifiers
3.2 ANN as a function
approximator.
3.3 ANN as predictors
1 hour 4. Examples using Matlab ANN
toolbox
4.1 A very simple classifier.
4.2 A very simple function
approximator.
2.1 Types of ANN
(c) P. Gómez-Gil, INAOE 2015
(c) P. Gómez-Gil, INAOE
2015
Types of ANN
Depending upon the way in that neurons
are connected, ANN may be classified as:
Single layer neural nets
Multi-layer neural nets
Recurrent neural nets
Combinations of previous types
(c) P. Gómez-Gil, INAOE
2015
Single layer ANN
In this kind of networks, neurons get their
inputs directly from the external world, and
their outputs are the outputs of the ANN
Neurons are organized at a single line
(c) P. Gómez-Gil, INAOE
2015
x0
x1
Xn-1
.
.
.
y0
y1
y2
Ym-1
w00
w02
w10 wm0
w01
wm1
w11
w21
wmn
w0n
w1n
w2n
A single layer neural net
Inputs
outputs
ANN
(c) P. Gómez-Gil, INAOE
2015
x0
x1
Xn-1
.
.
.
y0
y1
y2
Ym-1
w00
w02
w10 wm0
w01
wm1
w11
w21
wmn
w0n
w1n
w2n
A single layer neural net
Inputs
outputs
(c) P. Gómez-Gil, INAOE
2015
Calculation of a single-layer
ANN Let
n be the number of external inputs
m be the number of outputs
W be a matrix with nxm elements to represent knowledge of the network
X a vector with n inputs to the net
Y a vector with m outputs generated by the net
N = X * W
Y = F( N )
(c) P. Gómez-Gil, INAOE 2015
Calculation of a single-layer ANN
(cont.)
For example, if a NN has 3 inputs and 3 outputs:
)(
)(
)(
|
2221210202
2121110101
2021010000
222121020212111010202101000
222120
121110
020100
210
wxwxwxfy
wxwxwxfy
wxwxwxfy
wxwxwxwxwxwxwxwxwx
www
www
www
xxx
N
A feed-forward ANN
(Multi-layer perceptron)
(c) P. Gómez-Gil, INAOE 2015
0,01w
1,11w
0,12w
Output layer Hidden layer Input layer
A MLP with 3 hidden layers
(c) P. Gómez-Gil, INAOE 2015
0,01w
1,11w
0,12w
Output layer Hidden layers Input layer
1,03w
(c) P. Gómez-Gil, INAOE
2015
Characteristics of a feed-forward,
multi-layer perceptron (MLP)
Made of several layers, where outputs of neurons at level i are connected as input to neurons at level i+1. First layer is called “input layer.”
Last layer is called “output layer”
The rest of layers are “hidden layers”
(c) P. Gómez-Gil, INAOE
2015
Characteristics of a MLP(cont.)
The output of the ANN is the output of neurons located at last layer.
To calculate the output of the ANN, the output of each neuron is calculated, started with the first layer up the last layer.
In this kind of ANN, the activation function is non-linear.
(c) P. Gómez-Gil, INAOE
2015
Recurrent neural networks present feed-back. That is, the output of a neuron is used as input of another neuron that may be located in a previous layer.
The output of a neuron is calculated using the input values and output values obtained in previous time.
Recurrent neural networks present characteristics similar to human memory.
The learning algorithms used in this type of neurons use differential equations or differences equations.
Recurrent Neural networks
(c) P. Gómez-Gil, INAOE 2015
Example of the dynamics of a
Recurrent Neural network
It is defined by non-linear coupled differential systems with the form:
where:
are values representing bias weights, capacitive effects and resistance effects
is the output of neuron j at time t
NjR
vvW
t
vC j
j
jjj
N
jii
jij
j ..2,1
,1
jjj RC , ,
tv j
An example of RNN: Hopfield
network
(c) P. Gómez-Gil, INAOE 2015
A Hopfield net at hardware
(c) P. Gómez-Gil, INAOE 2015
[Zurada 1992]
An example of RNN: A fully
connected RNN
(c) P. Gómez-Gil, INAOE 2015
I1
I2 w02
w10
w21
w1
w20
w11
w22
w12
w01 w00
I3
2.2 Single layer
perceptron network
(c) P. Gómez-Gil, INAOE 2015
(c) P. Gómez-Gil, INAOE 2015
The Perceptron Proposed by Mc. Culloch y Pitts, consists of only one
neuron:
x0
xi
xn-1
w0
wi
wn
)(
1
0
i
n
i
ii wxFo
0 if 0
0 if 1)(
x
xxF
(c) P. Gómez-Gil, INAOE 2015
Single layer perceptron network
A set of
perceptrons,
sharing inputs
y0
inputs
x0
x1
Xn
. . .
y1
y2
Ym-1
w00
w02 w10
wm0
w01
wm1
w11
w21
wmn
w0n
w1n
w2n
outputs
(c) P. Gómez-Gil, INAOE 2015
A single layer perceptron may work as a
linear classifier
Suppose a neuron with 2 inputs. This is able to divide a 2-D plane in two regions, one region in each side of a line.
If the neuron has more than n inputs, then the generated hyper-plane will separate a n-dimensional space in two regions.
[Mora-Salinas, 2010]
(c) P. Gómez-Gil, INAOE 2015
zzz….
(c) P. Gómez-Gil, INAOE 2015
Multi-Layer Perceptrons (MLP)
(c) P. Gómez-Gil, INAOE 2015
0,01w
1,11w
0,12w
Output layer Hidden layers Input layer
1,03w
Back Propagation Learning
Algorthm
This is the most common algorithm to train MLP
It was originally proposed by J. Werbos in 1974
[Werbos 90] as part of his PhD dissertation in
economics
Independently, it was proposed by Rumelhart,
Hinton y Williams [Rumelhart 86] with the name
“backpropagation
(c) P. Gómez-Gil, INAOE 2015
(c) P. Gómez-Gil, INAOE 2015
Dr. Werbos and Dr. Edgar
Sánchez en el WICCI 2006
(c) P. Gómez-Gil, INAOE
2015
Characteristics of BP
It is a supervised training
The objective is to find values of weights that minimize the squared mean error gotten from the actual and desired output of the MLP. This is a gradient-based algorithm
2
1
)(2
1
:donde
n
j
pjpjp
p
p
OTE
EETTpj = Desired output in the
j-th neuron in the output layer
Opj = Output in the
j-th neuron in the output layer
(c) P. Gómez-Gil, INAOE 2015
The original BO Algorithm
Sean:
Wji = Peso que conecta hacia el neurón j en la capa k
desde el neurón i en la capa k-1.
F = Función de activación (contínua y diferenciable)
Opj = F(NETpj) Salida del j-ésimo neurón.
NETpj = S WjiOpi
Donde Opi corresponde a la entrada de la red (Xi)
si el neurón i está en la primera capa de la red.
DpWji = Incremento en el peso Wji provocado por
el p-ésimo patrón.
(c) P. Gómez-Gil, INAOE 2015
Algoritmo de Retro- propagación (cont.)
1.- Inicialice todos los pesos y valores umbrales de la red con número
reales pequeños generados al azar.
2.- Repita lo siguiente hasta que el error ET del conjunto de
entrenamiento sea aceptablemente pequeño, o alguna condición
predeterminada de "fin de entrenamiento" sea verdadera:
2.1 Por cada patrón p en el conjunto de entrenamiento:
2.1.1 Lea el vector de entrada Xp y el vector de la salida
deseada Tp.
2.1.2 Calcule la salida de la red.
2.1.3 Calcule el error Ep generado por el patrón p.
(c) P. Gómez-Gil, INAOE 2015
Algoritmo de Retro- propagación
(cont.)
2.1.4 Ajuste todos los pesos de la red aplicando la siguiente regla ( regla delta generalizada) a cada
uno de los pesos Wji
Wji(t+1) = Wji (t) + DpWji
donde
DpWji = hdpjOpi
h = Coeficiente de aprendizaje ( 0 < h < 1)
d Se calcula de la siguiente manera:
a) Si j corresponde a un neurón en la capa de salida de la red:
dpj = (Tpj - Opj) F' (NETpj)
(c) P. Gómez-Gil, INAOE
2015
Algoritmo de Retro- propagación
(cont.)
si la función de activación es la sigmoide,
entonces:
F´(x) = F(x)(1 - F(x) ), y
dpj = (Tpj - Opj) Opj (1 - Opj)
b) Si j no corresponde a un neurón de salida, es decir, está en una de las capas escondidas entonces:
donde la sumatoria acumula el error propagado
hacia atrás.
2.1.5 Regrese al paso 2.1
3.- Regrese al paso 2.
k
kjpkpjpj wNETF dd )(
(c) P. Gómez-Gil, INAOE 2015
Example: Learning a circle (Lipmann 87),
This net has 2 input nodes (the point (x,y) in the
plane inside the circle), 2 output nodes (the two
possible classes, inside the circle or outside of it) and
8 hidden nodes.
The network was trained using a training coefficient
of h = 0.3, using 100 samples of each class.
Example: learning a circle
(c) P. Gómez-Gil, INAOE 2015
class B
class A
(c) P. Gómez-Gil, INAOE 2015
Learning two classes
A
(Lipmann 87),
(c) P. Gómez-Gil, INAOE 2015
Example: Learning XOR
Swweps = 558 h = 0.5 Positive bias
[Rumelhart 86]
2.2
6.3 OUPUT UNIT
HIDDEN UNITS
INPUT UNITS
-4.2 -4.2
-6.4 -6.4
-9.4
(c) P. Gómez-Gil, INAOE 2015
Advantages and Drawback of Back
Propagation
It generates non-linear solutions
It is easy to implement and use to solve a wide range of problems
It is noise-resistant
The learning may be very slow in some cases
We do not know in advance if the problem may be solve using a MLP
BP may get stuck in a “local minima”
(c) P. Gómez-Gil, INAOE 2015
2.4 Self-organizing
maps
(c) P. Gómez-Gil, INAOE 2015
Self Organizing Maps
The purpose of auto-organization is to
discover significant patterns or
characteristics in the input data, without
the aid of a teacher
(c) P. Gómez-Gil, INAOE 2015
(c) P. Gómez-Gil, INAOE
2015
El proceso de formación de grupos
(clustering)
[Tao & Gonzalez 1974]
(c) P. Gómez-Gil, INAOE
2015
Creating groups
(c) P. Gómez-Gil, INAOE
2015
SOM Network
[Hilera y Martínez 00]
Teuvo Kohonen
(c) P. Gómez-Gil, INAOE
2015
Architecture
Each of N input neurons connects to all M output neurons in a feed-forward way
There are implicit inhibitory lateral connections among the neurons in the output layer
Each neuron in the ouput layer has some effects in its neigbours neurons.
Values of weights (wji) are calculated using such interactions
(c) P. Gómez-Gil, INAOE
2015
Learning algorithm
El objetivo del algoritmo de aprendizaje de SOM es almacenar una serie de patrones de entrada x X encontrando un conjunto de prototipos {wj | j = 1, 2…N} que representan un mapa de características de estos patrones, siguiendo alguna estructura topológica.
Este mapa se forma por la conexión de pesos wj de un conjunto de neurones arreglados en una o 2 dimensiones, donde éstos se relacionan de una forma competitiva.
El proceso de aprendizaje de SOFM es estocástico, fuera de línea y no supervisado.
(c) P. Gómez-Gil, INAOE
2015
Learning algorithm (cont.)
[Martín & Sanz 01 en De los Santos 02]
1. Inicialización de los pesos sinápticos wijk. Pueden ser pesos nulos, aleatorios pequeños, o con un valor de predeterminado. Además se fija la zona inicial de vecindad entre las neuronas de salida.
2. En cada iteración, presentación de un patrón x(t) tomado de acuerdo con la función de distribución p(x) del espacio sensorial de entrada (en la muy habitual situación de disponer solamente de un conjunto finito de patrones de entrenamiento, basta con tomar al azar uno de ellos y presentarlo a la red).
continúa...
(c) P. Gómez-Gil, INAOE
2015
Learning algorithm (cont.)
3. Para cada neurona i ≡ (i, j) del mapa, se
calcula (puede ser en paralelo) la similitud entre su vector de pesos sinápticos wij y el vector de entradas actual x. Un criterio de medida de similitud muy utilizado es la distancia Euclidiana:
2
1
2 ,
n
k
kijkij xwxwd
(c) P. Gómez-Gil, INAOE
2015
Learning algorithm (cont.)
4. Actualización de los pesos sinápticos de la neurona ganadora g = (g1, g2), y los de sus neuronas vecinas. La zona de vecindad va reduciendo en el número de elementos que la conforman conforme avanza el tiempo
twtxtjihttwtw ijkkijkijk ,1
(c) P. Gómez-Gil, INAOE
2015
Learning algorithm (cont.)
donde
h representa a la función de vecindad. Muchas
veces no se utiliza, sobre todo en redes
pequeñas.
Si se ha alcanzado el número máximo de
iteraciones establecido, entonces el proceso de
aprendizaje finaliza. En caso contrario, se
vuelve al paso 2.
tt
tt
11)( ó
1)( 1
(c) P. Gómez-Gil, INAOE 2015
Time for
more coffee?
References (1)
Gómez-Gil, P., J. M. Ramírez y W. Oldham. “On handwritten character Recognition through
Locally connected structural neural networks,” Proceedings of the “Second Joint Mexico-US
International Workshop on Neural Networks and Neurocontrol Sian Ka'an '97,” Quintana Roo,
México, August 1997. pp. 251 – 255
Gómez-Gil, P. De-Los-Santos Torres G., Navarrete-García J. Ramírez-Cortés M. “The Role of
Neural Networks in the interpretation of Antique Handwritten Documents.” Hibrid Intelligent
Systems. Analysis and Design Series: Studies at Fuzziness and Soft Computing. Vol . 208.
Editors: Castillo, O. Melin, P. Kacprzyk W. 2007 Springer. ISBN-10: 3-540-37419-1. Pags. 269-
281.
Gómez-Gil, P., Ramírez-Cortes, M. ”Experiments with a Hybrid-Complex Neural Networks for
Long Term Prediction of Electrocardiograms” with M. Ramírez-Cortés. Proceedings of the IEEE
2006 International World Congress of Computational Intelligence, IJCNN 2006”. Vancouver.
Canada. July 2006. DOI: 10.1109/IJCNN.2006.246952
Gómez-Gil, P (b) “Long Term Prediction, Chaos and Artificial Neural Networks. Where is the
meeting point?” Engineering Letters. Vo. 15, Number 1. August 2007 (b). . ISSN: 1816-0948
(online version), 1816-093X (printed version).
cont.
(c) P. Gómez-Gil, INAOE 2015
References (2)
Lippmann, R.P. “An Introduction to Computing with Neural Nets,” lEEEASSP Magazine, Vol. 4, No. 2, Apr. 1987, pp. 4-22.
Mora Salinas, R. “Entrenando un perceptron”. Proyecto del curso Redes Neuronales Artificiales, Coordinación de Computación. Primavera 2010. Instituto Nacional de Astrofísica, Óptica y Electrónica.
Rumelhart, D.E. G. E. Hinton and R.J. Williams.: Learning Internal Representation by error propagation. In: Parallel Distributed Processing: Explorations in the Microstructure of Cognition D.E. Rumelhart and J.L. McClelland, eds. Vol. 1, Chapter 8. Cambridge, MA: MIT Press. 1986
Tao, J.T. and Gonzalez, R.C. Pattern Recognition Principles. Addison-Wesley. 1974
Werbos, P.J. 1990. “Backpropagation through time: What it does and how to do it.” Proceedings of the IEEE, Vol. 78, pp. 1550-1560.
Zurada, Jacek M. Introduction to Artificial Neural Systems. WEST PUBLISHING COMPANY. 1992.
(c) P. Gómez-Gil, INAOE 2015