Tutorial An Introduction to the Use of Artificial Neural...

(c) P. Gómez-Gil, INAOE 2015

Tutorial

An Introduction to the Use of

Artificial Neural Networks.

Part 2: Basic architectures

Dra. Ma. del Pilar Gómez Gil INAOE

[email protected] [email protected]

This version: October 13, 2015

Outline


Duration Topic Sub-topics

1 hour 1. Artificial Neural Networks.

What is that?

1.1 Some definitions.

1.2 Advantages and drawbacks of

ANN.

1.3 Characteristics of solutions

using ANN.

1.4 The fundamental neuron.

1.5 The concept of “learning” by

examples

1 hour 2. Basic architectures 2.1 Types of ANN

2.2 Single layer perceptron

network

2.3 Multi-layer Perceptrons

1 hour 3. Solutions using ANN 3.1 ANN as classifiers

3.2 ANN as a function

approximator.

3.3 ANN as predictors

1 hour 4. Examples using Matlab ANN

toolbox

4.1 A very simple classifier.

4.2 A very simple function

approximator.

2.1 Types of ANN


(c) P. Gómez-Gil, INAOE

2015

Types of ANN

Depending upon the way in that neurons

are connected, ANN may be classified as:

Single layer neural nets

Multi-layer neural nets

Recurrent neural nets

Combinations of previous types


2015

Single layer ANN

In this kind of networks, neurons get their

inputs directly from the external world, and

their outputs are the outputs of the ANN

Neurons are organized at a single line


2015

x0

x1

Xn-1

.

.

.

y0

y1

y2

Ym-1

w00

w02

w10 wm0

w01

wm1

w11

w21

wmn

w0n

w1n

w2n

A single layer neural net

Inputs

outputs

ANN


2015

x0

x1

Xn-1

.

.

.

y0

y1

y2

Ym-1

w00

w02

w10 wm0

w01

wm1

w11

w21

wmn

w0n

w1n

w2n

A single layer neural net

Inputs

outputs


2015

Calculation of a single-layer

ANN Let

n be the number of external inputs

m be the number of outputs

W be a matrix with nxm elements to represent knowledge of the network

X a vector with n inputs to the net

Y a vector with m outputs generated by the net

N = X * W

Y = F( N )


Calculation of a single-layer ANN

(cont.)

For example, if a NN has 3 inputs and 3 outputs:

)(

)(

)(

|

2221210202

2121110101

2021010000

222121020212111010202101000

222120

121110

020100

210

wxwxwxfy

wxwxwxfy

wxwxwxfy

wxwxwxwxwxwxwxwxwx

www

www

www

xxx

N

A feed-forward ANN

(Multi-layer perceptron)


0,01w

1,11w

0,12w

Output layer Hidden layer Input layer

A MLP with 3 hidden layers


0,01w

1,11w

0,12w

Output layer Hidden layers Input layer

1,03w


2015

Characteristics of a feed-forward,

multi-layer perceptron (MLP)

Made of several layers, where outputs of neurons at level i are connected as input to neurons at level i+1. First layer is called “input layer.”

Last layer is called “output layer”

The rest of layers are “hidden layers”


2015

Characteristics of a MLP(cont.)

The output of the ANN is the output of neurons located at last layer.

To calculate the output of the ANN, the output of each neuron is calculated, started with the first layer up the last layer.

In this kind of ANN, the activation function is non-linear.


2015

Recurrent neural networks present feed-back. That is, the output of a neuron is used as input of another neuron that may be located in a previous layer.

The output of a neuron is calculated using the input values and output values obtained in previous time.

Recurrent neural networks present characteristics similar to human memory.

The learning algorithms used in this type of neurons use differential equations or differences equations.

Recurrent Neural networks


Example of the dynamics of a

Recurrent Neural network

It is defined by non-linear coupled differential systems with the form:

where:

are values representing bias weights, capacitive effects and resistance effects

is the output of neuron j at time t

NjR

vvW

t

vC j

j

jjj

N

jii

jij

j ..2,1

,1

jjj RC , ,

tv j

An example of RNN: Hopfield

network


A Hopfield net at hardware


[Zurada 1992]

An example of RNN: A fully

connected RNN


I1

I2 w02

w10

w21

w1

w20

w11

w22

w12

w01 w00

I3

2.2 Single layer

perceptron network



The Perceptron Proposed by Mc. Culloch y Pitts, consists of only one

neuron:

x0

xi

xn-1

w0

wi

wn

)(

1

0

i

n

i

ii wxFo

0 if 0

0 if 1)(

x

xxF


Single layer perceptron network

A set of

perceptrons,

sharing inputs

y0

inputs

x0

x1

Xn

. . .

y1

y2

Ym-1

w00

w02 w10

wm0

w01

wm1

w11

w21

wmn

w0n

w1n

w2n

outputs


A single layer perceptron may work as a

linear classifier

Suppose a neuron with 2 inputs. This is able to divide a 2-D plane in two regions, one region in each side of a line.

If the neuron has more than n inputs, then the generated hyper-plane will separate a n-dimensional space in two regions.

[Mora-Salinas, 2010]


zzz….

Multi-Layer Perceptrons (MLP)


0,01w

1,11w

0,12w

Output layer Hidden layers Input layer

1,03w

Back Propagation Learning

Algorthm

This is the most common algorithm to train MLP

It was originally proposed by J. Werbos in 1974

[Werbos 90] as part of his PhD dissertation in

economics

Independently, it was proposed by Rumelhart,

Hinton y Williams [Rumelhart 86] with the name

“backpropagation



Dr. Werbos and Dr. Edgar

Sánchez en el WICCI 2006


2015

Characteristics of BP

It is a supervised training

The objective is to find values of weights that minimize the squared mean error gotten from the actual and desired output of the MLP. This is a gradient-based algorithm

2

1

)(2

1

:donde

n

j

pjpjp

p

p

OTE

EETTpj = Desired output in the

j-th neuron in the output layer

Opj = Output in the

j-th neuron in the output layer


The original BO Algorithm

Sean:

Wji = Peso que conecta hacia el neurón j en la capa k

desde el neurón i en la capa k-1.

F = Función de activación (contínua y diferenciable)

Opj = F(NETpj) Salida del j-ésimo neurón.

NETpj = S WjiOpi

Donde Opi corresponde a la entrada de la red (Xi)

si el neurón i está en la primera capa de la red.

DpWji = Incremento en el peso Wji provocado por

el p-ésimo patrón.


Algoritmo de Retro- propagación (cont.)

1.- Inicialice todos los pesos y valores umbrales de la red con número

reales pequeños generados al azar.

2.- Repita lo siguiente hasta que el error ET del conjunto de

entrenamiento sea aceptablemente pequeño, o alguna condición

predeterminada de "fin de entrenamiento" sea verdadera:

2.1 Por cada patrón p en el conjunto de entrenamiento:

2.1.1 Lea el vector de entrada Xp y el vector de la salida

deseada Tp.

2.1.2 Calcule la salida de la red.

2.1.3 Calcule el error Ep generado por el patrón p.


Algoritmo de Retro- propagación

(cont.)

2.1.4 Ajuste todos los pesos de la red aplicando la siguiente regla ( regla delta generalizada) a cada

uno de los pesos Wji

Wji(t+1) = Wji (t) + DpWji

donde

DpWji = hdpjOpi

h = Coeficiente de aprendizaje ( 0 < h < 1)

d Se calcula de la siguiente manera:

a) Si j corresponde a un neurón en la capa de salida de la red:

dpj = (Tpj - Opj) F' (NETpj)


2015

Algoritmo de Retro- propagación

(cont.)

si la función de activación es la sigmoide,

entonces:

F´(x) = F(x)(1 - F(x) ), y

dpj = (Tpj - Opj) Opj (1 - Opj)

b) Si j no corresponde a un neurón de salida, es decir, está en una de las capas escondidas entonces:

donde la sumatoria acumula el error propagado

hacia atrás.

2.1.5 Regrese al paso 2.1

3.- Regrese al paso 2.

k

kjpkpjpj wNETF dd )(


Example: Learning a circle (Lipmann 87),

This net has 2 input nodes (the point (x,y) in the

plane inside the circle), 2 output nodes (the two

possible classes, inside the circle or outside of it) and

8 hidden nodes.

The network was trained using a training coefficient

of h = 0.3, using 100 samples of each class.

Example: learning a circle


class B

class A


Learning two classes

A

(Lipmann 87),


Example: Learning XOR

Swweps = 558 h = 0.5 Positive bias

[Rumelhart 86]

2.2

6.3 OUPUT UNIT

HIDDEN UNITS

INPUT UNITS

-4.2 -4.2

-6.4 -6.4

-9.4


Advantages and Drawback of Back

Propagation

It generates non-linear solutions

It is easy to implement and use to solve a wide range of problems

It is noise-resistant

The learning may be very slow in some cases

We do not know in advance if the problem may be solve using a MLP

BP may get stuck in a “local minima”

2.4 Self-organizing

maps


Self Organizing Maps

The purpose of auto-organization is to

discover significant patterns or

characteristics in the input data, without

the aid of a teacher



2015

El proceso de formación de grupos

(clustering)

[Tao & Gonzalez 1974]


2015

Creating groups


2015

SOM Network

[Hilera y Martínez 00]

Teuvo Kohonen


2015

Architecture

Each of N input neurons connects to all M output neurons in a feed-forward way

There are implicit inhibitory lateral connections among the neurons in the output layer

Each neuron in the ouput layer has some effects in its neigbours neurons.

Values of weights (wji) are calculated using such interactions


2015

Learning algorithm

El objetivo del algoritmo de aprendizaje de SOM es almacenar una serie de patrones de entrada x X encontrando un conjunto de prototipos {wj | j = 1, 2…N} que representan un mapa de características de estos patrones, siguiendo alguna estructura topológica.

Este mapa se forma por la conexión de pesos wj de un conjunto de neurones arreglados en una o 2 dimensiones, donde éstos se relacionan de una forma competitiva.

El proceso de aprendizaje de SOFM es estocástico, fuera de línea y no supervisado.


2015

Learning algorithm (cont.)

[Martín & Sanz 01 en De los Santos 02]

1. Inicialización de los pesos sinápticos wijk. Pueden ser pesos nulos, aleatorios pequeños, o con un valor de predeterminado. Además se fija la zona inicial de vecindad entre las neuronas de salida.

2. En cada iteración, presentación de un patrón x(t) tomado de acuerdo con la función de distribución p(x) del espacio sensorial de entrada (en la muy habitual situación de disponer solamente de un conjunto finito de patrones de entrenamiento, basta con tomar al azar uno de ellos y presentarlo a la red).

continúa...


2015


3. Para cada neurona i ≡ (i, j) del mapa, se

calcula (puede ser en paralelo) la similitud entre su vector de pesos sinápticos wij y el vector de entradas actual x. Un criterio de medida de similitud muy utilizado es la distancia Euclidiana:

2

1

2 ,

n

k

kijkij xwxwd


2015


4. Actualización de los pesos sinápticos de la neurona ganadora g = (g1, g2), y los de sus neuronas vecinas. La zona de vecindad va reduciendo en el número de elementos que la conforman conforme avanza el tiempo

twtxtjihttwtw ijkkijkijk ,1


2015


donde

h representa a la función de vecindad. Muchas

veces no se utiliza, sobre todo en redes

pequeñas.

Si se ha alcanzado el número máximo de

iteraciones establecido, entonces el proceso de

aprendizaje finaliza. En caso contrario, se

vuelve al paso 2.

tt

tt

11)( ó

1)( 1


Time for

more coffee?

References (1)

Gómez-Gil, P., J. M. Ramírez y W. Oldham. “On handwritten character Recognition through

Locally connected structural neural networks,” Proceedings of the “Second Joint Mexico-US

International Workshop on Neural Networks and Neurocontrol Sian Ka'an '97,” Quintana Roo,

México, August 1997. pp. 251 – 255

Gómez-Gil, P. De-Los-Santos Torres G., Navarrete-García J. Ramírez-Cortés M. “The Role of

Neural Networks in the interpretation of Antique Handwritten Documents.” Hibrid Intelligent

Systems. Analysis and Design Series: Studies at Fuzziness and Soft Computing. Vol . 208.

Editors: Castillo, O. Melin, P. Kacprzyk W. 2007 Springer. ISBN-10: 3-540-37419-1. Pags. 269-

281.

Gómez-Gil, P., Ramírez-Cortes, M. ”Experiments with a Hybrid-Complex Neural Networks for

Long Term Prediction of Electrocardiograms” with M. Ramírez-Cortés. Proceedings of the IEEE

2006 International World Congress of Computational Intelligence, IJCNN 2006”. Vancouver.

Canada. July 2006. DOI: 10.1109/IJCNN.2006.246952

Gómez-Gil, P (b) “Long Term Prediction, Chaos and Artificial Neural Networks. Where is the

meeting point?” Engineering Letters. Vo. 15, Number 1. August 2007 (b). . ISSN: 1816-0948

(online version), 1816-093X (printed version).

cont.


http://ccc.inaoep.mx/~pgomez/publications/congress/PggSia97.pdf

http://ccc.inaoep.mx/~pgomez/publications/congress/PggSia97.pdf

http://www.springerlink.com/content/33661173371xm44j/

http://www.springerlink.com/content/33661173371xm44j/

http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1716661




http://www.engineeringletters.com/issues_v15/issue_1/EL_15_1_10.pdf

http://www.engineeringletters.com/issues_v15/issue_1/EL_15_1_10.pdf

References (2)

Lippmann, R.P. “An Introduction to Computing with Neural Nets,” lEEEASSP Magazine, Vol. 4, No. 2, Apr. 1987, pp. 4-22.

Mora Salinas, R. “Entrenando un perceptron”. Proyecto del curso Redes Neuronales Artificiales, Coordinación de Computación. Primavera 2010. Instituto Nacional de Astrofísica, Óptica y Electrónica.

Rumelhart, D.E. G. E. Hinton and R.J. Williams.: Learning Internal Representation by error propagation. In: Parallel Distributed Processing: Explorations in the Microstructure of Cognition D.E. Rumelhart and J.L. McClelland, eds. Vol. 1, Chapter 8. Cambridge, MA: MIT Press. 1986

Tao, J.T. and Gonzalez, R.C. Pattern Recognition Principles. Addison-Wesley. 1974

Werbos, P.J. 1990. “Backpropagation through time: What it does and how to do it.” Proceedings of the IEEE, Vol. 78, pp. 1550-1560.

Zurada, Jacek M. Introduction to Artificial Neural Systems. WEST PUBLISHING COMPANY. 1992.


Tutorial An Introduction to the Use of Artificial Neural...

Documents

Transcript of Tutorial An Introduction to the Use of Artificial Neural...