Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks...

Post on 20-Dec-2015

214 views 1 download

Tags:

Transcript of Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks...

EE3J2 Data MiningSlide 1

EE3J2 Data Mining

Lecture 15: Introduction to Artificial Neural Networks

Martin Russell

EE3J2 Data MiningSlide 2

Objectives

Unsupervised and supervised learning Modelling and discrimination Introduction to Artificial Neural Networks (ANNs)

EE3J2 Data MiningSlide 3

Unsupervised learning

So far we have looked at techniques which try to discover structure in ‘raw’ data – data with no information about classes– Gaussian Mixture Modelling

– Clustering

We treat the whole data set as a single entity, and try to discover underlying structure

The analysis is unsupervised, and automatic learning of the structure of the data is unsupervised learning

EE3J2 Data MiningSlide 4

Supervised learning

In some cases additional information is available For example, for speech data we might know who

was speaking, or what he or she said This is information about the class of each piece of

data When the analysis is driven by class labels, it is

called supervised learning

EE3J2 Data MiningSlide 5

Modelling and Discrimination

In supervised learning we can:– Analyse the data for each class separately– Try to discover how to distinguish between classes

Could apply GMM or clustering separately to model each class

Alternatively, we could try to find a method to discriminate between the classes

EE3J2 Data MiningSlide 6

Modelling and DiscriminationClass models

Decision boundary

EE3J2 Data MiningSlide 7

Discrimination

In the simplest cases we can discriminate between two classes using a class boundary

Allocation of a point to a class depends on which side of the boundary it lies

Linear decision boundary

Non-linear

decision boundary

EE3J2 Data MiningSlide 8

Artificial Neural Networks

There are many approaches to discrimination A common class of approaches is based on the idea

of Artificial Neural Networks (ANNs) Inspiration for the basic elements of an ANN

(artificial neuron) comes from biology… …but the analogy really stops there ANNs are just a computational device for processing

patterns – not “artificial brains”

EE3J2 Data MiningSlide 9

A model of a neuron

EE3J2 Data MiningSlide 10

An Artificial Neuron

Simple artificial neuron Basic idea –

– if the input to unit u4 is big enough, then the neurone ‘fires’

– Otherwise nothing happens

How do we calculate the input to u4?

i1 i2 i3

w1,4 w2,4 w3,4

u4

EE3J2 Data MiningSlide 11

Artificial Neurone (2) Suppose that the inputs to units

1, 2 and 3 are i1, i2 and i3

Then the input to u4 is:

In general, for an artificial neuron with N input units the input to unit k is:

4,334,224,114 wiwiwii

i1 i2 i3

w1,4 w2,4 w3,4

u4

N

nknnk wii

1,

EE3J2 Data MiningSlide 12

The ‘threshold’ activation function The activation function decides

whether the neuron should “fire” A suitable activation function is

the threshold function g:

The output of u4 is then:

0 if 0

0 if 1

x

xxg

i1 i2 i3

w1,4 w2,4 w3,4

u4

44 igo

EE3J2 Data MiningSlide 13

Other activation functions

Linear:

Sigmoid

xxg

kxe

xg

1

1

Sigmoid activation function

EE3J2 Data MiningSlide 14

The ‘bias’

As described, the neuron will ‘fire’ only if its input is greater than 0

We can change the value of the point of firing by introducing a bias

This is an additional input unit whose input is fixed at 1

i1 i2 i3

w1,4 w2,4 w3,4

u4

wb,4

1

EE3J2 Data MiningSlide 15

How the bias works…

The artificial neuron ‘fires’ if input to u4 is greater than or equal to 0

I.E: But this happens only if

Or, equivalently,

04,4,334,224,114 bwwiwiwii

04,4,334,224,114 bwwiwiwii

4,4,334,224,11 bwwiwiwi

EE3J2 Data MiningSlide 16

Example (2D)

Suppose u has a threshold or sigmoid activation function

u will ‘fire’ if:

x y

3 1

u

-2

1

23 i.e.

023

xy

yx

EE3J2 Data MiningSlide 17

Example (continued)

23 xyx y

3 1

u4

-2

1

2/3

2

23 xy

u1 u2u3

EE3J2 Data MiningSlide 18

Example (continued)

Assume – linear activation functions for units u1, u2 and u3

– Sigmoid activation function for u4

If input to u1 is 2 and input to u2 is 2, then:– Input to u4 is 2 × 3 + 2 ×1 + 1 × (-2) = 6– Hence output from u4 is g(6) = 0.998

If input to u1 is -2 and input to u2 is -2, then:– Input to u4 is -2 × 3 + -2 ×1 + 1 × (-2) = -10– Hence output from u4 is g(-10) = 4.54 × 10-5

EE3J2 Data MiningSlide 19

Example 2

x y

2 -1

u4

-1

1

-1

1/2

12

012

xy

yx

EE3J2 Data MiningSlide 20

Combining 2 Artificial Neuronsx y

3 1

u

-2

1 x y

2 -1

u

-1

1

-1

1/2

2

2/3

EE3J2 Data MiningSlide 21

Combining neurons – artificial neural networks

x y

3

u4

-2

1u1 u2

-12 1 -1

20 -20

u5

u6

-2

1

EE3J2 Data MiningSlide 22

Combining neurons

Input to u4 is 3 × x + 1 × y - 2

Input to u5 is 2 × x + (-1) × y – 1

When x = 3, y = 0– Input to u4 is 7, input to u5 is 5

– Output from u4 is 1, output from u5 is 0.99

– Input to u6 is 1 × 20 + 0.88 × (-20) - 2 = -1.88

– Output from u6 is 0.13

EE3J2 Data MiningSlide 23

Outputs

i1 i2 o6

3 0 0.13

0.5 2 1.00

0.5 -2 0.00

-1 0 0.06

EE3J2 Data MiningSlide 24

Combining neurones

2

2/3

-1

‘firing region’

EE3J2 Data MiningSlide 25

Single layer Multi-Layer Perceptron (MLP)

Input layer

Hidden layer

Output layer

EE3J2 Data MiningSlide 26

Single Layer MLP Can characterize arbitrary convex regions Defines the region using linear decision boundaries

EE3J2 Data MiningSlide 27

Two-layer MLP

Hidden layers

EE3J2 Data MiningSlide 28

Two-Layer MLP

An MLP with two hidden layers can characterize arbitrary shapes

First hidden layer characterises convex regions Second hidden layer combines these convex regions There is no advantage in having more than two

hidden layers

EE3J2 Data MiningSlide 29

MLP training

To define an MLP must decide:– Number of layers

– Number of input units

– Number of hidden units

– Number of output units

Once these are defined, properties of the MLP are completely defined by the values of the weights

How do we choose the weight values?

EE3J2 Data MiningSlide 30

MLP training (continued)

MLP weights learnt automatically from training data We have already seen computational techniques for

estimating:– Parameters of GMMs

– Centroid positions in clustering

Similarly there is an iterative computational technique for estimating MLP weights – “Error-Back-Propagation”

EE3J2 Data MiningSlide 31

Error-back propagation (EBP)

EBP is a ‘gradient descent’ method, like others we have seen

First stage is to choose initial values for the weights The EBP algorithm then changes the weights

incrementally to identify the class boundaries Only guaranteed to find a local optimum

EE3J2 Data MiningSlide 32

Other types of ANN

Multi-Layer Perceptrons (MLP) are not the only types of ANNs

There are lots of others:– Radial Basis Function (RBF) networks

– Support Vector Machines (SVMs)

– …

There are also ANN interpretations of other methods

EE3J2 Data MiningSlide 33

Summary

Discrimination versus Modelling Brief introduction to neural networks Definition of an ‘artificial neurone’ Activation functions – linear and sigmoid Linear boundary defined by a single neurone Convex region defined by a one-level MLP Two-level MLPs