Artificial Neural Networksvasighi/courses/ann97win/ann08.pdf · Artificial Neural Networks Part 8 ....

26
Artificial Neural Networks Part 8

Transcript of Artificial Neural Networksvasighi/courses/ann97win/ann08.pdf · Artificial Neural Networks Part 8 ....

Page 1: Artificial Neural Networksvasighi/courses/ann97win/ann08.pdf · Artificial Neural Networks Part 8 . Neural Learning Application of MLPs Modelling of Dynamic Systems Time-variant or

Artificial Neural

Networks

Part 8

Page 2: Artificial Neural Networksvasighi/courses/ann97win/ann08.pdf · Artificial Neural Networks Part 8 . Neural Learning Application of MLPs Modelling of Dynamic Systems Time-variant or

Neural Learning Application of MLPs

Modelling of Dynamic Systems

Time-variant or time dependent

Financial market, Weather forecasting, Power consumption

The outputs of dynamic systems, at any instant of time, depend

on their previous input and output values

MLPs in Functional Approximation MLPs in Dynamic Systems

Page 3: Artificial Neural Networksvasighi/courses/ann97win/ann08.pdf · Artificial Neural Networks Part 8 . Neural Learning Application of MLPs Modelling of Dynamic Systems Time-variant or

Neural Learning Application of MLPs

Modelling of Dynamic Systems

Time delay neural network (TDNN)

Feedforward architectures with multiple layers,

without any feedback

𝒙 𝒕 = 𝒇(𝒙 𝒕 − 𝟏 , 𝒙 𝒕 − 𝟐 ,… , 𝒙 𝒕 − 𝒏𝒑 )

Autoregressive (AR) model

Page 4: Artificial Neural Networksvasighi/courses/ann97win/ann08.pdf · Artificial Neural Networks Part 8 . Neural Learning Application of MLPs Modelling of Dynamic Systems Time-variant or

Neural Learning Application of MLPs

Modelling of Dynamic Systems

Time delay neural network (TDNN)

Feedforward architectures with multiple layers,

without any feedback

Page 5: Artificial Neural Networksvasighi/courses/ann97win/ann08.pdf · Artificial Neural Networks Part 8 . Neural Learning Application of MLPs Modelling of Dynamic Systems Time-variant or

Neural Learning Application of MLPs

Modelling of Dynamic Systems

Time delay neural network (TDNN)

Feedforward architectures with multiple layers,

without any feedback

Page 6: Artificial Neural Networksvasighi/courses/ann97win/ann08.pdf · Artificial Neural Networks Part 8 . Neural Learning Application of MLPs Modelling of Dynamic Systems Time-variant or

Neural Learning Application of MLPs

Modelling of Dynamic Systems

MLP with feedback

It is able to “remember” previous outputs to produce

the current or future response.

𝒙 𝒕 = 𝒇(𝒙 𝒕 − 𝟏 , 𝒙 𝒕 − 𝟐 ,… , 𝒙 𝒕 − 𝒏𝒑 , 𝒚 𝒕 − 𝟏 , 𝒚 𝒕 − 𝟏 ,… , 𝒚 𝒕 − 𝒏𝒒 )

Page 7: Artificial Neural Networksvasighi/courses/ann97win/ann08.pdf · Artificial Neural Networks Part 8 . Neural Learning Application of MLPs Modelling of Dynamic Systems Time-variant or

Neural Learning Application of MLPs

Modelling of Dynamic Systems

MLP with feedback

It is able to “remember” previous outputs to produce

the current or future response.

Page 8: Artificial Neural Networksvasighi/courses/ann97win/ann08.pdf · Artificial Neural Networks Part 8 . Neural Learning Application of MLPs Modelling of Dynamic Systems Time-variant or

You might want to trydemo nnd12sd1 .

It allows you to see the

error surface in two

dimensional space and

inspect the learning.

Neural Learning

Page 9: Artificial Neural Networksvasighi/courses/ann97win/ann08.pdf · Artificial Neural Networks Part 8 . Neural Learning Application of MLPs Modelling of Dynamic Systems Time-variant or

You might want to try demo nnd11gn .

Neural Learning

Page 10: Artificial Neural Networksvasighi/courses/ann97win/ann08.pdf · Artificial Neural Networks Part 8 . Neural Learning Application of MLPs Modelling of Dynamic Systems Time-variant or

Neural Learning

How to decide the most suitable MLP- topology?

How large should the network be?

Computation time• the number of hidden nodes directly impacts the computation

time required to train the network

Generalization vs. Memorization

Page 11: Artificial Neural Networksvasighi/courses/ann97win/ann08.pdf · Artificial Neural Networks Part 8 . Neural Learning Application of MLPs Modelling of Dynamic Systems Time-variant or

Generalization Introduction

Recall the idea of getting a neural network to learn a classification decision

boundary:

Our goal is for the network to generalize to classify new inputs

correctly.

If the training data contains noise, we don’t want the training data to be

classified totally accurately as that is likely to reduce the generalization

ability.

Page 12: Artificial Neural Networksvasighi/courses/ann97win/ann08.pdf · Artificial Neural Networks Part 8 . Neural Learning Application of MLPs Modelling of Dynamic Systems Time-variant or

Similarly if our network is required to recover an underlying function

(curve-fitting) from noisy data:

Generalization Introduction

x

f(x)

x

f(x)

The network can give a more accurate generalization to new inputs if its

output curve does not pass through all the data points. Again, allowing a

larger error on the training data is likely to lead to better generalization.

Page 13: Artificial Neural Networksvasighi/courses/ann97win/ann08.pdf · Artificial Neural Networks Part 8 . Neural Learning Application of MLPs Modelling of Dynamic Systems Time-variant or

Generalization Introduction

Given a large network, it is possible that repeated training iterations

successively improve performance of the network on training data, e.g., by

"memorizing" training samples, but the resulting network may perform

poorly on test data. This phenomenon is called over-training.

x

f(x)

x

f(x)Over-trainingUnder-training

Page 14: Artificial Neural Networksvasighi/courses/ann97win/ann08.pdf · Artificial Neural Networks Part 8 . Neural Learning Application of MLPs Modelling of Dynamic Systems Time-variant or

Generalization Improving generalization

we need to avoid both under-fitting and over-fitting of the

training data. There are a number of approaches for

improving generalization:

1. Arrange to have the optimum number of free parameters

(independent connection weights) in our model.

• Number of hidden layers

• Number of nodes in each layer

2. Stop the gradient descent training at the appropriate point.

Page 15: Artificial Neural Networksvasighi/courses/ann97win/ann08.pdf · Artificial Neural Networks Part 8 . Neural Learning Application of MLPs Modelling of Dynamic Systems Time-variant or

Generalization Improving generalization

We have talked about training data sets: the data used for training our

networks.

The testing data set is the unseen data that is used to test the network’s

generalization.

What we can do is assume that the training data and testing data are

drawn (randomly) from the same data set

Dataset

Total(Inputs & targets)

Training data(Include Inputs &

corresponding targets)

Test data(Include Inputs &

corresponding targets)

Page 16: Artificial Neural Networksvasighi/courses/ann97win/ann08.pdf · Artificial Neural Networks Part 8 . Neural Learning Application of MLPs Modelling of Dynamic Systems Time-variant or

Generalization Improving generalization

The portion of the data we have available for training that is withheld from

the network training is called the validation data set, and the remainder of

the data is called the training data set.

This approach is called the hold out method.

Dataset

Total(Inputs & targets)

Training data set(Include Inputs &

corresponding targets)

Validation data set(Include Inputs &

corresponding targets)

Training data (Include Inputs &

corresponding targets)

Test data(Include Inputs &

corresponding targets)

Cross-Validation

General practical problem concerning how

to best split the available training data into

distinct training and validation data sets. For

example:

• What fraction of the patterns should be in

the validation set?

• Should the data be split randomly, or by

some systematic algorithm?

Page 17: Artificial Neural Networksvasighi/courses/ann97win/ann08.pdf · Artificial Neural Networks Part 8 . Neural Learning Application of MLPs Modelling of Dynamic Systems Time-variant or

Generalization Improving generalization

Random subsampling cross-validation method

• around 60-90 % of samples are chosen at random for the training

• This partitioning system must be repeated several times during the

learning process to provide (for each trial) different samples in both

subsets.

Page 18: Artificial Neural Networksvasighi/courses/ann97win/ann08.pdf · Artificial Neural Networks Part 8 . Neural Learning Application of MLPs Modelling of Dynamic Systems Time-variant or

Generalization Improving generalization

we divide all the training data at random into K distinct subsets, train the

network using K–1 subsets, and test the network on the remaining

subset.

K-fold cross-validation

Cross-Validation

Page 19: Artificial Neural Networksvasighi/courses/ann97win/ann08.pdf · Artificial Neural Networks Part 8 . Neural Learning Application of MLPs Modelling of Dynamic Systems Time-variant or

Generalization Improving generalization

we divide all the training data at random into K distinct subsets, train the

network using K–1 subsets, and test the network on the remaining

subset.

Dataset

Total(Inputs & targets)

Training data(Include Inputs &

corresponding targets)

Test data (Include Inputs &

corresponding targets)

Subset 1

Subset 2

Subset K

K-fold cross-validation

Cross-Validation

Page 20: Artificial Neural Networksvasighi/courses/ann97win/ann08.pdf · Artificial Neural Networks Part 8 . Neural Learning Application of MLPs Modelling of Dynamic Systems Time-variant or

Generalization Improving generalization Cross-Validation

The process of training and validation is then repeated for each of the K

possible choices of the subset omitted from the training.

Dataset

Total(Inputs & targets)

Test data (Include Inputs &

corresponding targets)

Subset 1

Subset 2

Subset K

Left-out Subset

Remained Subsets

The average performance

on the K omitted subsets is

then our estimate of the

generalization performance.

K-fold cross-validation

If K is made equal to the full sample size, it is called leave-one-out cross

validation (LOOCV)

Page 21: Artificial Neural Networksvasighi/courses/ann97win/ann08.pdf · Artificial Neural Networks Part 8 . Neural Learning Application of MLPs Modelling of Dynamic Systems Time-variant or

DataSample Variable

K-fold cross-validation

Generalization Improving generalization Cross-Validation

Venetian blind

• simple and easy to implement,

• generally safe to use if there are relatively many objects that are not in

random order.

we divide all the training data into groups with K members, and

corresponding members from each group form a subset.

K=4

subset 1

subset 2

subset 3

subset 4

20 samples

Page 22: Artificial Neural Networksvasighi/courses/ann97win/ann08.pdf · Artificial Neural Networks Part 8 . Neural Learning Application of MLPs Modelling of Dynamic Systems Time-variant or

DataSample Variable

K-fold cross-validation

Generalization Improving generalization Cross-Validation

Contiguous Blocks

• simple and easy to implement

• safe to use when there are relative many objects in random order.

we divide all the training data into groups with N/K members (N: number

of samples), and members of each subset are contiguous.

K=4

subset 1

subset 2

20 samples

Page 23: Artificial Neural Networksvasighi/courses/ann97win/ann08.pdf · Artificial Neural Networks Part 8 . Neural Learning Application of MLPs Modelling of Dynamic Systems Time-variant or

Generalization Improving generalization

the most obvious and simplest way to prevent over-fitting in our neural

network model is to restrict the number of free parameters they have.

• some form of validation scheme can be used to find the best number for

each given problem

Weight Restriction

# of hidden neurons

RMSECV

Page 24: Artificial Neural Networksvasighi/courses/ann97win/ann08.pdf · Artificial Neural Networks Part 8 . Neural Learning Application of MLPs Modelling of Dynamic Systems Time-variant or

Generalization Improving generalization

For the iterative gradient descent based network training procedures we

have considered (e.g. batch back-propagation), the training set error will

naturally decrease with increasing numbers of epochs of training.

epoch

Error

Error of Training set

Error of Validation set

Over

FittingUnder

fitting

The error on the unseen

validation and testing data sets,

however, will start off decreasing

as the under-fitting is reduced,

but then it will eventually begin

to increase again as over-fitting

occurs.

Early Stopping

Page 25: Artificial Neural Networksvasighi/courses/ann97win/ann08.pdf · Artificial Neural Networks Part 8 . Neural Learning Application of MLPs Modelling of Dynamic Systems Time-variant or

epoch

Error

Error of Validation set

Error of Training set

One potential problem with the idea of stopping early is that the validation

error may go up and down numerous times during training.

The safest approach is generally to train to convergence (or at least until it

is clear that the validation error is unlikely to fall again)

Generalization Improving generalization Early Stopping

Page 26: Artificial Neural Networksvasighi/courses/ann97win/ann08.pdf · Artificial Neural Networks Part 8 . Neural Learning Application of MLPs Modelling of Dynamic Systems Time-variant or