Artificial Neural Networksvasighi/courses/ann97win/ann08.pdf · Artificial Neural Networks Part 8 ....

Artificial Neural

Networks

Part 8

Neural Learning Application of MLPs

Modelling of Dynamic Systems

Time-variant or time dependent

Financial market, Weather forecasting, Power consumption

The outputs of dynamic systems, at any instant of time, depend

on their previous input and output values

MLPs in Functional Approximation MLPs in Dynamic Systems



Time delay neural network (TDNN)

Feedforward architectures with multiple layers,

without any feedback

𝒙 𝒕 = 𝒇(𝒙 𝒕 − 𝟏 , 𝒙 𝒕 − 𝟐 ,… , 𝒙 𝒕 − 𝒏𝒑 )

Autoregressive (AR) model



Time delay neural network (TDNN)

Feedforward architectures with multiple layers,

without any feedback



MLP with feedback

It is able to “remember” previous outputs to produce

the current or future response.

𝒙 𝒕 = 𝒇(𝒙 𝒕 − 𝟏 , 𝒙 𝒕 − 𝟐 ,… , 𝒙 𝒕 − 𝒏𝒑 , 𝒚 𝒕 − 𝟏 , 𝒚 𝒕 − 𝟏 ,… , 𝒚 𝒕 − 𝒏𝒒 )



MLP with feedback

It is able to “remember” previous outputs to produce

the current or future response.

You might want to trydemo nnd12sd1 .

It allows you to see the

error surface in two

dimensional space and

inspect the learning.

Neural Learning

You might want to try demo nnd11gn .

Neural Learning

Neural Learning

How to decide the most suitable MLP- topology?

How large should the network be?

Computation time• the number of hidden nodes directly impacts the computation

time required to train the network

Generalization vs. Memorization

Generalization Introduction

Recall the idea of getting a neural network to learn a classification decision

boundary:

Our goal is for the network to generalize to classify new inputs

correctly.

If the training data contains noise, we don’t want the training data to be

classified totally accurately as that is likely to reduce the generalization

ability.

Similarly if our network is required to recover an underlying function

(curve-fitting) from noisy data:


x

f(x)

x

f(x)

The network can give a more accurate generalization to new inputs if its

output curve does not pass through all the data points. Again, allowing a

larger error on the training data is likely to lead to better generalization.


Given a large network, it is possible that repeated training iterations

successively improve performance of the network on training data, e.g., by

"memorizing" training samples, but the resulting network may perform

poorly on test data. This phenomenon is called over-training.

x

f(x)

x

f(x)Over-trainingUnder-training

Generalization Improving generalization

we need to avoid both under-fitting and over-fitting of the

training data. There are a number of approaches for

improving generalization:

1. Arrange to have the optimum number of free parameters

(independent connection weights) in our model.

• Number of hidden layers

• Number of nodes in each layer

2. Stop the gradient descent training at the appropriate point.


We have talked about training data sets: the data used for training our

networks.

The testing data set is the unseen data that is used to test the network’s

generalization.

What we can do is assume that the training data and testing data are

drawn (randomly) from the same data set

Dataset

Total(Inputs & targets)

Training data(Include Inputs &

corresponding targets)

Test data(Include Inputs &



The portion of the data we have available for training that is withheld from

the network training is called the validation data set, and the remainder of

the data is called the training data set.

This approach is called the hold out method.

Dataset


Training data set(Include Inputs &


Validation data set(Include Inputs &


Training data (Include Inputs &


Test data(Include Inputs &


Cross-Validation

General practical problem concerning how

to best split the available training data into

distinct training and validation data sets. For

example:

• What fraction of the patterns should be in

the validation set?

• Should the data be split randomly, or by

some systematic algorithm?


Random subsampling cross-validation method

• around 60-90 % of samples are chosen at random for the training

• This partitioning system must be repeated several times during the

learning process to provide (for each trial) different samples in both

subsets.


we divide all the training data at random into K distinct subsets, train the

network using K–1 subsets, and test the network on the remaining

subset.

K-fold cross-validation

Cross-Validation


we divide all the training data at random into K distinct subsets, train the

network using K–1 subsets, and test the network on the remaining

subset.

Dataset


Training data(Include Inputs &


Test data (Include Inputs &


…

Subset 1

Subset 2

Subset K


Cross-Validation

Generalization Improving generalization Cross-Validation

The process of training and validation is then repeated for each of the K

possible choices of the subset omitted from the training.

Dataset


Test data (Include Inputs &


…

Subset 1

Subset 2

Subset K

Left-out Subset

Remained Subsets

The average performance

on the K omitted subsets is

then our estimate of the

generalization performance.


If K is made equal to the full sample size, it is called leave-one-out cross

validation (LOOCV)

DataSample Variable



Venetian blind

• simple and easy to implement,

• generally safe to use if there are relatively many objects that are not in

random order.

we divide all the training data into groups with K members, and

corresponding members from each group form a subset.

…

K=4

subset 1

subset 2

subset 3

subset 4

20 samples

DataSample Variable



Contiguous Blocks

• simple and easy to implement

• safe to use when there are relative many objects in random order.

we divide all the training data into groups with N/K members (N: number

of samples), and members of each subset are contiguous.

K=4

subset 1

subset 2

20 samples

…


the most obvious and simplest way to prevent over-fitting in our neural

network model is to restrict the number of free parameters they have.

• some form of validation scheme can be used to find the best number for

each given problem

Weight Restriction

# of hidden neurons

RMSECV


For the iterative gradient descent based network training procedures we

have considered (e.g. batch back-propagation), the training set error will

naturally decrease with increasing numbers of epochs of training.

epoch

Error

Error of Training set

Error of Validation set

Over

FittingUnder

fitting

The error on the unseen

validation and testing data sets,

however, will start off decreasing

as the under-fitting is reduced,

but then it will eventually begin

to increase again as over-fitting

occurs.

Early Stopping

epoch

Error

Error of Validation set

Error of Training set

One potential problem with the idea of stopping early is that the validation

error may go up and down numerous times during training.

The safest approach is generally to train to convergence (or at least until it

is clear that the validation error is unlikely to fall again)

Generalization Improving generalization Early Stopping

Artificial Neural Networksvasighi/courses/ann97win/ann08.pdf · Artificial Neural Networks Part 8 ....

Documents

Transcript of Artificial Neural Networksvasighi/courses/ann97win/ann08.pdf · Artificial Neural Networks Part 8 ....