Adaptive neural network model for time-series forecasting.pdf

7/29/2019 Adaptive neural network model for time-series forecasting.pdf

1/10

Stochastics and Statistics

Adaptive neural network model for time-series forecasting

W.K. Wong a,*, Min Xia a,b, W.C. Chu a

a Institute of Textiles and Clothing, The Hong Kong Polytechnic University, Hong Kongb College of Information Science and Technology, Donghua University, Shanghai, China

a r t i c l e i n f o

Article history:

Received 27 January 2010Accepted 14 May 2010

Available online 1 June 2010

Keywords:

Time-series

Forecasting

Adaptive metrics

Neural networks

a b s t r a c t

In this study, a novel adaptive neural network (ADNN) with the adaptive metrics of inputs and a new

mechanism for admixture of outputs is proposed for time-series prediction. The adaptive metrics ofinputs can solve the problems of amplitude changing and trend determination, and avoid the over-fitting

of networks. The new mechanism for admixture of outputs can adjust forecasting results by the relative

error and make them more accurate. The proposed ADNNmethod can predict periodical time-series with

a complicated structure. The experimental results show that the proposed model outperforms the auto-

regression (AR), artificial neural network (ANN), and adaptive k-nearest neighbors (AKN) models. TheADNN

model is proved to benefit from the merits of the ANNand the AKNthrough its novel structure with high

robustness particularly for both chaotic and real time-series predictions.

2010 Elsevier B.V. All rights reserved.

1. Introduction

Many planning activities require prediction of the behavior of

variables (e.g. economic, financial, traffic and physical). The predic-tions support the strategic decisions of organizations (Makridakis,

1996), which in turn sustain a practical interest in forecasting

methods. Time-series methods are generally used to model fore-

casting systems when there is not much information about the

generation process of the underlying variable and when other vari-

ables provide no clear explanation about the studied variable

(Zhang, 2003).

Time-series forecasting is used to forecast the future based on

historical observations (Makridakis et al., 1998). There have been

many approaches to modeling time-series dependent on the the-

ory or assumption about the relationship in the data (Huarng

and Yu, 2006; Chen and Hwang, 2000; Taylor and Buizza, 2002;

Kim and Kim, 1997; Zhang et al., 1998; Wang and Chien, 2006;

Singh and Deo, 2007). Traditional methods, such as time-series

regression, exponential smoothing and autoregressive integrated

moving average (Brooks, 2002) (ARIMA), are based on linear mod-

els. All these methods assume linear relationships among the past

values of the forecast variable and therefore non-linear patterns

cannot be captured by these models. One problem that makes

developing and implementing this type of time-series model diffi-

cult is that the model must be specified and a probability distribu-

tion for data must be assumed (Hansen et al., 2002).

Approximation of linear models to complex real-world problems

is not always satisfactory.

Recently, artificial neural networks (ANN) have been proposed

as a promising alternative to time-series forecasting. A large num-

ber of successful applications have shown that neural networks

can be a very useful tool for time-series modeling and forecasting(Adya and Collopy, 1998; Zhang et al., 1998; Celik and Karatepe,

2007; Wang and Chien, 2006; Sahoo and Ray, 2006; Singh and

Deo, 2007; Barbounis and Teocharis, 2007; Bodyanskiy and Popov,

2006; Freitas and Rodrigues, 2006). The reason is that the ANNis a

universal function approximator which is capable of mapping any

linear or non-linear functions (Cybenko, 1989; Funahashi, 1989).

Neural networks are basically a data-driven method with few pri-

ori assumptions about underlying models. Instead they let data

speak for themselves and have the capability to identify the under-

lying functional relationship among the data. In addition, the ANN

is capable of tolerating the presence of chaotic components and

thus is better than most methods (Masters, 1995). This capacity

is particularly important, as many relevant time-series possess sig-

nificant chaotic components.

However, since the neural network lacks a systematic proce-

dure for model-building, the forecasting result is not always accu-

rate when the input data is very different from the training data.

Like other flexible non-linear estimation methods such as kernel

regression and smoothing splines, the ANN may suffer either

under-fitting or over-fitting (Moody, 1992; Geman et al., 1992;

Bartlett, 1997). A network that is not sufficiently complex can fail

to fully detect the signal in a complicated data set and lead to

under-fitting. A network that is too complex may fit not only the

signal but also the noise and lead to over-fitting. Over-fitting is

especially misleading because it can easily lead to wild prediction

far beyond the range of the training data even with the noise-free

data. In order to solve this problem, a novelANNmodel is proposed

0377-2217/$ - see front matter 2010 Elsevier B.V. All rights reserved.doi:10.1016/j.ejor.2010.05.022

* Corresponding author. Tel.: +00852 64300917.

E-mail address: [email protected] (W.K. Wong).

European Journal of Operational Research 207 (2010) 807816

Contents lists available at ScienceDirect

European Journal of Operational Research

j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / e j o r
http://dx.doi.org/10.1016/j.ejor.2010.05.022mailto:[email protected]://dx.doi.org/10.1016/j.ejor.2010.05.022http://www.sciencedirect.com/science/journal/03772217http://www.elsevier.com/locate/ejorhttp://www.elsevier.com/locate/ejorhttp://www.sciencedirect.com/science/journal/03772217http://dx.doi.org/10.1016/j.ejor.2010.05.022mailto:[email protected]://dx.doi.org/10.1016/j.ejor.2010.05.022


2/10

in this study with the adaptive metrics of inputs, and the output

data is evolved by a mechanism for admixture. The adaptive met-

rics of inputs of the model can adapt to local variations of trends

and amplitudes. Most inputs of the network are close to the histor-

ical data in order to avoid a dramatic increase in the forecasting er-

ror due to the big difference between training data and input data.

In using the proposed mechanism for admixture of outputs, the

forecasting result can be adjusted by the relative error, making

the forecasting result more accurate.

The forecasting results generated by the proposed model are

compared with those obtained by the traditional statistical AR

model, traditional ANN architectures (BP network), and adaptive

k-nearest neighbors (AKN) method (Kulesh et al., 2008) in the re-

lated literature. The experimental results indicate that the pro-

posed model outperforms the other models, especially in chaotic

and real data time-series predictions.

This paper is organized as follows. In the next section, the fun-

damental principle of the proposed method is introduced. The

experimental results are presented in Section 3. The last section

concludes this study.

2. Methodology

We focus on one-step-ahead point forecasting in this work. Let

y1,y2,y3, . . . ,yt be a time-series. At time t for tP 1, the next value

yt+1 will be predicted based on the observed realizations of

yt,yt1,yt2, . . . ,y1.

2.1. The ANN approach to time-series modeling

The ANNis a flexible computing framework for a broad range of

non-linear problems (Wong et al., 2000). The network model is

greatly determined by data characteristics. A single hidden-layer

feed-forward network is the most widely used model for time-ser-

ies modeling and forecasting (Zhang and Qi, 2005). The model is

characterized by a network of three layers of simple processingunits connected by acyclic links. The hidden layers can capture

the non-linear relationship among variables. Each layer consists

of multiple neurons that are connected to neurons in adjacent

layers. The relationship between the output yt+1 and the inputs

yt,yt1,yt2, . . . ,ytp+1 has the following mathematical

representation:

yt1 a0 Xqj1

ajg b0j Xpi1

bijyti1

! e; 1

where aj (j = 0,1,2,. . . , q) and bij (i = 0,1,2, . . . ,p; j = 1,2,3,. . . , q) arethe model parameters called connection weights, p is the number

of input nodes and q is the number of hidden nodes. The logistic

function is often used as the hidden-layer transfer function, which

is,

gx 1

1 ex: 2

A neural network can be trained by the historical data of a time-

series in order to capture the characteristics of this time-series. The

model parameters (connection weights andnode biases) can be ad-

justed iteratively by the process of minimizing the forecasting er-

rors (Liu et al., 1995).

2.2. Adaptive neural network model for forecasting (ADNN)

It is well known that the ANNmay suffer either under-fitting orover-fitting (Moody, 1992; Geman et al., 1992; Bartlett, 1997). A

network that is not sufficiently complex can fail to fully detect

the signal leads to under-fitting. Over-fitting generally occurs

when a model is excessively complex. A model which has been un-

der-fitting or over-fitting will generally have poor predictive per-

formance, as it can exaggerate minor fluctuations in the data. For

these two problems, the over-fitting is more important when the

signal data is sufficient and the network is sufficiently complex.

Thus, in this paper we emphasize on the problem of over-fitting

for theANN. Generally, theANNalgorithm is said to over-fitting rel-

ative to a simpler one if it is more accurate in fitting known data

(hindsight) but less accurate in predicting new data (foresight).

In order to avoid over-fitting, the adaptive neural network model

is proposed. In this model, the hindsight data is used to modify

the inputs of the ANN in the prediction processing making the in-

puts approach to the learning data. Thus, this algorithm can reduce

the chance of over-fitting. Based on the current ANN, an extension

is done to develop the adaptive neural network (ADNN) model for

time-series forecasting. Firstly, a strategy is used to initialize the

input data yt,yt1,yt2, . . . ,ytm+1, where m is the number of input

nodes. The strategy adopts the adaptive metrics which are similar

to the adaptive k-nearest neighbor method. The data set

yt,yt1,yt2, . . . ,ytm+1 is compared with the other parts of this

time-series, which have the same length. The determination of

the closeness measure is the major factor in prediction accuracy.

Closeness is usually defined in terms of metric distance on the

Euclidean space. The most common choices are the Minkowski

metrics:

LMYt; Yr jyt yrjd jyt1 yr1j

d jytm1 yrm1jd

1

d:

3

This equation gives the value difference between Yt and Yr, but

the differences of trends and amplitudes are not presented. In

time-series forecasting, the information on trends and amplitudes

is the crucial factor. In this study, adaptive metrics are introduced

to solve this problem and the arithmetic is presented as:

LAYt; Yr minkr;urfrkr; ur; 4

frkr; ur jyt kryr urjd jyt1 kryr1 urj

d

jytm1 kryrm1 urjd

1

d; 5

where hr and lr are the largest and smallest elements of vector cor-

respondingly, kr 2 1;hrlr

h i, ur2 [0,hr lr]. The parameter of minimi-

zation kr equilibrates the amplitude difference between Yt and Yr.

The parameter ur is responsible for the trend of time-series.

The optimization problem (4) can be solved by the algorithm of

LevenbergMarquardt (Press et al., 1992) optimization or other

gradient methods for dP 1. In this study, d is assumed to be 2

and gives the widely used Euclidean metrics.

frkr; ur jyt kryr urj2

jyt1 kryr1 urj2

jytm1 kryrm1 urj2

1

2: 6

For d = 2, two equations are considered:

@frkr;ur@kr

0;

@frkr;ur@ur

0:

(7

When the corresponding linear system is solved, the solution of the

minimization problem can be obtained analytically:

ur z1z2 z3z4

mz2 z23

; kr mz4 z1z3

mz2 z23

;

where

808 W.K. Wong et al. / European Journal of Operational Research 207 (2010) 807816


3/10

z1 Xmi1

yti1; z2 Xmi1

y2ri1; z3 Xmi1

yri1;

z4 Xmi1

yri1yti1:

Based on this strategy, the adaptive k-nearest neighbors are

chosen. And the input vector of the first network (known as the

main network) can be defined as:

inputv

qvt ; qv

t1; . . . ; qv

tp1

yt urvkrv

;yt1 urv

krv; . . . ;

ytp1 urvkrv

: 8

Most input values can be close to the historical data using this

method. The forecasting error increases dramatically due to the

big difference between training data and input data. In order to

get more accurate results for time-series yt,yt1,yt2, . . . ,ytp+1, k

sets of inputs are used and the output vector are outputv = bv,

v= 1,2,. . . , k. Due to the different value of LAYt; Yrv; krv ; urv , and

jt rvj for v = 1,2,. . . , k, the forecasting result is affected. The rela-

tive error is used to measure the impact of LAYt; Yrv ; krv ; urv ,

and jt rvj. The relative error is defined as REv

yrv~yrv

yrv , where yrvis the source point and ~yrv is the predicted point. In this study,

the second neural network (known as modified network) is used

to find out the relationship between the four factors and the REv.

The estimated result of REv is fREv, which is presented as follows:fREv fLAYt; Yrv ; krv ; urv ; jt rvj: 9The mechanism for admixture of outputs is presented as

follows:

yt1 1

U

Xkv1

krv bv urv eeREv ; 10

U

Xk

v1

e

eREv : 11

From Eq. (10), the forecasting result of yt+1 is calculated from bvv= 1,2,. . . , k with different weighing coefficients. Based on the

methodology proposed above, the forecasting scheme can be for-

mulated as shown in Fig. 1.

The steps of the algorithm of the proposed method are given as

follows.

Step 1: Train the two neural networks using the historical data. In

the first neural network, yi,yi1,yi2, . . . ,yim+1 are the

input training data, and yi+1 is the output training data.

In the second neural network, LAYt; Yrv; krv ; urv, and

jt rvj are the training inputs, and the relative error fREvis the training output. BP algorithm is used to train these

two neural networks.

Step 2: Compare the data set yt,yt1,yt2 , . . . ,ytm+1 and other

parts of the time-series using the adaptive metrics dis-

tance on the Euclidean space based on Eq. (6).

Step 3: Choose the k-nearest neighbors and get krv ; urv based on

Eq. (7). Initialize the input data of the first neural network

according to Eq. (8). The input data of the first neural net-

work are qvi ; qv

i1; qv

i2; . . . ; qv

im1; v 1; 2; . . . ; k.

Step 4: Apply the first neural network and obtain the results of the

outputv = bv, v= 1,2,. . . , k.

Step 5: Use LAYt; Yrv; krv ; urv, and j t rvj to predict the relative

error fREv, the number of hidden neurons is 5 for all simu-lations in the second neural network.

Step 6: Apply the mechanism for admixture of Eq. (10) and obtain

the forecasting result of yt+1.

3. Numerical simulations

Since the auto-regression model (AR), the traditional Back Prop-

agation (BP) ANN architectures and the adaptive k-nearest neigh-

bors method (AKN) are the popular methods for forecasting

programme, the performance of the proposed model is bench-

marked against ZhangsAR model (Zhang, 2003), AdyasANNmodel

(Adya and Collopy, 1998), and Kuleshs AKN model (Kulesh et al.,

2008) reported in the literature. To illustrate the accuracy of the

method, several deterministic and chaotic time-series are gener-

ated and predicted, and three real time-series are considered in

this section. The Mean Absolute Percentage Error (MAPE) statistic

is used to evaluate the forecasting performance of the model. The

MAPE is regarded as one of the standard statistical performance

measures and takes the following form

MAPE 1

MXM

i1

yi ~yiyi

100%;

where yi is the source point, ~yi is the predicted point and M is the

number of predicted points.

The mean squared error (NMSE) is used as the error criterion,

which is the ratio of the mean squared error to the variance of

the time-series. It defined, for a time-series yi, by

NMSE

PMi1yi ~yi

2PMi1yi ^yi

2

PMi1yi ~yi

2

Mr2;

^yi 1

M

XMi1

yi;

Fig. 1. The forecasting scheme for adaptive neural network modeling.

W.K. Wong et al. / European Journal of Operational Research 207 (2010) 807816 809


4/10

where ^yi is the mean value of the source data and r2 is the varianceof the source data.

In the simulations, for bothAKNmethod andADNNmethod, the

number of nearest neighbors is set as k, and the data length for

comparison is set as m. For each simulation, we use the AR(m)

for forecasting. The number of input nodes and the number of hid-

den nodes for ANNmethod are set the same as the first neural net-

work of ADNN, and the parameters setting for all simulations areshown in Table 1.

3.1. Deterministic synthetic examples

In this section, the proposed method is tested on three deter-

ministic time-series with obvious seasonal dependence, trend

and amplitude change. The corresponding MAPEvalues and NMSE

values of the predicted time-series are listed in Tables 24 respec-

tively. In order to investigate the generalization ability for the pro-

posed model, the different noise terms are added into the time

series. For the time-series yt,yt1,yt2, . . . ,y1, the noise term is

rt,rt1,rt2, . . . ,r1. Thus, the training inputs are yi + ri,yi1 +ri1,yi3 + ri3, . . . ,yim+1 + rim+1, and the training output isyi+1 + ri+1. In this study, the noise terms are generated by normaldistribution.

The first synthetic time-series is a seasonal dependence series

with linear increasing as shown in Fig. 2a. The equation of the sea-

sonal dependence series is:

St cost

25

sin

t

100

t

1000 1; t2 0;2200;

where t denotes time. For this time-series data set, the first 2200

lengths are used for training, and the last 200 source time-series

lengths for prediction. The parameters k and m are set as 2 and100 respectively. The numbers of the hidden neurons is 16. For this

Table 1

Parameters setting for simulations.

Types of time-series k First networks hidden

nodes of ADNN

m

Seasonal dependence time-series 2 16 100

Multiplicative seasonality time-series 2 8 15

High-frequency time-series 3 10 70

Duffing chaotic time-series 2 8 50

MackeyGlass chaotic time-series 3 6 14Ikeda map chaotic time-series 3 5 28

Sunspot time-series 3 6 11

Traffic time-series 3 12 180

Payments time-series 3 8 30

Table 2

Prediction summary for seasonal dependence time-series.

Noise AKN AR ANN ADNN

MAPE NMSE MAPE NMSE MAPE (%) NMSE MAPE NMSE

0 4.83 107 6.35 1011 3.43 1017 2.53 1030 0.02 1.16 1010 3 103 1.3 1010

0.001 0.83% 7.34 105 0.84% 1.23 104 2.95 1.8 103 1.05% 1.31 104

0.005 2.4% 6.08 104 2.1083% 8.53 104 4.32 4.1 103 2.86% 1.34 103

0.01 2.65% 7.45 104 2.55% 2.1 104 4.85 3.1 104 3.49% 1.21 103

0.02 3.31% 9.81 104 3.19% 2.8 104 5.21 5.2 103 3.87% 2.6 103

0.03 4.74% 2.1 103 4.18% 3.2 103 5.55 7.1 103 4.31% 3.3 103

0.04 4.92% 2.4 103 5.12% 3.9 103 6.44 9.2 103 5.49% 4.4 103

0.05 5.68% 2.9 103

5.32% 4.8 103

8.06 1.21 102

6.31% 5.2 103

Table 3

Prediction summary for multiplicative seasonality series.


MAPE (%) NMSE MAPE (%) NMSE MAPE (%) NMSE MAPE (%) NMSE

0 1.15 1.42 104 1.18 1.88 106 13.81 0.09 1.17 6.72 104

0.001 3.61 1.51 103 3.05 2.7 103 15.69 0.12 4.27 3.9 103

0.005 4.76 1.89 103 7.29 9.7 103 17.01 0.15 6.11 1.1 102

0.01 5.6 6.3 103 8.43 1.89 102 19.31 0.18 9.13 1.72 102

0.02 7.47 8.3 103 12.46 3.17 102 22.44 0.22 12.95 5.58 102

0.03 8.92 1.88 102 13.43 5.31 102 28.75 0.41 13.67 7.3 102

0.04 10.56 2.12 102 14.49 7.2 102 34.78 0.52 14.76 8.8 102

0.05 11.85 3.89 102 15.71 8.3 102 35.34 0.64 15.92 9.8 102

Table 4

Prediction summary for high-frequency series.



0 5.52 9.14 104 1.18 3.54 105 1.09 7.01 105 1.17 1.78 104

0.001 7.55 1.6 103 5.36 1.7 103 5.88 1.79 103 5.57 2.18 103

0.005 8.91 5.7 103 6.78 3.1 103 6.22 3.2 103 6.32 3.2 103

0.01 10.98 4.3 103 9.41 4.2 103 8.20 3.9 103 8.24 4.1 103

0.02 14.72 7.3 103 9.96 4.9 103 8.96 4.2 103 9.17 4.5 103

0.03 17.31 8.2 103 13.15 9.6 103 9.47 4.9 103 9.52 4.9 103

0.04 24.43 1.3 102 14.09 1.1 102 9.67 5.3 103 10.16 5.4 103

0.05 32.27 3.12 102 15.38 1.6 102 11.24 6.5 103 12.01 8.3 103



5/10

time series, the amplitude of seasonal components does not change,

which means that the optimal value of parameter krv is equal to 1.

In the prediction process, only the parameter urv , which is respon-

sible for a changing trend, should be determined. Fig. 2b illustrates

the prediction results with different noise terms using ADNN. In

Fig. 2b, the prediction data presented by *, h, and + are under

the circumstances that the source data added with the noise varia-

tion 0, 0.01, and 0.05 respectively. The simulation result shows that

the model can predict this time-series very accurately when the

source data with no noise term, and the forecasting accuracy is de-

creased when the noise variation is increased. The performance of

different models is described in Table 2. Table 2 shows that the

ADNNhas better performance than the ANN model, and has almost

the same performance of theAKN. As the synthetic time-series has a

feature of strong orderliness, the result of ADNN is not better than

that of the AR model. Table 2 also indicates that ADNN model has

almost the same noise endurance ability as the AKN and ANN for

this time-series.The second time-series, which is non-linear and multiplicative

seasonal, is simulated and the results are shown in Fig. 3a. This

time-series has a non-linear trend and the amplitude of seasonal

oscillations increases with time. The model of the time-series is de-

scribed as:

St Rt; t 2 0;79;

Sts2

St2s ; t 2 80;590;

(

where Rt 170000

sin t350

cos 9t

7

10

, s = 14. Prediction is done for

15% of source time-series length; the first 85% observations are used

for training. The parameters k and m are set as 2 and 15 respec-

tively. The numbers of the hidden neurons is 8. Fig. 3b illustrates

the prediction results with different noise terms using ADNN.Fig. 3b illustrates that the predicted data is almost the same as

the source data with no noise term. This time-series has a peculiar-

ity in which the amplitude of every following periodic segment is as

much a fixed time as the previous segment. The performance for

different models is described in Table 3. Table 3 shows that the per-

formance of theADNNis better than that of theANN, but more close

to that of the AKN. Like the first time-series simulation above, the

time-series also has a feature of strong orderliness, and the AR mod-

el generates the same results as those of the ADNN. Table 3 also

indicates that ADNN model has better noise endurance ability than

ANN, but almost the same as the AKN for this time-series.

The third synthetic time-series is high-frequency with multipli-

cative seasonality and smoothly increasing amplitude as shown in

Fig. 4a. To formulate this time-series, the following explicit expres-

sion is used:

St t

100sin

t

2 cos

t

20 ; t 2 0;550:

It models a high-frequency series with seasonal periodicity. The

prediction is done for 111

of the time-series length. The first 1011

of

the source data is used for training. The parameters k and m are

set as 3 and 70 respectively. The numbers of the hidden neurons

is 10. Fig. 4b shows the prediction results with different noise terms

using ADNN. The performance for different models is described in

Table 4. Table 4 indicates that the ADNN outperforms the AKN,

but has the same performance as the ANN and AR models. Table 4

also indicates that ADNN model has better noise endurance ability

than AKN, but almost the same as the ANN for this time-series.

In the seasonal dependence series with linear increasing and

non-linear multiplicative seasonality, the performance of theADNN

is better than that of the ANN, but the same as that of the AKN. In

high-frequency time-series with multiplicative seasonality andsmoothly increasing amplitude, the performance of the ADNN is

0 500 1000 1500 20000

1

2

3

4

Time

D

ataseries

(a)

2000 2050 2100 2150 22002

2.5

3

3.5

4

4.5

5

Time

Dataseries

(b)

Source data

Variation=0

Variation=0.01

Variation=0.05

Fig. 2. (a) Time-series with seasonal dependence (vertical line showing the prediction start) and (b) zoom of predicted values with different noise terms using ADNNmodel.

400 450 500 5500

20

40

60

80

Time

Dataser

ies

(a)

560 565 570 575 580 585 5900

20

40

60

80

Time

Dataser

ies

(b)

Source data

Variation=0

Variation=0.01

Variation=0.05

Fig. 3. (a) Time-series with multiplicative seasonality (vertical line showing the prediction start) and (b) zoom of predicted values with different noise terms using ADNN

model.



6/10

the same as the ANN and AR models, and better than that of the

AKN. The above simulation results show that theADNNmodel ben-

efits from the merits of the ANNand AKN.

3.2. Chaotic time-series

In this section, the proposed method is tested on three chaotic

time series. The corresponding MAPE values and NMSE values of

the predicted time-series are listed in Tables 57 respectively.

The Duffing-equation chaotic time-series consists of 2050

observations generated by the equation:

dydx

y x x3 bcosat;dydx

y:

(

The results based on this equation are shown in Fig. 5a. For predic-

tion, only the horizontal component of this chaotic two-component

series is used. We assumed that the time-series consists of the

0 100 200 300 400 5000

1

2

3

4

5

6

7

Time

D

ataseries

(a)

505 510 515 520 525 530 535 540 545 5500

2

4

6

8

Time

D

ataseries

(b)

Source data

Variation=0

Variation=0.01

Variation=0.05

Fig. 4. (a) High-frequency time-series with multiplicative seasonality (vertical line showing the prediction start) and (b) zoom of predicted values with different noise terms

using ADNN model.

Table 5

Prediction summary for Duffing chaotic time-series.



0 7.2999 0.0034 0.6483 2.4143 105 0.2806 2.5246 106 0.3361 7.4141 106

0.001 13.4408 0.011 7.8903 0.0042 3.6449 1.1432 103 3.4630 1.1653 103

0.005 15.8850 0.0134 13.4571 0.0211 9.2552 0.0082 8.8457 0.0068

0.01 19.1673 0.0165 13.9801 0.0223 11.9432 0.0098 11.2729 0.0094

0.02 19.6696 0.0143 19.2554 0.0282 15.2521 0.0122 16.7104 0.0163

0.03 23.4675 0.0631 20.6164 0.0397 16.5124 0.0231 17.3692 0.0267

0.04 25.9714 0.0365 21.9570 0.0376 19.2540 0.0215 20.6404 0.0214

0.05 27.2354 0.0461 24.3629 0.0433 21.5172 0.0273 21.7155 0.0273

Table 6

Prediction summary for MackeyGlass chaotic time-series.



0 1.0484 2.1431 103 0.4941 7.1211 105 0.0373 2.8314 107 0.0655 8.1435 107

0.001 1.5015 3.8461 104 4.2862 0.0031 2.0275 7.7453 104 2.1203 9.6342 104

0.005 3.3865 0.0016 5.6697 0.0061 5.6017 0.0061 6.5236 0.0065

0.01 5.4567 0.0039 6.4944 0.0078 6.9095 0.0082 7.3885 0.0086

0.02 7.1349 0.0067 7.3276 0.0098 8.8958 0.0132 9.0145 0.0172

0.03 9.0888 0.0121 7.8270 0.0110 9.0929 0.0126 10.2763 0.0183

0.04 10.9259 0.0190 9.4304 0.0169 12.1027 0.0324 11.0820 0.0194

0.05 12.7687 0.0287 10.3548 0.0182 13.6413 0.0116 12.3625 0.0213

Table 7

Prediction summary for Ikeda map chaotic time-series.



0 20.5213 0.1231 11.2426 0.0164 9.3819 0.0142 9.1635 0.0114

0.001 21.5745 0.1348 13.3011 0.0193 10.3539 0.0194 9.7751 0.0178

0.005 24.7196 0.1333 17.4031 0.0272 12.8506 0.0183 12.9018 0.0204

0.01 27.3472 0.1408 19.5150 0.0383 14.5355 0.0253 14.9533 0.0281

0.02 26.6362 0.1512 22.9523 0.0577 16.7840 0.0316 16.5926 0.0334

0.03 27.0252 0.1331 24.1169 0.0643 18.6916 0.0298 18.7898 0.0385

0.04 31.5612 0.1612 29.8482 0.0786 21.4366 0.0399 21.2370 0.0465

0.05 34.2396 0.1827 32.5005 0.116 26.3222 0.0796 26.8273 0.0845



7/10

positive values xiP 0 and therefore the value x0 is added to the

source data (Fig. 5b). The first 1950 observations are used for train-

ing and the remaining observations for testing. The parameters k

and m are set as 2 and 50 respectively. The numbers of the hidden

neurons is 8. The prediction results with different noise terms using

ADNN are shown in Fig. 5c. In Fig. 5c, the simulation result shows

that the model can predict this time-series perfectly if the source

data with no noise term, and the prediction get worse when thenoise variation is increased. The performance of different models

is described in Table 5. Table 5 indicates that the ADNN has better

performance than the AKN model and AR model, and has almost

the same performance of the ANN. Table 5 also indicates that ADNN

model has almost the same noise endurance ability as the ANN for

this time-series.

The MackeyGlass benchmarks (Casdagli, 1989) are well known

for their evaluation of prediction methods. The time-series is gen-

erated by the following non-linear differential equation:

dxt

dt bxt

axt s1 xct s

:

The different values ofs generate various degrees of chaos. Thebehavior is chaotic as s > 16.8 and s = 17, which is commonly seenin the literature. In this study, the parameters are set as a = 0.2,

b = 0.1, c= 10, s = 17 as shown in Fig. 6a. According to commonpractice (Kulesh et al., 2008), the first 1950 values of this series

are used for the learning set and the next 100 values for the testing

set. The parameters k and m are set as 3 and 14 respectively. The

numbers of the hidden neurons is 6. Fig. 6c depicts the prediction

results with different noise terms. The performance for different

models is described in Table 6. Table 6 indicates that the ADNN

has almost the same performance of the ANN and the AKN. Table

6 also indicates that ADNN model has almost the same noise

endurance ability as the ANNand AKN for this time-series.

The Ikeda map is another chaotic time-series experiment which

may be given in terms of mapping the complex plane to itself. Thecoordinates of the phase space are related to the complex degree of

freedom z=x + iy. The mapping is (Makridakis, 1996; Murray,

1993):

zn1 p Bznea b

1jzn j2

i;

wherep = 1, B = 1, a = 0.4,b = 6. Fig. 7a is generated by a time-seriesof the Ikeda map, and the series length is 2048. The prediction is

done for the translated vertical component y(t) +y0 for 5% of thetime-series length (Fig. 7b). The parameters k and m are set as 3

and 28 respectively. The numbers of the hidden neurons is 5. The

prediction results with different noise terms are shown in Fig. 7c.

The performance for different models is presented in Table 7. Table

7 indicates that the ADNN model has better prediction result than

AKN and AR, but has almost the same prediction performance re-

sults as ANN model for this time series.

From the simulation above, the results show that the proposed

ADNNmodel can make predictions on a complicated chaotic time-

series as well as those by the ANNalgorithm of predicting chaotic

time-series. The ADNN model reduces the problems of amplitude

changing and trend determination. When detrended signals are

without amplitude changing, the prediction results are similar to

those of the ANNas expected.

3.3. Real time-series

In this section, the proposed method is tested on three real data

time series. In the real data time-series, there must have many er-

rors because of many reasons such as observation error. So, in this

part of simulation, we dont add the noise term into the series.

The sunspots dataset (Fig. 8a) is natural and contains the yearly

number of dark spots on the sun from 1701 to 2007. The time-ser-

ies has a pseudo-period of 1011 years. It is common practice

(McDonnell and Waagen, 1994) to use the data from 1700 to

1920 as a training set and to assess the performance of the model

on another set of 19211955 (Test 1). The parameters k and m are

set as 3 and 11 respectively. The numbers of the hidden neurons is6. The prediction results are shown in Fig. 8b.

0 200 400 600 800 1000 1200 1400 1600 1800 20000

1

2

3

4

5

6

Time

x(t)+x0

(b)

3 2 1 0 1 2 34

2

0

2

4

x(t)

y(t)

(a)

1960 1970 1980 1990 2000 2010 2020 2030 20400

1

2

3

4

5

Time

x(t)+x0

(c)

Source data

Variation=0

Variation=0.01

Variation=0.05

Fig. 5. (a) Chaotic time-series based on the solution of Duffing equation, (b) horizontal component of the time-series (vertical line showing the prediction start) and (c) zoom

of predicted values with different noise terms using ADNN model.



8/10

The fitting traffic flow data consists of 1689 observations

(11 weeks) from the hourly vehicle count for the Monash Freeway,

outside Melbourne in Victoria, Australia, beginning in August 1995.

A graph of the data is shown in Fig. 9a. The parameters k and m are

set as 3 and 180 respectively. The numbers of the hidden neuronsis 12. The prediction results are shown in Fig. 9b.

The data in Fig. 10a are the payments based on paper docu-

ments (filled in and sent to the bank Kulesh et al., 2008). The con-

cerned data have appreciable seasonal components with sinusoidal

trends. The parameters k and m are set as 3 and 30 respectively.

The numbers of the hidden neurons is 8. The prediction resultsare shown in Fig. 10b.

0 200 400 600 800 1000 1200 1400 1600 1800 20000

2

4

6

8

10

12

14

Time

y(t)+

y0

(b)

10 5 0 5 104

2

0

2

4

6

8

10

12

x(t)

y(t)

(a)

1960 1970 1980 1990 2000 2010 2020 2030 20400

2

4

6

8

10

12

14

16

Time

x(t)+x0

Source data

Variation=0

Variation=0.01

Variation=0.05

Fig. 7. (a) Chaotic time-series based on the Ikeda map, (b) vertical component of the time-series (vertical line showing the prediction start) and (c) zoom of predicted values

with different noise terms using ADNN model.

0 200 400 600 800 1000 1200 1400 1600 1800 2000

0.4

0.6

0.8

1

1.2

1.4

Time

x(t)

(b)

0.4 0.6 0.8 1 1.2 1.4

0.4

0.6

0.8

1

1.2

1.4

x(t)

x(tt0)

(a)

1960 1970 1980 1990 2000 2010 2020 2030 2040

0.4

0.6

0.8

1

1.2

1.4

1.6

Time

x(t)

(c)

Source data

Variation=0

Variation=00.1

Variation=0.05

Fig. 6. (a) Chaotic time-series based on the solution of the MackeyGlass delay differential equation, (b) horizontal component of the time-series (vertical line showing the

prediction start) and (c) zoom of predicted values with different noise terms using ADNN model.



9/10

Table 8 summarizes the prediction performance of different

prediction models on the three sets of real time-series data men-

tioned above. It is obvious that the proposedADNNmethod outper-forms the three other models as the proposed method has the

advantage of adapting to local variations of trends and amplitudes

and solving the problem of over-fitting. Moreover, the neural

networks have the flexible non-linear modeling capability. Becauseof the reasons above, the real time-series prediction made by the

1750 1800 1850 1900 1950 20000

50

100

150

200

Time

T

imeseries

(a)

220 225 230 235 240 245 250 2550

20

40

60

80

100

120

140

160

Time

D

ataseries

(b)

Source data

Prediction data

Fig. 8. (a) Real sunspots data (vertical line showing the prediction start) and (b) zoom of predicted values using ADNN model.

0 200 400 600 800 1000 1 200 1 400 1 6000

1000

2000

3000

4000

5000

6000

Time

Timeseries

(a)

1560 1580 1600 1620 1640 1660 16800

1000

2000

3000

4000

5000

Time

Timeserie

s

(b)

Source data

Prediction data

Fig. 9. (a) Real traffic flow data (vertical line showing the prediction start) and (b) zoom of predicted values using ADNN model.

100 200 300 400 500

3

4

5

6

7

8

9

Time

Dataseries

(a)

505 510 515 520 525 530 535 540 545 5502

3

4

5

6

7

8

9

Time

Dataseries

(b)

Source data

Prediction data

Fig. 10. (a) Real payment data based on paper documents (vertical line showing the prediction start) and (b) zoom of predicted values using ADNN model.

Table 8

Prediction summary for real data time-series.

Time-series Sunspot time-series Traffic time-series Payments time-series

MAPE (%) NMSE MAPE (%) NMSE MAPE (%) NMSE

ADNN 28.45 0.068 14.31 0.0193 8.08 0.0109

ANN 30.8 0.078 17.97 0.0267 15.24 0.0274

AR 31.2 0.0852 26.98 0.0818 9.06 0.0113

AKN 50.3 0.1833 17.39 0.0206 12.5 0.0178



10/10

proposed method gives us better fitting to the real data in compar-

ison with the three other methods.

4. Conclusions

This study presents a novel adaptive approach to extending the

artificial neural network in which the adaptive metrics of inputs

and a new mechanism for admixture of outputs are proposed fortime-series prediction. The experimental results generated by a

set of consistent performance measures with different metrics

(MAPE, NMSE) show that this new method can improve the accu-

racy of time-series prediction. The performance of the proposed

method is validated by three sets of complex time-series, namely

deterministic synthetic time-series, chaotic time-series and real

time-series. In addition, the predicted results generated by the

ADNNare also compared with those by theANN,AKN, andAR meth-

ods and indicate that the proposed model outperforms these con-

ventional techniques, particularly in forecasting chaotic and real

time-series.

References

Adya, M., Collopy, F., 1998. How effective are neural networks at forecasting and

prediction? A review and evaluation. Journal of Forecasting 17, 481495.

Barbounis, T.G., Teocharis, J.B., 2007. Locally recurrent neural networks for wind

speedprediction using spatial correlation. Information Science 177, 57755797.

Bartlett, P.L., 1997. For valid generalization, the size of the weights is more

important than the size of the network. In: Mozer, M.C., Jordan, M.I., Petsche, T.

(Eds.), Advances in Neural Information Processing Systems, vol. 9. The MIT

Press, Cambridge, MA, pp. 134140.

Bodyanskiy, Y., Popov, S., 2006. Neural network approach to forecasting of

quasiperiodic financial time series. European Journal of Operational Research

175, 13571366.

Brooks, C., 2002. Introductory Econometrics for Finance. Cambridge University

Press, Cambridge, UK. p. 289.

Casdagli, M., 1989. Nonlinear prediction of chaotic time series. Physics D 35, 335

356,.

Celik, A.E., Karatepe, Y., 2007. Evaluating and forecasting banking crises through

neural network models: an application for Turkish banking sector. Expert

Systems with Applications 33, 809815.Chen, S.M., Hwang, J.R., 2000. Temperature prediction using fuzzy time series. IEEE

Transactions on Systems, Man and Cybernetics Part B 30, 263275.

Cybenko, G., 1989. Approximation by superpositions of a sigmoidal function.

Mathematics of Control, Signals, and Systems 2, 303314.

Freitas, P.S.A., Rodrigues, A.J.L., 2006. Model combination in neural-based

forecasting. European Journal of Operational Research 173, 801814.

Funahashi, K., 1989. On the approximate realization of continuous mappings by

neural networks. Neural Networks 2, 183192.

Geman, S., Bienenstock, E., Doursat, R., 1992. Neural networks and the bias/variance

dilemma. Neural Computation 4, 158.

Hansen, J.V., McDonald, J.B., Nelson, R.D., 2002. Time series prediction with genetic-

algorithm designed neural networks: An empirical comparison with modern

statistical models. Computational Intelligence 15, 171184.

Huarng, K., Yu, T.H., 2006. Ratio-based lengths of intervals to improve fuzzy time

series forecasting. IEEE Transactions on Systems, Man and Cybernetics Part B36, 328340.

Kim, D., Kim, C., 1997. Forecasting time series with genetic fuzzy predictor

ensemble. IEEE Transactions on Fuzzy Systems 5, 523535.

Kulesh, M., Holschneider, M., Kurennaya, K., 2008. Adaptive metrics in the nearest

neighbours method. Physics D 237, 283291.

Liu, M.C., Kuo, W., Sastri, T., 1995. An exploratory study of a neural network

approach for reliability data analysis. Quality and Reliability Engineering

International 11, 107112.

Makridakis, S., 1996. Forecasting: its role and value for planning and strategy.

International Journal of Forecasting 12, 513537.

Makridakis, S., Wheelwright, S.C., Hyndman, R.J., 1998. Forecasting-Methods and

Applications, third ed. Wiley, New York. pp. 4250.

Masters, T., 1995. Advanced Algorithms for Neural Networks: A C++ Sourcebook.

Wiley, New York.

McDonnell, J.R., Waagen, D., 1994. Evolving recurrent perceptrons for time series

modeling. IEEE Transactions on Neural Networks 5, 2438.

Moody, J.E., 1992. Theeffective number of parameters: an analysis of generalization

and regularization in nonlinear learning systems. Neural Information

Processing Systems 4, 847854.

Murray, D.B., 1993. Forecasting a chaotic time series using an improved metric for

embedding space. Physics D 68, 318325.

Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P., 1992. Numerical Recipe

in C: The Art of Scientific Computing. Cambridge University Press.

Sahoo, G.B., Ray, C., 2006. Flow forecasting for a Hawaii stream using rating curves

and neural networks. Journal of Hydrology 317, 6380.

Singh, P., Deo, M.C., 2007. Suitability of different neural networks in daily flow

forecasting. Applied Soft Computing 7, 968978.

Taylor, J.W., Buizza, R., 2002. Neural network load forecasting with weather

ensemble predictions. IEEE Transactions on Power Systems 17, 59.

Wang, T., Chien, S., 2006. Forecasting innovation performance via neural networks

a case of Taiwanese manufacturing industry. Technovation 26, 635643.

Wong, B.K., Vincent, S., Jolie, L., 2000. A bibliography of neural network business

applications research: 19941998. Operations Research and Computers 27,

10451076.

Zhang, G.P., 2003. Time series forecasting using a hybrid ARIMA andneuralnetwork

model. Neurocomputing 50, 159175.

Zhang, P., Qi, G.M., 2005. Neural network forecasting for seasonal and trend timeseries. European Journal of Operational Research 160, 501514.

Zhang, G., Eddy, P.B., Hu, M.Y., 1998. Forecasting with artificialneuralnetworks: the

state of the art. International Journal of Forecasting 14, 3562.


Adaptive neural network model for time-series forecasting.pdf

Documents

Transcript of Adaptive neural network model for time-series forecasting.pdf