Adaptive neural network model for time-series forecasting.pdf
-
Upload
james-oconnor -
Category
Documents
-
view
219 -
download
0
Transcript of Adaptive neural network model for time-series forecasting.pdf
-
7/29/2019 Adaptive neural network model for time-series forecasting.pdf
1/10
Stochastics and Statistics
Adaptive neural network model for time-series forecasting
W.K. Wong a,*, Min Xia a,b, W.C. Chu a
a Institute of Textiles and Clothing, The Hong Kong Polytechnic University, Hong Kongb College of Information Science and Technology, Donghua University, Shanghai, China
a r t i c l e i n f o
Article history:
Received 27 January 2010Accepted 14 May 2010
Available online 1 June 2010
Keywords:
Time-series
Forecasting
Adaptive metrics
Neural networks
a b s t r a c t
In this study, a novel adaptive neural network (ADNN) with the adaptive metrics of inputs and a new
mechanism for admixture of outputs is proposed for time-series prediction. The adaptive metrics ofinputs can solve the problems of amplitude changing and trend determination, and avoid the over-fitting
of networks. The new mechanism for admixture of outputs can adjust forecasting results by the relative
error and make them more accurate. The proposed ADNNmethod can predict periodical time-series with
a complicated structure. The experimental results show that the proposed model outperforms the auto-
regression (AR), artificial neural network (ANN), and adaptive k-nearest neighbors (AKN) models. TheADNN
model is proved to benefit from the merits of the ANNand the AKNthrough its novel structure with high
robustness particularly for both chaotic and real time-series predictions.
2010 Elsevier B.V. All rights reserved.
1. Introduction
Many planning activities require prediction of the behavior of
variables (e.g. economic, financial, traffic and physical). The predic-tions support the strategic decisions of organizations (Makridakis,
1996), which in turn sustain a practical interest in forecasting
methods. Time-series methods are generally used to model fore-
casting systems when there is not much information about the
generation process of the underlying variable and when other vari-
ables provide no clear explanation about the studied variable
(Zhang, 2003).
Time-series forecasting is used to forecast the future based on
historical observations (Makridakis et al., 1998). There have been
many approaches to modeling time-series dependent on the the-
ory or assumption about the relationship in the data (Huarng
and Yu, 2006; Chen and Hwang, 2000; Taylor and Buizza, 2002;
Kim and Kim, 1997; Zhang et al., 1998; Wang and Chien, 2006;
Singh and Deo, 2007). Traditional methods, such as time-series
regression, exponential smoothing and autoregressive integrated
moving average (Brooks, 2002) (ARIMA), are based on linear mod-
els. All these methods assume linear relationships among the past
values of the forecast variable and therefore non-linear patterns
cannot be captured by these models. One problem that makes
developing and implementing this type of time-series model diffi-
cult is that the model must be specified and a probability distribu-
tion for data must be assumed (Hansen et al., 2002).
Approximation of linear models to complex real-world problems
is not always satisfactory.
Recently, artificial neural networks (ANN) have been proposed
as a promising alternative to time-series forecasting. A large num-
ber of successful applications have shown that neural networks
can be a very useful tool for time-series modeling and forecasting(Adya and Collopy, 1998; Zhang et al., 1998; Celik and Karatepe,
2007; Wang and Chien, 2006; Sahoo and Ray, 2006; Singh and
Deo, 2007; Barbounis and Teocharis, 2007; Bodyanskiy and Popov,
2006; Freitas and Rodrigues, 2006). The reason is that the ANNis a
universal function approximator which is capable of mapping any
linear or non-linear functions (Cybenko, 1989; Funahashi, 1989).
Neural networks are basically a data-driven method with few pri-
ori assumptions about underlying models. Instead they let data
speak for themselves and have the capability to identify the under-
lying functional relationship among the data. In addition, the ANN
is capable of tolerating the presence of chaotic components and
thus is better than most methods (Masters, 1995). This capacity
is particularly important, as many relevant time-series possess sig-
nificant chaotic components.
However, since the neural network lacks a systematic proce-
dure for model-building, the forecasting result is not always accu-
rate when the input data is very different from the training data.
Like other flexible non-linear estimation methods such as kernel
regression and smoothing splines, the ANN may suffer either
under-fitting or over-fitting (Moody, 1992; Geman et al., 1992;
Bartlett, 1997). A network that is not sufficiently complex can fail
to fully detect the signal in a complicated data set and lead to
under-fitting. A network that is too complex may fit not only the
signal but also the noise and lead to over-fitting. Over-fitting is
especially misleading because it can easily lead to wild prediction
far beyond the range of the training data even with the noise-free
data. In order to solve this problem, a novelANNmodel is proposed
0377-2217/$ - see front matter 2010 Elsevier B.V. All rights reserved.doi:10.1016/j.ejor.2010.05.022
* Corresponding author. Tel.: +00852 64300917.
E-mail address: [email protected] (W.K. Wong).
European Journal of Operational Research 207 (2010) 807816
Contents lists available at ScienceDirect
European Journal of Operational Research
j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / e j o r
http://dx.doi.org/10.1016/j.ejor.2010.05.022mailto:[email protected]://dx.doi.org/10.1016/j.ejor.2010.05.022http://www.sciencedirect.com/science/journal/03772217http://www.elsevier.com/locate/ejorhttp://www.elsevier.com/locate/ejorhttp://www.sciencedirect.com/science/journal/03772217http://dx.doi.org/10.1016/j.ejor.2010.05.022mailto:[email protected]://dx.doi.org/10.1016/j.ejor.2010.05.022 -
7/29/2019 Adaptive neural network model for time-series forecasting.pdf
2/10
in this study with the adaptive metrics of inputs, and the output
data is evolved by a mechanism for admixture. The adaptive met-
rics of inputs of the model can adapt to local variations of trends
and amplitudes. Most inputs of the network are close to the histor-
ical data in order to avoid a dramatic increase in the forecasting er-
ror due to the big difference between training data and input data.
In using the proposed mechanism for admixture of outputs, the
forecasting result can be adjusted by the relative error, making
the forecasting result more accurate.
The forecasting results generated by the proposed model are
compared with those obtained by the traditional statistical AR
model, traditional ANN architectures (BP network), and adaptive
k-nearest neighbors (AKN) method (Kulesh et al., 2008) in the re-
lated literature. The experimental results indicate that the pro-
posed model outperforms the other models, especially in chaotic
and real data time-series predictions.
This paper is organized as follows. In the next section, the fun-
damental principle of the proposed method is introduced. The
experimental results are presented in Section 3. The last section
concludes this study.
2. Methodology
We focus on one-step-ahead point forecasting in this work. Let
y1,y2,y3, . . . ,yt be a time-series. At time t for tP 1, the next value
yt+1 will be predicted based on the observed realizations of
yt,yt1,yt2, . . . ,y1.
2.1. The ANN approach to time-series modeling
The ANNis a flexible computing framework for a broad range of
non-linear problems (Wong et al., 2000). The network model is
greatly determined by data characteristics. A single hidden-layer
feed-forward network is the most widely used model for time-ser-
ies modeling and forecasting (Zhang and Qi, 2005). The model is
characterized by a network of three layers of simple processingunits connected by acyclic links. The hidden layers can capture
the non-linear relationship among variables. Each layer consists
of multiple neurons that are connected to neurons in adjacent
layers. The relationship between the output yt+1 and the inputs
yt,yt1,yt2, . . . ,ytp+1 has the following mathematical
representation:
yt1 a0 Xqj1
ajg b0j Xpi1
bijyti1
! e; 1
where aj (j = 0,1,2,. . . , q) and bij (i = 0,1,2, . . . ,p; j = 1,2,3,. . . , q) arethe model parameters called connection weights, p is the number
of input nodes and q is the number of hidden nodes. The logistic
function is often used as the hidden-layer transfer function, which
is,
gx 1
1 ex: 2
A neural network can be trained by the historical data of a time-
series in order to capture the characteristics of this time-series. The
model parameters (connection weights andnode biases) can be ad-
justed iteratively by the process of minimizing the forecasting er-
rors (Liu et al., 1995).
2.2. Adaptive neural network model for forecasting (ADNN)
It is well known that the ANNmay suffer either under-fitting orover-fitting (Moody, 1992; Geman et al., 1992; Bartlett, 1997). A
network that is not sufficiently complex can fail to fully detect
the signal leads to under-fitting. Over-fitting generally occurs
when a model is excessively complex. A model which has been un-
der-fitting or over-fitting will generally have poor predictive per-
formance, as it can exaggerate minor fluctuations in the data. For
these two problems, the over-fitting is more important when the
signal data is sufficient and the network is sufficiently complex.
Thus, in this paper we emphasize on the problem of over-fitting
for theANN. Generally, theANNalgorithm is said to over-fitting rel-
ative to a simpler one if it is more accurate in fitting known data
(hindsight) but less accurate in predicting new data (foresight).
In order to avoid over-fitting, the adaptive neural network model
is proposed. In this model, the hindsight data is used to modify
the inputs of the ANN in the prediction processing making the in-
puts approach to the learning data. Thus, this algorithm can reduce
the chance of over-fitting. Based on the current ANN, an extension
is done to develop the adaptive neural network (ADNN) model for
time-series forecasting. Firstly, a strategy is used to initialize the
input data yt,yt1,yt2, . . . ,ytm+1, where m is the number of input
nodes. The strategy adopts the adaptive metrics which are similar
to the adaptive k-nearest neighbor method. The data set
yt,yt1,yt2, . . . ,ytm+1 is compared with the other parts of this
time-series, which have the same length. The determination of
the closeness measure is the major factor in prediction accuracy.
Closeness is usually defined in terms of metric distance on the
Euclidean space. The most common choices are the Minkowski
metrics:
LMYt; Yr jyt yrjd jyt1 yr1j
d jytm1 yrm1jd
1
d:
3
This equation gives the value difference between Yt and Yr, but
the differences of trends and amplitudes are not presented. In
time-series forecasting, the information on trends and amplitudes
is the crucial factor. In this study, adaptive metrics are introduced
to solve this problem and the arithmetic is presented as:
LAYt; Yr minkr;urfrkr; ur; 4
frkr; ur jyt kryr urjd jyt1 kryr1 urj
d
jytm1 kryrm1 urjd
1
d; 5
where hr and lr are the largest and smallest elements of vector cor-
respondingly, kr 2 1;hrlr
h i, ur2 [0,hr lr]. The parameter of minimi-
zation kr equilibrates the amplitude difference between Yt and Yr.
The parameter ur is responsible for the trend of time-series.
The optimization problem (4) can be solved by the algorithm of
LevenbergMarquardt (Press et al., 1992) optimization or other
gradient methods for dP 1. In this study, d is assumed to be 2
and gives the widely used Euclidean metrics.
frkr; ur jyt kryr urj2
jyt1 kryr1 urj2
jytm1 kryrm1 urj2
1
2: 6
For d = 2, two equations are considered:
@frkr;ur@kr
0;
@frkr;ur@ur
0:
(7
When the corresponding linear system is solved, the solution of the
minimization problem can be obtained analytically:
ur z1z2 z3z4
mz2 z23
; kr mz4 z1z3
mz2 z23
;
where
808 W.K. Wong et al. / European Journal of Operational Research 207 (2010) 807816
-
7/29/2019 Adaptive neural network model for time-series forecasting.pdf
3/10
z1 Xmi1
yti1; z2 Xmi1
y2ri1; z3 Xmi1
yri1;
z4 Xmi1
yri1yti1:
Based on this strategy, the adaptive k-nearest neighbors are
chosen. And the input vector of the first network (known as the
main network) can be defined as:
inputv
qvt ; qv
t1; . . . ; qv
tp1
yt urvkrv
;yt1 urv
krv; . . . ;
ytp1 urvkrv
: 8
Most input values can be close to the historical data using this
method. The forecasting error increases dramatically due to the
big difference between training data and input data. In order to
get more accurate results for time-series yt,yt1,yt2, . . . ,ytp+1, k
sets of inputs are used and the output vector are outputv = bv,
v= 1,2,. . . , k. Due to the different value of LAYt; Yrv; krv ; urv , and
jt rvj for v = 1,2,. . . , k, the forecasting result is affected. The rela-
tive error is used to measure the impact of LAYt; Yrv ; krv ; urv ,
and jt rvj. The relative error is defined as REv
yrv~yrv
yrv , where yrvis the source point and ~yrv is the predicted point. In this study,
the second neural network (known as modified network) is used
to find out the relationship between the four factors and the REv.
The estimated result of REv is fREv, which is presented as follows:fREv fLAYt; Yrv ; krv ; urv ; jt rvj: 9The mechanism for admixture of outputs is presented as
follows:
yt1 1
U
Xkv1
krv bv urv eeREv ; 10
U
Xk
v1
e
eREv : 11
From Eq. (10), the forecasting result of yt+1 is calculated from bvv= 1,2,. . . , k with different weighing coefficients. Based on the
methodology proposed above, the forecasting scheme can be for-
mulated as shown in Fig. 1.
The steps of the algorithm of the proposed method are given as
follows.
Step 1: Train the two neural networks using the historical data. In
the first neural network, yi,yi1,yi2, . . . ,yim+1 are the
input training data, and yi+1 is the output training data.
In the second neural network, LAYt; Yrv; krv ; urv, and
jt rvj are the training inputs, and the relative error fREvis the training output. BP algorithm is used to train these
two neural networks.
Step 2: Compare the data set yt,yt1,yt2 , . . . ,ytm+1 and other
parts of the time-series using the adaptive metrics dis-
tance on the Euclidean space based on Eq. (6).
Step 3: Choose the k-nearest neighbors and get krv ; urv based on
Eq. (7). Initialize the input data of the first neural network
according to Eq. (8). The input data of the first neural net-
work are qvi ; qv
i1; qv
i2; . . . ; qv
im1; v 1; 2; . . . ; k.
Step 4: Apply the first neural network and obtain the results of the
outputv = bv, v= 1,2,. . . , k.
Step 5: Use LAYt; Yrv; krv ; urv, and j t rvj to predict the relative
error fREv, the number of hidden neurons is 5 for all simu-lations in the second neural network.
Step 6: Apply the mechanism for admixture of Eq. (10) and obtain
the forecasting result of yt+1.
3. Numerical simulations
Since the auto-regression model (AR), the traditional Back Prop-
agation (BP) ANN architectures and the adaptive k-nearest neigh-
bors method (AKN) are the popular methods for forecasting
programme, the performance of the proposed model is bench-
marked against ZhangsAR model (Zhang, 2003), AdyasANNmodel
(Adya and Collopy, 1998), and Kuleshs AKN model (Kulesh et al.,
2008) reported in the literature. To illustrate the accuracy of the
method, several deterministic and chaotic time-series are gener-
ated and predicted, and three real time-series are considered in
this section. The Mean Absolute Percentage Error (MAPE) statistic
is used to evaluate the forecasting performance of the model. The
MAPE is regarded as one of the standard statistical performance
measures and takes the following form
MAPE 1
MXM
i1
yi ~yiyi
100%;
where yi is the source point, ~yi is the predicted point and M is the
number of predicted points.
The mean squared error (NMSE) is used as the error criterion,
which is the ratio of the mean squared error to the variance of
the time-series. It defined, for a time-series yi, by
NMSE
PMi1yi ~yi
2PMi1yi ^yi
2
PMi1yi ~yi
2
Mr2;
^yi 1
M
XMi1
yi;
Fig. 1. The forecasting scheme for adaptive neural network modeling.
W.K. Wong et al. / European Journal of Operational Research 207 (2010) 807816 809
-
7/29/2019 Adaptive neural network model for time-series forecasting.pdf
4/10
where ^yi is the mean value of the source data and r2 is the varianceof the source data.
In the simulations, for bothAKNmethod andADNNmethod, the
number of nearest neighbors is set as k, and the data length for
comparison is set as m. For each simulation, we use the AR(m)
for forecasting. The number of input nodes and the number of hid-
den nodes for ANNmethod are set the same as the first neural net-
work of ADNN, and the parameters setting for all simulations areshown in Table 1.
3.1. Deterministic synthetic examples
In this section, the proposed method is tested on three deter-
ministic time-series with obvious seasonal dependence, trend
and amplitude change. The corresponding MAPEvalues and NMSE
values of the predicted time-series are listed in Tables 24 respec-
tively. In order to investigate the generalization ability for the pro-
posed model, the different noise terms are added into the time
series. For the time-series yt,yt1,yt2, . . . ,y1, the noise term is
rt,rt1,rt2, . . . ,r1. Thus, the training inputs are yi + ri,yi1 +ri1,yi3 + ri3, . . . ,yim+1 + rim+1, and the training output isyi+1 + ri+1. In this study, the noise terms are generated by normaldistribution.
The first synthetic time-series is a seasonal dependence series
with linear increasing as shown in Fig. 2a. The equation of the sea-
sonal dependence series is:
St cost
25
sin
t
100
t
1000 1; t2 0;2200;
where t denotes time. For this time-series data set, the first 2200
lengths are used for training, and the last 200 source time-series
lengths for prediction. The parameters k and m are set as 2 and100 respectively. The numbers of the hidden neurons is 16. For this
Table 1
Parameters setting for simulations.
Types of time-series k First networks hidden
nodes of ADNN
m
Seasonal dependence time-series 2 16 100
Multiplicative seasonality time-series 2 8 15
High-frequency time-series 3 10 70
Duffing chaotic time-series 2 8 50
MackeyGlass chaotic time-series 3 6 14Ikeda map chaotic time-series 3 5 28
Sunspot time-series 3 6 11
Traffic time-series 3 12 180
Payments time-series 3 8 30
Table 2
Prediction summary for seasonal dependence time-series.
Noise AKN AR ANN ADNN
MAPE NMSE MAPE NMSE MAPE (%) NMSE MAPE NMSE
0 4.83 107 6.35 1011 3.43 1017 2.53 1030 0.02 1.16 1010 3 103 1.3 1010
0.001 0.83% 7.34 105 0.84% 1.23 104 2.95 1.8 103 1.05% 1.31 104
0.005 2.4% 6.08 104 2.1083% 8.53 104 4.32 4.1 103 2.86% 1.34 103
0.01 2.65% 7.45 104 2.55% 2.1 104 4.85 3.1 104 3.49% 1.21 103
0.02 3.31% 9.81 104 3.19% 2.8 104 5.21 5.2 103 3.87% 2.6 103
0.03 4.74% 2.1 103 4.18% 3.2 103 5.55 7.1 103 4.31% 3.3 103
0.04 4.92% 2.4 103 5.12% 3.9 103 6.44 9.2 103 5.49% 4.4 103
0.05 5.68% 2.9 103
5.32% 4.8 103
8.06 1.21 102
6.31% 5.2 103
Table 3
Prediction summary for multiplicative seasonality series.
Noise AKN AR ANN ADNN
MAPE (%) NMSE MAPE (%) NMSE MAPE (%) NMSE MAPE (%) NMSE
0 1.15 1.42 104 1.18 1.88 106 13.81 0.09 1.17 6.72 104
0.001 3.61 1.51 103 3.05 2.7 103 15.69 0.12 4.27 3.9 103
0.005 4.76 1.89 103 7.29 9.7 103 17.01 0.15 6.11 1.1 102
0.01 5.6 6.3 103 8.43 1.89 102 19.31 0.18 9.13 1.72 102
0.02 7.47 8.3 103 12.46 3.17 102 22.44 0.22 12.95 5.58 102
0.03 8.92 1.88 102 13.43 5.31 102 28.75 0.41 13.67 7.3 102
0.04 10.56 2.12 102 14.49 7.2 102 34.78 0.52 14.76 8.8 102
0.05 11.85 3.89 102 15.71 8.3 102 35.34 0.64 15.92 9.8 102
Table 4
Prediction summary for high-frequency series.
Noise AKN AR ANN ADNN
MAPE (%) NMSE MAPE (%) NMSE MAPE (%) NMSE MAPE (%) NMSE
0 5.52 9.14 104 1.18 3.54 105 1.09 7.01 105 1.17 1.78 104
0.001 7.55 1.6 103 5.36 1.7 103 5.88 1.79 103 5.57 2.18 103
0.005 8.91 5.7 103 6.78 3.1 103 6.22 3.2 103 6.32 3.2 103
0.01 10.98 4.3 103 9.41 4.2 103 8.20 3.9 103 8.24 4.1 103
0.02 14.72 7.3 103 9.96 4.9 103 8.96 4.2 103 9.17 4.5 103
0.03 17.31 8.2 103 13.15 9.6 103 9.47 4.9 103 9.52 4.9 103
0.04 24.43 1.3 102 14.09 1.1 102 9.67 5.3 103 10.16 5.4 103
0.05 32.27 3.12 102 15.38 1.6 102 11.24 6.5 103 12.01 8.3 103
810 W.K. Wong et al. / European Journal of Operational Research 207 (2010) 807816
-
7/29/2019 Adaptive neural network model for time-series forecasting.pdf
5/10
time series, the amplitude of seasonal components does not change,
which means that the optimal value of parameter krv is equal to 1.
In the prediction process, only the parameter urv , which is respon-
sible for a changing trend, should be determined. Fig. 2b illustrates
the prediction results with different noise terms using ADNN. In
Fig. 2b, the prediction data presented by *, h, and + are under
the circumstances that the source data added with the noise varia-
tion 0, 0.01, and 0.05 respectively. The simulation result shows that
the model can predict this time-series very accurately when the
source data with no noise term, and the forecasting accuracy is de-
creased when the noise variation is increased. The performance of
different models is described in Table 2. Table 2 shows that the
ADNNhas better performance than the ANN model, and has almost
the same performance of theAKN. As the synthetic time-series has a
feature of strong orderliness, the result of ADNN is not better than
that of the AR model. Table 2 also indicates that ADNN model has
almost the same noise endurance ability as the AKN and ANN for
this time-series.The second time-series, which is non-linear and multiplicative
seasonal, is simulated and the results are shown in Fig. 3a. This
time-series has a non-linear trend and the amplitude of seasonal
oscillations increases with time. The model of the time-series is de-
scribed as:
St Rt; t 2 0;79;
Sts2
St2s ; t 2 80;590;
(
where Rt 170000
sin t350
cos 9t
7
10
, s = 14. Prediction is done for
15% of source time-series length; the first 85% observations are used
for training. The parameters k and m are set as 2 and 15 respec-
tively. The numbers of the hidden neurons is 8. Fig. 3b illustrates
the prediction results with different noise terms using ADNN.Fig. 3b illustrates that the predicted data is almost the same as
the source data with no noise term. This time-series has a peculiar-
ity in which the amplitude of every following periodic segment is as
much a fixed time as the previous segment. The performance for
different models is described in Table 3. Table 3 shows that the per-
formance of theADNNis better than that of theANN, but more close
to that of the AKN. Like the first time-series simulation above, the
time-series also has a feature of strong orderliness, and the AR mod-
el generates the same results as those of the ADNN. Table 3 also
indicates that ADNN model has better noise endurance ability than
ANN, but almost the same as the AKN for this time-series.
The third synthetic time-series is high-frequency with multipli-
cative seasonality and smoothly increasing amplitude as shown in
Fig. 4a. To formulate this time-series, the following explicit expres-
sion is used:
St t
100sin
t
2 cos
t
20 ; t 2 0;550:
It models a high-frequency series with seasonal periodicity. The
prediction is done for 111
of the time-series length. The first 1011
of
the source data is used for training. The parameters k and m are
set as 3 and 70 respectively. The numbers of the hidden neurons
is 10. Fig. 4b shows the prediction results with different noise terms
using ADNN. The performance for different models is described in
Table 4. Table 4 indicates that the ADNN outperforms the AKN,
but has the same performance as the ANN and AR models. Table 4
also indicates that ADNN model has better noise endurance ability
than AKN, but almost the same as the ANN for this time-series.
In the seasonal dependence series with linear increasing and
non-linear multiplicative seasonality, the performance of theADNN
is better than that of the ANN, but the same as that of the AKN. In
high-frequency time-series with multiplicative seasonality andsmoothly increasing amplitude, the performance of the ADNN is
0 500 1000 1500 20000
1
2
3
4
Time
D
ataseries
(a)
2000 2050 2100 2150 22002
2.5
3
3.5
4
4.5
5
Time
Dataseries
(b)
Source data
Variation=0
Variation=0.01
Variation=0.05
Fig. 2. (a) Time-series with seasonal dependence (vertical line showing the prediction start) and (b) zoom of predicted values with different noise terms using ADNNmodel.
400 450 500 5500
20
40
60
80
Time
Dataser
ies
(a)
560 565 570 575 580 585 5900
20
40
60
80
Time
Dataser
ies
(b)
Source data
Variation=0
Variation=0.01
Variation=0.05
Fig. 3. (a) Time-series with multiplicative seasonality (vertical line showing the prediction start) and (b) zoom of predicted values with different noise terms using ADNN
model.
W.K. Wong et al. / European Journal of Operational Research 207 (2010) 807816 811
-
7/29/2019 Adaptive neural network model for time-series forecasting.pdf
6/10
the same as the ANN and AR models, and better than that of the
AKN. The above simulation results show that theADNNmodel ben-
efits from the merits of the ANNand AKN.
3.2. Chaotic time-series
In this section, the proposed method is tested on three chaotic
time series. The corresponding MAPE values and NMSE values of
the predicted time-series are listed in Tables 57 respectively.
The Duffing-equation chaotic time-series consists of 2050
observations generated by the equation:
dydx
y x x3 bcosat;dydx
y:
(
The results based on this equation are shown in Fig. 5a. For predic-
tion, only the horizontal component of this chaotic two-component
series is used. We assumed that the time-series consists of the
0 100 200 300 400 5000
1
2
3
4
5
6
7
Time
D
ataseries
(a)
505 510 515 520 525 530 535 540 545 5500
2
4
6
8
Time
D
ataseries
(b)
Source data
Variation=0
Variation=0.01
Variation=0.05
Fig. 4. (a) High-frequency time-series with multiplicative seasonality (vertical line showing the prediction start) and (b) zoom of predicted values with different noise terms
using ADNN model.
Table 5
Prediction summary for Duffing chaotic time-series.
Noise AKN AR ANN ADNN
MAPE (%) NMSE MAPE (%) NMSE MAPE (%) NMSE MAPE (%) NMSE
0 7.2999 0.0034 0.6483 2.4143 105 0.2806 2.5246 106 0.3361 7.4141 106
0.001 13.4408 0.011 7.8903 0.0042 3.6449 1.1432 103 3.4630 1.1653 103
0.005 15.8850 0.0134 13.4571 0.0211 9.2552 0.0082 8.8457 0.0068
0.01 19.1673 0.0165 13.9801 0.0223 11.9432 0.0098 11.2729 0.0094
0.02 19.6696 0.0143 19.2554 0.0282 15.2521 0.0122 16.7104 0.0163
0.03 23.4675 0.0631 20.6164 0.0397 16.5124 0.0231 17.3692 0.0267
0.04 25.9714 0.0365 21.9570 0.0376 19.2540 0.0215 20.6404 0.0214
0.05 27.2354 0.0461 24.3629 0.0433 21.5172 0.0273 21.7155 0.0273
Table 6
Prediction summary for MackeyGlass chaotic time-series.
Noise AKN AR ANN ADNN
MAPE (%) NMSE MAPE (%) NMSE MAPE (%) NMSE MAPE (%) NMSE
0 1.0484 2.1431 103 0.4941 7.1211 105 0.0373 2.8314 107 0.0655 8.1435 107
0.001 1.5015 3.8461 104 4.2862 0.0031 2.0275 7.7453 104 2.1203 9.6342 104
0.005 3.3865 0.0016 5.6697 0.0061 5.6017 0.0061 6.5236 0.0065
0.01 5.4567 0.0039 6.4944 0.0078 6.9095 0.0082 7.3885 0.0086
0.02 7.1349 0.0067 7.3276 0.0098 8.8958 0.0132 9.0145 0.0172
0.03 9.0888 0.0121 7.8270 0.0110 9.0929 0.0126 10.2763 0.0183
0.04 10.9259 0.0190 9.4304 0.0169 12.1027 0.0324 11.0820 0.0194
0.05 12.7687 0.0287 10.3548 0.0182 13.6413 0.0116 12.3625 0.0213
Table 7
Prediction summary for Ikeda map chaotic time-series.
Noise AKN AR ANN ADNN
MAPE (%) NMSE MAPE (%) NMSE MAPE (%) NMSE MAPE (%) NMSE
0 20.5213 0.1231 11.2426 0.0164 9.3819 0.0142 9.1635 0.0114
0.001 21.5745 0.1348 13.3011 0.0193 10.3539 0.0194 9.7751 0.0178
0.005 24.7196 0.1333 17.4031 0.0272 12.8506 0.0183 12.9018 0.0204
0.01 27.3472 0.1408 19.5150 0.0383 14.5355 0.0253 14.9533 0.0281
0.02 26.6362 0.1512 22.9523 0.0577 16.7840 0.0316 16.5926 0.0334
0.03 27.0252 0.1331 24.1169 0.0643 18.6916 0.0298 18.7898 0.0385
0.04 31.5612 0.1612 29.8482 0.0786 21.4366 0.0399 21.2370 0.0465
0.05 34.2396 0.1827 32.5005 0.116 26.3222 0.0796 26.8273 0.0845
812 W.K. Wong et al. / European Journal of Operational Research 207 (2010) 807816
-
7/29/2019 Adaptive neural network model for time-series forecasting.pdf
7/10
positive values xiP 0 and therefore the value x0 is added to the
source data (Fig. 5b). The first 1950 observations are used for train-
ing and the remaining observations for testing. The parameters k
and m are set as 2 and 50 respectively. The numbers of the hidden
neurons is 8. The prediction results with different noise terms using
ADNN are shown in Fig. 5c. In Fig. 5c, the simulation result shows
that the model can predict this time-series perfectly if the source
data with no noise term, and the prediction get worse when thenoise variation is increased. The performance of different models
is described in Table 5. Table 5 indicates that the ADNN has better
performance than the AKN model and AR model, and has almost
the same performance of the ANN. Table 5 also indicates that ADNN
model has almost the same noise endurance ability as the ANN for
this time-series.
The MackeyGlass benchmarks (Casdagli, 1989) are well known
for their evaluation of prediction methods. The time-series is gen-
erated by the following non-linear differential equation:
dxt
dt bxt
axt s1 xct s
:
The different values ofs generate various degrees of chaos. Thebehavior is chaotic as s > 16.8 and s = 17, which is commonly seenin the literature. In this study, the parameters are set as a = 0.2,
b = 0.1, c= 10, s = 17 as shown in Fig. 6a. According to commonpractice (Kulesh et al., 2008), the first 1950 values of this series
are used for the learning set and the next 100 values for the testing
set. The parameters k and m are set as 3 and 14 respectively. The
numbers of the hidden neurons is 6. Fig. 6c depicts the prediction
results with different noise terms. The performance for different
models is described in Table 6. Table 6 indicates that the ADNN
has almost the same performance of the ANN and the AKN. Table
6 also indicates that ADNN model has almost the same noise
endurance ability as the ANNand AKN for this time-series.
The Ikeda map is another chaotic time-series experiment which
may be given in terms of mapping the complex plane to itself. Thecoordinates of the phase space are related to the complex degree of
freedom z=x + iy. The mapping is (Makridakis, 1996; Murray,
1993):
zn1 p Bznea b
1jzn j2
i;
wherep = 1, B = 1, a = 0.4,b = 6. Fig. 7a is generated by a time-seriesof the Ikeda map, and the series length is 2048. The prediction is
done for the translated vertical component y(t) +y0 for 5% of thetime-series length (Fig. 7b). The parameters k and m are set as 3
and 28 respectively. The numbers of the hidden neurons is 5. The
prediction results with different noise terms are shown in Fig. 7c.
The performance for different models is presented in Table 7. Table
7 indicates that the ADNN model has better prediction result than
AKN and AR, but has almost the same prediction performance re-
sults as ANN model for this time series.
From the simulation above, the results show that the proposed
ADNNmodel can make predictions on a complicated chaotic time-
series as well as those by the ANNalgorithm of predicting chaotic
time-series. The ADNN model reduces the problems of amplitude
changing and trend determination. When detrended signals are
without amplitude changing, the prediction results are similar to
those of the ANNas expected.
3.3. Real time-series
In this section, the proposed method is tested on three real data
time series. In the real data time-series, there must have many er-
rors because of many reasons such as observation error. So, in this
part of simulation, we dont add the noise term into the series.
The sunspots dataset (Fig. 8a) is natural and contains the yearly
number of dark spots on the sun from 1701 to 2007. The time-ser-
ies has a pseudo-period of 1011 years. It is common practice
(McDonnell and Waagen, 1994) to use the data from 1700 to
1920 as a training set and to assess the performance of the model
on another set of 19211955 (Test 1). The parameters k and m are
set as 3 and 11 respectively. The numbers of the hidden neurons is6. The prediction results are shown in Fig. 8b.
0 200 400 600 800 1000 1200 1400 1600 1800 20000
1
2
3
4
5
6
Time
x(t)+x0
(b)
3 2 1 0 1 2 34
2
0
2
4
x(t)
y(t)
(a)
1960 1970 1980 1990 2000 2010 2020 2030 20400
1
2
3
4
5
Time
x(t)+x0
(c)
Source data
Variation=0
Variation=0.01
Variation=0.05
Fig. 5. (a) Chaotic time-series based on the solution of Duffing equation, (b) horizontal component of the time-series (vertical line showing the prediction start) and (c) zoom
of predicted values with different noise terms using ADNN model.
W.K. Wong et al. / European Journal of Operational Research 207 (2010) 807816 813
-
7/29/2019 Adaptive neural network model for time-series forecasting.pdf
8/10
The fitting traffic flow data consists of 1689 observations
(11 weeks) from the hourly vehicle count for the Monash Freeway,
outside Melbourne in Victoria, Australia, beginning in August 1995.
A graph of the data is shown in Fig. 9a. The parameters k and m are
set as 3 and 180 respectively. The numbers of the hidden neuronsis 12. The prediction results are shown in Fig. 9b.
The data in Fig. 10a are the payments based on paper docu-
ments (filled in and sent to the bank Kulesh et al., 2008). The con-
cerned data have appreciable seasonal components with sinusoidal
trends. The parameters k and m are set as 3 and 30 respectively.
The numbers of the hidden neurons is 8. The prediction resultsare shown in Fig. 10b.
0 200 400 600 800 1000 1200 1400 1600 1800 20000
2
4
6
8
10
12
14
Time
y(t)+
y0
(b)
10 5 0 5 104
2
0
2
4
6
8
10
12
x(t)
y(t)
(a)
1960 1970 1980 1990 2000 2010 2020 2030 20400
2
4
6
8
10
12
14
16
Time
x(t)+x0
Source data
Variation=0
Variation=0.01
Variation=0.05
Fig. 7. (a) Chaotic time-series based on the Ikeda map, (b) vertical component of the time-series (vertical line showing the prediction start) and (c) zoom of predicted values
with different noise terms using ADNN model.
0 200 400 600 800 1000 1200 1400 1600 1800 2000
0.4
0.6
0.8
1
1.2
1.4
Time
x(t)
(b)
0.4 0.6 0.8 1 1.2 1.4
0.4
0.6
0.8
1
1.2
1.4
x(t)
x(tt0)
(a)
1960 1970 1980 1990 2000 2010 2020 2030 2040
0.4
0.6
0.8
1
1.2
1.4
1.6
Time
x(t)
(c)
Source data
Variation=0
Variation=00.1
Variation=0.05
Fig. 6. (a) Chaotic time-series based on the solution of the MackeyGlass delay differential equation, (b) horizontal component of the time-series (vertical line showing the
prediction start) and (c) zoom of predicted values with different noise terms using ADNN model.
814 W.K. Wong et al. / European Journal of Operational Research 207 (2010) 807816
-
7/29/2019 Adaptive neural network model for time-series forecasting.pdf
9/10
Table 8 summarizes the prediction performance of different
prediction models on the three sets of real time-series data men-
tioned above. It is obvious that the proposedADNNmethod outper-forms the three other models as the proposed method has the
advantage of adapting to local variations of trends and amplitudes
and solving the problem of over-fitting. Moreover, the neural
networks have the flexible non-linear modeling capability. Becauseof the reasons above, the real time-series prediction made by the
1750 1800 1850 1900 1950 20000
50
100
150
200
Time
T
imeseries
(a)
220 225 230 235 240 245 250 2550
20
40
60
80
100
120
140
160
Time
D
ataseries
(b)
Source data
Prediction data
Fig. 8. (a) Real sunspots data (vertical line showing the prediction start) and (b) zoom of predicted values using ADNN model.
0 200 400 600 800 1000 1 200 1 400 1 6000
1000
2000
3000
4000
5000
6000
Time
Timeseries
(a)
1560 1580 1600 1620 1640 1660 16800
1000
2000
3000
4000
5000
Time
Timeserie
s
(b)
Source data
Prediction data
Fig. 9. (a) Real traffic flow data (vertical line showing the prediction start) and (b) zoom of predicted values using ADNN model.
100 200 300 400 500
3
4
5
6
7
8
9
Time
Dataseries
(a)
505 510 515 520 525 530 535 540 545 5502
3
4
5
6
7
8
9
Time
Dataseries
(b)
Source data
Prediction data
Fig. 10. (a) Real payment data based on paper documents (vertical line showing the prediction start) and (b) zoom of predicted values using ADNN model.
Table 8
Prediction summary for real data time-series.
Time-series Sunspot time-series Traffic time-series Payments time-series
MAPE (%) NMSE MAPE (%) NMSE MAPE (%) NMSE
ADNN 28.45 0.068 14.31 0.0193 8.08 0.0109
ANN 30.8 0.078 17.97 0.0267 15.24 0.0274
AR 31.2 0.0852 26.98 0.0818 9.06 0.0113
AKN 50.3 0.1833 17.39 0.0206 12.5 0.0178
W.K. Wong et al. / European Journal of Operational Research 207 (2010) 807816 815
-
7/29/2019 Adaptive neural network model for time-series forecasting.pdf
10/10
proposed method gives us better fitting to the real data in compar-
ison with the three other methods.
4. Conclusions
This study presents a novel adaptive approach to extending the
artificial neural network in which the adaptive metrics of inputs
and a new mechanism for admixture of outputs are proposed fortime-series prediction. The experimental results generated by a
set of consistent performance measures with different metrics
(MAPE, NMSE) show that this new method can improve the accu-
racy of time-series prediction. The performance of the proposed
method is validated by three sets of complex time-series, namely
deterministic synthetic time-series, chaotic time-series and real
time-series. In addition, the predicted results generated by the
ADNNare also compared with those by theANN,AKN, andAR meth-
ods and indicate that the proposed model outperforms these con-
ventional techniques, particularly in forecasting chaotic and real
time-series.
References
Adya, M., Collopy, F., 1998. How effective are neural networks at forecasting and
prediction? A review and evaluation. Journal of Forecasting 17, 481495.
Barbounis, T.G., Teocharis, J.B., 2007. Locally recurrent neural networks for wind
speedprediction using spatial correlation. Information Science 177, 57755797.
Bartlett, P.L., 1997. For valid generalization, the size of the weights is more
important than the size of the network. In: Mozer, M.C., Jordan, M.I., Petsche, T.
(Eds.), Advances in Neural Information Processing Systems, vol. 9. The MIT
Press, Cambridge, MA, pp. 134140.
Bodyanskiy, Y., Popov, S., 2006. Neural network approach to forecasting of
quasiperiodic financial time series. European Journal of Operational Research
175, 13571366.
Brooks, C., 2002. Introductory Econometrics for Finance. Cambridge University
Press, Cambridge, UK. p. 289.
Casdagli, M., 1989. Nonlinear prediction of chaotic time series. Physics D 35, 335
356,.
Celik, A.E., Karatepe, Y., 2007. Evaluating and forecasting banking crises through
neural network models: an application for Turkish banking sector. Expert
Systems with Applications 33, 809815.Chen, S.M., Hwang, J.R., 2000. Temperature prediction using fuzzy time series. IEEE
Transactions on Systems, Man and Cybernetics Part B 30, 263275.
Cybenko, G., 1989. Approximation by superpositions of a sigmoidal function.
Mathematics of Control, Signals, and Systems 2, 303314.
Freitas, P.S.A., Rodrigues, A.J.L., 2006. Model combination in neural-based
forecasting. European Journal of Operational Research 173, 801814.
Funahashi, K., 1989. On the approximate realization of continuous mappings by
neural networks. Neural Networks 2, 183192.
Geman, S., Bienenstock, E., Doursat, R., 1992. Neural networks and the bias/variance
dilemma. Neural Computation 4, 158.
Hansen, J.V., McDonald, J.B., Nelson, R.D., 2002. Time series prediction with genetic-
algorithm designed neural networks: An empirical comparison with modern
statistical models. Computational Intelligence 15, 171184.
Huarng, K., Yu, T.H., 2006. Ratio-based lengths of intervals to improve fuzzy time
series forecasting. IEEE Transactions on Systems, Man and Cybernetics Part B36, 328340.
Kim, D., Kim, C., 1997. Forecasting time series with genetic fuzzy predictor
ensemble. IEEE Transactions on Fuzzy Systems 5, 523535.
Kulesh, M., Holschneider, M., Kurennaya, K., 2008. Adaptive metrics in the nearest
neighbours method. Physics D 237, 283291.
Liu, M.C., Kuo, W., Sastri, T., 1995. An exploratory study of a neural network
approach for reliability data analysis. Quality and Reliability Engineering
International 11, 107112.
Makridakis, S., 1996. Forecasting: its role and value for planning and strategy.
International Journal of Forecasting 12, 513537.
Makridakis, S., Wheelwright, S.C., Hyndman, R.J., 1998. Forecasting-Methods and
Applications, third ed. Wiley, New York. pp. 4250.
Masters, T., 1995. Advanced Algorithms for Neural Networks: A C++ Sourcebook.
Wiley, New York.
McDonnell, J.R., Waagen, D., 1994. Evolving recurrent perceptrons for time series
modeling. IEEE Transactions on Neural Networks 5, 2438.
Moody, J.E., 1992. Theeffective number of parameters: an analysis of generalization
and regularization in nonlinear learning systems. Neural Information
Processing Systems 4, 847854.
Murray, D.B., 1993. Forecasting a chaotic time series using an improved metric for
embedding space. Physics D 68, 318325.
Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P., 1992. Numerical Recipe
in C: The Art of Scientific Computing. Cambridge University Press.
Sahoo, G.B., Ray, C., 2006. Flow forecasting for a Hawaii stream using rating curves
and neural networks. Journal of Hydrology 317, 6380.
Singh, P., Deo, M.C., 2007. Suitability of different neural networks in daily flow
forecasting. Applied Soft Computing 7, 968978.
Taylor, J.W., Buizza, R., 2002. Neural network load forecasting with weather
ensemble predictions. IEEE Transactions on Power Systems 17, 59.
Wang, T., Chien, S., 2006. Forecasting innovation performance via neural networks
a case of Taiwanese manufacturing industry. Technovation 26, 635643.
Wong, B.K., Vincent, S., Jolie, L., 2000. A bibliography of neural network business
applications research: 19941998. Operations Research and Computers 27,
10451076.
Zhang, G.P., 2003. Time series forecasting using a hybrid ARIMA andneuralnetwork
model. Neurocomputing 50, 159175.
Zhang, P., Qi, G.M., 2005. Neural network forecasting for seasonal and trend timeseries. European Journal of Operational Research 160, 501514.
Zhang, G., Eddy, P.B., Hu, M.Y., 1998. Forecasting with artificialneuralnetworks: the
state of the art. International Journal of Forecasting 14, 3562.
816 W.K. Wong et al. / European Journal of Operational Research 207 (2010) 807816