Adaptive neural network model for time-series forecasting.pdf

download Adaptive neural network model for time-series forecasting.pdf

of 10

Transcript of Adaptive neural network model for time-series forecasting.pdf

  • 7/29/2019 Adaptive neural network model for time-series forecasting.pdf

    1/10

    Stochastics and Statistics

    Adaptive neural network model for time-series forecasting

    W.K. Wong a,*, Min Xia a,b, W.C. Chu a

    a Institute of Textiles and Clothing, The Hong Kong Polytechnic University, Hong Kongb College of Information Science and Technology, Donghua University, Shanghai, China

    a r t i c l e i n f o

    Article history:

    Received 27 January 2010Accepted 14 May 2010

    Available online 1 June 2010

    Keywords:

    Time-series

    Forecasting

    Adaptive metrics

    Neural networks

    a b s t r a c t

    In this study, a novel adaptive neural network (ADNN) with the adaptive metrics of inputs and a new

    mechanism for admixture of outputs is proposed for time-series prediction. The adaptive metrics ofinputs can solve the problems of amplitude changing and trend determination, and avoid the over-fitting

    of networks. The new mechanism for admixture of outputs can adjust forecasting results by the relative

    error and make them more accurate. The proposed ADNNmethod can predict periodical time-series with

    a complicated structure. The experimental results show that the proposed model outperforms the auto-

    regression (AR), artificial neural network (ANN), and adaptive k-nearest neighbors (AKN) models. TheADNN

    model is proved to benefit from the merits of the ANNand the AKNthrough its novel structure with high

    robustness particularly for both chaotic and real time-series predictions.

    2010 Elsevier B.V. All rights reserved.

    1. Introduction

    Many planning activities require prediction of the behavior of

    variables (e.g. economic, financial, traffic and physical). The predic-tions support the strategic decisions of organizations (Makridakis,

    1996), which in turn sustain a practical interest in forecasting

    methods. Time-series methods are generally used to model fore-

    casting systems when there is not much information about the

    generation process of the underlying variable and when other vari-

    ables provide no clear explanation about the studied variable

    (Zhang, 2003).

    Time-series forecasting is used to forecast the future based on

    historical observations (Makridakis et al., 1998). There have been

    many approaches to modeling time-series dependent on the the-

    ory or assumption about the relationship in the data (Huarng

    and Yu, 2006; Chen and Hwang, 2000; Taylor and Buizza, 2002;

    Kim and Kim, 1997; Zhang et al., 1998; Wang and Chien, 2006;

    Singh and Deo, 2007). Traditional methods, such as time-series

    regression, exponential smoothing and autoregressive integrated

    moving average (Brooks, 2002) (ARIMA), are based on linear mod-

    els. All these methods assume linear relationships among the past

    values of the forecast variable and therefore non-linear patterns

    cannot be captured by these models. One problem that makes

    developing and implementing this type of time-series model diffi-

    cult is that the model must be specified and a probability distribu-

    tion for data must be assumed (Hansen et al., 2002).

    Approximation of linear models to complex real-world problems

    is not always satisfactory.

    Recently, artificial neural networks (ANN) have been proposed

    as a promising alternative to time-series forecasting. A large num-

    ber of successful applications have shown that neural networks

    can be a very useful tool for time-series modeling and forecasting(Adya and Collopy, 1998; Zhang et al., 1998; Celik and Karatepe,

    2007; Wang and Chien, 2006; Sahoo and Ray, 2006; Singh and

    Deo, 2007; Barbounis and Teocharis, 2007; Bodyanskiy and Popov,

    2006; Freitas and Rodrigues, 2006). The reason is that the ANNis a

    universal function approximator which is capable of mapping any

    linear or non-linear functions (Cybenko, 1989; Funahashi, 1989).

    Neural networks are basically a data-driven method with few pri-

    ori assumptions about underlying models. Instead they let data

    speak for themselves and have the capability to identify the under-

    lying functional relationship among the data. In addition, the ANN

    is capable of tolerating the presence of chaotic components and

    thus is better than most methods (Masters, 1995). This capacity

    is particularly important, as many relevant time-series possess sig-

    nificant chaotic components.

    However, since the neural network lacks a systematic proce-

    dure for model-building, the forecasting result is not always accu-

    rate when the input data is very different from the training data.

    Like other flexible non-linear estimation methods such as kernel

    regression and smoothing splines, the ANN may suffer either

    under-fitting or over-fitting (Moody, 1992; Geman et al., 1992;

    Bartlett, 1997). A network that is not sufficiently complex can fail

    to fully detect the signal in a complicated data set and lead to

    under-fitting. A network that is too complex may fit not only the

    signal but also the noise and lead to over-fitting. Over-fitting is

    especially misleading because it can easily lead to wild prediction

    far beyond the range of the training data even with the noise-free

    data. In order to solve this problem, a novelANNmodel is proposed

    0377-2217/$ - see front matter 2010 Elsevier B.V. All rights reserved.doi:10.1016/j.ejor.2010.05.022

    * Corresponding author. Tel.: +00852 64300917.

    E-mail address: [email protected] (W.K. Wong).

    European Journal of Operational Research 207 (2010) 807816

    Contents lists available at ScienceDirect

    European Journal of Operational Research

    j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / e j o r

    http://dx.doi.org/10.1016/j.ejor.2010.05.022mailto:[email protected]://dx.doi.org/10.1016/j.ejor.2010.05.022http://www.sciencedirect.com/science/journal/03772217http://www.elsevier.com/locate/ejorhttp://www.elsevier.com/locate/ejorhttp://www.sciencedirect.com/science/journal/03772217http://dx.doi.org/10.1016/j.ejor.2010.05.022mailto:[email protected]://dx.doi.org/10.1016/j.ejor.2010.05.022
  • 7/29/2019 Adaptive neural network model for time-series forecasting.pdf

    2/10

    in this study with the adaptive metrics of inputs, and the output

    data is evolved by a mechanism for admixture. The adaptive met-

    rics of inputs of the model can adapt to local variations of trends

    and amplitudes. Most inputs of the network are close to the histor-

    ical data in order to avoid a dramatic increase in the forecasting er-

    ror due to the big difference between training data and input data.

    In using the proposed mechanism for admixture of outputs, the

    forecasting result can be adjusted by the relative error, making

    the forecasting result more accurate.

    The forecasting results generated by the proposed model are

    compared with those obtained by the traditional statistical AR

    model, traditional ANN architectures (BP network), and adaptive

    k-nearest neighbors (AKN) method (Kulesh et al., 2008) in the re-

    lated literature. The experimental results indicate that the pro-

    posed model outperforms the other models, especially in chaotic

    and real data time-series predictions.

    This paper is organized as follows. In the next section, the fun-

    damental principle of the proposed method is introduced. The

    experimental results are presented in Section 3. The last section

    concludes this study.

    2. Methodology

    We focus on one-step-ahead point forecasting in this work. Let

    y1,y2,y3, . . . ,yt be a time-series. At time t for tP 1, the next value

    yt+1 will be predicted based on the observed realizations of

    yt,yt1,yt2, . . . ,y1.

    2.1. The ANN approach to time-series modeling

    The ANNis a flexible computing framework for a broad range of

    non-linear problems (Wong et al., 2000). The network model is

    greatly determined by data characteristics. A single hidden-layer

    feed-forward network is the most widely used model for time-ser-

    ies modeling and forecasting (Zhang and Qi, 2005). The model is

    characterized by a network of three layers of simple processingunits connected by acyclic links. The hidden layers can capture

    the non-linear relationship among variables. Each layer consists

    of multiple neurons that are connected to neurons in adjacent

    layers. The relationship between the output yt+1 and the inputs

    yt,yt1,yt2, . . . ,ytp+1 has the following mathematical

    representation:

    yt1 a0 Xqj1

    ajg b0j Xpi1

    bijyti1

    ! e; 1

    where aj (j = 0,1,2,. . . , q) and bij (i = 0,1,2, . . . ,p; j = 1,2,3,. . . , q) arethe model parameters called connection weights, p is the number

    of input nodes and q is the number of hidden nodes. The logistic

    function is often used as the hidden-layer transfer function, which

    is,

    gx 1

    1 ex: 2

    A neural network can be trained by the historical data of a time-

    series in order to capture the characteristics of this time-series. The

    model parameters (connection weights andnode biases) can be ad-

    justed iteratively by the process of minimizing the forecasting er-

    rors (Liu et al., 1995).

    2.2. Adaptive neural network model for forecasting (ADNN)

    It is well known that the ANNmay suffer either under-fitting orover-fitting (Moody, 1992; Geman et al., 1992; Bartlett, 1997). A

    network that is not sufficiently complex can fail to fully detect

    the signal leads to under-fitting. Over-fitting generally occurs

    when a model is excessively complex. A model which has been un-

    der-fitting or over-fitting will generally have poor predictive per-

    formance, as it can exaggerate minor fluctuations in the data. For

    these two problems, the over-fitting is more important when the

    signal data is sufficient and the network is sufficiently complex.

    Thus, in this paper we emphasize on the problem of over-fitting

    for theANN. Generally, theANNalgorithm is said to over-fitting rel-

    ative to a simpler one if it is more accurate in fitting known data

    (hindsight) but less accurate in predicting new data (foresight).

    In order to avoid over-fitting, the adaptive neural network model

    is proposed. In this model, the hindsight data is used to modify

    the inputs of the ANN in the prediction processing making the in-

    puts approach to the learning data. Thus, this algorithm can reduce

    the chance of over-fitting. Based on the current ANN, an extension

    is done to develop the adaptive neural network (ADNN) model for

    time-series forecasting. Firstly, a strategy is used to initialize the

    input data yt,yt1,yt2, . . . ,ytm+1, where m is the number of input

    nodes. The strategy adopts the adaptive metrics which are similar

    to the adaptive k-nearest neighbor method. The data set

    yt,yt1,yt2, . . . ,ytm+1 is compared with the other parts of this

    time-series, which have the same length. The determination of

    the closeness measure is the major factor in prediction accuracy.

    Closeness is usually defined in terms of metric distance on the

    Euclidean space. The most common choices are the Minkowski

    metrics:

    LMYt; Yr jyt yrjd jyt1 yr1j

    d jytm1 yrm1jd

    1

    d:

    3

    This equation gives the value difference between Yt and Yr, but

    the differences of trends and amplitudes are not presented. In

    time-series forecasting, the information on trends and amplitudes

    is the crucial factor. In this study, adaptive metrics are introduced

    to solve this problem and the arithmetic is presented as:

    LAYt; Yr minkr;urfrkr; ur; 4

    frkr; ur jyt kryr urjd jyt1 kryr1 urj

    d

    jytm1 kryrm1 urjd

    1

    d; 5

    where hr and lr are the largest and smallest elements of vector cor-

    respondingly, kr 2 1;hrlr

    h i, ur2 [0,hr lr]. The parameter of minimi-

    zation kr equilibrates the amplitude difference between Yt and Yr.

    The parameter ur is responsible for the trend of time-series.

    The optimization problem (4) can be solved by the algorithm of

    LevenbergMarquardt (Press et al., 1992) optimization or other

    gradient methods for dP 1. In this study, d is assumed to be 2

    and gives the widely used Euclidean metrics.

    frkr; ur jyt kryr urj2

    jyt1 kryr1 urj2

    jytm1 kryrm1 urj2

    1

    2: 6

    For d = 2, two equations are considered:

    @frkr;ur@kr

    0;

    @frkr;ur@ur

    0:

    (7

    When the corresponding linear system is solved, the solution of the

    minimization problem can be obtained analytically:

    ur z1z2 z3z4

    mz2 z23

    ; kr mz4 z1z3

    mz2 z23

    ;

    where

    808 W.K. Wong et al. / European Journal of Operational Research 207 (2010) 807816

  • 7/29/2019 Adaptive neural network model for time-series forecasting.pdf

    3/10

    z1 Xmi1

    yti1; z2 Xmi1

    y2ri1; z3 Xmi1

    yri1;

    z4 Xmi1

    yri1yti1:

    Based on this strategy, the adaptive k-nearest neighbors are

    chosen. And the input vector of the first network (known as the

    main network) can be defined as:

    inputv

    qvt ; qv

    t1; . . . ; qv

    tp1

    yt urvkrv

    ;yt1 urv

    krv; . . . ;

    ytp1 urvkrv

    : 8

    Most input values can be close to the historical data using this

    method. The forecasting error increases dramatically due to the

    big difference between training data and input data. In order to

    get more accurate results for time-series yt,yt1,yt2, . . . ,ytp+1, k

    sets of inputs are used and the output vector are outputv = bv,

    v= 1,2,. . . , k. Due to the different value of LAYt; Yrv; krv ; urv , and

    jt rvj for v = 1,2,. . . , k, the forecasting result is affected. The rela-

    tive error is used to measure the impact of LAYt; Yrv ; krv ; urv ,

    and jt rvj. The relative error is defined as REv

    yrv~yrv

    yrv , where yrvis the source point and ~yrv is the predicted point. In this study,

    the second neural network (known as modified network) is used

    to find out the relationship between the four factors and the REv.

    The estimated result of REv is fREv, which is presented as follows:fREv fLAYt; Yrv ; krv ; urv ; jt rvj: 9The mechanism for admixture of outputs is presented as

    follows:

    yt1 1

    U

    Xkv1

    krv bv urv eeREv ; 10

    U

    Xk

    v1

    e

    eREv : 11

    From Eq. (10), the forecasting result of yt+1 is calculated from bvv= 1,2,. . . , k with different weighing coefficients. Based on the

    methodology proposed above, the forecasting scheme can be for-

    mulated as shown in Fig. 1.

    The steps of the algorithm of the proposed method are given as

    follows.

    Step 1: Train the two neural networks using the historical data. In

    the first neural network, yi,yi1,yi2, . . . ,yim+1 are the

    input training data, and yi+1 is the output training data.

    In the second neural network, LAYt; Yrv; krv ; urv, and

    jt rvj are the training inputs, and the relative error fREvis the training output. BP algorithm is used to train these

    two neural networks.

    Step 2: Compare the data set yt,yt1,yt2 , . . . ,ytm+1 and other

    parts of the time-series using the adaptive metrics dis-

    tance on the Euclidean space based on Eq. (6).

    Step 3: Choose the k-nearest neighbors and get krv ; urv based on

    Eq. (7). Initialize the input data of the first neural network

    according to Eq. (8). The input data of the first neural net-

    work are qvi ; qv

    i1; qv

    i2; . . . ; qv

    im1; v 1; 2; . . . ; k.

    Step 4: Apply the first neural network and obtain the results of the

    outputv = bv, v= 1,2,. . . , k.

    Step 5: Use LAYt; Yrv; krv ; urv, and j t rvj to predict the relative

    error fREv, the number of hidden neurons is 5 for all simu-lations in the second neural network.

    Step 6: Apply the mechanism for admixture of Eq. (10) and obtain

    the forecasting result of yt+1.

    3. Numerical simulations

    Since the auto-regression model (AR), the traditional Back Prop-

    agation (BP) ANN architectures and the adaptive k-nearest neigh-

    bors method (AKN) are the popular methods for forecasting

    programme, the performance of the proposed model is bench-

    marked against ZhangsAR model (Zhang, 2003), AdyasANNmodel

    (Adya and Collopy, 1998), and Kuleshs AKN model (Kulesh et al.,

    2008) reported in the literature. To illustrate the accuracy of the

    method, several deterministic and chaotic time-series are gener-

    ated and predicted, and three real time-series are considered in

    this section. The Mean Absolute Percentage Error (MAPE) statistic

    is used to evaluate the forecasting performance of the model. The

    MAPE is regarded as one of the standard statistical performance

    measures and takes the following form

    MAPE 1

    MXM

    i1

    yi ~yiyi

    100%;

    where yi is the source point, ~yi is the predicted point and M is the

    number of predicted points.

    The mean squared error (NMSE) is used as the error criterion,

    which is the ratio of the mean squared error to the variance of

    the time-series. It defined, for a time-series yi, by

    NMSE

    PMi1yi ~yi

    2PMi1yi ^yi

    2

    PMi1yi ~yi

    2

    Mr2;

    ^yi 1

    M

    XMi1

    yi;

    Fig. 1. The forecasting scheme for adaptive neural network modeling.

    W.K. Wong et al. / European Journal of Operational Research 207 (2010) 807816 809

  • 7/29/2019 Adaptive neural network model for time-series forecasting.pdf

    4/10

    where ^yi is the mean value of the source data and r2 is the varianceof the source data.

    In the simulations, for bothAKNmethod andADNNmethod, the

    number of nearest neighbors is set as k, and the data length for

    comparison is set as m. For each simulation, we use the AR(m)

    for forecasting. The number of input nodes and the number of hid-

    den nodes for ANNmethod are set the same as the first neural net-

    work of ADNN, and the parameters setting for all simulations areshown in Table 1.

    3.1. Deterministic synthetic examples

    In this section, the proposed method is tested on three deter-

    ministic time-series with obvious seasonal dependence, trend

    and amplitude change. The corresponding MAPEvalues and NMSE

    values of the predicted time-series are listed in Tables 24 respec-

    tively. In order to investigate the generalization ability for the pro-

    posed model, the different noise terms are added into the time

    series. For the time-series yt,yt1,yt2, . . . ,y1, the noise term is

    rt,rt1,rt2, . . . ,r1. Thus, the training inputs are yi + ri,yi1 +ri1,yi3 + ri3, . . . ,yim+1 + rim+1, and the training output isyi+1 + ri+1. In this study, the noise terms are generated by normaldistribution.

    The first synthetic time-series is a seasonal dependence series

    with linear increasing as shown in Fig. 2a. The equation of the sea-

    sonal dependence series is:

    St cost

    25

    sin

    t

    100

    t

    1000 1; t2 0;2200;

    where t denotes time. For this time-series data set, the first 2200

    lengths are used for training, and the last 200 source time-series

    lengths for prediction. The parameters k and m are set as 2 and100 respectively. The numbers of the hidden neurons is 16. For this

    Table 1

    Parameters setting for simulations.

    Types of time-series k First networks hidden

    nodes of ADNN

    m

    Seasonal dependence time-series 2 16 100

    Multiplicative seasonality time-series 2 8 15

    High-frequency time-series 3 10 70

    Duffing chaotic time-series 2 8 50

    MackeyGlass chaotic time-series 3 6 14Ikeda map chaotic time-series 3 5 28

    Sunspot time-series 3 6 11

    Traffic time-series 3 12 180

    Payments time-series 3 8 30

    Table 2

    Prediction summary for seasonal dependence time-series.

    Noise AKN AR ANN ADNN

    MAPE NMSE MAPE NMSE MAPE (%) NMSE MAPE NMSE

    0 4.83 107 6.35 1011 3.43 1017 2.53 1030 0.02 1.16 1010 3 103 1.3 1010

    0.001 0.83% 7.34 105 0.84% 1.23 104 2.95 1.8 103 1.05% 1.31 104

    0.005 2.4% 6.08 104 2.1083% 8.53 104 4.32 4.1 103 2.86% 1.34 103

    0.01 2.65% 7.45 104 2.55% 2.1 104 4.85 3.1 104 3.49% 1.21 103

    0.02 3.31% 9.81 104 3.19% 2.8 104 5.21 5.2 103 3.87% 2.6 103

    0.03 4.74% 2.1 103 4.18% 3.2 103 5.55 7.1 103 4.31% 3.3 103

    0.04 4.92% 2.4 103 5.12% 3.9 103 6.44 9.2 103 5.49% 4.4 103

    0.05 5.68% 2.9 103

    5.32% 4.8 103

    8.06 1.21 102

    6.31% 5.2 103

    Table 3

    Prediction summary for multiplicative seasonality series.

    Noise AKN AR ANN ADNN

    MAPE (%) NMSE MAPE (%) NMSE MAPE (%) NMSE MAPE (%) NMSE

    0 1.15 1.42 104 1.18 1.88 106 13.81 0.09 1.17 6.72 104

    0.001 3.61 1.51 103 3.05 2.7 103 15.69 0.12 4.27 3.9 103

    0.005 4.76 1.89 103 7.29 9.7 103 17.01 0.15 6.11 1.1 102

    0.01 5.6 6.3 103 8.43 1.89 102 19.31 0.18 9.13 1.72 102

    0.02 7.47 8.3 103 12.46 3.17 102 22.44 0.22 12.95 5.58 102

    0.03 8.92 1.88 102 13.43 5.31 102 28.75 0.41 13.67 7.3 102

    0.04 10.56 2.12 102 14.49 7.2 102 34.78 0.52 14.76 8.8 102

    0.05 11.85 3.89 102 15.71 8.3 102 35.34 0.64 15.92 9.8 102

    Table 4

    Prediction summary for high-frequency series.

    Noise AKN AR ANN ADNN

    MAPE (%) NMSE MAPE (%) NMSE MAPE (%) NMSE MAPE (%) NMSE

    0 5.52 9.14 104 1.18 3.54 105 1.09 7.01 105 1.17 1.78 104

    0.001 7.55 1.6 103 5.36 1.7 103 5.88 1.79 103 5.57 2.18 103

    0.005 8.91 5.7 103 6.78 3.1 103 6.22 3.2 103 6.32 3.2 103

    0.01 10.98 4.3 103 9.41 4.2 103 8.20 3.9 103 8.24 4.1 103

    0.02 14.72 7.3 103 9.96 4.9 103 8.96 4.2 103 9.17 4.5 103

    0.03 17.31 8.2 103 13.15 9.6 103 9.47 4.9 103 9.52 4.9 103

    0.04 24.43 1.3 102 14.09 1.1 102 9.67 5.3 103 10.16 5.4 103

    0.05 32.27 3.12 102 15.38 1.6 102 11.24 6.5 103 12.01 8.3 103

    810 W.K. Wong et al. / European Journal of Operational Research 207 (2010) 807816

  • 7/29/2019 Adaptive neural network model for time-series forecasting.pdf

    5/10

    time series, the amplitude of seasonal components does not change,

    which means that the optimal value of parameter krv is equal to 1.

    In the prediction process, only the parameter urv , which is respon-

    sible for a changing trend, should be determined. Fig. 2b illustrates

    the prediction results with different noise terms using ADNN. In

    Fig. 2b, the prediction data presented by *, h, and + are under

    the circumstances that the source data added with the noise varia-

    tion 0, 0.01, and 0.05 respectively. The simulation result shows that

    the model can predict this time-series very accurately when the

    source data with no noise term, and the forecasting accuracy is de-

    creased when the noise variation is increased. The performance of

    different models is described in Table 2. Table 2 shows that the

    ADNNhas better performance than the ANN model, and has almost

    the same performance of theAKN. As the synthetic time-series has a

    feature of strong orderliness, the result of ADNN is not better than

    that of the AR model. Table 2 also indicates that ADNN model has

    almost the same noise endurance ability as the AKN and ANN for

    this time-series.The second time-series, which is non-linear and multiplicative

    seasonal, is simulated and the results are shown in Fig. 3a. This

    time-series has a non-linear trend and the amplitude of seasonal

    oscillations increases with time. The model of the time-series is de-

    scribed as:

    St Rt; t 2 0;79;

    Sts2

    St2s ; t 2 80;590;

    (

    where Rt 170000

    sin t350

    cos 9t

    7

    10

    , s = 14. Prediction is done for

    15% of source time-series length; the first 85% observations are used

    for training. The parameters k and m are set as 2 and 15 respec-

    tively. The numbers of the hidden neurons is 8. Fig. 3b illustrates

    the prediction results with different noise terms using ADNN.Fig. 3b illustrates that the predicted data is almost the same as

    the source data with no noise term. This time-series has a peculiar-

    ity in which the amplitude of every following periodic segment is as

    much a fixed time as the previous segment. The performance for

    different models is described in Table 3. Table 3 shows that the per-

    formance of theADNNis better than that of theANN, but more close

    to that of the AKN. Like the first time-series simulation above, the

    time-series also has a feature of strong orderliness, and the AR mod-

    el generates the same results as those of the ADNN. Table 3 also

    indicates that ADNN model has better noise endurance ability than

    ANN, but almost the same as the AKN for this time-series.

    The third synthetic time-series is high-frequency with multipli-

    cative seasonality and smoothly increasing amplitude as shown in

    Fig. 4a. To formulate this time-series, the following explicit expres-

    sion is used:

    St t

    100sin

    t

    2 cos

    t

    20 ; t 2 0;550:

    It models a high-frequency series with seasonal periodicity. The

    prediction is done for 111

    of the time-series length. The first 1011

    of

    the source data is used for training. The parameters k and m are

    set as 3 and 70 respectively. The numbers of the hidden neurons

    is 10. Fig. 4b shows the prediction results with different noise terms

    using ADNN. The performance for different models is described in

    Table 4. Table 4 indicates that the ADNN outperforms the AKN,

    but has the same performance as the ANN and AR models. Table 4

    also indicates that ADNN model has better noise endurance ability

    than AKN, but almost the same as the ANN for this time-series.

    In the seasonal dependence series with linear increasing and

    non-linear multiplicative seasonality, the performance of theADNN

    is better than that of the ANN, but the same as that of the AKN. In

    high-frequency time-series with multiplicative seasonality andsmoothly increasing amplitude, the performance of the ADNN is

    0 500 1000 1500 20000

    1

    2

    3

    4

    Time

    D

    ataseries

    (a)

    2000 2050 2100 2150 22002

    2.5

    3

    3.5

    4

    4.5

    5

    Time

    Dataseries

    (b)

    Source data

    Variation=0

    Variation=0.01

    Variation=0.05

    Fig. 2. (a) Time-series with seasonal dependence (vertical line showing the prediction start) and (b) zoom of predicted values with different noise terms using ADNNmodel.

    400 450 500 5500

    20

    40

    60

    80

    Time

    Dataser

    ies

    (a)

    560 565 570 575 580 585 5900

    20

    40

    60

    80

    Time

    Dataser

    ies

    (b)

    Source data

    Variation=0

    Variation=0.01

    Variation=0.05

    Fig. 3. (a) Time-series with multiplicative seasonality (vertical line showing the prediction start) and (b) zoom of predicted values with different noise terms using ADNN

    model.

    W.K. Wong et al. / European Journal of Operational Research 207 (2010) 807816 811

  • 7/29/2019 Adaptive neural network model for time-series forecasting.pdf

    6/10

    the same as the ANN and AR models, and better than that of the

    AKN. The above simulation results show that theADNNmodel ben-

    efits from the merits of the ANNand AKN.

    3.2. Chaotic time-series

    In this section, the proposed method is tested on three chaotic

    time series. The corresponding MAPE values and NMSE values of

    the predicted time-series are listed in Tables 57 respectively.

    The Duffing-equation chaotic time-series consists of 2050

    observations generated by the equation:

    dydx

    y x x3 bcosat;dydx

    y:

    (

    The results based on this equation are shown in Fig. 5a. For predic-

    tion, only the horizontal component of this chaotic two-component

    series is used. We assumed that the time-series consists of the

    0 100 200 300 400 5000

    1

    2

    3

    4

    5

    6

    7

    Time

    D

    ataseries

    (a)

    505 510 515 520 525 530 535 540 545 5500

    2

    4

    6

    8

    Time

    D

    ataseries

    (b)

    Source data

    Variation=0

    Variation=0.01

    Variation=0.05

    Fig. 4. (a) High-frequency time-series with multiplicative seasonality (vertical line showing the prediction start) and (b) zoom of predicted values with different noise terms

    using ADNN model.

    Table 5

    Prediction summary for Duffing chaotic time-series.

    Noise AKN AR ANN ADNN

    MAPE (%) NMSE MAPE (%) NMSE MAPE (%) NMSE MAPE (%) NMSE

    0 7.2999 0.0034 0.6483 2.4143 105 0.2806 2.5246 106 0.3361 7.4141 106

    0.001 13.4408 0.011 7.8903 0.0042 3.6449 1.1432 103 3.4630 1.1653 103

    0.005 15.8850 0.0134 13.4571 0.0211 9.2552 0.0082 8.8457 0.0068

    0.01 19.1673 0.0165 13.9801 0.0223 11.9432 0.0098 11.2729 0.0094

    0.02 19.6696 0.0143 19.2554 0.0282 15.2521 0.0122 16.7104 0.0163

    0.03 23.4675 0.0631 20.6164 0.0397 16.5124 0.0231 17.3692 0.0267

    0.04 25.9714 0.0365 21.9570 0.0376 19.2540 0.0215 20.6404 0.0214

    0.05 27.2354 0.0461 24.3629 0.0433 21.5172 0.0273 21.7155 0.0273

    Table 6

    Prediction summary for MackeyGlass chaotic time-series.

    Noise AKN AR ANN ADNN

    MAPE (%) NMSE MAPE (%) NMSE MAPE (%) NMSE MAPE (%) NMSE

    0 1.0484 2.1431 103 0.4941 7.1211 105 0.0373 2.8314 107 0.0655 8.1435 107

    0.001 1.5015 3.8461 104 4.2862 0.0031 2.0275 7.7453 104 2.1203 9.6342 104

    0.005 3.3865 0.0016 5.6697 0.0061 5.6017 0.0061 6.5236 0.0065

    0.01 5.4567 0.0039 6.4944 0.0078 6.9095 0.0082 7.3885 0.0086

    0.02 7.1349 0.0067 7.3276 0.0098 8.8958 0.0132 9.0145 0.0172

    0.03 9.0888 0.0121 7.8270 0.0110 9.0929 0.0126 10.2763 0.0183

    0.04 10.9259 0.0190 9.4304 0.0169 12.1027 0.0324 11.0820 0.0194

    0.05 12.7687 0.0287 10.3548 0.0182 13.6413 0.0116 12.3625 0.0213

    Table 7

    Prediction summary for Ikeda map chaotic time-series.

    Noise AKN AR ANN ADNN

    MAPE (%) NMSE MAPE (%) NMSE MAPE (%) NMSE MAPE (%) NMSE

    0 20.5213 0.1231 11.2426 0.0164 9.3819 0.0142 9.1635 0.0114

    0.001 21.5745 0.1348 13.3011 0.0193 10.3539 0.0194 9.7751 0.0178

    0.005 24.7196 0.1333 17.4031 0.0272 12.8506 0.0183 12.9018 0.0204

    0.01 27.3472 0.1408 19.5150 0.0383 14.5355 0.0253 14.9533 0.0281

    0.02 26.6362 0.1512 22.9523 0.0577 16.7840 0.0316 16.5926 0.0334

    0.03 27.0252 0.1331 24.1169 0.0643 18.6916 0.0298 18.7898 0.0385

    0.04 31.5612 0.1612 29.8482 0.0786 21.4366 0.0399 21.2370 0.0465

    0.05 34.2396 0.1827 32.5005 0.116 26.3222 0.0796 26.8273 0.0845

    812 W.K. Wong et al. / European Journal of Operational Research 207 (2010) 807816

  • 7/29/2019 Adaptive neural network model for time-series forecasting.pdf

    7/10

    positive values xiP 0 and therefore the value x0 is added to the

    source data (Fig. 5b). The first 1950 observations are used for train-

    ing and the remaining observations for testing. The parameters k

    and m are set as 2 and 50 respectively. The numbers of the hidden

    neurons is 8. The prediction results with different noise terms using

    ADNN are shown in Fig. 5c. In Fig. 5c, the simulation result shows

    that the model can predict this time-series perfectly if the source

    data with no noise term, and the prediction get worse when thenoise variation is increased. The performance of different models

    is described in Table 5. Table 5 indicates that the ADNN has better

    performance than the AKN model and AR model, and has almost

    the same performance of the ANN. Table 5 also indicates that ADNN

    model has almost the same noise endurance ability as the ANN for

    this time-series.

    The MackeyGlass benchmarks (Casdagli, 1989) are well known

    for their evaluation of prediction methods. The time-series is gen-

    erated by the following non-linear differential equation:

    dxt

    dt bxt

    axt s1 xct s

    :

    The different values ofs generate various degrees of chaos. Thebehavior is chaotic as s > 16.8 and s = 17, which is commonly seenin the literature. In this study, the parameters are set as a = 0.2,

    b = 0.1, c= 10, s = 17 as shown in Fig. 6a. According to commonpractice (Kulesh et al., 2008), the first 1950 values of this series

    are used for the learning set and the next 100 values for the testing

    set. The parameters k and m are set as 3 and 14 respectively. The

    numbers of the hidden neurons is 6. Fig. 6c depicts the prediction

    results with different noise terms. The performance for different

    models is described in Table 6. Table 6 indicates that the ADNN

    has almost the same performance of the ANN and the AKN. Table

    6 also indicates that ADNN model has almost the same noise

    endurance ability as the ANNand AKN for this time-series.

    The Ikeda map is another chaotic time-series experiment which

    may be given in terms of mapping the complex plane to itself. Thecoordinates of the phase space are related to the complex degree of

    freedom z=x + iy. The mapping is (Makridakis, 1996; Murray,

    1993):

    zn1 p Bznea b

    1jzn j2

    i;

    wherep = 1, B = 1, a = 0.4,b = 6. Fig. 7a is generated by a time-seriesof the Ikeda map, and the series length is 2048. The prediction is

    done for the translated vertical component y(t) +y0 for 5% of thetime-series length (Fig. 7b). The parameters k and m are set as 3

    and 28 respectively. The numbers of the hidden neurons is 5. The

    prediction results with different noise terms are shown in Fig. 7c.

    The performance for different models is presented in Table 7. Table

    7 indicates that the ADNN model has better prediction result than

    AKN and AR, but has almost the same prediction performance re-

    sults as ANN model for this time series.

    From the simulation above, the results show that the proposed

    ADNNmodel can make predictions on a complicated chaotic time-

    series as well as those by the ANNalgorithm of predicting chaotic

    time-series. The ADNN model reduces the problems of amplitude

    changing and trend determination. When detrended signals are

    without amplitude changing, the prediction results are similar to

    those of the ANNas expected.

    3.3. Real time-series

    In this section, the proposed method is tested on three real data

    time series. In the real data time-series, there must have many er-

    rors because of many reasons such as observation error. So, in this

    part of simulation, we dont add the noise term into the series.

    The sunspots dataset (Fig. 8a) is natural and contains the yearly

    number of dark spots on the sun from 1701 to 2007. The time-ser-

    ies has a pseudo-period of 1011 years. It is common practice

    (McDonnell and Waagen, 1994) to use the data from 1700 to

    1920 as a training set and to assess the performance of the model

    on another set of 19211955 (Test 1). The parameters k and m are

    set as 3 and 11 respectively. The numbers of the hidden neurons is6. The prediction results are shown in Fig. 8b.

    0 200 400 600 800 1000 1200 1400 1600 1800 20000

    1

    2

    3

    4

    5

    6

    Time

    x(t)+x0

    (b)

    3 2 1 0 1 2 34

    2

    0

    2

    4

    x(t)

    y(t)

    (a)

    1960 1970 1980 1990 2000 2010 2020 2030 20400

    1

    2

    3

    4

    5

    Time

    x(t)+x0

    (c)

    Source data

    Variation=0

    Variation=0.01

    Variation=0.05

    Fig. 5. (a) Chaotic time-series based on the solution of Duffing equation, (b) horizontal component of the time-series (vertical line showing the prediction start) and (c) zoom

    of predicted values with different noise terms using ADNN model.

    W.K. Wong et al. / European Journal of Operational Research 207 (2010) 807816 813

  • 7/29/2019 Adaptive neural network model for time-series forecasting.pdf

    8/10

    The fitting traffic flow data consists of 1689 observations

    (11 weeks) from the hourly vehicle count for the Monash Freeway,

    outside Melbourne in Victoria, Australia, beginning in August 1995.

    A graph of the data is shown in Fig. 9a. The parameters k and m are

    set as 3 and 180 respectively. The numbers of the hidden neuronsis 12. The prediction results are shown in Fig. 9b.

    The data in Fig. 10a are the payments based on paper docu-

    ments (filled in and sent to the bank Kulesh et al., 2008). The con-

    cerned data have appreciable seasonal components with sinusoidal

    trends. The parameters k and m are set as 3 and 30 respectively.

    The numbers of the hidden neurons is 8. The prediction resultsare shown in Fig. 10b.

    0 200 400 600 800 1000 1200 1400 1600 1800 20000

    2

    4

    6

    8

    10

    12

    14

    Time

    y(t)+

    y0

    (b)

    10 5 0 5 104

    2

    0

    2

    4

    6

    8

    10

    12

    x(t)

    y(t)

    (a)

    1960 1970 1980 1990 2000 2010 2020 2030 20400

    2

    4

    6

    8

    10

    12

    14

    16

    Time

    x(t)+x0

    Source data

    Variation=0

    Variation=0.01

    Variation=0.05

    Fig. 7. (a) Chaotic time-series based on the Ikeda map, (b) vertical component of the time-series (vertical line showing the prediction start) and (c) zoom of predicted values

    with different noise terms using ADNN model.

    0 200 400 600 800 1000 1200 1400 1600 1800 2000

    0.4

    0.6

    0.8

    1

    1.2

    1.4

    Time

    x(t)

    (b)

    0.4 0.6 0.8 1 1.2 1.4

    0.4

    0.6

    0.8

    1

    1.2

    1.4

    x(t)

    x(tt0)

    (a)

    1960 1970 1980 1990 2000 2010 2020 2030 2040

    0.4

    0.6

    0.8

    1

    1.2

    1.4

    1.6

    Time

    x(t)

    (c)

    Source data

    Variation=0

    Variation=00.1

    Variation=0.05

    Fig. 6. (a) Chaotic time-series based on the solution of the MackeyGlass delay differential equation, (b) horizontal component of the time-series (vertical line showing the

    prediction start) and (c) zoom of predicted values with different noise terms using ADNN model.

    814 W.K. Wong et al. / European Journal of Operational Research 207 (2010) 807816

  • 7/29/2019 Adaptive neural network model for time-series forecasting.pdf

    9/10

    Table 8 summarizes the prediction performance of different

    prediction models on the three sets of real time-series data men-

    tioned above. It is obvious that the proposedADNNmethod outper-forms the three other models as the proposed method has the

    advantage of adapting to local variations of trends and amplitudes

    and solving the problem of over-fitting. Moreover, the neural

    networks have the flexible non-linear modeling capability. Becauseof the reasons above, the real time-series prediction made by the

    1750 1800 1850 1900 1950 20000

    50

    100

    150

    200

    Time

    T

    imeseries

    (a)

    220 225 230 235 240 245 250 2550

    20

    40

    60

    80

    100

    120

    140

    160

    Time

    D

    ataseries

    (b)

    Source data

    Prediction data

    Fig. 8. (a) Real sunspots data (vertical line showing the prediction start) and (b) zoom of predicted values using ADNN model.

    0 200 400 600 800 1000 1 200 1 400 1 6000

    1000

    2000

    3000

    4000

    5000

    6000

    Time

    Timeseries

    (a)

    1560 1580 1600 1620 1640 1660 16800

    1000

    2000

    3000

    4000

    5000

    Time

    Timeserie

    s

    (b)

    Source data

    Prediction data

    Fig. 9. (a) Real traffic flow data (vertical line showing the prediction start) and (b) zoom of predicted values using ADNN model.

    100 200 300 400 500

    3

    4

    5

    6

    7

    8

    9

    Time

    Dataseries

    (a)

    505 510 515 520 525 530 535 540 545 5502

    3

    4

    5

    6

    7

    8

    9

    Time

    Dataseries

    (b)

    Source data

    Prediction data

    Fig. 10. (a) Real payment data based on paper documents (vertical line showing the prediction start) and (b) zoom of predicted values using ADNN model.

    Table 8

    Prediction summary for real data time-series.

    Time-series Sunspot time-series Traffic time-series Payments time-series

    MAPE (%) NMSE MAPE (%) NMSE MAPE (%) NMSE

    ADNN 28.45 0.068 14.31 0.0193 8.08 0.0109

    ANN 30.8 0.078 17.97 0.0267 15.24 0.0274

    AR 31.2 0.0852 26.98 0.0818 9.06 0.0113

    AKN 50.3 0.1833 17.39 0.0206 12.5 0.0178

    W.K. Wong et al. / European Journal of Operational Research 207 (2010) 807816 815

  • 7/29/2019 Adaptive neural network model for time-series forecasting.pdf

    10/10

    proposed method gives us better fitting to the real data in compar-

    ison with the three other methods.

    4. Conclusions

    This study presents a novel adaptive approach to extending the

    artificial neural network in which the adaptive metrics of inputs

    and a new mechanism for admixture of outputs are proposed fortime-series prediction. The experimental results generated by a

    set of consistent performance measures with different metrics

    (MAPE, NMSE) show that this new method can improve the accu-

    racy of time-series prediction. The performance of the proposed

    method is validated by three sets of complex time-series, namely

    deterministic synthetic time-series, chaotic time-series and real

    time-series. In addition, the predicted results generated by the

    ADNNare also compared with those by theANN,AKN, andAR meth-

    ods and indicate that the proposed model outperforms these con-

    ventional techniques, particularly in forecasting chaotic and real

    time-series.

    References

    Adya, M., Collopy, F., 1998. How effective are neural networks at forecasting and

    prediction? A review and evaluation. Journal of Forecasting 17, 481495.

    Barbounis, T.G., Teocharis, J.B., 2007. Locally recurrent neural networks for wind

    speedprediction using spatial correlation. Information Science 177, 57755797.

    Bartlett, P.L., 1997. For valid generalization, the size of the weights is more

    important than the size of the network. In: Mozer, M.C., Jordan, M.I., Petsche, T.

    (Eds.), Advances in Neural Information Processing Systems, vol. 9. The MIT

    Press, Cambridge, MA, pp. 134140.

    Bodyanskiy, Y., Popov, S., 2006. Neural network approach to forecasting of

    quasiperiodic financial time series. European Journal of Operational Research

    175, 13571366.

    Brooks, C., 2002. Introductory Econometrics for Finance. Cambridge University

    Press, Cambridge, UK. p. 289.

    Casdagli, M., 1989. Nonlinear prediction of chaotic time series. Physics D 35, 335

    356,.

    Celik, A.E., Karatepe, Y., 2007. Evaluating and forecasting banking crises through

    neural network models: an application for Turkish banking sector. Expert

    Systems with Applications 33, 809815.Chen, S.M., Hwang, J.R., 2000. Temperature prediction using fuzzy time series. IEEE

    Transactions on Systems, Man and Cybernetics Part B 30, 263275.

    Cybenko, G., 1989. Approximation by superpositions of a sigmoidal function.

    Mathematics of Control, Signals, and Systems 2, 303314.

    Freitas, P.S.A., Rodrigues, A.J.L., 2006. Model combination in neural-based

    forecasting. European Journal of Operational Research 173, 801814.

    Funahashi, K., 1989. On the approximate realization of continuous mappings by

    neural networks. Neural Networks 2, 183192.

    Geman, S., Bienenstock, E., Doursat, R., 1992. Neural networks and the bias/variance

    dilemma. Neural Computation 4, 158.

    Hansen, J.V., McDonald, J.B., Nelson, R.D., 2002. Time series prediction with genetic-

    algorithm designed neural networks: An empirical comparison with modern

    statistical models. Computational Intelligence 15, 171184.

    Huarng, K., Yu, T.H., 2006. Ratio-based lengths of intervals to improve fuzzy time

    series forecasting. IEEE Transactions on Systems, Man and Cybernetics Part B36, 328340.

    Kim, D., Kim, C., 1997. Forecasting time series with genetic fuzzy predictor

    ensemble. IEEE Transactions on Fuzzy Systems 5, 523535.

    Kulesh, M., Holschneider, M., Kurennaya, K., 2008. Adaptive metrics in the nearest

    neighbours method. Physics D 237, 283291.

    Liu, M.C., Kuo, W., Sastri, T., 1995. An exploratory study of a neural network

    approach for reliability data analysis. Quality and Reliability Engineering

    International 11, 107112.

    Makridakis, S., 1996. Forecasting: its role and value for planning and strategy.

    International Journal of Forecasting 12, 513537.

    Makridakis, S., Wheelwright, S.C., Hyndman, R.J., 1998. Forecasting-Methods and

    Applications, third ed. Wiley, New York. pp. 4250.

    Masters, T., 1995. Advanced Algorithms for Neural Networks: A C++ Sourcebook.

    Wiley, New York.

    McDonnell, J.R., Waagen, D., 1994. Evolving recurrent perceptrons for time series

    modeling. IEEE Transactions on Neural Networks 5, 2438.

    Moody, J.E., 1992. Theeffective number of parameters: an analysis of generalization

    and regularization in nonlinear learning systems. Neural Information

    Processing Systems 4, 847854.

    Murray, D.B., 1993. Forecasting a chaotic time series using an improved metric for

    embedding space. Physics D 68, 318325.

    Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P., 1992. Numerical Recipe

    in C: The Art of Scientific Computing. Cambridge University Press.

    Sahoo, G.B., Ray, C., 2006. Flow forecasting for a Hawaii stream using rating curves

    and neural networks. Journal of Hydrology 317, 6380.

    Singh, P., Deo, M.C., 2007. Suitability of different neural networks in daily flow

    forecasting. Applied Soft Computing 7, 968978.

    Taylor, J.W., Buizza, R., 2002. Neural network load forecasting with weather

    ensemble predictions. IEEE Transactions on Power Systems 17, 59.

    Wang, T., Chien, S., 2006. Forecasting innovation performance via neural networks

    a case of Taiwanese manufacturing industry. Technovation 26, 635643.

    Wong, B.K., Vincent, S., Jolie, L., 2000. A bibliography of neural network business

    applications research: 19941998. Operations Research and Computers 27,

    10451076.

    Zhang, G.P., 2003. Time series forecasting using a hybrid ARIMA andneuralnetwork

    model. Neurocomputing 50, 159175.

    Zhang, P., Qi, G.M., 2005. Neural network forecasting for seasonal and trend timeseries. European Journal of Operational Research 160, 501514.

    Zhang, G., Eddy, P.B., Hu, M.Y., 1998. Forecasting with artificialneuralnetworks: the

    state of the art. International Journal of Forecasting 14, 3562.

    816 W.K. Wong et al. / European Journal of Operational Research 207 (2010) 807816