Time Series forecasting using ARIMA and Neural Network based Hybrid Model

download Time Series forecasting using ARIMA and Neural Network based Hybrid Model

of 14

Transcript of Time Series forecasting using ARIMA and Neural Network based Hybrid Model

  • 7/27/2019 Time Series forecasting using ARIMA and Neural Network based Hybrid Model

    1/14

    Time Series Analysis and forecasting using ARIMA

    modeling, Neural Network and Hybrid Model using ELM

    Puneet Singh, Ved Prakash Gupta

    Department of Mathematics and Scientific Computing, Indian Institute of Technology, Kanpur

    Abstract

    For past three decades ARIMA modeling of time series data has been the most used method for forecasting.

    Recent research has shown that using Artificial Neural Networks (ANNs) can improve model fit significantly

    giving better predictions. With some data ARIMA models and ANNs give mixed results in terms of modelsuperiority. In this paper, we try to apply a hybrid model constituting of ARIMA model for the linear part in time

    series and ANN model for modeling the non-linear part. We prove that the result obtained by this hybrid model

    is better than any of the model used alone. Further, we have implemented Extreme Learning Machine technique

    as a substitute to ANN and identified some interesting results. Mean square error has been used as a measure of

    models strength.

  • 7/27/2019 Time Series forecasting using ARIMA and Neural Network based Hybrid Model

    2/14

    Introduction

    Time series forecasting is one the most important area of forecasting where future value of a variable is

    predicted based on the past value of the same variable. A model is estimated based on the properties of the

    actual data and the obtained model is used for extrapolating the time series in to the future. This method works

    even when there is no significant information about the origin of the data and its nature. A lot of research is

    aimed at improving the time series forecasting models. One of the most important and widely used time series

    models is the autoregressive integrated moving average (ARIMA) model. The popularity of the ARIMA model is

    due to its statistical properties as well as the well-known BoxJenkins methodology in the model building

    process. In addition, various exponential smoothing models can be implemented by ARIMA models. Although

    ARIMA models are quite flexible in that they can represent several different types of time series, i.e., pure

    autoregressive (AR), pure moving average (MA) and combined AR and MA (ARMA) series, their major limitation

    is the pre-assumed linear form of the model. That is, a linear correlation structure is assumed among the time

    series values and therefore, no nonlinear patterns can be captured by the ARIMA model. The approximation of

    linear models to complex real-world problem is not always satisfactory.

    Recently, artificial neural networks (ANNs) have been extensively studied and used in time series forecasting.

    Zhang et al. presented a recent review in this area. Artificial neural networks (ANNs) are one of the most

    accurate and widely used forecasting models that have enjoyed fruitful applications in forecasting social,

    economic, engineering, foreign exchange, stock problems, etc. Several distinguishing features of artificial neural

    networks make them valuable and attractive for a forecasting task. First, as opposed to the traditional model-

    based methods, artificial neural networks are data-driven self-adaptive methods in that there are few a priori

    assumptions about the models for problems under study. Second, artificial neural networks can generalize. After

    learning the data presented to them (a sample), ANNs can often correctly infer the unseen part of a population

    even if the sample data contain noisy information. Third, ANNs are universal functional approximators. It has

    been shown that a network can approximate any continuous function to any desired accuracy.

    Finally, artificial neural networks are nonlinear. The traditional approaches to time series prediction, such as the

    BoxJenkins or ARIMA, assume that the time series under study are generated from linear processes. However,

    they may be inappropriate if the underlying mechanism is nonlinear. In fact, real world systems are often

    nonlinear [Zhang et al.].

    In this paper, auto-regressive integrated moving average models are applied to construct a new hybrid model in

    order to yield more accurate model than artificial neural networks. In our proposed model, the future value of a

    time series is considered as nonlinear function of several past observations and random errors.

    Therefore, in the first phase, an auto-regressive integrated moving average model is used in order to generate

    the necessary data from under study time series. Then, in the second phase, a neural network is used to model

    the generated data by ARIMA model, and to predict the future value of time series. Two well-known data sets

    the Wolfs sunspot data and the Canadian lynx data are used in this paper in order to show the appropriateness

    and effectiveness of the proposed model to time series forecasting.

  • 7/27/2019 Time Series forecasting using ARIMA and Neural Network based Hybrid Model

    3/14

    Time Series Forecasting Methods

    For more than half a century, auto-regressive integrated moving average (ARIMA) models have dominated many

    areas of time series forecasting. In an ARIMA (p,d,q)model, the future value of a variable is assumed to be a

    linear function of several past observations and random errors. That is, the underlying process that generates

    the time series with the mean has the form:()( )= () .(1)where, andare the actual value and random error at time period t, respectively;

    ()= 1 , ()= 1 + .(2)are polynomials in B of degree p and q, (i=1,2,3,..p) and (i=1,2,3.q) are model parameters, = 1 , is the backward shift operator, p and q are integers and often referred to as orders of the model, and d isan integer and often referred to as order of differencing. Random errors are assumed to be independentlyand identically distributed with a mean of zero and a constant variance of .The Box and Jenkins (1976) methodology includes three iterative steps of model identification, parameterestimation, and diagnostic checking. The basic idea of model identification is that if a time series is generated

    from an ARIMA process, it should have some theoretical autocorrelation properties. By matching the empirical

    autocorrelation patterns with the theoretical ones, it is often possible to identify one or several potential models

    for the given time series. Box and Jenkins (1976) proposed to use the autocorrelation function (ACF) and the

    partial autocorrelation function (PACF) of the sample data as the basic tools to identify the order of the ARIMA

    model. Some other order selection methods have been proposed based on validity criteria, the information-

    theoretic approaches such as the Akaikes information criterion (AIC) (Shibata, 1976) and the minimum

    description length (MDL) (Hurvich & Tsai, 1989; Jones, 1975; Ljung, 1987). In addition, in recent years different

    approaches based on intelligent paradigms, such as neural networks (Hwang, 2001), genetic algorithms (Minerva

    & Poli, 2001; Ong, Huang, & Tzeng, 2005) or fuzzy system (Haseyama & Kitajima, 2001) have been proposed to

    improve the accuracy of order selection of ARIMA models.

    In the identification step, data transformation is often required to make the time series stationary. Stationarity is

    a necessary condition in building an ARIMA model used for forecasting. A stationary time series is characterized

    by statistical characteristics such as the mean and the autocorrelation structure being constant over time. When

    the observed time series presents trend and heteroscedasticity, differencing and power transformation are

    applied to the data to remove the trend and to stabilize the variance before an ARIMA model can be fitted. Once

    a tentative model is identified, estimation of the model parameters is straightforward. The parameters are

    estimated such that an overall measure of errors is minimized. This can be accomplished using a nonlinear

    optimization procedure. The last step in model building is the diagnostic checking of model adequacy. This is

    basically to check if the model assumptions about the errorsare satisfied.Several diagnostic statistics and plots of the residuals can be used to examine the goodness of fit of the

    tentatively entertained model to the historical data. If the model is not adequate, a new tentative model should

    be identified, which will again be followed by the steps of parameter estimation and model verification.

  • 7/27/2019 Time Series forecasting using ARIMA and Neural Network based Hybrid Model

    4/14

    Diagnostic information may help suggest alternative model(s). This three-step model building process is typically

    repeated several times until a satisfactory model is finally selected. The final selected model can then be used

    for prediction purposes.

    The ANN approach to time series modeling

    Recently, computational intelligence systems and among them artificial neural networks (ANNs), which in fact

    are model free dynamics, has been used widely for approximation functions and forecasting. One of the most

    significant advantages of the ANN models over other classes of nonlinear models is that ANNs are universal

    approximators that can approximate a large class of functions with a high degree of accuracy (Chen, Leung, &

    Hazem, 2003; Zhang & Min Qi, 2005). Their power comes from the parallel processing of the information from

    the data. No prior assumption of the model form is required in the model building process. Instead, the network

    model is largely determined by the characteristics of the data. Single hidden layer feed forward network is the

    most widely used model form for time series modeling and forecasting (Zhang et al., 1998). The model is

    characterized by a network of three layers of simple processing units connected by acyclic links (Fig. 1). The

    relationship between the output and the ( ) has the following mathematicalrepresentation:

    = + . ,+ , . + .(3)where,,(i=0,1,2p, j=1q) (j=0,1,2q) are model parameters often called connection weights; p is thenumber of input nodes; and q is the number of hidden nodes.

    Activation functions can take several forms. The type of activation function is indicated by the situation of the

    neuron within the network. In the majority of cases input layer neurons do not have an activation function, as

    their role is to transfer the inputs to the hidden layer. The most widely used activation function for the output

    layer is the linear function as non-linear activation function may introduce distortion to the predicated output.

    The logistic and hyperbolic functions are often used as the hidden layer transfer function that are shown in

    Eqs.(4) and (5), respectively. Other activation functions can also be used such as linear and quadratic, each with

    a variety of modeling applications.

    ()= () (4)

    Tanh()= ()() (5)

    Hence, the ANN model of(1), in fact, performs a nonlinear functional mapping from past observations to the

    future value , i.e.,

  • 7/27/2019 Time Series forecasting using ARIMA and Neural Network based Hybrid Model

    5/14

    = , + (6)where, w is a vector of all parameters and f() is a function determined by the network structure and connection

    weights. Thus, the neural network is equivalent to a nonlinear auto-regressive model. The simple network given

    by (1) is surprisingly powerful in that it is able to approximate the arbitrary function as the number of hidden

    nodes when q is sufficiently large. In practice, simple network structure that has a small number of hidden nodesoften works well in out-of-sample forecasting. This may be due to the over fitting effect typically found in the

    neural network modeling process. An overfitted model has a good fit to the sample used for model building but

    has poor generalizability to data out of the sample (Demuth & Beale, 2004).

    There are some similarities between ARIMA and ANN models. Both of them include a rich class of different

    models with different model orders. Data transformation is often necessary to get best results. A relatively large

    sample is required in order to build a successful model. The iterative experimental nature is common to their

    modeling processes and the subjective judgment is sometimes needed in implementing the model. Because of

    the potential over fitting effect with both models, parsimony is often a guiding principle in choosing an

    appropriate model for forecasting.

    Overview of Extreme Learning Machine

    Gradient descent based algorithms require all the weights be updated after every iteration. So, gradient

    based algorithms are generally slow and may easily converge to local minima. On the other hand, ELM,

    proposed by Huang et al. in 2004, randomly assigns the weights connecting input and hidden layers; and hidden

    biases. Then it analytically determines the output weight using the Moore-Penrose generalized inverse. It has

    been proved in that given randomly assigned input weights and hidden biases with almost any non-zero

    activation function, we can approximate any continuous function on compact sets. Unlike the traditional

    algorithms, ELM not only achieves the minimum error but also assigns the smallest norm for the output weights.

    The reason for using Moore-Penrose inverse is that according to Bartletts theory, smaller norm of weights

    results in better generalization of the feedforward neural network. The advantages of ELM over traditional

    algorithms are as follows

    ELM can be up to 1000 times faster than the traditional algorithm.ELM has better generalization performance as it not only reaches the smallest error but also the assigns

    smallest norm of weights.

    Non-differentiable functions can also be used to train SLFNs with ELM learning algorithms.Given a training setN = {(x, t)|x R, i = 1, , n and t R}, activation function g(x), and hidden nodenumber

    Nthe mathematical equation for the neural network can be written as:

    gwx+ b= o, j = 1, ,N where w=[w, w, w, , w] is the weight vector connecting the ith hidden neuron and the inputneurons and=, , , , is the weight vector connecting the ith hidden neuron and the outputneurons. tj denotes the target vector of the input xj whereas oj denotes the output vector obtained from the

    neural network. wi.xj denotes the inner product of wi and xj. The output neurons are chosen to be linear.

  • 7/27/2019 Time Series forecasting using ARIMA and Neural Network based Hybrid Model

    6/14

    Standard SLFNs with Nhidden neurons with activation function g(x) can approximate these N samples withzero mean error means that o t= 0 , i.e, there exists , wand bsuch that gwx+ b= t,j = 1, , N

    The above n equations can be written compactly as:

    H

    = Twhere

    H =g(wx+ b) g(w x+ b ) g(wx+ b) g(w x+ b )

    =

    and T =ttThe smallest norm least squared solution of the above linear system is:

    = HTThe algorithm for the ELM with architecture as shown in figure 1 can be summarized as follows:

    Given training sample N, activation function g(x) and number of hidden neurons N,1. Assign random input weightswandbiasb, i = 1 N2. Calculate the hidden layer output matrix H.3. Calculate the output weight = HT

    The Hybrid Methodology

    Both ARIMA and ANN models have achieved successes in their own linear or nonlinear domains. However, none

    of them is a universal model that is suitable for all circumstances. The approximation of ARIMA models to

    complex nonlinear problems may not be adequate. On the other hand, using ANNs to model linear problems

    have yielded mixed results. For example, using simulated data, Denton showed that when there are outliers or

    multicollinearity in the data, neural networks can significantly outperform linear regression models. Markham

    and Rakes also found that the performance of ANNs for linear regression problems depends on the sample size

    and noise level. Hence, it is not wise to apply ANNs blindly to any type of data. Since it is difficult to completely

    know the characteristics of the data in a real problem, hybrid methodology that has both linear and nonlinear

    modeling capabilities can be a good strategy for practical use. By combining different models, different aspectsof the underlying patterns may be captured.

    It may be reasonable to consider a time series to be composed of a linear autocorrelation structure and a

    nonlinear component. That is,

    = + ,

  • 7/27/2019 Time Series forecasting using ARIMA and Neural Network based Hybrid Model

    7/14

    where , denotes the linear component and Nt denotes the nonlinear component. These two components haveto be estimated from the data. First, we let ARIMA to model the linear component, and then the residuals from

    the linear model will contain only the nonlinear relationship. Let et denote the residual at time t from the linear

    model, then= Where is the forecast value for time t from the estimated relationship (2). Residuals are important in

    diagnosis of the sufficiency of linear models. A linear model is not sufficient if there are still linear correlation

    structures left in the residuals. However, residual analysis is not able to detect any nonlinear patterns in the

    data. In fact, there is currently no general diagnostic statistics for nonlinear autocorrelation relationships.

    Therefore, even if a model has passed diagnostic checking, the model may still not be adequate in that nonlinear

    relationships have not been appropriately modeled. Any significant nonlinear pattern in the residuals will

    indicate the limitation of the ARIMA. By modeling residuals using ANNs, nonlinear relationships can be

    discovered. With n input nodes, the ANN model for the residuals will be

    = ( . ) + (7)where f is a nonlinear function determined by the neural network and t is the random error. Note that if the

    model f is not an appropriate one, the error term is not necessarily random. Therefore, the correct model

    identification is critical. Denote the forecast from (7) as View the source, the combined forecast will be+In summary, the proposed methodology of the hybrid system consists of two steps. In the first step, an ARIMA

    model is used to analyze the linear part of the problem. In the second step, a neural network model is

    developed to model the residuals from the ARIMA model. Since the ARIMA model cannot capture the nonlinear

    structure of the data, the residuals of linear model will contain information about the nonlinearity. The resultsfrom the neural network can be used as predictions of the error terms for the ARIMA model. The hybrid model

    exploits the unique feature and strength of ARIMA model as well as ANN model in determining different

    patterns. Thus, it could be advantageous to model linear and nonlinear patterns separately by using different

    models and then combine the forecasts to improve the overall modeling and forecasting performance.

    As previously mentioned, in building ARIMA as well as ANN models, subjective judgment of the model order as

    well as the model adequacy is often needed. It is possible that suboptimal models may be used in the hybrid

    method. For example, the current practice of BoxJenkins methodology focuses on the low order

    autocorrelation. A model is considered adequate if low order autocorrelations are not significant even though

    significant autocorrelations of higher order still exist. This sub optimality may not affect the usefulness of the

    hybrid model. Granger has pointed out that for a hybrid model to produce superior forecasts, the component

    model should be suboptimal. In general, it has been observed that it is more effective to combine individual

    forecasts that are based on different information sets.

  • 7/27/2019 Time Series forecasting using ARIMA and Neural Network based Hybrid Model

    8/14

    Empirical Results

    1)Data Sets

    Two well-known data setsthe Wolf's sunspot data and the Canadian lynx data are used in this study to

    demonstrate the effectiveness of the hybrid method. These time series come from different areas and have

    different statistical characteristics. They have been widely studied in the statistical as well as the neural network

    literature. Both linear and nonlinear models have been applied to these data sets, although more or less

    nonlinearities have been found in these series.

  • 7/27/2019 Time Series forecasting using ARIMA and Neural Network based Hybrid Model

    9/14

    The sunspot data we consider contains the annual number of sunspots from 1700 to 2012, giving a total of 313

    observations. The study of sunspot activity has practical importance to geophysicists, environment scientists,

    and climatologists. The data series is regarded as nonlinear and non-Gaussian and is often used to evaluate the

    effectiveness of nonlinear models. The plot of this time series (see Fig. 1) also suggests that there is a cyclical

    pattern with a mean cycle

    of about 11 years. The

    sunspot data have been

    extensively studied with a

    vast variety of linear and

    nonlinear time series

    models including ARIMA

    and ANNs.

    The lynx series contains

    the number of lynxtrapped per year in the

    Mackenzie River district of

    Northern Canada. The

    data shows a periodicity

    of approximately 10 years.

    The data set has 114

    observations,

  • 7/27/2019 Time Series forecasting using ARIMA and Neural Network based Hybrid Model

    10/14

    corresponding to the period of 18211934. It has also been extensively analyzed in the time series literature

    with a focus on the nonlinear modeling. Following other studies, the logarithms (to the base 10) of the data are

    used in the analysis.

    To assess the forecasting performance of different models, each data set is divided into two samples of training

    and testing. The training data set is used exclusively for model development and then the test sample is used to

    evaluate the established model. The data compositions for the three data sets are given in Table 1.

    2) Results

    Only the one-step-ahead forecasting is considered. The mean squared error (MSE) is selected to be the

    forecasting accuracy measures.

    Table 1 gives the forecasting results for the sunspot data. A subset autoregressive model of order 9 has been

    found to be the most parsimonious among all ARIMA models that are also found adequate judged by the

    residual analysis.. The neural model and Extreme Learning Machine used is a 231 network based on

    experimental results achieved by calculating the mean square error. Forecast horizons of 67 periods are used.

    Results show that while applying neural networks alone could not improve the forecasting accuracy over theARIMA model in the 67-period horizon. The results of the hybrid model show that by combining two models

    together, the overall forecasting errors can be significantly reduced. The comparison between the actual value

    and the forecast value for the 67 points out-of-sample is given in the figure below. Although at some data

    points, the hybrid model gives worse predictions than either ARIMA or ELM forecasts, its overall forecasting

    capability is improved.

  • 7/27/2019 Time Series forecasting using ARIMA and Neural Network based Hybrid Model

    11/14

  • 7/27/2019 Time Series forecasting using ARIMA and Neural Network based Hybrid Model

    12/14

    Table 1:Sunspot Data 67 points ahead

    Model Mean Square Estimate Value

    ARIMA 393.73

    ANN 482.02

    ELM 487.64

    Hybrid(ARIMA+ELM) 363.57

    Hybrid(ARIMA+ANN) 361.69

    In a similar fashion, we have fitted to Canadian lynx data with a subset AR model of order 12. This is a

    parsimonious model also used by Subba Rao and Gabr and others. The overall forecasting results for the last 14

    years are summarized in Table 2. A neural network structure/Extreme Learning Structure of 221 gives is not

    able to improve the results of ARIMA models. Applying the hybrid method, we find a significant decrease in the

    MSE error of the data. Figure below gives the gives the actual vs. forecast values with individual models of

    Extreme Learning Machine and ARIMA as well as the combined model.

  • 7/27/2019 Time Series forecasting using ARIMA and Neural Network based Hybrid Model

    13/14

    Table 2:Lynx Data 14 points ahead

    Model Mean Square Estimate Value

    ARIMA 0.128

    ANN .148

    ELM .154

    Hybrid(ARIMA+ELM) .118Hybrid(ARIMA+ANN) .123

    Conclusion

    Applying quantitative methods for forecasting and assisting investment decision making has become more

    indispensable in business practices than ever before. Time series forecasting is one of the most important

    quantitative models that has received considerable amount of attention in the literature. Artificial neural

    networks (ANNs) have shown to be an effective, general-purpose approach for pattern recognition,

    classification, clustering, and especially time series prediction with a high degree of accuracy Nevertheless, their

    performance is not always satisfactory. Theoretical as well empirical evidences in the literature suggest that by

    using dissimilar models or models that disagree each other strongly, the hybrid model will have lower

    generalization variance or error. Additionally, because of the possible unstable or changing patterns in the data,

    using the hybrid method can reduce the model uncertainty, which typically occurred in statistical inference and

    time series forecasting.

    In this paper, the auto-regressive integrated moving average models and ELM models are applied to propose a

    new hybrid method for improving the performance of the artificial neural networks to time series forecasting. In

    our proposed model, based on the BoxJenkins methodology in linear modeling, a time series is considered as

    nonlinear function of several past observations and random errors. Therefore, in the first stage, an auto-

    regressive integrated moving average model is used in order to generate the necessary data, and ExtremeLearning Machine is used to determine a model in order to capture the underlying data generating process and

    predict the future, using preprocessed data. Empirical results with two well-known real data sets indicate that

    the proposed model can be an effective way in order to yield more accurate model than traditional artificial

    neural networks. Thus, it can be used as an appropriate alternative for artificial neural networks.

  • 7/27/2019 Time Series forecasting using ARIMA and Neural Network based Hybrid Model

    14/14

    References

    [1] J.M. Bates, C.W.J. Granger, The combination of forecasts, Oper. Res. Q., 20 (1969), pp. 451468[2] G.E.P. Box, G. Jenkins,Time Series Analysis, Forecasting and ControlHolden-Day, San Francisco, CA (1970)[3] M.J. Campbell, A.M. Walker,A survey of statistical work on the MacKenzie River series of annual

    Canadian lynx trappings for the years 1821

    1934, and a new analysis,J. R. Statist. Soc. Ser. A, 140 (1977),pp. 411431

    [4] F.C. Palm, A. Zellner,To combine or not to combine? Issues of combining forecasts,J. Forecasting, 11(1992), pp. 687701

    [5] Z. Tang, C. Almeida, P.A. Fishwick, Time series forecasting using neural networks vs BoxJenkinsmethodology,Simulation, 57 (1991), pp. 303310

    [6] G.B. Huang, Q. Zhu, C.Siew, Extreme Learning Machine: A New Learning Scheme of Feedforward NeuralNetworks, International Joint Conference on Neural Networks (2004), Vol. 2, pp:985-990

    [7] G.B. Huang, Q. Zhu, C. Siew, Extreme Learning Machine: Theory and Applications, Neurocomputing(2006), Vol. 70, pp: 489-501

    [8] G.B. Huang, H. Zhou, X. Ding, R. Zhang, Extreme Learning Machine for Regression and MulticlassClassification, IEEE Transaction on Systems, Man. And Cybernetics(2012), Cybernetics, Vol. 42, No. 2