Time Series forecasting using ARIMA and Neural Network based Hybrid Model

7/27/2019 Time Series forecasting using ARIMA and Neural Network based Hybrid Model

1/14

Time Series Analysis and forecasting using ARIMA

modeling, Neural Network and Hybrid Model using ELM

Puneet Singh, Ved Prakash Gupta

Department of Mathematics and Scientific Computing, Indian Institute of Technology, Kanpur

Abstract

For past three decades ARIMA modeling of time series data has been the most used method for forecasting.

Recent research has shown that using Artificial Neural Networks (ANNs) can improve model fit significantly

giving better predictions. With some data ARIMA models and ANNs give mixed results in terms of modelsuperiority. In this paper, we try to apply a hybrid model constituting of ARIMA model for the linear part in time

series and ANN model for modeling the non-linear part. We prove that the result obtained by this hybrid model

is better than any of the model used alone. Further, we have implemented Extreme Learning Machine technique

as a substitute to ANN and identified some interesting results. Mean square error has been used as a measure of

models strength.


2/14

Introduction

Time series forecasting is one the most important area of forecasting where future value of a variable is

predicted based on the past value of the same variable. A model is estimated based on the properties of the

actual data and the obtained model is used for extrapolating the time series in to the future. This method works

even when there is no significant information about the origin of the data and its nature. A lot of research is

aimed at improving the time series forecasting models. One of the most important and widely used time series

models is the autoregressive integrated moving average (ARIMA) model. The popularity of the ARIMA model is

due to its statistical properties as well as the well-known BoxJenkins methodology in the model building

process. In addition, various exponential smoothing models can be implemented by ARIMA models. Although

ARIMA models are quite flexible in that they can represent several different types of time series, i.e., pure

autoregressive (AR), pure moving average (MA) and combined AR and MA (ARMA) series, their major limitation

is the pre-assumed linear form of the model. That is, a linear correlation structure is assumed among the time

series values and therefore, no nonlinear patterns can be captured by the ARIMA model. The approximation of

linear models to complex real-world problem is not always satisfactory.

Recently, artificial neural networks (ANNs) have been extensively studied and used in time series forecasting.

Zhang et al. presented a recent review in this area. Artificial neural networks (ANNs) are one of the most

accurate and widely used forecasting models that have enjoyed fruitful applications in forecasting social,

economic, engineering, foreign exchange, stock problems, etc. Several distinguishing features of artificial neural

networks make them valuable and attractive for a forecasting task. First, as opposed to the traditional model-

based methods, artificial neural networks are data-driven self-adaptive methods in that there are few a priori

assumptions about the models for problems under study. Second, artificial neural networks can generalize. After

learning the data presented to them (a sample), ANNs can often correctly infer the unseen part of a population

even if the sample data contain noisy information. Third, ANNs are universal functional approximators. It has

been shown that a network can approximate any continuous function to any desired accuracy.

Finally, artificial neural networks are nonlinear. The traditional approaches to time series prediction, such as the

BoxJenkins or ARIMA, assume that the time series under study are generated from linear processes. However,

they may be inappropriate if the underlying mechanism is nonlinear. In fact, real world systems are often

nonlinear [Zhang et al.].

In this paper, auto-regressive integrated moving average models are applied to construct a new hybrid model in

order to yield more accurate model than artificial neural networks. In our proposed model, the future value of a

time series is considered as nonlinear function of several past observations and random errors.

Therefore, in the first phase, an auto-regressive integrated moving average model is used in order to generate

the necessary data from under study time series. Then, in the second phase, a neural network is used to model

the generated data by ARIMA model, and to predict the future value of time series. Two well-known data sets

the Wolfs sunspot data and the Canadian lynx data are used in this paper in order to show the appropriateness

and effectiveness of the proposed model to time series forecasting.


3/14

Time Series Forecasting Methods

For more than half a century, auto-regressive integrated moving average (ARIMA) models have dominated many

areas of time series forecasting. In an ARIMA (p,d,q)model, the future value of a variable is assumed to be a

linear function of several past observations and random errors. That is, the underlying process that generates

the time series with the mean has the form:()( )= () .(1)where, andare the actual value and random error at time period t, respectively;

()= 1 , ()= 1 + .(2)are polynomials in B of degree p and q, (i=1,2,3,..p) and (i=1,2,3.q) are model parameters, = 1 , is the backward shift operator, p and q are integers and often referred to as orders of the model, and d isan integer and often referred to as order of differencing. Random errors are assumed to be independentlyand identically distributed with a mean of zero and a constant variance of .The Box and Jenkins (1976) methodology includes three iterative steps of model identification, parameterestimation, and diagnostic checking. The basic idea of model identification is that if a time series is generated

from an ARIMA process, it should have some theoretical autocorrelation properties. By matching the empirical

autocorrelation patterns with the theoretical ones, it is often possible to identify one or several potential models

for the given time series. Box and Jenkins (1976) proposed to use the autocorrelation function (ACF) and the

partial autocorrelation function (PACF) of the sample data as the basic tools to identify the order of the ARIMA

model. Some other order selection methods have been proposed based on validity criteria, the information-

theoretic approaches such as the Akaikes information criterion (AIC) (Shibata, 1976) and the minimum

description length (MDL) (Hurvich & Tsai, 1989; Jones, 1975; Ljung, 1987). In addition, in recent years different

approaches based on intelligent paradigms, such as neural networks (Hwang, 2001), genetic algorithms (Minerva

& Poli, 2001; Ong, Huang, & Tzeng, 2005) or fuzzy system (Haseyama & Kitajima, 2001) have been proposed to

improve the accuracy of order selection of ARIMA models.

In the identification step, data transformation is often required to make the time series stationary. Stationarity is

a necessary condition in building an ARIMA model used for forecasting. A stationary time series is characterized

by statistical characteristics such as the mean and the autocorrelation structure being constant over time. When

the observed time series presents trend and heteroscedasticity, differencing and power transformation are

applied to the data to remove the trend and to stabilize the variance before an ARIMA model can be fitted. Once

a tentative model is identified, estimation of the model parameters is straightforward. The parameters are

estimated such that an overall measure of errors is minimized. This can be accomplished using a nonlinear

optimization procedure. The last step in model building is the diagnostic checking of model adequacy. This is

basically to check if the model assumptions about the errorsare satisfied.Several diagnostic statistics and plots of the residuals can be used to examine the goodness of fit of the

tentatively entertained model to the historical data. If the model is not adequate, a new tentative model should

be identified, which will again be followed by the steps of parameter estimation and model verification.


4/14

Diagnostic information may help suggest alternative model(s). This three-step model building process is typically

repeated several times until a satisfactory model is finally selected. The final selected model can then be used

for prediction purposes.

The ANN approach to time series modeling

Recently, computational intelligence systems and among them artificial neural networks (ANNs), which in fact

are model free dynamics, has been used widely for approximation functions and forecasting. One of the most

significant advantages of the ANN models over other classes of nonlinear models is that ANNs are universal

approximators that can approximate a large class of functions with a high degree of accuracy (Chen, Leung, &

Hazem, 2003; Zhang & Min Qi, 2005). Their power comes from the parallel processing of the information from

the data. No prior assumption of the model form is required in the model building process. Instead, the network

model is largely determined by the characteristics of the data. Single hidden layer feed forward network is the

most widely used model form for time series modeling and forecasting (Zhang et al., 1998). The model is

characterized by a network of three layers of simple processing units connected by acyclic links (Fig. 1). The

relationship between the output and the ( ) has the following mathematicalrepresentation:

= + . ,+ , . + .(3)where,,(i=0,1,2p, j=1q) (j=0,1,2q) are model parameters often called connection weights; p is thenumber of input nodes; and q is the number of hidden nodes.

Activation functions can take several forms. The type of activation function is indicated by the situation of the

neuron within the network. In the majority of cases input layer neurons do not have an activation function, as

their role is to transfer the inputs to the hidden layer. The most widely used activation function for the output

layer is the linear function as non-linear activation function may introduce distortion to the predicated output.

The logistic and hyperbolic functions are often used as the hidden layer transfer function that are shown in

Eqs.(4) and (5), respectively. Other activation functions can also be used such as linear and quadratic, each with

a variety of modeling applications.

()= () (4)

Tanh()= ()() (5)

Hence, the ANN model of(1), in fact, performs a nonlinear functional mapping from past observations to the

future value , i.e.,


5/14

= , + (6)where, w is a vector of all parameters and f() is a function determined by the network structure and connection

weights. Thus, the neural network is equivalent to a nonlinear auto-regressive model. The simple network given

by (1) is surprisingly powerful in that it is able to approximate the arbitrary function as the number of hidden

nodes when q is sufficiently large. In practice, simple network structure that has a small number of hidden nodesoften works well in out-of-sample forecasting. This may be due to the over fitting effect typically found in the

neural network modeling process. An overfitted model has a good fit to the sample used for model building but

has poor generalizability to data out of the sample (Demuth & Beale, 2004).

There are some similarities between ARIMA and ANN models. Both of them include a rich class of different

models with different model orders. Data transformation is often necessary to get best results. A relatively large

sample is required in order to build a successful model. The iterative experimental nature is common to their

modeling processes and the subjective judgment is sometimes needed in implementing the model. Because of

the potential over fitting effect with both models, parsimony is often a guiding principle in choosing an

appropriate model for forecasting.

Overview of Extreme Learning Machine

Gradient descent based algorithms require all the weights be updated after every iteration. So, gradient

based algorithms are generally slow and may easily converge to local minima. On the other hand, ELM,

proposed by Huang et al. in 2004, randomly assigns the weights connecting input and hidden layers; and hidden

biases. Then it analytically determines the output weight using the Moore-Penrose generalized inverse. It has

been proved in that given randomly assigned input weights and hidden biases with almost any non-zero

activation function, we can approximate any continuous function on compact sets. Unlike the traditional

algorithms, ELM not only achieves the minimum error but also assigns the smallest norm for the output weights.

The reason for using Moore-Penrose inverse is that according to Bartletts theory, smaller norm of weights

results in better generalization of the feedforward neural network. The advantages of ELM over traditional

algorithms are as follows

ELM can be up to 1000 times faster than the traditional algorithm.ELM has better generalization performance as it not only reaches the smallest error but also the assigns

smallest norm of weights.

Non-differentiable functions can also be used to train SLFNs with ELM learning algorithms.Given a training setN = {(x, t)|x R, i = 1, , n and t R}, activation function g(x), and hidden nodenumber

Nthe mathematical equation for the neural network can be written as:

gwx+ b= o, j = 1, ,N where w=[w, w, w, , w] is the weight vector connecting the ith hidden neuron and the inputneurons and=, , , , is the weight vector connecting the ith hidden neuron and the outputneurons. tj denotes the target vector of the input xj whereas oj denotes the output vector obtained from the

neural network. wi.xj denotes the inner product of wi and xj. The output neurons are chosen to be linear.


6/14

Standard SLFNs with Nhidden neurons with activation function g(x) can approximate these N samples withzero mean error means that o t= 0 , i.e, there exists , wand bsuch that gwx+ b= t,j = 1, , N

The above n equations can be written compactly as:

H

= Twhere

H =g(wx+ b) g(w x+ b ) g(wx+ b) g(w x+ b )

=

and T =ttThe smallest norm least squared solution of the above linear system is:

= HTThe algorithm for the ELM with architecture as shown in figure 1 can be summarized as follows:

Given training sample N, activation function g(x) and number of hidden neurons N,1. Assign random input weightswandbiasb, i = 1 N2. Calculate the hidden layer output matrix H.3. Calculate the output weight = HT

The Hybrid Methodology

Both ARIMA and ANN models have achieved successes in their own linear or nonlinear domains. However, none

of them is a universal model that is suitable for all circumstances. The approximation of ARIMA models to

complex nonlinear problems may not be adequate. On the other hand, using ANNs to model linear problems

have yielded mixed results. For example, using simulated data, Denton showed that when there are outliers or

multicollinearity in the data, neural networks can significantly outperform linear regression models. Markham

and Rakes also found that the performance of ANNs for linear regression problems depends on the sample size

and noise level. Hence, it is not wise to apply ANNs blindly to any type of data. Since it is difficult to completely

know the characteristics of the data in a real problem, hybrid methodology that has both linear and nonlinear

modeling capabilities can be a good strategy for practical use. By combining different models, different aspectsof the underlying patterns may be captured.

It may be reasonable to consider a time series to be composed of a linear autocorrelation structure and a

nonlinear component. That is,

= + ,


7/14

where , denotes the linear component and Nt denotes the nonlinear component. These two components haveto be estimated from the data. First, we let ARIMA to model the linear component, and then the residuals from

the linear model will contain only the nonlinear relationship. Let et denote the residual at time t from the linear

model, then= Where is the forecast value for time t from the estimated relationship (2). Residuals are important in

diagnosis of the sufficiency of linear models. A linear model is not sufficient if there are still linear correlation

structures left in the residuals. However, residual analysis is not able to detect any nonlinear patterns in the

data. In fact, there is currently no general diagnostic statistics for nonlinear autocorrelation relationships.

Therefore, even if a model has passed diagnostic checking, the model may still not be adequate in that nonlinear

relationships have not been appropriately modeled. Any significant nonlinear pattern in the residuals will

indicate the limitation of the ARIMA. By modeling residuals using ANNs, nonlinear relationships can be

discovered. With n input nodes, the ANN model for the residuals will be

= ( . ) + (7)where f is a nonlinear function determined by the neural network and t is the random error. Note that if the

model f is not an appropriate one, the error term is not necessarily random. Therefore, the correct model

identification is critical. Denote the forecast from (7) as View the source, the combined forecast will be+In summary, the proposed methodology of the hybrid system consists of two steps. In the first step, an ARIMA

model is used to analyze the linear part of the problem. In the second step, a neural network model is

developed to model the residuals from the ARIMA model. Since the ARIMA model cannot capture the nonlinear

structure of the data, the residuals of linear model will contain information about the nonlinearity. The resultsfrom the neural network can be used as predictions of the error terms for the ARIMA model. The hybrid model

exploits the unique feature and strength of ARIMA model as well as ANN model in determining different

patterns. Thus, it could be advantageous to model linear and nonlinear patterns separately by using different

models and then combine the forecasts to improve the overall modeling and forecasting performance.

As previously mentioned, in building ARIMA as well as ANN models, subjective judgment of the model order as

well as the model adequacy is often needed. It is possible that suboptimal models may be used in the hybrid

method. For example, the current practice of BoxJenkins methodology focuses on the low order

autocorrelation. A model is considered adequate if low order autocorrelations are not significant even though

significant autocorrelations of higher order still exist. This sub optimality may not affect the usefulness of the

hybrid model. Granger has pointed out that for a hybrid model to produce superior forecasts, the component

model should be suboptimal. In general, it has been observed that it is more effective to combine individual

forecasts that are based on different information sets.


8/14

Empirical Results

1)Data Sets

Two well-known data setsthe Wolf's sunspot data and the Canadian lynx data are used in this study to

demonstrate the effectiveness of the hybrid method. These time series come from different areas and have

different statistical characteristics. They have been widely studied in the statistical as well as the neural network

literature. Both linear and nonlinear models have been applied to these data sets, although more or less

nonlinearities have been found in these series.


9/14

The sunspot data we consider contains the annual number of sunspots from 1700 to 2012, giving a total of 313

observations. The study of sunspot activity has practical importance to geophysicists, environment scientists,

and climatologists. The data series is regarded as nonlinear and non-Gaussian and is often used to evaluate the

effectiveness of nonlinear models. The plot of this time series (see Fig. 1) also suggests that there is a cyclical

pattern with a mean cycle

of about 11 years. The

sunspot data have been

extensively studied with a

vast variety of linear and

nonlinear time series

models including ARIMA

and ANNs.

The lynx series contains

the number of lynxtrapped per year in the

Mackenzie River district of

Northern Canada. The

data shows a periodicity

of approximately 10 years.

The data set has 114

observations,


10/14

corresponding to the period of 18211934. It has also been extensively analyzed in the time series literature

with a focus on the nonlinear modeling. Following other studies, the logarithms (to the base 10) of the data are

used in the analysis.

To assess the forecasting performance of different models, each data set is divided into two samples of training

and testing. The training data set is used exclusively for model development and then the test sample is used to

evaluate the established model. The data compositions for the three data sets are given in Table 1.

2) Results

Only the one-step-ahead forecasting is considered. The mean squared error (MSE) is selected to be the

forecasting accuracy measures.

Table 1 gives the forecasting results for the sunspot data. A subset autoregressive model of order 9 has been

found to be the most parsimonious among all ARIMA models that are also found adequate judged by the

residual analysis.. The neural model and Extreme Learning Machine used is a 231 network based on

experimental results achieved by calculating the mean square error. Forecast horizons of 67 periods are used.

Results show that while applying neural networks alone could not improve the forecasting accuracy over theARIMA model in the 67-period horizon. The results of the hybrid model show that by combining two models

together, the overall forecasting errors can be significantly reduced. The comparison between the actual value

and the forecast value for the 67 points out-of-sample is given in the figure below. Although at some data

points, the hybrid model gives worse predictions than either ARIMA or ELM forecasts, its overall forecasting

capability is improved.


11/14


12/14

Table 1:Sunspot Data 67 points ahead

Model Mean Square Estimate Value

ARIMA 393.73

ANN 482.02

ELM 487.64

Hybrid(ARIMA+ELM) 363.57

Hybrid(ARIMA+ANN) 361.69

In a similar fashion, we have fitted to Canadian lynx data with a subset AR model of order 12. This is a

parsimonious model also used by Subba Rao and Gabr and others. The overall forecasting results for the last 14

years are summarized in Table 2. A neural network structure/Extreme Learning Structure of 221 gives is not

able to improve the results of ARIMA models. Applying the hybrid method, we find a significant decrease in the

MSE error of the data. Figure below gives the gives the actual vs. forecast values with individual models of

Extreme Learning Machine and ARIMA as well as the combined model.


13/14

Table 2:Lynx Data 14 points ahead

Model Mean Square Estimate Value

ARIMA 0.128

ANN .148

ELM .154

Hybrid(ARIMA+ELM) .118Hybrid(ARIMA+ANN) .123

Conclusion

Applying quantitative methods for forecasting and assisting investment decision making has become more

indispensable in business practices than ever before. Time series forecasting is one of the most important

quantitative models that has received considerable amount of attention in the literature. Artificial neural

networks (ANNs) have shown to be an effective, general-purpose approach for pattern recognition,

classification, clustering, and especially time series prediction with a high degree of accuracy Nevertheless, their

performance is not always satisfactory. Theoretical as well empirical evidences in the literature suggest that by

using dissimilar models or models that disagree each other strongly, the hybrid model will have lower

generalization variance or error. Additionally, because of the possible unstable or changing patterns in the data,

using the hybrid method can reduce the model uncertainty, which typically occurred in statistical inference and

time series forecasting.

In this paper, the auto-regressive integrated moving average models and ELM models are applied to propose a

new hybrid method for improving the performance of the artificial neural networks to time series forecasting. In

our proposed model, based on the BoxJenkins methodology in linear modeling, a time series is considered as

nonlinear function of several past observations and random errors. Therefore, in the first stage, an auto-

regressive integrated moving average model is used in order to generate the necessary data, and ExtremeLearning Machine is used to determine a model in order to capture the underlying data generating process and

predict the future, using preprocessed data. Empirical results with two well-known real data sets indicate that

the proposed model can be an effective way in order to yield more accurate model than traditional artificial

neural networks. Thus, it can be used as an appropriate alternative for artificial neural networks.


14/14

References

[1] J.M. Bates, C.W.J. Granger, The combination of forecasts, Oper. Res. Q., 20 (1969), pp. 451468[2] G.E.P. Box, G. Jenkins,Time Series Analysis, Forecasting and ControlHolden-Day, San Francisco, CA (1970)[3] M.J. Campbell, A.M. Walker,A survey of statistical work on the MacKenzie River series of annual

Canadian lynx trappings for the years 1821

1934, and a new analysis,J. R. Statist. Soc. Ser. A, 140 (1977),pp. 411431

[4] F.C. Palm, A. Zellner,To combine or not to combine? Issues of combining forecasts,J. Forecasting, 11(1992), pp. 687701

[5] Z. Tang, C. Almeida, P.A. Fishwick, Time series forecasting using neural networks vs BoxJenkinsmethodology,Simulation, 57 (1991), pp. 303310

[6] G.B. Huang, Q. Zhu, C.Siew, Extreme Learning Machine: A New Learning Scheme of Feedforward NeuralNetworks, International Joint Conference on Neural Networks (2004), Vol. 2, pp:985-990

[7] G.B. Huang, Q. Zhu, C. Siew, Extreme Learning Machine: Theory and Applications, Neurocomputing(2006), Vol. 70, pp: 489-501

[8] G.B. Huang, H. Zhou, X. Ding, R. Zhang, Extreme Learning Machine for Regression and MulticlassClassification, IEEE Transaction on Systems, Man. And Cybernetics(2012), Cybernetics, Vol. 42, No. 2

Time Series forecasting using ARIMA and Neural Network based Hybrid Model

Documents

Transcript of Time Series forecasting using ARIMA and Neural Network based Hybrid Model