Guided Tour on ARIMA Estimation and Forecasting

download Guided Tour on ARIMA Estimation and Forecasting

of 33

Transcript of Guided Tour on ARIMA Estimation and Forecasting

  • 7/29/2019 Guided Tour on ARIMA Estimation and Forecasting

    1/33

    Guided tour on ARIMA estimation and forecastingThis guided tour contains mathematical formulas and/or Greek symbols and aretherefore best viewed with Internet Explorer, because other web browsers may not

    display the "Symbol" fonts involved. For example, "" should be displayed as the Greek

    letter "beta" rather than the Roman "b". If not, reload this guided tour in InternetExplorer, or make the latter your default web browser.

    ARIMA stands forAutoRegressive Integrated Moving Average. The ARIMA modelingand forecasting approach is also known as the Box-Jenkins approach.

    ARMA(p,q) processesI will discuss the "I" in ARIMA later. For the time being is suffices to note that an

    ARIMA(p,0,q) process is the same as an ARMA(p,q) process.

    As is well known (if not to you, stop here and don't use module ARIMA!), the generalform of an ARMA(p,q) process y(t) is:

    y(t) = 1y(t-1) + .... + py(t-p) + + e(t) - 1e(t-1) - .... - qe(t-q)

    where the e(t)'s are independently distributed with zero expectation and variance 2,

    and is a constant. Thus, the pin "ARMA(p,q)" is the maximum lag of the AR part, andqis the maximum lag of the MA part.

    This model can be written more compactly in terms of lag polynomials and lagoperators. Define the lag operatorL as:

    L.y(t) = y(t-1)L2y(t) = y(t-2)

    .............Lpy(t) = y(t-p)

    .............

    etcetera. Then we can write:

    y(t) - 1y(t-1) - .... - py(t-p) = y(t) - 1L.y(t) - .... - pLpy(t) = ap(L)y(t)

    say, where

    ap(L) = 1 - 1L - .... - pLp

    and similarly

    e(t) - 1e(t-1) - .... - qe(t-q) = e(t) - 1L.e(t) - .... - qLqe(t) = bq(L)e(t)

  • 7/29/2019 Guided Tour on ARIMA Estimation and Forecasting

    2/33

    say, where

    bq(L) = 1 - 1L - .... - qLq

    Thus, the ARMA(p,q) model involved can now be written as:

    ap(L)y(t) = + bq(L)e(t)

    If the roots ofap(L) are all outside the (complex) unit circle, i.e., if

    ap(z) = 0 |z| > 1,

    then [ap(L)]-1

    exists and is an infinite order lag polynomial with exponentially vanishingcoefficients:

    [ap(L)]-1 = 1+ j 1jL

    j,

    where |j| < cj for some constant cand a (0,1). If so, we can write the ARMA

    model as a stationary MA() process:

    y(t) = /ap(1) + [ap(L)]-1bq(L)e(t)

    Similarly, if the roots ofbq(L) are all outside the (complex) unit circle then we can write

    the ARMA model as an AR() process:

    [bq(L)]-1ap(L)y(t) = /bq(1) + e(t)

    where [bq(L)]-1ap(L) can be written as:

    [bq(L)]-1ap(L) = 1 - j 1jL

    j,

    so that

    y(t) = j 1jy(t-j) + + e(t),

    with = /bq(1). Thus

    Et-1[y(t)] = j 1jy(t-j) + ,

    which is the best one-step-ahead forecast ofy(t).

    I(d) processesA time series process is called I(d) if we need to apply at least dtimes the firstdifference operator

  • 7/29/2019 Guided Tour on ARIMA Estimation and Forecasting

    3/33

    = 1 - L

    to make the process stationary. Now a time series process x(t) is an ARIMA(p,d,q)process if

    y(t) = d

    x(t) = (1 - L)d

    x(t),

    where y(t) is a stationary ARMA(p,q) process.

    ARIMA estimation and forecasting in practice

    Model and dataThe time series y(t) I shall use has been artifically generated, as follows:

    (1 - 0.7L)(1 - L)y(t) = (1 - 0.7L)y(t) = (1 + 0.5L)(1 - 0.25L4)e(t),

    where e(t) is i.i.d. standard normally distributed. This is an ARIMA(1,1,5) process:

    y(t) = 0.7y(t-1) + e(t) + 0.5e(t-1) - 0.25e(t-4) - 0.125e(t-5).

    The data involved is available as CSV fileARIMADATA.CSV, with y(t) = "ARIMA testdata". The data should be interpreted as quarterly time series, starting from quarter1950.1. Note that this data file has been created under US number setting. Hence, ifyour Windows uses a comma as decimal delimiter you have to convert it to your localnumber setting. See theguided tour on importing Excel files in CSV format.

    Note that the MA lag polynomial b5(L) = (1 + 0.5L)(1 - 0.25L4) is specified as the productof a non-seasonal lag polynomial bns,1(L) = 1 + 0.5L and a seasonal lag polynomialcs,1(L

    4), where cs,1(L) = 1 - 0.25L . In general a seasonal lag polynomial is a polynonial in

    Ls, where sis the number of observations per year (the number of seasons). Forexample, for monthly data s=12, and for weekly data s= 52.

    ARIMA estimation and forecasting in EasyReg

    Import the data fileARIMADATA.CSVin EasyReg, and declare it quarterly time series,with first year 1950 and first quarter 1.

    Next, open "Menu > Single equation models > ARIMA estimation and forecasting". Thenthe following window appears.

    http://econ.la.psu.edu/~hbierens/EasyRegTours/ARIMA_Tourfiles/ARIMADATA.CSVhttp://econ.la.psu.edu/~hbierens/EasyRegTours/ARIMA_Tourfiles/ARIMADATA.CSVhttp://econ.la.psu.edu/~hbierens/EasyRegTours/ARIMA_Tourfiles/ARIMADATA.CSVhttp://econ.la.psu.edu/~hbierens/EasyRegTours/DATACSV.HTMhttp://econ.la.psu.edu/~hbierens/EasyRegTours/DATACSV.HTMhttp://econ.la.psu.edu/~hbierens/EasyRegTours/DATACSV.HTMhttp://econ.la.psu.edu/~hbierens/EasyRegTours/ARIMA_Tourfiles/ARIMADATA.CSVhttp://econ.la.psu.edu/~hbierens/EasyRegTours/ARIMA_Tourfiles/ARIMADATA.CSVhttp://econ.la.psu.edu/~hbierens/EasyRegTours/ARIMA_Tourfiles/ARIMADATA.CSVhttp://econ.la.psu.edu/~hbierens/EasyRegTours/ARIMA_Tourfiles/ARIMADATA.CSVhttp://econ.la.psu.edu/~hbierens/EasyRegTours/DATACSV.HTMhttp://econ.la.psu.edu/~hbierens/EasyRegTours/ARIMA_Tourfiles/ARIMADATA.CSV
  • 7/29/2019 Guided Tour on ARIMA Estimation and Forecasting

    4/33

    Click "Continue":

  • 7/29/2019 Guided Tour on ARIMA Estimation and Forecasting

    5/33

    In order to conduct out of sample forecasting, either append the data with missingvalues (via "Menu > Input > Prepare time series for forecasting"), or select a subset ofobservations. I will choose the latter. Thus, click "Yes":

  • 7/29/2019 Guided Tour on ARIMA Estimation and Forecasting

    6/33

    Choose the subsample 1950.1 through 1997.1. Then the ARIMA model will be fitten tothis subsample, and the observations after 1997.1 will be used to compare forecastsand realizations.

    Click "Bounds OK", and then "Confirm" and "Continue" (in the next window). Then thefollowing window appears:

  • 7/29/2019 Guided Tour on ARIMA Estimation and Forecasting

    7/33

    You now have to tell EasyReg what the order of integration ("d") is. Usually you do notknow this in advance. If so, test the unit root hypothesis, via "Menu > Data analysis >Unit root tests (root 1)". If you don't know what a unit root is, please read my lecturenoteson unit roots. If after reading these lecture notes you still don't understand what aunit root is and how to test for it, click "Don't know".

    In our case d= 1, as indicated. Thus click "1 times OK":

    http://econ.la.psu.edu/~hbierens/LECNOTES.HTMhttp://econ.la.psu.edu/~hbierens/LECNOTES.HTMhttp://econ.la.psu.edu/~hbierens/LECNOTES.HTMhttp://econ.la.psu.edu/~hbierens/LECNOTES.HTMhttp://econ.la.psu.edu/~hbierens/LECNOTES.HTMhttp://econ.la.psu.edu/~hbierens/LECNOTES.HTM
  • 7/29/2019 Guided Tour on ARIMA Estimation and Forecasting

    8/33

    Although in our case the process y(t) has zero expectation so that there is no need foran intercept, in practice this is rare. Therefore, in first instance include an intercept inyour model, and test afterwards whether the parameter involved is zero. Thus, click"Continue":

  • 7/29/2019 Guided Tour on ARIMA Estimation and Forecasting

    9/33

    Now you have to specify the ARMA process foru(t) = y(t) - E[y(t)]. The coefficientsa(1,i) are the non-zero coefficients of the non-seasonal AR lag polynomial, and thecoefficients c(1,i) are the non-zero coefficients of the seasonal AR lag polynomial.Similarly, the coefficients a(2,i) are the non-zero coefficients of the non-seasonal MAlag polynomial, and the coefficients c(2, i) are the non-zero coefficients of the seasonalMA lag polynomial. If your data consist of annual time series the option of specifyingseasonal lag polynomials is not available.

    In our case (1 - 0.7L)u(t) = (1 + 0.5L)(1 - 0.25L4)e(t), hence

    a(1,1) = 0.7

    a(2,1) = -0.5 c(2,1) = 0.25

    Thus you have to click a(1,1), a(2,1) and c(2,1). Of course, in practice you don't knowthis in advance. To determine the lags involved, read first mylecture notes onforecasting, and then use the option "ARIMA model section via information criteria".This module also comes with aguided tour.

    http://econ.la.psu.edu/~hbierens/LECNOTES.HTMhttp://econ.la.psu.edu/~hbierens/LECNOTES.HTMhttp://econ.la.psu.edu/~hbierens/LECNOTES.HTMhttp://econ.la.psu.edu/~hbierens/LECNOTES.HTMhttp://econ.la.psu.edu/~hbierens/EasyRegTours/ARIMAMODSEL.HTMhttp://econ.la.psu.edu/~hbierens/EasyRegTours/ARIMAMODSEL.HTMhttp://econ.la.psu.edu/~hbierens/EasyRegTours/ARIMAMODSEL.HTMhttp://econ.la.psu.edu/~hbierens/EasyRegTours/ARIMAMODSEL.HTMhttp://econ.la.psu.edu/~hbierens/LECNOTES.HTMhttp://econ.la.psu.edu/~hbierens/LECNOTES.HTM
  • 7/29/2019 Guided Tour on ARIMA Estimation and Forecasting

    10/33

    For more advanced time series analysis, see for example:

    James D. Hamilton: Time Series Analysis, Princeton University Press, 1994.

    Now click "Specification OK":

    The only action required is to click "Continue":

  • 7/29/2019 Guided Tour on ARIMA Estimation and Forecasting

    11/33

    The model parameters will be estimated by minimizing te(t)

    2

    , using the simplexmethod of Nelder and Mead. Click first "Simplex method: How it works, and stoppingrules".

  • 7/29/2019 Guided Tour on ARIMA Estimation and Forecasting

    12/33

    I recommend to use in first instance the default stopping rules. After completing the firstiteration round you may wish to decrease the value of "r". Thus click "Stoppings rulesOK". Then the previous window reappears. Click "Start SIMPLEX iteration":

  • 7/29/2019 Guided Tour on ARIMA Estimation and Forecasting

    13/33

    In the current version of module ARIMA the simplex method is restarted until theparameters do not change anymore. As a double check, check "Auto restart" and restartthe simplex iteration. Then click "Simplex method: How it works, and stopping rules"again, and decrease the value of "r":

  • 7/29/2019 Guided Tour on ARIMA Estimation and Forecasting

    14/33

    Click "Stopping rules OK":

  • 7/29/2019 Guided Tour on ARIMA Estimation and Forecasting

    15/33

    Check "Auto restart" and click "Restart SIMPLEX iteration", and then click "Done withSIMPLEX iteration":

  • 7/29/2019 Guided Tour on ARIMA Estimation and Forecasting

    16/33

    Click "Continue":

  • 7/29/2019 Guided Tour on ARIMA Estimation and Forecasting

    17/33

    This window is similar to the "What to do next" window when you run an OLSregression. See theguided tour on OLS estimation. It contains the estimation results,and an options menu.

    Click the "Options" button:

    http://econ.la.psu.edu/~hbierens/EasyRegTours/OLS.HTMhttp://econ.la.psu.edu/~hbierens/EasyRegTours/OLS.HTMhttp://econ.la.psu.edu/~hbierens/EasyRegTours/OLS.HTMhttp://econ.la.psu.edu/~hbierens/EasyRegTours/OLS.HTM
  • 7/29/2019 Guided Tour on ARIMA Estimation and Forecasting

    18/33

    Test parameter restrictionsRecall that the true values of the parameters are:

    b(1) = 0 a(1,1) = 0.7 a(2,1) = -0.5 c(2,1) = 0.25

    In order to see whether the estimates are significantly different from the true values, testthe null hypothesis involved using the "Test parameter restrictions" option. The

    procedure is the same as for OLS (see theguided tour on OLS estimation), andtherefore I will not demonstrate it again how to conduct this test, but only show theresults:

    http://econ.la.psu.edu/~hbierens/EasyRegTours/OLS.HTMhttp://econ.la.psu.edu/~hbierens/EasyRegTours/OLS.HTMhttp://econ.la.psu.edu/~hbierens/EasyRegTours/OLS.HTMhttp://econ.la.psu.edu/~hbierens/EasyRegTours/OLS.HTM
  • 7/29/2019 Guided Tour on ARIMA Estimation and Forecasting

    19/33

    The test result is as expected: The parameter estimates are not significantly differentfrom the true values, at any reasonable significance level.

    Plot the fit

  • 7/29/2019 Guided Tour on ARIMA Estimation and Forecasting

    20/33

    This window does not need explanation.

    One-step ahead forecasts

    Recall that the best one-step ahead forecast ofy(t) takes the form

    Et-1[y(t)] = j 1jy(t-j) + ,

    where the parameters jand can be derived from the parameters of the ARMA model

    fory(t). See myLecture notes on forecasting. Therefore, the best one-step aheadforecast ofy(t) itself takes the form

    Et-1[y(t)] = y(t-1) + Et-1[y(t)].

    Thus both forecast schemes use all the data up to time t-1. The option "One-step aheadforecasts" generates these forecasts:

    http://econ.la.psu.edu/~hbierens/LECNOTES.HTMhttp://econ.la.psu.edu/~hbierens/LECNOTES.HTMhttp://econ.la.psu.edu/~hbierens/LECNOTES.HTMhttp://econ.la.psu.edu/~hbierens/LECNOTES.HTM
  • 7/29/2019 Guided Tour on ARIMA Estimation and Forecasting

    21/33

    Click "Continue":

  • 7/29/2019 Guided Tour on ARIMA Estimation and Forecasting

    22/33

    This picture displays Et-1[y(t)] on the vertical axis and its realisation y(t) on thehorizontal axis, fort= 1997.2 to 1999.4. The closer the points (y(t), Et-1[y(t)]) are tothe (45 degrees) line, the better the forecasts.

    Click "Continue":

  • 7/29/2019 Guided Tour on ARIMA Estimation and Forecasting

    23/33

    The top panel displays the plots ofy(t) (solid line) and its forecasts Et-1[y(t)] (dottedred line) fort= 1997.2 to 1999.4. The bottom panel plots the forecast errors y(t) - Et-

    1[y(t)], together with the one- and two-times standard error bands.

    Click "Continue":

  • 7/29/2019 Guided Tour on ARIMA Estimation and Forecasting

    24/33

    This picture displays Et-1[y(t)] on the vertical axis and its realisation y(t) on the horizontalaxis, fort= 1997.2 to 1999.4. The closer the points (y(t), Et-1[y(t)]) are to the (45degrees) line, the better the forecasts.

    Click "Continue":

  • 7/29/2019 Guided Tour on ARIMA Estimation and Forecasting

    25/33

    The top panel displays the plots ofy(t) (solid line) and its forecasts Et-1[y(t)] (dotted redline) fort= 1997.2 to 1999.4. The bottom panel plots the forecast errors y(t) - Et-1[y(t)].

    Click "Continue". Then you will return to the "What to do next?" window.

    Recursive forecasts

    In recursive forecasting, the unknown y(t+h-j)'s in the one-step ahead forecast scheme

    Et+h[y(t+h)] = j 1jy(t+h-j) +

    are replaced recursively by forecasts, which then yields the best h-step ahead forecast:

    Et[y(t+h)] = j 0h,jy(t-j) + h

    See myLecture notes on forecasting. Thus, these forecasts only use the information upto time t= 1997.1. The corresponding recursive level forecast ofy(t+h) is then

    Et[y(t+h)] = y(t) + Et[y(t+1)] + ... + Et[y(t+h)]

    http://econ.la.psu.edu/~hbierens/LECNOTES.HTMhttp://econ.la.psu.edu/~hbierens/LECNOTES.HTMhttp://econ.la.psu.edu/~hbierens/LECNOTES.HTMhttp://econ.la.psu.edu/~hbierens/LECNOTES.HTM
  • 7/29/2019 Guided Tour on ARIMA Estimation and Forecasting

    26/33

  • 7/29/2019 Guided Tour on ARIMA Estimation and Forecasting

    27/33

  • 7/29/2019 Guided Tour on ARIMA Estimation and Forecasting

    28/33

  • 7/29/2019 Guided Tour on ARIMA Estimation and Forecasting

    29/33

  • 7/29/2019 Guided Tour on ARIMA Estimation and Forecasting

    30/33

    The results in the above windows indicate that recursive hstep ahead forecasting onlyyields reasonable forecasts for modest values ofh. This corresponds to the fact that

    Et[y(t+h)] E[y(t)] as h.

    This is what you see happening in the last window.

    Plot one-step ahead forecast coefficients

    Recall that the best one-step ahead forecast ofy(t+1) takes the form

    Et[y(t+1)] = j 0j+1y(t-j) + .

    In the next window the coefficients a(j) = j+1 are plotted, and their values displayed.

  • 7/29/2019 Guided Tour on ARIMA Estimation and Forecasting

    31/33

    Kernel estimate of the error density

  • 7/29/2019 Guided Tour on ARIMA Estimation and Forecasting

    32/33

    This picture compares the nonparametric kernel estimator of the density ofe(t) with thecorresponding normal density. Note that nonparametric kernel density estimation lacks"parametric backbone" and therefore needs much more data than parametric densityestimation. The effective sample size in this case is too small to do reliablenonparametric estimation.

    Comparison with the OLS module

    An ARMA model can also be estimated via the linear regression (OLS) module, usingthe option "Re-estimate the model with ARMA errors". See theguided tour on OLSestimation. This option produces the same parameter estimates as the ARIMA module.

    However, if you estimate an AR model directly via the linear regression (OLS) module,without using the option "Re-estimate the model with ARMA errors", the estimationresults for the intercept, and eventually time trend and seasonal dummy parameters, willdiffer from the corresponding results obtained via the ARIMA module under review. Thereason is the following. The ARIMA module estimates an AR(p) model with intercept inthe form

    yt= + ut,

    http://econ.la.psu.edu/~hbierens/EasyRegTours/OLS.HTMhttp://econ.la.psu.edu/~hbierens/EasyRegTours/OLS.HTMhttp://econ.la.psu.edu/~hbierens/EasyRegTours/OLS.HTMhttp://econ.la.psu.edu/~hbierens/EasyRegTours/OLS.HTMhttp://econ.la.psu.edu/~hbierens/EasyRegTours/OLS.HTMhttp://econ.la.psu.edu/~hbierens/EasyRegTours/OLS.HTM
  • 7/29/2019 Guided Tour on ARIMA Estimation and Forecasting

    33/33

    ut= 1ut-1 + .... + put-p+ et,

    where = E[yt] and et is white noise, whereas the linear regression module estimatesthis model in the form

    yt= 0 + 1yt-1 + .... + pyt-p+ et.

    The intercept 0 will be different from , because taking expectations in the latter case

    yields = 0 + 1 + .... + p, hence:

    = (1 - 1 - .... - p)-10.

    This is the end of the guided tour on ARIMA estimation and forecasting