Arima Model Bu Budiasih

download Arima Model Bu Budiasih

of 26

Transcript of Arima Model Bu Budiasih

  • 8/11/2019 Arima Model Bu Budiasih

    1/26

    Autoregressive Integrated Moving

    Average (ARIMA)

    Popularly known as the Box-Jenkins

    methodology

  • 8/11/2019 Arima Model Bu Budiasih

    2/26

    ARIMA methodology emphasis not only on constructing

    single-equation or simultaneous-equation models but also on

    analyzing the probabilistic or stochastic properties of economic

    time series on their own set of data.

    Unlike the regression models, in which Yi is explained by k

    regressor X1, X2, X3, ... , Xkthe BJ-type time series models

    allow Yito be explained by past, or lagged, values of Y itself

    and stochastic error terms.

    For this reason, ARIMA models are sometimes called a

    theoretic model because they are not derived from any

    economic theory and economic theories are often the basis of

    simultaneous-equation models.

    Note that the emphasis in this topic is on univariate ARIMA

    models, as this is pertaining to a single time series.

    But can be extended to multivariate ARIMA models.

  • 8/11/2019 Arima Model Bu Budiasih

    3/26

    Let us work with the GDP time series data for the United

    States given in Table.

    A plot of this time series is given in Figures1 (undifferenced

    GDP) and 2 (first-differenced GDP)

    GDP in level form is nonstationary but in (first) differenced

    form it is stationary.

    If a time series is stationary, then it can fit for ARIMA model

    in a variety of ways.

    An Autoregressive (AR) Process

    Let Ytrepresent GDP at time t.

    If we model Ytas (Yt - ) = 1(Yt-1) + ut where is the mean of Y and where utis an uncorrelated

    random error term with zero mean and constant variance 2

    (i.e., it is white noise), then we say that Ytfollows a first-order

    autoregressive, or AR(l), stochastic process

  • 8/11/2019 Arima Model Bu Budiasih

    4/26

    Here the value of Y at time t depends on its value in the

    previous time period and a random term; the Y values are

    expressed as deviations from their mean value.

    In other words, this model says that the forecast value of Y at

    time t is simply some proportion (=l) of its value at time (t-1)

    plus a random shock or disturbance at time t; again the Y

    values are expressed around their mean values.

    But in the model, (Yt - ) = 1(Yt-1) + 2(Yt-2) + ut

    Ytfollows a second-order autoregressive, or AR(2), process.

    The value of Y at time t depends on its value in the previous

    two time periods, the Y values being expressed around their

    mean value .

    In general, (Yt - ) = 1(Yt-1) + 2(Yt-2) + .. + p

    (Yt-p) + ut

    Here Yt is a pth order autoregressive or AR(p), process.

  • 8/11/2019 Arima Model Bu Budiasih

    5/26

    A Moving Average (MA) Process

    Suppose we model Y as follows: Yt= + 0ut+ 1ut-1

    where is a constant and utas before, is the white noisestochastic error term.

    Here Y at time t is equal to a constant plus a moving average

    of the current and past error terms.

    Thus, in the present case, Y follows a first-order movingaverage, or an MA(1), process.

    But if Y follows the expression Yt= + 0ut+ 1ut-1+ 2ut-2

    then it is an MA(2) process.

    Generally, Yt= + 0ut+ 1ut-1+ 2ut-2+ .. + qut-qis anMA(q) process.

    In short, a moving average process is simply a linear

    combination of white noise error terms.

  • 8/11/2019 Arima Model Bu Budiasih

    6/26

    An Autoregressive and Moving Average (ARMA) Process

    It is quite likely that Y has characteristics of both AR and MA

    and is therefore ARMA. Thus, Ytfollows an ARMA (1, 1) process if it can be written

    as Yt= + 1Yt-1+ 0ut+ 1ut-1

    because there is one autoregressive and one moving average

    term and represents a constant term. In general, in an ARMA (p, q) process, there will be p

    autoregressive and q moving average terms.

    An Autoregressive Integrated Moving Average (ARIMA)

    Process Many economic time series are nonstationary, that is, they are

    integrated.

  • 8/11/2019 Arima Model Bu Budiasih

    7/26

    If a time series is integrated of order 1 [i.e., it is I(1)], its first

    differences are I(0), that is, stationary.

    Similarly, if a time series is I(2), its second difference is I(0).

    In general, if a time series is I(d), after differencing it d times

    we obtain an I(0) series.

    Therefore, if in a time series d times difference make it

    stationary, then it is ARIMA (p, d, q) model is called an

    autoregressive integrated moving average time series model.

    where p denotes the number of autoregressive terms, d the

    number of times the series has to be differenced before it

    becomes stationary, and q the number of moving average terms.

    An ARIMA(2,1,2) time series has to be differenced once (d =1)

    becomes stationary and it has two AR and two MA terms.

  • 8/11/2019 Arima Model Bu Budiasih

    8/26

    The important point to note is that to use the Box-Jenkins

    methodology, we must have either a stationary time series or a

    time series that is stationary after one or more differencing. Reason for assuming stationarity can be explained as follows:

    The objective of B-J [Box-Jenkins] is to identify and estimate

    a statistical model which can be interpreted as having

    generated the sample data. If this estimated model is then to be used for forecasting we

    must assume that the features of this model are constant

    through time, and particularly over future time periods.

    Thus the reason for requiring stationary data is that any modelwhich is inferred from these data can itself be interpreted as

    stationary or stable, therefore providing valid basis for

    forecasting.

  • 8/11/2019 Arima Model Bu Budiasih

    9/26

    THE BOX-JENKINS (BJ) METHODOLOGY

    Looking at a time series, such as the US GDP series in Figure.

    How does one know whether it follows a purely AR process

    (and if so, what is the value of p) or a purely MA process (and

    if so, what is the value of q) or an ARMA process (and if so,

    what are the values of p and q) or an ARIMA process.

    In which case we must know the values of p, d, and q.

    The BJ methodology answering these questions.

    The method consists of four steps:

    Step 1. Identification:That is, find out the appropriate values

    of p, d, and q using correlogram and partial correlogram and

    Augmented Dickey Fuller Test.

  • 8/11/2019 Arima Model Bu Budiasih

    10/26

  • 8/11/2019 Arima Model Bu Budiasih

    11/26

    Step 2. Estimation:Having identified the appropriate p and q

    values, the next stage is to estimate the parameters of the

    autoregressive and moving average terms included in themodel.

    Sometimes this calculation can be done by simple least squares

    but sometimes we will have to resort to nonlinear (in

    parameter) estimation methods.

    Since this task is now routinely handled by several statistical

    packages, we do not have to worry about the actual

    mathematics of estimation.

    Step 3. Diagnostic checking:Having chosen a particular

    ARIMA model and having estimated its parameters, we next

    see whether the chosen model fits the data reasonably well, for

    it is possible that another ARIMA model might do the job as

    well.

  • 8/11/2019 Arima Model Bu Budiasih

    12/26

    This is why Box-Jenkins ARIMA modeling is more an art than

    a science; considerable skill is required to choose the right

    ARIMA model. One simple test of the chosen model is to see if the residuals

    estimated from this model are white noise; if they are, we can

    accept the particular fit; if not, we must start over.

    Thus, the BJ methodology is an iterative process.

    Step 4. Forecasting:One of the reasons for popularity of the

    ARIMA modeling is its success in forecasting.

    In many cases, the forecasts obtained by this method are more

    reliable than those obtained from the traditional econometricmodeling, particularly for short-term forecasts.

    Let us look at these four steps in some detail. Throughout, we

    will use the GDP data given in Table .

  • 8/11/2019 Arima Model Bu Budiasih

    13/26

    IDENTIFICATION

    The chief tools in identification are the autocorrelation

    function (ACF), the partial autocorrelation function (PACF),

    and the resulting correlogram, which are simply the plots of

    ACFs and PACFs against the lag length.

    The concept of partial autocorrelation is analogous to theconcept of partial regression coefficient.

    In the k-variable multiple regression model, the kth regression

    coefficient kmeasures the rate of change in the mean value of

    the regress and for a unit change in the kth regressor Xk,holding the influence of all other regressors constant.

  • 8/11/2019 Arima Model Bu Budiasih

    14/26

    In similar fashion the partial autocorrelation kkmeasurescorrelation between (time series) observations that are k time

    periods apart after controlling for correlations at intermediatelags (i.e., lag less than k).

    In other words, partial autocorrelation is the correlationbetween Ytand Yt-kafter removing the effect of intermediateY's.

    In Figure, we show the correlogram and partial correlogram ofthe GDP series.

    From this figure, two facts stand out:

    First, the ACF declines very slowly and ACF up to 23 lags are

    individually statistically significantly different from zero, forthey all are outside the 95% confidence bounds.

    Second, after the first lag, the PACF drops dramatically, andall PACFs after lag 1 are statistically insignificant.

  • 8/11/2019 Arima Model Bu Budiasih

    15/26

  • 8/11/2019 Arima Model Bu Budiasih

    16/26

    Since the US GDP time series is not stationary, we have to

    make it stationary before we can apply the Box-Jenkins

    methodology.

    In next Figure we plotted the first differences of GDP.

    Unlike previous Figure, we do not observe any trend in this

    series, perhaps suggesting that the first-differenced GDP time

    series is stationary.

    A formal application of the Dickey-Fuller unit root test showsthat that is indeed the case.

    Now we have a different pattern of ACF and PACE The ACFs

    at lags 1, 8, and 12 seem statistically different from zero.

    Approximate 95% confidence limits for kare -0.2089 and+0.2089.

    But at all other lags are not statistically different from zero.

    This is also true of the partial autocorrelations .kk

  • 8/11/2019 Arima Model Bu Budiasih

    17/26

  • 8/11/2019 Arima Model Bu Budiasih

    18/26

    Now how do the correlogram given in Figure enable us to findthe ARMA pattern of the GDP time series?

    We will consider only the first differenced GDP series becauseit is stationary.

    One way of accomplishing this is to consider the ACF andPACF and the associated correlogram of a selected number ofARMA processes, such as AR(l), AR(2), MA(1), MA(2),ARMA(1, 1), ARIMA(2, 2), and so on.

    Since each of these stochastic processes exhibits typicalpatterns of ACF and PACF, if the time series under study fitsone of these patterns we can identify the time series with that

    process. Of course, we will have to apply diagnostic tests to find out if

    the chosen ARMA model is reasonably accurate.

  • 8/11/2019 Arima Model Bu Budiasih

    19/26

    What we plan to do is to give general guidelines (see Table );

    the references can give the details of the various stochastic

    processes. The ACFs and PACFs of AR(p) and MA(q) processes have

    opposite patterns; in AR(p) case the AC declines geometrically

    or exponentially but the PACF cuts off after a certain number

    of lags, whereas the opposite happens to an MA(q) process.

    Table: Theoretical Patterns of ACF and PACF

    Type of Model Typical pattern of ACF Typical pattern of PACF

    AR(p)

    Decays exponentially

    or with damped sinewave pattern or both

    Significant spikesthrough lags p

    MA(q)

    Significant spikes

    through lags q Declines exponentially

    ARMA(p,q) Exponential decay Exponential decay

  • 8/11/2019 Arima Model Bu Budiasih

    20/26

    ARIMA Identification of US GDP:

    The correlogram and partial correlogram of the stationary

    (after first-differencing) US GDP for 1970-1 to 1991-IV given

    in Figure shown

    The autocorrelations decline up to lag 4, then except at lags 8

    and 12, the rest of them are statistically not different from zero

    (the solid lines shown in this figure give the approximate 95%

    confidence limits). The partial autocorrelations with spikes at lag 1, 8, and 12

    seem statistically significant but the rest are not; if the partial

    correlation coefficient were significant only at lag 1, we could

    have identified this as an AR (l) model. Let us therefore assume that the process that generated the

    (first-differenced) GDP is at the most an AR (12) process.

    We do not have to include all the AR terms up to 12, only the

    AR terms at lag 1, 8, and 12 are significant.

  • 8/11/2019 Arima Model Bu Budiasih

    21/26

    ESTIMATION OFTHE ARIMA MODEL

    Let denote the first differences of US GDP.

    Then our tentatively identified AR model is

    Using Eviews, we obtained the following estimates:

    t = (7.7547) (3.4695) (-2.9475) (-2.6817)

    R2= 0.2931 d = 1.7663

    *

    tY

    *

    1212

    *

    88

    *

    11

    *

    tttt YYYY

    *

    12

    *

    8

    *

    1

    *2644.02994.03428.00894.23

    tttt YYYY

  • 8/11/2019 Arima Model Bu Budiasih

    22/26

    DIAGNOSTIC CHECKING

    How do we know that the above model is a reasonable fit to

    the data?

    One simple diagnostic is to obtain residuals from the above

    model and obtain ACF and PACF of these residuals, say, up to

    lag 25.

    The estimated AC and PACF are shown in Figure.

    As this figure shows, none of the autocorrelations and partial

    autocorrelations is individually statistically significant.

    Nor is the sum of the 25 squared autocorrelations, as shown by

    the Box-Pierce Q and Ljung-Box LB statistics statistically

    significant.

    Correlogram of autocorrelation and partial autocorrelation give

    that the residuals estimated from are purely random. Hence,

    there may not be any need to look for another ARIMA model.

  • 8/11/2019 Arima Model Bu Budiasih

    23/26

  • 8/11/2019 Arima Model Bu Budiasih

    24/26

    FORECASTING

    Suppose, on the basis of above model, we want to forecast

    GDP for the first four quarters of 1992. But in the above model the dependent variable is change in the

    GDP over the previous quarter.

    Therefore, if we use the above model, what we can obtain are

    the forecasts of GDP changes between the first quarter of 1992and the fourth quarter of 1991, second quarter of 1992 over the

    first quarter of 1992, etc.

    To obtain the forecast of GDP level rather than its changes, we

    can "undo" the first-difference transformation that we had usedto obtain the changes.

    (More technically, we integrate the first-differenced series.)

  • 8/11/2019 Arima Model Bu Budiasih

    25/26

    To obtain the forecast value of GDP (not GDP) for 1992-1,

    we rewrite model as

    Y1992,I- Y1991,IV= + l[Y1991,IVY1991,III] + 8[Y1989,IV

    Y1989,III] + 12[Y1988,IVY1988,III] + u1992-I

    That is, Y1992,I= + (1+l)Y1991,IVlY1991,III+ 8Y1989,IV8Y1989,III+ 12Y1988,IV12Y1988,III + u1992-I

    The values of , l, 8, and 12are already known from the

    estimated regression.

    The value of u1992-Iis assumed to be zero.

    Therefore, we can easily obtain the forecast value of Y1992-I.

    *

    1212

    *

    88

    *

    11

    *

    tttt YYYY

  • 8/11/2019 Arima Model Bu Budiasih

    26/26