INTERVENTION BASED ARIMA TIME-SERIES MODELS

30
INTERVENTION BASED ARIMA TIME-SERIES MODELS Ramasubramanian V. Indian Agricultural Statistics Research Institute Library Avenue, New Delhi 110012 [email protected] 1. Time series models Time series (TS) data refers to observations on a variable that occurs in a time sequence. Mostly these observations are collected at equally spaced, discrete time intervals. The TS movements of such chronological data can be resolved or decomposed into discernible components as trend, periodic (say, seasonal), cyclical and irregular variations. One or two of these components may overshadow the others in some series. A basic assumption in any TS analysis/modeling is that some aspects of the past pattern will continue to remain in the future. Here it is tacitly assumed that information about the past is available in the form of numerical data. Ideally, at least 50 observations are necessary for performing TS analysis/ modeling, as propounded by Box and Jenkins who were pioneers in TS modeling. TS models have advantages over other statistical models in certain situations. They can be used more easily for forecasting purposes because historical sequences of observations upon study variables are readily available from published secondary sources. These successive observations are statistically dependent and TS modeling is concerned with techniques for the analysis of such dependencies. Thus in TS modeling, the prediction of values for the future periods is based on the pattern of past values of the variable under study, but not generally on explanatory variables which may affect the system. There are two main reasons for resorting to such TS models. First, the system may not be understood, and even if it is understood it may be extremely difficult to measure the cause and effect relationship, second, the main concern may be only to predict what will happen and not to know why it happens. Many a time, collection of information on causal factors (explanatory variables) affecting the study variable(s) may be cumbersome /impossible and hence availability of long series data on explanatory variables is a problem. In such situations, the TS models are a boon for forecasters. Hence, if TS models are put to use, say, for instance, for forecasting purposes, then they are especially applicable only in the ‗short term‘. Decomposition models are among the oldest approaches to TS analysis albeit a number of theoretical weaknesses from a statistical point of view. These were followed by the crudest form of forecasting methods called the moving averages method. As an improvement over this method which had equal weights, exponential smoothing methods came into being which gave more weights to recent data. Exponential smoothing methods have been proposed initially as just recursive methods without any distributional assumptions about the error structure in them, and later, they were found to be particular cases of the statistically sound AutoRegressive Integrated Moving Average (ARIMA) models.

Transcript of INTERVENTION BASED ARIMA TIME-SERIES MODELS

Page 1: INTERVENTION BASED ARIMA TIME-SERIES MODELS

INTERVENTION BASED ARIMA TIME-SERIES MODELS

Ramasubramanian V.

Indian Agricultural Statistics Research Institute

Library Avenue, New Delhi – 110012

[email protected]

1. Time series models

Time series (TS) data refers to observations on a variable that occurs in a time sequence.

Mostly these observations are collected at equally spaced, discrete time intervals. The TS

movements of such chronological data can be resolved or decomposed into discernible

components as trend, periodic (say, seasonal), cyclical and irregular variations. One or

two of these components may overshadow the others in some series. A basic assumption

in any TS analysis/modeling is that some aspects of the past pattern will continue to

remain in the future. Here it is tacitly assumed that information about the past is

available in the form of numerical data. Ideally, at least 50 observations are necessary for

performing TS analysis/ modeling, as propounded by Box and Jenkins who were pioneers

in TS modeling.

TS models have advantages over other statistical models in certain situations. They can

be used more easily for forecasting purposes because historical sequences of observations

upon study variables are readily available from published secondary sources. These

successive observations are statistically dependent and TS modeling is concerned with

techniques for the analysis of such dependencies. Thus in TS modeling, the prediction of

values for the future periods is based on the pattern of past values of the variable under

study, but not generally on explanatory variables which may affect the system. There are

two main reasons for resorting to such TS models. First, the system may not be

understood, and even if it is understood it may be extremely difficult to measure the

cause and effect relationship, second, the main concern may be only to predict what will

happen and not to know why it happens. Many a time, collection of information on causal

factors (explanatory variables) affecting the study variable(s) may be cumbersome

/impossible and hence availability of long series data on explanatory variables is a

problem. In such situations, the TS models are a boon for forecasters. Hence, if TS

models are put to use, say, for instance, for forecasting purposes, then they are especially

applicable only in the ‗short term‘.

Decomposition models are among the oldest approaches to TS analysis albeit a number

of theoretical weaknesses from a statistical point of view. These were followed by the

crudest form of forecasting methods called the moving averages method. As an

improvement over this method which had equal weights, exponential smoothing methods

came into being which gave more weights to recent data. Exponential smoothing methods

have been proposed initially as just recursive methods without any distributional

assumptions about the error structure in them, and later, they were found to be particular

cases of the statistically sound AutoRegressive Integrated Moving Average (ARIMA)

models.

Page 2: INTERVENTION BASED ARIMA TIME-SERIES MODELS

M III: 1: Intervention based ARIMA Time-series Models

M III-2

A detailed discussion regarding various TS components has been done by Croxton et al.

(1979). A good account on exponential smoothing methods is given in Makridakis et al.

(1998). A practical treatment on ARIMA modeling along with several case studies can

be found in Pankratz (1983). A reference book on ARIMA and related topics such as

intervention models with a more rigorous theoretical flavour is by Box et al. (1994).

2. Time series components

An important step in analysing TS data is to consider the types of data patterns, so that

the models most appropriate to those patterns can be utilized. Four types of TS

components can be distinguished.

They are

(i) Horizontal when data values fluctuate around a constant value

(ii) Trend when there is long term increase or decrease in the data

(iii)Seasonal when a series is influenced by seasonal factor and recurs on a regular

periodic basis

(iv) Cyclical when the data exhibit rises and falls that are not of a fixed period

Note that many data series include combinations of the preceding patterns. After

separating out the existing patterns in any TS data, the pattern that remains unidentifiable

form the ‗random‘ or ‗error‘ component. Time plot (data plotted over time) and seasonal

plot (data plotted against individual seasons in which the data were observed) help in

visualizing these patterns while exploring the data.

Trend analysis of TS data is usually done to analyse a variable over time to detect or

investigate long-term changes. Trend is ‘long-term' behaviour of a TS process usually in

relation to the mean level. The trend of a TS may be studied because the interest lies in

the trend it self, or may be to eliminate the trend statistically in order to have insight into

other components such as periodic variations in the series. A periodic movement is one

which recurs with some degree of regularity, within a definite period. The most

frequently studied periodic movement is that which occurs within a year and which is

known as seasonal variation. Sometimes the TS data are de-seasonalized for the purpose

of making the other movements (particularly trend) more readily discernible. Climatic

conditions directly affect the production system in agriculture and hence in turn their

patterns of prices and thus are primarily responsible for most of the seasonal variations

exhibited in such series.

Over a period of time, a TS is very likely to show a tendency to increase or to decrease

otherwise termed as an upward or downward trend respectively. One should not loose

sight of the underlying factors that sometimes may cause such trend like growth in

population, price changes etc. Technological developments and adoption patterns have

Page 3: INTERVENTION BASED ARIMA TIME-SERIES MODELS

M III: 1: Intervention based ARIMA Time-series Models

M III-3

been affecting agriculture so as to increase output enormously. Not always keeping pace

with them, but induced by them, have been changes in the main variable (read here price

of agricultural commodities) under study. Not all historical series show upward trends.

Some, say, plant disease incidence, exhibit a generally downward trend. This particular

declining trend is attributable to better and more widely available advisory and extension

services or due to good government policies. An economic series may have a downward

trend because a better or cheaper substitute may be available.

Many techniques such as time plots, auto-correlation functions, box plots and scatter

plots abound for suggesting relationships with possibly influential factors. For long and

erratic series, time plots may not be helpful. Alternatives could be to go for smoothing or

averaging methods like moving averages, exponential smoothing methods etc. In fact, if

the data contains considerable error, then the first step in the process of trend

identification is smoothing.

3. Moving average methods

Before discussing any decomposition method, the method of moving averages is

discussed as this will be used for the same.

(i) Simple moving averages

A moving average (MA) is simply a numerical average of the last N data points. There

are prior MA, centered MA etc. in the TS literature. In general, the moving average at

time t, taken over N periods, is given by

where Yt is the observed response at time t. Another way of stating the above equation is

Mt[1]

= Mt1[1]

+ (YtYtN ) / N

At each successive time period the most recent observation is included and the farthest

observation is excluded for computing the average. Hence the name ‗moving‘ averages.

(ii) Double moving averages

The simple moving average is intended for data of constant and no trend nature. If the

data have a linear or quadratic trend, the simple moving average will be misleading.

In order to correct for the bias and develop an improved forecasting equation, the double

moving average can be calculated. To calculate this, simply treat the moving averages

Mt[1]

over time as individual data points and obtain a moving average of these averages.

(iii) Centered moving averages

N

Y.......YYM 1Nt1tt]1[

t

Page 4: INTERVENTION BASED ARIMA TIME-SERIES MODELS

M III: 1: Intervention based ARIMA Time-series Models

M III-4

In order to ensure that the average was centered at the middle of the data values being

averaged, the order or span or term k of the moving average (MA) in principle required to

be an odd number. This issue of not able to use any general span (read even number) can

be overcome by taking an additional 2-period moving average of the k-period moving

average denoted by 2 x k MA. The centered MA can be thought of as a weighted MA,

where the weights of the first and last terms will be different from those of the other

terms. Thus, a 2 x k MA smother is equivalent to a weighted MA of order k+1 with

weights 1/k for all observations except for the first and last observations in the average,

which have weights 1/2k. For instance, if a 2 x 4 MA i.e. a centered 4 MA was used with

quarterly data each quarter would be given equal weight. The ends of the moving

average will apply to the same quarter in consecutive years. So each quarter receives the

same weight with the weight for the quarter at the ends of the moving average split

between the two years. It is this property that makes 2 x 4 MA very useful for estimating

a trend-cycle in the presence of quarterly seasonality. The seasonal variation with

slightly longer or a slightly shorter MA will still retain some seasonal variation. An

alternative to a 2 x 4 MA for quarterly data is a 2 x 8 MA or 2 x 12 MA which will also

give equal weights to all quarters and produce a smoother fit than the 2 x 4 MA. Other

MAs tend to be contaminated by the seasonal variation, although many times do not seem

to be particularly serious. .

The following is useful in discussing weighted and centered MAs. Let Y1, Y2, Y3, Y4 and

Y5 etc. till TN be the given TS data for N time periods. Thus to calculate a 4-term MA for

the data, the trend-cycle at time period 3 could be calculated as either

(Y1 + Y2 + Y3 + Y4) /4 = T2.5 (say) or (Y2 + Y3 + Y4 +Y5) /4 = T3.5 (say). That is, for the

period 3, whether to include two terms on the left and two terms on the right or the other

way is a problem. The centre of the first Ma is at 2.5 (half a period early) while the

centre of the second Ma is at 3.5 (half a period late). However, the average of the two

MAs is centered at 3, just where it should be. This can be done by using a centered MA

i.e. T3 = (T2.5 + T3.5) /2 = (Y1 + 2Y2 + 2Y3 + 2Y4 +Y5)/8. Here note that the first and last

terms have the weights of 1/8 and all other terms have weights of double that value i.e. ¼.

Therefore, a 2 x 4 MA smoother is equivalent to a weighted MA of order 5.

(iv) Caveat

To illustrate a cautionary note, we will make use of a simple dataset that represents a time

series meant to represent sales figures for some product over five years. The data are

quarterly so the series contains 20 data points overall.

Page 5: INTERVENTION BASED ARIMA TIME-SERIES MODELS

M III: 1: Intervention based ARIMA Time-series Models

M III-5

Year Quarter Sales (1000‘s)

1989 Q1 9

1989 Q2 50

1989 Q3 1

1989 Q4 7

1990 Q1 38

1990 Q2 68

1990 Q3 11

1990 Q4 43

1991 Q1 7

1991 Q2 41

1991 Q3 13

1991 Q4 12

1992 Q1 50

1992 Q2 36

1992 Q3 16

1992 Q4 35

1993 Q1 41

1993 Q2 64

1993 Q3 1

1993 Q4 53

Smoothed data takes out random error from so-called ―noisy‖ data—data with lots of

noise. The analogy is the signal to noise ratio in radio broadcasting, for instance. A

noisy channel is one with so much static (the noise) that it is hard to hear the program

(the signal). Likewise, a time series with lots of noise makes it hard to see a pattern,

which might be a trend.

A simple example would be a linear series with little or no noise (Figure 6). You see that

the Moving Averages always lag behind the actual data. So substituting the Moving

Average for Data Point #7 for the Actual Data for #7 would give a number that is too

low. If the trends were decreasing the Moving Averages would usually be larger than the

actual data. The reason for the difference is that Moving Averages and Exponential

Smoothing only use data from the past, dragging down (or up) the real numbers with past

values which, in the case of a strong trend, are always higher or lower than the actual data

point.

Page 6: INTERVENTION BASED ARIMA TIME-SERIES MODELS

M III: 1: Intervention based ARIMA Time-series Models

M III-6

We can solve this problem by ―centering‖ the Moving Average on the data point that we

are substituting for. Rather than averaging the first three values to get the value for Data

Point #3, we use that same number for Data Point #2, the center of the first three

numbers. That basically shifts the Moving Average graph up, right on top of the actual

data (Figure 7). If there were more noise in that data, it would smooth it without

destroying the trend.

Notice, however, that there is a cost. Moving the data up loses the last value of the

Moving Average (#10) since there is no number after it to include in the Average. The

problem would be worse for MA5 and MA7 which would give up the last two or three

values respectively. That could be a problem because some forecasting methods use that

last data point as the jumping off point of the forecast.

The other way of handling the problem is simply not to smooth curves that have little or

no random error in them. And that makes sense. We do not remove seasonality for a

series that has little or none. So likewise, do not try to remove noise when trend is more

than the random error. It just messes up what‘s left.

Page 7: INTERVENTION BASED ARIMA TIME-SERIES MODELS

M III: 1: Intervention based ARIMA Time-series Models

M III-7

4. Classical time series decomposition methods

Decomposition methods are among the oldest approaches to TS analysis. A crude yet

practical way of decomposing the original data (including the cyclical pattern in the trend

itself) is to go for a seasonal decomposition either by assuming an additive or

multiplicative model such as

Yt = Tt + St + Et or Yt = Tt . St . Et

where Yt - Original TS data

Tt - Trend component

St – Seasonal component

Et – Error/ Irregular component

If the seasonal variations of a TS increases as the level of the series increases then one

has to go for a multiplicative model else an additive model. Alternatively, an additive

decomposition of the logged values of the TS can also be done. The decomposition

methods may enable one to study the TS components separately or will allow workers to

de-trend or to do seasonal adjustments if needed for further analysis. The steps involved

in above two classical decomposition methods will be discussed in detail and other

methods and developments will be stated in brief subsequently.

Earlier workers observed that when an underlying pattern exists in a data series, that

pattern can be distinguished from randomness by smoothing (averaging) past values.

This concept was exploited in the methods such as moving averages, exponential

smoothing methods to eliminate randomness so the pattern can be projected into the

future and used as forecast models. Decomposition methods usually try to identify two

separate components i.e. trend-cycle and seasonality of the basic underlying pattern that

tend to characterize economic and business series. The seasonal factor relate to periodic

fluctuations of constant length that are caused by such things as weather factors, month of

the year, timing of holidays and corporate policies. Thus the given TS data is divided as

pattern plus error.

The several approaches that exist in decomposing TS data, albeit with a theoretical

weakness from a statistical point of view, is based on one common premise that the

separation is empirical and consists of first removing the trend-cycle, then isolating the

seasonal component. Another key step in all decomposition methods involves smoothing

the original data. Any residual is assumed as ‗irregular‘ or ‗remainder‘ component

identified as the difference between the combined effect of the two subpatterns of the

series and the actual data.

(i) Additive decomposition

Consider a TS which is additive in nature with seasonal period 12 such that, preserving

the earlier mentioned notations, Yt = Tt + St + Et .

Page 8: INTERVENTION BASED ARIMA TIME-SERIES MODELS

M III: 1: Intervention based ARIMA Time-series Models

M III-8

The steps involved are

1. The trend cycle Tt is computed using a centered MA i.e. a 2 x 12 MA.

2. The de-trended series is computed by subtracting the trend-cycle component from the

data, leaving the seasonal and irregular terms. That is, Yt - Tt = St + Et.

3. For calculating the seasonal component, assuming that the same is constant from year

to year, gather all the de-trended values for a given month and take the average. The

seasonal component St is obtained by stringing together this set of 12 values called

seasonal indices (one for each month) repeated the same for each year of data.

4. Finally, the irregular series Et is computed by simply subtracting the estimated

seasonality and trend – cycle from the original data series.

(ii) Multiplicative decomposition

Consider a TS with periodicity 12 (i.e. monthly data) which is multiplicative in nature in

the sense that the seasonal variation increases with the level of the series i.e. Yt = Tt * St *

Et.

The steps involved are

1. The trend cycle Tt is computed using a centered MA i.e. a 2 x 12 MA. There will be

some values missing both at the beginning and the end because of the averaging

procedure used.

2. The de-trended series is computed as the ratio of actual-to-moving averages. Thus the

resultant values Rt = (Yt / Tt )= St * Et isolates the additional two components of the

TS.

3. For calculating the seasonal component, assuming that the same is constant from year

to year, gather all the de-trended values for a given month and take the average. The

seasonal component St is obtained by stringing together this set of 12 values called

seasonal indices (one for each month given in bold values in col. 5) repeated the same

for each year of data.

4. Finally, the irregular series Et is computed simply as the ratio of the original data to

the estimated seasonality and trend – cycle components as (Yt /St * Tt). Thus apart

from de-trending, the data has also been de-seasonalised.

Page 9: INTERVENTION BASED ARIMA TIME-SERIES MODELS

M III: 1: Intervention based ARIMA Time-series Models

M III-9

Illustration for Multiplicative decomposition using MSExcel

(a) Data set (International Airline passenger totals in thousands; Box et al., 1994)

1 2 3 4 5 6 7 8 9 10 11 12

1949 112 118 132 129 121 135 148 148 136 119 104 118

1950 115 126 141 135 125 149 170 170 158 133 114 140

1951 145 150 178 163 172 178 199 199 184 162 146 166

1952 171 180 193 181 183 218 230 242 209 191 172 194

1953 196 196 236 235 229 243 264 272 237 211 180 201

1954 204 188 235 227 234 264 302 293 259 229 203 229

1955 242 233 267 269 270 315 364 347 312 274 237 278

1956 284 277 317 313 318 374 413 405 355 306 271 306

(b) Graph

0

50

100

150

200

250

300

350

400

450

1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96

(c) Multiplicative decomposition calculations

col.1 col.2 col.3 col.4 col.5 col.6 col.7

Yt

12

MA Tt=2x12MA detrended St Et

Tt*St*

Et

2000 1 112 -

2 118 -

3 132 -

4 129 -

5 121 -

6 135 -

7 148 126.67 126.79 116.73 119.97 0.010 148.00

Page 10: INTERVENTION BASED ARIMA TIME-SERIES MODELS

M III: 1: Intervention based ARIMA Time-series Models

M III-10

8 148 126.92 127.25 116.31 118.80 0.010 148.00

9 136 127.58 127.96 106.28 105.64 0.010 136.00

10 119 128.33 128.58 92.55 92.01 0.010 119.00

11 104 128.83 129.00 80.62 79.80 0.010 104.00

12 118 129.17 129.75 90.94 90.53 0.010 118.00

2001 1 115 130.33 131.25 87.62 91.09 0.010 115.00

2 126 132.17 133.08 94.68 90.30 0.010 126.00

3 141 134.00 134.92 104.51 103.22 0.010 141.00

4 135 135.83 136.42 98.96 98.63 0.010 135.00

5 125 137.00 137.42 90.96 97.88 0.009 125.00

6 149 137.83 138.75 107.39 109.89 0.010 149.00

7 170 139.67 140.92 120.64 119.97 0.010 170.00

--- --- --- --- --- --- --- ---

5. Simple Exponential smoothing

Depending upon whether the data is horizontal, or has trend or has also seasonality the

following methods are employed for forecasting purposes.

For non-seasonal time series data in which there is no trend, simple exponential

smoothing (SES) method can be employed. Let the observed time series up to time

period t upon a variable Y be denoted by y1, y2, …, yt. Suppose the forecast, say, yt (1), of

the next value yt+1 of the series (which is yet to be observed) is to be found out. Given

data up to time period t-1, the forecast for the next time t is denoted by yt-1 (1). When the

observation yt becomes available, the forecast error is found to be yt− yt-1(1). Thereafter

method of simple or single exponential smoothing takes the forecast for the previous

period and adjusts it using the forecast error to find the forecast for the next period, i.e.

Level: yt (1)= yt -1 (1) + α [yt− yt-1 (1)]

Forecast: yt (h) = yt (1); h=2,3,…

where is the smoothing constant (weight) which lies between 0 and 1. Here yt (h) is the

forecast for h periods ahead i.e. the forecast of yt+h for some subsequent time period t+h

based on all the data up to time period t. It is emphasized here that this model is a

recursive one and is mathematical rather than statistical in nature. Note that the choice of

should be based on the criterion that yields the smallest Mean Squared Error (MSE)

value. Also to start the iterative method, a value of the forecast for the initial period has

to be supplied by the user. However, as the weight attached to this value is small, its

effect on the ultimate forecast is negligible. However, sometimes, the choice of has

considerable impact on the forecast. A large value of (say 0.9) gives very little

smoothing in the forecast, whereas a small value of (say 0.1) gives considerable

smoothing. Alternatively, one can choose from a grid of values (say =0.1,0.2,…,0.9)

and choose the value that yields the smallest MSE value. If you expand the above model

recursively then yt (1) will come out to be a function of , past yt values and the initial

Page 11: INTERVENTION BASED ARIMA TIME-SERIES MODELS

M III: 1: Intervention based ARIMA Time-series Models

M III-11

forecast. So, having known values of and past values of yt the point of concern relates

to finding the the value of this initial forecast. One method of initialization is to use the

first observed value Y1 as the first forecast and then proceed. Another possibility would

be to average the first four or five values in the data set and use this as the initial forecast.

However, because the weight attached to this user-defined one is minimal, its effect on

the ultimate forecast is negligible.

The illustration for SES is given below using MSExcel:

Table 1: Illustration for Single Exponential Smoothing model (SES) - MSExcel

Monthly wholesale prices(in rupees per quintal) - U.P. -Hapur - Sugar variety C-29 (12 months period 1996

Jan - 1996-Dec)

Forecasts with varying values of alpha

T yt 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

1 1350 1350.00 1350.00 1350.00 1350.00 1350.00 1350.00 1350.00 1350.00 1350.00

2 1350 1350.00 1350.00 1350.00 1350.00 1350.00 1350.00 1350.00 1350.00 1350.00

3 1320 1350.00 1350.00 1350.00 1350.00 1350.00 1350.00 1350.00 1350.00 1350.00

4 1350 1347.00 1344.00 1341.00 1338.00 1335.00 1332.00 1329.00 1326.00 1323.00

5 1300 1347.30 1345.20 1343.70 1342.80 1342.50 1342.80 1343.70 1345.20 1347.30

6 1250 1342.57 1336.16 1330.59 1325.68 1321.25 1317.12 1313.11 1309.04 1304.73

7 1300 1333.31 1318.93 1306.41 1295.41 1285.63 1276.85 1268.93 1261.81 1255.47

8 1400 1329.98 1315.14 1304.49 1297.24 1292.81 1290.74 1290.68 1292.36 1295.55

9 1325 1336.98 1332.11 1333.14 1338.35 1346.41 1356.30 1367.20 1378.47 1389.55

10 1325 1335.79 1330.69 1330.70 1333.01 1335.70 1337.52 1337.66 1335.69 1331.46

11 1360 1334.71 1329.55 1328.99 1329.80 1330.35 1330.01 1328.80 1327.14 1325.65

12 ? 1337.24 1335.64 1338.29 1341.88 1345.18 1348.00 1350.64 1353.43 1356.56

Y12= 1325

MSE calc.

0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0

900 900 900 900 900 900 900 900 900

9 36 81 144 225 324 441 576 729

2237.29 2043.04 1909.69 1831.84 1806.25 1831.84 1909.69 2043.04 2237.29

8569.20 7423.55 6494.75 5727.46 5076.56 4505.09 3982.87 3485.72 2995.37

1109.76 358.27 41.13 21.09 206.64 536.02 965.16 1458.63 1982.65

4902.56 7200.81 9122.33 10558.63 11489.16 11937.92 11950.88 11586.03 10910.37

143.60 50.61 66.30 178.14 458.23 979.42 1781.18 2859.29 4167.31

116.32 32.39 32.49 64.13 114.56 156.71 160.31 114.37 41.67

639.75 927.03 961.63 911.75 879.03 899.56 973.54 1079.85 1180.23

1693 1725 1783 1849 1923 2006 2097 2191 2286

Page 12: INTERVENTION BASED ARIMA TIME-SERIES MODELS

M III: 1: Intervention based ARIMA Time-series Models

M III-12

6. Forecasting using simple decomposition methods

In the class, we will discuss about how time series decomposition can be used for

forecasting values for future time periods by means of MSExcel as given below:

Data Seasonal Deaseas Smoothed Trend Deviation

average Data Data

0.8 Y=39.3+4.6X

Data -

Trend

1989 Q1 0 9 29 31 31 39 -8.6

1989 Q2 1 50 52 96 44 44 -0.1

1989 Q3 2 1 8 10 37 49 -11.6

1989 Q4 3 7 30 22 34 53 -19.1

1990 Q1 4 38 29 131 53 58 -4.4

1990 Q2 5 68 52 132 69 62 6.7

1990 Q3 6 11 8 134 82 67 15.1

1990 Q4 7 43 30 142 94 72 22.5

1991 Q1 8 7 29 26 80 76 4.2

1991 Q2 9 41 52 79 80 81 -0.6

1991 Q3 10 13 8 153 95 85 9.4

1991 Q4 11 12 30 41 84 90 -6.1

1992 Q1 12 50 29 173 102 95 7.0

1992 Q2 13 36 52 70 95 99 -4.0

1992 Q3 14 16 8 188 114 104 9.8

1992 Q4 15 35 30 118 115 109 6.0

1993 Q1 16 41 29 140 120 113 6.5

1993 Q2 17 64 52 123 120 118 2.6

1993 Q3 18 1 8 15 99 122 -23.1

1993 Q4 19 53 30 177 115 127 -12.1

1994 Q1 20 36 29 124 124 132 -8.0

1994 Q2 21 69 52 132 132 136 -4.0

1994 Q3 22 11 8 141 141 141 0.0

1994 Q4 23 44 30 145 145 145 0.0

Smoothed data No change

Sum of

trend Trend for Average of

x seas avg/100

and

deviation new values last 6 pts

Page 13: INTERVENTION BASED ARIMA TIME-SERIES MODELS

M III: 1: Intervention based ARIMA Time-series Models

M III-13

7. Other Exponential smoothing methods

If the data has trend or has also seasonality the following methods are employed for

forecasting purposes.

(i) Double exponential smoothing (Holt) (For illustration see Table 2)

The above SES method was then extended to linear exponential smoothing and has been

employed to allow for forecasting non-seasonal time series data with trends. The forecast

for Holt‘s linear exponential smoothing is found by having two equations to deal with –.

one for level and one for trend. The forecast is found using two smoothing constants,

and (with values between 0 and 1), and three equations:

Level: lt = α yt +(1− α)(lt−1 + bt−1),

Trend: bt = β(lt − lt−1)+(1− β)bt−1,

Forecast: yt(h)= lt + bth.

Here lt denotes the level of the series at time t and bt denotes the trend (additive) of the

series at time t. The optimal combination of smoothing parameters and should be

chosen by minimizing the MSE over observations of the model data set.

(ii) Triple exponential smoothing (Winters) (For illustration see Tables 3 (a) & (b))

If the data have no trend or seasonal patterns, then SES is appropriate. If the data exhibit

a linear trend, Holt‘s method is appropriate. But if the data are seasonal, these methods,

on their own, cannot handle the problem well. Holt‘s method was later extended to

capture seasonality directly. The Winters‘ method is based on three smoothing

equations—one for the level, one for trend, and one for seasonality. It is similar to Holt‘s

method, with one additional equation to deal with seasonality. In fact there are two

different Winters‘ methods, depending on whether seasonality is modelled in an additive

or multiplicative way.

The basic equations for Winters‘ multiplicative method are as follows.

Level: lt = α m-t

t

s

y+(1− α)(lt−1 + bt−1)

Trend: bt = β (lt − lt−1)+(1− β)bt−1

Seasonal: st = γ yt / (lt−1 + bt−1)+(1− γ)st−m

Forecast: yt(h)=(lt + bt h)st−m+h

where m is the length of seasonality (e.g., number of, say, months or ‗seasons‘ in a year),

lt represents the level of the series, bt denotes the trend of the series at time t, st is the

seasonal component, and yt(h) is the forecast for h periods ahead. As with all exponential

smoothing methods, we need initial values of the components and parameter values.

Page 14: INTERVENTION BASED ARIMA TIME-SERIES MODELS

M III: 1: Intervention based ARIMA Time-series Models

M III-14

The basic equations for Winters‘ additive method are as follows.

Level: lt = α m-tsty +(1− α)(lt−1 + bt−1)

Trend: bt = β (lt − lt−1)+(1− β)bt−1

Seasonal: st = γ (yt - lt−1 - bt−1) + (1− γ) s t−m

Forecast: yt(h)=(lt + bt h) + st−m+h

8. ARIMA models

8.1 Stationarity of a TS process

A TS is said to be stationary if its underlying generating process is based on a constant

mean and constant variance with its autocorrelation function (ACF) essentially constant

through time. Thus, if we consider different subsets of a realization (TS ‗sample‘) the

different subsets will typically have means, variances and autocorrelation functions that

do not differ significantly.

A statistical test for stationarity is the most widely used Dickey Fuller test. To carry out

the test, estimate by OLS the regression model

y't = y t -1+b 1y‘ t -2+…+ b p y‘ t -p

where y't denotes the differenced series (yt -yt -1). The number of terms in the regression,

p, is usually set to be about 3. Then if is nearly zero the original series yt needs

differencing. And if <0 then yt is already stationary.

8.2 Autocorrelation functions

(i) Autocorrelation

Autocorrelation refers to the way the observations in a TS are related to each other and is

measured by the simple correlation between current observation (Yt) and observation

from p periods before the current one (Ytp ). That is for a given series Yt, autocorrelation

at lag p is the correlation between the pair (Yt , Ytp ) and is given by

It ranges from 1 to +1. Box and Jenkins has suggested that maximum number of useful

rp are roughly N/4 where N is the number of periods upon which information on yt is

available.

(ii) Partial autocorrelation

n

1t

2t

pn

1t

ptt

p

YY

YYYY

r

Page 15: INTERVENTION BASED ARIMA TIME-SERIES MODELS

M III: 1: Intervention based ARIMA Time-series Models

M III-15

Partial autocorrelations are used to measure the degree of association between y t and y t-p

when the y-effects at other time lags 1,2,3,…,p-1 are removed.

(iii) Autocorrelation function(ACF) and partial autocorrelation function(PACF)

Theoretical ACFs and PACFs (Autocorrelations versus lags) are available for the various

models chosen (say, see Pankratz, 1983) for various values of orders of autoregressive

and moving average components i.e. p and q. Thus compare the correlograms (plot of

sample ACFs versus lags) obtained from the given TS data with these theoretical

ACF/PACFs, to find a reasonably good match and tentatively select one or more ARIMA

models. The general characteristics of theoretical ACFs and PACFs are as follows:-

(here ‗spike‘ represents the line at various lags in the plot with length equal to magnitude

of autocorrelations)

Model ACF PACF

AR Spikes decay towards zero Spikes cutoff to zero

MA Spikes cutoff to zero Spikes decay to zero

ARMA Spikes decay to zero Spikes decay to zero

8.3 Description of ARIMA representation

(i) ARIMA modeling (For illustration see Table 4)

In general, an ARIMA model is characterized by the notation ARIMA (p,d,q) where,p, d

and q denote orders of auto-regression, integration (differencing) and moving average

respectively. In ARIMA parlance, TS is a linear function of past actual values and

random shocks. For instance, given a TS process {yt}, a first order auto-regressive

process is denoted by ARIMA (1,0,0) or simply AR(1) and is given by

y t = + 1y t-1 + t

and a first order moving average process is denoted by ARIMA (0,0,1) or simply MA(1)

and is given by

y t = - 1 t-1 + t Alternatively, the model ultimately derived, may be a mixture of these processes and of

higher orders as well. Thus a stationary ARMA (p, q) process is defined by the equation

y t = 1y t-1+ 2y t-2+…+ p y t-p - 1 t-1- 2 t-2+…- q t-q + t where t‘s are independently and normally distributed with zero mean and constant

variance 2 for t = 1,2,...n. Note here that the values of p and q, in practice lie between

0 and 3. The degree of differencing of main variable yt will be discussed in section 7 (i)).

(ii) Seasonal ARIMA modeling

Identification of relevant models and inclusion of suitable seasonal variables are

necessary for seasonal models. The Seasonal ARIMA i.e. ARIMA (p,d,q) (P,D,Q)s

model is defined by

p (B) P (Bs)

d s

D y t = Q (B

s) q (B) t

where

Page 16: INTERVENTION BASED ARIMA TIME-SERIES MODELS

M III: 1: Intervention based ARIMA Time-series Models

M III-16

p (B) = 1 - 1B-….-pB p

, q (B) = 1-1B-…-qBq

P (Bs) = 1-1 B

s-…-P B

sP , Q (B

s) = 1- 1 B

s-…-Q B

sQ

B is the backshift operator (i.e. B y t= y t-1, B 2y t = y t-2 and so on), ‘s‘ the seasonal lag

and ‗ t‘ a sequence of independent normal error variables with mean 0 and variance 2.

's and 's are respectively the seasonal and non-seasonal autoregressive parameters. 's

and 's are respectively seasonal and non-seasonal moving average parameters. p and q

are orders of non-seasonal autoregression and moving average parameters respectively

whereas P and Q are that of the seasonal auto regression and moving average

parameters respectively. Also d and D denote non-seasonal and seasonal differences

respectively.

8.4 Model building

(i) Identification

The foremost step in the process of modeling is to check for the stationarity of the

series, as the estimation procedures are available only for stationary series. There are two

kinds of stationarity, viz., stationarity in ‗mean‘ and stationarity in ‗variance‘. A cursory

look at the graph of the data and structure of autocorrelation and partial correlation

coefficients may provide clues for the presence of stationarity. Another way of checking

for stationarity is to fit a first order autoregressive model for the raw data and test

whether the coefficient ‗1‘ is less than one. If the model is found to be non-stationary,

stationarity could be achieved mostly by differencing the series. Or go for a Dickey

Fuller test (see section 8.1). This is applicable for both seasonal and non-seasonal

stationarity.

Thus, if ‗X t‘ denotes the original series, the non-seasonal difference of first order is

Y t = X t – X t-1

followed by the seasonal differencing (if needed)

Z t = Yt – Y t—s = (X t – X t-1) – (X t-s - Xt-s-1)

The next step in the identification process is to find the initial values for the orders of

seasonal and non-seasonal parameters, p, q, and P, Q. They could be obtained by looking

for significant autocorrelation and partial autocorrelation coefficients (see section 5 (iii)).

Say, if second order auto correlation coefficient is significant, then an AR (2), or MA (2)

or ARMA (2) model could be tried to start with. This is not a hard and fast rule, as

sample autocorrelation coefficients are poor estimates of population autocorrelation

coefficients. Still they can be used as initial values while the final models are achieved

after going through the stages repeatedly. Note that usually up to order 2 for p, d, or q are

sufficient for developing a good model in practice.

(ii) Estimation

Page 17: INTERVENTION BASED ARIMA TIME-SERIES MODELS

M III: 1: Intervention based ARIMA Time-series Models

M III-17

At the identification stage one or more models are tentatively chosen that seem to provide statistically adequate representations of the available data. Then we attempt to obtained precise estimates of parameters of the model by least squares as advocated by Box and Jenkins. Standard computer packages like SAS, SPSS etc. are available for finding the estimates of relevant parameters using iterative procedures. The methods of estimation are not discussed here for brevity.

(iii) Diagnostics

Different models can be obtained for various combinations of AR and MA individually

and collectively. The best model is obtained with following diagnostics.

(a) Low Akaike Information Criteria (AIC)/ Bayesian Information Criteria (BIC)/

Schwarz-Bayesian Information Criteria (SBC)

AIC is given by

AIC = (-2 log L + 2 m)

where m=p+ q+ P+ Q and L is the likelihood function. Since -2 log L is approximately

equal to {n (1+log 2π) + n log σ2} where σ

2 is the model MSE, Thus AIC can be written

as

AIC={n (1+log 2π) + n log σ2 + 2 m}

and because first term in this equation is a constant, it is usually omitted while comparing

between models. As an alternative to AIC, sometimes SBC is also used which is given

by

SBC = log σ2

+ (m log n) /n.

(b) Plot of residual ACF

Once the appropriate ARIMA model has been fitted, one can examine the goodness of fit

by means of plotting the ACF of residuals of the fitted model. If most of the sample

autocorrelation coefficients of the residuals are within the limits N/96.1± where N is

the number of observations upon which the model is based then the residuals are white

noise indicating that the model is a good fit.

(c) Non-significance of auto correlations of residuals via Portmonteau tests (Q-tests

based on Chisquare statistics)-Box-Pierce or Ljung-Box texts

After tentative model has been fitted to the data, it is important to perform diagnostic checks to test the adequacy of the model and, if need be, to suggest potential improvements. One way to accomplish this is through the analysis of residuals. It has been found that it is effective to measure the overall adequacy of the chosen model by examining a quantity Q known as Box-Pierce statistic (a function of autocorrelations of residuals) whose approximate distribution is chi-square and is computed as follows:

Q = n r2 (j)

Page 18: INTERVENTION BASED ARIMA TIME-SERIES MODELS

M III: 1: Intervention based ARIMA Time-series Models

M III-18

where summation extends from 1 to k with k as the maximum lag considered, n is the

number of observations in the series, r (j) is the estimated autocorrelation at lag j; k can

be any positive integer and is usually around 20. Q follows Chi-square with (k-m1)

degrees of freedom where m1 is the number of parameters estimated in the model. A

modified Q statistic is the Ljung-box statistic which is given by

Q = n(n+2) r2 (j) / (nj)

The Q Statistic is compared to critical values from chi-square distribution. If model is

correctly specified, residuals should be uncorrelated and Q should be small (the

probability value should be large). A significant value indicates that the chosen model

does not fit well.

All these stages require considerable care and work and they themselves are not

exhaustive.

Table 2: Illustration for Double Exponential Smoothing model (Holt) using

MSExcel calculations

Monthly wholesale prices (in rupees per quintal) -U.P.-Hapur-Sugar variety C-29 (24

months period 1997Apr - 1999Mar)

Forecasts with values of alpha & neta as 0.1 & 0.2

respectively

Year Month t yt h Ft bt

1997 4 1 1275 1275 5.094*

* for t=o, bt is obtained by fitting yt = a0 + b0 t

5 2 1315 1283.58 5.79

6 3 1325 1292.94 6.50

7 4 1400 1309.50 8.52

8 5 1420 1328.21 10.56

9 6 1400 1344.89 11.78

10 7 1400 1361.00 12.65

11 8 1430 1379.29 13.77

12 9 1450 1398.75 14.91

1998 1 10 1450 1417.30 15.64

2 11 1450 1434.64 15.98

3 12 1420 1447.56 15.37

4 13 1450 1461.64 15.11

5 14 1450 1474.07 14.57

6 15 1460 1485.78 14.00

7 16 1400 1489.80 12.01

8 17 1400 1491.63 9.97

9 18 1465 1497.94 9.24

10 19 1475 1503.96 8.59

11 20 1500 1511.30 8.34

12 21 1455 1513.18 7.05

Page 19: INTERVENTION BASED ARIMA TIME-SERIES MODELS

M III: 1: Intervention based ARIMA Time-series Models

M III-19

1999 1 22 1515 1519.70 6.95

2 23 1465 1520.48 5.71

3 24 1400 1513.58 3.19

4 25 1460 1 1516.77

5 26 1440 2 1519.95

6 27 1515 3 1523.14

7 28 1525 4 1526.33

8 29 1530 5 1529.52

9 30 1525 6 1532.71

Table 3 (a): Illustration for Triple Exponential Smoothing model (Winters) using

MSExcel calculations

Quarterly exports (in thousands of francs) of a French company over a six year

period (Makridakis et al. 1998; page 162)

alpha beta gamma

0.1 0.1 0.2

year quarter yt lead lt bt st

1 1 1 362 362.00 17.83** 0.96*

** the initial bt value is obtained as b0 by fitting a simple linear regression of yt

on t i.e. yt = a0 + b0 t

2 2 385 385.00 18.35 1.02*

3 3 432 432.00 21.21 1.14*

4 4 341 341.00 9.99 0.87*

*Refer Table 3 (b)

2 1 5 382 355.77 10.47 0.98

2 6 409 369.74 10.82 1.04

3 7 498 386.30 11.39 1.17

4 8 387 402.16 11.84 0.89

3 1 9 473 420.67 12.51 1.02

2 10 513 439.24 13.11 1.07

3 11 582 456.80 13.56 1.19

4 12 474 476.31 14.15 0.92

4 1 13 544 494.98 14.60 1.03

2 14 582 513.12 14.96 1.08

3 15 681 532.29 15.38 1.21

4 16 557 553.63 15.98 0.94

5 1 17 628 573.36 16.35 1.05

2 18 707 596.03 16.98 1.11

3 19 773 615.42 17.22 1.22

4 20 592 632.55 17.21 0.94

6 1 21 627 644.61 16.70 1.03

2 22 725 660.73 16.64 1.10

Page 20: INTERVENTION BASED ARIMA TIME-SERIES MODELS

M III: 1: Intervention based ARIMA Time-series Models

M III-20

3 23 854 679.47 16.85 1.23

4 24 661 697.24 16.94 0.94

25 ? 1 736.62

26 ? 2 813.27

27 ? 3 1000.71

28 ? 4 940.03

Table 3 (b): Illustration for ‘seasonals’ used in Triple Exponential Smoothing model

(Winters) using MSExcel calculations

yt 4MA 2x4MA yt/(2x4MA) Sea.index

year quarter

1 1 1 362 0.96

2 2 385 1.02

3 3 432 382.50 1.13 1.14

4 4 341 380.00 388.00 0.88 0.87

2 1 5 382 385.00 399.25 0.96

2 6 409 391.00 413.25 0.99

3 7 498 407.50 430.38 1.16

4 8 387 419.00 454.75 0.85

3 1 9 473 441.75 478.25 0.99

2 10 513 467.75 499.63 1.03

3 11 582 488.75 519.38 1.12

4 12 474 510.50 536.88 0.88

4 1 13 544 528.25 557.88 0.98

2 14 582 545.50 580.63 1.00

3 15 681 570.25 601.50 1.13

4 16 557 591.00 627.63 0.89

5 1 17 628 612.00 654.75 0.96

2 18 707 643.25 670.63 1.05

3 19 773 666.25 674.88 1.15

4 20 592 675.00 677.00 0.87

6 1 21 627 674.75 689.38 0.91

2 22 725 679.25 708.13 1.02

3 23 854 699.50

4 24 661 716.75

Page 21: INTERVENTION BASED ARIMA TIME-SERIES MODELS

M III: 1: Intervention based ARIMA Time-series Models

M III-21

Table 4: ARIMA model upon cotton crop yield (in kg/ha) for Maharashtra state

using SPSS package

1970-71 31 1987-88 100

1971-72 69 1988-89 89

1972-73 75 1989-90 143

1973-74 77 1990-91 117

1974-75 112 1991-92 72

1975-76 57 1992-93 124

1976-77 67 1993-94 180

1977-78 93 1994-95 154

1978-79 89 1995-96 155

1979-80 111 1996-97 173

1980-81 81 1997-98 95

1981-82 92 1998-99 139

1982-83 103 1999-00 162

1983-84 52 2000-01 100

1984-85 93 2001-02 147

1985-86 123 2002-03 158

1986-87 56 2003-04 189

SPSS Output

87654321

Lag Number

1.0

0.5

0.0

-0.5

-1.0

AC

F

Lower ConfidenceLimit

Upper Confidence Limit

Coefficient

y

87654321

Lag Number

1.0

0.5

0.0

-0.5

-1.0

Part

ial A

CF

Lower ConfidenceLimit

Upper Confidence Limit

Coefficient

y

Parameter Estimates

Estimates Std Error T

Approx

Sig

Non-

Seasonal

Lags

AR1 -.613 .172 -3.558 .001

AR2 -.499 .173 -2.880 .008

Constant 3.169 2.819 1.124 .271

Page 22: INTERVENTION BASED ARIMA TIME-SERIES MODELS

M III: 1: Intervention based ARIMA Time-series Models

M III-22

9. Intervention based ARIMA models

Every time series modelling approach has its own advantages and limitations. Time

series intervention models have advantages in certain situations. Sometimes, it may be

known that certain exceptional external events called ‗interventions‘ could affect the time

series under study. Under such circumstances, ‗transfer function‘ models may be used to

account for the effects of the intervention event on the series but wherein the input series

(apart from the main variable) will be in the form of a simple indicator variable to

indicate the presence or absence of the event. Here transfer function modeling refers to

accounting for the dynamic relationship between two time series Yt and Xt (the latter is

the input series) wherein past values of both series may be used in forecasting Yt, leading

to a considerable reduction in the errors of the forecast.

Intervention analysis or event study is used to assess the impact of a special event on the

time series of interest. Alternatively, intervention analysis may be undertaken to adjust

for any unusual values in the series Yt that might have resulted as a consequence of the

intervention event. This will ensure that the results of the time series analysis of the

series, such as the structure of the fitted model, estimates of model parameters, and

forecasts of future values, are not seriously distorted by the influence of these unusual

values. In agriculture, intervention may occur due to introduction of new variety, new

environmental regulations, economic policy changes, strikes, special promotion

campaigns, natural disaster etc. There are broadly three kinds of intervention viz., step,

pulse and ramp.

In its simplest form, intervention analysis itself may be regarded as a generalization of

the two-sample problem (corresponding to pre and post intervention periods) to the case

where the error or noise term is autocorrelated rather than independent. It is well known

that the usual two-sample procedures are not robust against alternatives involving

autocorrelation. Moreover, in many intervention analysis applications, time series data

may be expensive or otherwise difficult to collect. In such cases, ‗power functions‘ are

helpful, because they can be used to determine the probability that a proposed

intervention analysis application will detect a meaningful change. Power is the statistical

term used for the probability that a test will reject the null hypothesis of no change at a

given level of significance for a prescribed change. McLeod and Vingilis (2005) have

suggested power computation methods for use with time series analyses for the certain

cases of intervention analysis. They have also showed that power function also helps to

compute the sample size required for intervention analysis.

9.1 Time Series Intervention Model:

Intervention model is a model in time series which is used to explore the impact on the

series from external factors. Suppose that Yt is a time series the ARIMA (p, d, q) model

is written as:

1 1 1 1 1...d d d

t t p t t q t q tY Y Y

where

Page 23: INTERVENTION BASED ARIMA TIME-SERIES MODELS

M III: 1: Intervention based ARIMA Time-series Models

M III-23

=autoregressive parameter

=moving average parameter

d=degree of differencing

B=backshift operator

t =white noise

Let 2

1 2( ) (1 ... )p

pB B B B

2

1 2( ) (1 ... )q

qB B B B

Now the ARIMA model can be written as-

( )

( )

d

t t

BY

B

Sometimes the time series is depends on season i.e. Yt depends Yt-s, Yt-2s etc. where s is

the length of periodicity. Linearly this relation may be represented as-

1.... ...D

t s t s Ps t Ps s t Qs t Q tY Y Y

where

=seasonal autoregressive parameter

= seasonal moving average parameter

D= degree of seasonal differencing i.e.

Let 2

1 2( ) (1 ... )S S S PS

PB B B B

2

1 2( ) (1 ... )Q

QB B B B

D=degree of seasonal differencing

Now the model can be written as-

( )

( )

D

s t t

BY

B

The ‗seasonal non-seasonal multiplicative‘ model with the above notation can be written

as-

( ) ( )

( ) ( )

sd D

s t ts

B BY

B B

The time series input-output model is of the form

1 1 0 0 1 0... ...t t r t r t b t b t b s tY Y Y X X X N

where,

Xt=exogenous variable

b=delay parameter

Nt=error term

Let

1( ) (1 ... )r

rB B B

Page 24: INTERVENTION BASED ARIMA TIME-SERIES MODELS

M III: 1: Intervention based ARIMA Time-series Models

M III-24

0 1( ) ( ... )s

sB B B

Then the input-output model can be written as

( )

( )

b

t t t

BY B X N

B

When Nt is generated as an ARIMA process then the input-output model is known as

transfer function model. The ARIMA model can be extended to Seasonal ARIMA model

in a straightforward manner as shown above.

Intervention models are special cases of transfer function modeling in which the

exogenous variable is a deterministic categorical variable. Accordingly, an intervention

model with Seasonal ARIMA process can be written as

( ) ( ) ( )

( ) ( ) ( )

sb

t t ts

B B BY B I

B B B

where

It=dummy variable

The term ( )

( )

bBB

B

is called the intervention component, and the model may be extended

to include several intervention components and thereby to account for several types of

interventions that influence the process.

9.2. Input variables:

In general, there are three types of functions in the intervention variable. The intervention

type of step function starts from a given time till the last time period. Mathematically, the

intervention type of step function is written as:

0

1tI

t T

t T

with T is time of intervention when it first occurred.

The pulse function is the intervention type occurs only particular period of time.

Mathematically, the intervention type of pulse function is usually written as:

0

1tI

t T

t T

Apart from pulse and step functions, there is another kind of function known as ramp

function. Mathematically, the intervention type of ramp function is usually written as:

0tI

t T

t T

t T

To fix ideas, the illustration is given with the help of pulse function.

Page 25: INTERVENTION BASED ARIMA TIME-SERIES MODELS

M III: 1: Intervention based ARIMA Time-series Models

M III-25

Indicator coding for pulse, step and ramp functions are given in Table 3.1.

Table 3.1: Values of Intervention variable under different functio (when it occurred

in the 6th

time point)

Time t Pulse function It Step function It Ramp function It

1 0 0 0

2 0 0 0

3 0 0 0

4 0 0 0

5 0 0 0

6 1 1 1

7 0 1 2

8 0 1 3

9 0 1 4

10 0 1 5

9.3 Graphical representation of input and various types of outputs

When the input variable takes the form of step and pulse functions, their forms can be

graphically represented as shown below in Fig-9.1 and Fig-9.2 respectively.

Fig-9.1 Fig-9.2

Output

( )T

tBS ( )1

1

T

t

BP

B

Fig-9.3 Fig-9.4

In Fig-9.3, when an input step intervention has occurred, then the output pattern can be

described by single parameter ω and the response is abrupt-permanent and in Fig-9.4,

when a pulse intervention has occurred and the output pattern can be described by two

parameters ω1 and δ and the response due to intervention is abrupt-temporary.

Page 26: INTERVENTION BASED ARIMA TIME-SERIES MODELS

M III: 1: Intervention based ARIMA Time-series Models

M III-26

( )

1

T

t

BS

B

(' )1 2

1 1

T

t

B BP

B B

Fig-9.5 Fig-9.6

In Fig-9.5 step intervention has occurred and the response type is gradual permanent and

in Fig-9.6 pulse intervention has occurred and the response of the input can be described

by two intervention components, one component for explaining the abrupt temporary

effect and other explaining the permanent effect.

( )

1

T

t

BS

B

(' )1 2

01 1

T

t

B BP

B B

Fig-9.7 Fig-9.8

In Fig-9.7 step intervention has occurred and the response type is gradual increasing and

in Fig-9.8 pulse intervention has occurred and the response type can be explained by

three different intervention components.

9.4 Intervention model of null order pulse function:

The Intervention Type Null Order Pulse Function can be written as:

t t tY I N

with

Yt = Response variable at t

ω = Intervention effect on Yt

It = Intervention variable

Nt = ARIMA in pre- intervention data period

Page 27: INTERVENTION BASED ARIMA TIME-SERIES MODELS

M III: 1: Intervention based ARIMA Time-series Models

M III-27

In the above equation, the effect of I on Y is assumed to have an intervention element.

The estimated value of ω can be used to estimate the difference between the two periods

before and after the occurrence of intervention. In general, the effect of I on Y are

categorized as temporary, gradual, permanent or after delay in certain time.

9.5 Intervention model of first order pulse function:

Null order pulse function occurs when the δ parameter is zero. An alternative approach

which accommodates another sort of such an effect as gradual is denoted as first order

pulse function. If the intervention component is denoted as

*

t t tY Y N

then an additional parameter is needed to define f(It) as follows

* ( )1

t t tY f I IB

such that 1 1

We have * *

1t t tY Y I

since * *

1 2 1t t tY Y I and 1 , and so on recursively, the equation becomes:

*

0

j

t t j

j

Y I

If this equation is applied for k (k = 0, 1, 2 …) periods after intervention, the equation

below can be obtained: * 2 1

1 2 ( 1)( .... ...)K K

T K T K T K T K T K K T K KY I I I I

(0 0 0 ... 0 1 0 0 ...)K

K

( a)

Page 28: INTERVENTION BASED ARIMA TIME-SERIES MODELS

M III: 1: Intervention based ARIMA Time-series Models

M III-28

(b)

Fig-9.9 Intervention response with single pulse occurred in t = T

Equation (ix) means that the effect from pulse will vanish gradually corresponding to

geometric sequence which is determined by δ value. Figure 1(a) shows *

tY value for

model with value ω = 1, Fig. 1(b) for value ω = -1 and single pulse occurred in t = T for

some δ (delta) value with 1 and 0 . From Fig. 1, it can also be seen that Yt*

value

approaches to zero asymptotically. In this case, as δ approaches to 0, then the effect of I

to Y will be for a minimum period. On the other hand, as δ approaches to 1, the effect of

I will last longer. For the extreme case of δ = 1, we can get the permanent effect of I to Y

which is shown in Fig. 9.9.

9.6 Procedure for intervention model building

The intervention response which is written as Yt

* in essence are values of the differences

(errors) between the original data after the intervention period and the corresponding

ARIMA model forecasts of the same fitted on the pre-intervention data period.

The time series data Yt are divided into Data I data time series period before the

intervention occurred, Y1t and Data II-data time series period after the intervention

occurred, Y2t:

Step I: The forming of ARIMA model for data time series period before intervention,

Y1t. The model building with respect to identification of autoregressive and moving

average parameters will be done in the usual way based on the Autocorrelation Function

(ACF) and the partial autocorrelation function (PACF).

Step II: In order to identification of the order of b, r and s cross-correlation function is

used in case of transfer function model but in case of intervention model cross-correlation

function cannot be used because the intervention variable are not continuous, cross

correlations between the intervention variable and the dependent variable are meaningless

Page 29: INTERVENTION BASED ARIMA TIME-SERIES MODELS

M III: 1: Intervention based ARIMA Time-series Models

M III-29

the only way to identify the intervention model is impulse response function which is

given below-

( )

( )

B

B

Impulse response

0

0 1B

0

1 B

0 1

1

B

B

Step III: The parameter estimates of intervention model will be tested by using statistical

tests (such as t or z-tests, the latter will be elaborated in a subsequent section). Model

suitability will be diagnosed through residual assumption test for adherence to white

noise.

Bibliography

Page 30: INTERVENTION BASED ARIMA TIME-SERIES MODELS

M III: 1: Intervention based ARIMA Time-series Models

M III-30

Box, G.E.P., Jenkins, G.M. and Reinsel, G.C. (1994). Time series analysis : Forecasting

and control, Pearson Education, Delhi.

Croxton, F.E., Cowden, D.J. and Klein, S.(1979). Applied General Statistics, New Delhi:

Prentice Hall of India Pvt. Ltd.

Makridakis, S., Wheelwright, S.C. and Hyndman, R.J. (1998). Forecasting Methods and

Applications, 3rd

Edition, John Wiley, New York.

Pankratz, A. (1983). Forecasting with univariate Box-Jenkins models: concepts and

cases, John Wiley, New york.