CHAPTER 2 Univariate Time Series Models 2.1 Least Squares Regressionshumway/sta137/ch2s.pdf ·...

CHAPTER 2

Univariate Time Series Models

2.1 Least Squares Regression

We begin our discussion of univariate and multivariate time series methods byconsidering the idea of a simple regression model, which we have met before inother contexts. All of the multivariate methods follow, in some sense, from theideas involved in simple univariate linear regression. In this case, we assumethat there is some collection of fixed known functions of time, say zt1, zt2, . . . ztq

that are influencing our output yt which we know to be random. We expressthis relation between the inputs and outputs as

yt = β1zt1 + β2zt2 + · · ·+ βqztq + et (2.1)

at the time points t = 1, 2, . . . , n, where β1, . . . , βq are unknown fixed regressioncoefficients and et is a random error or noise, assumed to be white noise;this means that the observations have zero means, equal variances σ2 and areindependent. We traditionally assume also that the white noise series, et, isGaussian or normally distributed.

Example 2.1:We have assumed implicitly that the model

yt = β1 + β2t + et

is reasonable in our discussion of detrending in Chapter 1. This is inthe form of the regression model (2.1) when one makes the identificationzt1 = 1, zt2 = t. The problem in detrending is to estimate the coeffi-cients β1 and β2 in the above equation and detrend by constructing theestimated residual series et. We discuss the precise way in which this isaccomplished below.

The linear regresssion model described by Equation (2.1) can be conve-niently written in slightly more general matrix notation by defining the column

2.1: Least Squares Regression 27

vectors zzzt = (zt1, . . . , ztq)′ and βββ = (β1, . . . , βq)′ so that we write (2.1) in thealternate form

yt = βββ′zzzt + et. (2.2)

To find estimators for β and σ2 it is natural to determine the coefficient vectorβββ minimizing

∑e2t with respect to β. This yields least squares or maximum

likelihood estimator β and the maximum likelihood estimator for σ2 which isproportional to the unbiased

σ2 =1

(n− q)

n−1∑t=0

(yt − βββ′zzzt)2 (2.3)

An alternate way of writing the model (2.2) is as

yyy = Zβββ + eee (2.4)

where Z ′ = (zzz1, zzz2, . . . , zzzn) is a q×n matrix composed of the values of the inputvariables at the observed time points and yyy′ = (y1, y2, . . . , yn) is the vector ofobserved outputs with the errors stacked in the vector eee′ = (e1, e2, . . . , en).The ordinary least squares estimators β are the solutions to the normalequations

Z ′Zβββ = Z ′y,

You need not be concerned as to how the above equation is solved in practiceas all computer packages have efficient software for inverting the q × q matrixZ ′Z to obtain

βββ = (Z ′Z)−1Z ′yyy. (2.5)

An important quantity that all software produces is a measure of uncertaintyfor the estimated regression coefficients, say

ˆcov{βββ} = σ2 (Z ′Z)−1. (2.6)

If cij denotes an element of C = (Z ′Z)−1, then cov( βi, βj) = σ2cij and a100(1− α)% confidence interval for βi is

βi ± tn−q(α/2)σ√

cii, (2.7)

where tdf (α/2) denotes the upper 100(1 − α)% point on a t distribution withdf degrees of freedom.

Example 2.2:

Consider estimating the possible global warming trend alluded to in Sec-tion 1.1.2. The global temperature series, shown previously in Figure1.3 suggests the possibility of a gradually increasing average tempera-ture over the 123 year period covered by the land-based series. If wefit the model in Example 2.1, replacing t by t/100 to convert to a 100

28 1 Univariate Time Series Models

year base so that the increase will be in degrees per 100 years, we obtainβ1 = 38.72, β2 = .9501 using (2.5). The error variance, from (2.3), is.0752, with q = 2 and n = 123. Then (2.6) yields

ˆcov(β1, β2) =(

1.8272 −.0941−.0941 .0048

),

leading to an estimated standard error of√

.0048 = .0696. The value of twith n−q = 123−2 = 121 degrees of freedom for α = .025 is about 1.98,leading to a narrow confidence interval of .95± .138 for the slope leadingto a confidence interval on the one hundred year increase of about .81to 1.09 degrees. We would conclude from this analysis that there is asubstantial increase in global temperature amounting to an increase ofroughly one degree F per 100 years.

0 5 10 15 20−0.5

0

0.5

1

ACF = γx(h)

lag

Detrended Temperature

0 5 10 15 20−0.5

0

0.5

1

lag

PACF = Φhh

0 5 10 15 20−0.5

0

0.5

1

ACF = γx(h)

lag

Differenced Temperature

0 5 10 15 20−0.5

0

0.5

1

lag

PACF = Φhh

Figure 2.1 Autocorrelation functions (ACF) and partial autocorrelation functions(PACF) for the detrended (top panel) and differenced (bottom panel) globaltemperature series.

If the model is reasonable, the residuals et = yt − β1 − β2 t should beessentially independent and identically distributed with no correlation evident.The plot that we have made in Figure 1.3 of the detrended global temperatureseries shows that this is probably not the case because of the long low frequency

2.1: Least Squares Regression 29

in the observed residuals. However, the differenced series, also shown in Figure1.3 (second panel), appears to be more independent suggesting that perhapsthe apparent global warming is more consistent with a long term swing inan underlying random walk than it is of a fixed 100 year trend. If we checkthe autocorrelation function of the regression residuals, shown here in Figure2.1, it is clear that the significant values at higher lags imply that there issignificant correlation in the residuals. Such correlation can be importantsince the estimated standard errors of the coefficients under the assumptionthat the least squares residuals are uncorrelated is often too small. We canpartially repair the damage caused by the correlated residuals by looking at amodel with correlated errors. The procedure and techniques for dealing withcorrelated errors are based on the Autoregressive Moving Average (ARMA)models to be considered in the next sections. Another method of reducingcorrelation is to apply a first difference ∆xt = xt − xt−1 to the global trenddata. The ACF of the differenced series, also shown in Figure 2.1, seems tohave lower correlations at the higher lags. Figure 1.3 shows qualitatively thatthis transformation also eliminates the trend in the original series.

Since we have again made some rather arbitrary looking specifications forthe configuration of dependent variables in the above regression examples, thereader may wonder how to select among various plausible models. We mentionthat two criteria which reward reducing the squared error and penalize foradditional parameters are the Akaike Information Criterion

AIC(K) = log σ2 +2K

n(2.8)

and the Schwarz Information Criterion

SIC(K) = log σ2 +K log n

n, (2.9)

(Schwarz, 1978) where K is the number of parameters fitted (exclusive of vari-ance parameters) and σ2 is the maximum likelihood estimator for the variance.This is sometimes termed the Bayesian Information Criterion, BIC and willoften yield models with fewer parameters than the other selection methods. Amodification to AIC(K) that is particularly well suited for small samples wassuggested by Hurvich and Tsai (1989). This is the corrected AIC, given by

AICC(K) = log σ2 +n + K

n−K − 2(2.10)

The rule for all three measures above is to choose the value of K leading to thesmallest value of AIC(K) or SIC(K) or AICC(K). We will give an examplelater comparing the above simple least squares model with a model where theerrors have a time series correlation structure.

The organization of this chapter is patterned after the landmark approachto developing models for time series data pioneered by Box and Jenkins (see


Box et al, 1994). This assumes that there will be a representation of timeseries data in terms of a difference equation that relates the current valueto its past. Such models should be flexible enough to include non-stationaryrealizations like the random walk given above and seasonal behavior, wherethe current value is related to past values at multiples of an underlying season;a common one might be multiples of 12 months (1 year) for monthly data.The models are constructed from difference equations driven by random inputshocks and are labeled in the most general formulation as ARIMA , i.e.,AutoRegressive Integrated Moving Average processes. The analogieswith differential equations, which model many physical processes, are obvious.

For clarity, we develop the separate components of the model sequentially,considering the integrated, autoregressive and moving average in order, fol-lowed by the seasonal modification. The Box-Jenkins approach suggests threesteps in a procedure that they summarize as l identification, estimationand forecasting. Identification uses model selection techniques, combiningthe ACF and PACF as diagnostics with the versions of AIC given above tofind a parsimonious (simple) model for the data. Estimation of parameters inthe model will be the next step. Statistical techniques based on maximum like-lihood and least squares are paramount for this stage and will only be sketchedin this course. Finally, forecasting of time series based on the estimated pa-rameters, with sensible estimates of uncertainty, is the bottom line, for anyassumed model.

2.2 Integrated (I) Models

We begin our study of time correlation by mentioning a simple model that willintroduce strong correlations over time. This is the random walk model whichdefines the current value of the time series as just the immediately precedingvalue with additive noise. The model forms the basis, for example, of therandom walk theory of stock price behavior. In this model we define

xt = xt−1 + wt, (2.11)

where wt is a white noise series with mean zero and variance σ2. Figure 2.2shows a typical realization of such a series and we observe that it bears apassing resemblance to the global temperature series. Appealing to (2.11),the best prediction of the current value would be expected to be given by itsimmediately preceding value. The model is, in a sense, unsatisfactory, becauseone would think that better results would be possible by a more efficient useof the past.

The ACF of the original series, shown in Figure 2.3, exhibits a slow decayas lags increase. In order to model such a series without knowing that it isnecessarily generated by (2.11), one might try looking at a first difference andcomparing the result to a white noise or completely independent process. It is

2.2 I Models 31

0 20 40 60 80 100 120 140 160 180 200−15

−10

−5

0

5

Random walk: xt=x

t−1+w

t

0 20 40 60 80 100 120 140 160 180 200−3

−2

−1

0

1

2

3 First Difference: xt−x

t−1

Figure 2.2 A typical realization of the random walk series (top panel and the firstdifference of the series (bottom panel)

clear from (2.11) that the first difference would be ∆xt = xt−xt−1 = wt whichis just white noise. The ACF of the differenced process, in this case, would beexpected to be zero at all lags h 6= 0 and the sample ACF should reflect thisbehavior. The first difference of the random walk in Figure 2.2 is also shownin Figure 2.3 and we note that it appears to be much more random. The ACF,shown in Figure 2.3, reflects this predicted behavior, with no significant valuesfor lags other than zero. It is clear that (2.11) is a reasonable model for thisdata. The original series is nonstationary, with an autocorrelation functionthat depends on time of the form

ρ(xt+h, xt) =

√t

t+h , h ≥ 0√

t+ht , h < 0


0 5 10 15 20−0.5

0

0.5

1

ACF = γx(h)

lag

Random Walk

0 5 10 15 20−0.5

0

0.5

1

lag

PACF = Φhh

0 5 10 15 20−0.5

0

0.5

1

ACF = γx(h)

lag

First Difference

0 5 10 15 20−0.5

0

0.5

1

lag

PACF = Φhh

Figure 2.3 Autocorrelation functions (ACF) and partial autocorrelation functions(PACF) for the random walk (top panel) and the first difference (bottompanel) series.

The above example, using a difference transformation to make a randomwalk stationary, shows a very particular case of the model identification pro-cedure advocated by Box et al (1994). Namely, we seek a linearly filteredtransformation of the original series, based strictly on the past values, thatwill reduce it to completely random white noise. This gives a model thatenables prediction to be done with a residual noise that satisfies the usualstatistical assumptions about model error.

We will introduce, in the following discussion, more general versions ofthis simple model that are useful for modeling and forecasting series withobservations that are correlated in time. The notation and terminology wereintroduced in the landmark work by Box and Jenkins (1970) (see Box et al,1994). A requirement for the ARMA model of Box and Jenkins is that theunderlying process be stationary. Clearly the first difference of the randomwalk is stationary but the ACF of the first difference shows relatively littledependence on the past, meaning that the differenced process is not predictablein terms of its past behavior.

To introduce a notation that has advantages for treating more general mod-els, define the backshift operator B as the result of shifting the series backby one time unit, i.e.

Bxt = xt−1, (2.12)

2.2 AR Models 33

and applying successively higher powers, Bkxt = xt−k. The operator has manyof the usual algebraic properties and allows, for example, writing the randomwalk model (2.11) as

(1−B)xt = wt.

Note that the difference operator discussed previously in 1.2.2 is just∇ = 1−B.Identifying nonstationarity is an important first step in the Box-Jenkins

procedure. From the above discussion, we note that the ACF of a nonstationaryprocess will tend to decay rather slowly as a function of lag h. For example,a straightly line would be perfectly correlated, regardless of lag. Based onthis observation, we mention the following properties that aid in identifyingnon-stationarity.

Property P2.1: ACF and PACF of a non-stationary time series

The ACF of a non-stationary time series decays very slowly asa function of lag h. The PACF of a non-stationary time seriestends to have a peak very near unity at lag 1, with other valuesless than the significance level.

2.3 Autoregressive (AR) Models

Now, extending the notions above to more general linear combinations of pastvalues might suggest writing

xt = φ1xt−1 + φ2xt−2 + . . . φpxt−p + wt (2.13)

as a function of p past values and an additive noise component wt. The modelgiven by (2.12) is called an autoregressive model of order p, since it is as-sumed that one needs p past values to predict xt. The coefficients φ1, φ2, . . . , φp

are autoregressive coefficients, chosen to produce a good fit between the ob-served xt and its prediction based on xt−1, xt−2, . . . , xt−p. It is convenient torewrite (2.13), using the backshift operator, as

φ(B)xt = wt, (2.14)

whereφ(B) = 1− φ1B − φ2B

2 − . . .− φpBp (2.15)

is a polynomial with roots (solutions of φ(B) = 0) outside the unit circle(|Bk| > 1). The restrictions are necessary for expressing the solution xt of(2.14) in terms of present and past values of wt. That solution has the form

xt = ψ(B)wt (2.16)


where

ψ(B) =∞∑

k=0

ψkBk, (2.17)

is an infinite polynomial (ψ0 = 1), with coefficients determined by equatingcoefficients of B in

ψ(B)φ(B) = 1. (2.18)

Equation (2.16) can be obtained formally by noting that choosing ψ(B) sat-isfying (2.18), and multiplying both sides of (2.16) by ψ(B) gives the repre-sentation (2.16). It is clear that the random walk has B1 = 1, which does notsatisfy the restriction and the process is nonstationary.

Example 2.2Suppose that we have an autoregressive model (2.13) with p = 1, i.e.,xt − φ1xt−1 = (1− φ1B)xt = wt. Then (2.18) becomes

(1 + ψ1B + ψ2B2 + . . .)(1− φ1B) = 1

Equating coefficients of B implies that ψ1 − φ1 = 0 or ψ1 = φ1. For B2,we would get ψ2−φ1ψ1 = 0, or ψ2 = φ2

1. Continuing, we obtain ψk = φk1

and the representation is

ψ(B) = 1 +∞∑

k=1

φk1Bk

and we have

xt =∞∑

k=0

φk1wt−k. (2.19)

The representation (2.16) is fundamental for developing approximateforecasts and also exhibits the series as a linear process of the form con-sidered in Problem 1.4.

For data involving such autoregressive (AR) models as defined above, themain selection problems are deciding that the autoregressive structure is ap-propriate and then in determining the value of p for the model. The ACF ofthe process is a potential aid for determining the order of the process as arethe model selection measures (2.8)-(2.10). To determine the ACF of the pthorder AR in (2.13), , write the equation as

xt −p∑

k=1

φkxt−k = wt

2.2 AR Models 35

and multiply both sides by xt−h, h = 1, 2, . . .. Assuming that the mean E(xt) =0, and using the definition of the autocovariance function (1.2) leads to theequation

E[(xt −p∑

k=1

φkxt−k)xt−h] = E[wtxt−h]

The left-hand side immediately becomes

γx(h)−p∑

k=1

φkγx(h− k).

The representation (2.16) implies that

E[wtxt−h] = E[wt(wt−h + ψ1wt−h−1 + ψ2wt−h−2 + . . .)]

For h = 0, we get σ2w. For all other h, the fact that the wt are independent

implies that the right-hand side will be zero. Hence, we may write the equationsfor determining γx(h) as

γx(0)−p∑

k=1

φkγx(h− k) = σ2w (2.20)

and

γx(h)−p∑

k=1

φkγx(h− k) = 0 (2.21)

for h = 1, 2, 3, . . .. Note that one will need the property γx(−h) = γx(h)in solving these equations. Equations (2.20) and (2.21) are called the Yule-Walker Equations (see Yule, 1927, Walker, 1931).

Example 2.3

Consider finding the ACF of the first-order autoregressive model. First,(2.21) implies that γx(0) − φ1γx(1) = σ2

w. For h = 1, 2, . . ., we obtainγx(h)− φ1γx(h− 1) = 0 Solving these successively gives

γx(h) = γx(0)φh1

Combining with (2.20) yields

γx(0) =σ2

w

1− φ21

It follows that the autocovariance function is

γx(h) =σ2

w

1− φ21

φh1


Taking into account that γx(−h) = γx(h) and using (1.3), we obtain

ρx(h) = φ|h|1

for h = 0,±1,±2, . . ..

The exponential decay is typical of autoregressive behavior and there mayalso be some periodic structure. However, the most effective diagnostic of ARstructure is in the PACF and is summarized by the following identificationproperty:

Property P2.2: PACF for AR Process

The partial autocorrelation function φhh as a function of lag his zero for h > p, the order of the autoregressive process. Thisenables one to make a preliminary identification of the order pof the process using the partial autocorrelation function PACF.Simply choose the order beyond which most of the sample valuesof the PACF are approximately zero.

To verify the above, note that the PACF (see Section 1.3.3) is basically thelast cofficient obtained when minimizing the squared error

MSE = E[(xt+h −h∑

k=1

akxt+h−k)2].

Setting the derivatives with respect to aj equal to zero leads to the equations

E[((xt+h −h∑

k=1

akxt+h−k)xt+h−j ] = 0

This can be written as

γx(j)−h∑

i=1

akγx(j − k) = 0

for j = 1, 2, . . . , h. Now, from Equation and (2.21), it is clear that, for anAR(p), we may take ak = φk for k ≤ p and ak = 0 for k > p to get a solutionfor the above equation. This implies Property P2.3 above.

Having decided on the order p of the model, it is clear that, for the esti-mation step, one may write the model (2.13) in the regression form

xt = φφφ′zzzt + wt, (2.22)

2.2 AR Models 37

where φφφ = (φ1, φ2, . . . , φp)′ corresponds to βββ and zzzt = (xt−1, xt−2, . . . , xt−p)′

is the vector of dependent variables in (2.2). Taking into account the fact thatxt is not observed for t ≤ 0, we may run the regression approach in Section 3.1for t = p, p+1, . . . , n−1 to get estimators for φ and for σ2, the variance of thewhite noise process. These so-called conditional maximum likelihood estimatorsare commonly used because the exact maximum likelihood estimators involvesolving nonlinear equations.

Example 2.4

We consider the simple problem of modeling the recruit series shown inFigure 1.1 using an autoregressive model. The bottom panel of Figure 1.9shows the autocorrelation ACF and partial autocorelation PACF func-tions of the recruit series. The PACF has large values for h = 1, 2 andthen is essentially zero for higher order lags. This implies by PropertyP2.2 above that a second order (p = 2) AR model might provide a goodfit. Running the regression program for the model

xt = β0 + φ1xt−1 + φ2xt−2 + wt

leads to the estimators

β0 = 6.74(1.11), φ1 = 1.35(.04), φ2 = −.46(.04), σ2 = 90.31

where the estimated standard deviations are in parentheses. To deter-mine whether the above order is the best choice, we fitted models forp = 1, . . . , 10, obtaining corrected AICC values of 5.75, 5.52, 5.53, 5.54,5.54, 5.55, 5.55, 5.56, 5.57, and 5.58 respectively using (2.10) with K = 2.This shows that the minimum AICC obtains for p = 2 and we choosethe second-order model.

Example 2.5

The previous example used various autoregressive models for the recruitsseries, fitting a second-order regression model. We may also use this re-gression idea to fit the model to other series such as a detrended versionof the Southern Oscillation Index (SOI) given in previous discussions.We have noted in our discussion of Figure 1.9 from the partial autocor-relation function (PACF) that a plausible model for this series might bea first order autoregression of the form given above with p = 1. Again,putting the model above into the regression framework (2.2) for a sin-gle coefficient leads to the estimators φ1 = .59 with standard error .04,σ2 = .09218 and AICC(1) = −1.375. The ACF of these residuals (notshown), however, will still show cyclical variation and it is clear thatthey still have a number of values exceeding the ±1.96/

√n threshold


(see Equation 1.14). A suggested procedure is to try higher order au-toregressive models and successive models for p = 1, 2, . . . , 30 were fittedand the AICC(K) values are plotted in Figure 3.10 of Chapter 3 so we donot repeat it here. There is a clear minimum for a p = 16th order model.The coefficient vector is φ with components .40, .07, .15, .08, -.04, -.08,-.09, -.08, .00, .11, .16, .15, .03, -.20, -.14 and -.06 and σ2 = .07354.

Finally, we give a general approach to forecasting for any process that canbe written in the form (2.16). This includes the AR, MA and ARMA processes.We begin by defining an h-step forecast of the process xt as

xtt+h = E[xt+h|xt, xt−1, . . .] (2.23)

Note that this is not exactly right because we only have x1, xt, . . . , xt available,so that conditioning on the infinite past is only an approximation. From thisdefinition is reasonable to intuit that xt

s = xt, s ≤ t and

E[ws|xt, xt−1, . . .] = E[ws|wt, wt−1 . . .] = ws, (2.24)

for s ≤ t. For s > t, use xts and

E[ws|xt, xt−1, . . .] = E[ws|wt, wt−1, . . .] = E[ws] = 0, (2.25)

since ws will be independent of past values of wt. We define the h-step forecastvariance as

P tt+h = E[(xt+h − xt

t+h)2|xt, xt−1, . . .]. (2.26)

To develop an expression for this mean square error, note that, with ψ0 = 1,we can write

xt+h =∞∑

k=0

ψkwt+h−k.

Then, since wtt+h−k = 0 for t + h− k > t, i.e. k < h, we have

xtt+h =

∞∑

k=h

ψkwt+h−k,

so that the residual is

xt+h − xtt+h =

h−1∑

k=0

ψkwt+h−k,

Hence, the mean square error (2.26) is just the variance of a linear combinationof independent zero mean errors, with common variance σ2

w

P tt+h = σ2

w

h−1∑

k=0

ψ2k (2.27)

2.2 AR Models 39

As an example, we consider forecasting the second order model developed forthe recruit series in Example 2.5.

Example 2.6

Consider the one-step forecast xtt+1 first. Writing the defining equation

for t + 1 givesxt+1 = φ1xt + φ2xt−1 + wt+1,

so that

xtt+1 = φ1x

tt + φ2x

tt−1 + wt

t+1

= φ1xt + φ2xt−1 + 0

Continuing in this vein, we obtain

xtt+2 = φ1x

tt+1 + φ2x

tt + wt

t+2

= φ1xtt+1 + φ2xt + 0.

Then,

xtt+h = φ1x

tt+h−1 + φ2x

tt+h−2 + wt

t+h

= φ1xtt+h−1 + φ2x

tt+h−2 + 0

for h > 2. Forecasts out to lag h = 4 and beyond, if necessary, can befound by solving (2.18) for ψ1, ψ2 and ψ3, and substituting into (2.26).By equating coefficients of B, B2 and B3 in

(1− φ1B − φ2B2)(1 + ψ1B + ψ2B

2 + ψ3B3 + . . .) = 1,

we obtain ψ1 = φ1, ψ2 − φ2 + φ1ψ1 = 0 and ψ3 − φ1ψ2 − φ2ψ1 = 0.This gives the coefficients ψ1 = φ1, ψ2 = φ2 − φ2

1, ψ3 = 2φ2φ1 − φ21 From

Example 2.5, we have φ1 = 1.35, φ2 = −.46, σ2w = 90.31 and β0 = 6.74.

The forecasts are of the form

xtt+h = 6.74 + 1.35xt

t+h−1 − .46xtt+h−2

For the forecast variance, we evaluate ψ1 = 1.35, ψ2 = −2.282, ψ3 =−3.065, leading to 90.31, 90.31(2.288), 90.31(7.495) and 90.31(16.890) forforecasts at h = 1, 2, 3, 4. The standard deviations of the forecasts are9.50, 14.37, 26.02 and 39.06 for the standard errors of the forecasts. Therecruit series values range from 20 to 100 so the forecast uncertainty willbe rather large.


2.4 Moving Average (MA) Models

We may also consider processes that contain linear combinations of underlyingunobserved shocks, say, represented by white noise series wt. These movingaverage components generate a series of the form

xt = wt − θ1wt−1 − θ2wt−2 − . . .− θqwt−q (2.28)

where q denotes the order of the moving average component and θ1, θ2, . . . , θq

are parameters to be estimated. Using the shift notation, the above equationcan be written in the form

xt = θ(B)wt (2.29)

whereθ(B) = 1− θ1B − θ2B

2 − . . .− θqBq (2.30)

is another polynomial in the shift operator B. It should be noted that the MAprocess of order q is a linear process of the form considered earlier in Problem1.4 with ψ0 = 1, ψ2 = −θ1, . . . , ψq = θq. This implies that the ACF will bezero for lags larger than q because terms in the form of the covariance functiongiven in Problem 1.4 of Chapter 1 will all be zero. Specifically, he exact formsare

γx(0) = σ2w

(1 +

q∑

k=1

θ2k

)(2.31)

for h + 0 and

γx(h) = σ2w

(−θh +

q−h∑

k=1

θk+hθk

)(2.32)

for h = 1, . . . , q − 1, with γx(q) = −σ2wθq, and γx(h) = 0 for h > q.

Hence, we will have

P2.3: ACF for MA SeriesFor a moving average series of order q, note that the autocor-relation function (ACF) is zero for lags h > q, i.e. ρx(h) = 0for h > q. Such a result enables us to diagnose the order of amoving average component by examining ρx(h) and choosing qas the value beyond which the coefficients are essentially zero.

Example 2.7

Consider the varve thicknesses in Figure 1.10, which is described in Prob-lem 1.7 of Chapter 1. Figure 2.4 shows the ACF and PACF of the originallog-transformed varve series and the first differences. The ACF of theoriginal series indicates a possible non-stationary behavior, and suggeststaking a first difference, interpreted hear as the percentage yearly change

2.2 MA Models 41

in deposition. The ACF of the first difference shows a clear peak at h = 1and no other significant peaks, suggesting a first-order moving average.Fitting the first order moving average model xt = wt − θ1wt−1 to thisdata using the Gauss-Newton procedure described next leads to θ1 = .77and σ2

w = .2358.

0 10 20 30−0.5

0

0.5

1

log varves

0 10 20 30−0.5

0

0.5

1

0 10 20 30−0.5

0

0.5

1

First difference

0 10 20 30−0.5

0

0.5

1

ACF PACF

Figure 2.4 Autocorrelation functions (ACF) and partial autocorrelation functions(PACF) for the log varve series (top panel) and the first difference (bottompanel), showing a peak in the ACF at lag h = 1.

Fitting the pure moving average term turns into a nonlinear problem as wecan see by noting that either maximum likelihood or regression involves solving(2.28) or (2.29) for wt, and minimizing the sum of the squared errors. Supposethat the roots of π(B) = 0 are all outside the unit circle, then this is possibleby solving π(B)θ(B) = 1, so that, for the vector parameter θθθ = (θ1, . . . , θq)′,we may write

wt(θθθ) = π(B)xt (2.33)

and minimize

SSE(θθθ) =n∑

t=q+1

w2t (θθθ)

as a function of the vector parameter θθθ. We don’t really need to find the opera-tor π(B) but can simply solve (2.23) recursively for wt, with w1, w2, . . . wq = 0


and

wt(θθθ) = xt +q∑

k=1

θkwt−k

for t = q+1, . . . , n. It is easy to verify that SSE(θθθ) will be a nonlinear functionof θ1, θ2, . . . , θq. However, note that

wt(θθθ) ≈ wt(θθθ0) +(

∂wt

∂t

)′(θθθ − θθθ0),

where the derivative is evaluated at the previous guess θθθ0. Rearranging theabove equation leads to

wt(θθθ0) ≈(−∂wt

∂θθθ

)′(θθθ − θθθ0) + wt(θθθ), (2.34)

which is just the regression model (2.2). Hence, we can begin with an ini-tial guess θθθ0 = (.1, .1, . . . , .1)′, say and successively minimize SSE(θθθ) untilconvergence.

In order to forecast a moving average series, note that

xt+h = wt+h −q∑

k=1

θkwt+h−k.

The results below (2.24) imply that

xtt+h = −

q∑

k=h+1

θkwtt+h−k,

where the wt values needed for the above are computed recursively as before.Because of (2.17), it is clear that ψ0 = 1 and ψk = −θk, k = 1, 2, . . . , q andthese values can be substituted directly into the variance formula (2.27).

2.5 Autoregressive Integrated Moving Average

(ARIMA) Models

Now combining the autoregressive and moving average components leadsto the autoregressive moving average ARMA(p, q) model, written as

φ(B)xt = θ(B)wt, (2.35)

where the polynomials in B are as defined earlier in (2.15) and (2.29), with pautoregressive coefficients and q moving average coefficients. In the differenceequation form, this becomes

xt −p∑

k=1

φkxt−k = wt −q∑

k=1

θkwt−k. (2.36)

2.5 ARIMA Models 43

The mixed processes do not satisfy the properties P2.1-P2.3 any more butthey tend to behave in approximately the same way, even for the mixed cases.

Estimation and forecasting for such problems are treated in essentially thesame manner as for the AR and MA processes. We note that we can formallydivide both sides of (2.25) by φ(B) and note that the usual representation(2.16) holds when

ψ(B)φ(B) = θ(B). (2.37)

For forecasting, we determine the ψ1, ψ2, . . . by equating coefficients of B, B2, B3, . . .in (2.37), as before, assuming the all the roots of φ(B) = 0 are greater thanone in absolute value. Similarly, we can always solve for the residuals, say

wt = xt −p∑

k=1

φkxt−k +q∑

k=1

θkwt−k (2.38)

to get the terms needed for forecasting and estimation.

Example 2.8Consider the above mixed process with p = q = 1, i.e. ARMA(1, 1). By(2.26), we may write

xt = φ1xt−1 + wt − θ1wt−1.

Now,xt+1 = φ1xt + wt+1 − θ1wt

so thatxt

t+1 = φ1xt + 0− θ1wt

and xtt+h = φxt

t+h−1 for h > 1, leading to very simple forecasts in thiscase. Equating coefficients of Bk in

(1− φB)(1 + ψ1B + ψ2B2 + . . .) = (1− θ1B)

leads toψk = (φ1 − θ1)φk−1

1

for k = 1, 2, . . .. Using (2.26) leads to the expression

P tt+h = σ2

w

[1 + (φ1 − θ1)2

∑h−1k=1 φ

2(k−1)1

]

= σ2w

[1 + (φ1−θ1)

2(1−φ2(h−1)1 )

(1−φ21)

]

for the forecast variance.


In the first example of this chapter, it was noted that nonstationary pro-cesses are characterized by a slow decay in the ACF as in Figure 2.3. In manyof the cases where slow decay is present, the use of a first order difference

∆xt = xt − xt−1

= (1−B)xt

will reduce the nonstationary process xt to a stationary series ∆xt. On cancheck to see whether the slow decay has been eliminated in the ACF of thetransformed series. Higher order differences, ∆dxt = ∆∆d−1xt are possibleand we call the process obtained when the dth difference is an ARMA seriesan ARIMA(p, d, q) series where p is the order of the autoregressive component,d is the order of differencing needed and q is the order of the moving averagecomponent. Symbolically, the form is

φ(B)∆dxt = θ(B)wt (2.39)

The principles of model selection for ARIMA(p, d, q) series are obtained usingthe extensions of (2.8)-(2.10) which replace K by K = p + q the total numberof ARMA parameters.

2.6 Seasonal ARIMA Models

When the autoregressive, differencing, or seasonal moving average behaviorseems to occur at multiples of some underlying period s, a seasonal ARIMAseries may result. The seasonal nonstationarity is characterized by slow decayat multiples of s and can often be eliminated by a seasonal differencing operatorof the form

∇Ds xt = (1−Bs)Dxt.

For example, when we have monthly data, it is reasonable that a yearly phe-nomenon will induce s = 12 and the ACF will be characterized by slowlydecaying spikes at 12, 24, 36, 48, . . . and we can obtain a stationary series bytransforming with the operator (1−B12)xt = xt−xt−12 which is the differencebetween the current month and the value one year or 12 months ago.

If the autoregressive or moving average behavior is seasonal at period s, wedefine formally the operators

Φ(Bs) = 1− Φ1(Bs)− Φ2(B2s)− . . .− ΦP (BPs) (2.40)

andΘ(Bs) = 1−Θ1(Bs)−Θ2(B2s)− . . .−ΘQ(BQs). (2.41)

The final form of the ARIMA(p, d, q)×ARIMA(P, D, Q)s model is

Φ(Bs)φ(B)∆sD∆dxt = Θ(Bs)θ(B)wt (2.42)

2.5 SARIMA Models 45

We may also note the properties below corresponding to P2.1-P2.3

Property P2.1’: ACF and PACF of a seasonally non-stationary timeseries

The ACF of a seasonally non-stationary time series decaysvery slowly at lag multiples s, 2s, 3s, . . . with zeros in between,where s denotes a seasonal period ,usually 12. The PACF of anon-stationary time series tends to have a peak very near unityat lag s.

Property P2.2’: PACF for Seasonal AR Series

The partial autocorrelation function φhh as a function of lagh has nonzero values at s, 2s, 3s, . . . , Ps, with zeros in between,and is zero for h > Ps, the order of the seasonal autoregressiveprocess. There should be some exponential decay.

P2.3’: ACF for a Seasonal MA Series

For a seasonal moving average series of order Q, note that theautocorrelation function (ACF) has nonzero values at s, 2s, 3s, . . . , Qsand is zero for h > Qs

Example 2.9:

We illustrate by fitting the monthly birth series from 1948-1979 shown inFigure 2.5. The period encompasses the boom that followed the SecondWorld War and there is the expected rise which persists for about 13years followed by a decline to around 1974, The series appears to havelong-term swings, with seasonal effects super-imposed. The long-termswings indicate possible non-stationarity and we verify that this is thecase by checking the ACF and PACF shown in the top panel of Figure2.6. Note, that by Property 2.1, slow decay of the ACF indicates non-stationarity and we respond by taking a first difference. The resultsshown in the second panel of Figure 2.5 indicate that the first differencehas eliminated the strong low frequency swing. The ACF, shown in thesecond panel from the top in Figure 2.6 shows peaks at 12, 24, 36, 48,..., with now decay. This behavior implies seasonal non-stationarity, byProperty P2.1’ above, with s = 12. A seasonal difference of the firstdifference generates an ACF and PACF in Figure 2.6 that we expect forstationary series.

Taking the seasonal difference of the first difference gives a series thatlooks stationary and has an ACF with peaks at 1 and 12 and a PACF witha substantial peak at 12 and lesser peaks at 12,24, .... This suggests tryingeither a first order moving average term, by Property P2.3, or a first order


50 100 150 200 250 300 350200

300

400Births

50 100 150 200 250 300 350−50

0

501st diff.

50 100 150 200 250 300 350−50

0

50ARIMA(0,1,0)X(0,1,0)

12

50 100 150 200 250 300 350−50

0

50

ARIMA(0,1,1)X(0,1,1)12

month

Figure 2.5 Number of live births 1948(1)-1979(1) and residuals from models with afirst difference, a first difference and a seasonal difference of order 12 and afitted ARIMA(0, 1, 1)× (0, 1, 1)12 model.


0 20 40 60−0.5

0

0.5

1

0 20 40 60−0.5

0

0.5

1

data

PACF

lag lag

0 20 40 60−0.5

0

0.5

1

0 20 40 60−0.5

0

0.5

1

ARIMA(0,1,0)

0 20 40 60−0.5

0

0.5

1

0 20 40 60−0.5

0

0.5

1

ARIMA(0,1,0)X(0,1,0)12

0 20 40 60−0.5

0

0.5

1

0 20 40 60−0.5

0

0.5

1

ARIMA(0,1,0)X(0,1,1)12

0 20 40 60−0.5

0

0.5

1

0 20 40 60−0.5

0

0.5

1

ARIMA(0,1,1)X(0,1,1)12

ACF

Figure 2.6 Autocorrelation functions (ACF) and partial autocorrelation functions(PACF) for the birth series (top two panels), the first difference (secondtwo panels) an ARIMA(0, 1, 0× (0, 1, 1)12 model (third two panels) and anARIMA(0, 1, 1)× (0, 1, 1)12 model (last two panels.

seasonal moving average term with s = 12, by Property P2.3’ above. Wechoose to eliminate the largest peak first by applying a first-order seasonalmoving average model with s = 12. The ACF and PACF of the residualseries from this model, i.e. from ARIMA(0, 1, 0)× (0, 1, 112, written as

(1−B)(1−B12)xt = (1−Θ1B12)wt,


370 375 380 385 390 395 400 405 410150

200

250

300

350

400

450

forecast

lower 95%

upper 95%

month

For

ecas

t 197

9(2)

−198

2(1)

Figure 2.7 A 36 month forecast for the birth series with 95% uncertainty limits.

is shown in the fourth panel from the top in Figure 2.6. We note thatthe peak at lag one is still there, with attending exponential decay inthe PACF. This can be eliminated by fitting a first-order moving averageterm and we consider the model ARIMA(0, 1, 1)× (0, 1, 1)12, written as

(1−B)(1−B12)xt = (1− θ1B)(1−Θ1B12)wt

The ACF of the residuals from this model are relatively well behavedwith a number of peaks either near or exceeding the 95% test of nocorrelation. Fitting this final ARIMA(0, 1, 1)× (0, 1, 1)12 model leads tothe model

(1−B)(1−B12)xt = (1− .4896B)(1− .6844B12)wt

AICc = 4.95, R2 = .98042 = .961,P− values = .000, .000

R2 is computed from saving the predicted values and then plotting againstthe observed values using the 2-D plot option. The format that ASTSAputs out these results is shown below.

ARIMA(0,1,1)x(0,1,1)x12 from U.S. Births AICc = 4.94684 variance =51.1906 d.f. = 358 Start values = .1


predictor coef st. error t-ratio p-valueMA(1) .4896 .04620 10.5966 .000SMA(1) .6844 .04013 17.0541 .000

(D1) (D(12)1) x(t) = (1 -.49B1) (1 -.68B12) w(t)

The ARIMA search in ASTSA leads to the model

(1−.0578B12)(1−B)(1−B12)xt = (1−.4119B−.1515B2)(1−.8136B12)wt

with AICc = 4.8526, somewhat lower than the previous model. The sea-sonal autoregressive coefficient is not statistically significant and shouldprobably be omitted from the model. The new model becomes

(1−B)(1−B12)xt = (1− .4088B − .1645B2)(1− .6990B12)wt,

yielding AICc = 4.92 and R2 = .9812 = .962, slightly better than theARIMA(0, 1, 1)× (0, 1, 1)12 model. Evaluating these latter models leadsto the conclusion that the extra parameters do not add a practicallysubstantial amount to the predictability.

The model is expanded as

(1−B)(1−B12)xt = (1− θ1B)(1−Θ1B12)wt

(1−B −B12 + B13)xt = (1− θ1B − θ1B12 + θ1Θ1B

13)wt

so that

xt − xt−1 − xt−12 + xt−13 = wt − θ1wt−1 −Θ1wt−12 + θ1Θ1wt−13

or

xt = xt−1 + xt−12 − xt−13 + wt − θ1wt−1 −Θ1wt−12 + θ1Θ1wt−13

The forecast is

xtt+1 = xt + xt−11 − xt−12 − θ1wt −Θ1wt−11 + θ1Θ1wt−12

xtt+2 = xt

t+1 + xt−10 − xt−11 −Θ1wt−10 + θ1Θ1wt−11

Continuing in the same manner, we obtain

xtt+12 = xt

t+11 + xt − xt−1 −Θ1wt + θ1Θ1wt−1

for the 12 month forecast.


The forecast limits are quite variable with a standard error that rises to20% of the mean by the end of the forecast period The plot shows thatthe general trend is upward, rising from about 250,000 to about 290,000births per year. One could check the actual records from the years 1979-1982. The direction is not certain because of the large uncertainty. Onecould compute the probability

P (Bt+47 ≤ 250, 000) = Φ(

250− 29060

)= .25,

so there is a 75% chance of increase.

A website where the forecasts can be compared on a yearly basis ishttp://www.cdc.gov/nccdphp/drh/pdf/nvs/nvs48 tb1.pdf

Example 2.10:

Figure 2.8 shows the autocorrelation function of the log-transformed J&Jearnings series that is plotted in Figure 1.4 and we note the slow decayindicating the nonstationarity which has already been obvious in theChapter 1 discussion. We may also compare the ACF with that of arandom walk, shown in Figure 3.2, and note the close similarity. Thepartial autocorrelation function is very high at lag one which, under or-dinary circumstances, would indicate a first order autoregressive AR(1,0)model, except that, in this case, the value is close to unity, indicating aroot close to 1 on the unit circle. The only question would be whetherdifferencing or detrending is the better transformation to stationarity.Following in the Box-Jenkins tradition, differencing leads to the ACFand PACF shown in the second panel and no simple structure is appar-ent. To force a next step, we interpret the peaks at 4, 8, 12, 16, . . . ascontributing to a possible seasonal autoregressive term, leading to a pos-sible ARIMA(0, 1, 0)×(1, 0, 0)4 and we simply fit this model and look atthe ACF and PACF of the residuals, shown in the third two panels. Thefit improves somewhat, with significant peaks still remaining at lag 1 inboth the ACF and PACF. The peak in the ACF seems more isolated andthere remains some exponentially decaying behavior in the PACF, so wetry a model with a first-order moving average. The bottom two panelsshow the ACF and PACF of the resulting ARIMA(0, 1, 1) × (1, 0, 0)4and we note only relatively minor excursions above and below the 95%intervals under the assumption that the theoretical ACF is white noise.The final model suggested is (yt = log x2)

(1− Φ1B4)(1−B)yt = (1− θ1B)wt,

where Φ1 = .820(.058), θ1 = .508(.098) and σ2w = .0086. The model can

be written in forecast form as

yt = yt−1 + Φ1(yt−4 − yt−5) + wt − θ1wt−1.

2.6 Correlated Regression 51

To forecast the original series for, say 4 quarters, we compute the forecastlimits for yt = log xt and then exponentiate, i.e.

xtt+h = exp{yt

t+h}

We note the large limits on the forecast values in Figure 2.9 and mentionthat the situation can be improved by the regression approach in thenext section

2.7 Regression Models With Correlated Errors

The standard method for dealing with correlated errors et in the in the regres-sion model

yt = βββ′zzzt + et (2.2)′

is to try to transform the errors et into uncorrelated ones and then apply thestandard least squares approach to the transformed observations. For example,let P be an n × n matrix that transforms the vector eee = (e1, . . . , en)′ into aset of independent identically distributed variables with variance σ2. Then,transform the matrix version (2.4) to

Pyyy = PZβββ + Peee

and proceed as before. Of course, the major problem is deciding on what tochoose for P but in the time series case, happily, there is a reasonable solution,based again on time series ARMA models. Suppose that we can find, forexample, a reasonable ARMA model for the residuals, say, for example theARMA(p,0,0) model

et =p∑

k=1

φket−k + wt,

which defines a linear transformation of the correlated et to a sequence ofuncorrelated wt. We can ignore the problems near the beginning of the seriesby starting at t = p. In the ARMA notation, using the backshift operator B,we may write

φ(B)et = wt, (2.43)

where

φ(B) = 1−p∑

k=1

φkBk, (2.44)

and applying the operator to both sides of (2.2) leads to the model

φ(B)yt = βββ′φ(B)zzzt + wt, (2.45)


0 10 20 30−0.5

0

0.5

1

log(J&J)

0 10 20 30−0.5

0

0.5

1PACF

lag h lag h

1

0 10 20 30−0.5

0

0.5

1

diff4 8 12

0 10 20 30−0.5

0

0.5

1

0 10 20 30−0.5

0

0.5

1

ARIMA(0,1,0)X(1,0,0)4

1

0 10 20 30−0.5

0

0.5

1

1

0 10 20 30−0.5

0

0.5

1

ARIMA(0,1,1)X(1,0,0)4

0 10 20 30−0.5

0

0.5

1

ACF

Figure 2.8 Autocorrelation functions (ACF) and partial autocorrelation functions(PACF) for the log J&J earnings series (top two panels), the first difference(second two panels) and two sets of ARIMA residuals.


0 10 20 30 40 50 60 70 80 900

5

10

15

20

25

30

quarter

earn

ings

forecasts

− observed

−− predicted

Figure 2.9 Observed and predicted values for the Johnson and Johnson Earnings Serieswith forecast values for the next four quarters, using the ARIMA(0, 1, 1)×(1, 0, 0)4 model for the log-transformed data.

where the wt now satisfy the independence assumption. Doing ordinary leastsquares on the transformed model is the same as doing weighted least squareson the untransformed model. The only problem is that we do not know thevalues of the coefficients φk, k = 1, . . . , p in the transformation (2.42). However,if we knew the residuals et, it would be easy to estimate the coefficients, since(2.42) can be written in the form

et = φφφ′eeet−1 + wt, (2.46)

which is exactly the usual regression model (2.2) with φ′ = (φ1, . . . , φp) replac-ing β and eee′t−1 = (et−1, et−2, . . . , et−p) replacing zzzt.

The above comments suggest a general approach known as the Cochran-Orcutt procedure (Cochrane and Orcutt, 1949) for dealing with the problemof correlated errors in the time series context.

1. Begin by fitting the original regression model (2.2) by least squares, ob-taining βββ and the residuals et = yt − βββ

′zzzt

2. Fit an ARMA to the estimated residuals, say

φ(B)et = θ(B)wt


3. Apply the ARMA transformation found to both sides of the regressionequation (2.2)’ to obtain

φ(B)θ(B)

yt = βββ′φ(B)θ(B)

zzzt + wt

4. Run an ordinary least squares on the transformed values to obtain thenew βββ.

5. Return to 2. if desired. Often, one iteration is enough to develop the esti-mators under a reasonable correlation structure. In general, the Cochran-Orcutt procedure converges to the maximum likelihood or weighted leastsquares estimators.

0 10 20 30−0.5

0

0.5

1

4,8,12,16

detrended

0 10 20 30−0.5

0

0.5

1PACF

4

0 10 20 30−0.5

0

0.5

1

ARIMA(1,0,0)4

0 10 20 30−0.5

0

0.5

1

ACF

Figure 2.10 Autocorrelation functions (ACF) and partial autocorrelation functions(PACF) for the detrended log J&J earnings series (top two panels)and thefitted ARIMA(00, 0, 0)× (1, 0, 0)4 residuals.


0 10 20 30 40 50 60 70 80 900

5

10

15

20

25

30

quarter

earn

ings

forecasts

− observed

−− predicted

Figure 2.8 Observed and predicted values for the Johnson and Johnson Earnings Se-ries with forecast values for the next four quarters, using the correlatedregression model for the log-transformed data.

Example 2.11:

We might consider an alternative approach to treating the Johnson andJohnson Earnings Series, assuming that

yt = log xt = β1 + β2t + et

In order to analyze the data with this approach, first we fit the modelabove, obtaining β1 = −.6678(.0349) and β2 = .0417(.0071). The com-puted residuals et = yt− β1− β2 t can be computed easily, the ACF andPACF are shown in the top two panels of Figure 2.7. Note that the ACFand PACF suggest that a seasonal AR series will fit well and we showthe ACF and PACF of these residuals in the bottom panels of Figure2.7. The seasonal AR model is of the form

et = Φ1et−4 + wt

and we obtain Φ1 = .7614(.0639), with σ2w = .00779. Using these values,

we transform yt to

yt − Φ1yt−4 = β1(1− Φ1) + β2[t− Φ1(t− 4)] + wt


using the estimated value Φ1 = .7614. With this transformed regression,we obtain the new estimators β1 = −.7488(.1105) and β2 = .0424(.0018).The new estimator has the advantage of being unbiased and having asmaller generalized variance.

To forecast, we consider the original model, with the newly estimated β1

and β2. We obtain the approximate forecast for

ytt+h = β1 + β2(t + h) + et

t+h

for the log transformed series, along with upper and lower limits de-pending on the estimated variance that only incorporates the predictionvariance of et

t+h, considering the trend and seasonal autoregressive pa-rameters as fixed. The narrower upper and lower limits shown in Figure2.8 are mainly a reflection of a slightly better fit to the residuals and theability of the trend model to take care of the nonstationarity.

2.8 Chapter 2 Problems

2.1 Consider the regression model

yt = β1yt−1 + et

where et is white noise with zero-mean and variance σ2e . Assume that we

observe y1, y2, . . . , yn and consider the model above for t = 2, 3, . . . , n.Show that the least squares estimator of β1 is

β1 =∑n

t=2 ytyt−1∑nt=2 y2

t−1

.

If we pretend that yt−1 are fixed, show that

var{β1} =σ2

e∑nt=2 y2

t−1

Relate your answer to a method for fitting a first-order AR model to thedata yt.

2.2 Consider the autoregressive model (2.13) for p = 1, i.e.

xt − φ1xt−1 = wt

(a) show that the necessary condition below (2.15) implies that |φ1| < 1.

Chapter 2 Problems 57

(b) Show that

xt =∞∑

k=0

φk1wt−k

is the form of (2.16) in this case.

(c) Show that E[wtxt] = σ2w and E[wtxt−1] = 0, so that future errors

are uncorrelated with past data.

2.3 The autocovariance and autocorrelation functions for AR processes areoften derived from the Yule-Walker equations, obtained by multiplyingboth sides of the defining equation, successively by xt, xt−1, xt−2, . . .,using the result (2.16).

(a) Derive the Yule-Walker equations

γx(h)− φ1γx(h− 1) =

σ2w, h = 0

0, h > 0.

(b) Use the Yule-Walker equations to show that

ρx(h) = φ|h|1

for the first-order AR.

2.4 For an ARMA series we define the optimal forecast based on xt, xt−1, . . .as the conditional expectation

xtt+h = E[xt+h|xt, xt−1, . . .]

for h = 1, 2, 3, . . ..

(a) Show, for the general ARMA model that

E[wt+h|xt, xt−1, . . .] =

0, h > 0

wt+h, h ≤ 0

(b) For the first-order AR model, show that the optimal forecast is

xtt+h =

φ1xt, h = 1

φ1xtt+h−1, h > 1

(c) Show that E[(xtt+1 − xt+1)2] = σ2

w is the prediction error varianceof the one-step forecast.


2.5 Suppose we have the simple linear trend model

yt = β1t + xt,

t = 1, 2, . . . , n, wherext = φ1xt−1 + wt.

Give the exact form of the equations that you would use for estimatingβ1, φ1 and σ2

w using the Cochran-Orcutt procedure of Section 2.7.

50 100 150 200 250 300 350 400 450 50050

100

150

LA Cardiovascular Mortality

50 100 150 200 250 300 350 400 450 50040

60

80

100Temperature

50 100 150 200 250 300 350 400 450 5000

50

100

Particulate Level

Figure 2.9 Los Angeles Mortality, Temperature and Particulates (6-day increment).

2.6 Consider the file la regr.dat, in the syllabus, which contains cardio-vascular mortality, temperature values and particulate levels over 6-dayperiods from Los Angeles County (1970-1979). The file also contains twodummy variables for regression purposes, a column of ones for the con-stant term and a time index. The order is as follows: Column 1: 508cardiovascular mortality values (6-day averages), Column 2: 508 ones,Column 3: the integers 1, 2, . . . , 508, Column 3: Temperature in degreesF and Column 4: Particulate levels. A reference is Shumway et al (1988).The point here is to examine possible relations between the temperatureand mortality in the presence of a time trend in cardiovascular mortality.

(a) Use scatter diagrams to argue that particulate level may be lin-early related to mortality and that temperature has either a linearor quadratic relation. Check for lagged relations using the crosscorrelation function.

Chapter 2 Problems 59

(b) Adjust temperature for its mean value, using the Scale option andfit the model

Mt = β0 + β1(Tt − T ) + β2(Tt − T )2 + β3Pt + et,

where Mt, Tt and Pt denote the mortality, temperature and particu-late pollution series. You can use as inputs Columns 2 and 3 for thetrend terms and run the regression analysis without the constantoption. Note that you need to transform temperature first. Retainthe residuals for the next part of the problem.

(c) Plot the residuals and compute the autocorrelation (ACF) and par-tial autocorrelation (PACF) functions. Do the residuals appear tobe white? Suggest an ARIMA model for the residuals and fit theresiduals. The simple ARIMA(2, 0, 0) model is a good compromise.

(d) Apply the ARIMA model obtained in part (c) to all of the inputvariables and to cardiovascular mortality using the ARIMA trans-formation option. Retain the forecast values for the transformedmortality, say mt = Mt − φ1Mt−1 − φ2Mt−2.

2.7 Generate 10 realizations of a (n = 200 points each) series from anARIMA(1,0,1) Model with φ1 = .90, θ1 = .20 and σ2 = .25. Fit theARIMA model to each of the series and compare the estimators to thetrue values by computing the average of the estimators and their stan-dard deviations.

2.8 Consider the bivariate time series record containing monthly U.S. Pro-duction as measured monthly by the Federal Reserve Board ProductionIndex and unemployment as given in the file frb.asd. The file containsn = 372 monthly values for each series. Before you begin, be sure to plotthe series. Fit a seasonal ARIMA model of your choice to the FederalReserve Production Index. Develop a 12 month forecast using the model.

2.9 The file labeled clim-hyd.asd has 454 months of measured values forthe climatic variables Air Temperature, Dew Point, Cloud Cover, WindSpeed, Preciptation, and Inflow at Shasta Lake. We would like to look atpossible relations between the weather factors and between the weatherfactors and the inflow to Shasta Lake.

(a) Fit the ARIMA(0, 0, 0)× (0, 1, 1)12 model to transformed precipi-tation Pt =

√pt and transformed flow it = log it. Save the residuals

for transformed precipitation for use in part (b).

(b) Apply the ARIMA model fitted in part (a) for transformed precip-itation to the flow series. Compute the cross correlation betweenthe flow residuals using the precipitation ARIMA model and theprecipitation residuals using the precipitation model and interpret.


50 100 150 200 250 300 3500

50

100

150

200

Federal Reserve Board Production Index

50 100 150 200 250 300 3500

200

400

600

800

1000

Monthly Unemployment

month

Figure 2.10 Federal Reserve Board Production and Unemployment for Problem 2.7.

Use the coefficients from the ARIMA model in the transform optionin the main menu to construct the transformed flow residuals. Sug-gest two possible models for relating the two series. More analysiscan be done using the transfer function models of Chapter 4.

ASTSA Notes 61

2.9 Chapter 2 ASTSA Notes

8. Regression Analysis

Time domain →Multiple Regression

Model (without constant):

yt = β1zt1 + β2zt2 + . . . + βqztq + et

Model (with constant):

yt = β0 + β1zt1 + β2zt2 + . . . + βqztq + et

Series(dependent):yt

No. of independent series: q

series 1: zt1−h1

lag: h1 Often is zero

· · ·series q: ztq−hq

lag: hq Often is zero

forecasts: 0

constant(y/n):

selector(AIC,AICc, BIC, FPEL, AICL): AICc

Save →ResidualsSave →Predicted

9. Fit ARIMA(p, d, q)× (P, D, Q)s

Time Domain →ARIMA

Series:

p: AR order

d: Difference

q: MA order

P: SAR order

D: Seasonal Difference


Q: SMA order

season: s

forecasts: h

use .1 guess(y/n): y

selector(AIC,AICc, BIC, FPEL, AICL): AICc

Save →ResidualsSave →Predicted

10. ARIMA Transformation

Transform →Transform →ARIMA Residual

Series:

p: AR order

d: Difference

q: MA order

P: SAR order

D: Seasonal Difference

Q: SMA order

season: s

CHAPTER 2 Univariate Time Series Models 2.1 Least Squares Regressionshumway/sta137/ch2s.pdf ·...

Documents

Transcript of CHAPTER 2 Univariate Time Series Models 2.1 Least Squares Regressionshumway/sta137/ch2s.pdf ·...