9. Time series regression and forecasting - uni … · Time series regression and forecasting ......

9. Time series regression and forecasting

Key feature of this section:

• Analysis of data on a single entity observed at multiple pointsin time(time series data)

Typical research questions:

• What is the causal effect on a variable of interest, Y , of achange in another variable, X, over time?

• What is the best forecast of the value of an economic variable(e.g. the stock price) at some future date?

251

Remarks:

• The analysis of time series data requires knowledge on severalspecific concepts

• Some of these key concepts (to be explained) are

forecasting

estimation of dynamic causal effects

stationarity (non-stationarity)

Aim of this section:

• Description of these basic concepts(For detailed information refer to the many special lectureson this topic)

252

9.1. Time series data and serial correlation

Point of departure:

• We consider an economic / financial variable over time(for example, the inflation rate or the unemployment rate)

• We denote the observation on that variable at date t by Yt

• We denote the number of all observations by T

• The period of time between observations (i.e. between ob-servation t and t + 1) is some unit of time(a day, week, month, quarter, . . .)

• In a first step, we always plot the observations on the variableover time

253

Inflation and unemployment in the US, 1960–2004 (quarterly data)

254

Definition 9.1: (Lags, first differences)

We consider a time series variable Yt observed through time (t =1, . . . , T ). We use the following notation:

• The first lag of a time series Yt is Yt−1.

• The jth lag is Yt−j.

• The first difference of a series, ∆Yt, is its change betweenperiods t− 1 and t:

∆Yt = Yt − Yt−1.

• The first difference of the logarithm of Yt is

∆ ln(Yt) = ln(Yt)− ln(Yt−1).

255

Remarks:

• The percentage change of a time series Yt between periodst− 1 and t is approximately 100×∆ln(Yt)

• This approximation is most accurate when the percentagechange is small(see Slide 92)

256

Autocorrelation:• An important issue in the analysis of time series data is

whether and how a variable Yt is related to its own pastvalues

• As a measure of this relation, we use the covariance and thecorrelation of Yt with its own past values

−→ Autocovariance and autocorrelation (serial correlation)

Definition 9.2: (Autocovariance, autocorrelation)

The jth autocovariance (autocorrelation) of a series Yt is thecovariance (correlation coefficient) between Yt and its jth lag:

jth autocovariance ≡ γj = Cov(Yt, Yt−j),

jth autocorrelation ≡ ρj =Cov(Yt, Yt−j)

√

Var(Yt) ·Var(Yt−j).

257

Remarks:• The formulas in Definition 9.2

represent the probabilistic autocovariances and autocor-relations when we think of the series Yt as a sequence ofrandom variables(jth population autocovariances and autocorrelations)

implicitly assume that the population autocovariances γjand autocorrelations ρj remain constant over time, forexample in the case of j = 1

γ1 = Cov(Y2, Y1) = Cov(Y3, Y2) = . . .

ρ1 = Corr(Y2, Y1) = Corr(Y3, Y2) = . . .

−→ Assumption of stationarity(to be discussed later)

258

Question:

• How can we estimate the theoretical population autocovari-ances γj and autocorrelations ρj on the basis of the timeseries observations Y1, Y2, . . . , YT?

Definition 9.3: (Estimation of autocovariances, autocorrelations)

The conventional estimators of the jth autocovariance γj and thejth autocorrelation ρj based on the observations Y1, Y2, . . . , YT aredefined as follows:

γj =1T

T∑

t=j+1

(

Yt − Y)

(

Yt−j − Y)

, (9.1)

ρj =γj

γ0, (9.2)

where Y denotes the sampling mean of the Yt.259

Remarks:

• The estimators (9.1) and (9.2) are consistent

• Both estimators implicitly assume stationarity

Example:

• Sample autocorrelations of the U.S. CPI inflation rate andits changes up to lag 10(see next slides)

• We denote the U.S. CPI inflation rate in EViews by INFt

• INFt itself exhibits a strongly positive autocorrelation

• Its first difference, ∆INFt, exhibits a strongly negative auto-correlation

260

U.S. CPI inflation rate and its changes

261

-4

0

4

8

12

16

1960 1970 1980 1990 2000

U.S. CPI inflation rate

-6

-4

-2

0

2

4

6

1960 1970 1980 1990 2000

First differences in the U.S. CPI inflation rate

Sample autocorrelations of the U.S. CPI inflation rate and its changes

262

Variable: INF Date: 25/06/12 Time: 11:21 Sample: 1957Q1 2005Q1 Included observations: 192

Autocorrelation Partial Correlation AC PAC Q-Stat Prob

.|******| .|******| 1 0.837 0.837 136.72 0.000 .|***** | .|* | 2 0.761 0.202 250.38 0.000 .|******| .|** | 3 0.765 0.302 365.58 0.000 .|***** | *|. | 4 0.676 -0.158 456.17 0.000 .|**** | .|. | 5 0.611 -0.009 530.62 0.000 .|**** | .|. | 6 0.578 -0.018 597.60 0.000 .|**** | .|. | 7 0.511 -0.047 650.22 0.000 .|*** | *|. | 8 0.414 -0.180 684.99 0.000 .|*** | .|* | 9 0.395 0.118 716.76 0.000 .|*** | .|* | 10 0.383 0.098 746.73 0.000

Variable: D(INF) Date: 25/06/12 Time: 11:22 Sample: 1957Q1 2005Q1 Included observations: 191

Autocorrelation Partial Correlation AC PAC Q-Stat Prob

**|. | **|. | 1 -0.265 -0.265 13.594 0.000 **|. | ***|. | 2 -0.252 -0.346 25.956 0.000 .|** | .|* | 3 0.294 0.138 42.918 0.000 *|. | .|. | 4 -0.077 -0.030 44.079 0.000 *|. | .|. | 5 -0.110 -0.024 46.492 0.000 .|* | .|. | 6 0.110 0.002 48.905 0.000 .|* | .|* | 7 0.086 0.126 50.402 0.000 **|. | *|. | 8 -0.228 -0.151 60.877 0.000 .|. | *|. | 9 -0.026 -0.133 61.014 0.000 .|* | *|. | 10 0.112 -0.075 63.570 0.000

Four economic time series

263

9.2. Autoregressions


• Forecasts made using a regression model that relates a timeseries variable to its own past values

Definition 9.4: (First order autoregressive model)

We consider an economic time series variable Yt and define thefirst order autoregressive population model (the AR(1) model)for Yt as

Yt = β0 + β1 · Yt−1 + ut, (9.3)

where ut is an error term.

264

Example:

• Fit of an AR(1) model to the change in the U.S. CPI inflationrate

EViews output: Fit of an AR(1) model to ∆INFt

265

Dependent Variable: D(INF) Method: Least Squares Date: 28/06/12 Time: 10:29 Sample (adjusted): 1957Q4 2004Q4 Included observations: 189 after adjustments Convergence achieved after 3 iterations

Variable Coefficient Std. Error t-Statistic Prob.

C -0.002156 0.095362 -0.022612 0.9820AR(1) -0.263130 0.070795 -3.716805 0.0003

R-squared 0.068793 Mean dependent var -0.000258Adjusted R-squared 0.063813 S.D. dependent var 1.711465S.E. of regression 1.655958 Akaike info criterion 3.857162Sum squared resid 512.7908 Schwarz criterion 3.891466Log likelihood -362.5018 Hannan-Quinn criter. 3.871060F-statistic 13.81464 Durbin-Watson stat 2.174737Prob(F-statistic) 0.000266

Inverted AR Roots -.26

Forecasting:

• Consider the AR(1) model in Eq. (9.3)

• We estimate the unknown parameters β0 and β1 by OLSusing the data Y1, . . . , YT

−→ We obtain the OLS estimates β0 and β1

• We aim at forecasting the future value YT+1 based on theobserved value YT and the estimated coefficients β0 and β1

• We denote this forecast by YT+1|T and it is given by

YT+1|T = β0 + β1 · YT (9.4)

• The forecast error is the difference between the value of YT+1that actually occurs and its forecasted value based on YT :

Forecast error = YT+1 − YT+1|T (9.5)

266

Forecasting: [continued]

• Forecasts versus predicted values

Forecasts pertain to out-of-sample observations

Predicted values pertain to in-sample observations

• The root mean squared forecast error (RMSFE) is a measureof the magnitude of a typical mistake using a forecastingmodel and is defined by

RMSFE =√

E[

YT+1 − YT+1|T]2

(9.6)

• The RMSFE has two sources:

The unknown future values of ut

The errors in the estimates β0 and β1

267

Forecasting: [continued]

• The RMSFE can be estimated by the standard error of theregression(to be discussed in Section 9.3.)

Example:

• We consider the estimation output on Slide 265

• The estimated AR(1) model for the changes in INFt is

∆INFt = −0.0022− 0.2631 ·∆INFt−1 (9.7)

• The estimation period is 1957:Q1 – 2004:Q4

• We aim at forecasting the inflation rate for 2005:Q1

268

Example: [continued]

• In general, we have

INFT+1|T = INFT + INFT+1|T − INFT

= INFT + ∆INFT+1|T (9.8)

• Setting T = 2004:Q4, we find from the data set

INF2004:Q4 = 3.5051% (9.9)

∆INF2004:Q4 = 1.8824% (9.10)

• From Eq. (9.7), we have

∆INF2005:Q1 = −0.0022− 0.2631 ·∆INF2004:Q4= −0.4975% (9.11)

269


• It follows from the Eqs. (9.8) – (9.11) that

INF2005:Q1|2004:Q4 = 3.5051− 0.4975

= 3.0076%

• Accuracy of the forecast:

From the data set we find that

INF2005:Q1 = 2.3660%

−→ The forecast error is

INF2005:Q1 − INF2005:Q1|2004:Q4 = 2.3660− 3.0076

= −0.6416%

270

Next:

• Extension of the AR(1) model by including potentially usefulinformation in more distant past values of the time series

Definition 9.5: (pth-order autoregressive model)

The pth-order autoregressive model (the AR(p) model) repre-sents Yt as a linear function of p of its lagged values:

Yt = β0 + β1 · Yt−1 + β2 · Yt−2 + . . . + βp · Yt−p + ut, (9.12)

where E(ut|Yt−1, Yt−2, . . .) = 0. The number of lags p is calledthe order, or the lag length, of the autoregression.

271

Implications of the assumption E(ut|Yt−1, Yt−2, . . .) = 0:

1. The best forecast of YT+1 based on its entire history dependsonly on the most recent p past values

It can be shown that if Yt follows an AR(p) model, thenthe best forecast (in the sense of having smallest RMSFE)of YT+1 based on YT , YT−1, . . . is

YT+1|T = β0+β1 ·YT +β2 ·YT−1+ . . .+βp ·YT−p+1 (9.13)

Since the coefficients β0, . . . , βp are unknown, we use theforecast from Eq. (9.13) with estimated coefficients

2. The errors ut are serially uncorrelated

272

Fit of an AR(4) model to ∆INFt

273



C 0.002142 0.075830 0.028241 0.9775AR(1) -0.280006 0.074085 -3.779518 0.0002AR(2) -0.314222 0.076197 -4.123837 0.0001AR(3) 0.139577 0.076431 1.826198 0.0695AR(4) -0.033850 0.074372 -0.455150 0.6495

R-squared 0.200155 Mean dependent var 0.004425Adjusted R-squared 0.182478 S.D. dependent var 1.702383S.E. of regression 1.539242 Akaike info criterion 3.726971Sum squared resid 428.8372 Schwarz criterion 3.813685Log likelihood -341.6083 Hannan-Quinn criter. 3.762111F-statistic 11.32343 Durbin-Watson stat 2.006627Prob(F-statistic) 0.000000

Inverted AR Roots .19+.18i .19-.18i -.33+.62i -.33-.62i

Estimation results:

• Estimated equation:

∆INFt = 0.0021− 0.2800 ·∆INFt−1 − 0.3142 ·∆INFt−2

(0.0758) (0.0741) (0.0762)+0.1396 ·∆INFt−3 − 0.0339 ·∆INFt−4 (9.14)(0.0764) (0.0744)

• The coefficients on ∆INFt−2,∆INFt−3,∆INFt−4 are jointly sta-tistically different from zero(F -statistic = 10.4249, p-value < 0.0001)

• R2 improves from 0.0638 for the AR(1) model on Slide 265to 0.1825

• SER improves from 1.6560 for the AR(1) model on Slide 265to 1.5392

274

Inflation forecast for 2005:Q1:

• Recall Eq. (9.8) on Slide 269 with T = 2004:Q4

INFT+1|T = INFT + ∆INFT+1|T

• From the data set we have

INF2004:Q4 = 3.5051∆INF2004:Q4 = 1.8824 , ∆INF2004:Q3 = −2.7132∆INF2004:Q2 = 0.5301 , ∆INF2004:Q1 = 2.9390

• Using the estimates in Eq. (9.14) on Slide 274, we obtain

∆INF2005:Q1|2004:Q4 = 0.0021− 0.2800× 1.8824−0.3142× (−2.7132)+0.1396× 0.5301−0.0339× 2.9390

= 0.3019

275

Inflation forecast for 2005:Q1: [continued]

• From Eq. (9.8) we thus have

INF2005:Q1|2004:Q4 = 3.5051 + 0.3019

= 3.8070%

• Accuracy of the forecast:

From the data set we find that INF2005:Q1 = 2.3660%


INF2005:Q1 − INF2005:Q1|2004:Q4 = 2.3660− 3.8070

= −1.441

276

Obviously:

• Surprisingly, the AR(4) forecast error (-1.441) is larger inabsolute value than the AR(1) forecast error (-0.6416)(to be explained in Section 9.3.)

277

9.3. Time series regression with additional predic-tors and the autoregressive distributed lag model

Next:

• Other variables than past values of the Y -variable may helpto forecast the variable of interest

• These variables (called predictors) should be included on theright-hand side of the autoregression Eq. (9.12) on Slide 271

−→ Autoregressive distributed lag models

Example:

• Forecasting changes in the inflation rate using past unem-ployment rates (short-run Phillips curve)

• We denote the unemployment rate in EViews by UNEMP

278

Change in the U.S. CPI inflation rate between year t and year t + 1 versus

the unemployment rate in year t

279

Fit of an AR(4) model plus UNEMPt−1 to ∆INFt

280



C 0.634881 0.307273 2.066183 0.0402UNEMP(-1) -0.106746 0.050459 -2.115503 0.0358

AR(1) -0.303283 0.074401 -4.076311 0.0001AR(2) -0.344885 0.077887 -4.428021 0.0000AR(3) 0.108898 0.078209 1.392391 0.1655AR(4) -0.051430 0.075145 -0.684419 0.4946



Estimation results:

• Lagged predictor UNEMPt−1 is significant at the 5% level

• Improvement of the R2 from 0.1825 for the pure AR(4)model to 0.1974

• Using the estimates from Slide 280 and the data set includingthe observations for UNEMP, we compute the inflation forecastfor 2005:Q1 as

INF2005:Q1|2004:Q4 = 3.8318%


INF2005:Q1 − INF2005:Q1|2004:Q4 = 2.3660− 3.8318

= −1.4658

281

Fit of an AR(4) model plus (UNEMPt−1, . . . , UNEMPt−4) to ∆INFt

282



C 0.598842 0.249966 2.395698 0.0176UNEMP(-1) -2.272813 0.407136 -5.582439 0.0000UNEMP(-2) 3.940038 0.945639 4.166535 0.0000UNEMP(-3) -2.501335 0.928075 -2.695186 0.0077UNEMP(-4) 0.733724 0.388328 1.889446 0.0605

AR(1) -0.468472 0.075819 -6.178804 0.0000AR(2) -0.348231 0.083839 -4.153555 0.0001AR(3) -0.043275 0.083751 -0.516707 0.6060AR(4) -0.052008 0.074859 -0.694752 0.4881



Estimation results:

• Predictor lags UNEMPt−1, UNEMPt−2, UNEMPt−3 are individually sig-nificant at the 1% level, UNEMPt−4 at the 10% level

• Substantial improvement of the R2 from 0.1974 to 0.3079

• Using the estimates from Slide 282 and the data set includingthe observations for UNEMP, we compute the inflation forecastfor 2005:Q1 as

INF2005:Q1|2004:Q4 = 3.4408%


INF2005:Q1 − INF2005:Q1|2004:Q4 = 2.3660− 3.4408

= −1.0748

283

Now:

• Formal definition of this autoregressive model including oneadditional predictor

Definition 9.6: (Autoregressive distributed lag model)

The autoregressive distributed lag model with p lags of Yt and qlags of the predictor Xt, denoted by ADL(p, q), is

Yt = β0 + β1 · Yt−1 + β2 · Yt−2 + . . . + βp · Yt−p

δ1 ·Xt−1 + δ2 ·Xt−2 + . . . + δq ·Xt−q + ut, (9.15)

where β0, β1, . . . , βp, δ1, . . . , δq are unknown coefficients and ut isthe error term with E(ut|Yt−1, Yt−2, . . . , Xt−1, Xt−2, . . .) = 0.

284

Remarks:

• The assumption E(ut|Yt−1, Yt−2, . . . , Xt−1, Xt−2, . . .) = 0 im-plies that no additional lags of either Y or X belong in theADL model(the lag lengths p and q are the true lag lengths)

• The ADL model contains

lags of the dependent variable(autoregressive component)

a distributed lag of a single additional predictor X

• In general, forecasts can be improved by using multiple pre-dictors

285

Stationarity:

• Forecasting future values of a time series Yt based on pastrelationships implicitly require that the relationships remainstable over time

−→ Concept of stationarity

Definition 9.7: (Stationarity)

A time series Yt is stationary if its probability distribution does notchange over time, that is, if the joint distribution (Ys+1, Ys+2, . . . ,Ys+T ) does not depend on s regardless of the value of T ; oth-erwise, Yt is said to be nonstationary. A pair of time series, Xtand Yt, are said to be jointly stationary, if the joint distribution(Xs+1, Ys+1, Xs+2, Ys+2, . . . , Xs+T , Ys+T ) does not depend on sregardless of the value of T . Stationarity requires the future tobe like the past, at least in a probabilistic sense.

286

Definition 9.8: (Time series regression with multiple predictors)

The general times series regression model allows for k additonalpredictors X1, . . . Xk with q1 included lags of X1, q2 included lagsfor X2, and so forth:

Yt = β0 + β1Yt−1 + β2Yt−2 + . . . + βpYt−p+ δ11X1t−1 + δ12X1t−2 + . . . + δ1q1X1t−q1 (9.16)+ . . . + δk1Xkt−1 + δk2Xkt−2 + . . . + δkqk

Xkt−qk+ ut,

where

1. E(ut|Yt−1, Yt−2, . . . , X1t−1, X1t−2, . . . , Xkt−1, Xkt−2, . . .) = 0.

2. The random variables

(Yt, X1t, . . . , Xkt) have a stationary distribution.

(Yt, X1t, . . . , Xkt) and (Yt−j, X1t−j, . . . , Xkt−j) become inde-pendent as j gets large.

287

Definition 9.8: [continued]

3. Large outliers are unlikely: X1t, . . . , Xkt and Yt have nonzero,finite fourth moments.

4. There is no perfect multicollinearity.

Remarks:

• The first part of Assumption #2 requires that the distributionof the data today is the same as its distribution in the past

• The second part of Assumption #2 requires that the randomvariables become independently distributed when the amountof time separating them becomes large

−→ Both parts replace the cross-sectional OLS Assumption#2 on Slide 18

288

Statistical inference:

• Given the assumptions in Definition 9.8, we can apply OLSin the ususal way to make inference on the regression coef-ficients

• We can use the F -statistic to test whether the lags of oneof the included regressors have useful predictive content

−→ Granger causality tests

Definition 9.9: (Granger causality test)

The Granger causality statistic is the F -statistic testing the hy-pothesis that the coefficients on all the values of one of the vari-ables in Eq. (9.16) are simultaneously equal to zero (for example,the coefficients on X1t−1, X1t−2, . . . X1t−q1). This null hypothe-sis implies that these regressors have no predictive content forYt beyond that contained in the other regressors.

289

Remarks:

• Granger causality means that if X Granger-causes Y , thenX is a useful predictor of Y , given the other variables in theregression

• A more accurate phrasing than Granger causality would beGranger predictability

Example:

• Consider the relationship between ∆INFt and its past valuesand past values of UNEMP on Slide 282

• Test the null hypothesis on the UNEMP coefficients

H0 : δ11 = 0, δ12 = 0, δ13 = 0, δ14 = 0

290


• F -statistic: 11.4662, p-value < 0.0001

• UNEMP appears to contain information that is useful for fore-casting the change in the inflation rate

Forecast uncertainty:

• We consider the RMSFE defined in Eq. (9.6) on Slide 267as a measure of uncertainty of a forecast

• In general, the RMSFE consists of two components:

uncertainty arising from the estimation of the regressioncoefficients

uncertainty about the future unknown value of ut

291

Example:

• Consider an ADL(1,1) model with a single predictor:

Yt = β0 + β1Yt−1 + δ1Xt−1 + ut

• Assume further that ut is homoskedastic

• The forecast of YT+1 is

YT+1|T = β0 + β1YT + δ1XT


YT+1 − YT+1|T = uT+1

−[

(β0 − β0) + (β1 − β1)YT + (δ1 − δ1)XT]

(9.17)

292

Example: [continued]• Since ut is homoskedastic, uT+1 has variance σ2

u and it canbe shown that

MSFE = E[

(

YT+1 − YT+1|T)2

]

= σ2u + Var

[

(β0 − β0) + (β1 − β1)YT + (δ1 − δ1)XT]

,(9.18)

so that RMSFE =√

MSFE

Forecast uncertainty: [continued]• The term σ2

u appearing in Eq. (9.18) can be estimated bythe standard error of the regression defined on Slide 21:

σ2u = SER2 =

1T − 3

T∑

t=1u2

t

293

Forecast uncertainty: [continued]

• The second term appearing on the right-hand side ofEq. (9.18) can be estimated by specific statistical techniques(not to be discussed here)

• Adding the two latter estimates yield the estimate RMSFE

• In practice, a 95% forecast interval of YT+1 is (approxi-mately) given by

YT+1|T ± 1.96 · RMSFE

• Eq. (9.18) illuminates that highly parameterized models mayproduce larger forecast errors than parsimoneous models(see our previous examples)

294

9.4. Lag length selection using information crite-ria

Important practical issue:

• How many lags should be included in a time series regression?


• Presentation of statistical methods for choosing the numberof lags in

an autoregression

a time series regression with multiple predictors

295

9.4.1. Determining the order of an autoregression

Potential trade-off:

• If the order p of an estimated autoregression is

too low, we omit potentially valuable information con-tained in the more distant lagged values

too high, we estimate more coefficients than necessarythus introducing additional estimation error into our fore-casts

Two approaches:

• t-statistic approach

• Use of information criteria

296

t-statistic approach:

• Consider a model with many lags (that is with a high valueof p) and perform hypothesis tests on the final lag

• Example:

Start by estimating an AR(6) model and test whether thecoefficient on the sixth lag is significant at the 5% level

if not, drop it and estimate an AR(5) model, test thesignificance of the fifth lag, and so forth

Remarks:

• The t-statistic approach has the tendency to produce toolarge a model

297

Remarks: [continued]• Reasoning:

Even if the true AR order is 5 (that is β6 = 0), a t-test atthe 5% level will incorrectly reject the null hypothesis

H0 : β6 = 0

5% of the time by chance

−→ When the true value of p is five, this approach will estimatep to be six 5% of the time

• Some textbooks suggest the t-statistic approach starting withthe order p = 0 and then successively including AR termswhenever the t-statistic indicates significance at the 5% level(modeling from small to large)

• This is not a recommended procedure because of potentialomitted-variable bias

298

Information criteria:

• Information criteria are measures reflecting the trade-off de-scribed on Slide 296 in the selection of the order p in anautoregression

• We can estimate the order p of an autoregression by mini-mizing such an information criterion

• The most popular criteria are the Schwarz (SIC) and theAkaike (AIC) information criteria

• Both, the SIC and the AIC, are based on the sum of squaredresiduals (SSR) of the AR(p) model estimated by OLS:

SSR(p) =T

∑

t=1u2

t =T

∑

t=1

(

Yt − β0 − β1Yt−1 − . . .− βpYt−p)2

(see Slide 22)

299

Definition 9.10: (Schwarz, Akaike information criteria)

The Schwarz (SIC) and the Akaike (AIC) information criteria ofan AR(p) model estimated by OLS are respectively defined as

SIC(p) = ln

[

SSR(p)T

]

+ (p + 1) ·ln(T )

T, (9.19)

AIC(p) = ln

[

SSR(p)T

]

+ (p + 1) ·2T

. (9.20)

The SIC and the AIC estimators of p are the values that respec-tively minimize SIC(p) and AIC(p) among the possible choicesp = 0,1, . . . , pmax, where pmax is the largest value of p consid-ered and p = 0 corresponds to the model that contains only anintercept.

300

Information criteria: [continued]

• The first term on the right-hand side of the Eqs. (9.19)and (9.20) necessarily decrease when adding a lag to theautoregression

• The respective second terms, (p+1)·ln(T )/T and (p+1)·2/T ,necessarily increase when adding a lag to the autoregression(terms punishing the inclusion of further lags)

−→ The two terms in the Eqs. (9.19) and (9.20) reflect thetrade-off

• Both information criteria are routinely computed by EViews(see the outputs on Slides 265, 273)

301


• It can be proved that

−→ the SIC estimator of p is consistent

−→ the AIC estimator of p is not consistent(AIC overestimates p with nonzero probability)

The Schwarz (SIC) and Akaike (AIC) information criteria of distinct AR(p)models for ∆INFt, 1958:Q4–2005:Q1 (T = 186)

302

AR‐order p SIC(p) AIC(p)

0 3.9111 3.8937 1 3.8659 3.8312 2 3.7700 3.7180 3 3.7778 3.7085 4 3.8046 3.7179 5 3.8319 3.7278

Remark:

• The SIC and AIC estimates of p should be determined byrunning all autoregressions involved over the same samplingperiod (thus, using the same number of observations)

Example:

• Our U.S. inflation dataset originally covers the sampling pe-riod 1957:Q1–2005:Q1 (T = 193)

• Since we estimate, inter alia, an AR(5) model involving thehighest lag ∆INFt−5 the feasible sampling period adjusts to1958:Q4–2005:Q1 (T = 186)

• It is this period, 1958:Q4–2005:Q1 with T = 186, thatshould also be used to compute the SIC and AIC values inall other autoregressions with p ≤ 4

303

9.4.2. Lag length selection in time series regres-sion with multiple predictors

Potential trade-off here:

• The choice of the number of predictors plus the correspond-ing lag numbers must balance

the benefit of using additional information

the cost of estimating additional coefficients

Two approaches:

• F -statistic approach

• Information criteria

304

F -statistic approach:

• Use the F -statistic to test joint hypotheses that sets of co-efficients are simultaneously equal to zero

• Example:

Consider Eq. (9.16) on Slide 287

Use the F -statistic to test the null hypothesis

H0 : δ11 = 0, δ12 = 0, . . . , δ1q1 = 0

(the predictor X1 has no predictive content)

• Similar to the t-statistic approach, the F -statistic approachhas the tendency to produce too large models

305

Information criteria:

• Consider the general time series regression model with mul-tiple predictors defined in Eq. (9.16) on Slide 287

• Denote the number of all coefficients (including the inter-cept) by K

• The SIC and AIC information criteria are then modified as

SIC(K) = ln

[

SSR(K)T

]

+ K ·ln(T )

T

AIC(K) = ln

[

SSR(K)T

]

+ K ·2T

• Evaluate the SIC (or AIC) for each candidate model

• The model with the minimal value of SIC (or AIC) is thepreferred model

306


• Two important practical considerations:

Again, all candidate models must be estimated over thesame sampling period(see Slide 303)

When there are multiple predictors, the approach getscomputationally demanding since it requires computingmany different models(many combinations of the lag parameters)

−→ Convenient shortcut:Require all the regressors to have the same number oflags, that is, require that p = q1 = . . . = qk, so that onlypmax + 1 models need to be compared (corresponding top = 0,1, . . . , pmax)

307

Thank you for your attention!

308

9. Time series regression and forecasting - uni … · Time series regression and forecasting ......

Documents

Transcript of 9. Time series regression and forecasting - uni … · Time series regression and forecasting ......