Time Series Analysis R

Analysis of Time Series Data Using R1

ZONGWU CAIa,b,c

E-mail address: [email protected] of Mathematics & Statistics and Department of Economics,

University of North Carolina, Charlotte, NC 28223, U.S.A.bWang Yanan Institute for Studies in Economics, Xiamen University, China

cCollege of Economics and Management, Shanghai Jiaotong University, China

July 30, 2006

c!2006, ALL RIGHTS RESERVED by ZONGWU CAI

1This manuscript may be printed and reproduced for individual or instructional use, but maynot be printed for commercial purposes.

Preface

The purpose of this lecture notes is designed to provide an overview of methods thatare useful for analyzing univariate and multivariate phenomena measured over time. Sincethis is a course emphasizing applications with both theory and applications, the reader isguided through examples involving real time series in the lectures. A collection of simpletheoretical and applied exercises assuming a background that includes a beginning levelcourse in mathematical statistics and some computing skills follows each chapter. Moreimportantly, the computer code in R and datasets are provided for most of examples analyzedin this lecture notes.

Some materials are based on the lecture notes given by Professor Robert H. Shumway,Department of Statistics, University of California at Davis and my colleague, ProfessorStanislav Radchenko, Department of Economics, University of North Carolina at Charlotte.Some datasets are provided by Professor Robert H. Shumway, Department of Statistics, Uni-versity of California at Davis and Professor Phillips Hans Franses at University of Rotterdam,Netherland. I am very grateful to them for providing their lecture notes and datasets.

Contents

1 Package R and Simple Applications 11.1 Computational Toolkits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 How to Install R ? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Data Analysis and Graphics Using R – An Introduction (109 pages) . . . . . 41.4 CRAN Task View: Empirical Finance . . . . . . . . . . . . . . . . . . . . . . 41.5 CRAN Task View: Computational Econometrics . . . . . . . . . . . . . . . . 8

2 Characteristics of Time Series 152.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.2 Stationary Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.2.1 Detrending . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.2.2 Di!erencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242.2.3 Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.2.4 Linear Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.3 Other Key Features of Time Series . . . . . . . . . . . . . . . . . . . . . . . 302.3.1 Seasonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.3.2 Aberrant Observations . . . . . . . . . . . . . . . . . . . . . . . . . . 332.3.3 Conditional Heteroskedasticity . . . . . . . . . . . . . . . . . . . . . . 342.3.4 Nonlinearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

2.4 Time Series Relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382.4.1 Autocorrelation Function . . . . . . . . . . . . . . . . . . . . . . . . . 392.4.2 Cross Correlation Function . . . . . . . . . . . . . . . . . . . . . . . . 402.4.3 Partial Autocorrelation Function . . . . . . . . . . . . . . . . . . . . 45

2.5 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492.6 Computer Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532.7 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

3 Univariate Time Series Models 693.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 693.2 Least Squares Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 743.3 Model Selection Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

3.3.1 Subset Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . 793.3.2 Sequential Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 833.3.3 Likelihood Based-Criteria . . . . . . . . . . . . . . . . . . . . . . . . 853.3.4 Cross-Validation and Generalized Cross-Validation . . . . . . . . . . 87

ii

CONTENTS iii

3.3.5 Penalized Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 883.4 Integrated Models - I(1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 903.5 Autoregressive Models - AR(p) . . . . . . . . . . . . . . . . . . . . . . . . . 93

3.5.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 933.5.2 Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

3.6 Moving Average Models – MA(q) . . . . . . . . . . . . . . . . . . . . . . . . 1023.7 Autoregressive Integrated Moving Average Model - ARIMA(p, d, q) . . . . . 1063.8 Seasonal ARIMA Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1083.9 Regression Models With Correlated Errors . . . . . . . . . . . . . . . . . . . 1203.10 Estimation of Covariance Matrix . . . . . . . . . . . . . . . . . . . . . . . . 1303.11 Long Memory Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1333.12 Periodicity and Business Cycles . . . . . . . . . . . . . . . . . . . . . . . . . 1363.13 Impulse Response Function . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

3.13.1 First Order Di!erence Equations . . . . . . . . . . . . . . . . . . . . 1423.13.2 Higher Order Di!erence Equations . . . . . . . . . . . . . . . . . . . 146


4 Non-stationary Processes and Structural Breaks 1854.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1854.2 Random Walks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

4.2.1 Inappropriate Detrending . . . . . . . . . . . . . . . . . . . . . . . . 1894.2.2 Spurious (nonsense) Regressions . . . . . . . . . . . . . . . . . . . . . 190

4.3 Unit Root and Stationary Processes . . . . . . . . . . . . . . . . . . . . . . . 1904.3.1 Comparison of Forecasts of TS and DS Processes . . . . . . . . . . . 1914.3.2 Random Walk Components and Stochastic Trends . . . . . . . . . . . 193

4.4 Trend Estimation and Forecasting . . . . . . . . . . . . . . . . . . . . . . . . 1944.4.1 Forecasting a Deterministic Trend . . . . . . . . . . . . . . . . . . . . 1944.4.2 Forecasting a Stochastic Trend . . . . . . . . . . . . . . . . . . . . . . 1954.4.3 Forecasting ARMA models with Deterministic Trends . . . . . . . . . 1954.4.4 Forecasting of ARIMA Models . . . . . . . . . . . . . . . . . . . . . . 196

4.5 Unit Root Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1974.5.1 The Dickey-Fuller and Augmented Dickey-Fuller Tests . . . . . . . . 1974.5.2 Cautions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200

4.6 Structural Breaks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2004.6.1 Testing for Breaks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2014.6.2 Zivot and Andrews’s Testing Procedure . . . . . . . . . . . . . . . . . 2034.6.3 Cautions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205


CONTENTS iv

5 Vector Autoregressive Models 2155.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215

5.1.1 Properties of VAR Models . . . . . . . . . . . . . . . . . . . . . . . . 2185.1.2 Statistical Inferences . . . . . . . . . . . . . . . . . . . . . . . . . . . 220

5.2 Impulse-Response Function . . . . . . . . . . . . . . . . . . . . . . . . . . . 2225.3 Variance Decompositions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2255.4 Granger Causality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2265.5 Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2295.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2295.7 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233

6 Cointegration 2346.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2346.2 Cointegrating Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2356.3 Testing for Cointegration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2366.4 Cointegrated VAR Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2396.5 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2426.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243

7 Nonparametric Density, Distribution & Quantile Estimation 2447.1 Mixing Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2447.2 Density Estimate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245

7.2.1 Asymptotic Properties . . . . . . . . . . . . . . . . . . . . . . . . . . 2467.2.2 Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2497.2.3 Boundary Correction . . . . . . . . . . . . . . . . . . . . . . . . . . . 251

7.3 Distribution Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2547.3.1 Smoothed Distribution Estimation . . . . . . . . . . . . . . . . . . . 2547.3.2 Relative E"ciency and Deficiency . . . . . . . . . . . . . . . . . . . . 257

7.4 Quantile Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2587.4.1 Value at Risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2587.4.2 Nonparametric Quantile Estimation . . . . . . . . . . . . . . . . . . . 260

7.5 Computer Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2617.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264

8 Nonparametric Regression Estimation 2678.1 Bandwidth Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267

8.1.1 Simple Bandwidth Selectors . . . . . . . . . . . . . . . . . . . . . . . 2678.1.2 Cross-Validation Method . . . . . . . . . . . . . . . . . . . . . . . . . 268

8.2 Multivariate Density Estimation . . . . . . . . . . . . . . . . . . . . . . . . . 2708.3 Regression Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2718.4 Kernel Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273

8.4.1 Asymptotic Properties . . . . . . . . . . . . . . . . . . . . . . . . . . 2748.4.2 Boundary Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277

8.5 Local Polynomial Estimate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2798.5.1 Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279

CONTENTS v

8.5.2 Implementation in R . . . . . . . . . . . . . . . . . . . . . . . . . . . 2808.5.3 Complexity of Local Polynomial Estimator . . . . . . . . . . . . . . . 2818.5.4 Properties of Local Polynomial Estimator . . . . . . . . . . . . . . . 2848.5.5 Bandwidth Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . 289

8.6 Functional Coe"cient Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 2928.6.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2928.6.2 Local Linear Estimation . . . . . . . . . . . . . . . . . . . . . . . . . 2938.6.3 Bandwidth Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . 2948.6.4 Smoothing Variable Selection . . . . . . . . . . . . . . . . . . . . . . 2968.6.5 Goodness-of-Fit Test . . . . . . . . . . . . . . . . . . . . . . . . . . . 2968.6.6 Asymptotic Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2998.6.7 Conditions and Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . 3018.6.8 Monte Carlo Simulations and Applications . . . . . . . . . . . . . . . 311

8.7 Additive Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3118.7.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3118.7.2 Backfitting Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 3158.7.3 Projection Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3178.7.4 Two-Stage Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . 3198.7.5 Monte Carlo Simulations and Applications . . . . . . . . . . . . . . . 322

8.8 Computer Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3228.9 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326

List of Tables

3.1 AICC values for ten models for the recruits series . . . . . . . . . . . . . . . 98

4.1 Large-sample critical values for the ADF statistic . . . . . . . . . . . . . . . 1984.2 Summary of DF test for unit roots in the absence of serial correlation . . . . 1994.3 Critical Values of the QLR statistic with 15% Trimming . . . . . . . . . . . 203

5.1 Sims variance decomposition in three variable VAR model . . . . . . . . . . 2285.2 Sims variance decomposition including interest rates . . . . . . . . . . . . . . 228

6.1 Critical values for the Engle-Granger ADF statistic . . . . . . . . . . . . . . 238

8.1 Sample sizes required for p-dimensional nonparametric regression to have com-parable performance with that of 1-dimensional nonparametric regression us-ing size 100 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277

vi

List of Figures

2.1 Monthly SOI (left) and simulated recruitment (right) from a model (n=453months, 1950-1987). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.2 Simulated MA(1) with !1 = 0.9. . . . . . . . . . . . . . . . . . . . . . . . . . 212.3 Log of annual indices of real national output in China, 1952-1988. . . . . . . 222.4 Monthly average temperature in degrees centigrade, January, 1856 - February

2005, n = 1790 months. The straight line (wide and green) is the linear trendy = "9.037 + 0.0046 t and the curve (wide and red) is the nonparametricestimated trend. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.5 Detrended monthly global temperatures: left panel (linear) and right panel(nonlinear). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.6 Di!erenced monthly global temperatures. . . . . . . . . . . . . . . . . . . . . 252.7 Annual stock of motor cycles in the Netherlands, 1946-1993. . . . . . . . . . 262.8 Quarterly earnings for Johnson & Johnson (4th quarter, 1970 to 1st quarter,

1980, left panel) with log transformed earnings (right panel). . . . . . . . . . 272.9 The SOI series (black solid line) compared with a 12 point moving average

(red thicker solid line). The top panel: original data and the bottom panel:filtered series. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.10 US Retail Sales Data from 1967-2000. . . . . . . . . . . . . . . . . . . . . . . 312.11 Four-weekly advertising expenditures on radio and television in The Nether-

lands, 1978.01 " 1994.13. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322.12 First di!erence in log prices versus the inflation rate: the case of Argentina,

1970.1 " 1989.4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332.13 Japanese - U.S. dollar exchange rate return series {yt}, from January 1, 1974

to December 31, 2003. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362.14 Quarterly unemployment rate in Germany, 1962.1 " 1991.4 (seasonally ad-

justed and not seasonally adjusted) in the left panel. The scatterplot of un-employment rate (seasonally adjusted) versus unemployment rate (seasonallyadjusted) one period lagged in the right panel. . . . . . . . . . . . . . . . . . 37

2.15 Multiple lagged scatterplots showing the relationship between SOI and thepresent (xt) versus the lagged values (xt+h) at lags 1 # h # 16. . . . . . . . . 39

2.16 Autocorrelation functions of SOI and recruitment and cross correlation func-tion between SOI and recruitment. . . . . . . . . . . . . . . . . . . . . . . . 41

2.17 Multiple lagged scatterplots showing the relationship between the SOI at timet + h, say xt+h (x-axis) versus recruits at time t, say yt (y-axis), 0 # h # 15. 42

vii

LIST OF FIGURES viii

2.18 Multiple lagged scatterplots showing the relationship between the SOI at timet, say xt (x-axis) versus recruits at time t + h, say yt+h (y-axis), 0 # h # 15. 42

2.19 Partial autocorrelation functions for the SOI (left panel) and the recruits(right panel) series. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

2.20 Varve data for Problem 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 502.21 Gas and oil series for Problem 6. . . . . . . . . . . . . . . . . . . . . . . . . 512.22 Handgun sales (per 10,000,000) in California and monthly gun death rate (per

100,00) in California (February 2, 1980 -December 31, 1998. . . . . . . . . . 53

3.1 Autocorrelation functions (ACF) for simple (left) and log (right) returns forIBM (top panels) and for the value-weighted index of US market (bottompanels), January 1926 to December 1997. . . . . . . . . . . . . . . . . . . . . 72

3.2 Autocorrelation functions (ACF) and partial autocorrelation functions (PACF)for the detrended (top panel) and di!erenced (bottom panel) global temper-ature series. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

3.3 A typical realization of the random walk series (left panel) and the first dif-ference of the series (right panel). . . . . . . . . . . . . . . . . . . . . . . . . 91

3.4 Autocorrelation functions (ACF) (left) and partial autocorrelation functions(PACF) (right) for the random walk (top panel) and the first di!erence (bot-tom panel) series. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

3.5 Autocorrelation (ACF) of residuals of AR(1) for SOI (left panel) and the plotof AIC and AICC values (right panel). . . . . . . . . . . . . . . . . . . . . . 99

3.6 Autocorrelation functions (ACF) and partial autocorrelation functions (PACF)for the log varve series (top panel) and the first di!erence (bottom panel),showing a peak in the ACF at lag h = 1. . . . . . . . . . . . . . . . . . . . . 104

3.7 Number of live births 1948(1)"1979(1) and residuals from models with a firstdi!erence, a first di!erence and a seasonal di!erence of order 12 and a fittedARIMA(0, 1, 1) $ (0, 1, 1)12 model. . . . . . . . . . . . . . . . . . . . . . . . 111

3.8 Autocorrelation functions (ACF) and partial autocorrelation functions (PACF)for the birth series (top two panels), the first di!erence (second two panels) anARIMA(0, 1, 0)$(0, 1, 1)12 model (third two panels) and an ARIMA(0, 1, 1)$(0, 1, 1)12 model (last two panels). . . . . . . . . . . . . . . . . . . . . . . . . 112

3.9 Autocorrelation functions (ACF) and partial autocorrelation functions (PACF)for the log J&J earnings series (top two panels), the first di!erence (sec-ond two panels), ARIMA(0, 1, 0) $ (1, 0, 0)4 model (third two panels), andARIMA(0, 1, 1) $ (1, 0, 0)4 model (last two panels). . . . . . . . . . . . . . . 115

3.10 Autocorrelation functions (ACF) and partial autocorrelation functions (PACF)for ARIMA(0, 1, 1)$(0, 1, 1)4 model (top two panels) and the residual plots ofARIMA(0, 1, 1)$ (1, 0, 0)4 (left bottom panel) and ARIMA(0, 1, 1)$ (0, 1, 1)4

model (right bottom panel). . . . . . . . . . . . . . . . . . . . . . . . . . . . 1163.11 Monthly simple return of CRSP Decile 1 index from January 1960 to December

2003: Time series plot of the simple return (left top panel), time series plotof the simple return after adjusting for January e!ect (right top panel), theACF of the simple return (left bottom panel), and the ACF of the adjustedsimple return. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

LIST OF FIGURES ix

3.12 Autocorrelation functions (ACF) and partial autocorrelation functions (PACF)for the detrended log J&J earnings series (top two panels)and the fittedARIMA(0, 0, 0) $ (1, 0, 0)4 residuals. . . . . . . . . . . . . . . . . . . . . . . 123

3.13 Time plots of U.S. weekly interest rates (in percentages) from January 5, 1962to September 10, 1999. The solid line (black) is the Treasury 1-year constantmaturity rate and the dashed line the Treasury 3-year constant maturity rate(red). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

3.14 Scatterplots of U.S. weekly interest rates from January 5, 1962 to September10, 1999: the left panel is 3-year rate versus 1-year rate, and the right panelis changes in 3-year rate versus changes in 1-year rate. . . . . . . . . . . . . 125

3.15 Residual series of linear regression Model I for two U.S. weekly interest rates:the left panel is time plot and the right panel is ACF. . . . . . . . . . . . . . 126

3.16 Time plots of the change series of U.S. weekly interest rates from January 12,1962 to September 10, 1999: changes in the Treasury 1-year constant maturityrate are in denoted by black solid line, and changes in the Treasury 3-yearconstant maturity rate are indicated by red dashed line. . . . . . . . . . . . . 127

3.17 Residual series of the linear regression models: Model II (top) and Model III(bottom) for two change series of U.S. weekly interest rates: time plot (left)and ACF (right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

3.18 Sample autocorrelation function of the absolute series of daily simple returnsfor the CRSP value-weighted (left top panel) and equal-weighted (right toppanel) indexes. The log spectral density of the absolute series of daily simplereturns for the CRSP value-weighted (left bottom panel) and equal-weighted(right bottom panel) indexes. . . . . . . . . . . . . . . . . . . . . . . . . . . 135

3.19 The autocorrelation function of an AR(2) model: (a) "1 = 1.2 and "2 = "0.35,(b) "1 = 1.0 and "2 = "0.7, (c) "1 = 0.2 and "2 = 0.35, (d) "1 = "0.2 and"2 = 0.35. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

3.20 The growth rate of US quarterly real GNP from 1947.II to 1991.I (seasonallyadjusted and in percentage): the left panel is the time series plot and the rightpanel is the ACF. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

3.21 The time-series yt is generated with wt % N(0, 1), y0 = 5. At period t = 50,there is an additional impulse to the error term, i.e. !w50 = w50 + 1. Theimpulse response function is computed as the di!erence between the series yt

without impulse and the series !yt with the impulse. . . . . . . . . . . . . . . 1433.22 The time-series yt is generated with wt % N(0, 1), y0 = 3. At period t = 50,

there is an additional impulse to the error term, i.e. !w50 = w50 + 1. Theimpulse response function is computed as the di!erence between the series yt

without impulse and the series !yt with the impulse. . . . . . . . . . . . . . . 1443.23 Example of impulse response functions for first order di!erence equations. . . 1463.24 The time series yt is generated with wt % N(0, 1), y0 = 3. For the transitory

impulse, there is an additional impulse to the error term at period t = 50, i.e.!w50 = w50 + 1. For the permanent impulse, there is an additional impulse forperiod t = 50, · · ·, 100, i.e. !wt = wt + 1, t = 50, 51, · · ·, 100. The impulseresponse function (IRF) is computed as the di!erence between the series yt

without impulse and the series !yt with the impulse. . . . . . . . . . . . . . . 147

LIST OF FIGURES x

3.25 Example of impulse response functions for second order di!erence equation. . 149

Chapter 1

Package R and Simple Applications

1.1 Computational Toolkits

When you work with large datasets, messy data handling, models,etc, you need to choose the computational tools that are useful fordealing with these kinds of problems. There are “menu drivensystems” where you click some buttons and get some work done- but these are useless for anything nontrivial. To do serious eco-nomics and finance in the modern days, you have to write com-puter programs. And this is true of any field, for example, empiricalmacroeconomics - and not just of “computational finance” which isa hot buzzword recently.

The question is how to choose the computational tools. Accord-ing to Ajay Shah (December 2005), you should pay attention to fourelements: price, freedom, elegant and powerful computerscience, and network e!ects. Low price is better than high price.Price = 0 is obviously best of all. Freedom here is in many aspects.A good software system is one that doesn’t tie you down in terms ofhardware/OS, so that you are able to keep moving. Another aspectof freedom is in working with colleagues, collaborators and students.With commercial software, this becomes a problem, because yourcolleagues may not have the same software that you are using. Here

1

CHAPTER 1. PACKAGE R AND SIMPLE APPLICATIONS 2

free software really wins spectacularly. Good practice in research in-volves a great accent on reproducibility. Reproducibility is importantboth so as to avoid mistakes, and because the next person workingin your field should be standing on your shoulders. This requiresan ability to release code. This is only possible with free software.Systems like SAS and Gauss use archaic computer science. Thecode is inelegant. The language is not powerful. In this day and age,writing C or fortran by hand is “too low level”. Hell, with Gauss,even a minimal thing like online help is tawdry. One prefers a systemto be built by people who know their computer science - it should bean elegant, powerful language. All standard CS knowledge should benicely in play to give you a gorgeous system. Good computer sciencegives you more productive humans. Lots of economists use Gauss,and give out Gauss source code, so there is a network e!ect in favorof Gauss. A similar thing is right now happening with statisticiansand R.

Here I cite comparisons among most commonly used packages (seeAjay Shah (December 2005)); see the web site athttp://www.mayin.org/ajayshah/COMPUTING/mytools.html.

R is a very convenient programming language for doing statisticalanalysis and Monte Carol simulations as well as various applicationsin quantitative economics and finance. Indeed, we prefer to think ofit of an environment within which statistical techniques are imple-mented. I will teach it at the introductory level, but NOTICE thatyou will have to learn R on your own. Note that about 97% of com-mands in S-PLUS and R are same. In particular, for analyzing timeseries data, R has a lot of bundles and packages, which can be down-loaded for free, for example, at http://www.r-project.org/.


R, like S, is designed around a true computer language, and itallows users to add additional functionality by defining new functions.Much of the system is itself written in the R dialect of S, whichmakes it easy for users to follow the algorithmic choices made. Forcomputationally-intensive tasks, C, C++ and Fortran code canbe linked and called at run time. Advanced users can write C codeto manipulate R objects directly.

1.2 How to Install R ?

(1) go to the web site http://www.r-project.org/;(2) click CRAN;(3) choose a site for downloading, say http://cran.cnr.Berkeley.edu;(4) click Windows (95 and later);(5) click base;(6) click R-2.3.1-win32.exe (Version of 06-01-2006) to save thisfile first and then run it to install.The basic R is installed into your computer. If you need to installother packages, you need to do the followings:(7) After it is installed, there is an icon on the screen. Click the iconto get into R;(8) Go to the top and find packages and then click it;(9) Go down to Install package(s)... and click it;(10) There is a new window. Choose a location to download thepackages, say USA(CA1), move mouse to there and click OK;(11) There is a new window listing all packages. You can select anyone of packages and click OK, or you can select all of them and thenclick OK.


1.3 Data Analysis and Graphics Using R – An Intro-duction (109 pages)

See the file r-notes.pdf (109 pages) which can be downloaded fromhttp://www.math.uncc.edu/˜ zcai/r-notes.pdf.I encourage you to download this file and learn it by yourself.

1.4 CRAN Task View: Empirical Finance

This CRAN Task View contains a list of packages useful for empiricalwork in Finance, grouped by topic. Besides these packages, a verywide variety of functions suitable for empirical work in Finance isprovided by both the basic R system (and its set of recommendedcore packages), and a number of other packages on the Comprehen-sive R Archive Network (CRAN). Consequently, several of the otherCRAN Task Views may contain suitable packages, in particular theEconometrics Task View. The web site ishttp://cran.r-project.org/src/contrib/Views/Finance.html

1. Standard regression models: Linear models such as ordi-nary least squares (OLS) can be estimated by lm() (from bythe stats package contained in the basic R distribution). Max-imum Likelihood (ML) estimation can be undertaken with theoptim() function. Non-linear least squares can be estimatedwith the nls() function, as well as with nlme() from the nlmepackage. For the linear model, a variety of regression diagnostictests are provided by the car, lmtest, strucchange, urca,uroot, and sandwich packages. The Rcmdr and Zelig pack-ages provide user interfaces that may be of interest as well.


2. Time series: Classical time series functionality is providedby the arima() and KalmanLike() commands in the basicR distribution. The dse packages provides a variety of moreadvanced estimation methods; fracdi! can estimate fraction-ally integrated series; longmemo covers related material. Forvolatily modeling, the standard GARCH(1,1) model can beestimated with the garch() function in the tseries package.Unit root and cointegration tests are provided by tseries, urcaand uroot. The Rmetrics packages fSeries and fMultivarcontain a number of estimation functions for ARMA, GARCH,long memory models, unit roots and more. The ArDec im-plements autoregressive time series decomposition in a Bayesianframework. The dyn and dynlm are suitable for dynamic (lin-ear) regression models. Several packages provide wavelet anal-ysis functionality: rwt, wavelets, waveslim, wavethresh.Some methods from chaos theory are provided by the packagetseriesChaos.

3. Finance: The Rmetrics bundle comprised of the fBasics,fCalendar, fSeries, fMultivar, fPortfolio, fOptions andfExtremes packages contains a very large number of relevantfunctions for di!erent aspect of empirical and computationalfinance. The RQuantLib package provides several option-pricing functions as well as some fixed-income functionality fromthe QuantLib project to R. The portfolio package containsclasses for equity portfolio management.

4. Risk Management: The VaR package estimates Value-at-Risk, and several packages provide functionality for ExtremeValue Theory models: evd, evdbayes, evir, extRremes,ismec, POT. The mvtnorm package provides code for mul-


tivariate Normal and t-distributions. The Rmetrics packagesfPortfolio and fExtremes also contain a number of relevantfunctions. The copula and fgac packages cover multivariatedependency structures using copula methods.

5. Data and Date Management: The its, zoo and fCalendar(part of Rmetrics) packages provide support for irregularly-spaced time series. fCalendar also addresses calendar issuessuch as recurring holidays for a large number of financial cen-ters, and provides code for high-frequency data sets.

CRAN packages:

* ArDec

* car

* copula

* dse

* dyn

* dynlm

* evd

* evdbayes

* evir

* extRemes

* fBasics (core)

* fCalendar (core)

* fExtremes (core)

* fgac


* fMultivar (core)

* fOptions (core)

* fPortfolio (core)

* fracdi!

* fSeries (core)

* ismev

* its (core)

* lmtest

* longmemo

* mvtnorm

* portfolio

* POT

* Rcmdr

* RQuantLib (core)

* rwt

* sandwich

* strucchange

* tseries (core)

* tseriesChaos

* urca (core)

* uroot

* VaR

* wavelets

* waveslim

* wavethresh


* Zelig

* zoo (core)

Related links:

* CRAN Task View: Econometrics. The web site ishttp://cran.cnr.berkeley.edu/src/contrib/Views/Econometrics.htmlor see the next section.

* Rmetrics by Diethelm Wuertz contains a wealth of Rcode for Finance. The web site ishttp://www.itp.phys.ethz.ch/econophysics/R/

* Quantlib is a C++ library for quantitative finance. Theweb site ishttp://quantlib.org/

* Mailing list: R Special Interest Group Finance

1.5 CRAN Task View: Computational Econometrics

Base R ships with a lot of functionality useful for computationaleconometrics, in particular in the stats package. This functionalityis complemented by many packages on CRAN, a brief overview isgiven below. There is also a considerable overlap between the toolsfor econometrics in this view and finance in the Finance view. Fur-thermore, the finance SIG is a suitable mailing list for obtaininghelp and discussing questions about both computational finance andeconometrics. The packages in this view can be roughly structuredinto the following topics. The web site is


http://cran.r-project.org/src/contrib/Views/Econometrics.html

1. Linear regression models: Linear models can be fitted (viaOLS) with lm() (from stats) and standard tests for model com-parisons are available in various methods such as summary()and anova(). Analogous functions that also support asymp-totic tests (z instead of t tests, and Chi-squared instead of Ftests) and plug-in of other covariance matrices are coeftest()and waldtest() in lmtest. Tests of more general linear hy-potheses are implemented in linear.hypothesis() in car. HCand HAC covariance matrices that can be plugged into thesefunctions are available in sandwich. The packages car and lmtestalso provide a large collection of further methods for diagnosticchecking in linear regression models.

2. Microeconometrics: Many standard micro-econometric mod-els belong to the family of generalized linear models (GLM) andcan be fitted by glm() from package stats. This includes inparticular logit and probit models for modelling choice data andpoisson models for count data. Negative binomial GLMs areavailable via glm.nb() in package MASS from the VR bundle.Zero-inflated count models are provided in zicounts. Furtherover-dispersed and inflated models, including hurdle models, areavailable in package pscl. Bivariate poisson regression modelsare implemented in bivpois. Basic censored regression models(e.g., tobit models) can be fitted by survreg() in survival.Further more refined tools for microecnometrics are provided inmicEcon. The package bayesm implements a Bayesian ap-proach to microeconometrics and marketing. Inference for rela-tive distributions is contained in package reldist.


3. Further regression models: Various extensions of the linearregression model and other model fitting techniques are availablein base R and several CRAN packages. Nonlinear least squaresmodelling is available in nls() in package stats. Relevant pack-ages include quantreg (quantile regression), sem (linear struc-tural equation models, including two-stage least squares), systemfit(simultaneous equation estimation), betareg (beta regression),nlme (nonlinear mixed-e!ect models), VR (multinomial logitmodels in package nnet) and MNP (Bayesian multinomial pro-bit models). The packages Design and Hmisc provide sev-eral tools for extended handling of (generalized) linear regressionmodels.

4. Basic time series infrastructure: The class ts in pack-age stats is R’s standard class for regularly spaced time serieswhich can be coerced back and forth without loss of informa-tion to zooreg from package zoo. zoo provides infrastructurefor both regularly and irregularly spaced time series (the lattervia the class “zoo” ) where the time information can be of ar-bitrary class. Several other implementations of irregular timeseries building on the “POSIXt” time-date classes are availablein its, tseries and fCalendar which are all aimed particularlyat finance applications (see the Finance view).

5. Time series modelling: Classical time series modelling toolsare contained in the stats package and include arima() forARIMA modelling and Box-Jenkins-type analysis. Furthermorestats provides StructTS() for fitting structural time series anddecompose() and HoltWinters() for time series filtering anddecomposition. For estimating VAR models, several methodsare available: simple models can be fitted by ar() in stats, more


elaborate models are provided by estVARXls() in dse and aBayesian approach is available in MSBVAR. A convenient inter-face for fitting dynamic regression models via OLS is available indynlm; a di!erent approach that also works with other regres-sion functions is implemented in dyn. More advanced dynamicsystem equations can be fitted using dse. Unit root and coin-tegration techniques are available in urca, uroot and tseries.Time series factor analysis is available in tsfa.

6. Matrix manipulations: As a vector- and matrix-based lan-guage, base R ships with many powerful tools for doing ma-trix manipulations, which are complemented by the packagesMatrix and SparseM.

7. Inequality: For measuring inequality, concentration and povertythe package ineq provides some basic tools such as Lorenz curves,Pen’s parade, the Gini coe"cient and many more.

8. Structural change: R is particularly strong when dealingwith structural changes and changepoints in parametric mod-els, see strucchange and segmented.

9. Data sets: Many of the packages in this view contain collec-tions of data sets from the econometric literature and the packageEcdat contains a complete collection of data sets from variousstandard econometric textbooks. micEcdat provides severaldata sets from the Journal of Applied Econometrics and theJournal of Business & Economic Statistics data archives. Pack-age CDNmoney provides Canadian monetary aggregates andpwt provides the Penn world table.

CRAN packages:


* bayesm

* betareg

* bivpois

* car (core)

* CDNmoney

* Design

* dse

* dyn

* dynlm

* Ecdat

* fCalendar

* Hmisc

* ineq

* its

* lmtest (core)

* Matrix

* micEcdat

* micEcon

* MNP

* MSBVAR

* nlme

* pscl

* pwt

* quantreg


* reldist

* sandwich (core)

* segmented

* sem

* SparseM

* strucchange

* systemfit

* tseries (core)

* tsfa

* urca (core)

* uroot

* VR

* zicounts

* zoo (core)

Related links:

* CRAN Task View: Finance. The web site ishttp://cran.cnr.berkeley.edu/src/contrib/Views/Finance.htmlor see the above section.

* Mailing list: R Special Interest Group Finance

* A Brief Guide to R for Beginners in Econometrics. Theweb site ishttp://people.su.se/˜ma/R"intro/.


* R for Economists. The web site ishttp://www.mayin.org/ajayshah/KB/R/R"for"economists.html.

Chapter 2

Characteristics of Time Series

2.1 Introduction

The very nature of data collected in di!erent fields as as diverse aseconomics, finance, biology, medicine, and engineering leads one nat-urally to a consideration of time series models. Samples taken fromall of these disciplines are typically observed over a sequence of timeperiods. Often, for example, one observes hourly or daily or monthlyor yearly data, even tick-by-tick trade data, and it is clear from exam-ining the histories of such series over a number of time periods thatthe adjacent observations are by no means independent. Hence, theusual techniques from classical statistics, developed primarily for in-dependent identically distributed (iid) observations, are notapplicable.

Clearly, we can not hope to give a complete accounting of the the-ory and applications of time series in the limited time to be devotedto this course. Therefore, what we will try to accomplish, in thispresentation is a considerably more modest set of objectives, withmore detailed references quoted for discussions in depth. First, wewill attempt to illustrate the kinds of time series analyses that canarise in scientific contexts, particularly, in economics and finance,and give examples of applications using real data. This necessarily

15

CHAPTER 2. CHARACTERISTICS OF TIME SERIES 16

will include exploratory data analysis using graphical displays andnumerical summaries such as the autocorrelation and cross correla-tion functions. The use of scatter diagrams and various linear andnonlinear transformations also will be illustrated. We will defineclassical time series statistics for measuring the patterns describedby time series data. For example, the characterization of consistenttrend profiles by dynamic linear or quadratic regression models aswell as the representation of periodic patterns using spectral analysiswill be illustrated. We will show how one might go about examiningplausible patterns of cause and e!ect, both within and among timeseries. Finally, some time series models that are particularly usefulsuch as regression with correlated errors as well as multivariate au-toregressive and state-space models will be developed, together withunit root, co-integration, and nonlinear time series models, and someother models. Forms of these models that appear to o!er hope forapplications will be emphasized. It is recognized that a discussion ofthe models and techniques involved is not enough if one does not haveavailable the requisite resources for carrying out time series compu-tations; these can be formidable. Hence, we include a computingpackage, called R.

In this chapter, we will try to minimize the use of mathematicalnotation throughout the discussions and will not spend time devel-oping the theoretical properties of any of the models or procedures.What is important for this presentation is that you, the reader, cangain a modest understanding as well as having access to some of theprincipal techniques of time series analysis. Of course, we will referto Hamilton (1994) for additional references or more complete dis-cussions relating to an application or principle and will discuss themin detail.


2.2 Stationary Time Series

We begin by introducing several environmental and economic as wellas financial time series to serve as illustrative data for time seriesmethodology. Figure 2.1 shows monthly values of an environmentalseries called the Southern Oscillation Index (SOI) and associatedrecruitment (number of new fish) computed from a model by PierreKleiber, Southwest Fisheries Center, La Jolla, California. Both seriesare for a period of 453 months ranging over the years 1950 " 1987.The SOI measures changes in air pressure that are related to seasurface temperatures in the central Pacific. The central Pacific Oceanwarms up every three to seven years due to the El Nino e!ect whichhas been blamed, in particular, for foods in the midwestern portionsof the U.S.

Both series in Figure 2.1 tend to exhibit repetitive behavior, withregularly repeating (stochastic) cycles that are easily visible. Thisperiodic behavior is of interest because underlying processes ofinterest may be regular and the rate or frequency of oscillation char-acterizing the behavior of the underlying series would help to identifythem. One can also remark that the cycles of the SOI are repeatingat a faster rate than those of the recruitment series. The recruit seriesalso shows several kinds of oscillations, a faster frequency that seemsto repeat about every 12 months and a slower frequency that seemsto repeat about every 50 months. The study of the kinds of cyclesand their strengths will be discussed later. The two series also tendto be somewhat related; it is easy to imagine that somehow the fishpopulation is dependent on the SOI. Perhaps there is even a laggedrelation, with the SOI signalling changes in the fish population.

The study of the variation in the di!erent kinds of cyclical behav-


ior in a time series can be aided by computing the power spectrumwhich shows the variance as a function of the frequency of oscilla-tion. Comparing the power spectra of the two series would then givevaluable information relating to the relative cycles driving each one.One might also want to know whether or not the cyclical variationsof a particular frequency in one of the series, say the SOI, are asso-ciated with the frequencies in the recruitment series. This would bemeasured by computing the correlation as a function of frequency,called the coherence. The study of systematic periodic variations

0 100 200 300 400

−1.0−0.5

0.00.5

1.0

Southern Oscillation Index

0 100 200 300 400

020

4060

80100

Recruit

Figure 2.1: Monthly SOI (left) and simulated recruitment (right) from a model (n=453months, 1950-1987).

in time series is called spectral analysis. See Shumway (1988)and Shumway and Sto!er (2001) for details.

We will need a characterization for the kind of stability that isexhibited by the environmental and fish series. One can note thatthe two series seem to oscillate fairly regularly around central values(0 for SOI and 64 for recruitment). Also, the lengths of the cycles andtheir orientations relative to each other do not seem to be changingdrastically over the time histories.

In order to describe this in a simple mathematical way, it is con-


venient to introduce the concept of a stationary time series.Suppose that we let the value of the time series at some time pointt be denoted by {xt}. Then, the observed values can be representedas x1, the initial time point, x2, the second time point and so forthout to xn, the last observed point. A stationary time series is onefor which the statistical behavior of xt1, xt2, . . . , xtk is identical tothat of the shifted set xt1+h, xt2+h, . . . , xtk+h for any collection oftime points t1, t2, . . ., tk and for any shift h. This means that all ofthe multivariate probability density functions for subsets of variablesmust agree with their counterparts in the shifted set for all valuesof the shift parameter h. This is called strictly strong station-ary, which can be regarded as a mathematical assumption.The above version of stationarity is too strong for most applicationsand is di"cult or impossible to be verified statistically in applica-tions. Therefore, to relax this mathematical assumption, we will usea weaker version, called weak stationarity or covariance sta-tionarity, which requires only that first and second moments satisfythe constraints. This implies that

E(xt) = µ and E[(xt+h " µ)(xt " µ)] = #x(h), (2.1)

where E denotes expectation or averaging over the population densi-ties and h is the shift or lag. This implies, first, that the mean valuefunction does not change over time and that #x(h), the populationcovariance function, is the same as long as the points are separatedby a constant shift h. Estimators for the population covariance areimportant diagnostic tools for time correlation as we shall see later.When we use the term stationary time series in the sequel, we meanweakly stationary as defined by (2.1). The autocorrelation function(ACF) is defined as a scaled version of (2.1) and is written as

$x(h) = #x(h)/#x(0), (2.2)


which is always between "1 and 1. The denominator of (2.2) is themean square error or variance of the series since #x(0) = E[(xt"µ)2].

Exercise: For given time series {xt}nt=1, how do you check whether

the time series {xt} is weakly or strong stationary? Thank aboutthis problem.

Example 1.1: We introduce in this example a simple exampleof a time domain model to be considered in detail later. A simplemoving average model assumes that the series xt is generated fromlinear combinations of independent or uncorrelated “shocks” wt,sometimes called white noise1 (WN), to the system. For example,the simple first order moving average series

xt = wt " 0.9 wt"1

is stationary when the inputs {wt} are assumed independent withE(wt) = 0 and E(w2

t ) = 1. It can be easily verified that E(xt) = 0and #x(h) = 1 + 0.92 if h = 0, "0.9 if h = ±1, 0 if h > 1 (pleaseverify this). We can see what such a series might look like by drawingrandom numbers wt from a standard normal distribution and thencomputing the values of xt. One such simulated series is shown inFigure 2.2 for n = 200 values; the series resembles vaguely the realdata in the bottom panel of Figure 2.1.

Many of our techniques are based on the idea that a suitablymodified time series can be regarded as stationary (weakly). Thisrequires first that the mean value function be constant as in (2.1).Several simple commonly occurring nonstationary time series can beillustrated by letting this assumption be violated. For example, the

1white noise is defined as a sequence of uncorrelated ransom variables with mean zero and same variance.


0 50 100 150 200

−3−2

−10

12

3

Simulated MA(1)

Figure 2.2: Simulated MA(1) with !1 = 0.9.

series yt = t+xt, where xt is the moving average series of Example1.1, will be nonstationary because E(xt) = t and the constant meanassumption of (2.1) is clearly violated.

Four techniques for modifying the given series to improve the ap-proximation to stationarity are detrending, di!erencing, trans-formations, and linear filtering as discussed below. A simpleexample of a nonstationary series is also given later.

2.2.1 Detrending

One of the dominant features on many economic and business timeseries is the trend. Such a trend can be upward or downward, itcan be steep or not, and it can be exponential or approximatelylinear. Since a trend should be definitely somehow be incorporatedin a time series model, simply because it can be exploited for out-of-sample forecasting, an analysis of trend behavior typically requiresquite some research input. The discussion later will show that thetype of trend has an important impact on forecasting.

The general version of the nonstationary time series given above is


to assume a general trend of the form yt = Tt + xt, particularly, thelinear trend Tt = %1 + %2 t. If one looks for a method of modifyingthe above series to achieve stationarity, it is natural to consider theresidual

"xt = yt " #Tt = yt " #%1 " #%2 t

as a plausible stationary series where #%1 and #%2 are the estimatedintercept and slope of the least squares line for yt as a function oft. The use of the residual or detrended series is common and theprocess of constructing residual is known as detrending.

Example 1.2: To illustrate the presence of trends in economic data,consider the five graphs in Figure 2.3, which are the annual indices of

1955 1960 1965 1970 1975 1980 1985

56

78

agriculturecommerceconsumptionindustrytransport

Figure 2.3: Log of annual indices of real national output in China, 1952-1988.

real national output (in logs) in China in five di!erent sectors for thesample period 1952 " 1988. These sectors are agriculture, industry,construction, transportation, and commerce.

From this figure, it can be observed that the five sectors have grownover the year at di!erent rates, and also that the five sectors seemto have been a!ected by the, likely, exogenous shocks to the Chineseeconomy around 1958 and 1968. These shocks roughly correspond


to the two major political movements in China: the Great-Leap-Forward around 1958 until 1962 and the Cultural Revolution from1966 to 1976. It also appears from the graphs that these politicalmovements may not have a!ected each of the five sectors in a similarfashion. For example, the decline of the output in the constructionsector in 1961 seems much larger than that in the industry sectorin the same year. It also seems that the Great-Leap-Forward shockalready had an impact on the output in the agriculture sector asearly as 1959. To quantify the trends in the five Chinese outputseries, one might consider a simple regression model with a lineartrend as mentioned earlier or some more complex models.

Example 1.3: As another more interesting example, consider theglobal temperature series given in Figure 2.4. There appears to bean increasing trend in global temperature which may signal global

o

o

o

oo

o

oo

oo

o

oo

o

o

oo

ooo

o

o

o

o

o

o

o

oooooo

o

o

oo

ooo

o

oo

o

o

oo

o

o

oo

o

o

o

o

oo

o

o

oo

ooo

o

oo

o

ooo

o

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

oo

o

o

o

oo

o

oo

oo

o

o

o

o

oo

oooo

o

o

o

o

o

o

o

o

o

o

o

o

oo

oo

oo

o

o

o

o

ooo

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

oo

oooo

o

oo

o

oo

ooo

o

oo

o

o

o

o

o

o

ooo

o

o

o

o

oo

ooo

ooo

ooo

o

oo

o

oooo

ooo

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

oo

o

o

oo

o

o

oo

oo

oooo

o

o

oo

oo

oo

o

o

o

o

o

oooooo

o

o

o

o

o

oooooo

o

o

oo

o

oooo

o

oo

oo

o

oo

o

oo

o

o

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

ooo

ooo

oo

o

o

ooo

oo

oo

o

o

o

o

o

oo

o

oo

o

oo

oo

oo

o

o

o

o

o

ooo

ooooo

o

o

o

o

o

o

o

ooo

oo

o

oo

oo

oo

o

oo

o

oo

o

oo

oo

ooo

oo

oooooo

oo

ooo

o

o

o

oooo

oooooo

o

o

oo

oo

oooo

o

o

o

o

o

o

ooo

o

oo

o

o

o

o

o

o

o

oo

o

o

ooo

oo

o

ooooo

oo

o

oo

o

o

o

o

ooo

o

ooooooo

oo

oooooo

o

o

o

o

o

o

oooooo

oo

o

o

o

oo

oooo

oo

oo

oo

ooo

oooo

o

o

ooo

o

o

o

ooo

o

o

oo

oo

ooo

oo

oo

oo

oo

o

o

ooooo

o

oo

o

o

ooooooooooo

o

o

o

oooooo

o

o

o

o

oo

ooooo

o

ooo

oo

ooo

oo

oo

o

o

oo

o

oo

o

oo

oo

o

ooo

o

oo

oooo

ooooo

ooo

oo

oooo

o

o

ooooo

ooo

o

oo

o

ooo

oo

ooooooo

o

o

oo

o

o

oo

o

ooo

o

o

ooo

oooo

o

oo

o

oo

ooo

oo

o

oo

o

o

o

o

o

oo

o

o

o

oooo

o

ooo

oooo

o

o

o

oo

o

o

o

o

o

o

oo

o

oooo

o

o

oo

o

o

oooo

o

o

oooo

o

o

o

o

o

o

oo

o

o

o

oo

o

oo

o

oooooo

ooo

o

o

oooooooooooo

o

o

o

o

o

o

oo

oo

oo

oooo

ooooooo

o

o

oooooo

oo

o

o

oooo

oo

oo

ooo

ooo

o

o

oo

ooooo

o

o

o

o

o

ooo

oooooo

o

o

ooooo

oo

oo

o

o

oooo

ooooo

o

o

o

o

ooo

oo

ooo

o

o

o

o

o

o

oooo

oo

oo

ooo

ooooo

oo

o

o

o

o

o

o

oooo

oooo

o

o

o

oo

oooo

o

o

oo

o

oo

oo

oooooo

o

o

o

o

o

oo

ooo

o

o

oo

oo

oo

oo

oo

o

o

oo

o

oo

ooo

o

o

o

o

o

oo

oo

o

o

o

o

oo

o

oo

o

o

o

ooo

o

o

ooo

ooooo

oooooo

o

o

o

o

o

o

o

oo

o

o

o

o

oo

o

o

ooo

oo

o

oooo

o

o

oo

o

oo

o

o

oo

o

o

oo

o

o

oo

o

o

oo

ooooooo

oo

o

o

oo

o

oo

ooooo

o

o

oo

oooooooo

o

oo

oooooooo

o

o

o

o

o

o

oooooo

o

ooo

o

oooooo

o

o

o

oooo

oooooo

o

o

o

o

ooo

o

o

ooo

o

o

o

o

o

o

ooo

ooo

oo

o

ooooooooo

oo

oooo

ooo

ooo

o

o

o

o

ooooooooooooooooooo

o

o

oo

o

o

oo

ooooo

o

o

o

ooooo

oooooo

oo

ooooooooooo

o

oooo

ooooo

oo

o

oo

ooo

oooo

o

o

ooo

oo

o

ooo

ooooooo

oooo

oo

oo

o

oo

o

oooo

o

oo

oo

o

oo

ooooooo

o

o

o

ooooooo

oo

o

o

o

o

ooooooo

o

o

ooooo

ooooo

o

oo

o

oooooooo

oooo

ooooooo

oo

oo

oooooooooo

ooooooooo

oooo

o

o

o

o

oooo

o

ooo

ooooooooo

o

o

oooo

oo

o

ooo

ooo

o

ooo

oooooo

o

oo

o

oo

ooooo

o

o

o

ooo

ooooooo

o

oo

o

ooooo

ooo

o

oo

ooooooo

o

o

oo

ooo

o

ooooo

o

o

o

o

ooo

oooooo

o

oo

oooooooo

o

o

o

o

o

ooo

ooooo

oo

o

oo

oooooo

o

o

o

ooooooooo

o

ooo

o

o

ooooo

oo

oo

o

o

o

oooooooo

ooo

ooo

ooooo

o

ooooooooo

o

o

oo

o

ooooooo

ooo

o

o

ooo

ooo

ooo

o

o

o

oo

ooooooo

oo

oooo

oooooo

oo

o

oooooo

oo

o

oo

o

oooooooo

o

o

o

oo

o

ooo

oo

o

oo

oo

oooooo

oo

o

o

ooo

oooo

oooo

o

o

ooooooooo

o

o

o

o

oo

oooooo

o

o

o

o

1900 1950 2000

−1.0−0.5

0.00.5

Original Data with Linear and Nonlinear Trend

Figure 2.4: Monthly average temperature in degrees centigrade, January, 1856 - February2005, n = 1790 months. The straight line (wide and green) is the linear trend y = "9.037 +0.0046 t and the curve (wide and red) is the nonparametric estimated trend.

warming or it may be just a normal fluctuation. Fitting a straight linerelating time t to temperature in degrees Centigrade by simple leastsquares leads to #%1 = "9.037, #%2 = 0.0046 and a detrended seriesshown in the left panel of Figure 2.5. Note that the detrended series


1900 1950 2000

−0.50.0

0.5

Detrended: Linear

1900 1950 2000

−0.6−0.4

−0.20.0

0.20.4

0.6

Detrended: Nonlinear

Figure 2.5: Detrended monthly global temperatures: left panel (linear) and right panel(nonlinear).

still contains a trend like bulge that is highest at about t = 60 years.In this case the slope of the line is often used to argue that there is aglobal warming trend and that the average increase is approximately0.83 degrees F per 100 years. It is clear that the residuals in Figure2.5 still contain substantial correlation and the ordinary least squaresmodel may not be appropriate.

There may also be other functional forms that do a better jobof detrending; for example, quadratic or logarithmic representationsare common or nonparametric approach can be used (We willdiscuss this approach in detail later); see the detrended series shownin the right panel of Figure 2.5. Detrending is particularly essentialwhen one is estimating the covariance function and power spectrum.

2.2.2 Di!erencing

A common method for achieving stationarity in nonstationary casesis with the first di!erence

#yt = yt " yt"1,


where # is called the di!erencing operator. The use of di!erenc-ing as a method for transforming to stationarity is common also forseries with trend. For example, in the trend in Example 1.3, thedi!erenced series would be #yt = b + xt " xt"1, which is stationarybecause the di!erence xt " xt"1 can be shown to be stationary.

Example 1.4: The first di!erence of the global temperature se-ries is shown in Figure 2.6 and we see that the upward linear trendhas disappeared as has the trend like bulge that remained in the

1900 1950 2000

−0.50.0

0.51.0

Differenced Time Series

Figure 2.6: Di!erenced monthly global temperatures.

detrended series. Higher order di!erences are defined as successiveapplications of the operator #. For example, the second di!erenceis #2yt = ##yt so that#2yt = yt " 2 yt"1 + yt"2. If the model alsocontains a quadratic trend term c t2, it is easy to show that takingthe second di!erence reduces the model to a stationary form.

The trends in Figures 2.3 and 2.4 are all of the familiar type, thatis, many economic time series display an upward moving trend. Itis however not necessary for a trend to move upwards to be called atrend. It is also that a trend is less smooth and may display slowlychanging tendencies which once in a while change directions.


Example 1.5: An example of such a trending pattern is given in thetop left panel of Figure 2.7 and the first di!erence in the top right

50 60 70 80 90

100150

200250 Data

0 10 20 30 40

020

4060

First difference

0 10 20 30 40−20−10

010

2030

Second order difference

Figure 2.7: Annual stock of motor cycles in the Netherlands, 1946-1993.

panel of Figure 2.7 and the second order di!erence in the bottomleft panel of Figure 2.7, where the annual stock of motor cycles inThe Netherland is displayed, for 1946 " 1993, with the first orderand second order di!erencing time series. From the figures in theright top and bottom panels, we can see that the di!erencing mightnot work well for this example. One way to describe this changingtrends is to allow the parameters to change over time, driven by someexogenous shocks (macroeconomic variables)2, for example, oil shockin 1974.

2.2.3 Transformations

A transformation that cuts down the values of larger peaks of a timeseries and emphasizes the lower values may be e!ective in reducing

2see the paper by Cai (2006)


nonstationary behavior due to changing variance. An example isthe logarithmic transformation yt = log(xt), where log denotes theexponential-base logarithm.

Example 1.6: For example, the data shown in Figure 2.8 representquarterly earnings per share for the American Company Johnson &

0 20 40 60 80

05

1015

J&J Earnings

0 20 40 60 80

01

2

transformed log(earnings)

Figure 2.8: Quarterly earnings for Johnson & Johnson (4th quarter, 1970 to 1st quarter,1980, left panel) with log transformed earnings (right panel).

Johnson from the from the fourth quarter of 1970 to the first quar-ter of 1980. It is easy to note some very nonstationary behavior inthis series that cannot be eliminated completely by di!erencing ordetrending because of the larger fluctuations that occur near the endof the record when the earnings are higher. The right panel of Fig-ure 2.8 shows the log-transformed series and we note that the latterpeaks have been attenuated so that the variance of the transformedseries seems more stable. One would have to eliminate the trend stillremaining in the above series to obtain stationarity. For more detailson the current analyses of this series, see the later analyses and thepapers by Burman and Shumway (1998) and Cai and Chen (2006).

A general transformation is the well-known Box-Cox transfor-mation; see Hamilton (1994, p.126), Shumway (1988), and Shumway


and Sto!er (2000), defined in terms of arbitrary power x&t for some& in a certain range, which can be chosen based on some optimalcriterion such as the smallest mean squared error.

2.2.4 Linear Filters

The first di!erence is a linear combination of the values of the seriesat two lags, say 0 and 1 and has the e!ect of retaining the fasteroscillations and attenuating or reducing the slower oscillations. Wemay define more general linear filters to do other kinds of smoothingor roughening of a time series to enhance signals and attenuate noise.Consider the general linear combination of past and future values ofa time series given as

yt =&$

j="&aj xt"j

where aj, j = 0, ±1, ±2, . . ., define a set of fixed filter coe"cients tobe applied to the series of interest. An example is the first di!erencewhere a0 = 1, a1 = "1, aj = 0 otherwise. Note that the above {yt}is also called a linear process in probability literature.

Example 1.7: To give a simple illustration, consider the twelvemonth moving average aj = 1/12, j = 0, ±1, ±2, ±3, ±4, ±5,±6 and zero otherwise. The result of applying this filter to the SOIindex is shown in Figure 2.9. It is clear that this filter removes somehigher oscillations and produces a smoother series. In fact, the yearlyoscillations have been filtered out (see the bottom panel in Figure 2.9)and a lower frequency oscillation appears with a cycling rate of about42 months. This is the so-called El Nino e!ect that accounts for allkinds of phenomena. This filtering e!ect will be examined furtherlater on spectral analysis since it is extremely important to knowexactly how one is influencing the periodic oscillations by filtering.


0 100 200 300 400

−1.0−0.5

0.00.5

1.0

0 100 200 300 400

−0.50.0

0.5

Figure 2.9: The SOI series (black solid line) compared with a 12 point moving average (redthicker solid line). The top panel: original data and the bottom panel: filtered series.

To summarize, the graphical examination of time histories canpoint the way to further analyses by noting periodicities andtrends that may be present. Furthermore, looking at time his-tories of transformed or filtered series often gives an intuitive idea asto whether one series could also be associated with another. Figure2.1 indicates that the SOI series tends to precede or lead the recruitseries. Naturally, one can ask for a more detailed specification ofthe leading lagging relation. In the following sections, we will try toshow how classical time series methods can be used to provide par-tial answers to these kinds of questions. Before doing so, we spendsome space to introduce some other features, such as seasonality,outliers, nonlinearity, and conditional heteroscedasticitycommon seen in the economic and financial as well as environmentaldata.


2.3 Other Key Features of Time Series

2.3.1 Seasonality

When time series (particularly, economic and financial time series)are observed each day or month or quarter, it is often the case thatsuch as a series displays a seasonal pattern (deterministic cyclical be-havior). Similar to the feature of trend, there is no precise definitionof seasonality. Usually we refer to seasonality when observationsin certain seasons display strikingly di!erent features to other sea-sons. For example, when the retail sales are always large in the fourthquarter (because of the Christmas spending) and small in the firstquarter as can be observed from Figure 2.10. It may also be possiblethat seasonality is reflected in the variance of a time series. For ex-ample, for daily observed stock market returns the volatility seemsoften highest on Mondays, basically because investors have to digestthree days of news instead of only day. For mode details, see thebook by Taylor (2005, §4.5) and Tsay (2005).

Example 1.8: In this example we consider the monthly US retailsales series (not seasonally adjusted) from January of 1967 to Decem-ber of 2000 (in billions of US dollars). The data can be downloadedfrom the web site at http://marketvector.com. The U.S. retail salesindex is one of the most important indicators of the US economy.There are vast studies of the seasonal series (like this series) in theliterature; see, e.g., Franses (1996, 1998) and Ghysels and Osborn(2001) and Cai and Chen (2006). From Figure 2.10, we can observethat the peaks occur in December and we can say that retail sales dis-play seasonality. Also, it can be observed that the trend is basicallyincreasing but nonlinearly. The same phenomenon can be observedfrom Figure 2.8 for the quarterly earnings for Johnson & Johnson.


0 100 200 300 400

500001

00000

200000

300000

Figure 2.10: US Retail Sales Data from 1967-2000.

If simple graphs are not informative enough to highlight possibleseasonal variation, a formal regression model can be used, for exam-ple, one might try to consider the following regression model withseasonal dummy variables

# yt = yt " yt"1 =s$

j=1%j Dj,t + 't,

where Dj,t is a seasonal dummy variable and s is the number of sea-sons. Of course, one can use a seasonal ARIMA model, denotedby ARIMA(p, d, q) $ (Q, D, Q)s, which will be discussed later.

Example 1.9: In this example, we consider a time series withpronounced seasonality displayed in Figure 2.11, where logs of four-weekly advertising expenditures on ratio and television in The Nether-lands for 1978.01" 1994.13. For these two marketing time series onecan observe clearly that the television advertising displays quite someseasonal fluctuation throughout the entire sample and the radio ad-vertising has seasonality only for the last five years. Also, there seemsto be a structural break in the radio series around observation 53.This break is related to an increase in radio broadcasting minutes inJanuary 1982. Furthermore, there is a visual evidence that the trend


0 50 100 150 200

89

1011

television

radio

Figure 2.11: Four-weekly advertising expenditures on radio and television in The Nether-lands, 1978.01 " 1994.13.

changes over time.

Generally, it appears that many time series seasonally observedfrom business and economics as well as other applied fields displayseasonality in the sense that the observations in certain seasons haveproperties that di!er from those data points in other seasons. Asecond feature of many seasonal time series is that the seasonalitychanges over time, like what studied by Cai and Chen (2006). Some-times, these changes appear abrupt, as is the case for advertisingon the radio in Figure 2.11, and sometimes such changes occur onlyslowly. To capture these phenomena, Cai and Chen (2006) proposeda more general flexible seasonal e!ect model having the followingform:

yij = &(ti) + %j(ti) + eij, i = 1, . . . , n, j = 1, . . . , s,

where yij = y(i"1)s+j, ti = i/n, &(·) is a (smooth) common trendfunction in [0, 1], {%j(·)} are (smooth) seasonal e!ect functions in[0, 1], either fixed or random, subject to a set of constraints, and theerror term eij is assumed to be stationary. For more details, see Caiand Chen (2006).


2.3.2 Aberrant Observations

Possibly distorting observations do not necessarily come in a se-quence as in the radio advertising example (which might have so-called regime shifts). It may also be that only few observations havea major impact on time series modeling and forecasting. Such datapoints are called aberrant observations (outliers in statistics).

Example 1.10: As an illustration example, we consider the di!er-enced yt, that is # yt = yt " yt"1, where yt = log(wt), with wt theprice level, and the inflation rate ft = (wt"wt"1)/wt"1 in Argentina,for the sample 1970.1 " 1989.4 in Figure 2.12. From the figure, it

70 75 80 85 90

01

23

45

differenceinflation

Figure 2.12: First di!erence in log prices versus the inflation rate: the case of Argentina,1970.1 " 1989.4.

is obvious that in the case where the quarterly inflation rate is high(as is the case in 1989.3 where it is about 500 percent), # yt seriesis not a good approximation to the inflation rate (since the 1989.3observation would not now correspond to about 200 percent). Also,we can observe that the data in 1989 seem to be quite di!erent fromthose observations the year before. In fact, if there is any correlationbetween # yt and # yt"1, such a correlation may be a!ected by these


observations. In other words, if a simple regression is used to modelthe correlation between # yt and # yt"1, we would expect that an es-timate of the coe"cient $ is influenced by the data points in the lastyear. The ordinary least square estimates of $ are "$ = 0.561(0.094)for the entire sample and "$ = 0.704(0.082) for the sample withoutthe data points in the last year, where the estimated standard erroris in parentheses.

It now turns to the question on how to handle with these aberrantobservations. See Chapter 6 of Franses (1998) for details on discus-sion of methods to delete several types of aberrant observations, andalso methods to take account of such data for forecasting.

2.3.3 Conditional Heteroskedasticity

An important feature of economic time series, and in particular, offinancial time series is that aberrant observations tend to emergein clusters (persistence). The intuitive interpretation is that ifa certain day news arrives on a stock market, the reaction to thisnews is to buy or sell many stocks, while the day after the news hasdigested and valued properly, the stock market returns back to thelevel of before the arrival of the news. This pattern would be reflectedby (a possibly large) increasing or decreasing (usually, the negativeimpact is larger than the positive impact, called asymmetric) in thereturns on one day followed by an opposite change on the next day.As a result, we can regard them as aberrant observations in a rowand two sudden changes in returns are correlated since the secondsharp change is caused by the first. This is called conditionalheteroskedasticity. To characterize this phenomenon, one mightuse the so called the autoregressive conditional heteroscedasticity(ARCH) model of Engle (1982) and the generalized autoregressive


conditional heteroscedasticity (GARCH) model of Bollerslev (1986)or other GARCH type models; see Taylor (2005) and Tsay (2005).

Example 1.11: This example concerns the closing bid prices of theJapanese Yen (JPY) in terms of U.S. dollar. There is a vast amountof literature devoted to the study of the exchange rate time series; seeSercu and Uppal (2000) and the references therein for details. Herewe explore the possible nonlinearity feature (see the next section),heteroscedasticity, and predictability of the exchange rate series (Wewill discuss this later). The data is a weekly series from January 1,1974 to December 31, 2003. The daily noon buying rates in New YorkCity certified by the Federal Reserve Bank of New York for customsand cable transfers purposes were obtained from the Chicago FederalReserve Board (www.frbchi.org). The weekly series is generated byselecting the Wednesdays series (if a Wednesday is a holiday then thefollowing Thursday is used), which has 1566 observations. The use ofweekly data avoids the so-called weekend e!ect as well as other biasesassociated with nontrading, bid-ask spread, asynchronous rates andso on, which are often present in higher frequency data. We considerthe log return series yt = 100 log((t/(t"1), plotted in Figure 2.13,where (t is an exchange rate level on the t-th week. Around the44th week of 1998 (the period of the Asian fiance crisis), the returnson the Japanese and U.S. dollar exchange rate decreased by 9.7%.Immediately after that observation, we can find several data pointsthat are large in the absolute value. Additionally, in other partsof sample, we can observe “bubbles”, i.e., clusters of observationswith large variances. This phenomenon is called volatility clustering(persistence) or, conditional heteroskedasticity. In other words, thevariance changes over time.

To allow for the possibility that high volatility is followed by high


0 500 1000 1500

−10−5

05

Figure 2.13: Japanese - U.S. dollar exchange rate return series {yt}, from January 1, 1974to December 31, 2003.

volatility, and that low volatility will be followed by low volatility,where volatility is defined in terms of returns themselves, one canconsider the presence of so-called conditional heteroskedasticity orits variants such as ARCH or GARCH type models by using (# yt)2

as the variance of the returns. Also, once can replay (# yt)2 by |# yt|,the absolute value of return.

The main purpose of exploiting volatility clustering is to forecastfuture volatility. Since this variable is a measure of risk, such forecastscan be useful to evaluate investment strategies or portfolio selectionor risk management. Furthermore, it can be useful for decisions onbuying or seeling options or derivatives. See Hamilton (1994, Chapter21), Taylor (2005), and Tsay (2005) for details.

2.3.4 Nonlinearity

The nonlinear feature of time series can be seen often in economic andfinancial data as well as other applied fields; see the popular books byTong (1990), Granger and Terasvirta (1993), Franses and van Dijk(2000), and Fan and Yao (2003). Beyond linear domain, there are


infinite many nonlinear forms to be explored. Early development ofnonlinear time series analysis focused on various nonlinear (some-times non-Gaussian) parametric forms. The successful examples in-clude, among others, the ARCH-modeling of fluctuating structurefor financial time series, and the threshold modeling for biologicaland economic data, as well as regime switches or structuralchange modeling for economic and financial time series.

Example 1.12: Consider the example of the unemployment rate inGermany for 1962.1 to 1991.4 in Figure 2.14. From the graph in the

65 70 75 80 85 9002

46

810

unadjustedseasonally adjusted

2 4 6 8

24

68

Figure 2.14: Quarterly unemployment rate in Germany, 1962.1"1991.4 (seasonally adjustedand not seasonally adjusted) in the left panel. The scatterplot of unemployment rate (sea-sonally adjusted) versus unemployment rate (seasonally adjusted) one period lagged in theright panel.

left panel, it is clear that unemployment rate sometimes rises quiterapidly, usually in the recession years 1967, 1974" 1975, and 1980"1982, while it decreases very slowly, usually in times of expansions.This asymmetry can be formalized by estimating the parameters inthe following simple regression

# yt = yt " yt"1 = %1 It(E) + %2 It(R) + 't,

where It(·) is the indicator variable, which allows the absolute valueof the rate of change to vary across the two state, say, “decreasing


yt” and “increasing yt” from %1 to %2, where %1 may be di!erentfrom "%2. For the German seasonally adjusted unemployment rate,we find that #%1 = "0.040 and #%2 = 0.388, indicating that when theunemployment rate increases (in recessions), it rises faster than whenit goes down (in expansions).

Furthermore, from the graph in the right panel, the scatterplotof yt versus yt"1, where yt is the seasonally adjusted unemploymentrate, we can observe that this series displays cyclical behavior aroundpoints that shift over time. When these shifts are endogenous, i.e.,caused by past observations on yt themselves, this can be viewed asa typical feature of nonlinear time series. For the detailed analysisof this dataset using a nonlinear methods, the reader is referred tothe book by Franses (1998) for nonlinear parametric model and thepaper by Cai (2002) for nonparametric model.

2.4 Time Series Relationships

One can identify two basic kinds of association or correlationthat are important in time series considerations. The first is the no-tion of self correlation or autocorrelation introduced in (2.2). Thesecond is that series are somehow related to each other so that onemight hypothesize some causal relationship existing between the phe-nomena generating the series. For example, one might hypothesizethat the simultaneous oscillations of the SOI and recruitment seriessuggest that they are related. We introduce below three statistics foridentifying the sources of time correlation. Assume that we have twoseries xt and yt that are observed over some set of time points, sayt = 1, . . . , n.


2.4.1 Autocorrelation Function

Correlation at adjacent points of the same series is measured bythe autocorrelation function defined in (2.2). For example, the SOIseries in Figure 2.1 contains regular fluctuations at intervals of ap-proximately 12 months. An indication of possible linear as well asnonlinear relations can be inferred by examining the lagged scatter-plots, defined as plots that put xt on the horizontal Figure 2.15. Mul-tiple lagged scatterplots showing the relations between SOI and thepresent (xt) and lagged values (xt+h) of SOI at lags h = 0, 1, . . . , 12axis and xt+h on the vertical axis for various values of the lag h = 1,2, 3, . . ., 12.

Example 1.13: In Figure 2.15, we have made a lagged scatterplotof the SOI series at time t + h against the SOI series at time t andobtained a high correlation, 0.412, between the series xt+12 and the

oooo

o ooo

oooo

oo

oo

ooo

o oo

o o ooo

o

oo

oo

o ooooo

oo

oooo

oo

oooo

oo

oo

oo

o oooo

oo

oo

oo

o

ooo

o o

oooo

oo

o o oo

o

o

ooooo

o

o oo

o ooooo

ooo oooooooo

ooo

o oo

oo o

oo

oo

o

o

oo

o o

o

ooo

o

o

o

o

oo

oo o

ooooo

oo o

o ooo

oo

oo

ooo

o

o oooooooo

ooo

o oo

oo

oo

ooo

o

ooo

oo

o ooo

oo o

oo

ooo oo

oo

oooo o

oo

o oo

oo

o

o

o

oo

ooo oo

o

oo

oo

oooooo

oo

ooo

ooo

oo

oo

oo

oooo

o oo

oooo

oo

oooo

o oo oooo

oo

o

o

o ooo o

ooo

o

oo

oo

oo

o oo oo

ooo oo

o oo oooo

oo

o o

o

oo

ooo

o

o

o

o

o o

o ooooooo

o

ooo

oo

oo

oo

o

oo

oo

o o

oo

o oo

ooo

o ooo o

oo

o oo

o

oo

o

oo

o oo

oo o

ooooooo

ooo

o o

ooooo ooo

o oooo

oo o

ooo

o oo

ooo ooo

o

oo o

ooo

oo

oo

−1.0 0.0 0.5 1.0−1.0

0.01.0

1

ooo

oo ooo

o oo

oo

oo

oo o

oooo o o

oo

o

oo

oo

ooo

ooo

oo

oo oo

oo

o oo o

oo

oo

oo

oo ooo

oo

oo

oo

o

oo o

oo

ooooo

o

oo o o

o

o

o oooo

o

o oooo o

ooo

oooo oooooo

ooo

oo o

oo

oooo

oo

o

o

oo

oo

o

oo o

o

o

o

o

oo

oo o

ooo ooo

oooo o

oo

o

oo

o ooo

o o oo ooooooo

oo o

oo

oo o

ooo

o

o oo

oo

oo oo

ooo

oo

o ooo o

oo

ooooo

oo

oooo

o

o

o

o

oo

o ooo

oo

oo

oo

o oo ooo

oo

o oo

ooo

oo

oo

oo

oooo

ooo

oo oo

oo

oo oo

ooo o ooo

oo

o

o

oo ooo

ooo

o

oo

oo

oo

o ooo o

ooooo

o ooo ooo

oo

oo

o

oo

oo oo

o

o

o

oo

oo oooooo

o

ooo

oo

oo

oo

o

oo

oo

o o

oo

o oo

o oooo ooo

oo

o oo

o

oo

o

oo

oooo

o oooooooo

oo o

oo

oooooo oo

o o ooo

ooo

ooo

o oo

o ooo ooo

oo o

oo oo

o

ooo

−1.0 0.0 0.5 1.0−1.0

0.01.0

2

ooooo

oooo o

oo

oo

oo o

ooo

ooo

o o

o

oo

o o

oooooo

oo

ooo o

ooo o

oo

oo

oo

oo

ooo oo

oo

oo

oo

o

oo o

oo

o ooooo

o oo o

o

o

ooooo

o

ooo

oooooo

ooooo ooooo

ooo

ooo

ooo oo

o

oo

o

o

oo

o o

o

ooo

o

o

o

o

oo

oooo o

oo oooo

o ooooo

oo

ooo

o

oo o oo ooooooo

ooo

oo

o o

oo o

o

oo o

ooo oo

o

ooo

oo

o ooooooooo

oo

ooo o

ooo

o

o

o

oo

o oooo

o

oo

oo

o ooo o

oo

o

ooo

oo o

oo

oo

oo

o ooo

ooo

oo o o

oo

ooo oooo o o oo

oo

o

o

ooooo

o oo

o

o o

ooo

oo o

o oo

oooo

ooo

o oo oo

oo

oo

o

oo

oo o

o

o

o

o

oo

o oo ooooo

o

ooo

ooo

oo

o

o

oo

oo

oo

oo

ooo

ooo

oooooo

ooo

oo

o o

o

oo

ooo

oooo oooooo

ooo

oo

o ooooooo

oo o oo

ooo

o oo

ooo

ooooo oo

ooo

ooo

oo

o ooo

−1.0 0.0 0.5 1.0−1.0

0.01.0

3

oooo

oo

ooo

oo

oo

ooo

o oo

o oo o o

o

oooo

oooo oo

ooo

ooo

oooo o o

oo

oo

oo

o oooo

oo

oo

oo

o

ooo

o o

oo oo

oo

oo oo

o

o

o oo oo

o

ooo

o ooo oo

oooooo oooooooo

ooo

ooo

oo

oo

o

o

oo

oo

o

oo o

o

o

o

o

oo

oo o

o oooo

ooo

oo oo

oo

oo

o oo

o

ooo o oo ooooo

ooo

oo

ooo

ooo

o

ooo

oooo o

o

ooo

oo

ooo oo

ooooo

oo

oo

ooo

oo

o

o

o

oo

ooo ooo

oo

oo

o oo ooo

oo

o oo

ooo

oo

oo

oo

oo oo

ooo

ooo o

oo

oooooo

oo o o o

ooo

o

o ooo o

ooo

o

oo

oo

oo

ooo o o

o ooo

oo oo o oo o

oo

oo

o

oo

oo o

o

o

o

o

o o

oo oo oooo

o

ooo

oo

oo

oo

o

oo

oo

oo

oo

ooo

o oo

oooo o

oo

ooo

o

oo

o

oo

o oo

ooo

o o ooooo

ooo

o o

oo ooooo

o

ooo oo

ooo

ooooo

oo o

o ooo o

ooo

oo o

oo

oo oo

o

−1.0 0.0 0.5 1.0−1.0

0.01.0

4

ooooo

ooo

oo

oo

oo o

o oo

oooo o

o

oo

oo

o oooo o

oo

oooo

oo

ooo o

oo

oo

oo

oo ooo

oo

oo

oo

o

ooo

o o

ooooo

o

ooo o

o

o

oooo o

o

ooo

o o ooo o

ooooooo ooo

oooo

oooo

o oo

o

oo

o

o

oo

o o

o

oo o

o

o

o

o

oo

ooo

ooo ooo

ooooooo

o

oo

o oo

o

oooo o oo oo

ooooo

oo

ooo

oo o

o

ooo

oo

ooo o

oooo

oo o

o o oo

oooo

oo

ooo o

oo

o

o

o

o

oo

o oo o o

o

ooo

ooo

o o oo

oo

o oo

ooo

oo

oo

oo

o ooo

ooo

oo oo

oo

oooo

o oooo o o

oo

o

o

oo ooo

ooo

o

oo

oo

ooo o

o o o

ooooo

oooo o o

o

oo

oo

o

oo

ooo

o

o

o

o

oo

ooo oo ooo

o

ooo

oo

ooo

o

o

oo

oo

oo

oo

o oo

o oo

o oooo

oo

o oo

o

o o

o

oo

ooo

oo o

oo o oooo

ooo

oo

ooo oooo

o

o oooo

ooo

ooo

ooo

o ooo oo

o

ooo

oo o

oo

oooo

oo

−1.0 0.0 0.5 1.0−1.0

0.01.0

5

ooo

oo oo

oo

oo

oo o

ooo

ooo oo

o

oo

oo

oo oooo

ooo

ooo

oo

oooo

oo

oo

oo

o oo oo

oo

oo

oo

o

oo ooo

o ooo

oo

oooo

o

o

o oo oo

o

ooooo o

ooo

oooooooo oo

oooooo

ooo o

oo

oo

o

o

oo

oo

o

ooo

o

o

o

o

oo

oo oo o

o o oo

o oooo

ooo

oo

ooo

o

o oooo o oo ooo

oooo

oo

o o

oo o

o

o oo

oo

oooo

oo o

oo

oooo o

oo

o oooo

oo

oooo

o

o

o

o

oo

oooo o

o

oo

oo

ooo o o

oo

o

ooo

oo o

oo

oo

oo

oo oo

ooo

ooo o

oo

oooo

oooooo o

ooo

o

ooooo

o oo

o

oo

oo

oo

oooo o

o oo o

ooo

o oo oo

oo

oo

o

oo

ooo

o

o

o

o

oo

o ooo oo oo

o

ooo

oo

oo

oo

o

oo

oo

o o

oo

o oo

ooo

oo ooo

oo

oooo

o o

o

oo

ooo

ooo

ooo o ooo

ooo

oo

o oooooo

o

oo ooo

ooo

ooo

o oo

ooo oo oo

ooo

ooo

oo

o ooo

oo

o

−1.0 0.0 0.5 1.0−1.0

0.01.0

6

ooo

oo o

oo

oo

ooo

o oo

o ooo o

o

oo

oo

ooo

ooo

oo

oooo

oo

o ooo

oo

oo

oo

oo ooo

oo

oo

oo

o

ooooo

o o ooo

o

oooo

o

o

o ooo o

o

ooo

ooo o oo

o oooooooo oo

oooooo

ooo

oo

oo

o

o

oo

oo

o

oo o

o

o

o

o

oo

ooo

oooo o

ooo

oooo

oo

oo

ooo

o

oo oooo o ooooo

oooo

oo o

ooo

o

oo o

ooo oo

o

ooo

oooo

o ooo

ooo o

oo

oo

ooo

oo

o

o

o

oo

ooo oo

o

ooo

ooo

oo oo

oo

ooo

ooo

oo

oo

oo

o ooo

o oo

oooo

oo

oooo

ooo oooo

oo

o

o

oooo o

ooo

o

o o

oo

oo

ooo oo

o ooo

ooo

oo ooo

oo

oo

o

oo

oo o

o

o

o

o

oo

oo ooo oo o

o

ooo

oo

oo

oo

o

oo

oo

o o

oo

ooo

ooo

o oooo

oo

ooo

o

oo

o

oo

ooo

ooo

o ooo o oo

ooo

oo

oo ooo oo

o

ooo oo

oo o

ooo

ooooo

o o ooo

oo o

ooo

oo

oo oo

o o

oo

−1.0 0.0 0.5 1.0−1.0

0.01.0

7

ooooo

oo

oo

ooo

o oo

o o ooo

o

oo

o o

oooo oo

oo

oooo

oo

oooo

oo

oo

oo

ooo oo

oo

oo

oo

o

ooo

o o

oo oo

oo

oooo

o

o

ooo oo

o

o oo

oooo o o

oo oooooooooooo

ooo

ooo

oo

oo

o

o

oo

oo

o

oooo

o

o

o

oo

oo o

o oo oo

ooo

o ooo

oo

oo

ooo

o

o oo oooo o oo o

ooo

ooo

oo

ooo

o

o oo

oo

oo oo

oo o

oooo

oo oo

oooooo

oo

oooo

o

o

o

o

oo

oooo o

o

oo

oo

oooooo

oo

o ooo

o o

oo

oo

oo

oo oo

ooo

oooo

oo

oo oo

oooo ooo

oo

o

o

ooooo

ooo

o

oo

oo

oo

oooo o

o oo oo

ooooo o

o

oo

o o

o

oo

oooo

o

o

o

o o

ooo ooo oo

o

ooo

ooo

oo

o

o

oo

oo

oo

oo

ooo

o oo

o o oo o

oo

o oo

o

oo

o

oo

o oo

oo o

oo ooo o o

ooo

oo

ooo ooo o

o

ooooo

oo o

ooooo

ooo

oo o oo

ooo

ooo

oo

o ooo

oo

oo

o

−1.0 0.0 0.5 1.0−1.0

0.01.0

8

oooo

oo

oo

ooo

ooo

oo ooo

o

oo

o o

ooooo o

ooo

ooo

oo

ooo o

ooo

oo

o

ooooo

oo

oo

oo

o

oo o

oo

oooo

oo

o ooo

o

o

o oo o o

o

ooo

ooooo o

ooo ooooooo

ooo

ooooo

oooo

oo

o

o

ooo o

o

oo oo

o

o

o

oo

ooooo

oo oo

o ooo o

oo

o

oo

o ooo

o o oo oooo ooo

ooo

ooooo

oo o

o

o o o

oo

oooo

ooo

oo

ooooo

oo

o ooo o

oo

ooo

oo

o

o

o

ooo ooooo

oo

ooo o

oooo

oo

ooo

oo o

oo

oo

oo

oooo

o oo

oooo

oo

oo o o

ooooo oo

oo

o

o

ooooo

o oo

o

oo

ooo

oo o

ooo

ooo o

oo o

ooooo

oo

oo

o

ooo

ooo

o

o

o

o o

oooo ooo o

o

ooo

oooo

oo

o

oo

oo

o o

oo

ooo

o oo

oo ooo

oooo

oo

o o

o

oo

oo oo

ooooo ooo o

ooo

oo

ooooooo

o

ooooo

ooo

o oooo

oo o

ooo oo

ooo

ooo

oo

o o oo

oo

oo

oo

−1.0 0.0 0.5 1.0−1.0

0.01.0

9

ooo

oo

oo

oo ooo

oo oo

o o

o

ooo o

o ooooo

oo

oo oo

oooo

oo

oooo

oo

o oooo

oo

oo

oo

o

ooooo

o ooo

oo

oo oo

o

o

oooo o

o

o oo

oooooo

o ooo oooooo

oo o

ooo

oooo

oo

oo

o

o

oo

oo

o

ooo

o

o

o

o

oo

ooo

o oo oo

oo o

oooo

oo

oo

oooo

oo o oo ooooo o

oooo

oooo

oo o

o

oo o

oo

oooo

oooo

oo o

oooo

oo o o

oo

oooo

oo

o

o

o

o

oo

ooooo

o

oo

oo

oo ooo

oo

o

o oo

ooo

oo

oo

oo

oooo

ooo

oooo

oo

ooo o

oooooo o

ooo

o

o oooo

ooo

o

o o

oo

oo

ooooo

o oo o

ooo

ooooo

oo

o o

o

ooooo

o

o

o

o

o o

o oooo ooo

o

o oo

oo

ooo

o

o

ooo

o

o o

oo

o oo

ooo

oooo o

oo

ooo

o

oo

o

oo

o oo

ooo

o ooo ooo

ooo

oo

ooooo oo

o

ooooo

ooo

o oooo

ooo

ooooo

oo o

oo o

oo

oo ooo o

oo

ooo

−1.0 0.0 0.5 1.0−1.0

0.01.0

10

oo

oo

oo

ooooo

oo o o

o o

o

oo

oo

o o oooo

oo

ooo o

oo

oooo

oo

ooo

o

oo ooo

oo

oo

oo

o

ooo

o o

oo oo

oo

ooo o

o

o

ooo oo

o

ooo

o ooooo

o o ooo oooooo

oooooo

oooo

o

oo

o

o

oo

o o

o

ooo

o

o

o

o

oo

oo o

oooo o

oooo ooo

oo

oo

o oo

o

ooo o oo oooo o

oo o

oo

ooo

ooo

o

ooo

oo

o ooo

ooo

oo

ooooo

oo

oo ooo

oooo

oo

o

o

o

o

oo

o oo oo

o

oo

oo

ooo oo

oo

o

o oo

ooo

oo

oo

oo

o ooo

o oo

oo oo

oo

oo oo

o oooooo

ooo

o

o o ooo

ooo

o

oo

oo

oooo

o oo

oooo

oo oo oooo

oo

o o

o

ooooo

o

o

o

o

oo

o o oooo oo

o

oo o

ooo

ooo

o

oo

oo

oo

oo

o oo

ooo

o ooo o

oo

ooo

o

oo

o

oo

o oo

ooo

oo ooo oo

oo o

oo

oooooo o

o

o oooo

oo o

ooo

ooooo

o oooo

oooo

oooo

oooo

oo

oo

o ooo

−1.0 0.0 0.5 1.0−1.0

0.01.0

11o

oo

oo

ooo

o oooo o

oo

o

oo

o o

o o oooo

oo

oooo

oooo

oo

oo

oooo

o oo oo

oo

oo

oo

o

ooooo

oooo

oo

o ooo

o

o

oooo o

o

o oo

oo oooo

ooo ooo ooooooo

ooo

oo

ooo

o

oo

o

o

oo

oo

o

ooo

o

o

o

o

oo

ooo

ooo ooo

o oo o o

oo

o

oo

oooo

oooo o oo oo

ooo

ooooo

oo

ooo

o

o oo

oo

oo oo

oo o

ooo o

o ooo

oo oo

o o

oo

ooo

oo

o

o

o

oo

oooo

oo

oo

oo

o ooo

oo

oo

o oo

oo o

oo

oo

oo

oo oo

ooo

ooo oo

o

ooo o

o oooooo

oo

o

o

o o ooo

ooo

o

oo

ooo

ooo

oo o

ooo o

oo o

oo ooo

oo

oo

o

oo

oooo

o

o

o

oo

o o o oooo o

o

o oo

oooo

oo

o

oo

oo

o o

oo

ooo

ooo

o o ooo

oo

o ooo

o o

o

oo

ooo

oo oooo ooo o

oo o

oo

ooooooo

o

oo ooo

ooo

ooo

o oooo

oo ooo

oo o

ooo

oo

oooo

o o

oo

oo ooo

−1.0 0.0 0.5 1.0−1.0

0.01.0

12

oo

oo

oooooo

ooo

o o

o

oo

oo

ooo

o oo

oo

oooo

oo

oooo

oo

oo

oo

o o ooo

oo

oo

oo

o

ooooo

o ooo

oo

o o oo

o

o

oooooo

o ooo oo

ooo

ooo o ooo ooo

ooo

oo o

oo

oooo

oo

o

o

ooo o

o

oo oo

o

o

o

oo

oo o

o ooo o

ooo

oo ooo

o

oo

ooo

o

o oooo o oo o

ooo

o oo

oooo

ooo

o

o o o

oo

o ooo

ooo

oooo

oo ooo

oo oo o

oo

o ooo

o

o

o

o

ooo o

o ooo

oo

oo

o oooo

oo

o

ooo

ooo

ooo

o

oo

oooo

ooo

oo oo

oo

oooo

ooo oooo

oo

o

o

oo oo o

ooo

o

o o

oo

oo

o oooo

oooo

oo o

o oo oo

oo

o o

o

oo

oooo

o

o

o

o o

oo o o oooo

o

oo o

ooooo

o

o

oo

oo

oo

oo

o oo

o oo

oo ooo

oo

ooo

o

oo

o

oo

oooo

oooooo ooo

ooo

o o

ooooooo

o

ooo oo

ooo

o oo

o oooo

ooo oo

oo o

oo o

oo

oooo

o o

oo

ooo oo

o

−1.0 0.0 0.5 1.0−1.0

0.01.0

13

o

oo

oo o

ooo

oooo o

o

oo

oo

o ooo o o

ooo

o oo

oo

oooo

oo

oo

oo

oo o oo

oo

oo

oo

o

oo o

oo

oo oo

oo

oo o o

o

o

ooooo

o

ooo

oo oo oo

oooo o ooo oo

ooo

ooo

oo

oooo

oo

o

o

oo

oo

o

ooo

o

o

o

o

oo

oo o

ooooo

oo oo oo

ooo

oo

ooo

o

oo oooo o oooo

oo o

oo

ooo

ooo

o

oo o

oo

o o oo

ooo

oo

ooo oo

ooooo

oo

oo

ooooo

o

o

o

oo

oooo o

o

oo

oo

ooo ooo

oo

ooo

oo o

oo

oo

oo

o ooo

ooo

ooo o

oo

oooo

o oo o ooo

oo

o

o

oooo o

ooo

o

oo

oo

oooo

ooo

o ooo

ooo

o o ooo

oo

oo

o

oo

oo o

o

o

o

o

oo

ooo o o ooo

o

ooo

oo

oooo

o

oo

oo

oo

oo

o oo

o oo

oooo o

oo

o oo

o

oo

o

oo

o oo

oo o

o oooo oo

ooo

o o

oooooooo

o oooo

ooo

ooo

ooo

oooooo

o

ooo

oooo

o

o ooooo

ooo ooo o

oo

−1.0 0.0 0.5 1.0−1.0

0.01.0

14o

o

ooo

ooo

o oooo

o

oo

o o

oooo o o

oo

ooo o

ooo o

oo

oo

oo

oo

ooo oo

oo

oo

oo

o

ooo

oo

oooo

oo

ooo o

o

o

o oooo

o

o oo

o oooo o

ooooo o ooo o

ooo

ooo

ooooo

o

oo

o

o

oo

o o

o

oo o

o

o

o

o

oo

ooo

o oo oo

ooo

oo oo

oo

oo

ooo

o

o oooooo o o

o ooooo

oo

oo

ooo

o

ooo

oo

oo oo

ooo

oo

oooo o

ooooo

o o

oo

ooo

oo

o

o

o

ooo o

o ooo

oo

oo

o oo o o

oo

o

ooo

oo o

oo

oo

oo

oo oo

o oo

oo oo

oo

oooo

oooo o oo

oo

o

o

oooo o

o oo

o

oo

oo

oo

ooo oo

oooo

oo o

o o o oo

oo

oo

o

oo

ooo

o

o

o

o

oo

o ooo o o oo

o

o oo

ooo

ooo

o

ooo

o

oo

oo

ooo

ooo

oooo o

oo

o oo

o

oo

o

oo

oooo

o ooo oooo o

oo o

oo

o oooooo

o

oo ooo

ooo

ooo

ooo

o ooooo

o

ooo

oo o

oo

oo oo

oo

oo

oo ooo

ooo

−1.0 0.0 0.5 1.0−1.0

0.01.0

15o

ooo

o oooo o

oo

o

oo

o o

ooooo o

oo

oooo

oo

oo oo

ooo

oo

o

ooooo

oo

oo

oo

o

ooo

o o

oooo

oo

o ooo

o

o

ooooo

o

ooo

o o oo oo

oooooo o ooo

ooo

ooo

oo

ooo

o

oo

o

o

ooo o

o

oooo

o

o

o

oo

oo o

o ooo o

oo o

o ooo

oo

oo

o ooo

oo oo oooo ooo

ooo

oo

oo o

ooo

o

ooo

oo

oooo

oo o

ooo o

oooo

oooo

oo

oo

o oo

oo

o

o

o

oo

oooo

oo

oo

oo

oooo o

ooo

ooo

oo o

oo

oo

oo

o ooo

ooo

ooo o

oo

oooo

ooo oo o o

ooo

o

o oooo

o oo

o

oo

oo

oo

o ooo o

ooo o

ooo

oo o oo

oo

oo

o

oo

oo o

o

o

o

o

oo

oo ooo o o o

o

oo o

oo

oo

oo

o

oooo

oo

oo

o oo

o oo

o ooooo

ooo

oo

o o

o

oo

ooo

ooo

o oo oooo

ooo

oo

o o ooooo

o

ooo oo

ooo

ooo

o oo

o oooooo

ooo

oo o

oo

oooooo

oo

o oo oo

ooo

o

−1.0 0.0 0.5 1.0−1.0

0.01.0

16

Figure 2.15: Multiple lagged scatterplots showing the relationship between SOI and thepresent (xt) versus the lagged values (xt+h) at lags 1 # h # 16.

series xt shifted by 12 years. Lower order lags at t " 1, t " 2 also


show correlation. The scatterplot shows the direction of the relationwhich tends to be positive for lags 1, 2, 11, 12, 13, and tends to benegative for lags 6, 7, 8. The scatterplot can also show no significantnonlinearities to be present. In order to develop a measure for thisself correlation or autocorrelation, we utilize a sample version of thescaled autocovariance function (2.2), say

"$x(h) = "#x(h)/ "#x(0),

where"#x(h) =

1

n

n"h$

t=1(xt+h " x)(xt " x),

which is the sample counterpart of (2.2) with x =%n

t=1 xt/n. Underthe assumption that the underlying process xt is white noise, theapproximate standard error of the sample ACF is

)$ =1'n. (2.3)

That is, "$x(h) is approximately normal with mean 0 and variance1/n.

Example 1.14: As an illustration, consider the autocorrelationfunctions computed for the environmental and recruitment seriesshown in the top two panels of Figure 2.16. Both of the autocor-relation functions show some evidence of periodic repetition. TheACF of SOI seems to repeat at periods of 12 while the recruitmenthas a dominant period that repeats at about 12 to 16 time points.Again, the maximum values are well above two standard errors shownas dotted lines above and below the horizontal axis.

2.4.2 Cross Correlation Function

The fact that correlations may occur at some time delay when tryingto relate two series to one another at some lag h for purposes of


0 10 20 30 40 50−0.5

0.00.5

1.0ACF of SOI Index

0 10 20 30 40 50−0.5

0.00.5

1.0

ACF of Recruits

−40 −20 0 20 40−0.5

0.00.5

1.0

CCF of SOI and Recruits

Figure 2.16: Autocorrelation functions of SOI and recruitment and cross correlation functionbetween SOI and recruitment.

prediction suggests that it would also be useful to plot xt+h againstyt.

Example 1.15: In order to examine this possibility, consider thelagged scatterplot matrix shown in Figures 2.17 and 2.18. Figure2.17 plots the SOI at time t + h, xt+h, versus the recruitment seriesyt at lag 0 # h # 15 in Figure 2.17. There are no particularly stronglinear relations apparent in this plots, i.e. future values of SOI are notrelated to current recruitment. This means that the temperatures arenot responding to past recruitment. In Figure 2.18, the current SOIvalues, xt are plotted against the future recruitment values, yt+h for0 # h # 15. It is clear from Figure 2.18 that the series are correlatednegatively for lags h = 5, . . . , 9. The correlation at lag 6, forexample, is "0.60 implying that increases in the SOI lead decreasesin number of recruits by about 6 months. On the other hand, theseries are hardly correlated (0.025) at all in the conventional sense,


ooooo oooo oooo o oo ooo

ooo o o ooo

oo

oooo oooooooo ooo

o

oo

o ooo

o oo ooo oo

ooo

oo

ooo

o ooo

ooo

oooo

oo o oo oo

oooooo o ooo oooooooo ooo

oooooooo o oooo oo ooo

oo

o oo

o oo ooo

oo

oo oo

o ooo oo

ooo oo

oo

oo o oo oooo o o

o

ooooooooo o ooo o

oo

oooo ooo

o

ooooooo oo

o ooo o

oooo

ooo oo

o

o oo oooo oo o o

oo oooo

o

oo

ooooo o oo ooo ooo

oooo

oooooo

ooo

o ooooo

oo

ooo

oo ooooo

o ooo ooo o

ooo oooooo

o

o

ooooo

oo ooo oo ooo

oo

oo ooo o

o oo

oo ooo oo oo

ooooo

oo

ooo o

o

o

oo ooo

ooo ooo o o

o oooo ooo o

oo o oo ooo

o ooo

o

oo o

oooooo

ooo ooo ooooo

ooo

oo

ooooo

ooo

oo

ooo o

oo oo

ooo o oo ooo oo

o

−1.0 0.0 0.5 1.0

020

60100

1

oooo ooo

o oooo o oo o ooo

oo

o o ooooo

oo

oo oooooooo oooo

o

oo

oooo

oo ooooooooo

oo

ooo

o ooo

ooo

oooo

o o oooo o

ooooo o ooo oooooooo oooo

ooooooo o oo o

o oo ooooo

o

ooo

ooooo o

o o

o oo o

ooo ooo

oo oo oo

o

o o oo ooooo oo

o

oooooooo o ooo o o

oo

ooooooo

o

o ooooo oo ooo

o ooo

ooooo oo

o

o

oo oooo ooo ooo ooo

ooo

oo

o ooo o oo ooo oooo

oo oo

oooooo

oo

ooooooo

oooo o

ooooooo

ooo ooo oo

oo oooo oo oo

o

o ooo

oo oo o

oo oooo

oo

oooo o o

ooo

o ooo oooooooooo

oo

oo oo

o

o

o o ooo

o oooo o oo

oooo ooo oo

o o oo oooo

ooo o

o

o o o

oooooo

oo ooo oooooo

ooo

oooo

ooo

oo

oo o

oo oo

oooo

oo o oo oooooo

o

−1.0 0.0 0.5 1.0

020

60100

2

ooo oooo

oooo o oo o ooo o

oo

o oooooo

ooo oooooooo oooo o

o

oo

ooo o

o ooo oooo

o oo

oo

oo o

ooo o

ooooo o

oo oo o

o ooo

ooo o ooo oooooooo ooooo

oooooo

o oo oooo oooo o

oo

o oo

o ooo oo

oo

oo o o

oooooo

ooo oo

oo

o oo oooo ooo o

o

ooooooo o ooo o oo

oo

oo oooo o

o

ooo

oo oo o ooo

oooo

oooo oo o

o

o

o oooo oo oooo

ooooo o

o

oo

ooo o oo ooo oooo o

ooo

oo

oooo oo

oo

ooooo o

ooo

o ooooooo o

oo ooo ooo

o oooo oo o oo

o

oooo

ooo o o

o ooooo

oo

ooo o o o

ooo

ooo oo oooo

ooooo

ooo

oo oo

o

o ooooo o

oo o oo oooo oooooo

o oo oooo o

oooo

o

o oo

oooo

ooo ooo oooo

oo o

oo o

oooo

oo o

oo

oo o

oooo

oooo

o o ooooo o

oo oo

−1.0 0.0 0.5 1.0

020

60100

3

oo ooooo

ooo o oo o ooooo

oo

ooooo oo

oo

oooooooo oooo o oo

oo

oooo

ooo ooooooo

oo

oo

o ooo

ooooo

o ooo

oo oooooo

oo o ooo oooooooo oooooo

ooooo o oo oo oo oooo ooo

o

o oo

ooo oo o

o o

o o oo

o ooooooo oooo o

oo oooo o oo oo

o

oooooo o ooo o oo oo

oo oo

oo oo

o

ooo

o oo o ooo o

oooo

ooooo oo

o

o

oooo oo o ooo o

ooooo o

o

oo

ooo oo ooo oooo ooo

o oo

oooo oo

oo

ooo

oo oo

oo

oo o

ooooo oo

o ooo oooo

oooo oo o o oo

o

ooooo

o o oooooooo

oo

oo o o oo

oo o

oo oo ooooooooo

ooo

oo o o

o

o

oooo o

ooo o oo oo

oo ooo ooo o

oo oooooo

o ooo

o

ooo

oooooo

ooo oooooo oo

oo o

oooo

ooo

oo

ooo

ooo

o

oooo

o oo ooo ooo oo

o

−1.0 0.0 0.5 1.0

020

60100

4

o oooo oooo o oo o ooo

oo o

oo

oooo oooo

oooooooo oooo o oo

o

oo

o oo o

oo ooooo o

ooo

oo

ooo

o ooo

ooooo o

oo oo o

oooo

o o ooo oooooooo oooooooo

ooo o oo oo oooooo oo o

oo

ooo

oooo oooo

o ooo

ooooo oo

oooo o o

o oooo o ooooo

o

oooooo ooo o oo oo

oo

oooo oo o

o

ooo

oo o oooooooo

ooo o

o oo oo

o

ooo oo o ooo oo

ooo oo o

o

ooo o

oo ooo oooo oo oo

ooo

ooo oo o

oooooo ooo

oo

ooo

oooo ooo

ooo oooo o

ooooo o o oo

o

o

ooo oo

o oo ooooooo

oo

oo o ooo

o oo

o oo oooooo

oooooo

oo

o ooo

o

oooo o

ooo oo ooo

o ooo ooo o o

o oooo ooo

ooo o

o

ooo

ooooo o

oo oooooo

ooo

ooooo

ooooo

oo

oo o

oo oo

ooo o

oo ooo ooooo oo

−1.0 0.0 0.5 1.0

020

60100

5

oooo ooo

o o oo o ooo ooo o

oo

ooo oooo

oooooooo oooo o oo o

o

oo

oooo

o ooooo oo

o oo

oo

ooo oo

ooo

o oo o o

ooo oo

ooooo ooo oooooo

oo ooooooooo

ooo oo

oo oo oooo oo ooo

o

o oo

o oo oo o

o o

ooo o

oooo oo

oooo o oo

oooo o oo oooo

o

ooooo ooo o oo ooo

oo

ooooo oo

o

ooo

o o ooo ooo

ooooo oo

oo ooo

o

oo oo o oooooo

oo o ooo

o

oo

o oo ooo oooo oo oo

ooo

oo

o oo o oooo

oo oooo

oo

ooo

ooo ooo o

oo oooo oo

oo oo o o oo oo

o

oooo

ooo

ooooooo o

oo

oo ooooooo

oo ooooooo

ooooo

oo

ooo o

o

o

oo ooo

o ooo oooo

ooo oooo oo

oooo ooo o

ooo o

o

ooo

oooo oo

o oooooo ooo o

ooo

oo

o oo

ooo

oo

oo

ooo

o

oo o o

o oooooo o

o ooo

−1.0 0.0 0.5 1.0

020

60100

6

ooo oooo

o oo o ooo ooo

o oo

ooo oooo

ooo

ooooo oooo o oo ooo

oo

o ooo

oooooooo

ooo

oo

oo o

oooo

ooo

o ooo

o oooooo o

ooo oooooooo ooooooooo

oo o oo oo oo oo

oo oo oo oo

o

ooo

oooo oo

o o

oo oo

ooo oo o

ooo o oo o

ooo o oo ooooo

o

ooo o ooo o oo oooo

oooo o

o ooo

o

o oo

o ooo oooo

oooo

oo oo oo o

o

o

o oo o ooo oooo

o o o oo oo

oo

ooooo oooo oo oo o

ooo

oo

oo o oooo

oo oooo o

ooo

ooo

o ooo ooo oooo ooo

o oo o o oo ooo

o

o oo o

oo ooo

oooo oo

oo

ooooo o

oo o

o oooooooo

ooooo

oo

oo o oo

o

o o ooo

o oo oooo o

oo ooo ooo o

ooo ooooo

o ooo

o

ooo

ooo ooo

oooooo ooo o o

ooo

oo

ooo

o oo

oo

oo

oooo

oo oo

ooo ooo ooooo

o

−1.0 0.0 0.5 1.0

020

60100

7

oo ooooo

oo o ooo oo o ooo

oo

o oooo ooo

ooooo ooo

o o oo oooo

oo

ooo o

oooo ooo o

ooo

ooo

ooooooo

o ooo o

ooooo

oo oooo oooooooo oooooooooo

oo oo oo oo ooo

o oo oo o oo

o

ooo

o oo oo o

oo

o ooo

oooo oo

oo o oo oo

oo o oo oooooo

o

oo o ooo o oo oooo oooo oo

oooo

o

ooo

ooo ooooo

ooooo oo

oo ooo

o

oo o ooo ooooo

o o oooo

o

oo

o ooo oooo oo oo oo

ooo

ooo o ooo

oo

ooooo o o

oo

ooo

oooo ooooooo oooo

oo o o oo oooo

o

ooo o

ooooo

ooo ooo

oo

oooo oo

o oo

ooooooooo

ooo oo

oo

oo oo

o

o

o ooo o

oooooo oo

o ooo o oo oo

oo ooo oooo o

oo

o

ooo

ooooo o

ooooo ooo

o oo

ooo

oooo

oo o

oo

oo o

ooo

o

ooo ooo ooo oo o

oo oo

−1.0 0.0 0.5 1.0

020

60100

8

o oooo oo

o o ooo oo o o ooo

oo

oooo ooo

oo

ooo ooooo oo oooo

o

oo

ooooooo oo

o ooo o

ooo

ooo

ooo o

oo o

o ooo

ooooo o o

oo oooooooo ooooooooooo

ooo

oo oo oooooo oo o oo

oo

o oo

oooo o o

oo

oooo

o oo oooo

o oo ooo

o o oo ooooooo

o

o oooo o oo oooo oo

oo

oo ooooo

o

o oo

oo oooooo

oo oo

oo oo ooo

o

o

o o ooo ooooo o

o oo ooo

o

oo

ooo oooo oo oo ooo

ooo

oo

o ooooo

oo

ooo o o o

ooo

ooo

oo ooo oooo oooo o

o o o oo oooo

o

o

o ooo

ooooo

oo ooo o

oo

ooo ooo

ooo

ooooooooo

oo ooo

oo

oooo

o

o

oooo o

o oooo oooooo o oo

oooo ooo oo

o ooo

oo

o

ooo

o ooo

oooooo ooo o

ooo

ooo

oo

ooo

ooo

oo

oo

ooo o

oo oo

o ooo oo ooo o o

o

−1.0 0.0 0.5 1.0

020

60100

9

oooo o oo

o ooo oo o o oooo

oo

ooo ooooo

ooo oooo o oo oooo oo

oo

o ooooo ooo

oooo o

oo

oo

oooooo

ooooo o

ooooo

o ooooooooooo oo

ooooooooo ooo oo oo

oooo oo oo o oo o

oo

ooo

o oo o oo

o o

oooo

ooooooo

oo oooo

o oo oooooooo

o

o ooo o oo oooo oooo

oo oo

ooo o

o

o oo

o ooooooo

o ooo

o oooooo

o

o

o ooo ooooo o o

oo ooo o

o

oo

oooooo oo oo oooo

oo o

oo

oooooo

oooo

o o oo

oo

oo oo

o ooo oooo oooo oo

o o oo oooooo

o

o oo o

ooo

ooo ooo o o

oo

oo ooo oo oo

ooooooooo

o oo oo

oo

oooo

o

o

oo ooo

oooo ooo ooo o oo oooo

ooo oooo o

oooo

o

ooo

ooo ooo

ooo ooo o oooo

oo o

oo

o oo

o ooo

ooo

oo o o

oooo

ooo oo oooo o o

o

−1.0 0.0 0.5 1.0

020

60100

10

ooo o ooo

ooo oo o o ooooo

oo

oo ooooo

oo

o oooo o oo oooo ooo

oo

oooo

o ooo ooo o

ooo

oo

oooo o

o ooo o

o ooo

ooo oooo

oooooooo oo

ooooooooo o oo

oo oo oooo oooo o oo oo

oo

o oo

ooo ooo

oo

ooo o

o oooo o

oo ooo

o o

oo ooooooooo

o

ooo o oo oooo ooo

oo

oooo

oo oo

o

ooo

oooooooo

oo oo

oo oooo o

o

o

ooo oooooo o o

o oooo oo

oo

o oooo oo oo ooooo

ooo

oo

oooooo

ooo o o ooo

oo

ooo

oooo oooo oooo oo o

o oo ooooo oo

o

ooooooooo

ooo o o o

oo

oooo ooooo

ooooooooo

oo o oo

oo

ooo o

o

o

o o oo o

ooo ooo ooo o oo oo

oo ooo ooo o

oooo

oo

o

oo o

oooooo

oo ooo o ooooo

ooo

oo

o oo

ooo

oo

oo

oo oo

ooo ooo oo

ooo oo oo

o

−1.0 0.0 0.5 1.0

020

60100

11

oo o oo oo

oo oo o o ooooo o

ooo ooooo

oo

ooooo o oo oooo oo o

o

oo

ooooooo oo

o o ooo

oo

oooooo

o oo

ooooo

ooo o o

oo oo

oooooo ooooooooooo o ooo

o oo oooo oo oo o oo ooo

oo

ooo

o oooo o

oo

oo oo

oooo o o

ooooo o o

o ooooooooo o

o

ooo oo oooo ooooo

ooooo

o oo o

o

ooooooooo

o oo oo

oo oo

oo ooo

o

oo ooooo oo ooooo o

ooo

oo

oooo oo oo oooooo

oo o

oooooo o

ooo

o o oooo

oo

oooooo oooooooo oo o o

oo ooooo ooo

o

o oooo

ooo ooo o o oo

oo

ooo oo o

ooo

oooooooo o

o o ooo

oo

oo o o

o

o

o oooo

ooooo ooo

o oo oooo oo

o ooo o ooo

oooo

o

o oo

o ooooo

o ooo o oooooo

ooo

oo

ooo

oooo

ooo

ooo

o

oo oo

o oo ooo o oooo

o

−1.0 0.0 0.5 1.0

020

60100

12

o o oo o oo

o oo o o ooooo

oooo

ooooooo

oo

ooo o oo oooo oo ooo

oo

ooo o

oo oooo oo

o oo

ooo

o oo o

ooo

o oooo

oo o oo

o ooo

ooooo ooooooooooo o oo o

ooo

oooo oo ooo oo ooo o

oo

o oo

o ooo oo

oo

o oo o

ooo o oo

oooo o oo

ooooooooo o o

o

o ooo oooo oooo oo

ooooo

oo o o

o

o oooooooo

oooo o

oooo

o oo oo

o

o ooooo o ooo o

oo o oo o

o

oo

ooo oo oo oooooo o

oo o

ooooo oo

oo

oo ooooo

oo

oo o

oo oooo o

ooo oo o o o

o ooooo oo oo

o

oooo

ooo oo

o o o ooo

oo

oo oo oo

ooo

ooooooo oo

o oo oo

oo

oo oo

o

o

oo ooo

o ooo ooo o

oo oooooooooo o oooo

oooo

o

ooo

ooooooooo o oooo

oo o

ooo

oo

o ooo o

ooo

o o

oo oo

oooo

oo ooo o o ooooo

−1.0 0.0 0.5 1.0

020

60100

13

o oo o ooo

oo o o ooooo ooo

oo

ooooooo

oooo o oo oooo oo ooo

o

oo

oooo

o ooo oooo oo

ooo

ooo

o oo o

ooo

oooo

o ooooooo

oooo ooooooooooo o oo oo

oo oooo oo oo o

oo ooo ooo

o

ooo

ooo ooo

oo

oo oo

ooo oo o

ooo o oo

o

ooooooooo oo

o

o oo oooo oooo oo ooo

oo oo o oo

o

ooo

ooooo oo o

o ooo

ooooo o oo

o

ooooo o o oo ooo o oo

ooo

oo

oooo oo oooooo oo

ooo

oo

oo oooo

oo

oooooo

oo

ooo

ooooo oo

oo oo o o oo

ooooo oo o o

o

o

oooo

oo ooo

o o oooo

oo

ooo ooo

ooo

oooooooo o

oo o oo

oo

oooo

o

o

o oooo

ooo ooo o o

o oooo ooo o

oo o ooooo

oooo

o

oo o

oooo

o ooo o ooooo

o oo

oo o

oo

ooo

oooo

oo o

oooo

ooo o

o oooo o oo

oooo

−1.0 0.0 0.5 1.0

020

60100

14

oo o ooooo o o ooooo oo

ooo

ooooooo

oooo o oo oooo oo ooo o

o

oo

o oooooo o o

oo ooo

oo

oo

o ooooo

ooo

oooo

ooo oooo

oooo oooooo

ooooo o oo oo oo

oooo oo oo o oo ooo oo o

oo

o oo

oooooo

o o

o ooo

o ooo oo

oo o oo

oo

ooooooo oooo

o

oooooo oooo oo ooo

oo oo

o ooo

o

ooo

oooo oooo

oo oo

oo oo o oo

o

o

oooo o o oooooo oo o

ooo

oo

o oo oo oooooo oo o

oooo

oo oooo

oo

oooooo o

oo

ooo

oooo ooo

o oo o o oo o

oooo oo o ooo

o

oooo

ooo

o oo oooo o

oo

oo oooo

ooo

ooooo oo o o

o o ooo

oo

ooo o

o

o

oooo o

ooooo o oo

oooo ooo oo

o o oooooo

ooo o

o

o oo

oooo oo

o o ooooooooo

oo o

oo

ooooo

oo

ooo

ooo

o

oo oo

ooo o o oooooo

o

−1.0 0.0 0.5 1.00

2060

10015

o o ooo oo

o o ooooo oooo o

oo

ooooo oooo

o oo oooo oo ooo ooo

oo

ooo ooo o oo

o ooooo

oo

oo o

o oo o

ooo

oo oooo oo

oooo

oo ooooooooooo o oo oo oo

oooo oo oo o oo

ooo oo ooo

o

o oo

o ooooooo

oooo

o oo ooo

oo oo oo

o

oooooo o ooo o

o

o oooo oooo oo oo

oo

ooo o

ooo o

o

ooo

ooo oo oo o

o ooo

o ooo ooo

o

o

ooo o o oo ooo o

oo ooo oo

oo

oooo oooooo oo o oooo

oo

oooo oo

oo

oooo oo

ooo

o oo

oo oooooo o o oo oo

ooo oo o oo oo

o

ooooo

oo o ooooo oo

ooo

ooooooo

o

oooo ooo oo

o oooo

oo

oo o o

o

o

oooooo o

oo o oo oooo ooo

oooo ooooo

oooo

oo

o

ooo

ooo ooo

o oooooo oooo

oooo

oo o

ooo

oo

oo o

oo oo

ooo ooo o o oooo

oooo

−1.0 0.0 0.5 1.0

020

60100

16

Figure 2.17: Multiple lagged scatterplots showing the relationship between the SOI at timet + h, say xt+h (x-axis) versus recruits at time t, say yt (y-axis), 0 # h # 15.

ooooo oooo oooo o oo ooo

ooo o o ooo

oo

oooo oooooooo ooo

o

oo

o ooo

o oo ooo oo

ooo

oo

ooo

o ooo

ooo

oooo

oo o oo oo

oooooo o ooo oooooooo ooo

oooooooo o oooo oo ooo

oo

o oo

o oo ooo

oo

oo oo

o ooo oo

ooo oo

oo

oo o oo oooo o o

o

ooooooooo o ooo o

oo

oooo ooo

o

ooooooo oo

o ooo o

oooo

ooo oo

o

o oo oooo oo o o

oo oooo

o

oo

ooooo o oo ooo ooo

oooo

oooooo

ooo

o ooooo

oo

ooo

oo ooooo

o ooo ooo o

ooo oooooo

o

o

ooooo

oo ooo oo ooo

oo

oo ooo o

o oo

oo ooo oo oo

ooooo

oo

ooo o

o

o

oo ooo

ooo ooo o o

o oooo ooo o

oo o oo ooo

o ooo

o

oo o

oooooo

ooo ooo ooooo

ooo

oo

ooooo

ooo

oo

ooo o

oo oo

ooo o oo ooo oo

o

−1.0 0.0 0.5 1.0

020

60100

1

oooooo

ooo oooo o ooo o

oo

oo o o oooo

ooooo oooooooo oo

o

oo

oooo

oo oo ooo o

ooo

oo

oo ooo

o ooo o

oooo

o oo ooo o

ooooooo o oo

o oooooooo ooo

oooooooo o o

o oo oo ooo

o

ooo

o ooo oo

o o

o oo o

o oooo o

oooo oo

o

ooo o oo oooo o

o

o ooo

oooooo o oooo

oo oo

oo oo

o

o oo

ooooo oo o

ooooooo

ooooo

o

oo oo oooooo o

ooo ooo

o

oo

o oo ooo o oo ooo oo

oo o

oo

o ooooo

ooo o oooo

oo

oooo

o o oooooo ooo ooo

oooo oooo oo

o

o oo o

oooo o

o o oo oo

oo

ooo ooo

o o o

ooo ooooo o

oooooo

oo

oooo

o

o ooo o

ooo o ooo o

oo ooooooo

ooo o oooo

oooo

o

ooo

o ooooo

oooo ooo oooo

oo oo

oo o

ooo

oo

ooo

oo oo

ooo o

oooo o oo ooo ooo

−1.0 0.0 0.5 1.0

020

60100

2

ooooo

oooo oooo o oo o

oo

o oo o o oo

oo

o oooo oooooooo oo

oo

o oo oooo oo

oooooo

oo

ooooo

o oo

ooooo

ooo oo

o oo oo oooooo o oo

o oooooooo oo

ooooooooo o

oo oo oo oo

o

o oo

ooo oo o

oo

oo oo

ooo ooo

ooooo oo

oooo o oo oooo

o

oo ooooooooo o oo

oo

oo oooo o

o

ooo

o ooooooo

o ooo

ooooooo

o

o

o oo oo oooo oo

o ooooo

o

oo

o ooo ooo o oo ooo o

ooo

oo

oo ooooo

ooo

o ooo

oo

ooo

oo o o ooo

ooo ooo oo

o oooo oooo

o

o

o ooo

ooo

oooo o oo o

ooo

ooo ooo o o

oooo ooo oo

ooooooo

oooo

o

o

o o oo o

oooo o ooo

o oo oooo oo

o ooo o oo ooo

o o

o

o oo

o ooooo

ooooo oooooo

oooo

oo o

ooo

oo

ooo

oo o o

oooooooo

o o ooooo

ooo

−1.0 0.0 0.5 1.0

020

60100

3

oooo

o oooo oooo ooo

oo

oo oo o oo

oooo oooo oooooooo

o

ooo o

oooooo o

o ooo o

oo

oo

ooo ooo

ooo

o oooooo o

o o oo

oo oooooo o ooo ooooooooooooooooooo

o oo oo ooo

o

ooo

o oo o oo

oo

o oo o

o oo o oo

oooooo o

o oooo o ooooo

o

o oo oo

ooooooo o oo

oo oo

oooo

o

ooo

oo ooooo o

o o oo

o oooooo

o

o

oo oo oo oooo o

o o ooo o

o

oo

o oo oo ooo o oo ooo

ooo

oo

o oo ooo

oo

o oo o oo

oo

oo o

ooo o o oo

oooo ooo o

oo oooo oooo

o

o oo oo

oooo

o oo o oo

ooooooo o

oo o

o oooo ooo oo ooo

oooo

oooo

o

oo ooo

o oooo o oo

o o oo oooo ooo ooo o

oooo

oo

o

oo o

ooo ooo

oooooo ooo oo

ooo

oo

ooo

oooo

oo o

ooo o

oo oo

o ooooo o oo oo

o ooo

−1.0 0.0 0.5 1.00

2060

1004

ooo

oo oooo oooo

o oo

oooo oo o

oo

oooo ooooooooooo

o

oooo

o oo oooo

oo ooo

oo

oo

o ooooo

oo o

oo oo

oooooo o

oo oo oooooo

o ooo oooooooo

ooooo

oooooo o oo oo o

oo

ooo

oooo o o

o o

oo oo

oooo o o

oo ooo

oo

oo oooo o oo oo

o

o ooo

oooooooooo

oo

o o oo ooo

o

oooo oo ooo

oooo o

ooo o

ooooo

o

o oo oo oo oooooo o ooo

o

oo

ooo o oo ooo o oo oo

ooo

oo

oo oo oo

oooo oo o o

ooo

ooo

ooo o o oooooo ooo

ooo oooo ooo

o

ooo o

oo ooooo oo o o

oooooooo

ooo

o o ooooooo

oo oooo

ooooo

o

o

o ooo oo o

oooo o ooo o oo oooo

ooo oooo o

o ooo

o

ooo

ooo o oo

ooooooo ooo o

oooo

ooo

oo o

ooo

oo

oooo

ooo o

oo ooooo o

oo ooo ooo

−1.0 0.0 0.5 1.0

020

60100

5

ooooo oooo ooo

o oo

oo ooo oo

oo

ooooo oooo oooooo

o

oo

ooo o

oo oooo oo

ooo

oo

ooooo

o oo

o oooo

ooooo

o ooo

oo oo oooooo o ooo oooooo

oo ooooooooo

oo o oo oooo

ooo

o oo oo o

oo

ooo o

o oo oo o

ooo ooo

o

o oo oooo ooo o

o

ooo oo

ooooooooo oo

oo ooo oo

o

o oooo oo oo

ooo oo

oooooooo

o

o

oo oo oo ooooo

o oo ooo

o

oo

ooo o o oo ooo o oo o

oo oo

oo oo oo

oo

ooo

o oo o

ooooo

ooooo o o

oooooo oo

o ooo oooo o

o

o

o oo o

ooo oo

ooo oo o

oo

oooooo

o oo

o o o oooo oo

o oo oooo

oooo

o

o

oo oo o

ooo oooo oooo o oo

oooo ooo ooo o

oooo

o

o oo

o ooo o o

ooooooooooo

ooo

oo

o oo

o oooooo

ooo

o

oo oo

ooo oooooo oo

ooooo

o

−1.0 0.0 0.5 1.0

020

60100

6

ooooo oooo oo

ooo

oo o ooo o

oo

oooooo oooo ooooo

o

oo

oooo

o oo oooo o

o oo

oo

ooo

o oooo

ooo oo

ooooo

oo oo

o oo oo oooooo o ooo ooooo

ooo ooooooo

oooo o oo o

oo

o oo

oooo oo

o o

o ooo

oooo oo

oooo ooo

oo oo ooooo oo

o

ooo o oo ooooooooo

ooooo oo o

o

ooo

ooo oo ooo

oo oo

o ooo oooo

o

ooo oo oo oo oo

oo ooo oo

oooo

oo o o oo ooo o ooo

ooo

ooo oo o

oo

ooooo oo

oo

ooo

oo oooo o

o oooooo o

oo ooo oooo

o

o

oooo

oo oo o

oooo oo

oo

oooooo

oo o

oo o o oooo o

oo oooooo

oooo

o

ooooo

o oo o oooo

o ooo o oo oo

oo ooo ooo

o oo o

o

oo o

ooooo o

ooooooooo oo

ooo

ooooo

ooo

oooo

oo oo

oo o o

o ooooooo

o o oo oo

o oo

o

−1.0 0.0 0.5 1.0

020

60100

7

ooooo oooo ooo

oo

oo o ooooo

oo ooooo oooo oooo

o

oo

o oooo o oo o

ooooo

oo

oo

oooooo

ooo

o o ooo ooo

oooo

o o oo oo oooooo o ooo oooo

oooo oo

ooooooooo o oo

oo

ooo

ooo oo o

o o

oo oo

o oo oo o

oo ooo oo

ooo oo oooo o o

o

oooo o oo oooooo

ooo

o ooo o oo

o

ooo

oooo ooooooo

oo o o

oo ooo

o

oooo oo oooo oooo o

o oo

oo

ooooo o o oo ooo o o

ooo

oo

ooo ooo

oo

ooooo o

oo

oooo

oo ooooo o oooooo

ooo ooo oooo

o

ooo o

oo o oo

ooooo o

oo

oo oooo

ooo

ooo o o ooooooo o

ooooooo

o

o

oooo o

o ooo o ooo

o o ooo ooo o

ooo ooooo

o ooo

o

ooo

ooo ooo

o ooooooooo o

oo o

oo

ooo

ooo

oooo

ooo o

ooo o

oo ooo ooooo o

oo ooo

ooo

−1.0 0.0 0.5 1.0

020

60100

8

ooooo oooo

ooo

oo oo o oo

ooo

o o ooooo oooo oooo

oo

oooo

oo o oooooo o

oo

oo

oooo

o oo

o ooo o

ooo oo

oooooo o oo oo oooooo o ooo ooo

ooo

oo ooooooooooo o o

oo

o oo

oooo oo

oo

o oo o

oooo ooo

o o ooo o

oooo oo oooo o

o

o ooo

o o oo ooooooo

oo o o

oo o o

o

oooo oooo o

o oooo

ooo o

ooo oo

o

ooooo oo oo oo

ooooooo

ooo o

oooo o o oo ooo oo

o oo

ooooo o

oo

ooooooo

oo

ooo

oooo ooo

o o o ooooo

o ooo ooo ooo

o

oooo

oo o o o

o ooooo

oo

ooo ooo

ooo

o ooo o oooo

o ooooo

oooooo

o

ooooo

ooo oo o oo

oo o oooo oo

oooo ooo o

ooo o

o

ooo

o ooo oo

o o ooooooooo

ooo

oo

ooo

o oo

oo

oo

oooo

oooo

o oo ooo ooooo

o oooo

oooo

−1.0 0.0 0.5 1.0

020

60100

9

ooooo oooo ooo

o o oo o oo

ooo o o oooo

o oooo ooo

oo

ooo oooo o o

o oooo

oo

oo

o oooooo

oooooo

ooo ooooo

o oo o oo oo oooooo o ooo ooo

ooooo ooooooooooo o

oo

ooo

o oooo o

o o

o o oo

ooo oo o

ooo o ooo

ooooo oo oooo

o

ooooo

o o oo ooooooo

oo oooo o

o

o oo

oo oooooo

oooo

o ooo ooo

o

o

oooooo oooo o

o oooo o

o

oo

ooooooo o o oo ooo

ooo

oo

o ooooo

oo

o ooooo

ooo

o oooooo oo

oo o o oooo

oo ooo ooo o

o

o

o ooo

ooo o o

oo oooo

oo

oo oo oo

ooo

oo ooo oo oo

oo ooo

oo

oooo

o

o

oooooo o

o o oo o oooo o oo

o o oo oooo o

oooo

o o

o

o oo

ooooo ooo o ooooo

ooo

ooo

oo

oooooo

oo

o o

ooo

o

oooo

o o oo ooo oooo

o o oo oo

o ooo

−1.0 0.0 0.5 1.0

020

60100

10ooooo oo

ooo

ooo o oo o

oo

ooo o o ooooo oooo o

o

oooo

oooooo o

oo ooo

oo

oo

oooooo

ooo

o ooo

o oooooo

ooo oo o oo oo oooooo o ooo o

ooooooo oooo

oooooooo

o

o oo

oooooo

oo

oo o o

o ooo oo

oo oo o oo

o ooooo ooooo

o

o oo oooo o oo ooo

ooooooo ooo

o

ooo

ooo oooo o

o ooo

oo oo o ooo

o

ooooooo oo oo

oo oooo

o

oo

ooo ooooo o o oo ooo

o oo

ooo ooo

oo

ooo oooo

oo

oo o

oooooo o

ooo o o ooo

ooo ooo oooo

o

oooo

oo oo o

o oo ooo

oo

oo o oo o

ooo

ooo oooo o o

ooo oo

ooo

oooo

o

ooooo

oooo o oo o

oooo o ooo o

oo ooooooo o

oo

o

oo o

ooo ooo

ooo o ooooooo

oo o

oo

ooo

ooo

oo

o o

oooo

oo oo

oo o oo oooooo

oo ooo

ooo

ooo

−1.0 0.0 0.5 1.0

020

60100

11ooooo o

ooo

oooo o oo

oo

oo oo o o ooooo oooo

o

oo

oooo

o ooooo oo

ooo

oo

ooo

o oooo

o ooo o

oo o oo

o oooooo oo o oo oo oooooo o ooo

ooo

ooooo ooo

oooooooo

o

ooo

o oo ooo

o o

o oo o

ooooo o

ooo oo o o

oo ooooo oo oo

o

o ooo oooo o oo oo

oooooo

o o oo

o

o oo

oooo oooo

oo ooooo

oo o oo

o

oooooooooo o

o oo ooo

o

oo

o ooo ooooo o o oo ooo o

oo

ooo ooo

oo

o oo ooo

oo

ooo

ooooooo

oooo o o oo

oooo ooo ooo

o

ooo o

ooo

ooo o oo oo

oo

ooo o oo

ooo

oooo ooo o o

ooooo

oo

oo oo

o

o

ooooo

ooo oo o oo

o oooo oooo

o oo oooo o

oooo

o

o oo

oooo oo

o ooo o oooooo

ooo

ooo o

oooo

oo

oo

oooo

ooo o

ooo o oo ooo oo

oooo o

oooo ooo

−1.0 0.0 0.5 1.0

020

60100

12

oooooooo

ooooo o o

oo

ooo oo o o ooooo ooo

o

oooo

oooo ooo

o o oo oo

oo

oo o

ooooo

ooooo

ooo o o

oo oo

oooo oo o oooo oooooo o oo

ooooooooo oo

ooooooooo

o oo

oooo oo

oo

oo oo

o oo ooo

oo oo oo o

ooo ooooooo o

o

ooo oo oooo o oo ooo

oooo

oo o o

o

o oo

o oooo oooo oo

oooo

o oo oo

o

o oooooooo oo

oo oooo

o

oo

o oooo ooooo o o ooo

ooo

oo ooo o

oo

ooo oo oo

ooo

o oo

o oooooo oooo o o o

ooooo ooo o

o

o

oooo

oooo o

o o o oo o

oo

oo oo o o

o oo

ooooo ooo o

o oooo

oo

ooo o

o

o

ooooo

oooo oo o o

o o ooooo oo

o o oo oooo

ooo o

o

o o o

o oooo o

oo ooo o ooooo

oooo

ooo

ooo

oo

ooo

oo oo

oooo

oooo o oo ooo o

oooo o

oo ooo

ooo

−1.0 0.0 0.5 1.0

020

60100

13ooo

oo o

oo

o oooo oo

oo

ooo oo o o ooooo ooo

oo

oooo

ooo oooo o

ooo

oo

ooo

ooo o

ooo

o ooo

ooo oooo o

ooooo oo o oooo oooooo o oo

o oooooooo oooooooo

oo

o oo

o oo oo o

oo

o oo o

o ooo oo

ooo oo oo

o ooo ooooo oo

o

ooo o oo oooo o ooooo

oooooo o

o

ooo

oo oooooo

oo oo

ooooo ooo

o

oo oooooooo o

o oo oo oo

oo

ooo ooo ooooo o o oo

ooo

ooo ooo

oo

oo oo oo o

oo

ooo

oo o oooo

oo oooo o o

oooooo oooo

o

o ooo

ooooo

oo o o oo

ooo

oo oo ooo

o

ooooooooo

o o ooo

oo

oo oo

o

o

ooooo

ooooo oo o

oo o oooo o o

oo o oo ooo

o ooo

o

oo o

oooooo

ooo ooo o oooo

oooo

ooo

ooo

ooo

o o

oo o o

oooo

o oooo o oooooooo

ooo

ooooo o

oo

−1.0 0.0 0.5 1.0

020

60100

14ooo

ooo

ooo oooo

oo

oo ooo oo o o ooooo o

o

oo

oooo

oooo oooo o o

oo

oo

o oo o

ooo

oooo o

oo ooo

o ooo

oooooo oo o oo oo oooooo oooo ooo

ooooooooooooo

o

ooo

oooo oo

oo

oo oo

ooo oo o

oo oo oo o

o o ooo ooooo o

o

oooo o oo oooo o oo ooooo

oooo

o

ooo

o oo oooo o

oooo

o ooooo o

o

o

ooo ooooo

ooooo oo

ooo

oo

o oo o ooo ooooo o o

oo ooo

o oo ooo

oo

oo oo oo

oo

ooo

ooo o ooo

ooo oooo o

o oooooo oo

o

o

oooo

oo ooo

o oo o o o

oooooo oo

o oo

ooooooo oo

o o o oo

oo

ooo oo

o

ooooooo

oooo ooo oo o oo

oo oooo o oo

oooo

oo

o

ooo

o oo ooo

o ooo ooo oooo

ooo

oo

o ooo o

oo

ooo

ooo o

oooo

oo oooo o o

o ooo oo

ooo

o oo oooo

oo

−1.0 0.0 0.5 1.0

020

60100

15oo

ooo

oooo ooo

oo

oo o ooo oo

o o oooooo

ooo o

ooooooo

oooo oo

oo

ooo

oooo

ooooooo

oo ooo o o

oo oooooo oo o oo oo oooooo

ooo

o oooooooo oooooo

oo

ooo

o oo oo o

o o

ooo o

o oo o oo

ooo oo oo

oo o ooo ooooo

o

o ooo

o o oo oooo o oo

oooo

oooo

o

o oo

o o oo oooo

oooo

oo ooooo

o

o

o ooo ooooooo

o oo oo o

o

oo

oooo o ooo ooooo o

ooo

ooo o oo o

oo

oooo oo o

oo

ooo

oo oo o oo

oooo oooo

o o oooooo oo

o

ooo o

ooo oo

oo oo o o

oo

ooooo o

o o o

o ooooooo ooo o o

oo

oo

oooo

o

ooooooo

ooooo oo o oo o o

oooo ooo o o

o ooo

o o

o

o oo

o ooo

oooo ooo ooo

o oo

ooo

oooo

ooo

oo

ooo

ooo

o

ooooooo oooo o

oo ooo o

ooo

o o oooo

oo

oo

−1.0 0.0 0.5 1.0

020

60100

16

Figure 2.18: Multiple lagged scatterplots showing the relationship between the SOI at timet, say xt (x-axis) versus recruits at time t + h, say yt+h (y-axis), 0 # h # 15.

measured at lag h = 0. The general pattern suggests that predictingrecruits might be possible using the El Nino at lags of 5, 6 ,7, . . .


months.

A measure of the correlation between several series, xt and yt is thecross covariance function (CCF), defined in terms of the counterpartcovariance to (2.2),

E{(xt+h " µx)(yt " µy)} = #xy(h).

The cross correlation is the scaled (to lie between "1 and 1) versionof above, say

$xy(h) = #xy(h)/&#x(0) #y(0).

The above quantities are expressed in terms of population averagesand must generally be estimated from sample data. The estimatedsample cross correlation functions can be used to investigate the pos-sibility of the series being related at di!erent lags. In order to inves-tigate cross correlations between two series xt, yt, t = 1, . . . , n, wenote that a reasonable sample version of $xy(h) might be computedas

"#xy(h) =1

n

n"h$

t=1(xt+h " x)(yt " y),

where y is the sample mean of {yt}nt=1. The estimated sample cross

correlation function then becomes "$xy(h) = "#xy(h)/&"#x(0) "#y(0),

where h denotes the amount that one series is lagged relative tothe other. Generally, one computes the function "$xy(h) for a numberof positive and negative values, h = 0, ±1, ±2, up to about 0.3 n,say and displays the results as a function of lag h. The maximumvalue, attained for h = 0, is $xy(0) = 1 and the function takes valuesbetween "1 and 1. This fact makes it easy to compare values of thecross correlation with each other. Furthermore, under the hypothe-sis that there is not relation at time h and that at least one of thetwo series are independent and identically distributed, the distribu-tion of "$xy(h) is approximately normal with mean 0 and standard


deviation given again by (2.3). Hence, one can compare values ofthe sample cross correlation with some appropriate number of sam-ple standard deviations based on normal theory. Generally, valueswithin ±1.96)$ might be reasonable if one is willing to live with eachtest at a signi.cance level of 0.05. Otherwise, broader limits wouldbe appropriate. In general, if m tests are made, each at level a theoverall level of significance is bounded by m&.

Example 1.16: To give an example, consider the cross correlationsbetween the environmental series and recruitment shown in the bot-tom panel of Figure 2.16. The cross correlation between the SOI,xt+h and recruitment yt shows peaks at h = "6, "7 which impliesthat lagged products involving xt"6 and yt as well as those involv-ing xt"7 and yt match up closely. The value shown on the graph is"$xy("6) = "0.6. This means, in practice that the values of the SOIseries tend to lead the recruitment series by 6 or 7 units. Also, onemay note that, since the value is negative, lower values of SOI areassociated with higher recruitment. The standard error in this case isapproximately )$ = 0.047 and the "0.6 easily exceeds two standarddeviations, shown as lines above and below the axis in Figure 2.16.Hence, we can reject the hypothesis that the correlation is zero atthat lag.

It is clear also that there are some periodic fluctuations apparent inthe cross correlations. For example, in the SOI Recruitment example,there seems to be systematic fluctuation with a period (1 full cycle) ofabout 12 months. This produces a number of secondary peaks in thecross correlation function. The analysis of this periodic behavior andthe accounting for periodicities in the series is considered in latersections or the books by Franses (1996) and Ghysels and Osborn(2001).


2.4.3 Partial Autocorrelation Function

A third kind of time series relationship expresses what is essentiallythe self predictability of a series through the partial autocorrelationfunction (PACF). There are several ways of thinking about this mea-sure.

One may regard the PACF as the simple correlation between twopoints separated by a lag h, say xt and xt"h, with the e!ect of theintervening points xt"1, xt"2, . . ., xt"h+1 conditioned out, i.e. thepure correlation between the two points. This interpretation is oftengiven in more practical statistical settings. For example, one may getsilly causal inferences by quoting correlations between two variables,e.g. teachers’ income and wine consumption, that may occur simplybecause both are correlated with some common driving factor, inthis case, the gross domestic product, GDP, or some other forceinfluencing disposable income.

In time series analysis we are really more interested in the pre-diction or forecasting problem. In this case we might consider theproblem of predicting xt based on observation observed h units backin the past, say xt"1, xt"2, . . ., xt"h. Suppose we want to predict xt

from xt"1, . . ., xt"h using some linear function of these past values.Consider minimizing the mean square prediction error

MSE = E[(xt " "xt)2]

using the predictor "xt = a1 xt"1 + a2 xt"2 + · · · + ah xt"h over thepossible values of the weighting coe"cients a1, . . ., ah where weassume, for convenience that xt has been adjusted to have zero mean.

Consider the result of minimizing the above mean square predic-


tion error for a particular lag h. Then, the partial autocorrelationfunction is defined as the value of the last coe"cient "ah, i.e. $hh = ah.As a practical matter, we minimize the sample error sum of squares

SSE =n$

t=h+1

'

((xt " x) "h$

k=1ak(xt"k " x)

)

*2

with the estimated partial correlation defined as #$hh = "ah.

The coe"cients, as defined above, are also between "1 and 1 andhave the usual properties of correlation coe"cients. In particular,its standard error under the hypothesis of no partial autocorrelationis still (2.3). The intuition of the above argument is that the lastcoe"cient will get very small once the forecast horizon or lag h islarge enough to give good prediction. In particular, the order of theautoregressive model in the next chapter will be exactly the lag hbeyond which $hh = 0.

Example 1.17: As an example, we show in the right panels panelof Figure 2.19 the partial autocorrelation functions of the SOI series

0 5 10 15 20 25 30

−0.50.0

0.51.0

PACF of SOI

0 5 10 15 20 25 30

−0.50.0

0.51.0

PACF of Recruits

Figure 2.19: Partial autocorrelation functions for the SOI (left panel) and the recruits (rightpanel) series.

(left panel) and the recruits series (right panel). Note that the PACF


of the SOI has a single peak at lag h = 1 and then relatively smallvalues. This means, in e!ect, that fairly good prediction can beachieved by using the immediately preceding point and that addingfurther values does not really improve the situation. Hence we mighttry an autoregressive model with p = 1. The recruits series hastwo peaks and then small values, implying that the pure correlationbetween points is summarized by the first two lags.

A major application of the PACF is to diagnosing the appropriateorder of an autoregressive model for the series under consideration.Autoregressive models will be studied extensively later but we notehere that they are linear models expressing the present value of aseries as a linear combination of a number of previous values, withan additive error. Hence, an autoregressive model using two previousvalues for prediction might be written in the form xt = "1 xt"1 +"2 xt"2 + wt where "1 and "2 are fixed unknown coe"cients and wt

are values from an independent series with zero mean and commonvariance )2

w. For example, if the PACF at lag h is roughly zerobeyond a fixed value, say h = 2 as observed for the recruits series inthe left panel of Figure 2.19, then one might assume a model of theform above for that recruits series.

To finish the introductory discussion, we note that the extension ofwhat has been said above to multivariate time series is fairly straight-forward if one restricts the discussion to relations between two seriesat a time. The two series xt and yt can be laid out in the vector(xt, yt) which becomes a multivariate series of dimension two (bi-variate). To generalize, consider the p series (xt1, xt2, . . . , xtp) = xt

as the row vector defining the multivariate series xt. The autocovari-ance matrix of the vector xt can then be defined as the p$ p matrix


containing as elements

#ij(h) = E{(xt+h,i " µi)(xtj " µj)}

and the analysis of the cross covariance structure can proceed on twoelements at a time. The discussion of possible multiple relations isdeferred until later.

To recapitulate, the primary objective of this chapter has been todefine and illustrate with real examples three statistics of interestin describing relationships within and among time series. The au-tocorrelation function measures the correlation over time in a singleseries; this correlation can be exploited for prediction purposes or forsuggesting periodicities. The cross correlation function measures thecorrelation between series over time; it may be, for example, that aseries is better related to the past of another series than it is to itsown past. The partial autocorrelation function gives a direct mea-sure of the lag length necessary to predict a series from itself, i.e. toforecast future values. It is also critical in determining the order ofan autoregressive model satisfied by some real data set.

It should be noted that all three of the measures given in thissection can be distorted if there are significant trends in the data. Itis obvious that lagged products of the form (xt+h"x)(xt"x) will beartificially large if there is a trend present. Since the correlations ofinterest are usually associated with the stationary part of the series,i.e., the part that can be thought of as being superimposed on trend,it is usual to evaluate the correlations of the detrended series. Thismeans, in e!ect, that we replace x in the equations by "a + "b t ifthe trend can be considered to be linear. If the trend is quadraticor logarithmic, the appropriate alternate nonlinear predicted valueis subtracted before computing the lagged products. Note also that


di!erencing the series, as discussed in Section 2.2.2, can accomplishthe same result.

2.5 Problems

1. Consider a generalization of the model given in Example 1.1,namely, xt = µ+wt" !wt"1, where {wt} are independent zero-mean random variables with variance )2

w. Prove that E(xt) = µ,#x(0) = (1 + !2))2

w, #x(1) = "! )2w, #x(h) = 0 if |h| > 1, and

finally show that xt is weakly stationary.

2 Consider the time series generated by x1 = µ + w1 and xt =µ + xt"1 + wt for t ( 2. Show that xt is not stationary nomatter µ = 0 or not and find #x(h).

3. Suppose that xt is stationary with mean µx and covariance func-tion given by #x(h). Find the mean and covariance function of

(a) yt = a + b xt, where a and b are constants,

(b) zt = xt " xt"1.

4. Consider the linear process

yt =&$

j="&aj wt"j,

where wt is a white noise process with variance )2w and aj, j = 0,

±1, ±2, ldots, are constants. The process yt will exist (as alimit in mean square) if

%j |aj| < &; you do not need to prove

this. Show that the series yt is stationary, with autocovariancefunction

#y(h) = )2w

&$

j="&aj+h aj.


Apply the result to calculating the autocovariance function ofthe 3-point moving average (xt"1 + xt + xt+1)/3.

For the following problems, you need to use a computer package.

5. Melting glaciers deposit yearly layers of sand and silt during thespring melting seasons which can be reconstructed yearly over aperiod ranging from the time de-glaciation began in New Eng-land (about 12,600 years ago) to the time it ended (about 6, 000years ago). Such sedimentary deposits, called varves, can beused as a proxy for paleoclimatic parameters such as tempera-ture. The file mass2.dat contains yearly records for 634 yearsbeginning 11, 834 years ago, collected from one location in Mas-

year

varve thi

ckness

0 100 200 300 400 500 600

050

100150

Varve thickness from Massachusetts (n=634)

Figure 2.20: Varve data for Problem 5.

sachusetts. For further information, see Shumway and Verosub(1992).

(a) Plot the varve records and examine the autocorrelation andpartial autocorrelation functions for evidence of nonstationarity.

(b) Argue that the transformation yt = log xt might be useful forstabilizing the variance. Compute #x(0) and #y(0) over two time


intervals for each series to determine whether this is reasonable.Plot the histograms of the raw and transformed series.

(c) Plot the autocorrelation of the series yt and argue that a firstdi!erence produces a reasonably stationary series. Can you thinkof a practical interpretation for ut = yt"yt"1 = log xt"log xt"1?

(d) Compute the autocorrelation function of the di!erencedtransformed series and argue that a generalization of the modelgiven by Example 1.1 might be reasonable. Assume thatut = wt " !wt"1 is stationary when the inputs wt are assumedindependent with E(wt) = 0 and (w2

t ) = )2w. Using the sample

ACF and the printed autocovariance "#u(0), derive estimators for! and )2

w.

6. Two time series representing average wholesale U.S. gasoline andoil prices over 180 months, beginning in July, 1973 and ending inDecember, 1987 are given in the file oil-gas.dat. Analyze thedata using some of the techniques in this chapter with the ideathat one should be looking at how changes in oil prices influence

month

price

0 50 100 150

100200

300400

500600

700

Gas and oil prices (n=180 months)

GASOIL

Figure 2.21: Gas and oil series for Problem 6.

changes in gas prices. For further reading, see Liu (1991). In


particular, consider the following options:

(a) Plot the raw data and look at the autocorrelation functionsto argue that the untransformed data series are nonstationary.

(b) It is often argued in economics that price changes are im-portant, in particular, the percentage change in prices from onemonth to the next. On this basis, argue that a transformationof the form yt = log xt " log xt"1 might be applied to the datawhere xt is the oil or gas price series.

(c) Use lagged multiple scatterplots and the auto and cross cor-relation functions of the transformed oil and gas price series toinvestigate the properties of these series. Is it possible to guesswhether gas prices are raised more quickly in response to in-creasing oil prices than they are decreased when oil prices aredecreased ? Do you think that it might be possible to predictlog percentage changes in gas prices from log percentage changesin oil prices? Plot the two series on the same scale.

7. Monthly Handgun Sales and Firearms Related Deaths in Califor-nia Legal handgun purchase information for 227 months span-ning the time period February 1, 1980 through December 31,1998 was obtained from the Department of Justices automatedFirearms systems database. California resident firearms deathdata was obtained form the California Department of HealthServices. The data are plotted in the figure, with both ratesgiven in numbers per 100,000 residents. Suppose that the mainquestion of interest for this data pertains to the possible relationsbetween handgun sales and death rates over this time period.Include the possibility of lagging relations in your analysis. Inparticular, answer the questions below:


months0 50 100 150 200

1.01.5

2.0

Gun sales and gun death rate

Hangun sales/100 per 100,000

Gun death rate per 100,000

Figure 2.22: Handgun sales (per 10,000,000) in California and monthly gun death rate (per100,00) in California (February 2, 1980 -December 31, 1998.

(a) Use scatterplots to argue that there is a potential nonlinearrelation between death rates and handgun sales and indicatewhether you think that there might be a lag.

(b) Bolster your argument for a lagging relationship by examin-ing the cross correlation function. What do the autocorrelationfunctions indicate for this data?

(c) Examine the first di!erence for the two processes and indi-cate what the ACF’s and CCF’s show for the di!erenced data.

(d) Smooth the two series with a 12 point moving average andplot the two series on the same graph. Subtract the moving av-erage from the original unsmoothed series. What do the residualseries show in the ACF and CCF for this case?

2.6 Computer Code

The following R commands are used for making the graphs in thischapter.


# 3-28-2006

graphics.off() # clean the previous graph on the screen

################################################################

# This is Southern Oscillation Index data and Recruits data

##############################################################

y<-read.table("c:\\teaching\\time series\\data\\soi.dat",header=T)

# read data file

x<-read.table("c:\\teaching\\time series\\data\\recruit.dat",header=T)

y=y[,1]

x=x[,1]

postscript(file="c:\\teaching\\time series\\figs\\fig-1.1.eps",

horizontal=F,width=6,height=6)

par(mfrow=c(1,2),mex=0.4)

# save the graph as a postscript file

ts.plot(y,type="l",lty=1,ylab="",xlab="")

# make a time series plot

title(main="Southern Oscillation Index",cex=0.5)

# set up the title of the plot

abline(0,0)

# make a straight line

ts.plot(x,type="l",lty=1,ylab="",xlab="")

abline(mean(x),0)

title(main="Recruit",cex=0.5)

dev.off()

z=arima.sim(n=200,list(ma=c(0.9))) # simulate a MA(1) model




ts.plot(z,type="l",lty=1,ylab="",xlab="")

title(main="Simulated MA(1)",cex=0.5)

abline(0,0)

dev.off()

n=length(y)

n2=n-12

yma=rep(0,n2)

for(i in 1:n2){yma[i]=mean(y[i:(i+12)])} # compute the

yy=y[7:(n2+6)]

yy0=yy-yma




ts.plot(yy,type="l",lty=1,ylab="",xlab="")

points(1:n2,yma,type="l",lty=1,lwd=3,col=2)

ts.plot(yy0,type="l",lty=1,ylab="",xlab="")

points(1:n2,yma,lty=1,lwd=3,col=2) # make a point plot

abline(0,0)

dev.off()

m=17

n1=n-m

y.soi=rep(0,n1*m)

dim(y.soi)=c(n1,m)

y.rec=y.soi

for(i in 1:m){


y.soi[,i]=y[i:(n1+i-1)]

y.rec[,i]=x[i:(n1+i-1)]}

text_soi=c("1","2","3","4","5","6","7","8","9","10","11","12","13"

"14","15","16")




for(i in 2:17){

plot(y.soi[,1],y.soi[,i],type="p",pch="o",ylab="",xlab="",

ylim=c(-1,1),xlim=c(-1,1))

text(0.8,-0.8,text_soi[i-1],cex=2)}

dev.off()

text1=c("ACF of SOI Index")

text2=c("ACF of Recruits")

text3=c("CCF of SOI and Recruits")

SOI=y

Recruits=x




acf(y,ylab="",xlab="",ylim=c(-0.5,1),lag.max=50,main="")

# make an ACF plot

legend(10,0.8, text1) # set up the legend

acf(x,ylab="",xlab="",ylim=c(-0.5,1),lag.max=50,main="")

legend(10,0.8,text2)

ccf(y,x, ylab="",xlab="",ylim=c(-0.5,1),lag.max=50,main="")

legend(-40,0.8,text3)

dev.off()





for(i in 1:16){

plot(y.soi[,i],y.rec[,1],type="p",pch="o",ylab="",xlab="",

ylim=c(0,100),xlim=c(-1,1))

text(-0.8,10,text_soi[i],cex=2)}

dev.off()




for(i in 1:16){

plot(y.soi[,1],y.rec[,i],type="p",pch="o",ylab="",xlab="",

ylim=c(0,100),xlim=c(-1,1))

text(-0.8,10,text_soi[i],cex=2)}

dev.off()




pacf(y,ylab="",xlab="",lag=30,ylim=c(-0.5,1),main="")

text(10,0.9,"PACF of SOI")

pacf(x,ylab="",xlab="",lag=30,ylim=c(-0.5,1),main="")

text(10,0.9,"PACF of Recruits")

dev.off()

################################################################

###################################################################


# This is global temperature data

#################################

y1<-matrix(scan("c:\\teaching\\time series\\data\\ngtemp.dat"),

byrow=T,ncol=1)

a<-1:12

a=a/12

y=y1[,1]

n=length(y)

x<-rep(0,n)

for(i in 1:149){

x[((i-1)*12+1):(12*i)]<-1856+i-1+a

}

x[n-1]<-2005+1/12

x[n]=2005+2/13

#########################

# Nonparametric Fitting #

#########################

#########################################################

# Define the Epanechnikov kernel function local estimator

kernel<-function(x){0.75*(1-x^2)*(abs(x)<=1)}

###############################################################

# Define the function for computing the local linear estimation

local<-function(y,x,z,h){

# parameters: y=response, x=design matrix; h=bandwidth; z=


nz<-length(z)

ny<-length(y)

beta<-rep(0,nz*2)

dim(beta)<-c(nz,2)

for(k in 1:nz){

x0=x-z[k]

w0<-sqrt(kernel(x0/h))

beta[k,]<-glm(y~x0,weight=w0)$coeff

}

return(beta)

}

###################################################################

z=x

h=12 # take a badnwidth

fit=local(y,x,z,h) # fit model y=m(x) + e

mhat=fit[,1] # obtain the nonparametric estimate

resid1=y-(-9.037+0.0046*x)

resid2=y-mhat



matplot(x,y,type="p",pch="o",ylab="",xlab="",cex=0.5)

# make multiple plots

points(z,mhat,type="l",lty=1,lwd=3,col=2)

abline(-9.037,0.0046,lty=1,lwd=5,col=3)

# make a stright line with an intercept and slope

title(main="Original Data with Linear and Nonlinear Trend",cex=0.5)

dev.off()





matplot(x,resid1,type="l",lty=1,ylab="",xlab="",cex=0.5)

abline(0,0)

title(main="Detrended: Linear",cex=0.5)

matplot(x,resid2,type="l",lty=1,ylab="",xlab="",cex=0.5)

abline(0,0)

title(main="Detrended: Nonlinear",cex=0.5)

dev.off()

y_diff=diff(y)



plot(x[-1],y_diff,type="l",lty=1,ylab="",xlab="",cex=0.5)

abline(0,0)

title(main="Differenced Time Series",cex=0.5)

dev.off()

###################################################################

# This is China data

###################################

data<-read.table("c:/teaching/stat3150/data/data1.txt",header=T)

# read data from a file containing 6 columns of data

y<-data[,1:5]

# put the first 5 columns of data into y

x<-data[,6]

text1<-c("agriculture","commerce","consumption","industry","transp

# set the text for legend in a graph




matplot(x,log(y),type="l",lty=1:5,ylab="",xlab="")

legend(1960,8,text1,lty=1:5,col=1:5)

dev.off()

###################################################################

# This is motor cycles data

###################################


# read data from a file containing 6 columns of data

y<-data[,1]

x<-data[,2]-1900

y_diff1=diff(y)

y_diff2=diff(y_diff1)




matplot(x,y,type="l",lty=1,ylab="",xlab="")

text(60,250,"Data")

ts.plot(y_diff1,type="l",lty=1,ylab="",xlab="")

text(20,40,"First difference")

abline(0,0)

ts.plot(y_diff2,type="l",lty=1,ylab="",xlab="")

text(20,25,"Second order difference")

abline(0,0)

dev.off()


###################################################################

# This is Johnson and Johnson data

###################################

y<-matrix(scan("c:\\teaching\\time series\\data\\jj.dat"),byrow=T,ncol=1)

n=length(y)

y_log=log(y) # log of data



par(mfrow=c(1,2))


title(main="J&J Earnings",cex=0.5)

ts.plot(y_log,type="l",lty=1,ylab="",xlab="")

title(main="transformed log(earnings)",cex=0.5)

dev.off()

###################################################################

# This is retail sales data

###################################

y=matrix(scan("c:\\res\\0published\\cai-chen\\retail\\retail-sales.dat"),

byrow=T,ncol=1)




dev.off()

###################################################################

# This is marketing data

###################################

text_tv=c("television")


text_radio=c("radio")


TV=log(data[,1])

RADIO=log(data[,2])



ts.plot(cbind(TV,RADIO),type="l",lty=c(1,2),col=c(1,2),ylab=

text(20,10.5,text_tv)

text(165,8,text_radio)

dev.off()

###################################################################

# This is Argentina data

###################################

text_ar=c("difference", "inflation")

y<-read.table("c:/teaching/stat3150/data/data8.txt",header=T)

y=y[,1]

n=length(y)

y_t=diff(log(y))

f_t=diff(y)/y[1:(n-1)]

x=seq(70.25,by=0.25,89.75)



matplot(x,cbind(y_t,f_t),type="l",lty=c(1,2),col=c(1,2),ylab

legend(72,5,text_ar,lty=c(1,2),col=c(1,2))

dev.off()

###################################################################

# This is exchange rate data

###################################


x<-matrix(scan(file="c:\\res\\cai-xu\\jpy\\jpy.dat"),byrow=T,ncol=1)

n<-length(x)

nweek<-(n-7)/5

week1<-rep(0,n) # Dates for week

week1[1:4]<-2:5

for(j in 1:nweek){

i1<-4+(j-1)*5+1

i2<-4+j*5

week1[i1:i2]<-c(1,2,3,4,5)

}

i2<-(nweek+1)*5

week1[i2:n]<-1:3

y<-x[week1==3] # Wednsday

x1<-x[week1==4] # Thursday

x1<-append(x1,0)

x1<-(1-(y>0))*x1 # Take value from Thursday if ND on Wednsday

x1<-y+x1 # Wednsday + Thursday

n<-length(x1)

x<-100*(log(x1[2:n])-log(x1[1:(n-1)])) # log return



ts.plot(x,type="l",ylab="",xlab="")

abline(0,0)

dev.off()

###################################################################

# This is unemployment data

###################################

text_unemploy=c("unadjusted", "seasonally adjusted")



y1=data[,1]

y2=data[,2]

n=length(y1)

x=seq(62.25,by=0.25,92)




matplot(x,cbind(y1,y2),type="l",lty=c(1,2),col=c(1,2),ylab="

legend(66,10,text_unemploy,lty=c(1,2),col=c(1,2))

plot(y2[1:(n-1)],y2[2:n],type="l",lty=c(1),col=c(1),ylab="",

dev.off()

###################################################################

# This is varve data

#####################

x<-matrix(scan("c:\\teaching\\time series\\data\\mass2.dat"),byrow=T,ncol=1)



ts.plot(x,type="l",lty=1,ylab="varve thickness",xlab="year")

title(main="Varve thickness from Massachusetts (n=634)",cex=0.5)

dev.off()

###################################################################

# This is oil-gas data

#######################

data<-matrix(scan("c:\\teaching\\time series\\data\\gas-oil.dat"),

byrow=T,ncol=2)

text4=c("GAS","OIL")




ts.plot(data,type="l",lty=c(1,2),col=c(1,2),ylab="price",xlab=

title(main="Gas and oil prices (n=180 months)",cex=0.5)

legend(20,700,text4,lty=c(1,2),col=c(1,2))

dev.off()

###################################################################

# This is handgun data

#####################

y<-matrix(scan("c:\\teaching\\time series\\data\\guns.dat"),byrow=T,ncol=2)

sales=y[,1]

y=cbind(y[,1]/100,y[,2])

text5=c("Hangun sales/100 per 100,000")

text6=c("Gun death rate per 100,000")



par(mex=0.4)

ts.plot(y,type="l",lty=c(1,2),col=c(1,2),ylab="",xlab="months"

title(main="Gun sales and gun death rate",cex=0.5)

legend(20,2,lty=1,col=1,text5)

legend(20,0.8,lty=2,col=2,text6)

dev.off()

###################################################################

2.7 References

Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal ofEconometrics, 31, 307-327.

Burman, P. and R.H. Shumway (1998). Semiparametric modeling of seasonal time series.Journal of Time Series Analysis, 19, 127-145.


Cai, Z. (2002). A two-stage approach to additive time series models. Statistica Neerlandica,56, 415-433.

Cai, Z. (2006). Trending time varying coe"cient time series models with serially correlatederrors. Forthcoming in Journal of Econometrics.

Cai, Z. and R. Chen (2006). Flexible seasonal time series models. Advances in Economet-rics, 20B, 63-87.

Engle, R.F. (1982). Autoregressive conditional heteroscedasticity with estimates of thevariance of United Kingdom inflations. Econometrica, 50, 987-1007.

Fan, J. and Q. Yao (2003). Nonlinear Time Series: Nonparametric and Parametric Meth-ods. Springer-Verlag, New York.

Franses, P.H. (1996). Periodicity and Stochastic Trends in Economic Time Series. NewYork: Cambridge University Press.

Franses, P.H. (1998). Time Series Models for Business and Economic Forecasting. NewYork: Cambridge University Press.

Franses, P.H. and D. van Dijk (2000). Nonlinear Time Series Models for Empirical Finance.New York: Cambridge University Press.

Ghysels, E. and D.R. Osborn (2001). The Econometric Analysis of Seasonal Time Series.New York: Cambridge University Press.

Granger, C.W.J. and T. Terasvirta (1993). Modeling Nonlinear Economic Relationships.Oxford, U.K.: Oxford University Press.

Hamilton, J.D. (1994). Time Series Analysis. Princeton University Press, NJ.

Liu, L.M. (1991). Dynamic relationship analysis of U.S. gasoline and crude oil prices.Journal of Forecasting, 10, 521-547.

Sercu, P. and R. Uppal (2000). Exchange Rate Volatility, Trade, and Capital Flows underAlternative Rate Regimes. Cambridge: Cambridge University Press.

Shumway, R.H. (1988). Applied Statistical Time Series Analysis. Englewood Cli!s, NJ:Prentice-Hall.

Shumway, R.H. and D.S. Sto!er (2000). Time Series Analysis & Its Applications. NewYork: Springer-Verlag.

Shumway,R.H. and K.L. Verosub (1992). State space modeling of Paleoclimatic time se-ries. Proceeding of 5th International Meeting on Statistical Climatology, Toronto, 22-26June, 1992.

Taylor, S.(2005). Asset Price Dynamics, Volatility, and Prediction. Princeton UniversityPress, Princeton, NJ.


Tong, H. (1990). Nonlinear Time Series: A Dynamical System Approach. Oxford Univer-sity Press, Oxford.

Tsay, R.S. (2005). Analysis of Financial Time Series, 2th Edition. John Wiley & Sons,New York.

Chapter 3

Univariate Time Series Models

3.1 Introduction

The organization of this chapter is patterned after the landmark ap-proach to developing models for time series data pioneered by Boxand Jenkins (see Box, et al, 1994). This assumes that there will bea representation of time series data in terms of a di!erence equa-tion that relates the current value to its past. Such models should beflexible enough to include non-stationary realizations like the randomwalk given above and seasonal behavior, where the current value isrelated to past values at multiples of an underlying season; a com-mon one might be multiples of 12 months (1 year) for monthly data.The models are constructed from di!erence equations driven by ran-dom input shocks and are labelled in the most general formulationas ARMA (Autoregressive Moving Average) model or a more gen-eral model ARIMA, i.e., Autoregressive Integrated MovingAverage processes. The analogies with di!erential equations, whichmodel many physical processes, are obvious.

For clarity, we develop the separate components of the model se-quentially, considering the integrated, autoregressive, and movingaverage in order, followed by the seasonal modification. The Box-Jenkins approach suggests three steps in a procedure that they sum-

69

CHAPTER 3. UNIVARIATE TIME SERIES MODELS 70

marize as identification, estimation, and forecasting. Iden-tification uses model selection techniques, combining the ACF andPACF as diagnostics with the versions of the Akaike Informationcriterion (AIC) type model selection criteria given below to find aparsimonious (simple) model for the data. Estimation of parametersin the model will be the next step. Statistical techniques based onmaximum likelihood and least squares are paramount for this stageand will only be sketched in this course. Hopefully, we can discussthem in a great length if time permits. Finally, forecasting of timeseries based on the estimated parameters, with sensible estimates ofuncertainty, is the bottom line, for any assumed model.

Correlation and Autocorrelation

The correlation coe"cient between two random variables xt and yt

is defined as $xy(0), which is the special case of the cross correlationcoe"cient $xy(h) defined in Chapter 2. The correlation coe"cientbetween xt and xt+h is called the lag h autocorrelation of xt and iscommonly denoted by $x(h), which is under the weak stationarityassumption. The definition of $x(h) is given in Chapter 2. Thesample version of $x(h) is given by "$x(h) = "#x(h)/ "#x(0), where, forgiven data {xt}n

t=1,

"#x(h) =1

n " h

n"h$

t=1(xt+h " x)(xt " x) with x =

1

n

n$

t=1xt.

Under some general conditions, "$x(1) is a consistent estimate of $x(1).For example, if {xt} is an independent and identically distributed(iid) sequence and E(x2

t ) < &, then "$x(1) is asymptotically normalwith mean zero and variance 1/n; see Brockwell and Davis (1991,Theorem 7.2.2). This result can be used in practice to test thenull hypothesis H0 : $x(1) = 0 versus the alternative hypothesis


Ha : $x(1) )= 0. The test v statistic is the usual t-ratio, which is'n "$x(1) and follows asymptotically the standard normal distribu-

tion. In general, for the lag h sample autocorrelation of xt, if {xt} isan iid sequence satisfying E(x2

t ) < &, then, "$x(h) is asymptoticallynormal with mean zero and variance 1/n for any fixed positive in-teger h. For more information about the asymptotic distribution ofsample autocorrelations, see Brockwell and Davis (1991, Chapter 7).In finite samples, "$x(h) is a biased estimator of $x(h). The bias is inthe order of 1/n, which can be substantial when the sample size nis small. In most economic and financial applications, n is relativelylarge so that the bias is not serious.

Portmanteau Test

Economic and financial applications often require to test jointly thatseveral autocorrelations of xt are zero. Box and Pierce (1970) pro-posed the Portmanteau statistic

Q*(m) = nm$

h=1

"$2x(h)

as a test statistic for the null hypothesis H0 : $x(1) = · · · = $x(m) =0 against the alternative hypothesis Ha : $x(i) )= 0 for some i +[1, . . . , m]. Under the assumption that {xt} is an iid sequence withcertain moment conditions, Q*(m) is asymptotically a chi-squaredrandom variable with m degrees of freedom. Ljung and Box (1978)modified the Q*(m) statistic as below to increase the power of thetest in finite samples,

Q(m) = n(n + 2)m$

h=1

"$2x(h)/(n " h).

In practice, the selection of m may a!ect the performance of theQ(m) statistic. Several values of m are often used. Simulation stud-


ies suggest that the choice of m , log(n) provides better powerperformance.

The function "$x(h) is called the sample autocorrelation function(ACF) of xt. It plays an important role in linear time series analysis.As a matter of fact, a linear time series model can be characterized

0 20 40 60 80 100−0.2

−0.1

0.00.1

0.2

Simple ReturnsIBM

0 20 40 60 80 100−0.2

−0.1

0.00.1

0.2

Log ReturnsIBM

0 20 40 60 80 100−0.2

−0.1

0.00.1

0.2

Simple Returnsvalue−weighted index

0 20 40 60 80 100−0.2

−0.1

0.00.1

0.2

Log Returnsvalue−weighted index

Figure 3.1: Autocorrelation functions (ACF) for simple (left) and log (right) returns for IBM(top panels) and for the value-weighted index of US market (bottom panels), January 1926to December 1997.

by its ACF, and linear time series modeling makes use of the sampleACF to capture the linear dynamic of the data. The top panelsof Figure 3.1 show the sample autocorrelation functions of monthlysimple (left top panel) and log (right top panel) returns of IBM stockfrom January 1926 to December 1997. The two sample ACFs are veryclose to each other, and they suggest that the serial correlations ofmonthly IBM stock returns are very small, if any. The sample ACFsare all within their two standard-error limits, indicating that they arenot significant at the 5% level. In addition, for the simple returns,


the LjungBox statistics give Q(5) = 5.4 and Q(10) = 14.1, whichcorrespond to p-value of 0.37 and 0.17, respectively, based on chi-squared distributions with 5 and 10 degrees of freedom. For the logreturns, we have Q(5) = 5.8 and Q(10) = 13.7 with p-value 0.33 and0.19, respectively. The joint tests confirm that monthly IBM stockreturns have no significant serial correlations. The bottom panelsof Figure 3.1 show the same for the monthly returns (simple in theleft panel and log in the right panel) of the value-weighted indexfrom the Center for Research in Security Prices (CRSP), Universityof Chicago. There are some significant serial correlations at the 5%level for both return series. The LjungBox statistics give Q(5) =27.8 and Q(10) = 36.0 for the simple returns and Q(5) = 26.9 andQ(10) = 32.7 for the log returns. The p-values of these four teststatistics are all less than 0.0003, suggesting that monthly returns ofthe value-weighted index are serially correlated. Thus, the monthlymarket index return seems to have stronger serial dependence thanindividual stock returns.

In the finance literature, a version of the Capital Asset PricingModel (CAPM) theory is that the return {xt} of an asset is notpredictable and should have no auto-correlations. Testing for zeroautocorrelations has been used as a tool to check the e"cient marketassumption. However, the way by which stock prices are determinedand index returns are calculated might introduce autocorrelationsin the observed return series. This is particularly so in analysis ofhigh-frequency financial data.

Before we discuss univariate and multivariate time series meth-ods, we first review multiple regression models and model selectionmethods for both iid and time series data.


3.2 Least Squares Regression

We begin our discussion of univariate and multivariate time seriesmethods by considering the idea of a simple regression model, whichwe have met before in other contexts such as statistics or economet-rics course. All of the multivariate methods follow, in some sense,from the ideas involved in simple univariate linear regression. In thiscase, we assume that there is some collection of fixed known func-tions of time, say zt1, zt2, . . . , ztq that are influencing our output yt

which we know to be random. We express this relation between theinputs and outputs as

yt = %1 zt1 + %2 zt2 + · · · + %q ztq + et (3.1)

at the time points t = 1, 2, . . . , n, where %1, . . . , %q are unknownfixed regression coe"cients and et is a random error or noise, assumedto be white noise; this means that the observations have zero means,equal variances )2 and are independent. We traditionally assume alsothat the white noise series, et, is Gaussian or normally distributed.

Example 2.1: We have assumed implicitly that the model

yt = %1 + %2 t + et

is reasonable in our discussion of detrending in Example 1.2 ofChapter 2. Figure 2.4 shows the monthly average global temperatureseries and it is plausible that a straight line is a reasonable model.This is in the form of the regression model 3.1 when one makes theidentification zt1 = 1 and zt2 = t. The problem in detrending is toestimate the coe"cients %1 and %2 in the above equation and detrendby constructing the estimated residual series et, which is shown in thetop panel of Figure 2.4. As indicated in the example, estimates for %1

and %2 can be taken as #%1 = "9.037 and #%2 = 0.0046, respectively.


The linear regression model described by Equation 3.1 can be con-veniently written in slightly more general matrix notation by definingthe column vectors zt = (zt1, . . . , ztq)

- and ! = (%1, . . . , %q)- sothat we write (2.1) in the alternate form

yt = !-zt + et. (3.2)

To find estimators for ! and )2, it is natural to determine the co-e"cient vector ! minimizing

%e2t with respect to !. This yields

least squares or maximum likelihood estimator #! and the maximumlikelihood estimator for #)2 which is proportional to the unbiased

#)2 =1

n " q

n"1$

t=0

+yt " #!

-zt

,2. (3.3)

An alternate way of writing the model 3.2 is as

y = Z! + e, (3.4)

where Z- = (z1, z2, . . . , zn) is a q $ n matrix composed of thevalues of the input variables at the observed time points and y =(y1, y2, . . . , yn) is the vector of observed outputs with the errorsstacked in the vector e = (e1, e2, . . . , en)-. The ordinary leastsquares estimators #! are the solutions to the normal equations

Z- Z! = Zy.

You need not be concerned as to how the above equation is solved inpractice as all computer packages have e"cient software for invertingthe q $ q matrix Z- Z to obtain

#! = (Z- Z)"1 Zy. (3.5)

An important quantity that all software produces is a measure ofuncertainty for the estimated regression coe"cients, say

Cov+#!,

= )2 (Z- Z)"1 . )2 C . )2 (cij). (3.6)


Then, Cov(%i, %j) = )2 cij and a 100(1 " &)% confidence intervalfor #%i is

#%i ± tn"q(&/2) #)'

cii, (3.7)

where tdf(&/2) denotes the upper 100(1 " &)% point on a t distri-bution with df degrees of freedom.

Example 2.1: Consider estimating the possible global warmingtrend alluded to in Section 2.2.1. The global temperature series,shown previously in Figure 2.4 suggests the possibility of a graduallyincreasing average temperature over the 149 year period covered bythe land based series. If we fit the model in Example 2.1, replacingt by t/100 to convert to a 100 year base so that the increase will bein degrees per 100 years, we obtain #%1 = "9.037 and #%2 = 0.4607using (3.5). the error variance, from (3.3), is 0.0337, with q = 2 andn = 1790. Then, (3.6) yields

Cov(#%1,#%2) =

-

.0.0379 "0.0020"0.0020 0.0001

/

0

leading to an estimated standard error of'

0.001 = 0.01008. Thevalue of t with n " q = 1790 " 2 = 1788 degrees of freedom for& = 0.025 is about 1.96, leading to a narrow confidence interval of0.4607 ± 0.0198 for the slope leading to a confidence interval on theone hundred year increase of about 0.4409 to 0.4805 degrees. Wewould conclude from this analysis that there is a substantial increasein global temperature amounting to an increase of roughly one degreeF per 100 years.

If the model is reasonable, the residuals et = yt " #%1 " #%2 t shouldbe essentially independent and identically distributed with no corre-lation evident. The plot that we have made in Figure 2.5 (the toppanel) of the detrended global temperature series shows that this is


0 5 10 15 20−0.5

0.00.5

1.0

Lag

Detrended Temperature

ACF

5 10 15 20−0.5

0.00.5

1.0

Lag

PACF

0 5 10 15 20−0.5

0.00.5

1.0

Lag

Differenced Temperature

ACF

5 10 15 20−0.5

0.00.5

1.0Lag

PACF

Figure 3.2: Autocorrelation functions (ACF) and partial autocorrelation functions (PACF)for the detrended (top panel) and di!erenced (bottom panel) global temperature series.

probably not the case because of the long low frequency in the ob-served residuals. However, the di!erenced series, also shown in Figure2.6, appears to be more independent suggesting that perhaps the ap-parent global warming is more consistent with a long term swing inan underlying random walk than it is of a fixed 100 year trend. If wecheck the autocorrelation function of the regression residuals, shownhere in Figure 3.2, it is clear that the significant values at higherlags imply that there is significant correlation in the residuals. Suchcorrelation can be important since the estimated standard errors ofthe coe"cients under the assumption that the least squares residu-als are uncorrelated is often too small. We can partially repair thedamage caused by the correlated residuals by looking at a model withcorrelated errors. The procedure and techniques for dealing with cor-related errors are based on the autoregressive moving average modelsto be considered in the next sections. Another method of reducingcorrelation is to apply a first di!erence #xt = xt"xt"1 to the global


trend data. The ACF of the di!erenced series, also shown in Figure3.2, seems to have lower correlations at the higher lags. Figure 2.6shows qualitatively that this transformation also eliminates the trendin the original series.

Since we have again made some rather arbitrary looking spec-ifications for the configuration of dependent variables in the aboveregression examples, the reader may wonder how to select among var-ious plausible models. We mention that two criteria which rewardreducing the squared error and penalize for additional parametersare the Akaike Information Criterion and the Schwarz InformationCriterion (SIC) (Schwarz, 1978) with a common form

log(#)2) + C(K) /n, (3.8)

where K is the number of parameters fitted (exclusive of varianceparameters), #)2 is the maximum likelihood estimator for the variance,and C(K) = 2 K for AIC and K log(n) for SIC. SIC is sometimestermed the Bayesian Information Criterion, BIC and will often yieldmodels with fewer parameters than the other selection methods. Amodification to AIC that is particularly well suited for small sampleswas suggested by Hurvich and Tsai (1989). This is the correctedAIC, called AICC given by log(#)2)+(n+K)/(n"K"2). The rulefor all three measures above is to choose the value of K leading tothe smallest value of AIC or SIC or AICC. We will give an examplelater comparing the above simple least squares model with a modelwhere the errors have a time series correlation structure. A summaryof model selection methods is given in the next section. Note thatall methods are general purposes.


3.3 Model Selection Methods

Given a possibly large set of potential predictors, which ones do weinclude in our model? Suppose [X1, X2, · · ·] is a pool of potentialpredictors. The model with all predictors,

Y = %0 + %1 X1 + %2 X2 + · · · + ',

is the most general model. It holds even if some of the individual %j’sare zero. But if some %j’s zero or close to zero, it is better to omitthose Xj’s from the model. Reasons why you should omit variableswhose coe"cients are close to zero:

(a) Parsimony principle: Given two models that perform equallywell in terms of prediction, one should choose the model that ismore parsimonious (simple).

(b) Prediction principle: The model should give predictionsthat are as accurate as possible, not just for current observation,but for future observations as well. Including unnecessary pre-dictors can apparently improve prediction for the current data,but can harm prediction for future data. Note that the sum ofsquared errors (SSE) never increases as we add more predictors.

Therefore, when we build a statistical model, we should follow theseprinciples.

3.3.1 Subset Approaches

The all-possible-regressions procedure calls for considering all pos-sible subsets of the pool of potential predictors and identifying fordetailed examination a few good sub-sets according to some crite-rion. The purpose of all-possible-regressions approach is identifying


a small group of regression models that are good according to a spec-ified criterion (summary statistic) so that a detailed examination canbe made of these models leading to the selection of the final regres-sion model to be employed. The main problem of this approach iscomputationally expensive. For example, with k = 10 predictors, weneed to investigate 210 = 1024 potential regression models. With theaid of modern computing power, this computation is possible. Butstill the number of 1024 possible models to examine carefully wouldbe an overwhelming task for a data analyst.

Di!erent criteria for comparing the regression models may be usedwith the all-possible-regressions selection procedure. We discuss sev-eral summary statistics:

(i) R2p (or SSEp), (ii) R2

adj;p (or MSEp), (iii) Cp, (iv)PRESSp,

(v) Sequential Methods, and (vi) AIC type criteria

We shall denote the number of all potential predictors in the poolby P " 1. Hence including an intercept parameter %0, we have Ppotential parameters. The number of predictors in a subset willbe denoted by p " 1, as always, so that there are p parameters inthe regression function for this subset of predictors. Thus we have1 # p # P :

1. R2p (or SSEp): R2

p indicates that there are p parameters (or, p"1predictors) in the regression model. The coe"cient of multipledetermination R2

p is defined as

R2p = 1 " SSEp/SSTO,

where SSEp is the sum of squared errors of the model including allp"1 predictors and SSTO is the sum of squared total variations.


It is well known that R2p measures the proportion of variance of

Y explained by p " 1 predictors, it always goes up as we adda predictor, and it varies inversely with SSEp because SSTO isconstant for all possible regression models. That is, choosing themodel with the largest R2

p is equivalent to choosing the modelwith smallest SSEp.

2. R2adj;p (or MSEp): One often considers models with a large R2

p

value. However, R2p always increases with the number of predic-

tors. Hence it can not be used to compare models with di!erentsizes. The adjusted coe"cient of multiple determination R2

adj;p

has been suggested as an alternative criterion:

R2adj;p = 1"

SSEp/(n " p)

SSTO/(n " 1)= 1"

-

.n " 1

n " p

/

0SSEp

SSTO= 1"

MSEp

SSTO/(n " 1).

It is like R2p but with a penalty for adding unnecessary variables.

R2adj,p can go down when a useless predictor is added and it

can be even negative. R2adj;p varies inversely with MSEp because

SSTO/(n " 1) is constant for all possible regression models.That is, choosing the model with the largest R2

adj;p is equivalentto choosing the model with smallest MSEp. Note that R2

p isuseful when comparing models of the same size, while R2

adj;p (orCp) is used to compare models with di!erent sizes.

3. Mallows Cp: The Mallows Cp is concerned with the total meansquared error of the n fitted values for each subset regressionmodel. The mean squared error concept involves the total errorin each fitted value:

#Yi " µi = #Yi " E(#Yi)1 23 4random error

+ E(#Yi) " µi1 23 4bias

,


where µi is the true mean response at ith observation. Themeans squared error for #Yi is defined as the expected value ofthe square of the total error in the above. It can be shown that

mse(#Yi) = E5(#Yi " µi)

26

= V ar(#Yi) +7Bias(#Yi)

82,

where Bias(#Yi) = E(#Yi) " µi. The total mean square error forall n fitted values #Yi is the sum over the observation i:

n$

i=1mse(#Yi) =

n$

i=1V ar(#Yi) +

n$

i=1

7Bias(#Yi)

82.

It can be shown thatn$

i=1V ar(#Yi) = p)2 and

n$

i=1

7Bias(#Yi)

82= (n"p)[E(S2

p)")2],

where S2p is the MSE from the current model. Using this, we

haven$

i=1mse(#Yi) = p)2 + (n " p)[E(S2

p) " )2], (3.9)

Dividing (3.9) by )2, we make it scale-free:

n$

i=1

mse(#Yi)

)2= p + (n " p)

E(S2p) " )2

)2,

If the model does not fit well, then S2p is a biased estimate of )2.

We can estimate E(S2p) by MSEp and estimate )2 by the MSE

from the maximal model (the largest model we can consider), i.e.,#)2 = MSEP"1 = MSE(X1, . . . , XP"1). Using the estimatorsfor E(S2

p) and )2 gives

Cp = p+(n"p)MSEp " MSE(X1, . . . , XP"1)

MSE(X1, . . . , XP"1)=

SSEp

MSE(X1, . . . , XP"1)"

Small Cp is a good thing. A small value of Cp indicates thatthe model is relatively precise (has small variance) in estimating


the true regression coe"cients and predicting future responses.This precision will not improve much by adding more predictors.Look for models with small Cp. If we have enough predictors inthe regression model so that all the significant predictors areincluded, then MSEp , MSE(X1, . . . , XP"1) and it followsthat Cp , p. Thus Cp close to p is evidence that the predictorsin the pool of potential predictors (X1, . . . , XP"1) but not inthe current model, are not important. Models with considerablelack of fit have values of Cp larger than p. The Cp can be used tocompare models with di!erent sizes. If we use all the potentialpredictors, then Cp = P .

4. PRESSp: The PRESS (prediction sum of squares) is defined as

PRESS =n$

i=1

"'2(i),

where "'(i) is called PRESS (prediction sum of squares) residualfor the the ith observation. The PRESS residual is defined as"'(i) = Yi " #Y(i), where #Y(i) is the fitted value obtained by leavingthe ith observation. Models with small PRESSp fit well in thesense of having small prediction errors. PRESSp can be calcu-lated without fitting the model n times, each time deleting oneof the n cases. One can show that

"'(i) = "'i/(1 " hii),

where hii is the ith diagonal element of H = X(X -X)"1X -.

3.3.2 Sequential Methods

1. Forward selection

(a) Start with the null model.


(b) Add the significant variable if p-value is less than penter, (equiv-alently, F is larger than Fenter).

(c) Continue until no more variables enter the model.

2. Backward elimination

(a) Start with the full model.

(b) Eliminate the least significant variable whose p-value is largerthan premove, (equivalently, F is smaller than Fremove).

(c) Continue until no more variables can be discarded from themodel.

3. Stepwise selection

(a) Start with any model.

(b) Check each predictor that is currently in the model. Supposethe current model contains X1, . . . , Xk. Then F statistic for Xi

is

F =SSE(X1, . . . , Xi"1, Xi+1, . . . , Xk) " SSE(X1, . . . , Xk)

MSE(X1, . . . , Xk)% F (1; n"k

Eliminate the least significant variable whose p-value is largerthan premove, (equivalently, F is smaller than Fremove).

(c) Continue until no more variables can be discarded from themodel.

(d) Add the significant variable if p-value is less than penter, (equiv-alently, F is larger than Fenter).

(e) Go to step (ii)

(f) Repeat until no more predictors can be entered and no more canbe discarded.


3.3.3 Likelihood Based-Criteria

The basic idea of Akaike’s and alike approaches can be found inAkaike (1973) and subsequent papers; see the recent book by Burn-ham and Anderson (2003).

Suppose that f(y): true model (unknown) giving rise to data (is avector of data) and g(y, !) : candidate model (parameter vector). Wewant to find a model g(y, !) “close to” f(y). The Kullback-Leiblerdiscrepancy:

K(f, g) = Ef

'

9(log

-

:.f(Y )

g(Y, !)

/

;0

)

<* .

This is a measure of how “far” model g is from model f (with refer-ence to model f). Properties:

K(f, g) (, 0 K(f, g) = 0 /0 f = g.

Of course, we can never know how far our model g is from f . ButAkaike (1973) showed that we might be able to estimate somethingalmost as good.

Suppose we have two models under consideration: g(y, !) andh(y,"). Akaike (1973) showed that we can estimate

K(f, g) " K(f, h).

It turns out that the di!erence of maximized log-likelihoods, cor-rected for a bias, estimates the di!erence of K-L distances. Themaximized likelihoods are, #Lg = g(y, "!) and #Lh(y, #"), where "! and #"are the ML estimates of the parameters. Akaike’s result: [log(#Lg) "q] " [log(#Lh) " r] is an asymptotically unbiased estimate (i.e. biasapproaches zero as sample size increases) of K(f, g)"K(f, h). Hereq is the number of parameters estimated in ! (model g) and r is


the number of parameters estimated in " (model h). The price ofparameters: the likelihoods in the above expression are penalized bythe number of parameters.

The AIC for model g is given by

AIC = "2 log(#Lg) + 2 q.

The AIC might not perform well for the small sample size case. Toovercome this shortcoming, a biased correction version of AIC wasproposed by Hurvich and Tsai (1989), defined by

AICC = "2 log(#Lg)+2 (q+1)/(n"q"2) = AIC+2 (q+1)(q+2)/(n"q"2).

The AICC is in the between the AIC (less penalty) and the BIC(heavy penalty).

Another approach is given by the much older notion of Bayesianstatistics. In the Bayesian approach, we assume that a priori uncer-tainty about the value of model parameters is represented by a priordistribution. Upon observing the data, this prior is updated, yieldinga posterior distribution. In order to make inferences about the model(rather than its parameters), we integrate across the posterior dis-tribution. Under the assumption that all models are a priori equallylikely (because the Bayesian approach requires model priors as well asparameter priors), Bayesian model selection chooses the model withhighest marginal likelihood. The ratio of two marginal likelihoods iscalled a Bayes factor (BF), which is a widely used method of modelselection in Bayesian inference. The two integrals in the Bayes fac-tor are nontrivial to compute unless they form a conjugated family.Monte Carlo methods are usually required to compute BF, especiallyfor highly parameterized models. A large sample approximation ofBF yields the easily computable BIC

BIC = "2 log(#Lg) + q log n.


In a sum, both AIC and BIC as well as their generalizations havea similar form as

LC = "2 log(#Lg) + * q,

where * is fixed constant. The recent developments suggest the useof a data adaptive penalty to replace the fixed penalties. See, Bai,Rao and Wu (1999) and Shen and Ye (2002). That is to estimate* by data in a complexity form based on a concept of generalizeddegree of freedom.

3.3.4 Cross-Validation and Generalized Cross-Validation

The cross validation (CV) is the most commonly used method formodel assessment and selection. The main idea is a direct estimateof extra-sample error. The general version of CV is to split data intoK roughly equal-sized parts and to fit the model to the other K " 1parts and calculate prediction error on the remaining part.

CV =n$

i=1(Yi " #Y"i)

2

where #Y"i is the fitted value computed with k-th part of data re-moved.

A convenient approximation to CV for linear fitting with squarederror loss is generalized cross validation (GCV). A linear fitting methodhas the following property: #Y = S Y , where #Yi is the fitted valuewith the whole data. For many linear fitting methods with leave-one-out (k = 1), it can be showed easily that

CV =n$

i=1(Yi " #Y"i)

2 =n$

i=1

-

:.Yi " #Yi

1 " Sii

/

;0

2

.


Due to the intensive computation, the CV can be approximated bythe GCV, defined by

GCV =n$

i=1

-

:.Yi " #Yi

1 " trace(S)/n

/

;0

2

=%n

i=1(Yi " #Yi)2

(1 " trace(S)/n)2.

It has been shown that both the CV and GCV method are veryappalling to nonparametric modeling.

Recently, the leave-one-out cross-validation method was challengedby Shao (1993). Shao (1993) claimed that the popular leave-one-outcross-validation method, which is asymptotically equivalent to manyother model selection methods such as the AIC, the Cp, and thebootstrap, is asymptotically inconsistent in the sense that the proba-bility of selecting the model with the best predictive ability does notconverge to 1 as the total number of observations n 1 & and heshowed that the inconsistency of the leave-one-out cross-validationcan be rectified by using a leave-n+-out cross-validation with n+, thenumber of observations reserved for validation, satisfying n+/n 1 1as n 1 &.

3.3.5 Penalized Methods

1. Bridge and Ridge: Frank and Friedman (1993) proposed theLq(q > 0) penalized least squares as

n$

i=1(Yi "

$

j%j Xij)

2 + *$

j|%j|q,

which results in the estimator which is called the bridge estima-tor. If q = 2, the resulting estimator is called the ridge estimatorgiven by #% = (XTX + * I)"1 XTY .


2. LASSO: Tibshirani (1996) proposed the so-called LASSO whichis the minimizer of the following constrained least squares

n$

i=1(Yi "

$

j%j Xij)

2 + *$

j|%j|,

which results in the soft threshing rule #%j = sign(#%0j )(|

#%0j |"*)+.

3. Non-concave Penalized LS: Fan and Li (2001) proposedthe non-concave penalized least squares

n$

i=1(Yi "

$

j%j Xij)

2 +$

jp*(|%j|),

where the hard threshing penalty function p*(|!|) = *2 " (|!|"*)2|(|!| < *), which results in the hard threshing rule #%j =#%0j I(|#%0

j | > *). Finally, Fan and Li (2001) proposed the so-called the smoothly clipped absolute deviation (SCAD) modelselection criterion with the penalized function defined as

p-*(!) = *

=>?

>@I(! # *) "

(a*" !)+

(a " 1)*I(! > *)

A>B

>Cfor some a > 2 and

which results in the estimator

#%j =

=>>>>>?

>>>>>@

sign(#%0j )(|

#%0j |" *)+ when |#%0

j | # 2*,5(a " 1)#%0

j " sign(#%0j ) a*

6/(a " 2) when 2* # |#%0

j | # a*,#%0j when |#%0

j | > a*.

Also, Fan and Li (2001) showed that the SCAD estimator sat-isfies three properties: (1) unbiasedness, (2) sparsity, and (3)continuity and Fan and Peng (2004) considered the case thatthe number of regressors can depend on the sample size andgoes to infinity in a certain rate.

Remark: Note that the theory for the penalized methods is stillopen for time series data and it would be a very interesting researchtopic.


3.4 Integrated Models - I(1)

We begin our study of time correlation by mentioning a simple modelthat will introduce strong correlations over time. This is the ran-dom walk or unit root model which defines the current value ofthe time series as just the immediately preceding value with additivenoise. The model forms the basis, for example, of the random walktheory of stock price behavior. In this model we define

xt = xt"1 + wt, (3.10)

where wt is a white noise series with mean zero and variance )2. Theleft panel of Figure 3.3 shows a typical realization of such a series(wt % N(0, 1)) and we observe that it bears a passing resemblanceto the global temperature series. Appealing to (3.10), the best pre-diction of the current value would be expected to be given by itsimmediately preceding value. The model is, in a sense, unsatisfac-tory, because one would think that better results would be possibleby a more e"cient use of the past. The ACF of the original series,shown in Figure 3.4, exhibits a slow decay as lags increase. In orderto model such a series without knowing that it is necessarily gener-ated by (3.10), one might try looking at a first di!erence, shown inthe right panel of Figure 3.3, and comparing the result to a whitenoise or completely independent process. It is clear from (3.10) thatthe first di!erence would be # xt = xt " xt"1 = wt which is justwhite noise. The ACF of the di!erenced process, in this case, wouldbe expected to be zero at all lags h )= 0 and the sample ACF shouldreflect this behavior. The first di!erence of the random walk in theright panel of Figure 3.3 is also shown in the bottom panels of Figure3.4 and we note that it appears to be much more random. The ACF,shown in the left bottom panel of Figure 3.4, reflects this predicted


0 50 100 150 200

05

10 Random Walk

0 50 100 150 200

−3−2

−10

12 First Difference

Figure 3.3: A typical realization of the random walk series (left panel) and the first di!erenceof the series (right panel).

behavior, with no significant values for lags other than zero. It isclear that (3.10) is a reasonable model for this data. The original se-ries is nonstationary, with an autocorrelation function that dependson time of the form

$(xt+h, xt) =

=>>?

>>@

&1/(t + h), if h ( 0,

&(t + h)/t, if h < 0.

The above example, using a di!erence transformation to make arandom walk stationary, shows a very particular case of the modelidentification procedure advocated by Box, et al (1994). Namely,we seek a linearly filtered transformation of the original series, basedstrictly on the past values, that will reduce it to completely randomwhite noise. This gives a model that enables prediction to be donewith a residual noise that satisfies the usual statistical assumptionsabout model error.

We will introduce, in the following discussion, more general ver-sions of this simple model that are useful for modeling and forecast-ing series with observations that are correlated in time. The notation


0 5 10 15 20−0.5

0.00.5

1.0

Random Walk

ACF

5 10 15 20−0.5

0.00.5

1.0

lag

PACF

0 5 10 15 20−0.5

0.00.5

1.0

First Difference

ACF

5 10 15 20−0.5

0.00.5

1.0lag

PACF

Figure 3.4: Autocorrelation functions (ACF) (left) and partial autocorrelation functions(PACF) (right) for the random walk (top panel) and the first di!erence (bottom panel)series.

and terminology were introduced in the landmark work by Box andJenkins (1970). A requirement for the ARMA model of Box andJenkins is that the underlying process be stationary. Clearly the firstdi!erence of the random walk is stationary but the ACF of the firstdi!erence shows relatively little dependence on the past, meaningthat the di!erenced process is not predictable in terms of its pastbehavior.

To introduce a notation that has advantages for treating moregeneral models, define the back-shift operator L as the result ofshifting the series back by one time unit, i.e.

L xt = xt"1 (3.11)

and applying successively higher powers, Lk xt = xt"k. The operatorhas many of the usual algebraic properties and allows, for example,writing the random walk model (3.10) as (1"L) xt = wt. Note that


the di!erence operator discussed previously in 1.2.2 is just # = 1"L.

Identifying nonstationarity is an important first step in the Box-Jenkins procedure. From the above discussion, we note that theACF of a nonstationary process will tend to decay rather slowly asa function of lag h. For example, a straightly line would be perfectlycorrelated, regardless of lag. Based on this observation, we mentionthe following properties that aid in identifying non-stationarity.

Property 2.1: The ACF of a non-stationary time series decaysvery slowly as a function of lag h. The PACF of a non-stationarytime series tends to have a peak very near unity at lag 1, withother values less than the significance level.

Note that since I(1) model is very important in modeling eco-nomic and financial data, we will discuss more on the model and thestatistical inference in the later chapter.

3.5 Autoregressive Models - AR(p)

3.5.1 Model

Now, extending the notions above to more general linear combina-tions of past values might suggest writing

xt = "1 xt"1 + "2 xt"2 + · · · + "p xt"p + wt (3.12)

as a function of p past values and an additive noise component wt.The model given by (3.12) is called an autoregressive model of orderp, since it is assumed that one needs p past values to predict xt. Thecoe"cients "1, "2, · · ·, "p are autoregressive coe"cients, chosen toproduce a good fit between the observed xt and its prediction based


on xt"1, xt"2, · · ·, xt"p. It is convenient to rewrite (3.12), using theback-shift operator, as

"(L) xt = wt, where "(L) = 1 " "1 L " "2 L2 " · · ·" "p Lp,(3.13)

is a polynomial with roots (solutions of "(L) = 0) outside the unitcircle (|Lj| > 1)1. The restrictions are necessary for expressing thesolution xt of (3.13) in terms of present and past values of wt, whichis called invertibility of an ARMA series. That solution has theform

xt = ,(L) wt, where ,(L) =&$

k=0,k Lk, (3.14)

is an infinite polynomial (,0 = 1), with coe"cients determined byequating coe"cients of B in

,(L)"(L) = 1. (3.15)

Equation (3.14) can be obtained formally by noting that choosing,(L) satisfying (3.15), and multiplying both sides of (3.14) by ,(L)gives the representation (3.14). It is clear that the random walk has"1 = 1 and "k = 0 for all k ( 2, which does not satisfy the restrictionand the process is nonstationary. xt is stationary if

%k |,k| < &; see

Proposition 3.1.2 in Brockwell and Davis (1991, p.84), which can beweakened by

%k ,2

k < &; see Hamilton (1994, p.52).

Example 2.2: Suppose that we have an autoregressive model (3.12)with p = 1, i.e., xt " "1 xt"1 = (1 " "1 L)xt = wt. Then (3.15)becomes (1+,1 L+,2 L2+ · · ·)(1""1 L) = 1. Equating coe"cientsof L implies that ,1 " "1 = 0 or ,1 = "1. For L2, we would get,2 ",1 "1 = 0, or ,2 = "2

1. Continuing, we obtain ,k = "k1 and the

1This restriction is a su!cient and necessary condition for an ARMA time series to be invertible; seeSection 3.7 in Hamilton (1994) or Theorem 3.1.2 in Brockwell and Davis (1991, p.86) and the relateddiscussions.


representation is

,(L) = 1 +&$

k=1"k

1 Lk

and we have xt =%&

k=0 "k1 wt"k. The representation (3.14) is fun-

damental for developing approximate forecasts and also exhibits theseries as a linear process of the form considered in Chapter 2.

For data involving such autoregressive (AR) models as definedabove, the main selection problems are deciding that the autoregres-sive structure is appropriate and then in determining the value of pfor the model. The ACF of the process is a potential aid for deter-mining the order of the process as are the model selection measuresdescribed in Section 3.3. To determine the ACF of the pth order ARin (3.12), write the equation as

xt "p$

k=1"k xt"k = wt

and multiply both sides by xt"h for any h ( 1. Assuming thatthe mean E(xt) = 0, and using the definition of the autocovariancefunction leads to the equation

E'

((xt xt"h "p$

k=1"k xt"k xt"h

)

* = E[wt xt"h].

The left-hand side immediately becomes #x(h) " %pk=1 "k #x(h" k).

The representation (3.14) implies that

E[wt xt"h] = E[wt(wt"h+"1 wt"h"1+"2 wt"h"2+· · ·)] ==?

@)2

w, if h = 0,0 otherwise.

Hence, we may write the equations for determining #x(h) as

#x(0) "p$

k=1"k #x("k) = )2

w (3.16)


and#x(h) "

p$

k=1"k #x(h " k) = 0 for h ( 1. (3.17)

Note that one will need the property #x(h) = #x("h) in solving theseequations. Equations (3.16) and (3.17) are called the Yule-WalkerEquations (see Yule, 1927, Walker, 1931).

Example 2.3: Consider finding the ACF of the first-order autore-gressive model. First, (3.17) implies that #x(0) " "1 #x(1) = )2

w.For h ( 1, we obtain #x(h) " "1 #x(h " 1) = 0. Solving thesesuccessively gives #x(h) = #x(0)"h

1 . Combining with (3.16) yields#x(0) = )2

w/(1 " "21). It follows that the autocovariance function is

#x(h) = )2w"

h1/(1 " "2

1). Taking into account that #x(h) = #x("h),

we obtain $x(h) = "|h|1 for all h.

The exponential decay is typical of autoregressive behavior andthere may also be some periodic structure. However, the most e!ec-tive diagnostic of AR structure is in the PACF and is summarizedby the following identification property:

Property 2.2: The partial autocorrelation function as a func-tion of lag h is zero for h > p, the order of the autoregressiveprocess. This enables one to make a preliminary identificationof the order p of the process using the partial autocorrelationfunction PACF. Simply choose the order beyond which most ofthe sample values of the PACF are approximately zero.

To verify the above, note that the PACF (see Section 2.4.3) isbasically the last coe"cient obtained when minimizing the squarederror

MSE = E

'

9(

-

.xt+h "h$

k=1ak xt+h"k

/

02)

<* .


Setting the derivatives with respect to aj equal to zero leads to theequations

E

'

9(

-

.xt+h "h$

k=1ak xt+h"k

/

02

xt+h"j

)

<* = 0

This can be written as

#x(j) "h$

k=1ak #x(j " k) = 0

for 1 # j # h. Now, from Equation and (3.17), it is clear that, foran AR(p), we may take ak = "k for k # p and ak = 0 for k > p toget a solution for the above equation. This implies Property 2.2above.

Having decided on the order p of the model, it is clear that, forthe estimation step, one may write the model (3.12) in the regressionform

xt = "- zt + wt, (3.18)

where " = ("1, "2, · · · , "p)- corresponds to ! and zt = (xt"1, xt"2, · · · , xt"p

is the vector of dependent variables in (3.2). Taking into account thefact that xt is not observed for t # 0, we may run the regressionapproach in Section 3.2 for t = p+ 1, · · · , n to get estimators for "

and for )2, the variance of the white noise process. These so-calledconditional maximum likelihood estimators are commonlyused because the exact maximum likelihood estimators involve solv-ing nonlinear equations; see Chapter 5 in Hamilton (1994) for detailsand we will discuss this issue later.

Example 2.4: We consider the simple problem of modeling therecruit series shown in the right panel of Figure 2.1 using an autore-gressive model. The top right panel of Figure 2.16 and the top rightpanel of Figure 2.19 shows the autocorrelation and partial autocorre-lation functions of the recruit series. The PACF has large values for


Table 3.1: AICC values for ten models for the recruits series

p 1 2 3 4 5 6 7 8 9 10AICC 5.75 5.52 5.53 5.54 5.54 5.55 5.55 5.56 5.57 5.58

h = 1 and 2 and then is essentially zero for higher order lags. Thisimplies by Property 2.2 above that a second order (p = 2) ARmodel might provide a good fit. Running the regression program foran AR(2) model with intercept

xt = "0 + "1 xt"1 + "2 xt"2 + wt

leads to the estimators #"0 = 61.8439(4.0121), #"1 = 1.3512(0.0417),#"2 = "0.4612(0.0416) and #)2 = 89.53, where the estimated stan-dard deviations are in parentheses. To determine whether the aboveorder is the best choice, we fitted models for 1 # p # 10, obtain-ing corrected AICC values summarized in Table 3.1 using (3.8) withK = 2. This shows that the minimum AICC obtains for p = 2 andwe choose the second order model.

Example 2.5: The previous example used various autoregressivemodels for the recruits series, fitting a second-order regression model.We may also use this regression idea to fit the model to other seriessuch as a detrended version of the SOI given in previous discussions.We have noted in our discussions of Figure 2.19 from the partialautocorrelation function that a plausible model for this series mightbe a first order autoregression of the form given above with p = 1.Again, putting the model above into the regression framework (3.2)for a single coe"cient leads to the estimators #"1 = 0.59 with stan-dard error 0.04, #)2 = 0.09218 and AICC(1) = "1.375. The ACF ofthese residuals shown in the left panel of Figure 3.5, however, will stillshow cyclical variation and it is clear that they still have a number


of values exceeding the 1.96/'

n threshold. A suggested procedure

0 5 10 15 20

−0.50.0

0.51.0

ACF of residuls of AR(1) for SOI

o o oo

o o o o

o

o

o oo

o

o o o o o o o oo o o

o o o oo

0 5 10 15 20 25 30

−1.70

−1.65

−1.60

−1.55

−1.50

−1.45

−1.40

−1.35

Lag

o o oo

o o o o

o

o

o oo

o

o o o o o o o oo o o

o o o oo

AICAICC

Figure 3.5: Autocorrelation (ACF) of residuals of AR(1) for SOI (left panel) and the plot ofAIC and AICC values (right panel).

is to try higher order autoregressive models and successive modelsfor 1 # p # 30 were fitted and the AICC(K) values are plotted inthe right panel of Figure 3.5. There is a clear minimum for a p = 16order model. The coe"cient vector is " with components and theirstandard errors in the parentheses 0.4050(0.0469), 0.0740(0.0505),0.1527(0.0499), 0.0915(0.0505), "0.0377(0.0500), "0.0803(0.0493),"0.0743(0.0493), "0.0679(0.0492), 0.0096(0.0492), 0.1108 (0.0491),0.1707(0.0492), 0.1606(0.0499), 0.0281(0.0504), "0.1902(0.0501), "0.1283(0.0510),"0.0413(0.0476), and #)2 = 0.07166.

3.5.2 Forecasting

Time series analysis has proved to be fairly good way of producingforecasts. Its drawback is that it is typically not conducive to struc-tural or economic analysis of the forecast. The model has forecastingpower only if the future variable being forecasted is related to currentvalues of the variables that we include in the model.


The goal is to forecast the variable ys based on a set of variablesXt (Xt may consist of the lags of variable yt). Let ys

t denote a fore-cast of ys based on Xt. A quadratic loss function is the same as inOLS regression, i.e. choose yt

s to minimize E(yts"ys)2 and the mean

squared error (MSE) is defined as MSE(yts) = E

D(yt

s " ys)2 |XtE. It

can be shown that the forecast with the smallest MSE is the expec-tation of ys conditional on Xt, that is yt

s = E(ys |Xt). Then, theMSE of the optimal forecast is the conditional variance of ys givenXt, that is Var(ys |Xt).

We now consider the class of forecasts that are linear projection.These forecasts are used very often in empirical analysis of time seriesdata. There are two conditions for the forecast yt

s to be a linearprojection: (1) The forecast yt

s needs to be a linear function of Xt,that is yt

s = E(ys |Xt) = !- Xt, and (2) the coe"cients ! should bechosen in such a way that E[(ys"!- Xt)X

-t] = 0. The forecast !- Xt

satisfying (1) and (2) is called the linear projection of ys on Xt. Oneof the reasons linear projects are popular is that the linear projectionproduces the smallest MSE among the class of linear forecasting rules.

Finally, we give a general approach to forecasting for any processthat can be written in the form (3.14), a linear process. This includesthe AR, MA and ARMA processes. We begin by defining an h-stepforecast of the process xt as

xtt+h = E[xt+h |xt, xt"1, · · ·]

Note that this is not exactly right because we only have x1, x2,· · ·, xt available, so that conditioning on the infinite past is only anapproximation. From this definition, it is reasonable to intuit that


xts = xt for s # t and

E[ws |xt, xt"1, · · ·] = E[ws |wt, wt"1, · · ·] = wts = ws (3.19)

for s # t. For s > t, use xts and

E[ws |xt, xt"1, · · ·] = E[ws |wt, wt"1, · · ·] = wts = E(ws) = 0

(3.20)since ws will be independent of past values of wt. We define theh-step forecast variance as

P tt+h = E[(xt+h " xt

t+h)2 |xt, xt"1, · · ·] (3.21)

To develop an expression for this mean square error, note that, with,0 = 1, we can write

xt+h =&$

k=0,k wt+h"k.

Then, since wtt+h"k = 0 for t + h " k > t, i.e. k < h, we have

xtt+h =

&$

k=0,k wt

t+h"k =&$

k=h,k wt+h"k,

so that the residual is

xt+h " xtt+h =

h"1$

k=0,k wt+h"k.

Hence, the mean square error (3.21) is just the variance of a linearcombination of independent zero mean errors, with common variance)2

w

P tt+h = )2

w

h"1$

k=0,2

k. (3.22)

For more discussions, see Hamilton (1994, Chapter 4). As an exam-ple, we consider forecasting the second order model developed for therecruits series in Example 2.4.


Example 2.6: Consider the one-step forecast xtt+1 first. Writing

the defining equation for t+1 gives xt+1 = "1 xt +"2 xt"1 +wt+1; sothat xt

t+1 = "1 xtt+"2 xt

t"1+wtt+1 = "1 xt+"2 xt"1+0. Continuing in

this vein, we obtain xtt+2 = "1 xt

t+1+"2 xtt+wt

t+2 = "1 xtt+1+"2 xt+0.

Then, xtt+h = "1 xt

t+h"1+"2 xtt+h"2+wt

t+h = "1 xtt+h"1+"2 xt

t+h"2+0for h > 2. Forecasts out to lag h = 4 and beyond, if neces-sary, can be found by solving (3.15) for ,1, ,2 and ,3, and sub-stituting into (3.21). By equating coe"cients of B, L2 and L3 in(1 " "1 L " "2 L2)(1 + ,1 L + ,2 L2 + ,3 L3 + · · ·) = 1, we obtain,1 = ,1, ,2 " "2 + "1 ,1 = 0 and ,3 " "1 ,2 " "2 ,1 = 0. Thisgives the coe"cients ,1 = "1, ,2 = "2 " "2

1, ,3 = 2,"2 "1 " "2

1. From Example 2.4, we have #"1 = 1.35, #"2 ="0.46, #)2

w = 90.31 and #%0 = 6.74. The forecasts are of the formxt

t+h = 6.74 + 1.35 xtt+h"1 " 0.46 xt

t+h"2. For the forecast vari-ance, we evaluate #,1 = 1.35, #,2 = 2.282, #,3 = "3.065, leadingto 90.31, 90.31(2.288), 90.31(7.495) and 90.31(16.890) for forecastsat h = 1, 2, 3, 4. The standard deviations of the forecasts are 9.50,14.37, 26.02 and 39.06 for the standard errors of the forecasts. Therecruits series values range from 20 to 100 so the forecast uncertaintywill be rather large.

3.6 Moving Average Models – MA(q)

We may also consider processes that contain linear combinations ofunderlying unobserved shocks, say, represented by white noise serieswt. These moving average components generate a series of the form

xt = wt "q$

k=1!k wt"k, (3.23)

where q denotes the order of the moving average component and!k(1 # k # q) are parameters to be estimated. Using the back-shift


notation, the above equation can be written in the form

xt = !(L) wt with !(L) = 1 "q$

k=1!k Lk, (3.24)

where !(L) is another polynomial in the shift operator B. It shouldbe noted that the MA process of order q is a linear process of theform considered earlier in Problem 4 in Chapter 2 with ,0 = 1,,1 = "!1, · · ·, ,q = "!q. This implies that the ACF will be zero forlags larger than q because terms in the form of the covariance functiongiven in Problem 4 of Chapter 2 will all be zero. Specifically, theexact forms are

#x(0) = )2w

-

.1 +q$

k=1!2k

/

0 and #x(h) = )2w

-

:."!h +q"h$

k=1!k+h!k

/

;0

(3.25)for 1 # h # q " 1, with #x(q) = ")2

w !q, and #x(h) = 0 for h > q.Hence, we will have the property of ACF for for MA Series.

Property 2.3: For a moving average series of order q, notethat the autocorrelation function (ACF) is zero for lags h > q,i.e. $x(h) = 0 for h > q. Such a result enables us to diagnosethe order of a moving average component by examining $x(h)and choosing q as the value beyond which the coe!cients areessentially zero.

Example 2.7: Consider the varve thicknesses in Figure 2.19, whichis described in Problem 7 of Chapter 2. Figure 3.6 shows the ACFand PACF of the original log-transformed varve series {xt} and thefirst di!erences. The ACF of the original series {xt} indicates a pos-sible non-stationary behavior, and suggests taking a first di!erence# xt, interpreted hear as the percentage yearly change in deposition.The ACF of the first di!erence # xt shows a clear peak at h = 1 and


0 5 10 15 20 25 30−0.5

0.00.5

1.0

ACF

log varves

0 5 10 15 20 25 30−0.5

0.00.5

1.0

PACF

0 5 10 15 20 25 30−0.5

0.00.5

1.0

First difference

0 5 10 15 20 25 30−0.5

0.00.5

1.0

Figure 3.6: Autocorrelation functions (ACF) and partial autocorrelation functions (PACF)for the log varve series (top panel) and the first di!erence (bottom panel), showing a peakin the ACF at lag h = 1.

no other significant peaks, suggesting a first order moving average.Fitting the first order moving average model # xt = wt " !1 wt"1 tothis data using the Gauss-Newton procedure described next leads to"!1 = 0.77 and #)2

w = 0.2358.

Fitting the pure moving average term turns into a nonlinear prob-lem as we can see by noting that either maximum likelihood or re-gression involves solving (3.23) or (3.24) for wt, and minimizing thesum of the squared errors. Suppose that the roots of -(L) = 0 are alloutside the unit circle, then this is possible by solving -(L) !(L) = 1,so that, for the vector parameter # = (!1, · · · , !q)-, we may write

wt(#) = -(L) xt (3.26)

and minimize SSE(#) =%n

t=q+1 w2t (#) as a function of the vector

parameter #. We do not really need to find the operator -(L) butcan simply solve (3.26) recursively for wt, with w1, w2, · · · , wq = 0,


and wt(#) = xt +%q

k=1 !k wt"k for q + 1 # t # n. It is easy to verifythat SSE(#) will be a nonlinear function of !1, !2, · · · , !q. However,note that by the Taylor expansion

wt(#) , wt(#0) +

-

:..wt(#)

.#0

/

;0 (# " #0),

where the derivative is evaluated at the previous guess #0. Rearrang-ing the above equation leads to

wt(#0) ,-

:.".wt(#)

.#0

/

;0 (# " #0) + wt(#),

which is just the regression model (3.2). Hence, we can begin with aninitial guess #0 = (0.1, 0.1, · · · , 0.1)-, say and successively minimizeSSE(#) until convergence. See Chapter 5 in Hamilton (1994) fordetails and we will discuss this issue later.

Forecasting: In order to forecast a moving average series, notethat xt+h = wt+h "

%qk=1 !k wt+h"k. The results below (3.19) imply

that xtt+h = 0 if h > q and if h # q,

xtt+h = "

q$

k=h!k wt+h"k,

where the wt values needed for the above are computed recursivelyas before. Because of (3.14), it is clear that ,0 = 1 and ,k = "!k

for 1 # k # q and these values can be substituted directly into thevariance formula (3.22). That is,

P tt+h = )2

w

-

.1 +h"1$

k=1!2k

/

0 .


3.7 Autoregressive Integrated Moving Average Model- ARIMA(p, d, q)

Now combining the autoregressive and moving average componentsleads to the autoregressive moving average ARMA(p, q) model, writ-ten as "(L) xt = !(L) wt where the polynomials in B are as definedearlier in (3.13) and (3.24), with p autoregressive coe"cients andq moving average coe"cients. In the di!erence equation form, thisbecomes

xt "p$

k=1"k xt"k = wt "

q$

k=1!k wt"k.

The mixed processes do not satisfy the Properties 2.1 - 2.3 anymore but they tend to behave in approximately the same way, evenfor the mixed cases. Estimation and forecasting for such problemsare treated in essentially the same manner as for the AR and MAprocesses. We note that we can formally divide both sides of (3.20)by "(L) and note that the usual representation (3.14) holds when

,(L)"(L) = !(L). (3.27)

For forecasting, we determine the {,k} by equating coe"cients of{Lk} in (3.27), as before, assuming the all the roots of "(L) = 0 aregreater than one in absolute value. Similarly, we can always solve forthe residuals, say

wt = xt "p$

k=1"k xt"k +

q$

k=1!k wt"k

to get the terms needed for forecasting and estimation.

Example 2.8: Consider the above mixed process with p = q = 1,i.e. ARMA(1, 1). By (3.21), we may write

xt = "1 xt"1 + wt " !1 wt"1.


Now, xt+1 = "1 xt + wt+1 " !1 wt so that xtt+1 = "1 xt + 0" !1 wt =

"1 xt " !1 wt and xtt+h = "1 xt

t+h"1 for h > 1, leading to very simpleforecasts in this case. Equating coe"cients of Lk in (1 " "1 L)(1 +,1 L + ,2 L2 + · · ·) = (1 " !1 L) leads to ,k = ("1 " !1)"k"1

1 fork ( 1. Using (3.22) leads to the expression

P tt+h = )2

w

'

(1 + ("1 " !1)2

h"1$

k=1"2(k"1)

1

)

* = )2w

F

1 + ("1 " !1)2(1 " "2(h"1)

1 )/(1 " "

for the forecast variance.

In the first example of this chapter, it was noted that nonstationaryprocesses are characterized by a slow decay in the ACF as in Figure3.4. In many of the cases where slow decay is present, the use of afirst order di!erence # xt = xt " xt"1 = (1 " L) xt will reduce thenonstationary process xt to a stationary series # xt. On can checkto see whether the slow decay has been eliminated in the ACF of thetransformed series. Higher order di!erences, #d xt = ##d"1 xt arepossible and we call the process obtained when the dth di!erence isan ARMA series an ARIMA(p, d, q) series where p is the order ofthe autoregressive component, d is the order of di!erencing neededand q is the order of the moving average component. Symbolically,the form is

"(L) #d xt = !(L) wt.

The principles of model selection for ARIMA(p, d, q) series are ob-tained using the likelihood based methods such as AIC, BIC or AICCwhich replace K by K = p + q the total number of ARMA parame-ters or other methods such as penalized methods described in Section3.3.


3.8 Seasonal ARIMA Models

Some economic and financial as well as environmental time seriessuch as quarterly earning per share of a company exhibits certaincyclical or periodic behavior; see the later chapters on more dis-cussions on cycles and periodicity. Such a time series is calleda seasonal (deterministic cycle) time series. Figure 2.8 shows thetime plot of quarterly earning per share of Johnson and Johnsonfrom the first quarter of 1960 to the last quarter of 1980. The datapossess some special characteristics. In particular, the earning grewexponentially during the sample period and had a strong seasonal-ity. Furthermore, the variability of earning increased over time. Thecyclical pattern repeats itself every year so that the periodicity ofthe series is 4. If monthly data are considered (e.g., monthly salesof Wal-Mart Stores), then the periodicity is 12. Seasonal time seriesmodels are also useful in pricing weather-related derivatives and en-ergy futures. See Example 1.8 and Example 1.9 in Chapter 2for more examples with seasonality.

Analysis of seasonal time series has a long history. In some appli-cations, seasonality is of secondary importance and is removed fromthe data, resulting in a seasonally adjusted time series that is thenused to make inference. The procedure to remove seasonality from atime series is referred to as seasonal adjustment. Most economicdata published by the U.S. government are seasonally adjusted (e.g.,the growth rate of domestic gross product and the unemploymentrate). In other applications such as forecasting, seasonality is as im-portant as other characteristics of the data and must be handledaccordingly. Because forecasting is a major objective of economicand financial time series analysis, we focus on the latter approach


and discuss some econometric models that are useful in modelingseasonal time series.

When the autoregressive, di!erencing, or seasonal moving averagebehavior seems to occur at multiples of some underlying period s,a seasonal ARIMA series may result. The seasonal nonstationarityis characterized by slow decay at multiples of s and can often beeliminated by a seasonal di!erencing operator of the form#D

s xt = (1 " Ls)D xt. For example, when we have monthly data, itis reasonable that a yearly phenomenon will induce s = 12 and theACF will be characterized by slowly decaying spikes at 12, 24, 36, 48,· · ·, and we can obtain a stationary series by transforming with theoperator (1 " L12) xt = xt " xt"12 which is the di!erence betweenthe current month and the value one year or 12 months ago. If theautoregressive or moving average behavior is seasonal at period s, wedefine formally the operators

$(Ls) = 1 " $1 Ls " $2 L2s " · · ·" $P LPs (3.28)

and

%(Ls) = 1 " %1 Ls " %2 L2s " · · ·" %Q LQs. (3.29)

The final form of the seasonal ARIMA(p, d, q)$ (P, D, Q)s model is

$(Ls)"(L)#Ds #d xt = %(Ls) !(L) wt. (3.30)

Note that one special model of (3.30) is ARIMA(0, 1, 1) $ (0, 1, 1)s,that is

(1 " Ls)(1 " L) xt = (1 " !1 L)(1 " %1 Ls) wt.

This model is referred to as the airline model or multiplicativeseasonal model in the literature; see Box, Jenkins, and Reinsel


(1994, Chapter 9). It has been found to be widely applicable inmodeling seasonal time series. The AR part of the model simplyconsists of the regular and seasonal di!erences, whereas the MA partinvolves two parameters.

We may also note the properties below corresponding to Prop-erties 2.1 - 2.3.

Property 2.1’: The ACF of a seasonally non-stationary timeseries decays very slowly at lag multiples s, 2s, 3s, · · ·, withzeros in between, where s denotes a seasonal period, usually 4for quarterly data or 12 for monthly data. The PACF of a non-stationary time series tends to have a peak very near unity atlag s.

Property 2.2’: For a seasonal autoregressive series of order P ,the partial autocorrelation function $hh as a function of lag hhas nonzero values at s, 2s, 3s, · · ·, Ps, with zeros in between,and is zero for h > Ps, the order of the seasonal autoregressiveprocess. There should be some exponential decay.

Property 2.3’: For a seasonal moving average series of orderQ, note that the autocorrelation function (ACF) has nonzerovalues at s, 2s, 3s, · · ·, Qs and is zero for h > Qs.

Remark: Note that there is a build-in command in R called arima()which is a powerful tool for estimating and making inference for anARIMA model. The command is

arima(x,order=c(0,0,0),seasonal=list(order=c(0,0,0),period=NA

xreg=NULL,include.mean=TRUE, transform.pars=TRUE,fixed=NULL,init=NULL,

method=c("CSS-ML","ML","CSS"),n.cond,optim.control=list(),kappa


See the manuals of R for details about this commend.

Example 2.9: We illustrate by fitting the monthly birth seriesfrom 1948-1979 shown in Figure 3.7. The period encompasses the

0 100 200 300

250

300

350

400

Births

0 100 200 300

−40

−20

020

40 First difference

0 50 100 200 300

−40

−20

020

40 ARIMA(0,1,0)X(0,1,0)_{12}

0 100 200 300

−40

−20

020

40 ARIMA(0,1,1)X(0,1,1)_{12}

Figure 3.7: Number of live births 1948(1) " 1979(1) and residuals from models with a firstdi!erence, a first di!erence and a seasonal di!erence of order 12 and a fitted ARIMA(0, 1, 1)$(0, 1, 1)12 model.

boom that followed the Second World War and there is the expectedrise which persists for about 13 years followed by a decline to around1974. The series appears to have long-term swings, with seasonale!ects superimposed. The long-term swings indicate possible non-stationarity and we verify that this is the case by checking the ACFand PACF shown in the top panel of Figure 3.8. Note that by Prop-erty 2.1, slow decay of the ACF indicates non-stationarity and werespond by taking a first di!erence. The results shown in the secondpanel of Figure 2.5 indicate that the first di!erence has eliminatedthe strong low frequency swing. The ACF, shown in the second


0 10 20 30 40 50 60−0.5

0.5

ACF

0 10 20 30 40 50 60−0.5

0.5

PACF

data

0 10 20 30 40 50 60−0.5

0.5

0 10 20 30 40 50 60−0.5

0.5

ARIMA(0,1,0)

0 10 20 30 40 50 60−0.5

0.5

0 10 20 30 40 50 60−0.5

0.5

ARIMA(0,1,0)X(0,1,0)_{12}

0 10 20 30 40 50 60−0.5

0.5

0 10 20 30 40 50 60−0.5

0.5

ARIMA(0,1,0)X(0,1,1)_{12}

0 10 20 30 40 50 60−0.5

0.5

0 10 20 30 40 50 60−0.5

0.5

ARIMA(0,1,1)X(0,1,1)_{12}

Figure 3.8: Autocorrelation functions (ACF) and partial autocorrelation functions (PACF)for the birth series (top two panels), the first di!erence (second two panels) anARIMA(0, 1, 0) $ (0, 1, 1)12 model (third two panels) and an ARIMA(0, 1, 1) $ (0, 1, 1)12

model (last two panels).


panel from the top in Figure 3.8 shows peaks at 12, 24, 36, 48, · · ·,with now decay. This behavior implies seasonal non-stationarity, byProperty 2.1’ above, with s = 12. A seasonal di!erence of thefirst di!erence generates an ACF and PACF in Figure 3.8 that weexpect for stationary series.

Taking the seasonal di!erence of the first di!erence gives a seriesthat looks stationary and has an ACF with peaks at 1 and 12 anda PACF with a substantial peak at 12 and lesser peaks at 12, 24,· · ·. This suggests trying either a first order moving average term,by Property 2.3, or a first order seasonal moving average termwith s = 12, by Property 2.3’ above. We choose to eliminate thelargest peak first by applying a first-order seasonal moving averagemodel with s = 12. The ACF and PACF of the residual seriesfrom this model, i.e. from ARIMA(0, 1, 0) $ (0, 1, 1)12, written as(1 " L)(1 " L12) xt = (1 " %1 L12) wt, is shown in the fourth panelfrom the top in Figure 3.8. We note that the peak at lag one isstill there, with attending exponential decay in the PACF. This canbe eliminated by fitting a first-order moving average term and weconsider the model ARIMA(0, 1, 1) $ (0, 1, 1)12, written as

(1 " L)(1 " L12) xt = (1 " !1 L)(1 " %1 L12) wt.

The ACF of the residuals from this model are relatively well behavedwith a number of peaks either near or exceeding the 95% test of nocorrelation. Fitting this final ARIMA(0, 1, 1)$(0, 1, 1)12 model leadsto the model

(1 " L)(1 " L12) xt = (1 " 0.4896 L)(1 " 0.6844 L12) wt

with AICC= 4.95, R2 = 0.98042 = 0.961, and the p-values are(0.000, 0.000). The ARIMA search leads to the model

(1"L)(1"L12) xt = (1" 0.4088 L" 0.1645 L2)(1" 0.6990 L12) wt,


yielding AICC= 4.92 and R2 = 0.9812 = 0.962, slightly better thanthe ARIMA(0, 1, 1)$ (0, 1, 1)12 model. Evaluating these latter mod-els leads to the conclusion that the extra parameters do not add apractically substantial amount to the predictability. The model isexpanded as

xt = xt"1 + xt"12 " xt"13 + wt " !1 wt"1 " %1 wt"12 + !1%1 wt"13.

The forecast is

xtt+1 = xt + xt"11 " xt"12 " !1 wt " %1 wt"11 + !1 %1 wt"12

xtt+2 = xt

t+1 + xt"10 " xt"11 " %1 wt"10 + !1%1 wt"11.

Continuing in the same manner, we obtain

xtt+12 = xt

t+11 + xt " xt"1 " %1 wt + !1%1 wt"1

for the 12 month forecast.

Example 2.10: Figure 3.9 shows the autocorrelation function ofthe log-transformed J&J earnings series that is plotted in Figure2.8 and we note the slow decay indicating the nonstationarity whichhas already been obvious in the Chapter 2 discussion. We may alsocompare the ACF with that of a random walk, shown in Figure 3.2,and note the close similarity. The partial autocorrelation functionis very high at lag one which, under ordinary circumstances, wouldindicate a first order autoregressive AR(1) model, except that, inthis case, the value is close to unity, indicating a root close to 1 onthe unit circle. The only question would be whether di!erencing ordetrending is the better transformation to stationarity. Following inthe Box-Jenkins tradition, di!erencing leads to the ACF and PACFshown in the second panel and no simple structure is apparent. Toforce a next step, we interpret the peaks at 4, 8, 12, 16, · · ·, as


0 5 10 15 20 25 30−0.5

0.0

0.5

1.0

ACF

log(J&J)

0 5 10 15 20 25 30−0.5

0.0

0.5

1.0

PACF

0 5 10 15 20 25 30−0.5

0.0

0.5

1.0

First Difference

0 5 10 15 20 25 30−0.5

0.0

0.5

1.0

0 5 10 15 20 25 30−0.5

0.0

0.5

1.0

ARIMA(0,1,0)X(1,0,0,)_4

0 5 10 15 20 25 30−0.5

0.0

0.5

1.0

0 5 10 15 20 25 30−0.5

0.0

0.5

1.0

ARIMA(0,1,1)X(1,0,0,)_4

0 5 10 15 20 25 30−0.5

0.0

0.5

1.0

Figure 3.9: Autocorrelation functions (ACF) and partial autocorrelation functions (PACF)for the log J&J earnings series (top two panels), the first di!erence (second two panels),ARIMA(0, 1, 0) $ (1, 0, 0)4 model (third two panels), and ARIMA(0, 1, 1) $ (1, 0, 0)4 model(last two panels).

contributing to a possible seasonal autoregressive term, leading toa possible ARIMA(0, 1, 0) $ (1, 0, 0)4 and we simply fit this modeland look at the ACF and PACF of the residuals, shown in the thirdtwo panels. The fit improves somewhat, with significant peaks stillremaining at lag 1 in both the ACF and PACF. The peak in the ACFseems more isolated and there remains some exponentially decaying


behavior in the PACF, so we try a model with a first-order movingaverage. The bottom two panels show the ACF and PACF of theresulting ARIMA(0, 1, 1)$(1, 0, 0)4 and we note only relatively minorexcursions above and below the 95% intervals under the assumptionthat the theoretical ACF is white noise. The final model suggestedis (yt = log xt)

(1 " $1 L4)(1 " L) yt = (1 " !1 L) wt, (3.31)

where #$1 = 0.820(0.058), "!1 = 0.508(0.098), and #)2w = 0.0086. The

model can be written in forecast form as

yt = yt"1 + $1(yt"4 " yt"5) + wt " !1 wt"1.

The residual plot of the above is plotted in the left bottom panel ofFigure 3.10. To forecast the original series for, say 4 quarters, we

0 5 10 15 20 25 30−0.5

0.00.5

1.0

ACF

ARIMA(0,1,1)X(0,1,1,)_4

0 5 10 15 20 25 30−0.5

0.00.5

1.0

PACF

0 20 40 60 80−0.2

−0.1

0.00.1

0.2

Residual PlotARIMA(0,1,1)X(1,0,0,)_4

0 20 40 60 80

−0.1

0.00.1

0.2

Residual PlotARIMA(0,1,1)X(0,1,1,)_4

Figure 3.10: Autocorrelation functions (ACF) and partial autocorrelation functions(PACF) for ARIMA(0, 1, 1) $ (0, 1, 1)4 model (top two panels) and the residual plots ofARIMA(0, 1, 1) $ (1, 0, 0)4 (left bottom panel) and ARIMA(0, 1, 1) $ (0, 1, 1)4 model (rightbottom panel).


compute the forecast limits for yt = log xt and then exponentiate,i.e. xt

t+h = exp(ytt+h).

Based on the the exact likelihood method, Tsay (2005) consideredthe following seasonal ARIMA(0, 1, 1) $ (0, 1, 1)4 model

(1 " L)(1 " L4) yt = (1 " 0.678 L)(1 " 0.314 L4) wt, (3.32)

with #)2w = 0.089, where standard errors of the two MA parameters

are 0.080 and 0.101, respectively. The Ljung-Box statistics of theresiduals show Q(12) = 10.0 with p-value 0.44. The model appears tobe adequate. The ACF and PACF of the ARIMA(0, 1, 1)$ (0, 1, 1)4model are given in the top two panels of Figure 3.10 and the residualplot is displayed in the right bottom panel of Figure 3.10. Based onthe comparison of ACF and PACF of two model (3.31) and (3.32)[the last two panels of Figure 3.9 and the top two panels in Figure3.10], it seems that ARIMA(0, 1, 1)$ (0, 1, 1)4 model in (3.32) mightperform better than ARIMA(0, 1, 1) $ (1, 0, 0)4 model in (3.31).

To illustrate the forecasting performance of the seasonal model in(3.32), we re-estimate the model using the first 76 observations andreserve the last eight data points for forecasting evaluation. We com-pute 1-step to 8-step ahead forecasts and their standard errors of thefitted model at the forecast origin t = 76. An anti-log transformationis taken to obtain forecasts of earning per share using the relationshipbetween normal and log-normal distributions. Figure 2.15 in Tsay(2005, p.77) shows the forecast performance of the model, where theobserved data are in solid line, point forecasts are shown by dots, andthe dashed lines show 95% interval forecasts. The forecasts show astrong seasonal pattern and are close to the observed data. For morecomparisons for forecasts using di!erent models including semipara-metric and nonparametric models, the reader is referred to the book


by Shumway (1988), and Shumway and Sto!er (2000) and the papersby Burman and Shummay (1998) and Cai and Chen (2006).

When the seasonal pattern of a time series is stable over time (e.g.,close to a deterministic function), dummy variables may be used tohandle the seasonality. This approach is taken by some analysts.However, deterministic seasonality is a special case of the multiplica-tive seasonal model discussed before. Specifically, if %1 = 1, thenmodel contains a deterministic seasonal component. Consequently,the same forecasts are obtained by using either dummy variables ora multiplicative seasonal model when the seasonal pattern is deter-ministic. Yet use of dummy variables can lead to inferior forecasts ifthe seasonal pattern is not deterministic. In practice, we recommendthat the exact likelihood method should be used to estimate a mul-tiplicative seasonal model, especially when the sample size is smallor when there is the possibility of having a deterministic seasonalcomponent.

Example 2.11: To determine deterministic behavior, consider themonthly simple return of the CRSP Decile 1 index from January1960 to December 2003 for 528 observations. The series is shown inthe left top panel of Figure 3.11 and the time series does not showany clear pattern of seasonality. However, the sample ACf of thereturn series shown in the left bottom panel of Figure 3.11 containssignificant lags at 12, 24, and 36 as well as lag 1. If seasonal AIMAmodels are entertained, a model in form

(1 " "1 L)(1 " $1 L12) xt = & + (1 " %1 L12) wt

is identified, where xt is the monthly simple return. Using the con-ditional likelihood, the fitted model is

(1 " 0.25 L)(1 " 0.99 L12) xt = 0.0004 + (1 " 0.92 L12) wt


0 100 200 300 400 500

−0.2

0.00.2

0.4

Simple Returns

0 100 200 300 400 500−0.3

−0.1

0.10.2

0.30.4

January−adjusted returns

0 10 20 30 40−0.5

0.00.5

1.0

ACF

0 10 20 30 40−0.5

0.00.5

1.0

ACF

Figure 3.11: Monthly simple return of CRSP Decile 1 index from January 1960 to December2003: Time series plot of the simple return (left top panel), time series plot of the simplereturn after adjusting for January e!ect (right top panel), the ACF of the simple return (leftbottom panel), and the ACF of the adjusted simple return.

with )w = 0.071. The MA coe"cient is close to unity, indicatingthat the fitted model is close to being non-invertible. If the exactlikelihood method is used, we have

(1 " 0.264 L)(1 " 0.996 L12) xt = 0.0002 + (1 " 0.999 L12) wt

with )w = 0.067. Cancellation between seasonal AR and MA factorsis clearly. This highlights the usefulness of using the exact likelihoodmethod, and the estimation result suggests that the seasonal behaviormight be deterministic. To further confirm this assertion, we definethe dummy variable for January, that is

Jt ==?

@1 if t is January,0 otherwise,

and employ the simple linear regression

xt = %0 + %1 Jt + et.


The right panels of Figure 3.11 show the time series plot of and theACF of the residual series of the prior simple linear regression. Fromthe ACF, there are no significant serial correlation at any multiplesof 12, suggesting that the seasonal pattern has been successfully re-moved by the January dummy variable. Consequently, the seasonalbehavior in the monthly simple return of Decile 1 is due to the Jan-uary e!ect.

3.9 Regression Models With Correlated Errors

In many applications, the relationship between two time series is ofmajor interest. The market model in finance is an example thatrelates the return of an individual stock to the return of a marketindex. The term structure of interest rates is another example inwhich the time evolution of the relationship between interest rateswith di!erent maturities is investigated. These examples lead to theconsideration of a linear regression in the form yt = %1 + %2 xt + et,where yt and xt are two time series and et denotes the error term. Theleast squares (LS) method is often used to estimate the above model.If {et} is a white noise series, then the LS method produces consistentestimates. In practice, however, it is common to see that the errorterm et is serially correlated. In this case, we have a regression modelwith time series errors, and the LS estimates of %1 and %2 may notbe consistent and e"cient.

Regression model with time series errors is widely applicable ineconomics and finance, but it is one of the most commonly misusedeconometric models because the serial dependence in et is often over-looked. It pays to study the model carefully. The standard method


for dealing with correlated errors et in the regression model

yt = !- zt + et

is to try to transform the errors et into uncorrelated ones and thenapply the standard least squares approach to the transformed ob-servations. For example, let P be an n $ n matrix that transformsthe vector e = (e1, · · · , en)- into a set of independent identicallydistributed variables with variance )2. Then, transform the matrixversion (3.4) to

Py = PZ! + Pe

and proceed as before. Of course, the major problem is decidingon what to choose for P but in the time series case, happily, thereis a reasonable solution, based again on time series ARMA models.Suppose that we can find, for example, a reasonable ARMA modelfor the residuals, say, for example, the ARMA(p, 0, 0) model

et =p$

k=1"k et + wt,

which defines a linear transformation of the correlated et to a se-quence of uncorrelated wt. We can ignore the problems near thebeginning of the series by starting at t = p. In the ARMA notation,using the back-shift operator B, we may write

"(L) et = wt, (3.33)

where"(L) = 1 "

p$

k=1"k Lk (3.34)

and applying the operator to both sides of (3.2) leads to the model

"(L) yt = "(L) zt + wt, (3.35)


where the {wt}’s now satisfy the independence assumption. Doingordinary least squares on the transformed model is the same as do-ing weighted least squares on the untransformed model. The onlyproblem is that we do not know the values of the coe"cients "k

(1 # k # p) in the transformation (3.34). However, if we knew theresiduals et, it would be easy to estimate the coe"cients, since (3.34)can be written in the form

et = "- et"1 + wt, (3.36)

which is exactly the usual regression model (3.2) with " = ("1, · · · , "p)-

replacing ! and et"1 = (et"1, et"2, · · · , et"p)- replacing zt. Theabove comments suggest a general approach known as the Cochran-Orcutt procedure (Cochrane and Orcutt, 1949) for dealing with theproblem of correlated errors in the time series context.

1. Begin by fitting the original regression model (3.2) by least squares,obtaining ! and the residuals et = yt " #!

-zt.

2. Fit an ARMA to the estimated residuals, say "(L) et = !(L) wt.

3. Apply the ARMA transformation found to both sides of theregression equation (3.2) to obtain

"(L)

!(L)yt = !- "(L)

!(L)zt + wt.

4. Run an ordinary least squares on the transformed values to ob-tain the new !.

5. Return to 2. if desired.

Often, one iteration is enough to develop the estimators under a rea-sonable correlation structure. In general, the Cochran-Orcutt proce-dure converges to the maximum likelihood or weighted least squaresestimators.


Example 2.12: We might consider an alternative approach totreating the Johnson and Johnson earnings series, assuming thatyt = log(xt) = %1+%2 t+et. In order to analyze the data with this ap-proach, first we fit the model above, obtaining #%1 = "0.6678(0.0349)and #%2 = 0.0417(0.0071). The computed residuals et = yt" #%1" #%2 tcan be computed easily, the ACF and PACF are shown in the toptwo panels of Figure 3.12. Note that the ACF and PACF suggest

0 5 10 15 20 25 30−0.5

0.00.5

1.0

ACF

detrended

0 5 10 15 20 25 30−0.5

0.00.5

1.0

PACF

0 5 10 15 20 25 30−0.5

0.00.5

1.0

ARIMA(1,0,0,)_4

0 5 10 15 20 25 30−0.5

0.00.5

1.0

Figure 3.12: Autocorrelation functions (ACF) and partial autocorrelation functions (PACF)for the detrended log J&J earnings series (top two panels)and the fitted ARIMA(0, 0, 0) $(1, 0, 0)4 residuals.

that a seasonal AR series will fit well and we show the ACF andPACF of these residuals in the bottom panels of Figure 3.12. Theseasonal AR model is of the form et = $1 et"4 + wt and we obtain#$1 = 0.7614(0.0639), with #)2

w = 0.00779. Using these values, wetransform yt to

yt " #$1 yt"4 = %1(1 " #$1) + %2[t " #$1(t " 4)] + wt

using the estimated value #$1 = 0.7614. With this transformed re-


gression, we obtain the new estimators #%1 = "0.7488(0.1105) and#%2 = 0.0424(0.0018). The new estimator has the advantage of beingunbiased and having a smaller generalized variance.

To forecast, we consider the original model, with the newly esti-mated #%1 and #%2. We obtain the approximate forecast for yt

t+h =#%1 + #%2(t+h)+ et

t+h for the log transformed series, along with upperand lower limits depending on the estimated variance that only in-corporates the prediction variance of et

t+h, considering the trend andseasonal autoregressive parameters as fixed. The narrower upper andlower limits (The figure is not presented here) are mainly a refectionof a slightly better fit to the residuals and the ability of the trendmodel to take care of the nonstationarity.

Example 2:13: We consider the relationship between two U.S.weekly interest rate series: xt: the 1-year Treasury constant maturityrate and yt: the 3-year Treasury constant maturity rate. Both serieshave 1967 observations from January 5, 1962 to September 10, 1999and are measured in percentages. The series are obtained from theFederal Reserve Bank of St Louis.

Figure 3.13 shows the time plots of the two interest rates withsolid line denoting the 1-year rate and dashed line for the 3-yearrate. The left panel of Figure 3.14 plots yt versus xt, indicating that,as expected, the two interest rates are highly correlated. A naiveway to describe the relationship between the two interest rates is touse the simple model, Model I: yt = %1 + %2 xt + et. This resultsin a fitted model yt = 0.911 + 0.924 xt + et, with #)2

e = 0.538 andR2 = 95.8%, where the standard errors of the two coe"cients are0.032 and 0.004, respectively. This simple model (Model I) confirmsthe high correlation between the two interest rates. However, the


1970 1980 1990 2000

46

810

1214

16

Figure 3.13: Time plots of U.S. weekly interest rates (in percentages) from January 5, 1962to September 10, 1999. The solid line (black) is the Treasury 1-year constant maturity rateand the dashed line the Treasury 3-year constant maturity rate (red).

ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo

oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo

oooooooooooooooooooooooooooooooooooo

ooooooooooooooooo

ooooooooo

ooooooooooooooooooooooooo

ooooooooooooooooooooooooooooo

oooooooooooooooooooooooooooooooooo

oooooooooooo

ooooooooooooooooooooo

ooo

oooooooooooooo

oo

oooo

ooooooooooooo

oo

ooooooo

oooooooooooooooooo

oooo

oooooooo

ooooooooo

oooooooooo

oooo

ooo

oooo

ooooo

oooo

ooooooooooooooooooooooo

oooooooooo

ooooooooooo

ooooooooooooooooooooooooooooooo

oooooooooooooooooooooo

ooo

oooo

oo

oooo

ooooooooooooooooo

ooooo

ooo

oooooo

oo

oooooo

ooo

oo

ooo oooooooooooo

ooooooooooo

ooooooooo

ooooo

ooooo

oooooooo

ooooooooo

ooo

oooooooo

oooooooooo

ooooooooooooooooooooooooooooooo

oooooo

ooooooooooooooooooooooooooooooooooo

oooooooooo

ooooooooooooooo

oooooooooooooooo

ooooooo

ooooooooooooooooooo

ooooooo

oooooooooooooooooooooo

oooooooooooooo

oooooo

oo

ooo

oo

ooo

ooooooo

oo

o

oo

oo

o

o

o

o

o

o

ooooo

ooooooo

oo

o

o

o

oo

ooo

oo

o

o

oo

oo o

oo

oooo

ooo

oooo

o

o

oo

oo

oo

ooo

oooooo

oo

oooo

oooo

ooo

oooo

o

oooo

o o

ooo

oo

oooo

ooo

oooooooo

ooooo

ooo

oo

oo

oo

ooooo

oo

o

ooo

oooooooooooo

oooo

oo

ooooo

ooooo

ooooo

oooooo

oo

ooo

ooooooooooooooooooooooooooo

oooooooo

ooo

oooooo

ooooooooo

oooooo

oooo

ooooooooo

ooooo

ooooooo

oooo

ooo

oooo

oooooooooooo

ooooooooooo

ooooo

ooooooo

ooooo

ooooooooo

ooooo

ooooo

ooooooo

ooooooooooooooooooooooooooo

ooooo

oooo

oooooooooooo

oooooo

o

oooooooooooo

ooooooooooooooooooooooooooooo

ooooooooooooooo

oooooooooooooooo

ooo

ooooooooo

oooooooooooooooooo

oooooooooooooooooooooooooooooo

ooooooooooooooooooooo

oooooooooooooooooo

oooooooooooooooooooooooooo

oooooooooooooooooo

oooooooooooooooo

oooooo

ooooooooooooooooo


ooooooooooooooooooooooooooooooooooooooooooooooooooo

oooooooooooo

oo


ooooo

ooooooooo

oooooooooooooo

oooooooooooooooo

ooooooooooooooooooooooooooooooooooooooooooooooooo

ooooooooooooooooooo

oooooooooooooooooooooooooooooooooo

oooooooooooooooooooooooooooooooooooooooooooooooooooooo

oo

ooooooooooooooooooooooooooooooooooooooooooooooooooooo

4 6 8 10 12 14 16

46

810

1214

16

ooooooo

o

oo

o

oo

oo

oo

o

ooooo

oo

ooo

ooooooooooo

ooooooooooooo

oo

oooooo

oooo

ooo

ooooooo

ooo

o ooo

oooooooooooooo oooooooooooooooooooo

ooooooo

o

oooooooooo

ooooooo

oooooooooo

o

oooooooo

ooooooooooooooooooooooooooooooooooo

ooo

ooo

oooo

oo

oo

oo

oooooooooooo

oooooo

o

oooo

o

oo

ooo

o

oo

o

o

oo

o

o

o

oo

ooo

ooo

o o

ooo o

oo

o

o

o

ooo

o

ooooooooo

o

o

o

o oo

o

o oo

oo

o

oooooooooo

o

ooo

ooo

o

oo

o ooo

oooo

o

o

o

o

o

oo

ooo

o

oo

o

o

o

oo

o

o

oo

oooo

o

o

oooooooooo

o

ooo

o

o

o

o

oo

o

ooo

ooooooo

ooo

o oo oo

oo

oo

o

o

o

o

o

ooo

oo

o

o

o

o

ooo

o

o

o

o

oo

o

oo

o

o

ooooo

o

o

o

o

o

oo o

ooo o

o

oo

oooo

oooo

oo

oo

oo

o

oo

o

o

oo

o o

ooo

o

o

o

oo

oo

oo

oo

o

o

o

oo

oo

oo

o

o

o

o

oo

o

o

oo

oo

oo

o

o

o

ooo

oo

o

oo o

o

o

o

oooo

o

o

o

oo

o

oo

ooo

o

o

oo

oo

oo

oo

o

ooo

oo

ooo oooo

o

oo

oooooo

ooooooooooooooo

oo

ooooo

o

o

o

o oooo

oo

oo

o

oo

o

o

oo o

ooo

o

o

oo

o

o

ooo

o

o

o

o

o

o

o

oo

ooo

oo

o

o

ooo

o

oo

o

o

o

oo

o

o

oooo

ooo

o o

o

oo

o

o

oo oo

o

o

oo

o

ooo

oo

o

o

o

o

o

oo

o

o

o

o

o

oo

oo

o

o

o

o

o

oo

o

o

o

o

o

o

o

oo

ooo

oo

o

o

o

o

o

o

o

o

oo

o

o

o

oo

o

ooooo

oo

o

o

oo

o

o

o

o

o

oo

o

oo

ooo

oo

ooo

oo

o

o oooooooo o

oo

oo

oo

o

o

o

oooo

o

o

oo

o

o

o

o

oooooo

o

o

ooo

ooooo

oo

ooooo

oo

oo

oo

oo

o

o

o

oo

oo

ooooooo

o

ooo

oo

ooooo

o

ooo

oooo

ooo

o

o

o

oo

oo

oo

o

oo

o o

ooo

o

oo

o

o

o

o

o

o

o

o

oooo

o

o

oo oo

ooo

oo

o

oo

oo

o

o

ooo

o

o

o

o oo

o o

oo

oo

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

oo

o

o

o

o

o

o

ooo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

ooo

o

o

o

o

o

o

o

oo oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

ooooo

o

o o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

oo

o

o

oo

o

oo

o

o

o

oo

o

o

o

o

o

o

o

oo

o

o

o

o

ooo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo o

o

o

o

o

oo

oo

ooo

ooooo

oo

ooo

oo o

o

o

o

o

oo

o

o

oo

o

o

ooo

o

oo

ooo

o o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

oo

o

oo

o

o

o

o

oo

o

o

o

o

o o

ooo

o

o

o

o

o

o

o

o

o

o

o

oo

o

o o

o

o

oo

oo

ooo

o

oo

o

o

oo

oo

oo

ooo

o

o

o

o

oo

oo

o

o

o

o

o

o

o

o

o

oo

o

o

oo

ooo

o

o

o

ooo

o

o

o

o o

o

ooo

oo

oo

o

oo

ooo

ooo

oo

o

o

o

o

oooo

o

o

o

o

o

oo

oo

ooo

o

o

o

oo

o

o

oo

o

o

o

ooo

oo

o

ooooo

o oo

o

o

oo

o

o

o

oo

o

o

o

ooo

o

oo

o

o

o

o

o

oo

o

o

oo

o

o

ooo

o

o ooo

ooo

o o

o

o

oo

o

o

ooooo

oo

oo

o

o

o

o

oooo

oo

o

oo

oo

o

o

oo

o

o

oo

o

o

o

o

o

o

o o

o

o

o

o

oooo

o

oo

o

oo

oo

o

o

oo

oooo

oo

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

oo

ooo

o

o

ooo

oo

oo

oo

oo

o

oo

oo

o

o

ooo

o

o

oo

o oo

ooooo

oo

oooooo

o

o

oo

oo

o ooo

oo

o

oo

ooo

oo

o

o

oo

o

o

o

oo

o

o

ooo

o

o

o

o

o

o

oo

o

o

ooo

o

o

o

oo

oo

oo

o

oo

oo

o

o

o

o

oooooo

o

o

o

oooo

oo

o

ooo

o

oo

o

oo

o

o

o

o

oo

o

o

o

o

o

oo

o

oo

ooooo

oo

ooo

ooo

ooo

oo

o

ooo

o

oo

o

o

o

ooooo

o

o

o

o

o

oo

o

o

o

o

o oo

oo

o

o

o

o

ooo

ooooo

o

o

oo

ooo

o

oo

ooo

ooo

o

o

o

o

o

o

o

ooo

oo

o

o

o

o

o

o

oo

o

o

oo

o

o

oo

o

o

o

ooo

o

oooo

o

ooooo

oo

o

oo

o

o

ooo

o

oo

o

oo

o

o

o

o

oo

o

o

ooo

ooo

o

o

o

o

o

oo

o

o

oo

ooooo

oooooo

oo

o

oo

ooo

o

oo

oo

o

oooooo

o

oooo

ooo

oo

o

ooo

o

o

o

o

oo

o

ooo

oo

o

oooooo

o

oo

o

o

oo

ooo

o

o

oo

o

oo

ooo

o

ooooooo

oo

ooo

ooo

ooo

o

o

o

o

o

o

o

o

ooo

o

oo

o

ooooo

o

oooo

oo

oo

o

oooo

o

o

o

o

o

o

o

o

o

o

o

ooo

oo

o

o

−1.5 −0.5 0.5 1.0 1.5

−1.0−0.5

0.00.5

1.01.5

Figure 3.14: Scatterplots of U.S. weekly interest rates from January 5, 1962 to September10, 1999: the left panel is 3-year rate versus 1-year rate, and the right panel is changes in3-year rate versus changes in 1-year rate.

model is seriously inadequate as shown by Figure 3.15, which givesthe time plot and ACF of its residuals. In particular, the sampleACF of the residuals is highly significant and decays slowly, showingthe pattern of a unit root nonstationary time series2. The behaviorof the residuals suggests that marked di!erences exist between thetwo interest rates. Using the modern econometric terminology, if one

2We will discuss in detail on how to do unit root test later


1970 1980 1990 2000

−1.5−1.0

−0.50.0

0.51.0

0 5 10 15 20 25 30

−0.50.0

0.51.0

Figure 3.15: Residual series of linear regression Model I for two U.S. weekly interest rates:the left panel is time plot and the right panel is ACF.

assumes that the two interest rate series are unit root nonstationary,then the behavior of the residuals indicates that the two interest ratesare not co-integrated; see later chapters for discussion of unitroot and co-integration. In other words, the data fail to supportthe hypothesis that there exists a long-term equilibrium between thetwo interest rates. In some sense, this is not surprising because thepattern of “inverted yield curve” did occur during the data span. Bythe inverted yield curve, we mean the situation under which interestrates are inversely related to their time to maturities.

The unit root behavior of both interest rates and the residualsleads to the consideration of the change series of interest rates. Let#xt = yt " yt"1 = (1 " L) xt be changes in the 1-year interestrate and # yt = yt " yt"1 = (1 " L) yt denote changes in the 3-year interest rate. Consider the linear regression, Model II: # yt =%1 + %2 # xt + et. Figure 3.16 shows time plots of the two changeseries, whereas the right panel of Figure 3.14 provides a scatterplotbetween them. The change series remain highly correlated with afitted linear regression model given by # yt = 0.0002+0.7811 # xt+et

with #)2e = 0.0682 and R2 = 84.8%. The standard errors of the two


1970 1980 1990 2000

−1.5−1.0

−0.50.0

0.51.0

1.5

Figure 3.16: Time plots of the change series of U.S. weekly interest rates from January 12,1962 to September 10, 1999: changes in the Treasury 1-year constant maturity rate are indenoted by black solid line, and changes in the Treasury 3-year constant maturity rate areindicated by red dashed line.

coe"cients are 0.0015 and 0.0075, respectively. This model furtherconfirms the strong linear dependence between interest rates. Thetwo top panels of Figure 3.17 show the time plot (left) and sampleACF (right) of the residuals (Model II). Once again, the ACF shows

0 500 1000 1500 2000−0.4

−0.2

0.00.2

0.4

0 5 10 15 20 25 30−0.5

0.00.5

1.0

0 500 1000 1500 2000−0.4

−0.2

0.00.2

0.4

0 5 10 15 20 25 30−0.5

0.00.5

1.0

Figure 3.17: Residual series of the linear regression models: Model II (top) and Model III(bottom) for two change series of U.S. weekly interest rates: time plot (left) and ACF (right).


some significant serial correlation in the residuals, but the magnitudeof the correlation is much smaller. This weak serial dependence inthe residuals can be modeled by using the simple time series modelsdiscussed in the previous sections, and we have a linear regressionwith time series errors.

The main objective of this section is to discuss a simple approachfor building a linear regression model with time series errors. Theapproach is straightforward. We employ a simple time series modeldiscussed in this chapter for the residual series and estimate the wholemodel jointly. For illustration, consider the simple linear regressionin Model II. Because residuals of the model are serially correlated, weidentify a simple ARMA model for the residuals. From the sampleACF of the residuals shown in the right top panel of Figure 3.17,we specify an MA(1) model for the residuals and modify the linearregression model to (Model III): # yt = %1 + %2 # xt + et and et =wt " !1 wt"1, where {wt} is assumed to be a white noise series. Inother words, we simply use an MA(1) model, without the constantterm, to capture the serial dependence in the error term of ModelII. The two bottom panels of Figure 3.17 show the time plot (left)and sample ACF (right) of the residuals (Model III). The resultingmodel is a simple example of linear regression with time series errors.In practice, more elaborated time series models can be added to alinear regression equation to form a general regression model withtime series errors.

Estimating a regression model with time series errors was not easybefore the advent of modern computers. Special methods such as theCochrane-Orcutt estimator have been proposed to handle the serialdependence in the residuals. By now, the estimation is as easy asthat of other time series models. If the time series model used is


stationary and invertible, then one can estimate the model jointly viathe maximum likelihood method or conditional maximum likelihoodmethod. This is the approach we take by using the package R withthe command arima(). For the U.S. weekly interest rate data, thefitted version of Model II is # yt = 0.0002 + 0.7824 # xt + et andet = wt + 0.2115 wt"1 with #)2

w = 0.0668 and R2 = 85.4%. Thestandard errors of the parameters are 0.0018, 0.0077, and 0.0221,respectively. The model no longer has a significant lag-1 residualACF, even though some minor residual serial correlations remain atlags 4 and 6. The incremental improvement of adding additional MAparameters at lags 4 and 6 to the residual equation is small and theresult is not reported here.

Comparing the above three models, we make the following ob-servations. First, the high R2 and coe"cient 0.924 of Modle I aremisleading because the residuals of the model show strong serial cor-relations. Second, for the change series, R2 and the coe"cient of# xt of Model II and Model III are close. In this particular in-stance, adding the MA(1) model to the change series only providesa marginal improvement. This is not surprising because the esti-mated MA coe"cient is small numerically, even though it is statis-tically highly significant. Third, the analysis demonstrates that itis important to check residual serial dependence in linear regressionanalysis. Because the constant term of Model III is insignificant, themodel shows that the two weekly interest rate series are related asyt = yt"1 + 0.782 (xt " xt"1) + wt + 0.212 wt"1. The interest ratesare concurrently and serially correlated.

Finally, we outline a general procedure for analyzing linear re-gression models with time series errors: First, fit the linear regres-sion model and check serial correlations of the residuals. Second,


if the residual series is unit-root nonstationary, take the first di!er-ence of both the dependent and explanatory variables. Go to step1. If the residual series appears to be stationary, identify an ARMAmodel for the residuals and modify the linear regression model ac-cordingly. Third, perform a joint estimation via the maximum like-lihood method and check the fitted model for further improvement.

To check the serial correlations of residuals, we recommend thatthe Ljung-Box statistics be used instead of the Durbin-Watson (DW)statistic because the latter only considers the lag-1 serial correlation.There are cases in which residual serial dependence appears at higherorder lags. This is particularly so when the time series involvedexhibits some seasonal behavior.

Remark: For a residual series et with T observations, the Durbin-Watson statistic is

DW =T$

t=2(et " et"1)

2/T$

t=1e2t .

Straightforward calculation shows that DW , 2(1 " "$e(1)), where$e(1) is the lag-1 ACF of {et}.

3.10 Estimation of Covariance Matrix

Consider again the regression model in (3.2). There may exist sit-uations which the error et has serial correlations and/or conditionalheteroscedasticity, but the main objective of the analysis is to makeinference concerning the regression coe"cients !. When et has serialcorrelations, we discussed methods in Example 2.12 and Exam-ple 2.13 above to overcome this di"culty. However, we assume


that et follows an ARIMA type model and this assumption mightnot be always satisfied in some applications. Here, we consider ageneral situation without making this assumption. In situations un-der which the ordinary least squares estimates of the coe"cients re-main consistent, methods are available to provide consistent estimateof the covariance matrix of the coe"cients. Two such methods arewidely used in economics and finance. The first method is calledheteroscedasticity consistent (HC) estimator; see Eicker (1967) andWhite (1980). The second method is called heteroscedasticity andautocorrelation consistent (HAC) estimator; see Newey and West(1987).

To ease in discussion, we shall re-write the regression model as

yt = !-xt + et,

where yt is the dependent variable, xt = (x1t, · · · , xpt)- is a p-dimensional vector of explanatory variables including constant andlagged variables, and ! = (%1, · · · , %p)- is the parameter vector.The LS estimate of ! is given by

#! ='

(n$

t=1xt x

-t

)

*"1 n$

t=1xt yt,

and the associated covariance matrix has the so-called “sandwich”form as

&% = Cov(#!) ='

(n$

t=1xt x

-t

)

*"1

C'

(n$

t=1xt x

-t

)

*"1

if et is iid= )2

e

'

(n$

t=1xt x

-t

)

*"1

,

where C is called the “meat” given by

C = Var-

.n$

t=1et xt

/

0 ,

)2e is the variance of et and is estimated by the variance of residuals

of the regression. In the presence of serial correlations or conditional


heteroscedasticity, the prior covariance matrix estimator is inconsis-tent, often resulting in inflating the t-ratios of #!.

The estimator of White (1980) is based on following:

#&%,hc ='

(n$

t=1xt x

-t

)

*"1

#Chc

'

(n$

t=1xt x

-t

)

*"1

,

where with "et = yt " #!-xt being the residual at time t,

#Chc =n

n " p

n$

t=1

"e2t xt x

-t.

The estimator of Newey and West (1987) is

#&%,hac ='

(n$

t=1xt x

-t

)

*"1

#Chac

'

(n$

t=1xt x

-t

)

*"1

,

where #Chac is given by

#Chac =n$

t=1

"e2t xt x

-t +

l$

j=1wj

n$

t=j+1

5xt "et "et"j x

-t"j + xt"j "et"j "et x

-t

6

with l is a truncation parameter and wj is weight function such asthe Barlett weight function defined by wj = 1 " j/(l + 1). Otherweight function can also used. Newey and West (1987) suggestedchoosing l to be the integer part of 4(n/100)2/9. This estimatoressentially uses a nonparametric method to estimate the covariancematrix of

%nt=1 et xt and a class of kernel-based heteroskedasticity and

autocorrelation consistent (HAC) covariance matrix estimators wasintroduced by Andrews (1991).

Example 2.14: (Continuation of Example 2.13) For illustration,we consider the first di!erenced interest rate series in Model II inExample 2.13. The t-ratio of the coe"cient of # xt is 104.63 ifboth serial correlation and conditional heteroscedasticity in residuals


are ignored; it becomes 46.73 when the HC estimator is used, and itreduces to 40.08 when the HAC estimator is employed. To use HCor HAC estimator, we can use the package sandwich in R and thecommands are vcovHC() or vcovHAC().

3.11 Long Memory Models

We have discussed that for a stationary time series the ACF decaysexponentially to zero as lag increases. Yet for a unit root nonsta-tionary time series, it can be shown that the sample ACF convergesto 1 for all fixed lags as the sample size increases; see Chan and Wei(1988) and Tiao and Tsay (1983). There exist some time series whoseACF decays slowly to zero at a polynomial rate as the lag increases.These processes are referred to as long memory or long rangedependent time series. One such an example is the fractionallydi!erenced process defined by

(1 " L)d xt = wt, |d| < 0.5, (3.37)

where {wt} is a white noise series and d is called the long mem-ory parameter. Properties of model (3.37) have been widely studiedin the literature (e.g., Beran, 1994). We summarize some of theseproperties below.

1. If d < 0.5, then xt is a weakly stationary process and has theinfinite MA representation

xt = wt+&$

k=1,k wt"k with ,k = d(d+1) · · · (d+k"1)/k! =

-

:.k + d " 1

k

/

;0

2. If d > "0.5, then xt is invertible and has the infinite AR repre-sentation.

xt = wt+&$

k=1,k wt"k with ,k = (0"d)(1"d) · · · (k"1"d)/k! =

-

:.k " d

k


3. For |d| < 0.5, the ACF of xt is

$x(h) =d(1 + d) · · · (h " 1 + d)

(1 " d)(2 " d) · · · (h " d), h ( 1.

In particular, $x(1) = d/(1 " d) and as h 1 &,

$x(h) ,("d)!

(d " 1)!h2d"1.

4. For |d| < 0.5, the PACF of xt is "h,h = d/(h " d) for h ( 1.

5. For |d| < 0.5, the spectral density function fx(·) of xt, which isthe Fourier transform of the ACF #x(h) of xt, that is

fx(+) =1

2-

&$

h="&#x(h) exp("i h +)

for + + ["-, -], where i ='"1, satisfies

fx(+) % +"2d as + 1 0, (3.38)

where + + [0, -] denotes the frequency. See Chapter 6 of Hamil-ton (1994) for details about the spectral analysis.

Of particular interest here is the behavior of ACF of xt when d < 0.5.The property says that $x(h) % h2d"1, which decays at a polyno-mial, instead of exponential rate. For this reason, such an xt processis called a long-memory time series. A special characteristic of thespectral density function in (3.38) is that the spectrum diverges toinfinity as + 1 0. However, the spectral density function of a sta-tionary ARMA process is bounded for all + + ["-, -].

Earlier we used the binomial theorem for non-integer powers

(1 " L)d =&$

k=0("1)k

-

:.d

k

/

;0Lk.


If the fractionally di!erenced series (1"L)d xt follows an ARMA(p, q)model, then xt is called an fractionally di!erenced autoregressivemoving average (ARFIMA(p, d, q)) process, which is a generalizedARIMA model by allowing for non-integer d. In practice, if the sam-ple ACF of a time series is not large in magnitude, but decays slowly,then the series may have long memory. For more discussions, werefer to the book by Beran (1994). For the pure fractionally di!er-enced model in (3.37), one can estimate d using either a maximumlikelihood method in the time domain or the Whittle likelihood ora regression method with logged periodogram at the lower frequen-cies in the frequency domain. Finally, long-memory models haveattracted some attention in the finance literature in part becauseof the work on fractional Brownian motion in the continuous timemodels.

0 100 200 300 400−0.1

0.00.1

0.20.3

0.4 ACF for value−weighted index

0 100 200 300 400−0.1

0.00.1

0.20.3

0.4 ACF for equal−weighted index

0.00 0.02 0.04 0.06

−12−11

−10−9

−8−7

−6

Log Spectral Density of VW

0.00 0.02 0.04 0.06

−13−11

−9−8

−7−6

Log Spectral Density of EW

Figure 3.18: Sample autocorrelation function of the absolute series of daily simple returns forthe CRSP value-weighted (left top panel) and equal-weighted (right top panel) indexes. Thelog spectral density of the absolute series of daily simple returns for the CRSP value-weighted(left bottom panel) and equal-weighted (right bottom panel) indexes.


Example 2.15: As an illustration, Figure 3.18 show the sampleACFs of the absolute series of daily simple returns for the CRSPvalue-weighted (left top panel) and equal-weighted (right top panel)indexes from July 3, 1962 to December 31, 1997. The ACFs are rel-atively small in magnitude, but decay very slowly; they appear to besignificant at the 5% level even after 300 lags. For more informationabout the behavior of sample ACF of absolute return series, see Ding,Granger, and Engle (1993). To estimate the long memory parameterestimate d, we can use the package fracdi! in R and results are"d = 0.1867 for the absolute returns of the value-weighted index and"d = 0.2732 for the absolute returns of the equal-weighted index. Tosupport our conclusion above, we plot the log spectral density of theabsolute series of daily simple returns for the CRSP value-weighted(left bottom panel) and equal-weighted (right bottom panel). Theyshow clearly that both log spectral densities decay like a log functionand they support the spectral densities behavior like (3.38).

3.12 Periodicity and Business Cycles

Let us first recall what we have observed from from Figure 2.1 inChapter 2. From Figure 2.1, we can conclude that both series in tendto exhibit repetitive behavior, with regularly repeating (stochastic)cycles or periodicity that are easily visible. This periodic be-havior is of interest because underlying processes of interest may beregular and the rate or frequency of time series characterizing the be-havior of the underlying series would help to identify them. One canalso remark that the cycles of the SOI are repeating at a faster ratethan those of the recruitment series. The recruits series also showsseveral kinds of oscillations, a faster frequency that seems to repeatabout every 12 months and a slower frequency that seems to repeat


about every 50 months. The study of the kinds of cycles and theirstrengths are also very important, particularly in macroeconomics todetermine the business cycles. For more discussions, we refer tothe books by Franses (1996, 1998) and Ghysels and Osborn (2001).As we mention in Chapter 2, one way to identify the cycles is to com-pute the power spectrum which shows the variance as a functionof the frequency of oscillation. For other modeling methods such asperiodic autoregressive model (PAR), PAR(p) modeling techniques,we refer to the books by Franses (1996, 1998) and Ghysels and Os-born (2001) for details. Next, we introduce one method to describethe cyclical behavior using the ACF.

As indicated above, there exits cyclical behavior for the recruits.From Example 2.4, an AR(2) fits this series quite well. There-fore, we consider the ACF $x(h) of a stationary AR(2) series, whichsatisfies the second order di!erence equation

(1 " "1 L " "2 L2) $x(h) = "(L) $x(h) = 0

for h ( 2, $x(0) = 1, and $x(1) = "1/(1 " "2). This di!erenceequation determines the properties of the ACF of a stationary AR(2)time series. It also determines the behavior of the forecasts of xt.Corresponding to the prior di!erence equation, there is a secondorder polynomial equation

1 " "1 x " "2 x2 = 0.

Solutions of this equation are

x ="1 ±

G"2

1 + 4"2

"2"2.

In the time series literature, the inverse of two solutions are referredas the characteristic roots of the AR(2) model. Denote the


two solutions by /1 and /2. If both /i are real valued, then thesecond order di!erence equation of the model can be factored as(1 " /1 L)(1 " /2 L) and the AR(2) model can be regarded as anAR(1) model operates on top of another AR(1) model. The ACF ofxt is then a mixture of two exponential decays. Yet if "2

1 + 4"2 < 0,then /1 and /2 are complex numbers (called a complex conjugatepair), and the plot of ACF of xt would show a picture of dampingsine and cosine waves; see the top two panels of Figure 2.14 forthe ACF of the SOI and recruits. In business and economic appli-cations, complex characteristic roots are important. They give riseto the behavior of business cycles. It is then common for economictime series models to have complex valued characteristic roots. Foran AR(2) model with a pair of complex characteristic roots, the av-erage length of the stochastic cycles is

T0 =2-

cos"1["1/(2'""2)]

,

where the cosine inverse is stated in radians. If one wishes the com-plex solutions as a ± b i, then we have "1 = 2 a, "2 = "

'a2 + b2,

and

T0 =2-

cos"1(a/'

a2 + b2),

where'

a2 + b2 is the absolute value of a ± b i.

To illustrate the above idea, Figure 3.19 shows the ACF of fourstationary AR(2) models. The right top panel is the ACF of theAR(2) model (1"0.6 L+0.4 L2) xt = wt. Because "2

1 +4"2 = 1.0+4 $ ("0.7) = "1.8 < 0, this particular AR(2) model contains twocomplex characteristic roots, and hence its ACF exhibits dampingsine and cosine waves. The other three AR(2) models have real-valued characteristic roots. Their ACFs decay exponentially.


5 10 15 20−1.0

−0.5

0.00.5

1.0

(a)

5 10 15 20−1.0

−0.5

0.00.5

1.0

(b)

5 10 15 20−1.0

−0.5

0.00.5

1.0

(c)

5 10 15 20−1.0

−0.5

0.00.5

1.0

(d)

Figure 3.19: The autocorrelation function of an AR(2) model: (a) "1 = 1.2 and "2 = "0.35,(b) "1 = 1.0 and "2 = "0.7, (c) "1 = 0.2 and "2 = 0.35, (d) "1 = "0.2 and "2 = 0.35.

Example 2.16: As an illustration, consider the quarterly growthrate of U.S. real gross national product (GNP), seasonally adjusted,from the second quarter of 1947 to the first quarter of 1991, whichis shown in the left panel of Figure 3.20. The right panel of Figure

1950 1960 1970 1980 1990

−0.02

−0.01

0.000.01

0.020.03

0.04

0 5 10 15 20 25 30

0.00.2

0.40.6

0.81.0

Figure 3.20: The growth rate of US quarterly real GNP from 1947.II to 1991.I (seasonallyadjusted and in percentage): the left panel is the time series plot and the right panel is theACF.


3.20 displays the ACF of this series and it shows that a picture ofdamping sine and cosine waves. We can conclude that cycles exist.This series can be used as an example of nonlinear economic timeseries; see Tsay (2005, Chapter 4) for the detailed analyses using theMarkov switching model. Here we simply employ an AR(3) modelfor the data. Denoting the growth rate by xt, we can use the modelbuilding procedure to estimate the model. The fitted model is

xt = 0.0047+0.35 xt"1+0.18 xt"2"0.14 xt"3+wt, with #)2w = 0.0098.

Rewriting the model as xt " 0.35 xt"1 " 0.18 xt"2 + 0.14 xt"3 =0.0047 + wt, we obtain a corresponding third-order di!erence equa-tion (1 " 0.35 L " 0.18 L2 + 0.14 L3) = 0, which can be factored as(1 + 0.52 L)(1 " 0.87 L + 0.27 L2) = 0. The first factor (1 + 0.52 L)shows an exponentially decaying feature of the GNP growth rate.Focusing on the second order factor 1" 0.87 L" ("0.27) L2 = 0, wehave "2

1 + 4"2 = 0.872 + 4("0.27) = "0.3231 < 0. Therefore, thesecond factor of the AR(3) model confirms the existence of stochas-tic business cycles in the quarterly growth rate of U.S. real GNP.This is reasonable as the U.S. economy went through expansion andcontraction periods. The average length of the stochastic cycles isapproximately

T0 =2-

cos"1["1/(2'""2)]

= 10.83 quarters,

which is about 3 years. If one uses a nonlinear model to separateU.S. economy into “expansion” and “contraction” periods, the datashow that the average duration of contraction periods is about threequarters and that of expansion periods is about 3 years. The averageduration of 10.83 quarters is a compromise between the two separatedurations. The periodic feature obtained here is common among


growth rates of national economies. For example, similar featurescan be found for other countries.

For a stationary AR(p) series, the ACF satisfies the di!erenceequation (1""1 L""2 L2 " · · ·""p Lp) $x(h) = 0, for h > 0. Theplot of ACF of a stationary AR(p) model would then show a mixtureof damping sine and cosine patterns and exponential decaysdepending on the nature of its characteristic roots.

Finally, we continue our analysis of the recruits series as enter-tained in Example 2.4, from which an AR(2) model is fitted forthis series as

xt " 1.3512 xt"1 + 0.4612 xt"2 = 61.8439 + wt.

Clearly, "21 + 4"2 = 1.35122 " 4$ 0.4612 = "0.0191 < 0, which im-

plies the existence of stochastic business cycles in the recruits series.The average length of the stochastic cycles based on the above fittedAR(2) model is approximately

T0 =2-

cos"1["1/(2'""2)]

= 61.71 months,

which is about 5 years. Note that this average length of cycles is notclose to what we have observed (about 50 months). Please figure outthe reason why there is a big di!erence.

3.13 Impulse Response Function

The task facing the modern time-series econometrician is to developreasonably simple and intuitive models capable of forecasting, inter-preting and hypothesis testing regarding economic and financial data.In this respect, the time series econometrician is often concerned with


the estimation of di!erence equations containing stochastic compo-nents.

3.13.1 First Order Di!erence Equations

Suppose we are given the dynamic equation yt = "1 yt"1 +wt, whereyt is the value of variable at period t, wt is the value of variable atperiod t, 1 # t # n. Indeed, this is an AR(1) model. The equationrelates a variable yt to its previous (lagged) values with only the firstlag appears on the right hand side (RHS) of the equation. For now,the input variable, {w1, w2, w3, . . .}, will simply be regarded asa sequence of deterministic numbers which can be generated from adistribution. Later on, we will assume that they are stochastic. Bysolving a di!erence equation by recursive substitution, assuming thatwe know the starting value of y"1, called the initial condition, then,we have

y0 = "1 y"1 + w0,

y1 = "1 y0 + w1 = "("1 y"1 + w0) + w1 = "21 y"1 + "1 w0 + w1,

y2 = "1 y1 + w2 = "1("21 y"1 + "1 w0 + w1) + w2 = "3

1 y"1 + "21 w0 + "1 w1 +

...

yt = "1 yt"1 + wt = "t+11 y"1 + "t

1 w0 + "t"1 w1 + · · · + "1 wt"1 + wt

= "t+11 y"1 +

t$

k=0"k

1 wt"k. (3.39)

The procedure to express yt in term of the past values of wt and thestarting value y"1 is known as recursive substitution. The lastterm on RHS of (3.39) is called the linear process (with finite terms)generated by {wt} if {wt} is random.

Next we consider the computation of dynamic multiplier. Todo so, we consider one simple experiment by assuming that y"1 and


{w0, w1, · · · , wn} are given, fixed at this moment, and the valueof "1 is known. Then we can compute the time series yt by yt ="1 yt"1 + wt for 1 # t # n. What happens to the series yt if at theperiod t = 50, we change the value of deterministic component w50

and set the new w to be Hw50 = w50 + 1? Of course, the change inw50 leads to changes in yt:

!y50 = "1 y49 + Hw50 = "1 y49 + w50 + 1 = y50 + 1,!y51 = "1 y50 + w51 = "1(y50 + 1) + w51 = y51 + "1,!y52 = "1 !y51 + w52 = "1(y51 + "1) + w52 = y52 + "2

1,

and so on. If we look at the di!erence between !yt and yt then weobtain that: !y50"y50 = 1, !y51"y51 = "1, !y52"y52 = "2

1, and so on.This experiment is illustrated in Figures 3.21 and 3.22. Therefore,

0 20 40 60 80 100−4−2

02

4

No ImpulseWith Impulse

phi=0.8

0 20 40 60 80 100−4−2

02


phi= − 0.8

0 20 40 60 80 100−1.0

−0.5

0.00.5

1.0

Impulse Response Function

phi=0.8

0 20 40 60 80 100−1.0

−0.5

0.00.5

1.0


phi= − 0.8

Figure 3.21: The time-series yt is generated with wt % N(0, 1), y0 = 5. At period t = 50,there is an additional impulse to the error term, i.e. !w50 = w50 + 1. The impulse responsefunction is computed as the di!erence between the series yt without impulse and the series!yt with the impulse.

one can say that one unit increase in w50 leads to y50 increased by 1,


0 20 40 60 80 100

510

1520

2530


phi=1.010 20 40 60 80 100−20

−100

1020


phi= − 1.01

0 20 40 60 80 100

−2−1

01

2


phi=1.01

0 20 40 60 80 100−2

−10

12


phi= − 1.01

Figure 3.22: The time-series yt is generated with wt % N(0, 1), y0 = 3. At period t = 50,there is an additional impulse to the error term, i.e. !w50 = w50 + 1. The impulse responsefunction is computed as the di!erence between the series yt without impulse and the series!yt with the impulse.

y51 increased by "1, y52 increased by "21, and so so. The question is

how yt+j changes if change wt by one? This is exactly the questionthat dynamic multipliers answer.

Assume that we start with yt"1 instead of y"1, i.e. we observe thevalue of yt"1 at period t " 1. Can we say something about yt+j?Well, let us answer this question.

yt = "1 yt"1 + wt,

yt+1 = "1 yt + wt+1 = "1("1 yt"1 + wt) + wt+1 = "21 yt"1 + "1 wt + wt+1,

...

yt+j = "j+11 yt"1 + "j

1 wt + "j"1 wt+1 + · · · + "1 wt+j"1 + wt+j

The dynamic multiplier is defined by

.yt+j

.wt= "j

1 for an AR(1) model.


Note that the multiplier does not depend on t. The dynamic multi-plier .yt+j/.wt is called sometimes the impact multiplier.


We can plot dynamic multipliers as a function of lag j, i.e plot{.yt+j.wt

}Jj=1. Because dynamic multipliers calculate the response of

yt+j to a single impulse in wt, it is also referred to as the impulseresponse function (IRF). This function has many importantapplications in time series analysis because it shows how the entirepath of a variable is e!ected by a stochastic shock. Obviously, thedynamic of impulse response function depends on the value of "1 foran AR(1) model. Let us look at what the relationship is between theIRF and system.

(a) 0 < "1 < 1 implies that the impulse response converges tozero and the system is stable.

(b) "1 < "1 < 0 gives that the impulse response oscillates butconverges to zero and the system is stable.

(c) "1 > 1 concludes that the impulse response is explosive andthe system is unstable.

(d) "1 < "1 implies that the impulse response is explosive (os-cillationally) and the system is unstable.

Impulse response functions for all possible cases are presented inFigure 3.23.

Permanent Change in wt

In calculating dynamic multipliers in Figure 3.23, we were askingwhat would happen if wt were to increase by one unit with wt+1, wt+2, · · · , , wt+


o

o

oo o o o o o o o o o o o o o o o o

5 10 15 20

0.00.2

0.40.6

0.81.0

oo

oo

oo

oo o o o o o o o o o o o o

o o o o o o o o o o o o o o o o o o o o

phi1=0.5 phi1=0.9

phi1=0.99

o

o

oo

o o o o o o o o o o o o o o o o

5 10 15 20−1.0

−0.5

0.00.5

1.0

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

phi1= − 0.5 phi1= − 0.9

phi1= − 0.99

o o o o o o o o o o o oo

oo

oo

o

o

o

5 10 15 20010

2030

phi1=1.2

o o oo

oo

oo

o

o

o

o

o

o

o

o

o

o

o

o

5 10 15 20−30

−100

1020

3040

phi1= − 1.2

Figure 3.23: Example of impulse response functions for first order di!erence equations.

una!ected. We were finding the e!ect of a purely transitory changein wt. Permanent change in wt means that wt, wt+1, · · · , , wt+j

would all increase by one unit. The e!ect on yt+j of a permanentchange in wt beginning in period t is then given by

.yt+j

.wt+.yt+j

.wt+1+· · ·+

.yt+j

.wt+j= "j

1+"j"11 +· · ·+"1 =

1 " "j+11

1 " "1for the AR(1) mo

The di!erent between transitory and permanent change in wt is il-lustrated in Figure 3.24.

3.13.2 Higher Order Di!erence Equations

Consider the model: A linear pth order di!erence equation has thefollowing form: yt = "1 yt"1 + "2 yt"2 + · · · + "p yt"p + wt, whichis an AR(p) model if wt is the white noise. It is easy to drive theproperties of the pth order di!erence equation. To illustrate theproperties of pth order di!erence equation, we consider the second


0 20 40 60 80 100

−3−2

−10

12


phi=0.8

0 20 40 60 80 100−4−2

02

4 No ImpulseWith Impulse

phi=0.8

0 20 40 60 80 100

0.00.2

0.40.6

0.81.0


phi=0.8

0 20 40 60 80 1000

12

34

5


phi=0.8

Figure 3.24: The time series yt is generated with wt % N(0, 1), y0 = 3. For the transitoryimpulse, there is an additional impulse to the error term at period t = 50, i.e. !w50 = w50 +1.For the permanent impulse, there is an additional impulse for period t = 50, · · ·, 100, i.e.!wt = wt + 1, t = 50, 51, · · ·, 100. The impulse response function (IRF) is computed as thedi!erence between the series yt without impulse and the series !yt with the impulse.

order di!erence equation: yt = "1 yt"1 + "2 yt"2 + wt so that p = 2.We write equation as a first order vector di!erence equation.

$t = F $t"1 + vt,

where

$t =-

.yt

yt"1

/

0 , F =-

."1 "2

1 0

/

0 , and vt =-

.wt

0

/

0 .

If the starting value $"1 is known we use the recursive substitutionto obtain:

$t = Ft+1 $"1 + Ft v0 + Ft"1 v1 + · · · + Fvt"1 + vt.

Or if the starting value $t"1 is known we use the recursive substitu-tion to obtain:

$t+j = Fj+1 $t"1 + Fj vt + Fj"1 vt+1 + · · · + Fvt+j"1 + vt+j


for any j ( 0.

To compute the dynamic multipliers, we recall the rules of of vectorand matrix di!erentiation. If x(!) is an m $ 1 vector that dependson the n $ 1 vector !, then,

.x

.!-m$n

=

-

:::::.

.x1(!).%1

· · · .x1(!).%n... . . . ...

.xm(!).%1

· · · .xm(!).%n

/

;;;;;0.

It is easy to verify that the dynamic multipliers are given by

.$t+j

.v-t

=.Fj vt

v-t

= Fj.

Since the first element in $t+j is yt+j and the first element in vt iswt then the element(1, 1) in the matrix Fj is .yt+j/.wt, i.e. thedynamic multiplier. For larger values of j, an easy way to obtain anumerical values for dynamic multiplier .yt+j/.wt is to simulate thesystem. This is done as follows. Set y"1 = y"2 = · · · = y"p = 0 andw0 = 1, and set the values of w for all other dates to 0. Then useAR(p) to calculate the value for yt for # t # n.

To illustrate dynamic multipliers for the pth order di!erence equa-tion, we consider the second order di!erence equation as follows:

yt = "1 yt"1 + "2 yt"2 + wt, so that F =-

."1 "2

1 0

/

0. The impulse

response functions for this example with four di!erent settings arepresented in Figure 3.25:

(a) "1 = 0.6 and "2 = 0.2,

(b) "1 = 0.8 and "2 = 0.4,

(c) "1 = "0.9 and "2 = "0.5, and


o

oo

oo

oo

o o o o o o o o o o o o o o

0 5 10 15 20

0.00.2

0.40.6

0.81.0

(a)


phi1=0.6 and phi2=0.2

o o o o o o o o o o oo

oo

oo

oo

o

o

o

0 5 10 15 20

02

46

810

12

(b)


phi1=0.8 and phi2=0.4

o

o

oo

o

o

o oo

o o o o o o o o o o o o

0 5 10 15 20−1.0

−0.5

0.00.5

1.0

(c)


phi1= − 0.9 and phi2= − 0.5

o o oo o

o oo

oo

o o

o

o

o

o

o

o

o

o

o

0 5 10 15 20−40−20

010

2030

(d)


phi1= − 0.5 and phi2= − 1.5

Figure 3.25: Example of impulse response functions for second order di!erence equation.

(d) "1 = "0.5 and "2 = "1.5.

Similar to the first order di!erence equation, the impulse responsefunction for the pth order di!erence equation can be explosive or con-verge to zero. What determines the dynamics of impulse responsefunction? The eigenvalues of matrix F determine whether the im-pulse response oscillates, converges or is explosive. The eigenvaluesof a p $ p matrix A are those numbers * for which

|A " * Ip| = 0.

The eigenvalues of the general matrix F defined above are the valuesof * that satisfy:

*p " "1 *p"1 " "2 *

p"2 " · · ·" "p"1 *" "p = "(*"p) = 0.

Note that the eigenvalues are also called the characteristic rootsof AR(p) model. If the eigenvalues are real but at least one eigen-value is greater than unity in absolute value, the system is explosive.


Why do eigenvalues determine the dynamics of dynamic multipliers?Recall that if the eigenvalues of p $ p matrix A are distinct, thereexists a nonsingular p$ p matrix T such that A = T"T"1, where" is a p $ p matrix with the eigenvalues of A on the principal di-agonal and zeros elsewhere. Then, Fj = T"j T"1 and the value ofeigenvalues in " determines whether the elements of Fj explode ornot. Recall that the dynamic multiplier is equal to .$t+j/.v

-t = Fj.

Therefore, the size of eigenvalues determines whether the system isstable or not.

Now we compute the the eigenvalues for each case in the aboveexample:

(a) *1 = 0.838 and *2 = "0.238 so that |*k| < 1 and thesystem is stable.

(b) *1 = 1.148 and *2 = "0.348 so that |*1| > 1 and thesystem is unstable.

(c) * = "0.45 ± 0.545 i so that |*| =&("0.45)2 + 0.5452 =

0.706 < 1 and the system is stable. Since eigenvalues arecomplex, the impulse response function oscillates.

(d) * = "0.25 ± 1.198 i so that |*| =&("0.25)2 + 1.1982 =

1.223 > 1 and the system is unstable. Since eigenvalues arecomplex, the impulse response function oscillates.

3.14 Problems

1. Consider the regression model yt = %1 yt"1 + wt, where wt iswhite noise with zero-mean and variance )2

w. Assume that weobserve {yt}n

t=2. Show that the least squares estimator of %1 is#%1 =

%nt=2 yt yt"1/

%t=2 y2

t"1. If we pretend that yt"1 would be


fixed, show that Var(#%1) = )2e/

%nt=2 y2

t"1. Relate your answer toa method for fitting a first-order AR model to the data yt.

2. Consider the autoregressive model AR(1), i.e. xt""1 xt"1 = wt.

(a) Find the necessary condition for {xt} to be invertible.

(b) Show that xt can be expressed as a linear process.

(c) Show that E[wt xt] = )2w and E[wt xt"1] = 0, so that future

errors are uncorrelated with past data.

3. The auto-covariance and autocorrelation functions for AR pro-cesses are often derived from the Yule-Walker equations, ob-tained by multiplying both sides of the defining equation, succes-sively by xt, xt"1, · · ·. Use the Yule-Walker equations to drive$x(h) for the first-order AR.

4. For an ARMA series we define the optimal forecast based on xt,xt"1, · · · as the conditional expectation xt

t+h = E[xt+h |xt, xt"1, · · ·]for h ( 1.

(a) Show, for the general ARMA model that E[wt+h |xt, xt"1, · · ·] =0 if h > 0, wt+h if h # 0.

(b) For the AR(1) and AR(2) models, derive the optimal forecastxt

t+h and the prediction error variance of the one-step forecast.

5. Suppose we have the simple linear trend model yt = %1 t + xt,1 # t # n, where xt = "1 xt"1 + wt. Give the exact form of theequations that you would use for estimating %1, "1 and )2

w usingthe Cochran-Orcutt procedure.

6. Suppose that the simple return of a monthly bond index followsthe MA(1) model

xt = wt + 0.2 wt"1, )w = 0.025.


Assume that w100 = 0.01. Compute the 1-step (xtt+1) and 2-

step (xtt+2) ahead forecasts of the return at the forecast origin

t = 100. What are the standard deviations of the associatedforecast errors? Also compute the lag-1 ($(1)) and lag-2 ($(2))autocorrelations of the return series.

7. Suppose that the daily log return of a security follows the model

xt = 0.01 + 0.2 xt"2 + wt,

where {wt} is a Gaussian white noise series with mean zero andvariance 0.02. What are the mean and variance of the returnseries xt? Compute the lag-1 ($(1)) and lag-2 ($(2)) autocor-relations of xt. Assume that x100 = "0.01, and x99 = 0.02.Compute the 1-step (xt

t+1) and 2-step (xtt+2) ahead forecasts of

the return series at the forecast origin t = 100. What are theassociated standard deviations of the forecast errors?

8. Consider the file “la-regr.dat”, in the syllabus, which containscardiovascular mortality, temperature values and particulate lev-els over 6-day periods from Los Angeles County (1970-1979).The file also contains two dummy variables for regression pur-poses, a column of ones for the constant term and a time index.The order is as follows: Column 1: 508 cardiovascular mortalityvalues (6-day averages), Column 2: 508 ones, Column 3: theintegers 1, 2, · · ·, 508, Column 3: Temperature in degrees F andColumn 4: Particulate levels. A reference is Shumway et. al.(1988). The point here is to examine possible relations betweenthe temperature and mortality in the presence of a time trendin cardiovascular mortality.

(a) Use scatter diagrams to argue that particulate level may belinearly related to mortality and that temperature has either a


linear or quadratic relation. Check for lagged relations using thecross correlation function.

(b) Adjust temperature for its mean value and fit the model

Mt = %0 + %1(Tt " T ) + %2(Tt " T )2 + %3 Pt + et,

where Mt, Tt and Pt denote the mortality, temperature and par-ticulate pollution series. You can use as inputs Columns 2 and 3for the trend terms and run the regression analysis without theconstant option.

(c) Plot the residuals and compute the autocorrelation (ACF)and partial autocorrelation (PACF) functions. Do the residualsappear to be white? Suggest an ARIMA model for the residualsand fit the residuals. The simple ARIMA(2, 0, 0) model is a goodcompromise.

(d) Apply the ARIMA model obtained in part (c) to all of theinput variables and to cardiovascular mortality using a transfor-mation. Retain the forecast values for the transformed mortality,say mt = Mt " "1 Mt"1 " "2 Mt"2.

9. Generate 10 realizations of a (n = 200 points each) series from anARIMA(1, 0, 1) Model with "1 = 0.90, !1 = 0.20 and )2

w = 0.25.Fit the ARIMA model to each of the series and compare theestimators to the true values by computing the average of theestimators and their standard deviations.

10. Consider the bivariate time series record containing monthlyU.S. Production as measured monthly by the Federal ReserveBoard Production Index and unemployment as given in the file“frb.asd”. The file contains n = 372 monthly values for each se-ries. Before you begin, be sure to plot the series. Fit a seasonal


ARIMA model of your choice to the Federal Reserve ProductionIndex. Develop a 12 month forecast using the model.

11. The file labelled “clim-hyd.asd” has 454 months of measuredvalues for the climatic variables Air Temperature, Dew Point,Cloud Cover, Wind Speed, Preciptation, and Inflow at ShastaLake. We would like to look at possible relations between theweather factors and between the weather factors and the inflowto Shasta Lake.

(a) Fit the ARIMA(0, 0, 0) $ (0, 1, 1)12 model to transformedprecipitation Pt =

'pt and transformed flow it = log(it). Save

the residuals for transformed precipitation for use in part (b).

(b) Apply the ARIMA model fitted in part (a) for transformedprecipitation to the flow series. Compute the cross correlationbetween the flow residuals using the precipitation ARIMA modeland the precipitation residuals using the precipitation model andinterpret.

12. Consider the daily simple return of CRSP equal-weighted index,including distributions, from January 1980 to December 1999 inthe file “d-ew8099.txt” (day, ew). The indicator variables forMondays, Tuesdays, Wednesdays, and Thursdays are in the firstfour columns of the file “wkdays8099.dat”. Use a regressionmodel, possibly with time series errors, to study the e!ects oftrading days on the index return. What is the fitted model? Arethe weekday e!ects significant in the returns at the 5% level? Usethe HAC estimator of the covariance matrix to obtain the t-ratiosof regression estimates. Does it change the conclusion of weekdaye!ect? Are there serial correlations in the residuals? Use theLjung-Box test to perform the test. Draw your conclusion. If


yes, build a regression model with time series error to studyweekday e!ects.

13. This problem is concerned with the dynamic relationship be-tween the spot and futures prices of the S&P500 index. Thedata file “sp5may.dat” has three columns: log(futures price),log(spot price), and cost-of-carry ($100). The data were ob-tained from the Chicago Mercantile Exchange for the S&P 500stock index in May 1993 and its June futures contract. The timeinterval is 1 minute (intraday). Several authors used the datato study index futures arbitrage. Here we focus on the first twocolumns. Let ft and st be the log prices of futures and spot,respectively. Build a regression model with time series errors be-tween {ft} and {st}, with ft being the dependent variable. Youneed to provide all details (reasons and analysis results) at eachstep.

14. The quarterly gross domestic product implicit price deflator isoften used as a measure of inflation. The file “q-gdpdef.dat”contains the data for U.S. from the first quarter of 1947 to thefirst quarter of 2004. The data are seasonally adjusted and equalto 100 for year 2000. The data are obtained from the FederalReserve Bank of St Louis. Build a (seasonal) ARIMA modelfor the series and check the validity of the fitted model. Usethe model to forecast the gross domestic product implicit pricedeflator for the rest of 2004.

15. Consider the monthly simple returns of the Decile 1, Decile 5,and Decile 10 of NYSE/AMEX/NASDAQ based on market cap-italization. The data span is from January 1960 to December2003, and the data are obtained from CRSP with the file name:


“m-decile1510.txt”. For each series, test the null hypothesisthat the first 12 lags of autocorrelations are zero at the 5% level.Draw your conclusion. Build an AR and MA model for the seriesDecile 5. Use the AR and MA model built to produce 1-step and3-step ahead forecasts of the series. Compare the fitted AR andMA models.

16. The data file “q-unemrate.txt” contains the U.S. quarterly un-employment rate, seasonally adjusted, from 1948 to the secondquarter of 1991. Consider the change series # xt = xt " xt"1,where xt is the quarterly unemployment rate. Build an ARmodel for the # xt series. Does the fitted model suggest theexistence of business cycles?

17. In this exercise, please construct the impulse response functionfor a a third order di!erence equation: yt = "1 yt"1 + "2 yt"2 +"3 yt"3 +wt for 1 # t # n, where it is assumed that {wt}n

t=1 is asequence of deterministic numbers, say generated from N(0, 1).

(a) Set "1 = 1.1, "2 = "0.8, "3 = 0.1, y0 = y"1 = y"2 = 0, andn = 150. Generate yt using a third order di!erence equation for1 # t # n.

(b) Check eigenvalues of this model to determine whether IRFare converging or explosive.

(c) Construct the impulse response function for the generatedyt. Set the number of periods in the impulse response functionto J = 25. Comment your results.

(d) Set "1 = 1.71 and repeat steps (a) - (c). Comment yourresults.


3.15 Computer Code


# 3-28-2006

graphics.off()

###################################################################

ibm<-matrix(scan("c:\\teaching\\time series\\data\\m-ibm2697.txt"),

byrow=T,ncol=1)

vw<-matrix(scan("c:\\teaching\\time series\\data\\m-vw2697.txt"),

byrow=T,ncol=1)

n=length(ibm)

ibm1=ibm

ibm2=log(ibm1+1)

vw1=vw

vw2=log(1+vw1)




acf(ibm1, ylab="", xlab="",ylim=c(-0.2,0.2),lag=100,

main="Simple Returns",cex=0.5)

text(50,0.2,"IBM")

acf(ibm2, ylab="", xlab="",ylim=c(-0.2,0.2),lag=100,

main="Log Returns",cex=0.5)

text(50,0.2,"IBM")

acf(vw1, ylab="", xlab="",ylim=c(-0.2,0.2),lag=100,

main="Simple Returns",cex=0.5)


text(50,0.2,"value-weighted index")

acf(vw2, ylab="", xlab="",ylim=c(-0.2,0.2),lag=100,

main="Log Returns",cex=0.5)

text(50,0.2,"value-weighted index")

dev.off()

###################################################################

###################################################################

y1<-matrix(scan("c:\\teaching\\time series\\data\\ngtemp.dat"),byrow=T,ncol=1)

y=y1[,1]

n=length(y)

a<-1:12

a=a/12

y=y1[,1]

n=length(y)

x<-rep(0,n)

for(i in 1:149){

x[((i-1)*12+1):(12*i)]<-1856+i-1+a

}

x[n-1]<-2005+1/12

x[n]=2005+2/13

x=x/100

x1=cbind(rep(1,n),x)

z=t(x1)%*%x1

fit1=lm(y~x) # fit a regression model

resid1=fit1$resid # obatin residuls

sigma2=mean(resid1^2)

y.diff=diff(y) # compute difference


var_beta=sigma2*solve(z)




acf(resid1,lag.max=20,ylab="",ylim=c(-0.5,1),

main="Detrended Temperature",cex=0.5)

text(5,0.8,"ACF")

pacf(resid1,lag.max=20,ylab="",ylim=c(-0.5,1),main="")

text(5,0.8,"PACF")

acf(y.diff,lag.max=20,ylab="",ylim=c(-0.5,1),

main="Differenced Temperature",cex=0.5)

text(5,0.8,"ACF")

pacf(y.diff,lag.max=20,ylab="",ylim=c(-0.5,1),main="")

text(5,0.8,"PACF")

dev.off()

###################################################################

# simulate an I(1) series

n=200

#y=arima.sim(list(order=c(0,1,0)),n=200) # simulate the integradted

x2=rnorm(n) # a white noise series

y=diffinv(x) # simulate I(1) with






text(100,0.8*max(y),"Random Walk")

ts.plot(x2,type="l",ylab="",xlab="")

text(100,0.8*max(x2),"First Difference")

abline(0,0)

dev.off()




acf(y,ylab="",xlab="",main="Random Walk",cex=0.5,ylim=c(-0.5,1.0))

text(15,0.8,"ACF")

pacf(y,ylab="",xlab="lag",main="",ylim=c(-0.5,1.0))

text(15,0.8,"PACF")

acf(x2,ylab="",xlab="",main="First Difference",cex=0.5,ylim=c(-0.5,1.0))

text(15,0.8,"ACF")

pacf(x2,ylab="",xlab="lag",main="",ylim=c(-0.5,1.0))

text(15,0.8,"PACF")

dev.off()

################################################################

# This is Example 2.5 in Chapter 2

###################################

x<-read.table("c:\\teaching\\time series\\data\\soi.dat",header=T)

x.soi=x[,1]

n=length(x.soi)

aicc=0

if(aicc==1){

aic.value=rep(0,30) # max.lag=10


aicc.value=aic.value

sigma.value=rep(0,30)

for(i in 1:30){

fit3=arima(x.soi,order=c(i,0,0)) # fit an AR(i)

aic.value[i]=fit3$aic/n-2 # compute AIC

sigma.value[i]=fit3$sigma2

# obtain the estimated sigma^2

aicc.value[i]=log(sigma.value[i])+(n+i)/(n-i-2) # compute AICC

print(c(i,aic.value[i],aicc.value[i]))}

data=cbind(aic.value,aicc.value)

write(t(data),"c:\\teaching\\time series\\soi_aic.dat",ncol=2)

}else{

data<-matrix(scan("c:\\teaching\\time series\\soi_aic.dat"),byrow=T,ncol=2)

}

text4=c("AIC", "AICC")




acf(resid1,ylab="",xlab="",lag.max=20,ylim=c(-0.5,1),main="")

text(10,0.8,"ACF of residuls of AR(1) for SOI")

matplot(1:30,data,type="b",pch="o",col=c(1,2),ylab="",xlab="Lag"

legend(16,-1.40,text4,lty=1,col=c(1,2))

dev.off()

#fit2=arima(x.soi,order=c(16,0,0))

#print(fit2)

###################################################################



####################################

varve<-read.table("c:\\teaching\\time series\\data\\mass2.dat",header=T)

varve=varve[,1]

n_varve=length(varve)

varve_log=log(varve)

varve_log_diff=diff(varve_log)




acf(varve_log,ylab="",xlab="",lag.max=30,ylim=c(-0.5,1),main="AC

text(10,0.7,"log varves",cex=0.7)

pacf(varve_log,ylab="",xlab="",lag.max=30,ylim=c(-0.5,1),main="P

acf(varve_log_diff,ylab="",xlab="",lag.max=30,ylim=c(-0.5,1),mai

text(10,0.7,"First difference",cex=0.7)

pacf(varve_log_diff,ylab="",xlab="",lag.max=30,ylim=c(-0.5,1),ma

dev.off()

###################################################################


####################################

x<-matrix(scan("c:\\teaching\\time series\\data\\birth.dat"),byrow=T,ncol=1)

n=length(x)

x_diff=diff(x)

x_diff_12=diff(x_diff,lag=12)

fit1=arima(x,order=c(0,0,0),seasonal=list(order=c(0,0,0)),in

resid_1=fit1$resid

fit2=arima(x,order=c(0,1,0),seasonal=list(order=c(0,0,0)),in

resid_2=fit2$resid


fit3=arima(x,order=c(0,1,0),seasonal=list(order=c(0,1,0),peri

include.mean=F)

resid_3=fit3$resid




acf(resid_1, ylab="", xlab="",ylim=c(-0.5,1),lag=60,main="ACF",cex=0.7)

pacf(resid_1,ylab="",xlab="",ylim=c(-0.5,1),lag=60,main="PACF",c

text(20,0.7,"data",cex=1.2)

acf(resid_2, ylab="", xlab="",ylim=c(-0.5,1),lag=60,main="")

# differenced data

pacf(resid_2,ylab="",xlab="",ylim=c(-0.5,1),lag=60,main="")

text(30,0.7,"ARIMA(0,1,0)")


# seasonal difference of differenced data


text(30,0.7,"ARIMA(0,1,0)X(0,1,0)_{12}",cex=0.8)

fit4=arima(x,order=c(0,1,0),seasonal=list(order=c(0,1,1),

period=12),include.mean=F)

resid_4=fit4$resid

fit5=arima(x,order=c(0,1,1),seasonal=list(order=c(0,1,1),

period=12),include.mean=F)

resid_5=fit5$resid


# ARIMA(0,1,0)*(0,1,1)_12


text(30,0.7,"ARIMA(0,1,0)X(0,1,1)_{12}",cex=0.8)



# ARIMA(0,1,1)*(0,1,1)_12


text(30,0.7,"ARIMA(0,1,1)X(0,1,1)_{12}",cex=0.8)

dev.off()




ts.plot(x,type="l",lty=1,ylab="",xlab="")

text(250,375, "Births")

ts.plot(x_diff,type="l",lty=1,ylab="",xlab="",ylim=c(-50,50))

text(255,45, "First difference")

abline(0,0)

ts.plot(x_diff_12,type="l",lty=1,ylab="",xlab="",ylim=c(-50,50))

# time series plot of the seasonal difference (s=12) of differenced

text(225,40,"ARIMA(0,1,0)X(0,1,0)_{12}")

abline(0,0)

ts.plot(resid_5,type="l",lty=1,ylab="",xlab="",ylim=c(-50,50))

text(225,40, "ARIMA(0,1,1)X(0,1,1)_{12}")

abline(0,0)

dev.off()

###############################################################

###################################################################


#####################################

y<-matrix(scan("c:\\teaching\\time series\\data\\jj.dat"),byrow=T,ncol=1)


n=length(y)

y_log=log(y) # log of data

y_diff=diff(y_log) # first-order difference

y_diff_4=diff(y_diff,lag=4) # first-order seasonal difference

fit1=ar(y_log,order=1) # fit AR(1) model

#print(fit1)

library(tseries) # call library(tseries)

library(zoo)

fit1_test=adf.test(y_log)

# do Augmented Dicky-Fuller test for testing unit root

#print(fit1_test)

fit1=arima(y_log,order=c(0,0,0),seasonal=list(order=c(0,0,0))

include.mean=F)

resid_21=fit1$resid

fit2=arima(y_log,order=c(0,1,0),seasonal=list(order=c(0,0,0))

include.mean=F)

resid_22=fit2$resid # residual for ARIMA(0,1,0)*(0,0,0)

fit3=arima(y_log,order=c(0,1,0),seasonal=list(order=c(1,0,0),

include.mean=F,method=c("CSS"))

resid_23=fit3$resid # residual for ARIMA(0,1,0)*(1,0,0)_4

# note that this model is non-stationary so that "CSS" is used




acf(resid_21, ylab="", xlab="",ylim=c(-0.5,1),lag=30,main="ACF",cex=0.7)

text(16,0.8,"log(J&J)")

pacf(resid_21,ylab="",xlab="",ylim=c(-0.5,1),lag=30,main="PACF",



text(16,0.8,"First Difference")



text(16,0.8,"ARIMA(0,1,0)X(1,0,0,)_4",cex=0.8)



period=4),include.mean=F,method=c("CSS"))


# note that this model is non-stationary

#print(fit4)

fit4_test=Box.test(resid_4,lag=12, type=c("Ljung-Box"))

#print(fit4_test)


text(16,0.8,"ARIMA(0,1,1)X(1,0,0,)_4",cex=0.8)

# ARIMA(0,1,1)*(1,0,0)_4


dev.off()


include.mean=F,method=c("ML"))


#print(fit5)

fit5_test=Box.test(resid_25,lag=12, type=c("Ljung-Box"))

#print(fit5_test)




acf(resid_25, ylab="", xlab="",ylim=c(-0.5,1),lag=30,main="ACF")


text(16,0.8,"ARIMA(0,1,1)X(0,1,1,)_4",cex=0.8)

# ARIMA(0,1,1)*(0,1,1)_4

pacf(resid_25,ylab="",xlab="",ylim=c(-0.5,1),lag=30,main="PACF")

ts.plot(resid_24,type="l",lty=1,ylab="",xlab="")

title(main="Residual Plot",cex=0.5)

text(40,0.2,"ARIMA(0,1,1)X(1,0,0,)_4",cex=0.8)

abline(0,0)

ts.plot(resid_25,type="l",lty=1,ylab="",xlab="")

title(main="Residual Plot",cex=0.5)

text(40,0.18,"ARIMA(0,1,1)X(0,1,1,)_4",cex=0.8)

abline(0,0)

dev.off()

###################################################################

###################################################################


z<-matrix(scan("c:\\teaching\\time series\\data\\m-decile1510.txt"),byrow=T,ncol=4)

decile1=z[,2]

# Model 1: an ARIMA(1,0,0)*(1,0,1)_12

fit1=arima(decile1,order=c(1,0,0),seasonal=list(order=c(1,0,1)

period=12),include.mean=T)

#print(fit1)

e1=fit1$resid

n=length(decile1)

m=n/12

jan=rep(c(1,0,0,0,0,0,0,0,0,0,0,0),m)

feb=rep(c(0,1,0,0,0,0,0,0,0,0,0,0),m)


mar=rep(c(0,0,1,0,0,0,0,0,0,0,0,0),m)

apr=rep(c(0,0,0,1,0,0,0,0,0,0,0,0),m)

may=rep(c(0,0,0,0,1,0,0,0,0,0,0,0),m)

jun=rep(c(0,0,0,0,0,1,0,0,0,0,0,0),m)

jul=rep(c(0,0,0,0,0,0,1,0,0,0,0,0),m)

aug=rep(c(0,0,0,0,0,0,0,1,0,0,0,0),m)

sep=rep(c(0,0,0,0,0,0,0,0,1,0,0,0),m)

oct=rep(c(0,0,0,0,0,0,0,0,0,1,0,0),m)

nov=rep(c(0,0,0,0,0,0,0,0,0,0,1,0),m)

dec=rep(c(0,0,0,0,0,0,0,0,0,0,0,1),m)

de=cbind(decile1[jan==1],decile1[feb==1],decile1[mar==1],decile1[a

decile1[may==1],decile1[jun==1],decile1[jul==1],decile1[aug==1],

decile1[sep==1],decile1[oct==1],decile1[nov==1],decile1[dec==1])

# Model 2: a simple regression model without correlated errors

# to see the effect from January

fit2=lm(decile1~jan)

e2=fit2$resid

#print(summary(fit2))

# Model 3: a regression model with correlated errors

fit3=arima(decile1,xreg=jan,order=c(0,0,1),include.mean=T)

e3=fit3$resid

#print(fit3)




ts.plot(decile1,type="l",lty=1,col=1,ylab="",xlab="")

title(main="Simple Returns",cex=0.5)

abline(0,0)

ts.plot(e3,type="l",lty=1,col=1,ylab="",xlab="")


title(main="January-adjusted returns",cex=0.5)

abline(0,0)

acf(decile1, ylab="", xlab="",ylim=c(-0.5,1),lag=40,main="ACF")

acf(e3,ylab="",xlab="",ylim=c(-0.5,1),lag=40,main="ACF")

dev.off()

###################################################################


#####################################

z<-matrix(scan("c:\\teaching\\time series\\data\\jj.dat"),byrow=T,ncol=1)

n=length(z)

z_log=log(z) # log of

# MODEL 1: y_t=beta_0+beta_1 t+ e_t

z1=1:n

fit1=lm(z_log~z1) # fit log(z) versus time

e1=fit1$resid

# Now, we need to re-fit the model using the transformed data

x1=5:n

y_1=z_log[5:n]

y_2=z_log[1:(n-4)]

y_fit=y_1-0.7614*y_2

x2=x1-0.7614*(x1-4)

x1=(1-0.7614)*rep(1,n-4)

fit2=lm(y_fit~-1+x1+x2)

e2=fit2$resid




acf(e1, ylab="", xlab="",ylim=c(-0.5,1),lag=30,main="ACF")

text(10,0.8,"detrended")


pacf(e1,ylab="",xlab="",ylim=c(-0.5,1),lag=30,main="PACF")

acf(e2, ylab="", xlab="",ylim=c(-0.5,1),lag=30,main="")

text(15,0.8,"ARIMA(1,0,0,)_4")

pacf(e2,ylab="",xlab="",ylim=c(-0.5,1),lag=30,main="")

dev.off()

#################################################################

##############################################################


#####################################

z<-matrix(scan("c:\\teaching\\time series\\data\\w-gs1n36299.txt"),

byrow=T,ncol=3)

# first column=one year Treasury constant maturity rate;

# second column=three year Treasury constant maturity rate;

# third column=date

x=z[,1]

y=z[,2]

n=length(x)

u=seq(1962+1/52,by=1/52,length=n)

x_diff=diff(x)

y_diff=diff(y)

# Fit a simple regression model and examine the residuals

fit1=lm(y~x) # Model 1

e1=fit1$resid



matplot(u,cbind(x,y),type="l",lty=c(1,2),col=c(1,2),ylab="",


dev.off()




plot(x,y,type="p",pch="o",ylab="",xlab="",cex=0.5)

plot(x_diff,y_diff,type="p",pch="o",ylab="",xlab="",cex=0.5)

dev.off()




plot(u,e1,type="l",lty=1,ylab="",xlab="")

abline(0,0)

acf(e1,ylab="",xlab="",ylim=c(-0.5,1),lag=30,main="")

dev.off()

# Take different and fit a simple regression again

fit2=lm(y_diff~x_diff) # Model 2

e2=fit2$resid



matplot(u[-1],cbind(x_diff,y_diff),type="l",lty=c(1,2),col=c(

ylab="",xlab="")

abline(0,0)

dev.off()





ts.plot(e2,type="l",lty=1,ylab="",xlab="")

abline(0,0)

acf(e2, ylab="", xlab="",ylim=c(-0.5,1),lag=30,main="")

# fit a model to the differenced data with an MA(1) error

fit3=arima(y_diff,xreg=x_diff, order=c(0,0,1)) # Model

e3=fit3$resid

ts.plot(e3,type="l",lty=1,ylab="",xlab="")

abline(0,0)

acf(e3, ylab="",xlab="",ylim=c(-0.5,1),lag=30,main="")

dev.off()

#################################################################

###############################################################

# This is Example 14 of using HC and HAC

##########################################

library(sandwich) # HC and HAC are in the package "sandwich"

library(zoo)

z<-matrix(scan("c:\\teaching\\time series\\data\\w-gs1n36299.txt"),

byrow=T,ncol=3)

x=z[,1]

y=z[,2]

x_diff=diff(x)

y_diff=diff(y)

# Fit a simple regression model and examine the residuals

fit1=lm(y_diff~x_diff)

print(summary(fit1))

e1=fit1$resid

# Heteroskedasticity-Consistent Covariance Matrix Estimation


#hc0=vcovHC(fit1,type="const")

#print(sqrt(diag(hc0)))

# type=c("const","HC","HC0","HC1","HC2","HC3","HC4")

# HC0 is the White estimator

hc1=vcovHC(fit1,type="HC0")

print(sqrt(diag(hc1)))

#Heteroskedasticity and autocorrelation consistent (HAC) estimation

#of the covariance matrix of the coefficient estimates in a

#(generalized) linear regression model.

hac1=vcovHAC(fit1,sandwich=T)

print(sqrt(diag(hac1)))

###################################################################

# This is the Example 2.15 in Chapter 2

#######################################

z1<-matrix(scan("c:\\teaching\\time series\\data\\d-ibmvwewsp6203.txt"),

byrow=T,ncol=5)

vw=abs(z1[,3])

n_vw=length(vw)

ew=abs(z1[,4])



par(mfrow=c(2,2),mex=0.4,bg="light green")

acf(vw, ylab="",xlab="",ylim=c(-0.1,0.4),lag=400,main="")

text(200,0.38,"ACF for value-weighted index")

acf(ew, ylab="",xlab="",ylim=c(-0.1,0.4),lag=400,main="")

text(200,0.38,"ACF for equal-weighted index")

library(fracdiff)

d1=fracdiff(vw,ar=0,ma=0)


d2=fracdiff(ew,ar=0,ma=0)

print(c(d1$d,d2$d))

m1=round(log(n_vw)/log(2)+0.5)

pad1=1-n_vw/2^m1

vw_spec=spec.pgram(vw,spans=c(3,3,3),demean=T,detrend=T,pad=pad1

ew_spec=spec.pgram(ew,spans=c(3,3,3),demean=T,detrend=T,pad=pad1

vw_x=vw_spec$freq[1:1000]

vw_y=vw_spec$spec[1:1000]

ew_x=ew_spec$freq[1:1000]

ew_y=ew_spec$spec[1:1000]

scatter.smooth(vw_x,log(vw_y),span=1/15,ylab="",xlab="",col=6,ce

text(0.04,-7,"Log Spectral Density of VW",cex=0.8)

scatter.smooth(ew_x,log(ew_y),span=1/15,ylab="",xlab="",col=7,ce

text(0.04,-7,"Log Spectral Density of EW",cex=0.8)

dev.off()

###################################################################

# This is the Example 2.16 in Chapter 2

#######################################

phi=c(1.2,-0.35,1.0,-0.70,0.2, 0.35,-0.2,0.35)

dim(phi)=c(2,4)

phi=t(phi)



par(mfrow=c(2,2),mex=0.4,bg="dark grey")

for(j in 1:4){

rho=rep(0,20)

rho[1]=1

rho[2]=phi[j,1]/(1-phi[j,2])


for(i in 3:20){rho[i]=phi[j,1]*rho[i-1]+phi[j,2]*rho[i-2]}

plot(1:20,rho,type="h",ylab="",ylim=c(-1,1),xlab="")

if(j==1){title(main="(a)",cex=0.8)}

if(j==2){title(main="(b)",cex=0.8)}

if(j==3){title(main="(c)",cex=0.8)}

if(j==4){title(main="(d)",cex=0.8)}

abline(0,0)

}

dev.off()

z1<-matrix(scan("c:\\teaching\\time series\\data\\q-gnp4791.txt"),

byrow=T,ncol=1)

n=length(z1)

x=1:n

x=x/4+1946.25



par(mfrow=c(1,2),mex=0.4,bg="light pink")

plot(x,z1,type="o",ylab="",xlab="")

abline(0,0)

acf(z1,main="",ylab="",xlab="",lag=30)

dev.off()

################################################################

# This is for making graphs for IRF

################################################

n=100

w_t1=rnorm(n,0,1)

w_t2=w_t1

w_t2[50]=w_t1[50]+1


y1=rep(0,2*(n+1))

dim(y1)=c(n+1,2)

y1[1,]=c(5,5)

y2=y1

phi1=c(0.8,-0.8)

for(i in 2:(n+1)){

y1[i,1]=phi1[1]*y1[(i-1),1]+w_t1[i-1]

y1[i,2]=phi1[1]*y1[(i-1),2]+w_t2[i-1]

y2[i,1]=phi1[2]*y2[(i-1),1]+w_t1[i-1]

y2[i,2]=phi1[2]*y2[(i-1),2]+w_t2[i-1]

}

y1=y1[2:101,]

y2=y2[2:101,]

irf1=y1[,2]-y1[,1]

irf2=y2[,2]-y2[,1]

text1=c("No Impulse","With Impulse")



par(mfrow=c(2,2),mex=0.4,bg="dark grey")

ts.plot(y1,type="l",lty=1,col=c(1,2),ylab="",xlab="")

abline(0,0)

legend(40,0.9*max(y1[,1]),text1,lty=c(1,2),col=c(1,2),cex=0.

text(40,0.8*min(y1[,1]),"phi=0.8")


abline(0,0)


text(60,0.8*min(y2[,1]),"phi= - 0.8")

plot(1:n,irf1,type="l",ylab="",ylim=c(-1,1),xlab="")

abline(0,0)

text(40,-0.6,"Impulse Response Function",cex=0.8)


text(20,0.8,"phi=0.8")


abline(0,0)


text(20,0.8,"phi= - 0.8")

dev.off()

n=100

w_t1=rnorm(n,0,1)

w_t2=w_t1

w_t2[50]=w_t1[50]+1

y1=rep(0,2*(n+1))

dim(y1)=c(n+1,2)

y1[1,]=c(3,3)

y2=y1

phi1=c(1.01,-1.01)

for(i in 2:(n+1)){

y1[i,1]=phi1[1]*y1[(i-1),1]+w_t1[i-1]

y1[i,2]=phi1[1]*y1[(i-1),2]+w_t2[i-1]

y2[i,1]=phi1[2]*y2[(i-1),1]+w_t1[i-1]

y2[i,2]=phi1[2]*y2[(i-1),2]+w_t2[i-1]

}

y1=y1[2:101,]

y2=y2[2:101,]

irf1=y1[,2]-y1[,1]

irf2=y2[,2]-y2[,1]




par(mfrow=c(2,2),mex=0.4,bg="light pink")



abline(0,0)


text(40,0.8*min(y1[,1]),"phi=1.01")


abline(0,0)


text(60,0.8*min(y2[,1]),"phi= - 1.01")


abline(0,0)


text(20,0.8,"phi=1.01")


abline(0,0)


text(20,0.8,"phi= - 1.01")

dev.off()

x=1:20

phi1=cbind(0.5^x,0.9^x,0.99^x)

phi2=cbind((-0.5)^x,(-0.9)^x,(-0.99)^x)



#win.graph()


matplot(x,phi1,type="o",lty=1,pch="o",ylab="",xlab="")

text(4,0.3,"phi1=0.5",cex=0.8)

text(15,0.3,"phi1=0.9",cex=0.8)

text(15,0.9,"phi1=0.99",cex=0.8)

matplot(x,phi2,type="o",lty=1,pch="o",ylab="",xlab="")


abline(0,0)

text(4,0.3,"phi1= - 0.5",cex=0.8)

text(15,0.3,"phi1= - 0.9",cex=0.8)

text(15,0.9,"phi1= - 0.99",cex=0.8)

matplot(x,1.2^x,type="o",lty=1,pch="o",ylab="",xlab="")

text(14,22,"phi1=1.2",cex=0.8)

matplot(x,(-1.2)^x,type="o",lty=1,pch="o",ylab="",xlab="")

abline(0,0)

text(13,18,"phi1= - 1.2",cex=0.8)

dev.off()

n=100

w_t1=rnorm(n,0,1)

w_t2=w_t1

w_t3=w_t1

w_t2[50]=w_t1[50]+1

w_t3[50:n]=w_t1[50:n]+1

y=rep(0,3*(n+1))

dim(y)=c(n+1,3)

y[1,]=c(3,3,3)

phi1=0.8

for(i in 2:(n+1)){

y[i,1]=phi1*y[(i-1),1]+w_t1[i-1]

y[i,2]=phi1*y[(i-1),2]+w_t2[i-1]

y[i,3]=phi1*y[(i-1),3]+w_t3[i-1]}

y=y[2:101,1:3]

irf1=y[,2]-y[,1]

irf2=y[,3]-y[,1]





#win.graph()

par(mfrow=c(2,2),mex=0.4,bg="light blue")

ts.plot(y[,1:2],type="l",lty=1,col=c(1,2),ylab="",xlab="")

abline(0,0)

legend(40,0.9*max(y[,1]),text1,lty=c(1,2),col=c(1,2),cex=0.8

text(40,0.8*min(y[,1]),"phi=0.8")

ts.plot(cbind(y[,1],y[,3]),type="l",lty=1,col=c(1,2),ylab="",x

abline(0,0)

legend(40,0.9*max(y[,3]),text1,lty=c(1,2),col=c(1,2),cex=0.8

text(10,0.8*min(y[,3]),"phi=0.8")

plot(1:n,irf1,type="l",ylab="",xlab="")

abline(0,0)

text(40,0.6,"Impulse Response Function",cex=0.8)

text(20,0.8,"phi=0.8")

plot(1:n,irf2,type="l",ylab="",xlab="")

abline(0,0)

text(40,3,"Impulse Response Function",cex=0.8)

text(20,0.8,"phi=0.8")

dev.off()

ff=c(0.6,0.2,1,0,0.8,0.4,1,0,-0.9,-0.5,1,0,-0.5,-1.5,1,0)

dim(ff)=c(4,4)

mj=20

x=0:mj

irf=rep(0,(mj+1)*4)

dim(irf)=c(mj+1,4)

irf[1,]=1


for(j in 1:4){

aa=c(1,0,0,1)

dim(aa)=c(2,2)

ff1=ff[,j]

dim(ff1)=c(2,2)

ff1=t(ff1)

for(i in 1:mj){

aa=aa%*%ff1

irf[i+1,j]=aa[1,1]}}



#win.graph()

par(mfrow=c(2,2),mex=0.4,bg="light yellow")

plot(x,irf[,1],type="o",pch="o",ylab="",ylim=c(0,1),xlab="",

main="(a)",cex=0.8)


text(12,0.3,"phi1=0.6 and phi2=0.2",cex=0.8)

plot(x,irf[,2],type="o",pch="o",ylab="",ylim=c(0,12),xlab="",

main="(b)",cex=0.8)


text(9,6,"phi1=0.8 and phi2=0.4",cex=0.8)

plot(x,irf[,3],type="o",pch="o",ylab="",ylim=c(-1,1),xlab="",

main="(c)",cex=0.8)

abline(0,0)


text(11,-0.3,"phi1= - 0.9 and phi2= - 0.5",cex=0.8)

plot(x,irf[,4],type="o",pch="o",ylab="",ylim=c(-40,30),xlab="",

main="(d)",cex=0.8)

abline(0,0)



text(9,-16,"phi1= - 0.5 and phi2= - 1.5",cex=0.8)

dev.off()

###################################################################

3.16 References

Akaike, H. (1973). Information theory and an extension of the maximum likelihood princi-ple. In Proceeding of 2nd International Symposium on Information Theory (V. Petrovand F. Csaki, eds.) 267281. Akademiai Kiado, Budapest.

Andrews, D.W.K. (1991). Heteroskedasticity and autocorrelation consistent covariancematrix estimation. Econometrica, 59, 817-858.

Bai, Z., C.R. Rao and Y. Wu (1999). Model selection with data-oriented penalty. Journalof Statistical Planning and Inferences, 77, 103-117.

Beran, J. (1994). Statistics for Long-Memory Processes. Chapman and Hall, London.

Box, G.E.P. and Jenkins, G.M. (1970). Time Series Analysis, Forecasting, and Control.Holden Day, San Francisco.

Box, G.E.P., G.M. Jenkins and G.C. Reinsel (1994). Time Series Analysis, Forecasting andControl. 3th Edn. Englewood Cli!s, NJ: Prentice-Hall.

Brockwell, P.J. and Davis, R.A. (1991). Time Series Theory and Methods. New York:Springer.

Burman, P. and R.H. Shumway (1998). Semiparametric modeling of seasonal time series.Journal of Time Series Analysis, 19, 127-145.

Burnham, K.P. and D. Anderson (2003). Model Selection And Multi-Model Inference: APractical Information Theoretic Approach, 2nd edition. New York: Springer-Verlag.

Cai, Z. and R. Chen (2006). Flexible seasonal time series models. Advances in Economet-rics, 20B, 63-87.

Chan, N.H. and C.Z. Wei (1988). Limiting distributions of least squares estimates ofunstable autoregressive processes. Annals of Statistics, 16, 367-401.

Cochrane, D. and G.H. Orcutt (1949). Applications of least squares regression to relation-ships containing autocorrelated errors. Journal of the American Statistical Association,44, 32-61.


Ding, Z., C.W.J. Granger and R.F. Engle (1993). A long memory property of stock returnsand a new model. Journal of Empirical Finance, 1, 83-106.

Eicker, F. (1967). Limit theorems for regression with unequal and dependent errors. InProceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability(L. LeCam and J. Neyman, eds.), University of California Press, Berkeley.

Fan and Li (2001). Variable selection via nonconcave penalized likelihood and its oracleproperties. Journal of the American Statistical Association, 96, 1348-1360.

Fan and Peng (2004). Nonconcave penalized likelihood with a diverging number of param-eters. Annals of Statistics, 32, 928-961.

Frank, I.E. and J.H. Friedman (1993). A statistical view of some chemometric regressiontools (with discussion). Technometrics, 35, 109-148.

Franses, P.H. (1996). Periodicity and Stochastic Trends in Economic Time Series. NewYork: Cambridge University Press.

Franses, P.H. (1998). Time Series Models for Business and Economic Forecasting. NewYork: Cambridge University Press.

Franses, P.H. and D. van Dijk (2000). Nonlinear Time Series Models for Empirical Finance.New York: Cambridge University Press.

Ghysels, E. and D.R. Osborn (2001). The Econometric Analysis of Seasonal Time Series.New York: Cambridge University Press.

Hurvich, C.M. and C.-L.Tsai (1989). Regression and time series model selection in smallsamples. Biometrika, 76, 297-307.

Newey, W.K. and K.D. West (1987). A simple, positive-definite, heteroskedasticity andautocorrelation consistent covariance matrix. Econometrica, 55, 703-708.

Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6,461-464.

Shao (1993). Linear model selection by cross-validation. Journal of the American StatisticalAssociation, 88, 486-494.

Shen, X.T. and J.M. Ye (2002). Adaptive model selection. Journal of the American Sta-tistical Association, 97, 210-221.

Shumway, R.H. (1988). Applied Statistical Time Series Analysis. Englewood Cli!s, NJ:Prentice-Hall.

Shumway, R.H., A.S. Azari and Y. Pawitan (1988). Modeling mortality fluctuations in LosAngeles as functions of pollution and weather e!ects. Environmental Research, 45,224-241.


Shumway, R.H. and D.S. Sto!er (2000). Time Series Analysis & Its Applications. NewYork: Springer-Verlag.

Tiao, G.C. and R.S. Tsay (1983). Consistency properties of least squares estimates ofautoregressive parameters in ARMA models. Annals of Statistics, 11, 856-871.

Tibshirani, R.J. (1996). Regression shrinkage and selection via the Lasso. Journal of theRoyal Statistical Society, Series B, 58, 267-288.


Walker, G. (1931).On the periodicity in series of related terms. Proceedings of the RoyalSociety of London, Series A, 131, 518-532.

Yule, G.U. (1927).On a method of investigating periodicities in disturbed series with specialreference to Wolfer’s Sun spot numbers. Philosophical Transactions of the Royal Societyof London, Series A, 226, 267-298.

Chapter 4

Non-stationary Processes andStructural Breaks

4.1 Introduction

In our analysis so far, we have assumed that the variables in the mod-els that we have analyzed, univariate ARMA models, are stationary

yt = µ + ,(B) et,

where yt may be a vector of n variables at time period t. A glance atgraphs of most economic time series su"ces to reveal invalidity of thatassumption, because economies evolve, grow, and changeover time in both real and nominal terms and economicforecasts often are very wrong, although that they shouldoccur relatively infrequently in a stationary process.

The practical problem that an econometrician faces is to find anyrelationships that survive for relatively long period of time so thatthey can be used for forecasting and policy analysis. Hendry andJuselius (2000) pointed out that four issues immediately arise inthe issue of nonstationarity:

1. How important is the assumption of stationarity for mod-eling and inference? It is very important. When data

185

CHAPTER 4. NON-STATIONARY PROCESSES AND STRUCTURAL BREAKS 186

means and variances are non-constant, observations comefrom di!erent distributions over time, creating di"cult prob-lems for empirical modeling.

2. What is the e!ect of incorrectly assuming it? It is poten-tially hazardous. Assuming constant means and vari-ances when that is false can induce serious statistical mis-takes. If the variables in yt are not stationary, then con-ventional hypothesis tests, confidence intervals and forecastscan be unreliable. Standard asymptotic distribution theoryoften does not apply to regressions involving variables withunit roots, and inference may be misleading if this is ignored.

3. What are the sources of non-stationarity? There are manyand they are varied. Non-stationarity maybe due toevolution of the economy, legislative changes, technologicalchanges, political events, etc.

4. Can empirical analysis be transformed so stationarity be-comes a valid assumption? It is sometimes possible,depending on the source of non-stationarity. Someof non-stationarity can be eliminated by transformationssuch as de-trending and di!erencing, or some other typetransformations; see Park and Phillips (1999) and Chang,Park and Phillips (2001) for more discussions.

Trending Time Series

A non-stationary process is one which violates the stationary re-quirement, so its means and variances are non-constant over time.A trend is a persistent long-term movement of a variable over time


and a time series fluctuates around its trend. The common featuresof a trending time series can be summarized as the followingsituations:

1. Stochastic trends. A stochastic trend is random andvaries over time:

(1 " L)yt = 0 + ,(L)et.

That is, after taking the di!erence, the time series is mod-elled as a linear process.

2. Deterministic trends. A deterministic trend is a non-random function of time. For example, a deterministic trendmight be linear or quadratic or higher order in time:

yt = a + 0 t + ,(L)et.

The di"erence between a linear stochastic trend and adeterministic trend is that the changes of a stochastictrend are random, whereas those of a deterministic trendare constant over time.

3. Permanence of shocks. Macroeconomists used to de-trend data and regarded business cycles as the stationary de-viations from that trend. Economists investigated whetherGNP is better described as random walk or true stationaryprocess.

4. Statistical issues. We could have mistaken a time serieswith unit roots for trend stationary time series since a timeseries with unit roots might display a trending phenomena.

The trending time series models have gained a lot of attentionduring the last two decades due to many applications in economics


and finance. See Cai (2006) for more references. Following are someexamples. The market model in finance is an example that relatesthe return of an individual stock to the return of a market index oranother individual stock and the coe"cient usually is called a beta-coe"cient in the capital asset pricing model (CAPM); see the booksby Cochrane (2001) and Tsay (2005) for more details. However,some recent studies show that the beta-coe"cients might vary overtime. The term structure of interest rates is another example inwhich the time evolution of the relationship between interest rateswith di!erent maturities is investigated; see Tsay (2005). The lastexample is the relationship between the prototype electricity demandand other variables such as the income or production, the real price ofelectricity, and the temperature, and Chang and Martinez-Chombo(2003) found that this relationship may change over time. Althoughthe literature is already vast and continues to grow swiftly, as pointedout by Phillips (2001), the research in this area is just beginning.

4.2 Random Walks

The basic random walk is

yt = yt"1 + et

with Et"1(et) = 0, where Et"1(·) is the conditional expectation giventhe past information up to time t" 1, which implies that Et(yt+1) =yt. Random walks have a number of interesting properties:

1. The impulse-response function (IRF) of random walk is oneat all horizons. The IRF of a stationary processes dies outeventually.


2. The forecast variance of the random walk grows linearly withthe forecast horizon: Var(yt+k | yt) = Var(yt+k " yt) = k )2

e

3. The autocovariances of random walk are defined in Section3.4.

Statistical Issues

Suppose a series is generated by a random walk:

yt = yt"1 + et.

You might test for a random walk by running

yt = µ + " yt"1 + et

by the ordinary least square (OLS) and testing whether " = 1. How-ever, this is not correct since the assumptions underlying the usualasymptotic theory for OLS estimates and test statistics are violated.

Exercise: Why is the asymptotic theory for OLS estimates nottrue?

4.2.1 Inappropriate Detrending

Things get even more complicated with a trend in the model. Sup-pose the true model is

yt = µ + yt"1 + et.

Suppose you detrend the model and then fit an AR(1) model, i.e.the fitting model is:

(1 " "L)(yt " b t) = et.


But the above model can be written as follows:

yt = & + # t + " yt"1 + et,

so you could also run directly y on a time trend and lagged y. Inthis case, #" is biased downward (under-estimate) and the standardOLS errors are misleading.

4.2.2 Spurious (nonsense) Regressions

Suppose two series are generated by random walks:

yt = yt"1 + et, xt = xt"1 + vt, E(et vs) = 0, for all t, s.

Now, suppose you run yt on xt by OLS:

yt = & + % xt + ut.

The assumptions for classical regression are violated and we tend tosee “significant” % more often than OLS formulas say we should.

Exercise: Please conduct a Monte Carol simulation to verify theabove conclusion.

4.3 Unit Root and Stationary Processes

A more general process than pure random walk may have the follow-ing form:

(1 " L) yt = µ + ,1(L) et.

These are called unit root or di!erence stationary (DS) processes. Inthe simplest case ,1(L) = 1, the DS process becomes a random walkwith drift:

yt = µ + yt"1 + et.


Alternatively, we may consider a process to be stationary around alinear trend:

yt = µ t + ,2(L) et.

This process is called trending stationary (TS) process. The TSprocess can be considered as a special case of DS model. Indeed, wecan write the TS model as:

(1 " L)yt = (1 " L) µ t + (1 " L),2(L)et = µ + ,1(L)et.

Therefore, if TS model is correct, the DS model is still valid andstationary. For more studies on the TS models, we refer the readerto the papers by Phillips (2001) and Cai (2006).

One can think about unit roots as the study of the implicationsfor levels of a process that is stationary in di!erences. Therefore, itis very important to keep track of whether you are thinking aboutthe level of the process yt or its first di!erence.

Let us examine the impulse response function (IRF) for the abovetwo models. For the TS model, the IRF is determined by MA poly-nomial ,2(L), i.e. bj is the j-th period ahead response. For the DSmodel, aj gives the response of the di!erence (1"L)yt+j to a shockat time t. The response of the level of the series yt+j is the sum ofthe response of the di!erences. Response of yt+j to shock at periodt is:

IRFj = (yt"yt"1)+(yt+1"yt)+· · ·+(yt+j"yt+j"1) = a0+a1+· · ·+aj.

4.3.1 Comparison of Forecasts of TS and DS Processes

The forecast of a trend-stationary process is as follows:

"ytt+s = µ(t + s) + bs et + bs+1 et"1 + bs+2 et"2 + · · · .


The forecast of a di!erence stationary process can be written as fol-lows:

"ytt+s = # "yt

t+s + # "ytt+s"1 + · · · + # "yt

t+1 + yt

= (µ + bs et + bs+1 et"1 + bs+2et"2 + · · ·)+(µ + bs"1et + bset"1 + bs+1et"2 + · · ·) + · · ·+(µ + b1et + b2et"1 + b3et"2 + · · ·) + yt

or

"ytt+s = µ s+yt +(bs +bs"1 · · ·+b1)et +(bs+1 +bs + · · ·+b2)et"1 + · · ·

To see the di!erence between forecasts for TS and DS processes, weconsider a case in which b1 = b2 = · · · = 0. Then,

TS : "ytt+s = µ(t + s)

DS : "ytt+s = µ s + yt

Next, compare the forecast error for the TS and DS processes. Forthe TS process:

yt+s " "ytt+s = (µ(t + s) + et+s + b1et+s"1 + b2et+s"2 + · · · + bs"1et+1 + bset +

"(µ(t + s) + bset + bs+1et"1 + · · ·) = et+s + b1et+s"1 + · · · + b

The MSE of this forecast is

E+yt+s " "yt

t+s

,2= (1 + b2

1 + b22 + · · · + b2

s"1))2.

For the DS process:

yt+s " "ytt+s = (#yt+s + · · · + #yt+1 + yt) " (# "yt

t+s + · · · + # "ytt+1 + yt)

= et+s + (1 + b1)et+s"1 + (1 + b1 + b2)et+s"2 + · · · + (1 + b1 + b2

The MSE of this forecast is

E+yt+s " "yt

t+s

,2=

7(1 + b1)

2 + (1 + b1 + b2)2 + · · · + (1 + b1 + b2 + · · · + bs"1


Note that for the MSE for a TS process as the forecasting horizonincreases, though as s becomes large, the added uncertainty fromforecasting into future becomes negligible:

lims1& E

+yt+s " "yt

t+s

,2= (1 + b2

1 + b22 + · · ·))2

and the limiting MSE is just the unconditional variance of the sta-tionary component ,2(L). This is not true for the DS process. TheMSE for a DS process does not converge to any fixed value as s goesto infinity. To summarize, for a TS process, the MSE reaches a finitebound as the forecast horizon becomes large, whereas for a unit rootprocess, the MSE eventually grows linearly with the forecast horizon.

4.3.2 Random Walk Components and Stochastic Trends

It is well known that any DS process can be written as a sum of arandom walk and a stationary component. A decomposition witha nice property is due to the Beveridge-Nelson (1981, BN). If (1 "L)yt = µ + ,1(L)et, then we can write yt = ct + zt, where zt =µ + zt"1 + ,1(1)et and ct = ,*

1(L)et with a*j =%

k>j ak. To see whythis decomposition is true, we need to notice that any lag polynomial,1(L) can be written as ,1(L) = ,1(1) + (1 " L),*

1(L), wherea*j =

%k>j ak. To see this, just write it out:

,1(1) : a0 +a1 +a2 +a3 · · ·(1 " L),*

1(L) : "a1 "a2 "a3 · · ·+a1 L +a2 L +a3 L · · ·

"a2 L "a3 L · · ·· · · · · ·

and the terms ,1(L) remain when you cancel all the terms. Thereare many ways to decompose a unit root into stationary and randomwalk components. The BN decomposition is a popular choice because


it has a special property: the random walk component is a sensibledefinition of the “trend” in yt. The component zt is the limitingforecast of future y, i.e. today’s y plus all future expected changesin y. In the BN decomposition the innovations to the stationaryand random walk components are perfectly correlated. Consider anarbitrary combination of stationary and random walk components:yt = zt + ct, where zt = µ + zt"1 + vt and ct = ,2(L)et. It can beshown that in every decomposition of yt into stationary and randomwalk components, the variance of changes to the random walk com-ponent is the same, ,1(1)2)2

e . Since the unit root process is composedof a stationary plus random walk component, the unit root processhas the same variance of forecasts behavior as the random walk whenthe horizon is long enough.

4.4 Trend Estimation and Forecasting

4.4.1 Forecasting a Deterministic Trend

Consider the liner deterministic model:

yt = & + % t + et, t = 1, 2, . . . , T.

The h-step-ahead forecast is given by ytt+h = #& + #%(t + h), where

#& and #% are the OLS estimates of the parameters & and %. Theforecast variance may be computed using the following formula

EF+

yt+h " ytt+h

,2I

= )2

'

9(1 +1

t+

t + h " (t + 1)/2%t

m=1(m " (t + 1)/2)2

)

<* , )2,

where the last approximation is valid if t (the period at which forecastis constructed) is large relative to the forecast horizon h.


4.4.2 Forecasting a Stochastic Trend

Consider the random walk with drift

yt = & + yt"1 + et, t = 2, 3, . . . , T.

Let #& be an estimate of a obtained from the following regressionmodel:

#yt = & + et.

The h-step-ahead forecast is given by ytt+h = yt + #& h. The forecast

variance may be computed using the following formula

EF+

yt+h " ytt+h

,2I

= )2

-

:.h +h2

t " 1

/

;0 , h)2,

where the last approximation is valid if t is large relative to h.

4.4.3 Forecasting ARMA models with Deterministic Trends

The basic models for deterministic and stochastic trend ignore possi-ble short-run fluctuations in the series. Consider the following ARMAmodel with deterministic trend

,(L)yt = & + % t + !(L)et,

where the polynomial "(L) satisfies the stationarity condition and!(L) satisfies the invertibility condition.

The forecast is constructed as follows:

1. Linear detrending. Estimate the following regression model:

yt = 01 + 02 t + !zt,

and compute zt = yt " "01 " "02 t.


2. Estimate an appropriate ARMA(p, q) model for covariancestationary variable zt : ,(L)zt = !(L)et. The estimatedARMA(p, q) model may be used to construct forecasts h-period-ahead forecasts of zt, zt

t+h.

3. Construct the h-period-ahead forecast of yt as follows: ytt+h =

ztt+h + "01 + "02(t + h).

The MSE of ytt+h may be approximated by the MSE of zt

t+h.

4.4.4 Forecasting of ARIMA Models

Consider the forecasting of time series that are integrated of order 1that are described by the ARIMA(p, 1, q) model:

"(L)(1 " L)yt = & + !(L)et,

where the polynomial "(L) satisfies the stationarity condition and!(L) satisfies the invertibility condition. The forecast is constructedas follows:

1. Compute the first di!erence of yt, i.e. zt = #yt.

2. Estimate an appropriate ARMA(p, q) model for covariancestationary variable zt: "(L)zt = !(L)et. The estimatedARMA(p, q) model may be used to construct forecasts h-period-ahead forecasts of zt, zt

t+h.

3. Construct the h-period-ahead forecast of yt as follows:

ytt+h = yt + zt

t+h


4.5 Unit Root Tests

Although it might be interesting to know whether a time series hasa unit root, several papers have argued that the question can not beanswered on the basis of a finite sample of observations. Nevertheless,you will have to conduct test of unit root in doing empirical projects.It can be done using informal or informal methods. The informalmethods involve inspecting a time series plot of the data and com-puting the autocorrelation coe"cients, as what we did in Chapters1 and 2. If a series has a stochastic trend, the first autocorrelationcoe"cient will be near one. A small first autocorrelation coe"cientcombined with a time series plot that has no apparent trend suggeststhat the series does not have a trend. Dickey-Fuller’s (1979, DF) testis a most popular formal statistical procedure for unit root testing.

4.5.1 The Dickey-Fuller and Augmented Dickey-Fuller Tests

The starting point for the DF test is the autoregressive model oforder one, AR(1):

yt = & + $ yt"1 + et. (4.1)

If $ = 1, yt is nonstationary and contains a stochastic trend. There-fore, within the AR(1) model, the hypothesis that yt has a trend canbe tested by testing:

H0 : $ = 1 vs. H1 : $ < 1.

This test is most easily implemented by estimating a modified versionof (4.1). Subtract yt"1 from both sides and let 0 = $ " 1. Then,model (4.1) becomes:

# yt = & + 0yt"1 + et (4.2)


Table 4.1: Large-sample critical values for the ADF statistic

Deterministic regressors 10% 5% 1%Intercept only -2.57 -2.86 -3.43

Intercept and time trend -3.12 -3.41 -3.96

and the testing hypothesis is:

H0 : 0 = 0 vs. H1 : 0 < 0.

The OLS t-statistic in (4.2) testing 0 = 0 is known as the Dickey-Fuller statistic.

The extension of the DF test to the AR(p) model is a test ofthe null hypothesis H0 : 0 = 0 against the one-sided alternativeH1 : 0 < 0 in the following regression:

#yt = & + 0 yt"1 + #1 # yt"1 + · · · + #p #yt"p + et. (4.3)

Under the null hypothesis, yt has a stochastic trend and under thealternative hypothesis, yt is stationary. If instead the alternativehypothesis is that yt is stationary around a deterministic linear timetrend, then this trend must be added as an additional regressor inmodel (4.3) and the DF regression becomes

#yt = & + % t + 0 yt"1 + #1 # yt"1 + · · · + #p #yt"p + et. (4.4)

This is called the augmented Dickey-Fuller (ADF) test and the teststatistic is the OLS t-statistic testing that 0 = 0 in equation (4.4).

The ADF statistic does not have a normal distribution, even inlarge samples. Critical values for the one-sided ADF test depend onwhether the test is based on equation (4.3) or (4.4) and are given inTable 4.1. Table 17.1 of Hamilton (1994, p.502) presents a summaryof DF tests for unit roots in the absence of serial correlation for testing


Table 4.2: Summary of DF test for unit roots in the absence of serial correlationCase 1:

True process: yt = yt!1 + ut, ut % N(0, !2) iid.Estimated regression: yt = "yt!1 + ut.T ("" " 1) has the distribution described under the heading Case 1 in Table B.5.("" " 1)/"!2

"! has the distribution described under Case 1 in Table B.6.

Case 2:

True process: yt = yt!1 + ut, ut % N(0, !2) iid.Estimated regression: yt = # + " yt!1 + ut.T ("" " 1) has the distribution described under Case 2 in Table B.5.("" " 1)/!2


OLS F-test of join hypothesis that # = 0 and " = 1 has the distribution described under Case 2in Table B.7.Case 3:

True process: yt = # + yt!1 + ut, # )= 0, ut % N(0, !2) iid.Estimated regression: yt = # + " yt!1 + ut.("" " 1)/!2

"! 1 N(0, 1).

Case 4:

True process: yt = # + yt!1 + ut, # )= 0, ut % N(0, !2) iid.Estimated regression: yt = # + "yt!1 + $ t + ut.T ("" " 1) has the distribution described under Case 4 in Table B.5.("" " 1)/!2


OLS F-test of join hypothesis that " = 1 and $ = 0 has the distribution described under Case 4in Table B.7.

the null hypothesis of unit root against some di!erent alternativehypothesis. It is very important for you to understand what youralternative hypothesis is in conducting unit root tests. I reproducethis table here, but you need to check Hamilton’s (1994) book for thecritical values of DF statistic for di!erent cases. The critical valuesare presented in the Appendix of the book.

In the above models (4 cases), the basic assumption is that ut

is iid. But this assumption is violated if ut is serially correlatedand potentially heteroskedastic. To take account of serial correlationand potential heteroskedasticity, one way is to use the PP test pro-posed by Phillips and Perron (1988). For other tests for unit roots,please read the book by Hamilton (1994, p.532). Some recent testing


methods have been proposed. For example, Juhl (2005) used thefunctional coe"cient type model of Cai, Fan and Yao (2000) to testunit root and Phillips and Park (2005) employed the nonparametricregression. Finally, notice that in R, there are at least three packagesto provide unit root tests such as tseries, urca and uroot.

4.5.2 Cautions

The most reliable way to handle a trend in a series is to transformthe series so that it does not have the trend. If the series has astochastic trend, unit root, then the first di!erence of the series doesnot have a trend. In practice, you can rarely be sure whether a serieshas a stochastic trend or not. Recall that a failure to reject the nullhypothesis doe not necessarily mean that the null hypothesis is true;it simply means that there are not enough evidence to conclude thatit is false. Therefore, failure to reject the null hypothesis of a unitroot using the ADF test does not mean that the series actually hasa unit root. Having said that, even though failure to reject the nullhypothesis of a unit root does not mean the series has a unit root,it still can be reasonable to approximate the true autoregressive rootas equaling one and use the first di!erence of the series rather thanits levels.

4.6 Structural Breaks

Another type of nonstationarity arises when the population regres-sion function changes over the sample period. This may occur be-cause of changes in economic policy, changes in the structure of theeconomy or industry, events that change the dynamics of specificindustries or firm related quantities (inventories, sales, production),


etc. If such changes, called breaks, occur then regression modelsthat neglect those changes lead to a misleading inference or forecast-ing.

Breaks may result from a discrete change (or changes) in the pop-ulation regression coe"cients at distinct dates or from a gradual evo-lution of the coe"cients over a longer period of time. Discrete breaksmay be a result of some major changes in economic policy or in theeconomy (oil shocks) while “gradual” breaks, population parametersevolve slowly over time, may be a result of slow evolution of economicpolicy.

If a break occurs in the population parameters during the sample,then the OLS regression estimates over the full sample will estimatea relationship that holds on “average”.

4.6.1 Testing for Breaks

Tests for breaks in the regression parameters depend on whether thebreak date is know or not. If the date of the hypothesized break inthe coe"cients is known, then the null hypothesis of no break can betesting using a dummy variable.

Consider the following model:

yt = %0 + %1yt"1 + 01xt"1 + #0Dt(1 ) + #1 Dt(1 ) yt"1 + #2 Dt(1 ) xt"1 + ut

==?

@%0 + %1yt"1 + 01xt"1 + ut, if t # 1 ,(%0 + #0) + (%1 + #1)yt"1 + (01 + #2)xt"1 + ut, if t > 1 ,

where 1 denotes the hypothesized break date, Dt(1 ) is a binaryvariable that equals zero before the break date and one after, i.e.Dt(1 ) = 0 if t # 1 and Dt(1 ) = 1 if t > 1 . Under the null hypoth-esis of no break, #0 = #1 = #2 = 0, and the hypothesis of a break


can be tested using the F-statistic. This is called a Chow test fora break at a known break date. Indeed, the above structural breakmodel can be regarded as a special case of the following trendingtime series model

yt = %0(t) + %1(t) yt"1 + 01(t) xt"1 + ut.

For more discussions, see Cai (2006).

If there are more variables or more lags, this test can be extendedby constructing binary variable interaction variables for all the depen-dent variables. This approach can be modified to check for a breakin a subset of the coe"cients. The break date is unknown in mostof the applications but you may suspect that a break occurred some-time between two dates, 10 and 11. The Chow test can be modifiedto handle this by testing for break at all possible dates t in between10 and 11, then using the largest of the resulting F-statistics to testfor a break at an unknown date. This modified test is often calledQuandt likelihood ratio (QLR) statistic or the sup-Wald statistic:

QLR = max{F (10), F (10 + 1), · · · , F (11)}.

Since the QLR statistic is the largest of many F-statistics, its distri-bution is not the same as an individual F-statistic. The critical valuesfor QLR statistic must be obtained from a special distribution. Thisdistribution depends on the number of restriction being tested, m,10, 11, and the subsample over which the F-statistics are computedexpressed as a fraction of the total sample size.

For the large-sample approximation to the distribution of the QLRstatistic to be a good one, the subsample endpoints, 10 and 11, cannot be too close to the end of the sample. That is why the QLRstatistic is computed over a “trimmed” subset of the sample. A


Table 4.3: Critical Values of the QLR statistic with 15% Trimming

Number of restrictions (m) 10% 5% 1%1 7.12 8.68 12.162 5.00 5.86 7.783 4.09 4.71 6.024 3.59 4.09 5.125 3.26 3.66 4.536 3.02 3.37 4.127 2.84 3.15 3.828 2.69 2.98 3.579 2.58 2.84 3.3810 2.48 2.71 3.23

popular choice is to use 15% trimming, that is, to set for 10 = 0.15Tand 11 = 0.85T . With 15% trimming, the F-statistic is computedfor break dates in the central 70% of the sample. Table 4.3 presentsthe critical values for QLR statistic computed with 15% trimming.This table is from Stock and Watson (2003) and you should checkthe book for a complete table. The QLR test can detect a singlebreak, multiple discrete breaks, and a slow evolution of the regressionparameters.

If there is a distinct break in the regression function, the date atwhich the largest Chow statistic occurs is an estimator of the breakdate. In R, the packages strucchange and segmented provideseveral testing methods for testing breaks or you use the functionStructTS.

4.6.2 Zivot and Andrews’s Testing Procedure

Sometimes, you would suspect that a series may either have a unitroot or be a trend stationary process that has a structural break atsome unknown period of time and you would want to test the null


hypothesis of unit root against the alternative of a trend stationaryprocess with a structural break. This is exactly the hypothesis testedby Zivot and Andrews’s (1992) test. In this testing procedure, thenull hypothesis is a unit root process without any structural breaksand the alternative hypothesis is a trend stationary process with pos-sible structural change occurring at an unknown point in time. Zivotand Andrews (1992) suggested estimating the following regression:

xt = µ+!DUt(1 )+% t+#DTt(1 )+& xt"1+k$

i=1ci # xt"i+et, (4.5)

where 1 = TB/T is the break fraction; DUt(1 ) = 1 if t > 1 T and0 otherwise; DTt(1 ) = t " T 1 if t > 1 T and 0 otherwise; and xt

is the time series of interest. This regression allows both the slopeand intercept to change at date TB. Note that for t # 1 T (t # TB)model (4.5) becomes:

xt = µ + % t + & xt"1 +k$

i=1ci # xt"i + et,

while for t > 1 T (t > TB) model (4.5) becomes:

xt = [µ + !] + [% t + #(t " TB)] + & xt"1 +k$

i=1ci # xt"i + et.

Model (4.5) is estimated by OLS with the break points ranging overthe sample and the t-statistic for testing & = 1 is computed. Theminimum t-statistic is reported. Critical values for 1%, 5% and 10%critical values are "5.34, "4.8 and "4.58, respectively. The appro-priate number of lags in di!erences is estimated for each value of 1 .Please read the paper by Sadorsky (1999) for more details about thismethod and empirical applications.


4.6.3 Cautions

The appropriate way to adjust for a break in the population param-eters depends on the sources of the break. If a distinct break occursat a specific date, this break will be detected with high probabilityby the QLR statistic, and the break date can be estimated. Theregression model can be estimated using a dummy variable. If thereis a distinct break, then inference on the regression coe"cients canproceed as usual using t-statistics for hypothesis testing. Forecastscan be produced using the estimated regression model that appliesto the end of the sample. The problem is more di"cult if the breakis not distinct and the parameters slowly evolve over time. In thiscase a state-space modelling is required.

4.7 Problems

1. You will build a time-series model for real oil price listed in thetenth column in file “MacroData.xls”.

(a) Construct graphs of time series data: time plots and scatterplots of the levels of real oil prices (OP) and the log-di!erenceof oil prices. The log-di!erence of oil price is defined as follows# log(OPt) = log(OPt) " log(OPt"1). Comment your results.

(b) Try to identify lag structure for levels of real oil prices and thelog-di!erence of oil prices by using ACF and PACF. Commentyour results.

(c) We will not estimate ARMA models yet. So, estimate AR(p)model, 1 # p # 8. Compute Akaike Information Criteria (AIC)or AICC and Schwarz Information Criteria (SIC). Present yourresults. Choose the AR lag length based on the AIC or AICC.


Which lag length would you choose based on the AIC or AICCor SIC? Comment.

(d) Estimate the AR(p) model with the optimal lag length. Presentyour results nicely.

(e) Conduct the diagnostic checking of the estimated AR model.You need do the followings and comment your results: (i) Con-struct graphs of residuals (time series plots, scatterplots, squaredresiduals). (ii) Check for serial correlation using sample ACF.(iii) Check for serial correlation using Ljung-Box test statistic.(iv) Conduct Jarque-Bera test of normality and make a Q-Qplot. (v) Estimate the following AR(1) model for the estimatedsquared residuals and test the null hypothesis that the slope isinsignificant. How would interpret your results? What does itsay about the constancy of variance? (vi) Based on diagnosticchecking in the above, can you use the model or you should goback to identification of lag structure in (b)?

(f) Is there any structure change for oil price?

Note: The Jarque-Bera (1980, 1987) test evaluates the hypothesis thatX has a normal distribution with unspecified mean and variance,against the alternative that X does not have a normal distribu-tion. The test is based on the sample skewness and kurtosis ofX . For a true normal distribution, the sample skewness shouldbe near 0 and the sample kurtosis should be near 3. A test hasthe following general form:

JB =T

6

-

:.Sk +(K " 3)2

4

/

;0 1 222,

where Sk and K are the measures of skewness and kurtosis re-spectively. To use the build-in function for the Jarque-Bera test


in the package tseries in R, the command for the Jarque-Beratest is

library(tseries) # call the package "tseries"

jb=jarque.bera.test(x) # x is the series for the test

print(jb) # print the testing result

2. Based on the economic theory, the description of inflation sug-gests two fundamental causes, excess monetary growth (fasterthan real output) and the dissipation of external shocks. Theprecise mechanisms at work, appropriate lag structures are notperfectly defined. In this exercise, you will estimate the followingsimple model:

# log(Pt) = %1+%2[# log(M1t"1)"# log(Qt"1)]+%3 # log(Pt"1)+ut,

where Pt is the quarterly price level (CPI) listed in the eighthcolumn in “MacroData.xls”, Qt is the quarterly real output(listed in the third column), and M1t is the quarterly moneystock (listed in the thirteenth column).

(a) Nicely present the results of OLS estimation. Comment yourresults.

(b) Explain what may be an advantage of the above model com-pared to a simple autoregressive model of prices. To answer thisquestion, you might need to do some statistical analysis.

(c) Is there any structure change for CPI?

(d) Any suggestions to build a better model?

3. In this exercise, you will build a time series model for the realGDP process listed in the second column in “MacroData.xls”.


(a) Build informatively and formally an AR(p) model with the op-timal lag length based on some criterion and conduct the diag-nostic checking of the estimated AR model.

(b) Based on the data 1959.1-1999.3 construct forecasts for the quar-ters 1999.4 -2002.3. Plot the constructed forecasts and the real-ized values. Comment your results.

4. You need to replicate some of the steps in the analysis of oilprice shocks and stock market activity conducted by Sadorsky(1999). You should write your report in such a way that anoutside reader may understand what the report is about andwhat you are doing. Write a referee report for the paper ofSadorsky (1999). One possible structure of the referee report is:

(a) Summary of the paper (This assures that you really readthe paper carefully): (i) Is the economic/financial question ofrelevance? (ii) What have you learned from reading this paper?(iii) What contribution does this paper make to the literature?

(b) Can you think of interesting extensions for the paper?

(c) Expository quality of the paper: (i) Is the paper well struc-tured? If not, suggest and alternative structure. (ii) Is the papereasy to read?

5. Analyze the stochastic properties of the following interest rates:(i) federal funds rate, (ii) 90-day T-bill rate, (iii) 1-year T-bondinterest rate, (iv) 5-year T-bond interest rate; (v) 10-year T-bondinterest rate. The interest rates may be found in the Excel file“IntRates.xls”.

(a) Use the ADF or PP approach to test the null hypothesis thatfive interest rate are di!erence stationary against the alternative


that they are stationary. Explain carefully how you conduct thetest.

(b) Use the ADF or PP approach to test the null hypothesis thatfive interest rate are di!erence stationary against the alternativethat they are stationary around a deterministic trend. Explaincarefully how you conduct the test.

(c) Use the QLR testing procedure to test whether there was at leastone structural break in interest rates series.

4.8 Computer Code


# 5-20-2006

graphics.off()

###################################################################

y=read.csv("c:\\teaching\\time series\\data\\MacroData.csv",header=T,skip=1)

cpi=y[,8]

qt=y[,3]

m0=y[,12]

m1=y[,13]

m2=y[,14]

m3=y[,15]

op=y[,10]

v0=cpi*qt/m0

v1=cpi*qt/m1

v2=cpi*qt/m2


v3=cpi*qt/m3

vt=cbind(v0,v1,v2,v3)

win.graph()


ts.plot(cpi,type="l",lty=1,ylab="",xlab="")

title(main="CPI",col.main="red")

ts.plot(qt,type="l",lty=1,ylab="",xlab="")

title(main="Industry Output",col.main="red")

ts.plot(qt,type="l",lty=1,ylab="",xlab="")

title(main="Oil Price",col.main="red")

win.graph()

par(mfrow=c(2,2),mex=0.4,bg="light grey")

ts.plot(m0,type="l",lty=1,ylab="",xlab="")

title(main="Money Aggregate",col.main="red")







win.graph()

par(mfrow=c(2,2),mex=0.4,bg="yellow")

ts.plot(v0,type="l",lty=1,ylab="",xlab="")

title(main="Velocity",col.main="red")








library(tseries) # call library(tseries)

library(urca) # call library(urca)

library(quadprog)

library(zoo)

adf_test=adf.test(cpi) # Augmented Dickey-Fuller test

print(adf_test)

adf_test=pp.test(cpi) # do Phillips-Perron test

print(adf_test)

#adf_test2=ur.df(y=cpi,lag=5,type=c("drift"))

#print(adf_test2)

adf_test=adf.test(op) # Augmented Dickey-Fuller test

print(adf_test)

adf_test=pp.test(op) # do Phillips-Perron test

print(adf_test)

for(i in 1:4){

adf_test=pp.test(vt[,i])

print(adf_test)

adf_test=adf.test(vt[,i])

print(adf_test)

}

###################################################################


y=read.csv("c:\\teaching\\time series\\data\\MacroData.csv",header=T,skip=1)

op=y[,10]

library(strucchange)

win.graph()

par(mfrow=c(2,2),mex=0.4,bg="green")

op=ts(op)

fs.op <- Fstats(op ~ 1) # no lags and covariate

plot(op,type="l")

plot(fs.op)

sctest(fs.op)

## visualize the breakpoint implied by the argmax of the F statistics

plot(op,type="l")

lines(breakpoints(fs.op))

#####################################

# The following is the example from R

######################################

win.graph()

par(mfrow=c(2,2),mex=0.4,bg="red")

if(! "package:stats" %in% search()) library(ts)

## Nile data with one breakpoint: the annual flows drop

## because the first Ashwan dam was built

data(Nile)

plot(Nile)

## test whether the annual flow remains constant over the

fs.nile <- Fstats(Nile ~ 1)

plot(fs.nile)

sctest(fs.nile)


plot(Nile)

lines(breakpoints(fs.nile))

###################################################################

4.9 References

Beveridge, S. and C.R. Nelson (1981). A new approach to decomposition of economictime series into permanent and transitory components with particular attention tomeasurement of the business cycle. Journal of Monetary Economics, 7, 151-174.

Cai, Z. (2006). Trending time varying coe"cient time series models with serially correlatederrors. Forthcoming in Journal of Econometrics.

Cai, Z., J. Fan, and Q. Yao (2000). Functional-coe"cient regression models for nonlineartime series. Journal of the American Statistical Association, 95, 941-956.

Chang, Y., J.Y. Park and P.C.B. Phillips (2001). Nonlinear econometric models with co-integrated and deterministically treading regressors. Econometrics Journal, 4, 1-36.

Chang, Y. and E. Martinez-Chombo (2003). Electricity demand analysis using cointegrationand error-correction models with time varying parameters: The Mexican case, Workingpaper, Department of Economics, Rice University.

Cochrane, J.H. (1997). Time series for macroeconomics and finance. Lecture Notes.http://gsb.uchicago.edu/fac/john.cochrane/research/Papers/timeser1.pdf

Cochrane, J.H. (2001). Asset Pricing. New Jersey: Princeton University Press.

Dickey, D.A. and W.A. Fuller (1979). Distribution of the estimators for autoregressive timeseries with a unit root. Journal of the American Statistical Association, 74, 427-431.

Hamilton, J.D. (1994). Time Series Analysis. Princeton University Press.

Heij, C., P. de Boer, P.H. Franses and H.K. van Dijk (2004). Econometric Methods withApplications in Business and Economics. Oxford University Press.

Hendry, F.D. and K. Juselius (2000). Explaining cointegration analysis: Part I. Journal ofEnergy, 21, 1-42.

Jarque, C.M. and A.K. Bera (1980). E"cient tests for normality, homoscedasticity andserial independence of regression residuals. Economics Letters, 6, 255-259.

Jarque, C.M. and A.K. Bera (1987). A test for normality of observations and regressionresiduals. International Statistical Review, 55, 163-172.


Juhl, T. (2005). Functional coe"cient models under unit root behavior. EconometricsJournal, 8, 197-213.

Park, J.Y. and P.C.B. Phillips (1999). Asymptotics for nonlinear transformations of inte-grated time series. Econometric Theory, 15, 269-298.

Park, J.Y. and P.C.B. Phillips (2002). Nonlinear regressions with integrated time series.Econometrica, 69, 117-161.

Phillips, P.C.B. (2001). Trending time series and macroeconomic activity: Some presentand future challenges. Journal of Econometrics, 100, 21-27.

Phillips, P.C.B. and J. Park (2005). Non-stationary density and kernel autoregression.Under revision for Econometrics Theory.

Phillips, P.C.B. and P. Perron (1988). Testing for a unit root in time series regression.Biometrika, 75, 335-346.

Sadorsky, P. (1999). Oil price shocks and stock market activity. Energy Economics, 21,449-469.

Stock, J.H. and M.W. Watson (2003). Introduction to Econometrics. Addison-Wesley.


Zivot, E. and D.W.K. Andrews (1992). Further evidence on the great crash, the oil priceshock and the unit root hypothesis. Journal of Business and Economics Statistics, 10,251-270.

Chapter 5

Vector Autoregressive Models

5.1 Introduction

A univariate autoregression is a single-equation, single-variable linearmodel in which the current value of a variable is explained by its ownlagged values. Multivariate models look like the univariate modelswith the letters re-interpreted as vectors and matrices. Consider amultivariate time series:

xt =-

.yt

zt

/

0 .

Recall that by multivariate white noise et % N(0, &), we mean that

et =-

.vt

ut

/

0 , E(et) = 0, E(ete-t) = & =

-

.)2

v )vu

)uv )2u

/

0 , E(ete-t"j) = 0.

The AR(1) model for the random vector xt is xt = "xt"1 + et whichin multivariate framework means that

-

.yt

zt

/

0 =-

."yy "yz

"zy "zz

/

0

-

.yt"1

zt"1

/

0 +-

.vt

ut

/

0 .

Notice that both lagged y and z appear in each equation which meansthat the multivariate AR(1) process (VAR) captures cross-variabledynamics (co-movements).

215

CHAPTER 5. VECTOR AUTOREGRESSIVE MODELS 216

A general VAR is an n-equation, n-variable linear model in whicheach variable is explained by its own lagged values, plus current andpast values of the remaining n " 1 variables:

yt = #1 yt"1 + · · · + #p yt"1 + et, (5.1)

where

yt =

-

::::::::.

y1t

y2t...

ynt

/

;;;;;;;;0

, et =

-

::::::::.

e1t

e2t...

ent

/

;;;;;;;;0

, #i =

-

::::::::.

"(i)11 "(i)

12 · · · "(i)1n

"(i)21 "(i)

22 · · · "(i)2n

... ... . . . ..."(i)

n1 "(i)n2 · · · "(i)

nn

/

;;;;;;;;0

,

and the error terms et have a variance-covariance matrix '. Thename “vector autoregression” is usually used in the place of “vectorARMA” because it is very uncommon to estimate moving averageterms. Autoregressions are easy to estimate because the OLS as-sumptions still apply and each equation may be estimated by ordi-nary least squares regression. The MA terms have to be estimated bymaximum likelihood. However, since every MA process has AR(&)representation, pure autoregression can approximate MA process ifenough lags are included in AR representation.

The usefulness of VAR models is that macro economists can dofour things: (1) describe and summarize macroeconomic data; (2)make macroeconomic forecasts; (3) quantify what we do or do notknow about the true structure of the macro economy; (4) advisemacroeconomic policymakers. In data description and forecasting,VARs have proved to be powerful and reliable tools that are now ineveryday use. Policy analysis is more di"cult in VAR framework be-cause it requires di!erentiating between correlation and causation, socalled “identification problem”. Economic theory is required to solvethe identification problem. Standard practice in VAR analysis is


to report results from Granger-causality, impulse responsesand forecast error variance decomposition which will be dis-cussed respectively in the next sections. For more about the historyand recent developments as well as application, see the paper byStock and Watson (2001).

VAR models come in three varieties: reduced form, recursiveandstructural forms. Here, we only focus on the first one andthe details for the last two can be found in Hamilton (1994, Chapter11). A reduced form VAR process of order p has the same form asequation (5.1):

yt = #1 yt"1 + · · · + #p yt"1 + et, (5.2)

where yt is an n$ 1 vector of the variables, #i is an n$n matrix ofthe parameters, c is an n$ 1 vector of constants, and et % N(0, ').The error terms in these regressions are the “surprise” movements inthe variables after taking their past values into account. The model(5.2) can be presented in many di!erent ways: (1) using the lagoperator notation as

#(L) yt = c + et,

where #(L) = In " #1 L " · · ·" #p Lp; (2) in the matrix notationas

Y = c + X ( + E

where E is a T $ n matrix of the disturbances and Y is a T $ nmatrix of the observations; (3) in terms of deviations from the mean(centerized).

For estimating VAR models, several methods are available: simplemodels can be fitted by the function ar() in the package stats (built-in), more elaborate models are provided by estVARXls() in the


package dse1 and a Bayesian approach is available in the packageMSBVAR.

5.1.1 Properties of VAR Models

A vector process is said to be a covariance-stationary (weakly station-ary) if its first and second moments do not depend on t. The VAR(p)model in equation (5.1) can be written in the form of VAR(1), calledcompanion form, process as:

(t = F (t"1 + vt, (5.3)

where

(t =

-

::::::::.

yt

yt"1...

yt"p+1

/

;;;;;;;;0

, vt =

-

::::::::.

et

0...0

/

;;;;;;;;0

, F =

-

:::::::::::.

#1 #2 #3 · · · #p"1 #p

In 0 0 · · · 0 00 In 0 · · · 0 0... ... ... . . . ... ...0 0 0 · · · In 0

/

;;;;;;;;;;;0

.

To understand conditions for stationarity of a vector process, notefrom the above equation that

(t+s = vt+s + F vt+s"1 + F2 vt+s"2 + · · · + Fs"1 vt+1 + Fs (t.

Proposition 4.1 The VAR process is covariance-stationary if theeigenvalues of the matrix F are less than unity in absolute value.

For covariance-stationary n-dimensional vector process, the j-thauto-covariance is defined to be the following n $ n matrix

)j = E(yt " µ)(yt"j " µ)-.

Note that )j )= )"j but )j = )-"j.


A vector moving average process of order q takes the followingforms

MA(q) : yt = et + $1 et"1 + · · · + $q et"q,

where et is a vector of white noises and et % N(0, '). The VAR(p)model may be presented as an MA(&) model

yt =p$

j=1#j yt"j + et =

&$

k=0%k et"k,

where the sequence {%k} is assumed to be absolutely summable.

To compute the variance of a VAR process, let us rewrite a VAR(p)process in the form of VAR(1) process as in (5.3). Assume thatvectors ( and y are covariance-stationary, let & denote the varianceof (. Then, & is defined as follows:

& =

-

::::::::.

)0 )1 · · · )p"1

)-1 )0 · · · )p"2... ... . . . ...

)-p"1 )-

p"2 · · · )0

/

;;;;;;;;0

= F&F- + Q,

where Q = Var(vt). We can apply Vec operator (If Ad$d is symmet-ric, Vec(A) denotes the d(d + 1)/2 column vector representing thestacked up columns of A which are on and below the diagonal of A)to both sides of the above equation,

Vec(&) = Vec(F&F-) + Vec(Q) = (F 2 F) Vec(&) + Vec(Q),

and with A = F 2 F,

Vec(&) = (Ir2 " A)"1Vec(Q)

provided that the matrix Ir2 " A is nonsingular, r = np. If theprocess (t is covariance-stationary, then Ir2 " A is nonsingular.


5.1.2 Statistical Inferences

Suppose we have a sample of size T , {yt}Tt=1, drawn from an n-

dimensional covariance-stationary process with E(yt) = µ and E[(yt"µ)(yt"j " µ)-] = )j. As usually, the sample mean is defined :

yT =T$

t=1yt.

It is easy to show that for a variance-covariance stationary process:

E(yT ) = µ, and Var(yT ) =1

T 2

'

9(T )0 +T$

j=1(T " j){)j + )"j}

)

<* .

Proposition 4.2 Let yt be a covariance-stationary process with themean µ and the auto-covariance )j and with absolutely summableautocovariances. Then the sample mean yT satisfies: yT convergesto µ in probability and T Var(yT ) converges to

%&j="& )j . S.

A consistent estimate of S can be constructed based on the Neweyand West’s (1987) HAC estimator as follows (see Section 3.10):

#S = ")0 +q$

j=1

-

.1 "j

q + 1

/

07")"j + ")j

8,

where")j =

1

T " j

T$

v=j+1(yt " yT )(yt"j " yT )-

and one can set the value of q as follows: q = 0.75 T 1/3.

To estimate the parameters in a VAR(p) model, we consider thefollowing VAR(p) model given in (5.2) as

yt = c + #1 yt"1 + · · · + #p yt"1 + et,


where yt is an n $ 1 vector containing the values that n variablesassume at date t; et % N(0, '). We assume that we have observedeach of these n variables for (T + p) time periods, i.e we observethe following sample {y"p+1, . . . , y0, y1, y2, . . . , yT}. The simplestapproach to model estimation is to condition on the first p observa-tions {y"p+1, y"p+2, . . . , y0} and to do estimation using the last Tobservations {y1, y2, . . . , yT}. We use the following notation:

xt =

-

::::::::.

1y1...

yt"p

/

;;;;;;;;0

, ( =

-

::::::::.

c-

$-1...

$-p

/

;;;;;;;;0

,

where xt is a (np + 1) $ 1 vector and ( is an n $ (np + 1) matrix.Then equation (5.2) can be written as:

yt = (-xt + et,

where yt is an n $ 1 vector of the variables at period t, et is ann $ 1 vector of the disturbances and et % N(0, '). The likelihoodfunction for the model (5.2) can be calculated in the same way as fora univariate autoregression. The log likelihood for the entire sampleis as follows:

L((, ') = C "T

2ln |'|"

1

2

T$

t=1(yt " (-xt)

-'"1(yt " (-xt),

where C is a constant. To find the MLE estimates of the parameters( and ', we find the derivatives of the log-likelihood in the abovewith respect to ( and ' and set them equal to zero. It can be shownthat the MLE estimates of ( and ' are as follows:#( = (X -X)"1X -Y, and #' = #E #E -/T with #E = Y " X #(.

The OLS estimator of ( is the same as the unrestricted MLE esti-mator.


To test the hypothesis H0 : ! = !0, we can use the popular ap-proach: likelihood ratio test. To do so, we need to calculate themaximum values "! and "!0 under H0 and H1, respectively. The Like-lihood Ratio test statistic is found as:

*T = "2 [L( "!) " L( "!0)].

Under the null hypothesis, *T asymptotically has a 22 distributionwith degrees of freedom equal to the number of restrictions imposedunder H0.

To derive the asymptotic properties of the MLE, let us define#- = Vec(#(), where #( is the MLE of (. Note that since #( is an np$nmatrix, #- is an n2p$1 vector. It can be showed in Proposition 11.1 ofHamilton (1994) that: (1) (1/T )

%Tt=1 xtx-

t 1 Q in probability, whereQ = E(xtx-

t) is an np$np matrix; (2) #- 1 - in probability; (3) #' 1' in probability; (4)

'T (#- " -) 1 N

J0, ' 2 Q"1

K. Therefore, #-

can be treated as approximately: #- , NJ-, ' 2 (X -X)"1

K. To test

the hypothesis of the form H0 : R - = r we can use the followingform of the Wald test:

22(m) = (R#- " r)-7R

5#' 2 (X -X)"1

6R-

8"1(R#- " r).

5.2 Impulse-Response Function

Impulse responses trace out responses of current and future valuesof each of the variables to a one-unit increase in the current value ofone of the VAR errors, assuming that this error returns to zero insubsequent period and that all other errors are equal to zero. Thisfunction is of interest for several reasons: It is another characteri-zation of the behavior of models and it allows one to think about“causes” and “e!ects”.


Recall that for an AR(1) process, the model is xt = "xt"1 + et orxt =

%&j=0 "

j et"j. Based on the MA(&) representation, we see fromSection 3.13 that the impulse-response function is

.xt+j

.et= "j.

Vector process works the same way. The covariance-stationary VARmodel can be written in MA(&) form as

yt = µ +&$

j=0%j et"j.

Then,.yt+j

.e-t= %j.

The element ,ij of *s identifies the consequences of a one-unit in-crease in the j-th variable’s innovation at date t (ejt) for the value ofthe i-th variable at time t + s (yi,t+s), holding all other innovationsat all dates constant. One may also find the response of a specificvariable to shocks in all other variables or the response of all variablesto a specific shock

.yi,t+s

.e-t= *(s)

.i , and.yt+s

.ejt= *(s)

j. .

If one is interested in how the variables of the vector yt+s are a!ectedif the first element of et changed by 01 at the same time that thesecond element changed by 02, and the n-th element by 0n, then thecombined e!ect of these changes on the value of yt+s is given by

# yt+s =n$

j=1

.yt+s

.e-jt0j = %s 0.

A plot of the row i, column j element of %s,=>?

>@

.yi,t+s

.ejt

A>B

>C

S

s=0


as a function of s is called the orthogonal impulse-response func-tion. It describes the response of yi,t+s to a one-time impulse in yjt

with all other variables dated t or earlier held constant.

Suppose that the date t value of the first variable in the autore-gression, y1t, was higher than expected. How does this cause usto revise the forecast of yi,t+s? To answer this question, we definex-

t"1 = (y-t"1, y-t"2, . . . , y

-t"p), where yt"i is an n $ 1 vector and xt"1

is an np $ 1 vector. The question becomes, what is

.E(yi,t+s |y1t, xt"1)

.y1t?

Note that.E(yi,t+s |y1t, xt"1)

.y1t=.E(yi,t+s |y1t, xt"1)

.E(e-t | y1t, xt"1)$.E(et | y1t, xt"1)

.y1t= ,(s)

.1 .

Let us examine the forecast revision resulting from new informationabout the second variable, y2t, beyond that contained in the firstvariables, y1t,

.E(yi,t+s |y1t, y2t, xt"1)

.y2t=.E(yi,t+s |y1t, y2t, xt"1)

.E(e-t | y1t, y2t, xt"1)$.E(et | y1t, y2t, xt"1)

.y2t= ,(

.2

Similarly we might find the forecast revision for the third variableand so on. For variable ynt,

.E(yi,t+s |y1t, · · · , ynt, xt"1)

.ynt=.E(yi,t+s |y1t, · · · , ynt, xt"1)

.E(e-t | y1t, · · · , yn2t, xt"1)$.E(et | y1t, · · · , y

.ynt

The followings are three important properties of impulse-responses:First, the MA(&) representation is the same thing as the impulse-response function; second, the easiest way to calculate MA(&) rep-resentation is to simulate the impulse response function; finally, theimpulse response function is the same as Et(yt+j) " Et"1(yt+j).


5.3 Variance Decompositions

In the organized system, we can compute an accounting of forecast er-ror variance: what percent of the k step ahead forecast error varianceis due to which variable. To do this, we start with MA representation:

yt = %(L) et

where yt = (xt, zt)-, et = (ext, ezt)-, E(et e-t) = I , and %(L) =%&

j=0 %j Lj. The one step forecast error is

et+1 = yt1+ " Et(yt+1) = %0 et

and its variance is

Vart(xt+1) = ,2xx,0 + ,2

xz,0.

,2xx,0 gives the amount of one-step ahead forecast error variance of

x due to the ex shock and ,2xz,0 gives the amount due to ez shock.

In practice, one usually reports fractions ,2xx,0/(,2

xx,0 +,2xz,0). More

formally, we can write

Vart(yt+1) = %0 %-0.

Define

I1 =-

.1 00 0

/

0 and I2 =-

.0 00 1

/

0 .

Then, the part due of the one step ahead forecast error variance dueto the first shock x is %0 I1%

-0 and the part due to the second shock

z is %0 I2%-0. Generalizing to k steps can be done as follows:

Vart(yt+k) =k"1$

j=0%j %

-j.

Then,

wk,1 =k"1$

j=0%j I1 %

-j


is the variance of k step ahead forecast errors due to the 1 -th shockand the variance is sum of these components, e.g. Vart(yt+k) =%1 wk,1 .

5.4 Granger Causality

The first thing that you learn in econometrics is a caution thatputting x on the right hand side of y = x-% + e does not meanthat x “causes” y. Then you learn that causality is not somethingyou can test for statistically, but must be known a priori. It turnsout that there is a limited sense in which we can test whether onevariable “causes” another and vice versa.

Granger-causality statistics examine whether lagged values of onevariable help to predict another variable. The variable y fails toGranger-cause the variable x if for all s > 0 the MSE of a forecast ofxt+s based on (xt, xt"1, . . .) is the same as the MSE of a forecast ofxt+s that uses both (xt, xt"1, . . .) and (yt, yt"1, . . .). If one considersonly linear functions, y fails to Granger-cause x if

MSE[ #E(xt+s |xt, xt"1, . . .)] = MSE[ #E(xt+s |xt, xt"1, . . . , yt, yt"1, . . .)].

Equivalently, we say that x is exogenous in the time series sense withrespect to y if the above holds. Or, y is not linear informative aboutfuture x.

In a bivariate VAR model describing x and y, x does not Granger-cause y if the coe"cients matrices $j are lower triangular for all j,i.e.-

.xt

yt

/

0 =-

.c1

c2

/

0+

-

:."(1)

11 0"(1)

21 "(1)22

/

;0

-

.xt"1

yt"1

/

0+· · ·+-

:."(p)

11 0"(p)

21 "(p)22

/

;0

-

.xt"p

yt"p

/

0+-

.e1t

e2t

/

0


The Granger-causality can be tested by conducting F -test of the nullhypothesis

H0 : "(1)21 = "(2)

21 = · · · = "(p)21 = 0.

The first and most famous application of Granger causality was thequestion of whether “money growth causes changes in GNP”. Fried-man and Schwartz (1963) documented a correlation between moneygrowth and GNP. But Tobin (1970) argued that a phase lead anda correlation may not indicate causality. Sims (1980) answered thiscriticism and had shown that money Granger causes GNP and notvice versa (he found di!erent results later). Sims (1980) analyzed thefollowing regression to study e!ect of money on GNP:

yt =&$

j=0bj mt"j + ut.

This regression is known as a “St. Louis Fed” equation. The coe"-cients were interpreted as the response of y to changes in m, i.e. ifthe Fed sets m, {bj} gives the response of y. Since the coe"cientswere relatively big, it implied that constant money growth rules weredesirable.

The obvious objection to this statement is that coe"cients mayreflect reverse causality: the Fed sets money in anticipation of sub-sequent economic growth, or the Fed sets money in response to pasty. This means that the error term u is correlated with current andlagged m so OLS estimates of the parameters b are inconsistent dueto the endogeneity problem. Why is “Granger causality” not “causal-ity”? Granger causality is not causality because of the possible e!ectof other variables. If x leads to y with one lag but to z with two lags,then y will Granger cause z in bivariate system. The reason is thaty will help forecast z because it reveals information about the “true


Table 5.1: Sims variance decomposition in three variable VAR model

Explained by shocks toVar. of M1 IP WPI

M1 97 2 1IP 37 44 18

WPI 14 7 80

Table 5.2: Sims variance decomposition including interest rates

Explained by shocks toVar. of R M1 IP WPI

R 50 19 4 28M1 56 42 1 1IP 2 32 60 6

WPI 30 4 14 52

cause” x. But it does not follow that if you change y then a changein z will follow.

This would not be a problem if the estimated pattern of causalityin macroeconomic time series was stable over the inclusion of severalvariables. An example by Sims (1980) illustrated that is not the casevery often. Sims (1980) estimated three variable VAR with money,industrial production and whole sale price index and four variableVAR with interest rate, money, industrial production and wholesaleprice index. The results are in Table 5.1. The first row in Table5.1 verifies that M1 is exogenous because it does not respond toother variables’ shocks. The second row shows that M1 “causes”changes in IP , since 37% of the 48 month ahead variance of IP isdue to M1 shocks. The third row is puzzling because it shows thatWPI is exogenous. Table 5.2 shows what happens when on morevariable, interest rate, is added to the model. The second row showssubstantial response of M1 to interest rate shocks. In this model M1


is not exogenous. In the third row one can see that M1 does influenceIP ; the fourth row shows that M1 does not influence WPI , interestrate does.

5.5 Forecasting

To do forecasting, we need to do the following things. First, choosethe lag length of VAR using either one of the information criteria(AIC, SIC, AICC) or one of the forecasting criteria; second, estimatethe VAR model using the OLS and obtain the parameter estimates##j; and finally, the h-period-ahead forecasts are constructed recur-sively:

"ytt+1 = ##1 yt + ##2 yt"1 + · · · + ##p yt"p+1

"ytt+2 = ##1 "yt

t+1 + ##2 yt + · · · + ##p yt"p+2

"ytt+3 = ##1 "yt

t+2 + ##2 "ytt+1 + · · · + ##p yt"p+3

and so on.

How well do VARs perform the tasks? Because VAR involvescurrent and lagged values of multiple time series, they capture co-movements that can not be detected in univariate models. VARsummary statistics like Granger-causality tests, impulse responsefunctions and variance decompositions are well-accepted methods forportraying co-movements. Small VARs have become a benchmarkagainst which new forecasting systems are judged. The problem isthat small VARs are often unstable and thus poor predictors of thefuture.

5.6 Problems

1. Write a referee report for the paper by Sims (1992).


2. Use quarterly data for real GDP, GDP deflator, CPI, the FederalFunds rate, Money base measure, M1, and the index of commod-ity prices for the period 1959.I-2002.III (file “TEps8data.xls”).Also, use the monthly data for the Total Reserves (file “TotRe-sAdjRR.xls”) and Non-Borrowed Reserves (file “BOGNONBR.xls”)for the period 1959.1-2002.9. Transform monthly data into quar-terly data. We examine the VAR model, investigated by Chris-tiano, Eichenbaum and Evans (2000). The model has sevenvariables: the log of real GDP (Y ), log of consumer price in-dex (CPI), change in the index of sensitive commodity prices(Pcom), Federal funds rate(FF ), log of total reserves (TR),log of non-borrowed reserves (NBR), and log of M1 monetaryaggregate (M1). A monetary policy shock in the model is rep-resented by a shock to the Federal funds rate: the informationset consists of current and lagged values of Y , CPI and Pcom,and only lagged values of FF , TR, NBR and M0. It impliesthe following ordering of the variables in the model:

x-t = (Y, CPI, Pcom, FF, TR, NBR, M1).

The reference for this paper is Christiano, Eichenbaum and Evans(2000).

(a) Construct graphs of all time series data. Comment your results.

(b) Estimate a VAR model for xt. (i) Nicely present the impulse re-sponse functions representing the response of all variables in themodel to a monetary policy shock (FF rate). Carefully explainyour results. (ii) Nicely present the variance-decomposition re-sults for all variables for the forecast horizons k = 2, 4, 12, 36.Carefully explain your results. (iii) Conduct Granger Causal-ity tests for all variables in the model. Carefully explain your


results.

(c) Most macro economists agree that there was a shift of monetarypolicy toward inflation during the late 1970s from accommodat-ing to aggressive. Estimate model for two periods 1959.I-1979.IIand 1979.III -2002.III. Nicely present the impulse response func-tions representing the response of all variables in the model toa monetary policy shock (FF rate) for both periods. Carefullyexplain your results. How do impulse response function revealthe change in the monetary policy?

(d) Use the VAR model to construct 8-period-ahead forecasts for allthe variables in the model.

3. Some macro economists looked at the NBR and NBR/TRspecifications of monetary policy shocks. In the case of a NBRmonetary policy shock, the information set is identical to a FFshock, while in the case of a NBR/TR shock the information setincludes also the current value of total reserves. Use a recursivescheme for identification and examine the e!ect of a monetarypolicy shock for a NBR specification (think how you shouldreorder the variables in model).

(a) Nicely present the impulse response functions representing theresponse of all variables in the model to a monetary policy shock(the level of NBR). Explain your results.

(b) Nicely present the variance-decomposition results for all variablesshowing the contribution of NBR shock only for the forecasthorizons k = 2, 4, 12, 36. Explain your results.

4. Write a program for implementing FAVAR approach of Bernankeet al. (2005).


(a) Run the program and explain the main steps in the estimation.

(b) Using the estimation results, carefully explain the e!ect of mon-etary policy on 90-day T-bill rate, 1-year T-bond interest rate,5-year T-bond interest rate and 10-year T-bond interest rate.

(c) Carefully explain your finding on the e!ect of monetary policy onemployment. You need to use the impulse response functions foremployment series, unemployment series, average hours worked,and new claims for unemployment.

(d) Explain the e!ect of a shock to the Federal Funds rate on di!er-ent aggregate measures of money supply.

(e) Explain the e!ect of a monetary policy shock on exchange rateand real stock prices.

(f) Explain the e!ect of a monetary policy shock on di!erent mea-sures of GDP (real GDP, di!erent measures of Industrial Pro-duction, etc.).

(g) What happens to the results if the number of Di!usion Indexes(see Stock and Watson (2002)) is increased from three to five?

5. Assume that you are faced with a task of forecasting di!erentinterest rates. Explain how you may apply the Di!usion Indexesapproach of Stock and Watson (2002) to use as many variablesas possible in the forecasting.

6. Write a referee report for the paper by Bachmeier, Leelahanonand Li (2005). Please think about any possible future projects.


5.7 References

Bachmeier, L., S. Leelahanon and Q. Li (2005). Money growth and inflation in the UnitedStates. Working Paper, Department of Economics, Texas A&M University.

Bernanke, B.S., J. Boivin and E. Piotr (2005). Measuring the e!ects of monetary policy: afactor-augmented vector autoregressive (FAVAR) approach. The Quarterly Journal ofEconomics, 120, 387-422.

Christiano, L.J., M. Eichenbaum and C.L. Evans (2000). Monetary policy shocks: whathave we learned and to what end? Handbook of Macroeconomics, Vol. 1A.

Cochrane, J.H. (1994). Shocks. NBER working paper #46984.

Cochrane, J.H. (1997). Time series for macroeconomics and finance. Lecture Notes.http://www-gsb.uchicago.edu/fac/john.cochrane/research/Papers/timeser1.pdf

Friedman, M. and A.J. Schwartz (1963). A Monetary History of the United States, 1867-1960. Princeton University Press.



Newey, W.K. and K.D. West (1987). A simple, positive-definite, heteroskedasticity andautocorrelation consistent covariance matrix. Econometrica, 55, 703-708.

Sims, C. (1980). Macroeconomics and reality. Econometrica, 48, 1-48.

Sims, C.A. (1992). Interpreting the macroeconomic time series facts: the e!ects of monetarypolicy. European Economic Review, 36, 975-1000.

Stock, J.H. and M.W. Watson (2001). Vector autoregressions. Journal of Economic Per-spectives, 15, 101-115.

Stock, J.H. and M.W. Watson (2002). Macroeconomic forecasting using di!usion indexes.Journal of Business and Economic Statistics, 20, 147-162.


Tobin, J. (1970). Money and income. Quarterly Journal of Economics.

Chapter 6

Cointegration

6.1 Introduction

Cointegration is a generalization of unit root to vector models asa single series can not be cointegrated. Cointegration analysis isdesigned to find linear combinations of variables to remove unit roots.Suppose that two series are each integrated with the following MArepresentation:

(1 " L) yt = a(L) ut, and (1 " L) xt = b(L) vt.

In general, linear combinations of yt and xt will also have a unitroots. But if there is some linear combination, yt " ! xt, that isstationary, yt and xt are said to be cointegrated and & = (1,"!)- istheir cointegrating vector. Cointegration vectors are of considerableinterest when they exist, since they determine I(0) relations thathold between variables that are individually non-stationary.

As an example, we may look at the real GNP and consumption.Each of these series probably has a unit root but the ratio of consump-tion to real GNP is stable over the long periods of time. Therefore,log consumption minus log GNP is stationary, and log GNP andconsumption are cointegrated. Other possible examples include thedividend/price ratio or money and prices. However, cointegration

234

CHAPTER 6. COINTEGRATION 235

does not say anything about the direction of causality.

6.2 Cointegrating Regression

Cointegrating vectors are “super-consistent” which means that youcan estimate them by OLS even when the right hand side variablesare correlated with the error term, and the estimates converge at afaster rate than usual OLS estimates. Suppose yt and xt are cointe-grated so that yt " ! xt is stationary. Estimate the following modelusing OLS regression:

yt = % xt + et. (6.1)

OLS estimates of % converge to !, even if the errors are correlatedwith xt. Note that if yt and xt are each individual I(1) but are notcointegrated then the regression in equation (6.1) results in spuri-ous regression. Therefore, you have to check whether the estimatedresiduals "et are I(1) or I(0). We will discuss it later in the notes.

Representation of Cointegrating System

Let yt be a first di!erence stationary vector time series. The elementsof yt are cointegrated if there is at least one vector &, cointegratingvector, such that &- yt is stationary in levels. Since the di!erence ofyt is stationary, it has a moving average representation

(1 " L) yt = A(L) et.

Since the stationarity of &- yt is an extra restriction, it must imply arestriction on A(L).

Similar to the univariate Beveridge-Nelson (1981) decomposition,the multivariate Beveridge-Nelson decomposition can be done in the


same way as:yt = zt + ct

where (1 " L) zt = A(1) et and ct = A*(L) et with A*j =

%&k=j+1 Ak.

The restriction on A(1) implied by cointegration: the elements ofyt are cointegrated with cointegrating vectors & if and only if (i!)&-A(1) = 0. This implies that the rank of A(1) is the number ofelements of yt minutes number of cointegrating vectors &. There arethree cases for A(1): First, A(1) = 0 i! yt is stationary in levelsand all linear combinations of yt are stationary in levels. Second,A(1) is not full rank i! (1 " L) yt is stationary and some linearcombinations &- yt are stationary. Finally, A(1) has full rank i! (1"L) yt is stationary and no linear combinations of yt are stationary.

Impulse response function

A(1) is the limiting impulse-response of the levels of the vector yt =(xt, zt)-. To see how cointegration a!ects A(1), consider a simplecase, & = (1,"1)-. The reduced rank of A(1) means:

&-A(1) = 0, or (1 " 1)-

.A(1)xx A(1)zxA(1)xz A(1)zz

/

0 = 0.

Therefore, A(1)xx = A(1)zx and A(1)xz = A(1)zz so that each vari-able’s long-run response to a shock must be the same.

6.3 Testing for Cointegration

There are several ways to decide whether variables can be modeled ascointegrated: First, use expert knowledge and economic theory. Sec-ond, graph the series and see whether they appear to have a commonstochastic trend. Finally, perform statistical tests for cointegration.


All three methods should be used in practice. We will consider resid-ual based statistical test for cointegration.

Testing for cointegration when cointegrating vector is known

Sometimes, a researcher may know the cointegrating vector basedon the economic theory. For example, the hypothesis of purchasingpower parity implies that:

Pt = St $ P *t ,

where Pt is an index of the price level in the U.S., St is the exchangerate ($/Chinese Yuan), P *

t is a price index for China. Taking logs,this equation can be written as:

pt = st + p*t

A weaker version of the hypothesis is that the variable vt defined by:

vt = pt " st " p*t = (1 " 1 " 1)

-

::::.

pt

st

p*t

/

;;;;0

is stationary, even though the individual elements (pt, st, p*t ) areall I(1). In this case the cointegrating vector & is known to be(1,"1,"1). Testing for cointegration in this case consists of severalsteps:

1. Verify that pt, st, p*t are each individually I(1). This willbe true if (a) You test for unit root in levels of these seriesand can no reject the null hypothesis of unit root (ADF orother tests for unit root). (b) You test for unit root in firstdi!erences of these series and reject the null hypothesis ofunit root (ADF or other tests for unit root).

2. Test that a series vt is stationary or not.


Table 6.1: Critical values for the Engle-Granger ADF statistic

Number of X "s in equation (6.1) 10% 5% 1%1 -3.12 -3.41 -3.962 -3.52 -3.80 -4.363 -3.84 -4.16 -4.734 -4.20 -4.49 -5.07

Testing for cointegration when cointegrating vector is unknown

Consider an example in which two series yt and xt are cointegratedwith cointegrated vector & = (1,"!)- so that vt = yt " ! xt isstationary. However, the cointegrating coe"cient ! is not known. Toestimate !, we can use the Engle-Granger Augmented Dickey-Fuller(EG-ADF) test for cointegration which consists of two steps:

1. Verify that yt and xt are each individually I(1).

2. Estimate the cointegrating coe"cient ! using OLS estima-tion of the regression yt = µ + ! xt + vt.

3. A Dickey-Fuller t-test (with intercept µ no time trend) isused to test for a unit root in the residual from this regres-sion, "vt. Since we estimate residuals in the first step, weneed to use di!erent critical values for the unit root test.Critical values for EG-ADF statistic are given in Table 6.1.This table is taken from Stock and Watson (2002).

If xt and yt are cointegrated, then the OLS estimator of the co-e"cient in the regression is super-consistent. Therefore, the OLSestimator has a non-normal distribution, and inferences based on itst-statistic can be misleading. To avoid this problem, Stock and Wat-son (1993) developed dynamic OLS (DOLS) estimator of ! from the


following regression:

yt = µ + ! xt +p$

j="p0j# xt"j + ut.

If xt and yt are cointegrated, statistical inferences about ! and 0-sbased on HAC standard errors are valid.

If xt were strictly exogenous, then the coe"cient on xt, !, wouldbe the long-run cumulative multiplier, that is, the long-run e!ecton y of a change in x. See the long-run multiplier between oil andgasoline prices in the paper by Borenstein et al. (1997).

6.4 Cointegrated VAR Models

We start with the autoregressive representation of levels of yt, B(L) yt =et:

yt = "B1 yt"1 " B2 yt"2 " · · · + et

Applying BN decomposition B(L) = B(1)+(1"L)B*(L), we obtain:

[B(1) + (1 " L)B*(L)] yt = B(1) yt + B*(L) # yt = et

so that

yt = "[B1 + B2 + · · ·] yt"1 "&$

j=1B*

j #yt"j + et.

Subtracting yt"1 from both sides, we get

# yt = "B(1) yt"1 "&$

j=1Bj #yt"j + et.

The matrix B(1) controls the cointegration properties:

1. If B(1) is full rank, any linear combination of yt is stationaryand yt is stationary. In this case, we run a normal VAR.


2. If B(1) has rank between 0 and full rank. There are somelinear combinations of yt that are stationary, so yt is sta-tionary. The VAR in levels is consistent but ine"cient ifyou know the cointegrating vector and the VAR in di!er-ences is misspecified in this case.

3. If B(1) has rank zero, so no linear combination of yt is sta-tionary and #yt is stationary with no cointegration. In thiscase we run a normal VAR in di!erences.

Error Correction Representation

If B(1) has less than full rank, we can express it as:

B(1) = # &-.

If there are K cointegrating vectors, then the rank of B(1) is K and# and & each have K columns. Then the system can be rewritten as

# yt = "# &- yt"1 "&$

j=1B*

j # yt"j + et,

where &- is a K $ N matrix of cointegrating vectors. The aboveexpression is the well known error-correction model (ECM) repre-sentation of the integrated system. It is not easy to estimate thismodel when all cointegrated vectors in & are unknown.

Consider a multivariate model consisting of two variables xt andwt which are individually I(1). One may model these two variablesusing one of the following models:

1. A VAR model in levels

2. A VAR in the first di!erences


3. An ECM representation.

With cointegration, a pure VAR in di!erences is misspecified.

# xt = a(L) # xt"1 + b(L) # zt"1 + et

# zt = c(L) # xt"1 + d(L) # zt"1 + vt

Looking at the error-correction form, there is a missing regressor,&x xt"1 +&z zt"1. This is a problem. A pure VAR in levels is a littlebit unconventional since the variables in the model are nonstationary.The VAR in levels is not misspecified and the estimates are consistentbut the coe"cients may have non-standard distributions and they arenot e"cient. If there is cointegration, it imposes restrictions on B(1)that are not imposed in a pure VAR in levels. Cochrane (1994)suggested that one way to impose cointegration is to run an error-correction VAR:

# xt = #x(&x xt"1 + &z zt"1) + a(L) # xt"1 + b(L) # zt"1 + et

# zt = #w(&x xt"1 + &z zt"1) + c(L) # xt"1 + d(L) # zt"1 + vt

This specification imposes that x and z are cointegrated with coin-tegrating vector &. This is very useful if you know that the variablesare cointegrated and you know the cointegrating vector. Otherwise,you have to pre-test for cointegration and estimate the cointegratingvector in a separate step.

Another di"culty with the error-correction form is that it doesnot fit nicely into standard VAR packages. A way to use standardpackages is to estimate companion form:

# xt = a(L) # xt"1 + b(L) (&x xt"1 + &z zt"1) + et

&x xt + &z zt = c(L) # xt"1 + d(L)(&x xt"1 + &z zt"1) + vt

We need to know cointegrating vector to use this procedure. Thereis much debate as to which approach is best. When you do not


really know whether there is cointegration or what the cointegratingvector is, the VAR in levels seems to be better. When you knowthat there is cointegration and what the cointegrating vector is, theerror-correction form model or VAR in companion form is better.

Some of unit root and cointegration tests are provided by the pack-ages tseries, urca and uroot in R. For example, in tseries, thefunction po.test is for the Phillip-Ouliaris’s (1990) test for testingthat x is not cointegrated. There are many other test methods avail-able in the packages urca and uroot; for details, see their manuals.

6.5 Problems

1. The interest rates may be found in the Excel file “IntRates.xls”.Check whether 90-day T-bill rate and 10-year T-bond interestrate are cointegrated. Carefully explain how you conduct thetest and how you interpret your findings.

2. Download data (by yourself) from the web site to have data onconsumer price index (CPI), producer price index (PPI), three-month T-bill rate, the index of industrial production, S&P 500common stock price index. The data for industrial productionis seasonally adjusted while all other variables are not seasonallyadjusted. Conduct a test of cointegration between the variables.Explain your results.

3. Collect the following macroeconomic annual time series of Chinafrom China Statistical Yearbook: GNP, GDP, GDP1 (GDP ofprimary industry), GDP2 (GDP of secondary industry), GDP3(GDP of tertiary industry), and per capita GDP. From both thenominal and real terms of the definitions of national products,derive the corresponding price deflators.


(a) Define and test the unit roots for each of the 18 economic vari-ables (nominal, real, and deflator of GDPs) assuming no struc-tural break in the series.

(b) Define and test the unit roots for each of the 18 economic vari-ables (nominal, real, and deflator of GDPs) assuming one-timestructural break in the series.

(c) Conduct cointegration tests for some of the 18 economic vari-ables. Explain your results.

6.6 References

Bernanke, B.S., J. Boivin and E. Piotr (2005). Measuring the e!ects of monetary policy: afactor-augmented vector autoregressive (FAVAR) approach. The Quarterly Journal ofEconomics, 120, 387-422.

Borenstein, S., A.C. Cameron and R. Gilbert (1997). Do gasoline prices respond asymmet-rically to crude oil price changes? The Quarterly Journal of Economics, 112, 305-339.

Christiano, L.J., M. Eichenbaum and C.L. Evans (2000). Monetary policy shocks: whathave we learned and to what end? Handbook of Macroeconomics, Vol. 1A.

Cochrane, J.H. (1994). Shocks. NBER working paper #46984.

Cochrane, J.H. (1997). Time series for macroeconomics and finance. Lecture Notes.

Engle, R.F. and C.W.J. Granger (1987). Cointegration and error correction: Representa-tion, estimation and testing. Econometrica, 55, 251-276.



Phillip, P.C.B. and S. Ouliaris (1990). Asymptotic properties of residual based on tests forcointegration. Econometrica, 58, 165-193.

Stock, J.H. and M.W. Watson (1993). A simple estimator of cointegrating vectors in higherorder integrated systems. Econometrica, 61, 1097-1107.


Chapter 7

Nonparametric Density, Distribution& Quantile Estimation

7.1 Mixing Conditions

It is well known that &-mixing includes many time series models asa special case. In fact, under very mild assumptions linear autore-gressive and more generally bilinear time series models are &-mixingwith mixing coe"cients decaying exponentially. Many nonlinear timeseries models, such as functional coe"cient autoregressive processeswith/without exogenous variables, ARCH and GARCH type pro-cesses, stochastic volatility models, and nonlinear additive autore-gressive models with/without exogenous variables, are strong mixingunder some mild conditions. See Cai (2002) and Chen and Tang(2005) for more details.

To simplify the notation, we only introduce mixing conditions forstrictly stationary processes (in spite of the fact that a mixing processis not necessarily stationary). The idea is to define mixing coe"cientsto measure the strength (in di!erent ways) of dependence for the twosegments of a time series which are apart from each other in time.

244

CHAPTER 7. NONPARAMETRIC DENSITY, DISTRIBUTION & QUANTILE ESTIMATION245

Let {Xt} be a strictly stationary time series. For n ( 1, define

&(n) = supA+F0

"&;B+F&n

|P (A)P (B) " P (AB)|,

where F ji denotes the )-algebra generated by {Xt; i # t # j}. Note

that F&n 3. If &(n) 1 0 as n 1 &, {Xt} is called &-mixing or

strong mixing. There are several other mixing conditions such as$-mixing, %-mixing, "-mixing, and ,-mixing; see the books by Halland Heyde (1980) and Fan and Yao (2003, page 68). It is well knownthat the relationships among the mixing conditions are ,-mixing =0"-mixing =0 $-mixing and %-mixing =0 &-mixing.

Lemma 1: (Davydov’s inequality) (i) If E|Xi|p + E|Xj|q < & forsome p ( 1 and q ( 1 and 1/p + 1/q < 1, it holds that

|Cov(Xi, Xj)| # 8&1/r(|j " i|){E|Xi|p}1/p{E|Xj|q}1/q,

where r = (1 " 1/p " 1/q)"1.(ii) If P (|Xi| # C1) = 1 and P (|Xj| # C2) = 1 for some constantsC1 and C2, it holds that

|Cov(Xi, Xj)| # 4&(|j " i|) C1 C2.

Note that if we allow Xi and Xj to be complex-valued random vari-ables, (ii) still holds with the coe"cient “4” on the RHS of the in-equality replaced by “16”.

7.2 Density Estimate

Let {Xi} be a random sample with a (unknown) marginal distri-bution F (·) (CDF) and its probability density function (PDF) f(·).The question is how to estimate f(·) and F (·). Since

F (x) = P (Xi # x) = E[I(Xi # x)] =L x

"& f(u)du,


and

f(x) = limh30

F (x + h) " F (x " h)

2 h,

F (x + h) " F (x " h)

2 h

if h is very small, by the method of moment estimation (MME), F (x)can be estimated by

Fn(x) =1

n

n$

i=1I(Xi # x),

which is called the empirical cumulative distribution function (ecdf),so that f(x) can be estimated by

fn(x) =Fn(x + h) " Fn(x " h)

2 h=

1

n

n$

i=1Kh(Xi " x),

where K(u) = I(|u| # 1)/2 and Kh(u) = K(u/h)/h. Indeed, thekernel function K(u) can be taken to be any symmetric densityfunction.

Exercise: Please show that Fn(x) is unbiased estimate of F (x) butfn(x) is biased estimate of f(x). Think about intuitively(1) why fn(x) is biased(2) where the bias comes from(3) why K(·) should be symmetric.

7.2.1 Asymptotic Properties

Let us look at the variance of estimators. If {Xi} is stationary, then

n Var(Fn(x)) = Var(I(Xi # x)) + 2n$

i=2

-

.1 "i " 1

n

/

0 Cov(I(X1 # x), I(Xi #

= F (x)[1 " F (x)] + 2n$

i=2Cov(I(X1 # x), I(Xi # x))

1 23 41)2(x) by assuming that )2(x)<&


"2n$

i=2

i " 1

nCov(I(X1 # x), I(Xi # x))

1 23 410 by Kronecker Lemma

1 )2F (x) . F (x)[1 " F (x)] + 2

&$

i=2Cov(I(X1 # x), I(Xi # x))

1 23This term is called Ad

Therefore,n Var(Fn(x)) 1 )2

F (x). (7.1)

It is clear that Ad = 0 if {Xi} are independent. If Ad )= 0, thequestion is how to estimate it. We can use the HC estimator byWhite (1980) or the HAC estimator by Newey and West (1987); seeSection 3.10.

Next, we derive the asymptotic variance for fn(x). First, defineZi = Kh(Xi " x). Then,

E[Z1 Zi] =L L

Kh(u " x)Kh(v " x) f1,i(u, v)dudv

=L L

K(u)K(v) f1,i(x + u h, x + v h)dudv

1 f1,i(x, x),

where f1,i(u, v) is the joint density of (X1, Xi), so that

Cov(Z1, Zi) 1 f1,i(x, x) " f 2(x).

It is easy to show that

h Var(Z1) 1 +0(K) f(x),

where +j(K) =MujK2(u)du. Therefore,

n h Var(fn(x)) = Var(Z1) + 2hn$

i=2

-

.1 "i " 1

n

/

0 Cov(Z1, Zi)1 23 4.Af10 under some assumptions

1 +0(K) f(x).


To show that Af 1 0, let dn 1 & and dn h 1 0. Then,

|Af | # hdn$

i=2|Cov(Z1, Zi)| + h

n$

i=dn+1|Cov(Z1, Zi)|.

For the first term, if f1,i(u, v) # M1, then, it is bounded by h dn =o(1). For the second term, we apply the Davydov’s inequality toobtain

hn$

i=dn+1|Cov(Z1, Zi)| # M2

n$

i=dn+1&(i)/h = O(d"%+1

n h"1)

if &(n) = O(n"%) for some % > 2. If dn = O(h"2/%), then, thesecond term is dominated by O(h1"2/%) which goes to 0 as n 1 &.Hence,

n h Var(fn(x)) 1 +0(K) f(x). (7.2)

We can establish the following asymptotic normality for fn(x) butthe proof will be discussed later.

Theorem 1: Under regularity conditions, we have

'n h

'

9(fn(x) " f(x) "h2

2µ2(K) f --(x) + op(h

2)

)

<* 1 N (0, +0(K) f(x)) .

Exercise: By comparing (7.1) and (7.2), what can you observe?

Example 1: Let us examine how importance the choice of band-width is. The data {Xi}n

i=1 are generated from N(0, 1) (iid) andn = 300. The grid points are taken to be ["4, 4] with an increment# = 0.1. Bandwidth is taken to be 0.25, 0.5 and 1.0, respectivelyand the kernel can be the Epanechnikov kernel or Gaussian kernel.

Example 2: Next, we apply the kernel density estimation to thedensity of the weekly 3-month Treasury bill from January 2, 1970 toDecember 26, 1997.


Note that the computer code in R for the above two examplescan be found in Section 7.5. R has a build-in function density()for computing the nonparametric density estimation. Also, you canuse the command plot(density()) to plot the estimated density.Further, R has a build-in function ecdf() for computing the empir-ical cumulative distribution function estimation and plot(ecdf())for plotting the step function.

7.2.2 Optimality

As we already have shown that

E(fn(x)) = f(x) +h2

2µ2(K) f --(x) + o(h2),

and

Var(fn(x)) =+0(K) f(x)

n h+ o((nh)"1),

so that the asymptotic mean integrated squares error (AMISE) is

AMISE =h4

4µ2

2(K)L[f --(x)]2 +

+0(K)

n h.

Minimizing the AMISE gives the

hopt = C1(K) ||f --||"2/52 n"2/5, (7.3)

whereC1(K) =

7+0(K)/µ2

2(K)81/5

.

With this asymptotically optimal bandwidth, the optimal AMISE isgiven by

AMISEopt =5

4C2(K) ||f --||2/52 n"4/5,

whereC2(K) =

7+2

0(K) µ2(K)82/5

.


To choose the best kernel, it su"ces to choose one to minimize C2(K).

Proposition 1: The nonnegative probability density function Kminimizing C2(K) is a re-scaling of the Epanechnikov kernel:

Kopt(u) =3

4 a(1 " u2/a2)+

for any a > 0.

Proof: First of all, we note that C2(Kh) = C2(K) for any h >0. Let K0 be the Epanechnikov kernel. For any other nonnegativenegative K, by re-scaling if necessary, we assume that µ2(K) =µ2(K0). Thus, we need only to show that +0(K) # +0(K). LetG = K " K0. Then,

LG(u)du = 0 and

Lu2 G(u)du = 0,

which implies thatL(1 " u2) G(u)du = 0.

Using this and the fact that K0 has the support ["1, 1], we have

LG(u) K0(u)du =

3

4

L

|u|#1G(u)(1 " u2)du

= "3

4

L

|u|>1G(u)(1 " u2)du =

3

4

L

|u|>1K(u)(u2 " 1)du.

Since K is nonnegative, so is the last term. Therefore,L

K2(u)du =L

K20(u)du+2

LK0(u)G(u)du+

LG2(u)du (

LK2

0(u)du,

which proves that K0 is the optimal kernel.

Remark: This proposition implies that the Epanechnikov kernelshould be used in practice.


7.2.3 Boundary Correction

In many applications, the density f(·) has a bounded support. Forexample, the interest rate can not be less than zero and the incomeis always nonnegative. It is reasonable to assume that the interestrate has support [0, 1). However, because a kernel density estima-tor spreads smoothly point masses around the observed data points,some of those near the boundary of the support are distributed out-side the support of the density. Therefore, the kernel density es-timator under estimates the density in the boundary regions. Theproblem is more severe for large bandwidth and for the left boundarywhere the density is high. Therefore, some adjustments are needed.To gain some further insights, let us assume without loss of generalitythat the density function f(·) has a bounded support [0, 1] and wedeal with the density estimate at the left boundary. For simplicity,suppose that K(·) has a support ["1, 1]. For the left boundary pointx = c h (0 # c < 1) , it can easily be seen that as h 1 0,

E(fn(ch)) =L 1/h"c

"cf(ch + hu)K(u)du

= f(0+) µ0,c(K) + h f -(0+)[c µ0,c(K) + µ1,c(K)] + o(h),(7.4)

where f(0+) = limx30 f(x),

µj,c =L &"c

ujK(u)du, and +j,c(K) =L &"c

ujK2(u)du.

Also, we can show that Var(fn(ch)) = O(1/nh). Therefore,

fn(ch) = f(0+) µ0,c(K) + h f -(0+)[c µ0,c(K) + µ1,c(K)] + op(h).

Particularly, if c = 0 and K(·) is symmetric, then E(fn(0)) =f(0)/2 + o(1).

There are several methods to deal with the density estimation atboundary points. Possible approaches include the boundary ker-nel (see Gasser and Muller (1979) and Muller (1993)), reflection


(see Schuster (1985) and Hall and Wehrly (1991)), transformation(see Wand, Marron and Ruppert (1991) and Marron and Ruppert(1994)) and local polynomial fitting (see Hjort and Jones (1996)and Loader (1996)), and others.

Boundary Kernel

One way of choosing a boundary kernel is

K(c)(u) =12

(1 + c)4(1 + u)

=>?

>@(1 " 2c)u +

3c2 " 2c + 1

2

A>B

>CI["1,c].

Note K(1)(t) = K(t), the Epanechnikov kernel as defined above.Moreover, Zhang and Karunamuni (1998) have shown that this kernelis optimal in the sense of minimizing the MSE in the class of allkernels of order (0, 2) with exactly one change of sign in their support.The downside to the boundary kernel is that it is not necessarily non-negative, as will be seen on densities where f(0) = 0.

Reflection

The reflection method is to construct the kernel density estimatebased on the synthetic data {±Xt; 1 # t # n} where “reflected”data are {"Xt; 1 # t # n} and the original data a re {Xt; 1 # t #n}. This results in the estimate

fn(x) =1

n

=?

@

n$

t=1Kh(Xt " x) +

n$

t=1Kh("Xt " x)

AB

C , for x ( 0.

Note that when x is away from the boundary, the second term in theabove is practically negligible. Hence, it only corrects the estimatein the boundary region. This estimator is twice the kernel densityestimate based on the synthetic data {±Xt; 1 # t # n}.


Transformation

The transformation method is to first transform the data by Yi =g(Xi), where g(·) is a given monotone increasing function, rangingfrom "& to &. Now apply the kernel density estimator to thistransformed data set to obtain the estimate fn(y) for Y and applythe inverse transform to obtain the density of X . Therefore,

fn(x) = g-(x)1

n

n$

t=1Kh(g(Xt) " g(x)).

The density at x = 0 corresponds to the tail density of the trans-formed data since log(0) = "&, which can not usually be estimatedwell due to lack of the data at tails. Except at this point, the trans-formation method does a fairly good job. If g(·) is unknown in manysituations, Karunamuni and Alberts (2003) suggested a parametricform and then estimated the parameter. Also, Karunamuni and Al-berts (2003) considered other types of transformations.

Local Likelihood Fitting

The main idea is to consider the approximation log(f(Xt)) , P (Xt"x), where P (u " x) =

%pj=0 aj (u " x)j with the localized version of

log-likelihoodn$

t=1log(f(Xt)) Kh(Xt " x) " n

LKh(u " x)f(u)du.

With this approximation, the local likelihood becomes

L(a0, · · · , dp) =n$

t=1P (Xt"x) Kh(Xt"x)"n

LKh(u"x) exp(P (u"x))du.

Let {"aj} be the maximizer of the above local likelihoodL(a0, · · · , dp).Then, the local likelihood density estimate is

fn(x) = exp( "a0).


The maximizer does not exist, then fn(x) = 0. See Loader (1996)and Hjort and Jones (1996) for more details. If R is used for thelocal fit for density estimation, please use the function density.lf()in the package localfit.

Exercise: Please conduct a Monte Carol simulation to see whatthe boundary e!ects are and how the correction methods work. Forexample, you can consider some distribution densities with a finitesupport such as beta-distribution.

7.3 Distribution Estimation

7.3.1 Smoothed Distribution Estimation

The question is how to obtain a smoothed estimate of CDF F (x).Well, one way of doing so is to integrate the estimated PDF fn(x),given by

#Fn(x) =L x

"& fn(u)du =1

n

n$

i=1K

-

.x " Xi

h

/

0 ,

where K(x) =Mx"& K(u)du; the distribution of K(·). Why do we

need this smoothed estimate of CDF? To answer this question, weneed to consider the mean squares error (MSE).

First, we derive the asymptotic bias. By the integration by parts,we have

E7#Fn(x)

8= E

'

(K-

.x " Xi

h

/

0

)

* =L

F (x " hu)K(u)du

= F (x) +h2

2µ2(K) f -(x) + o(h2).

Next, we derive the asymptotic variance.

E'

(K2-

.x " Xi

h

/

0

)

* =L

F (x " hu)b(u)du = F (x) " h f(x) ! + o(h),


where b(u) = 2 K(u)K(u) and ! =Mu b(u)du. Then,

Var'

(K-

.x " Xi

h

/

0

)

* = F (x)[1 " F (x)] " h f(x) ! + o(h).

Define Ij(x) = Cov (I(X1 # x), I(Xj+1 # t)) = Fj(x, x) " F 2(x)and

Inj(x) = Cov-

.K-

.x " X1

h

/

0 , K-

.x " Xj+1

h

/

0

/

0 .

By means of Lemma 2 in Lehmann (1966), the covariance Inj(x) maybe written as follows

Inj(t) =L 5

P'

(K-

.x " X1

h

/

0 > u, K-

.x " Xj+1

h

/

0 > v)

*

"P'

(K-

.x " X1

h

/

0 > u)

*P'

(K-

.x " Xj+1

h

/

0 > v)

*6dudv.

Inverting the CDF K(·) and making two changes of variables, theabove relation becomes

Inj(x) =L[Fj(x"hu, x"hv)"F (x"hu)F (x"hv)]K(u)K(v)dudv.

Expanding the right-hand side of the above equation according toTaylor’s formula, we obtain

|Inj(x) " Ij(x)| # C h2.

By the Davydov’s inequality (see Lemma 1), we have

|Inj(x) " Ij(x)| # C &(j),

so that for any 1/2 < 1 < 1,

|Inj(x) " Ij(x)| # C h2 1 &1"1 (j).

Therefore,

1

n

n"1$

j=1(n " j)|Inj(x) " Ij(x)| #

n"1$

j=1|Inj(x) " Ij(x)| # C h21

&$

j=1&1"1 (j) = O


provided that%&

j=1 &1"1 (j) < & for some 1/2 < 1 < 1. Indeed,

this assumption is satisfied if &(n) = O(n"%) for some % > 2. Bythe stationarity, it is clear that

n Var+#Fn(x)

,= Var

-

.K-

.x " X1

h

/

0

/

0 +2

n

n"1$

j=1(n " j)Inj(x).

Therefore,

n Var+#Fn(x)

,= F (x)[1 " F (x)] " h f(x) ! + o(h) + 2

&$

j=1Ij(x) + O(h21 )

= )2F (x) " h f(x) ! + o(h).

We can establish the following asymptotic normality for #Fn(x) butthe proof will be discussed later.


'n

'

9( #Fn(x) " F (x) "h2

2µ2(K) f -(x) + op(h

2)

)

<* 1 N+0,)2

F (x),.

Similarly, we have

n AMSE( #Fn(x)) =n h4

4µ2

2(K) [f -(x)]2 + )2F (x) " h f(x) !.

If ! > 0, minimizing the AMSE gives the

hopt =

-

:.! f (x)

µ22(K)[f -(x)]2

/

;0

1/3

n"1/3,

and with this asymptotically optimal bandwidth, the optimal AMSEis given by

n AMSEopt(#Fn(x)) = )2

F (x) "3

4

-

:.!2 f 2(x)

µ2(K)f -(x)

/

;0

2/3

n"1/3.


Remark: From the aforementioned equation, we can see that if! > 0, the AMSE of #Fn(x) can be smaller than that for Fn(x) inthe second order. Also, it is easy to that if K(·) is the Epanechnikovkernel, ! > 0.

7.3.2 Relative E&ciency and Deficiency

To measure the relative e"ciency and deficiency of #Fn(x) over Fn(x),we define

i(n) = min5k + {1, 2, . . .}; MSE(Fk(x)) # MSE

+#Fn(x)

,6.

We have the following results without the detailed proof which canbe found in Cai and Roussas (1998).

Proposition 2: (i) Under regularity conditions,

i(n)

n1 1, if and only if nh4

n 1 0.

(ii) Under regularity conditions,

i(n) " n

n h1 !(x), if and only if nh3

n 1 0,

where !(x) = f(x)!/)2F (x).

Remark: It is clear that the quantity !(x) may be looked upon asa way of measuring the performance of the estimate #Fn(x). Supposethat the kernel K(·) is chosen, so that ! > 0, which is equivalent to!(x) > 0. Then, for su"ciently large n, i(n) > n + nhn(!(x) " ').Thus, i(n) is substantially larger than n, and, indeed, i(n)"n tendsto &. Actually, Reiss (1981) and Falk (1983) posed the questionof determining the exact value of the superiority of ! over a certain


class of kernels. More specifically, let Km be the class of kernelsK : ["1, 1] 1 4 which are absolutely continuous and satisfy therequirements: K("1) = 0, K(1) = 1, and

M 1"1 uµK(u)du = 0, µ =

1, · · · , m, for some m = 0, 1, · · · (where the moment condition isvacuous for m = 0). Set 3m = sup{!;K + Km}. Then, Mammitzsch(1984) answered the question posed by showing in an elegant manner.See Cai and Roussas (1998) for more details and simulation results.

7.4 Quantile Estimation

Let X(1) # X(2) # · · · # X(n) denote the order statistics of {Xt}nt=1.

Define the inverse of F (x) as F"1(p) = inf{x + 4; F (x) ( p},where 4 is the real line. The traditional estimate of F (x) hasbeen the empirical distribution function Fn(x) based on X1, . . . , Xn,while the estimate of the p-th quantile (p = F"1(p), 0 < p < 1, is thesample quantile function (pn = F"1

n (p) = X([np]), where [x] denotesthe integer part of x. It is a consistent estimator of (p for &-mixingdata (Yoshihara, 1995). However, as stated in Falk (1983), Fn(x)does not take into account the smoothness of F (x); i.e., the exis-tence of a probability density function f(x). In order to incorporatethis characteristic, investigators proposed several smoothed quantileestimates, one of which is based on #Fn(x) obtained as a convolutionbetween Fn(x) and a properly scaled kernel function; see the previoussection. Finally, note that R has a command quantile() which canbe used for computing (pn, the nonparametric estimate of quantile.

7.4.1 Value at Risk

Value at Risk (VaR) is a popular measure of market risk associatedwith an asset or a portfolio of assets. It has been chosen by the Basel


Committee on Banking Supervision as a benchmark risk measureand has been used by financial institutions for asset management andminimization of risk. Let {Xt}n

t=1 be the market value of an assetover n periods of t = 1 a time unit, and let Yt = log(Xt/Xt"1) bethe log-returns. Suppose {Yt}n

j=1 is a strictly stationary dependentprocess with marginal distribution function F (y). Given a positivevalue p close to zero, the 1 " p level VaR is

+p = inf{u : F (u) ( p},which specifies the smallest amount of loss such that the probabil-ity of the loss in market value being larger than +p is less than p.Comprehensive discussions on VaR are available in Du"e and Pan(1997) and Jorion (2001), and references therein. Therefore, VaR canbe regarded as a special case of quantile. R has a build-in packagecalled VaR for a set of methods for calculation of VaR, particularly,for parametric models.

Another popular risk measure is the expected shortfall (ES) whichis the expected loss, given that the loss is at least as large as somegiven quantile of the loss distribution (e.g., VaR). It is well knownfrom Artzner, Delbaen, Eber and Heath (1999) that ES is a coherentrisk measure such as it satisfies the four axioms: homogeneity (in-creasing the size of a portfolio by a factor should scale its risk measureby the same factor), monotonicity (a portfolio must have greater riskif it has systematically lower values than another), risk-free conditionor translation invariance (adding some amount of cash to a portfolioshould reduce its risk by the same amount), and subadditivity (therisk of a portfolio must be less than the sum of separate risks ormerging portfolios cannot increase risk). VaR satisfies homogeneity,monotonicity, and risk-free condition but is not sub-additive. SeeArtzner, et al. (1999) for details.


7.4.2 Nonparametric Quantile Estimation

The smoothed sample quantile estimate of (p,"(p, based on #Fn(x), is

defined by:"(p = #F"1

n (p) = inf5x + 4; #Fn(x) ( p

6.

"(p is referred to in literature as the perturbed (smoothed) samplequantile. Asymptotic properties of "(p, both under independence aswell as under certain modes of dependence, have been investigatedextensively in literature; see Cai and Roussas (1997) and Chen andTang (2005).

By the di!erentiability of #Fn(x), we use the Taylor expansion andignore the higher terms to obtain

#Fn( "(p) = p , #Fn((p) " fn((p) ( "(p " (p), (7.5)

then,"(p " (p , [ #Fn((p) " p]/fn((p) , [ #Fn((p) " p]/f((p)

since fn(x) is a consistent estimator of f(x). As an application ofTheorem 2, we can establish the following theorem for the asymptoticnormality of "(p but the proof is omitted since it is similar to that forTheorem 2.


'n

'

9( "(p " (p "h2

2µ2(K) f -((p)/f((p) + op(h

2)

)

<* 1 N+0,)2

F ((p)/f2((p)

,.

Next, let us examine the AMSE. To this e!ect, we can derive theasymptotic bias and variance. From the previous section, we have

E7"(p8= (p +

h2

2µ2(K) f -((p)/f((p) + op(h

2),


andn Var

7"(p8= )2

F ((p)/f2((p) " h !/f((p) + o(h).

Therefore, the AMSE is

n AMSE( "(p) =n h4

4µ2

2(K) [f -((p)/f((p)]2+)2

F ((p)/f2((p)"h !/f((p).

If ! > 0, minimizing the AMSE gives the

hopt =

-

:.! f ((p)

µ22(K)[f -((p)]2

/

;0

1/3

n"1/3,

and with this asymptotically optimal bandwidth, the optimal AMSEis given by

n AMSEopt("(p) = )2

F ((p)/f2((p) "

3

4

-

:.!2

µ2(K)f -((p)f((p)

/

;0

2/3

n"1/3,

which indicates a reduction to the AMSE of the second order. Chenand Tang (2005) conducted an intensive study on simulations todemonstrate the advantages of nonparametric estimation "(p over thesample quantile (pn under the VaR setting. We refer to the paper byChen and Tang (2005) for simulation results and empirical examples.

Exercise: Please use the above procedures to estimate nonparamet-rically the ES and discuss its properties as well as conduct simulationstudies and empirical applications.

7.5 Computer Code

# July 20, 2006

graphics.off() # clean the previous garphs on the secreen


#########################################################

# Define the Epanechnikov kernel function


###############################################################

# Define the kernel density estimator

kernden=function(x,z,h,ker){

# parameters: x=variable; h=bandwidth; z=grid point; ker=kernel

nz<-length(z)

nx<-length(x)

x0=rep(1,nx*nz)

dim(x0)=c(nx,nz)

x1=t(x0)

x0=x*x0

x1=z*x1

x0=x0-t(x1)

if(ker==1){x1=kernel(x0/h)} # Epanechnikov kernel

if(ker==0){x1=dnorm(x0/h)} # normal kernel

f1=apply(x1,2,mean)/h

return(f1)

}

###################################################################

###################################################################

# Simulation for different bandiwidths and different kernels

n=300 # n=300

ker=1 # ker=1 => Epan; ker=0 => Gaussian

h0=c(0.25,0.5,1) # set initial bandwidths

z=seq(-4,4,by=0.1) # grid points

nz=length(z) # number of grid points

x=rnorm(n) # simulate x ~ N(0, 1)

if(ker==1){h_o=2.34*n^{-0.2}} # optimal bandwidth for Epanechnikov


if(ker==0){h_o=1.06*n^{-0.2}} # optimal bandwidth for normal

f1=kernden(x,z,h0[1],ker)



f4=kernden(x,z,h_o,ker)

text1=c("True","h=0.25","h=0.5","h=1","h=h_o")

data=cbind(dnorm(z),f1,f2,f3,f4) # combine them

win.graph()

matplot(z,data,type="l",lty=1:5,col=1:5,xlab="",ylab="")

legend(-1,0.2,text1,lty=1:5,col=1:5)

###################################################################

###################################################################

# A Real Example

##################

z1=matrix(scan(file="c:\\teaching\\time series\\data\\w-3mtbs7097.txt"),

byrow=T,ncol=4)

# dada: weekly 3-month Treasury bill from 1970 to 1997

x=z1[,4]/100 # decimal

n=length(x)

y=diff(x) # Delta x_t=x_t-x_{t-1}=change

x=x[1:(n-1)]

n=n-1

x_star=(x-mean(x))/sqrt(var(x)) # standardized

den_3mtb=density(x_star,bw=0.30,kernel=c("epanechnikov"),from=-3

den_est=den_3mtb$y # estimated density values

z_star=seq(-3,3,by=0.1)

text1=c("Estimated Density","Standard Norm")

win.graph()


par(bg="light green")

plot(den_3mtb,main="Density of 3mtb (Buind-in)",ylab="",xlab="",col.main="red")

points(z_star,dnorm(z_star),type="l",lty=2,col=2,ylab="",xlab=""

legend(0,0.45,text1,lty=c(1,2),col=c(1,2),cex=0.7)

h_den=0.5

f_hat=kernden(x_star,z_star,h_den,1)

ff=cbind(f_hat,dnorm(z_star))

win.graph()

par(bg="light blue")

matplot(z_star,ff,type="l",lty=c(1,2),col=c(1,2),ylab="",xlab=

title(main="Density of 3mtb",col.main="red")

legend(0,0.55,text1,lty=c(1,2),col=c(1,2),cex=0.7)

###################################################################

7.6 References

Artzner, P., F. Delbaen, J.M. Eber, and D. Heath (1999). Coherent measures of risk.Mathematical Finance, 9, 203-228.

Cai, Z. (2002). Regression quantile for time series. Econometric Theory, 18, 169-192.

Cai, Z. and G.G. Roussas (1997). Smooth estimate of quantiles under association. Statisticsand Probability Letters, 36, 275-287.

Cai, Z. and G.G. Roussas (1998). E"cient estimation of a distribution function underquadrant dependence. Scandinavian Journal of Statistics, 25, 211-224.

Chen, S.X. and C.Y. Tang (2005). Nonparametric inference of value at risk for dependentfinancial returns. Journal of Financial Econometrics, 3, 227-255.

Du"e, D. and J. Pan (1997). An overview of value at risk. Journal of Derivatives, 4, 7-49.

Fan, J. and Q. Yao (2003). Nonlinear Time Series: Nonparametric and Parametric Meth-ods. Springer-Verlag, New York.


Gasser, T. and H.-G. Muller (1979). Kernel estimation of regression functions. In SmoothingTechniques for Curve Estimation, Lecture Notes in Mathematics, 757, 23-68. Springer-Verlag, New York.

Falk, M.(1983). Relative e"ciency and deficiency of kernel type estimators of smoothdistribution functions. Statistica Neerlandica, 37, 73-83.

Hall, P. and C.C. Heyde (1980). Martingale Limit Theory and its Applications. AcademicPress, New York.

Hall, P. and T.E. Wehrly (1991). A geometrical method for removing edge e!ects fromkernel-type nonparametric regression estimators. Journal of American Statistical As-sociation, 86, 665-672.

Hjort, N.L. and M.C. Jones (1996). Locally parametric nonparametric density estimation.The Annals of Statistics, 24,1619-1647.

Jorion, P. (2001). Value at Risk, 2nd Edition. New York: McGraw-Hill.

Karunamuni, R.J. and T. Alberts (2003). On boundary correction in kernel density estima-tion. Working paper, Department of Mathematical an d Statistical Sciences, Universityof Alberta, Canada.

Lehmann, E. (1966). Some concepts of dependence. Annals of Mathematical Statistics, 37,1137-1153.

Loader, C. R. (1996). Local likelihood density estimation. The Annals of Statistics, 24,1602-1618.

Mammitzsch, V. (1984). On the asymptotically optimal solution within a certain class ofkernel type estimators. Statistics Decisions, 2, 247-255.

Marron, J.S. and D. Ruppert (1994). Transformations to reduce boundary bias in kerneldensity estimation. Journal of the Royal Statistical Society Series B, 56, 653-671.

Muller, H.-G. (1993). On the boundary kernel method for nonparametric curve estimationnear endpoints. Scandinavian Journal of Statistics, 20, 313-328.

Reiss, R.D. (1981). Nonparametric estimation of smooth distribution functions. Scandi-navia Journal of Statistics, 8, 116-119.

Schuster, E.F. (1985). Incorporating support constraints into nonparametric estimates ofdensities. Communications in Statistics Theory and Methods, 14, 1123-1126.

Wand, M.P., J.S. Marron and D. Ruppert (1991). Transformations in density estimation(with discussion). Journal of the American Statistical Association, 86, 343-361.

Yoshihara, K. (1995). The Bahadur representation of sample quantiles for sequences ofstrongly mixing random variables. Statistics and Probability Letters, 24, 299-304.


Zhang, S. and R.J. Karunamuni (1998). On Kernel density estimation near endpoints.Journal of Statistical Planning and Inference, 70, 301-316.

Chapter 8

Nonparametric Regression Estimation

8.1 Bandwidth Selection

8.1.1 Simple Bandwidth Selectors

The optimal bandwidth (7.3) is not directly usable since it dependson the unknown parameter ||f --||2. When f(x) is a Gaussian densitywith standard deviation ), it is easy to see from (7.3) that

hopt = (8'-/3)1/5C1(K)) n"1/5,

which is called the normal reference bandwidth selector inliterature, obtained by replacing the unknown parameter ) in theabove equation by the sample standard deviation s. In particular,after calculating the constant C1(K) numerically, we have the fol-lowing normal reference bandwidth selector

"hopt =

=>?

>@1.06 s n"1/5 for the Gaussian kernel2.34 s n"1/5 for the Epanechnikov kernel

Hjort and Jones (1996) proposed an improved rule obtained by usingan Edgeworth expansion for f(x) around the Gaussian density. Sucha rule is given by

"h*opt = hopt

-

.1 +35

48"#4 +

35

32"#23 +

385

1024#2

4

/

0"1/5

,

267

CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION 268

where "#3 and "#4 are respectively the sample skewness and kurtosis.

Note that the normal reference bandwidth selector is only a sim-ple rule of thumb. It is a good selector when the data are nearlyGaussian distributed, and is often reasonable in many applications.However, it can lead to over-smooth when the underlying distribu-tion is asymmetric or multi-modal. In that case, one can eithersubjectively tune the bandwidth, or select the bandwidth by moresophisticated bandwidth selectors. One can also transform data firstto make their distribution closer to normal, then estimate the densityusing the normal reference bandwidth selector and apply the inversetransform to obtain an estimated density for the original data. Sucha method is called the transformation method. There are quite afew important techniques for selecting the bandwidth such as cross-validation (CV) and plug-in bandwidth selectors. A conceptuallysimple technique, with theoretical justification and good empiricalperformance, is the plug-in technique. This technique relies on find-ing an estimate of the functional ||f --||2, which can be obtained byusing a pilot bandwidth. An implementation of this approach is pro-posed by Sheather and Jones (1991) and an overview on the progressof bandwidth selection can be found in Jones, Marron and Sheather(1996).

Function dpik() in the package KernSmooth in R selects abandwidth for estimating the kernel density estimation using theplug-in method.

8.1.2 Cross-Validation Method

The integrated squared error (ISE) of fn(x) is defined by

ISE(h) =L[fn(x) " f(x)]2dx.


A commonly used measure of discrepancy between fn(x) and f(x) isthe mean integrated squared error (MISE) MISE(h) = E[ISE(h)]. Itcan be shown easily (or see Chiu, 1991) that MISE(h) , AMISE(h).The optimal bandwidth minimizing the AMISE is given in (7.3). Theleast squares cross-validation (LSCV) method proposed by Rudemo(1982) and Bowman (1984) is a popular method to estimate theoptimal bandwidth hopt. Cross-validation is very useful for assessingthe performance of an estimator via estimating its prediction error.The basic idea is to set one of the data point aside for validation ofa model and use the remaining data to build the model. The mainidea is to choose h to minimize ISE(h). Since

ISE(h) =L

f 2n(x)dx " 2

Lf(x) fn(x)dx +

Lf 2(x)dx,

the question is how to estimate the second term on the right handside. Well, let us consider the simplest case when {Xt} are iid. Re-express fn(x) as

fn(x) =n " 1

nf ("s)

n (x) +1

nKh(Xs " x)

for any 1 # s # n, where

f ("s)n (x) =

1

n " 1

n$

t)=sKh(Xt " x),

which is the kernel density estimate without the sth observation,commonly called the jackknife estimate or leave-one-out estimate.It is easy to see that for any 1 # s # n,

fn(x) , f ("s)n (x).

Let Ds = {X1, · · · , Xs"1, Xs+1, · · · , Xn}. Then,

E7f ("s)

n (Xs) | Ds

8=

Lf ("s)

n (x)f(x)dx ,L

fn(x)f(x)dx,


which, by using the method of moment, can be estimated by 1n%n

s=1 f ("s)n (Xs).

Therefore, the cross-validation is

CV(h) =L

f 2n(x)dx "

2

n

n$

s=1f ("s)

n (Xs)

=1

n2

$

s,tK*

h(Xs " Xt) "2

n(n " 1)

n$

t)=sKh(Xs " Xt),

where K*h(·) is the convolution of Kh(·) and Kh(·) as

K*h(u) =

LKh(v) Kh(u " v)dv.

Let "hcv be the minimizer of CV(h). Then, it is called the optimalbandwidth based on the cross-validation. Stone (1984) showed that"hcv is a consistent estimate of the optimal bandwidth hopt.

Function lscv() in the package locfit in R selects a bandwidthfor estimating the kernel density estimation using the least squarescross-validation method.

8.2 Multivariate Density Estimation

As we discussed in Chapter 7, the kernel density or distribution esti-mation is basically one-dimensional. For multivariate case, the kerneldensity estimate is given by

fn(x) =1

n

n$

t=1KH(Xt " x), (8.1)

where KH(u) = K(H"1 u)/ det(H), K(u) is a multivariate kernelfunction, and H is the bandwidth matrix such as for all 1 # i, j # p,n hij 1 & and hij 1 0 where hij is the (i, j)th element of H . Thebandwidth matrix is introduced to capture the dependent structurein the independent variables. Particularly, if H is a diagonal matrix


and K(u) =Np

j=1 Kj(uj) where Kj(·) is a univariate kernel function,then, fn(x) becomes

fn(x) =1

n

n$

t=1

pO

j=1Khj(Xjt " xj),

which is called the product kernel density estimation. This case iscommonly used in practice. Similar to the univariate case, it is easyto derive the theoretical results for the multivariate case, which isleft as an exercise. See Wand and Jones (1995) for details.

Exercise: Please derive the asymptotic results given in (8.1) for thegeneral multivariate case.

In R, the build-in function density() is only for univariate case.For multivariate situations, there are two packages ks and KernS-mooth. Function kde() in ks can compute the multivariate densityestimate for 2- to 6- dimensional data and Function bkde2D() inKernSmooth computes the 2D kernel density estimate. Also, ksprovides some functions for some bandwidth matrix selection suchas Hbcv() and Hscv for 2D case and Hlscv() and Hpi().

8.3 Regression Function

Suppose that we have the information set It at time t and we want toforecast the future value, say Yt+1 (one step-ahead forecast, or Yt+s,s-step ahead). There are several forecasting criteria. The generalform is

m(It) = mina

E[$(Yt+1 " a) | It],

where $(·) is an objective (loss) function. There are three majorcriteria:(1) If $(·) is the quadratic function, then, m(It) = E(Yt+1 | It),


called the mean regression function.(2) If $1 (y) = y (1 " I{y<0}) called the “check” function, where1 + (0, 1) and IA is the indicator function of any set A, then, m(It)satisfies L m(It)

"& f(y | It) du = F (m(It) | It) = 1,

where f(y | It) and F (m(It) | It) are the conditional PDF and CDFof Yt+1, respectively, given It. This m(It) becomes the conditionalquantile or quantile regression, dented by q1 (It). Particularly, if1 = 1/2, then, m(It) is the well known absolute deviation (LAD)regression which is robust.(3) If $(x) = 1

2 x2 I|x|#M +M(|x|"M/2) I|x|>M , the so called Huberfunction in literature, then it is the Huber robust regression. We willnot discuss this topic. If you have an interest, please read the bookby Rousseeuw and Leroy (1987). In R, the library MASS has thefunction rlm for robust linear model. Also, the library lqs containsfunctions for bounded-influence regression.

Since the information set It contains too many variables (high di-mension), it is often to approximate It by some finite numbers ofvariables, say Xt = (Xt1, . . . , Xtp)T (p ( 1), including the laggedvariables and exogenous variables. First, our focus is on the meanregression m(Xt). Of course, by the same token, we can considerthe nonparametric estimation of the conditional variance )2(x) =Var(Yt|Xt = x). Why do we need to consider nonlinear (nonpara-metric) models in economic practice? You can find the answer in thebook by Granger, C.W.J., and T. Terasvirta (1993).


8.4 Kernel Estimation

Let us look at the Nadaraya-Watson estimate of the mean regressionm(Xt). The main idea is as follows:

m(x) =L

y f (y |x)dy =My f (x, y)dyMf(x, y)dy

,

where f(x, y) is the joint PDF of Xt and Yt. To estimate m(x),we can apply the plug-in method. That is, plug the nonparametrickernel density estimate fn(x, y) (double kernel method) into the righthand side of the above equation to obtain

Pmnw(x) =My fn(x, y)dyMfn(x, y)dy

= ...

=1

n

n$

t=1Yt Kh(Xt " x)/fn(x)

=n$

t=1Wt Yt,

where fn(x) is the kernel density estimation of f(x), defined in Chap-ter 7, and

Wt = Kh(Xt " x)/n$

t=1Kh(Xt " x).

Pmnw(x) is the well known Nadaraya-Watson (NW) estimator. Notethat the weights {Wt} do not depend on {Yt}. Therefore, Pmnw(x)is called a linear estimator, similar to the least squares estimate(LSE).

Let us look at the NW estimator from a di!erent angle. Pmnw(x)can be re-expressed as the minimizer of the weighted locally leastsquares; that is,

Pmnw(x) = mina

n$

t=1(Yt " a)2 Kh(Xt " x).


This means that when Xt is in a neighborhood of x, m(Xt) is approx-imated by a constant a (local approximation). Indeed, we considerthe following working model

Yt = m(Xt) + 't , a + 't

with the weights {Kh(Xt " x)}, where 't = Yt " E(Yt |Xt).

In the implementation, for each x, we can fit the following trans-formed linear model

Y *t = %1 X*

t + 't,

where Y *t =

&Kh(Xt " x) Yt and X*

t =&Kh(Xt " x). Therefore,

the Nadaraya-Watson estimator is also called the local constantestimator. In R, we can use functions lm() or glm() with weights{Kh(Xt " x)} to fit a weighted least squares or generalized linearmodel. Or, you can use the weighted least squares theory (matrixmultiplication).

8.4.1 Asymptotic Properties

We derive the asymptotic properties of the nonparametric estimatorfor the time series situations. Also, we consider the simple case thatp = 1.

Pmnw(x) =1

n

n$

t=1m(Xt) Kh(Xt " x)/fn(x)

1 23 4I1

+n$

t=1Wt 't

1 23 4I2

.

We will show that I1 contributes only bias and I2 gives the asymptoticnormality. First, we derive the asymptotic bias for the interior bound-ary points. By the Taylor’s expansion, when Xt is in (x"h, x+h),we have

m(Xt) = m(x) + m-(x)(Xt " x) +1

2m--(xt)(Xt " x)2,


where xt = x + !(Xt " x) with "1 < ! < 1. Then,

I11 =1

n

n$

t=1m(Xt) Kh(Xt " x)

= m(x) fn(x) + m-(x)1

n

n$

t=1(Xt " x) Kh(Xt " x)

1 23 4J1(x)

+1

2

1

n

n$

t=1m--(xt)(Xt " x)2 Kh(Xt " x)

1 23 4J2(x)

.

Then,

E[J1(x)] = E[(Xt " x) Kh(Xt " x)]

=L(u " x)Kh(u " x)f(u)du

= hL

uK(u)f(x + hu)du

= h2 f -(x) µ2(K) + o(h2).

Similar to the derivation of the variance of fn(x) in (7.2), we canshow that

nh Var(J1(x)) = O(1).

Therefore, J1(x) = h2 f -(x) µ2(K) + op(h2). By the same token, wehave

E[J2(x)] = E7m--(xt)(Xt " x)2 Kh(Xt " x)

8

= h2L

m--(x + ! hu)u2K(u)f(x + hu)du

= h2 m--(x) µ2(K) f(x) + o(h2)

and Var(J2(x)) = O(1/nh). Therefore, J2(x) = h2 m--(x) µ2(K) f(x)+op(h2). Hence,

I1 = m(x) + m-(x) J1(x)/fn(x) +1

2J2(x)/fn(x)

= m(x) +h2

2µ2(K) [m--(x) + 2 m-(x)f -(x)/f(x)] + op(h

2)


by the fact that fn(x) = f(x) + op(1). The term

Bnw(x) =h2

2µ2(K) [m--(x) + 2 m-(x)f -(x)/f(x)]

is regarded as the asymptotic bias. If p > 1 (multivariate case),Bnw(x) becomes

Bnw(x) =h2

2tr7µ2(K)

5m--(x) + 2 f -(x)m-(x)T/f(x)

68, (8.2)

where µ2(K) =Mu uTK(u)du. The bias term involves not only

curvatures of m(x) but also the unknown density function f(x) andits derivative f -(x) so that the design can not be adaptive.

Under some regularity conditions, similar to (7.2), we can showthat for x being an interior grid point,

n hp Var(I2) 1 +0(K))2'(x)/f(x) = )2

m(x),

where )2'(x) = Var('t |Xt = x). Further, we can establish the

asymptotic normality (the proof is provided later)'

n hp7Pmnw(x) " m(x) " Bnw(x) + op(h

2)81 N

50, )2

m(x)6,

where Bnw(x) is given in (8.2).

When p is large, there exists the so called “curse of dimension-ality”. To understand this problem quantitatively, we can look atthe rate of convergence. The bias is of order O(h2) and the varianceis of order O(/nhp). This leads to the optimal rate of convergencefor MSE O(n"2/(4+p)) by trading o! the rates between the bias andvariance. To have a comparable performance with one-dimensionalnonparametric regression with n1 data points, for p-dimensional non-parametric regression, we need

n"2/(4+p) = O(n"2/51 ),


Table 8.1: Sample sizes required for p-dimensional nonparametric regression to have compa-rable performance with that of 1-dimensional nonparametric regression using size 100

dimension 2 3 4 5 6 7 8 9 10sample size 252 631 1,585 3,982 10,000 25,119 63,096 158,490 398,108

or n = O(n(p+4)/51 ). Table 8.1 shows the result with n1 = 100. The

increase of required sample sizes is exponentially fast.

8.4.2 Boundary Behavior

As for the boundary behavior of the NW estimator, we can follow Fanand Gijbels (1996). Without loss of generality, we consider the leftboundary point x = c h, 0 < c < 1. From Fan and Gijbels (1996),we take K(·) to have support ["1, 1] and m(·) to have support[0, 1]. Similar to (7.4), it is easy to see that if x = c h,

E[J1(ch)]

= E[(Xt " ch) Kh(Xt " ch)]

=L 1

0(u " ch)Kh(u " ch)f(u)du

= hL 1/h"c

"cuK(u)f(h(u + c))du

= h f(0+) µ1,c(K) + h2 f -(0+)[µ2,c(K) + c µ1,c(K)] + o(h),

and

E[J2(ch)] = E7m--(xt)(Xt " ch)2 Kh(Xt " ch)

8

= h2L 1/h"c

"cm--(h(c + ! u))u2K(u)f(h(u + c))du

= h2 m--(0+) µ2,c(K) f(0+) + o(h2).

Also, we can see that

Var(J1(ch)) = O(1/nh) and Var(J2(ch)) = O(1/nh),


which imply that

J1(ch) = h f(0+) µ1,c(K) + op(h)

andJ2(ch) = h2 m--(0+) µ2,c(K) f(0+) + o(h2).

This, in conjunction with (7.4), gives

I1 " m(ch) = m-(ch) J1(ch)/fn(ch) +1

2J2(ch)/fn(ch)

= a(c, K) h + b(c, K) h2 + op(h2)

where

a(c, K) =m-(0+)µ1,c(K)

µ0,c(K),

and

b(c, K) =µ2,c(K) m--(0+)

2 µ0,c(K)

+f -(0+)m-(0+)[µ2,c(K) µ0,c(K) " µ2

1,c(K)]

f(0+) µ20,c(K)

.

Here, a(c, K) h+b(c, K) h2 serves as the asymptotic bias term, whichis of the order O(h).

We can show that at the boundary point, the asymptotic variancehas the following form

n hpVar(Pmnw(x)) 1 +0,c(K))2m(0+)/[µ0,c(K) f(0+)],

which the same order as that for the interior point although thescaling constant is di!erent.


8.5 Local Polynomial Estimate

To overcome the above shortcomings of local constant estimate, wecan use the local polynomial fitting scheme; see Fan and Gijbels(1996). The main idea is described as follows.

8.5.1 Formulation

Assume that the regression function m(x) has (q + 1)th order con-tinuous derivative. For ease notation, assume that p = 1. WhenXt + (x " h, x + h), then

m(Xt) ,q$

j=0

m(j)(x)

j!(Xt " x)j =

q$

j=0%j (Xt " x)j,

where %j = m(j)(x)/j!. Therefore, when Xt + (x " h, x + h), themodel becomes

Yt ,q$

j=0%j (Xt " x)j + 't.

Hence, we can apply the weighted least squares method. The weightedlocally least squares becomes

n$

t=1

-

:.Yt "q$

j=0%j (Xt " x)j

/

;0

2

Kh(Xt " x). (8.3)

Minimizing the above with respect to % = (%0, . . . , %q)T to obtainthe local polynomial estimate #%;

#% =+XT WX

,"1XT WY, (8.4)

where W = diag{Kh(X1 " x), · · · , Kh(Xn " x)},

X =

-

::::::::.

1 (X1 " x) · · · (X1 " x)q

1 (X2 " x) · · · (X2 " x)q... ... . . . ...1 (Xn " x) · · · (Xn " x)q

/

;;;;;;;;0

, and Y =

-

::::::::.

Y1

Y2...

Yn

/

;;;;;;;;0

.


Therefore, for 1 # j # q,

Pm(j)(x) = j! #%j.

This means that the local polynomial method estimates not only theregression function itself but also derivatives of regression.

8.5.2 Implementation in R

There are several ways of implementing the local polynomial esti-mator. One way you can do so is to write your own code by us-ing matrix multiplication as in (8.4) or employing function lm() orglm() with weights {Kh(Xt " x)}. Recently, in R, there are somebuild-in packages for implementing the local polynomial estimate.For example, the package KernSmooth contains several functions.Function bkde() computes the kernel density estimate and Func-tion bkde2D() computes the 2D kernel density estimate as wellas Function bkfe() computes the kernel functional (derivative) den-sity estimate. Function dpik() selects a bandwidth for estimatingthe kernel density estimation using the plug-in method and Functiondpill() choose a bandwidth for the local linear (q = 1) regression es-timation using the plug-in approach. Finally, Function locpoly() isfor the local polynomial fitting including a local polynomial estimateof the density of X (or its derivative).

Example: We apply the kernel regression estimation method andlocal polynomial fitting method to the drift and di!usion of theweekly 3-month Treasury bill from January 2, 1970 to December 26,1997. Let xt denote the weekly 3-month Treasury bill. It is often tomodel xt by assuming that it satisfies the continuous-time stochasticdi!erential equation (Black-Scholes model)

d xt = µ(xt) dt + )(xt) dWt,


where Wt is a Wiener process, µ(xt) is called the drift function and)(xt) is called the di!usion function. Our interest is to identify µ(xt)and )(xt). Assume a time series sequence {Xt #, 1 # t # n} is ob-served at equally spaced time points. Using the infinitesimalgenerator (Øksendal, 1985), the first-order approximations of mo-ments of xt, a discretized version of the Ito’s process, are given byStanton (1997)

# xt = µ(xt) # t + )(xt) ''

#,

where # xt = xt+# " xt and ' % N(0, 1). Therefore,

µ(xt) = lim#10

E[#xt |xt]/#

and)2(xt) = lim

#10E

7(#xt)

2 |xt

8/#.

See Fan and Zhang (2003) for the higher orders.

8.5.3 Complexity of Local Polynomial Estimator

To implement the local polynomial estimator, we have to choose theorder of the polynomial q, the bandwidth h and the kernel func-tion K(·). These parameters are of course confounded each other.Clearly, when h = &, the local polynomial fitting becomes a globalpolynomial fitting and the order q determines the model complexity.Unlike in the parametric models, the complexity of local polynomialfitting is primarily controlled by the bandwidth, as shown in Fan andGijbels (1996) and Fan and Yao (2003). Hence q is usually small andthe issue of choosing q becomes less critical. We discuss those issuesin detail as follows.(1) If the objective is to estimate m(j)(·) (j ( 0), the local polyno-mial fitting corrects automatically the boundary bias when q " j is


is odd. Further, when q " j is odd, comparing with the order q " 1fit (so that q " j " 1 is even), the order q fit contains one extra pa-rameter without increasing the variance for estimating m(j)(·). Butthis extra parameter creates opportunities for bias reduction, partic-ularly in the boundary regions; see the next section and the booksby Fan and Gijbels (1996) and Ruppert and Wand(1994). For thesereasons, the odd order fits (the order q is chosen so that q " j isodd) outperforms the even order fits [the order (q " 1) fit so that qis even]. Based on theoretical and practical considerations, the orderq = j + 1 is recommended in Fan and Gijbels (1996). If the primaryobjective is to estimate the regression function, one uses local linearfit and if the target function is the first order derivative, one uses thelocal quadratic fit and so on.(2) It is well known that the choice of the bandwidth h plays an im-portant role in the local polynomial fitting. A too large bandwidthcauses over-smoothing, creating excessive modeling bias, while a toosmall bandwidth results in under-smoothing, obtaining wiggly es-timates. The bandwidth can be subjectively chosen by users viavisually inspecting resulting estimates, or automatically chosen bydata via minimizing an estimated theoretical risk (discussed later).(3) Since the estimate is based on the local regression (8.3), it isreasonable to require a non-negative weight function K(·). It can beshown (see Fan and Gijbels (1996)) that for all choices of q and j,the optimal weight function is K(z) = 3/4(1 " z2)+, the Epanech-nikov kernel, based on minimizing the asymptotic variance of thelocal polynomial estimator. Thus, it is a universal weighting schemeand provides a useful benchmark for other kernels to compare with.As shown in Fan and Gijbels (1996) and Fan and Yao (2003), otherkernels have nearly the same e"ciency for practical use of q and j.Hence the choice of the kernel function is not critical.


The local polynomial estimator compares favorably with other es-timators, including the Nadaraya-Watson (local constant) estimatorand other linear estimators such as the Gasser and Muller estimatorof Gasser and Muller (1979) and the Priestley and Chao estimatorof Priestley and Chao (1972). Indeed, it was shown by Fan (1993)that the local linear fitting is asymptotically minimax based on thequadratic loss function among all linear estimators and is nearly min-imax among all possible linear estimators. This minimax property isextended by Fan, Gasser, Gijbels, Brockmann and Engel (1995) tomore general local polynomial fitting. For the detailed comparisonsof the above four estimators, see Fan and Gijbels (1996).

Note that the Gasser and Muller estimator and the Priestley andChao estimator are particularly for the fixed design. That is, Xt = t.Let st = (2t + 1)/2 (t = 1, · · · , n " 1) with s0 = "& and sn = &.The Gasser and Muller estimator is

#fgm(t0) =n$

t=1

L st

st"1Kh(u " t0)du Yt.

Unlike the local constant estimator, no denominator is needed sincethe total weight

n$

t=1

L st

st"1Kh(u " t0)du = 1.

Indeed, the Gasser and Muller estimator is an improved version ofthe Priestley and Chao estimator, which is defined as

#fpc(t0) =n$

t=1Kh(t " t0) Yt.

Note that the Priestley and Chao estimator is only applicable for theequi-space setting.


8.5.4 Properties of Local Polynomial Estimator

Define, for 0 # j # q,

sn,j(x) =n$

t=1(Xt " x)j Kh(Xt " x)

and Sn(x) = XT WX. Then, the (i + 1, j + 1)th element of Sn(x)is sn,i+j(x). Similar to the evaluation of I11, we can show easily that

sn,j(x) = n hj µj(K) f(x){1 + op(1)}.

Define, H = diag{1, h, · · · , hq} and S = (µi+j(K))0#i,j#q. Then, itis not di"cult to show that Sn(x) = n f (x) H S H {1 + op(1)}.

First of all, for 0 # j # q, let ej be a (q + 1) $ 1 vector with(j + 1)th element being one and zero otherwise. Then, #%j can bere-expressed as

#%j = eTj#% =

n$

t=1Wj,n,h(Xt " x) Yt,

where Wj,n,h(Xt"x) is called the e!ective kernel in Fan and Gijbels(1996) and Fan and Yao (2003), given by

Wj,n,h(Xt"x) = eTj Sn(x)"1 (1, (Xt"x), · · · , (Xt"x)q)T Kh(Xt"x).

It is not di"cult to show that Wj,n,h(Xt " x) satisfies the followingthe so-called discrete moment conditions

n$

t=1(Xt " x)lWj,n,h(Xt " x) =

=?

@1 if l = j,0 otherwise.

(8.5)

Note that the local constant estimator does not have this property;see J1(x) in Section 8.4.1. This property implies that the local poly-nomial estimator is unbiased for estimating %j, when the true regres-sion function m(x) is a polynomial of order q.


To gain more insights about the local polynomial estimator, definethe equivalent kernel (see Fan and Gijbels (1996))

Wj(u) = eTj S"1 (1, u, · · · , uq)T K(u).

Then, it can be shown (see Fan and Gijbels (1996)) that

Wj,n,h(Xt " x) =1

n hj+1 f(x)Wj((Xt " x)/h){1 + op(1)}

andL

ul Wj(u)du ==?

@1 if l = j,0 otherwise.

The implications of these results are as follows.

As pointed out by Fan and Yao (2003), the local polynomial esti-mator works like a kernel regression estimation with a known designdensity f(x). This explains why the local polynomial fit adapts tovarious design densities. In contrast, the kernel regression estima-tor has large bias at the region where the derivative of f(x) is large,namely it can not adapt to highly-skewed designs. To see that, imag-ine the true regression function has large slope in this region. Sincethe derivative of design density is large, for a given x, there are morepoints on one side of x than the other. When the local averageis taken, the Nadaraya-Watson estimate is biased towards the sidewith more local data points because the local data are asymmet-rically distributed. This issue is more pronounced at the boundaryregions, since the local data are even more asymmetric. On the otherhand, the local polynomial fit creates asymmetric weights, if needed,to compensate for this kind of design bias. Hence, it is adaptive tovarious design densities and to the boundary regions.

We next derive the asymptotic bias and variance expression forlocal polynomial estimators. For independent data, we can obtain


the bias and variance expression via conditioning on the design ma-trix X. However, for time series data, conditioning on X wouldmean conditioning on nearly the entire series. Hence, we derive theasymptotic bias and variance using the asymptotic normality ratherthan conditional expectation. As explained in Chapter 7, localizingin the state domain weakens the dependent structure for the localdata. Hence, one would expect that the result for the independentdata continues to hold for the stationary process with certain mix-ing conditions. The mixing condition and the bandwidth should berelated, which can be seen later.

Set Bn(x) = (b1(x), · · · , bn(x))T , where, for 0 # j # q,

bj+1(x) =n$

t=1

'

9(m(Xt) "q$

j=0

m(j)(x)

j!(Xt " x)j

)

<* (Xt"x)jKh(Xt"x).

Then,

#% " % =+XT WX

,"1Bn(x) +

+XT WX

,"1XT W ',

where ' = ('1, · · · , 'n)T . It is easy to show that if q is odd,

Bn(x) = n hq+1 H f (x)m(q+1)(x)

(q + 1)!c1,q{1 + op(1)},

where, for 1 # k # 3, ck,q = (µq+k(K), · · · , µ2q+k(K))T . If q is even,

Bn(x) = n hq+2 H f (x)

'

9(c2,qm(q+1)(x)f -(x)

f(x)(q + 1)!+ c3,q

m(q+2)(x)

(q + 2)!

)

<* {1+op(1)}.

Note that f -(x)/f(x) does not appear in the right hand side of Bn(x)when q is odd. In either case, we can show that

Var7H(#% " %)

81 )2(x)S"1 S* S"1/f(x) = &(x),

where S* is a (q +1)$ (q +1) matrix with the (i, j)th element being+i+j"2(K).


This shows that the leading conditional bias term depends onwhether q is odd or even. By a Taylor series expansion argument, weknow that when considering |Xt " x| < h, the remainder term froma qth order polynomial expansion should be of order O(hq+1), so theresult for odd q is quite easy to understand. When q is even, (q + 1)is odd hence the term hq+1 is associated with

MulK(u)du for l odd,

and this term is zero because K(u) is a even function. Therefore, thehq+1 term disappears, while the remainder term becomes O(hq+2).Since q is either odd or even, then we see that the bias term is aneven power of h. This is similar to the case where one uses higherorder kernel functions based upon a symmetric kernel function (aneven function), where the bias is always an even power of h.

Finally, we can show that when q is odd,'

n h7H(#% " %) " B(x)

81 N(0, &(x)),

the asymptotic bias term for the local polynomial estimator is

B(x) =hq+1

(q + 1)!m(q+1)(x) S"1 c1,q{1 + op(1)}.

Or,'

n h2j+17Pm(j)(x) " m(j)(x) " Bj(x)

81 N(0, )jj(x)),

where the asymptotic bias and variance for the local polynomial es-timator of m(j)(x) are

Bj(x) =j! hq+1"j

(q + 1)!m(q+1)(x)

Luq+1 Wj(u)du{1 + op(1)}

and

)jj(x) =(j!)2)2(x)

f(x)

LW 2

j (u)du.


Similarly, we can derive the asymptotic bias and variance at bound-ary points if the regression function has a finite support. For details,see Fan and Gijbels (1996), Fan and Yao (2003), and Ruppert andWand (1994). Indeed, define Sc, S*

c , and ck,q,c similarly to S, S* andck,q with µj(K) and +j(K) replaced by µj,c(K) and +j,c(K) respec-tively. We can show that

'n h

7H(#%(ch) " %(ch)) " Bc(0)

81 N(0, &c(0)), (8.6)

where the asymptotic bias term for the local polynomial estimatorat the left boundary point is

Bc(0) =hq+1

(q + 1)!m(q+1)(0) S"1

c c1,q,c{1 + op(1)},

and the asymptotic variance is &c(0) = )2(0)S"1c S*

c S"1c /f(0). Or,

'n h2j+1

7Pm(j)(ch) " m(j)(ch) " Bj,c(0)

81 N(0, )jj,c(0)),

where with Wj,c(u) = eTj S"1

c (1, u, · · · , uq)T K(u),

Bj,c(0) =j! hq+1"j

(q + 1)!m(q+1)(0)

L &"c

uq+1 Wj,c(u)du{1 + op(1)}

and

)jj,c(0) =(j!)2)2(0)

f(0)

L &"c

W 2j,c(u)du.

Exercise: Please derive the asymptotic properties for the local poly-nomial estimator. That is to prove (8.6).

The above conclusions show that when q " j is odd, the bias atthe boundary is of the same order as that for points on the interior.Hence, the local polynomial fit does not create excessive boundarybias when q " j is odd. Thus, the appealing boundary behavior of


local polynomial mean estimation extends to derivative estimation.However, when q" j is even, the bias at the boundary is larger thanin the interior, and the bias can also be large at points where f(x)is discontinuous. This is referred to as boundary e!ect. For thesereasons (and the minimax e"ciency arguments), it is recommendedthat one strictly set q " j to be odd when estimating m(j)(x). It isindeed an odd world!

8.5.5 Bandwidth Selection

As seen in previous sections, for stationary sequences of data undercertain mixing conditions, the local polynomial estimator performsvery much like that for independent data, because of windowing re-duces dependency among local data. Partially because of this, thereare not many studies on bandwidth selection for these problems.However, it is reasonable to expect the bandwidth selectors for in-dependent data continue to work for dependent data with certainmixing conditions. Below, we summarize a few of useful approaches.When data do not have strong enough mixing, the general strategyis to increase bandwidth in order to reduce the variance.

As what we had already seen for the nonparametric density esti-mation, the cross-validation method is very useful for assessing theperformance of an estimator via estimating its prediction error. Thebasic idea is to set one of the data point aside for validation of amodel and use the remaining data to build the model. It is definedas

CV(h) =n$

s=1[Ys "Pm"s(Xs)]

2

where Pm"s(Xs) is the local polynomial estimator with j = 0 andbandwidth h, but without using the sth observation. The above sum-


mand is indeed a squared-prediction error of the sth data point usingthe training set {(Xt, Yt) : t )= s}. This idea of the cross-validationmethod is simple but is computationally intensive. An improvedversion, in terms of computation, is the generalized cross-validation(GCV), proposed by Wahba (1977) and Craven and Wahba (1979).This criterion can be described as follows. The fitted values #Y =(Pm(X1), · · · ,Pm(Xn))T can be expressed as #Y = H(h)Y , where H(h)is an n $ n hat matrix, depending on the X-variate and bandwidthh, and it is also called a smoothing matrix. Then the GCV approachselects the bandwidth h that minimizes

GCV(h) =7n"1tr(I " H(h))

8"2MASE(h)

where MASE(h) =%n

t=1(Yt " Pm(Xt))2/n is the average of squaredresiduals.

A drawback of the cross-validation type method is its inheritedvariability (see Hall and Johnstone, 1992). Further, it can not be di-rectly applied to select bandwidths for estimating derivative curves.As pointed out by Fan, Heckman, and Wand (1995), the cross-validation type method performs poorly due to its large sample vari-ation, even worse for dependent data. Plug-in methods avoid theseproblems. The basic idea is to find a bandwidth h minimizing esti-mated mean integrated square error (MISE). See Ruppert, Sheatherand Wand (1995) and Fan and Gijbels (1995) for details.

Nonparametric AIC Selector

Inspired by the nonparametric version of the Akaike final predictionerror criterion proposed by Tjøstheim and Auestad (1994b) for thelag selection in nonparametric setting, Cai (2002) proposed a simpleand quick method to select bandwidth for the foregoing estimation


procedures, which can be regarded as a nonparametric version of theAkaike information criterion (AIC) to be attentive to the structure oftime series data and the over-fitting or under-fitting tendency. Notethat the idea is also motivated by its analogue of Cai and Tiwari(2000). The basic idea is described as follows.

By recalling the classical AIC for linear models under the likelihoodsetting

"2 (maximized log likelihood)+2 (number of estimated parameters),

Cai (2002) proposed the following nonparametric AIC to select hminimizing

AIC(h) = log {MASE} + ,(tr(Sh), n), (8.7)

where ,(tr(Sh), n) is chosen particularly to be the form of the bias-corrected version of the AIC, due to Hurvich and Tsai (1989),

,(tr(H(h)), n) = 2 {tr(H(h)) + 1}/[n " {tr(H(h)) + 2}], (8.8)

and tr(H(h)) is the trace of the smoothing matrix H(h), regardedas the nonparametric version of degrees of freedom, called the e!ec-tive number of parameters. See the book by Hastie and Tibshirani(1990, Section 3.5) for the detailed discussion on this aspect for non-parametric models. Note that actually, (8.7) is a generalization ofthe AIC for the parametric regression and autoregressive time se-ries contexts, in which tr(H(h)) is the number of regression (autore-gressive) parameters in the fitting model. In view of (8.8), when,(tr(H(h)), n) = "2 log(1 " tr(H(h))/n), then (8.7) becomes thegeneralized cross-validation (GCV) criterion, commonly used to se-lect the bandwidth in the time series literature even in the iid setting,when ,(tr(H(h)), n) = 2 tr(H(h))/n, then (8.7) is the classical AIC


discussed in Engle, Granger, Rice, and Weiss (1986) for time seriesdata, and when ,(tr(H(h)), n) = " log(1 " 2 tr(H(h))/n), (8.7) isthe T-criterion, proposed and studied by Rice (1984) for iid samples.It is clear that when tr(H(h))/n 1 0, then the nonparametric AIC,the GCV and the T-criterion are asymptotically equivalent. However,the T-criterion requires tr(H(h))/n < 1/2, and, when tr(H(h))/nis large, the GCV has relatively weak penalty. This is especially truefor the nonparametric setting. Therefore, the criterion proposed herecounteracts the over-fitting tendency of the GCV. Note that Hurvich,Simono!, and Tsai (1998) gave the detailed derivation of the non-parametric AIC for the nonparametric regression problems under theiid Gaussian error setting and they argued that the nonparametricAIC performs reasonably well and better than some existing methodsin the literature.

8.6 Functional Coe&cient Model

8.6.1 Model

As mentioned earlier, when p is large, there exists the so called curseof dimensionality. To overcome this shortcoming, one way to do so isto consider the functional coe"cient model as studied in Cai, Fan andYao (2000)e and the additive model discussed in Section 8.7. First,we study the functional coe"cient model. To use the notation fromCai, Fan and Yao (2000), we change the notation from the previoussections.

Let {Ui, Xi, Yi}&i="& be jointly strictly stationary processes withUi taking values in 4k and Xi taking values in 4p. Typically, kis small. Let E(Y 2

1 ) < &. We define the multivariate regression


functionm(u, x) = E (Y |U = u, X = x) , (8.9)

where (U, X, Y ) has the same distribution as (Ui, Xi, Yi). In apure time series context, both Ui and Xi consist of some laggedvalues of Yi. The functional-coe"cient regression model has the form

m(u, x) =p$

j=1aj(u) xj, (8.10)

where the functions {aj(·)} are measurable from 4k to 41 and x =(x1, . . . , xp)T . This model has been studied extensively in the lit-erature; see Cai, Fan and Yao (2000) for the detailed discussions.

For simplicity, in what follows, we consider only the case k = 1in (8.10). Extension to the case k > 1 involves no fundamentallynew ideas. Note that models with large k are often not practicallyuseful due to the “curse of dimensionality”. If k is large, to overcomethe problem, one way to do so is to consider an index functionalcoe"cient model proposed by Fan, Yao and Cai (2003)

m(u, x) =p$

j=1aj(!

Tu) xj, (8.11)

where %1 = 1. Fan, Yao and Cai (2003) studied the estimationprocedures, bandwidth selection and applications. Hong and Lee(2003) considered the applications of model (8.11) to the exchangerates.

8.6.2 Local Linear Estimation

As recommended by Fan and Gijbels (1996), we estimate the co-e"cient functions {aj(·)} using the local linear regression method


from observations {Ui,Xi, Yi}ni=1, where Xi = (Xi1, . . . , Xip)T . We

assume throughout that aj(·) has a continuous second derivative.Note that we may approximate aj(·) locally at u0 by a linear func-tion aj(u) , aj + bj (u"u0). The local linear estimator is defined as"aj(u0) = "aj, where {("aj,

"bj)} minimize the sum of weighted squares

n$

i=1

'

9(Yi "p$

j=1{aj + bj (Ui " u0)} Xij

)

<*

2

Kh(Ui " u0), (8.12)

where Kh(·) = h"1K(·/h), K(·) is a kernel function on 41 and h > 0is a bandwidth. It follows from the least squares theory that

"aj(u0) =n$

k=1Kn,j(Uk " u0, Xk) Yk, (8.13)

where

Kn,j(u, x) = eTj,2p

QRX

TWRX

S"1-

.x

ux

/

0 Kh(u) (8.14)

ej,2p is the 2p $ 1 unit vector with 1 at the jth position, RX denotesan n $ 2p matrix with (XT

i ,XTi (Ui " u0)) as its ith row, and W =

diag {Kh(U1 " u0), . . . , Kh(Un " u0)}.

8.6.3 Bandwidth Selection

Various existing bandwidth selection techniques for nonparametricregression can be adapted for the foregoing estimation; see, e.g.,Fan, Yao, and Cai (2003) and the nonparametric AIC as discussedin Section 8.5.5. Also, Fan and Gijbels (1996) and Ruppert, Sheather,and Wand (1995) developed data-driven bandwidth selection schemesbased on asymptotic formulas for the optimal bandwidths, which areless variable and more e!ective than the conventional data-drivenbandwidth selectors such as the cross-validation bandwidth rule.


Similar algorithms can be developed for the estimation of functional-coe"cient models based on (8.24); however, this will be a futureresearch topic.

Cai, Fan and Yao (2000) proposed a simple and quick method forselecting bandwidth h. It can be regarded as a modified multi-foldcross-validation criterion that is attentive to the structure of station-ary time series data. Let m and Q be two given positive integersand n > mQ. The basic idea is first to use Q subseries of lengthsn" qm (q = 1, , · · · , Q) to estimate the unknown coe"cient func-tions and then compute the one-step forecasting errors of the nextsection of the time series of length m based on the estimated mod-els. More precisely, we choose h that minimizes the average meansquared (AMS) error

AMS(h) =Q$

q=1AMSq(h), (8.15)

where for q = 1, · · · , Q,

AMSq(h) =1

m

n"qm+m$

i=n"qm+1

=>?

>@Yi "

p$

j=1

"aj,q(Ui)Xi,j

A>B

>C

2

,

and {"aj,q(·)} are computed from the sample {(Ui, Xi, Yi), 1 # i #n" qm} with bandwidth equal h[n/(n" qm)]1/5. Note that we re-scale bandwidth h for di!erent sample sizes according to its optimalrate, i.e. h 5 n"1/5. In practical implementations, we may usem = [0.1n] and Q = 4. The selected bandwidth does not dependcritically on the choice of m and Q, as long as mQ is reasonablylarge so that the evaluation of prediction errors is stable. A weightedversion of AMS(h) can be used, if one wishes to down-weight theprediction errors at an earlier time. We believe that this bandwidthshould be good for modeling and forecasting for time series.


8.6.4 Smoothing Variable Selection

Of importance is to choose an appropriate smoothing variable Uin applying functional-coe"cient regression models if U is a laggedvariable. Knowledge on physical background of the data may be veryhelpful, as Cai, Fan and Yao (2000) discussed in modeling the lynxdata. Without any prior information, it is pertinent to choose U interms of some data-driven methods such as the Akaike informationcriterion (AIC) and its variants, cross-validation, and other criteria.Ideally, we would choose U as a linear function of given explanatoryvariables according to some optimal criterion, which can be fullyexplored in the work by Fan, Yao and Cai (2003). Nevertheless, wepropose here a simple and practical approach: let U be one of thegiven explanatory variables such that AMS defined in (8.15) obtainsits minimum value. Obviously, this idea can be also extended toselect p (number of lags) as well.

8.6.5 Goodness-of-Fit Test

To test whether model (8.10) holds with a specified parametric formwhich is popular in economic and financial applications, such as thethreshold autoregressive (TAR) models

aj(u) ==?

@aj1, if u # 4aj2, if u > 4,

or generalized exponential autoregressive (EXPAR) models

aj(u) = &j + (%j + #j u) exp("!j u2),

or smooth transition autoregressive (STAR) models

aj(u) = [1 " exp("!j u)]"1 (logistic),

or


aj(u) = 1 " exp("!j u2) (exponential),

or

aj(u) = [1 " exp("!j |u|)]"1 (absolute),

[for more discussions on those models, please see the survey papers byvan Dijk, Terasvirta and Franses (2002)], we propose a goodness-of-fit test based on the comparison of the residual sum of squares (RSS)from both parametric and nonparametric fittings. This method isclosely related to the sieve likelihood method proposed by Fan, Zhangand Zhang (2001). Those authors demonstrated the optimality ofthis kind of procedures for independent samples.

Consider the null hypothesis

H0 : aj(u) = &j(u, #), 1 # j # p, (8.16)

where &j(·, #) is a given family of functions indexed by unknownparameter vector #. Let ## be an estimator of #. The RSS under thenull hypothesis is

RSS0 = n"1n$

i=1

5Yi " &1(Ui,

##)Xi1 " · · ·" &p(Ui,##)Xip

62.

Analogously, the RSS corresponding to model (8.10) is

RSS1 = n"1n$

i=1{Yi " "a1(Ui)Xi1 " · · ·" "ap(Ui)Xip}2 .

The test statistic is defined as

Tn = (RSS0 " RSS1)/RSS1 = RSS0/RSS1 " 1,

and we reject the null hypothesis (8.16) for large value of Tn. Weuse the following nonparametric bootstrap approach to evaluate thep value of the test:


1. Generate the bootstrap residuals {'*i}ni=1 from the empirical dis-

tribution of the centered residuals { "'i " "'}ni=1, where

"'i = Yi " "a1(Ui) Xi1 " · · ·" "ap(Ui) Xip, "' =1

n

n$

i=1

"'i,

and define

Y *i = &1(Ui,

##)Xi1 + · · · + &p(Ui,##)Xip + '*i .

2. Calculate the bootstrap test statistic T *n based on the sample

{Ui, Xi, Y *i }

ni=1.

3. Reject the null hypothesis H0 when Tn is greater than the upper-& point of the conditional distribution of T *

n given {Ui, Xi, Yi}ni=1.

The p-value of the test is simply the relative frequency of the event{T *

n ( Tn} in the replications of the bootstrap sampling. For thesake of simplicity, we use the same bandwidth in calculating T *

n asthat in Tn. Note that we bootstrap the centralized residuals from thenonparametric fit instead of the parametric fit, because the nonpara-metric estimate of residuals is always consistent, no matter whetherthe null or the alternative hypothesis is correct. The method shouldprovide a consistent estimator of the null distribution even whenthe null hypothesis does not hold. Kreiss, Neumann, and Yao (1998)considered nonparametric bootstrap tests in a general nonparametricregression setting. They proved that, asymptotically, the conditionaldistribution of the bootstrap test statistic is indeed the distributionof the test statistic under the null hypothesis. It may be proven thatthe similar result holds here as long as ## converges to # at the raten"1/2.

It is a great challenge to derive the asymptotic property of the test-ing statistics Tn under time series context and general assumptions.


That is to show that

bn [Tn " *n] 1 N(0, )2)

for some bn and *n, which is a great project for future research. Notethat Fan, Zhang and Zhang (2001) derived the above result for theiid sample.

8.6.6 Asymptotic Results

We first present a result on mean squared convergence that servesas a building block for our main result and is also of independentinterest. We now introduce some notation. Let

Sn = Sn(u0) =-

.Sn,0 Sn,1

Sn,1 Sn,2

/

0

and

Tn = Tn(u0) =-

.Tn,0(u0)Tn,1(u0)

/

0

with

Sn,j = Sn,j(u0) =1

n

n$

i=1Xi X

Ti

-

.Ui " u0

h

/

0j

Kh(Ui " u0)

and

Tn,j(u0) =1

n

n$

i=1Xi

-

.Ui " u0

h

/

0j

Kh(Ui " u0) Yi. (8.17)

Then, the solution to (8.12) can be expressed as

#! = H"1 S"1n Tn, (8.18)

where H = diag (1, . . . , 1, h, . . . , h) with p-diagonal elements 1’sand p diagonal elements h’s. To facilitate the notation, we denote

' = (/l,m)p$p = E+XXT |U = u0

,. (8.19)


Also, let f(u, x) denote the joint density of (U, X) and fu(u) be themarginal density of U . We use the following convention: if U = Xj0

for some 1 # j0 # p, then f(u, x) becomes f(x) the joint densityof X.

Theorem 1. Let condition A.1 in hold, and let f(u, x) be con-tinuous at the point u0. Let hn 1 0 and n hn 1 &, as n 1 &.Then it holds that

E(Sn,j(u0)) 1 fu(u0)'(u0) µj,

andn hn Var(Sn,j(u0)l,m) 1 fu(u0) +2j /l,m

for each 0 # j # 3 and 1 # l, m # p.

As a consequence of Theorem 1, we have

SnP"1 fu(u0)S, and Sn,3

P"1 µ3 fu(u0)'

in the sense that each element converges in probability, where

S =-

.' µ1 '

µ1 ' µ2 '

/

0 .

Put)2(u, x) = Var(Y |U = u, X = x) (8.20)

and'*(u0) = E

7XXT )2(U, X) |U = u0

8. (8.21)

Let c0 = µ2/Jµ2 " µ2

1

Kand c1 = "µ1/

Jµ2 " µ2

1

K.

Theorem 2. Let )2(u, x) and f(u, x) be continuous at the pointu0. Then under conditions A.1 and A.2,

'n hn

'

9( "a(u0) " a(u0) "h2

2

µ22 " µ1 µ3

µ2 " µ21

a--(u0)

)

<*D"1 N

+0, %2(u0)

,,

(8.22)


provided that fu(u0) )= 0, where

%2(u0) =c20 +0 + 2 c0 c1 +1 + c2

1 +2

fu(u0)'"1(u0)'

*(u0)'"1(u0).

(8.23)

Theorem 2 indicates that the asymptotic bias of "aj(u0) is

h2

2

µ22 " µ1 µ3

µ2 " µ21

a--j (u0)

and the asymptotic variance is (n hn)"1 !2j (u0), where

!2j (u0) =

c20 +0 + 2 c0 c1 +1 + c2

1 +2

fu(u0)eT

j,p '"1(u0)'*(u0)'

"1(u0) ej,p.

When µ1 = 0, the bias and variance expressions can be simplified ash2 µ2 a--j (u0)/2 and

!2j (u0) =

+0

fu(u0)eT

j,p '"1(u0)'*(u0)'

"1(u0) ej,p.

The optimal bandwidth for estimating aj(·) can be defined to bethe one that minimizes the squared bias plus variance. The optimalbandwidth is given by

hj,opt =

'

999(µ2

2 +0 " 2 µ1 µ2 +1 + µ21 +2

fu(u0) (µ22 " µ1 µ3)

2

eTj,p '"1(u0)'

*(u0)'"1(u0) ej,p

5a--j (u0)

62

)

<<<*

1/5

n"1/

(8.24)

8.6.7 Conditions and Proofs

We first impose some conditions on the regression model but theymight not be the weakest possible.


Condition A.1

a. The kernel function K(·) is a bounded density with a boundedsupport ["1, 1].

b. |f(u, v |x0, x1; l)| # M < &, for all l ( 1, where f(u, v, |x0, x1; l)is the conditional density of (U0, Ul)) given (X0, Xl), and f(u |x) #M < &, where f(u |x) is the conditional density of U givenX = x.

c. The process {Ui, Xi, Yi} is &-mixing with%

kc[&(k)]1"2/0 < &for some 0 > 2 and c > 1 " 2/0.

d. E|X|2 0 < &, where 0 is given in condition A.1c.

Condition A.2

a. Assume that

E5Y 2

0 + Y 2l |U0 = u, X0 = x0; Ul = v, Xl = x1

6# M < &,

(8.25)for all l ( 1, x0, x1 + 4p, u, and v in a neighborhood of u0.

b. Assume that hn 1 and n hn 1 &. Further, assume that thereexists a sequence of positive integers sn such that sn 1 &,sn = o

+(n hn)1/2

,, and (n/hn)1/2 &(sn) 1 0, as n 1 &.

c. There exists 0* > 0, where 0 is given in Condition A.1c, suchthat

E5|Y |0* |U = u, X = x

6# M4 < & (8.26)

for all x + 4p and u in a neighborhood of u0, and

&(n) = O+n"!*

,, (8.27)

where !* ( 0 0*/{2(0* " 0)}.


d. E|X|2 0* < &, and n1/2"0/4 h0/0*"1/2"0/4 = O(1).

Remark A.1. We provide a su"cient condition for the mixingcoe"cient &(n) to satisfy conditions A.1c and A.2b. Suppose thathn = A n"$(0 < $ < 1, A > 0), sn = (n hn/ log n)1/2 and&(n) = O

+n"d

,for some d > 0. Then condition A.1c is satisfied

for d > 2(1 " 1/0)/(1 " 2/0) and condition A.2b is satisfied if d >(1 + $)/(1 " $). Hence both conditions are satisfied if

&(n) = O+n"d

,, d > max

=>?

>@

1 + $

1 " $,

2(1 " 1/0)

1 " 2/0

A>B

>C.

Note that this is a trade-o! between the order 0 of the moment of Yand the rate of decay of the mixing coe"cient; the larger the order0, the weaker the decay rate of &(n).

To study the joint asymptotic normality of "a(u0), we need to cen-ter the vector Tn(u0) by replacing Yi with Yi " m(Ui, Xi) in theexpression (8.17) of Tn,j(u0). Let

T*n,j(u0) =

1

n

n$

i=1Xi

-

.Ui " u0

h

/

0j

Kh(Ui " u0) [Yi " m(Ui, Xi)],

and

T*n =

-

.T*

n,0

T*n,1

/

0 .

Because the coe"cient functions aj(u) are conducted in the neigh-borhood of |Ui " u0| < h, by Taylor’s expansion,

m(Ui, Xi) = XTi a(u0)+(Ui"u0)X

Ti a-(u0)+

h2

2

-

.Ui " u0

h

/

02

XTi a--(u0)+op(h

2)

where a-(u0) and a--(u0) are the vectors consisting of the first andsecond derivatives of the functions aj(·). Then,

Tn,0 " T*n,0 = Sn,0 a(u0) + hSn,1 a-(u0) +

h2

2Sn,2 a--(u0) + op(h

2)


and

Tn,1 " T*n,1 = Sn,1 a(u0) + hSn,2 a-(u0) +

h2

2Sn,3 a--(u0) + op(h

2),

so that

Tn " T*n = Sn H! +

h2

2

-

.Sn,2

Sn,3

/

0 a--(u0) + op(h2), (8.28)

where ! = (a(u0)T , a-(u0)T )T . Thus it follows from (8.18), (8.28),and Theorem 1 that

H+#! " !

,= f"1

u (u0)S"1 T*

n +h2

2S"1

-

.µ2 'µ3'

/

0 a--(u0) + op(h2),

(8.29)from which the bias term of #!(u0) is evident. Clearly,

"a(u0)"a(u0) ='"1

fu(u0) (µ2 " µ21)

7µ2 T*

n,0 " µ1 T*n,1

8+

h2

2

µ22 " µ1 µ3

µ2 " µ21

a--(u0)+o

(8.30)Thus (8.30) indicates that the asymptotic bias of "a(u0) is

h2

2

µ22 " µ1 µ3

µ2 " µ21

a--(u0).

Let

Qn =1

n

n$

i=1Zi, (8.31)

where

Zi = Xi

'

(c0 + c1

-

.Ui " u0

h

/

0

)

*Kh(Ui " u0) [Yi " m(Ui, Xi)] (8.32)

with c0 = µ2/Jµ2 " µ2

1

Kand c1 = "µ1/

Jµ2 " µ2

1

K. It follows from

(8.30) and (8.31) that

'n hn

'

9( "a(u0) " a(u0) "h2

2

µ22 " µ1 µ3

µ2 " µ21

a--(u0)

)

<* ='"1

fu(u0)

'n hn Qn+op(1).

(8.33)


We need the following lemma, whose proof is more involved thanthat for Theorem 1. Therefore, we prove only this lemma. Through-out this Appendix, we let C denote a generic constant, which maytake di!erent values at di!erent places.

Lemma A.1. Under conditions A.1 and A.2 and the assumptionthat hn 1 0 and n hn 1 &, as n 1 &, if )2(u, x) and f(u, x)are continuous at the point u0, then we have

(a) hn Var(Z1) 1 fu(u0)'*(u0)

Dc20 +0 + 2 c0 c1 +1 + c2

1 +2E;

(b) hn%n"1|

l=1 |Cov(Z1, Zl+1)| = o(1); and

(c) n hn Var(Qn) 1 fu(u0)'*(u0)

Dc20 +0 + 2 c0 c1 +1 + c2

1 +2E.

Proof: First, by conditioning on (U1, X1) and using Theorem 1 ofSun (1984), we have

Var(Z1) = E

'

9(X1 XT1 )

2(U1, X1)=?

@c0 + c1

-

.U1 " u0

h

/

0

AB

C

2

K2h(U1 " u0)

)

<*

=1

h

7fu(u0)'

*(u0)5c20 +0 + 2 c0 c1 +1 + c2

1 +2

6+ o(1)

8. (8.34)

The result (c) follows in an obvious manner from (a) and (b) alongwith

Var(Qn) =1

nVar(Z1) +

2

n

n"1$

l=1

-

.1 "l

n

/

0 Cov(Z1, Zl+1). (8.35)

It thus remains to prove part (b). To this end, let dn 1 & be asequence of positive integers such that dn hn 1 0. Define

J1 =dn"1$

l=1|Cov(Z1, Zl+1)|


and

J2 =n"1$

l=dn

|Cov(Z1, Zl+1)|.

It remains to show that J1 = oJh"1

Kand J2 = o

Jh"1

K.

We remark that because K(·) has a bounded support ["1, 1],aj(u) is bounded in the neighborhood of u + [u0 " h, u0 + h].Let B = max1#j#p sup|u"u0|<h |aj(u)| and g(x) =

%pj=1 |xj|. Then

sup|u"u0|<h |m(u, x)| # B g(x). By conditioning on (U1, X1) and(Ul+1, Xl+1), and using (8.25) and condition A.1b, we have, for alll ( 1,

|Cov(Z1, Zl+1)|# C E

7|X1 XT

l+1| {|Y1| + B g(X1)}{|Yl+1| + B g(Xl+1)}Kh(U1 " u0) Kh(

# C EF

|X1XTl+1|

5M2 + B2g2(X1)

61/2 5M2 + B2g2(Xl+1)

61/2Kh(U1 " u0)

# C E7|X1 XT

l+1| {1 + g(X1)} {1 + g(Xl+1)}8

# C.

It follows thatJ1 # C dn = o

+h"1

,

by the choice of dn. We next consider the upper bound of J2. To thisend, using Davydov’s inequality (see Hall and Heyde 1980, CorollaryA.2), we obtain, for all 1 # j, m # p and l ( 1,

|Cov(Z1j, Zl+1,m)| # C [&(l)]1"2/07E|Zj|0

81/0 7E|Zm|0

81/0. (8.37)

By conditioning on (U, X) and using conditions A.1b and A.2c, onehas

E7|Zj|0

8# C E

7|Xj|0 K0

h(U " u0)5|Y |0 + B0 g0(X)

68

# C E7|Xj|0 K0

h(U " u0)5M3 + B0 g0(X)

68


# C h1"0 E7|Xj|0

5M3 + B0 g0(X)

68

# C h1"0. (8.38)

A combination of (8.37) and (8.38) leads to

J2 # C h2/0"2&$

l=dn

[&(l)]1"2/0 # C h2/0"2 d"cn

&$

l=dn

lc [&(l)]1"2/0 = o+h"1

,

(8.39)by choosing dn such that h1"2/0 dc

n = C, so the requirement thatdn hn 1 0 is satisfied.

Proof of Theorem 2

We use the small-block and large-block technique – namely, partition{1, . . . , n} into 2 qn + 1 subsets with large block of size r = rn andsmall block of size s = sn. Set

q = qn =999(

n

rn + sn

<<<* . (8.40)

We now use the Cramer-Wold device to derive the asymptotic nor-mality of Qn. For any unit vector d + 4p, let Zn,i =

'hdT Zi+1,

i = 0, . . . , n " 1. Then

'n h dT Qn =

1'n

n"1$

i=0Zn,i,

and, by Lemma A.1,

Var(Zn,0) , fu(u0)dT '*(u0)d

7c20 +0 + 2 c0 c1 +1 + c2

1 +2

8

. !2(u0) (8.41)

andn"1$

l=0|Cov(Zn,0, Zn,l)| = o(1). (8.42)


Define the random variables, for 0 # j # q " 1,

4j =j(r+s)+r"1$

i=j(r+s)Zn,i,

(j =(j+1)(r+s)$

i=j(r+s)+rZn,i,

and

5q =n"1$

i=q(r+s)Zn,i.

Then,

'n h dT Qn =

1'n

=>?

>@

q"1$

j=04j +

q"1$

j=0(j + 5q

A>B

>C.

1'n{Qn,1 + Qn,2 + Qn,3} .

(8.43)We show that as n 1 &,

1

nE [Qn,2]

2 1 0,1

nE [Qn,3]

2 1 0, (8.44)

TTTTE [exp(i t Qn,1)] "q"1O

j=0E [exp(i t 4j)]

TTTT 1 0, (8.45)

1

n

q"1$

j=0E

+42

j

,1 !2(u0), (8.46)

and1

n

q"1$

j=0E

742

j I5|4j| ( ' !(u0)

'n68

1 0 (8.47)

for every ' > 0. (8.44) implies that Qn,2 and Qn,3 are asymptoticallynegligible in probability, (8.45) shows that the summands 4j in Qn,1

are asymptotically independent and (8.46) and (8.47) are the stan-dard Lindeberg-Feller conditions for asymptotic normality of Qn,1 forthe independent setup.


We first establish (8.44). For this purpose, we choose the largeblock size. Condition A.2b implies that there is a sequence of positiveconstants #n 1 & such that

#n sn = o+'

n hn

,

and#n(n/hn)1/2 &(sn) 1 0. (8.48)

Define the large block size rn by rn = 6(n hn)1/2/#n7 and the smallblock size sn. Then it can easily be shown from (8.48) that as n 1&,

sn/rn 1 0, rn/n 1 0, rn (n hn)"1/2 1 0, (8.49)

and(n/rn)&(sn) 1 0. (8.50)

Observe that

E [Qn,2]2 =

q"1$

j=0Var((j) + 2

$

0#i<j#q"1Cov((i, (j) . I1 + I2. (8.51)

It follows from stationarity and Lemma A.1 that

I1 = qn Var((1) = qn Var

-

:.sn$

j=1Zn,j

/

;0 = qn sn [!2(u0) + o(1)]. (8.52)

Next consider the second term I2 in the right side of (8.51). Letr*j = j(rn + sn), then r*j " r*i ( rn for all j > i, we thus have

|I2| # 2$

0#i<j#q"1

sn$

j1=1

sn$

j2=1|Cov(Zn,r*i +rn+j1, Zn,r*j+rn+j2)|

# 2n"rn$

j1=1

n$

j2=j1+rn

|Cov(Zn,j1, Zn,j2)|.

By stationarity and Lemma A.1, one obtains

|I2| # 2nn$

j=rn+1|Cov(Zn,1, Zn,j)| = o(n). (8.53)


Hence, by (8.49)-(8.53), we have

1

nE[Qn,2]

2 = O+qn sn n"1

,+ o(1) = o(1). (8.54)

It follows from stationarity, (8.49), and Lemma A.1 that

Var [Qn,3] = Var

-

:.n"qn(rn+sn)$

j=1Zn,j

/

;0 = O(n " qn(rn + sn)) = o(n).

(8.55)Combining (8.49), (8.54), and (8.55), we establish (8.44). As for(8.46), by stationarity, (8.49), (8.50), and Lemma A.1, it is easilyseen that

1

n

qn"1$

j=0E

+42

j

,=

qn

nE

+42

1

,=

qn rn

n·

1

rnVar

-

:.rn$

j=1Zn,j

/

;0 1 !2(u0).

To establish (8.45), we use Lemma 1.1 of Volkonskii and Rozanov(1959) (see also Ibragimov and Linnik 1971, p. 338) to obtain

TTTTE [exp(i t Qn,1)] "qn"1O

j=0E [exp(i t 4j)]

TTTT # 16 (n/rn)&(sn)

tending to 0 by (8.50).

It remains to establish (8.47). For this purpose, we use theorem4.1 of Shao and Yu (1996) and condition A.2 to obtain

E742

1 I5|41| ( ' !(u0)

'n68

# C n1"0/2 E+|41|0

,

# C n1"0/2 r0/2n

5E

+|Zn,0|0

*,60/0*. (8.56)

As in (8.38),E

+|Zn,0|0

*, # C h1"0*/2. (8.57)


Therefore, by (8.56) and (8.57),

E742

1 I5|41| ( ' !(u0)

'n68

# C n1"0/2 r0/2n h(2"0*)0/(2 0*). (8.58)

Thus, by (8.40) and the definition of rn, and using conditions A.2cand A.2d, we obtain

1

n

q"1$

j=0E

742

j I5|4j| ( ' !(u0)

'n68

# C #1"0/2n n1/2"0/4 h0/0

*"1/2"0/4n 1 0

(8.59)because #n 1 &. This completes the proof of the theorem.

8.6.8 Monte Carlo Simulations and Applications

See Cai, Fan and Yao (2000) for the detailed Monte Carlo simulationresults and applications.

8.7 Additive Model

8.7.1 Model

In this section, we use the notation from Cai (2002). Let {Xt, Yt, Zt}&t="&be jointly stationary processes, where Xt and Yt take values in 4p

and 4q with p, q ( 0, respectively. The regression surface is definedby

m(x, y) = E {Zt |Xt = x, Yt = y} . (8.60)

Here, it is assumed that E|Zt| < &. Note that the regression func-tion m(·, ·) defined in (8.60) can identify only the sum

m(x, y) = µ + g1(x) + g2(y). (8.61)


Such a decomposition holds, for example, for the following nonlinearadditive autoregressive model with exogenous variables (ARX)

Yt = µ + g1(Xt"j1, . . . , Xt"jp) + g2(Yt"i1, . . . , Yt"iq) + 4t,Xt"j1 = g3(Xt"j2, . . . , Xt"jp) + 't.

For detailed discussions on the ARX model, the reader is referredto the papers by Masry and Tjøstheim (1997) and Cai and Masry(2000). For identifiability, it is assumed that E {g1(Xt)} = 0 andE {g2(Yt)} = 0. Then, the projection of m(x, y) on the g1(x)-direction is defined by

E{m(x, Yt)} = µ + g1(x) + E {g2(Yt)} = µ + g1(x). (8.62)

Clearly, g1(·) can be identified up to an additive constant and g2(·)can be retrieved likewise.

A thorough discussion of additive time series models defined in(8.61) can be found in Chen and Tsay (1993). Additive compo-nents can be estimated with a one-dimensional nonparametric rate.In most papers, to estimate additive components, several methodshave been proposed. For example, Chen and Tsay (1993) used theiterative backfitting procedures, such as the ACE algorithm and theBRUTO approach; see Hastie and Tibshirani (1990) for details. But,their asymptotic properties are not well understood due to the im-plicit definition of the resulting estimators. To attenuate the draw-backs of iterative procedures, Auestad and Tjøstheim (1991) andTjøstheim and Auestad (1994a) proposed a direct method based onan average regression surface idea, referred to as projection method inTjøstheim and Auestad (1994a) for time series data. As pointed outby Cai and Fan (2000), a direct method has some advantages, suchas it does not rely on iterations, it can make computation fast, and


more importantly, it allows an asymptotic analysis. Finally, the pro-jection method is extended to nonlinear ARX models by Masry andTjøstheim (1997) using the kernel method and Cai and Masry (2000)coupled with the local polynomial approach. It should be remarkedthat the projection method, under the name of marginal integration,is proposed independently by Newey (1994) and Linton and Nielsen(1995) for iid samples, and since then, some important progresseshave been made by some authors. For example, by combining themarginal integration with one-step backfitting, Linton (1997, 2000)presents an e"cient estimator, Mammen, Linton, and Nielsen (1999)establish rigorously the asymptotic theory of the backfitting, Cai andFan (2000) consider estimating each component using the weightedprojection method coupled with the local linear fitting in an e"cientway, and Sperlich, Tjøtheim, and Yang (2000) extend the e"cientmethod to models with simple interactions.

The projection method has some disadvantages although it hasthe aforementioned merits. The projection method may not be e"-cient if covariates (endogenous or exogenous variables) are stronglycorrelated, which is particularly relevant for autoregressive models.The intuitive interpretation is that additive components are not or-thogonal. To overcome this shortcoming, two e"cient estimationmethods have been proposed in the literature. The first one is calledweight function procedure, proposed by Fan, Hardle, and Mammen(1998) for iid samples and extended to time series situations by Caiand Fan (2000). With an appropriate choice of the weight function,additive components can be e"ciently estimated in the sense that anadditive component can be estimated with the same asymptotic biasand variance as if the rest of components were known. The secondone is to combine the marginal integration with one-step backfitting,


introduced by Linton (1997, 2000) for iid samples and extended bySperlish, Tjøstheim, and Yang (2000) to additive models with singleinteractions, but this method has not been advocated for time seriessituations. However, there has not been any attempt to discuss thebandwidth selection for the projection method and its variations inthe literature due to their complexity. In practice, one bandwidth isusually used for all components although Cai and Fan (2000) arguethat di!erent bandwidths might be used theoretically to deal with thesituation that additive components posses the di!erent smoothness.Therefore, the projection method may not be optimal in practice inthe sense that one bandwidth is used.

To estimate unknown additive components in (8.61) e"ciently,following the spirit of the marginal integration with one-step back-fitting proposed by Linton (1997) for iid samples, I use a two-stagemethod, due to Linton (2000), coupled with the local linear (polyno-mial) method, which has some attractive properties, such as math-ematical e"ciency, bias reduction and adaptation of edge e!ect (seeFan and Gijbels, 1996). The basic idea of the two-stage approachis described as follows. At the first stage, one obtains the initialestimated values for all components. More precisely, the idea forestimating any additive component is first to estimate directly high-dimensional regression surface by the local linear method and thento average the regression surface over the rest of variables to stabi-lize variance. Such an initial estimate, in general, is under-smoothedso that the bias should be asymptotically negligible. At the secondstage, the local linear (polynomial) technique is used again to esti-mate any additive component by using the initial estimated valuesof the rest of components. In such a way, it is shown that the esti-mate at the second stage is not only e"cient in the sense of being


equivalent to a procedure based on knowing other components, butalso making the bandwidth selection much easier. Note that thistechnique is not novel to this paper since the two-stage method isfirst used by Linton (1997, 2000) for iid samples, but many detailsand insights are.

The rest of the paper is organized as follows. Section 2 gives abrief review of the projection method and discusses its advantagesand shortcomings. Section 3 presents the two-stage approach coupledwith a new bandwidth selector, based on the nonparametric version ofthe Akaike information criterion. Also, the asymptotic normality ofthe resulting estimator is established. In Section 4, a small simulationstudy is carried out to illustrate the methodology and the two-stageapproach is also applied to a real example. Finally, together withsome regularity conditions, the technical proof is relegated to theAppendix.

8.7.2 Backfitting Algorithm

The building block of the generalized additive model algorithm isthe scatterplot smoother. We will first describe scatterplot smooth-ing in a simple setting, and then indicate how it is used in generalizedadditive modelling. Here y is a response or outcome variable, andx is a prognostic factor. We wish to fit a smooth curve f(x) thatsummarizes the dependence of y on x. If we were to find the curvethat simply minimizes

%ni=1[yi " f(xi)]2, the result would be an in-

terpolating curve that would not be smooth at all. The cubic splinesmoother imposes smoothness on f(x). We seek the function f(x)that minimizes

n$

i=1[yi " f(xi)]

2 + *L[f --(x)]2dx (8.63)


Notice thatM[f --(x)]2dx measures the “wiggliness” of the function

f(x): linear f(x)s haveM[f --(x)]2dx = 0, while non-linear fs produce

values bigger than zero. * is a non-negative smoothing parame-ter that must be chosen by the data analyst. It governs the tradeo!between the goodness of fit to the data and (as measured by and wig-gleness of the function. Larger values of * force f(x) to be smoother.

For any value of *, the solution to (8.63) is a cubic spline, i.e., apiecewise cubic polynomial with pieces joined at the unique observedvalues of x in the dataset. Fast and stable numerical procedures areavailable for computation of the fitted curve. What value of did weuse in practice? In fact it is not a convenient to express the desiredsmoothness of f(x) in terms of *, as the meaning of * depends on theunits of the prognostic factor x. Instead, it is possible to define an“e!ective number of parameters” or “degrees of freedom” of a cubicspline smoother, and then use a numerical search to determine thevalue of * to yield this number. In practice, if we chose the e!ectivenumber of parameters to be 5, roughly speaking, this means that thecomplexity of the curve is about the same as a polynomial regressionof degrees 4. However, the cubic spline smoother “spreads out” itsparameters in a more even manner, and hence is much more flexiblethan a polynomial regression. Note that the degrees of freedom of asmoother need not be an integer.

The above discussion tells how to fit a curve to a single prognosticfactor. With multiple prognostic factors, if xij denotes the value ofthe jth prognostic factor for the ith observation, we fit the additivemodel

yi =d$

j=1fj(xij) + 'i.


A criterion like (8.63) can be specified for this problem, and a simpleiterative procedure exists for estimating the fjs. We apply a cubicspline smoother to the outcome yi"

%dj )=k

#fj(xij) as a function of xik,for each prognostic factor in turn. The process is continues until theestimates #fj(x) stabilize. These procedure is known as “backfitting”and the resulting fit is analogous to a multiple regression for linearmodels.

8.7.3 Projection Method

This section is devoted to a brief review of the projection methodand discusses its merits and disadvantages.

It is assumed that all additive components have continuous secondpartial derivatives, so that m(u, v) can be locally approximated bya linear term in a neighborhood of (x, y), namely, m(u, v) , %0 +!T

1 (u " x) + !T2 (v " y) with {!j} depending on x and y, where

!T1 denotes the transpose of !1.

Let K(·) and L(·) be symmetric kernel functions in 4p and 4q,respectively, and h11 = h11(n) > 0 and h12 = h12(n) > 0 be band-widths in the step of estimating the regression surface. Here, tohandle various degrees of smoothness, Cai and Fan (2000) proposeusing h11 and h12 di!erently although the implementation may notbe easy in practice. The reader is referred to the paper by Cai andFan (2000) for details. Given observations {Xt, Yt, Zt}n

t=1, let #!j

be the minimizer of the following locally weighted least squaresn$

t=1

5Zt " %0 " !T

1 (Xt " x) " !T2 (Yt " y)

62Kh11(Xt"x) Lh12(Yt"y),

where Kh(·) = K(·/h)/hp and Lh(·) = L(·/h)/hq. Then, the locallinear estimator of the regression surface m(x, y) is Pm(x, y) = #%0.


By computing the sample average of Pm(·, ·) based on (8.62), theprojection estimators of g1(·) and g2(·) are defined as, respectively,

"g1(x) =1

n

n$

t=1

Pm(x, Yt) " #µ,

and"g2(y) =

1

n

n$

t=1

Pm(Xt, y) " #µ,

where #µ = n"1 %nt=1 Zt. Under some regularity conditions, by using

the same arguments as those employed in the proof of Theorem 3in Cai and Masry (2000), it can be shown (although not easy andtedious) that the asymptotic bias and asymptotic variance of "g1(x)are, respectively, h2

11 tr{µ2(K) g--1 (x)}/2 and v1(x) = +0(K) A(x),where

A(x) =L

p22(y))2(x, y) p"1(x, y) dy

and)2(x, y) = Var ( Zt |Xt = x, Yt = y) .

Here, p(x, y) stands for the joint density of Xt and Yt, p1(x) denotesthe marginal density of Xt, p2(y) is the marginal density of Yt,+0(K) =

MK2(u)du, and µ2(K) =

MuuTK(u) du.

The foregoing method has some advantages, such as it is easy tounderstand, it can make computation fast, and it allows an asymp-totic analysis. However, it can be quite ine"cient in an asymptoticsense. To demonstrate this idea, let us consider the ideal situationthat g2(·) and µ are known. In such a case, one can estimate g1(·) bydirectly regressing the partial error HZt = Zt "µ" g2(Yt) on Xt andsuch an ideal estimator is optimal in an asymptotic minimax sense(see, e.g., Fan and Gijbels, 1996). The asymptotic bias for the ideal


estimator is h211 tr{µ2(K) g--1 (x)}/2 and the asymptotic variance is

v0(x) = +0(K) B(x) with B(x) = p"11 (x) E

5)2(Xt, Yt) |Xt = x

6

(8.64)(see, e.g., Masry and Fan, 1997). It is clear that v1(x) = v0(x) ifXt and Yt are independent. If Xt and Yt are correlated and when)2(x, y) is a constant, it follows from the Cauchy-Schwarz inequalitythat

B(x) =)2

p1(x)

Lp1/2(y |x)

p2(y)

p1/2(y |x)dy #

)2

p1(x)

L p22(y)

p(y|x)dy = A(x),

which implies that the ideal estimator has always smaller asymptoticvariance than the projection method although both have the samebias. This suggests that the projection method could lead to an in-e"cient estimation of g1(·) and g2(·) when Xt and Yt are seriallycorrelated, which is particularly relevant for autoregressive models.To alleviate this shortcoming, I propose the two-stage approach de-scribed next.

8.7.4 Two-Stage Procedure

The two-stage method due to Linton (1997, 2000) is introduced. Thebasic idea is to get an initial estimate for "g2(·) using a small band-width h12. The initial estimate can be obtained by the projectionmethod and h12 can be chosen so small that the bias of estimat-ing "g2(·) can be asymptotically negligible. Then, using the partialresiduals Z*

t = Zt " #µ" "g2(Yt), we apply the local linear regressiontechnique to the pseudo regression model

Z*t = g1(Xt) + '*t


to estimate g1(·). This leads naturally to the weighted least-squaresproblem

n$

t=1

5Z*

t " %1 " !T2 (Xt " x)

62Jh2(Xt " x), (8.65)

where J(·) is the kernel function in 4p and h2 = h2(n) > 0 is thebandwidth at the second-stage. The advantage of this is twofold: thebandwidth h2 can now be selected purposely for estimating g1(·) onlyand any bandwidth selection technique for nonparametric regressioncan be applied here. Maximizing (8.65) with respect to %1 and !2

gives the two-stage estimate of g1(x), denoted by !g1(x) = #%1, where#%1 and #!2 are the minimizer of (8.65).

It is shown in Theorem 1, in which follows, that under some regu-larity conditions, the asymptotic bias and variance of the two-stageestimate !g1(x) are the same as those for the ideal estimator, providedthat the initial bandwidth h12 satisfies h12 = o (h2).

Sampling Properties

To establish the asymptotic normality of the two-stage estimator, itis assumed that the initial estimator satisfies a linear approximation;namely,

"g2(Yt) " g2(Yt) ,1

n

n$

i=1Lh12(Yi " Yt))(Xi, Yt) 0i

+1

2h2

12 tr{µ2(L) g--2 (Yt)}, (8.66)

where 0t = Zt"m(Xt, Yt) and )(x, y) = p1(x)/p(x, y). Note thatunder some regularity conditions, by following the same argumentsas in Masry (1996), one might show (although the proof is not easy,quite lengthy, and tedious) that (8.66) holds. Note that this assump-tion is also imposed in Linton (2000) for iid samples to simplify the


proof of the asymptotic results of the two-stage estimator. Now, theasymptotic normality for the two-stage estimator is stated here andits proof is relegated to the Appendix.

THEOREM 1. Under (8.66) and Assumptions A1 – A9 statedin the Appendix, if bandwidths h12 and h2 are chosen such thath12 1 0, n hq

12 1 &, h2 1 0, and n hp2 1 & as n 1 &, then,

&

n hp2

7!g1(x) " g1(x) " bias(x) + op

+h2

12 + h22

,8 D"1 N {0, v0(x)} ,

where the asymptotic bias is

bias(x) =h2

2

2tr{µ2(J) g--1 (x)}"

h212

2tr {µ2(L) E (g--2(Yt) |Xt = x)}

and the asymptotic variance is v0(x) = +0(J) B(x).

We remark that by Theorem 1, the asymptotic variance of thetwo-stage estimator is independent of the initial bandwidths. Thus,the initial bandwidths should be chosen as small as possible. Thisis another benefit of using the two-stage procedure: the bandwidthselection problem becomes relatively easy. In particular, when h12 =o (h2), the bias from the initial estimation can be asymptoticallynegligible. For the ideal situation that g2(·) is known, Masry andFan (1997) show that under some regularity conditions, the optimalestimate of g1(x), denoted by "g*1(x), by using (8.65) in which thepartial residual Z*

t is replaced by the partial error HZt = Yt " µ "g2(Yt), is asymptotically normally distributed,&

n hp2

'

9( "g*1(x) " g1(x) "h2

2

2tr{µ2(J) g--1 (x)} + op(h

22)

)

<*D"1 N {0, v0(x)} .

This, in conjunction with Theorem 1, shows that the two-stage es-timator and the ideal estimator share the same asymptotic bias andvariance if h12 = o (h2).


Finally, it is worth to pointing out that under some regularityconditions, the nonlinear additive ARX processes are stationary and&-mixing with geometric decay mixing coe"cient, see Masry andTjøstheim (1997), so that Assumptions A6, A7, and A8 in the Ap-pendix imposed on the mixing coe"cient are automatically satis-fied. Therefore, assuming that the other technical assumptions ofthis paper are satisfied, the result in Theorem 1 can be applied tothe nonlinear additive ARX models.

8.7.5 Monte Carlo Simulations and Applications

See Cai (2002) for the detailed Monte Carlo simulation results andapplications.

8.8 Computer Code

# 07-31-2006

graphics.off() # clean the previous graphs on the screen

###################################################################

z1=matrix(scan(file="c:\\teaching\\time series\\data\\w-3mtbs7097.txt"),

byrow=T,ncol=4)

# dada: weekly 3-month Treasury bill from 1970 to 1997

x=z1[,4]/100

n=length(x)

y=diff(x) # Delta x_t=x_t-x_{t-1}

x=x[1:(n-1)]

n=n-1

x_star=(x-mean(x))/sqrt(var(x))

z=seq(min(x),max(x),length=50)

#win.graph()





scatter.smooth(x,y,span=1/10,ylab="",xlab="x(t-1)",evaluation=60

title(main="(a) y(t) vs x(t-1)",col.main="red")

scatter.smooth(x,abs(y),span=1/10,ylab="",xlab="x(t-1)",evalua

title(main="(b) |y(t)| vs x(t-1)",col.main="red")

scatter.smooth(x,y^2,span=1/10,ylab="",xlab="x(t-1)",evaluation=

title(main="(c) y(t)^2 vs x(t-1)",col.main="red")

dev.off()

###################################################################

#########################

# Nonparametric Fitting #

#########################

#########################################################

# Define the Epanechnikov kernel function


###############################################################

# Define the kernel density estimator

kernden=function(x,z,h,ker){


nz<-length(z)

nx<-length(x)

x0=rep(1,nx*nz)

dim(x0)=c(nx,nz)

x1=t(x0)

x0=x*x0

x1=z*x1

x0=x0-t(x1)




f1=apply(x1,2,mean)/h

return(f1)

}

###############################################################

# Define the local constant estimator

local.constant=function(y,x,z,h,ker){


nz<-length(z)

nx<-length(x)

x0=rep(1,nx*nz)

dim(x0)=c(nx,nz)

x1=t(x0)

x0=x*x0

x1=z*x1

x0=x0-t(x1)



x2=y*x1

f1=apply(x1,2,mean)

f2=apply(x2,2,mean)

f3=f2/f1

return(f3)

}

###################################################################

# Define the local linear estimator

local.linear<-function(y,x,z,h){

# parameters: y=response, x=design matrix; h=bandwidth; z=grid

nz<-length(z)

ny<-length(y)


beta<-rep(0,nz*2)

dim(beta)<-c(nz,2)

for(k in 1:nz){

x0=x-z[k]

w0<-kernel(x0/h)

beta[k,]<-glm(y~x0,weight=w0)$coeff

}

return(beta)

}

###################################################################

h=0.02

# Local constant estimate

mu_hat=local.constant(y,x,z,h,1)

sigma_hat=local.constant(abs(y),x,z,h,1)

sigma2_hat=local.constant(y^2,x,z,h,1)

win.graph()

par(mfrow=c(2,2),mex=0.4,bg="light yellow")

scatter.smooth(x,y,span=1/10,ylab="",xlab="x(t-1)")

points(z,mu_hat,type="l",lty=1,lwd=3,col=2)


legend(0.04,0.0175,"Local Constant Estimate")

scatter.smooth(x,abs(y),span=1/10,ylab="",xlab="x(t-1)")

points(z,sigma_hat,type="l",lty=1,lwd=3,col=2)


scatter.smooth(x,y^2,span=1/10,ylab="",xlab="x(t-1)")


points(z,sigma2_hat,type="l",lty=1,lwd=3,col=2)

# Local Linear Estimate

fit2=local.linear(y,x,z,h)


mu_hat=fit2[,1]

fit2=local.linear(abs(y),x,z,h)

sigma_hat=fit2[,1]

fit2=local.linear(y^2,x,z,h)

sigma2_hat=fit2[,1]

win.graph()


scatter.smooth(x,y,span=1/10,ylab="",xlab="x(t-1)")

points(z,mu_hat,type="l",lty=1,lwd=3,col=2)


legend(0.04,0.0175,"Local Linear Estimate")

scatter.smooth(x,abs(y),span=1/10,ylab="",xlab="x(t-1)")

points(z,sigma_hat,type="l",lty=1,lwd=3,col=2)


scatter.smooth(x,y^2,span=1/10,ylab="",xlab="x(t-1)")


points(z,sigma2_hat,type="l",lty=1,lwd=3,col=2)

###################################################################

8.9 References

Bowman, A. (1984). An alternative method of cross-validation for the smoothing of densityestimate. Biometrika, 71, 353-360.

Cai, Z. (2002). A two-stage approach to additive time series models. Statistica Neerlandica,56, 415-433.

CAI, Z. and J. FAN (2000). Average regression surface for dependent data. Journal ofMultivariate Analysis, 75, 112-142.

Cai, Z., J. Fan and Q. Yao (2000). Functional-coe"cient regression models for nonlineartime series. Journal of American Statistical Association, 95, 941-956.

CAI, Z. and E. MASRY (2000). Nonparametric estimation of additive nonlinear ARX timeseries: Local linear fitting and projection. Econometric Theory, 16, 465-501.


Cai, Z. and R.C. Tiwari (2000). Application of a local linear autoregressive model to BODtime series. Environmetrics, 11, 341-350.

CHEN, R. and R. TSAY (1993). Nonlinear additive ARX models. Journal of the AmericanStatistical Association, 88, 310-320.

Chiu, S.T. (1991). Bandwidth selection for kernel density estimation. The Annals ofStatistics, 19, 1883-1905.

Engle, R.F., C.W.J. Grabger, J. Rice, and A. Weiss (1986). Semiparametric estimates ofthe relation between weather and electricity sales. Journal of The American StatisticalAssociation, 81, 310-320.

Fan, J. (1993). Local linear regression smoothers and their minimax e"ciency. The Annalsof Statistics, 21, 196-216.

Fan, J., N.E. Heckman, and M.P. Wand (1995). Local polynomial kernel regression forgeneralized linear models and quasi-likelihood functions. Journal of the AmericanStatistical Association, 90, 141-150.

Fan, J., T. Gasser, I. Gijbels, M. Brockmann and J. Engel (1996). Local polynomial fitting:optimal kernel and asymptotic minimax e"ciency. Annals of the Institute of StatisticalMathematics, 49, 79-99.

Fan, J. and I. Gijbels (1996). Local Polynomial Modeling and Its Applications. London:Chapman and Hall.

Fan, J. and Q. Yao (2003). Nonlinear Time Series: Nonparametric and Parametric Meth-ods. New York: Springer-Verlag.

Fan, J., Q. Yao and Z. Cai (2003). Adaptive varying-coe"cient linear models. Journal ofthe Royal Statistical Society, Series B, 65, 57-80.

Fan, J. and C. Zhang (2003). A re-examination of di!usion estimators with applicationsto financial model validation. Journal of the American Statistical Association, 98,118-134.

Fan, J., C. Zhang and J. Zhang (2001). Generalized likelihood test statistic and Wilksphenomenon. The Annals of Statistics, 29, 153-193.

Gasser, T. and H.-G. Muller (1979). Kernel estimation of regression functions. In SmoothingTechniques for Curve Estimation, Lecture Notes in Mathematics, 757, 2368. Springer-Verlag, New York.

Granger, C.W.J., and T. Terasvirta (1993). Modeling Nonlinear Economic Relationships.Oxford, U.K.: Oxford University Press.

Hall, P., and C.C. Heyde (1980). Martingale Limit Theory and Its Applications. New York:Academic Press.


Hall, P., and I. Johnstone (1992). Empirical functional and e"cient smoothing parameterselection (with discussion). Journal of the Royal Statistical Society, Series B, 54,475-530.

HASTIE, T.J. and R.J. TIBSHIRANI (1990). Generalized Additive Models. Chapman andHall, London.

Hjort, N.L. and M.C. Jones (1996). Better rules of thumb for choosing bandwidth in densityestimation. Working paper, Department of Mathematics, University of Oslo, Norway.

Hong, Y. and Lee, T.-H. (2003). Inference on via generalized spectrum and nonlinear timeseries models. The Review of Economics and Statistics, 85, 1048-1062.

Hurvich, C.M., J.S. Simono! and C.-L. Tsai (1998). Smoothing parameter selection innonparametric regression using an improved Akaike information criterion. Journal ofthe Royal Statistical, Society B, 60, 271-293.

Jones, M.C., J.S. Marron and S.J. Sheather (1996). A brief survey of bandwidth selectionfor density estimation. Journal of American Statistical Association, 91, 401-407.

Kreiss, J.P., M. Neumann and Q. Yao (1998). Bootstrap tests for simple structures innonparametric time series regression. Unpublished manuscript.

LINTON, O.B. (1997). E"cient estimation of additive nonparametric regression models.Biometrika, 84, 469-473.

LINTON, O.B. (2000). E"cient estimation of generalized additive nonparametric regressionmodels. Econometric Theory, 16, 502-523.

LINTON, O.B. and J.P. NIELSEN (1995). A kernel method of estimating structurednonparametric regression based on marginal integration. Biometrika, 82, 93-100.

MAMMEN, E., O.B. LINTON, and J.P. NIELSEN (1999). The existence and asymptoticproperties of a backfitting projection algorithm under weak conditions. The Annals ofStatistics, 27, 1443-1490.

MASRY, E. and J. FAN (1997). Local polynomial estimation of regression functions formixing processes. Scandinavian Journal of Statistics, 24, 165-179.

MASRY, E. and D. TJØSTHEIM (1997). Additive nonlinear ARX time series and projec-tion estimates. Econometric Theory, 13, 214-252.

Øksendal, B. (1985). Stochastic Di!erential Equations: An Introduction with Applications,3th edition. New York: Springer-Verlag.

Priestley, M.B. and M.T. Chao (1972). Nonparametric function fitting. Journal of theRoyal Statistical Society, Series B, 34, 384-392.

Rice, J. (1984). Bandwidth selection for nonparametric regression. The Annals of Statistics,12, 1215-1230.


Rudemo, M . (1982). Empirical choice of histograms and kernel density estimators. Scan-dinavia Journal of Statistics, 9, 65-78 .

Ruppert, D., S.J. Sheather and M.P. Wand (1995). An e!ective bandwidth selector for localleast squares regression. Journal of American Statistical Association, 90, 1257-1270.

Ruppert, D. and M.P. Wand (1994). Multivariate weighted least squares regression. TheAnnals of Statistics, 22, 1346-1370.

Rousseeuw, R.J. and A.M. Leroy (1987). Robust Regression and Outlier Detection. NewYork: Wiley.

Shao, Q. and H. Yu (1996). Weak convergence for weighted empirical processes of dependentsequences. The Annals of Probability, 24, 2098-2127.

Sheather, S.J. and M.C. Jones (1991). A reliable data-based bandwidth selection methodfor kernel density estimation. Journal of the Royal Statistical Society, Series B, 53,683-690.

SPERLISH, S., D. TJØSTHEIM, and L. YANG (2000). Nonparametric estimation andtesting of interaction in additive models. Econometric Theory,

Stanton, R. (1997). A nonparametric model of term structure dynamics and the marketprice of interest rate risk. Journal of Finance, 52, 1973-2002.

Sun, Z. (1984). Asymptotic unbiased and strong consistency for density function estimator.Acta Mathematica Sinica, 27, 769-782.

TJØSTHEIM, D. and B. AUESTAD (1994a). Nonparametric identification of nonlineartime series: Projections. Journal of the American Statistical Association, 89, 1398-1409.

Tjøstheim, D. and B. Auestad (1994b). Nonparametric identification of nonlinear timeseries: Selecting significant lags. Journal of the American Statistical Association, 89,1410-1419.

van Dijk, D., T. Terasvirta, and P.H. Franses (2002). Smooth transition autoregressivemodels - a survey of recent developments. Econometric Reviews, 21, 1-47.

Wand, M.P. and M.C. Jones (1995). Kernel Smoothing. London: Chapman and Hall.

Time Series Analysis R

Documents

Transcript of Time Series Analysis R