Instability of Predictability of Asset Returnswise.xmu.edu.cn/uploadfiles/ss2011/uploadfiles/... ·...

Instability of Predictability of Asset Returns

Zongwu Cai

University of North Carolina at Charlotte, USA

Xiamen University, China

E-mail:[email protected]

July 3, 2011

Co-author: Yunfei Wang (ING)

E-mail: [email protected]

Contents

Introduction

Proposed Model and Estimation Procedure

Asymptotic Theories

Optimal Estimation Procedures

Empirical Examples

Discussions and Future Research

1

Introduction

Predictability of stock returns

Classical predictive regression models and their modeling

approaches

Motivations of our new model

Difficulties

2

Introduction

♠ The predictability of stock returns has been studied for decades

as a cornerstone research topic in economics and finance. It

is widely examined in many financial applications such as the

mutual fund performance, the conditional capital asset pricing,

and the optimal asset allocations.

♠ Predictability of stock returns has two major aspects:

— To check whether the return series is autocorrelated or a

martingale difference sequence (MDS) or random walk.

— To use financial (state) variables as predictors to see if the

financial (state) variables can predict stock returns.

3

Introduction

♠ There is a vast amount of literature devoted to testing if stock

returns are autocorrelated or random walk or MDS or other

types of dependent structures; see the book by Campbell, Lo

and MacKinlay (1997), and the references therein.

♠ Recently, tremendous empirical studies document predictability

of stock returns using various lagged financial variables, such

as the log dividend-price ratio, log earning-price ratio, the log

book-to-market ratio, the dividend yield, the term spread and

default premium, and the interest rates, as well as other state

variables.

4

Introduction

♠ The classical predictive regression is a structural predictive linear

model:

yt = β0 + β1xt−1 + εt, xt = ρxt−1 + ut, 1 ≤ t ≤ n, (1)

where yt is the dependent variable, say, excess stock returns at

period t, xt−1 is a financial variable such as the log dividend-

price ratio at time t − 1, which is commonly modelled by an

AR(1) model as in (1), and the innovations (εt, ut) are assumed

to be iid bivariate normal N(0,Σ) with Σ =

(

σ2ε σεu

σεu σ2u

)

.

5

Introduction

♠ The common approach to estimate β0 and β1 is to use an

ordinary least squares (OLS) method. But, two major problems

for OLS estimation are as follows:

I. The OLS estimates of the slope coefficient and its standard error

are substantially biased in finite samples due to the correlation

between εt and ut and the biasedness of ρ; see Nelson and Kim

(1993) and Stambaugh (1999).

II. Conventional t-tests based on the OLS estimates tend to over-

reject of the testing predictability (H0 : β1 = 0) in Monte-Carlo

simulations; see Campbell and Yogo (2006).

6

Introduction

♠ How to overcome these problems? There are three major existing

estimation methods for predictive regression models:

1. The first order bias-correction estimator in Stambaugh

(1999), which is based on the relation: E(β1−β1) = γ E(ρ−ρ)

with γ = σεu/σ2u and the analytical result in Kendall (1954):

E(ρ− ρ) = −(1 + 3ρ)/n+O(n−2), is given by

β1,c = β1 + γ (1 + 3ρ)/n,

where γ = σεu/σ2u with β1, σεu and σ2

u obtained from OLS

estimation.

7

Introduction

2. The two-stage least squares estimator in Amihud and Hurvich

(2004).

— Assuming |ρ| < 1 and using linear projection of εt onto ut:

εt = γ ut + vt, then the model (1) can be rewritten as:

yt = β0 + β1 xt−1 + γ ut + vt . (2)

— The two-stage estimation procedure:

♦ Regress xt on xt−1 to obtain the fitted residuals ut.

♦ Regress yt on xt−1 and the fitted residuals ut to obtain the

bias-corrected estimate β∗1 .

8

Introduction

3. The conservative bias-adjusted estimator in Lewellen (2004).

Assuming ρ is close to 1, then the estimator is given by:

β∗∗

1 = β1 + γ(0.9999− ρ).

It can be shown that β∗∗1 is the least biased estimator of β1

when the true ρ is indeed very close to 1.

9

Introduction

♠ Several limitations are associated with the classical predictive

regression model.

A. The correlation between the innovations δ = σεu/σεσu is

unfortunately non-zero for many empirical applications, which

creates the “endogeneity” to the model; see Table 4 in

Campbell and Yogo (2006) and Table 1 in Torous, Valkanov,

and Yan (2004) for some real applications, which can be shown

in the next two slides. Indeed, the project εt onto ut can remove

the endogeneity.

10

ARTICLE IN PRESS

Table 4

Estimates of the model parameters

Series Obs. Variable p d DF-GLS 95% CI: r 95% CI: c

Panel A: S&P 1880– 2002, CRSP 1926– 2002

S&P 500 123 d–p 3 0.845 0.855 ½0:949; 1:033 ½6:107; 4:020

e–p 1 0.962 2.888 ½0:768; 0:965 ½28:262;4:232

Annual 77 d–p 1 0.721 1.033 ½0:903; 1:050 ½7:343; 3:781

e–p 1 0.957 2.229 ½0:748; 1:000 ½19:132;0:027

Quarterly 305 d–p 1 0.942 1.696 ½0:957; 1:007 ½13:081; 2:218

e–p 1 0.986 2.191 ½0:939; 1:000 ½18:670; 0:145

Monthly 913 d–p 2 0.950 1.657 ½0:986; 1:003 ½12:683; 2:377

e–p 1 0.987 1.859 ½0:984; 1:002 ½14:797; 1:711

Panel B: S&P 1880– 1994, CRSP 1926– 1994

S&P 500 115 d–p 3 0.835 2.002 ½0:854; 1:010 ½16:391; 1:079

e–p 1 0.958 3.519 ½0:663; 0:914 ½38:471;9:789

Annual 69 d–p 1 0.693 2.081 ½0:745; 1:010 ½17:341; 0:690

e–p 1 0.959 2.859 ½0:591; 0:940 ½27:808;4:074

Quarterly 273 d–p 1 0.941 2.635 ½0:910; 0:991 ½24:579;2:470

e–p 1 0.988 2.827 ½0:900; 0:986 ½27:322;3:844

Monthly 817 d–p 2 0.948 2.551 ½0:971; 0:998 ½23:419;1:914

e–p 2 0.983 2.600 ½0:970; 0:997 ½24:105;2:240

Panel C: CRSP 1952– 2002

Annual 51 d–p 1 0.749 0.462 ½0:917; 1:087 ½4:131; 4:339

e–p 1 0.955 1.522 ½0:773; 1:056 ½11:354; 2:811

r3 1 0.006 1.762 ½0:725; 1:040 ½13:756; 1:984

y–r1 1 0.243 3.121 ½0:363; 0:878 ½31:870;6:100

Quarterly 204 d–p 1 0.977 0.392 ½0:981; 1:022 ½3:844; 4:381

e–p 1 0.980 1.195 ½0:958; 1:017 ½8:478; 3:539

r3 4 0.095 1.572 ½0:941; 1:013 ½11:825; 2:669

y–r1 2 0.100 2.765 ½0:869; 0:983 ½26:375;3:347

Monthly 612 d–p 1 0.967 0.275 ½0:994; 1:007 ½3:365; 4:451

e–p 1 0.982 0.978 ½0:989; 1:006 ½6:950; 3:857

r3 2 0.071 1.569 ½0:981; 1:004 ½11:801; 2:676

y–r1 1 0.066 4.368 ½0:911; 0:968 ½54:471;19:335

This table reports estimates of the parameters for the predictive regression model. Returns are for the annual S&P

500 index and the annual, quarterly, and monthly CRSP value-weighted index. The predictor variables are the log

dividend–price ratio (d–p), the log earnings–price ratio (e–p), the three-month T-bill rate (r3), and the long-short

J.Y. Campbell, M. Yogo / Journal of Financial Economics 81 (2006) 27–60 47

11

TABLE 1 95% Confidence Intervals for the Largest Autoregressive Rootof the Stochastic Explanatory Variables

Series Sample Period k ADF 95% Interval

Dividend yield 1926:12–1994:12 5 3.30 (.960, .996)1926:12–1951:12 1 2.84 (.915, 1.004)1952:1–1994:12 1 2.65 (.956, 1.004)

Default spread 1926:12–1994:12 2 2.49 (.976, 1.003)1926:12–1951:12 3 0.90 (.984, 1.015)1952:1–1994:12 2 2.50 (.963, 1.004)

Book-to-market 1926:12–1994:08 6 2.35 (.977, 1.003)1926:12–1951:12 6 1.60 (.967, 1.013)1952:1–1994:08 6 1.24 (.986, 1.008)

Term spread 1926:12–1994:12 6 3.57 (.955, .992)1926:12–1951:12 6 3.11 (.943, .999)1952:1–1994:12 2 1.83 (.957, 1.012)

Short-term rate 1926:12–1994:12 8 1.85 (.984, 1.004)1926:12–1951:12 1 1.90 (0.955, 1.012)1952:1–1994:12 7 1.90 (.974, 1.007)

Note.—This table provides 95% confidence intervals for the largest autoregressive root U of sto-chastic explanatory variables typically used in predictive regressions. The explanatory variables used

944 Journal of Business

12

Introduction

B. The degree of persistence of the predictor xt is unknown.

— xt is stationary (|ρ| < 1); see Amihud and Hurvich (2004),

Paye and Timmermann (2006) and Dangl and Halling (2007).

— it is integrated (ρ = 1); see Park and Hanh (1999), Chang

and Martinez-Chombo (2003) and Cai, Li and Park (2009, JoE).

— it is nearly integrated (ρ = 1+c/n, where c < 0); see, Elliott

and Stock (1994), Cavanagh, Elliott, and Stock (1995), Torous,

Valkanov, and Yan (2004), Campbell and Yogo (2006), Polk,

Thompson, and Vuolteenho (2006), Rossi (2007), and among

others.

13

Introduction

C. Empirical evidences show that the coefficient β1 should change

over time in the second half of the 1990s, i.e. the predictability

is instable; see Viceira (1997), Lettau and Ludvigsson (2001),

Goyal and Welch (2003a), Paye and Timmermann (2006), Ang

and Bekaert (2007) and Dangl and Halling (2007). In particular,

Viceira (1997) and Paye and Timmermann (2006) examined

whether there are any structural changes to the predictive model

for equity returns. Indeed, they found the strong evidences

to conclude that there are structural changes in the predictive

model for equity returns.

14

Introduction

♠ To check if there really exist the aforementioned three problems,

we re-visit the monthly return data (from 1926:12 to 2002:12)

discussed in Campbell and Yogo (2006), and regress the monthly

return of CRSP on the log dividend-price ratio and log earnings-

price ratio based on model (2) by using a rolling method with

the length of rolling window being 300.

♠ Here are the results and conclusions.

15

0 100 200 300 400 500 600−0.

03−

0.02

−0.

010.

000.

01

(a)

0 100 200 300 400 500 600

0.00

00.

005

0.01

00.

015

(b)

−4.0 −3.5 −3.0 −2.5 −2.0 −1.5−4.

0−

3.5

−3.

0−

2.5

−2.

0−

1.5

(c)

Figure 1: Empirical results: monthly return of CRSP versus log dividend-price

ratio 16

0 100 200 300 400 500 600

−0.

94−

0.90

−0.

86−

0.82

(a)

0 100 200 300 400 500 600−1.

04−

1.00

−0.

96−

0.92

(b)

−4.0 −3.5 −3.0 −2.5 −2.0 −1.5−4.

0−

3.5

−3.

0−

2.5

−2.

0−

1.5

(c)

Figure 2: Empirical results: monthly return of CRSP versus log earnings-price

ratio 17

Introduction

♥ Figure (a) implies that the coefficient β1 is instable during the

whole period.

♥ Figure (b) shows the correlation between the two innovations γ

is time-varying.

♥ Figure (c) illustrates that the state variable log d-p ratio or log

e-p ratio is a nearly integrated or integrated process.

♥ From the above observations, we can conclude that there exists

at least instability of predictability.

18


How to solve the above problems?

Based on the above empirical results, we assume a nonparametric

relationship between the two innovations in (1): εt = γtut + vt.

Then, a nonparametric time-varying coefficient predictive

regression model can be formulated as

yt = β0t + β1txt−1 + εt = β0t + γtut + β1txt−1 + vt

≡ β⊤

t Xt + vt (3)

xt = θ + ρxt−1 + ut, ρ = 1 + c/n, 1 ≤ t ≤ n, (4)

where Xt = (1, ut, xt−1)⊤, βt = (β0t, β1t, β2t)

⊤, and θ = 0 or

θ 6= 0.

19


♥ Model (3) covers several known models in the literature.

— If βt is piecewise constant (structural changes), Xt is

stationary and γt = 0, model (3) was studied by Chow (1960).

Testing if βt is constant is the well-known Chow test.

— If βt is piecewise constant (structural changes), Xt is

stationary and γt 6= 0, model (3) was studied by Viceira (1997)

and Paye and Timmermann (2006).

— If Xt is stationary, βt is generated from a unit root process

and γt = 0, model (3) was studied by Dangl and Halling (2007).

— If Xt is stationary, βt is an unspecified smooth function of t

20

and γt = 0, it was investigated by Robinson (1989, 1991), Cai

(2007) and Chen and Hong (2007).

— If all components in Xt are I(1) (c = 0), βt is an unspecified

smooth function of t and γt = 0, model (3) was explored by Park

and Hanh (1999) and Chang and Martinez-Chombo (2003).

21


♥ Following Robinson (1989), we make the components of βt

depend on the sample size n in order to provide the asymptotic

justification for the nonparametric smoothing estimators, say,

for each i (0 ≤ i ≤ 2):

βit = βi(st) with st = t/n.

For the prediction purpose, we might set st = (t− 1)/n.

♥ Our object of interest is to estimate the time-varying coefficients

β(s) for any s ∈ [0, 1].

22


To estimate the unknown functions in β(s) based on the

observed values (xt, yt)nt=1, a local linear estimation procedure

is adopted here.

♥ Assume all of the components of β(st) have continuous second

derivative. Then, β(st) can be approximated at any fixed time

point s ∈ [0, 1]: βi(st) ≈ ai + bi(st − s), i = 0, 1, 2, where

ai = βi(s) and bi = β(1)i (s) (the first derivative).

♥ Then, model (2) can be approximated by:

yt ≈ X⊤

t a+ (st − s)X⊤

t b+ vt.

23


The estimates can be obtained by minimizing the locally

weighted sum of squares:

n∑

t=1

[yt −X⊤

t a− (st − s)X⊤

t b]2Kh(st − s), (5)

where Kh(·) = K(·/h)/h, and K(·) is a kernel function.

Minimizing (5) gives the local linear estimate of β(s) and β(1)(s).

24

Asymptotic Theories

♣ Theorem 1: Under the regularity conditions, when θ = 0, we

have:

√nhDn

[

β(s)− β(s)− h2

2β(2)(s)µ2(K)

]

d→ MN(0,Σ),

where Dn = diag1, 1,√n and MN(0,Σ) is a mixed normalwith mean zero and conditional covariance matrix Σ =ν0(K)σ2

vΩ/[∫

K2c − (

∫

Kc)2]2

and

Ω =

(∫

K2c dr)

2 +∫

K2c dr(

∫

Kcdr)2 0 −2

∫

Kcdr∫

K2c dr

0[

∫

K2c dr − (

∫

Kcdr)2]2

/σ2u 0

−2∫

Kcdr∫

K2c dr 0

∫

K2c dr + (

∫

Kcdr)2

.

25

Asymptotic Theories

♣ Here, according to the functional central limit theorem,

x[nr]/√n ⇒ Kc(r),

where Kc(r) =∫ r

0e(r−s)cdW (s) is a diffusion process, W (s) is a

one-dimensional Brownian motion with variance σ2u = var(ut) +

2∑∞

k=2Cov(u1, uk).

♣ Clearly, Kc(r) ∼ N(0, σ2c(r)), and

∫ 1

0Kc(r)dr ∼ N(0, ς(c)2),

where σ2c(r) = σ2

u [exp(2cr)− 1] /2c, and ς(c)2 = σ2u/c

2 +

σ2u(e

2c − 4ec + 3)/2c3. Also, we can show that limc→0 σ2c(r) =

σ2u r and limc→0 ς(c)

2 = σ2u/3.

26

Asymptotic Theories

♣ When c = 0, Kc(r) becomes W (r). Also, Kc(·) is a special

case of the Ornstein-Uhlenbeck process and satisfies the Black-

Scholes model: dKc(r) = cKc(r)dr + dW (r).

♣ ζt ∼ MN(µt,Σt) is the same as ζt ∼ N(µt,Σt) given µt and Σt,

so that its marginal density of ζt is given by

f(ζt) =

∫

ϕ[

Σ−1/2(ζt − u)]

g(u, v)dudv,

where ϕ(x) is the density for the standard normal and g(u, v) is

the joint density of µt and Σt; see Phillips (1988) and Phillips

and Park (1998) for details.

27

Asymptotic Theories

♥ Consider the asymptotic mean square error (AMSE) of the

estimates:

AMSE(βi(s)) =h4

4 µ22(K)|β(2)

i (s)|2 + σ2βi

nh

for 0 ≤ i ≤ 1, and

AMSE(β2(s)) =h4

4 µ22(K)|β(2)

2 (s)|2 + σ2β2

n2h

where σ2βi

is the (i+ 1)th diagonal of the variance matrix Σβ.

28

Asymptotic Theories

♥ By minimizing the above AMSEi, the solutions for the optimal

bandwidth are given by:

h0,opt = C0 n−1/5; h1,opt = C1 n

−1/5; h2,opt = C2 n−2/5,

where C0 and C2 are random and C1 is a constant.

♥ These results show: The optimal bandwidths for estimating

β0(s) and β1(s) have the order O(n−1/5). But, The optimal

bandwidth for estimating β2(s) has the order O(n−2/5).

29

Optimal estimation procedures

Therefore, the optimal estimations of the coefficients can not be

obtained using a single bandwidth. Thus, we use a two-stage

estimation procedure, similar to the so-called profile least

square estimation in Speckman (1988) and Cai, Li and Park

(2009), described as follows.

♥ Stage 1: Find the local linear estimate of β(s) using the

bandwidth h1 = c1 n−2/5 for some positive constant c1 or

smaller (under-smoothed if necessary). The estimate βj,s1(st)

has the form given in (5) with h = h1. β2,s1(s) reaches the

optimal convergence rate at this stage, but βj,s1 for j = 0, 1

does not.

30


♥ Stage 2: Let y∗t = yt− β2,s1(st)xt−1, estimate β0(s) and β1(s)

again using its optimal bandwidth h2 = c2 n−1/5 with some

c2 > 0, i.e., we minimize the local weighted sum of square:

n∑

t=1

[y∗t − a0 − b0(st − s)− a1ut − b1(st − s)ut]2Kh2(st − s)

and find the local linear estimates β0,s2(s) and β1,s2(s).

31


At the first stage, the estimate β2,s1(s) is the optimal estimate

for β2(s) according to Theorem 1.

At the second stage, the estimate βj,s2(s) for βj(s) is also

optimal and follows the asymptotic distribution as long as h1 is

small enough

√

nh2[βj,s2(s)− βj(s)−1

2h22µ2(K)β

(2)j (s)]

d→ N(0, σ2βj(s))

for j = 0 and 1, where σ2β0

= ν0(K)σ2v and σ2

β1= ν0(K)σ2

v/σu.

32

Asymptotic Theories

Theorem 1∗: Under the regularity conditions, when θ 6= 0, we

have:

√nhD∗

n

[

β(s)− β(s)− 1

2h2µ2(K)β(2)(s)

]

d→ N(0,Σ∗

β),

where D∗n = diag1, 1, n and N(0,Σ∗

β) is a normal distribution

with mean zero and covariance matrix Σ∗

β = ν0(K)σ2vΩ

∗ for

some Ω∗ and in particular, when c = 0,

Ω∗ =

(

28 0 −48/θ

0 σ2uv/σ

2vσ

2u 0

−48/θ 0 84/θ2

)

.

33

Empirical Example

♠ Example 1: We conduct a Monte Carlo simulation to exam

the finite sample performance. We consider a data generating

process as follows

yt = β0(st) + β2(st)xt−1 + εt, xt = ρxt−1 + ut, ρ = 1 + c/n,

where β0(st) = exp(−0.7 + 3.5st), and β2(st) = 0.4st −0.2 exp(−16(st − 0.5)2) − 0.2 for st ∈ [0, 1]. Here, εt =

β1(st)ut+ vt, where β1(st) = 7 sin(4st+18.2). Both ut and vt

follow AR(1) models: ut = 0.3ut−1+e1t and vt = 0.3vt−1+e2t,

where e1t and e2t are independently generated from N(0, 0.119)

and N(0, 0.095), respectively. We choose c = −20, −2, 0.

34

Empirical Example

♠ The proposed estimation procedure is evaluated by the mean

absolute deviation error (MADE):

MADEβi,j =1

nd

nd∑

k=1

|βi,sj(sk)− βi,sj(sk)|, i = 0, 2, j = 1, 2,

where sk : 1 ≤ k ≤ nd are the grid points on [0, 1].

♠ We consider different sample sizes as n = 50, 100 and 250, and

repeat the simulation 1000 times for each sample size.

35

Empirical Example

♠ The median and standard deviation (in parenthesis) of the 1000

MADE values for both the proposed one-stage and two-stage

estimation procedures are reported in the following table.

♠ At the first stage, h1 = d1n−2/5 with d1 = 0.3, d1 = 1 and

d1 = 2. At the second step, the optimal bandwidth is selected

by the cross validation method.

36

c d1 n ρ MADEβ0,s1MADEβ0,s2

MADEβ2,s1

50 1.000 0.6836 (0.3661) 0.4570 (0.2985) 1.2063 (0.2865)

0.3 100 1.000 0.5164 (0.2988) 0.3281 (0.1794) 0.5428 (0.1244)

250 1.000 0.2960 (0.2021) 0.1929 (0.1343) 0.2560 (0.0400)

50 1.000 0.4426 (0.3144) 0.3844 (0.2724) 0.7167 (0.2534)

c = 0 1.0 100 1.000 0.3598 (0.2672) 0.3097 (0.2189) 0.3486 (0.0980)

250 1.000 0.1841 (0.1270) 0.1582 (0.0986) 0.1286 (0.0319)

50 1.000 0.8472 (0.6167) 0.7208 (0.6445) 1.3669 (0.5992)

2.0 100 1.000 0.7374 (0.4647) 0.6726 (0.4821) 0.6141 (0.2224)

250 1.000 0.3179 (0.2408) 0.3031 (0.2344) 0.2269 (0.0683)

50 0.960 0.4236 (0.2082) 0.3266 (0.1714) 1.2051 (0.2571)

0.3 100 0.980 0.3208 (0.1671) 0.2233 (0.1113) 0.5598 (0.1195)

250 0.992 0.2018 (0.1149) 0.1297 (0.0814) 0.2611 (0.0389)

50 0.960 0.2744 (0.1753) 0.2574 (0.1471) 0.6947 (0.2324)

c = −2 1.0 100 0.980 0.2098 (0.1241) 0.1985 (0.1058) 0.3588 (0.0868)

250 0.992 0.1174 (0.0707) 0.1079 (0.0606) 0.1262 (0.0317)

50 0.960 0.6225 (0.3796) 0.5287 (0.4167) 1.3927 (0.6179)

2.0 100 0.980 0.4508 (0.2488) 0.3950 (0.2735) 0.6655 (0.1927)

250 0.992 0.2318 (0.1517) 0.1912 (0.1597) 0.2285 (0.0644)

50 0.600 0.1558 (0.0495) 0.1424 (0.0595) 0.9181 (0.2189)

0.3 100 0.800 0.1162 (0.0360) 0.1094 (0.0368) 0.5096 (0.1147)

250 0.920 0.0853 (0.0207) 0.0659 (0.0177) 0.2477 (0.0397)

50 0.600 0.1953 (0.0332) 0.1423 (0.0392) 0.6749 (0.1918)

c = −20 1.0 100 0.800 0.1240 (0.0233) 0.1056 (0.0280) 0.3560 (0.0906)

250 0.920 0.0671 (0.0167) 0.0653 (0.0172) 0.1364 (0.0326)

50 0.600 0.5012 (0.0690) 0.1902 (0.0821) 1.2325 (0.4567)

2.0 100 0.800 0.3453 (0.0372) 0.1545 (0.0589) 0.6731 (0.2285)

250 0.920 0.1930 (0.0194) 0.1025 (0.0312) 0.2504 (0.0725)37

Empirical Example

♠ Conclusions:

1. d1 = 1 is the best selection in the three choices.

2. For all three bandwidths h1, both the median and the

standard deviation for each coefficient function decline with the

increase of sample size, but the rates of decrease for β0,1 and

β0,2 are lower than that for β2,1.

3. By comparing the first two columns, the second stage really

converges faster.

The above findings are expected and consistent with the

asymptotic theory.

38

Empirical Example

♠ Example 2: We study the real example mentioned previously.

♥ We apply the proposed model and the estimation procedure to

analyze the CRSP return using monthly data from 1926:12 to

2002:12. Indeed, this data set was used in Campbell and Yogo

(2006).

♥ We choose log dividend ratio and log earnings price ratio as the

state variables.

♥ We consider two cases: the whole sample from 1926:12 to

2002:12 and the subsample from 1926:12 to 1994:12.

39

Empirical Examples

♥ The reason we consider two samples is that the financial

variable follows either I(1) or NI(1) for the whole sample or

the subsample; see the following table.

Table 1: 95% confidence intervals for ρ and c for subsample and whole sample.

Subsample Subsample Whole sample Whole sample

95% CI for ρ 95% CI for c 95% CI for ρ 95% CI for c

log d-p [0.971, 0.998] [−23.419, −1.914] [0.986, 1.003] [−12.683, 2.377]

log e-p [0.970, 0.997] [−24.105, −2.240] [0.984, 1.002] [−14.797, 1.711]

40

Empirical Examples

♥ These results illustrate that the log d-p or e-p ratio is a nearly

integrated process in the subsample and a unit root process in

the whole data period.

♥ In a sum, the log e-p ratio or log d-p ratio follows either a nearly

integrated or an integrated process. So, xt = ρ xt−1 + ut with

ρ = 1 + c/n for some c ≤ 0.

♥ Note that our theory holds for both NI(1) and I(1).

41

Empirical Examples

♥ Let me show the models used in the literature. For example,

similar to Amihud and Hurvich (2004), we consider following

simple regression model

rt = α0 + α1ut + α2xt−1 + vt

using two different periods of data, where rt is the CRSP

monthly return and xt−1 is the first lag of log d-p ratio (or log

e-p ratio).

♥ Results are summarized in the following table.

42

Empirical Examples

Table 2: OLS estimates (standard errors) for the univariate models

CRSP 1926:12-1994:12 CRSP 1926:12-2002:12

α0 -0.0159 (0.0067) -0.0005 (0.0048)

log d-p α1 -0.9220 (0.0102) -0.9205 (0.0096)

α2 -0.0063 (0.0021) -0.0015 (0.0015)

α0 0.0136 (0.0028) 0.0102 (0.0023)

log e-p α1 -0.9533 (0.0050) -0.9466 (0.0051)

α2 0.0034 (0.0010) 0.0021 (0.0008)

43

Empirical Examples

♥ Now, we consider the following multiple predictive regression

model:

rt = α0 + α1u1,t + α2u2,t + α3x1,t−1 + α4x2,t−1 + vt

using two different periods of data, where x1,t−1 is the first lag

of log d-p ratio and x2,t−1 is the first lag of log e-p ratio. Here,

uj,t = xj,t − ρj xj,t−1 for j = 1, 2.

♥ Results are reported in the following table.

44

Empirical Examples

Table 3: OLS estimates (standard errors) for bivariate model

CRSP 1926:12-1994:12 CRSP 1926:12-2002:12

α0 0.0712 (0.0032) 0.0452 (0.0024)

α1 -0.1062 (0.0159) -0.1577 (0.0154)

α2 -0.8549 (0.0158) -0.7994 (0.0153)

α3 0.0114 (0.0017) 0.0088 (0.0016)

α4 0.0108 (0.0016) 0.0040 (0.0016)

45

Empirical Examples

♥ Next, we consider the following time-varying coefficient model

rt = β0t + β1tut + β2txt−1 + vt

using two different periods of data, where rt is the CRSP

monthly return and xt−1 is the first lag of log d-p ratio (or log

e-p ratio).

46

0.0 0.2 0.4 0.6 0.8 1.0

−0.

050.

000.

05

(a)

0.0 0.2 0.4 0.6 0.8 1.0

−0.

10−

0.05

0.00

0.05

(b)

0.0 0.2 0.4 0.6 0.8 1.0

−0.

04−

0.02

0.00

0.02

0.04

(c)

0.0 0.2 0.4 0.6 0.8 1.0

−0.

04−

0.02

0.00

0.02

(d)

47

Figure 3. Results for CRSP return versus log dividend-price (d-p) ratio during 1926:12-

1994:12 (left panel) and 1926:12-2002:12 (right panel). (a) The local linear estimate of

β0(·) at stage two (solid line) with 95% confidence intervals (dotted line) and the OLS

estimate of α0 (dashed line) compared to zero (dash-dotted line); (b) The local linear

estimate of β0(·) at stage two (solid line) with 95% confidence intervals (dotted line) and

the OLS estimate of α0 (dashed line) compared to zero (dash-dotted line); (c) The local

linear estimate of β2(·) at stage one (solid line) with 95% confidence intervals (dotted line)

and the OLS estimate of α2 (dashed line) compared to zero (dash-dotted line); (d) The

local linear estimate of β2(·) at stage one (solid line) with 95% confidence intervals (dotted

line) and the OLS estimate of α0 (dashed line) compared to zero (dash-dotted line).

48

0.0 0.2 0.4 0.6 0.8 1.0

0.00

0.02

0.04

0.06

0.08

0.10

(a)

0.0 0.2 0.4 0.6 0.8 1.0

0.00

0.02

0.04

0.06

0.08

(b)

0.0 0.2 0.4 0.6 0.8 1.0

0.00

0.01

0.02

0.03

0.04

(c)

0.0 0.2 0.4 0.6 0.8 1.0−0.

010.

000.

010.

020.

03

(d)

Figure 3: Results for CRSP return versus log earning-price (e-p) ratio during

49

Empirical Examples

♥ Finally, we consider the following time-varying coefficient model

with two financial predictors

yt = β0t + β1t u1,t + β2t u2,t + β3t x1,t−1 + β4t x2,t−1 + vt,

x1,t = ρ1 x1,t−1 + u1,t, x2,t = ρ2 x2,t−1 + u2,t, 1 ≤ t ≤ n

using two different periods of data, where x1,t−1 is the first lag

of log d-p ratio and x2,t−1 is the first lag of log e-p ratio.

50

0.0 0.2 0.4 0.6 0.8 1.00.00

0.02

0.04

0.06

0.08

(a)

0.0 0.2 0.4 0.6 0.8 1.0−0.

040.

000.

040.

08

(b)

0.0 0.2 0.4 0.6 0.8 1.0

−0.

020.

000.

02

(c)

0.0 0.2 0.4 0.6 0.8 1.0

−0.

020.

000.

010.

020.

03

(d)

0.0 0.2 0.4 0.6 0.8 1.0

0.00

0.01

0.02

0.03

(e)

0.0 0.2 0.4 0.6 0.8 1.0

−0.

010

0.00

00.

010

(f)

51

Figure 5. Results for CRSP return using 2-dimensional predictive regression. The left panel

is for the time period 1926:12-1994:12 and the right panel is for 1926:12-2002:12. (a) The

local linear estimate of β0(·) at stage two (solid line) with 95% confidence intervals (dotted

line) and the OLS estimate of α0 (dashed line) ; (b) The local linear estimate of β0(·) at

stage two (solid line) with 95% confidence intervals (dotted line) and the OLS estimate of

α0 (dashed line) during ; (c) The local linear estimate of β3(·) at stage one (solid line) with

95% confidence intervals (dotted line) and the OLS estimate of α3 (dashed line); (d) The

local linear estimate of β3(·) at stage one (solid line) with 95% confidence intervals (dotted

line) and the OLS estimate of α3 (dashed line); (e) The local linear estimate of β4(·) at

stage one (solid line) with 95% confidence intervals (dotted line) and the OLS estimate of

α4 (dashed line); (f) The local linear estimate of β4(·) at stage one (solid line) with 95%

confidence intervals (dotted line) and the OLS estimate of α4 (dashed line).

52

Summary

1. We studied a time-varying coefficient predictive regression model

which has an ability to include the state variable to be an NI(1)

or I(1) process and allows endogeneity and time-dependent

intercept and slope functions.

2. We developed a nonparametric method for estimating the

coefficients and studied their asymptotic distribution.

3. To deal with the problem that the estimated intercept and slope

functions have different rates of convergence, we proposed a two

stage estimation procedure to achieve the optimal estimates of

slope and intercept functions respectively at each stage.

53


1. Data-driven method to select the bandwidths (CV?).

2. From the residual plots, one might conclude that volatility exists

and might be a function of nonstationary financial variables.

Therefore, the research topic on ARCH or GARCH type model

with nonstationary variable would be interesting.

3. Testing hypotheses such as testing if βt does really change

over time (H0 : βt = β0) or there is a structural change (H0 :

βt = β10I(t ≤ T0) + β11I(t > T0) for some T0) or there is no

relationship between the dependent variable and the financial

variable (H0 : βt = 0). The paper is available upon request.

54


4. Through nonlinear projection of εt onto xt−1 as εt = ϕ(xt−1) +

vt, then, (1) can be generalized to a nonparametric predictive

regression model as

rt = g(xt−1) + vt, (6)

where g(xt−1) is a nonlinear function of xt−1 and xt is either

integrated or nearly integrated.

Indeed, when xt−1 is integrated, model (6) was investigation

by Cai (2010) and Cai and Li (2011) is considering the testing

issue.

55


But, when xt−1 is nearly integrated, the asymptotic theory for

nonparametric estimate for (6) is still open. It is conjectured

that the asymptotic properties for a kernel type nonparametric

estimator of g(·) should be the same as those for the case when

xt−1 is integrated, with the Brownian motion W (r) replaced by

Kc(r). Of course, it is warranted to investigate this model and

its estimation properties.

56


5. For model (6), to test predictability (H0 : β1 = 0) for model (1)

becomes to test hypothesis H0 : g(x) = g0(x, θ) for model (6),

which is more general. Here, g0(x, θ) is a known function with

unknown parameter θ.

To propose a test statistic, a L2-type the testing approach might

be applicable to the setting. Alternatively, one might use the

generalized likelihood ratio type testing procedure as in Cai, Fan

and Yao (2000) and Fan, Zhang and Zhang (2001).

57


6. One might consider a more general model as

rt = g(t, xt−1) + vt,

where g(·) is a function of both time and nonstationary financial

variables.

58

−4.0 −3.5 −3.0 −2.5 −2.0

−0.3

−0.1

0.1

0.3

(a)

log d−p−4.0 −3.5 −3.0 −2.5 −2.0 −1.5

−0.3

−0.1

0.1

0.3

(b)

log e−p

Figure 4: Return vs the first lag of log d-p ratio in (a) and the first lag of e-p ratio in (b) for

1926:12-2002:12.

59


Indeed, Campbell and Cochrane (1999) and Paye and

Timmermann (2006) did consider some parametric nonlinear

models for the relationship between excess returns and

forecasting variables implied by economic models. See the

aforementioned papers for details. Unfortunately, both papers

considered only the case when xt is stationary.

60

End

THANK YOU for COMING!

61

Instability of Predictability of Asset Returnswise.xmu.edu.cn/uploadfiles/ss2011/uploadfiles/... ·...

Documents

Transcript of Instability of Predictability of Asset Returnswise.xmu.edu.cn/uploadfiles/ss2011/uploadfiles/... ·...