Instability of Predictability of Asset Returnswise.xmu.edu.cn/uploadfiles/ss2011/uploadfiles/... ·...
Transcript of Instability of Predictability of Asset Returnswise.xmu.edu.cn/uploadfiles/ss2011/uploadfiles/... ·...
Instability of Predictability of Asset Returns
Zongwu Cai
University of North Carolina at Charlotte, USA
Xiamen University, China
E-mail:[email protected]
July 3, 2011
Co-author: Yunfei Wang (ING)
Contents
Introduction
Proposed Model and Estimation Procedure
Asymptotic Theories
Optimal Estimation Procedures
Empirical Examples
Discussions and Future Research
1
Introduction
Predictability of stock returns
Classical predictive regression models and their modeling
approaches
Motivations of our new model
Difficulties
2
Introduction
♠ The predictability of stock returns has been studied for decades
as a cornerstone research topic in economics and finance. It
is widely examined in many financial applications such as the
mutual fund performance, the conditional capital asset pricing,
and the optimal asset allocations.
♠ Predictability of stock returns has two major aspects:
— To check whether the return series is autocorrelated or a
martingale difference sequence (MDS) or random walk.
— To use financial (state) variables as predictors to see if the
financial (state) variables can predict stock returns.
3
Introduction
♠ There is a vast amount of literature devoted to testing if stock
returns are autocorrelated or random walk or MDS or other
types of dependent structures; see the book by Campbell, Lo
and MacKinlay (1997), and the references therein.
♠ Recently, tremendous empirical studies document predictability
of stock returns using various lagged financial variables, such
as the log dividend-price ratio, log earning-price ratio, the log
book-to-market ratio, the dividend yield, the term spread and
default premium, and the interest rates, as well as other state
variables.
4
Introduction
♠ The classical predictive regression is a structural predictive linear
model:
yt = β0 + β1xt−1 + εt, xt = ρxt−1 + ut, 1 ≤ t ≤ n, (1)
where yt is the dependent variable, say, excess stock returns at
period t, xt−1 is a financial variable such as the log dividend-
price ratio at time t − 1, which is commonly modelled by an
AR(1) model as in (1), and the innovations (εt, ut) are assumed
to be iid bivariate normal N(0,Σ) with Σ =
(
σ2ε σεu
σεu σ2u
)
.
5
Introduction
♠ The common approach to estimate β0 and β1 is to use an
ordinary least squares (OLS) method. But, two major problems
for OLS estimation are as follows:
I. The OLS estimates of the slope coefficient and its standard error
are substantially biased in finite samples due to the correlation
between εt and ut and the biasedness of ρ; see Nelson and Kim
(1993) and Stambaugh (1999).
II. Conventional t-tests based on the OLS estimates tend to over-
reject of the testing predictability (H0 : β1 = 0) in Monte-Carlo
simulations; see Campbell and Yogo (2006).
6
Introduction
♠ How to overcome these problems? There are three major existing
estimation methods for predictive regression models:
1. The first order bias-correction estimator in Stambaugh
(1999), which is based on the relation: E(β1−β1) = γ E(ρ−ρ)
with γ = σεu/σ2u and the analytical result in Kendall (1954):
E(ρ− ρ) = −(1 + 3ρ)/n+O(n−2), is given by
β1,c = β1 + γ (1 + 3ρ)/n,
where γ = σεu/σ2u with β1, σεu and σ2
u obtained from OLS
estimation.
7
Introduction
2. The two-stage least squares estimator in Amihud and Hurvich
(2004).
— Assuming |ρ| < 1 and using linear projection of εt onto ut:
εt = γ ut + vt, then the model (1) can be rewritten as:
yt = β0 + β1 xt−1 + γ ut + vt . (2)
— The two-stage estimation procedure:
♦ Regress xt on xt−1 to obtain the fitted residuals ut.
♦ Regress yt on xt−1 and the fitted residuals ut to obtain the
bias-corrected estimate β∗1 .
8
Introduction
3. The conservative bias-adjusted estimator in Lewellen (2004).
Assuming ρ is close to 1, then the estimator is given by:
β∗∗
1 = β1 + γ(0.9999− ρ).
It can be shown that β∗∗1 is the least biased estimator of β1
when the true ρ is indeed very close to 1.
9
Introduction
♠ Several limitations are associated with the classical predictive
regression model.
A. The correlation between the innovations δ = σεu/σεσu is
unfortunately non-zero for many empirical applications, which
creates the “endogeneity” to the model; see Table 4 in
Campbell and Yogo (2006) and Table 1 in Torous, Valkanov,
and Yan (2004) for some real applications, which can be shown
in the next two slides. Indeed, the project εt onto ut can remove
the endogeneity.
10
ARTICLE IN PRESS
Table 4
Estimates of the model parameters
Series Obs. Variable p d DF-GLS 95% CI: r 95% CI: c
Panel A: S&P 1880– 2002, CRSP 1926– 2002
S&P 500 123 d–p 3 0.845 0.855 ½0:949; 1:033 ½6:107; 4:020
e–p 1 0.962 2.888 ½0:768; 0:965 ½28:262;4:232
Annual 77 d–p 1 0.721 1.033 ½0:903; 1:050 ½7:343; 3:781
e–p 1 0.957 2.229 ½0:748; 1:000 ½19:132;0:027
Quarterly 305 d–p 1 0.942 1.696 ½0:957; 1:007 ½13:081; 2:218
e–p 1 0.986 2.191 ½0:939; 1:000 ½18:670; 0:145
Monthly 913 d–p 2 0.950 1.657 ½0:986; 1:003 ½12:683; 2:377
e–p 1 0.987 1.859 ½0:984; 1:002 ½14:797; 1:711
Panel B: S&P 1880– 1994, CRSP 1926– 1994
S&P 500 115 d–p 3 0.835 2.002 ½0:854; 1:010 ½16:391; 1:079
e–p 1 0.958 3.519 ½0:663; 0:914 ½38:471;9:789
Annual 69 d–p 1 0.693 2.081 ½0:745; 1:010 ½17:341; 0:690
e–p 1 0.959 2.859 ½0:591; 0:940 ½27:808;4:074
Quarterly 273 d–p 1 0.941 2.635 ½0:910; 0:991 ½24:579;2:470
e–p 1 0.988 2.827 ½0:900; 0:986 ½27:322;3:844
Monthly 817 d–p 2 0.948 2.551 ½0:971; 0:998 ½23:419;1:914
e–p 2 0.983 2.600 ½0:970; 0:997 ½24:105;2:240
Panel C: CRSP 1952– 2002
Annual 51 d–p 1 0.749 0.462 ½0:917; 1:087 ½4:131; 4:339
e–p 1 0.955 1.522 ½0:773; 1:056 ½11:354; 2:811
r3 1 0.006 1.762 ½0:725; 1:040 ½13:756; 1:984
y–r1 1 0.243 3.121 ½0:363; 0:878 ½31:870;6:100
Quarterly 204 d–p 1 0.977 0.392 ½0:981; 1:022 ½3:844; 4:381
e–p 1 0.980 1.195 ½0:958; 1:017 ½8:478; 3:539
r3 4 0.095 1.572 ½0:941; 1:013 ½11:825; 2:669
y–r1 2 0.100 2.765 ½0:869; 0:983 ½26:375;3:347
Monthly 612 d–p 1 0.967 0.275 ½0:994; 1:007 ½3:365; 4:451
e–p 1 0.982 0.978 ½0:989; 1:006 ½6:950; 3:857
r3 2 0.071 1.569 ½0:981; 1:004 ½11:801; 2:676
y–r1 1 0.066 4.368 ½0:911; 0:968 ½54:471;19:335
This table reports estimates of the parameters for the predictive regression model. Returns are for the annual S&P
500 index and the annual, quarterly, and monthly CRSP value-weighted index. The predictor variables are the log
dividend–price ratio (d–p), the log earnings–price ratio (e–p), the three-month T-bill rate (r3), and the long-short
J.Y. Campbell, M. Yogo / Journal of Financial Economics 81 (2006) 27–60 47
11
TABLE 1 95% Confidence Intervals for the Largest Autoregressive Rootof the Stochastic Explanatory Variables
Series Sample Period k ADF 95% Interval
Dividend yield 1926:12–1994:12 5 3.30 (.960, .996)1926:12–1951:12 1 2.84 (.915, 1.004)1952:1–1994:12 1 2.65 (.956, 1.004)
Default spread 1926:12–1994:12 2 2.49 (.976, 1.003)1926:12–1951:12 3 0.90 (.984, 1.015)1952:1–1994:12 2 2.50 (.963, 1.004)
Book-to-market 1926:12–1994:08 6 2.35 (.977, 1.003)1926:12–1951:12 6 1.60 (.967, 1.013)1952:1–1994:08 6 1.24 (.986, 1.008)
Term spread 1926:12–1994:12 6 3.57 (.955, .992)1926:12–1951:12 6 3.11 (.943, .999)1952:1–1994:12 2 1.83 (.957, 1.012)
Short-term rate 1926:12–1994:12 8 1.85 (.984, 1.004)1926:12–1951:12 1 1.90 (0.955, 1.012)1952:1–1994:12 7 1.90 (.974, 1.007)
Note.—This table provides 95% confidence intervals for the largest autoregressive root U of sto-chastic explanatory variables typically used in predictive regressions. The explanatory variables used
944 Journal of Business
12
Introduction
B. The degree of persistence of the predictor xt is unknown.
— xt is stationary (|ρ| < 1); see Amihud and Hurvich (2004),
Paye and Timmermann (2006) and Dangl and Halling (2007).
— it is integrated (ρ = 1); see Park and Hanh (1999), Chang
and Martinez-Chombo (2003) and Cai, Li and Park (2009, JoE).
— it is nearly integrated (ρ = 1+c/n, where c < 0); see, Elliott
and Stock (1994), Cavanagh, Elliott, and Stock (1995), Torous,
Valkanov, and Yan (2004), Campbell and Yogo (2006), Polk,
Thompson, and Vuolteenho (2006), Rossi (2007), and among
others.
13
Introduction
C. Empirical evidences show that the coefficient β1 should change
over time in the second half of the 1990s, i.e. the predictability
is instable; see Viceira (1997), Lettau and Ludvigsson (2001),
Goyal and Welch (2003a), Paye and Timmermann (2006), Ang
and Bekaert (2007) and Dangl and Halling (2007). In particular,
Viceira (1997) and Paye and Timmermann (2006) examined
whether there are any structural changes to the predictive model
for equity returns. Indeed, they found the strong evidences
to conclude that there are structural changes in the predictive
model for equity returns.
14
Introduction
♠ To check if there really exist the aforementioned three problems,
we re-visit the monthly return data (from 1926:12 to 2002:12)
discussed in Campbell and Yogo (2006), and regress the monthly
return of CRSP on the log dividend-price ratio and log earnings-
price ratio based on model (2) by using a rolling method with
the length of rolling window being 300.
♠ Here are the results and conclusions.
15
0 100 200 300 400 500 600−0.
03−
0.02
−0.
010.
000.
01
(a)
0 100 200 300 400 500 600
0.00
00.
005
0.01
00.
015
(b)
−4.0 −3.5 −3.0 −2.5 −2.0 −1.5−4.
0−
3.5
−3.
0−
2.5
−2.
0−
1.5
(c)
Figure 1: Empirical results: monthly return of CRSP versus log dividend-price
ratio 16
0 100 200 300 400 500 600
−0.
94−
0.90
−0.
86−
0.82
(a)
0 100 200 300 400 500 600−1.
04−
1.00
−0.
96−
0.92
(b)
−4.0 −3.5 −3.0 −2.5 −2.0 −1.5−4.
0−
3.5
−3.
0−
2.5
−2.
0−
1.5
(c)
Figure 2: Empirical results: monthly return of CRSP versus log earnings-price
ratio 17
Introduction
♥ Figure (a) implies that the coefficient β1 is instable during the
whole period.
♥ Figure (b) shows the correlation between the two innovations γ
is time-varying.
♥ Figure (c) illustrates that the state variable log d-p ratio or log
e-p ratio is a nearly integrated or integrated process.
♥ From the above observations, we can conclude that there exists
at least instability of predictability.
18
Proposed Model and Estimation Procedure
How to solve the above problems?
Based on the above empirical results, we assume a nonparametric
relationship between the two innovations in (1): εt = γtut + vt.
Then, a nonparametric time-varying coefficient predictive
regression model can be formulated as
yt = β0t + β1txt−1 + εt = β0t + γtut + β1txt−1 + vt
≡ β⊤
t Xt + vt (3)
xt = θ + ρxt−1 + ut, ρ = 1 + c/n, 1 ≤ t ≤ n, (4)
where Xt = (1, ut, xt−1)⊤, βt = (β0t, β1t, β2t)
⊤, and θ = 0 or
θ 6= 0.
19
Proposed Model and Estimation Procedure
♥ Model (3) covers several known models in the literature.
— If βt is piecewise constant (structural changes), Xt is
stationary and γt = 0, model (3) was studied by Chow (1960).
Testing if βt is constant is the well-known Chow test.
— If βt is piecewise constant (structural changes), Xt is
stationary and γt 6= 0, model (3) was studied by Viceira (1997)
and Paye and Timmermann (2006).
— If Xt is stationary, βt is generated from a unit root process
and γt = 0, model (3) was studied by Dangl and Halling (2007).
— If Xt is stationary, βt is an unspecified smooth function of t
20
and γt = 0, it was investigated by Robinson (1989, 1991), Cai
(2007) and Chen and Hong (2007).
— If all components in Xt are I(1) (c = 0), βt is an unspecified
smooth function of t and γt = 0, model (3) was explored by Park
and Hanh (1999) and Chang and Martinez-Chombo (2003).
21
Proposed Model and Estimation Procedure
♥ Following Robinson (1989), we make the components of βt
depend on the sample size n in order to provide the asymptotic
justification for the nonparametric smoothing estimators, say,
for each i (0 ≤ i ≤ 2):
βit = βi(st) with st = t/n.
For the prediction purpose, we might set st = (t− 1)/n.
♥ Our object of interest is to estimate the time-varying coefficients
β(s) for any s ∈ [0, 1].
22
Proposed Model and Estimation Procedure
To estimate the unknown functions in β(s) based on the
observed values (xt, yt)nt=1, a local linear estimation procedure
is adopted here.
♥ Assume all of the components of β(st) have continuous second
derivative. Then, β(st) can be approximated at any fixed time
point s ∈ [0, 1]: βi(st) ≈ ai + bi(st − s), i = 0, 1, 2, where
ai = βi(s) and bi = β(1)i (s) (the first derivative).
♥ Then, model (2) can be approximated by:
yt ≈ X⊤
t a+ (st − s)X⊤
t b+ vt.
23
Proposed Model and Estimation Procedure
The estimates can be obtained by minimizing the locally
weighted sum of squares:
n∑
t=1
[yt −X⊤
t a− (st − s)X⊤
t b]2Kh(st − s), (5)
where Kh(·) = K(·/h)/h, and K(·) is a kernel function.
Minimizing (5) gives the local linear estimate of β(s) and β(1)(s).
24
Asymptotic Theories
♣ Theorem 1: Under the regularity conditions, when θ = 0, we
have:
√nhDn
[
β(s)− β(s)− h2
2β(2)(s)µ2(K)
]
d→ MN(0,Σ),
where Dn = diag1, 1,√n and MN(0,Σ) is a mixed normalwith mean zero and conditional covariance matrix Σ =ν0(K)σ2
vΩ/[∫
K2c − (
∫
Kc)2]2
and
Ω =
(∫
K2c dr)
2 +∫
K2c dr(
∫
Kcdr)2 0 −2
∫
Kcdr∫
K2c dr
0[
∫
K2c dr − (
∫
Kcdr)2]2
/σ2u 0
−2∫
Kcdr∫
K2c dr 0
∫
K2c dr + (
∫
Kcdr)2
.
25
Asymptotic Theories
♣ Here, according to the functional central limit theorem,
x[nr]/√n ⇒ Kc(r),
where Kc(r) =∫ r
0e(r−s)cdW (s) is a diffusion process, W (s) is a
one-dimensional Brownian motion with variance σ2u = var(ut) +
2∑∞
k=2Cov(u1, uk).
♣ Clearly, Kc(r) ∼ N(0, σ2c(r)), and
∫ 1
0Kc(r)dr ∼ N(0, ς(c)2),
where σ2c(r) = σ2
u [exp(2cr)− 1] /2c, and ς(c)2 = σ2u/c
2 +
σ2u(e
2c − 4ec + 3)/2c3. Also, we can show that limc→0 σ2c(r) =
σ2u r and limc→0 ς(c)
2 = σ2u/3.
26
Asymptotic Theories
♣ When c = 0, Kc(r) becomes W (r). Also, Kc(·) is a special
case of the Ornstein-Uhlenbeck process and satisfies the Black-
Scholes model: dKc(r) = cKc(r)dr + dW (r).
♣ ζt ∼ MN(µt,Σt) is the same as ζt ∼ N(µt,Σt) given µt and Σt,
so that its marginal density of ζt is given by
f(ζt) =
∫
ϕ[
Σ−1/2(ζt − u)]
g(u, v)dudv,
where ϕ(x) is the density for the standard normal and g(u, v) is
the joint density of µt and Σt; see Phillips (1988) and Phillips
and Park (1998) for details.
27
Asymptotic Theories
♥ Consider the asymptotic mean square error (AMSE) of the
estimates:
AMSE(βi(s)) =h4
4 µ22(K)|β(2)
i (s)|2 + σ2βi
nh
for 0 ≤ i ≤ 1, and
AMSE(β2(s)) =h4
4 µ22(K)|β(2)
2 (s)|2 + σ2β2
n2h
where σ2βi
is the (i+ 1)th diagonal of the variance matrix Σβ.
28
Asymptotic Theories
♥ By minimizing the above AMSEi, the solutions for the optimal
bandwidth are given by:
h0,opt = C0 n−1/5; h1,opt = C1 n
−1/5; h2,opt = C2 n−2/5,
where C0 and C2 are random and C1 is a constant.
♥ These results show: The optimal bandwidths for estimating
β0(s) and β1(s) have the order O(n−1/5). But, The optimal
bandwidth for estimating β2(s) has the order O(n−2/5).
29
Optimal estimation procedures
Therefore, the optimal estimations of the coefficients can not be
obtained using a single bandwidth. Thus, we use a two-stage
estimation procedure, similar to the so-called profile least
square estimation in Speckman (1988) and Cai, Li and Park
(2009), described as follows.
♥ Stage 1: Find the local linear estimate of β(s) using the
bandwidth h1 = c1 n−2/5 for some positive constant c1 or
smaller (under-smoothed if necessary). The estimate βj,s1(st)
has the form given in (5) with h = h1. β2,s1(s) reaches the
optimal convergence rate at this stage, but βj,s1 for j = 0, 1
does not.
30
Optimal estimation procedures
♥ Stage 2: Let y∗t = yt− β2,s1(st)xt−1, estimate β0(s) and β1(s)
again using its optimal bandwidth h2 = c2 n−1/5 with some
c2 > 0, i.e., we minimize the local weighted sum of square:
n∑
t=1
[y∗t − a0 − b0(st − s)− a1ut − b1(st − s)ut]2Kh2(st − s)
and find the local linear estimates β0,s2(s) and β1,s2(s).
31
Optimal estimation procedures
At the first stage, the estimate β2,s1(s) is the optimal estimate
for β2(s) according to Theorem 1.
At the second stage, the estimate βj,s2(s) for βj(s) is also
optimal and follows the asymptotic distribution as long as h1 is
small enough
√
nh2[βj,s2(s)− βj(s)−1
2h22µ2(K)β
(2)j (s)]
d→ N(0, σ2βj(s))
for j = 0 and 1, where σ2β0
= ν0(K)σ2v and σ2
β1= ν0(K)σ2
v/σu.
32
Asymptotic Theories
Theorem 1∗: Under the regularity conditions, when θ 6= 0, we
have:
√nhD∗
n
[
β(s)− β(s)− 1
2h2µ2(K)β(2)(s)
]
d→ N(0,Σ∗
β),
where D∗n = diag1, 1, n and N(0,Σ∗
β) is a normal distribution
with mean zero and covariance matrix Σ∗
β = ν0(K)σ2vΩ
∗ for
some Ω∗ and in particular, when c = 0,
Ω∗ =
(
28 0 −48/θ
0 σ2uv/σ
2vσ
2u 0
−48/θ 0 84/θ2
)
.
33
Empirical Example
♠ Example 1: We conduct a Monte Carlo simulation to exam
the finite sample performance. We consider a data generating
process as follows
yt = β0(st) + β2(st)xt−1 + εt, xt = ρxt−1 + ut, ρ = 1 + c/n,
where β0(st) = exp(−0.7 + 3.5st), and β2(st) = 0.4st −0.2 exp(−16(st − 0.5)2) − 0.2 for st ∈ [0, 1]. Here, εt =
β1(st)ut+ vt, where β1(st) = 7 sin(4st+18.2). Both ut and vt
follow AR(1) models: ut = 0.3ut−1+e1t and vt = 0.3vt−1+e2t,
where e1t and e2t are independently generated from N(0, 0.119)
and N(0, 0.095), respectively. We choose c = −20, −2, 0.
34
Empirical Example
♠ The proposed estimation procedure is evaluated by the mean
absolute deviation error (MADE):
MADEβi,j =1
nd
nd∑
k=1
|βi,sj(sk)− βi,sj(sk)|, i = 0, 2, j = 1, 2,
where sk : 1 ≤ k ≤ nd are the grid points on [0, 1].
♠ We consider different sample sizes as n = 50, 100 and 250, and
repeat the simulation 1000 times for each sample size.
35
Empirical Example
♠ The median and standard deviation (in parenthesis) of the 1000
MADE values for both the proposed one-stage and two-stage
estimation procedures are reported in the following table.
♠ At the first stage, h1 = d1n−2/5 with d1 = 0.3, d1 = 1 and
d1 = 2. At the second step, the optimal bandwidth is selected
by the cross validation method.
36
c d1 n ρ MADEβ0,s1MADEβ0,s2
MADEβ2,s1
50 1.000 0.6836 (0.3661) 0.4570 (0.2985) 1.2063 (0.2865)
0.3 100 1.000 0.5164 (0.2988) 0.3281 (0.1794) 0.5428 (0.1244)
250 1.000 0.2960 (0.2021) 0.1929 (0.1343) 0.2560 (0.0400)
50 1.000 0.4426 (0.3144) 0.3844 (0.2724) 0.7167 (0.2534)
c = 0 1.0 100 1.000 0.3598 (0.2672) 0.3097 (0.2189) 0.3486 (0.0980)
250 1.000 0.1841 (0.1270) 0.1582 (0.0986) 0.1286 (0.0319)
50 1.000 0.8472 (0.6167) 0.7208 (0.6445) 1.3669 (0.5992)
2.0 100 1.000 0.7374 (0.4647) 0.6726 (0.4821) 0.6141 (0.2224)
250 1.000 0.3179 (0.2408) 0.3031 (0.2344) 0.2269 (0.0683)
50 0.960 0.4236 (0.2082) 0.3266 (0.1714) 1.2051 (0.2571)
0.3 100 0.980 0.3208 (0.1671) 0.2233 (0.1113) 0.5598 (0.1195)
250 0.992 0.2018 (0.1149) 0.1297 (0.0814) 0.2611 (0.0389)
50 0.960 0.2744 (0.1753) 0.2574 (0.1471) 0.6947 (0.2324)
c = −2 1.0 100 0.980 0.2098 (0.1241) 0.1985 (0.1058) 0.3588 (0.0868)
250 0.992 0.1174 (0.0707) 0.1079 (0.0606) 0.1262 (0.0317)
50 0.960 0.6225 (0.3796) 0.5287 (0.4167) 1.3927 (0.6179)
2.0 100 0.980 0.4508 (0.2488) 0.3950 (0.2735) 0.6655 (0.1927)
250 0.992 0.2318 (0.1517) 0.1912 (0.1597) 0.2285 (0.0644)
50 0.600 0.1558 (0.0495) 0.1424 (0.0595) 0.9181 (0.2189)
0.3 100 0.800 0.1162 (0.0360) 0.1094 (0.0368) 0.5096 (0.1147)
250 0.920 0.0853 (0.0207) 0.0659 (0.0177) 0.2477 (0.0397)
50 0.600 0.1953 (0.0332) 0.1423 (0.0392) 0.6749 (0.1918)
c = −20 1.0 100 0.800 0.1240 (0.0233) 0.1056 (0.0280) 0.3560 (0.0906)
250 0.920 0.0671 (0.0167) 0.0653 (0.0172) 0.1364 (0.0326)
50 0.600 0.5012 (0.0690) 0.1902 (0.0821) 1.2325 (0.4567)
2.0 100 0.800 0.3453 (0.0372) 0.1545 (0.0589) 0.6731 (0.2285)
250 0.920 0.1930 (0.0194) 0.1025 (0.0312) 0.2504 (0.0725)37
Empirical Example
♠ Conclusions:
1. d1 = 1 is the best selection in the three choices.
2. For all three bandwidths h1, both the median and the
standard deviation for each coefficient function decline with the
increase of sample size, but the rates of decrease for β0,1 and
β0,2 are lower than that for β2,1.
3. By comparing the first two columns, the second stage really
converges faster.
The above findings are expected and consistent with the
asymptotic theory.
38
Empirical Example
♠ Example 2: We study the real example mentioned previously.
♥ We apply the proposed model and the estimation procedure to
analyze the CRSP return using monthly data from 1926:12 to
2002:12. Indeed, this data set was used in Campbell and Yogo
(2006).
♥ We choose log dividend ratio and log earnings price ratio as the
state variables.
♥ We consider two cases: the whole sample from 1926:12 to
2002:12 and the subsample from 1926:12 to 1994:12.
39
Empirical Examples
♥ The reason we consider two samples is that the financial
variable follows either I(1) or NI(1) for the whole sample or
the subsample; see the following table.
Table 1: 95% confidence intervals for ρ and c for subsample and whole sample.
Subsample Subsample Whole sample Whole sample
95% CI for ρ 95% CI for c 95% CI for ρ 95% CI for c
log d-p [0.971, 0.998] [−23.419, −1.914] [0.986, 1.003] [−12.683, 2.377]
log e-p [0.970, 0.997] [−24.105, −2.240] [0.984, 1.002] [−14.797, 1.711]
40
Empirical Examples
♥ These results illustrate that the log d-p or e-p ratio is a nearly
integrated process in the subsample and a unit root process in
the whole data period.
♥ In a sum, the log e-p ratio or log d-p ratio follows either a nearly
integrated or an integrated process. So, xt = ρ xt−1 + ut with
ρ = 1 + c/n for some c ≤ 0.
♥ Note that our theory holds for both NI(1) and I(1).
41
Empirical Examples
♥ Let me show the models used in the literature. For example,
similar to Amihud and Hurvich (2004), we consider following
simple regression model
rt = α0 + α1ut + α2xt−1 + vt
using two different periods of data, where rt is the CRSP
monthly return and xt−1 is the first lag of log d-p ratio (or log
e-p ratio).
♥ Results are summarized in the following table.
42
Empirical Examples
Table 2: OLS estimates (standard errors) for the univariate models
CRSP 1926:12-1994:12 CRSP 1926:12-2002:12
α0 -0.0159 (0.0067) -0.0005 (0.0048)
log d-p α1 -0.9220 (0.0102) -0.9205 (0.0096)
α2 -0.0063 (0.0021) -0.0015 (0.0015)
α0 0.0136 (0.0028) 0.0102 (0.0023)
log e-p α1 -0.9533 (0.0050) -0.9466 (0.0051)
α2 0.0034 (0.0010) 0.0021 (0.0008)
43
Empirical Examples
♥ Now, we consider the following multiple predictive regression
model:
rt = α0 + α1u1,t + α2u2,t + α3x1,t−1 + α4x2,t−1 + vt
using two different periods of data, where x1,t−1 is the first lag
of log d-p ratio and x2,t−1 is the first lag of log e-p ratio. Here,
uj,t = xj,t − ρj xj,t−1 for j = 1, 2.
♥ Results are reported in the following table.
44
Empirical Examples
Table 3: OLS estimates (standard errors) for bivariate model
CRSP 1926:12-1994:12 CRSP 1926:12-2002:12
α0 0.0712 (0.0032) 0.0452 (0.0024)
α1 -0.1062 (0.0159) -0.1577 (0.0154)
α2 -0.8549 (0.0158) -0.7994 (0.0153)
α3 0.0114 (0.0017) 0.0088 (0.0016)
α4 0.0108 (0.0016) 0.0040 (0.0016)
45
Empirical Examples
♥ Next, we consider the following time-varying coefficient model
rt = β0t + β1tut + β2txt−1 + vt
using two different periods of data, where rt is the CRSP
monthly return and xt−1 is the first lag of log d-p ratio (or log
e-p ratio).
46
0.0 0.2 0.4 0.6 0.8 1.0
−0.
050.
000.
05
(a)
0.0 0.2 0.4 0.6 0.8 1.0
−0.
10−
0.05
0.00
0.05
(b)
0.0 0.2 0.4 0.6 0.8 1.0
−0.
04−
0.02
0.00
0.02
0.04
(c)
0.0 0.2 0.4 0.6 0.8 1.0
−0.
04−
0.02
0.00
0.02
(d)
47
Figure 3. Results for CRSP return versus log dividend-price (d-p) ratio during 1926:12-
1994:12 (left panel) and 1926:12-2002:12 (right panel). (a) The local linear estimate of
β0(·) at stage two (solid line) with 95% confidence intervals (dotted line) and the OLS
estimate of α0 (dashed line) compared to zero (dash-dotted line); (b) The local linear
estimate of β0(·) at stage two (solid line) with 95% confidence intervals (dotted line) and
the OLS estimate of α0 (dashed line) compared to zero (dash-dotted line); (c) The local
linear estimate of β2(·) at stage one (solid line) with 95% confidence intervals (dotted line)
and the OLS estimate of α2 (dashed line) compared to zero (dash-dotted line); (d) The
local linear estimate of β2(·) at stage one (solid line) with 95% confidence intervals (dotted
line) and the OLS estimate of α0 (dashed line) compared to zero (dash-dotted line).
48
0.0 0.2 0.4 0.6 0.8 1.0
0.00
0.02
0.04
0.06
0.08
0.10
(a)
0.0 0.2 0.4 0.6 0.8 1.0
0.00
0.02
0.04
0.06
0.08
(b)
0.0 0.2 0.4 0.6 0.8 1.0
0.00
0.01
0.02
0.03
0.04
(c)
0.0 0.2 0.4 0.6 0.8 1.0−0.
010.
000.
010.
020.
03
(d)
Figure 3: Results for CRSP return versus log earning-price (e-p) ratio during
49
Empirical Examples
♥ Finally, we consider the following time-varying coefficient model
with two financial predictors
yt = β0t + β1t u1,t + β2t u2,t + β3t x1,t−1 + β4t x2,t−1 + vt,
x1,t = ρ1 x1,t−1 + u1,t, x2,t = ρ2 x2,t−1 + u2,t, 1 ≤ t ≤ n
using two different periods of data, where x1,t−1 is the first lag
of log d-p ratio and x2,t−1 is the first lag of log e-p ratio.
50
0.0 0.2 0.4 0.6 0.8 1.00.00
0.02
0.04
0.06
0.08
(a)
0.0 0.2 0.4 0.6 0.8 1.0−0.
040.
000.
040.
08
(b)
0.0 0.2 0.4 0.6 0.8 1.0
−0.
020.
000.
02
(c)
0.0 0.2 0.4 0.6 0.8 1.0
−0.
020.
000.
010.
020.
03
(d)
0.0 0.2 0.4 0.6 0.8 1.0
0.00
0.01
0.02
0.03
(e)
0.0 0.2 0.4 0.6 0.8 1.0
−0.
010
0.00
00.
010
(f)
51
Figure 5. Results for CRSP return using 2-dimensional predictive regression. The left panel
is for the time period 1926:12-1994:12 and the right panel is for 1926:12-2002:12. (a) The
local linear estimate of β0(·) at stage two (solid line) with 95% confidence intervals (dotted
line) and the OLS estimate of α0 (dashed line) ; (b) The local linear estimate of β0(·) at
stage two (solid line) with 95% confidence intervals (dotted line) and the OLS estimate of
α0 (dashed line) during ; (c) The local linear estimate of β3(·) at stage one (solid line) with
95% confidence intervals (dotted line) and the OLS estimate of α3 (dashed line); (d) The
local linear estimate of β3(·) at stage one (solid line) with 95% confidence intervals (dotted
line) and the OLS estimate of α3 (dashed line); (e) The local linear estimate of β4(·) at
stage one (solid line) with 95% confidence intervals (dotted line) and the OLS estimate of
α4 (dashed line); (f) The local linear estimate of β4(·) at stage one (solid line) with 95%
confidence intervals (dotted line) and the OLS estimate of α4 (dashed line).
52
Summary
1. We studied a time-varying coefficient predictive regression model
which has an ability to include the state variable to be an NI(1)
or I(1) process and allows endogeneity and time-dependent
intercept and slope functions.
2. We developed a nonparametric method for estimating the
coefficients and studied their asymptotic distribution.
3. To deal with the problem that the estimated intercept and slope
functions have different rates of convergence, we proposed a two
stage estimation procedure to achieve the optimal estimates of
slope and intercept functions respectively at each stage.
53
Discussions and Future Research
1. Data-driven method to select the bandwidths (CV?).
2. From the residual plots, one might conclude that volatility exists
and might be a function of nonstationary financial variables.
Therefore, the research topic on ARCH or GARCH type model
with nonstationary variable would be interesting.
3. Testing hypotheses such as testing if βt does really change
over time (H0 : βt = β0) or there is a structural change (H0 :
βt = β10I(t ≤ T0) + β11I(t > T0) for some T0) or there is no
relationship between the dependent variable and the financial
variable (H0 : βt = 0). The paper is available upon request.
54
Discussions and Future Research
4. Through nonlinear projection of εt onto xt−1 as εt = ϕ(xt−1) +
vt, then, (1) can be generalized to a nonparametric predictive
regression model as
rt = g(xt−1) + vt, (6)
where g(xt−1) is a nonlinear function of xt−1 and xt is either
integrated or nearly integrated.
Indeed, when xt−1 is integrated, model (6) was investigation
by Cai (2010) and Cai and Li (2011) is considering the testing
issue.
55
Discussions and Future Research
But, when xt−1 is nearly integrated, the asymptotic theory for
nonparametric estimate for (6) is still open. It is conjectured
that the asymptotic properties for a kernel type nonparametric
estimator of g(·) should be the same as those for the case when
xt−1 is integrated, with the Brownian motion W (r) replaced by
Kc(r). Of course, it is warranted to investigate this model and
its estimation properties.
56
Discussions and Future Research
5. For model (6), to test predictability (H0 : β1 = 0) for model (1)
becomes to test hypothesis H0 : g(x) = g0(x, θ) for model (6),
which is more general. Here, g0(x, θ) is a known function with
unknown parameter θ.
To propose a test statistic, a L2-type the testing approach might
be applicable to the setting. Alternatively, one might use the
generalized likelihood ratio type testing procedure as in Cai, Fan
and Yao (2000) and Fan, Zhang and Zhang (2001).
57
Discussions and Future Research
6. One might consider a more general model as
rt = g(t, xt−1) + vt,
where g(·) is a function of both time and nonstationary financial
variables.
58
−4.0 −3.5 −3.0 −2.5 −2.0
−0.3
−0.1
0.1
0.3
(a)
log d−p−4.0 −3.5 −3.0 −2.5 −2.0 −1.5
−0.3
−0.1
0.1
0.3
(b)
log e−p
Figure 4: Return vs the first lag of log d-p ratio in (a) and the first lag of e-p ratio in (b) for
1926:12-2002:12.
59
Discussions and Future Research
Indeed, Campbell and Cochrane (1999) and Paye and
Timmermann (2006) did consider some parametric nonlinear
models for the relationship between excess returns and
forecasting variables implied by economic models. See the
aforementioned papers for details. Unfortunately, both papers
considered only the case when xt is stationary.
60
End
THANK YOU for COMING!
61