recursive arch EXTENDED - Dept. of Statistics, Texas …suhasini/papers/recursive_arch_EXTENDED.pdfA...
Transcript of recursive arch EXTENDED - Dept. of Statistics, Texas …suhasini/papers/recursive_arch_EXTENDED.pdfA...
A recursive online algorithm for the estimation of
time-varying ARCH parameters∗
extended version†
Rainer Dahlhaus and Suhasini Subba Rao
Institut fur Angewandte Mathematik
Universitat Heidelberg
Im Neuenheimer Feld 294
69120 Heidelberg
Germany
November 19, 2006
Abstract
In this paper we propose an online recursive algorithm for estimating the parameters
of a time-varying ARCH process. The estimation is done by updating the estimator at
time point (t− 1) with observations about the time point t to yield an estimator of the
parameter at time point t. The sampling properties of this estimator are studied in a
nonstationary context, in particular asymptotic normality and an expression for the bias
due to nonstationarity are established. The minimax risk for the tvARCH process is
evaluated and compared with the mean squared error of the online recursive estimator.
It is shown, if the time-varying parameters belong to a Holder class of order less than
or equal to one, then, with a suitable choice of step-size, the recursive online estimator
attains the local minimax bound. On the other hand, if the order of the Holder class is
greater than one, the minimax bound for the recursive algorithm cannot be attained.
However, by running two recursive online algorithms in parallel with different step-sizes
and taking a linear combination of the estimators the minimax bound can be attained
for Holder classes of order between one and two.
∗Supported by the Deutsche Forschungsgemeinschaft (DA 187/12-2).†This technical report is the extended version of the paper “A recursive online algorithm for the estimation
of time-varying ARCH parameters”, to appear in ... In addition to the published paper this version contains
in particular a minimax result in Theorem 2.4, Remark 2.2, Theorem A.1 (being part of the proof of Theorem
3.1) and the proof of Lemma A.6.
1
1 Introduction
The class of ARCH processes can be generalised to include nonstationary processes, by
including models with parameters which are time dependent. More precisely Xt,N is
called a time-varying ARCH (tvARCH) process of order p if it satisfies the representation
Xt,N = Ztσt,N , σ2t,N = a0(
t
N) +
p∑
j=1
aj(t
N)X2
t−j,N , (1)
where Zt are independent, identically (iid) distributed random variables with E(Z0) = 0
and E(Z20 ) = 1. This general class of time-varying ARCH (tvARCH) processes was investi-
gated in detail in Dahlhaus and Subba Rao (2006). It was shown that the tvARCH process
can locally be approximated by a stationary process, the details we summarise below. Fur-
thermore, a local quasi-likelihood method was proposed to estimate the parameters of the
tvARCH(p) model.
A potential application of the tvARCH process is to model long financial time series. The
modelling of financial data using nonstationary time series models has recently attracted
considerable attention. A justification for using such models can be found, for example,
in Mikosch and Starica (2000, 2004). However given that financial time series are often
sampled at high frequency, evaluating the likelihood as each observation comes online can
be computationally expensive. Thus an “online” method, which uses the previous estimate
of the parameters at time point (t− 1) and the observation at time point t to estimate the
parameter at time point t, would be ideal and cost effective. There exists a huge literature
on recursive algorithms - mainly in the context of linear systems (cf. Ljung and Soderstrom
(1983), Solo (1981, 1989) or in the context of neural networks (cf. White (1996), Chen and
White (1998)). For a general overview see also Kushner and Yin (2003). Motivated by
the least mean squares algorithm considered in Moulines et al. (2005) for time-varying AR
processes, we consider in this paper the following online recursive algorithm for tvARCH
models:
at,N = at−1,N + λX2t,N − aT
t−1,NXt−1,N Xt−1,N
|Xt−1,N |21, t = (p + 1), . . . , N, (2)
where X Tt−1,N = (1, X2
t−1,N , . . . , X2t−p,N ) and where |Xt−1,N |1 = 1 +
∑pj=1 X2
t−j,N and we
start with the initial conditions ap,N = (0, . . . , 0). This algorithm is linear in the es-
timators, despite the nonlinearity of the tvARCH process. We call the stochastic algo-
rithm defined in (2) the ARCH normalised recursive estimation (ANRE) algorithm. Let
a(u)T = (a0(u), . . . , ap(u)), then at,N is regarded as an estimator of a( tN ) or of a(u) if
|t/N − u| < 1/N .
In this paper we will prove consistency and asymptotic normality of this recursive estima-
tor. Furthermore, we will discuss the improvements of the estimator obtained by combining
two estimates from (2) with different λ. Unlike most other work in the area of recursive
1
estimation the properties of the estimator are proved under the assumptions that the true
process is a process with time-varying coefficients, i.e. a nonstationary process. The rescal-
ing of the coefficients in (1) to the unit interval corresponds to the ’infill asymptotics’ in
nonparametric regression: As N → ∞ the system does not describe the asymptotic behavior
of the system in a physical sense but is meant as a meaningful asymptotics to approximate
e.g. the distribution of estimates based on a finite sample size. A similar approach has been
used in Moulines et al. (2005) for tvAR-models. A more detailed discussion of the relevance
of this approach and the relation to non-rescaled processes can be found in Section 3.
In fact the ANRE algorithm resembles the NLMS-algorithm investigated in Moulines
et al. (2005). Rewriting (2) we have
at,N = (I − λXt−1,NX T
t−1,N
|Xt−1,N |21) at−1,N + λ
X2t,NXt−1,N
|Xt−1,N |21. (3)
We can see from (3) that the convergence of the ANRE algorithm relies on showing some
type of exponential decay of the past. In this paper we will show that for any p > 0
E
∥
∥
∥
∥
∥
k∏
i=1
(I − λ1
|Xt−i−1,N |21Xt−i−1,NX T
t−i−1,N )
∥
∥
∥
∥
∥
p
≤ K(1 − λδ)k for some δ > 0, (4)
where ‖·‖ denotes the spectral norm of a matrix and∏n
i=0 Ai = A0 · · ·An. Roughly speaking
this means we have, on average, exponential decay of the past. Similar properties are often
established in the control literature and it is often referred to as persistence of excitation
of the stochastic matrix (see, for example, Guo (1994) and Aguech et al. (2000)), which
in our case is the matrix (1 − λ|Xt−1,N |21
Xt−1,NX Tt−1,N ). Persistence of excitation guarentees
convergence of the algorithm, which we will use to prove the asymptotic properties of at,N .
In Section 2 we state all results on the asymptotic behaviour of at,N including con-
sistency, asymptotic normality and rate efficiency. Furthermore, we suggest a modified
algorithm based on two parallel algorithms. In Section 3 we discuss practical implications.
Sections 4 and 5 contain the proofs which in large parts are based on the pertubation tech-
nique. We also derive a lower bound for the optimal rate of such estimators and prove that
it is achieved. Some technical methods are put in the appendix. We note that some of re-
sults in the appendix are of independent interest, as they deal with many of the probablistic
properties of both ARCH and tvARCH processes and their vector representations.
2 The ANRE algorithm
We first review some properties of the tvARCH process. Dahlhaus and Subba Rao (2006)
and Subba Rao (2004) have shown that the time varying ARCH process can locally be
approximated by a stationary ARCH process. Let u be fixed and
Xt(u) = Ztσt(u) σt(u)2 = a0(u) +
p∑
j=1
aj(u)Xt−j(u)2, (5)
2
where Zt are independent, identically distributed random variables with E(Z0) = 0 and
E(Z20 ) = 1. We also set Xt(u)T = (1, Xt(u)2, . . . , Xt−p+1(u)2). In Lemma 4.1 we show that
Xt(u)2 can be regarded as the stationary approximation of X2t,N around the time points
t/N ≈ u.
Let λmin(A) and λmax(A) denote the smallest and largest eigenvalues of the matrix A.
We use the following assumptions.
Assumption 2.1 Let Xt,N and Xt(u) be sequences of stochastic processes which satisfy
(1) and (5) respectively. Furthermore
(i) For some r ∈ [1,∞), there exists η > 0 with
E(Z2r0 )1/r sup
u
p∑
j=1
aj(u) < 1 − η.
(ii) There exists 0 < ρ1 ≤ ρ2 < ∞ such that for all u ∈ (0, 1] ρ1 ≤ a0(u) ≤ ρ2.
(iii) There exists β ∈ (0, 1] and a constant K such that for u, v ∈ (0, 1]
|aj(u) − aj(v)| ≤ K|u − v|β for j = 0, . . . , p.
(iv) Let Y0(u) = a0(u)Z20 and
and Yt(u) =
a0(u) +t∑
j=1
aj(u)Yt−j(u)
Z2t for t = 1, . . . , p.
Define Y p(u) = (1, Y1(u), . . . , Yp(u))T and Σ(u) = E(
Y p(u)Y p(u)T)
. Then there
exists a constant C such that
infu
λmin
Σ(u)
≥ C.
Remark 2.1 It is clear that Σ(u) is a positive semi-definite matrix, hence its smallest
eigenvalue is greater than or equal to zero. It can be shown that if p/E(Z4t )1/2 < 1 and
supu a0(u) > 0, then λmin(Σ(u)) > (1 − p/E(Z4t )1/2)/a0(u)(2p+1)/(p+1). However this con-
dition is only sufficient and lower bounds can be obtained under much weaker conditions.
We now investigate some of the asymptotic properties of the ANRE algorithm. Through-
out this paper we assume that λ → 0 and λN → ∞ as N → ∞. We mention explicitly that
λ does not depend on t, i.e. we are considering the fixed-stepsize case. The assumption
λ → 0 is possible in the triangle array framework of our model and the resulting asser-
tions (for example Theorem 2.2) are meant as an approximation of the corresponding finite
sample size distributions and not as the limit in any physical sense.
3
The following results are based on a representation proved at the end of Section 4.3.
The difference at0,N − a(u0) is dominated by two terms, that is
at0,N − a(u0) = Lt0(u0) + Rt0,N (u0) + Op(δN ), (6)
where
δN = (1
(Nλ)2β+
√λ
(Nλ)β+ λ +
1
Nβ), (7)
Lt0(u0) =
t0−p−1∑
k=0
λI − λF (u0)kMt0−k(u0) (8)
Rt0,N (u0) =
t0−p−1∑
k=0
λI − λF (u0)k
(
Mt0−k(t0 − k
N) −Mt0−k(u0) +
F (u0)a(t0 − k
N) − a(u0)
)
,
with
Mt(u) = (Z2t − 1)σt(u)2
Xt−1(u)
|Xt−1(u)|21(9)
and
F (u) = E
(X0(u)X0(u)T
|X0(u)|21
)
. (10)
We note that Lt0(u0) and Rt0,N (u0) play two different roles. Lt0(u0) is the weighted sum of
the stationary random variables Xt(u0)t, which locally approximate the tvARCH process
Xt,Nt, whereas Rt0,N (u0) is the (stochastic) bias due to nonstationarity; if the tvARCH
process were stationary this term would be zero. It is clear from the above that the magni-
tude of Rt0,N (u0) depends on the regularity of the time-varying parameters a(u) e.g., the
Holder class that a(u) belongs to. By using (6) we are able to obtain a bound for the mean
squared error of at0,N . Let | · | denote the Euclidean norm of a vector.
Theorem 2.1 Suppose Assumption 2.1 holds with r > 4 and u0 > 0. Then if |u0−t0/N | <
1/N we have
E
|at0,N − a(u0)|2
= O(λ +1
(Nλ)2β), (11)
where λ → 0, as N → ∞ and Nλ >> (log N)1+ε with some ε > 0.
The proof can be found at the end of Section 4.2. We note that an immediate consequence
of Theorem 2.1 is that the estimator at0,NP→ a(u0).
4
We note that the condition r > 4, is quite strong. We use the condition r = 4, to
prove persistence of excitation (see Section A.2), which is an integral part in the proof of
the theorem. However, we believe that it is possible to prove the same result under lower
moment conditions. In order to obtain the mean squared error we use the condition r > 4.
The stochastic term Lt0(u0) is the sum of martingale differences, which allows us to
obtain the following central limit theorem, whose proof is at the end of Section 4.3.
Theorem 2.2 Suppose Assumption 2.1 holds with r > 4 and u0 > 0. Let Rt0,N (u0) be
defined as in (8). If |t0/N − u0| < 1/N ,
(i) λ >> N−4β/(4β+1) and λ >> N−2β then
λ−1/2at0,N − a(u0) − λ−1/2Rt0,N (u0)D→ N (0, Σ(u0)), (12)
(ii) and λ >> N− 2β
2β+1 then
λ−1/2at0,N − a(u0) D→ N (0, Σ(u0)), (13)
where λ → 0 as N → ∞ and Nλ >> (log N)1+ε, for some ε > 0, with
Σ(u) =µ4
2F (u)−1
E(σ1(u)4X0(u)X0(u)T
|X0(u)|41)
(14)
and µ4 = E(Z40 ) − 1.
Remark 2.2 (Stationary ARCH-models) We note that an analogous algorithm to the
ANRE algorithm exists for stationary ARCH processes. Suppose the stationary ARCH(p)
process Xtt satisfies
Xt = Ztσt σ2t = a0 +
p∑
j=1
ajX2t−j ,
where Ztt are iid random variables with E(Zt) = 0, E(Z2t ) = 1, a0 > 0, aj ≥ 0 if 1 ≤ j ≤ p,
and∑p
j=1 aj < 1. Let aT = (a0, . . . , ap) and
at = at−1 +K
tX2
t − aTt−1Xt−1
Xt−1
|Xt−1|21, t ≥ (p + 1),
where X Tt−1 = (1, X2
t−1, . . . , X2t−p) and K is a finite positive constant. Then given the
observations Xk (k = 1, . . . , t), we use at as an estimator of a. Since the stepsize λ = Kt
does not fulfill our assumptions we cannot apply the above theorems directly. However, we
believe that by using the same methodology it can be shown that
atP→ a,
5
and
t1/2(at − a)D→ N (0, Σ) with Σ =
µ4
2E(X0X T
0
|X0|21)−1
E(σ4
1X0X T0
|X0|41)
.
This algorithm has the advantage that it is simple to evaluate and gives “stable” estimators,
in the sense that the difference |at − at+1|1 is small.
Until now we have assumed that a(u) ∈ Lip(β), where β ≤ 1. Let f(u) denote the
derivative of the vector or matrix f(·) with respect to u, Suppose 0 < β′ ≤ 1 and a(u) ∈Lip(β′), then we say a(u) ∈ Lip(1+β′). We now show that an exact expression for the bias
can be obtained if a(u) ∈ Lip(1 + β′) and β′ > 0.
We make the following assumptions.
Assumption 2.2 Let Xt,N be a sequence of stochastic processes which satisfy (1). Sup-
pose
(i) Assumption 2.1 holds.
(ii) For some β′ > 0
|ai(u) − ai(v)| ≤ K|u − v|β′
, for i = 0, . . . , p.
Under the above assumptions we show in Lemma 5.3 that
Eat0,N − a(u0) = − 1
NλF (u0)
−1a(u0) + O(1
(Nλ)1+β′ ). (15)
We note that at0,N is not a typical parameter estimator of an ARCH process, where usually
it is not possible to obtain an exact expression for the bias.
By using the bias above we obtain the following theorem, whose proof is at the end of
Section 5. Let tr(·) denote the trace of the matrix.
Theorem 2.3 Suppose Assumption 2.2 holds with r > 4 and u0 > 0. Then if |t0/N −u0| <
1/N , we have
E|at0,N − a(u0)|2 = λ trΣ(u0) +1
(Nλ)2∣
∣F (u0)−1a(u0)|2 + (16)
O
1
(Nλ)2+β′ +λ1/2
(Nλ)1+β′ +1
(Nλ)2
,
and if λ is such that λ−1/2/(Nλ)1+β′ → 0, then
λ−1/2(at0,N − a(u0)) + λ−1/2 1
NλF (u0)
−1a(u0)D→ N (0, Σ(u0)), (17)
where λ → 0 as N → ∞ and λN >> (log N)1+ε, for some ε > 0.
6
We now consider the minimax risk for estimators of tvARCH processes. Let ⌊β⌋ denote
the largest integer strictly less than β and define the norm ‖ · ‖β on f : [0, 1] → Rp+1 as
‖f‖β = supu,v
|f ⌊β⌋(u) − f ⌊β⌋(v)|1|u − v|β−⌊β⌋
.
Let Cβ(L) define the class
Cβ(L) =
f(·) =(
f0(·), . . . , fp+1(·))
∣
∣
∣ f : [0, 1] → (R+)p+1, ‖f‖β ≤ L, (18)
supu
p+1∑
i=1
fi(u) < 1 − δ, 0 < ρ1 < infu
f0(u) < ρ2 < ∞
.
It is clear if a(·) ∈ Cβ(L) are the parameters of the tvARCH process Xt,N, then Assump-
tion 2.1(i,ii,iii) is satisfied. In the following theorem we evaluate a lower bound for the
minimax risk.
Theorem 2.4 Let fZ denote the density function of Zk. Suppose E(Z4t ) < ∞ and
Z = E
1 + Zkd log fZ(x)
dx⌋x=Zk
2
< ∞.
Let Cβ(L) be defined as in (18) and If |t0/N − u0| < 1/N , then for any ‖x‖ ≤ 1, we have
infat0,N∈σ(Xt,N ;t=1,...,N)
supa(u)∈Cβ(L)
E[xT (at0,N − a(u0))(at0,N − a(u0))T x]
≥ C(Z, E(Z40 ), ρ1)) N−2β/(2β+1), (19)
where Xt,N is a tvARCH process defined by the parameters a(·) and C(Z, E(Z40 , ρ1)) is a
constant which only depends on Z, E(Z40 ) and ρ1.
Comparing this bound with (11) it is straightforward to show that the ANRE algorithm
attains the optimal rate if a(·) ∈ Lip(ν) with ν ≤ 1 (with λ = N− 2ν1+2ν ). The story is different
when 1 < ν < 2. If a(·) ∈ Lip(1+β′), β′ > 0, the mean squared error of the ANRE estimator
in (16) gets minimal for λ ≈ N−2/3 with minimum rate E|at0,N − a(u0)|2 = O(N−2/3).
However, in (19) the minimax rate in this case is N−
2(1+β′)
1+2(1+β′) which is smaller than N−2/3.
We now present a recursive method which attains the optimal rate even in this case.
Remark 2.3 (Bias correction, rate optimality) The idea here is to achieve a bias cor-
rection and the optimal rate by running two ANRE algorithms with different stepsizes λ1
and λ2 in parallel. Let at,N (λ1) and at,N (λ2) be the ANRE algorithms with stepsize λ1 and
λ2 respectively, and assume that λ1 > λ2. By using (15) for i = 1, 2 we have
Eat0,N (λi) = a(u0) −1
NλiF (u0)
−1a(u0) + O( 1
(Nλi)1+β′
)
. (20)
7
Since a(u0) − 1Nλi
F (u0)−1a(u0) ≈ a
(
u0 − 1Nλi
F (u0)−1)
we heuristically estimate a(u0 −1
NλiF (u0)
−1) instead of a(u0) by the algorithm. By using two different λi we can find a
linear combination of the corresponding estimates such that we “extrapolate” the two values
a(u0) − 1Nλi
F (u0)−1a(u0) (i = 1, 2) to a(u0). Formally let 0 < w < 1, λ2 = wλ1 and
at0,N (w) =1
1 − wat0,N (λ1) −
w
1 − wat0,N (λ2).
If |t0/N − u0| < 1/N , then by using (20) we have
Eat0,N (w) = a(u0) + O( 1
(Nλ)1+β′
)
.
By using Propostion 4.3 we have
E|at0,N − a(u0))|2 = O(λ +1
(Nλ)2(1+β′)),
and choosing λ = const × N−(2+2β′)/(3+2β′) gives the optimal rate. It remains the problem
of choosing λ (and w). It is obvious that λ should be chosen adaptively to the degree of
nonstationarity. That is λ should be large if the characteristics of the process are changing
more rapidly. However, a more specific suggestion would require more investigations - both
theoretically and by simulations.
The above method can not be extended to higher order derivatives since then the other
remainders of order (Nλ)−2 (see (16) and the proof of Theorem 2.3).
Finally we mention that choosıng λ2 < wλ1 will lead to an estimator of a(u0 + ∆) with
some ∆ > 0 (with rate as above). This could be the basis for the prediction of volatility of
time varying ARCH-processes.
3 Practical implications
Suppose that we observe data from a (non-rescaled) time-varying ARCH-process in discrete
time
Xt = Ztσt, σ2t = a0(t) +
p∑
j=1
aj(t)X2t−j , t ∈ Z. (21)
In order to estimate a(t) we use the estimator at as given in (2) (with all subscripts
N dropped). An approximation for the distribution of the estimator is given by Theo-
rem 2.2. Theorem 2.2(ii) can be used directly since it is completely formulated without
N . The matrices F (u0) and Σ(u0) depend on the unknown stationary approximation
Xt(u0) of the process at u0 = t0/N , i.e. at time t0 in non-rescaled time. Since this ap-
proximation is unknown we may use instead the process itself in a small neighborhood
of t0, i.e. we may estimate for example F (u0) by 1m
∑m−1j=0
Xt0−jXTt0−j
|Xt0−j |21with m small and
8
X Tt−1 = (1, X2
t−1, . . . , X2t−p). An estimator which fits better to the recursive algorithm is
[1 − (1 − λ)t0−p+1]−1∑t0−p
j=0 λ(1 − λ)j Xt0−jXTt0−j
|Xt0−j |21. In the same way we can estimate Σ(u0)
which altogether leads e.g. to an approximate confidence interval for at. In a similar way
Theorem 2.2(i) can be used.
The situation is more difficult with Theorem 2.1 and Theorem 2.3, since here the results
depend (at first sight) on N . Suppose that we have parameter functions aj(·) and some
N > t0 with aj(t0N ) = aj(t0) (i.e. the original function has been rescaled to the unit interval).
Consider Theorem 2.3 with the functions aj(·). The bias in (15) and (16) contains the term
1
N˙aj(u0) ≈
1
N
aj(t0N ) − aj(
t0−1N )
1N
= aj(t0) − aj(t0 − 1)
which again is independent of N . To avoid confusion we mention that 1N
˙aj(u0) of course
depends on N once the function aj(·) has been fixed (as in the asymptotic approach of
this paper) but it does not depend on N when it is used to approximate the function aj(t)
since then the function aj(·) is a different one for each N . In the spirit of the remarks
above we would e.g. use as an estimator of 1N
˙aj(u0) in (15) and (16) the expression
[1 − (1 − λ)t0−p+1]−1∑t0−p
j=0 λ(1 − λ)j[
aj(t0) − aj(t0 − j)]
.
These considerations also demonstrate the need for the asymptotic approach of this
paper: While it is not possible to set down a meaningful asymptotic theory for the model
(21) and to derive e.g. a central limit theorem for the estimator at, the approach of the
present paper for the rescaled model (1) leads to such results. This is achieved by the ’infill
asymptotics’ where more and more data become available of each local structure (e.g. about
time u0) as N → ∞. The results can then be used also for approximations in the model
(21) - e.g. for confidence intervals.
4 Proofs
4.1 Some preliminary results
In the next lemma we give a bound for the approximation error between X2t,N and Xt(u)2.
The proofs of these results and further details can be found in Dahlhaus and Subba Rao
(2006) (see also Subba Rao (2004)).
Lemma 4.1 Suppose Assumption 2.1 holds with some r ≥ 1. Let Xt,N and Xt(u) be
defined as in (1) and (5). Then we have
(i) Xt(u)2t is a stationary ergodic sequence. Furthermore there exists a stochastic pro-
cess Vt,Nt and a stationary ergodic process Wtt with supt,N E(|Vt,N |r) < ∞ and
E(|Wt|r) < ∞, such that
|X2t,N − Xt(u)2| ≤ 1
NβVt,N + | t
N− u|βWt (22)
9
|Xt(u)2 − Xt(v)2| ≤ |u − v|βWt. (23)
(ii) supt,N E(X2rt,N ) < ∞ and supu E(Xt(u)2r) < ∞.
We now define the derivative process by X2tt(u)t.
Lemma 4.2 Suppose Assumption 2.2 holds with some r ≥ 1. Then the derivative X2t(u)t
is a well defined process and satisfies the representation
X2t(u) =
a0(u) +
p∑
j=1
aj(u)X2t−j(u) + aj(u)Xt−j(u)2
Z2t .
Xt(u)2t is a stationary, ergodic process with
|X2t(u) − X2
t(v)| ≤ |u − v|β′
Wt, (24)
where Wt is the same Wt used in Lemma 4.1. Almost surely all paths of Xt(u)2u belong
to Lip(1) and we have the Taylor series expansion
X2t,N = Xt(u)2 + (
t
N− u)X2
t(u) + (| t
N− u|1+β′
+1
N)Rt,N ,
where |Rt,N |1 ≤ (Vt,N + Wt). Furthermore, supN X2t,N ≤ Wt, supu Xt(u)2 ≤ Wt and
supu |X2t(u)| < Wt. And the norms are bounded: supu E(|X2
t(u)|r) < ∞ and supt,N E(|Rt,N |r) <
∞.
Define Ft = σ(Z2t , Z2
t−1, . . .), (is clear that Ft = σ(X2t,N , X2
t−1,N , . . .) = σ(Xt(u)2, Xt−1(u)2, . . .),
since a Volterra expansion gives Xt,N in terms of Z2t t and the ARCH equations give Zt
in terms of Xtt). We now consider the mixing properties of functions of the processes
Xt,Nt and Xt(u)t. The proof of the proposition below can be found in Section A.1.
Proposition 4.1 Suppose Assumption 2.1 holds with r = 1. Let Xt,N and Xt(u) be
defined as in (1) and (5) respectively. Then there exists a (1− η) ≤ ρ < 1 such that for any
φ ∈ Lip(1)∣
∣E[
φ(Xt,N )|Ft−k
]
− E[
φ(Xt,N )]∣
∣
1≤ Kρk(1 + |Xt−k,N |1), (25)
∣
∣E[
φ(Xt(u))|Ft−k
]
− E[
φ(Xt(u))]∣
∣
1≤ Kρk(1 + |Xt−k(u)|1), (26)
∣
∣E[
φ(Xt,N )|Ft−k
]∣
∣
1≤ K(1 + ρk|Xt−k,N |1), (27)
if j, k > 0 then∣
∣E[
φ(Xt,N )|Ft−k
]
− E[
φ(Xt,N )|Ft−k−j
]∣
∣
1≤ Kρk (|Xt−k,N |1 + |Xt−k−j,N )|1) , (28)
and if Assumption 2.1 holds with some r > 1 then for 1 ≤ q ≤ r we have
E(
|Xt,N |q1∣
∣Ft−k) ≤ K|Xt−k,N |q1 (29)
where the constant K is independent of t, k, j and N .
10
The corollary below follows by using (25) and (26).
Corollary 4.1 Suppose Assumption 2.1 holds. Let Xt,N and Xt(u) be defined as in
(1) and (5) respectively and φ ∈ Lip(1). Then φ(Xt,N ) and φ(Xt(u)) are Lq-mixingales
of size −∞.
4.2 The pertubation technique
In this section we use the pertubation technique, introduced in Aguech et al. (2000), to
study the ANRE estimator, which we use to show consistency. To analyse the algorithm we
compare it with a similar algorithm, which is driven with the true parameters and where
Xt,N has been replaced by the stationary process Xt(u). Let δt,N (u) = at,N − a(u),
Mt,N = (Z2t − 1)σ2
t,N
Xt−1,N
|Xt−1,N |21, Mt(u) = (Z2
t − 1)σt(u)2Xt−1(u)
|Xt−1(u)|21, (30)
Ft,N =Xt,NX T
t,N
|Xt,N |21, Ft(u) =
Xt(u)Xt(u)T
|Xt(u)|21, (31)
and F (u) be defined as in (10). The ‘true algorithm’ is
a(u) = a(u) + λXt(u)2 − a(u)TXt−1(u) Xt−1(u)
|Xt−1(u)|21− λMt(u). (32)
An advantage about the specific form of the random matrices Ft,N is that |Ft,N |1 ≤ (p+1)2.
This upper bound will make some of the analysis easier to handle.
By subtracting (32) from (2) we get
δt,N (u) =(
I − λFt−1,N
)
δt−1,N (u) + λBt,N (u) + λMt(u), (33)
where
Bt,N (u) = Mt,N −Mt(u) + Ft−1,Na(t
N) − a(u). (34)
We note that (33) can also be considered as a recursive algorithm, thus we call (33) the
error algorithm. There are two terms driving the error algorithm: the bias Bt,N (u) and the
stochastic term, Mt(u). Because the error algorithm is linear with respect to the estimators
we can separate δt,N (u) in terms of the bias and the stochastic terms:
δt,N (u) = δBt,N (u) + δM
t,N (u) + δRt,N (u),
where
δBt,N (u) = (I − λFt−1,N )δB
t−1,N (u) + λBt,N (u), δBp,N (u) = 0 (35)
δMt,N (u) = (I − λFt−1,N )δM
t−1,N (u) + λMt(u), δMp,N (u) = 0 (36)
and δRt,N (u) = (I − λFt−1,N )δR
t−1,N (u), δRp,N (u) = −a(u). (37)
11
We have for y ∈ B, M, R
δyt,N (u) =
t−p−1∑
k=0
k−1∏
i=0
(
I − λFt−i−1,N
)
Dyt−k,N (u), (38)
where DBt,N (u) = λBt,N (u), DM
t,N (u) = λMt(u), DRt,N (u) = 0 if t > p and DR
p,N (u) = −a(u).
It is clear for δyt,N (u) to converge, that the random product
∏t−1i=0(I − λFt−i−1,N ) must
decay. Technically this is one of the main results. It is stated in the following theorem.
Theorem 4.1 Suppose the Assumption 2.1 holds with r = 4 and N is sufficiently large.
Then for all q ≥ 1 and (p + 1) ≤ t ≤ N there exists M > 0, δ > 0 such that
E
∥
∥
∥
∥
∥
k−1∏
i=0
(
I − λXt−i−1,NX T
t−i−1,N
‖Xt−i−1,N‖21
)
∥
∥
∥
∥
∥
q
≤ M exp−δλk (39)
Under Assumption 2.1 and by using the proposition above, there exists a δ > 0 such
that
(
E|δRt,N |q
)1/q ≤ M exp−δλt, (40)
for 1 ≤ q ≤ r and t = p + 1, . . . , N . Therefore this term decays exponentially and, as we
shall see below, is of lower order than δBt,N (u) and δM
t,N (u).
We now study the stochastic bias at t0 with |t0/N −u0| < 1/N . It is straightforward to
see that δBt,N (u) = δ1,B
t,N (u) + δ2,Bt,N (u) + δ3,B
t,N (u), where
δ1,Bt,N (u) = (I − λFt−1,N )δ1,B
t−1,N (u) + λMt,N −Mt(u),
δ2,Bt,N (u) = (I − λFt−1,N )δ2,B
t−1,N (u) + λF (u)a(t
N) − a(u),
δ3,Bt,N (u) = (I − λFt−1,N )δ3,B
t−1,N (u) + λ(Ft−1,N − F (u))a(t
N) − a(u), (41)
δ1,Bp,N = 0, δ2,B
p,N = 0 and δ3,Bp,N = 0.
In order to obtain asymptotic expressions for the expectation of each component δ1,Bt0,N ,
δ2,Bt0,N , δ2,B
t0,N and δMt0,N we use the pertubation technique proposed by Aguech et al. (2000).
Roughly speaking this means substituting the product of random matrices by a deterministic
product and showing that the difference is of a lower order. We can decompose for x =
M, (1, B), (2, B) the stochastic and bias terms as follows
δxt0,N (u0) = Jx,1
t0,N (u0) + Jx,2t0,N (u0) + Hx
t0,N (u0), (42)
where
Jx,1t,N (u0) = (I − λF (u0))J
x,1t−1,N (u0) + Gx
t,N , (43)
Jx,2t,N (u0) = (I − λF (u0))J
x,2t−1,N (u0) − λFt−1,N − F (u0)Jx,1
t−1,N (u0),
and Hxt,N (u0) = (I − λFt−1,N )Hx
t−1,N (u0) − λFt−1,N − F (u0)Jx,2t−1,N (u0),
12
with
GMt,N = λMt(u0), G1,B
t,N = λ[Mt,N −Mt(u0)],
G2,Bt,N = F (u0)[a( t
N ) − a(u)], G3,Bt,N = (Ft−1,N − F (u0))[a(
t
N) − a(u)] for t < t0.(44)
Furthermore, Jx,1p,N (u0) = Jx,2
p,N (u0) = Hxp,N (u0) = 0. (42) can easily be derived by taking the
sum on both sides of the three equations in (43). In the proposition below we will show that
Jx,1t0,N (u0) is the principle term in the expansion of δx
t0,N (u0), i.e. with δxt0,N (u0) ≈ Jx,1
t0,N (u0).
Substituting (44) into (43) we have
J(1,B),1t0,N (u0) =
t0−p−1∑
k=0
λI − λF (u0)kMt0−k,N −Mt0−k(u0)
J(2,B),1t0,N (u0) =
t0−p−1∑
k=1
I − λF (u0)kF (u0)a(t0 − k
N) − a(u0)
J(3,B),1t0,N (u0) =
t0−p−1∑
k=1
I − λF (u0)k(Ft−k−1,N − F (u0))a(t0 − k
N) − a(u0) (45)
In the proof below we require the following definition
Dt,N =
p−1∑
i=0
[Vt−i,N + Wt−i], for t ≥ p (46)
where Vt,N and Wt are defined as in Lemma 4.1.
Proposition 4.2 Suppose Assumption 2.1 holds with r > 4 and let δBt0,N (u0) and J
(1,B),1t0,N (u0),
J(2,B),1t0,N (u0) and J
(3,B),1t0,N (u0), be defined as in (35) and (45). Then for |t0/N − u0| < 1/N
and λN ≥ (log N)1+ε, for some ε > 0, we have
(i)
(
E|J (1,B),1t0,N (u0) + J
(2,B),1t0,N (u0)|r
)1/r= O(
1
(Nλ)β) (47)
(ii) (E|J (3,B),1t0,N (u0)|r)1/r = O( 1
(Nλ)2β + λ1/2(Nλ)−β)
(iii) and
(
E|δBt0,N (u0) − J
(1,B),1t0,N (u0) − J
(2,B),1t0,N (u0)|2
)1/2= O(
1
(Nλ)2β+
√λ
(Nλ)β+ λ). (48)
PROOF. We first prove (i). Let us consider J(1,B),1t0,N (u0). By using (117) we have
|Mt,N −Mt(u0)|1 ≤ |Z2t + 1|
1
Nβ+ (
p
N)β + | t
N− u0|β
Dt,N .
13
Therefore by substituting the above bound into J(1,B),1t0,N (u0) and by using (110) we have
(
E|J (1,B),1t0,N (u0)|r
)1/r≤
t0−p−1∑
k=0
λ‖(I − λF (u0))k‖ (E|Mt0−k,N −Mt0−k(u0)|r)1/r
≤ K
Nβ
t0−p−1∑
k=0
λ(1 − δλ)k1 + pβ + kβ(E(Z20 + 1)r)1/r(E|Dt0−k−1,N |r)1/r.
Now by using Lemma 4.1(i) we have that supt,N ‖Dt,N‖Er < ∞. Furthermore, from Moulines
et al. (2005), Lemma C.3 we have
N∑
k=1
(1 − λ)kkβ ≤ λ−1−β . (49)
By using the above we obtain
(E|J (1,B),1t0,N (u0)|r)1/r ≤ K sup
t,N(E|Dt,N |r)1/r(E|Z2
0 + 1|r)1/r 1
(Nλ)β= O(
1
(Nλ)β). (50)
We now bound (E|J (2,B),1t0,N (u0)|r)1/r. Since a(u) ∈ Lip(β), by using (110) and (49) we have
(E|J (2,B),1t0,N (u0)|r)1/r ≤ C
t0−p−1∑
k=1
λ(1 − δλ)k( k
N
)β= O(
1
(λN)β). (51)
Thus (50) and (51) give the bound (47), which completes the proof of (i).
To prove (ii), we now bound (E|J (3,B),1t0,N (u0)|r)1/2. We first observe that J
(3,B),1t0,N (u0) can
be written as
J(3,B),1t0,N (u0) = It0,N + IIt0,N
where It0,N =
t0−p−1∑
k=1
I − λF (u0)k(Ft−k−1,N − Ft−k−1(u0))a(t0 − k
N) − a(u0)
IIt0,N =
t0−p−1∑
k=1
I − λF (u0)k(Ft−k−1(u0) − F (u0))a(t0 − k
N) − a(u0).
Using the same proof to prove J(1,B),1t0,N (u0), it is straightforward to show that (E|It0,N |r)1/r =
O((Nλ)−2β). In order to bound IIt0,N , let Fk(u0) = Fk(u0) − F (u0) and φ(x) = xxT /|x|21where x = (1, x1, . . . , xp), then φ(x) ∈ Lip(1) and Ft,N = φ(Xt,N ). Since φ ∈ Lip(1), by
using Corollary 4.1 we have that Ft(u0) − F (u0) can be written as the sum of martingale
differences:
Ft(u0) − F (u0) =∞∑
ℓ=0
mt(ℓ), (52)
14
where mt(ℓ) is a (p+1)×(p+1) matrix, defined by mt(ℓ) = E(Ft,N |Ft−ℓ)−E(Ft,N |Ft−ℓ−1).By using (28) we have (E|mt(ℓ)|r/2)2/r ≤ Kρℓ. Substituting the above into B2,B
t0,N we have
IIt0,N =∞∑
ℓ=0
t0−p−1∑
k=0
λI − λF (u0)kmt0−k−1(ℓ)a(t0 − k
N) − a(u0)
IIt0,Ni =
p+1∑
j=1
∞∑
ℓ=0
t0−p−1∑
k=0
λI − λF (u0)kmt0−k−1(ℓ)ijaj(t0 − k
N) − aj(u0).
We note that ‖I − λF (u0)k‖ ≤ K(1 − λδ)k (see (110)). Furthermore, if t1 < t2 then
Emt1(ℓ)mt2(ℓ) = Emt1(ℓ)E(mt2(ℓ)|Ft1−ℓ) = 0, therefore mt(ℓ)t is a sequence of mar-
tingale differences. Therefore, since J(2,B),1t0−k−1,N is deterministic we can use Burkholder’s
inequality (cf. (Hall & Heyde, 1980), Theorem 12.2) to obtain
(E|IIt0,N |r)1/r ≤ Kλ
∞∑
ℓ=0
p+1∑
i,j=1
(
E∣
∣
t0−p−1∑
k=0
I − λF (u0)kmt0−k−1(ℓ)ija(t0 − k
N)j − a(u0)j
∣
∣
r
)1/r
≤ Kλ∞∑
ℓ=0
p+1∑
i,j=1
2r
t0−p−1∑
k=0
(1 − λδ)2k(
E|mk(ℓ)ij |r)2/r
(k
N)β
≤ K
(Nλ)β
∞∑
ℓ=0
ρℓλ1/2 = O(λ1/2
(Nλ)β). (53)
Thus we have proved (ii).
We now prove (iii). By using (42) we have
(E|δBt0,N (u0) − J
(1,B),1t0,N (u0) − J
(2,B),1t0,N (u0)|r/2)2/r
= (E|J (3,B),1t0,N (u0)|r/2)2/r + (E|
3∑
i=1
J(i,B),2t0,N (u0)|r/2)2/r + (E|
3∑
i=1
H(i,B)t0,N (u0)|r/2)2/r + O(
1
Nβ+ exp−λδt0).
We partition∑3
i=1 J(i,B),2t0,N into four terms:
3∑
i=1
J(i,B),2t0,N = AB
t0,N + B1,Bt0,N + B2,B
t0,N + B3,Bt0,N (54)
where,
ABt0,N = −
t0−p−1∑
k=0
λI − λF (u0)k[Ft0−k−1,N − Ft0−k−1(u0)] ×(
3∑
i=1
J(i,B),1t0−k−1,N
)
and, for y ∈ 1, 2, 3,
By,Bt0,N = −
t0−p−1∑
k=0
λI − λF (u0)k[Ft0−k−1(u0) − F (u0)]J(y,B),1t0−k−1,N (u0).
15
We first bound ABt0,N . By using (50), (51), (53) and (119) we have
(E|ABt0,N |r/2)2/r ≤ K
(Nλ)2β.
By using (134) and (135) we have
(E|B1,Bt0,N |r/2)r/2 = E
∣
∣−t0−p−1∑
k=0
t0−p−k−1∑
i=0
λ2I − λF (u0)k+iFt0−k−1(u0) ×
Mt0−k−i−1,N −Mt0−k−i−1(u0)∣
∣
r/2)2/r ≤ Kλ.
Using a similar proof to the above to bound (E|J (3,B),1t0−k−1,N (u0)|r)1/r we can show that
‖B2,Bt0,N‖E
r/2 = O(λ1/2(Nλ)−β). By using (53) it is straightforward to show that ‖B3,Bt0,N‖E
r/2 =
O((Nλ)−2β+λ1/2(Nλ)−β). Substituting the above bounds for (E|ABt0,N |r/2)2/r, (E|B1,B
t0,N |r/2)2/r,
(E|B2,Bt0,N |r/2)2/r and (E|B3,B
t0,N |r/2)2/r into (54) we obtain
(E|J (1,B),2t0,N (u0) + J
(2,B),2t0,N (u0) + J
(3,B),2t0,N (u0)|r/2)2/r = O(δN ) (55)
We now prove (iii) by bounding for y = 1, 2, 3, ‖H(y,B)t0,N (u0)‖E
2 . By using Holder’s
inequality, (55), Theorem 4.1 and that Ft,N are bounded random matrices we have
(E|H(1,B)t0,N (u0) + H
(2,B)t0,N (u0) + H
(3,B)t0,N (u0)|2)1/2 ≤
t0−p−1∑
k=0
λ
(
E‖(
k−1∏
i=0
(I − λFt0−i,N ))
‖2r/(r−4)
)(r−4)/2r
(E|[Ft0−k−1,N − F (u0)]J (1,B),2t0−k−1,N (u0) + J
(2,B),2t0−k−1,N (u0) + J
(3,B),2t0−k−1,N|r/2)2/r
≤ K( 1
(Nλ)2β+
λ1/2
(Nλ)β+ λ
)
. (56)
Since (E|J (i,B),2t0,N (u0)|2)1/2 ≤ E|J (i,B),2
t0,N (u0)|r/2)2/r, by substituting (55) and (56) into (54)
we have
(E|δBt0,N (u0) − J
(1,B),1t0,N (u0) − J
(2,B),1t0,N (u0|2)1/2
= O(1
(Nλ)2β+
√λ
(Nλ)β+ λ).
and we obtain (48).
In the following lemma we show that δMt0,N (u0) is dominated by JM,1
t0,N (u0), where
JM,1t0,N (u0) =
t0−p−1∑
k=0
λ(I − λF (u0))kMt0−k(u0). (57)
We observe that JM,1t0,N (u0) and Lt0(u0) (defined (8)) are the same.
16
Proposition 4.3 Suppose Assumption 2.1 holds with r > 4. Let δMt0,N (u0) and JM,1
t0,N (u0) be
defined as in (36) and (57). Then for |t0/N − u0| < 1/N and λN ≥ (log N)1+ε, for some
ε > 0, we have
(E|JM,1t0,N (u0)|r)1/r = O(
√λ) (58)
(E|δMt0,N (u0) − JM,1
t0,N (u0)|2)1/2 = O(λ +
√λ
(Nλ)β). (59)
PROOF. Since each component of the vector sequence Mt(u0) is a martingale difference
we can use the Burkholder inequality componentwise and (110) to obtain
(E|JM,1t0,N (u0)|r)1/r ≤ λ
2r
t0−p−1∑
k=0
(1 − δλ)2k(E|Z2t0−k − 1|r)2/r(E|σt0−k(u0)
2Xt0−k−1(u0)
|Xt0−k−1(u0)|21|r)2/r
1/2
≤ K√
λ. (60)
Since δMt0,N (u0) − JM,1
t0,N (u0) = JM,2t0,N (u0) + HM
t0,N (u0), to prove (59) we bound JM,2t0,N (u0) and
HMt0,N (u0). By using the same arguments given in Proposition 4.2 and (60) we have
(E|JM,2t0,N (u0)|r/2)2/r ≤ K
(
λ +
√λ
(Nλ)β
)
. (61)
Finally, we obtain a bound for HMt0,N (u0). By using Holder’s inequality, Theorem 4.1 and
(E‖Ft0−k−1,N − F (u0)‖r/2)2/r ≤ 2(p + 1)2 we have
(E|HMt0,N (u0)|2)1/2 ≤ 2
t0−p−1∑
k=0
λ‖(
k−1∏
i=0
(I − λFt−i−1,N ))
‖2r/(r−4))(r−4)/2r ×
(E|Ft0−k−1,N − F (u0)JM,2t0−k−1,N (u0)|r/2)2/r
≤ K(
λ +
√λ
(Nλ)β
)
. (62)
Thus we have shown (59).
By using Propositions 4.2 and 4.3 we have established that J(1,B),1t0,N (u0), J
(2,B),1t0,N (u0) and
JM,1t0,N (u0) are the principle terms in at0,N − a(u0). More precisely we have shown
at0,N − a(u0) = J(1,B),1t0,N (u0) + J
(2,B),1t0,N (u0) + JM,1
t0,N (u0) + R(1)N (63)
where ‖R(1)N ‖E
2 = O(δN ) (δN is defined in (7)). Furthermore, if λ → 0, λN → ∞ as N → ∞,
then this leads to consistency of at0,N . By using (63), we show below that an upper bound
for the mean squared error can be immediately obtained.
PROOF of Theorem 2.1 By substituting the bounds for (E|J (1,B),1t0,N (u0)+J
(2,B),1t0,N (u0)|2)1/2
and (E|JM,1t0,N (u0)|2)1/2 (in Propositions 4.2 and 4.3) into (63) we have E(|at0,N − a(u0)|2) =
O( 1(λN)β + λ1/2 + δN )2. Thus we have (11).
17
4.3 Proof of Theorem 2.2
We can see from Propositions 4.2 and 4.3 that the J(1,B),1t0,N (u0) + J
(1,B),2t0,N (u0) and JM,1
t0,N (u0)
are leading terms in the expansion of δt0,N . For this reason to prove Theorem 2.2 we need
only to consider these terms. We observe that
J(1,B),1t0,N (u0) + J
(2,B),1t0,N (u0) = Rt0,N (u0) + R
(2)N , (64)
where have replaced Mt,N by Mt(tN ) leading to the remainder
R(2)N =
t0−p−1∑
k=0
λI − λF (u0)k
Mt0−k,N −Mt0−k(t0 − k
N)
.
By using (22) we have (E|R(2)N |r)1/r ≤ K
Nβ , for all t, N . Therefore under Assumption 2.1
with r > 2, if λN ≥ (log N)1+ε for some ε > 0, then by substituting (64) into (63) we have
at0,N − a(u0) = Lt0(u0) + Rt0,N (u0) + R(3)N , (65)
where (E|R(3)N |2)1/2 = O(δN ). In the proposition below we show asymptotic normality of
Lt0(u0), which we use together with (65) to obtain asymptotic normality of at0,N
Proposition 4.4 Suppose Assumption 2.1 holds with r = 1 and for some δ > 0, E(Z4+δ0 ) <
∞. Let Lt0(u0) and Σ(u) be defined as in (8) and (14) respectively. If |t0/N − u0| < 1/N ,
then we have
λ−1/2Lt0(u0)D→ N (0, Σ(u0)), (66)
where λ → 0 as N → ∞ and λN ≥ (log N)1+ε, for some ε > 0.
PROOF. Since Lt0(u0) is the weighted sum of martingale differences the result follows from
the martingale central limit theorem and the Cramer-Wold device. It is straightforward to
check the conditional Lindeberg condition, hence we omit this verification. We simply note
that by using (111) we obtain the limit of the conditional variance of Lt0(u0):
µ4
t0−p−1∑
k=0
λ2I − λF (u0)2kσt0−k(u0)4Xt0−k−1(u0)Xt0−k−1(u0)
T
|Xt0−k−1(u0)|41P→ Σ(u0). (67)
We use the proposition above to prove asymptotic normality of at0,N (after a bias correc-
tion).
PROOF of Theorem 2.2. It follows from (65) that
λ−1/2at0,N − a(u0) = λ−1/2Rt0,N (u0) + λ−1/2Lt0(u0) + Op(λ1/2δN ).
18
Therefore, if λ >> N−4β/(4β+1) and λ >> N−2β then λ−1/2
(Nλ)2β → 0 and λ−1/2
Nβ → 0 respec-
tively, and we have
λ−1/2at0,N − a(u0) = λ−1/2Rt0,N (u0) + λ−1/2Lt0(u0) + op(1).
By using the above and Proposition 4.4 we obtain (12). Finally, since |Rt0,N (u0)|1 =
Op(1
(Nλ)β ), if λ >> N− 2β
2β+1 then λ−1/2Rt0,N (u0)P→ 0 and we have (13).
5 An asymptotic expression for the bias under additional
regularity
In this section, under additional assumptions on the smoothness of the parameters a(u), we
obtain an exact expression for the bias, which we use to prove Theorem 2.3.
In the section above we have shown that δBt0,N ≈ Rt0,N (u0). Since Mt(u) is a function
on Xt(u)2 whose derivative exists, the derivative of Mt(u) also exists and is given by
Mt(u) = (Z2t − 1)
Ft−1(u)a(u) + Ft−1(u)a(u)
, (68)
where
Ft−1(u) = Y t−1(u)Y t−1(u)T + Y t−1(u)Y t−1(u)T , (69)
with Y t−1(u) = 1|Xt−1(u)|1
Xt−1(u) and
Y t−1(u) =−∑p
j=1 X2t−j(u)
[1 +∑p
j=1 Xt−j(u)2]2
1
Xt−1(u)2
. . .
Xt−p(u)2
+1
1 +∑p
j=1 Xt−j(u)2
0
X2t−1(u)
. . .
X2t−p(u)
It is interesting to note that, like Mt(u)t, Mt(u)t is a sequence of vector martingale
differences. We will use Mt(u) to refine the approximation Rt0,N (u0) and show Rt0,N (u0) ≈Rt0,N (u0), where
Rt0,N (u0) =
t0−1∑
k=0
λ (1 − λF (u0)) (t0 − k
N− u0)
Mt0−k(u0) + F (u0)a(u0)
. (70)
We use this to obtain Theorem 2.3.
Lemma 5.1 Suppose Assumption 2.2 holds with some r ≥ 2. Then we have
∣
∣Mt(u) − Mt(v)∣
∣
1≤ K|u − v|β′ |Z2
t + 1|Lt, (71)
where Lt =
1 +∑p
k=1 Wt−k
2, with (E(L
r/2t )2/r < ∞. and almost surely
Mt(v) = Mt(u) + (v − u)Mt(u) + (u − v)1+β′
Rt(u, v) (72)
where |Rt(u, v)| ≤ Lt.
19
PROOF. By using (68), a(u) ∈ Lip(1 + β′) and that |Ft−1(u)|1 ≤ (p + 1)2 we have
|Mt(u) − Mt(v)|1≤ K|Z2
t + 1|
|Ft−1(u) − Ft−1(v)|1 + |Ft−1(u) − Ft−1(v)|1 + supu
|Ft−1(u)|1|u − v|
.
(73)
In order to bound (73) we consider Ft(u) and its derivatives. We see that |Ft−1(u) −Ft−1(v)|1 ≤ K|u − v|Lt, thus bounding the first bound on the right hand side of (73). To
obtain the other bounds we use (69). Now by using (23) and Lemma 4.2 we have that
supu |Ft−1(u)|1 ≤ KLt and |Ft−1(u) − Ft−1(v)|1 ≤ KL2t |u − v|β. Altogether this verifies
(71). By using the Cauchy-Schwartz inequality we have that (E(Lr/2t )2/r < ∞.
Finally we prove (73). Since Lt is a well defined random variable almost surely all the
paths of Mt(u) ∈ Lip(1 + β′). Therefore, there exists a set N , such that P (N ) = 0 and for
every ω ∈ N c we can make a Taylor expansion of Mt(u, ω) about w and obtain (72).
We now use (72) to show that Rt0,N (u0) ≈ Rt0,N (u0).
Lemma 5.2 Suppose Assumption 2.2 holds with r ≥ 2. Then if |t0/N −u0| < 1/N we have
Rt0,N (u0) = Rt0,N (u0) + R(4)N , (74)
where E(|R(4)N |r)1/r = O( 1
(Nλ)1+β′ ).
PROOF. To obtain the result we make a Taylor expansion of a( t−kN ) and Mt−k(
t−kN ) about
u0, and substitute this into Rt0,N . By using Lemma 5.1 we obtain the desired result.
In the lemma below we consider the mean and variance of the bias and stochastic terms
Rt0,N (u0) and Lt0(u0), which we use to obtain an asymptotic expression for the bias. We
will use the following results. Since infu λmin(F (u)) > C > 0 (see (108)), we have
t−1∑
k=0
λI − λF (u)kF (u) → I, (75)
∥
∥
t−1∑
k=0
λ2I − λF (u)2k(k
N)2∥
∥ = O(1
N2λ), (76)
if λ → 0, tλ → ∞ as t → ∞.
Lemma 5.3 Suppose Assumption 2.2 holds with r > 4. Let Rt0,N (u0), Lt0(u0) and Σ(u)
be defined as in (8) and (14) respectively. Then if |t0/N − u0| < 1/N we have
E (Rt0,N (u0)) = − 1
NλF (u0)
−1a(u) + O(1
(Nλ)1+β′ ), (77)
20
var (Rt0,N (u0)) = O(1
N2λ+
1
N), (78)
var (Lt0(u0)) = λΣ(u0) + o(λ) (79)
and E (Lt0(u0)) = 0.
PROOF. Since ∂Mk(u)∂u are martingale differences, by using appling (74) to (75) we have
(77). We now show (78). By using (76), supu |∂a(u)∂u | < ∞ and supu |Ft(u)| < (p + 1)2 we
have
var
Rt0,N (u0) = µ4
t0−p−1∑
k=0
λ2I − λF (u0)2k(k
N)2var
Ft0−k−1(u0)a(u0) + Ft0−k−1(u0)a(u0)
+ O(1
N2)
= O(1
N2λ+
1
N2).
It is straightforward to show (79) using (111). Finally, since Lt0(u0) is the sum of martingale
differences, E(Lt0(u0)) = 0.
From the lemma above it immediately follows that
Rt0,N (u0) = − 1
NλF (u0)
−1a(u0) + Op
1
(Nλ)1+β′ +1
Nλ1/2
(80)
and (Nλ)Rt0,N (u0)ℓ2→ −F (u0)
−1a(u0). We now use the above prove Theorem 2.3.
PROOF of Theorem 2.3 By substituting (80) into (65) (with β = 1) we have
at0,N − a(u0) =−1
NλF (u0)
−1a(u0) + Lt0(u0) + δ′NR(5)N , (81)
where (E|R(5)N |2)1/2 < ∞ and δ′N = 1
(Nλ)1+β′ + 1(Nλ)2
+ 1λ1/2N
+λ+ 1N1/2 . By using the above
and var(Lt0(u0)) = λΣ(u0) we have
E|at0,N − a(u0)|2 = λ trΣ(u0) +1
(Nλ)2∣
∣F (u0)−1a(u0)
∣
∣
2+ O
([
(Nλ)−1 +√
λ + δ′N
]
δ′N
)
,
which gives (16). In order to prove (17) we use (81) and Proposition 4.4. We first note that
if λ−1/2/(Nλ)1+β′ → 0, then by using (81) we have
λ−1/2at0,N − a(u0) + λ−1/2 1
λNF (u0)
−1a(u0) = λ−1/2Lt0(u0) + op(1).
Therefore by using Proposition 4.4 we have (17).
21
6 A lower bound for the minimax rate
PROOF of Theorem 2.4 We derive a lower bound for the minimax rate using the van
Trees inequality (cf. Gill and Levit (1995) and Moulines et al. (2005)). In order to do this
we define a prior distribution on a subset of Cβ(L), which we then go on to bound using
the van Trees inequality.
Let us suppose the function φ : [0, 1] → R+ is such that ‖φ‖β ≤ L, supu |φ(u)| = 1 and
∫ 10 φ(u)2du = 1. Let IT = (1, . . . , 1) ∈ R
p+1, JT = (ρ1, 0, . . . , 0) ∈ Rp+1 and define the set
CwN =
J + ηφ(wn(· − u0))I; η ∈ [0, min(w−βN , p−1(1 − δ))]
.
It is straightforward to show for a large enough wN that CwN ⊆ Cβ(L). Hence if a(·) ∈ CwN ,
and are the parameters of the tvARCH process Xt,N, then Assumption 2.1(i,ii,iii) is
satisfied and E(X2t,N ) < (ρ1 + 1)/(1 − δ).
Let λ : [0, 1] → R+ be a density which is continuous with respect to the Lebesgue
measure and define λN (·) = wβNλ(wβ
N ·). It is clear that we can use λN (·) as a density on
the set CwN . For each a(·) = J + ηφ(wn(· − u0))I ∈ CwN we define the tvARCH process
Xt,N = σt,NZt σ2t,N = ρ1 + ηφ(wn(
t
N− u0))Xt−1,N ,
where X Tt−1,N = (1, X2
t−1,N , . . . , X2t−p,N ). Let Eη be the expectation under the measure
generated by the parameters a(·) = (J + ηφ(wn(· − u0))I) and EλNthe expectation under
the measure λN (·). Since CwN ⊆ Cβ(L) it can be shown that
infat0,N∈σ(Xt,N ;t=1,...,N)
supa(u)∈Cβ(L)
E[xT (at0,N − a(u0))(at0,N − a(u0))T x]
≥ EλNinf
at0,N∈σ(Xt,N ;t=1,...,N)Eη[x
T (at0,N − a(u0))(at0,N − a(u0))T x]. (82)
We will assume that φ(·) and ρ1 are known. Now by using the van Trees inequality we have
for all |x| ≤ 1 and at0,N ∈ σ(Xt,N ; t = 1, . . . , N), that
EλNEη[x
T (at0,N − a(u0))(at0,N − a(u0))T x] ≥ 1
I(λN ) + EλN(I(η))
, (83)
where
I(λN ) =
∫
λN (x)
(
d log λN (x)
dx
)2
dx
and
IN (η) =
∫ (
d log fη(x1,N , . . . , xN,N )
dη⌋η=η
)2
fη(x1,N , . . . , xN,N )dy1,N . . . , dy1,N
with fη(x1,N , . . . , xN,N ) the density of Xt,NNt=1. We now derive asymptotic expressions
for I(λN ) and I(η).
22
Using that P (Xt,N ≤ y|Xt−1,N ) = PZ(y/σ2t,N ), where PZ is the distribution function of
Zt. The log density of Xt,NNt=1 is
log fηxN,N , . . . , x1,N =N∑
t=p+1
log1
σt,NfZ
xt,N
σt,N
+ log fXp,N(x1,N , . . . , xp,N ),
where fZ is the density function of Zt, σt,N = [1 + ηφ(wN ( tN − u0))|xt−1,N |1]1/2, with
xt−1,N = (1, x2t−1,N , . . . , x2
t−p,N ) and fXp,Nis the density of Xp,N . Define
ℓt,N (ζ; xt,N , xt−1,N ) = −1
2log[ρ1 + ζφ(wN (
t
N− u0))|xt−1,N |1] −
xt,N
2[ρ1 + ζφ(wN ( tN − u0))|xt−1,N |1]1/2
.
Differentiating ℓt,N (ζ; xt,N , xt−1,N ) wrt. ζ and evaluating it at η, it is straightforward to see
that
2ℓt,N (η ; xt,N , xt−1,N ) = −φ(wN ( t
N − u0))|xt−1,N |1σ2
t,N
(
ρ1 + ztd log fZ(zt)
dzt
)
,
where zt = xt,N/σt,N . Now let us suppose
Ap,N = Eη
(∂ log fXp,NX1,N , . . . , Xp,N
∂η
)2< ∞.
It is well known that the derivative of the log conditional densities are a sequence of
martingale differences; that is ℓt,N (η; Xt,N ,Xt−1,N ) : t = p + 1, . . . , N, are martingale
differences. From this it follows that for t > p, we have
Eη
(
∂ log fXp,N(X1,N , . . . , Xp,N )
∂ηℓt,N (η; Xt,N ,Xt−1,N )
)
= 0.
Therefore, the corresponding Cramer-Rao bound is
IN (η) = Eη
∂ log fXp,N(X1,N , . . . , Xp,N )
∂η+
N∑
t=p+1
ℓt,N (η; Xt,N ,Xt−1,N )
2
=1
4
N∑
t=p+1
φwN (t
N− u0)2
E
[ |Xt−1,N |1σ2
t,N
(
ρ1 + Ztd log fZ(Zt)
dZt
)]2
+ Ap,N .
Since η → 0, as N → ∞, for large enough wN we have that w−βN pE(Z4
t )1/2 < 1, which implies
supt Eη(X4t,N ) < [(ρ1+w−β
N )E(Z4t )1/2/(1−pw−β
N E(Z4t )1/2)]2. Therefore, since |xt−1,N |/σ2
t,N <
|xt−1,N |/ρ1 we obtain
IN (η) ≤ 1
4sup
tEη(|Xt−1,N |21)
N∑
t=p+1
φwN (u0 −t
N)2Z + Ap,N
≤ Z(p + 1)(ρ1 + w−β
N )2E(Z4t )
4ρ1[(1 − pw−βN E(Z4
t )1/2)]2
N∑
t=p+1
φwN (u0 −t
N)2 + Ap,N
≤ K(Z, E(Z40 ), ρ1)
N
wN
∫ 1
0φ(x)2dx + o(
N
wN), (84)
23
where Z = E(ρ1 + Ztd log fZ(x)
dx ⌋x=Zt)2 and for a large enough wN , K(Z, E(Z4
0 ), ρ1) = Z(p +
1)(2ρ1)2E(Z4
t )
4ρ1[(1−E(Z4t )−1/2)]2
. We now bound I(λN ). It is clear that
I(λN ) = w2βN I(λ) where I(λ) =
∫ 1
0
∂ log λ(u)
∂u
2λ(u)du. (85)
Substituting (85) and (84) into (82) and (83) we obtain
infat0,N∈σ(Xt,N ;t=1,...,N)
supa(u)∈Cβ(L)
E[xT (at0,N − a(u0))(at0,N − a(u0))T x]
≥ 1
K(Z, E(Z40 ), ρ1)
NwN
∫ 10 φ(x)2dx + w2β
N I(λ)
By letting wN = O(N1/(1+2β)), the above is minimised and we obtain
infat0,N∈σ(Xt,N ;t=1,...,N)
supa(u)∈Cβ(L)
E[xT (at0,N − a(u0))(at0,N − a(u0))T x] ≥ K(Z, E(Z4
0 ), ρ1)N−2β/(2β+1),
where C(Z, E(Z40 ), ρ1) is a constant which depends only on Z, E(Z4
0 ), ρ1, thus giving the
required result.
A Appendix
A.1 Mixingale properties of φ(Xt,N)
Our object in this section is to prove Proposition 4.1. We do this by using the random
vector representation of the tvARCH process. Let
At(u) =
a1(u)Z2t a2(u)Z2
t . . . ap−1(u)Z2t ap(u)Z2
t
1 0 . . . 0 0
0 1 . . . 0 0...
.... . .
......
0 0 . . . 1 0
At(u) =
(
0 0Tp
0p At(u)
)
, (86)
bt(u)T = (1, a0(u)Z2t , 0T
p−1). By using the definition of the tvARCH process given in (1) we
have that the tvARCH vectors Xt,Nt satisfy the representation
Xt,N = At(t
N)X t−1,N + bt(
t
N). (87)
We note that (87) looks like a vector AR process, the difference is that At(tN ) is a ran-
dom matrix. However, similar to the vector AR case, it can be shown that the product∏t
k=0 Ak(kN ) decays exponentially. It is this property which we use to prove Proposition
4.1.
24
Let AN (t, j) =∏j−1
i=0 At−i(t−iN )
, A(u, t, j) =∏j−1
i=0 At−i(u)
, AN (t, 0) = Ip+1 and
A(u, t, 0) = Ip+1. By expanding (87) we have
X t,N = AN (t, k)X t−k,N +k−1∑
j=0
AN (t, j) b t−j(t − j
N). (88)
Suppose A is a (p + 1) × (p + 1) dimensional random matrix, with (i, j)th element Aij .
Let [A]q be a (p + 1) × (p + 1) dimensional matrix where
[A]q =(
E|Aqij |)1/q
; i, j = 1, . . . , p + 1
.
We will make frequent use of the following inequalities, which can easily be derived by
repeatedly applying Holder’s and Minkowski’s inequalities. Suppose A and X are indepen-
dent random matrices and vectors, respectively, then(
E∣
∣AX∣
∣
q)1/q ≤∣
∣[A]q[X]q∣
∣
1, and if X
has only positive elements then E(
|AX|1∣
∣X)
≤ E(
|[A]1X |1∣
∣X)
. Now by using Subba Rao
(2004), Example 2.3 and Corollary A.2, we have
‖[AN (t, k)]q‖ ≤ K ρk and ‖[A(u, t, k)]q‖ ≤ Kρk, (89)
where K is a finite constant.
PROOF of Proposition 4.1 We first prove (25). Let CN (t, k) :=∑k−1
j=0 AN (t, j) b t−j(t−jN ),
that is
X t,N = AN (t, k)X t−k,N + CN (t, k) (90)
with AN (t, k), CN (t, k) ∈ σ(Zt, . . . , Zt−k+1) and X t−k,N ∈ Ft−k. In particular we have
Eφ(
CN (t, k))
|Ft−k) = Eφ(
CN (t, k))
. (91)
Furthermore, by using Minkowski’s inequality it can be shown that
E(|AN (t, k)Xt−k,N |q|Ft−k)1/q ≤ K|[AN (t, k)]qXt−k,N |1.
The Lipschitz continuity of φ and (88) now imply |φ(
X t,N
)
−φ(
CN (t, k))
| ≤ K|AN (t, k)X t−k,N |.Therefore, by using the above we obtain
|Eφ(Xt,N )|Ft−k) − Eφ(Xt,N )|1 (92)
= |Et−kφ(Xt,N ) − φ(
CN (t, k))
|Ft−k) − Eφ(Xt,N ) − φ(
CN (t, k))
|1≤ K
(
|[AN (t, k)]1X t−k,N |1 + E|[AN (t, k)]1X t−k,N |1)
≤ K ‖[AN (t, k)]1‖|X t−k,N |1 + E(
‖[AN (t, k)]1‖|X t−k,N |1)
≤ K ‖[AN (t, k)]1‖(
|X t−k,N |1 + E|X t−k,N |1)
≤ Kρk(
1 + |X t−k,N |1)
25
since E|X t−k,N |1 is uniformly bounded thus giving (25). The proof of (26) is the same as
the proof above, so we omit details. (27) follows from (25) with the triangular inequality.
To prove (28) we use (90). Since
Eφ(
CN (t, k))
|Ft−k = Eφ(
CN (t, k))
= Eφ(
CN (t, k))
|Ft−k−j
we obtain as in (92)
|Eφ(Xt,N )|Ft−k − Eφ(Xt,N )|Ft−k−j|1≤ K E|[AN (t, k)]1X t−k,N |1|Ft−k + E|[AN (t, k)]1X t−k,N |1
∣
∣Ft−k−j≤ Kρk
(
|X t−k,N |1 + |X t−k,N |1)
which gives (28). To prove (29) we use (92). We first note that
[
E|CN (t, k)|q∣
∣Ft−k
]1/q ≤ K
k−1∑
j=0
∣
∣
∣
∣
[AN (t, j)]q[bt−j(t − j
N)]q
∣
∣
∣
∣
1
≤ K
1 − ρ
and by using Minkowski’s inequality and the equivalence of norms, there exists a constant
K independent of Xt,N such that
E(
|Xt,N |q∣
∣Ft−k
)1/q ≤
E(
|AN (t, k)Xt−k,N |q∣
∣Ft−k
)1/q+
E(
|CN (t, k)|q∣
∣Ft−k
)1/q.
Now by using
E(
|AN (t, k)Xt−k,N |q∣
∣Ft−k
)1/q ≤ Kρk|Xt−k,N |1 we have
E
|Xt,N |q∣
∣Ft−k
≤
Kρk|Xt−k,N |1 +K
1 − ρ
q ≤ K|Xt−k,N |q1,
hence we obtain (29)
We use the corollary below to prove Lemma A.7.
Corollary A.1 Suppose Assumption 2.1 holds with r = q0 and E(Z4q00 ) < ∞. Let f, g :
Rp+1 → R
(p+1)×(p+1) be Lipschitz continuous functions, such that for all positive x ∈ Rp+1,
|f(x)|abs ≤ Ip+1 and |g(x)|abs ≤ Ip+1, where Ip+1 is a p + 1× p + 1-dimensional matrix with
one in all the elements, and |A|abs = (|Ai,j | : i, j = 1, . . . , p + 1). Then for q ≤ q0 we have
[
E(
E
(Z2t−k+1 − 1)f(Xt−k,N )g(Xt(u))
∣
∣Ft−k−j
− E
(Z2t−k+1 − 1)f(Xt−k,N )g(Xt(u))
)q]1/q
≤ KE(Z40 ) + 1ρj
(
(E|Xt−k−j,N |q)1/q + (E|Xt−k−j(u)|q)1/q)
, (93)
and[
E(
E
(Z2t−k+1 − 1)f(Xt−k(u))g(Xt(u))
∣
∣Ft−k−j
− E
(Z2t−k+1 − 1)f(Xt−k(u))g(Xt(u))
)q]1/q
≤ KE(Z40 ) + 1ρj
(
(E|Xt−k−j(u)|q)1/q + (E|Xt−k−j(u)|q)1/q)
, (94)
for j, k ≥ 0, where ρ is such that (1 − η) < ρ < 1.
26
PROOF. We give the proof of (93) only, the proof of (94) is the same. We will use the same
notation introduced in the proof of Proposition 4.1 and let C(u, t, k) =∑k−1
j=0 A(u, t, j) b t−j(u),
that is Xt(u) = A(u, t, k + j)Xt−k−j(u) + C(u, t, k + j).
We will use (91). Since |f(x)|abs ≤ Ip+1 and |g(x)|abs ≤ Ip+1 we have
|(Z2t−k+1 − 1)
(
f(Xt−k,N )g(Xt(u)) − f(CN (t − k, j))g(C(u, t, k + j)))
|≤ |Z2
t−k+1 + 1|(
|AN (t − k, j)Xt−k−j,N |1 + |A(u, t, k + j)Xt−k−j(u)|1)
. (95)
Since A(u, t, k − 1), (Z2t−k + 1)At−k−1(u) and A(u, t − k, j) are independent matrices and
by using (89) we have
E∥
∥[(Z2t−k+1 + 1)A(u, t, k + j)]1
∥
∥ ≤ E∥
∥[A(u, t, k − 1)]1∥
∥××∥
∥[(Z2t−k+1 + 1)At−k+1(u)]1
∥
∥
1
∥
∥[A(u, t − k, j)]1∥
∥ ≤ KE(Z4t−k+1 + 1)ρk+j−1.
(96)
Considering the conditional expectation of (95), by using (96) and
|Et−k−j(Z2t−k+1 − 1)AN (t − k, j)Xt−k−j,N| ≤ Kρj |Xt−k−j,N |1 we have
|E[
(Z2t−k+1 − 1)
f(Xt−k,N )g(Xt(u)) − f(CN (t − k, j))g(C(u, t, k + j)
|Ft−k−1
]
|≤ Kρj |Xt−k−j,N |1E(Z2
t−k+1 + 1) + KE(Z4t−k+1 + 1)ρk+j−1|Xt−k−j(u)|1
≤ KE(Z40 + 1)ρj
(
|Xt−k−j,N |1 + |Xt−k−j(u)|1)
.
Using similar arguments we obtain the bound:
(
E∣
∣E[
(Z2t−k+1 − 1)
f(Xt−k,N )g(Xt(u)) − f(CN (t − k, j))g(C(u, t, k + j)]∣
∣
r)1/r
≤ KE(Z40 + 1)ρj
(
E(|Xt−k−j,N |r)1/r + (E|Xt−k−j(u)|r)1/r)
leading to the result.
A.2 Persistence of excitation
As we mentioned in Section 1, a crucial component in the analysis of many stochastic,
recursive algorithms is to establish persistence of excitation of the transition random matrix
in the algorithm: in our case this implies showing Theorem 4.1. Intuitively, it is clear
that the verification of this result depends on showing that the smallest eigenvalue of the
conditional expectation of of the semi-positive definite matrix Xt,NX Tt,N/‖Xt,N‖2
1 is bounded
away from zero. In particular this is one of the major conditions given in Moulines et al.
(2005), Theorem 16, where conditions were given under which persistence of excitation can
be established. Their result is for a quite general class of algorithms which satisfy
Y t = I − λAt(λ)Y t−1 + Bt.
In our case the random matrix is not a function of λ, so to reduce notation we state the
theorem below for the case where At(λ) = At. Let M+d be the space of all positive semi-
definite symmetric matrices.
27
Theorem A.1 (Moulines, Priouret and Roueff) Let (Ω,F , P, F : n ∈ N) be a filtered
space. Let φt, t ≥ 0 and Vt, t ≥ 0 be non-negative adaptive processes such that Vt ≥ 1
for all t ≥ 1. Let A := At : t ≥ 0 be an adapted M+d valued process. Let s be a positive
integer, and let R1 and µ1 be positive constants. Assume that
(a) there exist µ < 1 and B ≥ 1 such that for all t ∈ N,
E(Vt+s|Ft) ≤ µVtI(φt > R1) + BVtI(φt ≤ R1); (97)
(b) there exists an α1 > 0 such that for all t ∈ N;
E
λmin
(
t+s∑
l=t+1
Al
)
|Ft
≥ α1I(φt ≤ R1); (98)
(c) there exists a λ1 such that for all λ ∈ [0, λ1] and t ∈ N; ‖λAt‖lub,2 ≤ 1;
(d) there exist a q > 1 and C1 > 0 such that for all t ∈ N and λ ∈ [0, λ1]
I(φt ≤ R1)t+s∑
l=t+1
E(
‖Al‖q|Ft
)
≤ C1.
Then for any p ≥ 1 there exist C0, δ0 and µ0 > 0 such that, for all λ ∈ [0, λ1] and n ≥ 1,
we have
E(
‖n∏
i=1
(I − λAi)‖p|F0
)
≤ C0e−δ0λnV0.
We use this theorem to prove Theorem 4.1, where At := Ft,N , φt := |Xt,N |1 and Vt :=
|Xt,N |1. (97) is often referred to as the drift condition, and concerns the actual process
Xt,N. Roughly speaking, under this condition if |Xt,N |1 is large, then after some time
it should ‘drift’ back to the interval [0, R1]. On the other hand (98), places conditions on
the algorithm, in particular the random matrices Ft,Nt. This is when excitation occurs;
loosely this means if |Xt,N |1 ∈ [0, R1] then ‖∏s−1i=0 (I − λFt+i)‖ ≤ (1 − δλ). The analysis
can then be conducted for |Xt,N |1 in the interval [0, R1], the drift condition then is used to
extend the results over the whole real line.
Suppose X is a random variable and define Et(X) = E(X|Ft).
Lemma A.1 Suppose that Assumption 2.1 holds with r = 2. Then, we have
λmin
(
E
Xt(u)Xt(u)T |Ft−k
)
> C, (99)
and for N large enough
λmin
(
E
Xt,NX Tt,N |Ft−k
)
>C
2, (100)
for k ≥ (p + 1), where C is a finite constant independent of t, N and u.
28
PROOF. We will prove (100). The proof of (99) is similar.
Our object is to partition Xt,NX Tt,N as follows
Xt,NX Tt,N = ∆1 + ∆2, (101)
where ∆1 and ∆2 are positive-definite matrices, ∆1 ∈ σ(Zt, . . . , Zt−p) and ∆2 ∈ Ft,N . This
implies λminEt−k(Xt,NX Tt,N ) ≥ λminE(∆1), if k ≥ p + 1, which allows us to obtain
a uniform lower bound for λminEt−k(Xt,NX Tt,N ). To facilitate this we represent Xt,N in
terms of martingale vectors.
By using (1) we have
X2t,N = a0(
t
N) +
p∑
i=1
ai(t
N)X2
t−i,N + (Z2t − 1)σ2
t,N
and
Xt,N = Θ(t
N)Xt−1,N + (Z2
t − 1)σ2t,ND (102)
where D is a (p + 1)-dimensional vector with Di = 0 if i 6= 2 and D2 = 1 and
Θ(u) =
1 0 0 0 0 0
a0(u) a1(u) a2(u) . . . ap−1(u) ap(u)
0 1 0 . . . 0 0
0 0 1 . . . 0 0...
.... . .
......
0 0 0 . . . 1 0
.
Iterating (102) (p + 1)-times we have
Xt,N =
p∑
i=0
(Z2t−i − 1)σ2
t−i,Ni−1∏
j=0
Θ(t − j
N)D +
p∏
j=0
Θ(t − j
N)Xt−p−1,N .
Let µ4 = E(Z2t − 1)2, Θt,N (i) =
∏i−1j=0 Θ( t−j
N ) and Bt,N (i) = Θt,N (i)DDT Θt,N (i)T . Since
(Z2t − 1)σ2
t,Nt are martingale differences we have
E(Xt,NX Tt,N |Ft−k,N ) = µ4
p∑
i=0
Et−k(σ4t−i,N )Θt,N (i)DDT Θt,N (i)T + (103)
+Θt,N (p + 1)Et−k(Xt−p−1,NX Tt−p−1,N )Θt,N (p + 1)T , for k ≥ p + 1.
Since the matrices above are non-negative definite, we have that
λmin
E(Xt,NX Tt,N |Ft−k)
≥ λmin
µ4
p∑
i=0
Et−k(σ4t−i,N )Bt,N (i)
. (104)
29
We now refine µ4∑p
i=0 Et−k(σ4t−i,N )Bt,N (i) to obtain ∆1. By using (88) we have
Xt,N =
p∑
j=0
AN (t, j)bt−j(t − j
N) + AN (t, p + 1)Xt−p−1,N .
Therefore by using (1) and X2t−i,N = (Xt,N )i+1 we have
σ2t−i,N =
1
Z2t−i
p∑
j=0
AN (t, j)bt−j(t − j
N)
i+1+
1
Z2t−i
AN (t, p + 1)Xt−p−1,Ni+1
= Ht−i,N (t) + Gt−i,N (t), (105)
where Ht−i,N (t) ∈ σ(Zt−i, . . . , Zt−p) and Gt−i,N (t) ∈ Ft−i. Since Ht−i,N (t) and Gt−i,N (t)
are positive this implies with (104)
λminE(Xt,NX Tt,N |Ft−k) ≥ λmin
Pt,N
,
with
Pt,N = µ4
p∑
i=0
E(Ht−i,N (t)2)Bt,N (i).
To bound this we define the corresponding terms for the stationary approximation Xt(u).
We set
P (u) = µ4
p∑
i=0
E(Ht−i(u, t)2)B(u, i),
where
Ht−i(u, t) =1
Z2t−i
p∑
j=0
A(u, t, j)bt−j(u)
i+1
and B(u, i) = Θ(u)iDDT (Θ(u)i)T . A close inspection of the above calculation steps reveals
that
P (u) = E
Y p(u)Y p(u)T
with Y p(u) from Assumption 2.1(iv). Therefore,
λmin
P (t
N)
≥ infu
λminP (u) ≥ C.
Since aj(·)j is β-Lipschitz we have
|E(Ht−i(t
N, t)2) − E(Ht−i,N (t)2)|1 ≤ K
Nβ
|Bt,N (i) − B(t
N, i)|1 ≤ K
Nβ, for i = 0, . . . , p − 1.
30
Therefore ‖Pt,N − P ( tN )‖ ≤ |Pt,N − P ( t
N )|1 ≤ KNβ which leads to
λminPt,N = λminP (t
N) + [Pt,N − P (
t
N)] ≥ λminP (
t
N) − ‖Pt,N − P (
t
N)‖ ≥ C − K
Nβ
and therefore to
λmin
(
E
Xt,NX Tt,N |Ft−k,N
)
> C − K
Nβ. (106)
Thus for N large enough we have (100). (99) follows in the same way.
Lemma A.2 Suppose that Assumption 2.1 holds with r = 4. Then, for k ≥ p + 1, we have
for N large enough
λmin
(
EXt,NX T
t,N
|Xt,N |21∣
∣Ft−k
)
>C
|Xt−k,N |41(107)
and
λmin
(
EXt(u)Xt(u)T
|Xt(u)|21)
> C, (108)
where C is a constant independent of t, N and u.
PROOF. We now prove (107). By definition
λminEt−k(Xt,NX T
t,N
|Xt,N |21) = inf
|x|=1Et−k
(xTXt,N
|Xt,N |1)2
.
Since
(xTXt,N )2 =xTXt,N
|Xt,N |1xTXt,N |Xt,N |1
and sup|x|=1 |xTXt,N |1 ≤ |Xt,N |1, we obtain by using Cauchy’s inequality and (29)
Et−k(xTXt,N )2 ≤
Et−k(xTXt,N
|Xt,N |1)21/2Et−k(x
TXt,N |Xt,N |1)21/2
≤
Et−k(xTXt,N
|Xt,N |1)21/2Et−k(|Xt,N |41)1/2
≤
Et−k(xTXt,N
|Xt,N |1)21/2K|Xt−k,N |411/2.
Therefore by using the above and Lemma A.1 for large N we obtain
inf|x|=1
Et−k
(xTXt,N
|Xt,N |1)2 ≥ inf
|x|=1
[
Et−k(xTXt,N )2
]2
Et−k(|Xt,N |41)≥ C
|Xt−k,N |41,
where C is a positive constant, thus giving (107). To prove (108) we use (99) to get
E(Xt(u)Xt(u)T ) = E−∞(Xt(u)Xt(u)T ) > C,
31
and using the arguments above we have
λmin
(
EXt(u)Xt(u)T
|Xt(u)|21)
>C
E(|Xt−k(u)|41).
By Lemma 4.1 supu E(|Xt(u)|41) < ∞, which leads to (108).
Corollary A.2 Suppose Assumption 2.1 holds with r = 4. Let F (u) be defined as in (10).
Then there exists C and λ1 such that for all λ ∈ [0, λ1] and u ∈ (0, 1] we have
λmaxI − λF (u) ≤ (1 − λC). (109)
There exists a 0 < δ ≤ C and K such that for all k
‖I − λF (u)k‖ ≤ K(1 − λδ)k. (110)
Furthermore
t−1∑
k=0
λI − λF (u)2k → 1
2F (u)−1 (111)
where λ → 0 and λt → ∞ as t → ∞.
PROOF. (109) follows directly from (108). Furthermore, since (I − λF (u)) is symmetric
matrix, we have ‖I − λF (u)k‖ ≤ ‖(I − λF (u))‖k ≤ (1 − λδ)k, that is (110).
We now prove (111). We have
λt−1∑
k=0
I − λF (u)2k = λ
I − (I − λF (u))2−1
I − (I − λF (u))2t
.
Since λminF (u) > C, for some C > 0, we have |I−λF (u)2t|1 ≤ ‖I−λF (u)2t‖|Ip+1| →0 as λ → 0, λt → ∞ and t → ∞. Furthermore, λ
I−(I−λF (u))2−1 → 1
2F (u)−1. Together
they give (111).
Lemma A.3 Suppose that Assumption 2.1 holds with r = 4. For a sufficiently large N and
for every R > 0 there exists a s0 > p+1 and C1 such that, for all s ≥ s0 and t = 1, . . . , N−s
we have
E
λmin
( t+s−1∑
k=t
Xk,NX Tk,N
|Xk,N |21
∣
∣
∣
∣
Ft
)
≥ C1I(‖Xt,N‖1 ≤ R),
where I denotes the identity function.
PROOF. The result can be proved using the methods given in Moulines et al. (2005),
Lemma 19. We outline the proof. By using λmin(A) ≥ λmin(B) − ‖A − B‖ (see Moulines
32
et al. (2005), Lemma 19) we have
Et
λmin
( t+s−1∑
k=t
Xk,NX Tk,N
|Xk,N |21
)
≥
t+s−1∑
k=t
λmin
Et
(Xk,NX Tk,N
|Xk,N |21)
− Et
∥
∥
t+s−1∑
k=t
Xk,NX Tk,N
|Xk,N |21− Et
(Xk,NX Tk,N
|Xk,N |21)
∥
∥.
We now evaluate an upper bound for the second term above. Let ∆k,N =Xk,NXT
k,N
‖Xk,N‖21
−
Et(Xk,NXT
k,N
‖Xk,N‖21
), then
Et‖t+s−1∑
k=t
∆k,N‖2 ≤ Et|t+s−1∑
k=t
∆k,N |2 ≤p+1∑
i,j=1
t+s−1∑
k1,k2=t
Et((∆k1,N )i,j(∆k2,N )i,j). (112)
We now bound Et((∆k1,N )i,j(∆k2,N )i,j). For k1 = k2 = k we obtain with (29)
Et
(∆k,N )i,j2 ≤ 2Et|Xk,N |41 ≤ K|Xt,N |41. (113)
Now let k1 6= k2. Let φ(x) = xT x/|x|21, where where x = (1, x1, . . . , xp). Since φ ∈ Lip(1),
by using (28) we have for k1 < k2:
|Ek1((∆k2,N )i,j)| = |Ek1((Xk2,NX T
k2,N )i,j
|Xk2,N |21) − Et(
(Xk2,NX Tk2,N )i,j
|Xk2,N |21)|
≤ Kρk2−k1(Et(|Xk1,N |1) + |Xk1,N |1). (114)
By using (29), (113) and (114) we have
|Et
(∆k1,N )i,j(∆k2,N )i,j
| ≤ |Et
Ek1(∆k1,N )i,j(∆k2,N )i,j
|≤ |Et
(∆k1,N )i,jEk1(∆k2,N )i,j
|≤ Et(∆k1,N )2i,j1/2
Et(Ek1(∆k2,N )i,j)21/2
≤ Kρk2−k1 |Xt,N |31 ≤ Kρk2−k1 |Xt,N |41. (115)
Substituting (115) into (112) we obtain
Et
∥
∥
t+s−1∑
k=t
∆k,N
∥
∥
2= Ks|Xt,N |41, (116)
where K is a constant independent of s. Therefore by using (107) and (116) we have for N
sufficiently large
Et
λmin
( t+s−1∑
k=t
Xk,NX Tk,N
|Xk,N |21
)
≥ Cs
2|Xt,N |−4
1 − K
s|Xt,N |411/2.
33
Therefore for any R > 0,
Et
λmin
( t+s−1∑
k=t
Xk,NX Tk,N
|Xk,N |21
)
≥ Cs
2R4− Ks1/2R2
I(|Xt,N |1 ≤ R)
=s1/2
R4
(
Cs1/2 − KR2)
I(|Xt,N |1 ≤ R).
Now choose s0 and corresponding C1 such that
C1 =s1/20
R4
(
Cs1/20 − KR2
)
> 0.
Then it is clear if s > s0 we have
Et
λmin
( t+s−1∑
k=t
Xk,NX Tk,N
|Xk,N |21
)
≥ s1/2
R4
(
Cs1/2 − KR2)
I(|Xt,N |1 ≤ R) ≥ C1I(|Xt,N |1 ≤ R),
thus we have the result.
PROOF of Theorem 4.1 We prove the result by applying Theorem A.1 to Al =
Fl,N ,Fl = σ(Zl, Zl−1, . . .) : p + 1 ≤ l ≤ N and verifying the conditions in Theorem
A.1, Theorem 4.1 then immediately follows. Let φl := |Xl,N |1, Vl := |Xl,N |1 and Al :=
Fl,N = Xl,NX Tl,N/|Xl,N |21. Let k ≥ p + 1 and 1 ≤ s ≤ N − k. Then by using (27) we have
E(Vt+s|Ft) ≤ K(1 + ρs|Xt,N |1). Therefore for any R1 and s > 0 we have
E(Vt+s
∣
∣Ft) ≤ Kρs +K
R1I(|Xt,N |1 > R1)|Xt,N |1 + KI(|Xt,N |1 ≤ R1).
Thus we have
E(Vt+s
∣
∣Ft) ≤ (Kρs +K
R1)I(|Xt,N |1 > R1)|Xt,N |1 + K(ρs + 1)I(|Xt,N |1 ≤ R1)|Xt,N |1.
By choosing an appropriate R1 we can find a s1 such that for all s ≥ s1 we have Kρs +
K/R1 < 1 and thus condition (a) is satisfied. Condition (b) directly follows from Lemma
A.3. Let Ip+1 be a (p + 1) × (p + 1) matrix where (Ip+1)ij = 1 for 1 ≤ i, j ≤ (p + 1). Since
‖Fl,N‖ =XT
l,NXl,N
|Xl,N |21= 1, for λ < 1, we have λ‖Fl,N‖ < 1, hence condition (c) is satisfied.
Finally by using the above for any q ≥ 1 and t ∈ N − p, . . . , t we have
r+s1∑
l=t
E(‖Fl,N‖q|Fj) ≤ k1(p + 1)2q.
Therefore condition (d) is satisfied and Theorem 4.1 follows from Theorem A.1.
A.3 The lower order terms in the pertubation expansion
In this section we will prove the auxillary results required in Section 4.2, where we showed
that the second order terms in the pertubation expansion were a lower order than the
principle terms. The analysis of Jx,2t0,N is based on partitioning it into two terms
Jx,2t0,N = Ax
t0,N + Bxt0,N ,
34
similar to the partition in (54).
Axt0,N is the weighted sum of Fk,N − Fk(u) whereas Bx
t0,N is the weighted sum of the
differences between the stationary approximation Fk(u0) and E(F0(u0)), that is of Fk(u0) =
Fk(u0)−F (u0). In this section we evaluate bounds for these two terms. We use the following
lemma.
Lemma A.4 Suppose Assumption 2.1 holds with some r ≥ 1. Let Mt,N and Mt(u) be
defined as in (30) and let Dt,N be defined as in (46). Then we have
Xt,NX Tt,N
|Xt,N |21=
Xt(u)Xt(u)T
|Xt(u)|21+ Rt,N (u) (117)
and Mt,N = Mt(u) + (1 + Z2t )Rt,N (u)
where
|Rt,N (u)| ≤(
| t
N− u|β + (
p
N)β +
1
Nβ
)
Dt,N
Further, for q ≤ q0 there exists a constant K independent of t, N and u such that
(E|Rt,N (u)|q)1/q ≤ K(| t
N− u|β +
(p + 1
N
)β). (118)
PROOF. The proof uses (22) and the method given in Dahlhaus and Subba Rao (2006),
Lemma A.4.
We now give a bound for a general Axt,N .
Lemma A.5 Suppose Assumption 2.1 holds with r = q0 and let Gt,N be a random process
which satisfies supt,N ‖Gt,N‖Eq0
< ∞. Let Ft,N , Ft(u) and F (u) be defined as in (31) and
(10), respectively. Then if | t0N − u0| < 1/N and q ≤ q0 we have
E|t0−p−1∑
k=0
(I − λF (u0))k(
Ft0−k−1,N − Ft0−k−1(u0))
Gt0−k,N |q/2)2/q ≤ K
(δNλ)βsupt,N
(E|Gt,N |q)1/q,
(119)
where K is a finite constant.
PROOF. By using (110) and (117) we get the result.
We now consider the second term Bxt0,N . A crude way to bound E|Bx
t0,N |q is to use
Minkowski’s inequality and directly take the norm into the summand. However, with
this method we do not obtain the desired lower order. To overcome this problem we use
Burkholder type inequalities.
We now state a generalisation of Proposition B.3, in Moulines et al. (2005). The following
result can be proved by adapting the proof of Proposition B.3, in Moulines et al. (2005).
35
Lemma A.6 Suppose Mt and Ft are random matrices and F is a positive definite, de-
terministic matrix, with λmin(F ) > δ, for some δ > 0. Let Ft = σ(Ft, Mt, Ft−1, Mt−1, . . .).
We assume for some q ≥ 2
(i) Ft are identically distributed with mean zero.
(ii) E(Mt|Ft−1) = 0.
(iii) E|E(Ft|Fk)|2q)1/2q ≤ Kρt−k
(iv) E|E(MtFs|Fk) − E(MtFs)|q)1/q ≤ Kρs−k if k ≤ s ≤ t.
(v) supt(E|Mt|2q)1/(2q) < ∞ and (E|Ft|2q)1/(2q) < ∞.
Then we have
(
E|t−p−1∑
k=0
t−p−k−2∑
i=0
(I − λF )k+iFt−k−1Mt−k−i−1|q)1/q
≤ K
δλ, (120)
where K is a finite constant.
PROOF. We note since λminF ≥ δ > 0, then
‖(I − λF )k‖ ≤ K(1 − λδ)k. (121)
Change of variables in (120) gives
t−p−1∑
k=0
t−p−k−2∑
i=0
(I − λF )k+iFt−k−1Mt−k−i−1 =t−1∑
k=p
k∑
i=p+1
(I − λF )t−i−1FkMi
=t−1∑
k=p
(I − λF )t−k−1Fk
k∑
i=p+1
(I − λF )k−iMi
(using the convention∑0
i=1 = 0). Let φk = (I −λF )t−k−1Fk∑k
i=1(I −λF )k−iMi. Then we
have
E|t−1∑
k=p
k∑
i=p+1
(I − λF )t−i−1FkMi|q
1/q
≤ K
E|t−1∑
k=p
E(φk)| +
E|t−1∑
k=p
φk − E(φk)|q1/q
.(122)
We first consider |∑t−1k=p E(φk)|. Since E(FkMi) = E(MiE(Fk|Fi)), under the stated as-
sumptions, and using the norm inequality, |Ax| ≤ ‖A‖ |x|, we obtain
|E((I − λF )t−i−1FkMi)| ≤ K(1 − λδ)t−i−1E(
|Mi| |E(Fk|Fi)|)
≤ Kρk−iE|Mi| ≤ Kρk−i.
Substituting the above into E(φk) gives the bound
|E(φk)| ≤ K(1 − λδ)t−k−1k∑
i=p+1
ρk−i ≤ KS(ρ)(1 − λδ)t−k−1,
36
where S(ρ) = 1/(1 − ρ). Thus
|t−1∑
k=p
E(φk)| ≤ Kt−1∑
k=p
(1 − λδ)t−k−1S(ρ) = O(λ−1). (123)
We now consider the second term on the right hand side of (122), (E|∑t−1k=p
φk−E(φk)
|q)1/q.
Let φk = φk − E(φk). By using the generalised Burkholder inequality (see Dedecker and
Doukhan (2003)) we have
(E|t−1∑
k=p
φk|q)1/q ≤
2qt−1∑
k=p
(E|φk|q)1/qt−1∑
j=k
(E|E(φj |Fk)|q)1/q
1/2
. (124)
We treat the two summands, on the right hand side of (124), separately. By using Holder’s
inequality, the norm inequality and (121) we have
(E|φk|q)1/q =
E|(I − λF )t−k−1k∑
i=p+1
(I − λF )k−iMiFk − E(MiFk)|q
1/q
≤ 2K(1 − λδ)t−k−1(E|k∑
i=p+1
(1 − λδ)k−iMiFk|q)1/q
≤ 2K(1 − λδ)t−k−1(E|Fk|2q)1/(2q)(E|k∑
i=p+1
(1 − λδ)k−iMi|2q)1/(2q). (125)
By applying Burkholder’s inequality to (E|∑ki=p+1(1 − λδ)k−iMi|2q)1/(2q), we have
(E|k∑
i=p+1
(1 − λδ)k−iMi|2q)1/(2q) ≤
2qk∑
i=p+1
(1 − λδ)2(k−i)((E|Mi|2q)1/q)1/2
.
Substituting the above into (125) gives
(E|φk|q)1/q ≤ 2K(1 − λδ)t−k−1 supk
(E|Fk|2q)1/(2q) supi
(E|Mi|2q)1/2q2q
k−p−1∑
i=0
(1 − λδ)2i1/2
≤ K
λ1/2(1 − λδ)t−k−1. (126)
We now consider∑t−1
j=k(E|E(φj |Fk)|q)1/q. We treat the cases Mi ∈ Fk and Mi /∈ Fk
37
separately, and obtain the following partition
t−1∑
j=k
‖E(φj |Fk)‖Eq ≤
t−1∑
j=k
E|(I − λF )t−j−1j∑
i=p+1
(I − λF )j−iE[MiFj − E(MiFj)]|Fk|q
1/q
≤t−1∑
j=k
E|(I − λF )t−j−1k∑
i=p+1
(I − λF )j−iE[MiFj − E(MiFj)]|Fk|q
1/q
+
+t−1∑
j=k
(
E|(I − λF )t−j−1j∑
i=k+1
(I − λF )j−iE[MiFj − E(MiFj)]|Fk|q
)1/q
= Ak + Bk. (127)
Since for all random variables A, |E(A)| ≤ (E|E(A|Fk|q)1/q (if q ≥ 1) and E(MiFj |Fk) =
MiE(Fj |Fk) (for i ≤ k), we have
Ak ≤ 2t−1∑
j=k
‖(I − λF )t−j−1E
k∑
i=p+1
(I − λF )j−iMiFj |Fk‖Eq
≤ 2Kt−1∑
j=k
(1 − λδ)t−j−1
E|E(Fj |Fk)k∑
i=p+1
(I − λF )j−iMi|q
1/q
.
Now by using the Holder inequality and (E|E(Fj |Fk)|2q)1/(2q) ≤ Kρj−k we have
Ak ≤ 2t−1∑
j=k
(1 − λδ)t−j−1(E|E(Fj |Fk)|2q)1/2q (E|k∑
i=p+1
(I − λF )j−iMi|2q)1/2q
≤ 2
t−1∑
j=k
Kρj−k2q
k∑
i=p+1
(1 − λδ)j−i(E|Mi|2q)1/q)1/2 ≤ K
λ1/2. (128)
We now bound Bk:
Bk ≤ Kt−1∑
j=k
(1 − λδ)t−j−1j∑
i=k+1
(1 − λδ)j−i(E|E[MiFj − E(MiFj)]|Fk|q)1/q. (129)
We will evaluate two different bounds for ‖E(MiFj − E(MiFj))|Fk‖Eq . We use the mini-
mum of these bounds to evaluate Bk. Since i > k, (E|E(A|Fk)‖Eq ≤ (E|E(A|Fi)|q)1/q, under
the assumption (E|E(Fj |Fi)|2q)1/2q ≤ Kρj−i, we have
(E|E[MiFj − E(MiFj)]|Fk|q)1/q ≤ (E|E[MiFj − E(MiFj)]|Fi|q)1/q
≤ 2(E|EMiFj |Fi|q)1/q ≤ 2(E|MiEFj |Fi|q)1/q
≤ 2(E|Mi|2q)1/2q|EFj |Fi|2q)1/(2q) ≤ K supi
(E|Mi|2q)1/(2q)ρj−i.
Using again the stated assumption (iv), we get the alternative bound
(E|E[MiFj − E(MiFj)]|Fk|q)1/q ≤ Kρi−k.
38
Therefore we have
(E|E(MiFj − E(MiFj))|Fk|q)1/q ≤ K min(ρj−i, ρi−k)
leading to
Bk ≤ Kt−1∑
j=k
(1 − λδ)t−j−1j∑
i=k+1
(1 − λδ)j−i min(ρj−i, ρi−k) (130)
≤t−1∑
j=k
j∑
i=k+1
min(ρj−i, ρi−k) ≤ K. (131)
Substituting (128) and (130) into (127) gives
t−p−1∑
j=k
(E|E(φj |Fk)|q)1/q ≤ Ak + Bk ≤ K +K
λ1/2, (132)
which, by substituting (126) and (132) into (124), leads to
E|t−1∑
k=p
φk|q)1/q ≤ 2qt−1∑
k=0
(1 − λδ)t−k−1 K
λ1/2
(
K +K
λ1/2
)
1/2 ≤ K
λ. (133)
Finally by using (122), (123) and (133) we get the assertion.
We now apply the lemma above to the tvARCH online recursive algorithm.
Lemma A.7 Suppose Assumption 2.1 holds for r > 4. Let Ft(u), Mt,N and Mt(u)be defined as in (30) and Ft(u) = Ft(u) − E(Ft(u)). Then, we have
(
E
t0−p−1∑
k=0
t0−p−k−2∑
i=0
I − λF (u)k+iFt0−k−1(u)Mt0−k−i−1,N |r/2
)2/r
≤ K
λ(134)
and
(
E|t0−p−1∑
k=0
t0−p−k−2∑
i=0
I − λF (u)k+iFt0−k−1(u)Mt0−k−i−1(u)|r/2
)2/r
≤ K
λ. (135)
PROOF. We prove (134), the proof of (135) is the same. We will prove the result by
verifying the conditions in Lemma A.6, then (134) immediately follows. By using (108) we
have that λminF (u) ≥ δ, for some δ > 0. Let Mt := Mt,N , Ft := Ft(u), F := F (u) and
Ft = σ(Zt, Zt−1, . . .). It is clear from the definition that the series Ft(u)t has zero mean
and are identically distributed, also E(Mt(u)|Ft−1) = 0p+1×p+1. By using (25) we have
(
E|E(Ft(u)|Fk)|r)1/r
= (E|E(Ft(u)|Fk) − E(Ft(u))|r)1/r ≤ Kρt−k,
39
thus condition (iii) is satisfied. Since Mt,N = (Z2t − 1)σ2
t,NXt−1,N/|Xt−1,N |21, and Ft ≤ Ip+1
and σ2t,NXt−1,N/|Xt−1,N |21 ≤ Ip+1, by using Corollary A.1 and supt,N (E|Xt,N |r/2)2/r < ∞,
we can show that condition (iv) is satisfied.
Moreover, Ft,N is a bounded random matrix, hence all its moments exist. Finally, since
supt,N |Mt,N |rr ≤ K(Z2t + 1)r and E(Z2r
0 ) < ∞ we have for all k ≤ s ≤ t ≤ N , that
supt,N (E|Mt,N |r) < ∞, leading to condition (v). Thus all the conditions of Lemma A.6 are
satisfied and we obtain (134).
References
Aguech, R., Moulines, E., & Priouret, P. (2000). On a perturbation approach for the
analysis of stochastic tracking algorithm. SIAM Journal on Control and Optimization,
39, 872-99.
Chen, X., & White, H. (1998). Nonparametric learning with feedback. Journal of Economic
Theory, 82, 190-222.
Dahlhaus, R., & Subba Rao, S. (2006). Statistical inference of time varying ARCH processes.
Ann. Statistics, 34.
Gill, R., & Levit, B. (1995). Applications of the van Trees inequality: a Bayesian Cramer-
Rao bound. Bernoulli, 1/2, 59-79.
Guo, L. (1994). Stability of recursive stochastic tracking algorithms. SIAM J. on Control
and Optimisation, 32, 1195-1225.
Hall, P., & Heyde, C. (1980). Martingale Limit Theory and its Application. New York:
Academic Press.
Kushner, H., & Yin, G. (2003). Stochastic approximation and recursive algorithms and
applications. Heidelberg: Springer.
Ljung, L., & Soderstrom, T. (1983). Theory and practice of recursive identification. Cam-
bridge, MA: MIT Press.
Mikosch, T., & Starica, C. (2000). Is it really long memory we see in financial returns. In
P. Embrechts (Ed.), Extremes and integrated risk management (p. 439-459). London:
Risk Books.
Mikosch, T., & Starica, C. (2004). Non-stationarities in financial time sries, the long-range
dependence, and the IGARCH effects. The Review of Econometrics and Statistics,
86, 378-390.
40
Moulines, E., Priouret, P., & Roueff, F. (2005). On recursive estimation for locally station-
ary time varying autoregressive processes. Ann. Statist., 33, 2610-2654.
Solo, V. (1981). The second order properties of a time series recursion. Ann. Statist., 9,
307-317.
Solo, V. (1989). The limiting behavior of the LMS. IEEE Trans. Acoust. Speach and Signal
Proceess, 37, 1909-1922.
Subba Rao, S. (2004). On some nonstationary, nonlinear random processes and their
stationary approximations. Preprint.
White, H. (1996). Parametric statistical estimation using artifical neural networks. In
P. Smolensky, M. Mozer, & D. Rumelhart (Eds.), Mathematical perspectives on neural
networks (p. 719-775). Hilldale, NJ: L. Erlbaum Associates.
41