Robust Estimation in Vector Autoregressive Models Based on a Robust Scale

Robust Estimation in Vector AutoregressiveModels Based on a Robust Scale

By Marta García Ben, Universidad de Buenos AiresAna J. Villar, Universidad de Buenos Aires

and Víctor J . Yohai, Universidad de Buenos Aires and CONICET

Abstract. A new class of robust estimates for vector autoregressive processesis proposed. The autoregressive coefficients and the covariance matrix of theinnovations are estimated simultaneously by minimizing the determinant ofthe covariance matrix estimate, subject to a constraint on a robust scaleof the Mahalanobis norms of the innovation residuals. By choosing as ro-bust scale a τ− estimate, the resulting estimates combine good robustnessproperties and asymptotic efficiency under Gaussian innovations. These es-timates are asymptotically normal and in the case that the innovations havean elliptical distribution, their asymptotic covariance matrix differs onlyby a scalar factor from the one corresponding to the maximum likelihoodestimate.

Keywords. Vector autoregressive models; multivariate time series; robustestimation, τ -estimates

1 Introduction

Let xt = (x1t, . . . , xmt)0 (1 ≤ t ≤ T ) be observations corresponding toa stationary vector autoregressive process (henceforth VAR(p)). This meansthat there exist m×m matrices

φr = (φr,ij)1≤i,j≤m (r = 1, . . . , p)

with φp 6= 0 and ν ∈ Rm such thatxt = φ1xt−1 + · · ·+ φpxt−p + ν + at, (1.1)

1

where at is a multivariate white noise process, i.e., the at are independentand identically distributed random vectors with

E(at) = 0, E(at a0t) = Σ,

where Σ is a nonsingular m×m matrix. Put

Φ(z) = Im −pXr=1

φr zr,

where Im is the m×m identity matrix. The stationarity of xt requires thatthe roots of

detΦ(z) = 0,where det(A) denotes the determinant of A, lie outside the unit circle |z| ≤ 1.The unknown parameters of model (1.1) are φ1, ...,φp , ν and Σ.

Wilson (1973) proposed to estimate the parameters of this model bymaximizing a normal likelihood conditionally on (x1, x2,...,xp) . We willcall this estimate conditional maximum likelihood estimate (CMLE) andit is known that it has the same asymptotic normal distribution as theexact maximum likelihood estimate. Therefore the CMLE is asymptoticallyefficient under normal innovations. However, the CMLE is very sensitive toviolations of the normality assumption and to the presence of a few atypicalobservations (outliers).

Let A be an m ×m non singular matrix and b ∈ Rm; then the processx∗t = Axt + b is also a VAR(p) process with parameters AφiA

−1 (1 ≤ i ≤p), Aν+(Im −A(Pp

r=1 φr)A−1)b and covariance matrix AΣA0. Then it is

natural to require that estimates of the VAR(p) model parameters satisfythe following affine equivariance properties:

bφi(x∗t )1≤t≤T = Abφi(xt)1≤t≤TA−1 (1 ≤ i ≤ p),

bν(x∗t )1≤t≤T = Abν(xt)1≤t≤T+ (Im −A( pXr=1

bφr(xt)1≤t≤T)A−1)b,bΣ(x∗t )1≤t≤T = AbΣ(xt)1≤t≤TA0.Put zt = (x0t−1, ....x0t−p,1)0 and Ω = (φ1, ....φp, ν). The dimension of zt is

r × 1, where r = mp+ 1 and the dimension of Ω is m× r . Then we havext = Ωzt + at.

Define the innovation residuals by

bat(Ω) = xt − Ωzt2

and the Mahalanobis norms of this residuals by

dt(Ω,Σ) = (ba0t(Ω)Σ−1bat(Ω))1/2.It is well known that the CMLE of Ω and Σ is given by

(bΩ, bΣ) = argminΩ,Σ

det(Σ) (1.2)

subject to1

T − pTX

i=p+1

d2t (Ω,Σ) = m. (1.3)

Given a sample u1, ..., un, let s(u1, ...un) be the square root of the meansquared error (SRMSE) scale estimate given by

s(u1, ...un) =

Ã1

n

nXi=1

u2i

!1/2. (1.4)

Then the constraint (1.3) can be also expressed as

s2(dp+1(Ω,Σ), ..., dT (Ω,Σ)) = m. (1.5)

Let ωi be the i − th row of Ω and bωi its CMLE. Then bωi is the leastsquares estimate of the regression model

xti = ω0izt + ati, p+ 1 ≤ i ≤ T.

The covariance matrix estimate bΣ is the sample covariance of the bat(bΩ)’s.The CMLE is asymptotically efficient under Gaussian innovations. How-

ever, it is known that is very sensitive to outliers and can be very inefficienteven for small departures from normality. The only two classes of robustestimates proposed for VAR models are extensions of the RA—estimates(Bustos and Yohai (1986)). The first class is due to Li and Hui (1989), butthese estimates are not affine equivariant. The estimates of the second class,introduced by García Ben, Martinez and Yohai (1999), are affine equivariant.

In this paper we define robust estimates of (Ω, Σ) that are still of theform (1.2) and (1.5), but the SRMSE scale s given in (1.4) is replaced bya robust scale. More precisely, we propose to use a scale in the class of τ-estimates introduced by Yohai and Zamar (1988). This approach is similarto the one used by Lopuhaä (1991) to estimate location and scatter matrixfrom multivariate data.

3

The proposed estimates, which will be called τ -estimates for VAR mod-els, are qualitatively robust for stationary VAR models, i.e., a small fractionof outliers does not change the asymptotic value of these estimates. SeeHampel (1971) and Boente, Fraiman and Yohai (1987) for rigorous formal-ization of the concepts of qualitative robustness for stochastic processes. AMonte Carlo study also shows that they have good robustness behavior forsmall samples.

In Section 2 we define τ—estimates for VAR processes and derive theasymptotic normal distribution of τ—estimates and compute their relativeefficiency with respect to the CMLE under normal innovations. In Section3 we present a computing algorithm based on iterative weighted CMLE. InSection 4 we present the results of the Monte Carlo study. In the Appendixwe derive some mathematical results.

2 Estimates based on a robust scale

A first class of robust scale estimates are the M-estimates which were in-troduced by Huber (1981). For a sample u = (u1, ..., un), an M-estimate ofscale s(u1, ..., un) is defined by the value s satisfying

1

n

nXi=1

ρ

µ |ui|s

¶= κ, (2.1)

where κ is chosen so that κ = EH0 (ρ (|u|)) , where H0 is the nominal modelfor the u0is. This choice guarantees that s(u1, ..., un) converges to 1 whenu1, ..., un, ... is a stationary and ergodic stochastic process with marginaldistribution H0. We will require that the function ρ satisfies the followingproperties:

A1 ρ(0) = 0.

A2 0 ≤ u ≤ u∗ implies ρ(u) ≤ ρ(u∗).

A3 0 < A =supuρ(u) <∞.A4 ρ is twice differentiable.

One measure of the degree of robustness of an estimate is the breakdownpoint introduced by Hampel (1971). Roughly speaking, the breakdown pointof an estimate is the smallest fraction of outliers that is required to take theestimate to an extreme value. In the case of scale estimates, the two extreme

4

values are zero or infinity (“implosion” and “explosion”, respectively). Hu-ber (1981) proves that the breakdown point to infinity of an scale M-estimateis ∗∞ = κ/A and the breakdown point to zero is ∗

0 = 1 − κ/A. Then, thebreakdown point of this scale estimate is given by

∗ = minµκ

A, 1− κ

A

¶. (2.2)

Therefore, choosing ρ such that κ/A = 0.5 we obtain ∗ = 0.5, which isthe highest breakdown point for a scale-estimate (see Huber (1981)). How-ever, (see Hossjer (1992)), the M-scale estimates can not combine a highbreakdown point with high efficiency under normality. To combine bothproperties, Yohai and Zamar (1988) proposed τ−estimates of scale.

Consider two functions ρ1 and ρ2 satisfying A1-A4 and put

κi = EH0(ρi (|u|)), i = 1, 2. (2.3)

Let s(u) be the M-scale estimate defined in (2.1) with ρ = ρ1 and κ = κ1.The τ -estimate of scale of u = (u1, ..., un) is defined by

τ2(u) =τ20κ2s2(u)

1

n

nXi=1

ρ2

µ |ui|s (u)

¶. (2.4)

This estimate τ(u) converges to τ0 when u1, ..., un, ... is a stationaryand ergodic process with marginal distribution H0. Then, it is convenientto choose τ20 = EH0(u

2), so that τ(u) converges to the same value as theSRMSE scale estimate.

Put ψi(u) = ρ0i(u), i = 1, 2. To guarantee consistency, we will also requirethat ρ2 satisfies the following condition

B 2ρ2(u)− ψ2(u)u > 0 for u > 0.

Yohai and Zamar (1988) proved that by properly choosing ρ1, ρ2, the τ -estimates can combine breakdown point 0.5 with high Gaussian asymptoticefficiency. This is a consequence of the following facts.

• The scale τ differs from s by a bounded factor. Based on this fact,Yohai and Zamar (1988) proved that a τ—scale has the same breakdownpoint as the corresponding s− scale.

• For any ρ1, the choice of ρ2(u) = u2, and τ20 = EH0(u2) yields τ(u) =

(1/nPni=1 u

2i )1/2, and hence taking ρ2(u) close to u

2 but boundedmakes τ robust and efficient for a normal distribution.

5

Possible choices of ρ1 and ρ2 are given later in this Section.Let now xt = (x1t, . . . , xmt), (1 ≤ t ≤ T ) be observations corresponding

to a VAR(p) model given by (1.1). Then the τ−estimates of Ω and Σ aredefined by ³eΩ, eΣ´ = argmin

Ω,Σdet(Σ) (2.5)

subject toτ2(dp+1(Ω,Σ), ..., dT (Ω,Σ)) = m, (2.6)

where the scale τ is defined in (2.4) with τ20 = m.Comparing (2.5) and (2.6) with (1.2) and (1.3), we observe that the

CMLE is a particular case of τ−estimate corresponding to ρ2(u) = u2.It is easy to check that τ -estimates are affine equivariant.The following Theorem, proved in the Appendix, gives the equations for

τ -estimates. Define

d∗t (Ω,Σ) =dt(Ω,Σ)

s(dp+1(Ω,Σ), ..., dT (Ω,Σ))

and let AT = AT (Ω,Σ), BT = BT (Ω,Σ) and ψ∗T = ψT,ΩΣ be defined by

AT (Ω,Σ) =1

T − pTX

t=p+1

(2ρ2(d∗t (Ω,Σ))− ψ2(d

∗t (Ω,Σ))d

∗t (Ω,Σ)),

BT (Ω,Σ) =1

T − pTX

t=p+1

(ψ1(d∗t (Ω,Σ))d

∗t (Ω,Σ)),

ψ∗T (u) = ATψ1(u) +BTψ2(u)

and

w∗T (u) =ψ∗T (u)u

.

Theorem 1. Suppose that ρ1 and ρ2 are differentiable and put ψi = ρ0i.Then the τ− estimates satisfy the following equations

TXt=p+1

w∗T (d∗t (eΩ, eΣ))bat(eΩ)z0t = 0 (2.7)

and eΣ = PTt=p+1mw

∗T (d

∗t (eΩ, eΣ))bat(eΩ)ba0t(eΩ)es2PT

t=p+1 ψ∗T (d

∗t (eΩ, eΣ))d∗t (eΩ, eΣ) . (2.8)

6

where es = s(dp+1(eΩ, eΣ), ..., dT (eΩ, eΣ))To determine the asymptotic distribution of τ− estimates for VAR mod-

els, we will require that the innovations have an elliptical distribution withunimodal density. Then we consider the following assumption.

C. The distribution F of at has a density of the form

f(a) =f∗(a0Σ−1a)det(Σ)1/2

,

where f∗ is an strictly decreasing function.

An important family of unimodal elliptical distributions is the multivari-ate normal. In this case

f∗(u) = ϕ∗(u) =exp(−u/2)(2π)m/2

.

We will denote by (Ω0,Σ0) the true values of (Ω,Σ). We will calibratethe estimates so that eΣ is consistent when f∗ = f∗0 , where f∗0 is a particulargiven function. For this purpose we take as H0 in (2.3), the distribution of(a0tΣ

−10 at)

1/2 under f∗0 . Observe that this distribution does not depend onΣ in the elliptical case. We will take f∗0 = ϕ∗, which implies that H0 isthe distribution of

√u, where u is a chi-square variable with m degrees of

freedom. If f∗ 6= f∗0 , then eΩ is still a consistent estimate of Ω, but eΣ willconverge to Σ∗0 = λ0(H)Σ0, where

λ0(H) = k20(H)/σ

20(H), (2.9)

and k0(H) and σ0(H) are defined by

EH

µρ1

µu

k0(H)

¶¶= κ1 (2.10)

andσ20(H)

κ2EH

µρ2

µu

k0(H)

¶¶= 1, (2.11)

where H is the distribution of (a0t Σ−10 at)

1/2. Observe that σ0 = k0 = 1 if theat are elliptical and f∗ = f∗0 . In the Appendix we give an heuristic argumentshowing that in this case (eΩ, eΣ) converges to (Ω0,Σ∗0) More precisely, weshow that (Ω0,Σ∗0) satisfies the asymptotic form of equations (2.7), (2.8)and (2.6).

7

Theorem 2 establishes the asymptotic normality of eΩ. A sketch of itsproof is given in the Appendix. We need to define the limit of the sequenceof functions ψ∗T. For that purpose put

A = A(H,ψ1,ψ2) = EH

µ2ρ2

µu

k0

¶− ψ2

µu

k0

¶µu

k0

¶¶,

B = B(H,ψ1,ψ2) = EH

µψ1

µu

k0

¶µu

k0

¶¶and

ψ∗H(u) = Aψ1µu

k0

¶+Bψ2

µu

k0

¶, w∗H(u) =

ψ∗H(u)u

.

Theorem 2. Let xt, 1 ≤ t ≤ T be a VAR( p) stationary process withparameters Ω0 and Σ0 . Suppose also that ρi, i = 1, 2 satisfy A1-A4 andB, and that the at have a distribution satisfying C. Let (eΩT , eΣT ) be the τ -estimate corresponding to the first T observations. Then eΩ∗T = T 1/2(eΩT −Ω0) converges in law to a normal distribution. The asymptotic covariancematrix of eΩ∗T is the same as the one of bΩ∗T = T 1/2(bΩT − Ω0), where bΩT isthe CMLE, multiplied by a constant J(ψ1,ψ2,H) given by

J(ψ1,ψ2,H) =m2 EH(ψ

∗2H (u))

EH(u2)(EH(ψ∗0H(u)) + (m− 1)EH(w∗H(u)))2

,

where H is the distribution of (a0t Σ−10 at)

1/2. Then the asymptotic relativeefficiency of eΩT with respect to bΩT is 1/J(ψ1,ψ2,H).

Observe that the efficiency 1/J(ψ1,ψ2,H) depends on the dimension mbut not on the VAR model..

This suggests the following strategy to obtain a τ—estimate which issimultaneously highly robust and highly efficient under normal innovations.

1. Choose ρ1 so thatκ1/max ρ1 = 0.5. (2.12)

This guarantees that the initial M—scale estimate s has breakdownpoint 0.5.

2. Choose as ρ2(u) a bounded function close enough to u2 to get the

desired efficiency..

8

This can be achieved, for example, by taking ρ1 and ρ2 in Tukey’sbisquare family defined by

ρB,c(u) =

⎧⎪⎪⎪⎨⎪⎪⎪⎩u2

2

Ã1− u

2

c2+u4

3c4

!if |u| ≤ c

c2

6if |u| > c,

where c is any positive number. Observe that when c increases, ρB,c ap-proaches u2.

Table 1 gives the values of c1 such that ρ1 = ρB,c1 satisfies (2.12) fordifferent values of m. Table 2 gives the value of c2 to achieve different levelsof asymptotic efficiency (taking ρ1 = ρB,c1 and ρ2 = ρB,c2 ).

3 Computing Algorithm

Consider the τ -estimating equations (2.7) and (2.8) and suppose thatwe knew eΩ, eΣ and es = s(dp+1(eΩ, eΣ), ..., dT (eΩ, eΣ)). Then according to result(2.7) of Theorem 1 we have that each row of eΩ can be obtained separatelyby weighted least squares (WLS) , where the t-th observation has weightw∗T (dt(eΩ, eΣ)/es). This and (2.8) suggest the following iterative algorithm:1. Using initial values eΩ0 and Σ0 satisfying (2.6), compute es0 =sT−p(dp+1(eΩ0, eΣ0), ..., dT (eΩ0, eΣ0)) and the weights w∗T (dt(eΩ0, eΣ0)/es0)for p+ 1 ≤ t ≤ T . Then these weights are used to compute each rowof eΩ1 separately by WLS.We compute now es1 = sT−p(dp+1(eΩ1, eΣ0), ..., dT (eΩ1, eΣ0)).

2. We compute a matrix

eΣ∗1 =PTt=p+1mw

∗T

µdt(eΩ1,eΣ0)es1

¶at(eΩ1)a0t(eΩ1)

es21PTt=p+1 ψ

∗T

µdt(eΩ1,eΣ0)es1

¶dt(eΩ1,eΣ0)es1

. (3.1)

3. We compute eτ1 = τ(dp+1(eΩ1, eΣ∗1), ..., dT (eΩ1, eΣ∗1)) and eΣ1 = (eτ21/m)eΣ∗1.Then (eΩ1, eΣ1) satisfies constraint (2.6).

9

4. Suppose now that we have already computed (eΩh, eΣh) satisfying con-straint (2.6). Then (eΩh+1, eΣh+1) are computed using steps 1-3, butstarting from (eΩh, eΣh) instead of (eΩ0, eΣ0).

5. The procedure is stopped in step h such that the relative absolutedifferences of all elements of the matrices eΩh and eΩh−1 are smallerthan a given value δ.

The convergence of this algorithm to the τ -estimate has not been proved.In our simulations in Section 4, (eΩh, eΣh) always approach values satisfying(2.7) and (2.8).

4 Monte Carlo Results

To assess the efficiency and robustness of the proposed estimates we per-formed a Monte Carlo study. We considered two VAR(1) models of dimen-sion 2. For the first model φ11 = 0.9, φ12 = 0., φ21 = 0. and φ22 = 0.9.For the second model φ11 = 0.9, φ12 = 0., φ21 = −0.4 and φ22 = 0.5. Theinnovations at were generated from a multivariate N(0, I2) distribution.

For each model, three estimates were computed: the CMLE, a RA-estimate (see García Ben et al., 1999) and a τ—estimate. Both RA- andτ -estimates were based on the bisquare function, and the tuning constantwere chosen so that the relative asymptotic efficiencies with respect to theCMLE be 0.90.

We considered the cases of no outliers and three types of contaminationwith 5 % of replacement outliers. In the contaminated cases, instead ofobserving the VAR process xt, we observe the contaminated process ext,where ext = xt for 95 % of the observations and ext = d for the remaining 5 %.The contaminated observations are fixed and equally spaced. Three valuesof d where considered: (1) d = (5, xt,2), (2) d = (5, 5) and (3) d = (5,−5).

The simulation was performed with 500 samples of size T=100. The τ−estimates were computed using the algorithm described in Section 3, andthe RA-estimates using the one described in Section 4 of García Ben et al.(1999). For both, RA- and τ−estimates, the initial values were separateMM-estimate of regression (see Yohai (1987)) for each component with 90%efficiency. In Tables 3 and 4 we report the mean squared errors (MSE) of thethree estimates and the relative efficiencies (RE) of the RA- and τ—estimateswith respect to the CMLE. The mean squared errors are displayed multipliedby 100. These results show that, when there are no outliers, the relative ef-ficiencies of the RA- and τ—estimates are always similar to the asymptotic

10

value 0.90. In the case of outlier contamination, the efficiency of the RA-and τ—estimates is in general large. However, there are some isolated caseswhere the efficiency is less than one. The explanation for this unexpectedimprovement of the efficiency of the CMLE is that in these cases, the conta-minated observations act as “good” leverage points. Consider for exampleModel 1 with contamination of type (1). In this case, the only contaminatedvariable is xt,1. Since in the equation xt,2 = φ2,1xt−1,1+φ2,2xt−1,2+ν2+at,2the coefficient φ2,1 = 0, the contaminated observations are good leveragepoints and this explains the improvement of the performance of the CMLEof ν2, φ2,1 and φ2,2.

TABLE 3 ABOUT HERE

TABLE 4 ABOUT HERE

For model 1, the τ−estimates behaves better than the RA-estimatesunder the three types of contaminations. For model 2, the τ−estimatesbehave better than the RA-estimates under contaminations of types 1 and3. Instead, for contamination of type 2 , the RA-estimates behaves better.

5 Conclusions

We have presented a new class of equivariant estimates for VAR modelsbased on a robust scale: the τ -estimates. These estimates depend on twofunctions ρ1 and ρ2, and by a proper choice of these functions, they combinehigh efficiency under normal errors and good robustness properties underoutlier contamination. A Monte Carlo study results confirm these propertiesunder normal errors and under isolated additive outliers. Further study isrequired to determine their robust behavior under other type of outliers asinnovation outliers and patchy outliers .

Appendix

Proof of Theorem 1. Put

T (Ω,Σ) = det(Σ)τ2m(dp+1(Ω,Σ), ..., dT (Ω,Σ)) ,

then, it is easy to show that for any real λ

T (Ω,λΣ) = T (Ω,Σ).

11

This shows that the τ− estimate (eΩ, eΣ) also minimizes T (Ω,Σ) withoutrestrictions (observe that (eΩ,λeΣ) minimizes T (Ω,Σ), too), and therefore itshould satisfy the following equations

∂T (Ω,Σ)

∂Ω= 0, (A.1)

and∂T (Ω,Σ)

∂Σ= 0. (A.2)

We will show that A.1 is equivalent to (2.7) and (A.2) to (2.8).Denote by s∗(Ω,Σ) = s(dp+1(Ω,Σ), ..., dT (Ω,Σ)) and R(Ω,Σ) = (1/(T −

p))PTt=p+1 ρ2(dt(Ω,Σ)/s

∗(Ω,Σ)). In the rest of the proof we will write dt , d∗t ,R and s∗ instead of dt(Ω,Σ), d∗t (Ω,Σ), R(Ω,Σ) and s∗(Ω,Σ) respectively.Finally put Ω = (ωij), then (A.1) is equivalent to

2s∗∂s∗

ωijR+ s∗2(Ω,Σ)

∂R

ωij= 0, 1 ≤ i ≤ m, 1 ≤ j ≤ r. (A.3)

It is easy to show that

∂s(dp+1, , , , , dT )

∂dt=

ψ1(dt/s(dp+1, ..., dT ))PTh=p+1 ψ1(dh/s(dp+1, ..., dT ))(dh/s(dp+1, ..., dT ))

and that∂dt∂ωij

= −Pmv=1 σ

ivatv(Ω)

dtztj ,

where Σ−1 = (σij). Therefore

∂s∗

∂ωij= −

PTt=p+1(ψ1(d

∗t )/d

∗t )Pmv=1 σ

ivztjatv(Ω)

s∗PTt=p+1 ψ1(d

∗t )d

∗t

. (A.4)

We also have

∂R

∂ωij=

1

T − pTX

t=p+1

ψ2 (d∗t )

Ã∂dt∂ωij

/s∗ − dt ∂s∗

∂ωij/s∗2

!. (A.5)

Substituting (A.4) in (A.5) we get

∂R

∂ωij= −

³PTt=p+1 ψ1(d

∗t )d

∗t

´³PTt=p+1(ψ2(d

∗t )/d

∗t )Pmv=1 σ

ivztjbatv(Ω)´(T − p)s∗2PT

t=p+1 ψ1(d∗t )d

∗t

12

+

³PTt=p+1 ψ2(d

∗t )d

∗t

´³PTt=p+1(ψ1(d

∗t )/d

∗t )Pmv=1 σ

ivztjbatv(Ω)´(T − p)s∗2PT

t=p+1 ψ1(d∗t )d

∗t

. (A.6)

Substituting (A.4) and (A.6) in (A.3), we get

−Pmv=1 σ

ivPTt=p+1w

∗T (d

∗t )batv(Ω)ztjPT

t=p+1 ψ1(d∗t )d

∗t

= 0, 1 ≤ i ≤ m, 1 ≤ j ≤ r

which is equivalent to

Σ−1PTt=p+1w

∗T (d

∗t )bat(Ω)z0tPT

t=p+1 ψ1(d∗t )d

∗t

= 0,

and this is equivalent to (2.7).(A.2) is equivalent to

∂L(Ω,Σ)

∂Σ= 0, (A.7)

where

L(Ω,Σ) = log(T (Ω,Σ)) = log(det(Σ)) + 2m log(s∗)

+m log

⎛⎝ 1

T − pTX

t=p+1

ρ2(d∗t )

⎞⎠ . (A.8)

It is easy to show that

∂s∗

∂Σ=

PTt=p+1 ψ1(d

∗t )

∂dt∂ΣPT

t=p+1 ψ1(d∗t )d

∗t

,

and since∂dt∂Σ

= − 1

2dtΣ−1ata0tΣ

−1, (A.9)

then

∂s∗

∂Σ= −1

2

Σ−1³PT

t=p+1ψ1(d

∗t )

d∗tata

0t

´Σ−1PT

t=p+1 ψ1(d∗t )dt

. (A.10)

It is well known that, for a symmetric matrix Σ

∂ log(det(Σ))

∂Σ= Σ−1.

Differentiating (A.8) we get

13

∂L(Ω,Σ)

∂Σ= Σ−1 +

2m

s∗∂s∗

∂Σ

+m(1/(T − p))PT

t=p+1 ψ2(d∗t )³∂dt∂Σ

1s∗ −

d∗ts∗

∂s∗∂Σ

´(1/(T − p))PT

t=p+1 ρ2(d∗t )

.

. (A.11)

Substituting (A.9) and (A.10) in (A.11) we get

∂L(Ω,Σ)

∂Σ= Σ−1 − m

s∗2Σ−1

³PTt=p+1w1(d

∗t )ata

0t

´Σ−1PT

h=p+1 ψ1(d∗t )d

∗t

−(m/(T − p))PTt=p+1w2(d

∗t ) Σ

−1ata0tΣ−1

2s∗2³(1/(T − p))PT

t=p+1 ρ2(d∗t )´

+

⎡⎣ (m/(T − p))PTt=p+1 ψ2(d

∗t )d

∗t

2s∗2³(1/(T − p))PT

t=p+1 ρ2(d∗t )´ ³PT

t=p+1 ψ1(d∗t )d

∗t

´⎤⎦ .

⎡⎣Σ−1⎛⎝ TXt=p+1

w1(d∗t )ata

0t

⎞⎠Σ−1⎤⎦ . (A.12)

Then, by (A.7)

2Σ

⎛⎝ 1

T − pTX

t=p+1

ρ2(d∗t )

⎞⎠⎛⎝ TXt=p+1

w1(d∗t )

⎞⎠

=2m

s∗2

⎛⎝ 1

T − pTX

t=p+1

ρ2(d∗t )

⎞⎠⎛⎝ TXt=p+1

w1(d∗t )ata

0t

⎞⎠+m

s∗2

⎛⎝ TXt=p+1

ψ1(d∗t )d

∗t

⎞⎠⎛⎝ 1

T − pTX

t=p+1

w2(d∗t )ata

0t

⎞⎠.+

m

s∗2

⎛⎝ 1

T − pTX

t=p+1

w2(d∗t )

⎞⎠⎛⎝ TXt=p+1

w1(d∗t )ata

0t

⎞⎠ . (A.13)

Solving Σ from (A.13) we get

14

Σ =m

s∗2

PTt=p+1w

∗T (d

∗t )ata

0t

2³

1T−p

PTt=p+1 ρ2(d

∗t )´³PT

t=p+1 ψ1(d∗t )d

∗t

´ . (A.14)

We also have

2

⎛⎝ 1

T − pTX

t=p+1

ρ2(d∗t ))

⎞⎠⎛⎝ TXt=p+1

ψ1(d∗t )d

∗t

⎞⎠ = 2BT TXt=p+1

ρ2(d∗t ). (A.15)

From the definition of AT we get

2TX

t=p+1

ρ2(d∗t ) = (T − p)AT +

TXt=p+1

ψ2(d∗t )d

∗t , (A.16)

and then

2BT

TXt=p+1

ρ2(d∗t ) =

⎛⎝ TXt=p+1

ψ1(d∗t )d

∗t

⎞⎠AT +⎛⎝ TXt=p+1

ψ2(d∗t )d

∗t

⎞⎠BT=

TXt=p+1

ψ∗T (d∗t )d

∗t .

Then by (A.14) and (A.15), we obtain

Σ =m

s∗2

PTt=p+1w

∗T (d

∗t )ata

0tPT

t=p+1 ψ∗T (d

∗t )d

∗t

,

and this proves that (A.2) is equivalent to (2.8).

Heuristic Proof of Consistency. Let s0(F,Ω,Σ) and τ0(F,Ω,Σ) be the limitvalues under F of s(dp+1(Ω,Σ), ..., dT (Ω,Σ)) and τ(dp+1(Ω,Σ), ..., dT (Ω,Σ))respectively. Therefore

EF

µρ1

µdt(Ω,Σ)

s0(F,Ω,Σ)

¶¶= κ1 (A.17)

and

τ20(F,Ω,Σ) =m s20(F,Ω,Σ)

κ2EF

µρ2

µdt(Ω,Σ)

s0(F,Ω,Σ)

¶¶. (A.18)

15

Taking limits in (2.7) and (2.8), we derive that the limit value of (eΩ, eΣ)should satisfy

EFw∗H(d

∗t (Ω,Σ))at(Ω)z

0t = 0 (A.19)

and

Σ =mEF (w

∗H(dt(Ω,Σ)/s0(F,Ω,Σ))bat(Ω)ba0t(Ω))

s0(F,Ω,Σ)EFψ∗H(dt(Ω,Σ)/s0(F,Ω,Σ))dt(Ω,Σ)

. (A.20)

Sinceτ2(dp+1(eΩ, eΣ), ..., dT (eΩ, eΣ)) = m , the limit value of (eΩ, eΣ) shouldalso satisfy

τ20(F,Ω,Σ) = m. (A.21)

Let (Ω0,Σ0) be the true values of (Ω,Σ) and Σ∗0 = λ0(H)Σ0, where λ0(H)is defined in (2.9). We will show that (Ω0,Σ∗0) satisfies (A.19), (A.20) and(A.21). We start proving that s0(F,Ω0,Σ∗0) = σ0 and that τ20(F,Ω0,Σ

∗0) =

m. By (2.10)

EF

µρ1

µdt (Ω0,Σ

∗0)

σ0

¶¶= EF

Ãρ1

Ã(a0tΣ

∗−10 at)

1/2

σ0

!!

= EF

Ãρ1

Ã(a0tΣ

−10 at)

1/2

k0

!!

= EH

µρ1

µu

k0

¶¶= κ1,

and therefore s0(F,Ω0,Σ∗0) = σ0. Then, using (2.11), we obtain

m

κ2σ20EF

Ãρ2

Ã(a0tΣ

∗−10 at)

1/2

σ0

!!=

m

κ2σ20EF

Ãρ2

Ã(a0tΣ

−10 at)

1/2

k0

!!

=m

κ2σ20EH

µρ2

µu

k0

¶¶= m

and thenτ20(F,Ω0,Σ

∗0) = m. (A.22)

Since at and zt are independent, we have

16

EF¡w∗H(d

∗t (Ω0,Σ

∗0))at(Ω0)z

0t

¢= EF

³w∗H((atΣ

−10 at)

1/2)atz0t

´= EF

³w∗H((atΣ

−10 at)

1/2)at´EF (z

0t)

= 0. (A.23)

Put a∗t = Σ∗−1/20 at, and let F ∗ be its distribution. Since F ∗ is spherical,

we have that for any function q

EF∗µq(||a∗t ||)||a∗t ||

a∗ta∗0t

¶=1

mEF∗ (q(||a∗t ||)||a∗t ||) ,

where || || denotes Euclidean norm. Then, taking q(u) = ψ∗(u/σ0(H)), weget

mEF (w∗(dt(Ω0,Σ∗0)/s0(F,Ω0,Σ∗0))bat(Ω0)ba0t(Ω0))

s0(F,Ω0,Σ∗0)EF (ψ∗(dt(Ω0,Σ∗0)/s0(F,Ω0,Σ∗0))dt(Ω0,Σ∗0))

=mEF

³ψ∗((atΣ∗−10 at)

1/2/s0(F,Ω0,Σ∗0))ata

0t/(atΣ

∗−10 at)

1/2´

EF³ψ∗((atΣ∗−10 at)1/2/s0(F,Ω0,Σ∗0))(atΣ

∗−10 at)1/2

´= Σ

∗1/20

mEF (ψ∗(||a∗t ||/s0(F,Ω0,Σ∗0))a∗ta∗0t /||a∗t ||)

EF (ψ∗(||a∗t ||/s0(F,Ω0,Σ∗0))||a∗t || )

Σ∗1/20

= Σ∗1/20 Σ

∗1/20 = Σ∗0. (A.24)

It follows from (A.23), (A.24) and (A.22) that (Ω0,Σ∗0) satisfies (A.19),(A.20) and (A.21).

Proof of Theorem 2. We will start assuming that Σ0 = Im. SincelimT→∞ eΩ = Ω0 and limT→∞ eΣ = (k20/σ

20)Σ0, it can be proved that as-

ymptotically equation (2.7) is equivalent to

Q(eΩ) = 0,where

Q(Ω) =TX

t=p+1

w∗H(||bat(Ω)||)bat(Ω)z0t.Put Q(Ω) = (qij(Ω)), then, by the mean value theorem we have

17

qij(eΩ) ' qij(Ω0) + mXu=1

rXv=1

∂qij(Ω0)

∂ωuv

¯Ω=Ω0

(eωuv − ω0uv), (A.25)

where ' means asymptotically equivalent.On the other hand

∂qij(Ω)

∂ωuv

¯Ω=Ω0

=TX

t=p+1

w∗0(||at||) ∂||bat(Ω||∂ωuv

¯Ω=Ω0

atiztj+w∗(||at||) ∂bati(Ω)

∂ωuv

¯Ω=Ω0

ztj ,

(A.26)

∂||bat||∂ωuv

¯Ω=Ω0

=−atuztv||at|| (A.27)

and∂bati(Ω)∂ωuv

¯Ω=Ω0

= −ztvδiu . (A.28)

Using (A.27), (A.28) and (A.26) we obtain

∂qij(Ω)

∂ωuv

¯Ω=Ω0

= −TX

t=p+1

£w∗0H(||at||)(atiatu/||at||)ztjztv + δiuw

∗H(||at||)ztjztv

¤.

(A.29)Put eω∗uv = (T − p)1/2(eωuv − ω0uv),

f ijuv =1

T − pTX

t=p+1

£w∗0H(||at||)(atiatu/||at||)ztjztv + δiuw

∗(||at||) ztjztv¤

and

rT,ij =1

(T − p)1/2TX

t=p+1

w∗(||at||)atiztj .

Replacing (A.29) in (A.25) we get the following approximating equation

rT,ij 'mXu=1

rXv=1

f ijuveω∗uv. (A.30)

By the ergodic theorem we have

limT→∞

f ijuv =

(Gmjv if u = i0 if u 6= i a.s.,

18

where

G = E(w∗0H(||at||)||at||)/m+E(w∗(||at||)and

mjv = E(ztjztv).

Then by (A.30) a new asymptotic equation is

rT,ij ' GrXv=1

mjv ew∗iv,and putting eω∗i = (eωi1, ..., eωir)0 , rT,i = (rT,i1, ..., rT,ir)0 and M = (mij) weget eω∗i ' 1

GM−1rT,i. (A.31)

The central limit theorem implies that the matrix rT,ij (i = 1, ..., r,j = 1, ..., .r) converges to a Gaussian process rij with mean 0 and covariancematrix

cov(rijri0j0) =

(dmjj0 if i = i0

0 if i 6= i0 ,

whered = E(w∗2H (||at||)||at||2)/m.

Then from (A.31) eω∗i is asymptotically N(0, J(ψ1,ψ2,H)M−1) where

J(ψ1,ψ2,H) =mE(w∗2H (||at||)||at||2)

(E(w∗0H(||at||)||at||) +mE(w∗H(||at||)))2

=mE(ψ∗2(||at||))¡

E(ψ∗0H(||at||)) + (m− 1)E(w∗H(||at||))¢2

=mEH(ψ

∗2(u))¡EH(ψ

∗0(u)) + (m− 1)E(w∗H(u))¢2 .

The CMLE corresponds to ρ2(u) = ρ0(u) = u2 and ρ1 arbitrary, then2ρ2(u)−ψ2(u) = 0 and thereforeA = 0.Then, ψ∗H(u) = 2Bu/k0, EH(ψ∗2(u)) =4B2EH(u

2)/k20, EH(ψ∗0(u)) = E(w∗H(u)) = 2B/k0. Then J(ψ1,ψ0,H) =

EH(u2)/m, and then Theorem 2 follows for the case that Σ0 = Im.

Consider now the case of an arbitrary Σ0. Let x∗t = Σ−1/20 xt be the

transformed process, which is also a VAR(p) process satisfying

x∗t = φ∗01xt−1 + · · ·+ φ∗0pxt−p + ν∗0 + a∗t , (A.32)

19

where φ∗0i = Σ−1/20 φ0iΣ

1/20 , ν∗0 = Σ

−1/20 ν0 and now the density of a∗t is

f∗(||at||). Then Theorem 2 also holds because of the equivariance of theτ−estimates.

REFERENCES

Boente, G, Fraiman, R. and Yohai, V. J. (1987).Qualitative robustness forstochastic processes. Annals of Statistics 15. 1293-1312.

Bustos, O. and Yohai, V. (1986). Robust estimates for ARMA models. J.Am. Statist. Assoc. 81, 155—168.

Davies, P. L. (1987) Asymptotic behavior of S-estimates of multivariatelocation parameters and dispersion matrices. Annals of Statistics 15,1269-1292

García Ben, M, Martinez, E. J., and Yohai, V. J.(1999) Robust estimationin vector autoregressive moving-averages models. Journal of TimeSeries Analysis 20, 381—399.

Hampel, F, R. (1971), A general qualitative definition of robustness. Annalsof Mathematical Statistics 42, 1887-1896.

Hossjer, O.(1992) On the optimality of S-estimators. Statistics and Prob-ability Letters 14, 413-419.

Huber, P. J. (1981) Robust Statistics. Wiley.

Li, W. K. and Hui, Y. V. (1989). Robust multiple time series modelling.Biometrika 76, 309—315.

Lopuhaä, H.P. Multivariate τ -estimators for location and scatter. Cana-dian Journal of Statistics 19, 307-321.

Wilson, G. T. (1973). The estimation of parameters in multivariate timeseries models. J. R. Statist. Soc. B 40, 76—85.

Yohai, V. J. (1987). High breakdown-point and high efficiency M-estimatesfor regression. Annals of Statistics, 15, 642-656.

Yohai, V.J. and Zamar, R. (1988). High breakdown-point estimates ofregression by means of the minimization of an efficient scale. Journalof the American Statistical Association 83, 406-413.

20

Table1. Values of c1 and κ1 for the bisquare function

Dimension 1 2 3 4 5 7c1 1.55 2.66 3.45 4.10 4.65 6.77κ1 0.20 0.59 0.99 1.40 1.80 3.82

Table 2. Values of the constant c2 and k2 for the bisquare function anddifferent values of the relative asymptotic efficiency

RAE Dimension1 2 3 4 5 10

c2 κ2 c2 κ2 c2 κ2 c2 κ2 c2 κ2 c2 κ20.80 3.98 0.42 3.94 0.77 4.02 1.10 4.10 1.40 4.17 1.67 4.28 2.560.90 4.97 0.44 4.97 0.85 5.10 1.24 5.25 1.61 5.39 1.96 5.98 3.540.95 6.04 0.46 6.06 0.90 6.24 1.32 6.42 1.73 6.60 2.13 7.50 4.03

21

Table 3. Mean square error (MSE) and relative efficiency (RE) of theestimates - Model 1

Parameter Estimate Contamination typenone 1 2 3

MSE RE MSE RE MSE RE MSE REν1 CMLE 2.68 1.00 15.8 1.00 15.3 1.00 15.1 1.00

RA 2.86 0.94 5.85 2.70 4.81 3.18 4.65 3.25τ 2.91 0.92 5.48 2.89 3.74 4.09 3.73 4.05

ν2 CMLE 2.77 1.00 2.53 1.00 14.9 1.00 15.9 1.00RA 3.08 0.90 2.81 0.90 4.37 3.41 4.68 3.40τ 3.04 0.91 3.14 0.81 3.97 3.77 3.97 4.00

φ11 CMLE 0.66 1.00 12.4 1.00 9.32 1.00 9.26 1.00RA 0.73 0.91 2.28 5.45 1.67 5.57 1.57 5.90τ 0.70 0.94 2.22 5.60 1.13 8.26 1.09 8.47

φ12 CMLE 0.38 1.00 1.52 1.00 3.91 1.00 4.00 1.00RA 0.42 0.91 0.70 2.16 0.75 5.23 0.73 5.47τ 0.42 0.91 0.65 2.35 0.60 6.48 0.60 6.62

φ21 CMLE 0.42 1.00 0.27 1.00 4.35 1.00 4.18 1.00RA 0.48 0.87 0.40 0.68 0.81 5.35 0.72 5.80τ 0.45 0.94 0.39 0.69 0.62 6.98 0.62 6.71

φ22 CMLE 0.58 1.00 0.53 1.00 8.38 1.00 8.39 1.00RA 0.65 0.90 0.64 0.83 1.44 5.83 1.43 5.88τ 0.63 0.93 0.66 0.80 1.05 8.01 1.02 8.21

22

Table 4. Mean square error (MSE) and relative efficiency (RE) of theestimates - Model 2

Parameter Estimate Contamination typenone 1 2 3

MSE RE MSE RE MSE RE MSE REν1 CMLE 2.30 1.00 8.98 1.00 7.38 1.00 14.8 1.00

RA 2.46 0.94 3.94 2.28 3.00 2.46 4.27 3.47τ 2.50 0.92 3.91 2.30 4.00 1.84 3.14 4.72

ν2 CMLE 1.55 1.00 2.19 1.00 8.49 1.00 9.31 1.00RA 1.69 0.91 2.06 1.07 2.18 3.89 2.22 4.20τ 1.80 0.86 1.95 1.13 2.38 3.57 2.26 4.13

φ11 CMLE 0.67 1.00 18.9 1.00 14.8 1.00 5.64 1.00RA 0.74 0.90 2.76 6.86 2.65 5.59 1.34 4.22τ 0.75 0.88 2.26 8.38 4.74 3.11 0.94 6.00

φ12 CMLE 0.57 1.00 7.31 1.00 9.28 1.00 4.37 1.00RA 0.64 0.89 1.27 5.76 1.31 7.07 1.08 4.04τ 0.65 0.87 1.10 6.64 3.40 2.73 0.79 5.52

φ21 CMLE 0.51 1.00 2.54 1.00 0.62 1.00 2.94 1.00RA 0.56 0.91 0.68 3.73 0.50 1.25 0.71 4.15τ 0.58 0.88 0.67 3.81 0.91 0.68 0.66 4.44

φ22 CMLE 0.60 1.00 1.54 1.00 1.80 1.00 6.17 1.00RA 0.66 0.90 0.82 1.88 0.83 2.17 1.18 5.23τ 0.65 0.92 0.66 2.31 1.19 1.52 0.85 7.24

23

Robust Estimation in Vector Autoregressive Models Based on a Robust Scale

Documents

Transcript of Robust Estimation in Vector Autoregressive Models Based on a Robust Scale