31 THE INTEGRATED MEAN SQUARED ERROR OF SERIES …€¦ · Hilbert space, which includes the case...

25
Econometric Theory, 31, 2015, 337–361. doi:10.1017/S0266466614000322 THE INTEGRATED MEAN SQUARED ERROR OF SERIES REGRESSION AND A ROSENTHAL HILBERT-SPACE INEQUALITY BRUCE E. HANSEN University of Wisconsin This paper develops uniform approximations for the integrated mean squared error (IMSE) of nonparametric series regression estimators, including both least-squares and averaging least-squares estimators. To develop these approximations, we also generalize an important probability inequality of Rosenthal (1970, Israel Journal of Mathematics 8, 273–303; 1972, Sixth Berkeley Symposium on Mathematical Statis- tics and Probability, vol. 2, pp. 149–167. University of California Press) to the case of Hilbert-space valued random variables. 1. INTRODUCTION This paper introduces uniform approximations for the integrated mean squared error (IMSE) of nonparametric series regression estimators. Bounds for the IMSE of series regression estimators have been obtained by Newey (1997) but ours are the first uniform approximations. Related papers include Andrews (1991) who studied the asymptotic normality of series estimators, and Newey (1995), de Jong (2002), Chen (2007, Chap. 76), and Song (2008) who studied uniform conver- gence. The difference is that we are interested in directly characterizing the IMSE and not just a bound on the rate of convergence. The theory rests developing a bound on the expectation of the norm of the inverse of the sample design matrix, and for this we use an argument due to Ing and Wei (2003). Our results apply to a wide variety of series regressions, including polynomial and spline expansions. We also extend the results to averaging estimators, which are weighted averages of least-squares estimators of individual series estimators. Averaging estimators are strict generalizations of standard estimators and thereby can achieve lower IMSE. See Hansen (2007) and Hansen and Racine (2012). We obtain uniform approximations to the IMSE of averaging estimators. This result is derived in the slightly more restricted setting of nested series estimators. Research supported by the National Science Foundation. I thank the Co-Editor Yoon-Jae Whang and two referees for helpful comments, catching a mistake, and suggesting an alternative proof of Theorem 1. I also would like to thank Rustam Ibragimov for very helpful suggestions and for providing a translation of Utev (1985). Address correspondence to Bruce Hansen, Department of Economics, University of Wisconsin-Madison, 1180 Observatory Drive, Madison, WI 53706, USA; e-mail: [email protected]. www.ssc.wisc.edu/ ˜ bhansen. c Cambridge University Press 2014 337

Transcript of 31 THE INTEGRATED MEAN SQUARED ERROR OF SERIES …€¦ · Hilbert space, which includes the case...

Page 1: 31 THE INTEGRATED MEAN SQUARED ERROR OF SERIES …€¦ · Hilbert space, which includes the case of ra ndom matrices. The result is a sim-ple extension of a Banach space inequality

Econometric Theory, 31, 2015, 337–361.doi:10.1017/S0266466614000322

THE INTEGRATED MEAN SQUAREDERROR OF SERIES REGRESSION ANDA ROSENTHAL HILBERT-SPACE

INEQUALITY

BRUCE E. HANSENUniversity of Wisconsin

This paper develops uniform approximations for the integrated mean squared error(IMSE) of nonparametric series regression estimators, including both least-squaresand averaging least-squares estimators. To develop these approximations, we alsogeneralize an important probability inequality of Rosenthal (1970, Israel Journal ofMathematics 8, 273–303; 1972, Sixth Berkeley Symposium on Mathematical Statis-tics and Probability, vol. 2, pp. 149–167. University of California Press) to the caseof Hilbert-space valued random variables.

1. INTRODUCTION

This paper introduces uniform approximations for the integrated mean squarederror (IMSE) of nonparametric series regression estimators. Bounds for the IMSEof series regression estimators have been obtained by Newey (1997) but ours arethe first uniform approximations. Related papers include Andrews (1991) whostudied the asymptotic normality of series estimators, and Newey (1995), de Jong(2002), Chen (2007, Chap. 76), and Song (2008) who studied uniform conver-gence. The difference is that we are interested in directly characterizing the IMSEand not just a bound on the rate of convergence. The theory rests developing abound on the expectation of the norm of the inverse of the sample design matrix,and for this we use an argument due to Ing and Wei (2003). Our results apply to awide variety of series regressions, including polynomial and spline expansions.

We also extend the results to averaging estimators, which are weighted averagesof least-squares estimators of individual series estimators. Averaging estimatorsare strict generalizations of standard estimators and thereby can achieve lowerIMSE. See Hansen (2007) and Hansen and Racine (2012). We obtain uniformapproximations to the IMSE of averaging estimators. This result is derived in theslightly more restricted setting of nested series estimators.

Research supported by the National Science Foundation. I thank the Co-Editor Yoon-Jae Whang and two refereesfor helpful comments, catching a mistake, and suggesting an alternative proof of Theorem 1. I also would liketo thank Rustam Ibragimov for very helpful suggestions and for providing a translation of Utev (1985). Addresscorrespondence to Bruce Hansen, Department of Economics, University of Wisconsin-Madison, 1180 ObservatoryDrive, Madison, WI 53706, USA; e-mail: [email protected]. www.ssc.wisc.edu/ ˜bhansen.

c© Cambridge University Press 2014 337

Page 2: 31 THE INTEGRATED MEAN SQUARED ERROR OF SERIES …€¦ · Hilbert space, which includes the case of ra ndom matrices. The result is a sim-ple extension of a Banach space inequality

338 BRUCE E. HANSEN

To develop these approximations, we also introduce a generalization of theclassic Rosenthal inequality. The related inequalities of Marcinkiewicz andZygmund (1937), Rosenthal (1970, 1972), and Burkholder (1973) are founda-tional for many problems in applied probability and statistics. The classical formof these inequalities is for real-valued random variables. In this paper we gen-eralize the Rosenthal inequality to allow for random variables taking values in aHilbert space, which includes the case of random matrices. The result is a sim-ple extension of a Banach space inequality obtained by De Acosta (1981) and aHilbert space inequality due to Utev (1985).

2. A HILBERT-SPACE ROSENTHAL INEQUALITY

The following is a generalization of the one-sided (upper) inequalities ofMarcinkiewicz and Zygmund (1937) and Rosenthal (1970, 1972) to the case ofHilbert spaces. For 2 ≤ p < 3 it is based on a Banach space inequality obtainedby De Acosta (1981), and for p ≥ 3 it is derived from a Hilbert space inequalityof Utev (1985) following a suggestion of Ibragimov (1997).

THEOREM 1. For any p ≥ 2 there is a finite constant Ap such that for anyindependent centered array of random variables ξni taking values in a Hilbertspace with norm ‖·‖ such that E‖ξni‖p <∞,

E

∥∥∥∥∥n∑

i=1

ξni

∥∥∥∥∥p

≤ Ap

⎧⎨⎩(

n∑i=1

E‖ξni ‖2

)p/2

+n∑

i=1

E‖ξni‖p

⎫⎬⎭ . (1)

For p ≥ 2 we have Ap =(

2B1/pp +1

)pwhere Bp is the constant from the mar-

tingale generalization of Rosenthal’s inequality (Burkholder, 1973, p. 40 or Halland Heyde, 1980, Thm. 2.12), and for p ≥ 3 we have Ap = 2pCp where Cp is theexact Rosenthal constant in the independent symmetric case from Ibragimov andSharakhmetov (1997). If p = 2, the expression simplifies to

E

∥∥∥∥∥n∑

i=1

ξni

∥∥∥∥∥2

=n∑

i=1

E‖ξni ‖2 . (2)

Inequalities of the form (1) are widely used in probability and statistical theory.They are commonly applied to random variables, and this is sufficient for manypurposes (as the bound can be applied separately to each element in the ma-trix ξni ). However, for some purposes it is essential for the bound to involve thesame norm as used on the left-side.

Remark 1. The constant Ap in (1) depends only on p, not on the underlyingprobability structure nor the specific norm.

Page 3: 31 THE INTEGRATED MEAN SQUARED ERROR OF SERIES …€¦ · Hilbert space, which includes the case of ra ndom matrices. The result is a sim-ple extension of a Banach space inequality

THE INTEGRATEDMEANSQUAREDERROROFSERIESREGRESSION 339

Remark 2. Theorem 1 applies to random vectors and matrices for any Hilbert

space norm, for example, the Euclidean norm ‖a‖ = (tra′a)1/2 for vectors and

the Frobenius norm ‖A‖F = (trA′A)1/2

for matrices.

Remark 3. When ξni are identically distributed across i for given n, then wecan write (1) as

E

∥∥∥∥∥n∑

i=1

ξni

∥∥∥∥∥p

≤ Ap

{(nE‖ξni ‖2

)p/2 +nE‖ξni ‖p}. (3)

If the distribution of ξni does not depend on n, then the first term on theright-hand-side of (3) is of order O(n p/2) which dominates the second termwhich is of order O(n) (unless p = 2 in which case they are the same). How-ever, this relative ranking can change when the distribution of ξni changeswith n.

Remark 4. Using Loeve’s Cr inequality (e.g., Davidson, 1994, p. 140), we canbound (1) by 2Apn p/2−1∑n

i=1E‖ξni ‖p , or 2Apn p/2E‖ξni ‖p in the case where

ξni are identically distributed across i for given n. However, these bounds areoften significantly less tight, so are not typically preferred.

Remark 5. Some other moment bounds for matrices, Hilbert-space variablesand Banach-valued variables can be found in de la Pena and Gine (1999) and Nzeand Doukhan (2004). A review of moment bounds in econometrics can be foundin Lai and Wei (1984).

Remark 6. Theorem 1 is restricted to independent variables. Rosenthal-typeinequalities for dependent random variables can be found in Utev (1991), de laPena, Ibragimov, and Sharakhmetov (2003), and Nze and Doukhan (2004). Itwould be greatly desirable to extend Theorem 1 to allow for dependence. How-ever, our proof builds on results (De Acosta, 1981; Utev, 1985) which are re-stricted to independent sequences.

Remark 7. The constant Ap may not be the best possible. See Bestsennayaand Utev (1991) and Ibragimov and Sharakhmetov (2002) for sharp bounds onmoments of sums and the best constant in Rosenthal’s inequality for the case ofrandom variables and p an even integer, and Ibragimov and Ibragimov (2008) forthe case of random variables with zero odd moments.

3. MOMENT BOUNDS FOR SERIES REGRESSION

Consider a sample of independently and identically distributed (iid) observations(yi , zi), i = 1, ...,n where zi ∈ Z, a compact subset of Rq . Define the condi-tional mean g(z) = E (yi | zi = z). We examine the estimation of g(z) by seriesregression. For m = 1, ...,Mn, let xm(z) denote a sequence of Km × 1 vector of

Page 4: 31 THE INTEGRATED MEAN SQUARED ERROR OF SERIES …€¦ · Hilbert space, which includes the case of ra ndom matrices. The result is a sim-ple extension of a Banach space inequality

340 BRUCE E. HANSEN

functions from a series expansion. For example, a power series sets xm(z) =(1, z, z2, ..., zm) and a quadratic spline sets xm(z) = (1, z, z2, (z − τ1)

2 1(z >τ1), ..., (z − τm)

2 1(z > τm)). Set xmi = xm(zi ), a Km × 1 regressor vector.The m’th series estimator of g(z) is

gm(z)= xm(z)′βm ,

where

βm =(

n∑i=1

xmi x′mi

)−1 n∑i=1

xmi yi

is the least squares coefficient from a regression of yi on xmi .Many of the challenges arising in the theory of series regression stem from the

inversion of the sample design matrix

Qm = 1

n

n∑i=1

xmi x′mi

as an estimate of

Qm = E (xmi x′mi

).

In this section we describe some properties of the moments of Qm and Q−1m .

Define

ζm = supz∈Z

(xm(z)

′ Q−1m xm(z)

)1/2, (4)

the largest normalized Euclidean length of the regressor vector. Under standardconditions for series regression (including compact support for the regressors),ζm will be a bounded function of the dimension Km. For example, when xmi

is a power series, then ζ 2m = O(K 2

m) (see Andrews, 1991), and when xmi is aregression spline, then ζ 2

m = O(Km) (see Newey, 1995). For further discussionsee Newey (1997) and Li and Racine (2006).

We will also define the array of constants

�nm =(ζ 2

m Km

n

)1/2

(5)

which will appear frequently in our bounds.For convenience, we will state our moment bounds under the assumption that

Qm = IKm . Since the estimator gm(z) is invariant to rotations of the regressorvector xmi , this is without loss of generality for most results of interest.

For any matrix A let ‖A‖F = (trA′A)1/2 denote the Frobenius norm. The

space of �× m matrices with the Frobenius norm is a Hilbert space, allowing theapplication of Theorem 1.

Page 5: 31 THE INTEGRATED MEAN SQUARED ERROR OF SERIES …€¦ · Hilbert space, which includes the case of ra ndom matrices. The result is a sim-ple extension of a Banach space inequality

THE INTEGRATEDMEANSQUAREDERROROFSERIESREGRESSION 341

LEMMA 1. If Qm = IKm , for any 0< p ≤ 2,

E∥∥Qm − IKm

∥∥pF ≤ �

pnm (6)

and for any p > 2

E∥∥Qm − IKm

∥∥pF ≤ Ap�

pnm

(1 +2p�

p−2nm

). (7)

As shown by Ing and Wei (2003), the moment bound of Lemma 1 plus thefollowing regularity conditions can be used together to establish moment boundson the inverse moment matrix.

Assumption 1.

1. For some δ > 0, sup1≤m≤Mnζ 2

m K 1+δm /n −→ 0.

2. For some α > 0, η > 0, and ψ < ∞, for all �′ Qm� = 1 and 0 ≤ u ≤ η,supm P

(∣∣�′xmi∣∣ ≤ u)≤ψuα.

Assumption 1.1 puts a bound on the number of series terms Km relative to thesample size. For a polynomial series expansion this requirement is satisfied whenK 3+δ

m /n = o(1) and for a spline expansion it is satisfied when K 2+δm /n = o(1). It

indirectly bounds the number of models Mn.Assumption 1.2 is an unusual requirement. It specifies that the all linear com-

binations �′xmi have a Lipschitz continuous distribution near the origin. This isused to ensure existence of the expectation of the inverse of the sample designmatrix.

Our next set of results are bounds for the spectral norm ‖A‖S = (λmax(A′A))1/2 of the inverse design matrix.

LEMMA 2. Under Assumption 1 and Qm = IKm , for any p> 0 and η > 0 thereis an n sufficiently large such that

E

∥∥∥Q−1m

∥∥∥p

S≤ 1 +η. (8)

LEMMA 3. Under Assumption 1 and Qm = IKm , for any 0< p < 2 and η > 0there is an n sufficiently large such that

E

∥∥∥Q−1m − IKm

∥∥∥ p

S≤ (1 +η)� p

nm , (9)

and for any p ≥ 2 and η > 0, there is an n sufficiently large such that

E

∥∥∥Q−1m − IKm

∥∥∥ p

S≤ (1 +η) A1/2

2 p �pnm . (10)

Page 6: 31 THE INTEGRATED MEAN SQUARED ERROR OF SERIES …€¦ · Hilbert space, which includes the case of ra ndom matrices. The result is a sim-ple extension of a Banach space inequality

342 BRUCE E. HANSEN

4. INTEGRATED MEAN SQUARED ERROR

The integrated mean-squared error (IMSE) of the estimator gm(z) is

I M S En(m)=∫ZE (gm(z)− g(z))2 f (z)dz, (11)

where f (z) is the marginal density of zi . We are interested in an approximationfor I M S En(m) which is uniform across the expansions m.

It will be useful to set up some notation. Let

βm = (E (xmi x′mi

))−1E (xmi yi )

denote the linear projection coefficient, emi = yi − x ′miβm be the projection error,

and ei = yi −g(zi) be the regression error. Define the approximation error rm (z)=g(z)− xm(z)′βm and rmi = rm (zi ), and observe that emi = rmi +ei . Since emi is aprojection error, E (xmiemi )= 0, and since ei is a regression error, E (xmiei )= 0.It follows that E (xmirmi ) = 0. Set φ2

m = E(r2

mi

)and Qm = E

(xmi x ′

mi

). Let

σ 2i = E

(e2

i | zi)

and σ 2mi = E

(e2

mi | zi), and set �m = E

(xmi x ′

miσ2i

)and �∗

m =E(xmi x ′

miσ2mi

).

By definition, g(z)= xm(z)′βm +rm (z), so

gm(z)− g(z)= xm(z)′ (βm −βm

)−rm (z). (12)

Thus∫Z(gm(z)− g(z))2 f (z)dz =

∫Z

rm (z)2 f (z)dz

−2(βm −βm

)′∫Z

xm(z)rm (z) f (z)dz

+(βm −βm)′∫Z

xm(z)xm (z)′ f (z)dz

(βm −βm

)= E(

r2mi

)−2(βm −βm

)′E (xmirmi )

+(βm −βm)′E(xmi x

′mi

) (βm −βm

)= φ2

m + (βm −βm)′

Qm(βm −βm

),

the second equality since integration over the density f (z) is the same as takingexpectations, and the third using the fact that E (xmirmi )= 0. Taking expectations,we have found that

I M S En(m)= φ2m +E

((βm −βm

)′Qm(βm −βm

)). (13)

The standard asymptotic covariance matrix for βm is n−1 Q−1m �∗

m Q−1m . Substitut-

ing this covariance matrix for the expectation in (13) we might expect I M S En(m)to be close to

I ∗n (m) = φ2

m + 1

ntr(

Q−1m �∗

m

). (14)

Page 7: 31 THE INTEGRATED MEAN SQUARED ERROR OF SERIES …€¦ · Hilbert space, which includes the case of ra ndom matrices. The result is a sim-ple extension of a Banach space inequality

THE INTEGRATEDMEANSQUAREDERROROFSERIESREGRESSION 343

Furthermore, we might expect �∗m � �m , so that perhaps I M S En(m) might be

close to

In(m) = φ2m + 1

ntr(

Q−1m �m

). (15)

To show that I M S E∗n(m) is uniformly close to both I ∗

n (m) and In(m), we addthe following regularity conditions.

Assumption 2.

1.∥∥Q−1

m

∥∥S ≤ B <∞.

2. 0< σ 2 ≤ σ 2i ≤ σ 2 <∞.

3. g(z) is continuously differentiable on z ∈Z.Assumption 2.1 states that the smallest eigenvalue of Qm is bounded above

zero, and thus Qm is uniformly invertible. This is a standard condition which issatisfied by typical series expansions. For example, Newey (1997) demonstratesthat Assumption 2.1 holds when the support Z of zi is a Cartesian product ofcompact connected intervals on which the density f (z) is bounded away fromzero.

Assumption 2.2 controls the degree of conditional heteroskedasticity, boundingthe conditional variance away from zero and infinity. Assumption 2.3 is a mildsmoothness condition.

We can use the the uniform approximations of Lemmas 1–3 to establish thefollowing technical bound.

LEMMA 4. Under Assumptions 1 and 2, for n sufficiently large, and allm ≤ Mn,∣∣∣∣∣E((βm −βm

)′Qm(βm −βm

))− tr(Q−1

m �∗m

)n

∣∣∣∣∣ ≤ A4�nmtr(Q−1

m �∗m

)n

. (16)

Using (13) and Lemma 4, we can show that I ∗n (m) and In(m) are uniform

approximations to I M S E∗n (m):

THEOREM 2. Under Assumptions 1 and 2, as n → ∞

sup1≤m≤Mn

∣∣∣∣ I M S En(m)− I ∗n (m)

I ∗n (m)

∣∣∣∣−→ 0 (17)

and

sup1≤m≤Mn

∣∣∣∣ I M S En(m)− In (m)

In(m)

∣∣∣∣−→ 0. (18)

Page 8: 31 THE INTEGRATED MEAN SQUARED ERROR OF SERIES …€¦ · Hilbert space, which includes the case of ra ndom matrices. The result is a sim-ple extension of a Banach space inequality

344 BRUCE E. HANSEN

Another way of stating these results is that I M S En(m) = I ∗n (m)(1 + o(1)) =

In(m)(1 +o(1)) uniformly across the series expansions m ≤ Mn.Interestingly, Theorem 2 does not put a direct bound on the number of models

Mn relative to sample size n. The number of models is only indirectly boundedthrough Assumption 1, which primarily bounds the dimensionality of the models.

The uniform approximation provided in Theorem 1 is an important step forshowing that data-dependent choices of model m can be optimal in the senseof minimizing the IMSE. Hansen (2014) shows that under regularity conditionscross-validation selection is asymptotically equivalent to selecting the modelwhich minimizes In(m). Combined with Theorem 2, we conclude that cross-validation is asymptotically optimal with respect to minimizing IMSE.

5. AVERAGING REGRESSIONS

Reductions in IMSE can be achieved by averaging across the individual seriesestimators gm(z). Let w = (w1, ...,wMn ) be a set of nonnegative weights whichsum to one. Define the averaging estimator

g(z)=Mn∑

m=1

wm gm(z).

The IMSE of the averaging estimator is

I M S En(w)=∫ZE (g(z)− g(z))2 f (z)dz

=Mn∑

m=1

w2m

∫ZE (gm(z)− g(z))2 f (z)dz

+2Mn∑�=1

�−1∑m=1

w�wm

∫ZE ((gm(z)− g(z)) (g�(z)− g(z))) f (z)dz.

In general, the series regressions need not be nested, but for simplicity we addthat assumption as it greatly simplifies our calculations. Recall that q = dim(zi ).

Assumption 3. For a power series or spline basis sequence τj (z), j = 1,2, ...,

xm(z)= (τ1(z), τ2(z), τ3(z), ..., τm (z)).

Assumption 3 is satisfied by nested power series. It is also satisfied by se-quences of splines when knots are added but not deleted. With this additionalassumption, we can show that I M S En(w) is uniformly close to

Page 9: 31 THE INTEGRATED MEAN SQUARED ERROR OF SERIES …€¦ · Hilbert space, which includes the case of ra ndom matrices. The result is a sim-ple extension of a Banach space inequality

THE INTEGRATEDMEANSQUAREDERROROFSERIESREGRESSION 345

I ∗n (w)=

Mn∑m=1

w2m

(φ2

m + 1

ntr(

Q−1m �∗

m

))

+2Mn∑�=1

�−1∑m=1

w�wm

(φ2� + 1

ntr(

Q−1m �∗

m

)).

LetWn be the Mn-dimensional unit simplex.

THEOREM 3. Under Assumptions 1–3, as n → ∞

supw∈Wn

∣∣∣∣ I M S En(w)− I ∗n (w)

I ∗n (w)

∣∣∣∣−→ 0.

Theorem 3 shows that I ∗n (w) is a uniformly good approximation to the IMSE

of the averaging estimator, where the uniformity is over all weight vectors.In analogy to Theorem 2, we might expect that I M S En(w) is also uniformly

close to

In(w)=Mn∑

m=1

w2m

(φ2

m + 1

ntr(

Q−1m �m

))

+2Mn∑�=1

�−1∑m=1

w�wm

(φ2� + 1

ntr(

Q−1m �m

)). (19)

This, however, turns out to be harder to establish. To do so, we need a strongercondition.

Assumption 4.

1. g(z) has s continuous derivatives on z ∈ Z with s ≥ q/2 for a spline, ands ≥ q for a power series.

2. φ2m > 0 for all m.

Assumption 4.1 is a strengthening of the smoothness condition Assumption2.3 when q ≥ 3 for a spline basis or q ≥ 2 for a power series. Assumption 4.2specifies that no finite dimensional expansion equals the true regression function,and thus all expansions are approximations. This is standard in the nonparametricsliterature. It excludes the possibility that the true regression function is a finitedimensional element of the sieve space (e.g., a linear function).

THEOREM 4. Under Assumptions 1–4, as n → ∞

supw∈Wn

∣∣∣∣ I M S En(w)− In(w)

In(w)

∣∣∣∣−→ 0.

Page 10: 31 THE INTEGRATED MEAN SQUARED ERROR OF SERIES …€¦ · Hilbert space, which includes the case of ra ndom matrices. The result is a sim-ple extension of a Banach space inequality

346 BRUCE E. HANSEN

Theorem 4 is relevant for establishing the optimality of cross-validation selec-tion of the averaging weights. Under similar conditions, Hansen (2014) shows thatthe cross-validation weight selection is asymptotically equivalent to selecting themodel which minimizes In(w). Theorem 4 thus establishes that cross-validationis an asymptotically optimal with respect to IMSE for selection of the averagingweights.

6. PROOFS

Proof of Theorem 1. Since ‖·‖ is a Hilbert space norm, we can write ‖x‖2 =⟨x,x⟩

where⟨x,y⟩

is an inner product. Thus

E

∥∥∥∥∥n∑

i=1

ξni

∥∥∥∥∥2

= E⟨

n∑i=1

ξni ,

n∑j=1

ξnj

=n∑

i=1

n∑j=1

E⟨ξni , ξnj

⟩=

n∑i=1

E 〈ξni , ξni 〉

=n∑

i=1

E‖ξni‖2 . (20)

The second and third equalities hold by the linearity of the inner product and

Eξni = 0, and the fourth is the definition ‖x‖2 = ⟨x,x ⟩ . This is (2).Define Sn =∑n

i=1 ξni . For any p ≥ 2, De Acosta (1981, Thm. 2.1, part (2))established the following inequality, valid for Banach-valued random variablesξni (which includes Hilbert spaces)

E |‖Sn‖−E‖Sn‖|p ≤ 2p Bp

⎧⎨⎩(

n∑i=1

E‖ξni ‖2

)p/2

+n∑

i=1

E‖ξni ‖p

⎫⎬⎭ . (21)

By Minkowski’s inequality, Liapunov’s inequality, (20), and (21),(E‖Sn‖p)1/p = (E |‖Sn‖−E‖Sn‖+E‖Sn‖|p)1/p

≤ (E |‖Sn‖−E‖Sn‖|p)1/p +E‖Sn‖≤ (E |‖Sn‖−E‖Sn‖|p)1/p +

(E‖Sn‖2

)1/2≤ 2B1/p

p

⎧⎨⎩(

n∑i=1

E‖ξni‖2

)p/2

+n∑

i=1

E‖ξni ‖p

⎫⎬⎭1/p

Page 11: 31 THE INTEGRATED MEAN SQUARED ERROR OF SERIES …€¦ · Hilbert space, which includes the case of ra ndom matrices. The result is a sim-ple extension of a Banach space inequality

THE INTEGRATEDMEANSQUAREDERROROFSERIESREGRESSION 347

+(

n∑i=1

E‖ξni ‖2

)1/2

≤(

2B1/pp +1

)⎧⎨⎩(

n∑i=1

E‖ξni‖2

)p/2

+n∑

i=1

E‖ξni‖p

⎫⎬⎭1/p

which establishes (1) with Ap =(

2B1/pp +1

)p.

For p ≥ 3 we follow a suggestion from Ibragimov (1997, Rem. 1.10) to obtain atighter bound. First, to induce symmetry, let ε1, ..., εn be independent Rademacherrandom variables independent of ξni . (A Rademacher random variable is a sym-metric random variable on the two points {−1,1}.) Then by the symmetrizationinequality (e.g., de la Pena and Gine, 1999, Lem. 1.2.6)

E

∥∥∥∥∥n∑

i=1

ξni

∥∥∥∥∥p

≤ 2pE

∥∥∥∥∥n∑

i=1

εiξni

∥∥∥∥∥p

. (22)

Since the Hilbert-space variables εiξni are independent and symmetric, we canapply Corollary 4 of Utev (1985) (given below) so that the right-hand of (22) isbounded by

2pE

∣∣∣∣∣n∑

i=1

‖εiξni ‖ui

∣∣∣∣∣p

= 2pE

∣∣∣∣∣n∑

i=1

‖ξni‖ui

∣∣∣∣∣p

, (23)

where u1, ...,un are independent Rademacher random variables independent ofξni and εi , and the equality is ‖εiξni‖ = ‖ξni ‖ . The right-hand moment only in-volves the sum of the symmetric real-valued random variables ‖ξni‖ui , and thuswe can apply the classic Rosenthal inequality to bound (23) by

2pCp

⎧⎨⎩(

n∑i=1

E‖ξni‖2 u2i

)p/2

+n∑

i=1

E‖ξni ‖p |ui |p

⎫⎬⎭= 2pCp

⎧⎨⎩(

n∑i=1

E‖ξni‖2

)p/2

+n∑

i=1

E‖ξni ‖p

⎫⎬⎭ .See Ibragimov and Sharakhmetov (1997) for the constant Cp . This is (1) for p ≥ 3with Ap = 2pCp. n

For the convenience of English readers I state here Corollary 4 from Utev(1985), kindly translated for me by Rustam Ibragimov.

LEMMA 5. (Utev, 1985) Let ψ1, ...,ψn be independent symmetrically dis-tributed random variables taking values in a separable Hilbert space with

Page 12: 31 THE INTEGRATED MEAN SQUARED ERROR OF SERIES …€¦ · Hilbert space, which includes the case of ra ndom matrices. The result is a sim-ple extension of a Banach space inequality

348 BRUCE E. HANSEN

norm ‖·‖ . Let u1, ...,un be independent real-valued Rademacher random vari-ables independent of ψi . Then for all p ≥ 3

E

∥∥∥∥∥n∑

i=1

ψi

∥∥∥∥∥p

≤ E∣∣∣∣∣

n∑i=1

‖ψi‖ui

∣∣∣∣∣p

.

Proof of Lemma 1. Let ‖a‖ = (a′a)1/2 denote the usual Euclidean norm for

vectors a. It is useful to observe that

E‖xmi‖2 = E trxmi x′mi = tr IKm = Km. (24)

Also, since ‖xmi‖ ≤ supz∈Z ‖xm(z)‖ = ζm , then for any q ≥ 2,

E‖xmi‖q = E‖xmi‖q−2 ‖xmi‖2 ≤ ζ q−2m E‖xmi‖2 = ζ

q−2m Km . (25)

Thus by the MSE minimizing property of the mean and (25) with q = 4

E∥∥xmi x

′mi − IKm

∥∥2F ≤ E∥∥xmi x

′mi

∥∥2F = E‖xmi‖4 ≤ ζ 2

m Km . (26)

Suppose p ≤ 2. By Liapunov’s inequality, Theorem 1 (2), and (26),

E∥∥Qm − IKm

∥∥pF ≤(E∥∥Qm − IKm

∥∥2F

)p/2

=⎛⎝n−2

E

∥∥∥∥∥n∑

i=1

(xmi x

′mi − IKm

)∥∥∥∥∥2

F

⎞⎠p/2

≤(E∥∥xmi x ′

mi

∥∥2F

n

)p/2

≤(ζ 2

m Km

n

)p/2

= �pnm ,

which is (6).Next, suppose p > 2. Using Loeve’s Cr inequality, Liapunov’s inequality, and

(25) with q = 2 p,

E∥∥xmi x

′mi − IKm

∥∥pF ≤ 2p−1 (

E∥∥xmi x

′mi

∥∥pF +∥∥IKm

∥∥ pF

)= 2p

E‖xmi‖2 p

≤ 2pζ2 p−2m Km. (27)

Page 13: 31 THE INTEGRATED MEAN SQUARED ERROR OF SERIES …€¦ · Hilbert space, which includes the case of ra ndom matrices. The result is a sim-ple extension of a Banach space inequality

THE INTEGRATEDMEANSQUAREDERROROFSERIESREGRESSION 349

Then by Theorem 1, (26), and (27)

E∥∥Qm − IKm

∥∥pF = n−p

E

∥∥∥∥∥n∑

i=1

(xmi x

′mi − IKm

)∥∥∥∥∥p

F

≤ Ap

⎧⎨⎩(E∥∥xmi x ′

mi − IKm

∥∥2F

n

)p/2

+ E∥∥xmi x ′

mi − IKm

∥∥pF

n p−1

⎫⎬⎭≤ Ap

⎧⎨⎩(ζ 2

m Km

n

)p/2

+ 2pζ2 p−2m Km

n p−1

⎫⎬⎭≤ Ap

⎧⎨⎩(ζ 2

m Km

n

)p/2

+2p

(ζ 2

m Km

n

)(p−1)⎫⎬⎭

= Ap�pnm

(1 +2p�

p−2nm

), (28)

where the third inequality holds since Km ≤ K p−1m as p ≥ 2, and the final equality

uses definition (5). This is (7). n

Proof of Lemma 2. The argument follows the proof of Theorem 2 of Ing andWei (2003). While that result was developed for autoregressive regression, themethod carries over to the nonparametric setting under Assumption 1.

If p < 1, then(E

∥∥∥Q−1m

∥∥∥p

S

)1/p ≤ E∥∥∥Q−1

m

∥∥∥S

so without loss of generality we may assume p ≥ 1. We may also assume thatKm ≥ 1 as for Km = 0 the result is trivial.

Let s > 6/δ be an integer and set J = 2s p. The first step is to establish that forsome B <∞ which depends only on J , α, η, and ψ,(E

∥∥∥Q−1m

∥∥∥J

S

)1/J

≤ B K 3m . (29)

The proof of (29) is nearly identical to that of Lemma 1 of Ing and Wei (2003).The only difference is the method to bound the left side of their equation (2.13)

P

(t Km⋂i=1

{∣∣∣�′j xmi

∣∣∣ ≤ 3u−1/(2q)})

(30)

by the displayed expression above their (2.14). Ing and Wei develop a detailedargument using the autoregressive structure of their problem. Instead, we use the

Page 14: 31 THE INTEGRATED MEAN SQUARED ERROR OF SERIES …€¦ · Hilbert space, which includes the case of ra ndom matrices. The result is a sim-ple extension of a Banach space inequality

350 BRUCE E. HANSEN

independence of the observations, and then Assumption 1.2, to observe that (30)equals

P

(∣∣∣�′j xmi

∣∣∣ ≤ 3u−1/(2q))t Km ≤ψ t Km

(3u−1/(2q)

)αt Km

which is smaller than the bound above Ing–Wei’s (2.14). Their argument other-wise goes through with these substitutions and we find (29).

By Minkowski’s inequality, for any 1 ≤ r ≤ J/2,

(E

∥∥∥Q−1m

∥∥∥rS

)1/r ≤ ∥∥IKm

∥∥S +(E

∥∥∥Q−1n − IKm

∥∥∥rS

)1/r. (31)

By the norm inequality and the fact that ‖A‖S ≤ ‖A‖F∥∥∥Q−1m − IKm

∥∥∥S=∥∥∥(Qm − IKm

)Q−1

m

∥∥∥S

≤ ∥∥Qm − IKm

∥∥S

∥∥∥Q−1m

∥∥∥S

≤ ∥∥Qm − IKm

∥∥F

∥∥∥Q−1m

∥∥∥S. (32)

Applying the Cauchy–Schwarz inequality

(E

∥∥∥Q−1m − IKm

∥∥∥rS

)1/r ≤(E∥∥Qm − IKm

∥∥2rF

)1/2r(E

∥∥∥Q−1m

∥∥∥2r

S

)1/2r

. (33)

Set ψ = (1 +η)1/p −1 and

ε = min

2 +ψ ,(ψ

2B

)1/s]. (34)

Lemma 1 and Assumption 1.1 imply that there is an n sufficiently large such that

(E∥∥Qm − IKm

∥∥2rF

)1/2r ≤ A1/2r2r �nm

(1 +22r�2r−2

nm

)1/2r ≤ εK −δ/2m . (35)

Equations (31)–(35) plus∥∥IKm

∥∥S = 1 establish that

(E

∥∥∥Q−1m

∥∥∥rS

)1/r ≤ 1 +εK −δ/2m

(E

∥∥∥Q−1m

∥∥∥2r

S

)1/2r

. (36)

Page 15: 31 THE INTEGRATED MEAN SQUARED ERROR OF SERIES …€¦ · Hilbert space, which includes the case of ra ndom matrices. The result is a sim-ple extension of a Banach space inequality

THE INTEGRATEDMEANSQUAREDERROROFSERIESREGRESSION 351

Iterating (36) s times, starting with r = p,(E

∥∥∥Q−1m

∥∥∥p

S

)1/p ≤ 1 +εK −δ/2m

(E

∥∥∥Q−1m

∥∥∥2 p

S

)1/2 p

≤ 1 +εK −δ/2m +

(εK −δ/2

m

)2+· · ·+

(εK −δ/2

m

)s (E

∥∥∥Q−1m

∥∥∥2s p

S

)1/2s p

≤ 1 +ε+ε2 +· · ·+ εs−1 +(εK −δ/2

m

)sB K 3

m

≤ 1

1 −ε +εs B

≤ 1 +ψ= (1 +η)1/p,

where the third inequality is (29), the fourth inequality uses s > 6/δ, and the finaluses (34). This is (8). n

Proof of Lemma 3. For 0 < p < 2, applying Holder’s inequality to (32), andthen (6) and (8), for some ε > 0

E

∥∥∥Q−1m − IKm

∥∥∥ p

S≤(E∥∥Qm − IKm

∥∥2F

)p/2(E

∥∥∥Q−1m

∥∥∥2 p/(2−p)

S

)(2−p)/2

≤ (1 +ε)(2−p)/2�pnm ,

which is (9) with η= (1 +ε)(2−p)/2 −1.For p ≥ 2, Lemma 1 and Assumption 1.1 imply that for any η > 0 there is an

n sufficiently large such that(E∥∥Qm − IKm

∥∥2 pF

)1/2 ≤ A1/22 p �

pnm

(1 +22 p�

2 p−2nm

)1/2 ≤ (1 +η)1/2 A1/22 p �

pnm .

(37)

Then by (33), (37), and (8), we obtain for n sufficiently large

E

∥∥∥Q−1m − IKm

∥∥∥ p

S≤(E∥∥Qm − IKm

∥∥2 pF

)1/2(E

∥∥∥Q−1m

∥∥∥2 p

S

)1/2

≤ (1 +η)A1/22 p �

pnm .

This is (10). n

Proof of Lemma 4. Observe that under Assumption 2.1, the IMSE (16) isunaffected if we replace the regressors xmi with x∗

mi = Q−1/2m xmi which has the

implication thatE(x∗

mi x∗′mi

)= IKm . For convenience and without loss of generalitywe shall simply assume that this transformation has been made, or equivalentlythat Qm = IKm , and thus we can apply Lemmas 1–3 without modification.

Page 16: 31 THE INTEGRATED MEAN SQUARED ERROR OF SERIES …€¦ · Hilbert space, which includes the case of ra ndom matrices. The result is a sim-ple extension of a Banach space inequality

352 BRUCE E. HANSEN

Define Sm = 1

n

∑ni=1 xmiemi so that βm −βm = Q−1

m Sm. Since Sm is the average

of iid mean zero random vectors, we calculate that

E(S′

m Sm)= 1

nE

(x ′

mi xmie2mi

)= tr�∗

m

n.

Thus

E

((βm −βm

)′ (βm −βm

))−( tr�∗m

n

)= E(

S′m Q−1

m Q−1m Sm

)−(

tr�∗m

n

)= E(

S′m

(Q−1

m Q−1m − IKm

)Sm

). (38)

Now define S0m = 1

n

∑ni=1 xmiei and γm = 1

n

∑ni=1 xmirmi so that Sm = S0

m + γm .

Define Z = (z1, ..., zn) and note that γm is measurable with respect to Z ,

E(S0

m|Z) = 0 and E(S0

m S0′m |Z) = 1

n�m where �m = 1

n

∑ni=1 xmi x ′

miσ2i . Using

the law of iterated expectations, we find that (38) equals

E

(E

(S′

m

(Q−1

m Q−1m − IKm

)Sm|Z))

= E(γ ′

m

(Q−1

m Q−1m − IKm

)γm

)+ 1

nE tr((

Q−1m Q−1

m − IKm

)�m

)= 2E(γ ′

m

(Q−1

m − IKm

)γm

)(39)

+E(γ ′

m

(Q−1

m − IKm

)(Q−1

m − IKm

)γm

)(40)

+ 2

nE tr((

Q−1m − IKm

)�m

)(41)

+ 1

nE tr((

Q−1m − IKm

)(Q−1

m − IKm

)�m

). (42)

We now bound the four terms (39)–(42).> From Assumption 2.2 and (24) we deduce

tr(�∗

m

)≥ E(x ′mi xmiσ

2i

)≥ E (x ′

mi xmi)σ 2 = Kmσ

2,

so that

Km ≤ tr(�∗

m

)σ 2

. (43)

Furthermore, note that since Km is a nonnegative integer and ζ 2m = 0 if Km = 0

then (43) implies

ζ 2m

n≤ ζ 2

m K 2m

n≤ �2

nmtr(�∗

m

)σ 2

. (44)

Page 17: 31 THE INTEGRATED MEAN SQUARED ERROR OF SERIES …€¦ · Hilbert space, which includes the case of ra ndom matrices. The result is a sim-ple extension of a Banach space inequality

THE INTEGRATEDMEANSQUAREDERROROFSERIESREGRESSION 353

We will also make use of the trace inequality, which states that for symmetric�× � A and positive semidefinite B,

|tr(AB)| ≤ ‖A‖S tr(B) . (45)

For vectors a, let ‖a‖ = (a′a)1/2

denote the Euclidean norm. Notice that

E‖xmirmi‖2 = E(

x ′mi xmir

2mi

)≤ tr(�∗

m

). (46)

Define

rm = supz∈Z

|rm(z)| (47)

which is bounded under Assumption 2.3. Using (44), (46), and (47),

n−3E‖xmirmi‖4 ≤ r2

mζ2mE‖xmirmi‖2

n3≤ r 2

m

σ 2�2

nm

(tr(�∗

m

)n

)2

. (48)

Since ‖a‖ = ‖a‖F , we can apply Theorem 1. Using (1) with p = 4, (46),and (48),

(E‖γm‖4

)1/2 ≤ A1/24

{(1

nE‖xmirmi ‖2

)2

+n−3E‖xmirmi ‖4

}1/2

≤ A1/24

⎧⎨⎩(

tr(�∗

m

)n

)2

+ r2m

σ 2�2

nm

(tr(�∗

m

)n

)2⎫⎬⎭

1/2

= A1/24

(1 + r 2

m

σ 2�2

nm

)1/2tr(�∗

m

)n

. (49)

Now, consider (39). Using the trace inequality (45), the Cauchy–Schwarz in-equality, Lemma 3, and (49),

E

∣∣∣γ ′m

(Q−1

m − IKm

)γm

∣∣∣≤ E(∥∥∥Q−1m − IKm

∥∥∥Sγ ′

m γm

)≤(E

∥∥∥Q−1m − IKm

∥∥∥2S

)1/2(E‖γm‖4

)1/2

≤ (1 +η)1/2 A3/44

(1 + r2

m

σ 2�2

nm

)1/2

�nmtr(�∗

m

)n

. (50)

Page 18: 31 THE INTEGRATED MEAN SQUARED ERROR OF SERIES …€¦ · Hilbert space, which includes the case of ra ndom matrices. The result is a sim-ple extension of a Banach space inequality

354 BRUCE E. HANSEN

Similarly, to bound (40),

E

∣∣∣γ ′m

(Q−1

m − IKm

)(Q−1

m − IKm

)γm

∣∣∣≤ E(∥∥∥Q−1

m − IKm

∥∥∥2Sγ ′

m γm

)≤(E

(∥∥∥Q−1m − IKm

∥∥∥4S

))1/2(E‖γm‖4

)1/2

≤ (1 +η)1/2 A1/28 A1/2

4

(1 + r 2

m

σ 2�2

nm

)1/2

�2nm

tr(�∗

m

)n

. (51)

For (41), first observe that tr(�m)= 1

n

∑ni=1 x ′

mi xmiσ2i so

E(tr(�m )

)= E(x ′mi xmiσ

2i

)≤ tr(�∗

m

).

Using (44),

n−3 var(tr(�m )

)= n−4E

(‖xmi‖4 σ 4

i

)≤ ζ 2

mσ2

n4E

(‖xmi‖2 σ 2

i

)≤ σ 2

σ 2�2

nm

(tr(�∗

m

)n

)2

,

and thus(n−2

E(tr(�m)

)2)1/2 =(

1

n3var(tr(�m )

)+(1

nE(tr(�m)

))2)1/2

≤⎛⎝σ 2

σ 2�2

nm

(tr(�∗

m

)n

)2

+(

tr(�∗

m

)n

)2⎞⎠1/2

=(

1 + σ 2

σ 2�2

nm

)1/2tr(�∗

m

)n

. (52)

By the trace inequality, Lemma 3, and (52),

1

nE

∣∣∣tr((Q−1m − IKm

)�m

)∣∣∣≤ 1

nE

(∥∥∥Q−1m − IKm

∥∥∥S

tr(�m))

≤(E

(∥∥∥Q−1m − IKm

∥∥∥2S

))1/2 (n−1

E(tr(�m)

)2)1/2

≤ (1+η)1/2 A1/44 �nm

(1+ σ 2

σ 2�2nm

)1/2tr(�∗

m

)n

, (53)

Page 19: 31 THE INTEGRATED MEAN SQUARED ERROR OF SERIES …€¦ · Hilbert space, which includes the case of ra ndom matrices. The result is a sim-ple extension of a Banach space inequality

THE INTEGRATEDMEANSQUAREDERROROFSERIESREGRESSION 355

bounding (41).Similarly, to bound (42),

1

nE

∣∣∣tr((Q−1m − IKm

)(Q−1

m − IKm

)�m

)∣∣∣≤ 1

nE

(∥∥∥Q−1m − IKm

∥∥∥2S

tr(�m))

≤(E

(∥∥∥Q−1m − IKm

∥∥∥4S

))1/2(n−2

E(tr(�m)

)2)1/2

≤ (1 +η)1/2 A1/48 �2

nm

(1 + σ 2

σ 2�2

nm

)1/2tr(�∗

m

)n

. (54)

Together, (50), (51), (53), and (54), plus �nm → 0 uniformly in m ≤ Mn underAssumption 1.1, we find that the absolute value of (39)–(42) is bounded by[(

2 (1 +η)1/2 A3/44 + (1 +η)1/2 A1/4

4

)(1 +o(1))+o(1)

]�nm

tr(�∗

m

)n

≤ A4�nmtr(�∗

m

)n

the inequality using 3 ≤ A1/44 , sufficiently small η, and sufficiently large n. This

is (16). n

Proof of Theorem 2. Using (13) and (14) we see that

I M S En(m)− I ∗n (m) = E

((βm −βm

)′Qm(βm −βm

))− tr(Q−1

m �∗m

)n

.

Using Lemma 4 and sup1≤m≤Mn�nm = o(1)

∣∣∣∣ I M S En(m)− I ∗n (m)

I ∗n (m)

∣∣∣∣ ≤ A4�nmtr(Q−1

m �∗m

)n

I ∗n (m)

≤ o(1),

uniformly in m ≤ Mn, which is (17).Next, since x ′

mi xmi ≤ ζ 2m = o(n) and E

(r2

mi

)= φ2m ≤ In(m)∣∣∣∣ I ∗

n (m)− In (m)

In(m)

∣∣∣∣= tr(�∗

m

)− tr(�m)

n In(m)

= E(x ′

mi xmir2mi

)n In(m)

≤ ζ 2m

n= o(1)

uniformly in m ≤ Mn . Combined with (17) this shows (18). n

Page 20: 31 THE INTEGRATED MEAN SQUARED ERROR OF SERIES …€¦ · Hilbert space, which includes the case of ra ndom matrices. The result is a sim-ple extension of a Banach space inequality

356 BRUCE E. HANSEN

Proof of Theorem 3. Define Qm� = E(xmi x ′

�i

)and γm� = E (xmir�i ) .

Using (12),∫Z((gm(z)− g(z)) (g�(z)− g(z))) f (z)dz

=∫Z

rm (z)r�(z) f (z)dz − (βm −βm)′ ∫Z

xm(z)r�(z) f (z)dz

−∫Z

rm (z)x�(z)′ f (z)dz

(β�−β�

)+(βm −βm

)′∫Z

xm(z)x�(z)′ f (z)dz

(β�−β�

)= E (rmi r�i )−γ ′

m�

(βm −βm

)−γ ′�m

(β�−β�

)+(βm −βm

)′Qm�(β�−β�

). (55)

Assumption 3 (nested regressors) implies that for m ≤ �, γm� = 0. Notice alsothat rmi = g(zi)− x ′

miβm = r�i + x ′�iβ�− x ′

miβm and thus

E (rmir�i )= E((

r�i +β ′�x�i −β ′

m xmi)

r�i)

= E(

r2�i

)+β ′

�E (x�i r�i )−β ′mE (xmir�i )

= φ2� . (56)

As in the proof of Lemma 4, without loss of generality we assume that Qm = IKm

for all m. Combined with Assumption 3 this implies Qm� = [IKm 0]. Thus form ≤ � (55) equals

φ2� −γ ′

�m

(β�−β�

)+ (βm −βm)′ (βm −βm

).

Thus

I M S En(w)− I ∗n (w)

=Mn∑

m=1

w2m

(E(βm −βm

)′ (βm −βm

)− tr(�∗

m

)n

)(57)

+2Mn∑�=1

�−1∑m=1

w�wm

(E(βm −βm

)′ (βm −βm

)− tr(�∗

m

)n

)(58)

−2Mn∑�=1

w�γ′�E(β�−β�

), (59)

where

γ � =�−1∑m=1

wmγ�m = E(

x�i

�−1∑m=1

wmrmi

). (60)

Page 21: 31 THE INTEGRATED MEAN SQUARED ERROR OF SERIES …€¦ · Hilbert space, which includes the case of ra ndom matrices. The result is a sim-ple extension of a Banach space inequality

THE INTEGRATEDMEANSQUAREDERROROFSERIESREGRESSION 357

From Lemma 4, the absolute value of the sum of (57) and (58) is smaller than

Mn∑m=1

w2m A4�nm

tr(�∗

m

)n

+2Mn∑�=1

�−1∑m=1

w�wm A4�nmtr(�∗

m

)n

≤ A4 maxm≤Mn

�nm

( Mn∑m=1

w2m

tr(�∗

m

)n

+2Mn∑�=1

�−1∑m=1

w�wmtr(�∗

m

)n

)≤ o(1)I ∗

n (w) (61)

uniformly in w ∈Wn , the final inequality since maxm≤Mn �nm = o(1).Now consider (59). Notice that

E

∣∣∣∣∣�−1∑m=1

wmrmi

∣∣∣∣∣2

=�−1∑m=1

w2mEr2

mi +2�−1∑j=1

j−1∑m=1

wjwmEr2j i

≤Mn∑

m=1

w2mφ

2m +2

Mn∑j=1

j−1∑m=1

wjwmφ2j

≤ I ∗n (w). (62)

Applying the Cauchy–Schwarz inequality to (60), using (24) and (62), then

∥∥γ �∥∥ ≤(E‖x�i‖2

)1/2⎛⎝E ∣∣∣∣∣Mn∑

m=1

wmrmi

∣∣∣∣∣2⎞⎠1/2

= K 1/2� I ∗

n (w)1/2. (63)

As in the proof of Lemma 2 define γ� = n−1∑ni=1 x�ir�i which is an average of

iid mean zero random vectors so

E‖γ�‖2 = 1

nE

(x ′�i x�ir

2�i

)≤ ζ 2

� φ2�

n. (64)

By the law of iterated expectations we can deduce that E(β�−β�

)= EQ−1� γ� =

E

(Q−1� − IK�

)γ�. Thus using the Cauchy–Schwarz inequality, Lemma 3,

and (64),

∥∥E (β�−β�)∥∥≤(E

(∥∥∥Q−1� − IK�

∥∥∥2S

))1/2(E‖γ�‖2

)1/2

≤ 21/2 A1/44 �n�

(ζ 2� φ

2�

n

)1/2

. (65)

Page 22: 31 THE INTEGRATED MEAN SQUARED ERROR OF SERIES …€¦ · Hilbert space, which includes the case of ra ndom matrices. The result is a sim-ple extension of a Banach space inequality

358 BRUCE E. HANSEN

Applying the Triangle and Cauchy–Schwarz inequalities, (63), and (65),∣∣∣∣∣Mn∑�=1

w�γ′�E(β�−β�

)∣∣∣∣∣≤Mn∑�=1

w�∥∥γ �∥∥∥∥E (β�−β�

)∥∥≤ (1 +η)1/2 A1/4

4 I ∗n (w)

1/2Mn∑�=1

w�

(�2

n�φ2�

)1/2

≤ (1 +η)1/2 A1/44 I ∗

n (w)1/2

( Mn∑�=1

w��2n�φ

2�

)1/2

≤ o(1)I ∗n (w), (66)

where the third inequality is Liapunov’s, noting that the weights w� define aprobability measure, and the final inequality uses maxm≤Mn �nm = o(1) and∑Mn�=1w�φ

2� ≤ I ∗

n (w).

Together, (61) and (66) show that

supw∈Wn

∣∣∣∣ I M S En(w)− I ∗n (w)

I ∗n (w)

∣∣∣∣≤ o(1),

as stated. n

Proof of Theorem 4. As discussed in Newey (1997, p. 150), since g(z) has scontinuous derivatives under Assumption 4.1, then

φ2m = inf

βE(g(zi)−β ′xm(zi )

)2 ≤ infβ

supz

∣∣g(z)−β ′xm(z)∣∣2 ≤ O

(K −2s/q

m

).

Furthermore, as discussed earlier, ζ 2m = O(Km) for splines and ζ 2

m = O(K 2m)

for power series (see Andrews, 1991; Newey, 1995). It follows that ζ 2mφ

2m =

O(K 1−2s/qm ) for splines and ζ 2

mφ2m = O(K 2−2s/q

m ) for power series. In either case,Assumption 4.1 implies there is some D<∞ such that for all m,E

(x ′

mi xmir2mi

)≤ζ 2

mφ2m ≤ D.

It follows that

n∣∣I ∗

n (w)− In(w)∣∣= Mn∑

m=1

w2mE

(x ′

mi xmir2mi

)

+2Mn∑�=1

�−1∑m=1

w�wmE

(x ′

mi xmir2mi

)≤ D. (67)

Below we show that

infw

n In(w)→ ∞. (68)

Page 23: 31 THE INTEGRATED MEAN SQUARED ERROR OF SERIES …€¦ · Hilbert space, which includes the case of ra ndom matrices. The result is a sim-ple extension of a Banach space inequality

THE INTEGRATEDMEANSQUAREDERROROFSERIESREGRESSION 359

Together, (67) and (68) imply that as n → ∞

supw∈Wn

∣∣∣∣ I ∗n (w)− In(w)

In(w)

∣∣∣∣ ≤ D

infw n In(w)−→ 0

as stated.To complete the proof, we now show (68). Let wn denote the solution to the

infimum problem (68), thus infw In(w)= In(wn), and rewrite (19) as

n In(w)=Mn∑

m=1

Mn∑�=1

wmw�

(nφ2

m∨�+ gm∧�), (69)

where gm = tr(Q−1

m �m).

Notice that n In(w)≥ nφ2Mn, which will diverge if Mn is bounded, since φ2

m > 0for all m by Assumption 4.2. Thus without loss of generality we can assume thatMn → ∞ as n → ∞.

We prove (68) by contradiction. Suppose (68) is false. Then there is a subse-quence {n} and a constant B <∞ such that for all n along this subsequence,

n In(wn) ≤ B. (70)

For some 0< ε < 1 set M∗ so that

KM∗ ≥ B

σ 2ε2. (71)

Assume that n is sufficiently large so that Mn ≥ M∗.Note that (70), (69) φ2

m ≥ 0 and gm ≥ 0 imply that

B ≥ n In(wn)

≥Mn∑

m=M∗

Mn∑�=M∗

wnmw

n�gm∧�

≥ σ 2Mn∑

m=M∗

Mn∑�=M∗

wnmw

n� Km∧�

≥ σ 2

( Mn∑m=M∗

wnm

)2

KM∗

≥( Mn∑

m=M∗wn

m

)2B

ε2 , (72)

where the third inequality uses gm ≥ σ 2 Km which is implied by Assumption 2.2,the fourth inequality uses Km∧� ≥ KM∗ for m, �≥ M∗, and the fifth inequality is(71). (72) implies that

Page 24: 31 THE INTEGRATED MEAN SQUARED ERROR OF SERIES …€¦ · Hilbert space, which includes the case of ra ndom matrices. The result is a sim-ple extension of a Banach space inequality

360 BRUCE E. HANSEN

Mn∑m=M∗

wnm ≤ ε. (73)

By a similar argument,

n In(wn) ≥

M∗−1∑m=1

M∗−1∑�=1

wnmw

n�nφ2

m∨�

≥⎛⎝M∗−1∑

m=1

wnm

⎞⎠2

nφ2M∗

≥ (1 −ε)2 nφ2M∗

−→ ∞the third inequality by (73) and the final convergence since ε < 1 and φ2

M∗ > 0by Assumption 4.2. This contradicts (70), establishing (68) by contradiction andcompleting the proof. n

REFERENCES

Andrews, D.W.K. (1991) Asymptotic normality of series estimators for nonparametric and semipara-metric models. Econometrica 59, 307–345.

Bestsennaya, E.V. & S.A. Utev (1991) An exact upper bound for the even moment of sums of inde-pendent random variables. Siberian Mathematical Journal 32, 139–141.

Burkholder, D.L. (1973) Distribution function inequalities for martingales. The Annals of Probability1, 19–42.

Chen, X. (2007) Large sample sieve estimation of semi-nonparametric models. In J.J. Heckman &E.E. Leamer (eds.), Handbook of Econometrics, vol. 6B. North-Holland.

Davidson, J. (1994) Stochastic Limit Theory: An Introduction for Econometricians. Oxford UniversityPress.

De Acosta, A. (1981) Inequalities for B-valued random vectors with applications to the strong law oflarge numbers. The Annals of Probability 9, 157–161.

De Jong, R.M. (2002) A note on ‘Convergence rates and asymptotic normality for series estimators’:Uniform convergence rates. Journal of Econometrics 111, 1–9.

de la Pena, V.H. & E. Gine (1999) Decoupling: From Dependence to Independence. Springer.de la Pena, V.H., R. Ibragimov, & S. Sharakhmetov (2003) On extremal distributions and sharp Lp-

bounds for sums of multilinear forms. The Annals of Probability 31, 630–675.Hall, P. & C.C. Heyde (1980) Martingale Limit Theory and Its Application. Academic Press.Hansen, B.E. (2007) Least squares model averaging. Econometrica 75, 1175–1189.Hansen, B.E. (2014) Nonparametric sieve regression: Least squares, averaging least squares, and

cross-validation. In J. S. Racine, L. Su & A. Ullah (eds.), Handbook of Applied Nonparametricand Semiparametric Econometrics and Statistics, Oxford University Press.

Hansen, B.E. & J.S. Racine (2012) Jackknife model averaging. Journal of Econometrics, 167, 38–46.Ibragimov, R. (1997) Estimates for moments of symmetric statistics. Ph.D. dissertation, Institute of

Mathematics of Uzbek Academy of Sciences.Ibragimov, M. & R. Ibragimov (2008) Optimal constants in the Rosenthal inequality for random vari-

ables with zero odd moments. Statistics and Probability Letters 78, 186–189.

Page 25: 31 THE INTEGRATED MEAN SQUARED ERROR OF SERIES …€¦ · Hilbert space, which includes the case of ra ndom matrices. The result is a sim-ple extension of a Banach space inequality

THE INTEGRATEDMEANSQUAREDERROROFSERIESREGRESSION 361

Ibragimov, R. & S. Sharakhmetov (1997) On an exact constant for the Rosenthal inequality. Theory ofProbability and Its Applications 42, 294–302.

Ibragimov, R. & S. Sharakhmetov (2002) The exact constant in the Rosenthal inequality for randomvariables with mean zero. Theory of Probability and Its Applications 46, 127–131.

Ing, C.-K. & C.-Z. Wei (2003) On same-realization prediction in an infinite-order autoregressive pro-cess. Journal of Multivariate Analysis 85, 130–155.

Lai, T.L. & C.-Z. Wei (1984) Moment inequalities with applications to regression and time seriesmodels. Inequalities in Statistics and Probability. IMS Lecture Notes Monograph Series, vol. 5, pp.165–172. Institute of Mathematical Statistics.

Li, Q. & J.S. Racine (2006) Nonparametric Econometrics: Theory and Practice. Princeton UniversityPress.

Marcinkiewicz, J. & A. Zygmund (1937) Sur les foncions independantes. Fund. Math. 28, 60–90.Newey, W.K. (1995) Convergence rates for series estimators. In G.S. Maddalla, P.C.B. Phillips, &

T.N. Srinavasan (eds.), Statistical Methods of Economics and Quantitative Economics: Essays inHonor of C.R. Rao, pp. 254–275. Blackwell.

Newey, W.K. (1997) Convergence rates and asymptotic normality for series estimators. Journal ofEconometrics 79, 147–168.

Nze, P.A. & P. Doukhan (2004) Weak dependence: Models and applications to econometrics. Econo-metric Theory 20, 995–1045.

Rosenthal, H.P. (1970) On the subspaces of Lp (p > 2) spanned by sequences of independent randomvariables. Israel Journal of Mathematics 8, 273–303.

Rosenthal, H.P. (1972) On the span in L p of sequences of independent random variables. Sixth Berke-ley Symposium on Mathematical Statistics and Probability, vol. 2, pp. 149–167. University of Cal-ifornia Press.

Song, K. (2008) A uniform convergence of series estimators over function spaces. Econometric Theory24, 1463–1499.

Utev, S.A. (1985) Extremal problems in moment inequalities. Proceedings of the Mathematical Insti-tute of the Siberian Branch of the USSR Academy of Sciences 5, 56–75.

Utev, S.A. (1991) A method for investigating the sums of weakly dependent random variables.Siberian Mathematical Journal 32, 675–690.