Convergence rates for series estimators

54

Transcript of Convergence rates for series estimators

Page 1: Convergence rates for series estimators
Page 2: Convergence rates for series estimators
Page 3: Convergence rates for series estimators
Page 4: Convergence rates for series estimators

Digitized by the Internet Archive

in 2011 with funding from

Boston Library Consortium Member Libraries

http://www.archive.org/details/convergenceratesOOnewe

Page 5: Convergence rates for series estimators

working paper

department

of economics

massachusetts

institute of

technology

50 memorial drive

Cambridge, mass. 02139

Page 6: Convergence rates for series estimators
Page 7: Convergence rates for series estimators

CONVERGENCE RATES FOR SERIES ESTIMATORS

Whitney K. Newey

No. 93-10 July 1993

Page 8: Convergence rates for series estimators

W6 12.9J* i

rEGBVk

Page 9: Convergence rates for series estimators
Page 10: Convergence rates for series estimators
Page 11: Convergence rates for series estimators

CONVERGENCE RATES FOR SERIES ESTIMATORS

Whitney K. Newey

MIT Department of Economics

July, 1993

This paper consists of part of one originally titled "Consistency and Asymptotic

Normality of Nonparametric Projection Estimators." Helpful comments were provided by

Andreas Buja and financial support by the NSF and the Sloan Foundation.

Page 12: Convergence rates for series estimators
Page 13: Convergence rates for series estimators
Page 14: Convergence rates for series estimators
Page 15: Convergence rates for series estimators

Abstract

Least squares projections are a useful way of describing the relationship between

random variables. These include conditional expectations and projections on additive

functions. Series estimators, i.e. regressions on a finite dimensional vector where

dimension grows with sample size, provide a convenient way of estimating such

projections. This paper gives convergence rates these estimators. General results are

derived, and primitive regularity conditions given for power series and splines.

Keywords: Nonparametric regression, additive interactive models, random coefficients,

polynomials, splines, convergence rates.

Page 16: Convergence rates for series estimators
Page 17: Convergence rates for series estimators
Page 18: Convergence rates for series estimators
Page 19: Convergence rates for series estimators

1. Introduction

Least squares projections of a random variable y on functions of a random vector

x provide a useful way of describing the relationship between y and x. The simplest

example is linear regression, the least squares projection on the set of linear

combinations of x, as exemplified in Rao (1973, Chapter 4). An interesting

nonparametric example is the conditional expectation, the projection on the set of all

functions of x with finite mean square. There are also a variety of projections that

fall in between these two polar cases, where the set of functions is larger than all

linear combinations but smaller than all functions. One example is an additive

regression, the projection on functions that are additive in the different elements of

x. This case is motivated partly by the difficulty of estimating conditional

expectations when x has many components: see Breiman and Stone (1978), Breiman and

Friedman (1985), Friedman and Stuetzle (1981), Stone (1985), and Zeldin and Thomas

(1977). A generalization that includes some interaction terms is the projection on

functions that are additive in some subvectors of x. Another example is random linear

combinations of functions of x, as suggested by Riedel (1992) for growth curve

estimation.

One simple way to estimate nonparametric projections is by regression on a finite

dimensional subset, with dimension allowed to grow with the sample size, e.g. as in

Agarwal and Studden (1980), Gallant (1981), Stone (1985), Cox (1988), and Andrews (1991),

which will be referred to here as series estimation. This type of estimator may not be

good at recovering the "fine structure" of the projection relative to other smoothers,

e.g. see Buja, Hastie, and Tibshirani (1989), but is computationally simple. Also,

projections often show up as nuisance functions in semiparametric estimation, where the

fine structure is less important.

This paper derives convergence rates for series estimators of projections.

Convergence rates are important because they show how dimension affects the asymptotic

Page 20: Convergence rates for series estimators

accuracy of the estimators (e.g. Stone 1982, 1985). Also, they are useful for the

theory of semiparametric estimators that depend on projection estimates (e.g. Newey

1993a). The paper gives mean-square rates for estimation of the projection and uniform

convergence rates for estimation of functions and derivatives. Fully primitive

regularity conditions are given for power series and regression splines, as well as more

general conditions that may apply to other types of series.

Previous work on convergence rates for series estimates includes Agarwal and Studden

(1980), Stone (1935, 1990), Cox (1988), and Andrews and Whang (1990). This paper

improves on mau>y previous results in the convergence rate or generality of regularity

conditions. Uniform convergence rates for functions and their derivatives are given and

some of the results allow for a data-based number of approximating terms, unlike all but

Cox (1988). Also, the projection does not have to equal the conditional expectation, as

in Stone (1985, 1990) but not the others.

2. Series Estimators

The results of this paper concern estimators of least squares projections that can

be described as follows. Let z denote a data observation, y and x (measurable)

functions of z, with x having dimension r. Let ^ denote a mean-squared closed,

linear subspace of the set of all functions of x with finite mean-square. The

projection of y on & is

(2.1) gQ(x) = argmin „E[<y - g(x)>

2].

An example is the conditional expectation, gn(x)= E[y|x], where !» is the set of all

measurable functions of x with finite mean-square. Two further examples will be used

as illustrations, and are of interest in their own right.

Page 21: Convergence rates for series estimators

Additive-Interactive Projections: When x has more than a few distinct components it is

difficult to estimate E[y|x], a feature often referred to as the "curse of

dimensionality." This problem motivates projections that are additive in functions of

subvectors of x, so that the individual components have smaller dimension than x. One

general way to describe these is to let x., (1 = 1 L) be distinct subvectors of

x, and specify the space of functions as

(2.2) W = {Z^g^) : ngt(x

t)

Z) < «h

For example, if L = r and each x. is just a component of x, the set "§ consists of

additive functions. The projection on !* generalizes linear regression to allow for

nonparametric nonlinearities in individual regressors. The set of equation (2.2) is a

further generalization that allows for nonlinear interactive terms. For example, if each

x. is just one or two dimensional, then this set would allow for just pairwise

interactions.

Cova.ria.te Interactive Projections: As discussed in Riedel (1992), problems in growth

curve estimation motivate considering projections that are random linear combinations of

functions. To describe these, suppose x = (w,u), where w = (w w. )' is a vector

of covariates and let Hf

, (1 = 1,..., L) be sets of functions of u. Consider the

set of functions

(2.3) i? = <£. w.h.(u) : h- € H.}, E[ww' |u] is nonsingular with probability one.

In a growth curve application u represents time, so that each h.(u) represents a

covariate coefficient that is allowed to vary over time in a general way.

The estimators of gn(x) considered here are sample projections on a finite

dimensional subspace of W, which can be described as follows. Let p (x) =

Page 22: Convergence rates for series estimators

(p.„(x) Pj-j-tx))' be a vector of functions, each of which is an element of §'.

Denote the data observations by y. and x., (i = 1, 2, ...), and let y =

K K K(y. y )' and p = [p (x ) p (x )], for sample size n. An estimator of gn(x)

is

(2.4) g(x) = pK(x)'rt, n = (p

K'pKfp

K/y,

where (•) denotes a generalized inverse, and K. subscripts for g(x) and n have

K Kbeen suppressed for notational convenience. The matrix p 'p will be asymptotically

nonsingular under conditions given below, making the choice of generalized inverse

asymptotically irrelevant.

The idea of sample projection estimators is that they should approximate gn(x) if

K is allowed to grow with the sample size. The two key features of this approximation

K Kare that 1) each component of p (x) is an element of *§, and 2) p (x) "spans" !*

as K grows (i.e. for any function in !*, K can be chosen big enough that there is a

linear combination of p (x) that approximates it arbitrarily closely in mean square).

Under 1), ir estimates n s (E[p (x)p (x)']) E[p (x)y] =

K K -IK(E[p (x)p (x)']) E[p (x)gn(x)], the coefficients of the projection of gn(x) on

K Kp (x). Thus, under 1) and 2), p (x)'re will approximate gn(x). Consequently, when the

estimation error in ir is small, g(x) should approximate gn(x).

Two types of approximating functions will be considered in detail. They are power

series and splines.

Power Series: Let A = (A, A )' denote an r-dimensional vector of nonnegative1 r

Ar Art

integers, i.e. a multi-index, with norm |A| = £._,A., and let x s TT„ x. For a

sequence (A(k)). _ of distinct such vectors, a power series approximation corresponds

to

Page 23: Convergence rates for series estimators

(2.5) pkK(x) = x

Mk), (k = 1, 2, ...).

Throughout the paper it will be assumed that Mk) are ordered so that |A(k)| is

monotonically increasing. For estimating the conditional expectation E[y|x], it will

also be required that (Mk)), _. include all distinct multi-indices. This requirement

is imposed so that E[y|x] can be approximated by a power series. Additive-interactive

projections can be estimated by restricting the multi-indices so that each term p, v (x)

is an element of !?. This can be accomplished by requiring that the only Mk) that are

included are those where indices of nonzero elements are the same as the indices of a

subvector x. for some I. In addition, covariate interactive terms can be estimated by

taking the multi-indices to have the same dimension as u and specifying the

Mk)approximating functions to be PtK (x) = w

»(l.}u • where £(k) is an integer that

selects a component of w.

Power series have a potential drawback of being sensitive to outliers. It may be

possible to make them less sensitive by using power series in a bounded, one-to-one

transformation of the original data. An example would be to replace each component of x

by a logit transformation l/(l+e I).

The theory to follow uses orthogonal polynomials, which may help alleviate the well

Mk)known multicollinearity problem for power series. If each x is replaced with the

product of orthogonal polynomials of order corresponding to components of A(k), with

respect to some weight function on the range of x, and the distribution of x. is

similar to this weight, then there should be little collinearity among the different

Mk)x . The estimator will be numerically invariant to such a replacement (because

I Mk) | is monotonically increasing), but it may alleviate the well known

multicollinearity problem for power series.

Regression Splines: A regression spline is a series estimator where the approximating

function is a smooth piecwise polynomial with fixed knots (join points). They have

Page 24: Convergence rates for series estimators

some attractive features relative to power series, including being less sensitive to

singularities in the function being approximated and less oscillatory. A disadvantage

is that the theory requires that knots be placed in the support and be nonrandom (as in

Stone, 1985), so that the support must be known. The power series theory does not

require a known support.

To describe regression splines it is convenient to begin with the one-dimensional x

case. For convenience, suppose that the support of x is [-1,1] (it can always be

normalized to take this form) and that the knots are evenly spaced. Let (•) = 1(» >

0)(»). An m degree spline with L+l evenly spaced knots on [-1,1] is a linear

combination of

(2.6) P*L(v) "

v , Os.Jts'in,

<[v + 1 - 2(*-m)/(L+l)]+

>

m, m+1 £ k £ m+L

Multivariate spline terms can be formed by interacting univariate ones for different

components of x. For a set of multi-indices <A(k)>, with X.(k) £ m+L-1 for each j

and k, the approximating functions will be products of univariate splines, i.e.

(2 - 7) ^>XM,LUj) ' {k = l'- K) -

Note that corresponding to each K there is a number of knots for each component of x

and a choice of which multiplicative components to include. Throughout the paper it will

be assumed that each ratio of numbers of knots for a pair of elements of x is bounded

above and below. For estimating the conditional expectation E[y|x], it will also be

required that (Mk)). _. include all distinct multi-indices. This requirement is

imposed so that E[y|x] can be approximated by interactive splines.

Additive-interactive projections can be estimated by restricting the multi-indices in the

same way as for power series. Also, covariate interactive terms can be estimated by

forming the approximating functions as products of elements of u with splines in x

Page 25: Convergence rates for series estimators

analogously to the power series case.

The theory to follow uses B-splines, which are a linear transformation of the above

basis that is nonsingular on [-1,1] and has low multicollinearity. The low

multicollinearity of B-splines and recursive formula for calculation also lead to

computational advantages; e.g. see Powell (1981).

Series estimates depend on the choice of the number of terms K, so that it is

desirable to choose K based on the data. With a data-based choice of K, these

estimates have the flexibility to adjust to conditions in the data. For example, one

might choose K by delete one cross validation, by minimizing the sum of squared

residuals E-_Jy -g -K (x.)] , where g .„(x.) is the estimate of the regression

function computed from all the observations but the i . Some of the results to follow

will allow for data based K.

General Convergence Rates

This section derives some convergence rates for general series estimators. To do

this it is useful to introduce some conditions. Let u = y - h.(x), u. = y. - h_.(x.).J Oil 1

1/2Also, for a matrix D let II II = (trace(D'D)] , for a random matrix Y, IIYII =

v 1/v<E[ IIYII ]> , v < eo, and IIYII the infimum of constants C such that Prob( IIYII < C)

00

1.

2Assumption 3.1: {(y.,x.)> is i.i.d. and E[u |x] is bounded on the support of x..

The bounded second conditional moment assumption is quite common in the literature (e.g.

Stone, 1985). Apparently it can be relaxed only at the expense of affecting the

convergence rates, so to avoid further complication this assumption is retained.

Page 26: Convergence rates for series estimators

The next Assumption is useful for controlling the second moment matrix of the series

terms.

Assumption 3.2: For each K there is constant, nonsingular matrix A such that for

K K K KP (x) = Ap (x), the smallest eigenvalue of EIP (x)P (x)'l is bounded away from zero

uniformly in K.

Since the estimator g(x) is invariant to nonsingular linear transformations, there is

K Kreally no need to distinguish between p (x) and P (x) at this point. An explicit

transformation A is allowed for in order to emphasize that Assumption 3.2 is only

needed for some transformation. For example, Assumption 3.2 will not apply to power

series, but will apply to orthonormal polynomials.

Assumption 3.2 is a normalization that leads to the series terms having specific

magnitudes. The regularity conditions will also require that the magnitude of P (x)

not grow too fast with the sample size. The size of P (x) will be quantified by

(3.1) <d(K) = suP|A|=dxeX ..a

XPK(x)H

1/2where I is the support of x, II II = (trace(D'D)l " for a matrix D, X denotes a

vector of nonnegative integers, and

ixi = r.r,x , a

xpK(x) » a

ulpK(x)/ax

1

1-"ax r.

**j=l r 1 r

That is, <j(K) is the supremum of the norms of derivatives of order d.

The following condition places some limits on the growth of the series magnitude.

Also, it allows for data based choice of K, at the expense of imposing that series

terms are nested.

Page 27: Convergence rates for series estimators

Assumption 3.3: There are K(n) and K(n) such that K(n) £ K s K(n) with probability

K K+lapproaching one and either a) p (x) is a subvector of p (x) for all K with K(n) s

K < K+l s K(n) and £K^KCQ(K)

4/n —» 0, or; b) The P

K(x) of Assumption 3.2 is a

subvector of P (x) for all K with K(n) s K < K+l s R~(n) and < (K(n)) /n —> 0.

As previously noted, a series estimate is invariant to nonsingular linear transformations

of p (x), so that in part a) it suffices that any such transformation form a nested

sequence of vectors. Part b) is more restrictive, in requiring that the <P (x)) from

Assumption 3.2 be nested, but imposes a less stringent requirement on the growth rate of

K. Also, if K is nonrandom, so that K(n) = K = K(n), the nested sequence requirment

of both part a) and b) will be satisfied, because that requirement is vacuous when K =

K.

In order to specify primitive hypotheses for Assumptions 3.2 and 3.3 it must be

possible to find P (x) satisfying the eigenvalue condition, and having known values

for, or bounds on, Cn(K). That is, one needs explicit bounds on series terms where the

eigenvalues are bounded away from zero. It is possible to derive such bounds for both

power series and regression splines, when x is continuously distributed with a density

—4that is bounded away from zero. These bounds lead to the requirements that K /n —>

2for power series and K /n —» for regression splines with nonrandom K. These results

are described in Sections 5 and 6. It is also possible to derive such results for

Fourier series, but this is not done here because they are most suitable for

approximation of periodic functions, which have fewer applications. It may also be

possible to derive results for Gallant's (1981) Fourier flexible form, although this is

more difficult, as described in Gallant and Souza (1991). In terms of this paper, the

problem with the Fourier flexible form is that the linear and quadtratic terms can be

approximated extremely quickly by the Fourier terms, leading to a multicollinearity

problem so severe that simultaneous satisfaction of Assumptions 3.2 and 3.3 would impose

Page 28: Convergence rates for series estimators

very slow growth rates on K.

Assumptions 3.1 - 3.3 are useful for controlling the variance of a series estimator.

The bias is the error from the finite dimensional approximation. A supremum Sobolev

norm will be used to quantify this approximation. For a measurable function f(x)

defined on X and a nonnegative integer d, let

|f|d

= maX|A|sd

maXx€Z

|aAf(x)l'

and |f | , equal to infinity if 5 f(x) does not exist for some \X\ ad and x e J.

Many of the results will be based on the following polynomial approximation rate

condition.

Assumption 3.4: There is a nonnegative integer d and constants C, a > such that

for all K there is n with |g - p * irI . s CK .

This condition is not primitive, but is known to be satisfied in many cases. Typically,

the higher the degree of derivative of g(x) that exists, the bigger a and/or d can

be chosen. This type of primtive condition will be explicitly discussed for power series

in Section 5 and for splines in Section 6. It is also possible to obtain results when the

approximation rate is for an L norm, rather than the sup norm. However, this

generalization leads to much more complicated results, and so is not given here.

These assumptions will imply both mean square, and uniform convergence rates for the

series estimate. The first result gives mean-square rates. Let F(x) denote the CDF of

x.

Theorem 3.1: If and Assumptions 3.1 - 3.4 are satisfied for d = then

l^fgUJ-gjxjf/n = O (K/n * K2*),

Slg(x)-gn(x))2dF(x) = (K/n + K

2cL).

O p

10

Page 29: Convergence rates for series estimators

The two terms in the convergence rate essentially correspond to variance and bias. The

first conclusion, on sample mean square error, is similar to those of Andrews and Whang

(1991) and Newey (1993b), but the hypotheses are different. Here the number of terms K

is allowed to depend on the data, and the projection residual u need not satisfy

E[u|x] = 0, at the expense of requiring Assumptions 3.2 and 3.3, that were not imposed

in these other papers. Also, the second conclusion, on integrated mean square error, has

not been previously given at this level of generality, although Stone (1985) gave

specific results for spline estimation of an additive projection.

The next result gives uniform convergence rates.

Theorem 3.2: If Assumptions 3.1, 3.2, 3.3 b), and 3.4 are satisfied for a nonnegative

integer d then

\g - g \

d= O

p((;d(K)[(K/n)

1/2+ K~*]).

There does not seem to be in the literature any previous uniform convergence results that

cover derivatives and general series in the way this one does. Furthermore, for the

univariate power series case, the convergence rate that is implied by this result

improves on that of Cox (1988), as further discussed in Section 4. These uniform rates

do not attain Stone's (1982) bounds, although they do appear to improve on previously

known rates.

For specific classes of functions !? and series approximations, more primitive

conditions for Assumptions 3.2 - 3.4 can be specified in order to derive convergence

rates for the estimators. These results are illustrated in the next two Sections, where

convergence rates are derived for power series and regression spline estimators of

additive interactive and covariate interactive functions.

11

Page 30: Convergence rates for series estimators

4. Additive Interactive Projections

This Section gives convergence rates for power series and regression spline

estimators of additive interactive functions. The first regularity condition

restricts x to be continuously distributed.

Assumption 4.1: x is continuously distributed with a support that is a cartesian

product of compact intervals, and bounded density that is also bounded away from zero.

This assumption is useful for showing that the set of additive-interactive functions is

closed. Also, this condition leads to Assumptions 3.2 and 3.3 being satisfied with

explicit formulae for C«(K). For power series it is possible to generalize this

condition, so that the density goes to zero on the boundary of the support. For

simplicity this generalization is not given here, although the Lemmas given in the

appendix can be used to verify the Section 3 conditions in this case.

It is also possible to allow for a discrete regressor with finite support, by

including all dummy variables for all points of support of the regressor, and all

interactions. Because such a regressor is essentially parametric, and allowing for it

does not change any of the convergence rate results, this generalization will not be

considered here.

Under Assumption 4.1 the following condition will suffice for Assumptions 3.2 and

3.3.

4.

Assumption 4.2: Either a) PkK (x) is a power series with K /n —» 0, or b) PkK(x) are

r — 2splines, the support of x is (-1,1] , K(n) = K(n) = K, and K /n —» 0.

It is possible to allow for data based K for splines and obtain similar mean-square

convergence rates to those given below. This generalization is not given here because it

would further complicate the statement of results.

12

Page 31: Convergence rates for series estimators

A primitive condition for Assumption 3.4 is the following one.

Assumption 4.3: Each of the components g, (x,), (I = 1, .... L), is continuously

differentiate of order & on the support of x..

Let a denote the maximum dimension of the components of the additive interactive

function. This condition can be combined with known results on approximation rates for

power series and splines to show that Assumption 3.4 is satisfied for d = and a = &/n.

and with a = /i-d when a = 1. The details are given in the appendix.

These conditions lead to the following result on mean-square convergence.

Theorem 4.1: If Assumptions 3.1, and 4.1 - 4.3 are satisfied, then

Z^&xJ-gJxjf/n = (K/n + K~2A/

"\>, S[g(x)-g (x)]2dF(x) = O (K/n * K~

2a/a;.

The integrated mean square error result for splines that is given here has previously

been derived by Stone (1990). The rest of this result is new, although Andrews and Whang

(1990) give the same conclusion for the sample mean square error of power series under

different hypotheses. An implication of Theorem 4.1 is that power series will have an

optimal integrated mean-square convergence rate if the number of terms is chosen randomly

between certain bounds. If there are C a c > such that K = cn , K = Cn , where y

= a/(2A+a), and a > 3o/2, then the mean-square convergence rate n ~ , which

attains Stone's (1982) bound. The side condition that & > 3n/2 is needed to ensure K

= Cn satisfies Assumption 4.2. A similar side condition is present for the

spline version of Stone (1990), but it has the less strigent form of o. > a/2.

Theorem 3.2 can be specialized to obtain uniform convergence rates for power series

and spline estimators.

13

Page 32: Convergence rates for series estimators

Theorem 4.2: If Assumptions 3.1, and 4.1 - 4.3 are satisfied, then for power series

\g - g \ = (K[(K/n)1/2

+ K^l),

and for regression splines,

If - gQ \= O

p(K

1/2[(K/n)

1/2+ K^l).

Obtaining uniform convergence rates for derivatives is more difficult, because

approxirnaton rates are difficult to find in the literature. When the argument of each

function is only one dimensional, an approximation rate follows by a simple integration

argument (e.g. see Lemma A. 12 in the Appendix). This approach leads to the following

convergence rate for the one-dimensional (i.e. additive model) case.

Theorem 4.3: If Assumptions 3.1 and 4.1-4.3 are satisfied, 1 = 1, d < &, p (x) is a

power series or a regression spline with m fc d, h-d, then for power series,

i- i n /T?l+2d, rlz, .1/2 -,-A+d..\g - gQ \ d

= O (K {[K/n] + jc ;;,

and for splines

\g - gQ \ d= (K {[K/n] + K }).

In the case of power series, it is possible to obtain an approximation rate by a

Taylor expansion argument when the derivatives do not grow too fast with their order.

The rate is faster than any power of K, leading to the following result.

14

Page 33: Convergence rates for series estimators

Theorem 4.4: If Assumptions 3.1 and 4.1-4.3 are satisfied, p (x) is a power series,

and there is a constant C such that for each multi-index X, the X partial

derivative of each additive component of g(x) exists and is bounded by C , then for

any positive integers a and d,

\g-g \ d- o

p(K

1+2d{[K/n]

1/2* jf

a;;.

The uniform convergence rates are not optimal in the sense of Stone (1982), but they

improve on existing results. For the one regressor, power series case Theorem 4.2

improves on Cox's (1988) rate of (K <[K/n] + K~A

>). For the other cases there do

not seem to be any existing results in the literature, so that Theorems 4.2 - 4.4 give

the only uniform convergence rates available. It would be interesting to obtain further

improvements on these results, and investigate the possibility of attaining optimal

uniform convergence rates for series estimators of additive interactive models.

5. Covariate Interactive Projections.

Estimation of random coefficient projections provides a second example of how the

general results of Section 3 can be applied to specific estimators. This Section gives

convergence rates for power series and regression spline estimators of projections on the

set & described in equation (2.3). For simplicity, results will be restricted to

mean-square and uniform convergence rates for the function, but not for its derivatives.

Also, the K. in equation (2.3) will each be taken equal to the set of all functions of

u with finite mean-square.

Convergence rates can be derived under the following analog to the conditions of

Section 4.

15

Page 34: Convergence rates for series estimators

Assumption 5.1: i) u is continuously distributed with a support that is a cartesian

product of compact intervals, and bounded density that is also bounded away from zero.

K K/£ii) K is restricted to be a multiple of £ and p (x) = w®p (u) where either

_4p. K

(u) is a power series with K /n —» 0, or b) PkK (u) are splines, the support of u

r — 2is [-1,1] , K(n) = K(n) = K, and K /n —> 0. iii) Each of the components h, (u), (I

= 1, ..., L), is continuously differentiable of order a. on the support of u.; iv) w

is bounded, and E[ww' |u] has smallest eigenvalue that is bounded away from zero on the

support of u..

These conditions lead to the following result on mean-square convergence.

Theorem 5.1: If Assumptions 3.1 and 5.1 are satisfied, then

Z^&xJ-grfxjf/n = (K/n + k'2^1

"), S[g(x)-g()

(x)]2dF(x) = O (K/n + K~

2<i/r).

Also, for power series and splines respectively,

\g - g \= O

p(K[(K/n)

1/2* K~*

/r]),

\g - g \=

p(K

1/2[(K/n)

1/2* K~^r]).

An important feature of this result is that the convergence rate does not depend on £,

but is controlled by the dimension of the coefficient functions and their degree of

smoothness. This feature is to be expected, since the nonparametric part of the

projection is the coefficient functions.

16

Page 35: Convergence rates for series estimators

Appendix: Proofs of Theorems

Throughout, let C be a generic positive constant and A .(B) and A (B) bemm max

minimum and maximum eigenvalues of a symmetric matrix B. A number of lemmas will be

useful in proving the results. First some Lemmas on mean-square closure of certain

spaces of functions are given.

2Lemma A.1: If H is linear and closed and E[\\w\\ ] < w then {w' a+h(x) : h e K} is

closed.

Proof: Let u = w-P(w|W), so that w'a + h(x) = u'a + h(x)+P(w|W a. Therefore, it

suffices to assume that w is orthogonal to H. It is well known that finite

dimensional spaces are closed, and that direct sums of closed orthogonal subspaces are

closed, giving the conclusion. QED.

Lemma A.2: Consider sets H ., (j = 1, .... J), of functions of a random vector x. If

each H . is closed and w is a J x 1 random vector such that Cl(x) = E[ww' \x] is

bounded and has smallest eigenvalue bounded away from zero, then {T . ,w h Xx) : h . € H .}

is closed.

Proof: By iterated expectations, E[<w'h(x)> ] = E[h(x)'n(x)h(x)] £ CE[h(x)'h(x)]

Lemma A.3: Suppose that i) for each x., (I = 1, ..., L), if x is a subvector of x„

then x = x., for some I' , and ii) There exists a constant c > 1 such that for each

I, with the partitioning x = (x'[txCt

' )', for any a(x) > 0, cSa(x)dlF(x

()'F(x

C

t)] *

E[a(x)l £ c~1Sa(x)d[F(x

l)'F(x

C

l)]. then {Z^^x^t ElhfxJ

2] < », I = 1 L)

is closed in mean-square.

L - - 2 1/2Proof: Let H = iZ^hfa)} and II a II ^ = [Ja(x) dF(x)J . By Proposition 2 of Section

4 of the Appendix of Bickel, Klaasen, Ritov, and Wellner (1993), K is closed if and

17

Page 36: Convergence rates for series estimators

only if there is a constant C such that for each h e H, II h II £ Cmax.dlh.ll } for some

h, (note h- need not be unique). Following Stone (1990, Lemma 1), suppose that the

maximal dimension of x. is r, and suppose that this property holds whenever the

maximal dimension of the x- is r-1 or less. Then there is a unique decomposition h

= E»,h.(x.), such that for all x. t that are strict subvectors of x.,

E[h.(xJS(x., )] = for all measurable functions of x., with finite mean-square.

Consequently, it suffices to show that for any "maximal" xf

, that is not a proper

2subvector of any other x., that there is a constant c > 1 such that E[h(x) ] £

-1 ~ 2c E[h.(x. ) 1. To show this property, note that that holding fixed the vector of

c

*• ~components x. of x that are not components of x» each I * k, h.(xJ is a

function of a strict subvector of x.. Then,

E[h(x)2

] s c"1J<h

/t(x

/t) + ^/t

he(x

i)}2dF(x

/t)dF(x^)

= c~lSlS<h

kixk

) + lt^khtix

t))2dF{x

k)]6Fix

k)

= c'Vir^x^.)2 + {J^h^x^AdFfx^ldF^)

a c'Vl/h^x^dFtx^ldFlx^) = c^Elh^x^)2

]. QED.

The next few Lemmas consist of useful convergence results for random matrices with

dimension that can depend on sample size. Let Z and Z denote symmetric matrices such

matrices, and X ( • ) and X . ( • ) the smallest and largest eigenvalues respectively.max mm

Lemma A.4: If X . (Z) a C with probability approaching one (w.p.a.l) and IIZ-ZII = o (1)

then X . (Z) s C w.p.a.l.min

Proof: For a conformable vector ji, it follows by II • II a matrix norm that

X . (Z) = min M ,, ,<n'ZM + >x'(Z-Z)n> £ X . (Z) - X (Z-Z) a X . (Z) - IIZ-ZII amm llfill=l*-»"» »" min max mm

C - o (1). Therefore, X . (Z) a C/2 w.p.a.l. QEDp min r

18

Page 37: Convergence rates for series estimators

Lemma A.5: If \ . (Z) £ C w.p.a.l, Ill-Ill = o (1), and D is a conformable matrixmm p n-\/7 --\/7

such that HZ D II = (e ) for some e , then HZ D II = (e ).

n p n n n p n

Proof: It is easy to show that for any conformable matrices A and B, IIABII s II A II •II B II,

HA' BAH £ IIBIIoHA'AII, and that if B is positive semi-definite, tr(A'BA) s HAH2A (B),max

-1/?IIABII s II Ail A (B) and IIBAII s KAMA (B). Let Z be the symmetric square root ofmax max

Z which is equal to UAU' where U is an orthogonal matrix and A a diagonal matrix

-1 -1/2consisting of the square roots of the eigenvalues of Z Note that Z is positive

—\/y —i \y? —idefinite and A (Z ) = [\ (Z )] . Also by Lemma A.4, \ (Z ) = (1). Thenmax max J max p

(A.l) HZ"1/2

D II

2= tHD'lZ^-Z^lD )

n n n

s HZ"1/2

D ll

2(l + HZ~

1/2[Z-Z]Z"

1/2II) + ll(Z-Z)Z

-1D ll

2A (Z

-1)]

n n max

s (e2)[l + o (1)0 (1) + HZ-ZH

2X (Z"

1/2)

2(1)] = (e

2). QED

p n P P max p p n

Let tr(A) denote the trace of a square matrix A and u a random matrix with n

rows.

Lemma A.6: Suppose \ . (Z) a C, P is a K x n random matrix such that HP'P/n - Zl^ nun_i /y

o (1) and HZ P'u/Vnll = (e ), and p = PA where A is a random matrix. ThenP P n

tr(u'p(p'pfp'u/n) = (e2

).

Proof: Let W = P(P'P)~P' and W = p(p'p)~p' be the orthogonal projection operators

for the linear spaces spanned by the columns of P and p respectively. Since the

space spanned by p is a subset of the space spanned by P, W-W is positive

semi-definite. Let Z = P'P/n. Then by Lemma A.5, tr(u'Wu/n) s tr(u'Wu/n) =

HZ'^P'iWnll 2 = (€2

). QED.P n

Let Y and G denote random matrices with the same number of columns and n

rows, and let u = Y-G. For a matrix p let it = (p'p) p'Y and G = pit.

19

Page 38: Convergence rates for series estimators

2Lemma A.7: If tr(u'p(p'p) p'u/n) = (e ). Then for any conformable matrix n,

IIG-GII2/n s (€

2) + IIG-pnll

2/n.

p n

Proof: For W and W as in the proof of Lemma A.6, by Wp = p, and I-W idempotent,

IIG-GII2/n = trfY'WY - Y'WG - G'WY + G'G]/n = trlu'Wu + G'(I-W)G]/n

s trlu'Wu + (G-pn)'(I-W)(G-pw)]/n s (e2

) + IIG-prell2/n. QED

Lemma A.8: If X . (I) a C, llp'p/n - Zll = o (1), and tr[u'p(p'pfp'u/n] = (e2

),min pf r r -f pn

then for any conformable matrix n,

Htt-wII2

s (€2

) + (l)IIG-pirll2/n,

p n p

tr[(ir-w)'Z(jr-n)] a (e2

) + (l)IIG-pirll2/n.

p n p

Proof: By Lemma A.4, X . (p'p/n) £ C w.p.a.l, so X . (p'p/n)~ = (1). Therefore,J mm r r min r rp

for G = pit,

\\n-n\l2

s X . (p' p/n)-1

tr[(ii-ir)' (p' p/n)(ir-ir)] = (DtrfY'WY - Y'WG - G'WY + G'G]/n

s (l)[tr(u'Wu/n) + IIG-GII2/n] = (e

2) + (l)IIG-GII

2/n.

P p n p

To prove the second conclusion, note that by the triangle inequality and the

same arguments as for the previous equation,

tr[(n-w)'Z(ir-ie)] - tr[(i-ii)'[Z-p'p/n](ii-ir)] + (n-n)'(p'p/n)(ir-ii)

s llir-irll2HZ-p'p/nH + (e

2) + (l)IIG-GII

2/n = (e

2) + (l)IIG-GII

2/n. QED.

p n p p n p

Lemma A.9: If z.,...,z are i.Ld. then for any vector of functions a (z) =

a1K

(z),...,aKK(z))' and K = K(n),

\\Z

n,aK(n)

(z.)/n - E[aK(n)

(z)]\\ = ({E[aK(n)

(z)'

a

K(n)(z)]/n}

1/2).

i=l i p

20

Page 39: Convergence rates for series estimators

Proof: Let K = K(n). By the Cauchy-Schwartz inequality,

nillj^a^z.J/n - E[aK(z)]H] s {EHJj" a

K(Zj)/ii - E[a

K(z)]H

2]>1/2

£ <E[llaK

(z)ll2/n]>

1/2,

so the conclusion follows by the Markov inequality. QED.

Now let Z = r.n,PK(x.)P

K(x.)'/n and Z = /P^xjP^xl'dFfx).

^i=l 1 1

Lemma A.10: Suppose that Assumptions 3.1 - 3.3 are satisfied. If Assumptions 3.3 a) is

also satisfied

iiz - zii = opc^K^K< c/c;

4/n7

2/2; = o w.

If Assumption 3.3 b) is also satisfied then

HZ - ZII = O ([$n(K)4/n]

1/2) = o (V.

p p

Proof: Let L, = 7." PK(x.)P

K(x.)'/n and Z„ = JT

K(x)P

K(x)'dF(x). To show the first

2K K K

conclusion, note that by the Cauchy Schwartz inequality, for a (z) = P (x)®P (x),

E[maxKsK£R

HZK-ZK

H] * MZ^^-^W2= q^^ij£f%)**Az

im>1/2

S<lKSK3KE[«a

K2(z)..

2]/n>

1/2= <ZKsKaR

E[HPK

(x)..4]/n>

1/2s B^^toW^.

Then by the Markov inequality, maxKsK3^llL.-Zj.ll = ( tIKsKSK<o(K) /nl )- The firSt

conclusion then follows by IIZ-ZII s maXj.sKsjt IIL. -Zj.il w.p.a.l. To show the second

conclusion, note that w.p.a.l, Z and Z are submatrices of Lr and Z^ respectively,

whence IIZ-ZII s IIZj^-Z^H. The conclusion then follows from Lemma A.9. QED.

K KLet y = (y

xyn

)\ g = (g^) g^x^)', and p = [p (x^ p (xn)]'.

21

Page 40: Convergence rates for series estimators

Lemma A.lh If Assumptions 3.1 - 3.3 are satisfied, then

(y-g)'p(p'p)~p'(y-g)/n = (K/n).

Proof: Let u a y-g, p. = PK(x.), P = [Pj P ]', and Z = E[P'P]/n. By Assumption

3.3, there is a random matrix A such that p = PA. Also, by Lemma A. 9 and an argument

like that of the proof of Lemma A.10, HP'P/n-ZII -^-» 0. Also, by Assumption 3.1,

X . (Z) £ C. Also, ElP.u.] = by each element of P. in £, and ElP.P'.u2

] =mm 11 l ill2

E[P.P'.E[u. |x.]] s CZ, so by the data i.i.d.,

E[IIZ_1/2

P'u/nll2

] = EUHu'PZ^P'uM/n2 = tr(Z~1/2

(T.nT .

n,E[P.u.P'.u.])Z"

1/2)/n

2^i=l^j=l i i J J

= tr(Z_1/2

E[P.P'.u2]Z"

1/2)/n s tr(CIrr)/n s CK/n.ill K

-1/2— — 1/2Therefore, by the Markov inequality, IIZ P'u/nll = ((K/n) ). The conclusion then

follows by Lemma A. 6. QED.

The next few lemmas give approximation rate results for power series and splines.

Lemma A.12: For power series, if the support I of x. is a compact box in R and

fix) is continuously differentiable of order {, then there are a, C > such that

for each K there is n with \f-p 'n\.<CK , where a = (/r for when d = 0,

and a = {.-d when r = 1 and d aI.

Proof: For the first conclusion, note that by |A(K)| monotonic increasing, the set of

all linear combinations of p (x) will include the set of all polynomials of degree

1/rCK for some C small enough, so Theorem 8 of Lorentz (1986) applies. For r = 1,

note that d p (x)/Sx is a spanning vector for power series up to order K. By the

first conclusion, there exists C such that for all k there is w such that, for

fK(x) = P

K+d(x)'7r, it is the case that sup

x 1

3

df(x)/3x

d- 3

dfK(x)/ax

d|s C«K"*

+dThe

second conclusion then follows by integration and boundedness of the support. For

22

Page 41: Convergence rates for series estimators

example, for d = 1, x the minimum of the support, and the constant coefficient chosen

so that f(x) = f„(x), equal to the minimum of the support x, |f(x)-f (x)| sV

S |3f(x)/3x - df (x)/3x|dx s CK~*+1

. QED

Lemma A.13: For power series, if 1 is star-shaped and there is C such that fix) is

continuously differentiable of all orders and for all multi-indices A, maxr \d f(x)\ s

C }, then for all a, d > there is C > such that for all K there is n with

\f-pK'n\

ds CK~

U.

Proof: By X star-shaped, there exists x € I such that for all x e I, 0x + (l-£)x €

X for all Oss l. For a function f(x), let P(f,m,x) denote the Taylor series up

to order m for an expansion around x. Note 3P(f,m,x)/3x. = P(3f/3x.,m-l,x), so that

by induction 3 P(f,m,x) = P(3 f,m-|A|,x). Also, 3 f(x) also satisfies the hypotheses,

so that by the intermediate value form of the remainder,

maxx€l l3

Af(x) - P(3

Xf,m-| A|,x)| s (^"/[(m-d)!].

Next, let m(K) be the largest integer such that P(f,m,x) is a linear combination of

p (x), and let f"K (x) = P(f,m(K),x). By the "natural ordering" hypothesis, there are

constants C and C_ such that C m(K) s K s C?m(K) , so that for any a > 0,

Cm(K)

/[(m(K)-d)!] s CK"a

, and

sup| X | sd I '

a*f(x)_sAf

k(x)

'

= sup| X | sd X '

^(x)_P(aXf'm(K)_ I

XI > x) I "

CK_a-

QED -

Lemma A.14: For splines, if X Is a compact box and fix) is continuously

differentiable of order (, then there are a, C > such that for all K there is n

iZ-p^'nl < CK~a

, where a = /-d for r = 1 and d s m-1 and a = £/r for d = 0.

Proof: The result for d = follows by Theorem 12.8 of Schumaker (1981). For the other

case, note that 3 p (x)/3x is a spanning vector for splines of degree m-d, with knot

23

Page 42: Convergence rates for series estimators

spacing bounded by CK for K large enough and some C. Therefore, by Powell (1981),

there exists wK

such that for fK(x) = p

K(x)'7r

K , supx l 3

df(x)/3x

d- 3

df (x)/3x

d|

<

OK The conclusion then follows by integration, similarly to the proof of Lemma

A. 12. QED.

The next two Lemmas show that for power series and splines, there exists P (x)

such that Assumption 3.2 is satisfied, and give explicit bounds on the series and their

derivatives.

Lemma A.15: For power series, if the support of x is a Cartesian product of compact

intervals, say of unit intervals, with density bounded below CTl, (xf) (1-xJ , then

Assumptions 3.2 and equation (3.1) are satisfied, with CjfJO s CKJ

and P (x) is

a subvector of P (x) for all K £ 1.

Proof: Following the definitions in Abramowitz and Stegun (1972, Ch. 22), let C . («)

( a]denote the ultraspherical polynomial of order k for exponent a, /is

n21-2a

r(k+2a)/<k!(k+a)[r(a)]2

>, and p^te) = [A^f1/2C

(

£}

(o:). Also, let <c.(x.) =

12 2 1(2x .-x -x . )/(x .-x .) and define

J J J J J

(v+.5)pk(x) - njW(k> w-

K KP (x) is a nonsingular combination of p (x) by the "natural ordering" assumption (i.e.

by |A(k)| monotonic increasing). Also, for P(x) absolutely continuous on X =

r 1 2 r 2 i "j[j._ [x.,x.] with pdf proportional to f[._.[(x .-x.)(x.-x.)] , and by the change of there

is a constant C with

„ „r iv +.5) Iv+.S)

X . (XPK(x)P

K(x)'dP(x)) £ X . (JV,[p J

w (<C.(x.))pJ

u (<c.(x.))']dP(x)) = C,min mm j=l M J J m J J

where the inequality follows by P (x) a subvector of ®._,[p w (a:.(x.))

for M = maxksK IMk)| and P^M = ^^ V}A lx)) -

Next, by differentiating 22.5.37 of Abramowitz and Stegun (for m there equal to v

24

Page 43: Convergence rates for series estimators

here) and solving, it follows that for «sk, d^^ixVdx1= C*C

iv***' 5)(x) so

that by 22.14.2 of Abramowitz and Stegun, for Mk-s) as in equation (2.3),

.5+i> +2A , ,- -

\a\vU)\ s cn'iwuk-.)] J J* ciMk-.)|

/rf -5*,HWs ck 5+v+2

\KK. J—

1

J

1/rwhere the last equality follows by |A(k-s)| a CK . QED.

Lemma A.16: For splines, if Assumption 4.1 is satisfied then Assumptions 3.2 and

equation (3.1) are satisfied, with ^(K) a CK(1/2)+d

.

Proof: First, consider the case where x = x_ and let I = n. [-1,1]. Let B . (<r),

be the B-spline of order m, for the knot sequence -1 + 2j/[L+l], j = .... -1, 0, -1,

... with left end-knot j, and let

P*k

(V S V2)1/\-m-l,L/V' (k " l 4+m+1

'l ' '• •- r) '

kICP,.^(x) = n.Ll(Mk)>0)P, -^ „.x(x.).

,K, , . K,

n^V^u^i'V

Then existence of a nonsingular matrix A such that P (x) = Ap (x) for x e I follows

by inclusion in p (x) of all multiplicative interactions of splines for components of

x corresponding to components of h(x) and the usual basis result for B-splines (e.g.

Theorem 19.2 of Powell, 1981).

Next, a well known property of B-splines that for all x, the number of elements of

P (x) = (P.K (x),...,PKK(x)) that are nonzero are bounded, uniformly in K. Also, when

the elements of x are i.i.d. uniform random variables, and noting that

[2(m+l)/L.](L./2l r.. (x.) are tne so-called normalized B-splines with evenly spaced

knots, it follows by the argument of Burman and Chen (1989, p. 1587) that for P. ,(x) =

(P..(xJ P..t(x-)), there is C with X . (I

1P. . (x)P. . (x)'dx) £ C for all

cl l c,L+m+l c min . c,L t,L

positive integers L. Therefore, the boundedness away from zero of the smallest

K reigenvalue follows by P (x) a subvector of ®»_,P/

, (x #). analogously to the proof of

25

Page 44: Convergence rates for series estimators

Lemma A. 14. Also, since changing even knot spacing is equivalent to rescaling the

argument of B-splines, sup_ \d B . Ax)/dxIJ CL , d s m, implying the bounds on

derivatives given in the conclusion. The proof when x is present follows as in the

proof of Lemma 8.4. QED.

Proof of Theorem 3.1: For each K let it be that from Assumption 3.4 with d = 0, so

that there is C such that

(A.2) E.n

t[gn(x.)-p

K(x.)'ji]

2/n s sup „|gn(x)-p

K(x)'ir|

2s CK

_2a= O (K

_2a).

i=l u l l xea. u p—

— 1/2Also, by Lemma A.ll, the hypothesis of Lemma A. 7 is satisfied with e = (K/n) The

first conclusion then follows by the conclusion of Lemma A. 7.

The second conclusion is proven using Lemma A. 8. In the hypotheses of Lemma A. 8,

let Z = JTK(x)P

K(x)'dF(x) and p [pfyxj] P

K(xn)]'. By Assumption 3.2 and Lemmas

— 1/2A. 10 and A.ll, the hypotheses of Lemma A.8 are satisfied with e = (K/n) . For each

K KK let re be as above, except with P (x) replacing p (x). Then eq. (A.2) is

satisfied (with PK(x) replacing p

K(x)) and T[g (x)-P

K(x)'Tt]

2dF(x) s

\C *y —letsup „|gn(x)-P (x)'ir| = (K ). Then by the second conclusion of Lemma A.8,

(A.3) J[g (x)-i(x)]2dF(x) s 2T[g (x)-P

K(x)'n]

2dF(x) + 2(n-7t)'Z(n-ir)

s (K_2a

) + (K/n) + (l)r.n

i [gft(x.)-P

K(x.)'Tr]

2/n = (K/n + K

2a). QED.

p p ^1=1 °0 l l p—

Proof of Theorem 3.2: Because P (x) is a constant nonsingular linear transformation of

K K Kp (x), Assumption 3.4 will be satisfied for P (x) replacing p (x). Also, by

Assumption 3.3 b), when K a K, n can be chosen so that |g_(x)-P (x)'ir|. s

|g_(x)-P%x)'ir| ., so that |g-(x)-PK(x)'n| , = (K

_a). Also, it follows as in the proof

d O d p —

of Theorem 3.1 that eq. (A.2) and the hypotheses of Lemma A.8 are satisfied. Then by the

first conclusion of Lemma A.8 and the triangle inequality,

26

Page 45: Convergence rates for series estimators

(A.4) lg -£ld

s lg -PK'nl

dIP

K/(w-w)l

ds O

p(K

a) + Cd

(K)"*-wll

= ° (K~a

) + C.(K)0 ((K/n)1/2

+K_a

) = (C, ,(K)[(K/n)1/2

+K~a

]). QED.pap pa ~~

Proof of Theorem 4.1: By Assumption 4.1 it follows that the hypotheses of Lemma A. 3 are

satisfied. Therefore, by the conclusion of Lemma A. 3 there exists a representation

SqM = E/8n^x^' wflere for eacn * tne dimension of x, is less than or equal to a.

I KThen by Lemmas A. 12 and A. 14 it follows that for each K there is n with lgn »

-P

' n\ n

s CK Then by the triangle inequality, for n = £.n., Assumption 3.4 is satisfied

with d = and a = a/a. Also, by Lemma A. 15, Assumptions 3.2 and equation (3.1) are

satisfied, and Assumption 4.2 implies that Assumption 3.3 holds. Then the conclusion

1/2follows by Theorem 3.1 with d = and CQ(K)

= K for power series and Cn(K)= K

for splines. QED.

Proof of Theorem 4.2: It follows as in the proof of Theorem 4.1 that Assumptions 3.1 -

1/23.4 are satisfied, with £n(K)

= K for power series and Cn(K)= K for splines. The

conclusion then follows by Theorem 3.2. QED.

Proof of Theorem 4.3: Follows as in the proof of Theorem 4.2, except that Assumption 3.4

is now satisfied with a = -&+d by Lemmas A. 12 and A. 14, and Assumption 3.2 and equation

(3.1) are now satisfied with Cj(K) = K for power series, by Lemma A. 15, and with

<d(K) = K

(1/2)+dfor splines, by Lemma A.16. QED.

Proof of Theorem 4.4: Follows as in the proof of Theorem 4.3, except that Lemma A. 13 is

applied to show that Assumption 3.4 holds for any a > 0. QED.

Proof of Theorem 5.1: The proof is similar to that of Theorems 4.1 and 4.2. By w

bounded and Lemmas A. 12 and A. 14, Assumption 3.4 is satisfied with a = <*/r. Also, note

that Assumption 4.1 is satisfied with u replacing x. Let P (x) = w«P (u) for

P (u) equal to the vector from the conclusion of Lemmas A. 15 or A.16, for power series

27

Page 46: Convergence rates for series estimators

and splines respectively. Then by the smallest eigenvalue of E[ww' |u] bounded away

Y. Y. Y./9 Y./!ffrom zero, E[P (x)P

K(x)'l a CUsElP^^uJP'^fu)' ]) in the positive semi-definite

K Ksense, so the smallest eigenvalue of E[P (x)P (x)'] is bounded away from zero. Also,

bounds on elements of P (x) are the same, up to a constant multiple, as bounds on

elements of P (u), so that Assumption 3.3 will hold. The conclusion then follows by

the conclusions to Theorems 3.1 and 3.2. QED.

28

Page 47: Convergence rates for series estimators

References

Abramowitz, M. and Stegun, I. A., eds. (1972). Handbook of Mathematical Functions.

Washington, D.C.: Commerce Department.

Agarwal, G. and Studden, W. (1980). Asymptotic integrated mean square error using least

squares and bias minimizing splines. Annals of Statistics. 8 1307-1325.

Andrews, D.W.K. (1991). Asymptotic normality of series estimators for various

nonparametric and semiparametric models. Econometrica. 59 307-345.

Andrews, D.W.K. and Whang, Y.J. (1990). Additive interactive regression models:

Circumvention of the curse of dimensionality. Econometric Theory. 6 466-479.

Bickel P., C.A.J. Klaassen, Y. Ritov, and J. A. Wellner (1993): Efficient andadaptive inference in semiparametric models, monograph, forthcoming.

Breiman, L. and Friedman, J.H. (1985). Estimating optimal transformations for multiple

regression and correlation. Journal of the American Statistical Association. 80580-598.

Breiman, L., Stone, C.J. (1978). Nonlinear additive regression, note.

Buja, A., Hastie, T., and Tibshirani, R. (1989). Linear smoothers and additive models.

Annals of Statistics. 17 453-510.

Burman, P. and Chen, K.W. (1989). Nonparametric estimation of a regression function.

Annals of Statistics. 17 1567-1596.

Cox, D.D. (1988). Approximation of Least Squares Regression on Nested Subspaces. .Annals

of Statistics. 16 713-732.

Friedman, J. and Stuetzle, W. (1981). Projection pursuit regression. Journal of the

American Statistical Association. 76 817-823.

Gallant, A.R. (1981). On the bias in flexible functional forms and an essentially

unbiased form: The Fourier flexible form. Journal of Econometrics. 76 211 - 245.

Gallant, A.R. and Souza, G. (1991). On the asymptotic normality of Fourier flexible formestimates. Journal of Econometrics. 50 329-353.

Lorentz, G.G. (1986). Approximation of Functions. New York: Chelsea Publishing

Company.

Newey, W.K. (1988). Adaptive estimation of regression models via moment restrictions.

Journal of Econometrics. 38 301-339.

Newey, W.K. (1993a). The asymptotic variance of semiparametric estimators. Preprint.

MIT. Department of Economics.

Newey, W.K. (1993b). Series estimation of regression functionals. forthcoming.

Econometric Theory.

29

Page 48: Convergence rates for series estimators

Powell, M.J.D. (1981). Approximation Theory and Methods. Cambridge, England: CambridgeUniversity Press.

Rao, C.R. (1973). Linear Statistical Inference and Its Applications. New York: Wiley.

Riedel, K.S. (1992). Smoothing spline growth curves with covariates. preprint, CourantInstitute, New York University.

Schumaker, L.L. (1981): Spline Functions: Basic Theory. Wiley, New York.

Stone, C.J. (1982). Optimal global rates of convergence for nonparametric regression.

Annals of Statistics. 10 1040-1053.

Stone, C.J. (1985). Additive regression and other nonparametric models. Annals ofStatistics. 13 689-705.

Stone, C.J. (1990). L_ rate of convergence for interaction spline regression, Tech. Rep.

No. 268, Berkeley).

Wahba, G. (1984). Cross-validated spline methods for the estimation of multivariate

functions from data on functionals. In Statistics: An Appraisal, Proceedings 50th

Anniversary Conference Iowa State Statistical Laboratory (H.A. David and H.T. David,

eds. ) 205-235, Iowa State University Press, Ames, Iowa.

Zeldin, M.D. and Thomas, D.M. (1975). Ozone trends in the Eastern Los Angeles basin

corrected for meteorological variations. Proceedings International Conference onEnvironmental Sensing and Assessment, 2, held September 14-19, 1975, in Las Vegas,

Nevada.

7579 O I

30

Page 49: Convergence rates for series estimators
Page 50: Convergence rates for series estimators
Page 51: Convergence rates for series estimators
Page 52: Convergence rates for series estimators

Date Due

Lib-26-67

Page 53: Convergence rates for series estimators

MIT LIBRARIES DUPL

3 TQflO 0063210^ 3

Page 54: Convergence rates for series estimators