A Design-Adaptive Local Polynomial Estimator for …carroll/ftp/2009.papers.directory/dcf.pdfA...

12
A Design-Adaptive Local Polynomial Estimator for the Errors-in-Variables Problem Aurore DELAIGLE, Jianqing FAN, and Raymond J. CARROLL Local polynomial estimators are popular techniques for nonparametric regression estimation and have received great attention in the literature. Their simplest version, the local constant estimator, can be easily extended to the errors-in-variables context by exploiting its similarity with the deconvolution kernel density estimator. The generalization of the higher order versions of the estimator, however, is not straightforward and has remained an open problem for the last 15 years. We propose an innovative local polynomial estimator of any order in the errors-in-variables context, derive its design-adaptive asymptotic properties and study its finite sample performance on simulated examples. We provide not only a solution to a long-standing open problem, but also provide methodological contributions to error-in- variable regression, including local polynomial estimation of derivative functions. KEY WORDS: Bandwidth selector; Deconvolution; Inverse problems; Local polynomial; Measurement errors; Nonparametric regression; Replicated measurements. 1. INTRODUCTION Local polynomial estimators are popular techniques for nonparametric regression estimation. Their simplest version, the local constant estimator, can be easily extended to the errors-in-variables context by exploiting its similarity with the deconvolution kernel density estimator. The generalization of the higher order versions of the estimator, however, is not straightforward and has remained an open problem for the last 15 years, since the publication of Fan and Truong (1993). The purpose of this article is to describe a solution to this long- standing open problem: we also make methodological con- tributions to errors-in-variable regression, including local pol- ynomial estimation of derivative functions. Suppose we have an iid sample (X 1 , Y 1 ), ...,(X n , Y n ) dis- tributed like (X, Y), and we want to estimate the regression curve m(x) ¼ E(Y|X ¼ x) or its nth derivative m (n) (x). Let K be a kernel function and h > 0 a smoothing parameter called the bandwidth. When X is observable, at each point x, the local polynomial estimator of order p approximates the function m by a pth order polynomial m p ðzÞ[ P p k¼0 b x;k ðz xÞ k ;where the local parameters b x ¼ (b x,0 , ..., b x,p ) are fitted locally by a weighted least squares regression problem, via minimization of X n j¼1 Y j m p ðX j Þ 2 K h ðX j xÞ; ð1Þ where K h (x) ¼ h 1 K(x/h). Then m(x) is estimated by b m p ðxÞ¼ b b x;0 and m (n) (x) is estimated by b m ðvÞ p ðxÞ¼ v! b b x;v (see Fan and Gijbels 1996). Local polynomial estimators of order p > 0 have many advantages over other nonparametric estima- tors, such as the Nadaraya-Watson estimator (p ¼ 0). One of their attractive features is their capacity to adapt automatically to the boundary of the design points, thereby offering the potential of bias reduction with no or little variance increase. In this article, we consider the more difficult errors-in-var- iables problem, where the goal is still to estimate the curve m(x) or its derivative m (n) (x), but the only observations available are an iid sample (W 1 , Y 1 ), ...,(W n , Y n ) distributed like (W, Y), where W ¼ X þ U with U independent of X and Y . Here, X is not observable and instead we observe W, which is a version of X contaminated by a measurement error U with density f U . In this context, when p ¼ 0, m p (X j ) ¼ b x,0 , and a consistent estimator of m can simply be obtained after replacing the weights K h (X j x) in (1) by appropriate weights depending on W j (see Fan and Truong 1993). For p > 0, however, m p ðX j Þ¼ X p k¼0 b x;k ðX j xÞ k depends on the unobserved X j . As a result, despite the popu- larity of the measurement error problem, no one has yet been able to extend the minimization problem (1) and the corre- sponding local pth order polynomial estimators for p > 0 to the case of contaminated data. An exception is the recent article by Zwanzig (2007), who constructed a local linear estimator of m in the context where the U i ’s are normally distributed, the density of the X i ’s is known to be uniform U[0, 1], and the curve m is supported on [0, 1]. We propose a solution to the general problem and thus generalize local polynomial estimators to the errors-in-variable case. The methodology consists of constructing simple unbiased estimators of the terms depending on X j , which are involved in the calculation of the usual local polynomial esti- mators. Our approach also provides an elegant estimator of the derivative functions in the errors-in-variables setting. The errors-in-variables regression problem has been con- sidered by many authors in both the parametric and the non- parametric context. See, for example, Fan and Masry (1992), Cook and Stefanski (1994), Stefanski and Cook (1995), Ioan- nides and Alevizos (1997), Koo and Lee (1998), Carroll, Maca, Aurore Delaigle is Reader, Department of Mathematics, University of Bristol, Bristol BS8 1TW, UK and Department of Mathematics and Statistics, University of Melbourne, VIC, 3010, Australia (E-mail: aurore.delaigle@bri- s.ac.uk). Jianqing Fan is Frederick L. Moore’18 Professor of Finance, Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ 08544, and Honored Professor, Department of Sta- tistics, Shanghai University of Finance and Economics, Shanghai, China (E-mail: [email protected]). Raymond J. Carroll is Distinguished Professor, Department of Statistics, Texas A&M University, College Station, TX 77843 (E-mail: [email protected]). Carroll’s research was supported by grants from the National Cancer Institute (CA57030, CA90301) and by award number KUS-CI-016-04 made by the King Abdullah University of Science and Tech- nology (KAUST). Delaigle’s research was supported by a Maurice Belz Fel- lowship from the University of Melbourne, Australia, and by a grant from the Australian Research Council. Fan’s research was supported by grants from the National Institute of General Medicine R01-GM072611 and National Science Foundation DMS-0714554 and DMS-0751568. The authors thank the editor, the associate editor, and referees for their valuable comments. 348 2009 American Statistical Association Journal of the American Statistical Association March 2009, Vol. 104, No. 485, Theory and Methods DOI 10.1198/jasa.2009.0114

Transcript of A Design-Adaptive Local Polynomial Estimator for …carroll/ftp/2009.papers.directory/dcf.pdfA...

A Design-Adaptive Local Polynomial Estimatorfor the Errors-in-Variables Problem

Aurore DELAIGLE, Jianqing FAN, and Raymond J. CARROLL

Local polynomial estimators are popular techniques for nonparametric regression estimation and have received great attention in the

literature. Their simplest version, the local constant estimator, can be easily extended to the errors-in-variables context by exploiting its

similarity with the deconvolution kernel density estimator. The generalization of the higher order versions of the estimator, however, is not

straightforward and has remained an open problem for the last 15 years. We propose an innovative local polynomial estimator of any order in

the errors-in-variables context, derive its design-adaptive asymptotic properties and study its finite sample performance on simulated

examples. We provide not only a solution to a long-standing open problem, but also provide methodological contributions to error-in-

variable regression, including local polynomial estimation of derivative functions.

KEY WORDS: Bandwidth selector; Deconvolution; Inverse problems; Local polynomial; Measurement errors; Nonparametric

regression; Replicated measurements.

1. INTRODUCTION

Local polynomial estimators are popular techniques fornonparametric regression estimation. Their simplest version,the local constant estimator, can be easily extended to theerrors-in-variables context by exploiting its similarity with thedeconvolution kernel density estimator. The generalization ofthe higher order versions of the estimator, however, is notstraightforward and has remained an open problem for the last15 years, since the publication of Fan and Truong (1993). Thepurpose of this article is to describe a solution to this long-standing open problem: we also make methodological con-tributions to errors-in-variable regression, including local pol-ynomial estimation of derivative functions.

Suppose we have an iid sample (X1, Y1), . . ., (Xn, Yn) dis-tributed like (X, Y), and we want to estimate the regressioncurve m(x)¼ E(Y|X¼ x) or its nth derivative m(n)(x). Let K be akernel function and h > 0 a smoothing parameter called thebandwidth. When X is observable, at each point x, the localpolynomial estimator of order p approximates the function mby a pth order polynomial mpðzÞ[

Ppk¼0 bx;kðz� xÞk;where the

local parameters bx ¼ (bx,0, . . ., bx,p) are fitted locally by aweighted least squares regression problem, via minimization ofXn

j¼1

Yj � mpðXjÞ� �2

KhðXj � xÞ; ð1Þ

where Kh(x) ¼ h�1K(x/h). Then m(x) is estimated bybmpðxÞ ¼ bbx;0 and m(n)(x) is estimated by bmðvÞp ðxÞ ¼ v!bbx;v(seeFan and Gijbels 1996). Local polynomial estimators of order p

> 0 have many advantages over other nonparametric estima-tors, such as the Nadaraya-Watson estimator (p ¼ 0). One oftheir attractive features is their capacity to adapt automaticallyto the boundary of the design points, thereby offering thepotential of bias reduction with no or little variance increase.

In this article, we consider the more difficult errors-in-var-iables problem, where the goal is still to estimate the curve m(x)or its derivative m(n)(x), but the only observations available arean iid sample (W1, Y1), . . ., (Wn, Yn) distributed like (W, Y),where W¼ XþU with U independent of X and Y. Here, X is notobservable and instead we observe W, which is a version of Xcontaminated by a measurement error U with density fU. In thiscontext, when p ¼ 0, mp(Xj) ¼ bx,0, and a consistent estimatorof m can simply be obtained after replacing the weights Kh(Xj

� x) in (1) by appropriate weights depending on Wj (see Fanand Truong 1993). For p > 0, however,

mpðXjÞ ¼Xp

k¼0

bx;kðXj � xÞk

depends on the unobserved Xj. As a result, despite the popu-larity of the measurement error problem, no one has yet beenable to extend the minimization problem (1) and the corre-sponding local pth order polynomial estimators for p > 0 to thecase of contaminated data. An exception is the recent article byZwanzig (2007), who constructed a local linear estimator of min the context where the Ui’s are normally distributed, thedensity of the Xi’s is known to be uniform U[0, 1], and the curvem is supported on [0, 1].

We propose a solution to the general problem and thusgeneralize local polynomial estimators to the errors-in-variablecase. The methodology consists of constructing simpleunbiased estimators of the terms depending on Xj, which areinvolved in the calculation of the usual local polynomial esti-mators. Our approach also provides an elegant estimator of thederivative functions in the errors-in-variables setting.

The errors-in-variables regression problem has been con-sidered by many authors in both the parametric and the non-parametric context. See, for example, Fan and Masry (1992),Cook and Stefanski (1994), Stefanski and Cook (1995), Ioan-nides and Alevizos (1997), Koo and Lee (1998), Carroll, Maca,

Aurore Delaigle is Reader, Department of Mathematics, University ofBristol, Bristol BS8 1TW, UK and Department of Mathematics and Statistics,University of Melbourne, VIC, 3010, Australia (E-mail: [email protected]). Jianqing Fan is Frederick L. Moore’18 Professor of Finance,Department of Operations Research and Financial Engineering, PrincetonUniversity, Princeton, NJ 08544, and Honored Professor, Department of Sta-tistics, Shanghai University of Finance and Economics, Shanghai, China(E-mail: [email protected]). Raymond J. Carroll is Distinguished Professor,Department of Statistics, Texas A&M University, College Station, TX 77843(E-mail: [email protected]). Carroll’s research was supported by grantsfrom the National Cancer Institute (CA57030, CA90301) and by award numberKUS-CI-016-04 made by the King Abdullah University of Science and Tech-nology (KAUST). Delaigle’s research was supported by a Maurice Belz Fel-lowship from the University of Melbourne, Australia, and by a grant from theAustralian Research Council. Fan’s research was supported by grants from theNational Institute of General Medicine R01-GM072611 and National ScienceFoundation DMS-0714554 and DMS-0751568. The authors thank the editor,the associate editor, and referees for their valuable comments.

348

� 2009 American Statistical AssociationJournal of the American Statistical Association

March 2009, Vol. 104, No. 485, Theory and MethodsDOI 10.1198/jasa.2009.0114

and Ruppert (1999), Stefanski (2000), Taupin (2001), Berry,Carroll, and Ruppert (2002), Carroll and Hall (2004), Stau-denmayer and Ruppert (2004), Liang and Wang (2005), Comteand Taupin (2007), Delaigle and Meister (2007), Hall andMeister (2007), and Delaigle, Hall, and Meister (2008); seealso Carroll, Ruppert, Stefanski, and Crainiceanu (2006) for anexhaustive review of this problem.

2. METHODOLOGY

In this section, we will first review local polynomial estimatorsin the error-free case to show exactly what has to be solved in themeasurement error problem. After that, we give our solution.

2.1 Local Polynomial Estimator in the Error-free Case

In the usual error-free case (i.e., when the Xi’s are observ-able), the local polynomial estimator of m(n)(x) of order p canbe written in matrix notation as

bmðnÞp ðxÞ ¼ n!e>nþ1ðX>KXÞ�1X>Ky;

where e>nþ1 ¼ ð0; . . . ; 0; 1; 0; . . . ; 0Þ with 1 on the (n þ 1)th

position, y> ¼ (Y1, . . ., Yn), X¼ {(Xi� x)j}1#i#n,0#j#p and K¼diag{Kh(Xj � x)} (e.g., see Fan and Gijbels (1996, p. 59).

Using standard calculations, this estimator can be written invarious equivalent ways. An expression that will be particularlyuseful in the context of contaminated errors, where we observeneither X nor K, is the one used in Fan and Masry (1997), whichfollows from equivalent kernel calculations of Fan and Gijbels(1996, p. 63). Let

Sn ¼Sn;0ðxÞ . . . Sn;pðxÞ

..

. . .. ..

.

Sn;pðxÞ . . . Sn;2pðxÞ

0B@1CA;Tn ¼

Tn;0ðxÞ...

Tn;pðxÞ

0B@1CA;

where

Sn;kðxÞ ¼1

n

Xn

j¼1

Xj � x

h

� �k

KhðXj � xÞ;

Tn;kðxÞ ¼1

n

Xn

j¼1

YjXj � x

h

� �k

KhðXj � xÞ:

Then the local polynomial estimator of m(n)(x) of order p can bewritten as

bmðnÞp ðxÞ ¼ n!h�ne>nþ1S�1n Tn:

2.2 Extension to the Error-case

Our goal is to extend bmðnÞp ðxÞ to the errors-in-variables set-ting, where the data are a sample (W1, Y1), . . ., (Wn, Yn) ofcontaminated iid observations coming from the model

Yj ¼ mðXjÞ þ hj; Wj ¼ Xj þ Uj; EðhjjXjÞ ¼ 0;

with Xj ; f X and Uj ; f U ;ð2Þ

where Uj are the measurement errors, independent of (Xj, Yj,hj), and fU is known.

For p ¼ 0, a rate-optimal estimator has been developedby Fan and Truong (1993). Their technique is similar to the

one employed in density deconvolution problems studied inStefanski and Carroll (1990) (see also Carroll and Hall 1988). Itconsists of replacing the unobserved Kh(Xj � x) by anobservable quantity Lh(Wj � x) satisfying

E½LhðWj � xÞjXj� ¼ KhðXj � xÞ:

In the usual nomenclature of measurement error models, thismeans that Lh(Wj � x) is an unbiased score for the kernelfunction Kh(Xj � x).

Following this idea, we would like to replace (Xj � x)kKh(Xj

� x) in Sn,k and Tn,k by (Wj � x)kLk,h(Wj � x), where Lk,h(x) ¼h�1Lk(x/h), and each Lk potentially depends on h and satisfies

E ðWj � xÞkLk;hðWj � xÞ Xj

��n o¼ ðXj � xÞkKhðXj � xÞ: ð3Þ

That is, we propose to find unbiased scores for all componentsof the kernel functions. Thus, using the substitution principle,we propose to estimate m(n)(x) bybmðnÞp ðxÞ ¼ n!h�ne>nþ1

bS�1nbTn; ð4Þ

where bSn ¼ fbSn; jþ‘ðxÞg0 # j;‘ # p and bTn ¼ fbTn;0ðxÞ; . . . ;bTn;pðxÞg> with

bSn;kðxÞ ¼ n�1Xn

j¼1

Wj � x

h

� �k

Lk;hðWj � xÞ;

bTn;kðxÞ ¼ n�1Xn

j¼1

YjWj � x

h

� �k

Lk;hðWj � xÞ:

The method explained earlier seems relatively straightfor-ward but its actual implementation is difficult, and this is thereason that the problem has remained unsolved. The main dif-ficulty has been that it is very hard to find an explicit solutionLk,h(�) to the integral Equation (3). In addition, a priori it is notclear that the solution will be independent of other quantitiessuch as Xj, x, and other population parameters. Therefore, thisproblem has remained unsolved for more than 15 years.

The key to finding the solution is the Fourier transform.Instead of solving (3) directly, we solve its Fourier version

E ffðWj�xÞkLk;hðWj�xÞgðtÞ Xj

��h i¼ ffðXj�xÞkKhðXj�xÞgðtÞ; ð5Þ

where, for a function g, we let fg denote its Fourier transform,whereas for a random variable T, we let fT denote the char-acteristic function of its distribution.

We make the following basic assumptions:

Condition A:ÐjfXj< ‘; fU(t) 6¼ 0 for all t; f

ð‘ÞK is not iden-

tically zero andÐjfð‘ÞK ðtÞ=fUðt=hÞj dt < ‘ for all h > 0 and 0 #

‘ # 2p.Condition A generalizes standard conditions of the decon-

volution literature, where it is assumed to hold for p ¼ 0. It iseasy to find kernels that satisfy this condition. For example,kernels defined by fK(t) ¼ (1 � t2)q�1[�1,1](t), with q $ 2p,satisfy this condition.

Under these conditions, we show in the Appendix that thesolution to (5) is found by taking Lk (in the definition of Lk,h)equal to

LkðuÞ ¼ u�kKU;kðuÞ;

Delaigle, Fan, and Carroll: Local Polynomial Estimators 349

with

KU;kðxÞ ¼ i�k 1

2p

ðe�itx f

ðkÞK ðtÞ

fUð�t=hÞ dt:

In other words, our estimator is defined by (4), where

bSn;kðxÞ ¼ n�1Xn

j¼1

KU;k;hðWj � xÞ and

bTn;kðxÞ ¼ n�1Xn

j¼1

YjKU;k;hðWj � xÞ;ð6Þ

with KU,k,h(x) ¼ h-1KU,k(x/h). Note that the functions KU,k

depend on h, even though, to simplify the presentation, we didnot indicate this dependence explicitly in the notations.

In what follows, for simplicity, we drop the p index frombmðnÞp ðxÞ: It is also convenient to rewrite (4) as

bmðnÞðxÞ ¼ h�nn!Xp

k¼0

bSn;kðxÞbTn;kðxÞ;

where bSn;kðxÞ denotes the (n þ 1, k þ 1)th element of theinverse of the matrix bSn.

3. ASYMPTOTIC NORMALITY

3.1 Conditions

To establish asymptotic normality of our estimator, we needto impose some regularity conditions. Note that these con-ditions are stronger than those needed to define the estimator,and to simplify the presentation, we allow overlap of some ofthe conditions [e.g., compare condition A and condition (B1)which follows].

As expected because of the unbiasedness of the score, and aswe will show precisely, the asymptotic bias of the estimator,defined as the expectation of the limiting distribution ofbmðnÞðxÞ � mðnÞðxÞ; is exactly the same as in the error-free case.Therefore, exactly the same as in the error-free case, the biasdepends on the smoothness of m and fX, and on the number offinite moments of Y and K. Define t2ðuÞ ¼ E½fY � mðxÞg2

jX ¼ u�: Note that, to simplify the notation, we do not put anindex x into the function t, but it should be obvious that thefunction depends on the point x where we wish to estimate thecurve m. We make the following assumptions:

Condition B:

(B1) K is a real and symmetric kernel such thatÐKðxÞ dx ¼ 1 and has finite moments of order 2p þ 3;(B2) h ! 0 and nh ! ‘ as n ! ‘;(B3) fX(x) > 0 and fX is twice differentiable such that

jj f ð jÞX jj‘< ‘ for j ¼ 0, 1, 2;(B4) m is pþ 3 times differentiable, t2(�) is bounded, ||m( j)||‘

< ‘ for j¼ 0, . . ., pþ 3, and for some h > 0, E{|Yi�m(x)|2þh|X¼ u} is bounded for all u.

These conditions are rather mild and, apart from theassumptions on the conditional moments of Y, they are fairlystandard in the error-free context. Boundedness of moments ofY are standard in the measurement error context (see Fan andMasry 1992; Fan and Truong 1993).

The asymptotic variance of the estimator, defined as thevariance of the limiting distribution of bmðnÞðxÞ � mðnÞðxÞ; dif-fers from the error-free case because, as usual in deconvolutionproblems, it depends strongly on the type of measurementerrors that contaminate the X-data. Following Fan (1991a,b,c),we consider two categories of errors. An ordinary smooth errorof order b is such that

limt!þ‘

tbfUðtÞ ¼ c and limt!þ‘

tbþ1f9UðtÞ ¼ �cb ð7Þ

for some constants c > 0 and b > 1. A supersmooth error oforder b > 0 is such that

d0jtjb0 expð�jtjb=gÞ # jfUðtÞj # d1jtjb1 expð�jtjb=gÞas jtj ! ‘;

ð8Þ

with d0, d1, g, b0, and b1 some positive constants. For example,Laplace errors, Gamma errors, and their convolutions areordinary smooth, whereas Cauchy errors, Gaussian errors, andtheir convolutions are supersmooth. Depending on the type ofthe measurement error, we need different conditions on K andU to establish the asymptotic behavior of the variance of theestimator. These conditions mostly concern the kernel function,which we can choose; they are fairly standard in deconvolutionproblems and are easy to satisfy. For example, see Fan(1991a,b,c) and Fan and Masry (1992). We assume:

Condition O (ordinary smooth case):

||f9U||‘ < ‘ and for k ¼ 0, . . ., 2p þ 1, jjfðkÞK jj‘< ‘ andÐjtjb þ jtjb�1h i

jfðkÞK ðtÞj dt < ‘ and, for 0 # k, k9 # 2p,Ðj tj2bjfðkÞK ðtÞj � j f

ðk9ÞK ðtÞj dt < ‘:

Condition S (supersmooth case):fK is supported on [�1, 1] and, for k ¼ 0, . . ., 2p,

jjfðkÞK jj‘< ‘;In the sequel we let bS ¼ bSn; mj ¼

Ðu jKðuÞ du; S ¼

(mkþ‘)0#k, ‘#p, eS ¼ mkþ‘þ1

� �0 # k;‘ # p

;m ¼ (mpþ1, . . ., m2pþ1)T,em ¼ ðmpþ2; . . . ;m2pþ2Þ>; and, for any square integrable func-

tion g, we define RðgÞ ¼Ð

g2: Finally, we let S i, j denote the (i

þ 1, j þ 1)th element of the matrix S�1f�1X ðxÞ and, for c as in

(7),

S� ¼ ð�1Þk9ik9�k 1

2pc2

ðj tj2b

fðkÞK ðtÞf

ðk9ÞK ðtÞ dt

� �0 # k;k9 # p

:

Note that this matrix is always real because its (k, k9)th elementis zero when k þ k9 is odd, and ik9�k ¼ (�1)(k9�k)/2 otherwise.

3.2 Asymptotic Results

Asymptotic properties of the estimator depend on the type oferror that contaminates the data. The following theorem estab-lishes asymptotic normality in the ordinary smooth error case.

Theorem 1: Assume (7). Under Conditions A, B, and O, ifnh2bþ2nþ1 ! ‘ and nh2bþ4 ! ‘, we havebmðnÞðxÞ � mðnÞðxÞ � BiasfbmðnÞðxÞgffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

varfbmðnÞðxÞgp !L Nð0; 1Þ

where varfm̂ðnÞðxÞg ¼ e>nþ1S�1S�S�1enþ1ðn!Þ2ðt2fXÞ � fUðxÞ

f 2X ðxÞnh2bþ2nþ1 þ

o 1nh2bþ2nþ1

� �; and

350 Journal of the American Statistical Association, March 2009

(1) if p � n is odd

BiasfbmðnÞðxÞg ¼ e>nþ1S�1mn!

ðpþ 1Þ! mðpþ1ÞðxÞhpþ1�n

þ oðhpþ1�nÞ;

(2) if p � n is even

BiasfbmðnÞðxÞg ¼ e>nþ1S�1 ~mn!

ð pþ 2Þ!"mð pþ2ÞðxÞ þ ð pþ 2Þmð pþ1ÞðxÞ f 9XðxÞ

f XðxÞ

#hpþ2�n � e>nþ1

S�1 eSS�1mn!

pþ 1ð Þ! m pþ1ð Þ xð Þ f 9XðxÞf XðxÞ

hpþ2�n þ oðhpþ2�nÞ:

From this theorem we see that, as usual in nonparametrickernel deconvolution estimators, the bias of our estimator isexactly the same as the bias of the local polynomial estimatorin the error-free case, and the errors-in-variables only affect thevariance of the estimator. Compare the previous bias formulaswith Theorem 3.1 of Fan and Gijbels (1996), for example. Inparticular, our estimator has the design-adaptive property dis-cussed in Section 3.2.4 of that book.

The optimal bandwidth is found by the usual trade-offbetween the squared bias and the variance, which gives h ;

n�1/(2bþ2pþ3) if p � n is odd and h ; n�1/(2bþ2pþ5) if p � n iseven. The resulting convergence rates of the estimator are,respectively, n�(pþ1�n)/(2bþ2pþ3) if p � n is odd and n�(pþ2�n)/

(2bþ2pþ5) if p � n is even. For p ¼ n ¼ 0, our estimator of m isexactly the estimator of Fan and Truong (1993) and has thesame rate, that is n�2/(2bþ5) (remember that we only assumethat fX is twice differentiable). For p ¼ 1 and n ¼ 0, our esti-mator is different from Fan and Truong (1993), but it convergesat the same rate. For p > 1 and n¼ 0, our estimator converges atfaster rates.

Remark 1: Which p should one use? This problem isessentially the same as in the error-free case (see Fan andGijbels, 1996, Sec. 3.3). In particular, although, in theory usinghigher values of p reduces the asymptotic bias of the estimatorwithout increasing the order of its variance, the theoreticalimprovement for p � n > 1 is not generally noticeable in finitesamples. In particular, the constant term of the dominating partof the variance can increase rapidly with p. However, theimprovement from p � n ¼ 0 to p � n ¼ 1 can be quite sig-nificant, especially in cases where fX or m are discontinuous atthe boundary of their domain, in which case the bias for p � n

¼ 1 is of smaller order than the bias for p � n ¼ 0. In othercases, the biases of the estimators of orders p and p þ 1, wherep � n is even, are of the same order. See Sections 3.3 and 5.

Remark 2: Higher order deconvolution kernel estimators. Asusual, it is possible to reduce the order of the bias by usinghigher order kernels, i.e., kernels that have all their moments upto order k, say, vanishing, and imposing the existence of higherderivatives of fX and m, as was done in Fan and Truong (1993).However, such kernels are not very popular, because it is wellknown that, in practice, they increase the variability of esti-

mators and can make them quite unattractive (e.g., see Marronand Wand 1992). Similarly, one can also use the infinite ordersinc kernel, which has the appealing theoretical property that itadapts automatically to the smoothness of the curves (seeDiggle and Hall 1993; Comte and Taupin 2007). However, thetrick of the sinc kernel does not apply to the boundary setting.In addition, this kernel can only be used when p ¼ n ¼ 0,because f

ð‘ÞK [0 8‘>0; where it can sometimes work poorly in

practice, especially in cases where fX or m have boundarypoints (see Section 5 for illustration).

The next theorem establishes asymptotic normality in thesupersmooth error case. In this case, the variance term is quitecomplicated and asymptotic normality can only be establishedunder a technical condition, which generalizes Condition 3.1(and Lemma 3.2) of Fan and Masry (1992). This condition isnothing but a refined version of the Lyapounov condition, andit essentially says that the bandwidth cannot converge to zerotoo fast. It should be possible to derive more specific lowerbounds on the bandwidth, but this would require considerabletechnical detail and therefore will be omitted here. In the nexttheorem we first give an expression for the asymptotic bias andvariance of bmðnÞðxÞ; defined as, respectively, the expectationand the variance of h–nn!Zn, the asymptotically dominating partof the estimator, and where Zn is defined in (A.3). Then, underthe additional assumption, we derive asymptotic normality.

Theorem 2: Assume (8). Under conditions A, B, and S, if h¼d(2/g)1/b(ln n)�1/b with d > 1, then

(1) BiasfbmðnÞðxÞg is as in Theorem 1 and varfbmðnÞðxÞg ¼ oðBias2fbmðnÞðxÞgÞ;

(2) If, in addition, for Un,1 defined in the Appendixat Equation (A.5), there exists r > 0 such that for bn ¼hb/(2rþ10), E½U2

n;1� $ C21 f�2

X ðxÞh2b0�1fb0<1=2g�1 exp 2� 4bbngf=hbgÞ; with 0 < C1 < ‘ independent of n, then we alsohave

bmðnÞðxÞ � mðnÞðxÞ � BiasfbmðnÞðxÞgffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffivarfbmðnÞðxÞgp !L Nð0; 1Þ:

When h¼ d(2/g)1/b(ln n)�1/b with d > 1, as in the theorem, itis not hard to see that, as usual in the supersmooth error case,the variance is negligible compared with the squared bias andthe estimator converges at the logarithmic rate{log(n)}�(pþ1�n)/b if p � n is odd, and {log(n)}�(pþ2�n)/b if p� n is even. Again, for p ¼ n ¼ 0, our estimator is equal to theestimator of Fan and Truong (1993) and thus has the same rate.

3.3 Behavior Near the Boundary

Because the bias of our estimator is the same as in the error-free case, it suffers from the same boundary effects when thedesign density fX is compactly supported. Without loss ofgenerality, suppose that fX is supported on [0, 1] and, for anyinteger k $ 0 and any function g defined in [0, 1] that is k timesdifferentiable on ]0, 1[, let eg ðkÞ ðxÞ ¼ gðkÞð0þÞ � 1fx¼0g þ gðkÞ

ð1�Þ � 1fx¼1g þ gðkÞðxÞ � 1f0<x<1g: We derive asymptotic nor-mality of the estimator under the following conditions, whichare the same as those usually imposed in the error-free case:

Delaigle, Fan, and Carroll: Local Polynomial Estimators 351

Condition C:

(C1)–(C2) Same as (B1)–(B2);(C3) fX(x) > 0 for x 2 ]0, 1[ and fX is twice differentiable such

that jj f ð jÞX jj‘< ‘ for j ¼ 0, 1, 2;(C4) m is p þ 3 times differentiable on ]0, 1[, t2 is bounded

on [0, 1] and continuous on ]0, 1[, ||m( j)||‘ < ‘ on [0, 1] for j ¼0, . . ., p þ 3 and there exists h > 0 such that E{|Yi � m(x)|2þh|X¼ u} is bounded for all u 2 [0, 1].

We also define mkðxÞ ¼Ð ð1�xÞ=h�x=h xkKðxÞ dx; SB(x) ¼

(mkþk9(x))0#k,k9#p, m(x) ¼ {mpþ1(x), . . ., m2pþ1(x)}> and

S�BðxÞ ¼(ð et 2ðx� uþ hzÞefXðx� uþ hzÞKU;kðzÞKU;k9ðzÞ

3 f UðuÞdu dz

)0 # k;k9 # p

:

For brevity, we only show asymptotic normality in the ordi-nary smooth error case. Our results can be extended to thesupersmooth error case: all our calculations for the bias arevalid for supersmooth errors, and the only difference is thevariance, which is negligible in that case.

The proof of the next theorem is similar to the proof ofTheorem 1 and hence is omitted. It can be obtained from thesequence of Lemmas B10 to B13 of Delaigle, Fan, and Carroll(2008). As for Theorem 2, a technical condition, which isnothing but a refined version of the Lyapounov condition, isrequired to deal with the variance of the estimator.

Theorem 3: Assume (7). Under Conditions A, C, and O, ifnh2bþ2nþ1! ‘ and nh2bþ4! ‘ as n! ‘, and e>nþ1S�1

B ðxÞS�BðxÞS�1

B ðxÞenþ1 $ C1h�2b for some finite constant C1 > 0, wehave

bmðnÞðxÞ � mðnÞðxÞ � BiasfbmðnÞðxÞgffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffivarfbmðnÞðxÞgp !L Nð0; 1Þ

where BiasfbmðnÞðxÞg ¼ e>nþ1S�1B ðxÞmðxÞ n!

ðpþ 1Þ! emðpþ1ÞðxÞ

hpþ1�n þ oðhpþ1�nÞ and varfbmðnÞðxÞg ¼ e>nþ1S�1B ðxÞS�BðxÞ

S�1B ðxÞenþ1

ðn!Þ2ef 2X ðxÞnh2nþ1

þ o 1nh2bþ2nþ1

� �:

As before, the bias is the same as in the error-free case, andthus all well-known results of the boundary problem extend toour context. In particular, the bias of the estimator for p � n

even is of order hpþ1�n, instead of hpþ2�n in the case without aboundary, whereas the bias of the estimator for p � n oddremains of order hpþ1�n, as in the no-boundary case. In par-ticular, the bias of the estimator for p � n ¼ 0 is of order h,whereas it is of order h2 when p � n ¼ 1. For this reason, localpolynomial estimators with p � n odd, and in particular withp � n ¼ 1, are often considered to be more natural.

Note that in deconvolution problems, kernels are usuallysupported on the whole real line (see Delaigle and Hall, 2006),the presence of the boundary can affect every point of the type x¼ ch or x ¼ 1 � ch, with c a finite constant satisfying 0 # c #

1/h. For x ¼ ch, it can be shown that

BiasfbmðnÞðxÞg ¼ e>nþ1S�1B ðxÞmðxÞ

n!

ðpþ 1Þ! mðpþ1Þð0þÞhpþ1�n

þ oðhpþ1�nÞ;whereas if x ¼ 1 � ch, we have

BiasfbmðnÞðxÞg ¼ e>nþ1S�1B ðxÞmðxÞ

n!

ðpþ 1Þ! mðpþ1Þð1�Þhpþ1�n

þ oðhpþ1�nÞ:

4. GENERALIZATIONS

In this section, we show how our methodology can beextended to provide estimators in two important cases: (1)when the measurement error distribution is unknown; and (2)when the measurement errors are heteroscedastic. In theinterest of space we focus on methodology and do not givedetailed asymptotic theory.

4.1 Unknown Measurement Error Distribution

In empirical applications, it can be unrealistic to assume thatthe error density is known. However, it is only possible toconstruct a consistent estimator of m if we are able to con-sistently estimate the error density itself. Several approachesfor estimating this density fU have been considered in thenonparametric literature. Diggle and Hall (1993) and Neumann(1997) assumed that a sample of observations from the errordensity is available and estimated fU nonparametrically fromthose data. A second approach, applicable when the con-taminated observations are replicated, consists in estimating fUfrom the replicates. Finally, in some cases, if we have a para-metric model for the error density fU and additional constraintson the density fX, it is possible to estimate an unknown pa-rameter of fU without any additional observation (see Butuceaand Matias 2005; Meister 2006).

We give details for the replicated data approach, which is byfar the most commonly used. In the simplest version of thismodel, the observations are a sample of iid data (Wj1, Wj2, Yj),j ¼ 1, . . ., n, generated by the model

Yj ¼ mðXjÞ þ hj; Wjk ¼ Xj þ Ujk; k ¼ 1; 2; EðhjjXjÞ ¼ 0;

with Xj ; f X and Ujk ; f U ;

ð9Þwhere the Ujk’s are independent, and independent of the (Xj, Yj,hj)’ s.

In the measurement error literature, it is often assumed thatthe error density, fU(;u) is known up to a parameter u, which hasto be estimated from the data. For example, if u ¼ var(U), theunknown variance of U, a

ffiffiffinp

consistent estimator is given by

bu ¼ f2ðn� 1Þg�1XðWj1 �Wj2 �W1 þW2Þ2; ð10Þ

where Wk ¼ n�1Pn

j¼1 Wjk (see, for example, Carroll et al.2006, Equation 4.3). Taking fUð;buÞ to be the characteristicfunction corresponding to f Uð;buÞ; we can extend our estimatorof m(n) to the unknown error case by replacing fU by fUð;buÞeverywhere.

352 Journal of the American Statistical Association, March 2009

In the case where no parametric model for fU is available,some authors suggest using a nonparametric estimator of fU.For general settings (see Li and Vuong 1998; Schennach2004a,b; Hu and Schennach 2008). In the common case wherethe error density fU is symmetric, Delaigle et al. (2008) pro-posed to estimate fU(t) by bfUðtÞ ¼ n�1

Pnj¼1 cosfitðWj1�

���Wj2Þgj1=2: Following the approach they use for the case p ¼ n

¼ 0, we can extend our estimator of m(n) to the unknown errorcase by replacing fU by bfU everywhere, adding a small pos-itive number to bfU when it gets too small. Detailed con-vergence rates of this approach have been studied by Delaigleet al. (2008) in the local constant case (p¼ 0), where they showthat the convergence rates of this version of the estimator is thesame as that of the estimator with known fU, as long as fX issufficiently smooth relative to fU. Their conclusion can beextended to our setting.

4.2 Heteroscedastic Measurement Errors

Our local polynomial methodology can be generalized to themore complicated setting where the errors Ui are not identi-cally distributed. In practice, this could happen when obser-vations have been obtained in different conditions, for exampleif they were collected from different laboratories. Recent ref-erences on this problem include Delaigle and Meister (2007,2008) and Staudenmayer, Ruppert, and Buonaccorsi (2008). Inthis context, the observations are a sample (W1, Y1), . . ., (Wn,Yn) of iid observations coming from the model

Yj ¼ mðXjÞ þ hj; Wj ¼ Xj þ Uj; EðhjjXjÞ ¼ 0;

with Xj ; f X and Uj ; f Uj;

ð11Þ

where the Uj’s are independent of the (Xj, Yj)’s. Our estimatorcannot be applied directly to such data because there is nocommon error density fU, and therefore KU,k is not defined.Rather, we need to construct appropriate individual functionsKUj;k and then replace bSn;kðxÞ and bTn;kðxÞ in the definition of theestimator, by

bS Hn;kðxÞ ¼ n�1

Xn

j¼1

KUj;k;hðWj � xÞ and

bTHn;kðxÞ ¼ n�1

Xn

j¼1

YjKUj;k;hðWj � xÞ; ð12Þ

where we use the superscript H to indicate that we are treatingthe heteroscedastic case. As before, we require thatEfbS H

n;kðxÞjX1; . . . ;Xng ¼ Sn;kðxÞ and EfbT Hn;kðxÞjX1; . . . ;Xng ¼

Tn;kðxÞ:A straightforward solution would be to define KUj;k by

KUj;kðxÞ ¼ i�k 1

2p

ðe�itxf

ðkÞK ðtÞ=fUj

ð�t=hÞ dt;

where fUjis the characteristic function of the distribution of

Uj. However, theoretical properties of the corresponding esti-mator are generally not good, because the order of its varianceis dictated by the least favorable error densities (see Delaigleand Meister 2007, 2008). This problem can be avoided byextending the approach of those authors to our context bytaking

KUj;kðxÞ ¼ i�k 1

2p

ðe�itx

fUjð�tÞfðkÞK ðtÞ

n�1Pn

j¼1 jfUjðt=hÞj2

dt:

Alternatively, in the case where the error densities are unknownbut replicates are available as at (9) we can use instead

bS Hn;kðxÞ ¼ n�1

Xn

j¼1

KUj;k;hðWj � xÞ and

bT Hn;kðxÞ ¼ n�1

Xn

j¼1

YjKUj;k;hðWj � xÞ; ð13Þ

where Wj ¼ ðWj1 þWj2Þ=2 and

~KUj;kðxÞ ¼ i�k 1

2p

ðe�itx f

ðkÞK ðtÞ

n�1Pn

j¼1 eitðWj1�Wj2Þ=2dt;

adding a small positive number to the denominator when it getstoo small. This is a generalization of the estimator of Delaigleand Meister (2008).

5. FINITE SAMPLE PROPERTIES

5.1 Simulation Settings

Comparisons between kernel estimators and other methodshave been carried out by many authors in various contexts, withor without measurement errors. One of their major advantagesis that they are simple and can be easily applied to problemssuch as heteroscedasticity (see Section 4.2), nonparametricvariance, or mode estimation and detection of boundaries. Seealso the discussion in Delaigle and Hall (2008). As for anymethod, in some cases kernel methods outperform, and in othercases are outperformed by other methods. Our goal is not torederive these well-known facts, but rather to illustrate the newresults of our article.

We applied our technique for estimating m and m(1) to sev-eral examples to include curves with several local extrema and/or an inflection point, as well as monotonic, convex and/orunbounded functions. To summarize the work we are about topresent, our simulations illustrate in finite samples:

d the gain that can be obtained by using a local linear esti-mator (LLE) in the presence of boundaries, in comparison witha local constant estimator (LCE);

d properties of our estimator of m(1) for p ¼ 1 (LPE1) andp ¼ 2 (LPE2);

d properties of our estimator when the error variance isestimated from replicates;

d the robustness of our estimator against misspecification ofthe error density;

d the gain obtained by using our estimators comparedwith their naive versions (denoted, respectively, by NLCE,NLLE, NLPE1, or NLPE2), which pretend there is no error inthe data;

d the properties of the LCE using the sinc kernel (LCES) inthe presence of boundary points.

We considered the following examples: (1) m(x)¼ x3 exp(x4/1,000) cos(x) and h ; N(0, 0.62); (2) m(x)¼ 2x exp(�10x4/81),h ; N(0, 0.22); (3) m(x) ¼ x3, h ; N(0, 1.22); (4) m(x) ¼ x4,

Delaigle, Fan, and Carroll: Local Polynomial Estimators 353

h ; N(0, 42). In cases (1) and (2) we took X ;0.8X1 þ 0.2X2,where X1 ; f X1

ðxÞ ¼ 0:1875x21½�2;2�ðxÞ and X2 ; U[�1, 1]. Incases (3) and (4) we took X ; N(0, 1).

In each case considered, we generated 500 samples of var-ious sizes from the distribution of (W, Y), where W ¼ X þ Uwith U ; Laplace or normal of zero mean, for several values ofthe noise-signal-ratio var(U)/var(X). Except otherwise stated,we used the kernel whose Fourier transform is given by

fKðtÞ ¼ ð1� t2Þ8 � 1½�1;1�ðtÞ: ð13Þ

To illustrate the potential gain of using local polynomial esti-mators without confounding the effect of an estimator with thatof the smoothing parameter selection, we used, for eachmethod, the theoretical optimal value of h; that is, for eachsample, we selected the value h minimizing the IntegratedSquared Error ISE ¼

ÐfmðnÞðxÞ � bmðnÞðxÞg2 dx; where bmðnÞ is

the estimator considered.In the case where n ¼ 0, a data-driven bandwidth procedure

has been developed by Delaigle and Hall (2008). For example,for the LLE of m, NW, and DW, in Section 3.1 of that article, areequal to NW ¼ bTn;2

bSn;0 � bTn;1bSn;1 and DW ¼ bSn;2

bSn;0 � bS2n;1;

respectively (see also Figure 2 in that article). Note that thefully automatic procedure of Delaigle and Hall (2008) alsoincludes the possibility of using a ridge parameter in caseswhere the denominator DW(x) gets too small. It would bepossible to extend their method to cases where n > 0, bycombining their SIMulation EXtrapolation (SIMEX) idea withdata-driven bandwidths used in the error-free case, in much thesame way as they combined their SIMEX idea with cross-validation for the case n¼ 0. Although not straightforward, thisis an interesting topic for future research.

5.2 Simulation Results

In the following figures, we show boxplots of the 500 cal-culated ISEs corresponding to the 500 generated samples. Wealso show graphs with the target curve (solid line) and threeestimated curves (q1, q2, and q3) corresponding to, respectively,the first, second, and third quartiles of these 500 calculatedISEs for a given method.

Figure 1 shows the quantile estimated curves of m at (2) andboxplots of the ISEs, for samples of size n ¼ 500, when U isLaplace and var(U)¼ 0.2var(X). As expected by the theory, theLCE is more biased than the LLE near the boundary. Similarly,the LCES is more biased and variable than the LLE. As usual inmeasurement error problems, the naive estimators that ignorethe error are oversmoothed, especially near the modes and theboundary. The boxplots show that the LLE tends to work betterthan the LCE, but also tends to be more variable. Except in afew cases, both outperform the LCES and the naive estimators.

Figure 1 also shows boxplots of ISEs for curve (1) in the casewhere U is normal, var(U) ¼ 0.2var(X) and n ¼ 250. Here wepretended var(U) was unknown, but generated replicated dataas at (9) and estimated var(U) via (10). Because the errorvariance of the averaged data Wi ¼ ðWi1 þWi2Þ=2 is half theoriginal one, we applied each estimator with these averageddata, either assuming U was normal, or wrongly assuming itwas Laplace, with unknown variance estimated. We found thatthe estimator was quite robust against error misspecification, asalready noted by Delaigle (2008) in closely connected decon-volution problems. Like there, assuming Laplace distributionoften worked reasonably well (see also Meister 2004). Exceptin a few cases, the LLE worked better than the LCE, and bothoutperformed the LCES and the NLLE (which itself out-performed the NLCE not shown here).

Figure 1. Estimated curves for case (2) when U is Laplace, var(U)¼ 0.2 var (X) and n¼ 500. Estimators: local linear estimator (LLE, top left),local constant estimator (LCE, top center), the naive local linear estimator that ignores measurement error (NLLE, top right), the local constantestimators using the sinc kernel (LCES, bottom left), and boxplots of the ISEs (bottom center). Bottom right: boxplots of the ISEs for case (1)when U is normal, var(U) ¼ 0.2 var (X) and n ¼ 250. Data are averaged replicates and var(U) is estimated by (10).

354 Journal of the American Statistical Association, March 2009

At Figure 2 we show results for estimating the derivative ofcurve (4) in the case where U is Laplace, var(U)¼ 0.4var(X), andn ¼ 250. We assumed the error variance was unknown, and wegenerated replicated data and estimated var(U) by (10). Weapplied the LPE1 on both the averaged data and the originalsample of non-averaged replicated data. For the averaged data,the errors distribution is that of a Laplace convolved with itself,and we took either that distribution or wrongly assumed that theerrors were Laplace. We compared our results with the naiveNPLE1. Again, taking the error into account worked better thanignoring the error, even when a wrong error distribution wasused, and whether we used the original data or the averaged data.

Finally, Figure 3 concerns estimation of the derivative func-tion m(1) in case (3) when U is normal with var(U) ¼ 0.4var(X)and n ¼ 500. We generated replicates and calculated the LPE1and LPE2 estimators assuming normal errors or wronglyassuming Laplace errors, and pretended the error variance wasunknown and estimated it via (10). In this case as well, taking the

measurement error into account gave better results than ignoringthe errors, even when the wrong error distribution was assumed.The LPE2 worked better at the boundary than the LPE1, but atthe interior it is the LPE1 that worked better.

6. CONCLUDING REMARKS

In the 20 years since the invention of the deconvolutingkernel density estimator and the 15 years of its use for localconstant, Nadaraya-Watson kernel regression, the discovery ofa kernel regression estimator for a function and its derivativesthat has the same bias properties as in the no-measurement-error case has remained unsolved. By working with estimatingequations and using the Fourier domain, we have shown how tosolve this problem. The resulting kernel estimators are readilycomputed, and with the right degrees of the local polynomials,have the design adaptation properties that are so valued in theno-error case.

Figure 2. Estimates of the density function m(1) for curve (4) when U is Laplace, var(U) ¼ 0.4var(X) and n ¼ 250, using our local linearmethod (LPE1, left) and the naive local linear method that ignores measurement error (NLPE1, center), when data are averaged and var(U) isestimated by (10). Right: boxplots of ISEs, var(U) estimated by (10). Data averaged except for boxes 3 and 5. Box 2 wrongly assumes Laplaceerror.

Figure 3. Estimates of the derivative function m(1) for case (3) when U is normal with var(U) ¼ 0.4var(X) and n ¼ 500, using our local linearmethod (LPE1, top left) and local quadratic method (LPE2, top center) assuming that U is normal, using the LPE1 (top right) and LPE2 (bottomleft) wrongly assuming that the error distribution is Laplace, or using the NLPE2 (bottom center). Boxplots of the ISEs (bottom right). Data areaveraged replicates and var(U) is estimated by (10).

Delaigle, Fan, and Carroll: Local Polynomial Estimators 355

APPENDIX: TECHNICAL ARGUMENTS

A.1 Derivation of the Estimator

We have that

ffðXj�xÞkKhðXj�xÞgðtÞ ¼ð

eitxðXj � xÞkKhðXj � xÞ dx

¼ hkeitXj

ðe�ithuukKðuÞ du

¼ i�khkeitXj fðkÞK ð�htÞ;

where we used the fact that fðkÞK ðtÞ ¼ ik

Ðeituuk KðuÞ du: Sim-

ilarly, we find that

ffðWj�xÞkLk;hðWj�xÞgðtÞ ¼ i�khkeitWj fðkÞLkð�htÞ:

Therefore, from (5) and using E½eitWj jXj� ¼ eitXj fUðtÞ;Lk sat-isfies

fðkÞLkð�htÞ ¼ f

ðkÞK ð�htÞ=fUðtÞ:

From fðkÞLkðtÞ ¼ ik

Ðeituuk LkðuÞ du and the Fourier inversion

theorem, we deduce that

ikukLkðuÞ¼1

2p

ðe�ituf

ðkÞLkðtÞ dt¼ 1

2p

ðe�ituf

ðkÞK ðtÞ=fUð�t=hÞ dt:

A.2 Proofs of the Results of Section 3

We show only the main results and refer to a longer versionof this article, Delaigle et al. (2008), for technical results thatare straightforward extensions of results of Fan (1991a), Fanand Masry (1992) and Fan and Truong (1993).

In what follows, we will first give the proofs, referring todetailed Lemmas and Propositions that follow. A longer ver-sion with many more details is given by Delaigle et al. (2008).

Before we provide detailed proofs of the theorems, note thatbecause bSn;k; k ¼ 0; � � � ; p represents the (n þ 1)th row of bS�1

n ;we have Xp

k¼0

bSn;kbSn;kþj ¼ 0; if j 6¼ n and 1 if j ¼ n:

Consequently, it is not hard to show that

hnðn!Þ�1½bmðnÞðxÞ � mðnÞðxÞ� ¼Xp

k¼0

bSn;kðxÞbT�n;kðxÞ; ðA:1Þ

where

bT�n;kðxÞ ¼ bTn;kðxÞ �Xp

j¼0

h j mð jÞðxÞj!

bSn;kþjðxÞ: ðA:2Þ

Proof of Theorem 1. We give the proof in the case where p �n odd. The case p � n even is treated similarly by replacing

everywhere Sn,k(x) by Sn,k(x) � hS^n;k

:From (A.1) and Lemma A.1, we have

hnðn!Þ�1fbmðnÞðxÞ � mðnÞðxÞg ¼ ZnðxÞ þ OPðhpþ3Þ; ðA:3Þ

where

ZnðxÞ ¼Xp

k¼0

Sn;kðxÞbT�n;kðxÞ: ðA:4Þ

We deduce from Propositions A.1 and A.2 that the OP(hpþ3)term in (A.3) is negligible. Hence, hnðn!Þ�1½bmðnÞðxÞ � mðnÞðxÞ�is dominated by Zn, and to prove asymptotic normality ofbmðnÞðxÞ; it suffices to show asymptotic normality of Zn. To dothis, write Zn ¼ n�1

Pni¼1 Un;i; where

Un;i [ Pn;i þ Qn;i;

Pn;i ¼Xp

k¼0

Sn;kðxÞfYi � mðxÞgKU;k;hðWi � xÞ;

Qn;i ¼ �Xp

k¼0

Xp

j¼1

Sn;kðxÞh j mð jÞðxÞj!

KU;kþj;hðWi � xÞ:

ðA:5Þ

As in Fan (1991a), to prove thatPnj¼1 Un; j � nEUn; jffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

n var Un; j

p !L Nð0; 1Þ; ðA:6Þ

it suffices to show that for some h > 0,

limn!‘

EjUn;1j2þh

nh=2½EU2n;1�ð2þhÞ=2

¼ 0: ðA:7Þ

Take h as in (B4) and let k(u) ¼ E{|Y � m(x)|2þh|X ¼ u}. Wehave

EjPn; jj2þh ¼ð ð

kðyÞXp

k¼0

h�1Sn;kðxÞKU;kx� u� y

h

� �����������2þh

f XðyÞf UðuÞ du dy # Ch�1�h max0 # k # p

jjKU;kjjh‘ �ðjKU;kðuÞj2 du # Ch �bð2þhÞ�1�h;

from Lemma B.5 of Delaigle et al. (2008), and where, here andbelow, C denotes a generic positive and finite constant. Sim-ilarly, we have

EjQn; jj2þh ¼ EXp

k¼0

Xp

j¼1

Sn;kðxÞh j mð jÞðxÞj!

KU;kþj;hðWi � xÞ�����

�����2þh

¼ Oðh1�bð2þhÞÞ;

and thus E|Un, j|2þh # Ch–b(2þh)–1–h.

For the denominator, it follows from Proposition A.2 andLemma B.7 of Delaigle et al. (2008) that

E½U2n; j� ¼ h�2b�1f�2

X ðxÞðt2f XÞ � f UðxÞe>nþ1S�1S�S�1enþ1

¼ Ch�2b�1f1þ oð1Þg:

We deduce that (A.7) holds and the proof follows from theexpressions of E(Un, j) and var(Un, j) given in Propositions A.1and A.2. j

Proof of Theorem 2. Using similar techniques as in theordinary smooth error case and Lemmas B.8 and B.9 of themore detailed version of this article (Delaigle et al. 2008), itcan be proven that (A.3) holds in the supersmooth case aswell, and under the conditions of the theorem, the OP(hpþ3) is

356 Journal of the American Statistical Association, March 2009

negligible. Thus, as in the ordinary smooth error case, it suf-fices to show that, for some h > 0, (A.7) holds.

With Pn,i and Qn,i as in the ordinary smooth error case, wehave

EjPn;ij2þh# Ch�1�h max

0 # k # pjjKU;kjjh‘ �

ðjKU;kðuÞj2 du

# Chb2ð2þhÞ�1�h expfð2þ hÞh�b=gg;

EjQn; jj2þh# Ch max

1 # k # 2p

ðjKU;kðyÞj2þh dy

# Chb2ð2þhÞþ1 expfð2þ hÞh�b=gg;with b2 ¼ b0�1{b0 < 1/2}, where, here and later, C denotes ageneric finite constant, and where we used Lemma B.9 ofDelaigle et al. (2008). It follows that E½U2þh

n;i � # Chb2ð2þhÞ�1�h

expfð2þ hÞh�b=gg: Under the conditions of the theorem, weconclude that (A.7) holds for any h > 0 and (A.6) follows. j

Lemma A.1. Under the conditions of Theorem 1, supposethat, for all 0 # k # 2p, we have, when p � n is odd, (nh)�1/2

{R(KU,k)}1/2 ¼ O(hpþ1) and when p � n is even, (nh)�1/2

{R(KU,k)}1/2 ¼ O(hpþ2). Then,Xp

k¼0

bSn;kðxÞbT�n;kðxÞ ¼Xp

k¼0

Rn;kðxÞbT�n;kðxÞ þ OPðhpþ3Þ; ðA:8Þ

Where Rn,k(x) ¼ Sn,k(x) if p � n is odd and Rn, k(x) ¼ Sn,k(x) �

hS^n; k

xð Þ if p � n is even, and where S^i; j

denotesf 9XðxÞf�2

X xð Þ S�1 eSS�1� �

i; j:

Proof. The arguments are an extension of the calculations ofFan and Gijbels (1996, pp. 62 and 101–103). We have

bT�n;kðxÞ ¼ EfbT�n;kðxÞg þ OP

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffivarfbT�n;kðxÞgq �

:

By construction of the estimator, E½bT�n;kðxÞ� is equal to theexpected value of its error-free counterpart T�n;kðxÞ ¼ Tn;kðxÞ�Pp

j¼0 h jð j!Þ�1mð jÞðxÞ Sn;kþjðxÞ; which, by Taylor expansion, iseasily found to be

E½T�n;kðxÞ� ¼mðpþ1ÞðxÞðpþ 1Þ! f XðxÞmkþpþ1 hpþ1

þ mðpþ1ÞðxÞðpþ 1Þ! f 9XðxÞ þ

mðpþ2ÞðxÞðpþ 2Þ! f XðxÞ

�3mkþpþ2 hpþ2 þ oðhpþ2Þ:

ðA:9Þ

Because K is symmetric, it has zero odd moments, and thusE½T�n;kðxÞ�� hpþ1 if kþ p is odd and E½T�n;kðxÞ�� hpþ2 if kþ p iseven. Moreover,

varfbT�n;kðxÞg ¼ var

(bTn;kðxÞ � mðxÞbSn; kðxÞ �Xp

j¼1

h jð j!Þ�1

mð jÞðxÞbSn;kþjðxÞ)

¼ OhvarfbTn;kðxÞ � mðxÞbSn;kðxÞg

iþ O

"Xp

j¼1

varfbSn;kþjðxÞg#

¼ On

RðKU;kÞ=ðnhÞoþ O

(Xp

j¼1

RðKU; jþkÞ=ðnhÞ);

where we used

var½bTn;kðxÞ � mðxÞbSn;kðxÞ� # ðnh2Þ�1EhfY � mðxÞg2

K2U;kfðW � xÞ=hg

i¼ ðnh2Þ�1E

�EhfY � mðxÞg2K2

U;kfðW � xÞ=hgjXi�

¼ ðnh2Þ�1E�

t2ðXÞK2U;kfðW � xÞ=hgjX

�¼ ðnh2Þ�1

ð ðt2ðyÞK2

U;kfðyþ u� xÞ=hgf XðyÞ

fUðuÞ dy du # ðnhÞ�1jjt2f Xjj‘ RðKU;kÞand results derived in the proof of Lemma B.1 of Delaigle et al.(2008).

Using our previous calculations, we see that when k þ p isodd,

bT�n;kðxÞ ¼ c1hpþ1 þ oPðhpþ1Þwhereas, for k þ p even,bT�n;kðxÞ ¼ c2hpþ2 þ oPðhpþ2Þ;

where c1 and c2 denote some finite nonzero constants(depending on x but not on n). Now, it follows from LemmasB.1 and B.5 of Delaigle et al. (2008) that, under the conditionsof the lemma, bS ¼ f XðxÞSþ hf 9XðxÞeSþ OPðh2Þ: Let I denotethe identity matrix. By Taylor expansion, we deduce that

bS�1 ¼ f XðxÞSþ hf 9XðxÞeSn o�1

þOPðh2Þ

¼ ðI þ hf 9XðxÞf�1X ðxÞS�1 eSÞ�1S�1f�1

X ðxÞ þ OPðh2Þ¼ S�1f�1

X ðxÞ � hS�1 eSS�1f 9XðxÞf�2X ðxÞ þ OPðh2Þ:

Thus, we have bSi; j ¼ Si; j � hS^i; j

þ OPðh2Þ; where, due to thesymmetry properties of the kernel, Si,j ¼ 0 when i þ j is odd,

whereas S^n;j

¼ 0 when i þ j is even. This concludes the proof.

Proposition A.1. Under Conditions A, B, and O, we have forp � n odd

E½n!h�nZn� ¼ e>nþ1S�1mn!

ð pþ 1Þ! mðpþ1ÞðxÞhpþ1�n þ oðhpþ2�nÞ

and, for p � n even,

E½n!h�nZn� ¼ e>nþ1S�1em n!

ð pþ 2Þ!

3

"ð pþ 2Þmðpþ1ÞðxÞ f 9XðxÞ

f XðxÞþ mðpþ2ÞðxÞ

#hpþ2�n

� e>nþ1S�1 eSS�1mn!

pþ 1ð Þ!

m pþ1ð ÞðxÞ f 9XðxÞf XðxÞ

hpþ2�n þ oðhpþ2�nÞ:

Proof. From (A.4), we have E½Zn� ¼Pp

k¼0 Rn;kðxÞE½T�n;kðxÞ�;where E½T�n; kðxÞ� is given at (A.9). It follows that

Delaigle, Fan, and Carroll: Local Polynomial Estimators 357

E½Zn� ¼Xp

k¼0

Sn;kðxÞmðpþ1ÞðxÞðpþ 1Þ! f XðxÞmkþpþ1hpþ1

þXp

k¼0

Sn;kðxÞ

mðpþ1ÞðxÞðpþ 1Þ! f 9XðxÞ

þ mðpþ2ÞðxÞðpþ 2Þ! f XðxÞ

�mkþpþ2hpþ2

�Xp

k¼0

S^n;k

ðxÞmðpþ1ÞðxÞðpþ 1Þ! f 9XðxÞmkþpþ1hpþ2 þ oðhpþ2Þ:

Recall that Sn,k(x) ¼ 0 unless k þ n is even and S^n;k

xð Þ ¼ 0unless kþ n is odd, and write kþ p¼ (kþ n)þ (p� n). If kþn is even and p� n is odd or kþ n is odd and p� n is even, thenk þ p is odd and thus mkþpþ2 ¼ 0. If k þ n is odd and p � n isodd or if k þ n is even and p � n is even, then using similararguments we find mkþpþ1 ¼ 0. j

Proposition A.2. Under Conditions A, B, and O, we have

varðn!h�nZnÞ ¼ e>nþ1S�1S�S�1enþ1ðn!Þ2ðt2f XÞ � f UðxÞ

f 2XðxÞnh2bþ2nþ1

þ o1

nh2bþ2nþ1

� �:

Proof. Let Un as in the proof of Theorem 1. We have

varðUn;iÞ ¼ varðPn;iÞ þ varðQn;iÞ þ 2covðPn;i;Qn;iÞ:

We split the proof into three parts.(1) To calculate var(Pn,i), note that

E½fYi � mðxÞgKU;k;hðWi � xÞ�

¼ðfmðxþ hyÞ � mðxÞgykKðyÞf Xðxþ hyÞ dy ¼ OðhÞ;

and, noting that each KU,k is real, we have,

EhfYi � mðxÞg2KU;k;hðWi � xÞKU;k0;hðWi � xÞ

i¼Z

EhfYi � mðxÞg2jX ¼ y

iEhKU;k;hðWi � xÞ

3KU;k0;hðWi � xÞjX ¼ yif XðyÞ dy

¼Z Z

t2ðyÞKU;k;hðy þ u� xÞKU;k0;hðy þ u� xÞf UðuÞ

f XðyÞ du dy

¼ h�1

Z Zt2ðxþ hz� uÞKU;kðzÞKU;k0 ðzÞf UðuÞf Xðxþ hz

� uÞ du dz

¼ h�2b�1ðt2f XÞ � f UðxÞð�1Þk0i�k�k0 1

2pc2

Zjtj2b

fðkÞK ðtÞf

ðk0ÞK ðtÞ dt þ oðh�2b�1Þ;

where we used (B.1) of Delaigle et al. (2008), which states that

limn!‘

h2b

ðKU;kðyÞKU;k9ðyÞgðx� hyÞ dy

¼ i�k�k9ð�1Þ�k9 gðxÞc2

1

2p

ðjtj2b

fðkÞK ðtÞf

ðk9ÞK ðtÞ dt

with c as in (7). Finally

cov fYi � mðxÞgKU;k;hðWi � xÞ; fYi � mðxÞgKU;k9;hðWi � xÞ� �

¼ h�2b�1ðt2f XÞ � f UðxÞð�1Þk9i�k�k9 1

2pc2

ðjtj2b

fðkÞK ðtÞf

ðk9ÞK ðtÞdt

þ oðh�2b�1Þ;and thus

varðPn;iÞ ¼ h�2b�1ðt2f XÞ � f UðxÞXp

k;k9¼0

Sn;kðxÞSn;k9ðxÞS�k;k9

þ oðh�2b�1Þ:Now ðSn;0; . . . ; Sn;pÞ ¼ e>nþ1S�1f�1

X ðxÞ; which implies that

varðPn;iÞ ¼ h�2b�1f�2X ðxÞðt2f XÞ � f UðxÞe>nþ1S�1S�S�1enþ1

þ oðh�2b�1Þ:

(2) To calculate var(Qn,i), note that EfKU;kþj;hðWi � xÞg ¼ÐykþjKðyÞf Xðxþ hyÞ dy ¼ Oð1Þ and |E{KU,kþj,h(Wi �

x)KU,k9þj9,h(Wi � x)}| ¼ O(h�2b�1), by Lemma B.6 of Delaigleet al. (2008), which implies that

covfKU;kþj;hðWi � xÞ;KU;k9þj9;hðWi � xÞg ¼ Oðh�2b�1Þand var(Qn,i) ¼ O(h�2bþ1), which is negligible compared withvar(Pn,i).

(3) We conclude from (1) and (2) that var(Un,i)¼ var(Pn,i){1þ o(1)}, which proves the result. j

[Received April 2008. Revised October 2008.]

REFERENCES

Berry, S., Carroll, R. J., and Ruppert, D. (2002), ‘‘Bayesian Smoothing andRegression Splines for Measurement Error Problems,’’ Journal of theAmerican Statistical Association, 97, 160–169.

Butucea, C., and Matias, C. (2005), ‘‘Minimax Estimation of the Noise Leveland of the Deconvolution Density in a Semiparametric Convolution Model,’’Bernoulli, 11, 309–340.

Carroll, R. J., and Hall, P. (1988), ‘‘Optimal Rates of Convergence forDeconvolving a Density,’’ Journal of the American Statistical Association,83, 1184–1186.

——— (2004), ‘‘Low-Order Approximations in Deconvolution and Regressionwith Errors in Variables,’’ Journal of the Royal Statistical Society: Series B,66, 31–46.

Carroll, R. J., Maca, J. D., and Ruppert, D. (1999), ‘‘Nonparametric Regressionin the Presence of Measurement Error,’’ Biometrika, 86, 541–554.

Carroll, R. J., Ruppert, D., Stefanski, L. A., and Crainiceanu, C. M. (2006).Measurement Error in Nonlinear Models (2nd ed.), Boca Raton: Chapmanand Hall CRC Press.

Comte, F., and Taupin, M.-L. (2007), ‘‘Nonparametric Estimation of theRegression Function in an Errors-in-Variables Model,’’ Statistica Sinica, 17,1065–1090.

Cook, J. R., and Stefanski, L. A. (1994), ‘‘Simulation-Extrapolation Estimationin Parametric Measurement Error Models,’’ Journal of the American Stat-istical Association, 89, 1314–1328.

Delaigle, A. (2008), ‘‘An Alternative View of the Deconvolution Problem,’’Statistica Sinica, 18, 1025–1045.

Delaigle, A., Fan, J. and Carroll, R.J. (2008). Design-adaptive Local Poly-nomial Estimator for the Errors-in-Variables Problem. Long version avail-able from the authors.

Delaigle, A., and Hall, P. (2006), ‘‘On the Optimal Kernel Choice for Decon-volution,’’ Statistics & Probability Letters, 76, 1594–1602.

——— (2008), ‘‘Using SIMEX for Smoothing-Parameter Choice in Errors-in-Variables Problems,’’ Journal of the American Statistical Association, 103,280–287.

Delaigle, A., Hall, P., and Meister, A. (2008), ‘‘On Deconvolution WithRepeated Measurements,’’ Annals of Statistics, 36, 665–685.

358 Journal of the American Statistical Association, March 2009

Delaigle, A., and Meister, A. (2007), ‘‘Nonparametric Regression Estimation inthe Heteroscedastic Errors-in-Variables Problem,’’ Journal of the AmericanStatistical Association, 102, 1416–1426.

——— (2008), ‘‘Density Estimation with Heteroscedastic Error,’’ Bernoulli,14, 562–579.

Diggle, P., and Hall, P. (1993), ‘‘A Fourier Approach to NonparametricDeconvolution of a Density Estimate,’’ Journal of the Royal StatisticalSociety: Ser. B, 55, 523–531.

Fan, J. (1991a), ‘‘Asymptotic Normality for Deconvolution Kernel DensityEstimators,’’ Sankhya A, 53, 97–110.

——— (1991b), ‘‘Global Behavior of Deconvolution Kernel Estimates,’’ Sta-tistica Sinica, 1, 541–551.

——— (1991c), ‘‘On the Optimal Rates of Convergence for NonparametricDeconvolution Problems,’’ Annals of Statistics, 19, 1257–1272.

Fan, J., and Gijbels, I. (1996). Local Polynomial Modeling and Its Applications,London: Chapman & Hall.

Fan, J., and Masry, E. (1992), ‘‘Multivariate Regression Estimation with Errors-in-Variables: Asymptotic Normality for Mixing Processes,’’ J. Multiv. Anal.,43, 237–271.

——— (1997), ‘‘Local Polynomial Estimation of Regression Functions forMixing Processes,’’ Scandinavian Journal of Statistics, 24, 165–179.

Fan, J., and Truong, Y. K. (1993), ‘‘Nonparametric Regression With Errors inVariables,’’ Annals of Statistics, 21, 1900–1925.

Hall, P., and Meister, A. (2007), ‘‘A Ridge-Parameter Approach to Deconvo-lution,’’ Annals of Statistics, 35, 1535–1558.

Hu, Y., and Schennach, S. M. (2008), ‘‘Identification and Estimation of Non-classical Nonlinear Errors-in-Variables Models with Continuous Dis-tributions,’’ Econometrica, 76, 195–216.

Ioannides, D. A., and Alevizos, P. D. (1997), ‘‘Nonparametric Regression withErrors in Variables and Applications,’’ Statistics & Probability Letters, 32,35–43.

Koo, J.-Y., and Lee, K.-W. (1998), ‘‘B-Spline Estimation of RegressionFunctions with Errors in Variable,’’ Statistics & Probability Letters, 40,57–66.

Li, T., and Vuong, Q. (1998), ‘‘Nonparametric Estimation of the MeasurementError Model Using Multiple Indicators,’’ Journal of Multivariate Analysis,65, 139–165.

Liang, H., and Wang, N. (2005), ‘‘Large Sample Theory in a SemiparametricPartially Linear Errors-in-Variables Model,’’ Statistica Sinica, 15, 99–117.

Marron, J. S., and Wand, M. P. (1992), ‘‘Exact Mean Integrated Squared Error,’’Annals of Statistics, 20, 712–736.

Meister, A. (2004), ‘‘On the Effect of Misspecifying the Error Density in aDeconvolution Problem,’’ The Canadian Journal of Statistics, 32, 439–449.

——— (2006), ‘‘Density Estimation with Normal Measurement Error withUnknown Variance,’’ Statistica Sinica, 16, 195–211.

Neumann, M. H. (1997), ‘‘On the Effect of Estimating the Error Density in Non-parametric Deconvolution,’’ Journal of Nonparametric Statistics, 7, 307–330.

Schennach, S. M. (2004a), ‘‘Estimation of Nonlinear Models with Measure-ment Error,’’ Econometrica, 72, 33–75.

——— (2004b), ‘‘Nonparametric Regression in the Presence of MeasurementError,’’ Econometric Theory, 20, 1046–1093.

Staudenmayer, J., and Ruppert, D. (2004), ‘‘Local Polynomial Regression andSimulation-Extrapolation,’’ Journal of the Royal Statistical Society: Series B(General), 66, 17–30.

Staudenmayer, J., Ruppert, D., and Buonaccorsi, J. (2008), ‘‘Density Estima-tion in the Presence of Heteroscedastic Measurement Error,’’ Journal of theAmerican Statistical Association, 103, 726–736.

Stefanski, L. A. (2000), ‘‘Measurement Error Models,’’ Journal of the Ameri-can Statistical Association, 95, 1353–1358.

Stefanski, L., and Carroll, R. J. (1990), ‘‘Deconvoluting Kernel Density Esti-mators,’’ Statistics, 21, 169–184.

Stefanski, L. A., and Cook, J. R. (1995), ‘‘Simulation-Extrapolation: TheMeasurement Error Jackknife,’’ Journal of the American Statistical Associ-ation, 90, 1247–1256.

Taupin, M. L. (2001), ‘‘Semi-Parametric Estimation in the Nonlinear StructuralErrors-in-Variables Model,’’ Annals of Statistics, 29, 66–93.

Zwanzig, S. (2007), ‘‘On Local Linear Estimation in Nonparametric Errors-in-Variables Models,’’ Theory of Stochastic Processes, 13, 316–327.

Delaigle, Fan, and Carroll: Local Polynomial Estimators 359