Nonparametric Estimation and Inference on Conditional...

Nonparametric Estimation and Inference onConditional Quantile Processes�

Zhongjun Quy

Boston University

Jungmo Yoonz

Claremont Mckenna College

March 24, 2011; this version: June 17, 2011

Abstract

We consider the estimation and inference about a nonparametrically speci�ed con-

ditional quantile process. For estimation, a two-step procedure is proposed. The �rst

step utilizes local linear regressions and maintains quantile monotonicity through sim-

ple inequality constraints. The second step involves linear interpolation between adja-

cent quantiles. When computing the estimator, the bandwidth parameter is allowed to

vary across quantiles to adapt to the data sparsity. The procedure is computationally

simple to implement and is feasible even for relatively large data sets. For inference, we

�rst obtain a uniform Bahadur-type representation for the conditional quantile process.

Then, we show that the estimator converges weakly to a continuous Gaussian process,

whose critical values can be estimated via simulations by exploiting the fact that it is

conditionally pivotal. Next, we demonstrate how to compute the optimal bandwidth,

construct uniform con�dence bands and test hypotheses about the quantile process.

Finally, we examine the performance of the bandwidth selection rule and the uniform

con�dence bands through simulations.

Keywords: nonparametric quantile regression, monotonicity constraint, uniform Ba-

hadur representation, uniform inference.

�We thank Adam McCloskey, Denis Tkachenko, seminar participants at University of Iowa, Boston Uni-versity and 2011 Econometric Society summer meeting for useful comments and suggestions.

yDepartment of Economics, Boston University, 270 Bay State Rd., Boston, MA, 02215 ([email protected]).zRobert Day School of Economics and Finance, Claremont Mckenna College, 500 East Ninth Street,

Claremont, California, 91711 ([email protected]).

1 Introduction

Models for conditional quantiles play an important role in econometrics and statistics. They

complement the conventional conditional mean models, o¤ering a broader view of relation-

ships between variables than the mean e¤ects. The parametric approach to conditional

quantile modeling was pioneered by Koenker and Bassett (1978), while the nonparametric

approach was developed in a series of works including Bhattacharya (1963), Stone (1977),

Chaudhuri (1991), Koenker, Portnoy and Ng (1994), Fan, Hu and Truong (1994) and Yu

and Jones (1998). Reviews of the related literature can be found in Koenker (2005).

One is often interested in examining a full family of quantiles to obtain a complete analysis

of the stochastic relationships between variables. This motivation underlies the considera-

tion of conditional quantile processes. A seminal contribution to this analysis is Koenker

and Portnoy (1987), which established a uniform Bahadur representation and serves as the

foundation for further developments in this area. More recently, Koenker and Xiao (2002)

considered the issue of testing composite hypotheses about quantile regression processes us-

ing Khmaladzation (Khmaladze, 1981). Chernozhukov and Fernández-Val (2005) considered

the same issue and suggested resampling as an alternative approach. Angrist, Chernozhukov

and Fernández-Val (2006) established inferential theory in misspeci�ed models. Their results

can be used to study a wide range of issues, including but not restricted to (i) testing alter-

native model speci�cations, (ii) testing stochastic dominance, and (iii) detecting treatment

e¤ect signi�cance and heterogeneity.

A main focus of the above literature has been on quantile processes in parametric quantile

models. However, there are frequent occasions when parametric speci�cations fail, making

more �exible nonparametric methods desirable. Relaxing parametric assumptions is also

often practically feasible in view of the increasing availability of rich data sets in both econo-

metrics and statistics. Such considerations motivate us to consider estimation and inference

about conditional quantile processes in a nonparametric setting. Our �rst goal is to propose

a simple to implement estimator and the second is to provide a tractable inferential theory

1

that is useful for constructing uniform con�dence bands as well as conducting hypothesis

tests.

We propose a two-step procedure to estimate the conditional quantile process. The �rst

step applies local linear regressions (Fan, Hu and Truong, 1994 and Yu and Jones, 1998) to

a grid of quantiles and maintains quantile monotonicity through a set of simple inequality

constraints. The constraints depend neither on the data (such as their support) nor on the

model (such as the number of covariates), but only on the number of quantiles entering

the estimation. The second step involves linear interpolation between adjacent quantiles,

returning an estimate for the quantile process. This procedure has two useful features.

First, the bandwidth parameter is allowed to vary across quantiles to adapt to the data

sparsity. This is important because data are typically sparse near the tails of the conditional

distribution. Second, the optimization can be written as a linear programing problem with

linear inequality constraints and can be solved using the algorithm in Koenker and Ng (2005)

without essential modi�cation. Our experimentation suggests that the procedure is feasible

even for relatively large sample sizes.

For inference, we obtain three sets of results. First, we derive a uniform Bahadur-type

representation, generalizing Theorem 2.1 in Koenker and Portnoy (1987) to the local linear

setting. While being of independent interest, this forms a key step in proving the subsequent

results. Second, we prove that, under a certain rate condition on the quantile grid, the

proposed estimator has the same �rst-order asymptotic distribution as when the inequality

constraints are absent. Therefore, the constraints serve as a �nite sample correction, hav-

ing no �rst-order e¤ect asymptotically. The limiting distribution is a continuous Gaussian

process, whose critical values can be estimated via simulations by exploiting the fact that it

is conditionally pivotal, drawing on the insight of Parzen, Wei and Ying (1994) and Cher-

nozhukov, Hansen and Jansson (2009). We discuss in detail how to compute the optimal

bandwidth, construct uniform con�dence bands and conduct hypothesis tests. Finally, we

establish a connection with results in Chernozhukov, Fernández-Val and Galichon (2010).

Speci�cally, we show that an alternative estimator, obtained from the same two-step pro-

2

cedure but without inequality constraints, satis�es a functional central limit theorem, and

can therefore be used as the basis for rearrangement. The limiting distribution we obtain

is applicable to this rearranged estimator. Therefore, our results broaden the application of

rearrangement to the nonparametric local linear regression context.

We evaluate the proposed methods using simulations and brie�y summarize the results

below. First, the bandwidth selection rule performs well: it is able to deliver a small (large)

bandwidth when the curvature of the conditional quantile function is high (low). Second,

the proposed estimator, the conventional quantile-by-quantile local linear estimator (i.e.,

Fan, Hu and Truong, 1994) and its rearranged version all perform similarly in terms of the

integrated mean squared error criterion. This result is in line with our theoretical �nding

that they all share the same �rst-order asymptotic distribution. Third, the con�dence band

can have undercoverage because the bias term in the estimator can be di¢ cult to estimate, a

well known fact in the literature. To address this issue, we suggest a simple modi�cation that

allows for a more �exible bias correction. The resulting con�dence band is asymptotically

conservative. Simulation evidence shows that it has adequate coverage, even with small

sample sizes, and that it is only mildly wider than the con�dence band under the conventional

bias correction.

This paper is related to Belloni, Chernozhukov and Fernández-Val (2011). A key di¤er-

ence between the two works lies in the approach undertaken. Belloni, Chernozhukov and

Fernández-Val (2011) consider a series-based framework, where the conditional quantile func-

tion is modeled globally with a large number of parameters. (Consequently, their framework

is also useful for linear quantile regressions with many covariates.) The current paper is

based on local linear regressions, where the quantile function is modeled locally by a few pa-

rameters and the modeling complexity is governed by the bandwidth. The second di¤erence

is how the quantile monotonicity is achieved. Belloni, Chernozhukov and Fernández-Val

(2011) apply monotonization procedures to a preliminary series-based estimator, while in

our framework monotonicity enters directly into estimation via inequality constraints.

When viewed from a methodological perspective, the current paper is related to the fol-

3

lowing two strands of literature. First, the estimation procedure is related to the literature

on estimating nonparametric regression relationships subject to monotonicity constraints, a

majority of which has focused on monotonicity as a function of the covariate. For exam-

ple, Mammen (1991) considered an estimator consisting of a kernel smoothing step and an

isotonisation step. He and Shi (1998) and Koenker and Ng (2005) considered smoothing

splines subject to linear inequality constraints as a way to enforce monotonicity. In the cur-

rent paper, the monotonicity constraint is with regards to quantiles, giving rise to a di¤erent

type of estimator than those discussed above, and requiring di¤erent techniques for study-

ing its statistical properties. Note that He (1997), Dette and Volgushev (2008), Bondell,

Reich and Wang (2010) and Chernozhukov, Fernández-Val and Galichon (2010) considered

monotonicity with respect to quantiles. The connection with their works is discussed later

in the paper. Second, there is an active literature that studies uniform con�dence bands

for nonparametric conditional quantile functions. The closely related works are Härdle and

Song (2010) and Koenker (2010). The former considered kernel-based estimators and ob-

tained con�dence bands using strong approximations. The latter paper considered additive

quantile models analyzed with total-variation penalties and obtained con�dence bands us-

ing Hotelling�s (1939) tube formula. These results are uniform in covariates but pointwise in

quantiles. Therefore, their results and ours complement each other and, when jointly applied

in practice, allow one to probe a broad spectrum of topics.

The paper is organized as follows. Section 2 introduces the issue of interest. Section 3

presents the estimator while Section 4 discusses computational implementation. Section 5

establishes the asymptotic properties of the estimator. Section 6 discusses bandwidth selec-

tion. Section 7 shows how to construct uniform con�dence bands and conduct hypothesis

tests on the conditional quantile process. Section 8 establishes the connection to rearrange-

ment. Section 9 includes some Monte Carlo experiments and Section 10 concludes. All

proofs are included in the two appendices, with Appendix A containing proofs of the main

results and Appendix B some auxiliary lemmas.

The following notation is used. The superscript 0 indicates the true value. For a real-

4

valued vector z, kzk denotes its Euclidean norm. [z] is the integer part of z. 1(�) is the

indicator function. supp(f) stands for the support of the function f . D[0;1] stands for the

set of functions on [0; 1] that are right continuous and have left limits, equipped with the

Skorohod metric. The symbols �)�and �!p�denote weak convergence under the Skorohod

topology and convergence in probability, and Op(�) and op(�) is the usual notation for the

orders of stochastic magnitude.

2 The issue of interest

Let (X; Y ) be an Rd+1-valued random vector, where Y is a scalar response variable and X is

an Rd-valued explanatory variable. Denote their densities by fY jX(�) and fX(�); respectively.

Also, denote the conditional cumulative distribution of Y given X = x by FY jX(:jx) and the

conditional quantile function at � 2 (0; 1) by Q(� jx), i.e.,

Q(� jx) = F�1Y jX(� jx) = inf�s : FY jX(sjx) � �

:

In this paper, Q(� jx) is allowed to be a general nonlinear function of x and � . We �x

x and treat Q(� jx) as a process in � . Our goal is in estimating and conducting inference

on Q(� jx) for all � 2 T , where T = [�1; �2] with 0 < �1 � �2 < 1. Here, T lies strictly

within the unit interval in order to allow the conditional distribution of Y to have unbounded

support. Throughout the paper, we assume the availability of f(xi; yi)gni=1, a sample of n

observations that are i.i.d as (X;Y ).

We present two examples to motivate the above issue of interest and to illustrate the

applicability of the methods we propose. More discussion follows in Section 7.

Example 1 (Quantile treatment e¤ect, QTE). QTE measures the e¤ect of a treatment

(a policy) on the distribution of the potential outcomes. Speci�cally, let X = (D;Z), where

D is a binary policy variable and Z are some covariates. Let Q(� jd; z) denote the � -th

conditional quantile of the potential outcome conditioned on the treatment D = d and

5

controls Z = z. Then the QTE is de�ned as1

Q(� j1; z)�Q(� j0; z):

One may be interested in examining (i) whether the treatment has a signi�cant e¤ect at

some quantile, i.e., testing H0: Q(� j1; z) = Q(� j0; z) for all � 2 T against H1: Q(� j1; z) 6=

Q(� j0; z) for some � 2 T , and (ii) whether the policy e¤ect is homogeneous, i.e., testing H0:

Q(� j1; z)�Q(� j0; z) = �(z) for some �(z) and for all � 2 T against the hypothesis that the

preceding di¤erence is quantile dependent. Koenker and Xiao (2002) and Chernozhukov and

Fernández-Val (2005) developed inferential procedures for above hypotheses for parametric

conditional quantile models. Results in the current paper will allow us to analyze these

issues under a nonparametric setting. Note that, in practice, the set T can be �exibly

chosen to re�ect the issues of interest. For example, if the policy target is the low part of

the distribution, then we can choose T = ["; 0:5] with " being a small positive number.

Example 2 (Conditional stochastic dominance). Stochastic dominance is an impor-

tant concept for the study of poverty and income inequality. Although a large part of the

literature has focused on unconditional dominance (see McFadden, 1989 and Barrett and

Donald, 2003), conditional dominance has also received interest recently, see Koenker and

Xiao (2002), Chernozhukov and Fernández-Val (2005) and Linton, Maasoumi and Whang

(2005). Speci�cally, let Q1(� jx) and Q2(� jx) be two conditional quantile functions associated

with two conditional distributions. Then, Distribution One (weakly) �rst-order stochasti-

cally dominates Two at x if and only if

Q1(� jx) � Q2(� jx) for all � 2 (0; 1):

Distribution One (weakly) second-order stochastically dominates Two at x if and only ifZ �

0

Q1(sjx)ds �Z �

0

Q2(sjx)ds for all � 2 (0; 1):

1This concept traces back to Lehmann (1975) and Doksum (1974), and recent works include Heckman,Smith, and Clements (1997), Abadie, Angrist, and Imbens (2002), Chernozhukov and Hansen (2005) andFirpo (2007), among others.

6

One may be interested in testing the above null against non-dominance alternatives. This

can be achieved by letting �(� jx) = Q1(� jx)�Q2(� jx) and considering one sided Kolmogorov-

Smirnov tests based on �(� jx) and its inde�nite integral process. Note that the above three

papers considered parametric conditional quantiles/mean models, while results in this paper

will allow us to examine these hypotheses under a nonparametric setting.

3 The estimator

The estimation procedure is based on a family of local linear regressions. We therefore start

by a brief review of the local linear �t; more detailed discussions can be found in Chaudhuri

(1991), Fan, Hu and Truong (1994) and Yu and Jones (1998). The method assumes that

Q(� jx) is a smooth function of x for a given � 2 (0; 1) and exploits the following �rst-order

Taylor approximation:

Q(� jxi) � Q(� jx) + (xi � x)0@Q(� jx)@x

: (1)

The local linear estimator of Q(� jx), denoted by �(�); is determined via

(�(�); �(�)) = arg min�(�);�(�)

nXi=1

�� (yi � �(�)� (xi � x)0�(�))K

�xi � x

hn;�

�; (2)

where �� (u) = u (� � I(u < 0)) is the check function, K is a kernel function and hn;� is a

bandwidth parameter that depends on � . As demonstrated in Fan, Hu and Truong (1994),

the local linear �tting is simple to implement, yet has several advantages over the local con-

stant �t. In particular, the bias of �(�) is not a¤ected by the value of f 0X(x) and @Q(� jx)=@x

and is of the same order as in the interior at the boundary, thus not requiring edge modi�ca-

tion. In addition, plug-in data-driven bandwidth selection is free of estimating the derivatives

of the marginal density, therefore is relatively simple to implement. As will be seen later,

these three features continue to hold for our estimator. Note that the results in Fan, Hu and

Truong (1994) and Yu and Jones (1998) are pointwise in � .

We propose the following two-step procedure to estimate the process Q(� jx); � 2 T .

7

� STEP 1. Partition T into a grid of equally spaced quantiles � 1; :::; �m. The precise

condition on m will be stated later. Consider the following constrained optimization

problem

minf�(�j);�(�j)gmj=1

mXj=1

nXi=1

�� (yi � �(� j)� (xi � x)0�(� j))K

�xi � x

hn;�j

�(3)

subject to

�(� j) � �(� j+1) for all j = 1; :::;m� 1:

Denote the estimates by e�(� 1); :::; e�(�m).� STEP 2. Linearly interpolate between the estimates to obtain an estimate for the

entire process, i.e., for any � 2 T ; compute

e��(�) = n (�) e�(� j) + (1� n (�)) e�(� j+1) if � 2 [� j; � j+1]; (4)

where n (�) = (� j+1 � �)=(� j+1 � � j):

Remark 1 The set of constraints �(� j) � �(� j+1) (j = 1; :::;m� 1) forms a necessary and

su¢ cient condition for the monotonicity of the conditional quantiles. They depend neither

on the data (such as their support) nor on the model (such as the number of covariates), but

only on the number of quantiles entering the estimation. This is due to the local nature of

the problem. To our knowledge, this is the �rst time such a local feature is explicitly exploited

to maintain quantile monotonicity. Clearly, the above procedure can be applied to a di¤erent

x without essential modi�cation.

Remark 2 In practice, solving (3) is essentially a special case of Koenker and Ng (2005),

who provided algorithms for computing quantile regression estimates subject to general linear

inequality constraints, and a computationally e¢ cient algorithm can be constructed with little

cost. The detail is provided in the next Section.

Remark 3 The linear interpolation step is motivated by Neocleous and Portnoy (2008). It

permits obtaining a tractable inferential theory as presented later.

8

Remark 4 There exist other methods for ensuring quantile monotonicity. He (1997) ex-

ploited the structure of a location-scale model. Bondell, Reich and Wang (2010) considered

quantile smoothing splines and showed that non-crossing can be imposed via inequality con-

straints on the knots. Dette and Volgushev (2008) obtained monotonic quantile curves by �rst

estimating the conditional distribution function and then inverting it to obtain the quantiles.

However, their results are pointwise in quantiles, and it remains an interesting open ques-

tion whether they can lead to tractable inferential theory for the quantile process. Finally,

Chernozhukov, Fernández-Val and Galichon (2010) proposed a simple and elegant framework

to address the non-crossing issue using rearrangement. The connection with their work is

discussed later in the paper.

4 Computation

The joint estimation problem in (3) can be solved by the Frisch-Newton method developed

in Portnoy and Koenker (1997) and Koenker and Ng (2005) without essential modi�cations.

Below, we summarize the key formulations for the ease of implementation.

Let wij = K�(xi � x)=hn;�j

�. Because w�� (z) = �� (wz) for any w � 0, (3) can be

rewritten as

minf�(�j);�(�j)gmj=1

mXj=1

nXi=1

�� (wijyi � wij�(� j)� wij(xi � x)0�(� j)) (5)

subject to

�(� j) � �(� j+1) for all j = 1; :::;m� 1: (6)

To have a linear programing formulation of the above minimization problem, we de�ne

a few notations. Let y denote an n-vector of the dependent variable and X denote an

n � (d + 1) matrix of explanatory variables whose �rst column is a vector of ones. Let

bj = (�(� j); �(� j)0)0 denote a (d+ 1)-vector of parameters and e denote an n-vector of ones.

We de�ne a diagonal matrix Wj with (w1j; : : : ; wnj) along the diagonal and let y�j = Wj y

9

and X�j = Wj X. Then, (5) can be rewritten as

minfuj ;vj ;bjgmj=1

mXj=1

(� je0uj + (1� � j)e

0vj)

subject to0BBBBBB@u1

u2...

um

1CCCCCCA�0BBBBBB@

v1

v2...

vm

1CCCCCCA+0BBBBBB@

X�1 0 � � � 0

0 X�2 � � � 0

...... � � � ...

0 0 � � � X�m

1CCCCCCA

0BBBBBB@b1

b2...

bm

1CCCCCCA�0BBBBBB@

y�1

y�2...

y�m

1CCCCCCA 2 fOmng;

where bj 2 Rd+1, (u0j; v0j)0 2 R2n+ for j = 1; : : : ;m and Omn denote an mn-vector of zeros.

The inequality constraints (6) can be written as0BBBBBB@�s0 s0 0 � � � 0 0

0 �s0 s0 � � � 0 0...

...... � � � ...

...

0 0 0 � � � �s0 s0

1CCCCCCA

0BBBBBB@b1

b2...

bm

1CCCCCCA 2 Rm�1+

where a (d+1)-vector s is de�ned by s = (1; 0; : : : ; 0)0. This is the conventional Rb � r form

of constraints when b = (b01; : : : ; b0m)

0, R is the matrix in front of b, and r = Om�1, an (m�1)-

vector of zeros. To further simplify the notation, let � = (� 1; : : : ; �m)0, u = (u01; : : : ; u

0m)

0,

v = (v01; : : : ; v0m)

0, y� = (y�01 ; : : : ; y�0m)

0, and X� = diag(X�j )mj=1, a matrix whose diagonal blocks

are given by X�j . Then, in matrix notation, the above primal problem can be written as

minu;v;b

�(� e)0 u+ ((1� �) e)0 vju� v +X�b = y�; Rb 2 Rm�1+ ; (u0; v0) 2 R2mn+

As highlighted in Koenker and Ng (2005), the dual formulation can greatly reduce the

e¤ective dimensionality of the problem. Note that the dual constraint is26664� e

(1� �) e

Om(d+1)

37775�26664

Imn 0

�Imn 0

X�0 R0

3777524 �1

�2

35 2 R2mn+ � fOm(d+1)g:

10

Therefore, the dual problem is

max�

�y�0�1 + r0�2jX�0�1 +R0�2 = 0; (� � 1) e � �1 � � e; �2 2 Rm�1+

:

By the change of variables a1 = (1� �) e+ �1 and a2 = �2, we have

maxa

�y�0a1 + r0a2jX�0a1 +R0a2 = X�0((� � 1) e); a1 2 [0; 1]mn ; a2 2 Rm�1+

: (7)

Koenker and Ng (2005) showed how to solve (7) using interior methods based on Frisch�s

log-barrier approach. See their Section 3 for details. In practice, the R package quantreg

with the aid of SparseM can be used to compute the estimator e¢ ciently.

5 Asymptotic properties

We impose the following assumptions, where x stands for the evaluation point and �(x) for

some small open neighborhood of x.

Assumption 1 The evaluation point x 2 supp(fX) with fX being continuously di¤erentiable

at x and 0 < fX(x) <1.

Assumption 2 (i) fY jX(Q(� jx)jx) is Lipschitz continuous in � over T . (ii) There exist

some �nite constants fL > 0, fU > 0 and � > 0, such that

fL � fY jX(Q(� js) + �js) � fU

for all j�j � �; s 2 �(x) \ supp (fX) and � 2 T :

Assumption 3 (i) Q(� jx) and @Q(� jx)=@� are �nite and Lipschitz continuous in � over T .

(ii) The elements of @2Q(� js)=@s@s0 are �nite and Lipschitz continuous over the set f(s; �):

s 2 �(x) \ supp (fX) and � 2 T g. (iii) tr (@2Q(� jx)=@x@x0) > 0 for all � 2 T .

Assumption 4 The kernel K is compactly supported, bounded, having �nite �rst-order

derivatives and satisfying

K(�) � 0;ZK(u)du = 1;

ZuK(u)du = 0;

Zuu0K(u)du = �2(K)Id;

where �2(K) is a positive scalar and Id is the d-by-d identity matrix.

11

Assumption 5 The bandwidth hn;� satis�es hn;� = c(�)hn with hn = O(n�1=(4+d)) and

nhdn !1, and c(�) is Lipschitz continuous satisfying 0 < c � c(�) � �c <1 for all � 2 T .

Assumption 1 is fairly standard. Assumption 2 imposes restrictions on the conditional

density. Its part (i) requires the conditional density at x to have �nite �rst-order derivative

with respect to � over T . Its part (ii) concerns some neighborhood surrounding x and T

and is used to ensure that local expansions of (3) are well behaved, so that the estimator will

have the usual rate of convergence. Assumption 3 imposes restrictions on the conditional

quantile process. Part (i) requires it and its derivative in � to vary smoothly with respect to

� . Part (ii) ensures that the error in the Taylor expansion (1) is uniformly small. Part (iii)

implies that the curvature of Q(� jx) in x is not zero, therefore a linear model is inadequate.

This part is needed only for the purpose of determining the bandwidth parameter.

Note that a common theme of Assumptions 2 and 3 is that they are local: they impose

restrictions only on neighborhoods surrounding x and T . For example, if T = [0:5; 0:8], then

the lower part of the conditional distribution and the extreme quantiles are left essentially

unrestricted. The conditional densities can be unbounded or zero in such regions.

Assumption 4 permits univariate as well as multivariate kernels, although it rules out the

ones with unbounded support such as the Gaussian kernel and ones with unbounded deriv-

atives. Assumption 5 imposes restrictions on the set of bandwidths, in particular requiring

them to vary smoothly across quantiles. This is needed to ensure stochastic equicontinuity.

It is not restrictive, and it is satis�ed by the optimal bandwidth that minimizes the asymp-

totic MSE; see the section on bandwidth selection. Note that for a �xed � , we use the same

bandwidth for every coordinate of x. This greatly simpli�es the expressions for the bias and

variance, especially when x is a boundary point. More general formulas allowing for di¤erent

bandwidths can be obtained by applying similar arguments.

The main challenge in establishing the asymptotic property is that, due to the constraints,

the estimates e�(� j) for di¤erent j can no longer be studied separately. Because we allowm to grow with the sample size, the problem asymptotically involves an in�nite number

of inequality constraints. Our analysis takes an indirect approach, by �rst studying the

12

unconstrained estimates from (2) and then linking them to the constrained estimates in (3).

Speci�cally, we sequentially examine the properties of the following three quantities:

�(�) : the solution of (2) at a given � 2 T ;

��(�) : the solution from the two-step procedure but without imposing the constraints,

e��(�) : the solution from the two-step procedure.

The results obtained in the intermediate steps are of independent interest as discussed below.

The asymptotic properties of the estimators are di¤erent for x lying in the interior of

supp(fX) and for x lying near the boundary. Motivated by Ruppert and Wand (1994), let

Ex;� = fz : h�1n;� (x � z) 2 supp(K)g and call x an interior point if Ex;� � supp(fX) for all

� 2 T . Otherwise, x will be called a boundary point. If x is a �xed point in the interior of

supp(fX), then x is an interior point for all large n. Therefore, to study the boundary point

case, we consider a sequence of points x = xn converging to a point x@ on the boundary of

supp(fX) su¢ ciently rapidly so that x is a boundary point for all n. Formally, we suppose

x satis�es

x = x@ + hn;�z for some � 2 T and some �xed z 2 supp (K):

We also de�ne the following set which serves as the domain of integration:

Dx;� = fz 2 Rd : (x+ hn;�z) 2 supp (fX)g [ supp (K)g:

The next Assumption is made to avoid degeneracies in the boundary point case and is the

same as in Ruppert and Wand (1994).

Assumption 6 There is a convex set C with nonnull interior and containing x@ such that

infx2C

f(x) > 0:

Theorem 1 (Uniform Bahadur representation) Let Assumptions 1-6 hold, then the

following results hold uniformly in � 2 T .

13

1. If x is an interior point, thenqnhdn;�

��(�)�Q(� jx)� d�h

2n;�

�=

�nhdn;�

��1=2Pni=1 (� � 1(u0i (�) � 0)Ki;�

fX (x) fY jX (Q(� jx)jx)+ op (1) ;

where

d� =1

2�2(K) tr

�@2Q(� jx)@x@x0

�; Ki;� = K

�xi � x

hn;�

�; u0i (�) = yi �Q(� jxi):

2. If x is a boundary point, thenqnhdn;�

��(�)�Q(� jx)� db;�h

2n;�

�=

�01Nx (�)�1 �nhdn;��1=2Pn

i=1 (� � 1(u0i (�) � 0) zi;�Ki;�

fX (x) fY jX (Q(� jx))+ op (1) ;

where �1 = (1; 0d)0; u 2 Rd; �u = (1; u0)0;

Nx (�) =

ZDx;�

�u�u0K (u) du; db;� =1

2�01Nx (�)

�1ZDx;�

u0@2Q(� jx)@x@x0

u�uK (u) du:

The result generalizes Theorem 2.1 in Koenker and Portnoy (1987) to the nonparametric

setting. The proof faces two di¢ culties that do not arise in the parametric context. First,

because the quantile function is approximated using a Taylor expansion, the e¤ect of the

remainder term needs to be explicitly accounted for. Second, we allow the bandwidth to

vary across quantiles, which poses challenges for establishing stochastic equicontinuity. The

main ingredients used in the proof are the decomposition of Knight (1998) and a chaining

argument as in Bickel (1975). The maximal inequalities developed in Bai (1996) and Oka

and Qu (2011) are also useful for our analysis. We note that this result is of independent

interest because of its direct connection to inference on conditional quantile processes.

The above Theorem implies that the limit of �(�) is driven by the leading term in the

Bahadur representation. This paves the way for establishing the weak convergence of ��(�),

stated below.

Theorem 2 (Weak convergence of the unconstrained estimator) Let Assumptions

1-6 hold and let

��(�) = n (�) �(� j) + (1� n (�)) �(� j+1) if � 2 [� j; � j+1];

14

where n (�) = (� j+1 � �)=(� j+1 � � j). Suppose m=(nhdn)1=4 !1 as n!1. Then:

1. If x is an interior point, thenqnhdn;�fX (x)fY jX (Q(� jx)jx)

��(�)�Q(� jx)� d�h

2n;�

�) G (�) ;

where G (r) is a zero mean Gaussian process de�ned over T with continuous sample

path and covariance given by

E(G (r)G (s)) = (� (r)� (s))�d=2 (r ^ s� rs)

ZK

�u

� (r)

�K

�u

� (s)

�du; (8)

where

� (�) =hn;�hn;1=2

=c(�)

c(1=2):

2. If x is a boundary point, thenqnhdn;�fX (x)fY jX (Q(� jx)jx)

��(�)�Q(� jx)� db;�h

2n;�

�) Gb (�) ;

where Gb (�) is a zero mean Gaussian process de�ned over T with continuous sample

path and covariance given by

E(Gb (r)Gb (s)) = (� (r)� (s))�d=2 (r ^ s� rs) �01Nx (r)

�1 Tx(r; s)Nx (s)�1 �1; (9)

where

Tx(r; s) =

ZDx;h1=2

24 1

u�(r)

35K � u

� (r)

�h1 u0

�(r)

iK

�u

� (s)

�du

and � (r) and Nx (r) have the same de�nition as before.

Remark 5 The scalar Gaussian process G (�) depends only on the kernel and the bandwidth.

In the extreme case where the bandwidths are equal across quantiles, it is simply the Brownian

bridge (truncated to the interval T ) multiplied by (� (r)� (s))�d=2RK2 (u) du. Stute (1986,

Theorem 1) obtained such a re-scaled Brownian bridge as the weak limit of a nearest-neighbor-

type conditional empirical process. In the boundary case, the structure of the process is

similar, except now it also depends on the location of x relative to the boundary of the data

support.

15

Remark 6 The rate condition m=(nhdn)1=4 ! 1 ensures that the linear interpolation in-

duces no loss of e¢ ciency asymptotically, i.e., the limiting process is the same as if all

quantiles were being estimated directly.

Remark 7 When formulating the result, we have purposely related the bandwidths at � to

the one at the median. This facilitates the discussion and computation of the bandwidth

selection rule in the next Section.

The next result presents the asymptotic property of e��(�).Theorem 3 (Weak convergence of the constrained estimator) Let Assumptions 1-6

hold, then

1. sup1�j�mqnhdn;�j je�(� j)�Q(� jjx)j = Op (1) :

2. Assume m=(nhdn)1=4 !1 but m=(nhdn)

1=2 ! 0 as n!1: Then,qnhdn;�fX (x)fY jX (Q(� jx)jx)

�e��(�)�Q(� jx)� d�h2n;�

�) G (�)

if x is an interior point andqnhdn;�fX (x)fY jX (Q(� jx)jx)

�e��(�)�Q(� jx)� db;�h2n;�

�) Gb (�)

if x is a boundary point, where the relevant quantities have the same de�nitions as in

the previous theorem.

Remark 8 The �rst result allows for arbitrary relationships between m and n. The sec-

ond result involves a new rate condition m=(nhdn)1=2 ! 0. This is used to ensure that the

constraints (6) act as a �nite sample correction, having no e¤ect on the �rst-order limiting

distribution. The adequacy of the limiting distributions under di¤erent m will be evaluated

using simulations. It turns out that the property of e��(�) is rather insensitive to such choices,as long as m is not too small. Motivated by this �nding, we suggest the following simple rule

m = max�10;pnhdn= log(nh

dn)�;

where the �rst argument safeguards against using too few quantiles when the sample size is

relatively small and the second permits choosing a large number if the sample size is large.

16

6 Bandwidth selection

Theorems 1-3 imply that the estimator e��(�) has the same limiting behavior as the estimatorfrom the conventional quantile-by-quantile approach for any �xed � 2 T . Therefore, the same

bandwidth selection rule can be applied. We state the result as a Corollary.

Corollary 1 (Optimal bandwidth for an interior point). Let Assumptions 1-5 and

those stated in Theorem 3 hold. Then, the bandwidth that minimizes the (interior) asymptotic

MSE of e��(�) for any � 2 T is given by

hn;� =

0B@ � (1� �) dRK (v)2 dv

�22(K)fX (x) fY jX (Q(� jx)jx)2ntr�@2Q(� jx)@x@x0

�o21CA1=(4+d)

n�1=(4+d): (10)

In Appendix A, we also verify that the above bandwidth satis�es Assumption 5. The re-

sult implies that the bandwidth selection rule for estimating the conditional quantile process

is conceptually no more di¢ cult than in the conventional situation. The result also illus-

trates how the bandwidths change as � shifts away from the center of the distribution. It

typically widens as changes in fY jX (Q(� jx)jx) often dominate other terms.

In practice, the challenges in computing the optimal bandwidth in the above Corollary

are tr (@2Q(� jx)=@x@x0). In the simulations and empirical application, we have experimented

with an approximation due to Yu and Jones (1998) which treats tr (@2Q(� jx)=@x@x0) as

constant across quantiles. Speci�cally, under such an approximation, the optimal bandwidth

at � is related to the median via�hn;�hn;1=2

�4+d= 4� (1� �)

fY jX

�Q(1

2jx)jx

�fY jX (Q(� jx)jx)

!2:

Next, applying a Normal reference method (considering fY jX to be a Gaussian density) as

in Yu and Jones (1998) , the above relationship simpli�es to�hn;�hn;1=2

�4+d=

2� (1� �)

��(��1(�))2; (11)

where � and � are the density and the cdf of a standard normal random variable. Finally,

the bandwidth hn;1=2 can be determined using the above Corollary.

17

7 Uniform con�dence bands and hypothesis tests

7.1 Con�dence bands

Corollary 2 (Uniform con�dence band for the conditional quantile process). Let

Assumptions 1-6 and those stated in Theorem 3 hold, then an asymptotic (1� p) con�dence

band for Q(� jx) is given by

1. If x is an interior point,

Cp =��e��(�)� d�h

2n;�

�� n;� (x)Zp;

�e��(�)� d�h2n;�

�+ �n;�Zp

�; (12)

where

�n;� (x) =1q

nhdn;�fX (x)fY jX (Q(� jx)jx)

and Zp is the (1�p)-th percentile of sup�2T jG (�)j :

2. If x is a boundary point,

Cp =��e��(�)� db;�h

2n;�

�� n;� (x)Zp;

�e��(�)� db;�h2n;�

�+ �n;�Zp

�;

where �n;� (x) is the same as above and Zp is now the (1�p)-th percentile of sup�2T jGb (�)j :

The critical values of sup�2T jG (�)j or sup�2T jGb (�)j can be consistently estimated via

simulations. Consider the interior point case �rst. From Theorems 1 and 2,

(nhdn;� )�1=2Pn

i=1 (� � 1(u0i (�) � 0))Ki;�

fX (x)1=2

) G (�) :

Therefore, it su¢ ces to simulate the left hand side process. The key observation is that,

conditional on fxigni=1, the numerator has the same �nite sample distribution as

(nhdn;� )�1=2

nXi=1

(� � 1(ui � � � 0))Ki;� ;

with ui being i.i.d. Uniform(0,1) and independent of fxigni=1. This result follows from the

Skorohod representation (letting ui = FY jX(yijxi)):

1(u0i (�) � 0), 1(yi � F�1Y jX(� jxi) � 0), 1(FY jX(yijxi)� � � 0), 1(ui � � � 0)

18

and the independence of f(xi; yi)gni=1. Let fX (x) be a consistent estimate of fX (x), then

(nhdn;� )�1=2Pn

i=1 (� � 1(ui � � � 0))Ki;�

fX (x)1=2

) G (�) : (13)

This suggests the following simple procedure: (1) simulate the left hand side of (13) by

drawing i.i.d. Uniform(0,1) random variables and keeping Ki;� and fX (x) �xed, (2) repeat

the �rst step for a large number of replications, (3) compute the desired functionals and sort

them to obtain critical values.

The same idea can be applied in the boundary point case to consistently estimate the

critical values of sup�2T jGb (�)j. Speci�cally, from the second result in Theorems 1 and 2,

�01Nx (�)�1 �nhdn;��1=2Pn

i=1 (� � 1(u0i (�) � 0))zi;�Ki;�

fX (x)1=2

) Gb (�) :

Because (see Equation (A.11) in Appendix A)

(nhdn;� )�1

nXi=1

zi;�z0i;�Ki;� !p fX (x)Nx (�) (14)

and using the Skorohod representation, we have

�01�(nhdn;� )

�1Pni=1 zi;�z

0i;�Ki;�

��1 �nhdn;�

��1=2Pni=1 (� � 1(ui � � � 0)) zi;�Ki;�

fX (x)�1=2 ) Gb (�) ;

(15)

where fX (x) is a consistent estimate of fX (x). Therefore, we can simply simulate from

the left hand side distribution. The resulting critical values are asymptotically valid even

if x is an interior point. Because (15) automatically accounts for the relative location of

the evaluation point from the boundary, it allows us to easily handle situations where the

support of fX may have a complicated shape.

The above procedure is inspired by Parzen, Wei and Ying (1994) and Chernozhukov,

Hansen and Jansson (2009). The former paper exploited the conditional pivotal property to

obtain a resampling method, while the latter used such a property to conduct �nite sample

inference for quantile regressions. In Belloni, Chernozhukov and Fernández-Val (2011), the

above idea is generalized and applied to quantile models of increasing dimensions. The

19

application of the conditional pivotal property to the local regression setting and, more

importantly, to provide inference for both interior and boundary points appears to be new.

Now consider estimating the bias. We examine directly the boundary case, as it again

includes the interior point case as a special case. Note that from Theorem 1,

db;� = �01 ffX (x)Nx (�)g�1

8<:12fX (x)ZDx;�

u0@2Q(� jx)@x@x0

u

24 1u

35K (u) du9=;

The term inside the �rst curly bracket can be consistently estimated by (14). The second

curly bracket is the limit of (see Equation (A.14) in Appendix A)

�h�2n;��nhdn;�

��1 nXi=1

ei (�) zi;�Ki;� ;

where

ei (�) = �1

2

�xi � x

hn;�

�0@2Q(� jx)@x@x0

�xi � x

hn;�

�h2n;� :

Therefore, db;� can be estimated using

�01

(nhdn;� )

�1nXi=1

zi;�z0i;�Ki;�

!�1 �h�2n;�

�nhdn;�

��1 nXi=1

ei (�) zi;�Ki;�

!: (16)

In (16), the only unknown is @2Q(� jx)=@x@x0. As suggested in the literature, this can be

estimated using a global or local cubic regression, although caution needs to be exercised

because the estimate can be imprecise. As before, (16) automatically accounts for the relative

location of the evaluation point from the boundary, allowing us to easily handle situations

where the support of fX may have a complicated shape.

Remark 9 We use the formulas (15) and (16) throughout in the simulation section when

constructing the con�dence band. Note that as a by-product, it is no longer necessary to

estimate fX (x). This is because it appears in �n;� (x) and (15) and cancels out. This feature

is desirable because it can be challenging to directly estimate fX (x) if the dimension of x is

high.

20

Remark 10 It is well known that the bias term in the estimator is di¢ cult to estimate.

Consequently, the con�dence bands can have undercoverage. Therefore, some modi�cations

that re�ect the estimation uncertainty are desirable. We suggest the following simple modi�ed

con�dence band, where the idea is to allow for, but do not force, a bias adjustment: (Consider

the interior point case; the boundary point case can be handled in the same way.)

��e��(�)� d+� h2n;�

�� n;� (x)Zp;

�e��(�)� d�� h2n;�

�+ �n;�Zp

�; (17)

where

d+� = max (d� ; 0) and d�� = min(d� ; 0):

This modi�ed band has the same or higher coverage as in the case of not making any bias

adjustment (i.e., by setting d� to zero, as often done in the literature), which is preferable

when the bias is small. It also has the same or higher coverage than the conventional band

(12), which is preferable when the bias is large. Therefore, it has the �avor of a hybrid

procedure. Here, an important concern is whether the band will be too wide to be informative.

Our simulation evidence in Section 9 suggests otherwise. This is because when the curvature

of the conditional function (d� ) is high, the proposed bandwidth selection rule can deliver a

small bandwidth, therefore the value of d�h2n;� remains modest. Consequently, (17) is typically

only mildly wider than (12)

7.2 Hypothesis tests

We illustrate testing hypotheses on quantile processes using the two examples in Section 2.

Without loss of generality, assume the evaluation points are interior points. To be consistent

with the notation used there, we let Q(� jx) denote e��(�) when the evaluation point is x.Example 1 (continued). Treatment signi�cance can be tested using a Kolmogorov�

Smirnov (KS) type test:

sup�2T

qnhdn;�

��Q(� j1; z)� Q(� j0; z)�� :

21

Let x1 = (1; z), x2 = (0; z), and de�ne

G(j)x (�) =G(j) (�)p

fX (x)fY jX (Q(� jx)jx)

with G(j) (�) being independent copies of G (�). Assume Assumptions 1-5 holds at x1 and

x2 and @2Q(� jx1)=@x@x0 = @2Q(� jx2)=@x@x0. Then, the test has the following null limiting

distribution

sup�2T

��G(1)x1 (�)�G(2)x2 (�)�� ;

whose critical values can be consistently estimated by applying the simulation algorithm

described earlier to the �rst and second components. The treatment homogeneity hypothesis

can be tested using

sup�2T

qnhdn;�

��Q(� j1; z)� Q(� j0; z)�ZT

nQ(� j1; z)� Q(� j0; z)

od�

�� ;whose null limiting distribution, under the same assumptions as stated above, is given by

sup�2T

��G(1)x1 (�)�G(2)x2 (�)�ZT

�G(1)x1 (�)�G(2)x2 (�)

d�

�� ;whose critical values can be obtained in a similar manner.

Example 2 (continued). The null of �rst-order conditional stochastic dominance can be

tested against a non-dominance alternative using a signed version of the KS test:

sup�2T

qnhdn;�

��1�Q1(� jx)� Q2(� jx) � 0��

Q1(� jx)� Q2(� jx)�� :

Assume Assumptions 1-5 hold for both samples, and @2Q1(� jx)=@x@x0 = @2Q2(� jx)=@x@x0,

then under the least favorable null hypothesis, the test has the following limiting distribution

sup�2T

��1 �G(1)x (�)�G(2)x (�) � 0� �G(1)x (�)�G(2)x (�)

�� :The second-order dominance can be tested via

sup�2T

qnhdn;�

��1�Z �

"

�Q1(sjx)d� � Q2(sjx)

�d� � 0

�Z �

"

�Q1(sjx)d� � Q2(sjx)

�d�

�� ;22

then, under the least favorable null hypothesis, the test converges to

sup�2T

��1�Z �

"

�G(1)x (�)�G(2)x (�)

�d� � 0

�Z �

"

�G(1)x (�)�G(2)x (�)

�d�

�� :In the above construction, the lower limit is some positive constant " instead of zero. This

allows the conditional distributions to have unbounded support, at the cost of possibly

sacri�cing some power if the main di¤erences between two distributions lies in the lower

tails. Again, the critical values can be consistently estimated via simulations.

8 Connection to rearrangement

Chernozhukov, Fernández-Val and Galichon (2010) proposed a generic framework for es-

timating monotone probability and quantile curves. Their method uses some preliminary

estimator that is not necessarily monotonic and applies rearrangement to ensure monotonic-

ity. Importantly, they showed that if the preliminary estimator satis�es a functional central

limit theorem, say

an

�Q(� jx)�Q(� jx)

�) �Gx (�) over � 2 T

with an ! 1 as n ! 1 and �Gx (�) being a continuous stochastic process, then the re-

arranged estimator QR(� jx) satis�es

an

�QR(� jx)�Q(� jx)

�) �Gx (�) over � 2 T

provided Q(� jx) is strictly monotonic in � and continuously di¤erentiable in both arguments.

The next result shows that Q(� jx) = ��(�) su¢ ces as such an estimator.

Corollary 3 Suppose Assumptions 1-6 hold and hn;� = c(�)n�1=(4+d). Let QR(� jx) be the re-

arranged version of ��(�) with the latter de�ned in Theorem 2. Then, the results in Theorem

2 hold with ��(�) replaced by QR(� jx).

9 Monte Carlo experiments

We focus on three issues: (1) the performance of the bandwidth selection rule, (2) the

performance of the proposed estimator relative to other estimators including the standard

23

local linear estimator and its rearranged version, and (3) the property of the con�dence band.

We consider the following three models, each expressed in terms of their conditional

quantile functions:

Model 1 : Q(� jx) = x1 � x2 + (0:5x1 + 0:3x2)Qe1(�):

Model 2 : Q(� jx) = (0:5 + 2x1 + sin(2�x1 � 0:5)) + x2Qe1(�):

Model 3 : Q(� jx) = log(x1x2) + 1=(1 + exp (�x1Qe1(�)� x2Qe2(�))) + x2Qe1(�):

The regressors x1 and x2 are i.i.d. U(0; 1). The error terms e1 and e2 are i.i.d. N(0; 1)

and U(0; 1) respectively, with quantile functions given by Qe1(�) = ��1(�) and Qe2(�) = � .

Model 1 is a linear location-scale model. Model 2 is a similar model but with nonlinearity

in the location. Model 3 is a fairly complicated nonlinear model. The data are simulated

by �rst obtaining n independent draws ui (1 � i � n) from the U(0; 1) distribution and

then using Yi = Q(uijxi) to generate Yi: For instance, in Model 3, the data are generated

via Yi = Q(uijxi) = log(x1ix2i)+ 1=(1+ exp (�x1iQe1(ui)� x2iQe2(ui)))+ x2iQe1(ui) with ui

being i.i.d. U(0; 1). Figure 1 depicts Q(0:5jx) and Q(0:8jx) for Models 2 and 3. It shows

that these two models exhibit signi�cant curvature in x, which poses substantial challenges

for estimation.

Other aspects of the simulation design are as follows. We examine three evaluation

points, x = (0:5; 0:5), (0:75; 0:75), and (0:9; 0:9). The latter can be regarded as a boundary

point, since in a typical simulation run the size of bandwidth at that point is greater than

0.1. The sample sizes are n = 250; 500. Given that the experiment involves estimating the

quantile process nonparametrically at x = (0:9; 0:9), n = 250 should be viewed as a very

small sample size. We set T = [0:2; 0:8] unless stated otherwise. The number of quantiles

(m) in the quantile grid is set to 10; 20 and 30 to examine the result sensitivity. The kernel

function is the product of univariate Epanechnikov kernels. We also experimented with the

product Gaussian kernel and obtained qualitatively and quantitatively similar results. They

are omitted here to save space. All subsequent results are based on 500 simulation runs.

24

9.1 Performance of the bandwidth selection rule

We �rst provide some details on estimation. The optimal bandwidths given in Corollary 1

are estimated in three steps.

STEP 1. We obtain a set of pilot bandwidths in order to calculate relevant quantities in

(10). This is done by �rst obtaining a pilot bandwidth for the local median regression, which

is determined via leave-one-out cross validation, and then relating it to the other quantiles

using (11) following Yu and Jones (1998).

STEP 2. We estimate tr (@2Q(� jx)=@x@x0) and fY jX(:jx). For the former, we apply

a local cubic polynomial regression with the bandwidth determined via leave-one-out cross

validation. For the latter, we use

fY jX(zjx) =Z

1

hyxK1

�z � y

hyx

�dF (yjx); (18)

where hyx is the bandwidth parameter and K1 is a univariate Epanechnikov kernel, and

F (yjx) = supf� 2 (0; 1)jQ(� jx) � yg with Q(� jx) equal to our proposed two-step estimatore��(�) computed using the bandwidth in Step 1. To compute (18), we take F (yjx), drawrandom samples from it, and apply kernel smoothing with the generated random variates.

The bandwidth hyx is determined by Silverman�s rule-of-thumb method. In doing so, we

�nd that some degree of over-smoothing is useful. We therefore use bandwidth chyx with

c = 2 for all the reported results. Under suitable regularity conditions, it can be shown that

fY jX(�jx) converges uniformly to fY jX(�jx) due to the uniform convergence of Q(�jx) using

the arguments in Portnoy and Koenker (1989, Lemma 3.2).

STEP 3. We use the estimated quantities from Step 2 to compute the optimal bandwidth

for the median using (10) and relate it to the bandwidths for other quantiles using (11).

The selected bandwidths are summarized in Table 1 (for n = 250) and 2 (for n = 500).

The results for both cases are qualitatively similar. The procedure is able to adapt to

di¤erent degrees of curvatures in Q(� jx). It delivers larger bandwidths for Model 1 relative

to Model 2 and 3, which is desirable since Model 1 is linear in x. Within a given Model,

it can deliver di¤erent bandwidths at di¤erent design points. For example, in Model 2, the

25

selected bandwidths at (0.5, 0.5) are greater than at (0.75, 0.75). This is consistent with

tr (@2Q(� jx)=@x@x0) being of greater magnitude at the latter point. Similar results hold for

Model 3.

9.2 The relative performance of di¤erent estimators

We contrast the �nite sample performance of the proposed estimator (labelled Proposed)

with three alternatives. The �rst alternative is the classical estimator in Koenker and Bassett

(1978), labelled as QR. This comparison is used to illustrate the gain or loss from considering

a nonparametric estimator. The second alternative is the standard quantile-by-quantile

local linear estimator (labelled Local linear), which is included to illustrate the e¤ect of the

monotonicity constraint. The third alternative is the rearranged version of the local linear

estimator (labelled Rearrangement), as discussed in Section 8. This is used to gauge whether

one approach is superior to another. For all the estimators, except QR, we use the same

bandwidth selection rule described previously.

We report the RMISE averaged over 500 simulation runs, with

RMISE =

vuut 1

m

mXj=1

jQ(� jjx)�Q(� jjx)j2;

where m is the number of quantile to be estimated. Table 3 reports the results for m = 30.

We obtained similar �ndings with m = 10 and 20. They are omitted here to save space.

First, consider the proposed estimator and QR. When the underlying model is actually

linear (Model 1), the proposed estimator pays only modest price in terms of e¢ ciency. This is

because the bandwidth selection rule proposed earlier is able to deliver a wide bandwidth in

this case. When nonlinearity (Models 2 and 3) is present, the nonparametric estimator shows

substantially superior performance overall. Therefore, the results are quite encouraging.

Second, consider the proposed estimator and the standard quantile-by-quantile local lin-

ear estimator. Note that the fraction of quantile crossings is listed below the local linear

estimator. Quantile crossing is relatively infrequent at interior points, however it occurs

frequently when the design point is close to the boundary. The proposed estimator often has

26

smaller RMISE, but the di¤erence is always small. This is consistent with Theorem 3 that

these two estimators are �rst-order asymptotically equivalent. Therefore, in our simulations,

the monotonicity constraint serves as a way to ensure a coherent estimate, but not a way to

substantially improve the precision in �nite samples.

Third, consider the proposed estimator and the rearranged estimator. They have very

similar RMISE, con�rming the result in Corollary 3 that these two estimators are �rst-order

asymptotically equivalent. Interestingly, the RMISE of the rearranged estimator tends to be

slightly smaller. However, the di¤erence is too small to prefer one estimator to another.

One might wonder whether the above conclusion depends on the criterion function

(RMISE). To this end, we repeated the analysis by replacing RMISE with the inte-

grated absolute deviation and the �ndings are quantitatively very similar. They are omitted

to save space. Another issue is whether the di¤erence between the estimators is substan-

tially greater when the quantile range is increased. To this end, we repeated the analysis

with T = [0:05; 0:95]. However, the results remain qualitatively the same. They are omitted

to save space.

9.3 Properties of the uniform con�dence band

We examine the following two issues: whether the modi�ed con�dence band (17) shows a

signi�cant improvement over the conventional one (12), and whether the improvement comes

at the cost of a substantially wider band.

To obtain the con�dence band, we need critical values of sup�2T jG (�)j and sup�2T jGb (�)j

and the biases d� and db;� . These are estimated using the method outlined in Section 7, where

the conditional density is recomputed using (18) but with the new e��(�) estimated using theoptimal bandwidth.

Tables 4 and 5 contrast the coverage ratios under the two scenarios. The modi�cation

delivers adequate coverage for all models, sample sizes, quantile grids and design points.

This includes the challenging situation of estimating the quantile process at x = (0:9; 0:9)

with n = 250. The most signi�cant improvement comes under Model 1. For this model,

27

the selected bandwidths are large, therefore the bias adjustment plays a critical role. Since

the true bias is zero, the conventional bias correction can only have detrimental e¤ects on

the coverage ratio. For the other two models, the improvement is more important when the

sample size is small or when the x is close to the boundary of the data support.

Tables 6 and 7 summarize the relative width of the two con�dence bands at the two

representative quantiles � = 0:5; 0:8: For each simulation replication, we compute the ratios

and then report their means and standard deviations over the 500 replications. Table 6

corresponds to n = 250. For Model 1, the means of the ratios are between 1.095 and 1.195;

for Model 2, between 1.061 and 1.157; for Model 3, between 1.081 and 1.153. Table 7

corresponds to n = 500. The corresponding ratios are between 1.085 and 1.179 for Model 1,

1.055 and 1.152 for Model 2, and 1.051 and 1.152 for Model 3. Therefore, the modi�cation

only mildly increases the widths of the bands. Overall, the simulation suggests that the

modi�cation is useful.

10 Conclusion

We have considered the estimation and inference about a nonparametrically speci�ed condi-

tional quantile process. The estimation method is computationally simple to implement and

is practically feasible even for relatively large data sets. We obtained a uniform Bahadur

representation, a functional central limit theorem and provided practical procedures for con-

structing uniform con�dence bands and testing hypothesis about the quantile process. We

also proposed a modi�ed procedure for bias adjustment, which performed well in the simu-

lations.

28

References

Abadie, A., J. Angrist, and G. Imbens (2002). Instrumental variables estimates of the e¤ectof subsidized training on the quantiles of trainee earnings. Econometrica 70 (1), pp. 91�117.

Angrist, J., V. Chernozhukov, and I. Fernández-Val (2006). Quantile regression under mis-speci�cation, with an application to the U.S. wage structure. Econometrica 74 (2), pp.539�563.

Bai, J. (1996). Testing for parameter constancy in linear regressions: An empirical distrib-ution function approach. Econometrica 64 (3), pp. 597�622.

Barrett, G. F. and S. G. Donald (2003). Consistent tests for stochastic dominance. Econo-metrica 71 (1), pp. 71�104.

Belloni, A., V. Chernozhukov, and I. Fernández-Val (2011). Conditional quantile processesbased on series or many regressors. Working Paper, Boston University.

Bhattacharya, P. K. (1963). On an analog of regression analysis. The Annals of MathematicalStatistics 34 (4), pp. 1459�1473.

Bickel, P. J. (1975). One-step huber estimates in the linear model. Journal of the AmericanStatistical Association 70 (350), pp. 428�434.

Billingsley, P. (1968). Convergence of Probability Measures. Wiley.

Bondell, H. D., B. J. Reich, and H. Wang (2010). Noncrossing quantile regression curveestimation. Biometrika 97 (4), pp. 825�838.

Chaudhuri, P. (1991). Nonparametric estimates of regression quantiles and their local ba-hadur representation. The Annals of Statistics 19 (2), pp. 760�777.

Chernozhukov, V. and I. Fernández-Val (2005). Subsampling inference on quantile regressionprocesses. Sankhya: The Indian Journal of Statistics 67 (2), pp. 253�276.

Chernozhukov, V., I. Fernández-Val, and A. Galichon (2010). Quantile and probabilitycurves without crossing. Econometrica 78 (3), 1093�1125.

Chernozhukov, V. and C. Hansen (2005). An IV model of quantile treatment e¤ects. Econo-metrica 73 (1), pp. 245�261.

Chernozhukov, V., C. Hansen, and M. Jansson (2009). Finite sample inference for quantileregression models. Journal of Econometrics 152 (2), pp. 93�103.

Dette, H. and S. Volgushev (2008). Non-crossing non-parametric estimates of quantile curves.Journal of the Royal Statistical Society. Series B 70 (3), pp. 609�627.

Doksum, K. (1974). Empirical probability plots and statistical inference for nonlinear modelsin the two-sample case. The Annals of Statistics 2 (2), pp. 267�277.

29

Fan, J., T. C. Hu, and Y. K. Truong (1994). Robust non-parametric function estimation.Scandinavian Journal of Statistics 21 (4), pp. 433�446.

Firpo, S. (2007). E¢ cient semiparametric estimation of quantile treatment e¤ects. Econo-metrica 75 (1), pp. 259�276.

Hall, P. and C. C. Heyde (1980). Martingale Limit Theory and Its Application. AcademicPress.

Härdle, W. K. and S. Song (2010). Con�dence bands in quantile regression. EconometricTheory 26, pp. 1180�1200.

He, X. (1997). Quantile curves without crossing. The American Statistician 51, pp. 186�192.

He, X., P. Ng, and S. Portnoy (1998). Bivariate quantile smoothing splines. Journal of theRoyal Statistical Society. Series B 60 (3), pp. 537�550.

He, X. and P. Shi (1988). Monotone b-spline smoothing. Journal of the American StatisticalAssociation 93, pp. 643�650.

Heckman, J. J., J. Smith, and N. Clements (1997). Making the most out of programmeevaluations and social experiments: Accounting for heterogeneity in programme impacts.The Review of Economic Studies 64 (4), pp. 487�535.

Hotelling, H. (1939). Tubes and spheres in n-spaces, and a class of statistical problems.American Journal of Mathematics 61 (2), pp.440�460.

Khmaladze, È. V. (1981). Martingale approach in the theory of goodness-of-�t tests. Theoryof Probability and Its Applications 26 (2), 240�257.

Knight, K. (1998). Limiting distributions for L1 regression estimators under general condi-tions. The Annals of Statistics 26 (2), pp. 755�770.

Koenker, R. (2005). Quantile Regression. Cambridge University Press.

Koenker, R. (2010). Additive models for quantile regression: Model selection and con�dencebandaids. Working Paper, University of Illinois at Urbana-Champaing.

Koenker, R. and G. Bassett, Jr (1978). Regression quantiles. Econometrica 46 (1), pp. 33�50.

Koenker, R. and P. Ng (2005). Inequality constrained quantile regression. Sankhya: TheIndian Journal of Statistics 67 (2), pp. 418�440.

Koenker, R., P. Ng, and S. Portnoy (1994). Quantile smoothing splines. Biometrika 81 (4),pp. 673�680.

Koenker, R. and S. Portnoy (1987). L-estimation for linear models. Journal of the AmericanStatistical Association 82 (399), pp. 851�857.

30

Koenker, R. and Z. Xiao (2002). Inference on the quantile regression process. Economet-rica 70 (4), pp. 1583�1612.

Lehmann, E. L. (1975). Nonparametrics: statistical methods based on ranks. Holden-Day,San Francisco.

Linton, O., E. Maasoumi, and Y.-J. Whang (2005). Consistent testing for stochastic domi-nance under general sampling schemes. The Review of Economic Studies 72 (3), pp. 735�765.

Mammen, E. (1991). Estimating a smooth monotone regression function. The Annals ofStatistics 19 (2), pp. 724�740.

McFadden, D. (1989). Testing for stochastic dominance. In T. Fomby and T.K. Seo (eds.)Studies in the Economics of Uncertainty, pp. 1�6. Springer-Verlag.

Neocleous, T. and S. Portnoy (2008). On monotonicity of regression quantile functions.Statistics & Probability Letters 78 (10), pp. 1226�1229.

Oka, T. and Z. Qu (2011). Estimating structural changes in regression quantiles. Journal ofEconometrics 162, pp. 248�267.

Parzen, M. I., L. J. Wei, and Z. Ying (1994). A resampling method based on pivotalestimating functions. Biometrika 81 (2), pp. 341�350.

Portnoy, S. and R. Koenker (1989). Adaptive l-estimation for linear models. The Annals ofStatistics 17 (1), pp. 362�381.

Portnoy, S. and R. Koenker (1997). The gaussian hare and the laplacian tortoise: Com-putability of squared- error versus absolute-error estimators. Statistical Science 12 (4), pp.279�296.

Ruppert, D. and M. P. Wand (1994). Multivariate locally weighted least squares regression.The Annals of Statistics 22 (3), pp. 1346�1370.

Stone, C. J. (1977). Consistent nonparametric regression. The Annals of Statistics 5 (4), pp.595�620.

Stute, W. (1986). Conditional empirical processes. The Annals of Statistics 14 (2), pp.638�647.

Yu, K. and M. C. Jones (1998). Local linear quantile regression. Journal of the AmericanStatistical Association 93 (441), pp. 228�237.

31

Appendix A. Proof of Main ResultsWe �rst de�ne some notations to be used in the two appendices.Let

u0i (�) = yi �Q(� jxi);ui(�) = yi � �(�)� (xi � x)0�(�);

where �(�) 2 R and �(�) 2 Rd are some arbitrary parameter values. Let

ei (�) = Q(� jx) + (xi � x)0@Q(� jx)@x

�Q(� jxi);

�(�) =qnhdn;�

0@ �(�)�Q(� jx)

hn;�

��(�)� @Q(� jx)

@x

�1A :

Then, ei (�) is the error from approximating Q(� jxi) with a �rst-order Taylor expansion and �(�)is the normalized di¤erence between (�(�), �(�)0)0 and their values in the expansion. Using theabove notation, ui(�) can be represented as

ui(�) = u0i (�)� ei (�)� (nhdn;� )�1=2z0i;�� (�) ;

where we have de�ned

z0i;� =

�1;(xi � x)0hn;�

�:

This decomposition is useful because it breaks ui(�) into three components: the true residual u0i (�),the error due to the Taylor approximation and the error due to replacing unknown values in theapproximation by some estimates.

In addition, let

Vn;� (�(�)) =

nXi=1

n��

�u0i (�)� ei (�)� (nhdn;� )�1=2z0i;�� (�)

��

�u0i (�)� ei(�)

�oKi;� (A.1)

with

Ki;� = K

�xi � xhn;�

�;

where �� is the check function. The term ��u0i (�)� ei(�)

�is introduced to recenter the �rst term

and has no e¤ect on estimation. Consequently, Vn;� (�(�)) is the recentered version of (2), whoseminimizer delivers the local linear estimator. Also, note that (A.1) is convex in �(�).

Finally, let

Sn (� ; �(�); ei (�)) = (nhdn)�1=2

nXi=1

nP�1(u0i (�) � (nhdn;� )�1=2z0i;��(�) + ei(�))

��xi�� 1

�u0i (�) � (nhdn;� )�1=2z0i;��(�) + ei(�)

�ozi;�Ki;� ;

A-1

which have mean zero and are independent in i for given � and �(�). A key element in the proofslies in analyzing the property of Sn (� ; �(�); ei (�)); the details can be found in Appendix B.Proof of Theorem 1. The strategy of the proof is similar to Theorem 2.1 in Koenker and Portnoy(1987). It consists of three steps. Step 1 proves sup�2T jj�(�)jj � log1=2(nhdn) with probabilitytending to 1. Step 2 provides an approximation to the subgradient process that holds uniformlyover the set

� =n(� ; �(�)): � 2 T ; k�(�)k � log1=2(nhdn)

o: (A.2)

Step 3 veri�es the Bahadur representation.Step 1. Let Kn = log

1=2(nhdn). First, note that

Vn;� (�(�)) � 0

always holds for each � and every n. This is because Vn;� (0) = 0, which can not be smallerthan the minimized value Vn;� (�(�)). The subsequent proof is by contradiction, showing that ifsup�2T jj�(�)jj � Kn, then Vn;� will be strictly positive with probability tending to 1 for some� 2 T . To this end, it is su¢ cient to show that for any � > 0, there exist some �nite constants N0and � > 0 independent of � , such that if k�(�)k � Kn holds for some � , then

P�Vn;� (�(�)) > �K2

n

�> 1� � holds for all n � N0: (A.3)

A su¢ cient condition for (A.3) is

P

�inf

k�k�Kn

inf�2T

Vn;� (�) > �K2n

�> 1� � for all n � N0: (A.4)

This formulation is useful because � is no longer quantile dependent. Because Vn;� (�) is convex in�, we always have

Vn;� ( �)� Vn;� (0) � (Vn;� (�)� Vn;� (0)) for any � 1: (A.5)

Therefore, a further su¢ cient condition for (A.4) is

P

�inf

k�k=Kn

inf�2T

Vn;� (�) > �K2n

�> 1� � for all n � N0; (A.6)

which we will now establish.Consider the following decomposition, due to Knight (1998):

Vn;� (�) =Wn;� (�) + Zn;� (�); (A.7)

where

Wn;� (�) = �(nhdn;� )�1=2nXi=1

Ki;� � (u0i (�)� ei(�))z0i;�� with � (u) = � � 1(u < 0);

Zn;� (�) =nXi=1

Ki;�

Z (nhdn;� )�1=2z0i;��

0

�1(u0i (�)� ei(�) � s)� 1(u0i (�)� ei(�) � 0)

ds:

A-2

Applying this decomposition, we have

infk�k=Kn

inf�2T

K�2n Vn;� (�) � inf

k�k=Kn

inf�2T

K�2n Zn;� (�)� sup

k�k=Kn

sup�2T

��K�2n Wn;� (�)

�� : (A.8)

We bound the two term on right hand side of (A.8) separately. First consider the second term:

supk�k=Kn

sup�2T

��K�2n Wn;� (�)

�� K�1

n sup�2T

(nhdn;� )�1=2nXi=1

� (u0i (�)� ei(�))z0i;�Ki;�

� K�1

n sup�2T

(nhdn;� )�1=2nXi=1

� � (u

0i (�)� ei(�))� � (u0i (�))

z0i;�Ki;�

(L1)

+K�1n sup

�2T

(nhdn;� )�1=2nXi=1

� (u0i (�))z

0i;�Ki;�

(L2)

Apply Lemma B.6, we have (L1)= Op�K�1n

�= op (1). For Term (L2), the quantity inside the norm

is Op(1) for any �xed � 2 T by a central limit theorem and it is stochastic equicontinuous in � byLemma B.3. Therefore, (L2)= Op

�K�1n

�= op (1). Thus,

supk�k=Kn

sup�2T

��K�2n Wn;� (�)

�� = op (1) : (A.9)

Next, consider the �rst term in (A.8). We will show that it is strictly positive with probabilitytending to 1. To this end, note that the integral appearing in Zn;� (�) is always nonnegative andsatis�es (c.f. Lemma A.1 in Oka and Qu, 2011)Z (nhdn;� )

�1=2z0i;��

0

�1(u0i (�)� ei(�) � s)� 1(u0i (�)� ei(�) � 0)

ds

� (nhdn;� )�1=2 z

0i;��

2

�1

�u0i (�)� ei(�) � (nhdn;� )�1=2

z0i;��

2

�� 1(u0i (�)� ei(�) � 0)

�:

Thus,

K�2n Zn;� (�)

� K�2n (nhdn;� )

�1=2��

2

�0 nXi=1

�1

�u0i (�)� ei(�) � (nhdn;� )�1=2

z0i;��

2

�� 1(u0i (�)� ei(�) � 0)

�zi;�Ki;�

= K�2n

�hdnhdn;�

�1=2��

2

�0�Sn (� ; 0; ei (�))� Sn

�� ;�

2; ei (�)

��(L3)

+K�2n (nhdn;� )

�1=2��

2

�0 nXi=1

�P

�u0i (�)� ei(�) � (nhdn;� )�1=2

z0i;��

2

��xi�� P (u0i (�)� ei(�) � 0

��xi) zi;�Ki;� : (L4)

A-3

We now analyze (L3) and (L4) separately.

(L3) = Op�K�1n

�= op (1) (A.10)

because of Lemma (B.5), k�k = Kn and hdn;�=hdn = O(1) by Assumption 5. Apply a mean-value

theorem (L4):

(L4) =1

4K�2n (nhdn;� )

�1nXi=1

fY jX (eyijxi)Ki;��0zi;�z

0i;��;

where eyi lies between Q(� jxi) + ei(�) and Q(� jxi) + ei(�) + (nhdn;� )

�1=2z0i;��=2. Note that xi is ina vanishing neighborhood of x and eyi approaches Q(� jxi) as n ! 1. Therefore, Assumption 2implies that fY jX (eyijxi) � fL for large n: Thus,

(L4) � 1

4K�2n fL�

0

1

nhdn;�

nXi=1

Ki;�zi;�z0i;�

!� uniformly in � for large n:

The term in the parentheses satis�es

(nhdn;� )�1

nXi=1

zi;�z0i;�Ki;� !p fX (x)Nx (�) uniformly in � ; (A.11)

where

Nx (�) =

8>>>>>><>>>>>>:

Z 24 1 0

0 u2

35K (u) du (if x is an interior point)

ZDx;�

24 1

u

35h 1 u0iK (u) du (if x is a boundary point)

In both cases, Nx (�) is positive de�nite for all � by Assumption 6. Let �min(�) > 0 denote itsminimum eigenvalue, then

(L4) � 1

4fL�min(�) in probability uniformly in � : (A.12)

Combining the results (A.9), (A.10) and (A.12), we see that (A.12) is strictly positive and dominates(A.9) and (A.10) with probability tending to 1. We have therefore proved (A.6). In fact, using thesame arguments as above, we can establish a stronger result: for any � > 0, there exists K0 > 0

and � > 0, such that ifjj�(�)jj � K0

holds for some � , then

P�Vn;� (�(�)) > �

�> 1� � holds for all n � N0: (A.13)

The weaker result su¢ ces for proving the current theorem and the stronger result is needed forproving Theorem 3.

A-4

Step 2. Consider the subgradient, normalized by (nhdn)�1=2:

(nhdn)�1=2

nXi=1

�� 1(u0i (�) � ei (�) + (nhn;� )

�1=2z0i;� � (�))�zi;�Ki;� ;

which, by Theorem 2.1 in Koenker (2005), is Op�(nhdn)

�1=2� = op(1) uniformly in � . Adding andsubtracting terms, it can be rewritten asn

Sn(� ; �(�); ei (�))� Sn (� ; 0; ei (�))o+ fSn (� ; 0; ei (�))� Sn (� ; 0; 0)g

+Sn (� ; 0; 0)

+(nhdn)�1=2

nXi=1

n� � P

�u0i (�) � ei (�) + (nh

dn;� )

�1=2z0i;� � (�)��xi�o zi;�Ki;�

Because of Step 1, we can restrict our attention to the set � de�ned in (A.2). The term in the�rst curly bracket is op (1) uniformly over this set by Lemma B.5. The term in the second curlybracket is uniformly op (1) implied by the proof of Lemma B.5. Consider the last term, and applya �rst-order Taylor expansion. It equals to

�(nhdn)�1=2nXi=1

fY jX (eyijxi) ei (�) zi;�Ki;��(nhdn)�1=2(nhdn;� )�1=2

nXi=1

fY jX (eyijxi)Ki;�zi;�z0i;�

!� (�) ;

where eyi lies between Q(� jxi) and Q(� jxi)+ei(�)+(nhdn;� )�1=2z0i;� �=2. Combining the above results,we have

� (�) =

(nhdn;� )

�1nXi=1

fY jX (eyijxi)Ki;�zi;�z0i;�

!�1(�

hdnhdn;�

�1=2Sn (� ; 0; 0)� (nhdn;� )�1=2

nXi=1

fY jX (eyijxi) ei (�) zi;�Ki;�

)+ op (1)

=�fY jX (Q(� jx)) fX (x)Nx (�)

��1(�hdnhdn;�

�1=2Sn (� ; 0; 0)� (nhdn;� )�1=2fY jX (Q(� jx))

nXi=1

ei (�) zi;�Ki;�

)+ op (1) ;

where the second equality follows from (A.11) and Assumption 2.Step 3. We now derive the limit of the preceding expression. Note that for any xi satisfying(xi � x)=hn;� 2 supp(K(�)); we have

ei (�) = �1

2

�xi � xhn;�

�0 @2Q(� jx)@x@x0

�xi � xhn;�

�h2n;� + o

�h2n;�

�;

Below, we consider two cases separately.

A-5

Case 1: x is a boundary point. Then,

�h�2n;��nhdn;�

��1 nXi=1

ei (�) zi;�Ki;� (A.14)

=1

2

Z �xi � xhn;�

�0 @2Q(� jx)@x@x0

�xi � xhn;�

�24 1

xi�xhn;�

35h�dn;�K �xi � xhn;�

�fX (xi) dxi + op (1)

! p 1

2fX (x)

ZDx;�

u0@2Q(� jx)@x@x0

u

24 1

u

35K (u) du:Let �01 be a row vector with the �rst element equal to 1 and the others zero, and let

1

2�01Nx (�)

�1ZDx;�

u0@2Q(� jx)@x@x0

u

24 1

u

35K (u) du = db;�

as in Theorem 1, we have

qnhdn;�

��(�)�Q(� jx)� db;�h2n;�

�=�01Nx (�)

�1 �nhdn;��1=2Pni=1

�� 1(u0i (�) � 0

�zi;�Ki;�

fX (x) fY jX (Q(� jx))+op (1) :

Case 2: x is an interior point. Then the above formula simpli�es because

Nx (�) =

24 1 0

0 �2(K)Id

35and

h�2n;� (nhdn;� )

�1�01

nXi=1

ei (�) zi;�Ki;� =1

2�2(K)tr

�@2Q(� jx)@x@x0

�+ op (1) :

Let

d� =1

2�2(K)tr

�@2Q(� jx)@x@x0

�as in Theorem 1. Thenq

nhdn;��(�)�Q(� jx)� d�h2n;�

�=(nhdn;� )

�1=2Pni=1

�� 1(u0i (�) � 0

�Ki;�

fX (x) fY jX (Q(� jx))+ op (1) : �

Proof of Theorem 2. The proof consists of two steps. The �rst step shows that �(�); obtainedby solving (2) for all � 2 T , converges weakly to the desired limit. The second step shows that��(�) has the same weak limit as �(�) over � 2 T . We focus on the interior point case, as theboundary case can be proved similarly.Step 1. It su¢ ces to show that the leading term in the uniform Bahadur representation convergesweakly to the desired limit. Its �nite dimensional convergence is an immediate consequence of

A-6

results in Fan, Hu and Truong (1994), therefore the detail is omitted. To verify tightness, notethat the denominator is �nite and bounded away from 0 by Assumptions 1 and 2. The numeratoris tight by Lemma B.3.Step 2. We use similar arguments as in Neocleous and Portnoy (2008). Consider

��(�)� �(�)= n(�)�(� j)� n(�)�(�) + (1� n(�)) �(� j+1)� (1� n(�)) �(�)

= n(�)��(� j)�Q(� j jx)� d�jh2n;�j

�� n(�)

��(�)�Q(� jx)� d�h2n;�

�(L5)

+(1� n(�))��(� j+1)�Q(� j+1jx)� d�j+1h2n;�j+1

�� (1� n(�))

��(�)�Q(� jx)� d�h2n;�

�(L6)

+ n(�)�Q(� j jx) + d�jh2n;�j

�+ (1� n(�))

�Q(� j+1jx) + d�j+1h2n;�j+1

��Q(� jx) + d�h2n;�

�(L7)

For Term (L5), applying Theorem 1, we haveqnhdn

��(� j)�Q(� j jx)� d�jh2n;�j

��qnhdn

��(�)�Q(� jx)� d�h2n;�

�=

pnhdn

Pni=1

�� j � 1(u0i (� j) � 0)

�Ki;�j

fX (x) fY jX (Q(� j jx)jx)�pnhdn

Pni=1

�� 1(u0i (�) � 0)

�Ki;�

fX (x) fY jX (Q(� jx)jx)+ op (1)

=qnhdn

Pni=1

�� j � 1(u0i (� j) � 0)

�Ki;�j �

Pni=1

�� 1(u0i (�) � 0)

�Ki;�

fX (x) fY jX (Q(� j jx)jx)

+

pnhdn

Pni=1

�� 1(u0i (�) � 0)

�Ki;�

fX (x)

�1

fY jX (Q(� j jx)jx)� 1

fY jX (Q(� jx)jx)

�+ op (1) :

The �rst term on the right hand side is op (1) by the stochastic equicontinuity of the subgradientprocess, see Lemma B.3. The second term is also op (1) by the tightness of the subgradient processand Assumption 2. Term (L6) can be analyzed in the same way. For Term (L7), apply a second-order Taylor expansion as in Neocleous and Portnoy (2008, p.1228):

Q(� j jx) = Q(� jx) + @Q(� jx)@�

(� j � �) +O((� j+1 � � j)2);

where the order of the remainder term is due to the Lipschitz continuity of @Q(� jx)=@� . Thisimplies

n(�)Q(� j jx) + (1� n(�))Q(� j+1jx) = Q(� jx) +O((� j+1 � � j)2):

Similarly,

d�jh2n;�j = d�h

2n;� +

@d�h2n;�

@�(� j � �) +O((� j+1 � � j)2);

implying n(�)d�jh

2n;�j + (1� n(�)) d�j+1h

2n;�j+1 = d�h

2n;� +O((� j+1 � � j)

2):

Therefore,(L7) = O((� j+1 � � j)2):

A-7

Because � j+1 � � j = O(�nhdn

��1=4), it follows

pnhdn (� j � �)

2 ! 0. This holds uniformly in � 2 T .�Proof of Theorem 3. Consider the �rst result. Without loss of generality, assume the minimizersto the constrained and unconstrained problems are both unique. The subsequent proof is bycontradiction. Suppose the stochastic order condition is violated. Then,q

nhdn;�j je�(� j)�Q(� j jx)j > K0 for some � j 2 T and some n � N0

with positive probability. We will argue along such a sequence. Without loss of generality, assume

e�(� j) > Q(� j jx) +K0. (A.15)

Then, we consider the following two possibilities separately: (1) the constraint is not binding at � j(i.e., e�(� j) > e�(� j�1)), and (2) the constraint is in fact binding (i.e., e�(� j) = e�(� j�1)).

In the �rst case, e�(� j) is identical to the unconstrained estimate �(� j), thereforeVn;�j (�(� j)) = Vn;�j (

e�(� j)) � 0:However, this contradicts (A.13). Consider the second case. Let � l denote the lowest quantile inthe grid for which e�(� l) = e�(� j). By the monotonicity of the quantile function and (A.15), wemust have e�(� l) > Q(� ljx) +K0:

Therefore,P�Vn;� l(

e�(� l)) > ��> 1� � holds for all n � N0:

Now, consider decreasing the value of e�(� l) by some small amount. This will decrease the value ofthe objective function Vn;� l(e�(� l)) due to its convexity, see (A.5). This action does not violate anyconstraint because of the de�nition of � l (i.e., it is the lowest quantile for which e�(� l) = e�(� j)).Therefore, the new value remains admissible and yet returns a smaller objective function value. Itcontradicts that e�(� l) is the minimizer.

The second result holds because under the given condition on m, the quantiles will not crosswith probability arbitrarily close to 1 in large samples, therefore the large sample distribution isthe same as the unconstrained case, which is given in Theorem 2. The constraints serve as a �nitesample correction, having no e¤ect asymptotically. �

Proof of Corollary 1. The mean squared error (MSE) for an interior point x is

1

4�22(K) tr

�@2Q(� jx)@x@x0

�2h4n;� +

� (1� �)RK (v)2 dv

nhdn;�fX (x) fY jX (Q(� jx)jx)2 + op(nh

dn;� ):

Therefore, the optimal bandwidth that minimizes the (interior) asymptotic MSE is

hn;� =

0B@ � (1� �) dRK (v)2 dv

�22(K)fX (x) fY jX (Q(� jx)jx)2 tr�@2Q(� jx)@x@x0

�21CA1=(4+d)

n�1=(4+d):

A-8

We now brie�y verify that this bandwidth rule satis�es Assumption 5. The factor in front ofn�1=(4+d) corresponds to c(�). It is clearly bounded under Assumptions 1-4. We now verify that it isLipschitz continuous. It su¢ ces to verify the three quantities (� (1� �))1=(4+d) ; fY jX (Q(� jx)jx)�2=(4+d)

and tr�@2Q(� jx)=@x@x0

��2=(4+d) are individually Lipschitz continuous, or equivalently, to show theyhave bounded �rst derivatives. This is true for (� (1� �))1=(4+d) because T � (0; 1): For the secondterm, apply a mean-value theorem��fY jX (Q(�1jx)jx)�2=(4+d) � fY jX (Q(�2jx)jx)�2=(4+d)��

=

�� 2

4 + df (Q(e� jx)jx)� 6+d

4+d f 0Y jX (Q(e� jx)jx)Q0(e� jx)�� (�2 � �1) ;where e� 2 T . The terms inside the absolute value on the right side are all bounded over T byAssumptions 1-3. Finally, tr

�@2Q(� jx)=@x@x0

��2=(4+d) is Lipschitz continuous following the sameargument. �Proof of Corollary 2. We have, for the interior point case,

P (Q(� jx) =2 Cp for some � 2 T )

= P

�1

�n;� (x)

��Q(� jx)� e��(�) + d�h2n;� �� > Zp for some � 2 T�

= P

�sup�2T

1

�n;� (x)

��Q(� jx)� e��(�) + d�h2n;� �� > Zp

�! p

by the continuous mapping theorem. The boundary point case can be proved using the sameargument. �Proof of Corollary 3. We focus on the interior point case and verify the conditions in Cher-nozhukov, Fernández-Val and Galichon (2010, Corollary 3) are satis�ed. First, Assumptions 1-3 inthis paper imply their Assumption 1 with the domain (0,1) replaced by T . They also imply thestrict monotonicity of Q(� jx) in � as well as its continuous di¤erentiability in both arguments. Toverify their Assumption 2, simply let

an =qnhdn;� and �Gx (�) =

G(�)pfX (x)fY jX (Q(� jx)jx)

+ h(�)d� ;

and apply the �rst result in Theorem 2. �

A-9

Appendix B. Auxiliary LemmasThe same notations as in Appendix A are used throughout. The next lemma is needed for

analyzing the e¤ect of the quantile dependant bandwidth on the asymptotic properties of theestimators.

Lemma B.1 Let Assumptions 1, 4 and 5 hold.

1. For any � 1 there exists a B > 0, such that for any �1; �2 2 T with �1 � �2

E kzi;�2Ki;�2 � zi;�1Ki;�1k2 � Bhdn (�2 � �1)

2 :

2. Let bn = (nhdn)�1=2�� with 0 < � < 1=2 being some arbitrary constant and hn satisfying

hn ! 0 and nhdn ! 1 as n ! 1: Let �n = f(�1; �2) : �1 2 T ; �2 2 T ; �1 � �2 � �1 + bng ;then

sup(�1;�2)2�n

(nhdn)�1=2

nXi=1

kzi;�2Ki;�2 � zi;�1Ki;�1k = op (1) :

Proof. By Assumption 4, we can, without loss of generality, assume that the support of K(:) iscontained in a compact set D = [�1; 1]d.

To show the �rst result, note that

kzi;�2Ki;�2 � zi;�1Ki;�1k � kzi;�2 (Ki;�2 �Ki;�1)k+ kKi;�1 (zi;�2 � zi;�1)k : (B.1)

We now analyze the two terms on the right side separately. Apply the mean-value theorem to the�rst term:

kzi;�2 (Ki;�2 �Ki;�1)k = zi;�2K 0

i;e��xi � xhn;�2

� xi � xhn;�1

� ;where e� 2 [�1; �2] and K 0

i;e� is the derivative of Ki;� with respect to (xi � x)=hn;� ; evaluated at� = e� . Because K 0

i;e� is bounded by Assumption 4, say jjK 0i;e� jj � M < 1 within D, and is zero

elsewhere, the preceding display is bounded by

M

zi;�21�xi � xhn;e� 2 D��

xi � xhn;�2

� xi � xhn;�1

� � M

zi;�21�xi � xhn;e� 2 D� 1�xi � xhn;e� 2 D

��xi � xhn;�2

� xi � xhn;�1

� = M kT1k kT2k .

We have

kT1k = zi;�21�xi � xhn;�2

2hn;e�hn;�2

D

� � zi;�21�xi � xhn;�22 (�c=c)D

� � C;

where the �rst inequality follows from Assumption 5, the second uses the de�nition of zi;�2 , andthe constant C depends only on (�c=c)D. Also,

kT2k � 1�xi � xhn;�2

2hn;e�hn;�2

D

��xi � xhn;�2

� ��hn;�1 � hn;�2hn;�1

�� (B.2)

B-1

due to the Cauchy-Schwarz inequality. The �rst norm on the right hand side of (B.2) is boundedby

C � 1�xi � x 2 hn;e�D� � C � 1 (xi � x 2 �cDhn) ;

while the second norm equals to��c(�1)� c(�2)c(�1)

�� c(�1)� c(�2)c

�� G(�2 � �1);

where the last inequality uses the Lipschitz property of c(�) and C and G are some �nite constants(c.f. Assumption 5). Therefore, the �rst term in (B.1) is bounded by

CMG � 1 (xi � x 2 �cDhn) (�2 � �1)

Now, consider the second term in (B.1) and apply similar arguments as above:

kKi;�1 (zi;�2 � zi;�1)k =

Ki;�1

�xi � xhn;�2

� xi � xhn;�1

� =

Ki;�1

�xi � xhn;�1

�1

�xi � xhn;�1

2 D��

hn;�2 � hn;�1hn;�2

� �

Ki;�1

�xi � xhn;�1

� 1�xi � xhn;�12 D

� ��hn;�2 � hn;�1hn;�2

�� KCG � 1 (xi � x 2 �cDhn) (�2 � �1);

where �K is an upper bound for K(:) and C and G have the same de�nition as before. Let �C =

CMG+ �KCG; then (B.1) satis�es

kzi;�2Ki;�2 � zi;�1Ki;�1k � �C � 1 (xi � x 2 �cDhn) (�2 � �1): (B.3)

Consequently,

E k(zi;�2Ki;�2 � zi;�1Ki;�1)k2 � �C2 (�2 � �1)2 P (xi � x 2 �cDhn) :

Because the density of xi is bounded by Assumption 1, there exists a �nite constant A independentof � such that

P (xi � x 2 �cDhn) � Ahdn:

Letting B = �C2 A completes the proof for the �rst result.To prove the second result, we apply (B.3):

sup(�1;�2)2�n

(nhdn)�1=2

nXi=1

kzi;�2Ki;�2 � zi;�1Ki;�1k

� sup(�1;�2)2�n

�C(nhdn)1=2(�2 � �1)

((nhdn)

�1nXi=1

1 (xi � x 2 �cDhn)):

B-2

The quantity inside the curly brackets is independent of � and satis�es a weak law of large numbers.Therefore, it is of order Op (1). Finally,

sup(�1;�2)2�n

�C(nhdn)1=2(�2 � �1) � �C(nhdn)

1=2bn = �C(nhdn)�� ! 0

because � > 0 and nhdn !1. �The next Lemma is needed to establish the stochastic equicontinuity of the process Sn (� ; 0; 0).

Lemma B.2 Let bn = (nhdn)�1=2�� with � 2 (0; 1=2) being some arbitrary constant and hn ! 0

and nhdn ! 1 as n ! 1. Then, there exist > 1 and �C < 1, such that for any �1; �2 2 Tsatisfying j�2 � �1j � bn; we have

E kSn (�2; 0; 0)� Sn (�1; 0; 0)k2 � �C j�2 � �1j :

Proof. Without loss of generality, assume �2 � �1: It is equivalent to show

(E kSn (�2; 0; 0)� Sn (�1; 0; 0)k2 )1= � �C1= (�2 � �1):

Let

A1i =��2 � 1(u0i (�2) � 0)

��1 � 1(u0i (�1) � 0)

�zi;�2Ki;�2

A2i =��1 � 1(u0i (�1) � 0)

�(zi;�2Ki;�2 � zi;�1Ki;�1) :

Then, we can write

Sn (�2; 0; 0)� Sn (�1; 0; 0) = (nhdn)�1=2nXi=1

(A1i +A2i) ;

therefore

�E kSn (�2; 0; 0)� Sn (�1; 0; 0)k2

�1= =

8<:(nhdn)� E nXi=1

A1i +A2i

2 9=;1=

=

8<:(nhdn)� E0@d+1Xj=1

��nXi=1

A1i;j +A2i;j

��21A 9=;

1=

�d+1Xj=1

8<:(nhdn)� E��nXi=1

A1i;j +A2i;j

��2 9=;1=

; (B.4)

where A1i;j and A2i;j denote the j-th element of A1i and A2i respectively, and the last line followsfrom the Minkowski inequality. We now bound the term inside the curly brackets using arguments

B-3

similar to Bai (1996, Lemma A1).

(nhdn)� E

��nXi=1

A1i;j +A2i;j

��2

� C(nhdn)�

nXi=1

E jA1i;j +A2i;j j2!

+ C(nhdn)�

nXi=1

E jA1i;j +A2i;j j2

� 2 C(nhdn)�

nXi=1

EA21i;j + EA22i;j

! + 2 C(nhdn)

� nXi=1

E�A21i;j +A

22i;j

� � 2 C(nhdn)

�

nXi=1

E kA1ik2 + E kA2ik2!

+ 2 C(nhdn)�

nXi=1

E�kA1ik2 + kA2ik2

� � 2 C(nhdn)

�

nXi=1

E kA1ik2 + E kA2ik2!

(T3)

+2 (nhdn)� C

nXi=1

�(E kA1ik2 )1= + (E kA2ik2 )1=

� (T4),

where the �rst inequality is because of the Rosenthal inequality for independent random variables(Hall and Heyde, 1980, p. 23) with the constant C depending only on , the second is because ofthe triangle inequality, the third is due to the simple fact that A21i;j � kA1ik

2 and A22i;j � kA2ik2,

and the last inequality is due to the Minkowski inequality. We now derive bounds for the summandsin (T3) and (T4).

E kA1ik2 = EnE�kA1ik2 jxi

�o= E

�En��2 � 1(u0i (�2) � 0)� �1 + 1(u0i (�1) � 0)

�2 ��xio kzi;�2Ki;�2k2 �

� E�En��2 � 1(u0i (�2) � 0)� �1 + 1(u0i (�1) � 0)

�2��xio kzi;�2Ki;�2k2 �

� (�2 � �1)E kzi;�2Ki;�2k2 :

where the �rst inequality is because��2 � 1(u0i (�2) � 0)� �1 + 1(u0i (�1) � 0)�� 1. Because Ki;�

has a bounded support, using similar arguments as in Lemma B.1, there exists a constant Cindependent of � ; such that E kzi;�2Ki;�2k

2 � Chdn, which implies

E kA1ik2 � Chdn (�2 � �1) :

Meanwhile,

E kA2ik2

= E ��1 � 1(u0i (�1) � 0)� (zi;�2Ki;�2 � zi;�1Ki;�1)

2 � E k(zi;�2Ki;�2 � zi;�1Ki;�1)k

2

� Bhdn (�2 � �1)2 ;

B-4

where the last inequality uses the �rst result in Lemma B.1. The terms E kA1ik2 and E kA2ik2 in(T3) can be bounded similarly, giving

E kA1ik2 � Chdn (�2 � �1) and E kA2ik2 � Bhdn (�2 � �1)2 :

These bounds imply

(T3) � 2 C

(nhdn)

�1nXi=1

�C (�2 � �1)hdn +B (�2 � �1)

2 hdn

�! � M (�2 � �1) for some constant M (B.5)

and

(T4) = 2 C(nhdn)�

nXi=1

��E kA1ik2

�1= +�E kA2ik2

�1= � � 2 C(nhdn)

� nXi=1

��Chdn (�2 � �1)

�1= +�Bhdn (�2 � �1)

2 �1= �

� M(nhdn)1� (�2 � �1)

= M(nhdn (�2 � �1))1� (�2 � �1) for some constant M .

By the de�nition of bn in the Lemma, we have �2��1 � bn > (nhdn)�1, which implies nhdn (�2 � �1) >

1. Consequently, M�nhdn (�2 � �1)

�1� < M because > 1. Therefore,

(T4) �M (�2 � �1) : (B.6)

Combining the (B.5) and (B.6), it follows that each term inside the curly brackets in (B.4) isbounded by 2M (�2 � �1) .Thus, (B.4) is bounded by (d + 1) (2M)1= (�2 � �1). Let �C = (d +

1) (2M)1= , the proof is complete. �The next lemma establishes the stochastic equicontinuity of the process Sn (� ; 0; 0).

Lemma B.3 For any " > 0 and � > 0; there exists a � > 0; such that for large n,

P

sup

� 00;� 02T ;j� 00�� 0j��

Sn �� 00; 0; 0�� Sn �� 0; 0; 0� > "

!< �:

Proof. Let zi;� ;j denote the j-th component of zi;� (j = 1; :::; d + 1). Note that zi;� 0;j � 0 if andonly if zi;� 00;j � 0 (j = 1; :::; d + 1) for any � 00; � 0 2 T . Without loss of generality, assume that theelements of zi;� ;j are all nonnegative. Otherwise, let

zi;� ;j = z+i;� ;j � z�i;� ;j � zi;� ;j1(zi;� ;j � 0)� (�zi;� ;j)1(zi;� ;j � 0) (B.7)

and let z+i;� = (z+i;� ;1; :::; z

+i;� ;d+1) and z

�i;� = (z

�i;� ;1; :::; z

�i;� ;d+1). Then,

Sn (� ; 0; 0) = (nhdn)�1=2

nXi=1

�� 1(u0i (�) � 0)

�z+i;�Ki;�


�� 1(u0i (�) � 0)

�z�i;�Ki;� ;

B-5

where the elements of z+i;� and z�i;� are nonnegative and the two summations can be analyzedseparately. Note that Lemmas B.1 and B.2, which are the only results needed here, still hold whenzi;� is replaced by z+i;� or z

�i;� .

For a given �; T contains at most 1=� intervals of length �. Therefore, it su¢ ces to show thatfor any " > 0; � > 0, there exists a � > 0, such that (see Billingsley 1968, p. 58, equation 8.12)

P

sup

s��+skSn (� ; 0; 0)� Sn (s; 0; 0)k > "

!< �� holds for all s 2 T when n is large, (B.8)

(with � in the supremum restricted to � 2 T ).To show this, we use a chaining argument. Partition the interval [s; � + s] into

bn = (nhdn)1=2+�

small intervals of equal sizes with 0 < � < 1=2. Let � j denote the lower limit of the j-th interval.Note that �1 = s. Then, by the triangle inequality

sups��+s

kSn (� ; 0; 0)� Sn (s; 0; 0)k (B.9)

� sup1�j�bn

sup�2[�j ;�j+1]

kSn (� ; 0; 0)� Sn (� j ; 0; 0)k+ sup1�j�bn

kSn (� j ; 0; 0)� Sn (s; 0; 0)k :

For the �rst term, we have

Sn (� ; 0; 0)� Sn (� j ; 0; 0) (B.10)

� (nhdn)�1=2

nXi=1

�� j � 1(u0i (� j+1) < 0)

�zi;�Ki;� � Sn (� j ; 0; 0)

= Sn (� j+1; 0; 0)� Sn (� j ; 0; 0)

+(nhdn)�1=2

nXi=1

�� j+1 � 1(u0i (� j+1) < 0)

� �zi;�Ki;� � zi;�j+1Ki;�j+1

�(T5)


(� j+1 � � j) zi;�Ki;� (T6)

where the inequality uses the non-negativeness of zi;�Ki;� and that � j and � j+1 are the lower andupper limits of the interval containing � ; and the equality uses the de�nition of Sn (� ; 0; 0). Theterms (T5) and (T6) are negligible because

k(T5)k � (nhdn)�1=2nXi=1

zi;�Ki;� � zi;�j+1Ki;�j+1

= op(1)

by the second result in Lemma B.1, and

k(T6)k � (nhdn)��(�

nhdn

��1 nXi=1

kzi;�Ki;�k)= op(1)

B-6

because j� j+1 � � j j � (nhdn)�1=2��. Therefore

Sn (� ; 0; 0)� Sn (� j ; 0; 0) � Sn (� j+1; 0; 0)� Sn (� j ; 0; 0)�"

5(B.11)

with probability no less than 1 � �� uniformly in T for large n. Meanwhile, there is a reversedinequality for (B.10) given by

Sn (� ; 0; 0)� Sn (� j ; 0; 0)

� (nhdn)�1=2

nXi=1

�� j+1 � 1(u0i (� j) � 0)

�zi;�Ki;� � (nhdn)�1=2

nXi=1

�� j � 1(u0i (� j) < 0)

�zi;�jKi;�j

= (nhdn)�1=2

nXi=1

1(u0i (� j) < 0)�zi;�jKi;�j � zi;�Ki;�

�(T7)

+(nhdn)�1=2

nXi=1

(� j+1 � � j) zi;�Ki;� (T8)


� j�zi;�jKi;�j � zi;�Ki;�

�(T9).

The terms (T7) and (T9) can be analyzed in the same way as (T5) and are op(1) uniformly in T .The term (T8) can be analyzed in the same way as (T6) and is also op(1) uniformly in T . Therefore

Sn (� ; 0; 0)� Sn (� j ; 0; 0) �"

5(B.12)

holds with probability no less than 1�� uniformly in T for large n. Combining (B.11) and (B.12),along with (B.9), we have

sups��+s

kSn (� ; 0; 0)� Sn (s; 0; 0)k

� sup1�j�bn

kSn (� j+1; 0; 0)� Sn (� j ; 0; 0)k+2"

5+ sup1�j�bn

kSn (� j ; 0; 0)� Sn (s; 0; 0)k

� sup1�j�bn

3 kSn (� j ; 0; 0)� Sn (s; 0; 0)k+2"

5;

which holds with probability no less than 1 � �� uniformly in T for large n, where the secondinequality is due to the triangle inequality:

sup1�j�bn

kSn (� j+1; 0; 0)� Sn (� j ; 0; 0)k � sup1�j�bn

2 kSn (� j ; 0; 0)� Sn (s; 0; 0)k :

Therefore, proving (B.8) boils down to showing

P

sup

1�j�bnkSn (� j ; 0; 0)� Sn (�1; 0; 0)k >

"

5

!< �� for large n.

To show this, we follow Bai (1996, p.613) and apply Theorem 12.2 in Billingsley (1968). This resultallows us to bound the above probability by using the bounds on the moments of Sn (� j+1; 0; 0)�

B-7

Sn (� j ; 0; 0), established in the previous lemma. Speci�cally, it states that if there exists � � 0;

� > 1 and ul � 0 (l = 1; :::; bn) such that

E�kSn (� j ; 0; 0)� Sn (� i; 0; 0)k�

��

0@Xi<l�j

ul

1A� ; 0 � i � j � bn;

then

P

sup

1�j�bnkSn (� j ; 0; 0)� Sn (�1; 0; 0)k > "

!� C�;�

"�(u1 + :::+ ubn)

� :

In our case, let � = 2 ; � = ; then Lemma B.2 implies

E kSn (� j ; 0; 0)� Sn (� i; 0; 0)k� � �C(� j � � i)�; 0 � i � j � bn;

therefore

P

sup

1�j�bnkSn (� j ; 0; 0)� Sn (�1; 0; 0)k >

"

5

!� C�;��

"5

�� C(� bn � �1)� = �

C�;��"5

�� C��1!:

We can choose � such that C�;��"5

�� C��1!� �:

This completes the proof. �The next two lemmas are used to establish the relationship between Sn (� ; �; ei (�)) and Sn (� ; 0; ei (�)).

They are needed to quantify the e¤ect of the parameter estimation on the subgradient.

Lemma B.4 Let bn = (nhdn)1=2+� and consider a partition of T into bn intervals of equal sizes.

Let � j denote the lower limit of the jth interval. Then, under Assumptions 1-6, we have

sup1�j�bn

sup�j�1��j

(nhdn)�1=2nXi=1

zi;�Ki;�

�1(u0i (� j) � ei (�))� 1(u0i (� j) � ei (� j))

= op (1)

and

sup1�j�bn sup�j�1��j supk�k�log1=2(nhdn;� )

(nhdn)�1=2Pni=1 zi;�Ki;�

n1�u0i (� j) � ei (�) + (nh

dn;� )

�1=2z0i;��

�(nhdn)�1=2Pni=1 zi;�jKi;�j1

�u0i (� j) � ei (� j) + (nh

dn;�j )

�1=2z0i;�j��o = op (1) :

Proof. Without loss of generality, assume zi;� is a scalar and nonnegative, otherwise, we can repeatthe argument as in (B.7) and analyze each element separately.

By the second result in Lemma B.1, the left hand side term in the �rst result has the sameorder as

sup1�j�bn

sup�j�1��j

(nhdn)�1=2nXi=1

zi;�jKi;�j

�1(u0i (� j) � ei (�))� 1(u0i (� j) � ei (� j))

: (B.13)

B-8

Meanwhile, for any xi satisfying (xi � x)=hn;� 2 supp(K(�));

ei (�) = �1

2

�xi � xhn;�

�0 @2Q(� jx)@x@x0

�xi � xhn;�

�h2n;� + o(h

2n;� );

where the o(h2n) term is uniformly in � because of the boundedness and Lipschitz continuityof @2Q(� jx)=@x@x0 stated in Assumption 3. Because j� j � � j�1j � (nhdn)

�1=2�� and hn;� =

O�n�1=(4+d)

�for all � ; we have

ei(�)� ei(� j) = o�(nhdn)

�1=2�� "(nhdn)

�1=2;

which holds in probability uniformly in � 2 T ; where " is a constant that can be made arbitrarilysmall. Thus, (B.13) is bounded from above by

(T10) = sup1�j�bn

nXi=1

zi;�jKi;�j1�ei (� j)� "(nhdn)�1=2 � u0i (� j) � ei (� j) + "(nh

dn)�1=2

� :To further bound (T10), let

�i;�j = 1�ei (� j)� "(nhdn)�1=2 � u0i (� j) � ei (� j) + "(nh

dn)�1=2

��E

n1�ei (� j)� "(nhdn)�1=2 � u0i (� j) � ei (� j) + "(nh

dn)�1=2

�jxio;

which has mean zero and, for any > 1, satis�es

E�jj�i;�j jj

2 jxi�� E

�jj�i;�j jj

2jxi�� B(nhdn)

�1=2

for some constant B. Apply triangle inequality to (T10):

(T10) � sup1�j�bn

(nhdn)�1=2

nXi=1

zi;�jKi;�j�i;�j

+ sup1�j�bn

(nhdn)1=2

nXi=1

En1�ei (� j)� "(nhdn)�1=2 � u0i (� j) � ei (� j) + "(nh

dn)�1=2

�jxio:

The second term is op (1) by choosing a small ": The �rst term satis�es

P

sup

1�j�bn(nhdn)

�1=2

nXi=1


> �

!�

bnXj=1

P

(nhdn)

�1=2

nXi=1


> �

!:

Apply the Rosenthal inequality with > 1 to the j-th term on the right side (Hall and Heyde,1980, p. 23):

P

(nhdn)

�1=2

nXi=1


> �

!

� C��2 (nhdn)� =2

(E

(nhdn)

�1=2nXi=1

E�jj�i;�j jj

2jxi� zi;�jKi;�j

2!

+ (nhdn)� =2

nXi=1

E� zi;�jKi;�j

2 E �jj�i;�j jj2jxi��):

B-9

The term inside the curly brackets is bounded by a constant M independent of � : Therefore, thepreceding display is bounded by CM��2 (nhdn)

� =2; whose summation overPbnj=1 is bounded by

CM��2 (nhdn)� =2(nhdn)

1=2+�;

which converges to zero by choosing > 1 + 2�. This establishes the �rst result.For the second result, notice that the di¤erence between (nhdn;� )

�1=2z0i;�� and (nhdn;� )

�1=2z0i;�j�

is of order lower than (nhdn)�1=2 because of Assumption 5. Therefore, the same arguments as above

can be used. The uniformity in � can be shown in the same way as in the next Lemma. The detailis omitted. �

Lemma B.5 Under Assumptions 1-6, let Kn = log1=2(nhdn); we have

sup�2T

supk�k�Kn

kSn (� ; �; ei (�))� Sn (� ; 0; ei (�))k = op (1) :

Proof. The proof is similar to Lemma B.3. As in that lemma, assume that the elements ofzi;� are all nonnegative. We apply a chaining argument. First, partition T into bn = (nhdn)

1=2+�

intervals as before and let � j denote the lower limit of the j-th interval. Then, as in (B.10), for any� 2 [� j ; � j+1];

Sn (� ; �; ei (�))� Sn (� ; 0; ei (�)) (B.14)

� (nhdn)�1=2

nXi=1

zi;�Ki;�1(u0i (� j+1) � ei (�))


zi;�Ki;�1�u0i (� j) � ei (�) + (nh

dn;� )

�1=2z0i;��


zi;�Ki;�E�1(u0i (� j) � ei (�))

��xi+(nhdn)

�1=2nXi=1

zi;�Ki;�En1�u0i (� j+1) � ei (�) + (nh

dn;� )

�1=2z0i;��jxio

Applying the results in Lemma B.4, we have, for any " > 0;

Sn (� ; �; ei (�))� Sn (� ; 0; ei (�))

� (nhdn)�1=2

nXi=1

zi;�j+1Ki;�j+11(u0i (� j+1) � ei (� j+1))


zi;�jKi;�j1�u0i (� j) � ei (� j) + (nh

dn;�j )

�1=2z0i;�j��


zi;�j+1Ki;�j+1E�1(u0i (� j) � ei (� j+1))jxi

+(nhdn)

�1=2nXi=1

zi;�jKi;�jEn1�u0i (� j+1) � ei (� j) + (nh

dn;�j )

�1=2z0i;�j��jxio

+" uniformly in T and k�k � Kn:

B-10

By adding and subtracting terms, the right hand side can be rewritten as

Sn (� j ; �; ei (� j))� Sn (� j+1; 0; ei (� j+1))

+(nhdn)�1=2

nXi=1

zi;�j+1Ki;�j+1E�1�u0i (� j+1) � ei (� j+1)

�jxi�

(T11)


zi;�j+1Ki;�j+1E�1(u0i (� j) � ei (� j+1))jxi

(T12)

+(nhdn)�1=2

nXi=1

zi;�jKi;�jEn1�u0i (� j+1) � ei (� j) + (nh

dn;�j )

�1=2z0i;�j��xio (T13)


zi;�jKi;�jEn1�u0i (� j) � ei (� j) + (nh

dn;�j )

�1=2z0i;�j��xio (T14)

+":

Apply a �rst-order Taylor expansion, we have (T11)�(T12)= Op�(nhn)

�� = op (1) and (T13)�(T14)=Op�(nhn)

�� = op (1) uniformly in � : Therefore, the preceding display is bounded by

Sn (� ; �; ei (�))� Sn (� ; 0; ei (�)) � Sn (� j ; �; ei (� j))� Sn (� j+1; 0; ei (� j+1)) + 2"

uniformly in � and k�k � Kn for large n.

B-11

Meanwhile, a reversed inequality is given by

Sn (� ; �; ei (�))� Sn (� ; 0; ei (�))

� (nhdn)�1=2

nXi=1

zi;�jKi;�j1(u0i (� j) � ei (� j))


zi;�j+1Ki;�j+11�u0i (� j+1) � ei (� j+1) + (nh

dn;�j+1)

�1=2z0i;�j+1��


zi;�jKi;�jE�1(u0i (� j+1) � ei (� j))

��xi+(nhdn)

�1=2nXi=1

zi;�j+1Ki;�j+1En1�u0i (� j) � ei (� j+1) + (nh

dn;�j+1)

�1=2z0i;�j+1��xio

+"

= Sn (� j+1; �; ei (� j+1))� Sn (� j ; 0; ei (� j))

+(nhdn)�1=2

nXi=1

zi;�jKi;�jE�1(u0i (� j) � ei (� j))

��xi�(nhdn)�1=2

nXi=1

zi;�jKi;�jE�1(u0i (� j+1) � ei (� j))

��xi+(nhdn)

�1=2nXi=1

zi;�j+1Ki;�j+1En1�u0i (� j) � ei (� j+1) + (nh

dn;�j+1)

�1=2z0i;�j+1��xio


zi;�j+1Ki;�j+1En1�u0i (� j+1) � ei (� j+1) + (nh

dn;�j+1)

�1=2z0i;�j+1��xio

+"

� Sn (� j+1; �; ei (� j+1))� Sn (� j ; 0; ei (� j)) + 2"

Combining the above results:

sup�2T

supk�k�Kn

kSn (� ; �; ei (�))� Sn (� ; 0; ei (�))k

� sup1�j�bn

supk�k�Kn

kSn (� j+1; �; ei (� j+1))� Sn (� j ; 0; ei (� j))k

+ sup1�j�bn

supk�k�Kn

kSn (� j ; �; ei (� j))� Sn (� j+1; 0; ei (� j+1))k+ 4":

Adding and subtracting terms, the �rst term on the right hand side equals to the sup of

fSn (� j+1; �; ei (� j+1))� Sn (� j+1; 0; ei (� j+1))g+ fSn (� j+1; 0; ei (� j+1))� Sn (� j+1; 0; 0)g+ fSn (� j+1; 0; 0)� Sn (� j ; 0; 0)g+ fSn (� j ; 0; 0)� Sn (� j ; 0; ei (� j))g ;

while second term equals to the sup of

fSn (� j ; �; ei (� j))� Sn (� j ; 0; ei (� j))g+ fSn (� j ; 0; ei (� j))� Sn (� j ; 0; 0)g+ fSn (� j ; 0; 0)� Sn (� j+1; 0; 0)g+ fSn (� j+1; 0; 0)� Sn (� j+1; 0; ei (� j+1))g

B-12

Thus

sup�2T

supk�k�Kn

kSn (� ; �; ei (�))� Sn (� ; 0; ei (�))k

� 2 sup1�j�bn+1

supk�k�Kn

kSn (� j ; �; ei (� j))� Sn (� j ; 0; ei (� j))k (T15)

+4 sup1�j�bn+1

kSn (� j ; 0; ei (� j))� Sn (� j ; 0; 0)k (T16)

+2 sup1�j�bn

kSn (� j+1; 0; 0)� Sn (� j ; 0; 0)k (T17)

+4":

First consider Term (T15). For any � > 0, the set f� : k�k � Kng can be partitioned into N(�)spheres such that the diameter of each sphere is less than or equal to �. Note that N(�) = O(Kd+1

n ).Denote the spheres by Dh with centers being �h (h 2 f1; 2; :::; N(�)g). For any � 2 Dh, we have

z0i;�j�h � zi;�j � � z0i;�j� � z0i;�j�h +

zi;�j �:Therefore,

sup1�j�bn+1

supk�k�Kn

kSn (� j ; �; ei (� j))� Sn (� j ; 0; ei (� j))k (B.15)

� sup1�j�bn+1

sup1�h�N(�)k=1;2;�2Dh

(nhdn)�1=2nXi=1

zi;�jKi;�jEn1�u0i (� j) � ei (� j)+z0i;�j�h+(�1)

kjjzi;�j jj��xio

� En1�u0i (� j) � ei (� j) + (nh

dn;�j )

�1=2z0i;�j��xio

+ sup1�j�bn+1

max1�h�N(�)

maxk=1;2

(nhdn)�1=2nXi=1

zi;�jKi;�j�i;j;h;k

;where

�i;j;h;k = En1�u0i (� j) � ei (� j) + (nh

dn;�j )

�1=2z0i;�j�h + (�1)k zi;�j ��xio

�1�u0i (� j) � ei (� j) + (nh

dn;�j )

�1=2z0i;�j�h + (�1)k zi;�j ��

�E�1�u0i (� j) � ei (� j)

��xi�+ 1 �u0(� j) � ei (� j)�:

Apply a Taylor expansion to the �rst term after the inequality in (B.15). It is of the same orderas �, which can be made arbitrarily small by choosing a small �. To bound the second norm, it issu¢ cient to show that for any � > 0; 1 � j � bn + 1; 1 � h � N(�) and k 2 f1; 2g;

bnN(�)P

(nhdn)�1=2nXi=1


> �

!! 0:

To verify this, �rst notice that for any > 1;

E�jj�i;j;h;kjj2 jxi

�� E

�jj�i;j;h;kjj2jxi

�� B(nhdn)

�1=2Kn:

B-13

Then, apply the Rosenthal inequality,

bnN(�)P

(nhdn)�1=2nXi=1


> "

!

� bnN(�)C"�2 (nhdn)

� =2

(E

(nhdn)

�1=2nXi=1

E�jj�i;j;h;kjj2jxi

� zi;�jKi;�j

2!

+(nhdn)� =2

nXi=1

E� zi;�jKi;�j

2 E �jj�i;j;h;kjj2jxi��)

� M"�2 bnN(�)(nhdn)� =2K

n for large n.

Now, using the de�nition of bn; N(�) and Kn, the preceding line is less than or equal to

�M"�2 (nhdn)(1+2�� )=2 log(d+1+ )=2(nhdn)

for some constant �M . Choosing = 1+ 2�+ c; for some c > 0, the preceding display converges to0.

(T16) can be analyzed similarly and is also op (1). (T17)= op (1) because of Lemma B.3. �

Lemma B.6 Under Assumptions 1-6,

sup�2T

(nhdn)�1=2nXi=1

� � (u

0i (�)� ei(�))� � (u0i (�))

zi;�Ki;�

= Op (1) :

Proof.

(nhdn)�1=2

nXi=1

� � (u

0i (�)� ei(�))� � (u0i (�))

zi;�Ki;�

= Sn (� ; 0; 0)� Sn (� j ; 0; ei(�))

+(nhdn)�1=2

nXi=1

zi;�Ki;�

�P�u0i (�) � 0

�� E

�1�u0i (�) � ei(�)

��xi�+ op (1)The second to the last line is op (1) as shown in the previous Lemma. The remaining term is Op (1)upon a Taylor expansion using the fact that ei(�) = O(h2n). �

B-14

Table 1. The Quantile-Specific Bandwidth: n = 250

Models m = 10 m = 20 m = 30τ = 0.5 τ = 0.8 τ = 0.5 τ = 0.8 τ = 0.5 τ = 0.8

Model 1(0.50, 0.50) 0.855 0.893 0.861 0.900 0.897 0.937

(0.622) (0.650) (0.514) (0.537) (0.573) (0.599)(0.75, 0.75) 0.967 1.010 0.997 1.042 1.000 1.045

(0.633) (0.661) (0.768) (0.802) (0.677) (0.707)(0.90, 0.90) 0.991 1.035 0.956 0.998 0.972 1.016

(1.006) (1.051) (0.617) (0.645) (0.603) (0.630)Model 2(0.50, 0.50) 0.285 0.297 0.288 0.301 0.287 0.300

(0.040) (0.042) (0.042) (0.044) (0.054) (0.056)(0.75, 0.75) 0.239 0.249 0.241 0.251 0.239 0.250

(0.045) (0.047) (0.046) (0.048) (0.040) (0.041)(0.90, 0.90) 0.290 0.302 0.283 0.296 0.299 0.312

(0.193) (0.201) (0.099) (0.104) (0.149) (0.155)Model 3(0.50, 0.50) 0.254 0.265 0.255 0.267 0.252 0.263

(0.028) (0.029) (0.030) (0.031) (0.029) (0.030)(0.75, 0.75) 0.421 0.440 0.404 0.422 0.410 0.428

(0.224) (0.234) (0.196) (0.204) (0.212) (0.221)(0.90, 0.90) 0.460 0.480 0.441 0.461 0.471 0.492

(0.224) (0.234) (0.250) (0.262) (0.375) (0.391)

We report the means of the selected bandwidths at two representative quantiles andtheir standard deviations (in parenthesis). n is the sample size and m is the numberof quantiles in the grid. The results are based on 500 replications.

Table 2. The Quantile-Specific Bandwidth: n = 500


Model 1(0.50, 0.50) 0.870 0.909 0.871 0.910 0.917 0.958

(0.475) (0.496) (0.734) (0.767) (0.572) (0.597)(0.75, 0.75) 0.979 1.022 1.027 1.072 0.958 1.001

(0.579) (0.604) (0.773) (0.807) (0.555) (0.580)(0.90, 0.90) 0.963 1.006 0.950 0.992 0.964 1.007

(0.661) (0.691) (0.541) (0.566) (0.594) (0.620)Model 2(0.50, 0.50) 0.251 0.263 0.252 0.263 0.252 0.263

(0.025) (0.026) (0.025) (0.026) (0.024) (0.025)(0.75, 0.75) 0.207 0.217 0.211 0.221 0.210 0.219

(0.022) (0.023) (0.023) (0.024) (0.026) (0.027)(0.90, 0.90) 0.223 0.233 0.223 0.233 0.223 0.233

(0.053) (0.055) (0.040) (0.041) (0.039) (0.041)Model 3(0.50, 0.50) 0.229 0.239 0.230 0.240 0.229 0.239

(0.020) (0.020) (0.020) (0.021) (0.019) (0.020)(0.75, 0.75) 0.408 0.427 0.387 0.404 0.410 0.428

(0.244) (0.254) (0.163) (0.170) (0.209) (0.218)(0.90, 0.90) 0.405 0.423 0.372 0.389 0.395 0.412

(0.370) (0.386) (0.155) (0.162) (0.253) (0.265)

We report the means of the selected bandwidths at two representative quantiles andtheir standard deviations (in parenthesis). n is the sample size and m is the numberof quantiles in the grid. The results are based on 500 replications.

Table 3. Root Mean Integrated Squared Error (RMISE)

Models n=250 n=500(0.5, 0.5) (0.75, 0.75) (0.9, 0.9) (0.5, 0.5) (0.75, 0.75) (0.9, 0.9)

Model 1Proposed 0.03760 0.07229 0.12406 0.02548 0.05329 0.09121Local Linear 0.03760 0.07230 0.12418 0.02548 0.05329 0.09122(Crossing) (0.00000) (0.03800) (0.72200) (0.00000) (0.00000) (0.33800)Rearrangement 0.03760 0.07229 0.12397 0.02548 0.05329 0.09119QR 0.03412 0.05786 0.07812 0.02361 0.04186 0.05211Model 2Proposed 0.15552 0.22889 0.29916 0.12569 0.18157 0.22604Local linear 0.15552 0.22890 0.29976 0.12569 0.18157 0.22617(Crossing) (0.05000) (0.11400) (0.74200) (0.00600) (0.05000) (0.63800)Rearrangement 0.15552 0.22889 0.29862 0.12569 0.18157 0.22590QR 0.48764 0.42703 0.21227 0.49118 0.41753 0.18839Model 3Proposed 0.13344 0.15490 0.33980 0.10339 0.11527 0.25429Local linear 0.13344 0.15490 0.34043 0.10339 0.11527 0.25445(Crossing) (0.08000) (0.08200) (0.91200) (0.01400) (0.00600) (0.82600)Rearrangement 0.13344 0.15490 0.33921 0.10339 0.11527 0.25409QR 0.52311 0.15542 0.62234 0.52754 0.12742 0.60894

The domain of the conditional quantile process: T = [0.2, 0.8]. The number of quantiles in the grid:m = 30. We report the mean of the RMISE over 500 simulation replications. “Proposed” denotes theproposed estimator. “Local linear” is the quantile-by-quantile application of the local linear estimator.“Rearrangement” is the rearranged version of “Local linear”. “QR” is obtained from the conventionallinear quantile regression. “Crossing” denotes the fraction of simulation runs in which “Local linear” hasquantile-crossing.

Table 4. Coverage Ratio of the Modified Confidence Bands

Models m = 10 m = 20 m = 30n = 250 n = 500 n = 250 n = 500 n = 250 n = 500

Model 1(0.50, 0.50) 0.968 0.980 0.982 0.988 0.974 0.986(0.75, 0.75) 0.986 0.986 0.994 0.994 0.986 0.988(0.90, 0.90) 0.962 0.982 0.962 0.988 0.990 0.988Model 2(0.50, 0.50) 0.842 0.888 0.832 0.910 0.876 0.926(0.75, 0.75) 0.904 0.938 0.918 0.966 0.928 0.962(0.90, 0.90) 0.872 0.970 0.900 0.972 0.912 0.970Model 3(0.50, 0.50) 0.956 0.968 0.970 0.982 0.968 0.990(0.75, 0.75) 0.910 0.938 0.930 0.946 0.942 0.968(0.90, 0.90) 0.866 0.932 0.880 0.948 0.892 0.948

Confidence Bands with the modified bias adjustment (90% nominal level). Resultsare based on 500 Replications. m is the number of quantiles in the grid and n is thesample size.

Table 5. Coverage Ratio of the Conventional Confidence Bands

Models m = 10 m = 20 m = 30n = 250 n = 500 n = 250 n = 500 n = 250 n = 500

Model 1(0.50, 0.50) 0.640 0.680 0.706 0.694 0.670 0.704(0.75, 0.75) 0.876 0.918 0.912 0.938 0.940 0.938(0.90, 0.90) 0.612 0.640 0.612 0.668 0.666 0.696Model 2(0.50, 0.50) 0.834 0.884 0.816 0.904 0.860 0.926(0.75, 0.75) 0.882 0.916 0.898 0.960 0.916 0.956(0.90, 0.90) 0.840 0.940 0.866 0.966 0.856 0.960Model 3(0.50, 0.50) 0.902 0.924 0.894 0.938 0.934 0.958(0.75, 0.75) 0.750 0.746 0.770 0.778 0.848 0.800(0.90, 0.90) 0.742 0.870 0.778 0.892 0.778 0.874

Confidence Bands with the conventional bias adjustment (90% nominal level).Results are based on 500 Replications. m is the number of quantiles in the grid andn is the sample size.

Table 6. The Relative Length of Two Confidence Bands: n = 250


Model 1(0.50, 0.50) 1.155 1.158 1.136 1.148 1.127 1.144

(0.103) (0.119) (0.088) (0.116) (0.090) (0.112)(0.75, 0.75) 1.105 1.100 1.096 1.095 1.096 1.086

(0.062) (0.073) (0.061) (0.076) (0.056) (0.066)(0.90, 0.90) 1.149 1.195 1.137 1.189 1.144 1.185

(0.137) (0.215) (0.117) (0.199) (0.149) (0.215)Model 2(0.50, 0.50) 1.157 1.134 1.147 1.130 1.141 1.122

(0.052) (0.064) (0.044) (0.068) (0.079) (0.070)(0.75, 0.75) 1.150 1.125 1.141 1.118 1.136 1.115

(0.032) (0.062) (0.034) (0.053) (0.034) (0.056)(0.90, 0.90) 1.092 1.074 1.075 1.061 1.083 1.067

(0.182) (0.159) (0.075) (0.088) (0.158) (0.149)Model 3(0.50, 0.50) 1.153 1.134 1.141 1.128 1.135 1.115

(0.032) (0.048) (0.029) (0.050) (0.035) (0.041)(0.75, 0.75) 1.111 1.101 1.101 1.095 1.094 1.085

(0.052) (0.089) (0.045) (0.078) (0.044) (0.066)(0.90, 0.90) 1.115 1.105 1.097 1.090 1.084 1.081

(0.302) (0.255) (0.285) (0.234) (0.187) (0.177)

We report means of the ratios. The standard deviations are in parenthesis. m is thenumber of quantiles in the grid and n is the sample size. The results are based on500 replications.

Table 7. The Relative Length of Two Confidence Bands: n = 500


Model 1(0.50, 0.50) 1.147 1.153 1.141 1.147 1.133 1.150

(0.095) (0.119) (0.090) (0.116) (0.091) (0.114)(0.75, 0.75) 1.103 1.099 1.098 1.086 1.096 1.085

(0.062) (0.081) (0.057) (0.064) (0.057) (0.064)(0.90, 0.90) 1.144 1.179 1.133 1.177 1.132 1.177

(0.115) (0.182) (0.123) (0.179) (0.104) (0.182)Model 2(0.50, 0.50) 1.150 1.131 1.142 1.121 1.133 1.115

(0.027) (0.043) (0.030) (0.041) (0.024) (0.038)(0.75, 0.75) 1.152 1.127 1.141 1.118 1.136 1.116

(0.029) (0.054) (0.023) (0.046) (0.025) (0.046)(0.90, 0.90) 1.087 1.069 1.073 1.059 1.070 1.055

(0.116) (0.090) (0.027) (0.026) (0.026) (0.023)Model 3(0.50, 0.50) 1.152 1.133 1.140 1.123 1.132 1.116

(0.024) (0.037) (0.023) (0.036) (0.019) (0.032)(0.75, 0.75) 1.106 1.100 1.098 1.091 1.096 1.089

(0.048) (0.083) (0.048) (0.078) (0.044) (0.073)(0.90, 0.90) 1.063 1.070 1.057 1.051 1.062 1.051

(0.142) (0.210) (0.145) (0.135) (0.154) (0.124)

We report means of the ratios. The standard deviations are in parenthesis. m is thenumber of quantiles in the grid and n is the sample size. The results are based on500 replications.

Figure 1. Surface Plots of Conditional Quantile Functions.

0.0

0.2

0.40.6

0.81.0

0.0

0.2

0.4

0.6

0.8

1.0

0.5

1.0

1.5

2.0

x1x2

Q(τ

|x)

Model 2, τ = 0.5

0.0

0.2

0.40.6

0.81.0

0.0

0.2

0.4

0.6

0.8

1.0

0.5

1.0

1.5

2.0

2.5

3.0

x1x2

Q(τ

|x)

Model 2, τ = 0.8

0.0

0.2

0.40.6

0.81.0

0.0

0.2

0.4

0.6

0.8

1.0

−6

−5

−4

−3

−2

−1

0

x1x2

Q(τ

|x)

Model 3, τ = 0.5

0.0

0.2

0.40.6

0.81.0

0.0

0.2

0.4

0.6

0.8

1.0

−6

−4

−2

0

x1x2

Q(τ

|x)

Model 3, τ = 0.8

Nonparametric Estimation and Inference on Conditional...

Documents

Transcript of Nonparametric Estimation and Inference on Conditional...