Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a...

88
Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 1 / 63

Transcript of Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a...

Page 1: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Single Index Quantile Regression for HeteroscedasticData

E. Christou M. G. Akritas

Department of StatisticsThe Pennsylvania State University

SMAC, November 6, 2015

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 1 / 63

Page 2: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Outline

1 IntroductionMotivationPossible Models

2 Single Index Quantile RegressionThe Proposed EstimatorMain Results

3 Numerical Studies4 Variable Selection

MotivationThe Proposed EstimatorMain Results

5 Numerical StudiesExample 1Example 2Real Data Example

6 Conclusions7 Future WorkE. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 2 / 63

Page 3: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Outline

1 IntroductionMotivationPossible Models

2 Single Index Quantile RegressionThe Proposed EstimatorMain Results

3 Numerical Studies4 Variable Selection

MotivationThe Proposed EstimatorMain Results

5 Numerical StudiesExample 1Example 2Real Data Example

6 Conclusions7 Future WorkE. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 3 / 63

Page 4: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

IntroductionMotivation

Least Squares Regression: Models the relationship between ad-dimensional vector of covariates X and the conditional mean of theresponse Y given X = x.

Least Squares Estimator:1 provides only a single summary measure for the conditional distribution

of the response given the covariates.2 sensitive to outliers and can provide a very poor estimator in many

non-Gaussian and especially long-tailed distributions.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 4 / 63

Page 5: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

IntroductionMotivation

Quantile Regression (QR): Models the relationship between ad-dimensional vector of covariates X and the τ th conditional quantileof the response Y given X = x, Qτ (Y |x).

Quantile Regression:1 provides a more complete picture for the conditional distribution of the

response given the covariates.2 useful for modeling data with heterogeneous conditional distributions,

especially data where extremes are important.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 5 / 63

Page 6: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Outline

1 IntroductionMotivationPossible Models

2 Single Index Quantile RegressionThe Proposed EstimatorMain Results

3 Numerical Studies4 Variable Selection

MotivationThe Proposed EstimatorMain Results

5 Numerical StudiesExample 1Example 2Real Data Example

6 Conclusions7 Future WorkE. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 6 / 63

Page 7: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

IntroductionPossible Models

Linear QR: Qτ (Y |x) = β′x, for a d-dimensional vector of unknownparameters β.Disadvantage: linear assumption not always valid.

Nonparametric QR: Qτ (Y |x) = g(x), for an unspecified functiong : Rd → R.Disadvantage: meets the ’data sparseness’ problem for highdimensional data.

Single-Index QR: Qτ (Y |x) = g(β′x|β), depends on x only throughthe single linear combination β′x.Advantage: reduces the dimension and maintain somenonparametric flexibility.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 7 / 63

Page 8: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

IntroductionPossible Models

Linear QR: Qτ (Y |x) = β′x, for a d-dimensional vector of unknownparameters β.Disadvantage: linear assumption not always valid.

Nonparametric QR: Qτ (Y |x) = g(x), for an unspecified functiong : Rd → R.Disadvantage: meets the ’data sparseness’ problem for highdimensional data.

Single-Index QR: Qτ (Y |x) = g(β′x|β), depends on x only throughthe single linear combination β′x.Advantage: reduces the dimension and maintain somenonparametric flexibility.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 7 / 63

Page 9: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

IntroductionPossible Models

Linear QR: Qτ (Y |x) = β′x, for a d-dimensional vector of unknownparameters β.Disadvantage: linear assumption not always valid.

Nonparametric QR: Qτ (Y |x) = g(x), for an unspecified functiong : Rd → R.Disadvantage: meets the ’data sparseness’ problem for highdimensional data.

Single-Index QR: Qτ (Y |x) = g(β′x|β), depends on x only throughthe single linear combination β′x.Advantage: reduces the dimension and maintain somenonparametric flexibility.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 7 / 63

Page 10: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Outline

1 IntroductionMotivationPossible Models

2 Single Index Quantile RegressionThe Proposed EstimatorMain Results

3 Numerical Studies4 Variable Selection

MotivationThe Proposed EstimatorMain Results

5 Numerical StudiesExample 1Example 2Real Data Example

6 Conclusions7 Future WorkE. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 8 / 63

Page 11: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

The Proposed EstimatorThe Model

Let {Yi ,Xi}ni=1 be independent and identically distributed (iid)observations that satisfy

Yi = Qτ (Y |Xi ) + εi ,

where Qτ (Y |x) = Qτ (Y |β′1x) and the error term satisfies Qτ (εi |x) = 0.

For identifiability, we impose certain conditions on β1:

the most common one: ‖β1‖ = 1 with its first coordinate positive.

the one adopted here: β1 = (1,β′)′, for β ∈ Rd−1.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 9 / 63

Page 12: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

The Proposed EstimatorThe Model

Let {Yi ,Xi}ni=1 be independent and identically distributed (iid)observations that satisfy

Yi = Qτ (Y |Xi ) + εi ,

where Qτ (Y |x) = Qτ (Y |β′1x) and the error term satisfies Qτ (εi |x) = 0.For identifiability, we impose certain conditions on β1:

the most common one: ‖β1‖ = 1 with its first coordinate positive.

the one adopted here: β1 = (1,β′)′, for β ∈ Rd−1.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 9 / 63

Page 13: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

The Proposed EstimatorThe Model

The true parametric vector β satisfies

β = arg minb

E(ρτ (Y − Qτ (Y |b′1X))

),

where b1 = (1,b′)′ and ρτ (u) = (τ − I (u < 0))u.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 10 / 63

Page 14: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

The Proposed EstimatorIterative algorithms

As in the single index mean regression (SIMR) problem, the unknownQτ (Y |b′1X) must be replaced with an estimator. Unlike the SIMRproblem, however, there is no closed form expression for the estimator ofQτ (Y |b′1Xi ), and this has led to iterative algorithms for estimating β(Wu, Yu, and Yu, 2010; Kong and Xia, 2012).

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 11 / 63

Page 15: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

The Proposed EstimatorThe New Approach

For any given b ∈ Rd−1, define the function g(u|b) : R→ R as

g(u|b) = E (Qτ (Y |X)|b′1X = u),

where b1 = (1,b′)′.

Noting that

g(β′1X|β) = Qτ (Y |X),

β also satisfies

β = arg minb

E(ρτ (Y − g(b′1X|b))

), (1)

where b1 = (1,b′)′.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 12 / 63

Page 16: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

The Proposed EstimatorThe New Approach

For any given b ∈ Rd−1, define the function g(u|b) : R→ R as

g(u|b) = E (Qτ (Y |X)|b′1X = u),

where b1 = (1,b′)′.Noting that

g(β′1X|β) = Qτ (Y |X),

β also satisfies

β = arg minb

E(ρτ (Y − g(b′1X|b))

), (1)

where b1 = (1,b′)′.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 12 / 63

Page 17: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

The Proposed EstimatorThe New Approach

The sample level version of (1) consists of minimizing

Sn(τ,b) =n∑

i=1

ρτ (Yi − g(b′1Xi |b)).

Again, g(·|b) is unknown but it can be estimated, in a non-iterativefashion, by first obtaining estimators Qτ (Y |Xi ), for i = 1, . . . , n, andforming the Nadaraya-Watson-type estimator

gNWQ

(t|b) =n∑

i=1

Qτ (Y |Xi )K(t−b′1Xi

h

)∑n

k=1 K(t−b′1Xk

h

) , (2)

where K (·) is a univariate kernel function and h is a bandwidth.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 13 / 63

Page 18: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

The Proposed EstimatorThe New Approach

The sample level version of (1) consists of minimizing

Sn(τ,b) =n∑

i=1

ρτ (Yi − g(b′1Xi |b)).

Again, g(·|b) is unknown but it can be estimated, in a non-iterativefashion, by first obtaining estimators Qτ (Y |Xi ), for i = 1, . . . , n, andforming the Nadaraya-Watson-type estimator

gNWQ

(t|b) =n∑

i=1

Qτ (Y |Xi )K(t−b′1Xi

h

)∑n

k=1 K(t−b′1Xk

h

) , (2)

where K (·) is a univariate kernel function and h is a bandwidth.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 13 / 63

Page 19: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

The Proposed EstimatorThe New Approach

For the estimator Qτ (Y |Xi ), we use the local linear conditional quantileestimator (Guerre and Sabbah, 2012). Specifically, for a multivariatekernel function K ∗(x) = K ∗(x1, ..., xd) and a univariate bandwidth h∗, let

Ln((α0,α1); τ, x) =1

nh∗d

n∑i=1

ρτ (Yi − α0 −α′1(Xi − x))K ∗(Xi − x

h∗

)

and define Qτ (Y |x) as α0(τ ; x), where α0(τ ; x) is defined through

(α0(τ ; x), α1(τ ; x)) = arg min(α0,α1)

Ln((α0,α1); τ, x).

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 14 / 63

Page 20: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

The Proposed Estimator

The proposed estimator is obtained by

β = arg minb∈Θ

n∑i=1

ρτ

(Yi − gNW

Q(b′1Xi |b)

), (3)

where Θ ⊂ Rd−1 is a compact set, β is in the interior of Θ.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 15 / 63

Page 21: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Outline

1 IntroductionMotivationPossible Models

2 Single Index Quantile RegressionThe Proposed EstimatorMain Results

3 Numerical Studies4 Variable Selection

MotivationThe Proposed EstimatorMain Results

5 Numerical StudiesExample 1Example 2Real Data Example

6 Conclusions7 Future WorkE. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 16 / 63

Page 22: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Main Results

Proposition 1

Let gNWQ

(t|b) be as defined in (2). Under some regularity conditions, we

have that

supb∈Θ,t∈Tb

∣∣gNWQ

(t|b)− g(t|b)∣∣ = Op

(a∗n + an + h2

2

),

where Tb = {t : t = b′1x, x ∈ X0}, a∗n = (log n/n)s/(2s+d) and

an = (log n/(nh2))1/2.

Proposition 2

Let β be as defined in (3). Under some regularity conditions, β is√n-consistent estimator of β.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 17 / 63

Page 23: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Main Results

Proposition 1

Let gNWQ

(t|b) be as defined in (2). Under some regularity conditions, we

have that

supb∈Θ,t∈Tb

∣∣gNWQ

(t|b)− g(t|b)∣∣ = Op

(a∗n + an + h2

2

),

where Tb = {t : t = b′1x, x ∈ X0}, a∗n = (log n/n)s/(2s+d) and

an = (log n/(nh2))1/2.

Proposition 2

Let β be as defined in (3). Under some regularity conditions, β is√n-consistent estimator of β.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 17 / 63

Page 24: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Main Results

Theorem

Let β be as defined in (3). Under the assumptions of Proposition 2,

√n(β − β)

d→ N(0, τ(1− τ)V−1ΣV−1

),

where

V = E((g ′(β′1X|β))2(X−1 − E (X−1|β′1X))(X−1 − E (X−1|β′1X))′fε|X(0|X)

),

and

Σ = E((g ′(β′1X|β))2(X−1 − E (X−1|β′1X))(X−1 − E (X−1|β′1X))′

),

for g ′(t|b) = (∂/∂t)g(t|b) and X−1 the (d − 1)−dimensional vector afterremoving the first coordinate.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 18 / 63

Page 25: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Numerical StudiesNotation

NWQR - the proposed estimator.

WYY - the estimator resulting from the iterative algorithm proposedby Wu et al. (2010).

WYY-1 - the estimator resulting from dividing the estimator of Wu etal. (2010) by its first component.

WYY-2 - the estimator resulting from using the proposed constraintin conjunction with the iterative algorithm of Wu et al. (2010).

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 19 / 63

Page 26: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Numerical StudiesExample 1

Consider the model

Y = exp(β′1X) + ε, (4)

where X = (X1,X2)′, Xi ∼ N(0, 1) are iid, β′1 = (1, 2)′ and the residual εfollows a standard normal distribution.

We fit the single index quantile regression model for five different quantilelevels, τ = 0.1, 0.25, 0.5, 0.75, 0.9.We use sample size of n = 400 and perform N = 100 replications.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 20 / 63

Page 27: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Numerical StudiesExample 1

Consider the model

Y = exp(β′1X) + ε, (4)

where X = (X1,X2)′, Xi ∼ N(0, 1) are iid, β′1 = (1, 2)′ and the residual εfollows a standard normal distribution.We fit the single index quantile regression model for five different quantilelevels, τ = 0.1, 0.25, 0.5, 0.75, 0.9.

We use sample size of n = 400 and perform N = 100 replications.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 20 / 63

Page 28: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Numerical StudiesExample 1

Consider the model

Y = exp(β′1X) + ε, (4)

where X = (X1,X2)′, Xi ∼ N(0, 1) are iid, β′1 = (1, 2)′ and the residual εfollows a standard normal distribution.We fit the single index quantile regression model for five different quantilelevels, τ = 0.1, 0.25, 0.5, 0.75, 0.9.We use sample size of n = 400 and perform N = 100 replications.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 20 / 63

Page 29: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Numerical StudiesExample 1

For comparison of the above estimators, consider the mean square error,R(β), and the mean check based absolute residuals, Rτ (QLL), which aredefined as

R(β) =1

N

N∑i=1

(βi − 2)2, Rτ (QLL) =1

n

n∑i=1

ρτ (Yi − QLLτ (Y |β

′1Xi )),

where QLLτ (Y |β

′1x) is the local linear conditional quantile estimator on the

univariate variable β′1Xi .

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 21 / 63

Page 30: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Numerical StudiesExample 1

τ 0.1 0.25 0.5 0.75 0.9

NWQR2.0085 2.0132 2.0149 2.0290 2.0768

(0.0603) (0.0477) (0.0481) (0.0683) (0.3905)

βWYY-1

1.9521 1.9890 3.4450 1.9363 1.9592(0.2521) (0.2252) (3.8756) (0.3246) (0.5906)

WYY-21.9699 1.9899 2.0158 2.0525 2.0287

(0.1593) (0.1307) (0.3937) (0.6735) (0.7240)

NWQR 0.0037 0.0024 0.0025 0.0055 0.1569

R(β) WYY-1 0.0652 0.0503 192.6956 0.1084 0.3469WYY-2 0.0260 0.0170 0.1537 0.4518 0.5197

NWQR 0.1682 0.3096 0.3909 0.3113 0.1761

Rτ (β1) WYY-1 0.1783 0.3273 0.4154 0.3479 0.2265WYY-2 0.1720 0.3160 0.4031 0.3310 0.2050

coverage NWQR 0.99 0.97 0.97 0.97 0.95prob. WYY-2 0.93 0.90 0.92 0.80 0.70

WYY 0.55 0.76 0.86 0.66 0.28

Table 1: mean values and standard errors (in parenthesis), R(β) andRτ (QLL) for Model (4). 95% coverage probability for NWQR, WYY-2 andthe second estimated coefficient of Wu et al. (2010).E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 22 / 63

Page 31: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Outline

1 IntroductionMotivationPossible Models

2 Single Index Quantile RegressionThe Proposed EstimatorMain Results

3 Numerical Studies4 Variable Selection

MotivationThe Proposed EstimatorMain Results

5 Numerical StudiesExample 1Example 2Real Data Example

6 Conclusions7 Future WorkE. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 23 / 63

Page 32: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Variable SelectionMotivation

Ubiquity of high dimensional data.

Including unnecessary variables Vs Variable selection.

Variable selection is also important in Quantile regression.

Large body of wok. For example, Wang, Li, and Jiang (2007), Wuand Liu (2009) for linear QR, Koenker (2011) for nonparametric/semiparametric QR, Alkenani and Yu (2013) for SIQR model.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 24 / 63

Page 33: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Variable SelectionMotivation

Ubiquity of high dimensional data.

Including unnecessary variables Vs Variable selection.

Variable selection is also important in Quantile regression.

Large body of wok. For example, Wang, Li, and Jiang (2007), Wuand Liu (2009) for linear QR, Koenker (2011) for nonparametric/semiparametric QR, Alkenani and Yu (2013) for SIQR model.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 24 / 63

Page 34: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Variable SelectionMotivation

Ubiquity of high dimensional data.

Including unnecessary variables Vs Variable selection.

Variable selection is also important in Quantile regression.

Large body of wok. For example, Wang, Li, and Jiang (2007), Wuand Liu (2009) for linear QR, Koenker (2011) for nonparametric/semiparametric QR, Alkenani and Yu (2013) for SIQR model.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 24 / 63

Page 35: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Variable SelectionMotivation

Ubiquity of high dimensional data.

Including unnecessary variables Vs Variable selection.

Variable selection is also important in Quantile regression.

Large body of wok. For example, Wang, Li, and Jiang (2007), Wuand Liu (2009) for linear QR, Koenker (2011) for nonparametric/semiparametric QR, Alkenani and Yu (2013) for SIQR model.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 24 / 63

Page 36: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Outline

1 IntroductionMotivationPossible Models

2 Single Index Quantile RegressionThe Proposed EstimatorMain Results

3 Numerical Studies4 Variable Selection

MotivationThe Proposed EstimatorMain Results

5 Numerical StudiesExample 1Example 2Real Data Example

6 Conclusions7 Future WorkE. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 25 / 63

Page 37: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

The Proposed EstimatorThe Model

Let {Yi ,Xi}ni=1 be independent and identically distributed (iid)observations that satisfy

Yi = Qτ (Y |Xi ) + εi ,

where Qτ (Y |x) = Qτ (Y |β′1x) and the error term satisfies Qτ (εi |x) = 0.

Sparsity Assumption:

Xi = (X′i1,X′i2)′, Xi1 ∈ Rd∗ , Xi2 ∈ Rd−d∗ , d∗ ≤ d is an integer that

specifies the d∗ relevant variables.

β1 = (1,β′)′ = (1,β′11,β′12)′ = (1,β′11, 0

′)′, where β11 ∈ Rd∗−1 isthe non-zero parametric vector and β12 ∈ Rd−d∗ is the zero vectorcorresponding to the irrelevant variables.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 26 / 63

Page 38: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

The Proposed EstimatorThe Model

Let {Yi ,Xi}ni=1 be independent and identically distributed (iid)observations that satisfy

Yi = Qτ (Y |Xi ) + εi ,

where Qτ (Y |x) = Qτ (Y |β′1x) and the error term satisfies Qτ (εi |x) = 0.

Sparsity Assumption:

Xi = (X′i1,X′i2)′, Xi1 ∈ Rd∗ , Xi2 ∈ Rd−d∗ , d∗ ≤ d is an integer that

specifies the d∗ relevant variables.

β1 = (1,β′)′ = (1,β′11,β′12)′ = (1,β′11, 0

′)′, where β11 ∈ Rd∗−1 isthe non-zero parametric vector and β12 ∈ Rd−d∗ is the zero vectorcorresponding to the irrelevant variables.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 26 / 63

Page 39: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

The Proposed EstimatorIterations

The true parametric vector β satisfies

β = arg minb

E(ρτ (Y − Qτ (Y |b′1X))

),

where b1 = (1,b′)′ and ρτ (u) = (τ − I (u < 0))u.

Problem

Same problem arises! Existing algorithms are iterative; computationallyexpensive and convergence issues.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 27 / 63

Page 40: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

The Proposed EstimatorIterations

The true parametric vector β satisfies

β = arg minb

E(ρτ (Y − Qτ (Y |b′1X))

),

where b1 = (1,b′)′ and ρτ (u) = (τ − I (u < 0))u.

Problem

Same problem arises! Existing algorithms are iterative; computationallyexpensive and convergence issues.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 27 / 63

Page 41: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

The Proposed EstimatorNew Approach

The proposed estimator is obtained by

β = arg minb∈Θ

n∑i=1

ρτ

(Yi − gNW

Q(b′1Xi |b)

),

where gNWQ

is the Nadaraya-Watson-type estimator, Θ ⊂ Rd−1 is a

compact set, and β is in the interior of Θ.

Recall,

gNWQ

(t|b) =n∑

i=1

Qτ (Y |Xi )K(t−b′1Xi

h

)∑n

k=1 K(t−b′1Xk

h

) ,

where Qτ (Y |Xi ) is the local linear conditional quantile estimator.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 28 / 63

Page 42: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

The Proposed EstimatorNew Approach

The proposed estimator is obtained by

β = arg minb∈Θ

n∑i=1

ρτ

(Yi − gNW

Q(b′1Xi |b)

),

where gNWQ

is the Nadaraya-Watson-type estimator, Θ ⊂ Rd−1 is a

compact set, and β is in the interior of Θ.Recall,

gNWQ

(t|b) =n∑

i=1

Qτ (Y |Xi )K(t−b′1Xi

h

)∑n

k=1 K(t−b′1Xk

h

) ,

where Qτ (Y |Xi ) is the local linear conditional quantile estimator.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 28 / 63

Page 43: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

The Proposed EstimatorNew Approach

Goals:

For high dimensional data, it is possible to improve the estimation ofthe local linear conditional quantile estimators Qτ (Y |Xi ). Considerthe use of penalty term for variable selection to estimate Qτ (Y |x).

Consider the penalty term for producing the estimator of theparametric component β of the SIQR model.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 29 / 63

Page 44: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

The Proposed EstimatorNew Approach

Goals:

For high dimensional data, it is possible to improve the estimation ofthe local linear conditional quantile estimators Qτ (Y |Xi ). Considerthe use of penalty term for variable selection to estimate Qτ (Y |x).

Consider the penalty term for producing the estimator of theparametric component β of the SIQR model.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 29 / 63

Page 45: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

The Proposed EstimatorNew Approach

Steps:

Step 1 can be thought of as an initial variable selection method usedto achieve better estimation of the conditional quantiles Qτ (Y |Xi ).

Step 2 uses these improved estimators to perform simultaneousvariable selection and estimation of the parametric component of theSIQR model.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 30 / 63

Page 46: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

The Proposed EstimatorNew Approach

Steps:

Step 1 can be thought of as an initial variable selection method usedto achieve better estimation of the conditional quantiles Qτ (Y |Xi ).

Step 2 uses these improved estimators to perform simultaneousvariable selection and estimation of the parametric component of theSIQR model.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 30 / 63

Page 47: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

The Proposed EstimatorNew Approach - Step 1

Specifically, for Step 1 we minimize with respect to b the objective function

Ln(τ,b) =n∑

i=1

ρτ (Yi − gNWQ

(b′1Xi |b)) + nd∑

j=2

pλ1(|bj |), (5)

where b1 = (1,b′)′ = (1, b2, ..., bd) and λ1 is the tuning parameter for theSCAD penalty pλ1(·).

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 31 / 63

Page 48: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

The Proposed EstimatorNew Approach - Step 1

Denote with βVS

the minimizer resulting from minimizing the objectivefunction (5).This estimator can be used for variable selection.

Let An = {j ∈ (2, ..., d) : βVSj 6= 0} of cardinality d∗ − 1.

Construct improved estimators, QVSτ (Y |x), of the conditional

quantiles.

In particular, for x∗ the d∗-dimensional vector, define QVSτ (Y |x∗) as the

local linear conditional quantile estimator on the relevant variables.

QVSτ (Y |x) has indeed a faster rate of convergence than Qτ (Y |x).

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 32 / 63

Page 49: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

The Proposed Estimator - Step 2

For Step 2, we set

gNWVS (t|b) =

n∑i=1

QVSτ (Y |Xi )K

(t−b′1Xi

h

)∑n

k=1 K(t−b′1Xk

h

)and define the minimization problem

Sn(τ,b) =n∑

i=1

ρτ

(Yi − gNW

VS (b′1Xi |b))

+ nd∑

j=2

pλ2(|bj |),

where b1 = (1,b′)′ = (1, b2, ..., bd) and λ2 > 0 is the tuning parameter forthe SCAD penalty pλ2(·). The proposed estimator is obtained by

βSCAD

= arg minb∈Θ

Sn(τ,b), (6)

where Θ ∈ Rd−1 is a compact set assumed to contain the true value of β.E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 33 / 63

Page 50: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

The Proposed EstimatorComments

Some Comments:

SCAD penalty is motivated by the fact that directly applying a L1

penalty tends to include inactive variables and to introduce bias. Also,SCAD penalty possess the oracle property. To be defined in a while.

An alternative algorithm, where Step 1 was replaced by local variableselection for the computation of QVS

τ (Y |x), was also used in somesimulations and performed equally well in the settings tried. Whilesuch an algorithm may be more appropriate for very high dimensionaldata, it is time consuming to run.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 34 / 63

Page 51: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

The Proposed EstimatorComments

Some Comments:

SCAD penalty is motivated by the fact that directly applying a L1

penalty tends to include inactive variables and to introduce bias. Also,SCAD penalty possess the oracle property. To be defined in a while.

An alternative algorithm, where Step 1 was replaced by local variableselection for the computation of QVS

τ (Y |x), was also used in somesimulations and performed equally well in the settings tried. Whilesuch an algorithm may be more appropriate for very high dimensionaldata, it is time consuming to run.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 34 / 63

Page 52: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Outline

1 IntroductionMotivationPossible Models

2 Single Index Quantile RegressionThe Proposed EstimatorMain Results

3 Numerical Studies4 Variable Selection

MotivationThe Proposed EstimatorMain Results

5 Numerical StudiesExample 1Example 2Real Data Example

6 Conclusions7 Future WorkE. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 35 / 63

Page 53: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Main Results

Theorem

Model Selection Consistency: Let βVS

= (βVS2 , ..., βVSd ) be the

minimizer of the objective function Ln(τ,b) defined in (5). Define An theset {j ∈ {2, ..., d} : βVSj 6= 0} and A the set {j ∈ {2, ..., d} : βj 6= 0}.Under some regularity conditions and the conditions λ1 → 0 and√nλ1 →∞ as n→∞, we have

limn→∞

P(An = A) = 1.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 36 / 63

Page 54: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Main Results

Proposition 1

Let QVSτ (Y |x) denote the local linear conditional quantile estimator on the

relevant variables. Then, under some regularity assumptions,

supx∈X0

|QVSτ (Y |x)− Qτ (Y |x)| = Op

(log n

n

)s/(2s+d∗)

= Op(a∗∗n )

if the bandwidth h∗ is asymptotically proportional to (log n/n)1/(2s+d∗).

Proposition 2

Let βSCAD

be as defined in (6). Then, under the assumptions of

Proposition 1 and the assumption that λ2 → 0 as n→∞, βSCAD

is√n-consistent estimator of β.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 37 / 63

Page 55: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Main Results

Proposition 1

Let QVSτ (Y |x) denote the local linear conditional quantile estimator on the

relevant variables. Then, under some regularity assumptions,

supx∈X0

|QVSτ (Y |x)− Qτ (Y |x)| = Op

(log n

n

)s/(2s+d∗)

= Op(a∗∗n )

if the bandwidth h∗ is asymptotically proportional to (log n/n)1/(2s+d∗).

Proposition 2

Let βSCAD

be as defined in (6). Then, under the assumptions of

Proposition 1 and the assumption that λ2 → 0 as n→∞, βSCAD

is√n-consistent estimator of β.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 37 / 63

Page 56: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Main Results

Theorem

Oracle Property: Let βSCAD

be as defined in (6) and set

βSCAD

= (βSCAD′

11 , βSCAD′

12 )′, where βSCAD

11 is of cardinality d∗ − 1. If theassumptions of Proposition 1 hold and moreover λ2 → 0 and

√nλ2 →∞

as n→∞, then with probability tending to one,

1. Sparsity: βSCAD

12 = 02. Asymptotic Normality:√n(β

SCAD

11 − β11)d→ N(0, τ(1− τ)V−1

11 Σ11V−111 ), where

V11 = E(

(g ′(β′1X1|β))2(X1,−1 − E(X1,−1|β′1X))(X1,−1 − E(X1,−1|β′1X))′fε|X(0|X))

and

Σ11 = E(

(g ′(β′1X1|β))2(X1,−1 − E(X1,−1|β′1X))(X1,−1 − E(X1,−1|β′1X))′),

for g ′(t|b) = (∂/∂t)g(t|b) and X1,−1 the (d∗ − 1)−dimensional vector.E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 38 / 63

Page 57: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Numerical Studies

NWQR denotes the unpenalized estimator.

SCAD-NWQR denotes the proposed estimator.

LASSO-AY denotes the LASSO estimator of Alkenani and Yu (2013).

ALASSO-AY denotes the adaptive-LASSO estimator of Alkenani andYu (2013).

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 39 / 63

Page 58: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Numerical Studies

How to evaluate the performance?

size of the estimated parametric component (number of nonzerocoefficients),

number of its correct and incorrect zeros,

absolute estimation error of the estimated parametric component,

and mean squared error (MSE) of βSCAD′

1 X, defined as

MSE (βSCAD′

1 X) =1

n

n∑i=1

(βSCAD′

1 Xi − β′1Xi )2.

For the final conditional quantile estimator QLLτ (Y |β

SCAD′

1 Xi ), we considerthe mean check based absolute residuals

Rτ (QLLτ ) =

1

n

n∑i=1

ρτ (Yi − QLLτ (Y |β

SCAD′

1 Xi )).

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 40 / 63

Page 59: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Outline

1 IntroductionMotivationPossible Models

2 Single Index Quantile RegressionThe Proposed EstimatorMain Results

3 Numerical Studies4 Variable Selection

MotivationThe Proposed EstimatorMain Results

5 Numerical StudiesExample 1Example 2Real Data Example

6 Conclusions7 Future WorkE. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 41 / 63

Page 60: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Numerical StudiesExample 1 - Asymmetric Homoscedastic Errors

Consider the model

Y = 5 cos(β′1X) + exp(−(β′1X)2) + ε, (7)

where X = (X1, ...,X5)′, Xi ∼ U(0, 1) are iid, β1 = (1, 2, 0, 0, 0)′, theresidual ε follows an exponential distribution with mean 2, and Xi ’s and εare mutually independent.

We fit the SIQR model for five different quantile levels,τ = 0.1, 0.25, 0.5, 0.75, 0.9.We use a sample size of n = 400 and perform N = 100 replications.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 42 / 63

Page 61: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Numerical StudiesExample 1 - Asymmetric Homoscedastic Errors

Consider the model

Y = 5 cos(β′1X) + exp(−(β′1X)2) + ε, (7)

where X = (X1, ...,X5)′, Xi ∼ U(0, 1) are iid, β1 = (1, 2, 0, 0, 0)′, theresidual ε follows an exponential distribution with mean 2, and Xi ’s and εare mutually independent.We fit the SIQR model for five different quantile levels,τ = 0.1, 0.25, 0.5, 0.75, 0.9.We use a sample size of n = 400 and perform N = 100 replications.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 42 / 63

Page 62: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Numerical StudiesExample 1

τ 0.1 0.25 0.5 0.75 0.9

size1.27 1.28 1.32 1.24 1.35

(0.4462) (0.4513) (0.4688) (0.5707) (0.6256)

# corr. zeros2.73 2.72 2.68 2.76 2.65

(0.4462) (0.4513) (0.4688) (0.5707) (0.6256)

# incor. zeros0 0 0 0 0

(0) (0) (0) (0) (0)

R(βSCAD

) 0.3702 0.5190 0.6535 0.7779 1.0298

Rτ (QLLτ ) 0.1955 0.4414 0.6991 0.6865 0.4425

Table 2: mean values and standard deviations (in parenthesis) for the sizeand the number of correct and incorrect zeros of the estimated parametric

component βSCAD

. Also, mean values for R(βSCAD

) and Rτ (QLLτ ) for

Model (7).

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 43 / 63

Page 63: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Numerical StudiesExample 1

Comments: The proposed methodology

selects correctly X2 as the significant covariate 100% of the times forall quantile levels (0 average number of incorrect zeros),

gives an average number of correct zeros close to the true value of 3.

The average R(βSCAD

) of the estimated parametric component seemsto increase with increasing quantile level; the asymptotic varianceformula of the sample quantiles involves the inverse density of theerror term evaluated at the specific quantile of interest.

Overall performance is good.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 44 / 63

Page 64: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Numerical StudiesExample 1

τ 0.1 0.25 0.5 0.75 0.9

SCAD-NWQR0.0015 0.0014 0.0023 0.0035 0.0174

(0.0041) (0.0034) (0.0047) (0.0059) (0.0997)

LASSO-AY0.0006 0.0022 0.0046 0.0335 0.0581

(0.0005) (0.0026) (0.0064) (0.0454) (0.0734)

ALASSO-AY0.0005 0.0020 0.0046 0.0311 0.0509

(0.0004) (0.0022) (0.0065) (0.0443) (0.0702)

Table 3: mean values and standard deviations (in parenthesis) for the

MSE (β′1X) for the SCAD-NWQR, LASSO-AY, and ALASSO-AY

estimated parametric components for Model (7).

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 45 / 63

Page 65: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Numerical StudiesExample 1

Comments:

the proposed methodology outperforms the LASSO andadaptive-LASSO estimators of Alkenani and Yu (2013) for all quantilelevels, except for the lowest one.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 46 / 63

Page 66: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Outline

1 IntroductionMotivationPossible Models

2 Single Index Quantile RegressionThe Proposed EstimatorMain Results

3 Numerical Studies4 Variable Selection

MotivationThe Proposed EstimatorMain Results

5 Numerical StudiesExample 1Example 2Real Data Example

6 Conclusions7 Future WorkE. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 47 / 63

Page 67: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Numerical StudiesExample 2 - Heteroscedastic Errors

Consider the model

Y = sin(2πβ′1X) +(1 + β′2X)2

4ε, (8)

where X = (X1, ...,X5)′, Xi ∼ U(0, 1) are iid, β1 = (1, 2, 0, 0, 0)′,β2 = (1, 1, 1, 0, 0)′, the residual ε follows an exponential distribution withmean 1.

We fit the SIQR model for five different quantile levels,τ = 0.1, 0.25, 0.5, 0.75, 0.9. We use a sample size of n = 400 and performN = 100 replications.We compare the penalized estimated parametric component(SCAD-NWQR) with the unpenalized one (NWQR).

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 48 / 63

Page 68: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Numerical StudiesExample 2 - Heteroscedastic Errors

Consider the model

Y = sin(2πβ′1X) +(1 + β′2X)2

4ε, (8)

where X = (X1, ...,X5)′, Xi ∼ U(0, 1) are iid, β1 = (1, 2, 0, 0, 0)′,β2 = (1, 1, 1, 0, 0)′, the residual ε follows an exponential distribution withmean 1.We fit the SIQR model for five different quantile levels,τ = 0.1, 0.25, 0.5, 0.75, 0.9. We use a sample size of n = 400 and performN = 100 replications.

We compare the penalized estimated parametric component(SCAD-NWQR) with the unpenalized one (NWQR).

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 48 / 63

Page 69: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Numerical StudiesExample 2 - Heteroscedastic Errors

Consider the model

Y = sin(2πβ′1X) +(1 + β′2X)2

4ε, (8)

where X = (X1, ...,X5)′, Xi ∼ U(0, 1) are iid, β1 = (1, 2, 0, 0, 0)′,β2 = (1, 1, 1, 0, 0)′, the residual ε follows an exponential distribution withmean 1.We fit the SIQR model for five different quantile levels,τ = 0.1, 0.25, 0.5, 0.75, 0.9. We use a sample size of n = 400 and performN = 100 replications.We compare the penalized estimated parametric component(SCAD-NWQR) with the unpenalized one (NWQR).

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 48 / 63

Page 70: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Numerical StudiesExample 2

τ 0.1 0.25 0.5 0.75 0.9

size1.06 1.12 1.08 1.18 1.35

(0.6485) (0.3835) (0.3075) (0.4795) (0.5751)

# corr. zeros2.79 2.87 2.92 2.82 2.65

(0.4777) (0.3667) (0.3075) (0.4795) (0.5751)

# incor. zeros0.15 0.01 0 0 0

(0.3589) (0.1) (0) (0) (0)

Table 4: mean values and standard deviations (in parenthesis) for the sizeand the number of correct and incorrect zeros of the estimator parametric

component βSCAD

for Model (8).

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 49 / 63

Page 71: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Numerical StudiesExample 2

τ 0.1 0.25 0.5 0.75 0.9

MSE (β′1X)

NWQR1.4696 0.4422 0.1259 0.1401 0.9802

(0.9675) (0.5094) (0.2012) (0.2992) (4.0261)

SCAD-NWQR0.3283 0.1094 0.0536 0.0239 0.0683

(0.4606) (0.2201) (0.1486) (0.0461) (0.1468)

R(βSCAD

)NWQR 2.3519 1.3141 0.8491 1.1313 2.3672

SCAD-NWQR 0.8644 0.4743 0.2843 0.2711 0.4935

Rτ (QLLτ )

NWQR 0.1486 0.2563 0.3406 0.3239 0.2118SCAD-NWQR 0.1248 0.2264 0.3213 0.3130 0.2064

Table 5: mean values and standard deviations (in parenthesis) for the

MSE (β′1X), average R(β) and average Rτ (QLL

τ ) for NWQR andSCAD-NWQR estimated parametric components for Model (8).

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 50 / 63

Page 72: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Outline

1 IntroductionMotivationPossible Models

2 Single Index Quantile RegressionThe Proposed EstimatorMain Results

3 Numerical Studies4 Variable Selection

MotivationThe Proposed EstimatorMain Results

5 Numerical StudiesExample 1Example 2Real Data Example

6 Conclusions7 Future WorkE. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 51 / 63

Page 73: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Numerical StudiesBoston Housing Data

506 observations.

14 variables, response: medv, the median value of owner-occupiedhomes in $1000’s, predictors: measurements on the 506 census tractsin suburban Boston from the 1970 census.

available in the MASS library in R.

Collinearity in the data set; Breiman and Friedman (1985) appliedACE method and selected the four covariates RM, TAX, PTRATIO,and LSTAT.

Variable selection in QR: Wu and Liu (2009) considered SCAD andadaptive-LASSO penalties, under the linear QR model. Alkenani andYu (2013) considered the LASSO and adaptive-LASSO penalties,under the SIQR model.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 52 / 63

Page 74: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Numerical StudiesBoston Housing Data

exclude CHAS and RAD,

standardize the response and the remaining 11 predictor variables,

denote the predictor variables by X1, X2,...,X11, and

consider the SIQR model:

Qτ (medv|X) = g(X1 + β2X2 + β3X3 + ...+ β11X11|β), (9)

for τ = 0.1, 0.25, 0.5, 0.75, 0.9, which assumes that RM is a significantpredictor.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 53 / 63

Page 75: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Numerical StudiesBoston Housing Data

τ RM CRIM ZN INDUS NOX AGE DIS TAX PTRATIO BLACK LSTAT

0.1 1 -0.1047 0 0 0 -0.3385 -1.5e-5 -0.4494 -0.0897 0.2672 -0.03670.25 1 -0.2216 0 0 0 -0.2964 -0.0737 -0.3736 -0.2238 0.2876 -0.00480.5 1 -0.2187 0 0 0 0 -2.0e-5 -0.4243 -0.2986 0.4840 -0.5273

0.75 1 0 0 0 0 -0.1260 -4.7e-5 0 -0.2783 0.3893 00.9 1 -0.1080 0 0 -0.2015 0 -0.2220 0 -0.1908 0.1814 0

Table 6: Parameter estimates for the Boston housing data for fitting theSIQR model (9) to each quantile level.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 54 / 63

Page 76: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Numerical StudiesBoston Housing Data

τ 0.1 0.25 0.5 0.75 0.9

no. of zerosSCAD-NWQR 3 3 4 6 5

LASSO-AY 0 2 1 1 2ALASSO-AY 4 2 1 1 2

τ 0.1 0.25 0.5 0.75 0.9

SCAD-NWQR 0.0641 0.1125 0.1455 0.1410 0.0882LASSO-AY 0.0503 0.0990 0.1355 0.1299 0.0892

ALASSO-AY 0.0527 0.1092 0.1355 0.1288 0.0888

Tables 7 and 8: Number of zero coefficients and mean check basedabsolute residuals, Rτ (QLL

τ ), for SCAD-NWQR, LASSO-AY, andALASSO-AY for fitting the SIQR model (9) to each quantile level.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 55 / 63

Page 77: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Numerical StudiesBoston Housing Data

Comments:

different significant predictor variables,

the proposed methodology gives more sparse models thatn the LASSOand adaptive-LASSO methodologies of Alkenani and Yu (2013),

the mean check based absolute residuals resulting from the proposedSCAD-NWQR is slightly larger than that of the other methods for allexcept the 90th quantile. The difference is probably due to thedifferent penalty functions used by the different methods, but couldalso be explained by the different levels of sparsity attained.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 56 / 63

Page 78: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Numerical StudiesBoston Housing Data

-2 -1 0 1 2 3

-2-1

01

23

0.1

index

medv-2 -1 0 1 2 3

-2-1

01

23

0.25

index

medv

-2 -1 0 1 2 3

-2-1

01

23

0.5

index

medv

-2 -1 0 1 2 3

-2-1

01

23

0.75

index

medv

-2 -1 0 1 2 3

-2-1

01

23

0.9

index

medv

Figure: Estimated single index quantile regression for Boston housing data.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 57 / 63

Page 79: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Conclusions

conditional quantiles provide a more complete picture of theconditional distribution and share nice properties such as robustness.

consider a single index model to reduce the dimensionality andmaintain some nonparametric flexibility.

iterative algorithms for estimating the parametric vector; convergenceissues.

proposed a method that directly estimates the parametric componentnon-iteratively and have derived its asymptotic distribution underheteroscedastic errors.

for high-dimensional data, introduce penalty terms both for variableselection to estimate Qτ (Y |x) and for producing the estimator of theparametric component of the SIQR model.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 58 / 63

Page 80: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Conclusions

conditional quantiles provide a more complete picture of theconditional distribution and share nice properties such as robustness.

consider a single index model to reduce the dimensionality andmaintain some nonparametric flexibility.

iterative algorithms for estimating the parametric vector; convergenceissues.

proposed a method that directly estimates the parametric componentnon-iteratively and have derived its asymptotic distribution underheteroscedastic errors.

for high-dimensional data, introduce penalty terms both for variableselection to estimate Qτ (Y |x) and for producing the estimator of theparametric component of the SIQR model.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 58 / 63

Page 81: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Conclusions

conditional quantiles provide a more complete picture of theconditional distribution and share nice properties such as robustness.

consider a single index model to reduce the dimensionality andmaintain some nonparametric flexibility.

iterative algorithms for estimating the parametric vector; convergenceissues.

proposed a method that directly estimates the parametric componentnon-iteratively and have derived its asymptotic distribution underheteroscedastic errors.

for high-dimensional data, introduce penalty terms both for variableselection to estimate Qτ (Y |x) and for producing the estimator of theparametric component of the SIQR model.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 58 / 63

Page 82: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Conclusions

conditional quantiles provide a more complete picture of theconditional distribution and share nice properties such as robustness.

consider a single index model to reduce the dimensionality andmaintain some nonparametric flexibility.

iterative algorithms for estimating the parametric vector; convergenceissues.

proposed a method that directly estimates the parametric componentnon-iteratively and have derived its asymptotic distribution underheteroscedastic errors.

for high-dimensional data, introduce penalty terms both for variableselection to estimate Qτ (Y |x) and for producing the estimator of theparametric component of the SIQR model.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 58 / 63

Page 83: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Conclusions

conditional quantiles provide a more complete picture of theconditional distribution and share nice properties such as robustness.

consider a single index model to reduce the dimensionality andmaintain some nonparametric flexibility.

iterative algorithms for estimating the parametric vector; convergenceissues.

proposed a method that directly estimates the parametric componentnon-iteratively and have derived its asymptotic distribution underheteroscedastic errors.

for high-dimensional data, introduce penalty terms both for variableselection to estimate Qτ (Y |x) and for producing the estimator of theparametric component of the SIQR model.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 58 / 63

Page 84: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Future Work

extend the proposed methodology for censored data.

The proposed estimator is obtained by

β = arg minb∈Θ

SCDn (τ,b),

where

SCDn (τ,b) =

n∑i=1

∆i

Gn(Zi |Xi )ρτ

(Zi − gNW

CD (b′1Xi |b)).

Christou and Akritas (2015); working paper.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 59 / 63

Page 85: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

References I

A. Alkenani and K. YuPenalized single-index quantile regression.International Journal of Statistics and Probability, 2: 12–30, 2013.

L. Breiman and J.H. FriedmanEstimating Optimal Transformations for multiple regression andcorrelation. Journal of the American Statistical Association, 80:580–598, 1985.

E. Christou and M.G. AkritasSingle Index Quantile Regression for Censored Data.Working paper, 2015.

E. Guerre and C. Sabbah.Uniform bias study and Bahadure representation for local polynomialestimators of the conditional quantile function.Econometric Theory, 28(01):87–129, 2012.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 60 / 63

Page 86: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

References II

R. Koenker.Additive Models for Quantile Regression: Model Selection andConfidence Bandaids.Brazilian Journal of Probability and Statistics, 25: 239–262, 2011.

E. Kong and Y. Xia.A single-index quantile regression model and its estimation.Econometric Theory, 28:730–768, 2012.

J. D. Opsomer and D. Ruppert.A fully automated bandwidth selection for additive regression model.Journal of the American Statistical Association, 93:605–618, 1998.

H. Wang, G. Li and G. JianRobust Regression Shrinkage and Consistent Variable Selectionthrough the LAD-Lasso.Journal of Business and Economic Statistics 25: 347–355, 2007.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 61 / 63

Page 87: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

References III

Y. Wu and Y. LiuVariable Selection in Quantile Regression.Statistica Sinica 19: 801–817, 2009.

T. Z. Wu, K. Yu and Y. Yu.Single index quantile regression.Journal of Multivariate Analysis, 101(7):1607–1621, 2010.

K. Yu and Z. Lu.Local linear additive quantile regression.Scandinavian Journal of Statistics, 31:333–346, 2004.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 62 / 63

Page 88: Single Index Quantile Regression for Heteroscedastic Data€¦ · Quantile Regression: 1 provides a more complete picture for the conditional distribution of the response given the

Thank you!

Many thanks to my advisor, Prof. Michael Akritas.

E. Christou, M. G. Akritas (PSU) SIQR SMAC, November 6, 2015 63 / 63