Download - MasterArbeit Statcon

An extension for smoothed empirical likelihoodconfidence intervals for extreme quantiles and

small sample sizesMaster Thesis

Author: Oliver Thunich1;2

Counseling: Sebastian Schoneberg2; Bertram Schafer2

Supervisors: Claus Weihs1; David Meintrup3

1TU Dortmund2Statcon GmbH3TH Ingolstadt

1. Motivation 1.0

Motivation

I Application: Calculating process capability without a distributionassumption.I Compute confidence intervals for ”extreme” quantiles (e.g. q = 0.01).I Using a non-parametric method.

I Problem: small sample sizes are desired but lead toI Infinite confidence intervals.I Bad coverage rates.

I Method: Smoothed empirical likelihood.I Capable of returning non symmetrical confidence intervals.I Smoothing reduces coverage error.

TU Dortmund; STATCON 2 / 16

2. Smoothed Empirical Likelihood 2.0

Existing Methods

I Empirical likelihood was first introduced by Owen (1988)

I When quantiles are considered, the log likelihood is dependent on theempirical distribution function Fn (Adimari, 1998):

l(Θ) = 2n[Fn(Θ)log(

Fn(Θ)

q) + (1−Fn(Θ))log(

1−Fn(Θ)

1−q)]

(1)

I Owen (1988) shows that the Wilks theorem is applicable to computeasymptotic confidence intervals: lim

n→∞P(l(Θ)≤ c) = P(χ2

1 ≤ c)

I As this results in a step function, several methods of smoothing havebeen proposed:I Smoothing using a kernel function. (Chen, Hall, 1993)I Linear smoothing of Fn (Adimari, 1998)



Linear Smoothing

I Let x(1) ≤ ...≤ x(n) be the ordered sample.

I The smoothing proposed by Adimari (1998) is achieved by using alinear smoothing F ∗ of Fn in equation 1.

F ∗n (Θ) =

0 if Θ < x(1)

H(Θ) if Θ ∈ [x(1),x(n))

1 if Θ≥ x(n)

where

H(Θ) =

{2i−1

2n if Θ = x(i); i ∈ {1, ..,n−1}(1−λ ) 2i−1

2n + λ2i+1

2n if Θ ∈ (x(i),x(i+1));λ =Θ−x(i)

x(i+1)−x(i); i ∈ {1, ..,n−1}

Problems:

I Constant likelihood values outside of the observed data can lead toinfinite CI’s.

I likelihood function still has two jumps (at x(1) and x(n)).



Distribution Function

−2 −1 0 1 2

0.0

0.2

0.4

0.6

0.8

1.0

Θ

F(Θ

)

FnF*

Figure: Smoothing of Fn for 11 observations from a standard normal distribution



Likelihood Function

−1 0 1 2

05

1015

Median

Θ

l(Θ)

not smoothed

Adimari

Chen and Hall

χ1,1−α2

true quantile

Figure: Smoothed empirical likelihood functions (n = 11;q = 0.5;α = 0.05)



Likelihood Function

−2.5 −2.0 −1.5 −1.0 −0.5

02

46

810

1%−Quantile

Θ

l(Θ)

not smoothed

Adimari

Chen and Hall

χ1,1−α2

true quantile

Figure: Smoothed empirical likelihood functions (n = 11;q = 0.01;α = 0.05)



Coverage Rates

0 100 200 300 400 500

0.6

0.7

0.8

0.9

1.0

n

cove

rage

rat

e

not smoothed

Adimari

Chen and Hall

normal distribution q=0.01

exponential distribution q=0.01

exponential distribution q=0.99

desired coverage rate (0.95)

minimal sample size

Figure: Estimated coverage rates based on 1000 samples


3. The Extension 3.0

The Extension

Idea:I Find an extension of the likelihood function for values outside of the

observed data so thatI finite confidence intervals are guaranteed.I the desired coverage rate is achieved.

1. extending the smoothed ecdf F ∗ as follows:

Fext(Θ) =

0 if Θ≤ x(1)−d1c1

2n −1

2n∗d1∗c (x(1)−Θ) if x(1)−d1c < Θ < x(1)

H(Θ) if x(1) ≤Θ≤ x(n)2n−1

2n + 12n∗d2∗c (Θ−x(n)) if x(n) < Θ < x(n) +d2c

1 if Θ≥ x(n) +d2c

Where c ≥ 1; d1 = 110 ∑

5i=1(x(i+1)−x(i)) and

d2 = 110 ∑

5i=1(x(n−i+1)−x(n−i))

2. linear extension of the likelihood function.TU Dortmund; STATCON 8 / 16


Visualising Fext

−2 −1 0 1 2

0.0

0.2

0.4

0.6

0.8

1.0

Θ

F(Θ

)

FnF*Fext (c=3)

Figure: Smoothing of Fn using Fext



Visualising Fext

−2 −1 0 1 2

0.0

0.2

0.4

0.6

0.8

1.0

Θ

F(Θ

)

FnF*Fext (c=3)Fext (c=7)

Figure: Visualising the influence of the parameter c on Fext



Visualising Fext

−2.5 −2.0 −1.5 −1.0 −0.5

0.0

0.1

0.2

0.3

0.4

0.5

Θ

F(Θ

)

FnF*Fext (c=3)Fext (c=7)

Figure: Visualising the influence of the parameter c on Fext



Visualising l(Θ)

−2.5 −2.0 −1.5 −1.0 −0.5

02

46

810

1%−Quantile

Θ

l(Θ)

not smoothed

Adimari

extension 1. step

χ1,1−α2

true quantile

Figure: First step of the extension (c = 3)



Visualising l(Θ)

−6 −5 −4 −3 −2 −1

02

46

810

1%−Quantile

Θ

l(Θ)

not smoothed

Adimari

extension(c=3)

χ1,1−α2

true quantile

Figure: Second step: further linear extension (c = 3)



Visualising l(Θ)

−6 −5 −4 −3 −2 −1

02

46

810

1%−Quantile

Θ

l(Θ)

not smoothedAdimariextension(c=3)extension(c=25)

Figure: Fully extended likelihood function for different values of c



Extension parameter

I The smallest value of c which results in a coverage rate of at least(1−α) is desired.

I The extension is necessary, if the confidence region extends beyondthe observed data.

I The necessity of the extension depends on the significance level α andthe product R:

R :=

{q ∗n if q ≤ 0.5

(1−q)∗n if q > 0.5

I Assumption:I The required value for c depends on q, n, R and α

I The required value for c does not depend on the distribution of thedata.



Modelling c

Simulation study:

I For different values of n;q and α choose the smallest value c ∈ Nthat produces a coverage of at least (1−α).

I Evaluate coverage using m = 5000 samples of size n from a normaldistribution.

I Try to find a model for c given q, n, R and α



Modelling c

0 1 2 3 4

05

1015

20

R

c

α=0.01α=0.025α=0.05α=0.1

Figure: Chosen value of c for different values of R



Modelling c

Simulation study:

I For different values of n,q and α choose the smallest value c ∈ Nthat produces a coverage of at least (1−α).

I Evaluate coverage using m = 5000 samples of size n from a normaldistribution.

I Model chosen:c = 12.344−7.082∗

√R−2.454∗ log(α)−75.125∗q−0.004∗n.

I (1−q) is used for q > 0.5

I adjusted R2 = 0.933



Modelling c

5 10 15 20

510

1520

simulated c

mod

eled

c

α=0.01α=0.025α=0.05α=0.1

Figure: Visualising the Model



Modelling c

−6 −5 −4 −3 −2 −1

02

46

810

1%−Quantile

Θ

l(Θ)

not smoothedAdimari(c=3)(c=25)modeled(c=16.55)

Figure: Example for the modeled value of c (n = 11, q = 0.01,α = 0.05)


4. Performance 4.0

Performance: coverage

0 100 200 300 400 500 600

0.80

0.85

0.90

0.95

1.00

Observations (n)

Cov

erag

e

q=0.005

q=0.01

q=0.05

desired coverage

Averages:Coverage: 0.952Width: 1.66

Figure: coverage rates based on 1000 samples for normally distributed data


4. Performance 4.0


0 100 200 300 400 500 600 700

0.80

0.85

0.90

0.95

1.00

Observations (n)

Cov

erag

e

q=0.005

q=0.01

q=0.05

desired coverage


Figure: coverage rates based on 1000 samples for exponentially distributed data


4. Performance 4.0


0 100 200 300 400 500 600 700

0.80

0.85

0.90

0.95

1.00

Observations (n)

Cov

erag

e

q=0.995

q=0.99

q=0.95

desired coverage


Figure: coverage rates based on 1000 samples for exponentially distributed data


4. Performance 4.0

Performance: discussion

Table: Average width if coverage is acceptable.q = 0.01,0.99, α = 0.05, based on 1000 samples

n= 10 n= 20 n= 50 n= 100 n= 200 n= 300 n= 400 n= 500g-NV Inf Inf Inf Inf 0.858 0.918 0.664A-NV Inf Inf Inf Inf 0.877 0.844 0.729H-NV 0.856 0.963 0.662

Af-NV 4.554 3.383 2.522 1.953 1.517 1.028 0.849 0.728g-Exp1 Inf Inf Inf Inf 0.021 0.020 0.016A-Exp1 Inf Inf Inf Inf 0.022 0.020 0.018H-Exp1 0.020 0.018 0.016

Af-Exp1 2.043 0.785 0.255 0.109 0.044 0.024 0.020 0.018g-Exp99 Inf Inf Inf Inf 2.479 2.688 1.886A-Exp99 Inf Inf Inf Inf 2.573 2.390 2.041H-Exp99 2.423 2.695 1.806

Af-Exp99 5.463 4.943 4.065 2.882 2.411 2.050

Legend: g-not smoothed; A- Adimari, H-Chen and Hall, Af- Adimari extended,NV-(standard) normal distribution, Exp1-exponential distribution q = 0.01,Exp99-exponential distribution q = 0.99


4. Performance 4.0


Table: Average width if coverage is acceptable.q = 0.01;0.99; α = 0.05, based on 1000 samples




Af-Exp99 5.463 4.943 4.065 2.882 2.411 2.050

Legend: g-not smoothed, A- Adimari, H-Chen and Hall, Af- Adimari extended,NV-(standard) normal distribution, Exp1-exponential distribution q = 0.01,Exp99-exponential distribution q = 0.99


4. Performance 4.0


Table: Average width if coverage is acceptable.q = 0.01,0.99, α = 0.05, based on 1000 samples




Af-Exp99 5.463 4.943 4.065 2.882 2.411 2.050

Legend: g-not smoothed, A- Adimari, H-Chen and Hall, Af- Adimari extended,NV-(standard) normal distribution, Exp1-exponential distribution q = 0.01,Exp99-exponential distribution q = 0.99


5. Outlook 5.0

Outlook

I Better account for the shape of the data by assigning higher weightsto extreme observations:

I Using a weighted mean for computing d1 and d2.

I Test of a semi parametric variation:I Assume a class of distributions (e.g. exponential) and Modell c using

samples from that distribution.


6. Outlook 6.0

References

Adimari, G. (1998). An empirical likelihood statistic for quantiles. Journalof Statistical Computation and Simulation, 60(1) pages 85-95.

Chen, S. X., Hall, P. (1993). Smoothed empirical likelihood confidenceintervals for quantiles. The Annals of Statistics, pages 1166-1181.

Owen, A. B. (1988). Empirical Likelihood Ratio Confidence IIntervals for aSingle Functional. Biometrika, Vol. 75, No. 2(Jun. 1988), pages237-249.

Zhu, H. (2007) Smoothed Empirical Likelihood for Quantiles and SomeVariations/Extension of Empirical Likelihood for Buckley-JamesEstimator, Ph.D. dissertation, University of Kentucky.