An extension for smoothed empirical likelihoodconfidence intervals for extreme quantiles and
small sample sizesMaster Thesis
Author: Oliver Thunich1;2
Counseling: Sebastian Schoneberg2; Bertram Schafer2
Supervisors: Claus Weihs1; David Meintrup3
1TU Dortmund2Statcon GmbH3TH Ingolstadt
1. Motivation 1.0
Motivation
I Application: Calculating process capability without a distributionassumption.I Compute confidence intervals for ”extreme” quantiles (e.g. q = 0.01).I Using a non-parametric method.
I Problem: small sample sizes are desired but lead toI Infinite confidence intervals.I Bad coverage rates.
I Method: Smoothed empirical likelihood.I Capable of returning non symmetrical confidence intervals.I Smoothing reduces coverage error.
TU Dortmund; STATCON 2 / 16
2. Smoothed Empirical Likelihood 2.0
Existing Methods
I Empirical likelihood was first introduced by Owen (1988)
I When quantiles are considered, the log likelihood is dependent on theempirical distribution function Fn (Adimari, 1998):
l(Θ) = 2n[Fn(Θ)log(
Fn(Θ)
q) + (1−Fn(Θ))log(
1−Fn(Θ)
1−q)]
(1)
I Owen (1988) shows that the Wilks theorem is applicable to computeasymptotic confidence intervals: lim
n→∞P(l(Θ)≤ c) = P(χ2
1 ≤ c)
I As this results in a step function, several methods of smoothing havebeen proposed:I Smoothing using a kernel function. (Chen, Hall, 1993)I Linear smoothing of Fn (Adimari, 1998)
TU Dortmund; STATCON 3 / 16
2. Smoothed Empirical Likelihood 2.0
Linear Smoothing
I Let x(1) ≤ ...≤ x(n) be the ordered sample.
I The smoothing proposed by Adimari (1998) is achieved by using alinear smoothing F ∗ of Fn in equation 1.
F ∗n (Θ) =
0 if Θ < x(1)
H(Θ) if Θ ∈ [x(1),x(n))
1 if Θ≥ x(n)
where
H(Θ) =
{2i−1
2n if Θ = x(i); i ∈ {1, ..,n−1}(1−λ ) 2i−1
2n + λ2i+1
2n if Θ ∈ (x(i),x(i+1));λ =Θ−x(i)
x(i+1)−x(i); i ∈ {1, ..,n−1}
Problems:
I Constant likelihood values outside of the observed data can lead toinfinite CI’s.
I likelihood function still has two jumps (at x(1) and x(n)).
TU Dortmund; STATCON 4 / 16
2. Smoothed Empirical Likelihood 2.0
Distribution Function
−2 −1 0 1 2
0.0
0.2
0.4
0.6
0.8
1.0
Θ
F(Θ
)
FnF*
Figure: Smoothing of Fn for 11 observations from a standard normal distribution
TU Dortmund; STATCON 5 / 16
2. Smoothed Empirical Likelihood 2.0
Likelihood Function
−1 0 1 2
05
1015
Median
Θ
l(Θ)
not smoothed
Adimari
Chen and Hall
χ1,1−α2
true quantile
Figure: Smoothed empirical likelihood functions (n = 11;q = 0.5;α = 0.05)
TU Dortmund; STATCON 6 / 16
2. Smoothed Empirical Likelihood 2.0
Likelihood Function
−2.5 −2.0 −1.5 −1.0 −0.5
02
46
810
1%−Quantile
Θ
l(Θ)
not smoothed
Adimari
Chen and Hall
χ1,1−α2
true quantile
Figure: Smoothed empirical likelihood functions (n = 11;q = 0.01;α = 0.05)
TU Dortmund; STATCON 6 / 16
2. Smoothed Empirical Likelihood 2.0
Coverage Rates
0 100 200 300 400 500
0.6
0.7
0.8
0.9
1.0
n
cove
rage
rat
e
not smoothed
Adimari
Chen and Hall
normal distribution q=0.01
exponential distribution q=0.01
exponential distribution q=0.99
desired coverage rate (0.95)
minimal sample size
Figure: Estimated coverage rates based on 1000 samples
TU Dortmund; STATCON 7 / 16
3. The Extension 3.0
The Extension
Idea:I Find an extension of the likelihood function for values outside of the
observed data so thatI finite confidence intervals are guaranteed.I the desired coverage rate is achieved.
1. extending the smoothed ecdf F ∗ as follows:
Fext(Θ) =
0 if Θ≤ x(1)−d1c1
2n −1
2n∗d1∗c (x(1)−Θ) if x(1)−d1c < Θ < x(1)
H(Θ) if x(1) ≤Θ≤ x(n)2n−1
2n + 12n∗d2∗c (Θ−x(n)) if x(n) < Θ < x(n) +d2c
1 if Θ≥ x(n) +d2c
Where c ≥ 1; d1 = 110 ∑
5i=1(x(i+1)−x(i)) and
d2 = 110 ∑
5i=1(x(n−i+1)−x(n−i))
2. linear extension of the likelihood function.TU Dortmund; STATCON 8 / 16
3. The Extension 3.0
Visualising Fext
−2 −1 0 1 2
0.0
0.2
0.4
0.6
0.8
1.0
Θ
F(Θ
)
FnF*Fext (c=3)
Figure: Smoothing of Fn using Fext
TU Dortmund; STATCON 9 / 16
3. The Extension 3.0
Visualising Fext
−2 −1 0 1 2
0.0
0.2
0.4
0.6
0.8
1.0
Θ
F(Θ
)
FnF*Fext (c=3)Fext (c=7)
Figure: Visualising the influence of the parameter c on Fext
TU Dortmund; STATCON 9 / 16
3. The Extension 3.0
Visualising Fext
−2.5 −2.0 −1.5 −1.0 −0.5
0.0
0.1
0.2
0.3
0.4
0.5
Θ
F(Θ
)
FnF*Fext (c=3)Fext (c=7)
Figure: Visualising the influence of the parameter c on Fext
TU Dortmund; STATCON 9 / 16
3. The Extension 3.0
Visualising l(Θ)
−2.5 −2.0 −1.5 −1.0 −0.5
02
46
810
1%−Quantile
Θ
l(Θ)
not smoothed
Adimari
extension 1. step
χ1,1−α2
true quantile
Figure: First step of the extension (c = 3)
TU Dortmund; STATCON 10 / 16
3. The Extension 3.0
Visualising l(Θ)
−6 −5 −4 −3 −2 −1
02
46
810
1%−Quantile
Θ
l(Θ)
not smoothed
Adimari
extension(c=3)
χ1,1−α2
true quantile
Figure: Second step: further linear extension (c = 3)
TU Dortmund; STATCON 10 / 16
3. The Extension 3.0
Visualising l(Θ)
−6 −5 −4 −3 −2 −1
02
46
810
1%−Quantile
Θ
l(Θ)
not smoothedAdimariextension(c=3)extension(c=25)
Figure: Fully extended likelihood function for different values of c
TU Dortmund; STATCON 10 / 16
3. The Extension 3.0
Extension parameter
I The smallest value of c which results in a coverage rate of at least(1−α) is desired.
I The extension is necessary, if the confidence region extends beyondthe observed data.
I The necessity of the extension depends on the significance level α andthe product R:
R :=
{q ∗n if q ≤ 0.5
(1−q)∗n if q > 0.5
I Assumption:I The required value for c depends on q, n, R and α
I The required value for c does not depend on the distribution of thedata.
TU Dortmund; STATCON 11 / 16
3. The Extension 3.0
Modelling c
Simulation study:
I For different values of n;q and α choose the smallest value c ∈ Nthat produces a coverage of at least (1−α).
I Evaluate coverage using m = 5000 samples of size n from a normaldistribution.
I Try to find a model for c given q, n, R and α
TU Dortmund; STATCON 12 / 16
3. The Extension 3.0
Modelling c
0 1 2 3 4
05
1015
20
R
c
α=0.01α=0.025α=0.05α=0.1
Figure: Chosen value of c for different values of R
TU Dortmund; STATCON 12 / 16
3. The Extension 3.0
Modelling c
Simulation study:
I For different values of n,q and α choose the smallest value c ∈ Nthat produces a coverage of at least (1−α).
I Evaluate coverage using m = 5000 samples of size n from a normaldistribution.
I Model chosen:c = 12.344−7.082∗
√R−2.454∗ log(α)−75.125∗q−0.004∗n.
I (1−q) is used for q > 0.5
I adjusted R2 = 0.933
TU Dortmund; STATCON 12 / 16
3. The Extension 3.0
Modelling c
5 10 15 20
510
1520
simulated c
mod
eled
c
α=0.01α=0.025α=0.05α=0.1
Figure: Visualising the Model
TU Dortmund; STATCON 12 / 16
3. The Extension 3.0
Modelling c
−6 −5 −4 −3 −2 −1
02
46
810
1%−Quantile
Θ
l(Θ)
not smoothedAdimari(c=3)(c=25)modeled(c=16.55)
Figure: Example for the modeled value of c (n = 11, q = 0.01,α = 0.05)
TU Dortmund; STATCON 12 / 16
4. Performance 4.0
Performance: coverage
0 100 200 300 400 500 600
0.80
0.85
0.90
0.95
1.00
Observations (n)
Cov
erag
e
q=0.005
q=0.01
q=0.05
desired coverage
Averages:Coverage: 0.952Width: 1.66
Figure: coverage rates based on 1000 samples for normally distributed data
TU Dortmund; STATCON 13 / 16
4. Performance 4.0
Performance: coverage
0 100 200 300 400 500 600 700
0.80
0.85
0.90
0.95
1.00
Observations (n)
Cov
erag
e
q=0.005
q=0.01
q=0.05
desired coverage
Averages:Coverage: 0.97Width: 0.183
Figure: coverage rates based on 1000 samples for exponentially distributed data
TU Dortmund; STATCON 13 / 16
4. Performance 4.0
Performance: coverage
0 100 200 300 400 500 600 700
0.80
0.85
0.90
0.95
1.00
Observations (n)
Cov
erag
e
q=0.995
q=0.99
q=0.95
desired coverage
Averages:Coverage: 0.941Width: 3.89
Figure: coverage rates based on 1000 samples for exponentially distributed data
TU Dortmund; STATCON 13 / 16
4. Performance 4.0
Performance: discussion
Table: Average width if coverage is acceptable.q = 0.01,0.99, α = 0.05, based on 1000 samples
n= 10 n= 20 n= 50 n= 100 n= 200 n= 300 n= 400 n= 500g-NV Inf Inf Inf Inf 0.858 0.918 0.664A-NV Inf Inf Inf Inf 0.877 0.844 0.729H-NV 0.856 0.963 0.662
Af-NV 4.554 3.383 2.522 1.953 1.517 1.028 0.849 0.728g-Exp1 Inf Inf Inf Inf 0.021 0.020 0.016A-Exp1 Inf Inf Inf Inf 0.022 0.020 0.018H-Exp1 0.020 0.018 0.016
Af-Exp1 2.043 0.785 0.255 0.109 0.044 0.024 0.020 0.018g-Exp99 Inf Inf Inf Inf 2.479 2.688 1.886A-Exp99 Inf Inf Inf Inf 2.573 2.390 2.041H-Exp99 2.423 2.695 1.806
Af-Exp99 5.463 4.943 4.065 2.882 2.411 2.050
Legend: g-not smoothed; A- Adimari, H-Chen and Hall, Af- Adimari extended,NV-(standard) normal distribution, Exp1-exponential distribution q = 0.01,Exp99-exponential distribution q = 0.99
TU Dortmund; STATCON 14 / 16
4. Performance 4.0
Performance: discussion
Table: Average width if coverage is acceptable.q = 0.01;0.99; α = 0.05, based on 1000 samples
n= 10 n= 20 n= 50 n= 100 n= 200 n= 300 n= 400 n= 500g-NV Inf Inf Inf Inf 0.858 0.918 0.664A-NV Inf Inf Inf Inf 0.877 0.844 0.729H-NV 0.856 0.963 0.662
Af-NV 4.554 3.383 2.522 1.953 1.517 1.028 0.849 0.728g-Exp1 Inf Inf Inf Inf 0.021 0.020 0.016A-Exp1 Inf Inf Inf Inf 0.022 0.020 0.018H-Exp1 0.020 0.018 0.016
Af-Exp1 2.043 0.785 0.255 0.109 0.044 0.024 0.020 0.018g-Exp99 Inf Inf Inf Inf 2.479 2.688 1.886A-Exp99 Inf Inf Inf Inf 2.573 2.390 2.041H-Exp99 2.423 2.695 1.806
Af-Exp99 5.463 4.943 4.065 2.882 2.411 2.050
Legend: g-not smoothed, A- Adimari, H-Chen and Hall, Af- Adimari extended,NV-(standard) normal distribution, Exp1-exponential distribution q = 0.01,Exp99-exponential distribution q = 0.99
TU Dortmund; STATCON 14 / 16
4. Performance 4.0
Performance: discussion
Table: Average width if coverage is acceptable.q = 0.01,0.99, α = 0.05, based on 1000 samples
n= 10 n= 20 n= 50 n= 100 n= 200 n= 300 n= 400 n= 500g-NV Inf Inf Inf Inf 0.858 0.918 0.664A-NV Inf Inf Inf Inf 0.877 0.844 0.729H-NV 0.856 0.963 0.662
Af-NV 4.554 3.383 2.522 1.953 1.517 1.028 0.849 0.728g-Exp1 Inf Inf Inf Inf 0.021 0.020 0.016A-Exp1 Inf Inf Inf Inf 0.022 0.020 0.018H-Exp1 0.020 0.018 0.016
Af-Exp1 2.043 0.785 0.255 0.109 0.044 0.024 0.020 0.018g-Exp99 Inf Inf Inf Inf 2.479 2.688 1.886A-Exp99 Inf Inf Inf Inf 2.573 2.390 2.041H-Exp99 2.423 2.695 1.806
Af-Exp99 5.463 4.943 4.065 2.882 2.411 2.050
Legend: g-not smoothed, A- Adimari, H-Chen and Hall, Af- Adimari extended,NV-(standard) normal distribution, Exp1-exponential distribution q = 0.01,Exp99-exponential distribution q = 0.99
TU Dortmund; STATCON 14 / 16
4. Performance 4.0
Performance: discussion
Table: Average width if coverage is acceptable.q = 0.01,0.99, α = 0.05, based on 1000 samples
n= 10 n= 20 n= 50 n= 100 n= 200 n= 300 n= 400 n= 500g-NV Inf Inf Inf Inf 0.858 0.918 0.664A-NV Inf Inf Inf Inf 0.877 0.844 0.729H-NV 0.856 0.963 0.662
Af-NV 4.554 3.383 2.522 1.953 1.517 1.028 0.849 0.728g-Exp1 Inf Inf Inf Inf 0.021 0.020 0.016A-Exp1 Inf Inf Inf Inf 0.022 0.020 0.018H-Exp1 0.020 0.018 0.016
Af-Exp1 2.043 0.785 0.255 0.109 0.044 0.024 0.020 0.018g-Exp99 Inf Inf Inf Inf 2.479 2.688 1.886A-Exp99 Inf Inf Inf Inf 2.573 2.390 2.041H-Exp99 2.423 2.695 1.806
Af-Exp99 5.463 4.943 4.065 2.882 2.411 2.050
Legend: g-not smoothed, A- Adimari, H-Chen and Hall, Af- Adimari extended,NV-(standard) normal distribution, Exp1-exponential distribution q = 0.01,Exp99-exponential distribution q = 0.99
TU Dortmund; STATCON 14 / 16
5. Outlook 5.0
Outlook
I Better account for the shape of the data by assigning higher weightsto extreme observations:
I Using a weighted mean for computing d1 and d2.
I Test of a semi parametric variation:I Assume a class of distributions (e.g. exponential) and Modell c using
samples from that distribution.
TU Dortmund; STATCON 15 / 16
6. Outlook 6.0
References
Adimari, G. (1998). An empirical likelihood statistic for quantiles. Journalof Statistical Computation and Simulation, 60(1) pages 85-95.
Chen, S. X., Hall, P. (1993). Smoothed empirical likelihood confidenceintervals for quantiles. The Annals of Statistics, pages 1166-1181.
Owen, A. B. (1988). Empirical Likelihood Ratio Confidence IIntervals for aSingle Functional. Biometrika, Vol. 75, No. 2(Jun. 1988), pages237-249.
Zhu, H. (2007) Smoothed Empirical Likelihood for Quantiles and SomeVariations/Extension of Empirical Likelihood for Buckley-JamesEstimator, Ph.D. dissertation, University of Kentucky.
TU Dortmund; STATCON 16 / 16
Top Related