Founded 1348Charles University 1. FSV UK STAKAN III Institute of Economic Studies Faculty of Social...
-
Upload
linda-bates -
Category
Documents
-
view
212 -
download
0
Transcript of Founded 1348Charles University 1. FSV UK STAKAN III Institute of Economic Studies Faculty of Social...
Founded 1348Charles University
1
Charles University
FSV UK
STAKAN III
Institute of Economic Studies
Faculty of Social Sciences Institute of Economic Studies
Faculty of Social Sciences
Jan Ámos VíšekJan Ámos Víšek
Econometrics Econometrics
Tuesday, 12.30 – 13.50
Charles University
Ninth Lecture (summer term)
2
Schedule of today talk
A motivation for robust studies
Huber’s versus Hampel’s approach
Prohorov distance - qualitative robustness
Influence function - quantitative robustness • gross-error sensitivity
• local shift sensitivity • rejection point Breakdown point
Recalling linear regression model
Scale and regression equivariance
The weighted least squares
3
Introducing robust estimators
continued
Schedule of today talk
Maximum likelihood(-like) estimators - M-estimators
Other types of estimators - L-estimators -R-estimators - minimum distance - minimum volume
Advanced requirement on the point estimators
4
AN EXAMPLE FROM READING THE MATH
.8
1lim
8
xx
.5
1lim
5
xx
Having explained what is the limit,
an example was presented:
To be sure that the students
they were asked to solve the exercise :
The answer was as follows:
really understand what is in question,
5
The Weighted Least Squares
The reasons for weighting (down) the residuals of observations.
An example – diagonal elements of hat matrix T1T X)XX(X
Assuming intercept in model XX T
in the first column (and row) and , respectively Xn
n
1i
Tip
n
1i3i
n
1i2i )X
n
1.,X
n
1,X
n
1,1(X
has
From
I)XX(XX 1TT )0,...,0,0,1()XX(Xn 1TT
1X)XX(Xnn,...,2,1i1X)XX(Xn i1TT1TT
n
1X)XX(X)XX()XX()XX( i
1TTii
1TTi
TXn
see the next slide for geometry of situation
6
n
1X)XX(X)XX()XX()XX( i
1TTii
1TTi
The Weighted Least Squares continued
The i-th diagonal element of hat matrix
X
1X
2X
7
The Weighted Least Squares continued
Moreover, is idempotent, i.e. T1T X)XX(X
T1TT1TT1T X)XX(XX)XX(XX)XX(X
pX)XX(XrankX)XX(Xtrace T1TT1T
“mean value” of the diagonal element of is . T1T X)XX(X
n
p
For the case of random regressors - Chatterjee, S., A. S. Hadi (1988): Sensitivity Analysis in Linear Regression. New York: J. Wiley & Sons,
gave an approximation of 95% upper quantile.
2 2.5p/n6 2p/n12 1.5p/n
p larger then approx to quantile Denote this as
the 1. approx. tocritical values
8
The Weighted Least Squares continued
D.A. Belsley, E. Kuh, R.E. Welsch (1980): Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. New York: J. Wiley & Sons.
Theorem Assumptions
Assertions
X)nI(X~ T1 11 If has i.i.d. rows with p-dimensional
normal d.f. ( where ), thenT)1,....,1,1(1
pn,1pii
1ii F
h1
nh
1p
pn
L .
Of course, if rows of the matrix are independent, the rows of X
X)nI(X~ T1 11 can’t be independent. But the correlation is
of order , i.e. for large n we can employ the result. Then 1n
More precise analysis is in:9
continued
40 0.155 0.18860 0.103 0.12580 0.078 0.094100 0.062 0.075150 0.041 0.050
40 0.274 0.37560 0.187 0.25080 0.141 0.188100 0.113 0.150150 0.076 0.100
70 0.181 0.200100 0.128 0.140130 0.099 0.108160 0.081 0.088190 0.068 0.074
100 0.170 0.200130 0.132 0.154160 0.108 0.125200 0.086 0.100240 0.072 0.083
)(F1p
pnn)1p(
pn)(F
)(criticalhcritical
pn,1p
criticalpn,1p
ii
The Weighted Least Squares
The 2. approx. tocritical valuesThe 1. approx. to
critical values
3p 6p
7p 10p
Number of observations
10
The Weighted Least Squares continued
160 0.156 0.150200 0.126 0.120250 0.101 0.096300 0.084 0.080400 0.064 0.060
140 0.160 0.200180 0.125 0.156220 0.103 0.127280 0.081 0.100340 0.067 0.082
200 0.151 0.150250 0.121 0.120300 0.101 0.100400 0.076 0.075500 0.061 0.060
250 0.141 0.144300 0.118 0.120400 0.089 0.090500 0.071 0.072600 0.059 0.060
14p 16p
20p 24p
The 2. approx. tocritical valuesThe 1. approx. to
critical valuesNumber of observations
If the diagonal term of the hat matrix is larger than or even
n
p5.2 n
p2
, we should search whether it is outlier or leverage point.
It can be reason to for weighing it down !!
11
The Weighted Least Squares continued
2n
1i
TiiiR
)n,SLW( XYwminargˆp
Odhad metodou vážených nejmenší čtverců
Let . Then the weighted least squares are given as follows:
]1,0[w,,w,w n21
0WXXWYXXYWX TTT
Putting , the normal equations are }w,,w,w{diagW n21
WYXWXXˆ T1T)n,SLW(
and finally
.
12
Why the robust methods should be also used?
Fisher, R. A. (1922): On the mathematical foundations of theoretical statistics.
Philos. Trans. Roy. Soc. London Ser. A 222, pp. 309--368.
)1(
61
)x(var
)x(var
n)(t
n)1,0(N
nlim
)1(
121
)s(var
)s(var2n)(t
2n)1,0(N
nlim
13
Continued
Why the robust methods should be also used?
9t 5t 3t
nx
2ns
0.93 0.80 0.50
0.83 0.40
)T(var
)T(var
n)(t
n)1,0(N
nlim
0 !
)s(var 2nt 3
)s(var 2n)1,0(N
is asymptotically
infinitely larger than
14
Standard normal density
Student density with 5 degree of freedom
Is it easy to distinguish between normal and student density?
15
Continued
Why the robust methods should be also used?
New York: J.Wiley & Sons
Huber, P.J.(1981): Robust Statistics.
n
1i in xxn2
d 2
1
n
1i
2in xx
n
1s
)(
3
x)x()1()x(F
n2
n
n2
n
n dE/dvar
sE/svarlim)(ARE
16
0 .001 .002 .05
.876 .948 1.016 2.035
)(ARE
Continued
Why the robust methods should be also used?
So, only 5% of contamination makes two times better than . ns
nd
Is 5% of contamination much or few?
E.g. Switzerland has 6% of errors in mortality tables, see Hampel et al..
Hampel, F.R., E.M. Ronchetti, P. J. Rousseeuw, W. A. Stahel (1986):
Robust Statistics - The Approach Based on Influence Functions. New York: J.Wiley & Sons.
17
Conclusion: We have developed efficient monoposts which however work only on special F1 circuits.
A proposal: Let us use both. If both work, bless the God. We are on F1 circuit. If not, let us try to learn why.
What about to utilize, if necessary, a comfortable sedan.
It can “survive” even the usual roads.
18
Huber’s approach
One of possible frameworks of statistical problems is to consider
a parameterized family of distribution functions.
Let us consider the same structure of parameter space but instead of each distribution function
let us consider a whole neighborhood of d.f. .
Huber’s proposal:
Finally, let us employ usual statistical technique for solving the problem in question.
19
continued - an exampleHuber’s approach
Let us look for an (unbiased, consistent, etc.) esti- mator of location with minimal (asymptotic)
variance for family . )x(F)x(F
, i.e. consider instead of single d.f. the family .
H )x(H:)x(H)x(F)1()x(GQ H,
F Q
Let us look for an (unbiased, consistent, etc.) estimator of location with minimal (asymptotic) variance
for family of families .
Q)x(G H,
Finally, solve the same problem as at the beginning of the task.
For each let us define
20
Hampel’s approach
The information in data )x,,x,x( n21 x
is the same as information in empirical d.f. .nF
An estimate of a parameter of d.f. can be then considered as a functional .)F(T nn
has frequently a (theoretical) counterpart .)F(TAn example:
)F(TdFxxn
1x nn
n
1i i
)F(T)x(dFxXE
)F(T nn
21
continued Hampel’s approach
Expanding the functional at in direction to , we obtain:
)F(T n FnF
nnnn R)x(dF)x(dF)x,F('T)F(T)F(T
where is e.g. Fréchet derivative - details below.)x,F('T
Message: Hampel’s approach is an infinitesimal one, employing “differential calculus” for functionals.
Local properties of can be studied through the properties of .)F('T
)F(T nn
22
Qualitative robustness
Let us consider a sequence of “green” d.f. which coincide with the red one,
up to the distance from the Y-axis .
Does the “green” sequence converge to the red d.f. ?
n1
23
Let us consider Kolmogorov-Smirnov distance, i.e.
continuedQualitative robustness
)x(F)x(Fmax)F,F(d nRx
n
K-S distanceof any “green” d.f.
from the red one is equal to the length of yellow
segment.
The “green” sequence does not converge in K-S metric
to the red d.f. !
CONCLUSION:Independently on n,
unfortunately.
24
continuedQualitative robustness
Prokhorov distance
Now, the sequenceof the green d.f. converges
to the red one.
We look for a minimal length, we have to move the green d.f.
- to the left and up - to be above the red one.
In words:
CONCLUSION:
25
Aε,)G(AF(A)ε;infGF,π ε
)( )(),( nGnF TT)G,F( LL
Conclusion : For practical purposes we need something “stronger” than qualitative robustness.
:G,F00 DEFINITION
E.g., the arithmetic mean is qualitatively robust at normal d.f. !?!
In words: Qualitative robustness is the continuity with respect to Prohorov distance.
i.i.d.
Qualitative robustness
)F(Tˆx,...,x,x nnn21
)( nF1 TF)x( LL
26
Quantitative robustness
nnnn R)x(dF)F,T,x(IF)F(T)F(T
ni
n
1i
2/1nn R)F,T,x(IFn))F(T)F(T(n
�
The influence function is defined where the limit exists.
Influence function
)F,T,x(IF lim0h h
)F(T)hF)h1((T x
27
continuedQuantitative robustness
Characteristics derived from influence function
)F,T,x(IFsupRx
*
Gross-error sensitivity
)F,T,y(IF)F,T,x(IFsup{
yx
*
Local shift sensitivity
/ }yx
rxfor0)F,T,x(IF;0rinf*
Rejection point
28
Breakdown point
(The definition is here only to show that the description of breakdown which is below, has good mathematical basis. )
)F,ˆ( )n(*
1))K(βG(εG)π(F, (n)
:compaktis)(K,R)(K:sup p
10
nfor
Definition – please, don’t read it
in the sense that the estimate tends (in absolute value ) to infinity or to zero.
is the smallest (asymptotic) ratio )F,ˆ( )n(*
which can destroy the estimate
In words
obsession
(especially in regression
– discussio
n below)
29
An introduction - motivation Robust estimators of parameters
Let us have a family )}x(f{
and data .n21 x,,x,x
Of course, we want to estimate .
Maximum likelihood estimators :
)x(fmaxargˆi
n
1i
)x(flogmaxarg i
n
1i
What can cause a problem?
30
What can cause a problem? Robust estimators of parameters
})x(2/1exp{)2()x(f 22/1
2)x()2log()x(flog2
}{2n
1i iR
)x(maxarg
}{ 0)x(argn
1i iR
nxn
1i i n
1i ixn/1
Consider normal family with unit variance: An example
2n
1i iR
)x(minarg
(notice that does not depend on ).So we solve the extremal problem
)2log(
31
A proposal of a new estimator
Robust estimators of parameters
Maximum likelihood-like estimators :
Once again: What caused the problem in the previous example?
So what about
kxfor)x(k
1)x( 2
kxforx
n
1i ixn/1
)(1 i
n
ix
minarg }{ 0)x(arg i
n
1i
32
2)x()x(
kxfor)x()x( 2
k
1
kxforx
Robust estimators of parameters
0
kxfor)x)(k/1()x()2/1(
kxfor1 x)x()2/1(
quadratic part
linear part
33
The most popular estimators
Robust estimators of parameters
maximum likelihood-like estimators
M )(1 i
n
ix
minarg
M-estimators
based on order statistics
L )( )(1 ii
n
ixw
minarg
L-estimators
based on rank statistics
R )(
1 ii
n
iRw
minarg
R-estimators
34
Robust estimators of parameters The less popular estimators
but still well known.
Robust estimators of parameters
based on minimazing distance between empirical d.f. and theoretical one.
d )F,F(dminarg n
n
1i
Minimal distance estimators
based on minimazing volume containing given part of data and applying “classical”
(robust) method.
V }{ }Vx{Iw:)x(wminarg ii
n
1i ii
Minimal volume estimators
35
Robust estimators of parameters
The classical estimator, e.g. ML-estimator, has typically a formula to be employed for evaluating it.
Algorithms for evaluating robust estimators
Extremal problems (by which robust estimators are defined) have not
(typically) a solution in the form of closed formula.
To find an algorithm how to evaluate an approximation to the precise solution.
Firstly
To find a trick how to verify that the appro- ximation is tight to the precise solution.
Secondly
36
High breakdown point
obsession (especially in regression
– discussion below)
Hereafter let us have in mind that we speak implicitly about
37
Recalling the model
Put
1p,,2,1j,n,,2,1i,x)X( ijij
( if intercept n,,2,1i1x 1i ),T
n21 ),,,( .andTp21 ),,,(
where Tip2i1ii )x,,x,x(X .
0T
i
n
1j
0jiji XXY
Tn21 )Y,,Y,Y(Y
LineLineaar regresr regressionsion model model
0XY
38
So we look for a model“reasonably” explaining data.
LineLineaar regresr regressionsion model model
Recalling the model graphically 39
This is a leverage point and this is an outlier.
LineLineaar regresr regressionsion model model
Recalling the model graphically 40
Formally it means:
If for data )X,Y( the estimate is , )X,aY(than for data the estimate is .ˆa
Equivariance in scale
If for data )X,Y( the estimate is , )X,XY( Tthan for data the estimate is .ˆ
Equivariance in regression Scale equivariant
Affine equivariant
We arrive probably easy to an agreementthat the estimates of parameters of model
should not depend on the system of coordinates.
Equivariance of regression estimators41
Unbiasedness Consistency
Asymptotic normality Low Gross-error sensitivity
Reasonably high efficiency Low local shift sensitivity
Finite rejection point Controllable breakdown point
Scale- and regression-equivariance Algorithm with acceptable complexity
and reliability of evaluation Heuristics, the estimator is based on,
is to really work
Advanced (modern?) requirement on the point estimator
Still not
exhaustive
42
What is to be learnt from this lecture for exam ?
All what you need is on http://samba.fsv.cuni.cz/~visek/
• Break down point
• Weigted least squares
• M-estimators and minimal distance estimators
• Main reasons for constructing robust estimators - influence of outliers in estimating mean and variance
• Influence function and indicators of robustness based on it