ROBUST STATISTICS
description
Transcript of ROBUST STATISTICS
ROBUST STATISTICS
INTRODUCTION
• Robust statistics provides an alternative approach to classical statistical methods. The motivation is to produce estimators that are not excessively affected by small departures from model assumptions. These departures may include departures from an assumed sample distribution or data departuring from the rest of the data (i.e. outliers).
MEAN VS MEDIAN
0.21Usually,0. where21
nm
mn
XT
mn
mj
jn
where11
nmn
XXXmT
mn
miimnm
n
ROBUST MEASURE OF VARIABILITY
j
ji
iYmedianYmedianMAD
ORDER STATISTICS AND ROBUSTNESS
• Ordered statistics and their functions are usually somewhat robust (e.g. median, MAD, IQR), but not all ordered statistics are robust (e.g. X(1), X(n), R=X(n) X(1).
M-ESTIMATORS
i
iii
ii
YminYflogmin
ˆY.ˆYwˆYwˆY iiii
ii
ii where0or 0
M-ESTIMATORS
M-ESTIMATORS
M-ESTIMATORS
M-ESTIMATOR
• When an estimator is robust, it may be inferred that the influence of any single observation is insufficient to yield any significant offset. There are several constraints that a robust M-estimator should meet:
1. The first is of course to have a bounded influence function.2. The second is naturally the requirement of the robust estimator to
be unique.
Briefly we give a few indications of these functions:
• L2 (least-squares) estimators are not robust because their influence function is not bounded.• L1 (absolute value) estimators are not stable because the -function |x| is not strictly convex in x. Indeed, the second derivative at x=0 is unbounded, and an indeterminant solution may result.•L1L2 estimators reduce the influence of large errors, but they still have an influence because the influence function has no cut off point.
EXAMPLES OF M-ESTIMATORS• The mean corresponds to ρ(x) = x2, and the median to ρ(x) = |x|. (For
even n any median will solve the problem.) The function
corresponds to metric trimming and large outliers have no influence at all. The function
is known as metric Winsorizing2 and brings in extreme observations to μ±c.
otherwise,
cx,xx
0
cx,c
cx,x
cx,c
x
EXAMPLES OF M-ESTIMATORS
• The corresponding −log f is
and corresponds to a density with a Gaussian center and double-exponential tails. This estimator is due to Huber.
otherwise,cxc
cxif,xx
2
2
EXAMPLES OF M-ESTIMATORS
• Tukey’s biweight has
where [ ]+ denotes the positive part of. This implements ‘soft’ trimming. The value R = 4.685 gives 95% efficiency at the normal.• Hampel’s ψ has several linear pieces,
22
1
R
ttt
xc,
cxb,bc/xca
bxa,a
ax,x
xsgnx
0
0
for example, with a = 2.2s, b = 3.7s, c = 5.9s.
ROBUST REGRESSION• Procedures dampen the influence of outlying cases, as compared to
ordinary LSE, in an effort to provide a better fit for the majority of cases.• LEAST ABSOLUTE RESIDUALS (LAR) REGRESSION: Estimates the
regression coefficients by minimizing the sum of absolute deviations of Y observations from their means:
Since absolute deviations rather than squared ones are involved, LAR places less emphasis on outlying observations than does the method of LS. Residuals ordinarily will not sum to 0. Solution for estimated coefficients may not be unique.
n
ip,ip,ii XXYmin
111110
ROBUST REGRESSION
• ITERATIVELY REWEIGHTED LEAST SQUARES (IRLS) ROBUST REGRESSION: It uses weighted least squares procedure.
This regression uses weights based on how far outlying a case is, as measured by the residual for that case. The weights are revised with each iteration until a robust fit has been obtained.
n
ip,ip,iii XXYwmin
1
2110
ROBUST REGRESSION
• LEAST MEDIAN OF SQUARES (LMS) REGRESSION:
• Other robust regression methods: Some involve trimming extreme squared deviations before applying LSE, others are based on ranks. Many of the robust regression procedures require intensive computing.
n
ip,ip,ii XXYmin
1
2110
EXAMPLE• This data set gives n = 24 observations about the annual numbers of
telephone calls made (calls, in millions of calls) in Belgium in the last two digits of the year (year); see Rousseeuw and Leroy (1987), and Venables and Ripley (2002). As it can be noted in Figure there are several outliers in the y-direction in the late 1960s.
• Let us start the analysis with the classical OLS fit.> data(phones)> attach(phones)> plot(year,calls)>fit.ols <- lm(calls~year)> summary(fit.ols,cor=F)..Coefficients:Estimate Std. Error t value Pr(>|t|)(Intercept) -260.059 102.607 -2.535 0.0189 *year 5.041 1.658 3.041 0.0060 **Residual standard error: 56.22 on 22 degrees of freedomMultiple R-Squared: 0.2959, Adjusted R-squared: 0.2639F-statistic: 9.247 on 1 and 22 DF, p-value: 0.005998
> abline(fit.ols$coef)> par(mfrow=c(1,4))> plot(fit.ols,1:2)> plot(fit.ols,4)> hmat.p <- hat(model.matrix(fit.ols))> h.phone <- hat(hmat.p)> cook.d <- cooks.distance(fit.ols)> plot(h.phone/(1-h.phone),cook.d,xlab="h/(1-h)",ylab="Cook distance")
• In order to take into account of observations related to high values of the residuals, i.e. the
outliers in the late 1960s, consider a robust regression based on Huber-type estimates:> fit.hub <- rlm(calls~year,maxit=50)> fit.hub2 <- rlm(calls~year,scale.est="proposal 2")> summary(fit.hub,cor=F)..Coefficients:Value Std. Error t value(Intercept) -102.6222 26.6082 -3.8568year 2.0414 0.4299 4.7480Residual standard error: 9.032 on 22 degrees of freedom> summary(fit.hub2,cor=F)..Coefficients:Value Std. Error t value(Intercept) -227.9250 101.8740 -2.2373year 4.4530 1.6461 2.7052Residual standard error: 57.25 on 22 degrees of freedom
> abline(fit.hub$coef,lty=2)abline(fit.hub2$coef,lty=3)
•From these results and also from THE PREVIOUS PLOT, we note that there are some differences with the OLS estimates, in particular this is true for the Huber-type estimator with MAD. Consider again some classic diagnostic plots about the robust fit: the plot of the observed values versus the fitted values, the plot of the residuals versus the fitted values, the normal QQ-plot of the residuals and the fit weights of the robust estimator. Note that there are some observations with low Huber-type weights which were not identified by the classical Cook’s statistics.