Half-Day 2: Robust Regression Estimation · Half-Day 2: Robust Regression Estimation 14 / 38...

Half-Day 2:Robust Regression Estimation

Andreas RuckstuhlInstitut fur Datenanalyse und ProzessdesignZürcher Hochschule fur Angewandte Wissenschaften

WBL Statistik 2018 — Robust Fitting

School ofEngineering

IDP Institute ofData Analysis andProcess Design

Zurich University

of Applied Sciences

Half-Day 2: Robust Regression Estimation 2 / 38General Regression M-Estimation Robust Regression MM-estimation Robuste Inferenz GLM

Outline:Half-Day 1 Regression Model and the Outlier Problem

Measuring RobustnessLocation M-EstimationInferenceRegression M-EstimationExample from Molecular Spectroscopy

Half-Day 2 General Regression M-EstimationRegression MM-EstimationExample from FinanceRobust InferenceRobust Estimation with GLM

Half-Day 3 Robust Estimation of the Covariance MatrixPrincipal Component AnalysisLinear Discriminant AnalysisBaseline Removal: An application of robust fittingbeyond theory



2.3 General Regression M-EstimationThe regression M-estimators will fail at the presence of leverage points.To see that,

compare the following examples (modified Air Quality data):

2.4 2.5 2.6 2.7 2.8 2.9

2.2

2.4

2.6

2.8

log(Annual Mean)

log(

Dai

ly M

ean)

●

●●

●

●

●

● ●

●

●

LSrobust

2.4 2.6 2.8 3.0 3.2 3.4

2.5

2.6

2.7

2.8

2.9

log(Annual Mean)lo

g(D

aily

Mea

n)

●

●

●

●

●

●

●

●

●

●

LSrobust (M)

or check the influence function:

IF⟨

x , y ; θM,N⟩

= ψ⟨ rσ

⟩M x︸︷︷︸

unbounded

.

+ Bound the total influence function!



A First Workaround: GM-EstimationAn simple modification of the Huber’s ψ-function can remedy:

Either (Mallows)

n∑i=1

ψc

⟨ri

⟨θ⟩

σ

⟩x (k)

i w 〈d 〈x i〉〉 = 0 , k = 1, . . . , p,

or (Schweppe)

n∑i=1

ψc

⟨ri

⟨θ⟩

σ · w 〈d 〈x i〉〉

⟩x (k)

i w 〈d 〈x i〉〉 =n∑

i=1

ψc·w〈d〈x i〉〉

⟨ri

⟨θ⟩

σ

⟩x (k)

i = 0 ,

where w 〈〉 is a suitable weight function and d 〈x i〉 is some measure of the“outlyingness” of x i .

Examples for w 〈〉 and d〈xi 〉:

d〈xi 〉 =(xi − median〈xk〉)

MAD〈xk〉or Mahalanobis distance and w〈d〈xi 〉〉 = Huber’s weight function (cf. notes 2.3.b)

w〈xi 〉 = 1 − Hii or w〈xi 〉 =√

1 − Hii where Hii is the leverage



Example Air Quality (modified)x.h <- 1-hat(model.matrix(y ∼ x, AQ))AQ.GMfit <- rlm(y ∼ x, data=AQ, weights=x.h, wt.method="case")

2.4 2.6 2.8 3.0 3.2 3.4

2.5

2.6

2.7

2.8

2.9

log(Annual Mean)

log(

Dai

ly M

ean)

●

●

●

●

●

●

●

●

●

●

LSrobust (M)robust (GM)



Example Air Quality (Initial Data)

2.4 2.5 2.6 2.7 2.8 2.9

2.4

2.5

2.6

2.7

2.8

2.9

log(Annual Mean)

log(

Dai

ly M

ean)

●

●

●

●

●

●

●

●

●

●

LSrobust (Mallows)robust (GM tuned)



Example Air Quality: The MapBy using robust estimation methods, we

are able to run a regression analysis every hour automatically andobtain reliable estimates each time

Hence robust methods provide a sound basis for the false colour map!

The outliers are identified and ana-lyse separately. The result of thisanalysis is part of the false colourmap.



Breakdown point of GM-EstimatorsHowever, the maximum breakdown point of general regression M-estimatorscannot exceed 1/p! (p is the number of variables)

Hence, it is not possible to detect clusters of leverage points with theprojection matrix H in residual analysis.The following example gives some insight why this happens: Look at the “Residuals vsLeverage” plots, where one, two and three outliers are put at the outlying leverage point:

0.0 0.2 0.4 0.6

−3

−2

−1

0

1

Leverage

Sta

ndar

dize

d re

sidu

als

●

●●

●

●

●

●

●

●

●

Cook's distance

10.5

0.51

8

6

2

0.0 0.1 0.2 0.3 0.4

−1.5

−1.0

−0.5

0.0

0.5

1.0

1.5

Leverage

Sta

ndar

dize

d re

sidu

als

●

●

●

●

●

●

●

●

●

●

●

Cook's distance1

0.5

0.5

1

8816

0.00 0.10 0.20 0.30

−1.5

−1.0

−0.5

0.0

0.5

1.0

1.5

Leverage

Sta

ndar

dize

d re

sidu

als

●

●

●

●

●

●

●

●●

●

●●

Cook's distance0.5

0.5

6

882



2.4 Robust Regression MM-estimationRegressions M-Estimator with Redescending ψ

Computational Experiments show:Regression M-estimators are robust if distant outliers are rejected completely!Theoretically and computationally more convenient: Ignore influence of distantoutliers graduallyFor example by a so-called redescending ψ functions like Tukey’s biweight function (ψfunction (left) and corresponding weight function (right)

r

ψ(a)

−bb

w(b)

−b b

1

But the equation defining the M-estimator has many solutions and only onemay identify the outliers correctly.Solution depends on starting value! - Good initial values are required!



Robust Estimator With High Breakdown PointRegression estimators with high breakdown point are e.g. the S-estimator.Instead of

n∑i=1

(ri 〈θ〉σ

)2!= min

θsolve 1

n − p

n∑i=1

ρ

⟨yi − xT

i θ

s

⟩= 0.5 ,

where s must be as small as possible(i.e. the equation should have a solution in θ).

A high breakdown point implies that ρ 〈·〉 must be symmetric and bounded.The function ρ 〈·〉 can be

ρ 〈·〉 = ρbo 〈u〉 :=

1−(

1−(

ubo

)2)3

if |u| < bo

1 otherwise

(its derivative is Tukey‘s bisquare function). To get a breakdown point of 0.5the tuning constant bo must be 1.548.



ρ function of the least squares estimator (left) and of Tukey‘s bisquarefunction (right)

ρ(a)

ρ(b)

−b b



Pros:high breakdown point of (about) 0.5computable(at least approximately for small data set, i.e. a few thousand observationsand about 20 – 30 predictor variables).

Cons:efficiency of just 28.7% at the Gaussian distribution!challenging computation – basically done by a random resamplingalgorithm.Such an approach may result in different solutions when the algorithm is run twice ormore times with the same data except but different seeds.



Regression MM-EstimatorWe have

a redescending M-etimator which is highly efficient but requires suitablestarting valuesan S-estimator which is highly resistant but very inefficient

Combining the strength of both estimators yields the regressionMM-estimator (modified M-estimator):

An S-estimator with breakdown point ε∗ = 1/2 is used as initial estimatorTukey’s bisquare function with ρbo =1.548: + θ

(o)and so

The redescending regression M-estimator is appliedusing Tukey’s bisquare ψ-function ψb1=4.687 and fixed scale parameter σ = so from the

initial estimation; starting value is θ(o)

.

The regression MM-estimator has a breakdown point of ε∗ = 1/2 and anefficiency and an asymptotic distribution like the regression M-estimator.



Example from FinanceReturn-Based Style Analysis of Fund of Hedge Funds(Joint work with P. Meier and his group)

A fund of hedge funds (FoHF) is a fund that invests in a portfolio of different hedge fundsto diversify the risks associated with a single hedge fund.

A hedge fund is an investment instrument that undertakes a wider range of investment andtrading activities in addition to traditional long-only investment funds.

One of the difficulties in risk monitoring of Fund of Hedge Funds (FoHF) istheir limited transparency.

Many FoHF will only disclose partial information on their underlying portfolioThe underlying investment strategy (style of FoHF), which is the crucialcharacterisation of FoHF, is self-declared

A return-based style analysis searches for the combination of indices ofsub-styles of hedge fund that would most closely replicate the actualperformance of the FoHF over a specified time period.



Such a style analysis is done basically by fitting a (in Finance) so-calledmultifactor model:

Rt =α +p∑

k=1βk Ik,t + Et

where

Rt = return on the FoHF at time tα = the excess return (a constant) of the FoHF

Ik,t = the index return of sub-style k (= factor) at time tβk = the change in the return on the FoHF per unit change in factor kp = the number of used sub-indices

Et = residual (error) which cannot be explained by the factors



Residuals VS TimeLS fit using lm(...) robust MM-fit using lmrob(. . . )

−0.04

−0.02

0.00

0.02

0.04

0.06

0.08

resi

dual

s

1998 2002 2006

+/− 2.5 sd

−0.04

−0.02

0.00

0.02

0.04

0.06

0.08

resi

dual

s1998 2002 2006

+/− 2.5 sd

Robust MM-fit identifies clearly two different investment periods: one beforeApril 2000 and one afterwards.



Residual Analysis with MM-Fitplot(FoHF.rlm)

●●

●

●●

●●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●●●

●

●●

● ●

●●

●●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

● ●●

●

●

●

●

● ●●

●●

●

●●

●●

●

● ●●

●

●●

●

●●

●

●

●

●

●

●●

●●

●

●

●●

● ●

●●●

●●

●

●

●

●●

●

●

●

●

●

5 10 15

−5

0

5

10

Robust Distances

Rob

ust S

tand

ardi

zed

resi

dual

s

Standardized residuals vs. Robust Distances

●●

●

●●

●●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●●●

●

●●

● ●

●●

●●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●●●

●

●

●

●

●●●

●●

●

●●

●●

●

● ●●

●

●●

●

●●

●

●

●

●

●

●●

●●

●

●

●●

● ●

●●●

●●

●

●

●

●●

●

●

●

●

●

−2 −1 0 1 2

−0.05

0.00

0.05

0.10

Theoretical Quantiles

Res

idua

ls

Normal Q−Q vs. Residuals

●

●

●

●

●

●●

●

●

●

●

●

●

● ●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●●

●

●●

● ●●

●

●

●

●● ●●●

●●

●●

●

●

●

●●●

●

●

●●

●

●

●●

●●●

●●

●●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●●●●

●

●●

●●

●

●

●

●●

●

●

●

●

●

−0.05 0.00 0.05 0.10

−0.05

0.00

0.05

0.10

Fitted Values

Res

pons

e

Response vs. Fitted Values

●●

●

●●

● ●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●●●

●

●●

●●

●●

●●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

● ●●

●

●

●

●

● ●●

●●

●

●●

●●

●

● ●●

●

●●

●

●●

●

●

●

●

●

●●

●●

●

●

●●

●●

●●●

●●

●

●

●

●●

●

●

●

●

●

−0.06 −0.04 −0.02 0.00 0.02 0.04

−0.05

0.00

0.05

0.10

Fitted Values

Res

idua

ls

Residuals vs. Fitted Values

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

● ●

●●

●

●

●

● ●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●●

●

●●

●●

●

●

●

●

●

●

●

●

●●

−0.06 −0.04 −0.02 0.00 0.02 0.04

0.00

0.05

0.10

0.15

0.20

0.25

0.30

Fitted Values

Sqr

t of a

bs(R

esid

uals

)

Sqrt of abs(Residuals) vs. Fitted Values



2.5 Robust Inference and Variable Selection

Outliers may also influence the result of a classical test crucially.It might happen that the null hypothesis is rejected, because an interfering alternativeHI (outliers) is present. That is, the rejection of the null hypothesis is justified butaccepting the actual alternative HA is unjustified.

To understand such situations better, one can explore the effect ofcontamination on the level and power of hypothesis tests.

Heritier and Ronchetti (1994) showed that the effects of contamination onboth the level and power of a test are inherited from the underlyingestimator (= test statistic).

That means that

the test is robust if its test statistic is based on a robust estimator.



Asymptotic Distribution of the MM-estimatorThe Regression MM-estimator is asymptotically Gaussian distributed with

mean (expectation) θ andcovariance matrix σ2τC−1, where C = (1/n)

∑i x i xT

i .

The covariance matrix of θ is estimated by

V = s2on τ C−1

where C =1n∑

i wi x i xTi

1n∑

i wi, τ =

1n∑n

i=1

(ψb1

⟨r i⟩ )2(

1n∑n

i=1 ψ′⟨

r i⟩)2

with ri :=yi − xT

i θ(o)

so, wi :=

ψb1 〈ri 〉ri

Note that θ(o)

and so come from the initial estimation.



A Further Modification to the MM-Estimator

The estimated covariance matrix V depends on quantities (θ(o)

and so)from the initial estimatorThe initial S-estimator, however, is known to be very inefficient.

Koller and Stahel (2011, 2014) investigated the effects of this constructionon the efficiency of the estimated confidence intervals . . .

. . . and, as a consequence, came up with an additional modification:Extend the current regression MM-estimator by two additional steps:I replaces so by a more efficient scale estimatorI Then apply another M-estimator but with a more slowly redescending

psi-function.

They called this estimation procedure regression SMDM-estimator and itis implemented in lmrob(..., setting="KS2014").



Example from Finance with another target FoHFReturn-Based Style Analysis of Fund of Hedge Funds - RBSA2

lm(FoHF ∼ . , data=FoHF2) lmrob(FoHF ∼ ., data=FoHF2, setting=”KS2011”)

Estimate se Pr(> |t|) Estimate se Pr(> |t|)(I) -0.0019 0.0017 0.2610 -0.0030 0.0014 0.0381RV 0.0062 0.3306 0.9850 0.3194 0.2803 0.2575CA -0.0926 0.1658 0.5780 -0.0671 0.1383 0.6288FIA 0.0757 0.1472 0.6083 -0.0204 0.1279 0.8733EMN 0.1970 0.1558 0.2094 0.2721 0.1328 0.0434ED -0.3010 0.1614 0.0655 -0.4763 0.1389 0.0009EDD 0.0687 0.1301 0.5986 0.1019 0.1112 0.3621EDRA 0.0735 0.1882 0.6971 0.0903 0.1583 0.5698LSE 0.4407 0.1521 0.0047 0.5813 0.1295 0.0000GM 0.1723 0.0822 0.0390 -0.0159 0.0747 0.8322EM 0.1527 0.0667 0.0245 0.1968 0.0562 0.0007SS 0.0282 0.0414 0.4973 0.0749 0.0356 0.0382

Residual standard error: 0.009315 Residual standard error: 0.007723

The 95% confidence interval of βSS is0.028± 1.98 · 0.041 = [−0.053, 0.109] 0.075± 1.96 · 0.036 = [0.004, 0.146]where 1.98 is the 0.975 Quantile of t108 where 1.98 is the 0.975 Quantile of N 〈0, 1〉



Example from Finance: RBSA2 (cont.)A fund of hedge funds (FoHF) may be classified by the style of their targetfunds into focussed directional, focussed non-directional or diversified.If our considered FoHF is a focussed non-directional FoHF, then it should beinvested in LSE, GM, EM, SS and hence the other parameter should be zero.

Goal: We want to test the hypothesis that q < p of the p elements of theparameter vector θ are zero.

First, let’s introduce some notation to express the results more easily:There is no real loss of generality if we suppose that the model isparameterized so that the null hypothesis can be expressed as H0 : θ1 = 0where θ = (θT

1 , θT2 )T .

Further, let V 11 be the quadratic submatrix containing the first q rowsand columns of V .



The so-called Wald-type test statistic can now be expressed as

W = θT1 (V 11)−1 θ1 .

It can be shown, that this test statistic is asymptotically χ2 distributed withq degrees of freedom.

This test statistic also provides the basis for confidence intervals of a singleparameter θk :

θk ± qN1−α/2 ·√

V kk

where qN1−α/2 is the (1− α/2) quantil of the standard Gaussian distribution.

Comments:Since we have estimated the covariance matrix Cov

⟨θ⟩

by V , it seems obvious fromclassical regression inference that replacing χ2

q by Fq, n−p is a reasonable adjustment inthe distribution of the test statistic for estimating the variance σ2.However, to do so has no formal justification and it would be better to avoid too smallsample sizes (thus 1

q · χ2q ≈ Fq, n−p) because all results are asymptotical in their nature

anyway.



For an MM-estimator, we may define a robust deviance by

D⟨

y , θMM

⟩= 2 · s2

o ·n∑

i=1

ρ

⟨yi − xT

i θMM

so

⟩.

The robust generalisation of the F-test(

SSreduced − SSfull)/

qSSfull/(n − p)

=

(SSreduced − SSfull

)/q

σ2

is to replace the sum of squares by the robust deviance (similar to generalisedlinear models) so that we obtain the test statistic

∆∗ = τ∗ ·D⟨

y , θrMM

⟩− D

⟨y , θ

fMM

⟩s2o

with τ∗ =( 1

n

n∑i=1

ψ′b1〈ri 〉)/( 1

n

n∑i=1

(ψb1 〈ri 〉

)2).

Then ∆∗ a∼ χ2q under the null hypothesis.



Example from Finance - RBSA2# Least squares estimator:> FoHF2.lm1 < − lm(FoHF ∼ ., data=FoHF2)> FoHF2.lm2 < − lm(FoHF ∼ LSE + GM + EM + SS, data=FoHF2)> anova(FoHF2.lm2, FoHF2.lm1)

Analysis of Variance TableModel 1: FoHF ∼ LSE + GM + EM + SSModel 2: FoHF ∼ RV + CA + FIA + EMN + ED + EDD + EDRA + LSE + GM + EM + SS

Res.Df RSS Df Sum of Sq F Pr(> F )1 96 0.00850242 89 0.0077231 7 0.00077937 1.2831 0.2679

# Robust with SMDM-estimator> FoHF.rlm < − lmrob(FoHF ∼ ., data=FoHF, setting=”KS2011”)> anova(FoHF.rlm, FoHF ∼ LSE + GM + EM + SS, test=”Wald”)

Robust Wald Test TableModel 1: FoHF ∼ RV + CA + FIA + EMN + ED + EDD + EDRA + LSE + GM + EM + SSModel 2: FoHF ∼ LSE + GM + EM + SSLargest model fitted by lmrob(), i.e. SMDM

pseudoDf Test.Stat Df Pr(> chisq)1 892 96 24.956 7 0.0007727 ***



Example from Finance - RBSA2 (cont.)# Robust with SMDM-estimator and Wald test> FoHF.rlm < − lmrob(FoHF ∼ ., data=FoHF, setting=”KS2011”)> anova(FoHF.rlm, FoHF ∼ LSE + GM + EM + SS, test=”Wald”)

Robust Wald Test TableModel 1: FoHF ∼ RV + CA + FIA + EMN + ED + EDD + EDRA + LSE + GM + EM + SSModel 2: FoHF ∼ LSE + GM + EM + SSLargest model fitted by lmrob(), i.e. SMDM


# Robust with SMDM-estimator and Deviance test> anova(FoHF.rlm, FoHF ∼ LSE + GM + EM + SS, test=”Deviance”)

Robust Deviance TableModel 1: FoHF ∼ RV + CA + FIA + EMN + ED + EDD + EDRA + LSE + GM + EM + SSModel 2: FoHF ∼ LSE + GM + EM + SSLargest model fitted by lmrob(), i.e. SMDM




Example from Finance - RBSA2 (i.e., Slide 21 again)

Return-Based Style Analysis of Fund of Hedge Fundslm(FoHF ∼ . , data=FoHF) lmrob(FoHF ∼ ., data=FoHF, setting=”KS2011”)

Estimate se Pr(> |t|) Estimate se Pr(> |t|)(I) -0.0019 0.0017 0.2610 -0.0030 0.0014 0.0381RV 0.0062 0.3306 0.9850 0.3194 0.2803 0.2575CA -0.0926 0.1658 0.5780 -0.0671 0.1383 0.6288FIA 0.0757 0.1472 0.6083 -0.0204 0.1279 0.8733EMN 0.1970 0.1558 0.2094 0.2721 0.1328 0.0434ED -0.3010 0.1614 0.0655 -0.4763 0.1389 0.0009EDD 0.0687 0.1301 0.5986 0.1019 0.1112 0.3621EDRA 0.0735 0.1882 0.6971 0.0903 0.1583 0.5698LSE 0.4407 0.1521 0.0047 0.5813 0.1295 0.0000GM 0.1723 0.0822 0.0390 -0.0159 0.0747 0.8322EM 0.1527 0.0667 0.0245 0.1968 0.0562 0.0007SS 0.0282 0.0414 0.4973 0.0749 0.0356 0.0382

Residual standard error: 0.009315 Residual standard error: 0.007723



Example from Finance - RBSA2: Residuals VSTime

LS fit robust MM-fit

2000 2004 2008

−0.04

−0.02

0.00

0.02

0.04

resi

dual

s

+/− 2.5 sd

2000 2004 2008

−0.04

−0.02

0.00

0.02

0.04

resi

dual

s

+/− 2.5 sd

Robust MM-fit identifies clearly two outliers.



Example from Finance - RBSA2: ResidualAnalysis with LS-Fit

−0.02 0.00 0.02 0.04 0.06

−0.04

−0.03

−0.02

−0.01

0.00

0.01

0.02

Fitted values

Res

idua

ls ●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●●

●● ●

●

●

● ●●

●

● ●

●●

●●

●

●

●●

●

●

●

●●

●

●

●

●

●

● ●

●

●● ●

●

●

●

●

●

●

●●

●●●

●

●

●●●

●●

●●

●

●

●

●

●

●●

●●

●

●

●

●●

●

●

●●●

●

●

●●●

●

●

●

●

Residuals vs Fitted

2000−03

2000−04

2000−07

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●●●

●

●

●●●

●

●●

●●

● ●

●

●

●●

●

●

●

●●

●

●

●

●

●

●●

●

●●●

●

●

●

●

●

●

●●

●●●

●

●

●●●

●●

●●

●

●

●

●

●

●●

●●

●

●

●

●●

●

●

●●●

●

●

●●●

●

●

●

●

−2 −1 0 1 2

−4

−2

0

2


Sta

ndar

dize

d re

sidu

als

Normal Q−Q

2000−03

2000−04

2000−07

−0.02 0.00 0.02 0.04 0.06

0.0

0.5

1.0

1.5

2.0

Fitted values

Sta

ndar

dize

d re

sidu

als

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●●

●

●

● ●●

●

● ●

●●

●●

●

●

●●

●

●

●●

●

●

●

●

●

●

● ●

●

●

● ●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●●

●

●

●

●

● ●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

Scale−Location2000−03

2000−042000−07

0.0 0.1 0.2 0.3 0.4

−4

−2

0

2

Leverage

Sta

ndar

dize

d re

sidu

als

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

● ●

●●●

●

●

● ● ●

●

●●

● ●

●●

●

●

●●

●

●

●

●●

●

●

●

●

●

●●

●

●● ●

●

●

●

●

●

●

●●

●● ●

●●

●● ●

●●

●●

●

●

●

●

●

●●

●●

●

●

●

●●

●

●

●●●

●

●

●●●

●

●

●

●

Cook's distance1

0.5

0.5

Residuals vs Leverage

2000−03

2000−04

2001−01



Example from Finance - RBSA2: ResidualAnalysis with Robust Fit

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

● ●●

●●

●

●● ●●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●●

●

●●

●●

●

●●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●●

●

●

●

●

●●

●●

●

●

2 4 6 8 10 12 14 16

−6

−4

−2

0

2

Robust Distances

Rob

ust S

tand

ardi

zed

resi

dual

s

Standardized residuals vs. Robust Distances

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●●●

●●

●

● ●●●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●●

●

●●

●●

●

●●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●●

●

●

●

●

●●

●●

●

●

−2 −1 0 1 2

−0.05

−0.04

−0.03

−0.02

−0.01

0.00

0.01

0.02


Res

idua

ls

Normal Q−Q vs. Residuals

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●●

●

●●

●●

●

●●●

●

●●

●

●

●

●

●

●

●●

●

●●●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●●●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●● ●

●

●

●●

●

●●

●

●

−0.04 0.00 0.02 0.04 0.06 0.08

−0.04

−0.02

0.00

0.02

0.04

0.06

0.08

Fitted Values

Res

pons

e

Response vs. Fitted Values

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●●●

●●

●

●● ●●

●

● ●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

● ●

●

●●

●

●

●

●

●

●

●

● ●

●

●●

●●

●

●●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●●

●

●

●

●

●●

● ●

●

●

−0.02 0.00 0.02 0.04 0.06

−0.05

−0.04

−0.03

−0.02

−0.01

0.00

0.01

0.02

Fitted Values

Res

idua

ls

Residuals vs. Fitted Values

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●● ●

●

●

● ●

●

●

●

●

●

●

● ●

●

●

●

●

●●●

●

●●

● ●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

−0.02 0.00 0.02 0.04 0.06

0.05

0.10

0.15

0.20

Fitted Values

Sqr

t of a

bs(R

esid

uals

)

Sqrt of abs(Residuals) vs. Fitted Values



3 Generalised Linear Models3.1 Unified Model Formulation

Generalized linear models were formulated by John Nelder and RobertWedderburn as a way of unifying various statistical regression models,including linear regression, logistic regression, Poisson regression and Gammaregression.

The generalization is based on a refomulation of the linear regression model. Instead of

Yi = θ0 + θ1 · x (1)i + . . .+ θp · x (p)

i + Ei , i = 1, . . . n, with Ei ind. ∼ N⟨

0, σ2⟩

use

Yi ind. ∼ N⟨µi , σ

2⟩

with µi = θ0 + θ1 · x (1)i + . . .+ θp · x (p)

i

The expectation µi may be linked to the linear predictor ηi = θ0 + θ1 · x (1)i + . . .+ θp · x (p)

iby another function than the identity function. In general, we assume

g 〈µi 〉 = ηi



Discrete Generalised Linear ModelsThe two discrete generalised linear models are the binary / binomialregression model and Poisson regression model.Let Yi , i = 1, . . . , n be the response and ηi = xT

i θ =∑p

j=1 x (j)i · θj its linear

predictor. ThenBinary / Binomial Regression

Yi indep. ∼ B 〈πi ,mi〉 with

E⟨ Yi

mi|x i⟩

= πi

Var⟨ Yi

mi|x i⟩

= πi ·(1−πi )mi

Poisson RegressionYi indep. ∼ P 〈λi〉 with

E 〈Yi |x i〉 = λi

Var 〈Yi |x i〉 = λi

The mean response πi and λi , respectively, are related to the linear predictor ηi bythe link function g 〈·〉: g 〈πi〉 = ηi . Popular choices for the link includeg 〈πi 〉 = log 〈πi/(1− πi )〉 Logit model

g 〈πi 〉 = Φ−1 〈πi 〉 Probit modelg 〈πi 〉 = log 〈− log 〈1− πi 〉〉 Complementary

log-log-model

g 〈λi 〉 = log 〈λi 〉 log-linear modelg 〈λi 〉 = λi identity

g 〈λi 〉 =√λi square root



Gamma RegressionLet Yi , i = 1, . . . , n be the response and ηi = xT

i θ =∑p

j=1 x (j)i · θj its linear

predictor. ThenYi indep. ∼ Gamma 〈αi , βi〉 with Common link functions are

E 〈Yi |x i〉 = αiβi

g 〈µi〉 = 1µi

inverseVar 〈Yi |x i〉 = αi

β2i

g 〈µi〉 = log 〈µi〉g 〈µi〉 = µi identity.

In GLM it is assumed thatthe response Yi is independently distributed according to a distribution from theexponential family with expectation E 〈Yi〉 = µi .The expectation µi is linked by a function g to the linear predictor xT

i β:g 〈µi〉 = xT

i β

The variance of the response depends on E 〈Yi〉: Var 〈Yi〉 = φV 〈µi〉.The so-called variance function V 〈µi 〉 is determined by the distribution. φ is thedispersion parameter.



Estimating Equation

The estimating equations of GLM can be written in a unified form,

0 =n∑

i=1

yi − µiV 〈µi〉

µ′i x i =n∑

i=1

yi − µi√V 〈µi〉

µ′i√V 〈µi〉

x i

where µ′i = ∂µ 〈ηi 〉 /∂ηi is the derivative of the inverse link function.yi−µi√V 〈µi〉

are called Pearson residuals.

If there are no leverage points, their variance is approximately constant,

Var

⟨yi − µi√

V 〈µi 〉

⟩≈ φ√

1− Hii .

In R use glm(Y ∼ ..., family=..., data=...) to fit a GLM to data.



In GLM, we face similar problems with the standard estimator as in linearregression problems at the presence of contaminated data.

If only two observations are misclassified . . .

2 4 6 8

0.0

0.2

0.4

0.6

0.8

1.0

x

y

●

●

●

● ●

●●

●

● ● ●

●

●

●

●●

●● ●

●

●●●

●

●

●

●

● ● ●●

●

●

●●

●●



Mallows type (robust) quasi-likelihood estimatorCantoni and Ronchetti (2001) suggested a Mallows type robustification ofthe estimating equation of the GLM-estimator:

0 =n∑

i=1

(ψc 〈ri〉

µ′i√V 〈µi〉

x i w 〈x i〉 − fcc 〈θ〉),

where ψc 〈〉 is the Huber function and the vector constant fcc 〈θ〉 ensures theFisher consistency of the estimator.

The “weights” w 〈x i〉 can be used to down-weight leverage points.If w 〈x i〉 = 1 for all observations i then the influence of position is notbounded (cf. regression M-estimator).To bound the total influence, one may, e.g., use w 〈x i〉 =

√1− Hii .

Since such an estimator will not yet have high breakdown point, we betteruse the inverse of the Mahalanobis distance for x i which is based on arobust covariance estimator with high breakdown point (cf. later).



Inference and Implementation

The advantage of this approach is that standard inference based both onthe asymptotic Gaussian distribution of the estimator and on robustquasi-deviances is available.I do not dare to present the formulas here, because they look frightening.But they are implemented in R and you can run the inference procedurelike glm(...) and anova(...).

The theory and the implementation cover Poisson, binomial, gammaand Gaussian (i.e., linear regression GM-estimator)

Fitting in R is done byglmrob(Y ∼ ..., family=..., data=...,

weights.on.x=c("none", "hat", "robCov", "covMcd"))

Testing in R is done byanova(Fit1, Fit2, test=c("Wald", "QD", "QDapprox"))



Take Home Message Half-Day 2Least-squares estimation are unreliable if contaminated observations arepresentBetter use a Regression MM- (or SMDM-) Estimator

In the R packages ’robustbase’ you find:lmrob(...) Regression MM-Estimatorlmrob(..., setting="KS2011") Regression SMDM-Estimatoranova(...) Comparing models using robust proceduresplot(‘‘lmrob object’’ ) graphics for a residual analysis

Remark: lmrob(...) is based on a improved algorithm (since robustbase 0.9-2) and can handleboth numeric and factor variables as exploratory variables.

Robust GM-estimators are also available for generalised linear models (GLMs)In the R packages ’robustbase’ you find:

glmrob(...) (Mallows type) Regression GM-Estimatorsanova(...) Comparing models using robust procedures


Half-Day 2: Robust Regression Estimation · Half-Day 2: Robust Regression Estimation 14 / 38...

Documents

Transcript of Half-Day 2: Robust Regression Estimation · Half-Day 2: Robust Regression Estimation 14 / 38...