A Deter Minis Tic Approach to Blind Identification of Multi-channel FIR System
A Deter Minis Tic Algorithm for the MCD
-
Upload
hamada-alger -
Category
Documents
-
view
237 -
download
0
Transcript of A Deter Minis Tic Algorithm for the MCD
-
8/2/2019 A Deter Minis Tic Algorithm for the MCD
1/27
S E C T I O N O F S T A T I S T I C S
DEPARTMENT OF MATHEMATICS
KATHOLIEKE UNIVERSITEIT LEUVEN
T E C H N I C A L R E P O R T
TR-10-01
A DETERMINISTIC ALGORITHM
FOR THE MCD
Hubert, M., Rousseeuw, P.J., Verdonck, T.
http://wis.kuleuven.be/stat/
-
8/2/2019 A Deter Minis Tic Algorithm for the MCD
2/27
A deterministic algorithm for the MCD
Mia HubertDepartment of Mathematics, Katholieke Universiteit Leuven
andPeter J. Rousseeuw
Department of Mathematics, Katholieke Universiteit Leuvenand
Tim VerdonckDepartment of Mathematics and Computer Science, Universiteit Antwerpen
January 4, 2010
Abstract
The minimum covariance determinant (MCD) method is a robust estimator of multivari-ate location and scatter (Rousseeuw, 1984). The MCD is highly resistant to outliers, andit is often applied by itself and as a building block for other robust multivariate methods.
Computing the exact MCD is very hard, so in practice one resorts to approximate algorithms.Most often the FASTMCD algorithm of Rousseeuw and Van Driessen (1999) is used. Thisalgorithm starts by drawing many random subsets, followed by so-called concentration steps.The FASTMCD algorithm is affine equivariant but not permutation invariant. In this arti-cle we present a deterministic algorithm, denoted as DetMCD, which does not use randomsubsets and is even faster. It is permutation invariant and very close to affine equivariant.We illustrate DetMCD on real and simulated data sets, with applications involving princi-pal component analysis, multivariate regression, and classication. Supplemental material(Matlab code of the DetMCD algorithm and the data sets) are available online.
Keywords: affine equivariance, covariance, outliers, multivariate, robustness.
1 Introduction
The Minimum Covariance Determinant (MCD) method (Rousseeuw, 1984) is a highly robust
estimator of multivariate location and scatter. Given an n p data matrix X = ( x 1 , . . . , x n )T with
1
-
8/2/2019 A Deter Minis Tic Algorithm for the MCD
3/27
x i = ( xi1 , . . . , x ip )T , its objective is to nd h observations (with n/ 2 h n) whose covariance
matrix has the lowest determinant. The MCD estimate of location is then the average of these
h points, and the scatter estimate is a multiple of their covariance matrix. Consistency and
asymptotic normality of the MCD estimator has been shown by Butler et al. (1993) and Cator
and Lopuhaa (2009). The MCD has a bounded inuence function (Croux and Haesbroeck, 1999).
The breakdown value is the smallest amount of contamination that can have an arbitrarily large
effect on the estimator. The MCD estimator has the highest possible breakdown value (i.e. 50%)
when h = (n + p + 1) / 2 (Lopuha a and Rousseeuw, 1991). In practice we often do not need the
maximal breakdown value and therefore typically h = 0.75n is chosen, yielding a breakdown
value of 25% which is sufficiently robust for most applications.
In addition to being highly resistant to outliers, the MCD is affine equivariant, i.e. the estimates
behave properly under affine transformations of the data. To be precise, the estimators and
are affine equivariant if for any n p data set X it holds that
(XA + 1n v T ) = (X )A + v (1)
(XA + 1n v T ) = A T (X )A (2)
for all nonsingular p p matrices A and all p 1 vectors v . The vector 1n denotes (1, 1, . . . , 1)T
with n entries. Affine equivariance makes the analysis independent of the measurement scales of
the variables, as well as to translations and rotations of the data.
Although the MCD was already introduced in 1984, its practical use only became feasible
since the introduction of the computationally efficient FASTMCD algorithm of Rousseeuw and
Van Driessen (1999). Since then the MCD has been applied in various elds such as quality con-
trol, medicine, nance, image analysis and chemistry, see e.g. Hubert et al. (2008) and Hubert
and Debruyne (2009) for references. The MCD is also being used as a basis to develop robust and
computationally efficient multivariate techniques, such as e.g. principal component analysis (Croux
and Haesbroeck, 2000; Hubert et al., 2005), factor analysis (Pison et al., 2003), classication (Hu-
bert and Van Driessen, 2004; Vanden Branden and Hubert, 2005), clustering (Hardin and Rocke,
2004), and multivariate regression (Rousseeuw et al., 2004). For a review see (Hubert et al., 2008).
The FASTMCD algorithm starts by drawing random subsets of size p+1. It needs to draw many
in order to obtain at least one that is outlier-free. Starting from each subset several iteration steps
are taken, as will be described in the next section. The overall computation time of FASTMCD
2
-
8/2/2019 A Deter Minis Tic Algorithm for the MCD
4/27
is thus roughly proportional to the number of initial subsets.
If one is willing to give up the affine equivariance requirement, certain robust covariance ma-
trices can be computed much faster. This is the idea behind the BACON algorithm (Billor et al.,
2000; Hadi et al., 2009), the spatial sign and rank covariance matrices (Visuri et al., 2000), and
the OGK estimator (Maronna and Zamar, 2002).
In this chapter we will present a deterministic algorithm for the MCD, denoted as DetMCD,
which does not use random subsets and runs even faster than FASTMCD. Unlike the latter it is
permutation invariant, i.e. the result does not depend on the order of the observations in the data
set. It starts from only a few well-chosen initial estimates. In Section 2 we give brief descriptions of
FASTMCD and the OGK estimator, since parts of both are used in Section 3 to construct the new
DetMCD algorithm. Section 4 reports on an extensive simulation study, showing that DetMCD is
as robust as FASTMCD. In Section 5 we show that DetMCD is permutation invariant and close
to affine equivariant. Section 6 illustrates the algorithm on several real data sets with applications
involving principal component analysis, multivariate regression, and discriminant analysis.
2 FASTMCD and OGK
In this section we briey describe the FASTMCD algorithm and the OGK estimator, as our new
algorithm DetMCD will use aspects of both. The observations will be denoted as x i (i = 1 , . . . , n ),
whereas the columns of our data matrix are denoted by X j ( j = 1 , . . . , p). For a data set X with
estimated center and scatter matrix , the statistical distance of the i-th observation x i will
be written as
D(x i , , ) = (x i )T 1 (x i ).2.1 The FASTMCD algorithm
A major component of the FASTMCD algorithm is the concentration step (C-step), which works
as follows. Given initial estimates old for the center and old for the scatter matrix,
1. Compute the distances dold (i) = D(x i , old , old ) for i = 1 , . . . , n .
2. Sort these distances, yielding a permutation for which
dold ((1)) dold ((2)) . . . dold ((n)), and set H = {(1), (2), . . . , (h)}.
3
-
8/2/2019 A Deter Minis Tic Algorithm for the MCD
5/27
3. Compute new = 1 /h iH x i and new = 1 /h iH (x i new )(x i new )T .
In Theorem 1 of Rousseeuw and Van Driessen (1999) it was proved that det( new ) det( old ),
with equality only if new = old . Therefore, if we apply C-steps iteratively, the sequence of
determinants obtained in this way must converge in a nite number of steps (because there areonly nitely many h-subsets). Since there is no guarantee that the nal value of the iteration
process is the global minimum of the MCD objective function, an approximate MCD solution
is obtained by taking many initial h-subsets H 1 {1, 2, . . . , n }, applying C-steps to each, and
keeping the solution with the overall lowest determinant.
To construct an initial subset H 1 a random ( p + 1)-subset J is drawn and 0 = 1 / ( p +
1) i J x i and 0 = 1 / ( p+ 1) i J (x i 0 )(x i 0 )T are computed. (If 0 is singular, random
points are added to J until it becomes nonsingular.) Next, we apply the C-step to ( 0 , 0 )
yielding ( 1 , 1 ), etc. Since each C-step involves the calculation of a covariance matrix, its inverse,
and the corresponding distances, we dont want to use too many. Therefore, the FASTMCD
algorithm only applies two C-steps to each initial subset, and only on the ten subsets with lowest
determinant further C-steps are taken until convergence. The raw FASTMCD estimates, RAWMCD
and RAWMCD , then correspond to the empirical mean and covariance matrix of the h-subset with
the lowest determinant.
In order to increase the statistical efficiency while retaining high robustness, reweighted esti-
mators are computed:
FASTMCD =n
i=1wix i /
n
i=1wi
FASTMCD = c1n
i=1wi(x i FASTMCD )(x i FASTMCD )T
n
i=1wi
1
where c1 is a correction factor to obtain consistency when the data come from a multivariate
normal distribution (Pison et al., 2002) and wi is an appropriate weight function, e.g.
wi =1 D(x i , RAWMCD , RAWMCD ) 2 p,0 .9750 otherwise
with 2 p, the -quantile of the 2 p distribution.
Implementations of the FASTMCD algorithm are available in the package S-PLUS (as the
built-in function cov.mcd ), in R (as part of the packages rrcov , robust and robustbase ), in
4
-
8/2/2019 A Deter Minis Tic Algorithm for the MCD
6/27
SAS/IML Version 7, and in SAS Version 9 (in PROC ROBUSTREG). The FASTMCD is also
part of LIBRA, a Matlab LIBrary for Robust Analysis (Verboven and Hubert, 2005) as the function
mcdcov . Moreover, it is available in the PLS Toolbox of Eigenvector Research (Wise et al., 2006)
used in chemometrics.
2.2 The OGK estimator
Maronna and Zamar (2002) presented a general method to obtain positive denite and approx-
imately affine equivariant robust scatter matrices starting from any pairwise robust scatter ma-
trix. This method was applied to the robust covariance estimate of Gnanadesikan and Ketten-
ring (1972). The resulting multivariate location and scatter estimates are called orthogonalized
Gnanadesikan-Kettenring (OGK) estimates and are calculated as follows:1. Let m(.) and s(.) be robust univariate estimators of location and scale.
2. Construct y i = D 1 x i for i = 1 , . . . , n with D = diag( s(X 1 ), . . . , s (X p)).
3. Compute the correlation matrix U of the variables of Y = ( Y 1 , . . . , Y p), given by
u jk = 1 / 4(s(Y j + Y k)2 s(Y j Y k)
2 ).
4. Compute the matrix E of eigenvectors of U and
(a) project the data on these eigenvectors, i.e. V = Y E ;
(b) compute robust variances of V = ( V 1 , . . . , V p), i.e. = diag( s2 (V 1 ), . . . , s 2 (V p));
(c) Set (Y ) = Em where m = ( m(V 1 ), . . . , m (V p))T , and compute the positive denite
matrix (Y ) = E E T .
5. Transform back to X , i.e. RAWOGK = D (Y ) and RAWOGK = D (Y )D T .
In the OGK algorithm m(.) is a weighted mean and s(.) is the -scale of Yohai and Zamar
(1988). Step 2 makes the estimate scale equivariant, whereas the following steps are a kind
of principal components that replace the eigenvalues of U (which may be negative) by robust
variances. As in the FASTMCD algorithm the estimate is improved by a reweighting step, where
the cutoff value in the weight function is now taken as c = 2 p,0 .9 med(d1 , . . . , d n )/2 p,0 .5 with
di = D(x i , RAWOGK , RAWOGK ). The reweighted estimates are denoted as OGK and OGK .
5
-
8/2/2019 A Deter Minis Tic Algorithm for the MCD
7/27
3 Deterministic MCD algorithm
3.1 General procedure
In this section we present an alternative algorithm to calculate the MCD. First we standardizeeach variable X j by subtracting its median and dividing by the Qn scale estimator of Rousseeuw
and Croux (1993). This standardization makes the algorithm location and scale equivariant, i.e.
(1) and (2) hold for any non-singular diagonal matrix A . (We also looked into centering by
the spatial median, but based on speed considerations and simulation results we stayed with the
coordinatewise median.) The standardized data set is denoted by Z with rows z T i (i = 1 , . . . , n )
and columns Z j ( j = 1 , . . . , p).
Next, we construct seven initial estimates k(Z ) and k(Z ) (k = 1 , . . . , 7) for the center and
scatter of Z . Apart from the last one, each computes a preliminary estimate S k of the covariance
or correlation matrix of Z . They will be described in Section 3.2. As these S k may have very
inaccurate eigenvalues, we apply the following steps to each. Note that the rst two steps are
similar to steps 4(a) and 4(b) of the OGK algorithm:
1. Compute the matrix E of eigenvectors of S k and put B = ZE .
2. Estimate the covariance of Z by k(Z ) = ELE T where L = diag ( Q2n (B 1 ), . . . , Q2n (B p)).
3. To estimate the center of Z we sphere the data, apply the coordinatewise median, and
transform it back, i.e. k(Z ) = 1 / 2k (med( Z
1 / 2k )).
For all seven estimates ( k(Z ), k(Z )) we then compute the statistical distances
dik = D(z i , k(Z ), k(Z )) . (3)
For each initial estimate k we take the h observations with smallest dik and apply C-steps until
convergence. The solution with smallest determinant we call the raw DetMCD. Then we apply areweighting step as in the FASTMCD algorithm, yielding the nal DetMCD.
3.2 Initial scatter estimates
(1) The rst initial scatter estimate is obtained by computing the hyperbolic tangent (sigmoid)
of each column of Z , i.e. Y j = tanh( Z j ) for j = 1 , . . . , p . This bounded function reduces
6
-
8/2/2019 A Deter Minis Tic Algorithm for the MCD
8/27
the effect of large coordinatewise outliers. Computing the classical correlation matrix of Y
yields S 1 = corr( Y ).
(2) Now let R j be the ranks of the column Z j , and put S 2 = corr( R ). This is the Spearman
correlation matrix of Z . (Note that since the population Spearman correlation satisesS = 6 / sin
1 (/ 2) for bivariate normal distributions with correlation coefficient (Kendall,
1975), we also applied the inverse transformation to each element of S 2 . As it did not improve
the results, we did not retain this option.)
(3) For S 3 we compute normal scores from the ranks R j , namely T j = 1 ((R j 1/ 3)/ (n +1 / 3))
where (.) is the normal cumulative distribution function, and set S 3 = corr( T ).
(4) The fourth scatter estimate is based on the spatial sign covariance matrix (Visuri et al.,2000). Dene k i = z i / z i for all i and let S 4 = cov( K ). (Note that this is not the usual
spatial sign covariance matrix because the z i were centered by the coordinatewise median
instead of the spatial median to save computation time.) We also tried the spatial rank
covariance matrix
cov 1/nn
i,j =1(z i z j )/ z i z j ,
but this matrix requires O(n 2 ) operations (for xed p) whereas the other estimates only
require O(n log n) time, and it did not improve the performance of the algorithm.
(6) For S 5 we take the rst step of the BACON algorithm (Billor et al., 2000). Consider
the n/ 2 standardized observations z i with smallest norm, and compute their mean and
covariance matrix. (Note that the BACON algorithm starts with a smaller set.)
(6) The sixth scatter estimate is the raw OGK estimator. For m(.) and s(.) we used the median
and Qn for reasons of simplicity (no choice of tuning parameters) and to be consistent with
the other components of DetMCD.
(7) Finally we consider the classical mean 7 (Z ) and covariance matrix 7 (Z ) of the full data
set. This initial estimate is not robust, but it is fast and accurate at uncontaminated data.
7
-
8/2/2019 A Deter Minis Tic Algorithm for the MCD
9/27
4 Simulation study
We will compare the new DetMCD algorithm with FASTMCD.
4.1 Simulation design
The simulation is similar to the setup of Maronna and Zamar (2002). Because the DetMCD
estimates are not fully affine equivariant, their behavior may depend on the covariance structure,
hence we need to generate correlated data. These are obtained by rst generating uncorrelated
normal data y i N p(0, I) and applying an affine transformation x i = Gy i to them, where G is
the matrix with G jj = 1 and G jk = for j = k. If there is no contamination ( = 0) X has
covariance matrix G 2 , and the squared multiple correlation 2mult (which is the R 2 obtained by
regressing any coordinate of X on all of the others) can be calculated as a function of . In the
simulations we have taken such that mult = 0 .75, which is a rather collinear situation.
Outliers were generated in y -space, and the same affine transformation G was applied to them.
We considered three types of contamination: point contamination, cluster contamination, and
radial contamination. In all cases y i N p(0, I) for i = 1 , . . . , n m where m = n and is the
percentage of contamination. Point contamination was obtained as in Maronna and Zamar (2002)
by generating y i N p(y 0 , 2 I) for i = n m + 1 , . . . , n with = 0 .1 and y 0 = r a 0 , where a0 is a
unit vector generated orthogonal to (1 , 1, . . . , 1)T . The value of r , which determines the distance
between the outliers and the main center, was varied with the data dimension as specied below.
Cluster contamination was generated by shifting the center while using the same covariance matrix,
i.e. y i N p([ 10, 10, 0 p 2 ], I). For radial contamination many observations were generated from
the distribution N p(0, 5I) and as radial outliers we took the rst m observations whose statistical
distance exceeded the cutoff value 2 p,0 .8 .Different data sizes were considered, namelyA : n = 100 and p = 2 ( r = 50)
B : n = 100 and p = 5 ( r = 100)
C : n = 200 and p = 10 ( r = 150)
8
-
8/2/2019 A Deter Minis Tic Algorithm for the MCD
10/27
and different contamination levels were investigated, namely = 0%, 10%, and 20%. We always
put h, the number of observations whose covariance determinant will be minimized, equal to the
default value 0.75n in both FASTMCD and DetMCD, so that the algorithms can resist about
25% of outliers.
For both the FASTMCD and DetMCD algorithms we compute the raw and the reweighted
location vectors raw (X ) and (X ), and the raw and the reweighted scatter matrices raw (X )
and (X ). The corresponding estimators for the data set Y are obtained by transforming back
to (Y ) = G 1 (X ) and (Y ) = G 1 (X )G 1 . The following performance measures were
considered:
The objective function of the raw scatter estimator, OBJ = det raw (Y ).
An error measure of the location estimator, given by e = || (Y )|| 2 .
An error measure of the scatter estimate, dened as the logarithm of its condition number:
e = log 10 (cond( (Y ))).
The computation time t (in seconds).
Each of these performance measures should be as close to zero as possible. All simulations were run
in MATLAB R2007a (The MathWorks, Natick, MA). We wrote new code for DetMCD, whereasthe FASTMCD was obtained from the mcdcov function in the Matlab library LIBRA (Verboven
and Hubert, 2005).
4.2 Simulation results
Table 1 shows the simulation results for clean data (without contamination) for the different data
sizes. Each entry is the average (over 100 runs) of the performance measure in question. We
see that the algorithms perform similarly for the rst three performance criteria. The objectivefunction attained by FASTMCD was on average slightly smaller than with DetMCD, but this
difference is not signicant (according to the Mann-Whitney test). Also the differences in e and
e are not signicant. Moreover we see that DetMCD is much faster. These results also hold when
outliers are present in the data, irrespective of the type or fraction of contamination, as can be
9
-
8/2/2019 A Deter Minis Tic Algorithm for the MCD
11/27
Table 1: Simulation results for clean data.
FASTMCD DetMCD
A
OBJ 0.2658 0.2666
e 0.0236 0.0231
e 0.1507 0.1445
t 1.2832 0.0434
B
OBJ 0.1409 0.1421
e 0.0597 0.0592
e 0.3606 0.3478
t 1.3943 0.0616
C
OBJ 0.0743 0.0746e 0.0560 0.0560
e 0.3773 0.3734
t 2.3281 0.1352
Table 2: Simulation results for data with 10% of contamination.
Point Cluster Radial
FASTMCD DetMCD FASTMCD DetMCD FASTMCD DetMCD
A
OBJ 0.3873 0.3890 0.3873 0.3889 0.3873 0.3880e
0.0231 0.0231 0.0231 0.0234 0.0231 0.0238e 0.1376 0.1375 0.1376 0.1361 0.1376 0.1370t 1.2812 0.0446 1.2917 0.0452 1.2962 0.0423
B
OBJ 0.2412 0.2425 0.2411 0.2426 0.2411 0.2426e
0.0571 0.0570 0.0570 0.0566 0.0577 0.0565e 0.3403 0.3364 0.3404 0.3384 0.3403 0.3364
t 1.3786 0.0583 1.3910 0.0585 1.3805 0.0567
C
OBJ 0.1490 0.1499 0.1490 0.1499 0.1490 0.1499e
0.0579 0.0566 0.0578 0.0571 0.0576 0.0567e 0.3720 0.3693 0.3719 0.3694 0.3725 0.3695t 2.2429 0.1303 2.2359 0.1314 2.1922 0.1259
10
-
8/2/2019 A Deter Minis Tic Algorithm for the MCD
12/27
Table 3: Simulation results for data with 20% of contamination.
Point Cluster Radial
FASTMCD DetMCD FASTMCD DetMCD FASTMCD DetMCD
A
OBJ 0.6258 0.6267 0.6258 0.6268 0.6258 0.6261e
0.0256 0.0256 0.0256 0.0256 0.0306 0.0313e 0.1283 0.1283 0.1283 0.1283 0.1583 0.1597t 1.3077 0.0422 1.2945 0.0418 1.3042 0.0400
B
OBJ 0.4955 0.4968 0.4955 0.4965 0.4956 0.4963e
0.0621 0.0622 0.0621 0.0622 0.0621 0.0621e 0.3302 0.3308 0.3302 0.3292 0.3302 0.3298t 1.3762 0.0536 1.3948 0.0565 1.3791 0.0532
C
OBJ 0.4001 0.4015 0.4001 0.4017 0.4002 0.4015e
0.0633 0.0630 0.0633 0.0630 0.0633 0.0631e 0.3786 0.3772 0.3786 0.3780 0.3778 0.3781t 2.2410 0.1215 2.2495 0.1312 2.1975 0.1199
seen in Tables 2 and 3. We conclude that DetMCD is a fast and robust alternative to FASTMCD.
For DetMCD, Figure 1 shows how many times on average each initial subset led (after con-
vergence) to the smallest value of the objective function. Figure 1(a) shows this for the uncon-taminated case, whereas Figures 1(b) and (c) correspond to 10% and 20% of point contamination.
For clustered and radial contamination we obtained similar gures, hence they are not included
here. We immediately see that the rst subset (using the hyperbolic tangent transformation) is
often best in low dimensions. In higher dimensions the frequencies are more evenly distributed
(except that at contaminated data, the initial estimate based on the classical mean and covariance
matrix is selected rarely). Therefore, we kept all seven initial h-subsets in the algorithm. Figure 2
shows for each initial h-subset how many C-steps were needed on average to reach convergence.Typically 3 or 4 C-steps were sufficient, so the DetMCD algorithm used around 25 C-steps in all,
compared to over 1000 in FASTMCD.
11
-
8/2/2019 A Deter Minis Tic Algorithm for the MCD
13/27
1 2 3 4 5 6 70
10
20
30
40
50
60
Hsubset
# T i m e s
S e
l e c
t e d
n=100,p=2n=100,p=5n=200,p=10
1 2 3 4 5 6 70
10
20
30
40
50
60
70
80
Hsubset
# T i m e s
S e
l e c
t e d
n=100,p=2n=100,p=5n=200,p=10
(a) (b)
1 2 3 4 5 6 70
10
20
30
40
50
60
70
80
90
Hsubset
# T i m e s
S e
l e c
t e d
n=100,p=2n=100,p=5n=200,p=10
(c)
Figure 1: Number of times each of the seven initial subsets of DetMCD led to the best objective
function, for (a) 0%, (b) 10%, and (c) 20% of point contamination.
5 Properties of DetMCD
5.1 Affine equivariance
DetMCD is not fully affine equivariant any more due to the construction of the initial estimates.
We will measure its deviation from affine equivariance as done in Maronna and Zamar (2002)for the OGK. Since DetMCD is clearly location equivariant we can drop v from (1) and (2) and
only consider non-singular matrices A . We generate such a p p matrix A as the product of a
random orthogonal matrix and a diagonal matrix diag( u1 , . . . , u p) where the u i are independent
and uniformly distributed on (0 , 1). Let X A = {Ax 1 , . . . , Ax n }. We then compare the original
estimates X = (X ) and X = (X ) with A = A 1 (X A ) and A = A 1 (X A )A T .
12
-
8/2/2019 A Deter Minis Tic Algorithm for the MCD
14/27
1 2 3 4 5 6 70
1
2
3
4
5
6
Hsubset
# c s
t e p s
n=100,p=2n=100,p=5n=200,p=10
1 2 3 4 5 6 70
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
Hsubset
# c s
t e p s
n=100,p=2n=100,p=5n=200,p=10
(a) (b)
1 2 3 4 5 6 70
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
Hsubset
# c s
t e p s
n=100,p=2n=100,p=5n=200,p=10
(c)
Figure 2: For each initial subset of DetMCD, the average number of C-steps until convergence,
for (a) 0%, (b) 10%, and (c) 20% of point contamination.
Maronna and Zamar (2002) measured the deviation from equivariance by d = || A X || and
d = cond( 1 / 2X A
1 / 2X ). Note that affine equivariant estimators satisfy d = 0 and d = 1.
First, we study the deviation from affine equivariance on the ionospheric data taken from Bay
(1999) and preprocessed as in Maronna and Zamar (2002), which has n = 225 observations and
p = 31 variables. Table 4 shows the average d and d over 100 matrices A . Note that DetMCDwas even closer to affine equivariance than OGK. (The values for FASTMCD conrm its affine
equivariance.) We also considered data sets from the simulation in Section 4.1. For 20 such data
sets we generated 50 matrices A . Tables 5, 6 and 7 report d and d for 0%, 10%, and 20% of
contamination. In all cases DetMCD was closer to affine equivariance than OGK. Since Maronna
and Zamar (2002) concluded that the OGKs deviation from affine equivariance was small enough
13
-
8/2/2019 A Deter Minis Tic Algorithm for the MCD
15/27
not to concerned about, this holds even more so for DetMCD.
Table 4: Deviation from Affine Equivariance for the Ionospheric Data.
FASTMCD OGK DetMCD
d 0 0.3451 0.0133
d 1 177.5803 2.9954
Table 5: Deviation from Affine Equivariance for the reweighted estimators on simulated data
without contamination.
OGK DetMCD
Ad 0.0183 0.0002
d 1.0397 1.0012
Bd 0.0819 0.0030
d 1.5229 1.0072
Cd 0.2997 0.0423
d 1.8086 1.1796
Table 6: Deviation from Affine Equivariance for the reweighted estimators on simulated data with
10% of contamination.
Point Cluster Radial
OGK DetMCD OGK DetMCD OGK DetMCD
Ad 0.0382 0.0009 0.0436 0.0000 0.0394 0.0012
d 1.1840 1.0023 1.2240 1.0000 1.0995 1.0025
Bd 0.0924 0.0337 0.0866 0.0416 0.1097 0.0274
d 1.8610 1.0515 1.5662 1.0626 1.7491 1.0433
Cd 0.2551 0.0369 0.1672 0.0261 0.1942 0.0309
d 2.1819 1.1403 2.6909 1.1300 1.8949 1.1261
14
-
8/2/2019 A Deter Minis Tic Algorithm for the MCD
16/27
Table 7: Deviation from Affine Equivariance for the reweighted estimators on simulated data with
20% of contamination.
Point Cluster Radial
OGK DetMCD OGK DetMCD OGK DetMCD
Ad 0.0658 0.0000 0.0837 0.0000 0.0346 0.0000
d 1.2592 1.0000 3.1184 1.0000 1.1487 1.0000
Bd 0.1869 0.0000 0.1464 0.0000 0.1092 0.0000
d 2.1709 1.0000 1.9857 1.0000 1.8884 1.0000
Cd 0.2928 0.0005 0.3770 0.0830 0.1860 0.0000
d 2.0009 1.0022 12.1472 4.5434 1.8217 1.0000
5.2 Permutation invariance
Another property we are interested in is permutation invariance. An estimator T (.) is said to be
permutation invariant if T (P X ) = T (X ) for any data set X and any permutation matrix P . A
permutation matrix is a square matrix that has a single entry 1 in each row and each column, and
zeroes elsewhere. Therefore P X simply permutes the rows of X . Note that FASTMCD is not
permutation invariant because the initial subsets (generated by a pseudorandom number generator
with a xed seed) will have the same case numbers but correspond to different observations. By
contrast, all ingredients of DetMCD are permutation invariant. Analogous to the previous section,
the deviation from permutation invariance can be measured by d = || (P X ) (X )|| and
d = cond( (X ) 1 / 2 (P X ) (X ) 1 / 2 ). For the ionospheric data, Table 8 shows the average d
and d over 100 matrices P . They conrm that FASTMCD is not permutation invariant, whereas
OGK and DetMCD are.
Table 8: Deviation from Permutation Invariance for the Ionospheric Data.
FASTMCD OGK DetMCD
d 0.0410 0 0
d 12.4131 1 1
15
-
8/2/2019 A Deter Minis Tic Algorithm for the MCD
17/27
5.3 Different values of h
We already noted that DetMCD is faster than FASTMCD when applying the algorithm once for
a xed value of h. As the number of outliers should be below n h, it is commonly advised to set
h 0.5n when a large proportion of outliers could occur, and h 0.75n otherwise. Alternatively,one could also compute the MCD for a whole range of h-values, and see whether at some h there
is an important change in the objective function or the estimates. This is related to the forward
search of Atkinson et al. (2004). With DetMCD it becomes very easy to compute the MCD for
several h-values: since the seven initial estimates do not depend on h, we only need to store the
resulting ordered distances (3), yielding the initial h-subset for any h. We will illustrate this
feature on several examples in Section 6.
6 Real Examples
6.1 Philips data
Rousseeuw and Van Driessen (1999) illustrated FASTMCD on data provided by Philips, which
produced diaphragm parts for television sets. When a new production line was started, the
engineers measured 9 characteristics for each of the 677 parts. Applying FASTMCD with h =
0.75n yielded the robust distances in Figure 3(a). Many distances exceed the cutoff value
29 ,0 .975 . In particular, the observations 491-565 are clearly different from the others, indicatingthat something happened in the production process. Figure 3(b) shows the robust distances usingthe DetMCD algorithm. They are almost identical to the FASTMCD results, with the same points
being agged as outlying. Moreover, the estimates for location and scatter were almost identical,
i.e. d = || MCD DetMCD || = 0 .0004 and d = cond
12
MCD DetMCD (
12
MCD )T = 1 .0488. Also
the objective functions reached by the raw DetMCD and the raw FASTMCD were almost the same,
since OBJ MCDOBJ DetMCD = 0 .9930. The optimal h-subsets that determine the raw estimates only differed
in ve observations. The main difference lies in the computation time: whereas FASTMCD took
4.8 seconds, DetMCD only needed 0.5 seconds.
16
-
8/2/2019 A Deter Minis Tic Algorithm for the MCD
18/27
0 100 200 300 400 500 600 7000
2
4
6
8
10
12
14
16
Index
R
o b u s
t d i s t a n c e
0 100 200 300 400 500 600 7000
2
4
6
8
10
12
14
16
Index
R
o b u s
t d i s t a n c e
(a) (b)
Figure 3: Robust distances of the Philips data with (a) FASTMCD and (b) DetMCD.
6.2 Swiss bank notes dataAs a second example we consider p = 6 measurements of n = 100 forged Swiss 1000 franc bills, from
Flury and Riedwyl (1988). As shown in Salibian-Barrera et al. (2006); Pison and Van Aelst (2004),
and Willems et al. (2009), this data set contains several outlying observations and highly correlated
variables. Therefore it is appropriate to analyze the data with a robust PCA method. In Croux
and Haesbroeck (2000) it is argued that the MCD estimator can be used for this purpose. The
rst three principal components are retained, because together they explain 92% of the variance.
Figure 4 shows the resulting outlier maps based on FASTMCD and DetMCD using h = 75. On
the horizontal axis they have the robust distance of the observation in the three-dimensional PCA
subspace. The vertical axis shows the orthogonal distance of the observation to the PCA subspace.
Such an outlier map allows to classify observations into regular cases, good PCA leverage points,
orthogonal outliers, and bad PCA leverage points (Hubert et al., 2005). We see that the outlier
maps are very similar, and that the same observations are agged as outlying. The nal h-subsets
obtained with both algorithms had h 1 points in common. Again the computation times were
quite different: FASTMCD took 1.5 seconds whereas DetMCD was 20 times faster.
6.3 Pulp bre data
The MCD can also be used to perform a robust multivariate regression (Rousseeuw et al., 2004).
Denoting the q-dimensional response variable of the ith observation by y i , the goal of multivariate
17
-
8/2/2019 A Deter Minis Tic Algorithm for the MCD
19/27
0 0.5 1 1.5 2 2.5 3 3.5 4
0
0.5
1
1.5
2
Score distance (3 LV)
O r t
h o g o n a
l d i s t a n c e
233
118716
8060
13
25
61
1187
94 1668
38
618248 9262 60
80
6771
0 0.5 1 1.5 2 2.5 3 3.5 4
0
0.5
1
1.5
2
Score distance (3 LV)
O r t
h o g o n a
l d i s t a n c e
23 3 13
118716
8060
25
61
1187
94 1668
38
618248 9262 60
80
6771
(a) (b)
Figure 4: Outlier map of the Swiss bank notes data using robust PCA with (a) FASTMCD and
(b) DetMCD.
linear regression is to estimate the intercept vector and the slope matrix B in the model
y i = + Bx i + i .
The MCD regression estimates and B are obtained by matrix operations on the MCD location
and covariance estimates of the joint ( x i , y i) data.
To illustrate MCD regression we consider a dataset of Lee (1992) that contains properties of
n = 62 pulp bres and the paper made from them. MCD regression is applied to predict the q = 4
paper properties from the p = 4 bre characteristics. Figure 5 shows the norm of each slope B j for
all 27 values of h from 35 to 61, obtained with the FASTMCD and the DetMCD algorithms. Note
that FASTMCD has to start from scratch for each h, whereas DetMCD only needs to compute
the seven initial estimates once. The algorithms yield identical results for all values of h. Figure 5
has a sizeable jump at h = 52. In fact, from h = 52 on the nal h-subset contains bad leverage
points. It turned out that the most severe bad leverage points were produced from r wood, and
that most of the outlying samples were obtained using different pulping processes. The 27 robust
regressions together took 53.2 seconds with FASTMCD, whereas the version with DetMCD only
needed 5.9 seconds.
18
-
8/2/2019 A Deter Minis Tic Algorithm for the MCD
20/27
35 40 45 50 55 60
20
40
60
80
100
120
h
n o r m
( s l o p e
)
FASTMCD (1st slope)DetMCD(1st slope)FASTMCD (2nd slope)DetMCD(2nd slope)FASTMCD (3rd slope)DetMCD(3rd slope)FASTMCD (4th slope)DetMCD(4th slope)
Figure 5: Norm of each slope of the pulp bre data, for different values of h.
6.4 Fruit data
Our last example is high-dimensional. The fruit data set contains spectra of three different cultivars
(with sizes 490, 106, and 500) of a type of cantaloupe, and was previously analyzed in Hubert and
Van Driessen (2004). All spectra were measured at 256 wavelengths, hence the data set contains
1096 observations and 256 variables. First, we performed a robust PCA using the ROBPCA
method of Hubert et al. (2005). ROBPCA mainly consists of two steps. In the rst step arobust subspace is constructed based on the Stahel-Donoho outlyingness (Stahel, 1981; Donoho
and Gasko, 1992; Debruyne and Hubert, 2009). Next, robust eigenvectors and eigenvalues are
found within this subspace by applying the MCD to the projected observations. We consider
the original ROBPCA method that uses FASTMCD in the second stage of the algorithm, and a
modied ROBPCA that applies DetMCD. From the scree plot we decided to retain two principal
components. Again FASTMCD and DetMCD gave identical results. We note a big group of
outliers in Figure 6, which corresponds to a change in the instruments illumination system.
Next, we applied the robust quadratic discriminant rule RQDR (Hubert and Van Driessen,
2004) to the robust two-dimensional PCA scores for values of h from 550 to 1095. The RQDR
method rst runs the MCD estimator on each of the groups. A new datum is then assigned to the
group for which it attains the largest discriminant score. Also membership probabilities for each
group are estimated, as the proportion of regular observations in each group. Figure 7 shows these
19
-
8/2/2019 A Deter Minis Tic Algorithm for the MCD
21/27
0 2 4 6 8 10
0
1
2
3
4
5
6
7
8
9
10
Score distance (2 LV)
O r t
h o g o n a l
d i s t a n c e
621682
681
625615600
Figure 6: Outlier map of the fruit data using ROBPCA.
membership probabilities obtained with both methods as a function of h. Also here FASTMCD
600 700 800 900 1000 11000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
h
M e m
b e r s
h i p p r o
b a
b i l i t y f o r e a c
h g r o u p
FASTMCD: group 1DetMCD: group 1FASTMCD: group 2DetMCD: group 2FASTMCD: group 3DetMCD: group 3
Figure 7: Membership probabilities of each group of the fruit data, for different values of h.
and DetMCD give almost the same results. We see that the membership probabilities change
signicantly at h = 700, hence there are a substantial number of outliers present. Therefore h
should be taken sufficiently small to obtain robust results. The entire analysis took 562 seconds
when using FASTMCD, whereas it only needed 111 seconds with DetMCD. The computation time
only went down by a factor of 5 here because the analyses have several parts in common, such as
the computation of the discriminant scores.
20
-
8/2/2019 A Deter Minis Tic Algorithm for the MCD
22/27
7 Conclusions and outlook
DetMCD is a new algorithm for the MCD, which needs even less time than FASTMCD. It starts
from a few easily computed h-subsets, and then takes concentration steps until convergence. The
DetMCD algorithm is deterministic in that it does not use any random subsets. It is permutationinvariant and close to affine equivariant, and allows to run the analysis for many values of h without
much additional computation. We illustrated DetMCD in the contexts of PCA, regression, and
classication.
Also many other methods that directly or indirectly rely on the MCD (e.g. through its ro-
bust distances) may benet from the DetMCD approach, such as robust canonical correlation
(Croux and Dehon, 2002), robust regression with continuous and categorical regressors (Hubert
and Rousseeuw, 1996), robust errors-in-variables regression (Fekri and Ruiz-Gazen, 2004), robustprincipal component regression (Hubert and Verboven, 2003), and robust partial least squares
(Hubert and Vanden Branden, 2003). In particular, on-line applications or procedures that re-
quire the MCD to be computed many times, such as genetic algorithms (Wiegand et al., 2009), will
become more efficient. The cross-validation techniques of Hubert and Engelen (2007) and Engelen
and Hubert (2005) may benet from the fact that DetMCD is easily updated when an observation
is added or removed. Following Copt and Victoria-Feser (2004) and Serneels and Verdonck (2008)
we will also investigate whether DetMCD can be extended to the missing data framework.The DetMCD algorithm will be made available in Matlab as part of LIBRA (Verboven and
Hubert, 2005). Also an implementation in R will be provided.
The random sampling mechanism is currently used for many other high-breakdown robust
estimators. Our deterministic approach could improve on those algorithms as well. In particular we
intend to study a deterministic algorithm for S-estimators and -estimators, for which algorithms
in the spirit of FASTMCD were developed recently (Salibian-Barrera and Yohai, 2006; Salibian-
Barrera et al., 2008). We will also work on a deterministic algorithm for LTS regression, which istypically computed with the FASTLTS algorithm (Rousseeuw and Van Driessen, 2006).
SUPPLEMENTAL MATERIALS
Matlab code for DetMCD algorithm: Matlab code to perform the DetMCD algorithm that
is proposed in this article. Note that this algorithm requires the Matlab library for Robust
21
-
8/2/2019 A Deter Minis Tic Algorithm for the MCD
23/27
Analysis LIBRA, which can be freely downloaded from
http://wis.kuleuven.be/stat/robust/LIBRA.html . (.m le)
data sets: Matlab le that contains all the data sets used in this article. (.mat le)
References
Atkinson, A. , Riani, M. and Cerioli, A. (2004). Exploring multivariate data with the forward
search . Springer-Verlag, New York.
Bay, S. (1999). The UCI KDD Archive, http://kdd.ics.uci.edu . Irvine, CA: University of Cali-
fornia, Department of Information and Computer Science.
Billor, N. , Hadi, A. and Velleman, P. (2000). Bacon: blocked adaptive computationally
efficient outlier nominators. Computational Statistics and Data Analysis 34(3) 279298.
Butler, R. , Davies, P. and Jhun, M. (1993). Asymptotics for the Minimum Covariance
Determinant estimator. The Annals of Statistics 21 13851400.
Cator, E. and Lopuhaa, H. (2009). Central limit theorem and inuence function for the MCD
estimators at general multivariate distributions. Submitted.Copt, S. and Victoria-Feser, M.-P. (2004). Fast algorithms for computing high breakdown
covariance matrices with missing data. In Theory and Applications of Recent Robust Methods
(M. Hubert, G. Pison, A. Struyf and S. V. Aelst, eds.). Statistics for Industry and Technology,
Birkh auser, Basel.
Croux, C. and Dehon, C. (2002). Analyse canonique basee sur des estimateurs robustes de la
matrice de covariance. La Revue de Statistique Appliquee 2 526.
Croux, C. and Haesbroeck, G. (1999). Inuence function and efficiency of the Minimum
Covariance Determinant scatter matrix estimator. Journal of Multivariate Analysis 71 161
190.
22
-
8/2/2019 A Deter Minis Tic Algorithm for the MCD
24/27
Croux, C. and Haesbroeck, G. (2000). Principal components analysis based on robust esti-
mators of the covariance or correlation matrix: inuence functions and efficiencies. Biometrika
87 603618.
Debruyne, M. and Hubert, M. (2009). The inuence function of the Stahel-Donoho covarianceestimator of smallest outlyingness. Statistics and Probability Letters 79 275282.
Donoho, D. and Gasko, M. (1992). Breakdown properties of location estimates based on
halfspace depth and projected outlyingness. The Annals of Statistics 20 18031827.
Engelen, S. and Hubert, M. (2005). Fast model selection for robust calibration. Analytica
Chemica Acta 544 219228.
Fekri, M. and Ruiz-Gazen, A. (2004). Robust weighted orthogonal regression in the errors-
in-variables model. Journal of Multivariate Analysis 88 89108.
Flury, B. and Riedwyl, H. (1988). Multivariate statistics: a practical approach . Cambridge
university press.
Gnanadesikan, R. and Kettenring, J. (1972). Robust estimates, residuals, and outlier de-
tection with multiresponse data. Biometrics 28 81124.
Hadi, A. , Rahmatullah Imon, H. and Werner, M. (2009). Detection of outliers. Wiley
Interdisciplinary Reviews: Computational Statistics 1 5770.
Hardin, J. and Rocke, D. (2004). Outlier detection in the multiple cluster setting using the
minimum covariance determinant estimator. Computational Statistics and Data Analysis 44
625638.
Hubert, M. and Debruyne, M. (2009). Minimum Covariance Determinant. Wiley Interdisci-
plinary Reviews: Computational Statistics in press.
Hubert, M. and Engelen, S. (2007). Fast cross-validation for high-breakdown resampling
algorithms for PCA. Computational Statistics and Data Analysis 51 50135024.
Hubert, M. and Rousseeuw, P. (1996). Robust regression with both continuous and binary
regressors. Journal of Statistical Planning and Inference 57 153163.
23
-
8/2/2019 A Deter Minis Tic Algorithm for the MCD
25/27
Hubert, M. , Rousseeuw, P. and Van Aelst, S. (2008). High breakdown robust multivariate
methods. Statistical Science 23 92119.
Hubert, M. , Rousseeuw, P. and Vanden Branden, K. (2005). ROBPCA: a new approach
to robust principal components analysis. Technometrics 47 6479.
Hubert, M. and Van Driessen, K. (2004). Fast and robust discriminant analysis. Computa-
tional Statistics and Data Analysis 45 301320.
Hubert, M. and Vanden Branden, K. (2003). Robust methods for Partial Least Squares
Regression. Journal of Chemometrics 17 537549.
Hubert, M. and Verboven, S. (2003). A robust PCR method for high-dimensional regressors.
Journal of Chemometrics 17 438452.
Kendall, M. (1975). Multivariate Analysis . Griffin, London.
Lee, J. (1992). Relationships between Properties of Pulp-Fibre and Paper . Ph.D. thesis, Ph.D.,
University of Toronto.
Lopuha a, H. and Rousseeuw, P. (1991). Breakdown points of affine equivariant estimators of
multivariate location and covariance matrices. The Annals of Statistics 19 229248.Maronna, R. and Zamar, R. (2002). Robust estimates of location and dispersion for high-
dimensional data sets. Technometrics 44 307317.
Pison, G. , Rousseeuw, P. , Filzmoser, P. and Croux, C. (2003). Robust factor analysis.
Journal of Multivariate Analysis 84 145172.
Pison, G. and Van Aelst, S. (2004). Diagnostic plots for robust multivariate methods. Journal
of Computational and Graphical Statistics 13 310329.
Pison, G. , Van Aelst, S. and Willems, G. (2002). Small sample corrections for LTS and
MCD. Metrika 55 111123.
Rousseeuw, P. (1984). Least median of squares regression. Journal of the American Statistical
Association 79 871880.
24
-
8/2/2019 A Deter Minis Tic Algorithm for the MCD
26/27
Rousseeuw, P. and Croux, C. (1993). Alternatives to the median absolute deviation. Journal
of the American Statistical Association 88 12731283.
Rousseeuw, P. , Van Aelst, S. , Van Driessen, K. and Agull o, J. (2004). Robust multi-
variate regression. Technometrics 46 293305.
Rousseeuw, P. and Van Driessen, K. (1999). A fast algorithm for the Minimum Covariance
Determinant estimator. Technometrics 41 212223.
Rousseeuw, P. and Van Driessen, K. (2006). Computing LTS regression for large data sets.
Data Mining and Knowledge Discovery 12 2945.
Salibian-Barrera, M. , Van Aelst, S. and Willems, G. (2006). PCA based on multivariate
MM-estimators with fast and robust bootstrap. Journal of the American Statistical Association
101 11981211.
Salibian-Barrera, M. , Willems, G. and Zamar, R. (2008). The fast- estimator for regres-
sion. Journal of Computational and Graphical Statistics 17 659682.
Salibian-Barrera, M. and Yohai, V. J. (2006). A fast algorithm for S-regression estimates.
Journal of Computational and Graphical Statistics 15 414427.
Serneels, S. and Verdonck, T. (2008). Principal component analysis for data containing
outliers and missing elements. Computational Statistics and Data Analysis 52 17121727.
Stahel, W. (1981). Robuste Sch atzungen: innitesimale Optimalit at und Sch atzungen von Ko-
varianzmatrizen . Ph.D. thesis, ETH Zurich.
Vanden Branden, K. and Hubert, M. (2005). Robust classication in high dimensions based
on the SIMCA method. Chemometrics and Intelligent Laboratory Systems 79 1021.
Verboven, S. and Hubert, M. (2005). LIBRA: a Matlab library for robust analysis. Chemo-
metrics and Intelligent Laboratory Systems 75 127136.
Visuri, S. , Koivunen, V. and Oja, H. (2000). Sign and rank covariance matrices. Journal of
Statistical Planning and Inference 91 557575.
25
-
8/2/2019 A Deter Minis Tic Algorithm for the MCD
27/27
Wiegand, P. , Pell, R. and Comas, E. (2009). Simultaneous variable selection and outlier
detection using a robust genetic algorithm. Chemometrics and Intelligent Laboratory Systems
98 108114.
Willems, G. , Joe, H. and Zamar, R. (2009). Diagnosing multivariate outliers detected byrobust estimators. Journal of Computational and Graphical Statistics 18(1) 7391.
Wise, B. , Gallagher, N. , Bro, R. , Shaver, J. , Windig, W. and Koch, R. (2006).
PLS Toolbox 4.0 for use with MATLAB . Software, Eigenvector Research, Inc., 2006.
URL http://software.eigenvector.com/
Yohai, V. and Zamar, R. (1988). High breakdown point estimates of regression by means of the
minimization of an efficient scale. Journal of the American Statistical Association 83 406413.
26