A Deter Minis Tic Algorithm for the MCD

download A Deter Minis Tic Algorithm for the MCD

of 27

Transcript of A Deter Minis Tic Algorithm for the MCD

  • 8/2/2019 A Deter Minis Tic Algorithm for the MCD

    1/27

    S E C T I O N O F S T A T I S T I C S

    DEPARTMENT OF MATHEMATICS

    KATHOLIEKE UNIVERSITEIT LEUVEN

    T E C H N I C A L R E P O R T

    TR-10-01

    A DETERMINISTIC ALGORITHM

    FOR THE MCD

    Hubert, M., Rousseeuw, P.J., Verdonck, T.

    http://wis.kuleuven.be/stat/

  • 8/2/2019 A Deter Minis Tic Algorithm for the MCD

    2/27

    A deterministic algorithm for the MCD

    Mia HubertDepartment of Mathematics, Katholieke Universiteit Leuven

    andPeter J. Rousseeuw

    Department of Mathematics, Katholieke Universiteit Leuvenand

    Tim VerdonckDepartment of Mathematics and Computer Science, Universiteit Antwerpen

    January 4, 2010

    Abstract

    The minimum covariance determinant (MCD) method is a robust estimator of multivari-ate location and scatter (Rousseeuw, 1984). The MCD is highly resistant to outliers, andit is often applied by itself and as a building block for other robust multivariate methods.

    Computing the exact MCD is very hard, so in practice one resorts to approximate algorithms.Most often the FASTMCD algorithm of Rousseeuw and Van Driessen (1999) is used. Thisalgorithm starts by drawing many random subsets, followed by so-called concentration steps.The FASTMCD algorithm is affine equivariant but not permutation invariant. In this arti-cle we present a deterministic algorithm, denoted as DetMCD, which does not use randomsubsets and is even faster. It is permutation invariant and very close to affine equivariant.We illustrate DetMCD on real and simulated data sets, with applications involving princi-pal component analysis, multivariate regression, and classication. Supplemental material(Matlab code of the DetMCD algorithm and the data sets) are available online.

    Keywords: affine equivariance, covariance, outliers, multivariate, robustness.

    1 Introduction

    The Minimum Covariance Determinant (MCD) method (Rousseeuw, 1984) is a highly robust

    estimator of multivariate location and scatter. Given an n p data matrix X = ( x 1 , . . . , x n )T with

    1

  • 8/2/2019 A Deter Minis Tic Algorithm for the MCD

    3/27

    x i = ( xi1 , . . . , x ip )T , its objective is to nd h observations (with n/ 2 h n) whose covariance

    matrix has the lowest determinant. The MCD estimate of location is then the average of these

    h points, and the scatter estimate is a multiple of their covariance matrix. Consistency and

    asymptotic normality of the MCD estimator has been shown by Butler et al. (1993) and Cator

    and Lopuhaa (2009). The MCD has a bounded inuence function (Croux and Haesbroeck, 1999).

    The breakdown value is the smallest amount of contamination that can have an arbitrarily large

    effect on the estimator. The MCD estimator has the highest possible breakdown value (i.e. 50%)

    when h = (n + p + 1) / 2 (Lopuha a and Rousseeuw, 1991). In practice we often do not need the

    maximal breakdown value and therefore typically h = 0.75n is chosen, yielding a breakdown

    value of 25% which is sufficiently robust for most applications.

    In addition to being highly resistant to outliers, the MCD is affine equivariant, i.e. the estimates

    behave properly under affine transformations of the data. To be precise, the estimators and

    are affine equivariant if for any n p data set X it holds that

    (XA + 1n v T ) = (X )A + v (1)

    (XA + 1n v T ) = A T (X )A (2)

    for all nonsingular p p matrices A and all p 1 vectors v . The vector 1n denotes (1, 1, . . . , 1)T

    with n entries. Affine equivariance makes the analysis independent of the measurement scales of

    the variables, as well as to translations and rotations of the data.

    Although the MCD was already introduced in 1984, its practical use only became feasible

    since the introduction of the computationally efficient FASTMCD algorithm of Rousseeuw and

    Van Driessen (1999). Since then the MCD has been applied in various elds such as quality con-

    trol, medicine, nance, image analysis and chemistry, see e.g. Hubert et al. (2008) and Hubert

    and Debruyne (2009) for references. The MCD is also being used as a basis to develop robust and

    computationally efficient multivariate techniques, such as e.g. principal component analysis (Croux

    and Haesbroeck, 2000; Hubert et al., 2005), factor analysis (Pison et al., 2003), classication (Hu-

    bert and Van Driessen, 2004; Vanden Branden and Hubert, 2005), clustering (Hardin and Rocke,

    2004), and multivariate regression (Rousseeuw et al., 2004). For a review see (Hubert et al., 2008).

    The FASTMCD algorithm starts by drawing random subsets of size p+1. It needs to draw many

    in order to obtain at least one that is outlier-free. Starting from each subset several iteration steps

    are taken, as will be described in the next section. The overall computation time of FASTMCD

    2

  • 8/2/2019 A Deter Minis Tic Algorithm for the MCD

    4/27

    is thus roughly proportional to the number of initial subsets.

    If one is willing to give up the affine equivariance requirement, certain robust covariance ma-

    trices can be computed much faster. This is the idea behind the BACON algorithm (Billor et al.,

    2000; Hadi et al., 2009), the spatial sign and rank covariance matrices (Visuri et al., 2000), and

    the OGK estimator (Maronna and Zamar, 2002).

    In this chapter we will present a deterministic algorithm for the MCD, denoted as DetMCD,

    which does not use random subsets and runs even faster than FASTMCD. Unlike the latter it is

    permutation invariant, i.e. the result does not depend on the order of the observations in the data

    set. It starts from only a few well-chosen initial estimates. In Section 2 we give brief descriptions of

    FASTMCD and the OGK estimator, since parts of both are used in Section 3 to construct the new

    DetMCD algorithm. Section 4 reports on an extensive simulation study, showing that DetMCD is

    as robust as FASTMCD. In Section 5 we show that DetMCD is permutation invariant and close

    to affine equivariant. Section 6 illustrates the algorithm on several real data sets with applications

    involving principal component analysis, multivariate regression, and discriminant analysis.

    2 FASTMCD and OGK

    In this section we briey describe the FASTMCD algorithm and the OGK estimator, as our new

    algorithm DetMCD will use aspects of both. The observations will be denoted as x i (i = 1 , . . . , n ),

    whereas the columns of our data matrix are denoted by X j ( j = 1 , . . . , p). For a data set X with

    estimated center and scatter matrix , the statistical distance of the i-th observation x i will

    be written as

    D(x i , , ) = (x i )T 1 (x i ).2.1 The FASTMCD algorithm

    A major component of the FASTMCD algorithm is the concentration step (C-step), which works

    as follows. Given initial estimates old for the center and old for the scatter matrix,

    1. Compute the distances dold (i) = D(x i , old , old ) for i = 1 , . . . , n .

    2. Sort these distances, yielding a permutation for which

    dold ((1)) dold ((2)) . . . dold ((n)), and set H = {(1), (2), . . . , (h)}.

    3

  • 8/2/2019 A Deter Minis Tic Algorithm for the MCD

    5/27

    3. Compute new = 1 /h iH x i and new = 1 /h iH (x i new )(x i new )T .

    In Theorem 1 of Rousseeuw and Van Driessen (1999) it was proved that det( new ) det( old ),

    with equality only if new = old . Therefore, if we apply C-steps iteratively, the sequence of

    determinants obtained in this way must converge in a nite number of steps (because there areonly nitely many h-subsets). Since there is no guarantee that the nal value of the iteration

    process is the global minimum of the MCD objective function, an approximate MCD solution

    is obtained by taking many initial h-subsets H 1 {1, 2, . . . , n }, applying C-steps to each, and

    keeping the solution with the overall lowest determinant.

    To construct an initial subset H 1 a random ( p + 1)-subset J is drawn and 0 = 1 / ( p +

    1) i J x i and 0 = 1 / ( p+ 1) i J (x i 0 )(x i 0 )T are computed. (If 0 is singular, random

    points are added to J until it becomes nonsingular.) Next, we apply the C-step to ( 0 , 0 )

    yielding ( 1 , 1 ), etc. Since each C-step involves the calculation of a covariance matrix, its inverse,

    and the corresponding distances, we dont want to use too many. Therefore, the FASTMCD

    algorithm only applies two C-steps to each initial subset, and only on the ten subsets with lowest

    determinant further C-steps are taken until convergence. The raw FASTMCD estimates, RAWMCD

    and RAWMCD , then correspond to the empirical mean and covariance matrix of the h-subset with

    the lowest determinant.

    In order to increase the statistical efficiency while retaining high robustness, reweighted esti-

    mators are computed:

    FASTMCD =n

    i=1wix i /

    n

    i=1wi

    FASTMCD = c1n

    i=1wi(x i FASTMCD )(x i FASTMCD )T

    n

    i=1wi

    1

    where c1 is a correction factor to obtain consistency when the data come from a multivariate

    normal distribution (Pison et al., 2002) and wi is an appropriate weight function, e.g.

    wi =1 D(x i , RAWMCD , RAWMCD ) 2 p,0 .9750 otherwise

    with 2 p, the -quantile of the 2 p distribution.

    Implementations of the FASTMCD algorithm are available in the package S-PLUS (as the

    built-in function cov.mcd ), in R (as part of the packages rrcov , robust and robustbase ), in

    4

  • 8/2/2019 A Deter Minis Tic Algorithm for the MCD

    6/27

    SAS/IML Version 7, and in SAS Version 9 (in PROC ROBUSTREG). The FASTMCD is also

    part of LIBRA, a Matlab LIBrary for Robust Analysis (Verboven and Hubert, 2005) as the function

    mcdcov . Moreover, it is available in the PLS Toolbox of Eigenvector Research (Wise et al., 2006)

    used in chemometrics.

    2.2 The OGK estimator

    Maronna and Zamar (2002) presented a general method to obtain positive denite and approx-

    imately affine equivariant robust scatter matrices starting from any pairwise robust scatter ma-

    trix. This method was applied to the robust covariance estimate of Gnanadesikan and Ketten-

    ring (1972). The resulting multivariate location and scatter estimates are called orthogonalized

    Gnanadesikan-Kettenring (OGK) estimates and are calculated as follows:1. Let m(.) and s(.) be robust univariate estimators of location and scale.

    2. Construct y i = D 1 x i for i = 1 , . . . , n with D = diag( s(X 1 ), . . . , s (X p)).

    3. Compute the correlation matrix U of the variables of Y = ( Y 1 , . . . , Y p), given by

    u jk = 1 / 4(s(Y j + Y k)2 s(Y j Y k)

    2 ).

    4. Compute the matrix E of eigenvectors of U and

    (a) project the data on these eigenvectors, i.e. V = Y E ;

    (b) compute robust variances of V = ( V 1 , . . . , V p), i.e. = diag( s2 (V 1 ), . . . , s 2 (V p));

    (c) Set (Y ) = Em where m = ( m(V 1 ), . . . , m (V p))T , and compute the positive denite

    matrix (Y ) = E E T .

    5. Transform back to X , i.e. RAWOGK = D (Y ) and RAWOGK = D (Y )D T .

    In the OGK algorithm m(.) is a weighted mean and s(.) is the -scale of Yohai and Zamar

    (1988). Step 2 makes the estimate scale equivariant, whereas the following steps are a kind

    of principal components that replace the eigenvalues of U (which may be negative) by robust

    variances. As in the FASTMCD algorithm the estimate is improved by a reweighting step, where

    the cutoff value in the weight function is now taken as c = 2 p,0 .9 med(d1 , . . . , d n )/2 p,0 .5 with

    di = D(x i , RAWOGK , RAWOGK ). The reweighted estimates are denoted as OGK and OGK .

    5

  • 8/2/2019 A Deter Minis Tic Algorithm for the MCD

    7/27

    3 Deterministic MCD algorithm

    3.1 General procedure

    In this section we present an alternative algorithm to calculate the MCD. First we standardizeeach variable X j by subtracting its median and dividing by the Qn scale estimator of Rousseeuw

    and Croux (1993). This standardization makes the algorithm location and scale equivariant, i.e.

    (1) and (2) hold for any non-singular diagonal matrix A . (We also looked into centering by

    the spatial median, but based on speed considerations and simulation results we stayed with the

    coordinatewise median.) The standardized data set is denoted by Z with rows z T i (i = 1 , . . . , n )

    and columns Z j ( j = 1 , . . . , p).

    Next, we construct seven initial estimates k(Z ) and k(Z ) (k = 1 , . . . , 7) for the center and

    scatter of Z . Apart from the last one, each computes a preliminary estimate S k of the covariance

    or correlation matrix of Z . They will be described in Section 3.2. As these S k may have very

    inaccurate eigenvalues, we apply the following steps to each. Note that the rst two steps are

    similar to steps 4(a) and 4(b) of the OGK algorithm:

    1. Compute the matrix E of eigenvectors of S k and put B = ZE .

    2. Estimate the covariance of Z by k(Z ) = ELE T where L = diag ( Q2n (B 1 ), . . . , Q2n (B p)).

    3. To estimate the center of Z we sphere the data, apply the coordinatewise median, and

    transform it back, i.e. k(Z ) = 1 / 2k (med( Z

    1 / 2k )).

    For all seven estimates ( k(Z ), k(Z )) we then compute the statistical distances

    dik = D(z i , k(Z ), k(Z )) . (3)

    For each initial estimate k we take the h observations with smallest dik and apply C-steps until

    convergence. The solution with smallest determinant we call the raw DetMCD. Then we apply areweighting step as in the FASTMCD algorithm, yielding the nal DetMCD.

    3.2 Initial scatter estimates

    (1) The rst initial scatter estimate is obtained by computing the hyperbolic tangent (sigmoid)

    of each column of Z , i.e. Y j = tanh( Z j ) for j = 1 , . . . , p . This bounded function reduces

    6

  • 8/2/2019 A Deter Minis Tic Algorithm for the MCD

    8/27

    the effect of large coordinatewise outliers. Computing the classical correlation matrix of Y

    yields S 1 = corr( Y ).

    (2) Now let R j be the ranks of the column Z j , and put S 2 = corr( R ). This is the Spearman

    correlation matrix of Z . (Note that since the population Spearman correlation satisesS = 6 / sin

    1 (/ 2) for bivariate normal distributions with correlation coefficient (Kendall,

    1975), we also applied the inverse transformation to each element of S 2 . As it did not improve

    the results, we did not retain this option.)

    (3) For S 3 we compute normal scores from the ranks R j , namely T j = 1 ((R j 1/ 3)/ (n +1 / 3))

    where (.) is the normal cumulative distribution function, and set S 3 = corr( T ).

    (4) The fourth scatter estimate is based on the spatial sign covariance matrix (Visuri et al.,2000). Dene k i = z i / z i for all i and let S 4 = cov( K ). (Note that this is not the usual

    spatial sign covariance matrix because the z i were centered by the coordinatewise median

    instead of the spatial median to save computation time.) We also tried the spatial rank

    covariance matrix

    cov 1/nn

    i,j =1(z i z j )/ z i z j ,

    but this matrix requires O(n 2 ) operations (for xed p) whereas the other estimates only

    require O(n log n) time, and it did not improve the performance of the algorithm.

    (6) For S 5 we take the rst step of the BACON algorithm (Billor et al., 2000). Consider

    the n/ 2 standardized observations z i with smallest norm, and compute their mean and

    covariance matrix. (Note that the BACON algorithm starts with a smaller set.)

    (6) The sixth scatter estimate is the raw OGK estimator. For m(.) and s(.) we used the median

    and Qn for reasons of simplicity (no choice of tuning parameters) and to be consistent with

    the other components of DetMCD.

    (7) Finally we consider the classical mean 7 (Z ) and covariance matrix 7 (Z ) of the full data

    set. This initial estimate is not robust, but it is fast and accurate at uncontaminated data.

    7

  • 8/2/2019 A Deter Minis Tic Algorithm for the MCD

    9/27

    4 Simulation study

    We will compare the new DetMCD algorithm with FASTMCD.

    4.1 Simulation design

    The simulation is similar to the setup of Maronna and Zamar (2002). Because the DetMCD

    estimates are not fully affine equivariant, their behavior may depend on the covariance structure,

    hence we need to generate correlated data. These are obtained by rst generating uncorrelated

    normal data y i N p(0, I) and applying an affine transformation x i = Gy i to them, where G is

    the matrix with G jj = 1 and G jk = for j = k. If there is no contamination ( = 0) X has

    covariance matrix G 2 , and the squared multiple correlation 2mult (which is the R 2 obtained by

    regressing any coordinate of X on all of the others) can be calculated as a function of . In the

    simulations we have taken such that mult = 0 .75, which is a rather collinear situation.

    Outliers were generated in y -space, and the same affine transformation G was applied to them.

    We considered three types of contamination: point contamination, cluster contamination, and

    radial contamination. In all cases y i N p(0, I) for i = 1 , . . . , n m where m = n and is the

    percentage of contamination. Point contamination was obtained as in Maronna and Zamar (2002)

    by generating y i N p(y 0 , 2 I) for i = n m + 1 , . . . , n with = 0 .1 and y 0 = r a 0 , where a0 is a

    unit vector generated orthogonal to (1 , 1, . . . , 1)T . The value of r , which determines the distance

    between the outliers and the main center, was varied with the data dimension as specied below.

    Cluster contamination was generated by shifting the center while using the same covariance matrix,

    i.e. y i N p([ 10, 10, 0 p 2 ], I). For radial contamination many observations were generated from

    the distribution N p(0, 5I) and as radial outliers we took the rst m observations whose statistical

    distance exceeded the cutoff value 2 p,0 .8 .Different data sizes were considered, namelyA : n = 100 and p = 2 ( r = 50)

    B : n = 100 and p = 5 ( r = 100)

    C : n = 200 and p = 10 ( r = 150)

    8

  • 8/2/2019 A Deter Minis Tic Algorithm for the MCD

    10/27

    and different contamination levels were investigated, namely = 0%, 10%, and 20%. We always

    put h, the number of observations whose covariance determinant will be minimized, equal to the

    default value 0.75n in both FASTMCD and DetMCD, so that the algorithms can resist about

    25% of outliers.

    For both the FASTMCD and DetMCD algorithms we compute the raw and the reweighted

    location vectors raw (X ) and (X ), and the raw and the reweighted scatter matrices raw (X )

    and (X ). The corresponding estimators for the data set Y are obtained by transforming back

    to (Y ) = G 1 (X ) and (Y ) = G 1 (X )G 1 . The following performance measures were

    considered:

    The objective function of the raw scatter estimator, OBJ = det raw (Y ).

    An error measure of the location estimator, given by e = || (Y )|| 2 .

    An error measure of the scatter estimate, dened as the logarithm of its condition number:

    e = log 10 (cond( (Y ))).

    The computation time t (in seconds).

    Each of these performance measures should be as close to zero as possible. All simulations were run

    in MATLAB R2007a (The MathWorks, Natick, MA). We wrote new code for DetMCD, whereasthe FASTMCD was obtained from the mcdcov function in the Matlab library LIBRA (Verboven

    and Hubert, 2005).

    4.2 Simulation results

    Table 1 shows the simulation results for clean data (without contamination) for the different data

    sizes. Each entry is the average (over 100 runs) of the performance measure in question. We

    see that the algorithms perform similarly for the rst three performance criteria. The objectivefunction attained by FASTMCD was on average slightly smaller than with DetMCD, but this

    difference is not signicant (according to the Mann-Whitney test). Also the differences in e and

    e are not signicant. Moreover we see that DetMCD is much faster. These results also hold when

    outliers are present in the data, irrespective of the type or fraction of contamination, as can be

    9

  • 8/2/2019 A Deter Minis Tic Algorithm for the MCD

    11/27

    Table 1: Simulation results for clean data.

    FASTMCD DetMCD

    A

    OBJ 0.2658 0.2666

    e 0.0236 0.0231

    e 0.1507 0.1445

    t 1.2832 0.0434

    B

    OBJ 0.1409 0.1421

    e 0.0597 0.0592

    e 0.3606 0.3478

    t 1.3943 0.0616

    C

    OBJ 0.0743 0.0746e 0.0560 0.0560

    e 0.3773 0.3734

    t 2.3281 0.1352

    Table 2: Simulation results for data with 10% of contamination.

    Point Cluster Radial

    FASTMCD DetMCD FASTMCD DetMCD FASTMCD DetMCD

    A

    OBJ 0.3873 0.3890 0.3873 0.3889 0.3873 0.3880e

    0.0231 0.0231 0.0231 0.0234 0.0231 0.0238e 0.1376 0.1375 0.1376 0.1361 0.1376 0.1370t 1.2812 0.0446 1.2917 0.0452 1.2962 0.0423

    B

    OBJ 0.2412 0.2425 0.2411 0.2426 0.2411 0.2426e

    0.0571 0.0570 0.0570 0.0566 0.0577 0.0565e 0.3403 0.3364 0.3404 0.3384 0.3403 0.3364

    t 1.3786 0.0583 1.3910 0.0585 1.3805 0.0567

    C

    OBJ 0.1490 0.1499 0.1490 0.1499 0.1490 0.1499e

    0.0579 0.0566 0.0578 0.0571 0.0576 0.0567e 0.3720 0.3693 0.3719 0.3694 0.3725 0.3695t 2.2429 0.1303 2.2359 0.1314 2.1922 0.1259

    10

  • 8/2/2019 A Deter Minis Tic Algorithm for the MCD

    12/27

    Table 3: Simulation results for data with 20% of contamination.

    Point Cluster Radial

    FASTMCD DetMCD FASTMCD DetMCD FASTMCD DetMCD

    A

    OBJ 0.6258 0.6267 0.6258 0.6268 0.6258 0.6261e

    0.0256 0.0256 0.0256 0.0256 0.0306 0.0313e 0.1283 0.1283 0.1283 0.1283 0.1583 0.1597t 1.3077 0.0422 1.2945 0.0418 1.3042 0.0400

    B

    OBJ 0.4955 0.4968 0.4955 0.4965 0.4956 0.4963e

    0.0621 0.0622 0.0621 0.0622 0.0621 0.0621e 0.3302 0.3308 0.3302 0.3292 0.3302 0.3298t 1.3762 0.0536 1.3948 0.0565 1.3791 0.0532

    C

    OBJ 0.4001 0.4015 0.4001 0.4017 0.4002 0.4015e

    0.0633 0.0630 0.0633 0.0630 0.0633 0.0631e 0.3786 0.3772 0.3786 0.3780 0.3778 0.3781t 2.2410 0.1215 2.2495 0.1312 2.1975 0.1199

    seen in Tables 2 and 3. We conclude that DetMCD is a fast and robust alternative to FASTMCD.

    For DetMCD, Figure 1 shows how many times on average each initial subset led (after con-

    vergence) to the smallest value of the objective function. Figure 1(a) shows this for the uncon-taminated case, whereas Figures 1(b) and (c) correspond to 10% and 20% of point contamination.

    For clustered and radial contamination we obtained similar gures, hence they are not included

    here. We immediately see that the rst subset (using the hyperbolic tangent transformation) is

    often best in low dimensions. In higher dimensions the frequencies are more evenly distributed

    (except that at contaminated data, the initial estimate based on the classical mean and covariance

    matrix is selected rarely). Therefore, we kept all seven initial h-subsets in the algorithm. Figure 2

    shows for each initial h-subset how many C-steps were needed on average to reach convergence.Typically 3 or 4 C-steps were sufficient, so the DetMCD algorithm used around 25 C-steps in all,

    compared to over 1000 in FASTMCD.

    11

  • 8/2/2019 A Deter Minis Tic Algorithm for the MCD

    13/27

    1 2 3 4 5 6 70

    10

    20

    30

    40

    50

    60

    Hsubset

    # T i m e s

    S e

    l e c

    t e d

    n=100,p=2n=100,p=5n=200,p=10

    1 2 3 4 5 6 70

    10

    20

    30

    40

    50

    60

    70

    80

    Hsubset

    # T i m e s

    S e

    l e c

    t e d

    n=100,p=2n=100,p=5n=200,p=10

    (a) (b)

    1 2 3 4 5 6 70

    10

    20

    30

    40

    50

    60

    70

    80

    90

    Hsubset

    # T i m e s

    S e

    l e c

    t e d

    n=100,p=2n=100,p=5n=200,p=10

    (c)

    Figure 1: Number of times each of the seven initial subsets of DetMCD led to the best objective

    function, for (a) 0%, (b) 10%, and (c) 20% of point contamination.

    5 Properties of DetMCD

    5.1 Affine equivariance

    DetMCD is not fully affine equivariant any more due to the construction of the initial estimates.

    We will measure its deviation from affine equivariance as done in Maronna and Zamar (2002)for the OGK. Since DetMCD is clearly location equivariant we can drop v from (1) and (2) and

    only consider non-singular matrices A . We generate such a p p matrix A as the product of a

    random orthogonal matrix and a diagonal matrix diag( u1 , . . . , u p) where the u i are independent

    and uniformly distributed on (0 , 1). Let X A = {Ax 1 , . . . , Ax n }. We then compare the original

    estimates X = (X ) and X = (X ) with A = A 1 (X A ) and A = A 1 (X A )A T .

    12

  • 8/2/2019 A Deter Minis Tic Algorithm for the MCD

    14/27

    1 2 3 4 5 6 70

    1

    2

    3

    4

    5

    6

    Hsubset

    # c s

    t e p s

    n=100,p=2n=100,p=5n=200,p=10

    1 2 3 4 5 6 70

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    4

    4.5

    5

    Hsubset

    # c s

    t e p s

    n=100,p=2n=100,p=5n=200,p=10

    (a) (b)

    1 2 3 4 5 6 70

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    4

    4.5

    5

    Hsubset

    # c s

    t e p s

    n=100,p=2n=100,p=5n=200,p=10

    (c)

    Figure 2: For each initial subset of DetMCD, the average number of C-steps until convergence,

    for (a) 0%, (b) 10%, and (c) 20% of point contamination.

    Maronna and Zamar (2002) measured the deviation from equivariance by d = || A X || and

    d = cond( 1 / 2X A

    1 / 2X ). Note that affine equivariant estimators satisfy d = 0 and d = 1.

    First, we study the deviation from affine equivariance on the ionospheric data taken from Bay

    (1999) and preprocessed as in Maronna and Zamar (2002), which has n = 225 observations and

    p = 31 variables. Table 4 shows the average d and d over 100 matrices A . Note that DetMCDwas even closer to affine equivariance than OGK. (The values for FASTMCD conrm its affine

    equivariance.) We also considered data sets from the simulation in Section 4.1. For 20 such data

    sets we generated 50 matrices A . Tables 5, 6 and 7 report d and d for 0%, 10%, and 20% of

    contamination. In all cases DetMCD was closer to affine equivariance than OGK. Since Maronna

    and Zamar (2002) concluded that the OGKs deviation from affine equivariance was small enough

    13

  • 8/2/2019 A Deter Minis Tic Algorithm for the MCD

    15/27

    not to concerned about, this holds even more so for DetMCD.

    Table 4: Deviation from Affine Equivariance for the Ionospheric Data.

    FASTMCD OGK DetMCD

    d 0 0.3451 0.0133

    d 1 177.5803 2.9954

    Table 5: Deviation from Affine Equivariance for the reweighted estimators on simulated data

    without contamination.

    OGK DetMCD

    Ad 0.0183 0.0002

    d 1.0397 1.0012

    Bd 0.0819 0.0030

    d 1.5229 1.0072

    Cd 0.2997 0.0423

    d 1.8086 1.1796

    Table 6: Deviation from Affine Equivariance for the reweighted estimators on simulated data with

    10% of contamination.

    Point Cluster Radial

    OGK DetMCD OGK DetMCD OGK DetMCD

    Ad 0.0382 0.0009 0.0436 0.0000 0.0394 0.0012

    d 1.1840 1.0023 1.2240 1.0000 1.0995 1.0025

    Bd 0.0924 0.0337 0.0866 0.0416 0.1097 0.0274

    d 1.8610 1.0515 1.5662 1.0626 1.7491 1.0433

    Cd 0.2551 0.0369 0.1672 0.0261 0.1942 0.0309

    d 2.1819 1.1403 2.6909 1.1300 1.8949 1.1261

    14

  • 8/2/2019 A Deter Minis Tic Algorithm for the MCD

    16/27

    Table 7: Deviation from Affine Equivariance for the reweighted estimators on simulated data with

    20% of contamination.

    Point Cluster Radial

    OGK DetMCD OGK DetMCD OGK DetMCD

    Ad 0.0658 0.0000 0.0837 0.0000 0.0346 0.0000

    d 1.2592 1.0000 3.1184 1.0000 1.1487 1.0000

    Bd 0.1869 0.0000 0.1464 0.0000 0.1092 0.0000

    d 2.1709 1.0000 1.9857 1.0000 1.8884 1.0000

    Cd 0.2928 0.0005 0.3770 0.0830 0.1860 0.0000

    d 2.0009 1.0022 12.1472 4.5434 1.8217 1.0000

    5.2 Permutation invariance

    Another property we are interested in is permutation invariance. An estimator T (.) is said to be

    permutation invariant if T (P X ) = T (X ) for any data set X and any permutation matrix P . A

    permutation matrix is a square matrix that has a single entry 1 in each row and each column, and

    zeroes elsewhere. Therefore P X simply permutes the rows of X . Note that FASTMCD is not

    permutation invariant because the initial subsets (generated by a pseudorandom number generator

    with a xed seed) will have the same case numbers but correspond to different observations. By

    contrast, all ingredients of DetMCD are permutation invariant. Analogous to the previous section,

    the deviation from permutation invariance can be measured by d = || (P X ) (X )|| and

    d = cond( (X ) 1 / 2 (P X ) (X ) 1 / 2 ). For the ionospheric data, Table 8 shows the average d

    and d over 100 matrices P . They conrm that FASTMCD is not permutation invariant, whereas

    OGK and DetMCD are.

    Table 8: Deviation from Permutation Invariance for the Ionospheric Data.

    FASTMCD OGK DetMCD

    d 0.0410 0 0

    d 12.4131 1 1

    15

  • 8/2/2019 A Deter Minis Tic Algorithm for the MCD

    17/27

    5.3 Different values of h

    We already noted that DetMCD is faster than FASTMCD when applying the algorithm once for

    a xed value of h. As the number of outliers should be below n h, it is commonly advised to set

    h 0.5n when a large proportion of outliers could occur, and h 0.75n otherwise. Alternatively,one could also compute the MCD for a whole range of h-values, and see whether at some h there

    is an important change in the objective function or the estimates. This is related to the forward

    search of Atkinson et al. (2004). With DetMCD it becomes very easy to compute the MCD for

    several h-values: since the seven initial estimates do not depend on h, we only need to store the

    resulting ordered distances (3), yielding the initial h-subset for any h. We will illustrate this

    feature on several examples in Section 6.

    6 Real Examples

    6.1 Philips data

    Rousseeuw and Van Driessen (1999) illustrated FASTMCD on data provided by Philips, which

    produced diaphragm parts for television sets. When a new production line was started, the

    engineers measured 9 characteristics for each of the 677 parts. Applying FASTMCD with h =

    0.75n yielded the robust distances in Figure 3(a). Many distances exceed the cutoff value

    29 ,0 .975 . In particular, the observations 491-565 are clearly different from the others, indicatingthat something happened in the production process. Figure 3(b) shows the robust distances usingthe DetMCD algorithm. They are almost identical to the FASTMCD results, with the same points

    being agged as outlying. Moreover, the estimates for location and scatter were almost identical,

    i.e. d = || MCD DetMCD || = 0 .0004 and d = cond

    12

    MCD DetMCD (

    12

    MCD )T = 1 .0488. Also

    the objective functions reached by the raw DetMCD and the raw FASTMCD were almost the same,

    since OBJ MCDOBJ DetMCD = 0 .9930. The optimal h-subsets that determine the raw estimates only differed

    in ve observations. The main difference lies in the computation time: whereas FASTMCD took

    4.8 seconds, DetMCD only needed 0.5 seconds.

    16

  • 8/2/2019 A Deter Minis Tic Algorithm for the MCD

    18/27

    0 100 200 300 400 500 600 7000

    2

    4

    6

    8

    10

    12

    14

    16

    Index

    R

    o b u s

    t d i s t a n c e

    0 100 200 300 400 500 600 7000

    2

    4

    6

    8

    10

    12

    14

    16

    Index

    R

    o b u s

    t d i s t a n c e

    (a) (b)

    Figure 3: Robust distances of the Philips data with (a) FASTMCD and (b) DetMCD.

    6.2 Swiss bank notes dataAs a second example we consider p = 6 measurements of n = 100 forged Swiss 1000 franc bills, from

    Flury and Riedwyl (1988). As shown in Salibian-Barrera et al. (2006); Pison and Van Aelst (2004),

    and Willems et al. (2009), this data set contains several outlying observations and highly correlated

    variables. Therefore it is appropriate to analyze the data with a robust PCA method. In Croux

    and Haesbroeck (2000) it is argued that the MCD estimator can be used for this purpose. The

    rst three principal components are retained, because together they explain 92% of the variance.

    Figure 4 shows the resulting outlier maps based on FASTMCD and DetMCD using h = 75. On

    the horizontal axis they have the robust distance of the observation in the three-dimensional PCA

    subspace. The vertical axis shows the orthogonal distance of the observation to the PCA subspace.

    Such an outlier map allows to classify observations into regular cases, good PCA leverage points,

    orthogonal outliers, and bad PCA leverage points (Hubert et al., 2005). We see that the outlier

    maps are very similar, and that the same observations are agged as outlying. The nal h-subsets

    obtained with both algorithms had h 1 points in common. Again the computation times were

    quite different: FASTMCD took 1.5 seconds whereas DetMCD was 20 times faster.

    6.3 Pulp bre data

    The MCD can also be used to perform a robust multivariate regression (Rousseeuw et al., 2004).

    Denoting the q-dimensional response variable of the ith observation by y i , the goal of multivariate

    17

  • 8/2/2019 A Deter Minis Tic Algorithm for the MCD

    19/27

    0 0.5 1 1.5 2 2.5 3 3.5 4

    0

    0.5

    1

    1.5

    2

    Score distance (3 LV)

    O r t

    h o g o n a

    l d i s t a n c e

    233

    118716

    8060

    13

    25

    61

    1187

    94 1668

    38

    618248 9262 60

    80

    6771

    0 0.5 1 1.5 2 2.5 3 3.5 4

    0

    0.5

    1

    1.5

    2

    Score distance (3 LV)

    O r t

    h o g o n a

    l d i s t a n c e

    23 3 13

    118716

    8060

    25

    61

    1187

    94 1668

    38

    618248 9262 60

    80

    6771

    (a) (b)

    Figure 4: Outlier map of the Swiss bank notes data using robust PCA with (a) FASTMCD and

    (b) DetMCD.

    linear regression is to estimate the intercept vector and the slope matrix B in the model

    y i = + Bx i + i .

    The MCD regression estimates and B are obtained by matrix operations on the MCD location

    and covariance estimates of the joint ( x i , y i) data.

    To illustrate MCD regression we consider a dataset of Lee (1992) that contains properties of

    n = 62 pulp bres and the paper made from them. MCD regression is applied to predict the q = 4

    paper properties from the p = 4 bre characteristics. Figure 5 shows the norm of each slope B j for

    all 27 values of h from 35 to 61, obtained with the FASTMCD and the DetMCD algorithms. Note

    that FASTMCD has to start from scratch for each h, whereas DetMCD only needs to compute

    the seven initial estimates once. The algorithms yield identical results for all values of h. Figure 5

    has a sizeable jump at h = 52. In fact, from h = 52 on the nal h-subset contains bad leverage

    points. It turned out that the most severe bad leverage points were produced from r wood, and

    that most of the outlying samples were obtained using different pulping processes. The 27 robust

    regressions together took 53.2 seconds with FASTMCD, whereas the version with DetMCD only

    needed 5.9 seconds.

    18

  • 8/2/2019 A Deter Minis Tic Algorithm for the MCD

    20/27

    35 40 45 50 55 60

    20

    40

    60

    80

    100

    120

    h

    n o r m

    ( s l o p e

    )

    FASTMCD (1st slope)DetMCD(1st slope)FASTMCD (2nd slope)DetMCD(2nd slope)FASTMCD (3rd slope)DetMCD(3rd slope)FASTMCD (4th slope)DetMCD(4th slope)

    Figure 5: Norm of each slope of the pulp bre data, for different values of h.

    6.4 Fruit data

    Our last example is high-dimensional. The fruit data set contains spectra of three different cultivars

    (with sizes 490, 106, and 500) of a type of cantaloupe, and was previously analyzed in Hubert and

    Van Driessen (2004). All spectra were measured at 256 wavelengths, hence the data set contains

    1096 observations and 256 variables. First, we performed a robust PCA using the ROBPCA

    method of Hubert et al. (2005). ROBPCA mainly consists of two steps. In the rst step arobust subspace is constructed based on the Stahel-Donoho outlyingness (Stahel, 1981; Donoho

    and Gasko, 1992; Debruyne and Hubert, 2009). Next, robust eigenvectors and eigenvalues are

    found within this subspace by applying the MCD to the projected observations. We consider

    the original ROBPCA method that uses FASTMCD in the second stage of the algorithm, and a

    modied ROBPCA that applies DetMCD. From the scree plot we decided to retain two principal

    components. Again FASTMCD and DetMCD gave identical results. We note a big group of

    outliers in Figure 6, which corresponds to a change in the instruments illumination system.

    Next, we applied the robust quadratic discriminant rule RQDR (Hubert and Van Driessen,

    2004) to the robust two-dimensional PCA scores for values of h from 550 to 1095. The RQDR

    method rst runs the MCD estimator on each of the groups. A new datum is then assigned to the

    group for which it attains the largest discriminant score. Also membership probabilities for each

    group are estimated, as the proportion of regular observations in each group. Figure 7 shows these

    19

  • 8/2/2019 A Deter Minis Tic Algorithm for the MCD

    21/27

    0 2 4 6 8 10

    0

    1

    2

    3

    4

    5

    6

    7

    8

    9

    10

    Score distance (2 LV)

    O r t

    h o g o n a l

    d i s t a n c e

    621682

    681

    625615600

    Figure 6: Outlier map of the fruit data using ROBPCA.

    membership probabilities obtained with both methods as a function of h. Also here FASTMCD

    600 700 800 900 1000 11000

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    h

    M e m

    b e r s

    h i p p r o

    b a

    b i l i t y f o r e a c

    h g r o u p

    FASTMCD: group 1DetMCD: group 1FASTMCD: group 2DetMCD: group 2FASTMCD: group 3DetMCD: group 3

    Figure 7: Membership probabilities of each group of the fruit data, for different values of h.

    and DetMCD give almost the same results. We see that the membership probabilities change

    signicantly at h = 700, hence there are a substantial number of outliers present. Therefore h

    should be taken sufficiently small to obtain robust results. The entire analysis took 562 seconds

    when using FASTMCD, whereas it only needed 111 seconds with DetMCD. The computation time

    only went down by a factor of 5 here because the analyses have several parts in common, such as

    the computation of the discriminant scores.

    20

  • 8/2/2019 A Deter Minis Tic Algorithm for the MCD

    22/27

    7 Conclusions and outlook

    DetMCD is a new algorithm for the MCD, which needs even less time than FASTMCD. It starts

    from a few easily computed h-subsets, and then takes concentration steps until convergence. The

    DetMCD algorithm is deterministic in that it does not use any random subsets. It is permutationinvariant and close to affine equivariant, and allows to run the analysis for many values of h without

    much additional computation. We illustrated DetMCD in the contexts of PCA, regression, and

    classication.

    Also many other methods that directly or indirectly rely on the MCD (e.g. through its ro-

    bust distances) may benet from the DetMCD approach, such as robust canonical correlation

    (Croux and Dehon, 2002), robust regression with continuous and categorical regressors (Hubert

    and Rousseeuw, 1996), robust errors-in-variables regression (Fekri and Ruiz-Gazen, 2004), robustprincipal component regression (Hubert and Verboven, 2003), and robust partial least squares

    (Hubert and Vanden Branden, 2003). In particular, on-line applications or procedures that re-

    quire the MCD to be computed many times, such as genetic algorithms (Wiegand et al., 2009), will

    become more efficient. The cross-validation techniques of Hubert and Engelen (2007) and Engelen

    and Hubert (2005) may benet from the fact that DetMCD is easily updated when an observation

    is added or removed. Following Copt and Victoria-Feser (2004) and Serneels and Verdonck (2008)

    we will also investigate whether DetMCD can be extended to the missing data framework.The DetMCD algorithm will be made available in Matlab as part of LIBRA (Verboven and

    Hubert, 2005). Also an implementation in R will be provided.

    The random sampling mechanism is currently used for many other high-breakdown robust

    estimators. Our deterministic approach could improve on those algorithms as well. In particular we

    intend to study a deterministic algorithm for S-estimators and -estimators, for which algorithms

    in the spirit of FASTMCD were developed recently (Salibian-Barrera and Yohai, 2006; Salibian-

    Barrera et al., 2008). We will also work on a deterministic algorithm for LTS regression, which istypically computed with the FASTLTS algorithm (Rousseeuw and Van Driessen, 2006).

    SUPPLEMENTAL MATERIALS

    Matlab code for DetMCD algorithm: Matlab code to perform the DetMCD algorithm that

    is proposed in this article. Note that this algorithm requires the Matlab library for Robust

    21

  • 8/2/2019 A Deter Minis Tic Algorithm for the MCD

    23/27

    Analysis LIBRA, which can be freely downloaded from

    http://wis.kuleuven.be/stat/robust/LIBRA.html . (.m le)

    data sets: Matlab le that contains all the data sets used in this article. (.mat le)

    References

    Atkinson, A. , Riani, M. and Cerioli, A. (2004). Exploring multivariate data with the forward

    search . Springer-Verlag, New York.

    Bay, S. (1999). The UCI KDD Archive, http://kdd.ics.uci.edu . Irvine, CA: University of Cali-

    fornia, Department of Information and Computer Science.

    Billor, N. , Hadi, A. and Velleman, P. (2000). Bacon: blocked adaptive computationally

    efficient outlier nominators. Computational Statistics and Data Analysis 34(3) 279298.

    Butler, R. , Davies, P. and Jhun, M. (1993). Asymptotics for the Minimum Covariance

    Determinant estimator. The Annals of Statistics 21 13851400.

    Cator, E. and Lopuhaa, H. (2009). Central limit theorem and inuence function for the MCD

    estimators at general multivariate distributions. Submitted.Copt, S. and Victoria-Feser, M.-P. (2004). Fast algorithms for computing high breakdown

    covariance matrices with missing data. In Theory and Applications of Recent Robust Methods

    (M. Hubert, G. Pison, A. Struyf and S. V. Aelst, eds.). Statistics for Industry and Technology,

    Birkh auser, Basel.

    Croux, C. and Dehon, C. (2002). Analyse canonique basee sur des estimateurs robustes de la

    matrice de covariance. La Revue de Statistique Appliquee 2 526.

    Croux, C. and Haesbroeck, G. (1999). Inuence function and efficiency of the Minimum

    Covariance Determinant scatter matrix estimator. Journal of Multivariate Analysis 71 161

    190.

    22

  • 8/2/2019 A Deter Minis Tic Algorithm for the MCD

    24/27

    Croux, C. and Haesbroeck, G. (2000). Principal components analysis based on robust esti-

    mators of the covariance or correlation matrix: inuence functions and efficiencies. Biometrika

    87 603618.

    Debruyne, M. and Hubert, M. (2009). The inuence function of the Stahel-Donoho covarianceestimator of smallest outlyingness. Statistics and Probability Letters 79 275282.

    Donoho, D. and Gasko, M. (1992). Breakdown properties of location estimates based on

    halfspace depth and projected outlyingness. The Annals of Statistics 20 18031827.

    Engelen, S. and Hubert, M. (2005). Fast model selection for robust calibration. Analytica

    Chemica Acta 544 219228.

    Fekri, M. and Ruiz-Gazen, A. (2004). Robust weighted orthogonal regression in the errors-

    in-variables model. Journal of Multivariate Analysis 88 89108.

    Flury, B. and Riedwyl, H. (1988). Multivariate statistics: a practical approach . Cambridge

    university press.

    Gnanadesikan, R. and Kettenring, J. (1972). Robust estimates, residuals, and outlier de-

    tection with multiresponse data. Biometrics 28 81124.

    Hadi, A. , Rahmatullah Imon, H. and Werner, M. (2009). Detection of outliers. Wiley

    Interdisciplinary Reviews: Computational Statistics 1 5770.

    Hardin, J. and Rocke, D. (2004). Outlier detection in the multiple cluster setting using the

    minimum covariance determinant estimator. Computational Statistics and Data Analysis 44

    625638.

    Hubert, M. and Debruyne, M. (2009). Minimum Covariance Determinant. Wiley Interdisci-

    plinary Reviews: Computational Statistics in press.

    Hubert, M. and Engelen, S. (2007). Fast cross-validation for high-breakdown resampling

    algorithms for PCA. Computational Statistics and Data Analysis 51 50135024.

    Hubert, M. and Rousseeuw, P. (1996). Robust regression with both continuous and binary

    regressors. Journal of Statistical Planning and Inference 57 153163.

    23

  • 8/2/2019 A Deter Minis Tic Algorithm for the MCD

    25/27

    Hubert, M. , Rousseeuw, P. and Van Aelst, S. (2008). High breakdown robust multivariate

    methods. Statistical Science 23 92119.

    Hubert, M. , Rousseeuw, P. and Vanden Branden, K. (2005). ROBPCA: a new approach

    to robust principal components analysis. Technometrics 47 6479.

    Hubert, M. and Van Driessen, K. (2004). Fast and robust discriminant analysis. Computa-

    tional Statistics and Data Analysis 45 301320.

    Hubert, M. and Vanden Branden, K. (2003). Robust methods for Partial Least Squares

    Regression. Journal of Chemometrics 17 537549.

    Hubert, M. and Verboven, S. (2003). A robust PCR method for high-dimensional regressors.

    Journal of Chemometrics 17 438452.

    Kendall, M. (1975). Multivariate Analysis . Griffin, London.

    Lee, J. (1992). Relationships between Properties of Pulp-Fibre and Paper . Ph.D. thesis, Ph.D.,

    University of Toronto.

    Lopuha a, H. and Rousseeuw, P. (1991). Breakdown points of affine equivariant estimators of

    multivariate location and covariance matrices. The Annals of Statistics 19 229248.Maronna, R. and Zamar, R. (2002). Robust estimates of location and dispersion for high-

    dimensional data sets. Technometrics 44 307317.

    Pison, G. , Rousseeuw, P. , Filzmoser, P. and Croux, C. (2003). Robust factor analysis.

    Journal of Multivariate Analysis 84 145172.

    Pison, G. and Van Aelst, S. (2004). Diagnostic plots for robust multivariate methods. Journal

    of Computational and Graphical Statistics 13 310329.

    Pison, G. , Van Aelst, S. and Willems, G. (2002). Small sample corrections for LTS and

    MCD. Metrika 55 111123.

    Rousseeuw, P. (1984). Least median of squares regression. Journal of the American Statistical

    Association 79 871880.

    24

  • 8/2/2019 A Deter Minis Tic Algorithm for the MCD

    26/27

    Rousseeuw, P. and Croux, C. (1993). Alternatives to the median absolute deviation. Journal

    of the American Statistical Association 88 12731283.

    Rousseeuw, P. , Van Aelst, S. , Van Driessen, K. and Agull o, J. (2004). Robust multi-

    variate regression. Technometrics 46 293305.

    Rousseeuw, P. and Van Driessen, K. (1999). A fast algorithm for the Minimum Covariance

    Determinant estimator. Technometrics 41 212223.

    Rousseeuw, P. and Van Driessen, K. (2006). Computing LTS regression for large data sets.

    Data Mining and Knowledge Discovery 12 2945.

    Salibian-Barrera, M. , Van Aelst, S. and Willems, G. (2006). PCA based on multivariate

    MM-estimators with fast and robust bootstrap. Journal of the American Statistical Association

    101 11981211.

    Salibian-Barrera, M. , Willems, G. and Zamar, R. (2008). The fast- estimator for regres-

    sion. Journal of Computational and Graphical Statistics 17 659682.

    Salibian-Barrera, M. and Yohai, V. J. (2006). A fast algorithm for S-regression estimates.

    Journal of Computational and Graphical Statistics 15 414427.

    Serneels, S. and Verdonck, T. (2008). Principal component analysis for data containing

    outliers and missing elements. Computational Statistics and Data Analysis 52 17121727.

    Stahel, W. (1981). Robuste Sch atzungen: innitesimale Optimalit at und Sch atzungen von Ko-

    varianzmatrizen . Ph.D. thesis, ETH Zurich.

    Vanden Branden, K. and Hubert, M. (2005). Robust classication in high dimensions based

    on the SIMCA method. Chemometrics and Intelligent Laboratory Systems 79 1021.

    Verboven, S. and Hubert, M. (2005). LIBRA: a Matlab library for robust analysis. Chemo-

    metrics and Intelligent Laboratory Systems 75 127136.

    Visuri, S. , Koivunen, V. and Oja, H. (2000). Sign and rank covariance matrices. Journal of

    Statistical Planning and Inference 91 557575.

    25

  • 8/2/2019 A Deter Minis Tic Algorithm for the MCD

    27/27

    Wiegand, P. , Pell, R. and Comas, E. (2009). Simultaneous variable selection and outlier

    detection using a robust genetic algorithm. Chemometrics and Intelligent Laboratory Systems

    98 108114.

    Willems, G. , Joe, H. and Zamar, R. (2009). Diagnosing multivariate outliers detected byrobust estimators. Journal of Computational and Graphical Statistics 18(1) 7391.

    Wise, B. , Gallagher, N. , Bro, R. , Shaver, J. , Windig, W. and Koch, R. (2006).

    PLS Toolbox 4.0 for use with MATLAB . Software, Eigenvector Research, Inc., 2006.

    URL http://software.eigenvector.com/

    Yohai, V. and Zamar, R. (1988). High breakdown point estimates of regression by means of the

    minimization of an efficient scale. Journal of the American Statistical Association 83 406413.

    26