A new depth-based approach for detecting outlying curves › stat › robust › papers › 2012 ›...

12
A new depth-based approach for detecting outlying curves Mia Hubert, Department of Mathematics & LStat, KU Leuven, [email protected] Gerda Claeskens, ORSTAT & LStat, KU Leuven, [email protected] Bart De Ketelaere, BIOSYST-MeBioS & LStat, KU L., [email protected] Kaveh Vakili, Department of Mathematics & LStat, KU Leuven, [email protected] Abstract. Depth functions are statistical tools, used to attribute a sensible ordering to observa- tions in a sample from the center outwards. Recently several depth functions have been proposed for functional data. These depth functions can for example be used for robust classification and for the detection of outlying curves. A new depth function is presented, which can be applied to multivariate curves and which takes the local changes in the amount of variability in amplitude into account. It is illustrated on an industrial data set how this depth function can be useful to detect globally outlying curves as well as curves that are only outlying on parts of their domain. Several graphical representations of the curves and their degree of outlyingness are presented. Keywords. depth, functional data, outlyingness 1 Introduction In functional data analysis (FDA) one is usually interested in the analysis of one set of curves. To be more precise, typical measurements consist of N curves of the form {(t, Y n (t))} N n=1 observed on an interval U . We assume that measurements are available in a discrete set of time points t 1 ,t 2 ,...,t T . Basic questions of interest are (i) the estimation of the central tendency of the curves, (ii) the estimation of the variability among the curves, (iii) the detection of outlying curves, as well as (iv) classification or clustering of such curves. In this paper we focus on the first three challenges in FDA. Our approach is based on the concept of depth. A depth function provides an ordering from the center outwards such that the most central object gets the highest depth value and the least central objects the smallest depth. Some well-known depth functions are halfspace depth, simplicial depth, projection depth, zonoid depth, among others [9]. Recently, many notions of depth have been proposed for functional data, such as the Fraiman and Muniz (FM) depth [6], random projection depth (RP) [3], band depth (BD) and modified band depth (MBD) [10], and half-region depth [11]. All these depth

Transcript of A new depth-based approach for detecting outlying curves › stat › robust › papers › 2012 ›...

  • A new depth-based approach fordetecting outlying curves

    Mia Hubert, Department of Mathematics & LStat, KU Leuven, [email protected] Claeskens, ORSTAT & LStat, KU Leuven, [email protected] De Ketelaere, BIOSYST-MeBioS & LStat, KU L., [email protected] Vakili, Department of Mathematics & LStat, KU Leuven, [email protected]

    Abstract. Depth functions are statistical tools, used to attribute a sensible ordering to observa-tions in a sample from the center outwards. Recently several depth functions have been proposedfor functional data. These depth functions can for example be used for robust classification andfor the detection of outlying curves. A new depth function is presented, which can be applied tomultivariate curves and which takes the local changes in the amount of variability in amplitudeinto account. It is illustrated on an industrial data set how this depth function can be useful todetect globally outlying curves as well as curves that are only outlying on parts of their domain.Several graphical representations of the curves and their degree of outlyingness are presented.

    Keywords. depth, functional data, outlyingness

    1 Introduction

    In functional data analysis (FDA) one is usually interested in the analysis of one set of curves. Tobe more precise, typical measurements consist of N curves of the form {(t, Yn(t))}Nn=1 observedon an interval U . We assume that measurements are available in a discrete set of time pointst1, t2, . . . , tT . Basic questions of interest are (i) the estimation of the central tendency of thecurves, (ii) the estimation of the variability among the curves, (iii) the detection of outlyingcurves, as well as (iv) classification or clustering of such curves.

    In this paper we focus on the first three challenges in FDA. Our approach is based on theconcept of depth. A depth function provides an ordering from the center outwards such that themost central object gets the highest depth value and the least central objects the smallest depth.Some well-known depth functions are halfspace depth, simplicial depth, projection depth, zonoiddepth, among others [9]. Recently, many notions of depth have been proposed for functionaldata, such as the Fraiman and Muniz (FM) depth [6], random projection depth (RP) [3], banddepth (BD) and modified band depth (MBD) [10], and half-region depth [11]. All these depth

  • 330 A new depth-based approach for detecting outlying curves

    functions are computed on the original set of observed curves {(t, Yn(t))}Nn=1. The FM depthand MBD depth are quite similar, as they both consider a univariate depth function at eachtime point t and define the functional depth as the average of these depth values over all timepoints.

    To better handle shape differences, Cuevas et al. [3] have proposed to consider the originalset of curves {(t, Yn(t))}Nn=1 as well as their derivatives {(t, Y ′n(t))}Nn=1. They consider a num-ber of random projections, project both sets of curves on each direction, apply a multivariatedepth function on the bivariate sample and finally average the depth values over the randomprojections. Adding this extra information through the use of the derivatives is frequently donein FDA.

    We generalize several of these ideas by constructing a depth function for K-variate curvesamples {(t,Y n(t))}Nn=1 (so each Y n(t) is a K-variate vector). Our definition also averages amultivariate depth function over the time points, but in addition it includes a weight functionwhich accounts for variability in amplitude. The definition and main properties of this depthfunction are given in Section 2. The resulting estimates for the central tendancy and variability ofthe curves are illustrated in Section 3 on a real industrial data set which consists of accelerationsignals over time. Here, we do not use the derivatives of the original set of curves, but theintegrated curves as they represent the underlying velocity. In Section 4 we describe and illustratehow our new notion of depth can be used for the detection of outlying curves, and for thevisualisation of outlying parts of the curves.

    2 Multivariate functional halfspace depth

    We denote a given sample of N curves, measured at T time points as YN = {Y n}Nn=1 whereY n = (Y n(t1),Y n(t2), . . . ,Y n(tT )) := (Y n,1,Y n,2, . . . ,Y n,T ) and each Y n(tj) is a K-variatevector. The N K-variate measurements at time tj are denoted as Y N,j = {Y n,j}Nn=1.

    Multivariate functional halfspace depth (MFHD) for a curve Y m (1 6 m 6 N) is obtained byweighting the time-wise halfspace depth according to the amplitude variability. More precisely(see [14], Ch. 5) for a sample of N curves, measured at T time points,

    MFHDN,T (Y m;α) =T∑j=1

    wα,N (tj)HDN,j(Y m,j),

    where, using tT+1 = tT + 0.5(tT − tT−1), the weight function is defined by

    wα,N (tj) =(tj+1 − tj)vol[{x ∈ RK : HDN,j(x) > α}]∑Tj=1(tj+1 − tj)vol[{x ∈ RK : HDN,j(x) > α}]

    . (1)

    The sample halfspace depth of a K-variate vector y at time tj is given by

    HDN,j(y) =1

    Nmin

    u,‖u‖=1#{Y n,j , n = 1, . . . , N : utY n,j > uty}.

    The weight function at time t is proportional to the depth contour of level α, with α ∈ (0, 0.5]such that the denominator in the weight function is not zero. The weight is introduced todownweight areas where all curves nearly coincide. Indeed, it is desired that the areas with largeamplitude variability should dominate the ordering of the curves.

    COMPSTAT 2012 Proceedings

  • Mia Hubert, Gerda Claeskens, Bart De Ketelaere and Kaveh Vakili 331

    Exact computation of the MFHD can be done with fast algorithms for the halfspace depthup to dimension K = 4 [1] and the depth contours up to dimension at least K = 5 [7, 12].In this paper we used the R-packages depth and aplpack which implement fast algorithms forbivariate data. Approximate halfspace depth in higher dimensions can be computed by meansof the random Tukey depth [2].

    Multivariate functional halfspace depth satisfies several key properties such as affine invari-ance, maximality at the center and monotonicity relative to the deepest point.

    A c% trimmed mean of curves is defined by first assigning a depth value to each curve andnext by omitting the c% curves with the lowest depth. A cross-sectional average of the remainingcurves may then be computed. The deepest curve, also called the median, is defined as the curvewith maximal MFHD. It can easily be shown that it corresponds to the curve Θ for which ateach time point t, Θ(t) is a deepest value, obtained by the center of mass of the set of valueswith maximum halfspace depth at time t. Under some assumptions, the median can be shownto exist and to be continuous.

    3 Central curve estimation

    We illustrate our new depth function on an industrial data set that produces one part duringeach cycle [4]. The behavior of the cycle as monitored by an accelerometer provides a fingerprintof the cycle and, related, of the quality of the produced part. If a deviating acceleration signaloccurs, the process owner should be warned. Figure 1 shows the acceleration signal of N = 224parts measured during 120ms (in gray). Measurements are available every millisecond, hencethe time signal ranges from t1 = 1 up to tT = 120. On this plot we see several curves with adeviation pattern, most prominently at the final stage of the production.

    To estimate the central pattern of the data, we first computed the mean curve, displayed ingreen on Figure 1. It gives a quite good representation of the main features of the curves, but it isclearly attracted by the outlying values during the last 30ms of the cycle. Next, we computed theMFHD on these original set of curves, with α = 0.25. As we only have univariate measurementsat each time point, this boils down for each curve to compute its univariate halfspace depthat each time point and to take the weighted average of these depth values. The curve whichattains the maximal MFHD is depicted in Figure 1 in dark red. We see that this deepest curveis not attracted by the outlying values at the end of the cycle. Also the estimates in the valleysaround time points 50 and 75 are lower than those of the mean curve, illustrating the robustnessof the deepest curve towards the upward contamination values in these regions. Finally we alsoconsider the 25% trimmed mean curve, obtained by trimming the 25% curves with lowest depthand taking the pointwise mean of the remaining curves (displayed in orange). This trimmedmean is hardly distinguishable from the deepest curve.

    Next, we performed a bivariate analysis on this data set. We could consider the derivativesof the curves as additional information, but in this example, we decided to use the integratedcurves instead. As the velocity at time tj , V (tj) =

    ´ tj−∞A(t)dt with A(t) the acceleration at

    time t, we approximated the velocity by V (tj) ≈ V (tj−1) + (A(tj−1) + A(tj))/2 starting withV (t1) = 0. Note that the choice of the integration constant is not important here, due to theaffine invariance of MFHD. The resulting velocity curves can be seen in Figure 2(b). Also herewe see several curves whose velocity is unusual during a large part of the cycle. The mean curveis slightly affected by these outliers. Computing the MFHD on the bivariate data (A(t), V (t))

    @ COMPSTAT 2012

  • 332 A new depth-based approach for detecting outlying curves

    0 20 40 60 80 100 120−

    2.0

    −1

    .5−

    1.0

    −0

    .50

    .00

    .5

    Univariate raw data

    Time

    Acce

    lera

    tio

    n

    mean

    deepest

    25% trimmed mean

    Figure 1: Mean curve, deepest curve and 25% trimmed mean based on the univariate MFHD.

    yields a deepest set of curves, again printed in dark red on Figure 2 for the acceleration andvelocity curves. There are no huge differences between the deepest curve in Figure 1, but it liescloser to the (more efficient) trimmed mean.

    0 20 40 60 80 100 120

    −2

    .0−

    1.5

    −1

    .0−

    0.5

    0.0

    0.5

    Bivariate analysis

    Time

    Acce

    lera

    tio

    n

    mean

    deepest

    25% trimmed mean

    0 20 40 60 80 100 120

    −5

    0−

    40

    −3

    0−

    20

    −1

    00

    Bivariate analysis

    Time

    Ve

    locity

    mean

    deepest

    25% trimmed mean

    (a) (b)

    Figure 2: Mean curve, deepest curve and 25% trimmed mean based on the bivariate MFHD.

    Apart from estimating the global pattern of the curves, we are also interested in the variabilityof the curves. Our depth-based approach allows to visualise this dispersion by means of thecentral regions, introduced in [15]. The β-central region consists of the band delimited by theβ curves with highest depth. If we draw the 25%, 50% and 75% central regions, we obtaina representation of the data as in the enhanced functional boxplot of [15]. As before, we canconstruct these regions based on the original curves A(t) only, which yields Figure 3(a). Similarlythe 10%, 50% and 90% central regions are depicted in Figure 3(b). It is obvious that the 90%central region contains outlying curves and hence increases the volume of that central region.Based on the bivariate analysis, we obtain Figures 4(a)-(b) for the acceleration curves and (c)-(d) for the velocity curves. Now, we see a more important difference between the univariate andthe bivariate analysis, as the 50% and 75% central regions of the acceleration curves are quite

    COMPSTAT 2012 Proceedings

  • Mia Hubert, Gerda Claeskens, Bart De Ketelaere and Kaveh Vakili 333

    different between 40ms and 60ms.

    0 20 40 60 80 100 120

    −2

    .0−

    1.5

    −1

    .0−

    0.5

    0.0

    0.5

    25 % (orange), 50 % (white) and 75 % (blue) central region

    Time

    Acce

    lera

    tio

    n

    0 20 40 60 80 100 120

    −2

    .0−

    1.5

    −1

    .0−

    0.5

    0.0

    0.5

    10 % (red), 50 % (white) and 90 % (dark blue) central region

    Time

    Acce

    lera

    tio

    n

    (a) (b)

    Figure 3: Central regions for the acceleration curves based on the univariate analysis.

    To understand the difference between the univariate and the bivariate analysis, we firstcompare the univariate and bivariate MFHD values for all curves, shown in Figure 5. We see aglobal monotone trend showing that curves with a low univariate MFHD depth also have a lowbivariate MFHD depth, but the relation is certainly not strictly monotone.

    Let us focus on two specific curves, with labels 112 and 207, indicated in Figure 6, for compar-ison together with the deepest curve. Curve 207 clearly has a completely different accelerationand velocity pattern than the trend observed on the regular curves. Only in the beginning of theprocess, the acceleration and velocity are small and comparable with the others. Not surpris-ingly, both the univariate and bivariate MFHD are small for this curve, as we observe in Figure 5.Curve 112 shows a different, deviating pattern. From Figure 6(a) we notice that it attains largeracceleration values at the peaks around 47ms and 57ms, and one additional oscillation between60ms and 80ms. Consequently its univariate depth, only based on this information, is somewhatlower but it is not extremely small. To be more precise, the univariate MFHD of curve 112 hasrank 45 (out of 224). When we include the information given by the velocity curves, we seefrom Figure 6(b) that the velocity of curve 112 is outlying on almost the whole time domain.This yields a bivariate MFHD with rank 15, which is close to the rank of the bivariate MHFD ofcurve 207 which equals 12. As a result, the 75% central region based on the bivariate depth doesnot include the curves with large peaks around 47ms and 57ms, whereas the univariate-basedcentral region does include them.

    Note that our definition of MFHD depends on the choice of the level of the depth contours αin the weighting function (1). However, we noticed that our analysis is usually not very sensitiveto this choice, as long as α is not taken too small such that outlying curves are not included inthe depth contour. Figure 7 shows a scatterplot of the MFHD depth with α = 0.35 versus thedepth values for α = 0.1. We see that they are very similar.

    A referee suggested to define the multivariate functional depth of a curve as the average ofits marginal functional depth in each dimension. If we denote Y n(t) = (Yn,1(t), . . . , Yn,K(t)) and(t, Yn,k(t)) the univariate set of curves for each 1 6 k 6 K, we can compute for each multivariatecurve Y m the functional depth of its corresponding univariate curves (t, Yn,k(t)) for each k and

    @ COMPSTAT 2012

  • 334 A new depth-based approach for detecting outlying curves

    0 20 40 60 80 100 120

    −2

    .0−

    1.5

    −1

    .0−

    0.5

    0.0

    0.5

    25 % (orange), 50 % (white) and 75 % (blue) central region

    Time

    Acce

    lera

    tio

    n

    0 20 40 60 80 100 120

    −2

    .0−

    1.5

    −1

    .0−

    0.5

    0.0

    0.5

    10 % (red), 50 % (white) and 90 % (dark blue) central region

    Time

    Acce

    lera

    tio

    n

    (a) (b)

    0 20 40 60 80 100 120

    −5

    0−

    40

    −3

    0−

    20

    −1

    00

    25 % (orange), 50 % (white) and 75 % (blue) central region

    Time

    Ve

    locity

    0 20 40 60 80 100 120

    −5

    0−

    40

    −3

    0−

    20

    −1

    00

    10 % (red), 50 % (white) and 90 % (dark blue) central region

    Time

    Ve

    locity

    (c) (d)

    Figure 4: Central regions for (a - b) the acceleration curves based on the bivariate analysis; and(c - d) central regions for the velocity curves.

    average these values. This approach has the computational advantage that no multivariatehalfspace depth needs to be computed, but on the other hand it does not use the correlationbetween the k components. Consequently, this analysis yields quite some different results. InFigure 8(a) we display this marginal functional depth (MaFD) versus our bivariate MFHD. Wesee globally a monotone pattern as in Figure 5, but quite some dispersion. Consequently somecurves, such as curve 103 which is marked in red in Figure 8, are considered to be more centrallylocated based on MaFD than on MFHD, and vice versa, as for curve 182 (marked in blue). Tounderstand this difference, we can e.g. look at the bivariate measurements at time t = 99, shownin Figure 8(b). At this time point the average of the marginal depth values for curve 182 hasrank 76, mainly because its velocity is very normal. Its bivariate halfspace depth on the otherhand has only rank 35, as the combination of its acceleration and velocity is more unusual. Forcurve 103, the opposite effects occur at several time points, such that finally its MFHD has alarger rank (namely 193) than the rank of MaFD which equals 159.

    COMPSTAT 2012 Proceedings

  • Mia Hubert, Gerda Claeskens, Bart De Ketelaere and Kaveh Vakili 335

    0.1 0.2 0.3 0.40

    .00

    0.0

    50

    .10

    0.1

    50

    .20

    0.2

    50

    .30

    univariate MFHD depth

    biv

    ari

    ate

    MF

    HD

    112

    207

    Figure 5: Univariate versus bivariate MFHD.

    0 20 40 60 80 100 120

    −2

    .0−

    1.5

    −1

    .0−

    0.5

    0.0

    0.5

    Time

    Acce

    lera

    tio

    n

    112

    207

    0 20 40 60 80 100 120

    −5

    0−

    40

    −3

    0−

    20

    −1

    00

    Time

    Ve

    locity

    112

    207

    (a) (b)

    Figure 6: (a) Acceleration and (b) velocity curves with two outlying curves and the deepestcurve.

    4 Outlier detection

    To detect outlying curves, we can follow two strategies. First it can be argued that the curveswith lowest MFHD are potential outliers. This is for example the approach considered in [5].As depth provides an ordering of the curves from the center outwards, we indeed expect thatoutlying curves have a low depth. This was also empiricially verified in Section 3. To visualisethese potential outliers, we color all the curves according to their depth, yielding a so-calledrainbow plot [8]. We first order the curves from maximal to minimal depth. Then we go fromdark red for the deepest curve, to white for the curve with rank N/2, and move to dark bluefor the curve with minimal depth. This yields Figure 9(a) based on the univariate MFHD, andFigure 9(b) and (c) based on the bivariate MFHD. We see that the extreme outlying curves areall colored dark blue, which is a confirmation that our depth measure gives them a low depthvalue. We also notice some differences between Figure 9(a) and (b) around the time points 47msand 57ms, which can be explained as in Section 3.

    @ COMPSTAT 2012

  • 336 A new depth-based approach for detecting outlying curves

    0.00 0.05 0.10 0.15 0.20 0.25 0.300

    .00

    0.0

    50

    .10

    0.1

    50

    .20

    0.2

    50

    .30

    MFHD and choice of α

    MFHD(α=0.1)

    MF

    HD

    (α=

    0.3

    5)

    Figure 7: Influence of α on the MFHD values.

    We should however be cautious about this approach, as any data set, even one which onlycontains regular curves, will always indicate some of the curves as the ones with lowest depth.Moreover, as our functional depth measure averages the cross-sectional depth values it might givea large depth to a curve which is strongly outlying on part of its domain. Hence we recommendnot only to consider the global amount of outlyingness of a curve, measured by means of itsMFHD, but also to consider its amount of local outlyingness. To this end, we reconsider thecross-sectional bivariate time points on which we have already computed the depth of eachcurve. As a by-product of these computations we can construct the bagplot, which is a bivariateextension of the boxplot [13]. An example is given in Figure 10 at time t = 57ms. The bagplotcontains a bag which contains the 50% curves with largest depth, and a fence which contains allthe regular observations. Curves outside this fence can be flagged as outliers. We see that curves112 and 207 are indeed flagged as being outlying at time t = 57, however both for a differentreason. Curve 112 has an outlying acceleration value, whereas curve 207 has an outlying velocity.This can also clearly be seen from Figure 6.

    Next we can indicate for each curve at which time points it is flagged as a bivariate outlier.For curves 112 and 207, this is shown in Figure 11 where the dark blue parts of the curveindicate the regions where such a local outlyingness is detected (in contrast with the light blueparts where the curve belongs to the fence of the bagplot).

    Finally we can compute for each curve the proportion of time points where it is marked asa local outlier. In Figure 12(a) we expose this proportion for all curves against their MFHD,which is more a global measure of outlyingness. We see that the curves with low MFHD alsohave many local regions of outlyingness. This provides more evidence that they really have anoverall outlying behavior. On this plot we have added a vertical line through the 10% quantile ofthe MFHD values, and a horizontal line at 0.1. This clearly exposes the different types of curves.Those curves represented in the upper right corner are locally outlying in more than 10% of thetime points but don’t have an extremely low depth. This can be explained by the fact thatMFHD also accounts for the amplitude variability whereas our local measure of outlyingnessdoes not.

    Note that this diagnostic display is currently limited to K = 2, as the bagplot is onlydefined for bivariate data. It is however very well suited for a nonparametric approach as the

    COMPSTAT 2012 Proceedings

  • Mia Hubert, Gerda Claeskens, Bart De Ketelaere and Kaveh Vakili 337

    0.1 0.2 0.3 0.4

    0.0

    00

    .05

    0.1

    00

    .15

    0.2

    00

    .25

    0.3

    0

    MaFD

    biv

    ari

    ate

    MF

    HD

    103

    182

    −1.5 −1.0 −0.5 0.0

    −5

    0−

    40

    −3

    0−

    20

    −1

    0

    Acceleration(t=99)

    Ve

    locity(t

    =9

    9)

    182103

    (a) (b)

    0 20 40 60 80 100 120

    −2

    .0−

    1.5

    −1

    .0−

    0.5

    0.0

    0.5

    Time

    Acce

    lera

    tio

    n

    103

    182

    0 20 40 60 80 100 120

    −5

    0−

    40

    −3

    0−

    20

    −1

    00

    Time

    Ve

    locity

    182103

    (c) (d)

    Figure 8: (a) MFHD versus MaFD, (b) curves at time t = 99, (c) acceleration and (d) velocitycurves with deepest curve and two curves indicated whose MFHD is highly different from MaFD.

    bagplot does not assume any parametric assumption about the data, apart from unimodality.A more powerful cross-sectional outlier identification procedure could of course be obtained ifmore assumptions (such as gaussianity) can be made.

    Finally we compared our approach with the outliers found by the enhanced functional boxplot[16]. In that approach curves are globally flagged as outliers as soon as they exceed at sometime point the fences, which are constructed based on the 50% central region as in the standardboxplot. First we derived the appropriate factor to inflate the central region as described in [16],which yielded the factor 1.5. The resulting outlying curves are indicated in Figure 12(b). We seethat all flagged curves are also clearly visible in our diagnostic plot, either because their MFHDis very small, or because their proportion of local outlyingness is large. There are howeversome curves (36, 42 and 112) which are clearly outlying following our criteria, but which arenot detected by means of the functional boxplot. The acceleration and velocity curves in Figure12(c) and Figure 12(d) clearly show that these curves mainly have an outlying velocity behavior,which confirms our conclusion based on MFHD. The functional boxplot on the other hand isonly based on the acceleration curves and apparently was not able to detect these deviations.

    @ COMPSTAT 2012

  • 338 A new depth-based approach for detecting outlying curves

    Univariate raw data

    Time

    Acce

    lera

    tio

    n

    −2

    .0−

    1.5

    −1

    .0−

    0.5

    0.0

    0.5

    0 20 40 60 80 100 120

    (a)Bivariate analysis

    Time

    Acce

    lera

    tio

    n

    −2

    .0−

    1.5

    −1

    .0−

    0.5

    0.0

    0.5

    0 20 40 60 80 100 120

    Bivariate analysis

    Time

    Ve

    locity

    −5

    0−

    40

    −3

    0−

    20

    −1

    00

    0 20 40 60 80 100 120

    (b) (c)

    Figure 9: All curves colored according to their (a) univariate, and (b - c) bivariate MFHD depth.

    Bibliography

    [1] Bremner, D., Chen, D., Iacono, J., Langerman, S. and Morin, P. (2008) Output-sensitivealgorithms for Tukey depth and related problems. Statistics and Computing, 18, 259–266.

    [2] Cuesta-Albertos, J.A. and Nieto-Reyes, A. (2008) The random Tukey depth. ComputationalStatistics and Data Analysis, 52, 4979–4988.

    [3] Cuevas, A., Febrero, M. and Fraiman, R. (2007) Robust estimation and classification forfunctional data via projection-based depth notions. Computational Statistics, 22, 481–496.

    [4] De Ketelaere, B., Mertens, K., Mathijs, F., Diaza, D.S. and De Baerdemaeker, J. (2011)Nonstationarity in statistical process control ı̈¿½ issues, cases, ideas. Applied StochasticModels In Business and Industry, 27, 367-̈ı¿½376.

    [5] Febrero, M., Galeano, P. and González-Manteiga, W. (2008) Outlier detection in functionaldata by depth measures, with application to identify abnormal NOx levels. Environmetrics,19, 331–345.

    COMPSTAT 2012 Proceedings

  • Mia Hubert, Gerda Claeskens, Bart De Ketelaere and Kaveh Vakili 339

    −0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6−

    15

    −1

    0−

    50

    Bagplot

    Acceleration(t=57)

    Ve

    locity(t

    =5

    7)

    112

    207

    Figure 10: Bagplot at time point 57.

    0 20 40 60 80 100 120

    −2

    .0−

    1.5

    −1

    .0−

    0.5

    0.0

    0.5

    Time

    Acce

    lera

    tio

    n

    0 20 40 60 80 100 120

    −5

    0−

    40

    −3

    0−

    20

    −1

    00

    Time

    Ve

    locity

    (a) (b)

    Figure 11: Local outlyingness for curves 112 and 207 shown on the (a) acceleration and the (b)velocity curves.

    [6] Fraiman, R. and Muniz, G. (2001) Trimmed means for functional data. Test, 10, 419–440.

    [7] Hallin, M., Paindaveine, D. and Šiman, M. (2010) Multivariate quantiles and multiple-output regression quantiles: from L1 optimization to halfspace depth (with discussion).Annals of Statistics, 38, 635–669.

    [8] Hyndman, R.J. and Shang, H.L. (2010) Rainbow plots, bagplots, and boxplots for functionaldata. Journal of Computational and Graphical Statistics, 19, 29–45.

    [9] Liu, R., Serfling, R. and Souvaine, D. (2006) Data depth: robust multivariate analysis,computational geometry and applications. DIMACS Ser. Discrete Math. Theoret. Comput.Sci., American Mathematical Society, Providence, RI.

    [10] Lopez-Pintado, S. and Romo, J. (2009) On the concept of depth for functional data. Journalof the Americal Statistical Association, 104, 718–734.

    @ COMPSTAT 2012

  • 340 A new depth-based approach for detecting outlying curves

    0.00 0.05 0.10 0.15 0.20 0.25 0.30

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    MFHD

    pe

    rce

    nta

    ge

    of

    loca

    l o

    utlyin

    gn

    ess

    112

    207

    0.00 0.05 0.10 0.15 0.20 0.25 0.30

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    MFHD

    pe

    rce

    nta

    ge

    of

    loca

    l o

    utlyin

    gn

    ess

    42

    36

    112

    (a) (b)

    0 20 40 60 80 100 120

    −2

    .0−

    1.5

    −1

    .0−

    0.5

    0.0

    0.5

    Time

    Acce

    lera

    tio

    n

    42

    36

    112

    0 20 40 60 80 100 120

    −5

    0−

    40

    −3

    0−

    20

    −1

    00

    Time

    Ve

    locity

    42

    36

    112

    (c) (d)

    Figure 12: (a) Percentage of local outlyingness versus MFHD for all curves; (b) Same plot withoutliers found by the enhanced functional boxplot indicated by red triangles; (c) accelerationand (d) velocity curves and some outlying curves according to MFHD.

    [11] Lopez-Pintado, S. and Romo, J. (2011) A half-region depth for functional data. Computa-tional Statistics and Data Analysis, 55, 1679–1695.

    [12] Paindavaine, D., Šiman, M. (2012) Computing multiple-output regression quantile regions.Computational Statistics and Data Analysis, 56, 840–853.

    [13] Rousseeuw, P.J., Ruts, I. and Tukey, J.W. (1999) The bagplot: a bivariate boxplot. TheAmerican Statistician, 53, 382–387.

    [14] Slaets, L. (2011) Analyzing Phase and Amplitude Variation of Functional Data, Ph.D. dis-sertation, KU Leuven, Faculty of Business and Economics.

    [15] Sun, Y. and Genton, M.G. (2011) Functional boxplots. Journal of Computational and Graph-ical Statistics, 20, 316–334.

    [16] Sun, Y. and Genton, M.G. (2012) Adjusted functional boxplots for spatio-temporal datavisualization and outlier detection. Environmetrics, 23, 54–64.

    COMPSTAT 2012 Proceedings