On Masking and Swamping Robustness of Leading Outlier ...(Median, MAD) and (trimmed mean, trimmed...

On Masking and Swamping Robustness ofLeading Outlier Identifiers for Univariate Data

Shanshan Wang1 and Robert Serfling2

University of Texas at Dallas

February, 2013

1Department of Mathematics, University of Texas at Dallas, Richardson, Texas75080-3021, USA.

2Department of Mathematics, University of Texas at Dallas, Richard-son, Texas 75080-3021, USA. Email: [email protected]. Website:www.utdallas.edu/∼serfling.

Abstract

In the wide-ranging scope of modern statistical data analysis, a key task isidentification of outliers. In using an outlier identification procedure, oneneeds to know its robustness against masking (an “outlier” is undetected) andswamping (a “nonoutlier” is classified as an “outlier”), possibilities which cancome about due to the presence of outliers. Study of these issues togetheris necessary but complex. Recently, Serfling and Wang (2012) developed ageneral framework providing foundations, tools, and criteria applicable in anydata space. Application of this framework to particular outlier identifiers inparticular types of data space requires, however, additional development of anature specialized to the chosen setting. The present paper applies the generalframework to the case of univariate data and evaluates masking and swampingrobustness for two leading outlier identifiers, scaled deviation outlyingness andcentered rank outlyingness. Our results shed new light on the choice between(Median, MAD) and (trimmed mean, trimmed standard deviation) in definingscaled deviation outlyingness. Also, our findings explain how the boxplot, aleading descriptive tool, acquires its excellent robustness by incorporating ascaled deviation outlier identification component alongside its quantile-baseddescription of the central part of a data set.

AMS 2000 Subject Classification: Primary 62G35 Secondary 62-07

Key words and phrases: Nonparametric; Outlier detection; Masking robust-ness; Swamping robustness; Breakdown point; Boxplot.

1 Introduction

In the wide-ranging scope of modern statistical data analysis, a key task isidentification of outliers and anomalies. Besides traditional contexts, new oneshave arisen, such as fraud detection and intrusion detection. A basic featureof any outlier identification procedure is its robustness against two kinds ofmisclassification error: masking (some outliers are classified as nonoutliers)and swamping (some nonoutliers are classified as outliers). Unfortunately,the outliers themselves can interfere with the very process of identifying them.Masking and swamping robustness trade off against each other, and hence it isimportant to study them coherently within a single picture. For this purpose,Serfling and Wang (2012) recently developed a general theoretical frameworkproviding foundations, tools, and criteria for studying masking and swampingrobustness of outlier identifiers in an arbitrary data space. Implementationof these general results with a particular outlier identifier in a particular typeof data space, however, requires nontrivial additional development specializedto the chosen setting. As a first application of the general framework, thepresent paper focuses on two leading outlier identification methods in the caseof univariate data in the nonparametric setting of an arbitrary cdf F .

Our approach uses two robustness measures: the masking breakdown point(MBP) and the swamping breakdown point (SBP). These are the minimumfractions of points in a data set which if arbitrarily replaced can cause a givenoutlier detection procedure to mask arbitrarily extreme outliers, or to swamparbitrarily central nonoutliers, respectively. The higher the MBP and SBPvalues, the better the robustness of an outlier detection procedure. It turnsout that for each of MBP and SBP there are two complementary versions(Type A and Type B), making four robustness measures in all, with Type AMBP paired naturally with Type A SBP, and likewise for the Type B versions.

These four robustness measures are determined for two long-establishedoutlyingness functions: scaled deviation outlyingness∣∣∣∣x− µ(F )

σ(F )

∣∣∣∣ , −∞ < x < ∞,

where µ(F ) and σ(F ) are location and spread measures, respectively, andcentered rank outlyingness

|2F (x)− 1|, −∞ < x < ∞.

Note that each of these increases as x moves outward from the ”center”, µ(F )or Median(F ), respectively.

For scaled deviation outlyingness, we apply our MBP and SBP results tocompare (Median, MAD) versus (trimmed mean, trimmed standard deviation)as choices for (µ(F ), σ(F )). Each pair has its appeal, but there are some

1

differences in robustness performance. As another application, we use ourMBP and SBP results for centered rank outlyingness together with those forscaled deviation outlyingness to obtain an explanation of how the boxplot,a popular tool for descriptive summary of a data set, achieves its excellentrobustness.

Although the notion of (finite sample) breakdown point (BP) for estimatorsis well established and widely used, notions of masking and swamping BP aremore problematic and have received only limited previous treatment. In theunivariate parametric setting of the contaminated normal model, Davies andGather (1993) treat certain notions of Type A MBP and Type B SBP forscaled deviation outlyingness. Becker and Gather (1999) treat Type A MBPfor the Mahalanobis distance outlyingness in the setting of the multivariatecontaminated normal model. Dang and Serfling (2010) treat Type A MBPfor several depth-based outlier identifiers in the nonparametric multivariatesetting. The results of the present paper provide a comprehensive parallel toDavies and Gather (1993) that covers both Type A and B MBP and SBPfor scaled deviation outlyingness and centered rank outlyingness. In a futurepaper, the multivariate setting will be treated, unifying and extending thework of Becker and Gather (1999) and Dang and Serfling (2010) into a generaltreatment of Type A and B MBP and SBP for several leading multivariateoutlier identifiers.

The present paper is organized as follows. Section 2 provides preliminariesfrom Serfling and Wang (2012). Type A and B MBP and SBP results aredeveloped for scaled deviation outlyingness in Section 3 and for centered rankoutlyingness in Section 4. Application to the boxplot is carried out in Section5. A general discussion in Section 6 includes comparison of (Median, MAD)versus (trimmed mean, trimmed standard deviation) in defining the scaleddeviation outlyingness. The proofs of our MBP and SBP results are providedin the Appendix.

2 Preliminaries

Here we provide needed preliminaries with a minimum of detail. See Serfling(2010) and Serfling and Wang (2012) for elaboration and discussion.

2.1 Outlyingness functions

Let F be a probability distribution on R. “Outliers” are points or groups ofpoints which lie apart from the central part of F or from the main body of thedata, or which are unusual, anomalous, or suspicious in some sense. Associatedwith F , an outlyingness function O(x, F ) provides a center-outward ordering ofpoints x in R, with higher values representing greater “outlyingness” relative

2

to a “center” measuring location. We suppose that O(x, F ) has lower andupper limits

infx

O(x, F ) = 0, supx

O(x, F ) = 1. (1)

For a data set Xn = {X1, . . . , Xn} from F , a sample version of O(x, F ) isdenoted by O(x, Xn) and may be considered to estimate O(x, F ). We denoteits lower and upper limits by O∗

n and O∗∗n :

infx

O(x, Xn) = O∗n (≥ 0), sup

xO(x, Xn) = O∗∗

n (≤ 1). (2)

Although typically O∗n = 0 and O∗∗

n = 1, for one of our outlyingness functionsit turns out that O∗

n = n−1 if n is odd and 0 otherwise.

2.2 Nonparametric outlier identification

For a given outlyingness function O(x, F ), we define “λ outlier regions” thatrepresent the points of outlyingness greater than the threshold λ:

out(λ, F ) = {x : O(x, F ) > λ}, 0 < λ < 1.

The goal is to classsify, for given threshold λ, all points x of R as belongingto out(λ, F ) or not. For this purpose, we estimate the region out(λ, F ) by thesample version

OR(λ, Xn) = {x : O(x, Xn) > λ}.

It is understood that OR(λ, Xn) includes, in principle, “regular” points fromF as well as “contaminants” originating from other sources. In some cases,OR(λ, Xn) is given by out(λ, Fn) with Fn an empirical df.

2.3 Masking and swamping robustness

2.3.1 Masking robustness

Let A denote the complement of a set A. Key sets regarding masking are ofthe form

M(λ, γ, Xn, F ) = OR(λ, Xn) ∩ out(γ, F ),

defined for any λ and γ. Masking occurs if

M(λ, γ, Xn, F ) 6= ∅, (3)

which requires λ > O∗n. In this case some γ outliers of F are included in

the sample threshold λ nonoutlier region. For fixed λ, masking becomes moresevere as γ ↑ 1. That is, increasingly extreme outliers of F become maskedas sample threshold λ nonoutliers. This represents extreme Type A masking.

3

On the other hand, for fixed γ masking becomes more severe as λ ↓ O∗n. That

is, some threshold γ outliers of F are included within an increasingly centralsample nonoutlier region. This represents extreme Type B masking.

Now consider all possible modified data sets Xn,k obtainable by replacingk observations of Xn by arbitrarily positioned new values (“contaminants”).Corresponding to the “fixed λ” and “fixed γ” cases, respectively, two indicesthat measure in different ways the size of the masking effect are

γM(λ, Xn, k)

= largest γ for which (3) with fixed λ holds subject to k replacements

= sup{γ < 1 : ∃ k replacements such that M(λ, γ, Xn,k, F ) 6= ∅},

and

λM(γ, Xn, k)

= smallest λ for which (3) with fixed γ holds subject to k replacements

= inf{λ > O∗n : ∃ k replacements such that M(λ, γ, Xn,k, F ) 6= ∅}.

The quantity γM(λ, Xn, k) represents the largest degree of outlyingness relativeto F that is nonidentifiable at sample outlyingness threshold λ. The worstpossible case, γM(λ, Xn, k) = 1, denotes Type A masking breakdown due to k

replacements. Letting k(A)M (λ, Xn) = min{k : γM(λ, Xn, k) = 1}, the Type A

masking breakdown point of OR(·, Xn) at sample outlyingness threshold λ isthen given by

MBP(A)(λ, Xn) = k(A)M (λ, Xn)/n.

On the other hand, the quantity λM(γ, Xn, k) represents the most central levelat which a γ outlier of F can be masked, 5the worse the masking robustnessof and the worst possible case, λM(γ, Xn, k) = O∗

n, denotes Type B masking

breakdown due to k replacements. Leting k(B)M (γ, Xn) = min{k : λM(γ, Xn, k) =

O∗n}, the Type B masking breakdown point of OR(·, Xn) at F outlyingness

threshold γ is then given by

MBP(B)(γ, Xn) = k(B)M (γ, Xn)/n.

The higher the values of MBP(A)(λ, Xn) for O∗n < λ < 1 and MBP(B)(γ, Xn)

for 0 < γ < 1, the greater the masking robustness of the outlier identifierOR(·, Xn).

2.3.2 Swamping robustness

Key sets regarding swamping are of form

S(λ, γ, Xn, F ) = OR(λ, Xn) ∩ out(γ, F ),

4

defined for any λ and γ, and swamping occurs if

S(λ, γ, Xn, F ) 6= ∅, (4)

for λ < O∗∗n . In this case some γ nonoutliers of F are included in the sample

threshold λ outlier region. For fixed λ, the swamping becomes more severeas γ ↓ 0, with increasingly central nonoutliers of F becoming included in thesample threshold λ outlier region (extreme Type A swamping). For fixed γ,swamping becomes more severe as λ ↑ O∗∗

n , with threshold γ nonoutliers of Fincluded within an increasingly extreme sample outlier region (extreme TypeB swamping).

Again consider the modifications Xn,k obtainable by replacing k observa-tions of Xn by “contaminants”. Corresponding to the “fixed λ” and “fixed γ”cases, respectively, two indices related to extreme instances of swamping are

γS(λ, Xn, k)

= smallest γ for which (4) with fixed λ holds subject to k replacements

= inf{γ > 0 : ∃ k replacements such that S(λ, γ, Xn,k, F ) 6= ∅},

and

λS(γ, Xn, k)

= largest λ for which (4) with fixed γ holds subject to k replacements

= sup{λ < O∗∗n : ∃ k replacements such that S(λ, γ, Xn,k, F ) 6= ∅}.

The quantity γS(λ, Xn, k) represents the most central level of nonoutlier of Fthat can be swamped at sample outlier threshold λ. The worst possible case,γS(λ, Xn, k) = 0, denotes Type A swamping breakdown due to k replacements.

Letting k(A)S (λ, Xn) = min{k : γS(λ, Xn, k) = 0}, The Type A swamping break-

down point of OR(·, Xn) at sample outlyingness threshold λ is given by

SBP(A)(λ, Xn) = k(A)S (λ, Xn)/n.

On the other hand, λS(γ, Xn, k) represents the most extreme sample outlying-ness threshold at which a γ nonoutlier of F can be swamped (by the pres-ence of k replacements in the data Xn), 5OR(·, Xn), and the worst possiblecase, λS(γ, Xn, k) = O∗∗

n , denotes Type B swamping breakdown due to k re-

placements. Letting k(B)S (γ, Xn) = min{k : λS(γ, Xn, k) = O∗∗

n }, The Type Bswamping breakdown point of OR(·, Xn) at F outlyingness threshold γ is givenby

SBP(B)(γ, Xn) = k(B)S (γ, Xn)/n.

The higher the values of SBP(A)(λ, Xn) for 0 < λ < O∗∗n and SBP(B)(γ, Xn)

for 0 < γ < 1, the greater the swamping robustness of the outlier identifierOR(·, Xn).

5

2.3.3 The four masking and swamping robustness measures

In exploring a data set Xn using OR(λ, Xn) as an estimator of out(λ, F ) for aspecified outlyingness threshold λ, Type A MBP and SBP quite naturally gotogether as companion robustness measures. On the other hand, one mightfocus on out(γ, F ) for some γ and ask how centrally this outlier region canbe masked using OR(·, Xn). Also, with focus on out(γ, F ) for some γ, onemight want to know how extremely this nonoutlier region can be swampedusing OR(·, Xn), for which purpose the Type B MBP and SBP are companionrobustness measures that play roles complementary to the Type A versions.

2.4 Basic lemmas

Here we provide basic lemmas for evaluating MBP(A)(λ, Xn), MBP(B)(γ, Xn),SBP(A)(λ, Xn), and SBP(B)(γ, Xn) in applications. These reduce the problemto that of evaluating ordinary breakdown points of certain “inf” and “sup”statistics.

For a real-valued statistic T (Xn) taking values in [0, 1] or (−∞, +∞), forexample, explosion breakdown of T (Xn) occurs with k points of Xn replaced if

supXn,k

|T (Xn,k)| = supXn,n

|T (Xn,n)| =: T ∗∗, (5)

with Xn,k as previously. Typical values of T ∗∗ are 1 or ∞. With kexp(T (Xn))denoting the minimum k such that (5) can occur, the explosion replacementbreakdown point of T (Xn) is given by RBPexp(T (Xn)) = kexp(T (Xn))/n.

Likewise, implosion breakdown of T (Xn) occurs with k points of Xn replacedif

infXn,k

|T (Xn,k)| = infXn,n

|T (Xn,n)| =: T ∗. (6)

The typical value of T ∗ is 0. With obvious notation, the implosion replacementbreakdown point of T (Xn) is given by RBPimp(T (Xn)) = kimp(T (Xn))/n.

Representations for MBP(A)(λ, Xn), MBP(B)(γ, Xn), SBP(A)(λ, Xn), andSBP(B)(γ, Xn) in terms of the above explosion and implosion RBPs are givenin the following lemmas from Serfling and Wang (2012).

Lemma 1 Type A masking breakdown with replacement of k sample values(γM(λ, Xn, k) = 1) holds if and only if supXn,k

supy ∈OR(λ,Xn,k) O(y, F ) = 1,

and hence

MBP(A)(λ, Xn) = RBPexp

(sup

y ∈OR(λ,Xn)

O(y, F )

). (7)

6

Lemma 2 Type B masking breakdown with replacement of k sample values(λM(γ, Xn, k) = O∗

n) holds if and only if infXn,kinfy ∈ out(γ,F ) O(y, Xn,k)) = O∗

n,and hence

MBP(B)(γ, Xn) = RBPimp

(inf

y ∈ out(γ,F )O(y, Xn)

). (8)

Lemma 3 Type A swamping breakdown with replacement of k sample values(γS(λ, Xn, k) = 0) holds if and only if infXn,k

infy ∈OR(λ,Xn,k) O(y, F ) = 0, andhence

SBP(A)(λ, Xn) = RBPimp

(inf

y ∈OR(λ,Xn)O(y, F )

). (9)

Lemma 4 Type B swamping breakdown with replacement of k sample values(λS(γ, Xn, k) = O∗∗

n ) holds if and only if supXn,ksupy ∈ out(γ,F ) O(y, Xn,k) = O∗∗

n ,and hence

SBP(B)(γ, Xn) = RBPexp

(sup

y ∈ out(γ,F )

O(y, Xn)

). (10)

3 MBP and SBP Results for Scaled Deviation

Outlyingness

Scaled deviation outlyingness functions have been popularized by Mostellerand Tukey (1977), for example. Let µ(F ) and σ(F ) be any location andspread measures. The corresponding scaled deviation outlyingness functiontaking values in [0, 1) is given by O(x, F ) = O(x, F )/(1 + O(x, F )), with

O(x, F ) =

∣∣∣∣x− µ(F )

σ(F )

∣∣∣∣ ,and sample versions O(x, Xn) and O(x, Xn) are similarly defined using µ(Xn)and σ(Xn). Note that we have O∗

n = 0 and O∗∗n = 1. It is straightforward to

express Lemmas 1-4 in terms of O(x, F ) and O(x, Xn), and we obtain

MBP(A)(λ, Xn) = RBPexp

(sup

y ∈OR(λ,Xn)

O(y, F )

), (11)

MBP(B)(γ, Xn) = RBPimp

(inf

y ∈ out(γ,F )O(y, Xn)

), (12)

SBP(A)(λ, Xn) = RBPimp

(inf

y ∈OR(λ,Xn)O(y, F )

), (13)


(sup

y ∈ out(γ,F )

O(y, Xn)

). (14)

7

Below we treat each of these in turn. Comparative discussion with a table ofthe results together is deferred to Section 6.

For convenience, we put µ(Xn) = µ and σ(Xn) = σ. Now, in terms of the

O versions, we have out(γ, F ) = {x : O(x, F ) > γ} = {x : O(x, F ) > η}, with

η = γ/(1 − γ), and OR(λ, Xn) = {x : O(x, Xn) > λ} = {x : O(x, Xn) > β},with β = λ/(1− λ). Note that η ↑ ∞ as γ ↑ 1, β ↑ ∞ as λ ↑ 1. Accordingly,the above inf and sup expressions involve the regions

out(γ, F ) = [µ(F )− ησ(F ), µ(F )− ησ(F )]

OR(λ, Xn) = [µ(Xn)− βσ(Xn), µ(Xn)− βσ(Xn)]

and their complements.

3.1 Results for MBP(A)(λ, Xn)

We adopt

Assumption A. RBPexp (|µ + βσ|, Xn) and RBPexp (µ, Xn) are in-variant if Xn is replaced by −Xn, i.e, if each observation Xi isreplaced by −Xi, 1 ≤ i ≤ n.

Proposition 5 Under Assumption A,

MBP(A)(λ, Xn) = n−1 min{k1, k2}

= min{RBPexp(µ), RBPexp(σ | µ bounded)}. (15)

Remark. Note that MBP(A)(λ, Xn) does not depend upon the threshold λ. Intypical cases, we have

RBPexp(σ | µ bounded) ≥ RBPexp(σ) ≥ RBPexp(µ),

in which case simply MBP(A)(λ, Xn) = RBPexp(µ).

Examples. (Assumption A is satisfied in each case.)(i) Mean and Standard Deviation. We take µ = X and σ = S.

It is straightforward that RBPexp(µ) = n−1, the minimum possible, yieldingMBP(A)(λ, Xn) = n−1 ≈ 0. (In passing, we note that RBPexp(σ | µ bounded)= 2n−1.)

(ii) Median and MAD. We take µ = Med(Xn) and σ = MAD(Xn).For n = 2m + 1, to obtain Med → ∞, we require that m + 1 observations→ ∞. On the other hand, for n = 2m, to obtain Med → ∞, we require thatm observations → ∞. In either case, we have RBPexp(µ) = n−1bn+1

2c. For

n = 2m + 1, to obtain MAD → ∞ with Med bounded, we require that m

8

observations →∞ and 1 observation to → −∞, for a total of m + 1 replacedobservations. On the other hand, for n = 2m, to obtain Med →∞, we requirethat m − 1 observations → ∞ and that 1 observation → −∞, for a total ofm replaced observations. In either case, we have RBPexp(σ | µ bounded) =n−1bn+1

2c. This yields

MBP(A)(λ, Xn) = n−1

⌊n + 1

2

⌋≈ 1

2.

(iii) α-Trimmed Mean and SD. Let X(n−2bnαc) denote the n − 2bnαcobservations that remain after trimming away the upper bnαc observations andthe lower bnαc observations. Then take µ to be the mean and σ the standarddeviation of the data set X(n−2bnαc). It is readily checked that RBPexp(µ) =(bnαc+ 1)/n and that RBPexp(σ | µ bounded) = 2(bnαc+ 1)/n, yielding

MBP(A)(λ, Xn) = n−1(bnαc+ 1) ≈ α.

Note that the result in (iii) approaches that in (i) as α → 0 and that in (ii) asα → 1/2. 2

3.2 Results for MBP(B)(γ, Xn)

Proposition 6 We have

MBP(B)(γ, Xn)

=

{0 if µ 6∈ (µ− ησ, µ + ησ)

min{RBPexp(µ), RBPexp(σ | µ bounded)} if µ ∈ (µ− ησ, µ + ησ)

=

{0 if µ 6∈ (µ− ησ, µ + ησ)

MBP(A)(·, Xn) if µ ∈ (µ− ησ, µ + ησ).

Remark. Unlike MBP(A)(·, Xn), MBP(B)(γ, Xn) does depend upon the thresh-old γ, but only weakly. When µ ∈ (µ− ησ, µ+ ησ), then MBP(B)(γ, Xn) takesthe same value as MBP(A)(·, Xn), which does not depend specifically on thethreshold γ. When µ 6∈ (µ− ησ, µ + ησ), then MBP(B)(γ, Xn) takes the value0, which again does not depend specifically on the threshold γ. Note that thecase µ 6∈ (µ − ησ, µ + ησ) has decreasing probability as n increases, in thetypical case that µ is a consistent estimator of µ.

Examples. When µ(Xn) 6∈ (µ− ησ, µ + ησ), we have MBP(B)(·, Xn) = 0. Forthe case µ(Xn) ∈ (µ−ησ, µ+ησ), we obtain the same value as MBP(A)(·, Xn).Thus, for the latter case and the examples previously considered, we have:

9

(i) Mean and Standard Deviation with µ(Xn) ∈ (µ− ησ, µ + ησ).

MBP(B)(·, Xn) = n−1 ≈ 0.

(ii) Median and MAD with µ(Xn) ∈ (µ− ησ, µ + ησ).

MBP(B)(·, Xn) = n−1

⌊n + 1

2

⌋≈ 1

2.

(iii) α-Trimmed Mean and SD with µ(Xn) ∈ (µ− ησ, µ + ησ).

MBP(B)(·, Xn) = n−1(bnαc+ 1) ≈ α.


3.3 Results for SBP(A)(λ, Xn)

We adopt

Assumption B. RBPimp (|µ + βσ − µ| I(µ ∈ (µ− βσ, µ + βσ))) andits counterpart RBPimp (|µ− βσ − µ| I(µ ∈ (µ− βσ, µ + βσ))) areinvariant if Xn is replaced by −Xn, i.e, if each observation Xi isreplaced by −Xi, 1 ≤ i ≤ n.

We have

Proposition 7 Under Assumption B, we have

SBP(A)(λ, Xn)

=

{0 if µ 6∈ (µ− βσ, µ + βσ)

min{RBPimp(µ + βσ − µ), RBPexp(|µ| − βσ)} if µ ∈ (µ− βσ, µ + βσ),

where here RBPimp refers to implosion to 0 and RBPexp refers to explosion to+∞.

Examples. When µ(F ) 6∈ (µ− βσ, µ + βσ), we have SBP(A)(·, Xn) = 0. Forthe case µ(F ) ∈ (µ − βσ, µ + βσ), we obtain the following results for theexamples previously considered.

(i) Mean and Standard Deviation with µ(F ) ∈ (µ−βσ, µ+βσ). Notethat for µ + βσ → µ we need that all of the n data points be placed at µ,yielding RBPimp(µ+βσ−µ) = 1. Now, for |µ|−βσ →∞, it is readily derivedthat if we place k observations at x∗ →∞, then (for these µ and σ) µ ∼ k

nx∗

10

and σ ∼√

kn

(1− k

n

)x∗, in which case |µ| − βσ →∞ if and only if k > β2

1+β2 n

= λ2

λ2+(1−λ)2n and thus

SBP(A)(·, Xn) = RBPexp(|µ| − βσ) = n−1

⌈λ2

λ2 + (1− λ)2n

⌉≈ λ2

λ2 + (1− λ)2.

(ii) Median and MAD with µ(F ) ∈ (µ − βσ, µ + βσ). Similar steps asin (i) above yield (restricting to λ < 1/2 when n is even)

SBP(A)(·, Xn) = n−1

⌊n + 1

2

⌋≈ 1

2,

which we note is the same as MBP(A)(·, Xn) for the Median and MAD.(iii) α-Trimmed Mean and SD with µ(F ) ∈ (µ− βσ, µ + βσ).Similar steps as in (i) now yield

SBP(A)(·, Xn) = n−1

⌈λ2

λ2 + (1− λ)2(n− 2bnαc) + bnαc

⌉≈ λ2

λ2 + (1− λ)2(1− 2α) + α.


3.4 Results for SBP(B)(γ, Xn)

Let us adopt

Assumption C. RBPexp

(∣∣∣µ+ησ−bµbσ∣∣∣) and RBPexp

(∣∣∣µ−ησ−bµbσ∣∣∣) are

invariant if Xn is replaced by −Xn, i.e, if each observation Xi isreplaced by −Xi, 1 ≤ i ≤ n.

We have

Proposition 8 Under Assumption C, we have

SBP(B)(γ, Xn) = min {RBPexp(|µ| | σ = o(|µ|)), RBPimp(σ | µ bounded)}

Examples. (i) Mean and Standard Deviation. For µ → ∞ with σ =o(|µ|), we need that all of the n data points → ∞ in a pattern with theirspread about µ not increasing as fast as |µ|. That is, for this µ and σ, we have

RBPexp(|µ| | σ = o(|µ|)) = 1.

11

Now, for σ → 0 with µ bounded, we need to move n − 1 observations to achosen nth observation, yielding

RBPimp(σ | µ bounded) =n− 1

n

and thus

SBP(B)(·, Xn) =n− 1

n≈ 1.

(ii) Median and MAD. For n = 2m + 1, we may take the Median andm other observations to ∞, resulting in the MAD remaining bounded. Forn = 2m, we may take the two middle observations and m−1 other observationsto∞, again resulting in the MAD remaining bounded. Then, either way takingm + 1 to ∞, we have

RBPexp (|µ| | σ = o(|µ|)) = n−1

⌊n + 1

2

⌋.

For σ → 0 with µ bounded, whether n = 2m + 1 or n = 2m we need to movem observations to a common point. Namely, for n odd we the uppermost mobservations to Med, and for n even we move the uppermost m observationsto the mth, yielding

RBPimp(σ | µ bounded) = n−1⌊n

2

⌋and thus

SBP(B)(·, Xn) = n−1⌊n

2

⌋≈ 1

2.

(iii) α-Trimmed Mean and SD. For µ → ∞ with σ = o(|µ|), we needto take n− bnαc observations to ∞, yielding

RBPexp(|µ| | σ = o(|µ|)) = n−1(n− bnαc).

For σ → 0 with µ bounded, we need to take n− 2bnαc − 1 observations to acommon point, yielding

RBPimp(σ | µ bounded) = n−1(n− 2bnαc − 1)

and thusSBP(B)(·, Xn) = n−1(n− 2bnαc − 1) ≈ 1− 2α.

This result approaches that in (i) as α → 0 but does not approach that in (ii)as α → 1/2. This difference in the latter case is due to the difference betweenthe trimmed SD and the MAD and also due to the fact that it is the influenceof inliers which makes the SBP as low as 1 − 2α instead of 1 − α, in whichcase (iii) would agree with (ii) as α → 1/2. 2

12

4 MBP and SBP Results for Centered Rank

Outlyingness

Let F be a continuous distribution on R. The corresponding centered rankoutlyingness function taking values in [0, 1] is given by

O(x, F ) = |2F (x)− 1|.

For the sample version, we employ the usual sample df Fn(x) = n−1∑n

i=1 I(Xi ≤x), and define O(x, Xn) = |2Fn(x) − 1|. It is readily checked that O∗∗

n = 1

whereas, due to the fact that Fn(x) takes values only of the form k/n, O∗n is

not strictly 0, but rather

O∗n =

∣∣∣∣2bn+12c

n− 1

∣∣∣∣ =

{0, n even1n, n odd.

(16)

We have

out(γ, F ) = {x : |2F (x)− 1| > γ} =

[F−1

(1− γ

2

), F−1

(1 + γ

2

)]

OR(λ, Xn) =

[F−1

n

(1− λ

2

), F−1

n

(1 + λ

2

)].

As with the scaled deviation outlyingness, we treat each of MBP(A)(λ, Xn),MBP(B)(γ, Xn), SBP(A)(λ, Xn), and SBP(B)(γ, Xn) in turn.

4.1 Results for MBP(A)(λ, Xn)


MBP(A)(λ, Xn) = n−1

⌈1− λ

2n

⌉≈ 1− λ

2.

Remark. Note that MBP(A)(λ, Xn) depends upon the threshold λ and de-creases as λ increases. Note also (from the proof) that Type A maskingbreakdown is attained by replacing observations in such a way that eitherthe 1−λ

2sample quantile → −∞ or the 1+λ

2sample quantile → +∞, i.e., by

explosion breakdown of either of these sample quantiles due to outliers.

13

4.2 Results for MBP(B)(γ, Xn)


MBP(B)(γ, Xn)

=

O∗

n if F−1n

(12

)6∈(F−1

(1−γ

2

), F−1

(1+γ

2

)),

min{

n−1⌈

n+12

⌉− F−1

n

(F−1

(1−γ

2

)), F−1

n

(F−1

(1+γ

2

))− n−1

⌊n+1

2

⌋}if F−1

n

(12

)∈(F−1

(1−γ

2

), F−1

(1+γ

2

))≈ γ

2.

Remark. Note that MBP(B)(γ, Xn) depends upon the threshold γ and in-

creases as γ increases. The case F−1n

(12

)∈(F−1

(1−γ

2

), F−1

(1+γ

2

))has prob-

ability increasing to 1 as n increases. Note also (from the proof, omitted) thatType B masking breakdown is attained by replacing observations in such away that the sample median → either the 1−γ

2population quantile or the 1+γ

2

population quantile, i.e., by implosion breakdown of the sample median due toinliers.

4.3 Results for SBP(A)(λ, Xn)


SBP(A)(λ, Xn)

=

0 if F−1(

12

)6∈(F−1

n

(1−λ

2

), F−1

n

(1+λ

2

)),

min{

F−1n

(F−1

(12

))− n−1

⌈1−λ

2n⌉

+ n−1I(F−1

(12

)6∈ Xn

),

n−1⌈

1+λ2

n⌉− F−1

n

(F−1

(12

))}if F−1

(12

)∈(F−1

n

(1−λ

2

), F−1

n

(1+λ

2

))≈ λ

2.

Remark. Note that SBP(A)(λ, Xn) depends upon the threshold λ and in-creases as λ increases. For this outlyingness function, the severest swampingbreakdown is due to inliers. Note that SBP(A)(λ, Xn) is affected (slightly) bywhether a data point equals the population median, an event of probability

0 for F continuous. The case F−1(

12

)∈(F−1

n

(1−λ

2

), F−1

n

(1+λ

2

))has proba-

bility increasing to 1 as n increases. Note also (from the proof, omitted) that

14

Type A swamping breakdown is attained by replacing observations in such away that either the 1−λ

2sample quantile or the 1+λ

2sample quantile → the pop-

ulation median, i.e., by implosion breakdown of either the 1−λ2

sample quantileor the 1+λ

2sample quantile, due to inliers.

4.4 Results for SBP(B)(γ, Xn)


SBP(B)(γ, Xn)

=

0 if(F−1

(1−γ

2

), F−1

(1+γ

2

)]6⊂ (X1:n, Xn,n)

min{

1− F−1n

(F−1

(1−γ

2

)), F−1

n

(F−1

(1−γ

2

)),

1− F−1n

(F−1

(1+γ

2

)), F−1

n

(F−1

(1+γ

2

))}if(F−1

(1−γ

2

), F−1

(1+γ

2

)]⊂ (X1:n, Xn,n)

≈ 1− γ

2.

Remark. We see that SBP(B)(γ, Xn) depends upon the threshold γ and de-creases as γ increases, and it is affected by whether the range of the dataset includes the

(1−γ

2

)th and

(1+γ

2

)th population quantiles, which event does

occur with probability ↑ 1 as n → ∞. Note also (from the proof, omitted)that Type B swamping breakdown is attained by replacing observations insuch a way that either the evaluation of the sample cdf at the 1−γ

2population

quantile → 0 or 1, or its evaluation at the 1+γ2

population quantile → 0 or 1,

i.e., by explosion breakdown of either Fn

(F−1

(1−γ

2

))or Fn

(F−1

(1+γ

2

)), due

to outliers.

5 On the Robustness of the Boxplot

The sample boxplot may be viewed as an estimator of a population boxplot.Denote the 1st quartile, median, and 3rd quartile of F by Q1, M , and Q3 andthe sample versions by Q1, M , and Q3, respectively. There are two aspects tothe boxplot:

a) the box that represents the “middle half”,

b) upper and lower “fences” that mark “outlyingness” thresholds.

The box is based on the quartiles Q1, M , and Q3, whereas the fences aredefined by Q1 − 1.5× (Q3 −Q1) and Q3 + 1.5× (Q3 −Q1). Robustness of thesample boxplot is evaluated differently with respect to a) and b).

15

Regarding a), robustness is characterized simply by the usual RBPs of the

relevant sample quantiles. For Q1, M , and Q3, these RBPs are 1/4, 1/2, and1/4, respectively, highly favorable values which strongly justify the sample“middle half” as a robust estimator of the population “middle half”.

Turning to b), we first note that, although it might appear otherwise at firstglance, the fences involve an outlyingness function that is not quantile-basedlike the centered rank outlyingness, but rather which is of scaled deviationtype:

O(x, F ) =

Q1−x

Q3−Q1, x < Q1

0 Q1 ≤ x ≤ Q3

x−Q3

Q3−Q1, x > Q3.

This is because Q1 and Q3 are measures of location, and Q3 − Q1 (the IQR)measures spread. In terms of this outlyingness function, the boxplot’s outlierregion is based on threshold η = 1.5:

out(1.5, F ) = {x : O(x, F ) > 1.5},

with sample version OR(1.5, Xn). Let us now study the masking and swampingrobustness of this outlier identifier, by evaluating MBP(A), MBP(B), SBP(A),and SBP(B) for the boxplot’s form of scaled deviation outlyingness, using theresults of Propositions 5-8. It is straightforward to obtain

• MBP(A) = MBP(B) =1/4, with both Type A and Type B masking break-

down occurring by taking either Q1 to −∞ or Q3 to +∞, i.e., due toreplacement of 25% of the observations by extreme outliers in the samedirection.

• SBP(A) = SBP(B) =1/2, with Type A swamping breakdown occurring by

taking both Q1 and Q3 either to Q1 or to Q3, and with Type B swampingbreakdown occurring by making Q1 and Q3 coincide i.e., due to replace-ment of the middle 50% of the observations by inliers at any single point(which must be either Q1 or Q3 in the case of Type A swamping break-down).

Thus masking breakdown of the boxplot outlier identifier occurs only if at least25% of the sample consists of exreme outliers in a single direction. This 25%MBP represents a rather satisfactory degree of robustness for most situations.If higher MBP is desired using a nonparametric outlier identifier, then onealternatively can use scaled deviation outlyingness with Median and MAD,achieving 50% MBP. It also is of interest that a lesser fraction of extremeoutliers can be present without any masking effects whatsoever.

The boxplot is extremely strong with respect to swamping breakdown, withits 50% SBP. Further, the situations which would actually produce swamping

16

breakdown of the boxplot are highly unlikely in reality. However, with a lesserfraction of inner replacements, which has some realistic possibility, we indeedcan have noticeable swamping effects although without complete breakdown.

We illustrate masking and swamping effects for the boxplot by the followingsimple experiment:

• A “population” is created by taking a sample of size 500 from standardnormal. This population has quartiles Q1 = −0.69, M = 0.02, and Q3

= 0.67.

• A sample of size 100 is taken without replacement from the “population”.This sample has quartiles Q1 = −0.71, M = −0.04, and Q3 = 0.41.The sample boxplot represents an “estimator” of the population boxplot,both of the population “middle half” and the population“outlier region”.

• Eight modified samples are produced by replacing observations accordingto the following scenarios:

– replace the 10 uppermost observations by the outlying value “5”

– replace the 27 uppermost observations by the outlying value “5”

– replace 10 inner observations by the population 3rd quartile Q3



– replace 10 inner observations by the population median M



To determine the masking and swamping effects associated with these eightreplacement scenarios, we compare the respective upper outlyingness threshold(boxplot fence) for each modified sample with the population outlyingnessthreshold Q1 +1.5(Q3−Q1) = 2.71. The following table shows the respectiveupper outlyingness thresholds and their corresponding upper masking andswamping effects. (If an outlyingness threshold is greater than 2.71, thensome population outliers are being masked. If it is less than 2.71, then somepopulation nonoutliers are being swamped.)

17

Boxplot Fence Masking Swamping

All FiguresPopulation 2.71Sample 2.09 None LightFigure 1Outliers, 10 Replacements 2.09 None LightOutliers, 27 Replacements 13.57 Heavy NoneFigure 2Inliers at Q3, 10 Replacements 2.74 Light NoneInliers at Q3, 27 Replacements 2.74 Light NoneInliers at Q3, 54 Replacements 0.96 None HeavyFigure 3Inliers at M , 10 Replacements 2.04 None LightInliers at M , 27 Replacements 1.12 None ModerateInliers at M , 54 Replacements 0.08 None Heavy

We also illustrate the foregoing in Figures 1-3.

18

Figure 1: Masking Effects Due to Outliers. Boxplots for: population; sample; samplewith 10 uppermost points replaced by value 5; sample with 27 uppermost pointsreplaced by value 5.

Comments on Figure 1. The sample boxplot does not perfectly estimate thepopulation boxplot. Rather, the sample 3rd quartile Q3 is noticeably belowQ3, resulting in the sample boxplot outlyingness threshold of 2.09 versus thepopulation value 2.71, which produces swamping of population nonoutliersbetween 2.09 and 2.71 as sample outliers. Note that there is no masking.These effects represent the “luck of the draw” in the case of no contamination.With contamination produced by replacing the 10 uppermost sample values by“5” yields extreme outliers, there is no change in the masking and swampingeffects. However, 27 replacements (exceeding the MBP threshold of 25%) doesproduce masking breakdown. The new sample outlyingness threshold of 13.57produces extreme upper masking (but no longer is there any upper swamping).

19

Figure 2: Swamping Effects Due to Inliers Placed at Q3. Boxplots for: population;sample; sample with 10 inner points replaced by value Q3; sample with 27 inner pointsreplaced by value Q3; sample with 54 inner points replaced by value Q3.

Comments on Figure 2. As noted above, the sample itself produces swampingof population nonoutliers between 2.09 and 2.71 as sample outliers, but thereis no masking. Contamination produced by replacing 10 inner sample valuesby the population Q3 yields no change in these masking and swamping effects,nor does replacement of 27 inner sample values by the population Q3 (althoughnow the sample median shifts upward). However, 54 replacements (exceedingthe SBP threshold of 50%) does produce swamping breakdown. The newsample outlyingness threshold of 0.96 produces extreme upper swamping (butno longer is there any upper masking).

20

Figure 3: Swamping Effects Due to Inliers Placed at Median M . Boxplots for: pop-ulation; sample; sample with 10 inner points replaced by value M ; sample with 27inner points replaced by value M ; sample with 54 inner points replaced by value M .

Comments on Figure 3. As noted already, the sample itself produces swampingof population nonoutliers between 2.09 and 2.71 as sample outliers, but thereis no masking. Contamination produced by replacing 10 inner sample valuesby the population median M yields no change in these masking and swampingeffects. However, replacement of 27 inner sample values by the populationM yields a new sample outlyingness threshold of 1.12, which does producesignificant swamping effects although not total breakdown (and no longer isthere any upper masking). And, of course, 54 replacements (exceeding the SBPthreshold of 50%) by value M yields a new sample outlyingness threshold of0.08 and complete swamping breakdown (but no longer is there any uppermasking).

Summary. The boxplot is highly robust, with 25% RBP for the 1st and 3rdquartiles 50% RBP for the Median, 25% MBP, and 50% SBP. The favorableMBP and SBP are due to the boxplot employing scaled deviation outlyingnessas its outlier identification component.

If instead the boxplot were to use a quantile-based threshold, extendingits use of quantiles for the “middle half” description, then a natural choicefor the upper fence would be the quantile of standard normal associated with

21

Q3 + 1.5(Q3 −Q1) = 4Q3 = 2.70, namely the 0.9965 quantile with upper tailprobability 0.0035. But this would now be using centered rank outlyingnesswith (as shown in the Discussion section below) corresponding thresholds

λ = γ = 1− 2× 0.0035 = 0.993,

and with associated masking and robustness measures MBP(A) = 0.0015,MBP(B) = 0.497, SBP(A) = 0.497, and SBP(B) = 0.0015. One of the twoMBP measures is very high but the other too low, and the same holds for thetwo SBP measures.

Thanks to John Tukey’s ingenuity and profound understanding, the box-plot combines the best features of quantile-based description and scaled devi-ation outlyingness. 2

6 Discussion

Let us recall the interpretations associated with our four robustness measures:

I Suitable replacement of a fraction MBP(A)(λ, Xn) of the data Xn permitsF outliers at arbitrarily extreme levels γ ↑ 1 to be masked as samplenonoutliers at level λ.

II Suitable replacement of a fraction MBP(B)(γ, Xn) of the data Xn permitsγ outliers of F to be masked as sample nonoutliers at arbitrarily centrallevels λ ↓ O∗

n.

III Suitable replacement of a fraction SBP(A)(λ, Xn) of the data Xn permitsarbitrarily central nonoutliers of F at levels γ ↓ 0 to be swamped assample outliers at level λ.

IV Suitable replacement of a fraction SBP(B)(γ, Xn) of the data Xn permits γnonoutliers of F to be swamped as sample outliers at arbitrarily extremelevels λ ↑ O∗∗

n .

The following table compares three scaled deviation outlyingness functions andthe centered rank outlyingness function with respect to all four of MBP(A)(λ, Xn),MBP(B)(γ, Xn), SBP(A)(λ, Xn), and SBP(B)(γ, Xn), in terms of their limits asn increases.

Robustness Measure Mean, SD Med, MAD α-trims, α < 1/2 Centered Rank

MBP(A)(λ, Xn) 0 12

α 1−λ2

MBP(B)(γ, Xn) 0 12

α γ2

SBP(A)(λ, Xn) 1 12

λ2

λ2+(1−λ)2(1− 2α) + α λ

2

SBP(B)(γ, Xn) 1 12

1− 2α 1−γ2

22

The scaled deviation outlyingness function with (Mean, SD) is very robustwith respect to swamping, attaining the highest possible value of 1 for SBP, butit achieves this at too great a cost with respect to masking robustness (MBP= 0), these values holding irregardless of the choices of λ and γ. Clearly, thisversion of scaled deviation outlyingness should be dropped from consideration(as already is well-known).

Comparison of (Median, MAD) and (Trimmed Mean, TrimmedSD).

(i) The use of (Med, MAD) in scaled deviation outlyingness results in abalancing of MBP and SBP equally at the value 1/2, irregardless of the choicesof λ and γ, yielding a very desirable version of scaled deviation outlyingness.

(ii) On the other hand, use of the α-trimmed Mean and SD permits trade-offs giving moderate priority to SBP without oversacrificing with respect toMBP. For example, with α = 1/3 and λ ≈ 1, we have MBP(A) = MBP(B) =SBP(B) = 1/3, with SBP(A) = 2/3. Likewise, with α = 1/4 and λ ≈ 1, wehave MBP(A) = MBP(B) = 1/4, SBP(A) = 3/4, and SBP(B) = 1/2. We note,however, that the α-trimmed versions do not accommodate prioritizing MBPover SBP.

(iii) If, however, we allow α ↑ 1/2, then SBP(B) ≈ 0, even though (for anyλ) MBP(A) = MBP(B) = SBP(A) ≈ 1/2, the poor showing of SBP(B) being aconsequence of the α-trimmed versions being vulnerable to inliers.

(iv) If replacements are confined to just outliers, then SBP(B) becomes1−α instead of 1−2α, and we obtain MBP(A) = MBP(B) = SBP(A) = SBP(B)

≈ 1/2 as α ↑ 1/2. However, in the latter case while the trimmed mean thenbecomes the Median, the trimmed SD does not become the MAD but ratherbecomes ≈ 0. Of course, for some data sets the MAD also can become 0, butin such cases modified versions of MAD are used (see Zuo, 2003, and Serflingand Mazumder, 2009).

(v) In sum, at moderate levels of α, the α-trimmed approach offers alterna-tives to (Med, MAD) allowing some prioritization of SBP over MBP. However,at level α ≈ 1/2, the α-trimmed approach is not an attactive alternative. 2

For the centered rank outlyingness function, the MBPs and SBPs dependexplicitly on the thresholds γ and λ. This requires us to think carefully aboutour choices of these thresholds. In general, the sample outlier region OR(λ, Xn)estimates a population analogue, namely the outlier region out(λ, F ). Thus γshould reflect some relatively high outlyingness level of interest in the F distri-bution and determine the same value for λ. For the centered rank outlyingness,it is straightforward to select γ using

|2F (x)− 1| = γ.

In this case the “contour” of equal outlyingness at level γ consists of the twopoints F−1

(1−γ

2

)and F−1

(1+γ

2

). Then, if γ should demark the outlyingness

23

contour enclosing a proportion p of the distribution, F (x) should be takenequal to (1− p)/2 or (1 + p)/2, in either case yielding

λ = γ = p.

In particular, for a typical choice such as p = 0.9, we then obtain MBP(A) =0.05, MBP(B) = 0.45, SBP(A) = 0.45, and SBP(B) = 0.05. One of these twoMBP measures is high, the other low, and the same holds for the two SBPmeasures.

We can make all four values equal by choosing λ = γ = 1/2, which are

the levels corresponding to the 1st and 3rd quartiles of F and Fn, in whichcase MBP(A) = MBP(B) = SBP(A) = SBP(B) = 1/4. However, the 1st and 3rdquartiles are associated with description of the center of the distribution andare not typical outlyingness thresholds.

Summary. For equal priority on MBP and SBP, the scaled deviationoutlyingness with (Med, MAD) provides the best common value of 1/2. Formoderate prioritization of SBP over MBP if desired, the α-trimmed versionsare attractive. The centered rank outlyingness function has great descriptiveappeal because it is closely associated with the quantile function, but its sampleversions are not sufficiently robust at more extreme outlyingness thresholds.2

Acknowledgements

The authors gratefully acknowledge useful input from G. L. Thompson, XinDang, Satyaki Mazumder, Bo Hong, and Seoweon Jin. Also, support underNational Science Foundation Grant DMS-1106691 is sincerely acknowledged.

Appendix

Two lemmas from Serfling and Wang (2012) are helpful in evaluating the RBPs of“inf” and “sup” statistics. The first treats breakdown of a statistic S(Xn) when it iseither the minimum or the maximum of certain other statistics T1(Xn), . . . , TJ(Xn).Let k

(0)exp, k

(1)exp, . . . , k

(J)exp be the minimal numbers of data points which must be re-

placed in order to cause explosion breakdown of the respective statistics S andT1, . . . , TJ , and let k

(0)imp, k

(1)imp, . . . , k

(J)imp be their counterparts for implosion break-

down. Also, let

T ∗ = min{T ∗1 , . . . , T ∗

J}, T ∗∗ = max{T ∗∗1 , . . . , T ∗∗

J }.

Lemma 13 (i) Let S = min{T1, . . . , TJ}. Then

min{k(1)imp, . . . , k

(J)imp} ≤ k

(0)imp ≤ max{k(1)

imp, . . . , k(J)imp}. (17)

24

(ii) Let S = max{T1, . . . , TJ}. Then

min{k(1)exp, . . . , k

(J)exp} ≤ k(0)

exp ≤ max{k(1)exp, . . . , k

(J)exp}. (18)

The next lemma treats breakdown of a statistic S(Xn) when the event of break-down due to k replacements is related to the possible occurrences of certain eventsE1, . . . , EJ as a consequence of k replacements. Let kS be the minimal number ofdata points which must be replaced in order to cause breakdown (either implosionor explosion) of S, and let k1, . . . , kJ be the minimal numbers of data points whichmust be replaced in order to cause occurrence of the respective events E1, . . . , EJ .It is assumed that kS and k1, . . . , kJ are well-defined and belong to {1, 2, . . . , n}.

Lemma 14 (i) If breakdown of S is implied by occurrence of each one of the eventsE1, . . . , EJ , then

kS ≤ min{k1, . . . , kJ}. (19)

(ii) If breakdown of S implies occurrence of at least one of the events E1, . . . , EJ ,then

kS ≥ min{k1, . . . , kJ}. (20)

(iii) If breakdown of S is implied by occurrence of each one of the events E1, . . . , EJ

and also implies that at least one of E1, . . . , EJ must occur, then

kS = min{k1, . . . , kJ}. (21)

Proof of Proposition 5. Using OR(λ, Xn) = [µ− βσ, µ + βσ], it follows that

supy ∈OR(λ,Xn)

O(y, F )

=

O(µ + βσ, F ) if µ(F ) ≤ µ− βσ

O(µ− βσ, F ) if µ(F ) ≥ µ + βσ

max{

O(µ + βσ, F ), O(µ− βσ, F )}

otherwise

= max{

O(µ + βσ, F ), O(µ− βσ, F )}

(in all cases). (22)

It then follows from Lemma 13(ii) that

min{

RBPexp

(O(µ + βσ, F )

),RBPexp

(O(µ− βσ, F )

)}≤ MBP(A)(λ, Xn)

≤ max{

RBPexp

(O(µ + βσ, F )

),RBPexp

(O(µ− βσ, F )

)}(23)

We next evaluate RBPexp

(O(µ + βσ, F )

)and RBPexp

(O(µ− βσ, F )

). It is readily

checked that these are equal, respectively, to RBPexp (|µ + βσ|) and RBPexp (|µ− βσ|).Under Assumption A, RBPexp (|µ + βσ|) and RBPexp (|µ− βσ|) are equal, and wehave

MBP(A)(λ, Xn) = RBPexp (|µ + βσ|) . (24)

25

Here we are using RBPexp with T ∗∗ = ∞. We now evaluate RBPexp (|µ + βσ|), forwhich we apply Lemma 14 with some choice of k and the events S, E1, E2, and E3,where

S = {∃{Xn,k} such that |µ(Xn,k) + βσ(Xn,k)| → ∞}

E1 = {∃{Xn,k} such that µ(Xn,k) → +∞}

E2 = {∃{Xn,k} such that |µ(Xn,k)| is bounded and σ(Xn,k) →∞}

E3 = {∃{Xn,k} such that µ(Xn,k) → −∞ and |µ(Xn,k) + βσ(Xn,k)| → ∞} .

Note that (with k fixed) each of E1, E2, E3 implies S and S implies E1 ∪ E2 ∪ E3.Then (21) yields

RBPexp

(O(µ + βσ, F )

)= RBPexp (|µ + βσ|) = n−1 min{k1, k2, k3}, (25)

where k1, k2, k3 are the minimal values of k, respectively, for occurrence of E1, E2, E3.Finally, we note that, under Assumption A, k3 ≥ kexp(µ) = k1.

Proof of Proposition 6. Using out(γ, F ) = [µ− ησ, µ + ησ], it follows that(interpreting 0/0 as 0)

infy ∈ out(γ,F )

O(y, Xn) = infy ∈ out(γ,F )

∣∣∣∣y − µ

σ

∣∣∣∣=

{0 if µ 6∈ (µ− ησ, µ + ησ)min

{∣∣∣µ+ησ−bµbσ∣∣∣ , ∣∣∣µ−ησ−bµbσ

∣∣∣} if µ ∈ (µ− ησ, µ + ησ)

= min{∣∣∣∣µ + ησ − µ

σ

∣∣∣∣ I(µ ∈ (µ− ησ, µ + ησ)),∣∣∣∣µ− ησ − µ

σ

∣∣∣∣ I(µ ∈ (µ− ησ, µ + ησ))}

,

where I(A) denotes the indicator of the event A. It then follows from Lemma 13(i)that

min{

RBPimp

(∣∣∣∣µ + ησ − µ

σ

∣∣∣∣ I(µ ∈ (µ− ησ, µ + ησ)))

,

RBPimp

(∣∣∣∣µ− ησ − µ

σ

∣∣∣∣ I(µ ∈ (µ− ησ, µ + ησ)))}

≤ MBP(B)(γ, Xn)

≤ max{

RBPimp

(∣∣∣∣µ + ησ − µ

σ

∣∣∣∣ I(µ ∈ (µ− ησ, µ + ησ)))

,

RBPimp

(∣∣∣∣µ− ησ − µ

σ

∣∣∣∣ I(µ ∈ (µ− ησ, µ + ησ)))}

. (26)

26

Here we use RBPimp with T ∗ = 0. It is immediately evident from (26) that

MBP(B)(γ, Xn) = 0 if µ(Xn) 6∈ (µ− ησ, µ + ησ). (27)

We now treat the case that µ(Xn) ∈ (µ− ησ, µ + ησ) and evaluate

RBPimp

(∣∣∣∣µ + ησ − µ

σ

∣∣∣∣ I(µ ∈ (µ− ησ, µ + ησ)))

using Lemma 14 with some choice of k and the events S, E1 and E2, where

S ={∃{Xn,k} such that

∣∣∣∣µ + ησ − µ(Xn,k)σ(Xn,k)

∣∣∣∣ I(µ(Xn,k) ∈ (µ− ησ, µ + ησ)) → 0}

E1 = {∃{Xn,k} such that µ(Xn,k) 6∈ (µ− ησ, µ + ησ)}

E2 = {∃{Xn,k} such that µ(Xn,k) ∈ (µ− ησ, µ + ησ) holds and σ(Xn,k) →∞} .

Note that (with k fixed) each of E1 and E2 implies S and S implies E1 ∪E2. Then(21) yields

RBPimp

(∣∣∣∣µ + ησ − µ

σ

∣∣∣∣ I(µ ∈ (µ− ησ, µ + ησ)))

= n−1 min{k1, k2}. (28)

where k1, k2 are the minimal values of k, respectively, for occurrence of E1, E2.Similarly, with a modified S but the same E1, E2 and the same k1, k2, we obtain

RBPimp

(∣∣∣∣µ− ησ − µ

σ

∣∣∣∣ I(µ ∈ (µ− ησ, µ + ησ)))

= n−1 min{k1, k2}. (29)

In particular, note that, again using RBPexp with T ∗∗ = ∞,

k1/n = RBPexp(µ(Xn))

k2/n = RBPexp(σ(Xn) | µ(Xn) bounded).

Proof of Proposition 7. Using OR(λ, Xn) = [µ− βσ, µ + βσ], it follows that(interpreting 0/0 as 0)

infy ∈OR(λ,Xn)

O(y, F ) = infy ∈OR(λ,Xn)

∣∣∣∣y − µ(F )σ(F )

∣∣∣∣=

{0 if µ 6∈ (µ− βσ, µ + βσ)min

{∣∣∣ bµ+βbσ−µσ

∣∣∣ , ∣∣∣ bµ−βbσ−µσ

∣∣∣} if µ ∈ (µ− βσ, µ + βσ)

= min{∣∣∣∣ µ + βσ − µ

σ

∣∣∣∣ I(µ ∈ (µ− βσ, µ + βσ)),∣∣∣∣ µ− βσ − µ

σ

∣∣∣∣ I(µ ∈ (µ− βσ, µ + βσ))}

,

27

where again I(A) denotes the indicator of the event A. It then follows from Lemma13(i) that

min {RBPimp (|µ + βσ − µ| I(µ ∈ (µ− βσ, µ + βσ))) ,

RBPimp (|µ− βσ − µ| I(µ ∈ (µ− βσ, µ + βσ)))}

≤ SBP(A)(λ, Xn)

≤ max {RBPimp (|µ + βσ − µ| I(µ ∈ (µ− βσ, µ + βσ))) ,

RBPimp (|µ− βσ − µ| I(µ ∈ (µ− βσ, µ + βσ)))} . (30)

Under Assumption B,

RBPimp (|µ + βσ − µ| I(µ ∈ (µ− βσ, µ + βσ)))

= RBPimp (|µ− βσ − µ| I(µ ∈ (µ− βσ, µ + βσ)))

and via (30) we have

SBP(A)(λ, Xn) = RBPimp (|µ + βσ − µ| I(µ ∈ (µ− βσ, µ + βσ))) . (31)

It is immediately evident that

SBP(A)(λ, Xn) = 0 if µ(F ) 6∈ (µ− βσ, µ + βσ). (32)

We now treat the case that µ(F ) ∈ (µ− βσ, µ + βσ) and (under this assumption)evaluate

RBPimp (|µ + βσ − µ| I(µ ∈ (µ− βσ, µ + βσ)))

using Lemma 14 with some choice of k and the events S, E1, E2, and E3, where

S = {∃{Xn,k} such that|µ(Xn,k) + βσ(Xn,k)|I(µ ∈ (µ(Xn,k)− βσ(Xn,k), µ(Xn,k) + βσ(Xn,k))) → 0}

E1 = {∃{Xn,k} such that µ(Xn,k)− βσ(Xn,k) ≥ µ}

E2 = {∃{Xn,k} such that µ(Xn,k) + βσ(Xn,k) ≤ µ}

E3 = {∃{Xn,k} such thatµ ∈ (µ(Xn,k)− βσ(Xn,k), µ(Xn,k) + βσ(Xn,k))) and µ(Xn,k) + βσ(Xn,k) → µ} .

Note that (with k fixed) each of E1, E2, E3 implies S and S implies E1 ∪E2 ∪E3. Itis of interest that E1 and E2 are associated with the presence of outliers, whereasE3 is associated with the presence of inliers. Here we use RBPimp with T ∗ = 0 andRBPexp with T ∗∗ = ∞.

Proof of Proposition 8. Under Assumption C,

RBPexp

(∣∣∣∣µ + ησ − µ

σ

∣∣∣∣) = RBPexp

(∣∣∣∣µ− ησ − µ

σ

∣∣∣∣)

28

and, by steps similar to those in the foregoing treatments, we obtain


(∣∣∣∣µ + ησ − µ

σ

∣∣∣∣) . (33)

We now use Lemma 14 with some choice of k and the events S, E1, and E2, where


∣∣∣∣µ + ησ − µ

σ

∣∣∣∣→∞}

E1 = {∃{Xn,k} such that |µ(Xn,k)| → ∞ with σ(Xn,k) = o(|µ(Xn,k)|)}

E2 = {∃{Xn,k} such that |µ(Xn,k)| is bounded with σ(Xn,k) → 0} .

Note that (with k fixed) each of E1, E2 implies S and S implies E1∪E2. Also, E1 isassociated with the presence of outliers, whereas E2 is associated with the presenceof inliers.

Proof of Proposition 9. We have

supy ∈OR(λ,Xn)

O(y, F ) = supy ∈OR(λ,Xn)

|2F (y)− 1|

= max{∣∣∣∣2F

(F−1

n

(1− λ

2

))− 1∣∣∣∣ , ∣∣∣∣2F

(F−1

n

(1 + λ

2

))− 1∣∣∣∣} (34)

It then follows from Lemma 13(ii) that

min{

RBPexp

(∣∣∣∣2F

(F−1

n

(1− λ

2

))− 1∣∣∣∣) ,

RBPexp

(∣∣∣∣2F

(F−1

n

(1 + λ

2

))− 1∣∣∣∣)}

≤ MBP(A)(λ, Xn)

≤ max{

RBPexp

(∣∣∣∣2F

(F−1

n

(1− λ

2

))− 1∣∣∣∣) ,

RBPexp

(∣∣∣∣2F

(F−1

n

(1 + λ

2

))− 1∣∣∣∣)} . (35)

We now evaluate RBPexp

(∣∣∣2F(F−1

n

(1−λ

2

))− 1∣∣∣). Note that∣∣∣∣2F

(F−1

n

(1− λ

2

))− 1∣∣∣∣→ 1

29

if and only if either F−1n

(1−λ

2

)→∞ or F−1

n

(1−λ

2

)→ −∞. Thus we apply Lemma

14 with some choice of k and the events S, E1 and E2, where


∣∣∣∣2F

(F−1

n,k

(1− λ

2

))− 1∣∣∣∣→ 1

}E1 =

{∃{Xn,k} such that F−1

n,k

(1− λ

2

)→∞

}E2 =

{∃{Xn,k} such that F−1

n,k

(1− λ

2

)→ −∞

}.

Note that (with k fixed) each of E1, E2 implies S and S implies E1 ∪ E2. Here weuse RBPexp with T ∗∗ = 1. Then (21) yields

RBPexp

(∣∣∣∣2F

(F−1

n

(1− λ

2

))− 1∣∣∣∣) = n−1 min{k1, k2}, (36)

where k1, k2 are the minimal values of k, respectively, for occurrence of E1, E2.Specifically, we find

k1 = n−⌈

1− λ

2n

⌉+ 1 =

⌊1 + λ

2n

⌋+ 1

and

k2 =⌈

1− λ

2n

⌉≤ k1.

Thus

RBPexp

(∣∣∣∣2F

(F−1

n

(1− λ

2

))− 1∣∣∣∣) = n−1

⌈1− λ

2n

⌉.

Similarly we obtain

RBPexp

(∣∣∣∣2F

(F−1

n

(1 + λ

2

))− 1∣∣∣∣) = n−1

(⌊1− λ

2n

⌋+ 1)

.

Proofs of Propositions 10, 11 and 12. These are similar to the foregoingproof.

References

[1] Davies, L. and Gather, U. (1993). The identification of multiple outliers.Journal of the American Statistical Association 88 782–801.

[2] Donoho, D. L. and Huber, P. J. (1983). The notion of breakdown point.In A Festschrift foe Erich L. Lehmann (P. J. Bickel, K. A. Doksum andJ. L. Hodges, Jr., eds.) pp. 157-184, Wadsworth, Belmont, California.

30

[3] Mosteller, C. F. and Tukey, J. W. (1977). Data Analysis and Regression.Addison-Wesley, Reading, Mass.

[4] Serfling, R. and Mazumder, S. (2009). Exponential probability inequalityand convergence results for the median absolute deviation and its modi-fications. Statistics and Probability Letters, 79 1767-1773.

[5] Serfling, R. and Wang, S. (2012). General foundations for studying mask-ing and swamping robustness of outlier identifiers. Submitted (availableat www.utdallas.edu/∼serfling).

[6] Zuo, Y. (2003). Projection-based depth functions and associated medians.Annals of Statistics 31 1460–1490.

31

On Masking and Swamping Robustness of Leading Outlier ...(Median, MAD) and (trimmed mean, trimmed...

Documents

Transcript of On Masking and Swamping Robustness of Leading Outlier ...(Median, MAD) and (trimmed mean, trimmed...