On Masking and Swamping Robustness of Leading Outlier ...(Median, MAD) and (trimmed mean, trimmed...
Transcript of On Masking and Swamping Robustness of Leading Outlier ...(Median, MAD) and (trimmed mean, trimmed...
On Masking and Swamping Robustness ofLeading Outlier Identifiers for Univariate Data
Shanshan Wang1 and Robert Serfling2
University of Texas at Dallas
February, 2013
1Department of Mathematics, University of Texas at Dallas, Richardson, Texas75080-3021, USA.
2Department of Mathematics, University of Texas at Dallas, Richard-son, Texas 75080-3021, USA. Email: [email protected]. Website:www.utdallas.edu/∼serfling.
Abstract
In the wide-ranging scope of modern statistical data analysis, a key task isidentification of outliers. In using an outlier identification procedure, oneneeds to know its robustness against masking (an “outlier” is undetected) andswamping (a “nonoutlier” is classified as an “outlier”), possibilities which cancome about due to the presence of outliers. Study of these issues togetheris necessary but complex. Recently, Serfling and Wang (2012) developed ageneral framework providing foundations, tools, and criteria applicable in anydata space. Application of this framework to particular outlier identifiers inparticular types of data space requires, however, additional development of anature specialized to the chosen setting. The present paper applies the generalframework to the case of univariate data and evaluates masking and swampingrobustness for two leading outlier identifiers, scaled deviation outlyingness andcentered rank outlyingness. Our results shed new light on the choice between(Median, MAD) and (trimmed mean, trimmed standard deviation) in definingscaled deviation outlyingness. Also, our findings explain how the boxplot, aleading descriptive tool, acquires its excellent robustness by incorporating ascaled deviation outlier identification component alongside its quantile-baseddescription of the central part of a data set.
AMS 2000 Subject Classification: Primary 62G35 Secondary 62-07
Key words and phrases: Nonparametric; Outlier detection; Masking robust-ness; Swamping robustness; Breakdown point; Boxplot.
1 Introduction
In the wide-ranging scope of modern statistical data analysis, a key task isidentification of outliers and anomalies. Besides traditional contexts, new oneshave arisen, such as fraud detection and intrusion detection. A basic featureof any outlier identification procedure is its robustness against two kinds ofmisclassification error: masking (some outliers are classified as nonoutliers)and swamping (some nonoutliers are classified as outliers). Unfortunately,the outliers themselves can interfere with the very process of identifying them.Masking and swamping robustness trade off against each other, and hence it isimportant to study them coherently within a single picture. For this purpose,Serfling and Wang (2012) recently developed a general theoretical frameworkproviding foundations, tools, and criteria for studying masking and swampingrobustness of outlier identifiers in an arbitrary data space. Implementationof these general results with a particular outlier identifier in a particular typeof data space, however, requires nontrivial additional development specializedto the chosen setting. As a first application of the general framework, thepresent paper focuses on two leading outlier identification methods in the caseof univariate data in the nonparametric setting of an arbitrary cdf F .
Our approach uses two robustness measures: the masking breakdown point(MBP) and the swamping breakdown point (SBP). These are the minimumfractions of points in a data set which if arbitrarily replaced can cause a givenoutlier detection procedure to mask arbitrarily extreme outliers, or to swamparbitrarily central nonoutliers, respectively. The higher the MBP and SBPvalues, the better the robustness of an outlier detection procedure. It turnsout that for each of MBP and SBP there are two complementary versions(Type A and Type B), making four robustness measures in all, with Type AMBP paired naturally with Type A SBP, and likewise for the Type B versions.
These four robustness measures are determined for two long-establishedoutlyingness functions: scaled deviation outlyingness∣∣∣∣x− µ(F )
σ(F )
∣∣∣∣ , −∞ < x < ∞,
where µ(F ) and σ(F ) are location and spread measures, respectively, andcentered rank outlyingness
|2F (x)− 1|, −∞ < x < ∞.
Note that each of these increases as x moves outward from the ”center”, µ(F )or Median(F ), respectively.
For scaled deviation outlyingness, we apply our MBP and SBP results tocompare (Median, MAD) versus (trimmed mean, trimmed standard deviation)as choices for (µ(F ), σ(F )). Each pair has its appeal, but there are some
1
differences in robustness performance. As another application, we use ourMBP and SBP results for centered rank outlyingness together with those forscaled deviation outlyingness to obtain an explanation of how the boxplot,a popular tool for descriptive summary of a data set, achieves its excellentrobustness.
Although the notion of (finite sample) breakdown point (BP) for estimatorsis well established and widely used, notions of masking and swamping BP aremore problematic and have received only limited previous treatment. In theunivariate parametric setting of the contaminated normal model, Davies andGather (1993) treat certain notions of Type A MBP and Type B SBP forscaled deviation outlyingness. Becker and Gather (1999) treat Type A MBPfor the Mahalanobis distance outlyingness in the setting of the multivariatecontaminated normal model. Dang and Serfling (2010) treat Type A MBPfor several depth-based outlier identifiers in the nonparametric multivariatesetting. The results of the present paper provide a comprehensive parallel toDavies and Gather (1993) that covers both Type A and B MBP and SBPfor scaled deviation outlyingness and centered rank outlyingness. In a futurepaper, the multivariate setting will be treated, unifying and extending thework of Becker and Gather (1999) and Dang and Serfling (2010) into a generaltreatment of Type A and B MBP and SBP for several leading multivariateoutlier identifiers.
The present paper is organized as follows. Section 2 provides preliminariesfrom Serfling and Wang (2012). Type A and B MBP and SBP results aredeveloped for scaled deviation outlyingness in Section 3 and for centered rankoutlyingness in Section 4. Application to the boxplot is carried out in Section5. A general discussion in Section 6 includes comparison of (Median, MAD)versus (trimmed mean, trimmed standard deviation) in defining the scaleddeviation outlyingness. The proofs of our MBP and SBP results are providedin the Appendix.
2 Preliminaries
Here we provide needed preliminaries with a minimum of detail. See Serfling(2010) and Serfling and Wang (2012) for elaboration and discussion.
2.1 Outlyingness functions
Let F be a probability distribution on R. “Outliers” are points or groups ofpoints which lie apart from the central part of F or from the main body of thedata, or which are unusual, anomalous, or suspicious in some sense. Associatedwith F , an outlyingness function O(x, F ) provides a center-outward ordering ofpoints x in R, with higher values representing greater “outlyingness” relative
2
to a “center” measuring location. We suppose that O(x, F ) has lower andupper limits
infx
O(x, F ) = 0, supx
O(x, F ) = 1. (1)
For a data set Xn = {X1, . . . , Xn} from F , a sample version of O(x, F ) isdenoted by O(x, Xn) and may be considered to estimate O(x, F ). We denoteits lower and upper limits by O∗
n and O∗∗n :
infx
O(x, Xn) = O∗n (≥ 0), sup
xO(x, Xn) = O∗∗
n (≤ 1). (2)
Although typically O∗n = 0 and O∗∗
n = 1, for one of our outlyingness functionsit turns out that O∗
n = n−1 if n is odd and 0 otherwise.
2.2 Nonparametric outlier identification
For a given outlyingness function O(x, F ), we define “λ outlier regions” thatrepresent the points of outlyingness greater than the threshold λ:
out(λ, F ) = {x : O(x, F ) > λ}, 0 < λ < 1.
The goal is to classsify, for given threshold λ, all points x of R as belongingto out(λ, F ) or not. For this purpose, we estimate the region out(λ, F ) by thesample version
OR(λ, Xn) = {x : O(x, Xn) > λ}.
It is understood that OR(λ, Xn) includes, in principle, “regular” points fromF as well as “contaminants” originating from other sources. In some cases,OR(λ, Xn) is given by out(λ, Fn) with Fn an empirical df.
2.3 Masking and swamping robustness
2.3.1 Masking robustness
Let A denote the complement of a set A. Key sets regarding masking are ofthe form
M(λ, γ, Xn, F ) = OR(λ, Xn) ∩ out(γ, F ),
defined for any λ and γ. Masking occurs if
M(λ, γ, Xn, F ) 6= ∅, (3)
which requires λ > O∗n. In this case some γ outliers of F are included in
the sample threshold λ nonoutlier region. For fixed λ, masking becomes moresevere as γ ↑ 1. That is, increasingly extreme outliers of F become maskedas sample threshold λ nonoutliers. This represents extreme Type A masking.
3
On the other hand, for fixed γ masking becomes more severe as λ ↓ O∗n. That
is, some threshold γ outliers of F are included within an increasingly centralsample nonoutlier region. This represents extreme Type B masking.
Now consider all possible modified data sets Xn,k obtainable by replacingk observations of Xn by arbitrarily positioned new values (“contaminants”).Corresponding to the “fixed λ” and “fixed γ” cases, respectively, two indicesthat measure in different ways the size of the masking effect are
γM(λ, Xn, k)
= largest γ for which (3) with fixed λ holds subject to k replacements
= sup{γ < 1 : ∃ k replacements such that M(λ, γ, Xn,k, F ) 6= ∅},
and
λM(γ, Xn, k)
= smallest λ for which (3) with fixed γ holds subject to k replacements
= inf{λ > O∗n : ∃ k replacements such that M(λ, γ, Xn,k, F ) 6= ∅}.
The quantity γM(λ, Xn, k) represents the largest degree of outlyingness relativeto F that is nonidentifiable at sample outlyingness threshold λ. The worstpossible case, γM(λ, Xn, k) = 1, denotes Type A masking breakdown due to k
replacements. Letting k(A)M (λ, Xn) = min{k : γM(λ, Xn, k) = 1}, the Type A
masking breakdown point of OR(·, Xn) at sample outlyingness threshold λ isthen given by
MBP(A)(λ, Xn) = k(A)M (λ, Xn)/n.
On the other hand, the quantity λM(γ, Xn, k) represents the most central levelat which a γ outlier of F can be masked, 5the worse the masking robustnessof and the worst possible case, λM(γ, Xn, k) = O∗
n, denotes Type B masking
breakdown due to k replacements. Leting k(B)M (γ, Xn) = min{k : λM(γ, Xn, k) =
O∗n}, the Type B masking breakdown point of OR(·, Xn) at F outlyingness
threshold γ is then given by
MBP(B)(γ, Xn) = k(B)M (γ, Xn)/n.
The higher the values of MBP(A)(λ, Xn) for O∗n < λ < 1 and MBP(B)(γ, Xn)
for 0 < γ < 1, the greater the masking robustness of the outlier identifierOR(·, Xn).
2.3.2 Swamping robustness
Key sets regarding swamping are of form
S(λ, γ, Xn, F ) = OR(λ, Xn) ∩ out(γ, F ),
4
defined for any λ and γ, and swamping occurs if
S(λ, γ, Xn, F ) 6= ∅, (4)
for λ < O∗∗n . In this case some γ nonoutliers of F are included in the sample
threshold λ outlier region. For fixed λ, the swamping becomes more severeas γ ↓ 0, with increasingly central nonoutliers of F becoming included in thesample threshold λ outlier region (extreme Type A swamping). For fixed γ,swamping becomes more severe as λ ↑ O∗∗
n , with threshold γ nonoutliers of Fincluded within an increasingly extreme sample outlier region (extreme TypeB swamping).
Again consider the modifications Xn,k obtainable by replacing k observa-tions of Xn by “contaminants”. Corresponding to the “fixed λ” and “fixed γ”cases, respectively, two indices related to extreme instances of swamping are
γS(λ, Xn, k)
= smallest γ for which (4) with fixed λ holds subject to k replacements
= inf{γ > 0 : ∃ k replacements such that S(λ, γ, Xn,k, F ) 6= ∅},
and
λS(γ, Xn, k)
= largest λ for which (4) with fixed γ holds subject to k replacements
= sup{λ < O∗∗n : ∃ k replacements such that S(λ, γ, Xn,k, F ) 6= ∅}.
The quantity γS(λ, Xn, k) represents the most central level of nonoutlier of Fthat can be swamped at sample outlier threshold λ. The worst possible case,γS(λ, Xn, k) = 0, denotes Type A swamping breakdown due to k replacements.
Letting k(A)S (λ, Xn) = min{k : γS(λ, Xn, k) = 0}, The Type A swamping break-
down point of OR(·, Xn) at sample outlyingness threshold λ is given by
SBP(A)(λ, Xn) = k(A)S (λ, Xn)/n.
On the other hand, λS(γ, Xn, k) represents the most extreme sample outlying-ness threshold at which a γ nonoutlier of F can be swamped (by the pres-ence of k replacements in the data Xn), 5OR(·, Xn), and the worst possiblecase, λS(γ, Xn, k) = O∗∗
n , denotes Type B swamping breakdown due to k re-
placements. Letting k(B)S (γ, Xn) = min{k : λS(γ, Xn, k) = O∗∗
n }, The Type Bswamping breakdown point of OR(·, Xn) at F outlyingness threshold γ is givenby
SBP(B)(γ, Xn) = k(B)S (γ, Xn)/n.
The higher the values of SBP(A)(λ, Xn) for 0 < λ < O∗∗n and SBP(B)(γ, Xn)
for 0 < γ < 1, the greater the swamping robustness of the outlier identifierOR(·, Xn).
5
2.3.3 The four masking and swamping robustness measures
In exploring a data set Xn using OR(λ, Xn) as an estimator of out(λ, F ) for aspecified outlyingness threshold λ, Type A MBP and SBP quite naturally gotogether as companion robustness measures. On the other hand, one mightfocus on out(γ, F ) for some γ and ask how centrally this outlier region canbe masked using OR(·, Xn). Also, with focus on out(γ, F ) for some γ, onemight want to know how extremely this nonoutlier region can be swampedusing OR(·, Xn), for which purpose the Type B MBP and SBP are companionrobustness measures that play roles complementary to the Type A versions.
2.4 Basic lemmas
Here we provide basic lemmas for evaluating MBP(A)(λ, Xn), MBP(B)(γ, Xn),SBP(A)(λ, Xn), and SBP(B)(γ, Xn) in applications. These reduce the problemto that of evaluating ordinary breakdown points of certain “inf” and “sup”statistics.
For a real-valued statistic T (Xn) taking values in [0, 1] or (−∞, +∞), forexample, explosion breakdown of T (Xn) occurs with k points of Xn replaced if
supXn,k
|T (Xn,k)| = supXn,n
|T (Xn,n)| =: T ∗∗, (5)
with Xn,k as previously. Typical values of T ∗∗ are 1 or ∞. With kexp(T (Xn))denoting the minimum k such that (5) can occur, the explosion replacementbreakdown point of T (Xn) is given by RBPexp(T (Xn)) = kexp(T (Xn))/n.
Likewise, implosion breakdown of T (Xn) occurs with k points of Xn replacedif
infXn,k
|T (Xn,k)| = infXn,n
|T (Xn,n)| =: T ∗. (6)
The typical value of T ∗ is 0. With obvious notation, the implosion replacementbreakdown point of T (Xn) is given by RBPimp(T (Xn)) = kimp(T (Xn))/n.
Representations for MBP(A)(λ, Xn), MBP(B)(γ, Xn), SBP(A)(λ, Xn), andSBP(B)(γ, Xn) in terms of the above explosion and implosion RBPs are givenin the following lemmas from Serfling and Wang (2012).
Lemma 1 Type A masking breakdown with replacement of k sample values(γM(λ, Xn, k) = 1) holds if and only if supXn,k
supy ∈OR(λ,Xn,k) O(y, F ) = 1,
and hence
MBP(A)(λ, Xn) = RBPexp
(sup
y ∈OR(λ,Xn)
O(y, F )
). (7)
6
Lemma 2 Type B masking breakdown with replacement of k sample values(λM(γ, Xn, k) = O∗
n) holds if and only if infXn,kinfy ∈ out(γ,F ) O(y, Xn,k)) = O∗
n,and hence
MBP(B)(γ, Xn) = RBPimp
(inf
y ∈ out(γ,F )O(y, Xn)
). (8)
Lemma 3 Type A swamping breakdown with replacement of k sample values(γS(λ, Xn, k) = 0) holds if and only if infXn,k
infy ∈OR(λ,Xn,k) O(y, F ) = 0, andhence
SBP(A)(λ, Xn) = RBPimp
(inf
y ∈OR(λ,Xn)O(y, F )
). (9)
Lemma 4 Type B swamping breakdown with replacement of k sample values(λS(γ, Xn, k) = O∗∗
n ) holds if and only if supXn,ksupy ∈ out(γ,F ) O(y, Xn,k) = O∗∗
n ,and hence
SBP(B)(γ, Xn) = RBPexp
(sup
y ∈ out(γ,F )
O(y, Xn)
). (10)
3 MBP and SBP Results for Scaled Deviation
Outlyingness
Scaled deviation outlyingness functions have been popularized by Mostellerand Tukey (1977), for example. Let µ(F ) and σ(F ) be any location andspread measures. The corresponding scaled deviation outlyingness functiontaking values in [0, 1) is given by O(x, F ) = O(x, F )/(1 + O(x, F )), with
O(x, F ) =
∣∣∣∣x− µ(F )
σ(F )
∣∣∣∣ ,and sample versions O(x, Xn) and O(x, Xn) are similarly defined using µ(Xn)and σ(Xn). Note that we have O∗
n = 0 and O∗∗n = 1. It is straightforward to
express Lemmas 1-4 in terms of O(x, F ) and O(x, Xn), and we obtain
MBP(A)(λ, Xn) = RBPexp
(sup
y ∈OR(λ,Xn)
O(y, F )
), (11)
MBP(B)(γ, Xn) = RBPimp
(inf
y ∈ out(γ,F )O(y, Xn)
), (12)
SBP(A)(λ, Xn) = RBPimp
(inf
y ∈OR(λ,Xn)O(y, F )
), (13)
SBP(B)(γ, Xn) = RBPexp
(sup
y ∈ out(γ,F )
O(y, Xn)
). (14)
7
Below we treat each of these in turn. Comparative discussion with a table ofthe results together is deferred to Section 6.
For convenience, we put µ(Xn) = µ and σ(Xn) = σ. Now, in terms of the
O versions, we have out(γ, F ) = {x : O(x, F ) > γ} = {x : O(x, F ) > η}, with
η = γ/(1 − γ), and OR(λ, Xn) = {x : O(x, Xn) > λ} = {x : O(x, Xn) > β},with β = λ/(1− λ). Note that η ↑ ∞ as γ ↑ 1, β ↑ ∞ as λ ↑ 1. Accordingly,the above inf and sup expressions involve the regions
out(γ, F ) = [µ(F )− ησ(F ), µ(F )− ησ(F )]
OR(λ, Xn) = [µ(Xn)− βσ(Xn), µ(Xn)− βσ(Xn)]
and their complements.
3.1 Results for MBP(A)(λ, Xn)
We adopt
Assumption A. RBPexp (|µ + βσ|, Xn) and RBPexp (µ, Xn) are in-variant if Xn is replaced by −Xn, i.e, if each observation Xi isreplaced by −Xi, 1 ≤ i ≤ n.
Proposition 5 Under Assumption A,
MBP(A)(λ, Xn) = n−1 min{k1, k2}
= min{RBPexp(µ), RBPexp(σ | µ bounded)}. (15)
Remark. Note that MBP(A)(λ, Xn) does not depend upon the threshold λ. Intypical cases, we have
RBPexp(σ | µ bounded) ≥ RBPexp(σ) ≥ RBPexp(µ),
in which case simply MBP(A)(λ, Xn) = RBPexp(µ).
Examples. (Assumption A is satisfied in each case.)(i) Mean and Standard Deviation. We take µ = X and σ = S.
It is straightforward that RBPexp(µ) = n−1, the minimum possible, yieldingMBP(A)(λ, Xn) = n−1 ≈ 0. (In passing, we note that RBPexp(σ | µ bounded)= 2n−1.)
(ii) Median and MAD. We take µ = Med(Xn) and σ = MAD(Xn).For n = 2m + 1, to obtain Med → ∞, we require that m + 1 observations→ ∞. On the other hand, for n = 2m, to obtain Med → ∞, we require thatm observations → ∞. In either case, we have RBPexp(µ) = n−1bn+1
2c. For
n = 2m + 1, to obtain MAD → ∞ with Med bounded, we require that m
8
observations →∞ and 1 observation to → −∞, for a total of m + 1 replacedobservations. On the other hand, for n = 2m, to obtain Med →∞, we requirethat m − 1 observations → ∞ and that 1 observation → −∞, for a total ofm replaced observations. In either case, we have RBPexp(σ | µ bounded) =n−1bn+1
2c. This yields
MBP(A)(λ, Xn) = n−1
⌊n + 1
2
⌋≈ 1
2.
(iii) α-Trimmed Mean and SD. Let X(n−2bnαc) denote the n − 2bnαcobservations that remain after trimming away the upper bnαc observations andthe lower bnαc observations. Then take µ to be the mean and σ the standarddeviation of the data set X(n−2bnαc). It is readily checked that RBPexp(µ) =(bnαc+ 1)/n and that RBPexp(σ | µ bounded) = 2(bnαc+ 1)/n, yielding
MBP(A)(λ, Xn) = n−1(bnαc+ 1) ≈ α.
Note that the result in (iii) approaches that in (i) as α → 0 and that in (ii) asα → 1/2. 2
3.2 Results for MBP(B)(γ, Xn)
Proposition 6 We have
MBP(B)(γ, Xn)
=
{0 if µ 6∈ (µ− ησ, µ + ησ)
min{RBPexp(µ), RBPexp(σ | µ bounded)} if µ ∈ (µ− ησ, µ + ησ)
=
{0 if µ 6∈ (µ− ησ, µ + ησ)
MBP(A)(·, Xn) if µ ∈ (µ− ησ, µ + ησ).
Remark. Unlike MBP(A)(·, Xn), MBP(B)(γ, Xn) does depend upon the thresh-old γ, but only weakly. When µ ∈ (µ− ησ, µ+ ησ), then MBP(B)(γ, Xn) takesthe same value as MBP(A)(·, Xn), which does not depend specifically on thethreshold γ. When µ 6∈ (µ− ησ, µ + ησ), then MBP(B)(γ, Xn) takes the value0, which again does not depend specifically on the threshold γ. Note that thecase µ 6∈ (µ − ησ, µ + ησ) has decreasing probability as n increases, in thetypical case that µ is a consistent estimator of µ.
Examples. When µ(Xn) 6∈ (µ− ησ, µ + ησ), we have MBP(B)(·, Xn) = 0. Forthe case µ(Xn) ∈ (µ−ησ, µ+ησ), we obtain the same value as MBP(A)(·, Xn).Thus, for the latter case and the examples previously considered, we have:
9
(i) Mean and Standard Deviation with µ(Xn) ∈ (µ− ησ, µ + ησ).
MBP(B)(·, Xn) = n−1 ≈ 0.
(ii) Median and MAD with µ(Xn) ∈ (µ− ησ, µ + ησ).
MBP(B)(·, Xn) = n−1
⌊n + 1
2
⌋≈ 1
2.
(iii) α-Trimmed Mean and SD with µ(Xn) ∈ (µ− ησ, µ + ησ).
MBP(B)(·, Xn) = n−1(bnαc+ 1) ≈ α.
Note that the result in (iii) approaches that in (i) as α → 0 and that in (ii) asα → 1/2. 2
3.3 Results for SBP(A)(λ, Xn)
We adopt
Assumption B. RBPimp (|µ + βσ − µ| I(µ ∈ (µ− βσ, µ + βσ))) andits counterpart RBPimp (|µ− βσ − µ| I(µ ∈ (µ− βσ, µ + βσ))) areinvariant if Xn is replaced by −Xn, i.e, if each observation Xi isreplaced by −Xi, 1 ≤ i ≤ n.
We have
Proposition 7 Under Assumption B, we have
SBP(A)(λ, Xn)
=
{0 if µ 6∈ (µ− βσ, µ + βσ)
min{RBPimp(µ + βσ − µ), RBPexp(|µ| − βσ)} if µ ∈ (µ− βσ, µ + βσ),
where here RBPimp refers to implosion to 0 and RBPexp refers to explosion to+∞.
Examples. When µ(F ) 6∈ (µ− βσ, µ + βσ), we have SBP(A)(·, Xn) = 0. Forthe case µ(F ) ∈ (µ − βσ, µ + βσ), we obtain the following results for theexamples previously considered.
(i) Mean and Standard Deviation with µ(F ) ∈ (µ−βσ, µ+βσ). Notethat for µ + βσ → µ we need that all of the n data points be placed at µ,yielding RBPimp(µ+βσ−µ) = 1. Now, for |µ|−βσ →∞, it is readily derivedthat if we place k observations at x∗ →∞, then (for these µ and σ) µ ∼ k
nx∗
10
and σ ∼√
kn
(1− k
n
)x∗, in which case |µ| − βσ →∞ if and only if k > β2
1+β2 n
= λ2
λ2+(1−λ)2n and thus
SBP(A)(·, Xn) = RBPexp(|µ| − βσ) = n−1
⌈λ2
λ2 + (1− λ)2n
⌉≈ λ2
λ2 + (1− λ)2.
(ii) Median and MAD with µ(F ) ∈ (µ − βσ, µ + βσ). Similar steps asin (i) above yield (restricting to λ < 1/2 when n is even)
SBP(A)(·, Xn) = n−1
⌊n + 1
2
⌋≈ 1
2,
which we note is the same as MBP(A)(·, Xn) for the Median and MAD.(iii) α-Trimmed Mean and SD with µ(F ) ∈ (µ− βσ, µ + βσ).Similar steps as in (i) now yield
SBP(A)(·, Xn) = n−1
⌈λ2
λ2 + (1− λ)2(n− 2bnαc) + bnαc
⌉≈ λ2
λ2 + (1− λ)2(1− 2α) + α.
Note that the result in (iii) approaches that in (i) as α → 0 and that in (ii) asα → 1/2. 2
3.4 Results for SBP(B)(γ, Xn)
Let us adopt
Assumption C. RBPexp
(∣∣∣µ+ησ−bµbσ∣∣∣) and RBPexp
(∣∣∣µ−ησ−bµbσ∣∣∣) are
invariant if Xn is replaced by −Xn, i.e, if each observation Xi isreplaced by −Xi, 1 ≤ i ≤ n.
We have
Proposition 8 Under Assumption C, we have
SBP(B)(γ, Xn) = min {RBPexp(|µ| | σ = o(|µ|)), RBPimp(σ | µ bounded)}
Examples. (i) Mean and Standard Deviation. For µ → ∞ with σ =o(|µ|), we need that all of the n data points → ∞ in a pattern with theirspread about µ not increasing as fast as |µ|. That is, for this µ and σ, we have
RBPexp(|µ| | σ = o(|µ|)) = 1.
11
Now, for σ → 0 with µ bounded, we need to move n − 1 observations to achosen nth observation, yielding
RBPimp(σ | µ bounded) =n− 1
n
and thus
SBP(B)(·, Xn) =n− 1
n≈ 1.
(ii) Median and MAD. For n = 2m + 1, we may take the Median andm other observations to ∞, resulting in the MAD remaining bounded. Forn = 2m, we may take the two middle observations and m−1 other observationsto∞, again resulting in the MAD remaining bounded. Then, either way takingm + 1 to ∞, we have
RBPexp (|µ| | σ = o(|µ|)) = n−1
⌊n + 1
2
⌋.
For σ → 0 with µ bounded, whether n = 2m + 1 or n = 2m we need to movem observations to a common point. Namely, for n odd we the uppermost mobservations to Med, and for n even we move the uppermost m observationsto the mth, yielding
RBPimp(σ | µ bounded) = n−1⌊n
2
⌋and thus
SBP(B)(·, Xn) = n−1⌊n
2
⌋≈ 1
2.
(iii) α-Trimmed Mean and SD. For µ → ∞ with σ = o(|µ|), we needto take n− bnαc observations to ∞, yielding
RBPexp(|µ| | σ = o(|µ|)) = n−1(n− bnαc).
For σ → 0 with µ bounded, we need to take n− 2bnαc − 1 observations to acommon point, yielding
RBPimp(σ | µ bounded) = n−1(n− 2bnαc − 1)
and thusSBP(B)(·, Xn) = n−1(n− 2bnαc − 1) ≈ 1− 2α.
This result approaches that in (i) as α → 0 but does not approach that in (ii)as α → 1/2. This difference in the latter case is due to the difference betweenthe trimmed SD and the MAD and also due to the fact that it is the influenceof inliers which makes the SBP as low as 1 − 2α instead of 1 − α, in whichcase (iii) would agree with (ii) as α → 1/2. 2
12
4 MBP and SBP Results for Centered Rank
Outlyingness
Let F be a continuous distribution on R. The corresponding centered rankoutlyingness function taking values in [0, 1] is given by
O(x, F ) = |2F (x)− 1|.
For the sample version, we employ the usual sample df Fn(x) = n−1∑n
i=1 I(Xi ≤x), and define O(x, Xn) = |2Fn(x) − 1|. It is readily checked that O∗∗
n = 1
whereas, due to the fact that Fn(x) takes values only of the form k/n, O∗n is
not strictly 0, but rather
O∗n =
∣∣∣∣2bn+12c
n− 1
∣∣∣∣ =
{0, n even1n, n odd.
(16)
We have
out(γ, F ) = {x : |2F (x)− 1| > γ} =
[F−1
(1− γ
2
), F−1
(1 + γ
2
)]
OR(λ, Xn) =
[F−1
n
(1− λ
2
), F−1
n
(1 + λ
2
)].
As with the scaled deviation outlyingness, we treat each of MBP(A)(λ, Xn),MBP(B)(γ, Xn), SBP(A)(λ, Xn), and SBP(B)(γ, Xn) in turn.
4.1 Results for MBP(A)(λ, Xn)
Proposition 9 We have
MBP(A)(λ, Xn) = n−1
⌈1− λ
2n
⌉≈ 1− λ
2.
Remark. Note that MBP(A)(λ, Xn) depends upon the threshold λ and de-creases as λ increases. Note also (from the proof) that Type A maskingbreakdown is attained by replacing observations in such a way that eitherthe 1−λ
2sample quantile → −∞ or the 1+λ
2sample quantile → +∞, i.e., by
explosion breakdown of either of these sample quantiles due to outliers.
13
4.2 Results for MBP(B)(γ, Xn)
Proposition 10 We have
MBP(B)(γ, Xn)
=
O∗
n if F−1n
(12
)6∈(F−1
(1−γ
2
), F−1
(1+γ
2
)),
min{
n−1⌈
n+12
⌉− F−1
n
(F−1
(1−γ
2
)), F−1
n
(F−1
(1+γ
2
))− n−1
⌊n+1
2
⌋}if F−1
n
(12
)∈(F−1
(1−γ
2
), F−1
(1+γ
2
))≈ γ
2.
Remark. Note that MBP(B)(γ, Xn) depends upon the threshold γ and in-
creases as γ increases. The case F−1n
(12
)∈(F−1
(1−γ
2
), F−1
(1+γ
2
))has prob-
ability increasing to 1 as n increases. Note also (from the proof, omitted) thatType B masking breakdown is attained by replacing observations in such away that the sample median → either the 1−γ
2population quantile or the 1+γ
2
population quantile, i.e., by implosion breakdown of the sample median due toinliers.
4.3 Results for SBP(A)(λ, Xn)
Proposition 11 We have
SBP(A)(λ, Xn)
=
0 if F−1(
12
)6∈(F−1
n
(1−λ
2
), F−1
n
(1+λ
2
)),
min{
F−1n
(F−1
(12
))− n−1
⌈1−λ
2n⌉
+ n−1I(F−1
(12
)6∈ Xn
),
n−1⌈
1+λ2
n⌉− F−1
n
(F−1
(12
))}if F−1
(12
)∈(F−1
n
(1−λ
2
), F−1
n
(1+λ
2
))≈ λ
2.
Remark. Note that SBP(A)(λ, Xn) depends upon the threshold λ and in-creases as λ increases. For this outlyingness function, the severest swampingbreakdown is due to inliers. Note that SBP(A)(λ, Xn) is affected (slightly) bywhether a data point equals the population median, an event of probability
0 for F continuous. The case F−1(
12
)∈(F−1
n
(1−λ
2
), F−1
n
(1+λ
2
))has proba-
bility increasing to 1 as n increases. Note also (from the proof, omitted) that
14
Type A swamping breakdown is attained by replacing observations in such away that either the 1−λ
2sample quantile or the 1+λ
2sample quantile → the pop-
ulation median, i.e., by implosion breakdown of either the 1−λ2
sample quantileor the 1+λ
2sample quantile, due to inliers.
4.4 Results for SBP(B)(γ, Xn)
Proposition 12 We have
SBP(B)(γ, Xn)
=
0 if(F−1
(1−γ
2
), F−1
(1+γ
2
)]6⊂ (X1:n, Xn,n)
min{
1− F−1n
(F−1
(1−γ
2
)), F−1
n
(F−1
(1−γ
2
)),
1− F−1n
(F−1
(1+γ
2
)), F−1
n
(F−1
(1+γ
2
))}if(F−1
(1−γ
2
), F−1
(1+γ
2
)]⊂ (X1:n, Xn,n)
≈ 1− γ
2.
Remark. We see that SBP(B)(γ, Xn) depends upon the threshold γ and de-creases as γ increases, and it is affected by whether the range of the dataset includes the
(1−γ
2
)th and
(1+γ
2
)th population quantiles, which event does
occur with probability ↑ 1 as n → ∞. Note also (from the proof, omitted)that Type B swamping breakdown is attained by replacing observations insuch a way that either the evaluation of the sample cdf at the 1−γ
2population
quantile → 0 or 1, or its evaluation at the 1+γ2
population quantile → 0 or 1,
i.e., by explosion breakdown of either Fn
(F−1
(1−γ
2
))or Fn
(F−1
(1+γ
2
)), due
to outliers.
5 On the Robustness of the Boxplot
The sample boxplot may be viewed as an estimator of a population boxplot.Denote the 1st quartile, median, and 3rd quartile of F by Q1, M , and Q3 andthe sample versions by Q1, M , and Q3, respectively. There are two aspects tothe boxplot:
a) the box that represents the “middle half”,
b) upper and lower “fences” that mark “outlyingness” thresholds.
The box is based on the quartiles Q1, M , and Q3, whereas the fences aredefined by Q1 − 1.5× (Q3 −Q1) and Q3 + 1.5× (Q3 −Q1). Robustness of thesample boxplot is evaluated differently with respect to a) and b).
15
Regarding a), robustness is characterized simply by the usual RBPs of the
relevant sample quantiles. For Q1, M , and Q3, these RBPs are 1/4, 1/2, and1/4, respectively, highly favorable values which strongly justify the sample“middle half” as a robust estimator of the population “middle half”.
Turning to b), we first note that, although it might appear otherwise at firstglance, the fences involve an outlyingness function that is not quantile-basedlike the centered rank outlyingness, but rather which is of scaled deviationtype:
O(x, F ) =
Q1−x
Q3−Q1, x < Q1
0 Q1 ≤ x ≤ Q3
x−Q3
Q3−Q1, x > Q3.
This is because Q1 and Q3 are measures of location, and Q3 − Q1 (the IQR)measures spread. In terms of this outlyingness function, the boxplot’s outlierregion is based on threshold η = 1.5:
out(1.5, F ) = {x : O(x, F ) > 1.5},
with sample version OR(1.5, Xn). Let us now study the masking and swampingrobustness of this outlier identifier, by evaluating MBP(A), MBP(B), SBP(A),and SBP(B) for the boxplot’s form of scaled deviation outlyingness, using theresults of Propositions 5-8. It is straightforward to obtain
• MBP(A) = MBP(B) =1/4, with both Type A and Type B masking break-
down occurring by taking either Q1 to −∞ or Q3 to +∞, i.e., due toreplacement of 25% of the observations by extreme outliers in the samedirection.
• SBP(A) = SBP(B) =1/2, with Type A swamping breakdown occurring by
taking both Q1 and Q3 either to Q1 or to Q3, and with Type B swampingbreakdown occurring by making Q1 and Q3 coincide i.e., due to replace-ment of the middle 50% of the observations by inliers at any single point(which must be either Q1 or Q3 in the case of Type A swamping break-down).
Thus masking breakdown of the boxplot outlier identifier occurs only if at least25% of the sample consists of exreme outliers in a single direction. This 25%MBP represents a rather satisfactory degree of robustness for most situations.If higher MBP is desired using a nonparametric outlier identifier, then onealternatively can use scaled deviation outlyingness with Median and MAD,achieving 50% MBP. It also is of interest that a lesser fraction of extremeoutliers can be present without any masking effects whatsoever.
The boxplot is extremely strong with respect to swamping breakdown, withits 50% SBP. Further, the situations which would actually produce swamping
16
breakdown of the boxplot are highly unlikely in reality. However, with a lesserfraction of inner replacements, which has some realistic possibility, we indeedcan have noticeable swamping effects although without complete breakdown.
We illustrate masking and swamping effects for the boxplot by the followingsimple experiment:
• A “population” is created by taking a sample of size 500 from standardnormal. This population has quartiles Q1 = −0.69, M = 0.02, and Q3
= 0.67.
• A sample of size 100 is taken without replacement from the “population”.This sample has quartiles Q1 = −0.71, M = −0.04, and Q3 = 0.41.The sample boxplot represents an “estimator” of the population boxplot,both of the population “middle half” and the population“outlier region”.
• Eight modified samples are produced by replacing observations accordingto the following scenarios:
– replace the 10 uppermost observations by the outlying value “5”
– replace the 27 uppermost observations by the outlying value “5”
– replace 10 inner observations by the population 3rd quartile Q3
– replace 27 inner observations by the population 3rd quartile Q3
– replace 54 inner observations by the population 3rd quartile Q3
– replace 10 inner observations by the population median M
– replace 27 inner observations by the population median M
– replace 54 inner observations by the population median M
To determine the masking and swamping effects associated with these eightreplacement scenarios, we compare the respective upper outlyingness threshold(boxplot fence) for each modified sample with the population outlyingnessthreshold Q1 +1.5(Q3−Q1) = 2.71. The following table shows the respectiveupper outlyingness thresholds and their corresponding upper masking andswamping effects. (If an outlyingness threshold is greater than 2.71, thensome population outliers are being masked. If it is less than 2.71, then somepopulation nonoutliers are being swamped.)
17
Boxplot Fence Masking Swamping
All FiguresPopulation 2.71Sample 2.09 None LightFigure 1Outliers, 10 Replacements 2.09 None LightOutliers, 27 Replacements 13.57 Heavy NoneFigure 2Inliers at Q3, 10 Replacements 2.74 Light NoneInliers at Q3, 27 Replacements 2.74 Light NoneInliers at Q3, 54 Replacements 0.96 None HeavyFigure 3Inliers at M , 10 Replacements 2.04 None LightInliers at M , 27 Replacements 1.12 None ModerateInliers at M , 54 Replacements 0.08 None Heavy
We also illustrate the foregoing in Figures 1-3.
18
Figure 1: Masking Effects Due to Outliers. Boxplots for: population; sample; samplewith 10 uppermost points replaced by value 5; sample with 27 uppermost pointsreplaced by value 5.
Comments on Figure 1. The sample boxplot does not perfectly estimate thepopulation boxplot. Rather, the sample 3rd quartile Q3 is noticeably belowQ3, resulting in the sample boxplot outlyingness threshold of 2.09 versus thepopulation value 2.71, which produces swamping of population nonoutliersbetween 2.09 and 2.71 as sample outliers. Note that there is no masking.These effects represent the “luck of the draw” in the case of no contamination.With contamination produced by replacing the 10 uppermost sample values by“5” yields extreme outliers, there is no change in the masking and swampingeffects. However, 27 replacements (exceeding the MBP threshold of 25%) doesproduce masking breakdown. The new sample outlyingness threshold of 13.57produces extreme upper masking (but no longer is there any upper swamping).
19
Figure 2: Swamping Effects Due to Inliers Placed at Q3. Boxplots for: population;sample; sample with 10 inner points replaced by value Q3; sample with 27 inner pointsreplaced by value Q3; sample with 54 inner points replaced by value Q3.
Comments on Figure 2. As noted above, the sample itself produces swampingof population nonoutliers between 2.09 and 2.71 as sample outliers, but thereis no masking. Contamination produced by replacing 10 inner sample valuesby the population Q3 yields no change in these masking and swamping effects,nor does replacement of 27 inner sample values by the population Q3 (althoughnow the sample median shifts upward). However, 54 replacements (exceedingthe SBP threshold of 50%) does produce swamping breakdown. The newsample outlyingness threshold of 0.96 produces extreme upper swamping (butno longer is there any upper masking).
20
Figure 3: Swamping Effects Due to Inliers Placed at Median M . Boxplots for: pop-ulation; sample; sample with 10 inner points replaced by value M ; sample with 27inner points replaced by value M ; sample with 54 inner points replaced by value M .
Comments on Figure 3. As noted already, the sample itself produces swampingof population nonoutliers between 2.09 and 2.71 as sample outliers, but thereis no masking. Contamination produced by replacing 10 inner sample valuesby the population median M yields no change in these masking and swampingeffects. However, replacement of 27 inner sample values by the populationM yields a new sample outlyingness threshold of 1.12, which does producesignificant swamping effects although not total breakdown (and no longer isthere any upper masking). And, of course, 54 replacements (exceeding the SBPthreshold of 50%) by value M yields a new sample outlyingness threshold of0.08 and complete swamping breakdown (but no longer is there any uppermasking).
Summary. The boxplot is highly robust, with 25% RBP for the 1st and 3rdquartiles 50% RBP for the Median, 25% MBP, and 50% SBP. The favorableMBP and SBP are due to the boxplot employing scaled deviation outlyingnessas its outlier identification component.
If instead the boxplot were to use a quantile-based threshold, extendingits use of quantiles for the “middle half” description, then a natural choicefor the upper fence would be the quantile of standard normal associated with
21
Q3 + 1.5(Q3 −Q1) = 4Q3 = 2.70, namely the 0.9965 quantile with upper tailprobability 0.0035. But this would now be using centered rank outlyingnesswith (as shown in the Discussion section below) corresponding thresholds
λ = γ = 1− 2× 0.0035 = 0.993,
and with associated masking and robustness measures MBP(A) = 0.0015,MBP(B) = 0.497, SBP(A) = 0.497, and SBP(B) = 0.0015. One of the twoMBP measures is very high but the other too low, and the same holds for thetwo SBP measures.
Thanks to John Tukey’s ingenuity and profound understanding, the box-plot combines the best features of quantile-based description and scaled devi-ation outlyingness. 2
6 Discussion
Let us recall the interpretations associated with our four robustness measures:
I Suitable replacement of a fraction MBP(A)(λ, Xn) of the data Xn permitsF outliers at arbitrarily extreme levels γ ↑ 1 to be masked as samplenonoutliers at level λ.
II Suitable replacement of a fraction MBP(B)(γ, Xn) of the data Xn permitsγ outliers of F to be masked as sample nonoutliers at arbitrarily centrallevels λ ↓ O∗
n.
III Suitable replacement of a fraction SBP(A)(λ, Xn) of the data Xn permitsarbitrarily central nonoutliers of F at levels γ ↓ 0 to be swamped assample outliers at level λ.
IV Suitable replacement of a fraction SBP(B)(γ, Xn) of the data Xn permits γnonoutliers of F to be swamped as sample outliers at arbitrarily extremelevels λ ↑ O∗∗
n .
The following table compares three scaled deviation outlyingness functions andthe centered rank outlyingness function with respect to all four of MBP(A)(λ, Xn),MBP(B)(γ, Xn), SBP(A)(λ, Xn), and SBP(B)(γ, Xn), in terms of their limits asn increases.
Robustness Measure Mean, SD Med, MAD α-trims, α < 1/2 Centered Rank
MBP(A)(λ, Xn) 0 12
α 1−λ2
MBP(B)(γ, Xn) 0 12
α γ2
SBP(A)(λ, Xn) 1 12
λ2
λ2+(1−λ)2(1− 2α) + α λ
2
SBP(B)(γ, Xn) 1 12
1− 2α 1−γ2
22
The scaled deviation outlyingness function with (Mean, SD) is very robustwith respect to swamping, attaining the highest possible value of 1 for SBP, butit achieves this at too great a cost with respect to masking robustness (MBP= 0), these values holding irregardless of the choices of λ and γ. Clearly, thisversion of scaled deviation outlyingness should be dropped from consideration(as already is well-known).
Comparison of (Median, MAD) and (Trimmed Mean, TrimmedSD).
(i) The use of (Med, MAD) in scaled deviation outlyingness results in abalancing of MBP and SBP equally at the value 1/2, irregardless of the choicesof λ and γ, yielding a very desirable version of scaled deviation outlyingness.
(ii) On the other hand, use of the α-trimmed Mean and SD permits trade-offs giving moderate priority to SBP without oversacrificing with respect toMBP. For example, with α = 1/3 and λ ≈ 1, we have MBP(A) = MBP(B) =SBP(B) = 1/3, with SBP(A) = 2/3. Likewise, with α = 1/4 and λ ≈ 1, wehave MBP(A) = MBP(B) = 1/4, SBP(A) = 3/4, and SBP(B) = 1/2. We note,however, that the α-trimmed versions do not accommodate prioritizing MBPover SBP.
(iii) If, however, we allow α ↑ 1/2, then SBP(B) ≈ 0, even though (for anyλ) MBP(A) = MBP(B) = SBP(A) ≈ 1/2, the poor showing of SBP(B) being aconsequence of the α-trimmed versions being vulnerable to inliers.
(iv) If replacements are confined to just outliers, then SBP(B) becomes1−α instead of 1−2α, and we obtain MBP(A) = MBP(B) = SBP(A) = SBP(B)
≈ 1/2 as α ↑ 1/2. However, in the latter case while the trimmed mean thenbecomes the Median, the trimmed SD does not become the MAD but ratherbecomes ≈ 0. Of course, for some data sets the MAD also can become 0, butin such cases modified versions of MAD are used (see Zuo, 2003, and Serflingand Mazumder, 2009).
(v) In sum, at moderate levels of α, the α-trimmed approach offers alterna-tives to (Med, MAD) allowing some prioritization of SBP over MBP. However,at level α ≈ 1/2, the α-trimmed approach is not an attactive alternative. 2
For the centered rank outlyingness function, the MBPs and SBPs dependexplicitly on the thresholds γ and λ. This requires us to think carefully aboutour choices of these thresholds. In general, the sample outlier region OR(λ, Xn)estimates a population analogue, namely the outlier region out(λ, F ). Thus γshould reflect some relatively high outlyingness level of interest in the F distri-bution and determine the same value for λ. For the centered rank outlyingness,it is straightforward to select γ using
|2F (x)− 1| = γ.
In this case the “contour” of equal outlyingness at level γ consists of the twopoints F−1
(1−γ
2
)and F−1
(1+γ
2
). Then, if γ should demark the outlyingness
23
contour enclosing a proportion p of the distribution, F (x) should be takenequal to (1− p)/2 or (1 + p)/2, in either case yielding
λ = γ = p.
In particular, for a typical choice such as p = 0.9, we then obtain MBP(A) =0.05, MBP(B) = 0.45, SBP(A) = 0.45, and SBP(B) = 0.05. One of these twoMBP measures is high, the other low, and the same holds for the two SBPmeasures.
We can make all four values equal by choosing λ = γ = 1/2, which are
the levels corresponding to the 1st and 3rd quartiles of F and Fn, in whichcase MBP(A) = MBP(B) = SBP(A) = SBP(B) = 1/4. However, the 1st and 3rdquartiles are associated with description of the center of the distribution andare not typical outlyingness thresholds.
Summary. For equal priority on MBP and SBP, the scaled deviationoutlyingness with (Med, MAD) provides the best common value of 1/2. Formoderate prioritization of SBP over MBP if desired, the α-trimmed versionsare attractive. The centered rank outlyingness function has great descriptiveappeal because it is closely associated with the quantile function, but its sampleversions are not sufficiently robust at more extreme outlyingness thresholds.2
Acknowledgements
The authors gratefully acknowledge useful input from G. L. Thompson, XinDang, Satyaki Mazumder, Bo Hong, and Seoweon Jin. Also, support underNational Science Foundation Grant DMS-1106691 is sincerely acknowledged.
Appendix
Two lemmas from Serfling and Wang (2012) are helpful in evaluating the RBPs of“inf” and “sup” statistics. The first treats breakdown of a statistic S(Xn) when it iseither the minimum or the maximum of certain other statistics T1(Xn), . . . , TJ(Xn).Let k
(0)exp, k
(1)exp, . . . , k
(J)exp be the minimal numbers of data points which must be re-
placed in order to cause explosion breakdown of the respective statistics S andT1, . . . , TJ , and let k
(0)imp, k
(1)imp, . . . , k
(J)imp be their counterparts for implosion break-
down. Also, let
T ∗ = min{T ∗1 , . . . , T ∗
J}, T ∗∗ = max{T ∗∗1 , . . . , T ∗∗
J }.
Lemma 13 (i) Let S = min{T1, . . . , TJ}. Then
min{k(1)imp, . . . , k
(J)imp} ≤ k
(0)imp ≤ max{k(1)
imp, . . . , k(J)imp}. (17)
24
(ii) Let S = max{T1, . . . , TJ}. Then
min{k(1)exp, . . . , k
(J)exp} ≤ k(0)
exp ≤ max{k(1)exp, . . . , k
(J)exp}. (18)
The next lemma treats breakdown of a statistic S(Xn) when the event of break-down due to k replacements is related to the possible occurrences of certain eventsE1, . . . , EJ as a consequence of k replacements. Let kS be the minimal number ofdata points which must be replaced in order to cause breakdown (either implosionor explosion) of S, and let k1, . . . , kJ be the minimal numbers of data points whichmust be replaced in order to cause occurrence of the respective events E1, . . . , EJ .It is assumed that kS and k1, . . . , kJ are well-defined and belong to {1, 2, . . . , n}.
Lemma 14 (i) If breakdown of S is implied by occurrence of each one of the eventsE1, . . . , EJ , then
kS ≤ min{k1, . . . , kJ}. (19)
(ii) If breakdown of S implies occurrence of at least one of the events E1, . . . , EJ ,then
kS ≥ min{k1, . . . , kJ}. (20)
(iii) If breakdown of S is implied by occurrence of each one of the events E1, . . . , EJ
and also implies that at least one of E1, . . . , EJ must occur, then
kS = min{k1, . . . , kJ}. (21)
Proof of Proposition 5. Using OR(λ, Xn) = [µ− βσ, µ + βσ], it follows that
supy ∈OR(λ,Xn)
O(y, F )
=
O(µ + βσ, F ) if µ(F ) ≤ µ− βσ
O(µ− βσ, F ) if µ(F ) ≥ µ + βσ
max{
O(µ + βσ, F ), O(µ− βσ, F )}
otherwise
= max{
O(µ + βσ, F ), O(µ− βσ, F )}
(in all cases). (22)
It then follows from Lemma 13(ii) that
min{
RBPexp
(O(µ + βσ, F )
),RBPexp
(O(µ− βσ, F )
)}≤ MBP(A)(λ, Xn)
≤ max{
RBPexp
(O(µ + βσ, F )
),RBPexp
(O(µ− βσ, F )
)}(23)
We next evaluate RBPexp
(O(µ + βσ, F )
)and RBPexp
(O(µ− βσ, F )
). It is readily
checked that these are equal, respectively, to RBPexp (|µ + βσ|) and RBPexp (|µ− βσ|).Under Assumption A, RBPexp (|µ + βσ|) and RBPexp (|µ− βσ|) are equal, and wehave
MBP(A)(λ, Xn) = RBPexp (|µ + βσ|) . (24)
25
Here we are using RBPexp with T ∗∗ = ∞. We now evaluate RBPexp (|µ + βσ|), forwhich we apply Lemma 14 with some choice of k and the events S, E1, E2, and E3,where
S = {∃{Xn,k} such that |µ(Xn,k) + βσ(Xn,k)| → ∞}
E1 = {∃{Xn,k} such that µ(Xn,k) → +∞}
E2 = {∃{Xn,k} such that |µ(Xn,k)| is bounded and σ(Xn,k) →∞}
E3 = {∃{Xn,k} such that µ(Xn,k) → −∞ and |µ(Xn,k) + βσ(Xn,k)| → ∞} .
Note that (with k fixed) each of E1, E2, E3 implies S and S implies E1 ∪ E2 ∪ E3.Then (21) yields
RBPexp
(O(µ + βσ, F )
)= RBPexp (|µ + βσ|) = n−1 min{k1, k2, k3}, (25)
where k1, k2, k3 are the minimal values of k, respectively, for occurrence of E1, E2, E3.Finally, we note that, under Assumption A, k3 ≥ kexp(µ) = k1.
Proof of Proposition 6. Using out(γ, F ) = [µ− ησ, µ + ησ], it follows that(interpreting 0/0 as 0)
infy ∈ out(γ,F )
O(y, Xn) = infy ∈ out(γ,F )
∣∣∣∣y − µ
σ
∣∣∣∣=
{0 if µ 6∈ (µ− ησ, µ + ησ)min
{∣∣∣µ+ησ−bµbσ∣∣∣ , ∣∣∣µ−ησ−bµbσ
∣∣∣} if µ ∈ (µ− ησ, µ + ησ)
= min{∣∣∣∣µ + ησ − µ
σ
∣∣∣∣ I(µ ∈ (µ− ησ, µ + ησ)),∣∣∣∣µ− ησ − µ
σ
∣∣∣∣ I(µ ∈ (µ− ησ, µ + ησ))}
,
where I(A) denotes the indicator of the event A. It then follows from Lemma 13(i)that
min{
RBPimp
(∣∣∣∣µ + ησ − µ
σ
∣∣∣∣ I(µ ∈ (µ− ησ, µ + ησ)))
,
RBPimp
(∣∣∣∣µ− ησ − µ
σ
∣∣∣∣ I(µ ∈ (µ− ησ, µ + ησ)))}
≤ MBP(B)(γ, Xn)
≤ max{
RBPimp
(∣∣∣∣µ + ησ − µ
σ
∣∣∣∣ I(µ ∈ (µ− ησ, µ + ησ)))
,
RBPimp
(∣∣∣∣µ− ησ − µ
σ
∣∣∣∣ I(µ ∈ (µ− ησ, µ + ησ)))}
. (26)
26
Here we use RBPimp with T ∗ = 0. It is immediately evident from (26) that
MBP(B)(γ, Xn) = 0 if µ(Xn) 6∈ (µ− ησ, µ + ησ). (27)
We now treat the case that µ(Xn) ∈ (µ− ησ, µ + ησ) and evaluate
RBPimp
(∣∣∣∣µ + ησ − µ
σ
∣∣∣∣ I(µ ∈ (µ− ησ, µ + ησ)))
using Lemma 14 with some choice of k and the events S, E1 and E2, where
S ={∃{Xn,k} such that
∣∣∣∣µ + ησ − µ(Xn,k)σ(Xn,k)
∣∣∣∣ I(µ(Xn,k) ∈ (µ− ησ, µ + ησ)) → 0}
E1 = {∃{Xn,k} such that µ(Xn,k) 6∈ (µ− ησ, µ + ησ)}
E2 = {∃{Xn,k} such that µ(Xn,k) ∈ (µ− ησ, µ + ησ) holds and σ(Xn,k) →∞} .
Note that (with k fixed) each of E1 and E2 implies S and S implies E1 ∪E2. Then(21) yields
RBPimp
(∣∣∣∣µ + ησ − µ
σ
∣∣∣∣ I(µ ∈ (µ− ησ, µ + ησ)))
= n−1 min{k1, k2}. (28)
where k1, k2 are the minimal values of k, respectively, for occurrence of E1, E2.Similarly, with a modified S but the same E1, E2 and the same k1, k2, we obtain
RBPimp
(∣∣∣∣µ− ησ − µ
σ
∣∣∣∣ I(µ ∈ (µ− ησ, µ + ησ)))
= n−1 min{k1, k2}. (29)
In particular, note that, again using RBPexp with T ∗∗ = ∞,
k1/n = RBPexp(µ(Xn))
k2/n = RBPexp(σ(Xn) | µ(Xn) bounded).
Proof of Proposition 7. Using OR(λ, Xn) = [µ− βσ, µ + βσ], it follows that(interpreting 0/0 as 0)
infy ∈OR(λ,Xn)
O(y, F ) = infy ∈OR(λ,Xn)
∣∣∣∣y − µ(F )σ(F )
∣∣∣∣=
{0 if µ 6∈ (µ− βσ, µ + βσ)min
{∣∣∣ bµ+βbσ−µσ
∣∣∣ , ∣∣∣ bµ−βbσ−µσ
∣∣∣} if µ ∈ (µ− βσ, µ + βσ)
= min{∣∣∣∣ µ + βσ − µ
σ
∣∣∣∣ I(µ ∈ (µ− βσ, µ + βσ)),∣∣∣∣ µ− βσ − µ
σ
∣∣∣∣ I(µ ∈ (µ− βσ, µ + βσ))}
,
27
where again I(A) denotes the indicator of the event A. It then follows from Lemma13(i) that
min {RBPimp (|µ + βσ − µ| I(µ ∈ (µ− βσ, µ + βσ))) ,
RBPimp (|µ− βσ − µ| I(µ ∈ (µ− βσ, µ + βσ)))}
≤ SBP(A)(λ, Xn)
≤ max {RBPimp (|µ + βσ − µ| I(µ ∈ (µ− βσ, µ + βσ))) ,
RBPimp (|µ− βσ − µ| I(µ ∈ (µ− βσ, µ + βσ)))} . (30)
Under Assumption B,
RBPimp (|µ + βσ − µ| I(µ ∈ (µ− βσ, µ + βσ)))
= RBPimp (|µ− βσ − µ| I(µ ∈ (µ− βσ, µ + βσ)))
and via (30) we have
SBP(A)(λ, Xn) = RBPimp (|µ + βσ − µ| I(µ ∈ (µ− βσ, µ + βσ))) . (31)
It is immediately evident that
SBP(A)(λ, Xn) = 0 if µ(F ) 6∈ (µ− βσ, µ + βσ). (32)
We now treat the case that µ(F ) ∈ (µ− βσ, µ + βσ) and (under this assumption)evaluate
RBPimp (|µ + βσ − µ| I(µ ∈ (µ− βσ, µ + βσ)))
using Lemma 14 with some choice of k and the events S, E1, E2, and E3, where
S = {∃{Xn,k} such that|µ(Xn,k) + βσ(Xn,k)|I(µ ∈ (µ(Xn,k)− βσ(Xn,k), µ(Xn,k) + βσ(Xn,k))) → 0}
E1 = {∃{Xn,k} such that µ(Xn,k)− βσ(Xn,k) ≥ µ}
E2 = {∃{Xn,k} such that µ(Xn,k) + βσ(Xn,k) ≤ µ}
E3 = {∃{Xn,k} such thatµ ∈ (µ(Xn,k)− βσ(Xn,k), µ(Xn,k) + βσ(Xn,k))) and µ(Xn,k) + βσ(Xn,k) → µ} .
Note that (with k fixed) each of E1, E2, E3 implies S and S implies E1 ∪E2 ∪E3. Itis of interest that E1 and E2 are associated with the presence of outliers, whereasE3 is associated with the presence of inliers. Here we use RBPimp with T ∗ = 0 andRBPexp with T ∗∗ = ∞.
Proof of Proposition 8. Under Assumption C,
RBPexp
(∣∣∣∣µ + ησ − µ
σ
∣∣∣∣) = RBPexp
(∣∣∣∣µ− ησ − µ
σ
∣∣∣∣)
28
and, by steps similar to those in the foregoing treatments, we obtain
SBP(B)(γ, Xn) = RBPexp
(∣∣∣∣µ + ησ − µ
σ
∣∣∣∣) . (33)
We now use Lemma 14 with some choice of k and the events S, E1, and E2, where
S ={∃{Xn,k} such that
∣∣∣∣µ + ησ − µ
σ
∣∣∣∣→∞}
E1 = {∃{Xn,k} such that |µ(Xn,k)| → ∞ with σ(Xn,k) = o(|µ(Xn,k)|)}
E2 = {∃{Xn,k} such that |µ(Xn,k)| is bounded with σ(Xn,k) → 0} .
Note that (with k fixed) each of E1, E2 implies S and S implies E1∪E2. Also, E1 isassociated with the presence of outliers, whereas E2 is associated with the presenceof inliers.
Proof of Proposition 9. We have
supy ∈OR(λ,Xn)
O(y, F ) = supy ∈OR(λ,Xn)
|2F (y)− 1|
= max{∣∣∣∣2F
(F−1
n
(1− λ
2
))− 1∣∣∣∣ , ∣∣∣∣2F
(F−1
n
(1 + λ
2
))− 1∣∣∣∣} (34)
It then follows from Lemma 13(ii) that
min{
RBPexp
(∣∣∣∣2F
(F−1
n
(1− λ
2
))− 1∣∣∣∣) ,
RBPexp
(∣∣∣∣2F
(F−1
n
(1 + λ
2
))− 1∣∣∣∣)}
≤ MBP(A)(λ, Xn)
≤ max{
RBPexp
(∣∣∣∣2F
(F−1
n
(1− λ
2
))− 1∣∣∣∣) ,
RBPexp
(∣∣∣∣2F
(F−1
n
(1 + λ
2
))− 1∣∣∣∣)} . (35)
We now evaluate RBPexp
(∣∣∣2F(F−1
n
(1−λ
2
))− 1∣∣∣). Note that∣∣∣∣2F
(F−1
n
(1− λ
2
))− 1∣∣∣∣→ 1
29
if and only if either F−1n
(1−λ
2
)→∞ or F−1
n
(1−λ
2
)→ −∞. Thus we apply Lemma
14 with some choice of k and the events S, E1 and E2, where
S ={∃{Xn,k} such that
∣∣∣∣2F
(F−1
n,k
(1− λ
2
))− 1∣∣∣∣→ 1
}E1 =
{∃{Xn,k} such that F−1
n,k
(1− λ
2
)→∞
}E2 =
{∃{Xn,k} such that F−1
n,k
(1− λ
2
)→ −∞
}.
Note that (with k fixed) each of E1, E2 implies S and S implies E1 ∪ E2. Here weuse RBPexp with T ∗∗ = 1. Then (21) yields
RBPexp
(∣∣∣∣2F
(F−1
n
(1− λ
2
))− 1∣∣∣∣) = n−1 min{k1, k2}, (36)
where k1, k2 are the minimal values of k, respectively, for occurrence of E1, E2.Specifically, we find
k1 = n−⌈
1− λ
2n
⌉+ 1 =
⌊1 + λ
2n
⌋+ 1
and
k2 =⌈
1− λ
2n
⌉≤ k1.
Thus
RBPexp
(∣∣∣∣2F
(F−1
n
(1− λ
2
))− 1∣∣∣∣) = n−1
⌈1− λ
2n
⌉.
Similarly we obtain
RBPexp
(∣∣∣∣2F
(F−1
n
(1 + λ
2
))− 1∣∣∣∣) = n−1
(⌊1− λ
2n
⌋+ 1)
.
Proofs of Propositions 10, 11 and 12. These are similar to the foregoingproof.
References
[1] Davies, L. and Gather, U. (1993). The identification of multiple outliers.Journal of the American Statistical Association 88 782–801.
[2] Donoho, D. L. and Huber, P. J. (1983). The notion of breakdown point.In A Festschrift foe Erich L. Lehmann (P. J. Bickel, K. A. Doksum andJ. L. Hodges, Jr., eds.) pp. 157-184, Wadsworth, Belmont, California.
30
[3] Mosteller, C. F. and Tukey, J. W. (1977). Data Analysis and Regression.Addison-Wesley, Reading, Mass.
[4] Serfling, R. and Mazumder, S. (2009). Exponential probability inequalityand convergence results for the median absolute deviation and its modi-fications. Statistics and Probability Letters, 79 1767-1773.
[5] Serfling, R. and Wang, S. (2012). General foundations for studying mask-ing and swamping robustness of outlier identifiers. Submitted (availableat www.utdallas.edu/∼serfling).
[6] Zuo, Y. (2003). Projection-based depth functions and associated medians.Annals of Statistics 31 1460–1490.
31