EXAMINING OUTLIER DETECTION PERFORMANCE FOR PRINCIPAL COMPONENTS ANALYSIS METHOD AND ITS...
-
Upload
p-singh-ijaet -
Category
Technology
-
view
421 -
download
3
description
Transcript of EXAMINING OUTLIER DETECTION PERFORMANCE FOR PRINCIPAL COMPONENTS ANALYSIS METHOD AND ITS...
International Journal of Advances in Engineering & Technology, May 2013.
©IJAET ISSN: 2231-1963
573 Vol. 6, Issue 2, pp. 573-582
EXAMINING OUTLIER DETECTION PERFORMANCE FOR
PRINCIPAL COMPONENTS ANALYSIS METHOD AND ITS
ROBUSTIFICATION METHODS
Nada Badr, Noureldien A. Noureldien
Department of Computer Science
University of Science and Technology, Omdurman, Sudan
ABSTRACT
Intrusion detection has gasped the attention of both commercial institutions and academic research area. In this
paper PCA (Principal Components Analysis) was utilized as unsupervised technique to detect multivariate
outliers on the dataset of an hour duration of time. PCA is sensitive to outliers since it depend on non-robust
estimators. This lead us using MCD (Minimum Covariance Determinant) and PP (Projection Pursuit) as two
different robustification techniques for the PCA. The results obtained from experiments show that PCA
generates a high false alarms due to masking and swamping effects, while MCD and PP detection rate is much
accurate and both reveals the effects of masking and swamping undergo the PCA method.
KEYWORDS: Multivariate Techniques, Robust Estimators, Principal Components, Minimum Covariance
Determinant, Projection Pursuit.
I. INTRODUCTION
Principal Components Analysis (PCA) is a multivariate statistical method that concerned with
analyzing and understanding data in high dimensions, that is to say, PCA method analyzes data sets
that represent observations which are described by several dependent variables that are inter
correlated. PCA is one of the best known and most used multivariate exploratory analysis technique
[5].
Several robust competitors to classical PCA estimators have been proposed in the literature. A natural
way to robustify PCA is to use robust location and scatter estimators instead of the PCA's sample
mean and sample covariance matrix when estimating the eigenvalues and eigenvectors of the
population covariance matrix. The minimum covariance determinant (MCD) method is a highly
robust estimator of multivariate location and scatter. Its objective is to find h observations out of n
whose covariance matrix has the lowest determinant. The MCD location estimate then is the mean of
these h points, and the estimate of scatter is their covariance matrix. Another robust method for
principal component analysis uses the Projection-Pursuit (PP) principle. Here, one projects the data on
a lower-dimensional space such that a robust measure of variance of the projected data will be
maximized.
In this paper we investigate the effectiveness of the robust estimators provided by MCD and PP, by
applying PCA on Abilene dataset and compare its detection performance of dataset outliers to MCD
and PP.
The rest of this paper is organized as follows. Section 2 is an overview to related work. Section 3 was
dedicated for classical PCA. PCA robustification methods, MCD and PP are discussed in section 4.
In section 5 the experiment results are shown, conclusions and future work are drawn in section 6.
II. RELATED WORK
A number of researches have utilized principal components analysis to reduce the dimensionality and
to detect anomalous network traffic. The use of PCA to structure network traffic flows was introduced
International Journal of Advances in Engineering & Technology, May 2013.
©IJAET ISSN: 2231-1963
574 Vol. 6, Issue 2, pp. 573-582
by Lakhina [13] whereby principal components analysis is used to decompose the structure of Origin-
Destination flows from two backbone networks into three main constituents, namely periodic
trends, bursts and noise.
Labib [2] utilized PCA in reducing the dimension of the traffic data and for visualizing and
identifying attacks. Bouzida et, al. [7] presented a performance study of two machine learning
algorithms, namely, nearest neighbors and decision trees algorithms, when used with traffic data with
or without PCA. They discover that when PCA is applied to the KDD99 dataset to reduce dimension
of the data, the algorithms learning speed was improved while accuracy remained the same.
Terrel [9] used principal components analysis on features of aggregated network traffic of a link
connecting a university campus to the Internet in order to detect anomalous traffic. Sastry [10]
proposed the use of singular value decomposition and wavelet transform for detecting anomalies in
self similar network traffic data. Wong [12] proposed an anomaly intrusion detection model based on
PCA for monitoring network behaviors. The model utilizes PCA in reducing the dimensions of a
historical data and in building the normal profile, as represented by the first few components
principals. An anomaly is flagged when distance between the new observation and normal profile
exceeds a predefined threshold.
Mei-ling [4] proposed an anomaly detection scheme on robust principal components analysis. Two
classifiers were implemented to detect anomalies, one was based on major components that capture
most of the variations in the data, and the second was based on minor components or residuals. A new
observation is considered to be an outlier or anomalous when the sum of squares of the weighted
principal components exceeds the threshold in any of the two classifiers.
Lakhina [6] applied principal components analysis to Origin-Destination (OD) flows traffic , the
traffic isolated into normal and anomalous spaces by projecting the data onto the resulting principal
components one at a time, ordered from high to low, Principal components (PC) are added to the
normal space as long as a predefined threshold is not exceeded. When the threshold is exceeded, then
the PC and the subsequent PCs are added to anomalous space. New OD flow traffic is projected into
the anomalous space and anomaly is flagged if the value of the square prediction error or Q-statistic
exceeds a predefined limit.
Therefore PCA is widely used to identify lower dimensional structure in data, and is commonly
applied to high-dimensional data. PCA represents data by a small number of components that account
for the variability in the data. This dimension reduction step can be followed by other multivariate
methods, such as regression, discriminant analysis, cluster analysis, etc.
In classical PCA the sample mean and the sample covariance matrix are used to derive the principal
components. These two estimators are highly sensitive to outlying observations, and render PCA
unreliable, when outliers are encountered.
III. CLASSICAL PCA MODEL
The PCA detection model detects outliers by projecting observations of the dataset on the new
computed axes known as PCs. The outliers detected by PCA method are two types, outliers detected
by major PCs, and outliers detected by minor PCs.
The basic goals of PCA [5] are to extract important information from data set, to compress the size of
the data set by keeping only this important information and to simplify the description of data and
analyze the structure of the observation and variables (finding patterns with similarities and
difference).
To achieve these goals PCA calculate new variables from the original variables, called Principal
Components (PCs). The computed variables are linear combination of the original variables (to
maximize variance of the projected observation) and uncorrelated. The first computed PCs, called
major PCs has the largest inertia ( total variance in data set ), while the second calculated PCs, called
minor PCs has the greater residual inertia ,and orthogonal to the first principal components.
The Principal Components define orthogonal directions in the space of observations. In other words,
PCA just makes a change of orthogonal reference frame, the original variables being replaced by the
Principal Components.
International Journal of Advances in Engineering & Technology, May 2013.
©IJAET ISSN: 2231-1963
575 Vol. 6, Issue 2, pp. 573-582
3.1 PCA Advantages
PCA common advantages are:
3.1.1 Exploratory Data Analysis
PCA is mostly used for making 2-dimensional plots of the data for visual examination and
interpretation. For this purpose, data is projected on factorial planes that are spanned by pairs of
Principal Components chosen among the first ones (that is, the most significant ones). From these
plots, one will try to extract information about the data structure, such as the detection of outliers
(observations that are very different from the bulk of the data).
Due to most researches [8][11], PCA detect two types of outliers, type(1): the outlier that inflate
variance and this is detected by the major PCs and type (2): outlier that violate structure, which are
detected by minor PCs.
3.1.2 Data Reduction Technique
All multivariate techniques are prone to the bias variance tradeoff, which states that the
number of variables entering a model should be severely restricted. Data is often described
by many more variables than necessary for building the best model. PCA is better than
other statistical reduction techniques in that, it select and feed the model with reduced
number of variables.
3.1.3 Low Computational Requirement
PCA needs low computational efforts since its algorithm constitutes simple calculations.
3.2 PCA Disadvantages
It may be noted that the PCA is based on the assumptions that, the dimensionality of data can be
efficiently reduced by linear transformation and most information is contained in those directions
where input data variance is maximum.
As it is evident, these conditions are by no means always met. For example, if points of an input set
are positioned on the surface of a hyper sphere, no linear transformation can reduce dimension
(nonlinear transformation, however, can easily cope with this task). From the above the following
disadvantage of PCA are concluded.
3.2.1 Depending On Linear Algebra
It relies on simple linear algebra as its main mathematical engine, and is quite easy to interpret
geometrically. But this strength is also a weakness, for it might very well be that other synthetic
variables, more complex than just linear combinations of the original variables, would lead to a more
complex data description.
3.2.2 Smallest Principal Components Have No Attention in Statistical Techniques
The lack of interest is due to the fact that, compared with the largest principal components that
contain most of the total variance in the data, the smallest principal components only contain the
noise of the data and, therefore, appear to contribute minimal information. However, because outliers
are a common source of noise, the smallest principal components should be useful for outlier
detection.
3.2.3 High False Alarms
Principal components are sensitive to outliers, since the principal components are determined by
their directions and calculated from classical estimator such classical mean and classical covariance
or correlation matrices.
IV. PCA ROBUSTIFICATION
In real datasets, it often happens that some observation are different from the majority, such
observation are called outliers, intrusion, discordant, etc. However classical PCA method can be
International Journal of Advances in Engineering & Technology, May 2013.
©IJAET ISSN: 2231-1963
576 Vol. 6, Issue 2, pp. 573-582
affected by outliers so that PCA model cannot detect all the actual real deviating observation, this is
known as masking effect. In addition some good data points might even appear to be outliers which
are known as swamping effect .
Masking and swamping cause PCA to generate a high false alarm. To reduce this high false alarms
using robust estimators was proposed, since outlying points are less likely to enter into the
calculation of the robust estimators.
The well-known PCA Robustification methods are the minimum covariance determinant (MCD) and
Projection-Pursuit (PP) principle. The objective of the raw MCD is to find h > n/2 observations out
of n whose covariance matrix has the smallest determinant. Its breakdown value is (bn= [n- h+1]/n),
hence the number h determines the robustness of the estimator. In Projection-Pursuit principle [3],
one projects the data on a lower-dimensional space such that a robust measure of variance of the
projected data will be maximized. PP is applied where the number of variables or dimensions is very
large, so PP has an advantage over MCD, since the MCD proposes the dimensions of the dataset not
to exceed 50 dimensions.
Principal Component Analysis (PCA) is an example of the PP approach, because they both search for
directions with maximal dispersion of the data projected on it, but PP instead of using variance as
measure of dispersion, they use robust scale estimator [4].
V. EXPERIMENTS AND RESULTS
In this section we show how we test PCA and its robustification methods MCD and PP on a dataset.
The data that was used consist of OD (Origin-Destination) flows which, are collected and made
available by Zhang [1]. The dataset is an extraction of sixty minutes traffic flows from first week of
the traffic matrix on 2004-03-01, which is the traffic matrix Yin Zhang was built from Abilene
network. Availability of the dataset is on offline mode, where it is extracted from offline traffic
matrix.
5.1 PCA on Dataset
At first, the dataset or the traffic matrix is arranged into the data matrix X, where rows represent
observations and columns represent variables or dimensions.
X (144×12) =[
𝑥1,1 ⋯ 𝑥1,12
⋮ ⋱ ⋮𝑥144,1 ⋯ 𝑥144,12
],
The following steps are considered in apply PCA method on the dataset.
Centering the dataset to have zero mean, so the mean vector is calculated from the following
equation:
𝜇 =1
𝑛∑ 𝑥𝑖
𝑛𝑖=1 (1)
and subtracted off the mean for each dimension.
The product of this step is another centered data matrix Y, which has the same size as original dataset
𝑌(𝑛,𝑝) = (𝑥𝑖,𝑗 – 𝜇(𝑋)) (2)
Covariance matrix is calculated from the following equation:
𝐶(𝑋)𝑜𝑟Σ(𝑋) =1
𝑛−1(𝑋 − 𝑇(𝑋))𝑇 . (𝑋 − 𝑇(𝑋)) (3)
Finding eigenvectors and eigenvalues from the covariance matrix where eigenvalues are diagonal
elements of the matrix by using eigen-decomposition technique in equation (4).
𝐸−1 × Σ Y×E =ʎ (4)
Where E is the eigenvectors, ʎ is the eigenvalues .
Ordering eigenvalues in decreasing order and sorting eigenvectors according to the ordered
eigenvalues in loadings matrix. The Eigenvectors matrix is then sorted to be loading matrix.
Calculating scores matrix (dataset projected on principal components), which declares the
relations between principal components and observations. The scores matrix is calculated from
the following equations:
𝑠𝑐𝑜𝑟𝑒𝑠(𝑛,𝑝) = 𝑌(𝑛,𝑝) × 𝑙𝑜𝑎𝑑𝑖𝑛𝑔𝑠(𝑝,𝑝) (5)
International Journal of Advances in Engineering & Technology, May 2013.
©IJAET ISSN: 2231-1963
577 Vol. 6, Issue 2, pp. 573-582
Applying the 97.5 tolerance ellipse of the bivariate dataset (data projected on first PCS, data
projected on minor PCS) to reveal outliers automatically. The ellipse is defined by these data
points whose distance is equal to the chisquare root of the 97.5 quantile with 2 degrees of
freedom. The form of the distance is 𝑑𝑖𝑠𝑡 ≤ √𝑥2𝑝,0.975 (6)
The screeplot is used and studied and the first and the second principal components accounted for
98% of total variance of the dataset, so retaining the first two principal components to represent the
dataset as whole, figure (1) shows the screeplot, the plotting of the data projected onto the first two
principal components in order to reveal the outliers on the dataset visually is shown in figure (2).
Figure 1: PCA Screeplot Figure 2: PCA Visual outliers
Figure (3) shows tolerance ellipse on major PCS, and figures (4) and (5) shows the visual recording of
outliers from scatter plots of data projected on robust minor principal components and the outliers
detected by robust minor principal components tuned by tolerance ellipse respectively.
Figure 3: PCA Tolerance Ellipse Figure 4: PCA type2 Outliers
.
0 2 4 6 8 10 120
10
20
30
40
50
60
70
80
90
100
principal components
tota
lvariance v
ariances
-2 -1 0 1 2 3 4 5 6 7
x 107
-1
-0.5
0
0.5
1
1.5
2x 10
7 data projected on major pcs
PC1
PC
2
66
120
119
135
67 68 71 75 77 78 82 83
86
87 88 89 90
96 98
101103105
111112113115
126
127128
132
134136139141
125
129
130 131144
124
116
117 118
58 60 64 65 76 79 80 81 82
84
85 91 92 93 94 95 107108109110 114115121
137138140
142143
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 65 106122123
133
69 72 102
73747370
104
-4 -2 0 2 4 6
x 107
-5
0
5
10
15
x 106
PC1
PC
2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 76 77 78 79 80 81 82
84
85 91 92 93 94 95
106107108109110 111112113114115121
122123
124
125 130
137138
139140141
142143
66
67 68 69 70 71 72 73 74
75 83
86
87 88 89 90
96 97 98
99100101
102103
104105
116
117 118
119
120
126
127128
129
131
132
133
134
135
136
144
Tolerance ellipse (97.5%)
-8 -6 -4 -2 0 2 4 6
x 105
-6
-4
-2
0
2
4
6
8x 10
5 data projected on minor pcs
last PC-1
last
PC
1 2 3 4 5 6 7 8 9 10
11
12 13 14 15
16
17 18 19 20 21 22 23 24 25
26
27 28 29 30 31 32 33 34 35 36
37 38 39 40
41
42 43 44 45 46 47 48 49 50 51 52 53 54 55
56 57 58 59 60
61 62 63 64 65
66
67 68
70
71
72
73
74
75
77 78 79 80 81
82 83
84 85
87 88
89 90
91
92 93 94 95
99 100
101
102103
104105
106
107108
109110
111
112113114115
116
117118
119
120
121
122123124
125
126
127128
129130
131
132133134135
136
137138
139140
141
142143
144
86
76
98
96
International Journal of Advances in Engineering & Technology, May 2013.
©IJAET ISSN: 2231-1963
578 Vol. 6, Issue 2, pp. 573-582
Figure 5: Tuned Minor PCS
5.2 MCD on Dataset
Testing robust statistics MCD (Minimum Covariance Determinant) estimator yields robust location
measure Tmcd and robust dispersion Σmcd.
The following steps are applied to test MCD on the dataset in order to reach the robust principal
components.
MCD measure is calculated from the formula: R=(xi-Tmcd(X))T.inv(Σmcd(X)).(xi-Tmcd(X) ) for i=1 to n (7)
Tmcd or µmcd =1.0e+006 *
From robust covariance matrix Σmcd calculating the followings:
C(X)mcd or Σ(x)mcd = 1.0e+012 *
* find robust eigenvalues as diagonal matrix as in equation (4) by replacing n with h
* find robust eigenvectors as loading matrix as in equation (5).
Calculating robust scores matrix as in the following form
𝑟𝑜𝑏𝑢𝑠𝑡𝑠𝑐𝑜𝑟𝑒𝑠(𝑛,𝑝) = 𝑌(𝑛,𝑝) × 𝑙𝑜𝑎𝑑𝑖𝑛𝑔𝑠(𝑝,𝑝) (8)
The robust screeplot retaining the first two robust principal components which accounted above of
98% of total variance is shown in figure (6). Figures (7) and (8) shows respectively the visual
recording of outliers from scatter plots of data projected on robust major principal components, and
the outliers detected by robust major principal components tuned by tolerance ellipse, and Figures (9)
and (10) shows the visual recording of outliers from scatter plots of data projected on robust minor
principal components and the outliers detected by robust minor principal components tuned by
tolerance ellipse respectively.
Figure 6: MCD screeplot Figure 7: MCD Visual Outliers
-6 -4 -2 0 2 4
x 105
-4
-2
0
2
4
6
x 105
PC11
PC
12 1 2 3 4 5 6 7 8 9 10
11
12 13 14 15
16
17 18 19 20 21 22 23 24 25
26
27 28 29 30 31 32 33 34 35 36
37 38 39 40
41
42 43 44 45 46 47 48 49 50 51
52 53 54 55
56
57 58 59 60
61 62 63 64 65
76
77 78 79 80 81
82
84
85
91
92 93 94 95106
107108
109110
111
112113
114115
121
122123124
125
130
137138
139140
141
142143 66
67 68
69 70
71
72
73
74
75
83
86
87 88
89 90
96
97 98
99100
101
102103
104
105
116
117118
119
120
126
127128
129
131
132133134135
136
144
Tolerance ellipse (97.5%)
0 2 4 6 8 10 120
10
20
30
40
50
60
70
80
90
100robust mcd screeplot to retain robust PCS
principal components
tota
l variance
-8 -7 -6 -5 -4 -3 -2 -1 0 1
x 107
-1
-0.5
0
0.5
1
1.5
2
2.5x 10
7
robustmcd PC1
rob
ustm
cd P
C2
major pcs from robust estimator
135
119
120
66
116
118117
129129130 125
124
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 67 68
69 70 71 72 73 74 75 76 77 78 79 80 81 82 83
84
85 86
87 88 89 90 91 92 93 94 95 96
97 98 99 100 101
102 103104 105 106107108109110111112113 114115
121122123
127128
132136
137138
139140141
142143
134
133
131
104104
International Journal of Advances in Engineering & Technology, May 2013.
©IJAET ISSN: 2231-1963
579 Vol. 6, Issue 2, pp. 573-582
Figure 8: MCD Tolerance Ellipse Figure 9: MCD type2 Outliers
Figure 10: MCD Tuned Minor PCs
5.3 Projection Pursuit on Dataset
Testing the projection pursuit method on the dataset is included in the following steps:
Center the data matrix X(n,p) , around L1-median to reach centralized data matrix Y(n,p) as :
𝑌(𝑛,𝑝) = (𝑋 (𝑛,𝑝) − 𝐿1(𝑋)) (9)
Where L1(X) is high robust estimator of multivariate data location with 50% resist of outliers [11].
Construct the directions pi as normalized rows of matrix , `this process include the following:
𝑃𝑌 = (𝑌[𝑖, : ])′ 𝑓𝑜𝑟 𝑖 , 1: 𝑛 (10)
𝑙𝑒𝑡 𝑁𝑃𝑌 = max(𝑆𝑉𝐷(𝑃𝑌)) (11)
Where SVD stand for singular value decomposition.
𝑃𝑖 =𝑃𝑌
𝑁𝑃𝑌 (12)
Project all dataset on all possible directions.
𝑇𝑖 = 𝑌 × (𝑃𝑖)𝑡 (13)
Calculate robust scale estimator for all the projections and find the directions that maximize qn
estimator,𝑞 = max(𝑞𝑛(𝑇𝑖)) (14) qn is a scale estimator, essentially it is the first quartile of all pairwise distance between two data
points [5]. The results of these steps yields the robust eigenvectors (PCs), and the squared of
value of the robust scale estimator is the eigenvalues.
project all data on the selected direction q to obtain robust principal components as in the
following : 𝑇𝑖 = 𝑌𝑛,𝑝 × 𝑃𝑞
𝑡 (15)
Update data matrix by its orthogonal complement as in the followings:
𝑌 = 𝑌 − (𝑃𝑞 × 𝑃𝑞𝑡). 𝑌 (16)
-6 -4 -2 0 2 4
x 107
-5
0
5
10
15
20
x 106
robustmcdPC1
robustm
cdP
C2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 79 81
84
85 91 94106107108109110114121
122123
124
125
66
67 68
69 70 71 72 73 74 75 76 77 78 80 82 83
86
87 88 89 90 92 93 95 96
97 98
99100 101102 103
104105 111112113115
116
117118
119
120
126
127128
129130
131
132133
134
135
136
137138
139140141
142143
144
Tolerance ellipse (97.5%)
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5
x 106
-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2x 10
6 data project on robustmcd minor PCS
robustmcd last-1 pc
robustm
cd last
pc
116
96131
717069
1019798
99100
66
120119
848576118
117
86
73
67
7414191
81126
136144
102134
102104
136139
61248026
444444
1131128888
56
-2.5 -2 -1.5 -1 -0.5 0 0.5 1
x 106
-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
x 106
robustmcd pclast-1
robu
stm
cd p
clas
t
1 2 3 4 5 6 7 8 9 10 11
12 13 14 15 16 17 18 19 20 21 22 23 24 25
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61
62 63 64 65
79
81
84 85 91
94106107108 109110114
121122123124125
66
67 68
69 70 71
72
73
74
75
76
77 78 80 82 83
86
87 88
89 90 92 93 95
96
97 98
99100
101102
103104105111
112113
115
116
117
118
119120
126
127128129130
131
132133
134
135136
137138
139140
141
142143144
Tolerance ellipse (97.5%)
International Journal of Advances in Engineering & Technology, May 2013.
©IJAET ISSN: 2231-1963
580 Vol. 6, Issue 2, pp. 573-582
Project all data on the orthogonal complement,
𝑠𝑐𝑜𝑟𝑒𝑠 = 𝑌 × 𝑃𝑖 (17)
The Plotting of the data projected on the first two robust principal components to detect outliers
visually, is shown in figure (11), and the tuning the first two robust principal components by
tolerance ellipse is shown in figure (12). Figures (13) and (14) show respectively the plotting of
the data projected on minor robust principal components to detect outliers visually, and the tuning
of the last robust principal components by tolerance ellipse.
Figure 11: PP Visual Outliers Figure 12: PP Tolerance Ellipse
Figure 13: MCD type2 Outliers Figure 14: MCD Tuned Minor PCs
5.4 Results
Table (1) summarizes the outliers detected by each method. The table shows that PCA suffers from
both masking and swamping. The MCD and PP methods results reveal the effects of masking and
swamping of the PCA method. The PP method results are similar to MCD with slight difference
since we use 12 dimensions on the dataset.
Table 1: Outliers Detection
PCA Outlier
detected by major
and Minor PCS
MCD Outliers
detected by major and
minor PCS
PP Outliers
detected by major
and minor PCS
False alarms effects
Masking Swamping
66 66 66 No No
99 99 99 No No
100 100 100 No No
116 116 116 No No
117 117 117 No No
118 118 118 No No
119 119 119 No No
120 120 120 No No
-1 0 1 2 3 4 5 6 7 8
x 107
-4
-3.5
-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1x 10
7 data projected on robust major PCS by PP method
PProbust PC1
PP
robust
PC
2
66
67 68
69 70
71
72 73 74
75
76 77 78
79 80 81
82 83 84 85
86 87 88
89 90
91 92 93 94 95
96
97 98 99 100
101
102
103104
105
107111112113
114115
116
117
118
119
120
121 126
127128
129130
131132
133134
135
136
137138139
140141
142143
144
-4 -2 0 2 4 6
x 107
-4
-3
-2
-1
0
1
x 107
PProbust PC1
PP
robust
PC
2
15 137 34 19 3 63 79 80
134 87
144 22 62 2 35 20 23 14 49 47 50 29 48 59 30 33 32 18 17 43109 25 54 42 24 55 27 45 28110 52 53 60 6 44106
90 88
142 57122 64 13 12 65123 46 58 51 26 8 40 7
89
39
78
38 37 31
77
10 92
21 5 4 1138 93 94
9 95
83 82
96132
143107
84
56108
128
11
73
131 86
140121 36 16
127
61126
124
85
103
114139
72
81130
118
133
141
41115
102
75
129
125
117
91
71
74
112113136
105101
67 68
111
104
76
135116
97 98100 99 70
69
66
119
120
Tolerance ellipse (97.5%)
-3 -2 -1 0 1 2 3 4
x 106
-2
-1.5
-1
-0.5
0
0.5
1
1.5x 10
6 data projected on robust minor PCS by PP
PProbust PC11
PP
robust
PC
12
99 100
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
41
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
61 62 63 64 65
67 68
70
72 73
77 78 79 80
81 82 83
84 85
86 87 88
91
92 93 94 95
102103
106107108109110
114121122123
127128131132
133
134137138139140
141
1421431441 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
41
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
61 62 63 64 65
67 68
71
72 73
74 75
76
77 78 79 80
81 82 83 86
87 88 89 90
91
92 93 94 95 96
101
102103
104105
106107108109110
111112113
114115
117118
121122123
124125126 127
129130
131132
133
134
136
137138139140
142143144
135
119
120
9797
116
-2 -1 0 1 2 3
x 106
-1.5
-1
-0.5
0
0.5
1
x 106
PProbust PC11
PP
robust
PC
12
15137 34 19 3 63 79 80134 87144 22 62 2 35 20 23 14 49 47 50 29 48 59 30 33 32 18 17 43109 25 54 42 24 55 27 45 28110 52 53 60 6 44106 90 88142 57122 64 13 12 65123 46 58 51 26 8 40 7 89 39 78
38 37 31 77 10 92 21 5 4 1138
93 94 9 95 83 82
96132143107
84 56
108
128 11 73131
86140121
36 16 127 61126124 85
103114 139
72
81
130 118
133
141
41115 102 75
129 125117
91 71 74 112113 136105
101
67 68
111104
76
135
116
97 98100 99 70
69 66
119
120
Tolerance ellipse (97.5%)
International Journal of Advances in Engineering & Technology, May 2013.
©IJAET ISSN: 2231-1963
581 Vol. 6, Issue 2, pp. 573-582
129 129 129 No No
131 131 131 No No
135 135 135 No No
Normal Normal 69 Yes No
Normal Normal 70 Yes No
71 Normal normal No Yes
76 Normal normal No Yes
81 Normal normal No Yes
101 Normal normal No Yes
104 Normal normal No Yes
111 Normal normal No Yes
144 Normal normal No Yes
Normal 84 normal Yes No
Normal 96 normal Yes No
Normal 97 97 Yes No
Normal 98 98 Yes No
VI. CONCLUSION AND FUTURE WORK
The study has examined the PCA and its robustification methods (MCD, PP) performance for
intrusion detection by presenting the bi-plots and extracted outlying observation that are very
different from the bulk of data. The study showed that tuned results are identical to visualized one.
The study returns the PCA false alarms shortness due to masking and swamping effect. The
comparison proved that PP results are similar to MCD with slight difference in outliers type 2 since
are considered as source of noise. Our future work will go into applying the hybrid method
(ROBPCA), which takes PP as reduction technique and MCD as robust measure for further
performance, and applying dynamic robust PCA model with regards to online intrusion detection.
REFERENCES
[1]. Abilene TMs, collected by Zhang . www.cs.utexas.edu/yzhang/ research, visited on 13/07/2012
[2]. Khalid Labib and V.Rao Vemuri. "An application of principal Components analysis to the detection
and visualization of computer network ". Annals of telecommunications, pages 218-234, 2005 .
[3]. C. Croux, A. Ruiz-Gazen, "A fast algorithm for robust principal components based on projection
pursuit", COMPSTAT: Proceedings in Computational Statistics, Physica-Verlag, Heidelberg,1996, 211–
217.
[4]. Mei-ling Shyu, Schu-Ching Chen,Kanoksri Sarinnapakorn,and Li Wuchang. "Anovel anomaly detection
scheme based on principal components classifier". In proceedings of the IEEE foundations and New
directions of Data Mining workshop, in conjuction with third IEEE international conference on data mining
(ICOM03) .
[5]. J.Edward Jackson . "A user guide to principal components". Wiely interscience Ist edition 2003.
[6]. Anukool Lakhina,. Mark Crovella, and Christoph Diot. "Diagnosing network wide traffic anomalies"
.Proceedings of the 2004 conference on Applications, technologies, architectures, protocols for computer
communication. ACM 2004.
[7]. Yacine Bouzida, Frederic Cuppens, NoraCuppens-Boulahio, and Sylvain Gombaul. "Efficient Intrusion
Detection Using Principal Component Analysis ". La londe, France, June 2004.
[8]. R.Gnandesikan, "Methods for statistical data analysis of multivariate observations". Wiely-interscience
publication New York, 2nd edition 1997.
[9]. J.Terrel, K.Jeffay L.Zhang, H.Shen, Zhu, and A.Nobel, "Multivariate SVD analysis for a network
anomaly detection ". In proceedings of the ACM SIGOMM Conference 2005.
[10]. Challa S.Sastry, Sanjay Rawat, Aurn K.Pujari and V.P Gulati, "Netwok traffic analysis using singular
value decomposition and multiscale transforms ". information sciences : an international journal 2007.
International Journal of Advances in Engineering & Technology, May 2013.
©IJAET ISSN: 2231-1963
582 Vol. 6, Issue 2, pp. 573-582
[11]. I.T.Jollif, "Principal components analysis", springer series in statistics, Springer Network ,2nd edition
2007.
[12]. Wei Wong, Xiachong Guan, and Xiangliong Zhang, "Processing of massive audit data streams for real
time anomaly intrusion detection". Computer communications , Elsevier 2008.
[13]. A Lkhaina, K Papagiannak, M Crovella, C-Diot, E Kolaczy, and N. Taft, "Structural Analysis of
network traffic flows". In proceedings of SIGMETRICS, New York, NY, USA, 2004.
AUTHORS BIOGRAPHIES
Nada Badr earned her BSC in Mathematical and Computer Science at University of
Gezira, Sudan. She received the MSC in Computer Science at University of Science and
Technology. She is pursuing her PHD in Computer Science at University of Science and
Technology, Omdurman, Sudan. She currently serving lecturer at the University of
Science and Technology, Faculty of Computer Science and Information Technology.
Noureldien A. Noureldien is working as an associate professor in Computer Science,
department of Computer Science and Information Technology, University of Science and
Technology, Omdurman, Sudan. He received his B.Sc. and M.Sc. from School of
Mathematical Sciences, University of Khartoum, and received his PhD in Computer
Science in 2001 from University of Science and Technology, Khartoum, Sudan. He has
many papers published in journals of repute. He currently working as the dean of the
Faculty of Computer Science and Information Technology at the University of Science
and Technology, Omdurman, Sudan.