Paper-1 Significance of One-Class Classification in Outlier Detection
-
Upload
rachel-wheeler -
Category
Documents
-
view
217 -
download
0
Transcript of Paper-1 Significance of One-Class Classification in Outlier Detection
-
7/28/2019 Paper-1 Significance of One-Class Classification in Outlier Detection
1/11
International Journal of Computational Intelligence and Information Security, June 2013 Vol. 4, No. 6ISSN: 1837-7823
4
Significance of One-Class Classification in Outlier Detection
Anandkumar Prakasam1
Nickolas Savarimuthu2
1ROOT Research Consultancy, Tiruchirappalli, Tamilnadu, India.
[email protected] Professor, National Institute of Technology, Tiruchirappalli, Tamilnadu, India.
Abstract
Outlier or Novelty detection is one of the important aspects in the current learning scenario. It helps to
discover unknown knowledge from the available data. Novelty detection is also called outlier detection, since in
some areas; these data are considered out-of-the ordinary and need to be eliminated. The current paper discusses
various methods for detecting the novelties and compares them to obtain the best method that can be used fornovelty detection. The current paper considers Generalized Extreme Studentized Deviate (GESD); an outlier
detection method, SVM; a binary classifier, Nave Bayes Classifier; a multi-class classifier and a one-class
classification method. Results reveal that the one-class classification method provides the best results in most of the
scenarios, where the availability of training data for the outliers is minimal, and sometimes not available.
Keywords
One-Class Classification; GESD; SVM; Nave Bayes; Outlier Detection; Novelty Detection; Classification
1. IntroductionMulti-class classification is one of the majorly used algorithms in data mining. However, sometimes it is not
necessary to classify the data into multiple classes. If there is only one class that we are interested in, it is sufficient
that the one specific class is separated from the rest of the data. This kind of data mining is called one-class
classification. Usually, in one-class classification, the data instances that do not belong to the normal data or to the
majority of the data are separated.
For example, credit card frauds [1] or intrusion detection [2] can be regarded as anomaly detection, while
detecting of previously unobserved patterns in data can be regarded as novelty detection [3] [4]. Such kind of
novelty detection techniques can be used, for example, for detecting a new discussion topic in news groups. The
difference between anomaly and novelty detection is that often the novelty detection method includes the discovered
novelty patterns into the model [5]. One-class classification can be considered, for example, if the number of
anomalies is much smaller than the number of normal data instances[21][22].
-
7/28/2019 Paper-1 Significance of One-Class Classification in Outlier Detection
2/11
International Journal of Computational Intelligence and Information Security, June 2013 Vol. 4, No. 6ISSN: 1837-7823
5
1.1 Anomaly Detection TechniquesAnomaly detection techniques could be divided into four different categories: classification, distance-based,
statistical and other techniques. Classification based anomaly detection techniques uses a model to predict if the test
instance is normal or anomalous. In general, a set of training data is provided and then the system is provided with
actual data for performing the classification process. If there exists multiple normal classes, then it is considered as
multi-class [6] and if there is only one normal class, it is considered as one-class anomaly detection technique [5].
Both of these classifiers are trained with only the normal data instances, so they belong to semi-supervised learning
methods. Typical classification based anomaly detection techniques are, for example, Bayesian networks [7], neural
networks [8], rule-based techniques [7] and support vector machines [9]. Distance based techniques use the distance
between points as a basic measure in detecting anomalies. Statistical techniques formulate statistical models to a
given data set that can calculate probabilities to test instances and declare if they are anomalous or normal.
Support vector machine is a method used in pattern recognition and classification. It is a classifier to predict or
classify patterns into two categories; fraudulent or non fraudulent. It is well suited for binary classifications. As any
artificial intelligence tool, it has to be trained to obtain a learned model. SVM has been used in many classification
pattern recognition problems such as text categorization, and face detection. SVM is correlated to and having the
basics of non-parametric applied statistics, neural networks and machine learning. [10][11][12][13].
SVM weight implements cost sensitive learning. Similar to SVM, the weighted SVM is used to maximize the
margin of separation and minimize the classification error. The margin boundary is used to separate the classes. In
CS-SVM different weights are assigned to the classes. Effective decision boundary is learned by adjusting the
weights of the different classes. It improves the accuracy of the prediction rate.
All of the anomaly detection techniques have their strengths and weaknesses and there is no single technique, which
is suitable for all the situations. We will have to analyze its suitability for a particular application and aptly choose
one.
In the current paper, we provide a comparative study on GESD (Generalized Extreme Studentized Deviate), Nave
Bayes and SVM Classifiers with one-class classifiers. The rest of this paper is organized as follows. Section 2
provides an overview of One-Class Classification technique used for the comparison study. Section 3 describes the
datasets used, Section 4 describes the experimental results and Section 5 concludes the study.
2. One-Class Classification : An OverviewThe traditional methods of classification have always been those that use all data classes to build models. Such
models are discriminatory in nature, since they learn to discriminatebetween classes. However, many real worldsituations are such that it is only possible to have data from one class, the target class; data from other classes, the
outlier classes, is either very difficult or impossible to obtain. Examples for such problems include Fraud Detection,
Medicine, Machine fault detection, Wireless Sensor Networks, Intrusion detection and Object recognition such as
Face Detection. These are the situations where one-class classification plays a major role in detection of anomalies.
-
7/28/2019 Paper-1 Significance of One-Class Classification in Outlier Detection
3/11
-
7/28/2019 Paper-1 Significance of One-Class Classification in Outlier Detection
4/11
International Journal of Computational Intelligence and Information Security, June 2013 Vol. 4, No. 6ISSN: 1837-7823
7
correctly, good performances can be achieved. These parameters are often called the magic parameters because
they often have a big influence on the final performance and no clear rules are given how to set them. These
numbers cannot be intuitively given beforehand, and only by trial and error a reasonable network size is found.
Computation and storage requirements: A final consideration is the computational requirements of the methods.
Although computers offer more power and storage every year, methods which require several minutes for the
evaluation of a single test object might be unusable in practice. Since training is often done off-line, training costs
are often not very important. However, in the case of adapting the method to a changing environment, these training
costs might become very important. The most straightforward method to obtain a one-class classifier is to estimate
the density of the training data and to set a threshold on this density.
In our approach we have used the GESD for outlier detection, SVM and Nave Bayes classifier for multi-class
classification and the one-class classification method described in [15] for analysis of outliers.
3. Dataset Description3.1. KEEL DatasetThe KEEL Datasets [23] available under the category of Classification was used for the analysis. The dataset
includes target instances as well as outlier instances. All the datasets are binary problems and they do not contain
any missing attributes. No categorical attributes are considered, and all the attributes available in the datasets are
numerical.
Table 1: Dataset Description
Name No of InstancesBanana 5300
Phoneme 5404
Appendicitis 106
Titanic 2201
Mammographic 830
4. Experimental Results and Discussion4.1.
Results on KEEL Dataset
Analysis of the results on various datasets shows that the One-Class Classifiers detect more outliers, when compared
to other classification methods. While the SVM shows an almost similar performance, its performance is slightly
lower than one-class classifiers. Figure 1-5 shows the performance of GESD, Nave Bayes, SVM and One-Class
Classifiers on various datasets.
-
7/28/2019 Paper-1 Significance of One-Class Classification in Outlier Detection
5/11
International Journal of Computational Intelligence and Information Security, June 2013 Vol. 4, No. 6ISSN: 1837-7823
8
Figure 1: Banana Dataset
Figure 2: Phoneme Dataset
0
20
40
60
80
100
120
140
Banana
N
o
o
f
O
u
t
l
i
e
r
s
DataSet
GESD
NaiveBayes
SVM
OCC
0
50
100
150
200
250
300
350
Phoneme
N
o
o
f
O
u
t
l
i
e
r
s
DataSet
GESD
NaiveBayes
SVM
OCC
-
7/28/2019 Paper-1 Significance of One-Class Classification in Outlier Detection
6/11
International Journal of Computational Intelligence and Information Security, June 2013 Vol. 4, No. 6ISSN: 1837-7823
9
Figure 3: Appendicitis Dataset
Figure 4: Titanic Dataset
0
10
20
30
40
50
60
Appendicitis
N
o
o
f
O
u
t
l
i
e
r
s
DataSet
GESD
NaiveBayes
SVM
OCC
0
510
15
20
25
30
35
40
Titanic
N
o
o
f
O
u
t
l
i
e
r
s
DataSet
GESD
NaiveBayes
SVM
OCC
-
7/28/2019 Paper-1 Significance of One-Class Classification in Outlier Detection
7/11
International Journal of Computational Intelligence and Information Security, June 2013 Vol. 4, No. 6ISSN: 1837-7823
10
Figure 5: Mammographic Dataset
Experiments were considered by varying the number of outliers during the training phase and then observing the
performance of each method. Since GESD is a statistics based outlier detection method, its performance deteriorated
slightly. While considering Nave Bayes and SVM, they rely completely on the training data while classifying data,
hence as the number of outliers in the training data reduces, the detection rate of SVM and Nave Bayes drops
drastically. The one-class classifier relies completely on the normal data and not the anomalies, hence the drop in
data detection rate is very low when compared to other methods.
Figure 6: Banana Dataset (X-Axis: percentage of outliers in training dataset & Y-Axis: number of outliers
detected)
0
10
20
30
40
50
60
Mammographic
N
o
o
f
O
u
t
l
i
e
r
s
DataSet
GESD
NaiveBayes
SVM
OCC
0
20
40
60
80
100
120
140
100 90 80 70 60 50 40 30 20 10
Banana
GESD
NaiveBayes
SVM
OCC
-
7/28/2019 Paper-1 Significance of One-Class Classification in Outlier Detection
8/11
International Journal of Computational Intelligence and Information Security, June 2013 Vol. 4, No. 6ISSN: 1837-7823
11
Figures 6-10 represent the detection rates of various algorithms, when we increased the imbalance in training
dataset. Analysis of the datasets show consistent performance of the one-class classifier, while deviations are
observed in the other detection methods.
Figure 7: Phoneme Dataset (X-Axis: Percentage of outliers in training dataset & Y-Axis: number of outliers
detected)
Figure 8: Appendicitis Dataset(X-Axis: Percentage of outliers in training dataset & Y-Axis: number of
outliers detected)
0
50
100
150
200
250
300
350
400
100 90 80 70 60 50 40 30 20 10
Phoneme
GESD
NaiveBayes
SVM
OCC
0
10
20
30
40
50
60
100 90 80 70 60 50 40 30 20 10
Appendicitis
GESD
NaiveBayes
SVM
OCC
-
7/28/2019 Paper-1 Significance of One-Class Classification in Outlier Detection
9/11
International Journal of Computational Intelligence and Information Security, June 2013 Vol. 4, No. 6ISSN: 1837-7823
12
Figure 9: Titanic Dataset(X-Axis: Percentage of outliers in training dataset & Y-Axis: number of outliers
detected)
Figure 10: Mammographic Dataset(X-Axis: Percentage of outliers in training dataset & Y-Axis: number of
outliers detected)
0
5
10
15
20
25
30
35
40
100 90 80 70 60 50 40 30 20 10
Titanic
GESD
NaiveBayes
SVM
OCC
0
10
20
30
40
50
60
100 90 80 70 60 50 40 30 20 10
Mammographic
GESD
NaiveBayes
SVM
OCC
-
7/28/2019 Paper-1 Significance of One-Class Classification in Outlier Detection
10/11
International Journal of Computational Intelligence and Information Security, June 2013 Vol. 4, No. 6ISSN: 1837-7823
13
4.2. DiscussionsFrom figures 6-10, where we analyses and compared the detection rates of various methods, we could infer from the
slope of the line that One Class Classification approach doesnt show much difference when the imbalance in the
training dataset is increased to a greater extent. SVM a popular binary classifier showed considerable performance
initially but later deteriorated when higher levels of imbalance were introduced. The slope of GESD, which is
completely based on statistical approach and One Class Classification, which doesnt rely on them, remained
constant showing the fact that they are not affected by increasing imbalance in the training datasets. This shows that
One Class Classifier could be used as a reliable technique when class imbalance is very high in the training data.
5. ConclusionOne-class classification becomes significant when in a conventional classification problem, classes in the (training)
data are highly imbalanced, i.e. one of the classes is severely under-represented due to the measuring costs for that
class caused by the low frequency of occurrence. It might also appear that it is completely unclear, what the
representative distribution of the data is. Since most of the real time data do not contain instances for anomalies, thisfails in most cases. Multi-class classification methods in general require training set containing samples for both
legitimate data and the anomalies. Hence, as the number of anomalies starts decreasing, the accuracy of these
methods will deteriorate. While considering the outlier detection methods, the false positive rates of these methods
are high, and are not reliable when considering real-time scenarios. Hence one-class classification proves to be the
best approach for usage in real-time scenarios where the occurrence of anomalies is limited and data about the
anomalies could not be obtained. Feature Selection[17][18][20] can be incorporated into one-class classification
scenarios to provide better and optimal results. This also helps to avoid unimportant parameters and provide
increased accuracy rate. Selective sampling[19] can also be used for reducing the cost, by labeling only the
important parameters.
References
[1] R. Brause, T. Langsdorf and M. Hepp, Neural Data Mining for Credit Card Fraud Detection, in Proceedingsof the 11th IEEE International Conference on Tools with Artificial Intelligence, pages 103 - 106, Washington,
DC, USA, 1999.
[2] F. A. Gonzlez and D. Dasgupta, Anomaly Detection Using Real-Valued Negative Selection, in GeneticProgramming And Evolvable Machines, Volume 4, Issue 4, pages 383 - 403, Klower Academic Publishers
Hingham, MA, USA, December 2003.
[3] M. Markou and S. Singh, Novelty Detection: A Review - Part 1: Statistical Approaches, Signal Processing,Volume 83, Issue 12, pages 2481 - 2497, December 2003.
[4] M. Markou and S. Singh, Novelty Detection: A Review - Part 2: Neural Network Based Approaches, SignalProcessing, Volume 83, Issue 12, pages 2499 - 2521, December 2003.
[5] V. Chandola, A. Banerjee and V. Kumar, Anomaly Detection: A Survey, in ACM Computing Surveys,Volume 41, Number 3, Article 15, ACM, New York, USA, July 2009.
[6] D. Barbar, N. Wu and S. Jajodia, Detecting novel network intrusions using Bayes estimators, in Proceedingsof the first SIAM Conference on Data Mining, Chicago, April 2001.
-
7/28/2019 Paper-1 Significance of One-Class Classification in Outlier Detection
11/11
International Journal of Computational Intelligence and Information Security, June 2013 Vol. 4, No. 6ISSN: 1837-7823
14
[7] Ethem Alpaydin, Introduction to Machine Learning 2nd edition, pages 109-112 and 489 - 493, The MIT Press,Cambridge, Massachusetts, London, England, 2010.
[8] Christopher M. Bishop, Neural networks for pattern recognition, Oxford University Press, Oxford 1996.[9] Corinna Cortes and Vladimir Vapnik, Support-vector networks, Machine Learning 20, pages 273 - 297, 1995.[10] Piyaphol Phoungphol, Yanqing Zhang, Yichuan Zha, and Bismita Srichandan, Multiclass SVM with Ramp
Loss for Imbalanced Data Classification, IEEE International Conference on Granular Computing, 2012.
[11] Yuchun Tang, Bo Jin, Yi Sun, and Yan-Qing Zhang, Member, Granular Support Vector Machines forMedical Binary Classification Problems, IEEE, 2004.
[12] Yuchun Tang, Member, IEEE, Yan-Qing Zhang, Member, IEEE, Nitesh V. Chawla, Member, IEEE, and SvenKrasser, Member, SVMs Modeling for Highly Imbalanced Classification, IEEE, 2009.
[13] Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen Lin, A Practical Guide to Support Vector Classification,2010.
[14] David Martinus Johannes TAX, One-class classification-Concept-learning in the absence of counter-examples, ISBN: 90-75691-05-x, 2001.
[15] Zineb Noumir, Paul Honeine, Cedric Richard, On Simple One-Class Classification Methods, 2012 IEEEInternational Symposium on Information Theory Proceedings, 978-1-4673-2579-0/12/$31.00 2012 IEEE.
[16] Defeng Wang, Daniel S. Yeung, Structured One-Class Classification, IEEE Transactions On Systems, Man,And CyberneticsPart B: Cybernetics, Vol. 36, No. 6, December 2006
[17] Young-Seon Jeong, In-Ho Kang, Myong-Kee Jeong, Dongjoon Kong, A New Feature Selection Method forOne-Class Classification Problems, IEEE Transactions On Systems, Man, And CyberneticsPart C:
Applications And Reviews, Vol. 42, No. 6, November 2012.
[18] George Gomes Cabral, Adriano Lorena Inacio de Oliveira,A Novel One-Class Classification Method Basedon Feature Analysis and Prototype Reduction, 978-1-4577-0653-0/11/$26.00 2011 IEEE
[19] Piotr Juszczak, Robert P.W. Duin ,Selective Sampling Methods in One-Class Classification Problems, O.Kaynak et al. (Eds.): ICANN/ICONIP 2003, LNCS 2714, pp. 140148, 2003.
[20] David M.J. Tax, Klaus-R. Muller ,Feature Extraction for One-Class Classification, O. Kaynak et al. (Eds.):ICANN/ICONIP 2003, LNCS 2714, pp. 342349, 2003.
[21] Kathryn Hempstalk, Eibe Frank ,Discriminating Against New Classes: One-class versus Multi-classClassification, W. Wobcke and M. Zhang (Eds.): AI 2008, LNAI 5360, pp. 325336, 2008.
[22] Kenneth Kennedy, Brian Mac Namee, Sarah Jane Delany,Learning without Default: A Study of One-ClassClassification and the Low-Default Portfolio Problem, L. Coyle and J. Freyne (Eds.): AICS 2009, LNAI
6206, pp. 174187, 2010.
[23] Knowledge Extraction based on Evolutionary Learning Datasets: http://sci2s.ugr.es/keel/datasets.php.