Paper-1 Significance of One-Class Classification in Outlier Detection

7/28/2019 Paper-1 Significance of One-Class Classification in Outlier Detection

1/11

International Journal of Computational Intelligence and Information Security, June 2013 Vol. 4, No. 6ISSN: 1837-7823

4

Significance of One-Class Classification in Outlier Detection

Anandkumar Prakasam1

Nickolas Savarimuthu2

1ROOT Research Consultancy, Tiruchirappalli, Tamilnadu, India.

[email protected] Professor, National Institute of Technology, Tiruchirappalli, Tamilnadu, India.

[email protected]

Abstract

Outlier or Novelty detection is one of the important aspects in the current learning scenario. It helps to

discover unknown knowledge from the available data. Novelty detection is also called outlier detection, since in

some areas; these data are considered out-of-the ordinary and need to be eliminated. The current paper discusses

various methods for detecting the novelties and compares them to obtain the best method that can be used fornovelty detection. The current paper considers Generalized Extreme Studentized Deviate (GESD); an outlier

detection method, SVM; a binary classifier, Nave Bayes Classifier; a multi-class classifier and a one-class

classification method. Results reveal that the one-class classification method provides the best results in most of the

scenarios, where the availability of training data for the outliers is minimal, and sometimes not available.

Keywords

One-Class Classification; GESD; SVM; Nave Bayes; Outlier Detection; Novelty Detection; Classification

1. IntroductionMulti-class classification is one of the majorly used algorithms in data mining. However, sometimes it is not

necessary to classify the data into multiple classes. If there is only one class that we are interested in, it is sufficient

that the one specific class is separated from the rest of the data. This kind of data mining is called one-class

classification. Usually, in one-class classification, the data instances that do not belong to the normal data or to the

majority of the data are separated.

For example, credit card frauds [1] or intrusion detection [2] can be regarded as anomaly detection, while

detecting of previously unobserved patterns in data can be regarded as novelty detection [3] [4]. Such kind of

novelty detection techniques can be used, for example, for detecting a new discussion topic in news groups. The

difference between anomaly and novelty detection is that often the novelty detection method includes the discovered

novelty patterns into the model [5]. One-class classification can be considered, for example, if the number of

anomalies is much smaller than the number of normal data instances[21][22].


2/11


5

1.1 Anomaly Detection TechniquesAnomaly detection techniques could be divided into four different categories: classification, distance-based,

statistical and other techniques. Classification based anomaly detection techniques uses a model to predict if the test

instance is normal or anomalous. In general, a set of training data is provided and then the system is provided with

actual data for performing the classification process. If there exists multiple normal classes, then it is considered as

multi-class [6] and if there is only one normal class, it is considered as one-class anomaly detection technique [5].

Both of these classifiers are trained with only the normal data instances, so they belong to semi-supervised learning

methods. Typical classification based anomaly detection techniques are, for example, Bayesian networks [7], neural

networks [8], rule-based techniques [7] and support vector machines [9]. Distance based techniques use the distance

between points as a basic measure in detecting anomalies. Statistical techniques formulate statistical models to a

given data set that can calculate probabilities to test instances and declare if they are anomalous or normal.

Support vector machine is a method used in pattern recognition and classification. It is a classifier to predict or

classify patterns into two categories; fraudulent or non fraudulent. It is well suited for binary classifications. As any

artificial intelligence tool, it has to be trained to obtain a learned model. SVM has been used in many classification

pattern recognition problems such as text categorization, and face detection. SVM is correlated to and having the

basics of non-parametric applied statistics, neural networks and machine learning. [10][11][12][13].

SVM weight implements cost sensitive learning. Similar to SVM, the weighted SVM is used to maximize the

margin of separation and minimize the classification error. The margin boundary is used to separate the classes. In

CS-SVM different weights are assigned to the classes. Effective decision boundary is learned by adjusting the

weights of the different classes. It improves the accuracy of the prediction rate.

All of the anomaly detection techniques have their strengths and weaknesses and there is no single technique, which

is suitable for all the situations. We will have to analyze its suitability for a particular application and aptly choose

one.

In the current paper, we provide a comparative study on GESD (Generalized Extreme Studentized Deviate), Nave

Bayes and SVM Classifiers with one-class classifiers. The rest of this paper is organized as follows. Section 2

provides an overview of One-Class Classification technique used for the comparison study. Section 3 describes the

datasets used, Section 4 describes the experimental results and Section 5 concludes the study.

2. One-Class Classification : An OverviewThe traditional methods of classification have always been those that use all data classes to build models. Such

models are discriminatory in nature, since they learn to discriminatebetween classes. However, many real worldsituations are such that it is only possible to have data from one class, the target class; data from other classes, the

outlier classes, is either very difficult or impossible to obtain. Examples for such problems include Fraud Detection,

Medicine, Machine fault detection, Wireless Sensor Networks, Intrusion detection and Object recognition such as

Face Detection. These are the situations where one-class classification plays a major role in detection of anomalies.


3/11


4/11


7

correctly, good performances can be achieved. These parameters are often called the magic parameters because

they often have a big influence on the final performance and no clear rules are given how to set them. These

numbers cannot be intuitively given beforehand, and only by trial and error a reasonable network size is found.

Computation and storage requirements: A final consideration is the computational requirements of the methods.

Although computers offer more power and storage every year, methods which require several minutes for the

evaluation of a single test object might be unusable in practice. Since training is often done off-line, training costs

are often not very important. However, in the case of adapting the method to a changing environment, these training

costs might become very important. The most straightforward method to obtain a one-class classifier is to estimate

the density of the training data and to set a threshold on this density.

In our approach we have used the GESD for outlier detection, SVM and Nave Bayes classifier for multi-class

classification and the one-class classification method described in [15] for analysis of outliers.

3. Dataset Description3.1. KEEL DatasetThe KEEL Datasets [23] available under the category of Classification was used for the analysis. The dataset

includes target instances as well as outlier instances. All the datasets are binary problems and they do not contain

any missing attributes. No categorical attributes are considered, and all the attributes available in the datasets are

numerical.

Table 1: Dataset Description

Name No of InstancesBanana 5300

Phoneme 5404

Appendicitis 106

Titanic 2201

Mammographic 830

4. Experimental Results and Discussion4.1.

Results on KEEL Dataset

Analysis of the results on various datasets shows that the One-Class Classifiers detect more outliers, when compared

to other classification methods. While the SVM shows an almost similar performance, its performance is slightly

lower than one-class classifiers. Figure 1-5 shows the performance of GESD, Nave Bayes, SVM and One-Class

Classifiers on various datasets.


5/11


8

Figure 1: Banana Dataset

Figure 2: Phoneme Dataset

0

20

40

60

80

100

120

140

Banana

N

o

o

f

O

u

t

l

i

e

r

s

DataSet

GESD

NaiveBayes

SVM

OCC

0

50

100

150

200

250

300

350

Phoneme

N

o

o

f

O

u

t

l

i

e

r

s

DataSet

GESD

NaiveBayes

SVM

OCC


6/11


9

Figure 3: Appendicitis Dataset

Figure 4: Titanic Dataset

0

10

20

30

40

50

60

Appendicitis

N

o

o

f

O

u

t

l

i

e

r

s

DataSet

GESD

NaiveBayes

SVM

OCC

0

510

15

20

25

30

35

40

Titanic

N

o

o

f

O

u

t

l

i

e

r

s

DataSet

GESD

NaiveBayes

SVM

OCC


7/11


10

Figure 5: Mammographic Dataset

Experiments were considered by varying the number of outliers during the training phase and then observing the

performance of each method. Since GESD is a statistics based outlier detection method, its performance deteriorated

slightly. While considering Nave Bayes and SVM, they rely completely on the training data while classifying data,

hence as the number of outliers in the training data reduces, the detection rate of SVM and Nave Bayes drops

drastically. The one-class classifier relies completely on the normal data and not the anomalies, hence the drop in

data detection rate is very low when compared to other methods.

Figure 6: Banana Dataset (X-Axis: percentage of outliers in training dataset & Y-Axis: number of outliers

detected)

0

10

20

30

40

50

60

Mammographic

N

o

o

f

O

u

t

l

i

e

r

s

DataSet

GESD

NaiveBayes

SVM

OCC

0

20

40

60

80

100

120

140

100 90 80 70 60 50 40 30 20 10

Banana

GESD

NaiveBayes

SVM

OCC


8/11


11

Figures 6-10 represent the detection rates of various algorithms, when we increased the imbalance in training

dataset. Analysis of the datasets show consistent performance of the one-class classifier, while deviations are

observed in the other detection methods.

Figure 7: Phoneme Dataset (X-Axis: Percentage of outliers in training dataset & Y-Axis: number of outliers

detected)

Figure 8: Appendicitis Dataset(X-Axis: Percentage of outliers in training dataset & Y-Axis: number of

outliers detected)

0

50

100

150

200

250

300

350

400

100 90 80 70 60 50 40 30 20 10

Phoneme

GESD

NaiveBayes

SVM

OCC

0

10

20

30

40

50

60

100 90 80 70 60 50 40 30 20 10

Appendicitis

GESD

NaiveBayes

SVM

OCC


9/11


12

Figure 9: Titanic Dataset(X-Axis: Percentage of outliers in training dataset & Y-Axis: number of outliers

detected)

Figure 10: Mammographic Dataset(X-Axis: Percentage of outliers in training dataset & Y-Axis: number of

outliers detected)

0

5

10

15

20

25

30

35

40

100 90 80 70 60 50 40 30 20 10

Titanic

GESD

NaiveBayes

SVM

OCC

0

10

20

30

40

50

60

100 90 80 70 60 50 40 30 20 10

Mammographic

GESD

NaiveBayes

SVM

OCC


10/11


13

4.2. DiscussionsFrom figures 6-10, where we analyses and compared the detection rates of various methods, we could infer from the

slope of the line that One Class Classification approach doesnt show much difference when the imbalance in the

training dataset is increased to a greater extent. SVM a popular binary classifier showed considerable performance

initially but later deteriorated when higher levels of imbalance were introduced. The slope of GESD, which is

completely based on statistical approach and One Class Classification, which doesnt rely on them, remained

constant showing the fact that they are not affected by increasing imbalance in the training datasets. This shows that

One Class Classifier could be used as a reliable technique when class imbalance is very high in the training data.

5. ConclusionOne-class classification becomes significant when in a conventional classification problem, classes in the (training)

data are highly imbalanced, i.e. one of the classes is severely under-represented due to the measuring costs for that

class caused by the low frequency of occurrence. It might also appear that it is completely unclear, what the

representative distribution of the data is. Since most of the real time data do not contain instances for anomalies, thisfails in most cases. Multi-class classification methods in general require training set containing samples for both

legitimate data and the anomalies. Hence, as the number of anomalies starts decreasing, the accuracy of these

methods will deteriorate. While considering the outlier detection methods, the false positive rates of these methods

are high, and are not reliable when considering real-time scenarios. Hence one-class classification proves to be the

best approach for usage in real-time scenarios where the occurrence of anomalies is limited and data about the

anomalies could not be obtained. Feature Selection[17][18][20] can be incorporated into one-class classification

scenarios to provide better and optimal results. This also helps to avoid unimportant parameters and provide

increased accuracy rate. Selective sampling[19] can also be used for reducing the cost, by labeling only the

important parameters.

References

[1] R. Brause, T. Langsdorf and M. Hepp, Neural Data Mining for Credit Card Fraud Detection, in Proceedingsof the 11th IEEE International Conference on Tools with Artificial Intelligence, pages 103 - 106, Washington,

DC, USA, 1999.

[2] F. A. Gonzlez and D. Dasgupta, Anomaly Detection Using Real-Valued Negative Selection, in GeneticProgramming And Evolvable Machines, Volume 4, Issue 4, pages 383 - 403, Klower Academic Publishers

Hingham, MA, USA, December 2003.

[3] M. Markou and S. Singh, Novelty Detection: A Review - Part 1: Statistical Approaches, Signal Processing,Volume 83, Issue 12, pages 2481 - 2497, December 2003.

[4] M. Markou and S. Singh, Novelty Detection: A Review - Part 2: Neural Network Based Approaches, SignalProcessing, Volume 83, Issue 12, pages 2499 - 2521, December 2003.

[5] V. Chandola, A. Banerjee and V. Kumar, Anomaly Detection: A Survey, in ACM Computing Surveys,Volume 41, Number 3, Article 15, ACM, New York, USA, July 2009.

[6] D. Barbar, N. Wu and S. Jajodia, Detecting novel network intrusions using Bayes estimators, in Proceedingsof the first SIAM Conference on Data Mining, Chicago, April 2001.


11/11


14

[7] Ethem Alpaydin, Introduction to Machine Learning 2nd edition, pages 109-112 and 489 - 493, The MIT Press,Cambridge, Massachusetts, London, England, 2010.

[8] Christopher M. Bishop, Neural networks for pattern recognition, Oxford University Press, Oxford 1996.[9] Corinna Cortes and Vladimir Vapnik, Support-vector networks, Machine Learning 20, pages 273 - 297, 1995.[10] Piyaphol Phoungphol, Yanqing Zhang, Yichuan Zha, and Bismita Srichandan, Multiclass SVM with Ramp

Loss for Imbalanced Data Classification, IEEE International Conference on Granular Computing, 2012.

[11] Yuchun Tang, Bo Jin, Yi Sun, and Yan-Qing Zhang, Member, Granular Support Vector Machines forMedical Binary Classification Problems, IEEE, 2004.

[12] Yuchun Tang, Member, IEEE, Yan-Qing Zhang, Member, IEEE, Nitesh V. Chawla, Member, IEEE, and SvenKrasser, Member, SVMs Modeling for Highly Imbalanced Classification, IEEE, 2009.

[13] Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen Lin, A Practical Guide to Support Vector Classification,2010.

[14] David Martinus Johannes TAX, One-class classification-Concept-learning in the absence of counter-examples, ISBN: 90-75691-05-x, 2001.

[15] Zineb Noumir, Paul Honeine, Cedric Richard, On Simple One-Class Classification Methods, 2012 IEEEInternational Symposium on Information Theory Proceedings, 978-1-4673-2579-0/12/$31.00 2012 IEEE.

[16] Defeng Wang, Daniel S. Yeung, Structured One-Class Classification, IEEE Transactions On Systems, Man,And CyberneticsPart B: Cybernetics, Vol. 36, No. 6, December 2006

[17] Young-Seon Jeong, In-Ho Kang, Myong-Kee Jeong, Dongjoon Kong, A New Feature Selection Method forOne-Class Classification Problems, IEEE Transactions On Systems, Man, And CyberneticsPart C:

Applications And Reviews, Vol. 42, No. 6, November 2012.

[18] George Gomes Cabral, Adriano Lorena Inacio de Oliveira,A Novel One-Class Classification Method Basedon Feature Analysis and Prototype Reduction, 978-1-4577-0653-0/11/$26.00 2011 IEEE

[19] Piotr Juszczak, Robert P.W. Duin ,Selective Sampling Methods in One-Class Classification Problems, O.Kaynak et al. (Eds.): ICANN/ICONIP 2003, LNCS 2714, pp. 140148, 2003.

[20] David M.J. Tax, Klaus-R. Muller ,Feature Extraction for One-Class Classification, O. Kaynak et al. (Eds.):ICANN/ICONIP 2003, LNCS 2714, pp. 342349, 2003.

[21] Kathryn Hempstalk, Eibe Frank ,Discriminating Against New Classes: One-class versus Multi-classClassification, W. Wobcke and M. Zhang (Eds.): AI 2008, LNAI 5360, pp. 325336, 2008.

[22] Kenneth Kennedy, Brian Mac Namee, Sarah Jane Delany,Learning without Default: A Study of One-ClassClassification and the Low-Default Portfolio Problem, L. Coyle and J. Freyne (Eds.): AICS 2009, LNAI

6206, pp. 174187, 2010.

[23] Knowledge Extraction based on Evolutionary Learning Datasets: http://sci2s.ugr.es/keel/datasets.php.

Paper-1 Significance of One-Class Classification in Outlier Detection

Documents

Transcript of Paper-1 Significance of One-Class Classification in Outlier Detection