Optimal Feature Selection for Cluster Based Ensemble ... · overcome are removed by ant colony...

@IJRTER-2016, All Rights Reserved 43

Optimal Feature Selection for Cluster Based Ensemble Classifier Using

Meta-Heuristic Function for Medical Disease Data Classification for

Symptom Prediction

Deepesh yadav Department of computer science &engineering

Samrat Ashok Technological institute (Poly.)college Vidisha

Abstract. The diversity and applicability of data mining are increase day to day in the field of medical

science for the predication of symptom of disease. The data mining provide lots of technique for mine

data in several field, the technique of mining as association rule mining, clustering technique,

classification technique and emerging technique such as called ensemble classification technique. The

process of ensemble classifier increases the classification rate and improved the majority voting of

classification technique for individual classification algorithm such as KNN, Decision tree and support

vector machine. The new paradigms of ensemble classifier are cluster oriented ensemble technique for

classification of data.

This research paper apply classification proceed based on classifier selection to medical disease data

and propose a clustering-based classifier selection method. In the method, many clusters are selected

for a ensemble process. Then, the standard presentation of each classifier on selected clusters is

calculated and the classifier with the best average performance is chosen to classify the given data. In

the computation of normal act, weighted average is technique is used. Weight values are calculated

according to the distances between the given data and each selected cluster. There are generally two

types of multiple classifiers combination: multiple classifiers selection and multiple classifiers fusion.

Multiple classifiers selection assumes that each classifier has expertise in some local regions of the

feature space and attempts to find which classifier has the highest local accuracy in the vicinity of an

unknown test sample. Then, this classifier is nominated to make the final decision of the system.

Performance of a classifier is frequently the most important aspect of its value and is measured using

a variety of well known method and matrix is used. On the other hand knowledge of a classifier is

often treated as less important or even neglected. However it is vital for the users of the classifier as

they belief it more if they can realize how the classifier works and because additional knowledge about

the relations in observed data can be extracted by involved classifier. Consequently some of the old

methods focus on knowledge of learned classifiers or transforming non- knowledge classifiers into

human- knowledge structure. There is lack of algorithms that treat accuracy and knowledge of

classifiers as uniformly significant, converting the domain of constructing a classifier into heuristic

optimization crisis. Such algorithms are especially important in domains where there are parts of

attribute space that can be classified with high accuracy using knowledgeable classifier and parts that

require non- knowledge classifiers to achieve required classification accuracy.

The process of combining different clustering output (cluster ensemble or clustering Aggregation)

emerged as an alternative approach for improving the quality of the Results of clustering algorithms.

It is based on the success of the combination of supervised classifiers. Given a set of objects, a cluster

ensemble method consists of two principal steps: Generation, which is about the creation of a set of

partitions a of these objects, and Consensus Function, where a new partition, which is the integration

of all partitions obtained in the generation step, is computed. Over the past years, many clustering

ensemble techniques have been proposed, resulting in new ways to face the problem together with new

fields of application for these techniques. Besides the presentation of the main methods, the

introduction of taxonomy of the different tendencies and critical comparisons among the methods is

International Journal of Recent Trends in Engineering & Research (IJRTER) Volume 02, Issue 05; May - 2016 [ISSN: 2455-1457]


really important in order to give a practical application to a survey. Thus, due to the importance that

clustering ensembles have gained facing cluster analysis, we have made a critical study of the different

approaches and the existing methods.

Feature selection technique is used for selecting subset of relevant features from the data set to build

robust classification models. Classification accuracy is improved by removing most irrelevant and

redundant features from the dataset. Ensemble model is proposed for improving classification accuracy

by combining the prediction of multiple classifiers. In this research paper we used cluster based

ensemble classifier. The performance of each classifier and ensemble model is evaluated by using

statistical measures like accuracy, specificity and sensitivity.

Classification of medical data is an important task in the prediction of any disease. It even helps doctors

in their diagnosis decisions. Cluster oriented Ensemble classifier is to generate a set of classifiers

instead of one classifier for the classification of a new object, hoping that the combination of answers

of multiple classification results in better performance.

We demonstrate the algorithmic use of the classification technique by extending SVM the most popular

binary classification algorithms. From the studies above, the key to improve cluster oriented classifier

is to improve binary classification. In the final part of the thesis, we include empirical evaluation that

aim at understanding binary classification better in the context of ensemble learning.

Keywords: Classifier and Cluster Ensembles Using Genetic algorithms, Genetic algorithms based

Cluster and Classifier Ensembles for Mining Drifting Data Streams, Mining Data Streams

I. INTRODUCTION

Cluster and classification play an important role in data mining and machine learning paradigm. The

evaluation of ensemble classifier is great advantage over binary and conventional classifier. The

process of prototype classification is combined two or more method with same nature. The prototype

classification of data brings come in form of cluster oriented ensemble classifier. The need and

requirement of online transaction of data is stream classification, due to stream classification save time

of computation and storage area of network. For the purpose of medical disease data classification

various machine learning algorithm are used, such as clustering, classification, and neural network. In

the classification process of a growing data stream, also the temporary or long-standing activities of

the stream may be more significant, or it often cannot be known a priori as to which one is more

important. We decide the window or horizon of the training data to use so as to obtain the best

classification accuracy. For the proper selection of window and horizon used ensemble classifier with

support of clustering technique. In ensemble methods, the main strategy is to maintain a dynamic set

of classifiers. When a decrease in performance is practical, new classifier are fused into the ensemble

while aged and horrific the stage fusion process are disinterested. For classification, decisions of fusion

in the ensemble are combined, usually with a cluster scheme [10]. The advantage of cluster ensembles

over single classifiers in the data stream classification problem has been proved empirically and

theoretically [1, 2]. However, few ensemble methods have been designed to take into consideration

the problem of recurring contexts [3, 4]. Specifically, in problems where concepts re-occur, models of

the ensemble should be maintained in memory even if they do not perform well in the latest batch of

data. Moreover, every classifier should be specialized in a unique concept, meaning that it should be

trained from data belonging to this concept and used for classifying similar data. In [5, 7], a

methodology that identifies concepts by grouping classifiers of similar performance on specific time

intervals is described. Clusters are then assigned to classifiers according to performance on the latest

batch of data. Predictions are made by using weighted averaging. Although this strategy fits very well

with the recurring contexts problem, it has an offline step for the discovery of concepts that is not

suitable for data streams. In particular, this framework will probably be inaccurate with concepts that

did not appear in the training set. To classified real-world data set with overlapping features from

different classes. The training of class borders between overlap class features in such cases is a hard

crisis. Extreme preparation of the base classifiers will lead to accurate training of the decision border

but resulting in over fitting thus miss-classifying instances of test data. On the other hand, learning



generalized boundaries will avoid over fitting but at the cost of always missclassifying some

overlapping features. This problem on learning the class boundaries of overlapping features remains

inherent in all the base classifiers and is propagated to the decision fusion stage as well even though

the base classifier errors are uncorrelated. We come bring in clustering at this point. Clustering is the

process of partitioning a data set into multiple groups where each group contains data points that are

very close in Euclidean space. The clusters have well defined and easy to learn boundaries. Let’s

assume that the features are labeled with their cluster number [8]. Now if the base classifiers are trained

on the modified data set they will learn the cluster boundaries. As the clusters have well defined easy

to learn boundaries the base classifiers can learn them with high accuracy. Clusters can contain

overlapping features from multiple classes. The cluster oriented ensemble classifier performs a great

role in classification technique, but the selection process of cluster is not well defined. Now these

overcome are removed by ant colony optimization. Ant colony optimization well knows multi-

objective function used in feature optimization.

The rest of this paper is organized as follows: Section 2 describe the previous work . In Section 3.

Proposed methodology for cluster ensemble technique Section 4 summarizes the survey Section 5

concludes this paper

II. RELATED WORK

Medical diagnosis is a challenging paradigm in that complex relationships can exist in the diagnostic

features that are utilized to map to a resultant diagnosis about the disease condition. In different

luggage the state of the disease condition itself can be marked by stages where the diagnostic symptoms

or signs can be subtle or different to other stages of the disease. This means that there is often not a

clean mapping between the diagnostic features and the diagnosis. Combination of clustering’s a more

challenging task than combination of supervised classifications. In the absence of labeled training data,

we face a difficult correspondence problem between cluster labels in different partitions of an

ensemble. Recent studies have demonstrated that consensus clustering can be found using graph-based,

statistical or information-theoretic methods without explicitly solving the label correspondence

problem. Other empirical consensus functions were also considered. However, the problem of

consensus clustering is known to be NP complete. Clustering has provided a widely used mechanism

for organizing data into similar groupings. The usage of clustering has also been extended to classifiers

and detection systems in order to improve detection and provide greater classification accuracy.

Stability of a clustering algorithm with respect to small perturbations of data and also different

initializations is a desirable quality of the algorithm. Cluster ensembles, on the other hand, enforce and

exploit some instability so that the ensemble comprises of diverse cluster. Although built upon unstable

components, the ensemble is expected to be more accurate and robust than the individual clustering

method. The stability-based validity indices are not bound to the clustering method used for

partitioning the perturbed data. More importantly, there is no implied guess on the clusters’ shape and

size. This makes stability based indices more adequate for using with cluster ensembles, knowing that

the main claim of cluster ensembles is exactly that the obtained clusters can be of any shape and size.

The problem here is that the assumption that stability corresponds to high accuracy may not always

hold. In recent years, various consensus functions have been defined and coupled with heuristic

algorithms for maximizing them. Heuristic clustering ensembles algorithms are commonly based on

three main approaches, namely instance-based clustering ensembles, cluster-based clustering

ensembles, and hybrid clustering ensembles. Instance-based clustering ensembles methods require a

notion of distance measure that possibly employs information available from the ensemble to group

the data objects. Cluster-based clustering ensembles is based on the principle to cluster clusters", i.e.,

to apply a clustering algorithm on the set of all the clusters produced by the clustering solutions in the

ensemble, in order to compute a set of meta-clusters; anally, the consensus partition is computed to

assign each data object to the meta-cluster that maximizes some assignment criterion (e.g., majority

voting). Hybrid clustering ensembles methods attempt to combine ideas coming from both instance-

based and cluster-based approaches. A common limit of all the above approaches is that most of the



consensus functions proposed in literature are defined by equally considering the various clustering

solutions in the ensemble. This is clearly a weak assumption for a number of reasons; for instance, an

ensemble may be comprised of very different clustering’s, as well as clusters that are somehow

correlated with each other may appear in distinct clustering’s of the ensemble. As a consequence,

treating the constituent solutions of an ensemble equally and averaging over them to extract the

consensus partition may not be effective.

2.1 Previous Work Done

There have been a lot of work done in the field of clustering and ensemble classification. But still there

exist a problem in dealing with huge amount of data. Huge data volume and drifting concepts are not

unknown to the data mining community. One of the goal of traditional data mining algorithms is to

mine models from large databases

Brijesh Verma and Ashfaqur Rahman entitled “Cluster-Oriented Ensemble Classifier: Impact of Multi-

cluster Characterization on Ensemble Classifier Learning” a novel clusteroriented ensemble classifier

is presented [1]. This cluster oriented ensemble classifier is based on original concepts where cluster

boundaries are learned by the base classifier and cluster confidences are mapped with the help of fusion

classifier to the class decision. According to this paper an ensemble classifier is constructed using a

set of base classifier which learns the class boundaries separately over the pattern. It is a difficult

problem to learn class boundaries between overlapping class because this problem remains inherent in

all the base classifier. So concept of clustering comes into existence. Clustering is the process of

separating an item set into multiple item sets group. It is assumed that if the patterns are labeled with

their cluster number and the base classifiers are trained on the modified data set then base classifier

will learn the cluster boundaries. To gain better and improved accuracy of the ensemble classifier

clusters are classified into multiple clusters and cluster decisions produced by the base classifier are

combined into class decision by a fusion classifier.

Anne-Laure Bianne-Bernard, Fare`s Menasri, Rami Al-Hajj Mohamad, Chafic Mokbel, Christopher

Kermorvant, and Laurence Likforman-Sulem a study of building an efficient word recognition system

resulting from the combination of three handwriting recognizers [12] as building an efficient word

recognition system resulting from the combination of three handwriting recognizers. The main

component of this combined system is an HMM-based recognizer which considers dynamic and

contextual information for a better modeling of writing units. Among popular applications of

handwriting recognition are bank check processing, mailed envelopes reading, and handwritten text

recognition in documents and videos, for which different systems have been successfully developed.

The scope of recognition can be extended to reading letters sent to companies or public offices since

there is a demand to sort, search, and automatically answer mails based on document content.

However, for reading most words of a handwritten letter, recognition systems must cope with the

inherent handwriting variability and deal with a large number of word and character models. Stochastic

modeling such as Hidden Markov Models (HMMs) has proven to be effective for modeling

handwriting. HMMs are effective for modeling unconstrained words since they can cope with

nonlinear distortions. HMM systems can be divided into two types, depending on their strategy:

holistic or analytical. The aim of this study is to build an efficient word recognition system resulting

from the combination of three HMM-based recognizers. The first system is a context independent

HMM-based system which includes dynamic information. The second recognizer is an original sliding

window system which aims at enhancing performance accuracy and reducing decoding time. We

introduced a novel approach to build efficient context dependent word models based on the HMM

framework. The key features of such approach are the use of dynamic features and state-based

clustering.

Leo Breiman entitled a Bagging predictor [3] as a method for generating multiple version of a predictor

and using these to get an aggregated predictor. The aggregation averages over the version when



predicting a numerical outcome and does a plurality vote when predicting a class. The multiple

versions are formed by making bootstrap replication of the learning set and using these as new learning

sets. Tests on real and simulated data sets using classification and regression trees and subset selection

in linear regression show that bagging can give substantial gains in accuracy.

Nandita Tripathi, Stefan Wermter, Chihli Hung, and Michael Oakes entitled a technique using the

maximum significance value to detect a semantic subspace [7] as Subspace detection and processing

is receiving more attention nowadays as a method to speed up search and reduce processing overload.

Subspace Learning algorithms try to detect low dimensional subspaces in the data which minimize the

intra-class separation while maximizing the inter-class separation. Subspace learning methods are

therefore nowadays being increasingly researched and applied to web document classification, image

recognition and data clustering. This work is an effort to explore semantic subspace learning with the

overall objective of improving document retrieval in a vast document space.

III. PROPOSED METHODOLOGY FOR CLUSTER ENSEMBLE TECHNIQUE

Optimal cluster selection is important part in cluster oriented classifier. COEC on classification of data

set is efficient process, but this method generates some better result in compression of clustering

method on the consideration of loss of data. When the scale of data set is increase the complexity of

classification is also increase, it is difficult to select optimal number of cluster for individual classifier

of data set. We introduce a new feature sub set selection method for finding similarity matrix for

clustering without alteration of cluster oriented classifier. The proposed features sub set selection

method based on ant colony optimization, ant colony optimization is very famous meta-heuristic

function for searching for finding similarity of data. In this method we introduced continuity of ants

for similar features and dissimilar features collect into next node. In that process ACO find optimal

selection of features sub set. Suppose ants find features of similarity in continuous root. Every ant of

features compares their property value according to initial features set. When deciding data is cluster

selection should consider the two factors: importance degree and easiness degree of cluster and

optimal. While walking ants secrete phenomenon on the ground according to importance of the outlier

and follow, in probability pheromone previously laid by other ants and the easiness degree of the noise.

In the process of proposed work we discuss COB MODEL, ANT colony optimization and binary

classifier support Vector Machine.

3.1 Proposed Approach For Cluster Ensemble Classifier

Proposed clustering ensemble method based on ant colony optimization where the compromise

clustering is selecting optimal cluster criterion function using a ant colony algorithm. This method uses

a metric between clustering’s based on the ant between partitions. It also uses class level method to

solve the label correspondence problem. The search capabilities of ant colony algorithms are used in

these methods. It allows exploring partitions that are not easy to be found by other methods. However,

a drawback of these algorithms is that a solution is better only in comparison to another; such an

algorithm actually has no concept of an optimal solution or any way to test whether a solution is

optimal or not. This selection function combines partitions obtained by using locally adaptive

clustering AECC algorithms. When a AECC algorithm is applied to a set of objects X, it gives as an

output a partition P = { C1,C2,….,Cq}, which can be also identified by two sets {c1,…..cq}and

{w1,…..wq}, where ci and wi are the centroid and the confidence associated to the cluster Ci

respectively. The AECC algorithms are designed to work with numerical data, i.e. this method assumes

that the object representation in the dataset is made up of numerical features: X={x1,…,xn}, with xj €

R, j =1, …..,n ; n. Also, ci € Rand wi €2 R, i = 1,….., k. The set of partitions P = {p1,p2,…….,pm} is

generated by applying AECC algorithms m times with deferent parameters initialization. The process

of ant colony algorithm is to use the ensemble to choose, the selection of pheromone update (increment

and decrement of constant deposit of interval value of phenomenon) parameters of ant colony

optimization to control the sensitive value. For a given cluster assembling the problem of optimal

cluster selection can be stated as follows: given the variable set of cluster index, ,of data point, find



optimal ,which consists of classifier 𝑚 < 𝑛,𝑆ϲ𝐹 ,such that the classification accuracy is

maximized. The cluester index selection representation exploited by artificial ants includes the

following:

1. Data point that constitutes the cluster index set

.

2. Distribution of the cluster index space (𝑛𝑎 ants).

3. 𝜏𝑖 the intensity of pheromone trail associated with cluster index 𝑓𝑖 ,which reflects the previous

knowledge about the importance of𝑓𝑖. 4. For each ant 𝑗 , a list that contains the selected optimal cluster subset, 𝑅𝑗 =

.

AECC evaluation measure that is able to estimate the overall performance of subset as well as the local

importance of cluster. A classification algorithm is used to estimate the performance of optimal cluster

selection .On the other hand, the local importance of a given cluster measured using the correlation

based evaluation function, which is a filter evaluation function. In the first iteration, each ant will

randomly choose a cluster index of classifier. Only the best subsets, 𝑘 <𝑛𝑎, be used to update the

pheromone trail and influence the optimal subset of the next move. In the second and following moves,

each ant will start with 𝑚−𝑝 cluster that are randomly chosen from the previously selected 𝑘− best

subsets, where is a number that limit between 1 and 𝑚−1. In this way, the ensemble that constitutes

the best subsets will have more chance to be present in the subsets of the next iteration. However, it

will still be possible for each ant to consider other cluster index as well. For a given ant , those cluster

index are the once that achieve the best compromise between pheromone trails and local importance

with respect to 𝑆𝑗 , where 𝑆𝑗 is the subset that consists of the cluster that have already been selected by

ant . The Updated selection Measure (USM) is used for this purpose the defined as:

Where is the local importance of cluster index 𝑓𝑖 given the subset 𝑆𝑗.The parameters 𝛼 and

control the effect of pheromone trail intensity and local cluster importance respectively.

is measured using the correlation measure and defined as:

(2)

Where 𝐶𝑖𝑅 is the absolute value of the correlation between cluster index and the response (class)

variable , and 𝐶𝑖𝑠 is the absolute value of the inter- correlation between cluster index and opimal

that belongs to 𝑆𝑗

Below are the steps of the algorithm:

1. Initialization:

• Set 𝜏𝑖 =𝑐𝑐 and ,where cc is a constant and 𝛥𝜏𝑖 is the amount of change of

pheromone trail quantity for variable cluster index 𝑓𝑖. • Assign the maximum number of moves.

• Assign , where the 𝑘−𝑏𝑒𝑠𝑡 subsets will influence the subsets of the next iteration.

• Assign , where 𝑚−𝑝 is the number of cluster indexes that each ant will start with in the second

and following moves.



2. If the first iteration,

• For𝑗 =1 𝑡𝑜 𝑛𝑎,

• Randomly assign a subset of classifier to 𝑆𝑗. • Goto step 4.

3. Select the remaining p cluster index for each ant: For 𝑚𝑚 =𝑚−𝑝+1 𝑡𝑜 𝑚,

For 𝑗 =1 𝑡𝑜 𝑛𝑎 ,

Given subset 𝑆𝑗, choose cluster index 𝑓𝑖 that maximizes

.

Merge the duplicated subsets, if any, with randomly chosen index.

4. estimated the selected index of each cluster ant using a chosen classification algorithm: For 𝑗 =1 𝑡𝑜 𝑛𝑎,

o Estimate the Error of the classification results obtained by classifying the optimal cluster of 𝑆𝑗. Sort the subsets according to their . Update the minimum (if achieved by any ant in this iteration),

and store the corresponding subset of cluster.

5. Using the ensemble subsets of the best ants, update the pheromone trail intensity:

6. For 𝑗 =1 𝑡𝑜 𝑘,

Where 𝜌 is a constant such that represents the evaporation of pheromone trails.

7. If the number of moves is less than the maximum number of moves, or the desired E has not

been achived, initialize the subsets for next iteration and goto step3:

For 𝑗 =1 𝑡𝑜 𝑛𝑎 ,

o From the selected cluster of the best ants, randomly produce 𝑚− 𝑝 classifeir subset for ant , to be

used in the next iteration, and store it in .

Goto step 3.

Figure - 1 shows proposed model of optimal selection of cluster and classification



IV. EXPERIMENTAL RESULT

The proposed method implements in MATLAB 7.8.0 and tested with very reputed data set from UCI

machine learning research center. In the research work, I have measured classification accuracy, mean

absolute error and execution time of classification method. To evaluate these performance parameters

I have used six datasets from UCI machine learning repository [6] namely Wisconsin breast cancer

(original), Thyroid dataset, heart disease dataset, hepatitis dataset, liver dataset, and diabetes dataset.

Out of these six dataset, two are small dataset namely cancer and heart dataset; and remaining four are

large datasets namely Wisconsin breast cancer dataset and reaming all four dataset.

4.1 Description Of Dataset[103]

Six standard medical diagnosis datasets from the UCI Machine Learning Repository [5] were used for

this experimental evaluation. These include the Wisconsin breast cancer dataset, the Cleveland heart

disease dataset, and the Pima Indian diabetes dataset, hepatitis dataset, liver dataset and etc. Out of the

699 cases for the breast cancer dataset, 16 records containing missing attributes were omitted, leaving

683 for use. In all cases, the dataset was randomly split into a training set comprising approximately

70% of the data and an evaluation set consisting of the remaining 30%.

4.2 Performance Parameters

4.2.1 Classification Accuracy

I have used Wisconsin breast cancer, iris, glass identification, page blocks classification, white wine

quality, and yeast dataset for testing accuracy of the proposed method. The classification accuracy of

all six dataset is increased significantly by the proposed method.

Accuracy rate = Total no. of correctly classified instance * 100

………………………………….(a)

Total no. of instance

4.2.2 Mean absolute Error Rate

In classification the mean absolute error (MAE) is a quantity used to measure how close real or

predictions are to the eventual outcomes. The mean absolute error is given by

………………. (b)

As the name suggests, the mean absolute error is an average of the absolute errors

, where is the prediction and the true value. Note that alternative

formulations may include relative frequencies as weight factors.

4.2.3 Relative mean error rate

The formula for Relative error can be written as,

Relative error = Absolute error / Accepted value * 100…………………. (c)

This is also called as relative error equation. Relative error can be calculated with the above given

formula.

4.3 Experimental Process Of Ensemble Clustering And Classification (Knn)

In this section we describe the result analysis of ensemble clustering and classification. The base

classifier KNN is used and the clustering technique is used k-means technique. For the selection of

optimal cluster used ant colony optimization technique. The number of cluster is 5.



Figure 4.1 shows that the classification result map of fixed cluster 5 and base classifier KNN in result map.

Figure 4.2 shows that the classification result maps of fixed cluster 5 and base classifier KNN with optimal

selection of number of classifier.

Table 4.1 shows that experimental result analysis of different data set and getting some standard

result in both these cases

Dataset

MAE MRE ACCURACY EXECUTION

TIME

cancer 5.45 12.39 82.85 12

3.725 11.76 82.85 13.58

Diabetes 6.00 14.56 87.95 13.99

5.0 16.93 8.95 13.51

thyroid 4.5 12.6 92.95 13.79

3.5 14.81 92.95 14.29



liver 4.0 17.9 93.95 14.24

3.0 13.81 93.95 14.05

hepatitis 3.89 11.6 96.35 14.71

2.89 12.18 96.35 14.74

Heart

dataset

1.56 10.34 97.45 14.21

2.01 9.78 97.67 13.99

Figure 4.3 shows that comparative result analysis of fixed number of cluster technique and optimal number

of selected cluster for classification. The result graph shows that optimal selection of cluster process is

improved in compression of fixed number of cluster.

Figure 4.4 shows that comparative result analysis of fixed number of cluster technique and optimal number

of selected cluster for classification. The result graph shows that optimal selection of cluster process is

reduces the value of MAE and MRE.

4.4 Experimental Process Of Ensemble Clustering And Classification (Svm)

In this section we describe the result analysis of ensemble clustering and classification. The base

classifier SVM is used and the clustering technique is used k-means technique. For the selection of

optimal cluster used ant colony optimization technique. The number of cluster is 6.

Figure 4.1 shows that classification map result of fixed number of cluster and support vector machine

classifier. The number of cluster is used is 6 and base classifier is 6 used.

0

20

40

60

80

100

120

comprassion result of accuracy of all six data set

along with optimal selectionn of cluester

ACCURACY

EXECUTION TIME

0 5

10 15 20

MAE

MRE



Figure 4.5 shows that classification map result of fixed number of cluster and support vector machine

classifier. The number of cluster is used is 6 and base classifier is 6 used. The process of ANT improved the

selection of optimal number of cluster in fixed number of cluster.

Table 4.2 shows that experimental result analysis of different data set and getting some standard result in

both these cases

Dataset

MAE MRE ACCURACY EXECUTION

TIME

cancer 4.55 10.39 86.85 10

3.725 9.76 90.85 12.45

Diabetes 5.23 8.56 87.95 13.99



4.67 7.93 88.95 13.51

thyroid 4.5 5.6 94.95 13.79

3.5 4.81 95.95 14.29

liver 4.0 7.94 94.95 14.24

3.0 6.81 96.95 14.05

hepatitis 3.89 7.6 97.35 14.71

2.89 5.18 96.35 14.74

Heart

dataset

1.56 4.34 97.45 14.21

2.01 5.78 98.67 13.99

Figure 4.6 shows that optimal selection of cluster and support vector machine classifier ensemble result

analysis the optimal selection of result are increase in compression of fixed number of classifier.

Figure 4.7 shows that comprative result analysis of MAE and MRE for all six dataset

4.5 Comparative Result Analysis Of Number Of Base Classifier Are Changed In Ensemble

0

50

100

150

Comprassion result of all Six dataset for optimal selection and fixed

number of cluester

ACCURACY

EXECUTION TIME

0

5

10

15

20

MAE

MRE



4.3 Table Performance evaluation of cancer data set for all classification method.

Number of

base lassifier Accuracy Error

KNN SVM SVM-

ANT

KNN SVM SVM-

ANT

3 81.54 86.64 95.64 2.85 4.00 2.50

4 81.54 86.64 95.64 2.61 4.00 2.50

5 82.02 86.74 96.54 3.12 4.05 2.61

4.3 Table Performance evaluation of heart data set for all classification method.

Number of

base classifier Accuracy Error

KNN SVM SVM-

ANT

KNN SVM SVM-

ANT

3 80.90 86.00 95.00 2.75 4.00 2.50

4 80.90 86.00 95.00 2.50 4.00 2.50

5 80.90 86.00 95.00 2.37 3.00 1.50

4.4 Table Performance evaluation of liver data set for all classification method.

4.5

Number of

base classifier Accuracy Error

KNN SVM SVM-

ANT

KNN SVM SVM-

ANT

3 82.12 87.32 96.98 2.94 4.02 2.57

4 83.22 88.33 97.32 2.70 4.56 2.85

5 84.22 87.43 97.98 2.56 4.87 1.50

4.6 Table Performance evaluation of diabetes data set for all classification method.

Number

of base

classifier

Accuracy Error

KNN SVM SVM-

ANT

KNN SVM SVM-

ANT

3 82.58 88.59 96.95 3.72 5.00 3.50

4 82.65 89.56 97.05 3.15 4.00 2.50

5 80.85 87.95 98.59 2.86 4.00 2.50



4.7 Table Performance evaluation of hepatitis data set for all classification method.

Number

of base

classifier

Accuracy Error

KNN SVM SVM-

ANT

KNN SVM SVM-

ANT

3 82.67 87.73 96.86 3.68 5.00 3.50

4 81.70 88.86 97.66 3.12 4.00 2.50

5 82.00 88.68 97.99 3.68 5.00 3.50

4.8 Table Performance evaluation of hepatitis data set for all classification method.

Number

of base

classifier

Accuracy Error

KNN SVM SVM-

ANT

KNN SVM SVM-

ANT

3 87.08 92.18 98.18 4.56 6.00 4.50

4 87.08 92.18 99.68 3.92 5.00 3.50

5 87.08 92.18 99.07 3.53 5.00 3.50

4.5 Performance Evaluation Of All Six Medical Dataset In Terms Of Number Of Base Classifier

Are Changed.

Figure 4.8 shows that comparative result of accuracy and error of number of base classifier are changed

during selection of ensemble cluster classification technique. In all base classifier support vector machine

classifier perform better result in all classifier. This result shows the cancer data classification for

ensemble process.



Figure 4.9 shows that comparative result of accuracy and error of number of base classifier are

changed during selection of ensemble cluster classification technique. In all base classifier support

vector machine classifier perform better result in all classifier. This result shows the heart dataset

classification for ensemble process.



classifier perform better result in all classifier. This result shows the diabetes data classification for

ensemble process.





classifier perform better result in all classifier. This result shows the hepatitis data classification for

ensemble process.



classifier perform better result in all classifier. This result shows the liver data classification for ensemble

process.



classifier perform better result in all classifier. This result shows the thyroid data classification for

ensemble process.

V. CONCLUSIONS

In this research paper proposed an optimal feature selection of cluster for ensemble classifier. The

proposed ensemble classifier classified the medical diagnosis of disease data such as cancer, liver

problem, and heart disease and so on. Our proposed method better classified data in compression of

conventional cluster ensemble technique. For the selection of optimal number of cluster used ANT

colony optimization algorithm. Ant colony optimization is multi-objective function gives the optimal

number of cluster for assembling process.

For the classification purpose used two base classifier KNN and SVM. The task of clustering performs

by fixed clustering technique such as k-means algorithm.

Optimal selection of Cluster ensemble has improved method for improving robustness, stability and

accuracy of unsupervised classification solutions. So far, numerous works have contributed to find

consensus clustering.



Compared classification accuracy on the same datasets between conventional and proposed ensemble

classification algorithms. We investigated the classification accuracy on benchmark datasets including

UCI repository, medical disease data, as well as real world datasets. The average and the maximum

classification accuracy computed by conventional and proposed ensembles on the same datasets were

compared. The comparison results prove that the average classification accuracy and the maximum

classification accuracy resulted by proposed ensembles on more datasets is better than the average

classification accuracy and the maximum classification accuracy computed by conventional on the

same datasets. The improving rate of classification accuracy average and classification accuracy

maximum resulted by proposed ensembles against conventional were also presented. They prove the

average classification accuracy resulted by proposed ensembles is usually better than the classification

accuracy resulted by conventional. Therefore, using genetic algorithms in classification ensembles are

able to improve the overall classification accuracy.

REFERENCES 1. Brijesh Verma and Ashfaqur Rahman “Cluster-Oriented Ensemble Classifier: Impact of

2. Multicluster Characterization on Ensemble Classifier Learning” in IEEE Transactions on knowledge and data

engineering, 2012.

3. Nayer M.Wanas, Rozita A. Dara and Mohamed S. Kamel “Adaptive fusion and co-operative training for classifier

ensembles” in Pattern Analysis and Machine Intelligence Lab, University of Waterloo, 2006.

4. Giorgio Fumera, Fabio Roli and Alessandra Serrau “A Theoretical Analysis of Bagging as a Linear Combination of

Classifiers” in IEEE Transactions,2008.

5. Albert Hung-Ren Ko and Robert Sabourin “The Implication of Data Diversity for a Classifier-free Ensemble Selection

in Random Subspaces” in IEEE Transactions,2008.

6. Juan J. Rodrı´guez and Jesu ´s Maudes “Boosting recombined weak classifiers” in ScienceDirect, 2007.

7. Oriol Pujol and David Masip “Geometry-Based Ensembles: Toward a Structural Characterization of the Classification

Boundary” in IEEE Transactions, 2009.

8. Nandita Tripathi, Stefan Wermter, Chihli Hung and Michael Oakes “Semantic Subspace Learning with Conditional

Significance Vectors” in IEEE Transactions,2010.

9. Valerio Grossi, Alessandro Sperduti “Kernel-Based Selective Ensemble Learning for Streams of Trees” in

Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence 2010.

10. Seyda Ertekin, Jian Huang, Leon Bottou and C. Lee Giles “ Learning on the Border”,2007

11. Piyasak Jeatrakul, KokWaiWong, and Chun Che Fung “Data Cleaning for Classification Using Misclassification

Analysis” in Data Cleaning for Classification Using Misclassification Analysis, 2010.

12. Amal S. Ghanem and Svetha Venkatesh, Geoff West “Multi-Class Pattern Classification in Imbalanced Data” in

International Conference on Pattern Recognition, 2010.

13. Guoqiang Peter Zhang entitled “Neural Networks for Classification: A Survey” in IEEE TRANSACTIONS, 2000.

14. UCI Machine Learning Database, http://archive.ics.uci.edu/ml/, Feb. 2010

Optimal Feature Selection for Cluster Based Ensemble ... · overcome are removed by ant colony...

Documents

Transcript of Optimal Feature Selection for Cluster Based Ensemble ... · overcome are removed by ant colony...