Improving the Performance of Entropy Ensembles of Neural ... · as Na wve Bayes(NBs), KNN, Decision...

16
Improving the Performance of Entropy Ensembles of Neural Networks (EENNS) on Classification of Heart Disease Prediction 1 S.Silvia Priscila and 2 M. Hemalatha 1 Bharathiar University, Coimbatore. [email protected] 2 Bharathiar University, Coimbatore. [email protected] Abstract In the classification of high-dimensional data, feature selection is a most significant step particularly for selection of gene expression in microarray data which encompass thousands of features. Being a wrapper method, an Entropy Ensemble of Neural Networks with Recursive Feature Elimination (EENN- RFE) algorithm is unique among influential feature selection techniques. In this work a EENN-RFE based feature selection algorithm is proposed and it aims to remove redundant features depending on the calculated correlation between features before using EENN-RFE. Noisy and irrelevant features are removed initially and consequently those features for which the correlation coefficient among each pair of the remaining features is less than a predetermined threshold are eliminated. In addition this work proposes a new method for training of Artificial Neural Networks (ANNs) by measuring the computation of an objective function with arithmetic mean of an ensemble of chosen randomly created NNs, and applies this EENN classifier to the heart disease classification problem. This EENN-RFE algorithm is implemented to the heart disease dataset and the evaluation results reveal that the selection of a subset of features and class-wise classification capability is effective than the existing Support Vector Machine with Recursive Feature Elimination (SVM-RFE). From the results it may be concluded that the proposed EENN-RFE algorithm after the feature selection process, an EENN classifier obtains reasonable trade-off between accuracy and execution time. Keywords:Data Mining, high-dimensional data, Feature selection, Entropy Ensemble of Neural Networks with Recursive Feature Elimination (EENN- RFE), Artificial Neural Networks (ANNs), classification, heart disease prediction. International Journal of Pure and Applied Mathematics Volume 117 No. 7 2017, 371-386 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu Special Issue ijpam.eu 371

Transcript of Improving the Performance of Entropy Ensembles of Neural ... · as Na wve Bayes(NBs), KNN, Decision...

Page 1: Improving the Performance of Entropy Ensembles of Neural ... · as Na wve Bayes(NBs), KNN, Decision Tree (DT) Algorithm, Neural Network(NN). etc., were used to classify the patient

Improving the Performance of Entropy Ensembles of

Neural Networks (EENNS) on Classification of Heart

Disease Prediction 1S.Silvia Priscila and 2M. Hemalatha

1Bharathiar University,

Coimbatore.

[email protected]

2Bharathiar University,

Coimbatore.

[email protected]

Abstract In the classification of high-dimensional data, feature selection is a most

significant step particularly for selection of gene expression in microarray data

which encompass thousands of features. Being a wrapper method, an Entropy

Ensemble of Neural Networks with Recursive Feature Elimination (EENN-

RFE) algorithm is unique among influential feature selection techniques. In this

work a EENN-RFE based feature selection algorithm is proposed and it aims to

remove redundant features depending on the calculated correlation between

features before using EENN-RFE. Noisy and irrelevant features are removed

initially and consequently those features for which the correlation coefficient

among each pair of the remaining features is less than a predetermined

threshold are eliminated. In addition this work proposes a new method for

training of Artificial Neural Networks (ANNs) by measuring the computation

of an objective function with arithmetic mean of an ensemble of chosen

randomly created NNs, and applies this EENN classifier to the heart disease

classification problem. This EENN-RFE algorithm is implemented to the heart

disease dataset and the evaluation results reveal that the selection of a subset of

features and class-wise classification capability is effective than the existing

Support Vector Machine with Recursive Feature Elimination (SVM-RFE). From

the results it may be concluded that the proposed EENN-RFE algorithm after

the feature selection process, an EENN classifier obtains reasonable trade-off

between accuracy and execution time.

Keywords:Data Mining, high-dimensional data, Feature selection, Entropy

Ensemble of Neural Networks with Recursive Feature Elimination (EENN-

RFE), Artificial Neural Networks (ANNs), classification, heart disease

prediction.

International Journal of Pure and Applied MathematicsVolume 117 No. 7 2017, 371-386ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version)url: http://www.ijpam.euSpecial Issue ijpam.eu

371

Page 2: Improving the Performance of Entropy Ensembles of Neural ... · as Na wve Bayes(NBs), KNN, Decision Tree (DT) Algorithm, Neural Network(NN). etc., were used to classify the patient

1. Introduction

Through high death rate, heart disease is a common prediction. Generally, in

their initial stage patients exhibit no symptoms and are usually diagnosed in

advanced stage for cancer. The focal point of their sustained existence is hardly

less than a year [1]. In each case, these data highlights the importance of fast in

depth insight into the gene level of this particular malignant tumor. The

development of microarray technology has facilitated scientists in the direction

of screen mRNA expression level of up to tens of thousands of genes parallel

inside just one single experiment on a solid surface, which is made of either

glass or silicon. Biologists use the data consequently to categorize cancer-related

genes by comparing their expression values in healthy and cancerous tissues. On

the other hand, microarray data are usually available with thousands of features

but only dozens of samples. The universal challenge of big data necessitates for

efficient feature selection techniques [2]. In feature selection, a small subset of

features is selected with stronger classification capacity. Two essential types of

feature selection techniques are filter methods and wrapper methods. In the

former, methods like Fisher criterion [3], T-statistics [4] and mutual information

[5] work correspondingly in estimating the state of systems [6]. The features are

selected in accordance with some statistics that quantifies each feature’s

relevancy with the class labels. The intrinsic statistical properties of data, drives

the selected feature subset which makes it independent of the learning method

used. In the later, the quality of features is evaluated in terms of their influence

on the performance of a certain classifier. Comparison reveals that the wrapper

methods, depends highly on the classification algorithms used, and generate

feature subset that leads to higher classification accuracy, in spite of the fact that

it is more time consuming. Both of the feature selection methods have better

explanatory power than feature extraction [7] methods such as Principal

Components Analysis (PCA) [8].

Due to high quality generalization capability, Support Vector Machine (SVM)

remains an exceptional machine learning method [9-10] that has been

successfully applied in various domains [11]–[12]. The kernel function has

further enhanced the applications of SVM to a wider extent. A backward

elimination technique named Support Vector Machine- Recursive Feature

Elimination (SVM-RFE) [13] is developed as an efficient wrapper method. This

method initially starts with all the features and gradually removes one feature at

a time, and the removal has the least impact on the value of the objective

function, until the final desired feature subset is attained. Though SVM-RFE

effectively eliminates the irrelevant features, the final feature subset generally

consists of redundant features which are expected to be removed in early

iterations. However for SVM, the classification is a time consuming task. The

standard approach to artificial neural networks (ANN) is based on the

optimization theory [14]. ANN is a composition of neurons (see the next

section), which is dependent on the set of weights assigned on the edges of the

neural network. An approximation of an objective function at the training set is

International Journal of Pure and Applied Mathematics Special Issue

372

Page 3: Improving the Performance of Entropy Ensembles of Neural ... · as Na wve Bayes(NBs), KNN, Decision Tree (DT) Algorithm, Neural Network(NN). etc., were used to classify the patient

considered as neural network in pattern classification problem. To determine this

approximation an optimization problem at the space of parameters of the neural

network is studied.

A novel method is proposed in this paper to solve the above problem. Initially, to

measure the redundancy among features, a correlation coefficient is employed.

For removal of redundant features the correlation coefficient between every pair

of the remaining features is maintained less than a predetermined threshold. A

new approach to training of (ensembles of) artificial neural networks is

introduced after the filtering procedure. Here, an ensemble of selected neural

networks with randomly chosen parameters is considered instead of optimization

of the parameters of a single neural network. A neural network that has an

adequately less number of errors at the training set is selected. An averaged

neural network is then introduced which in the simplest case is an arithmetic

mean of selected neural networks. The results revealed that the proposed Entropy

Ensemble of Neural Networks (EENN) is some kind of approximation of the

objective function. EENN method is experimented on pancreatic cancer dataset

and the results reveal that EENN-RFE algorithm is capable if selecting a subset

of features that has both overall and class-wise classification capability than the

baseline SVMRFE. In addition, it is observed that following the feature selection

process, an EENN classifier with RFE kernel can achieve reasonable trade-off

between sensitivity and specificity.

2. Literature Review

Prediction System that utilizes the Deep Belief Network (DBN) classification

algorithm to predict the probable likelihood of heart related diseases. Deep

Belief Network is one of the effective classification algorithm which employs

Deep Learning approach in Deep Neural Network. This proposed work is a

comparison of Convolutional Neural Network (CNN) and DBN classification

algorithms. Convolutional Neural Network being one of the unsupervised

algorithm that provides 82% of accuracy in the prediction of heart diseases. But

the proposed Deep Belief Network algorithm provides 90% prediction accuracy

which inturn enhances the prediction accuracy of heart disease prediction

system. It is designed in the MATLAB 8.1 development environment.

Kim and Kang[16] devised an NN based prediction of CHD risk using Feature

Correlation Analysis (NN-FCA) in two stages. First being the feature selection

stage, which makes features that comply to the importance in predicting CHD

risk, is ranked, and second being the feature correlation analysis stage, during

which one learns about the existence of correlations between feature relations

and the data of each NN predictor output, is determined. From the 4146

individuals in the Korean dataset evaluated, 3031 had low CHD risk and 1115

had CHD high risk. The area under the Receiver Operating Characteristic (ROC)

curve of the proposed model (0.749 ± 0.010) was larger than the Framingham

Risk Score (FRS) (0.393 ± 0.010). The proposed NN-FCA, which utilizes

feature correlation analysis, was found to be better than FRS in terms of CHD

International Journal of Pure and Applied Mathematics Special Issue

373

Page 4: Improving the Performance of Entropy Ensembles of Neural ... · as Na wve Bayes(NBs), KNN, Decision Tree (DT) Algorithm, Neural Network(NN). etc., were used to classify the patient

risk prediction. Furthermore, the proposed model resulted in a larger ROC curve

and more accurate predictions of CHD risk in the Korean population than the

FRS.

Sen [17] observed that researchers accelerated their research works to develop

software with the help machine learning algorithm to help doctors in both

prediction and diagnosing of heart disease. The main goal of this work is the

prediction of the heart disease using machine learning algorithms. Comparative

performances of machine learning algorithms are interpreted through graphical

representations.

Thomas and Princy [18] conducted a survey on various classification techniques

used for predicting the risk level of an individual based on age, gender, blood

pressure, cholesterol, pulse rate etc. Data mining classification techniques such

as Naive Bayes(NBs), KNN, Decision Tree (DT) Algorithm, Neural

Network(NN). etc., were used to classify the patient risk level. Classification

accuracy was found to be high when more number of attributes was used.

Shrivastava and Mehta [19] aimed to build the model for the heart disease

diagnosis and to increase the effectiveness of the proposed model. The artificial

neural network model espoused here is based on the symptoms and risk factors.

The multilayer perceptron network is used to develop the model and trained on

Clevel and clinic foundation dataset. The Back-Propagation Neural Network

(BPNN) algorithm implemented is trained, validated, and tested for the

performance and recorded. Turabieh [20] investigated the performance of two

computational intelligence methods namely Gray Wolf Optimization (GWO)

and Artificial Neural Networks (ANN) by hybridizing them for prediction of

heart disease. The global search method GWO is efficient over the local search

gradient-based back propagation method. The proposed algorithm exhibits the

capability of ANN in finding a relationship between the input and the output

variables while the stochastic search capability of GWO is used for finding the

initial optimal weights and biases of the ANN thereby reduces the probability of

ANN getting stuck at local minima and slowly converging towards global

optimum. The performance of hybrid model (ANN-GWO) was compared with

standard BPNN using Root Mean Square Error (RMSE) for evaluation. The

results revealed that the proposed model enhanced the convergence speed and

the accuracy of prediction.

Choi et al [21] explored whether the use of deep learning to model temporal

relations among events in Electronic Health Records (EHRs) would improve

model performance in predicting initial diagnosis of Heart Failure (HF)

compared to conventional methods that ignore temporality. To detect relations

among time-stamped events (eg, disease diagnosis, medication orders, procedure

orders, etc.), Recurrent Neural Network (RNN) models using Gated Recurrent

Units (GRUs) were adapted with a 12- to 18-month observation window of cases

and controls. The regular logistic regression, neural network, support vector

machine, and K-nearest neighbor classifier approaches are used to compare the

International Journal of Pure and Applied Mathematics Special Issue

374

Page 5: Improving the Performance of Entropy Ensembles of Neural ... · as Na wve Bayes(NBs), KNN, Decision Tree (DT) Algorithm, Neural Network(NN). etc., were used to classify the patient

performance metrics of the proposed model. Yazdani and Ramakrishnan [22]

designed and implemented a clinical decision support system that helped in

predicting the risk of heart disease. The optimal artificial neural network model

identified among the different models is evaluated for accuracy measures on

standard heart disease database. To ease the doctors in predicting the risk of

heart disease an interface is also developed based on the optimal model.

Sehgal and Sharma [23] focused on the prediction of heart disease using gradient

based techniques of neural networks. ANN based models for prediction of heart

disease have been developed using neural network toolbox available in

MATLAB.

To develop the disease prediction models, the multilayer perceptron architecture

of neural networks with error back propagation along with significant gradient

based methods has been employed. A performance comparative study of

algorithms that employs various important training methods is also been

provided. The intention is to provide a more effective and acceptable framework,

for prediction and diagnosis of heart disease.

3. Proposed Work

Initially, the correlation coefficient is used to quantify the redundancy among

features. The removal of redundant features helps the correlation coefficient

between every pair of the remaining features to be less than a predetermined

threshold.

A new approach to training of (ensembles of) artificial neural networks is

introduced after the filtering procedure. Here, an ensemble of selected neural

networks with randomly chosen parameters is considered instead of optimization

of the parameters of a single neural network. A neural network that has an

adequately less number of errors at the training set is selected. An averaged

neural network is then introduced which in the simplest case is an arithmetic

mean of selected neural networks. The results revealed that the proposed Entropy

Ensemble of Neural Networks (EENN) is some kind of approximation of the

objective function.

EENN method is experimented on pancreatic cancer dataset and the results

reveal that EENN-RFE algorithm is capable if selecting a subset of features that

has both overall and class-wise classification capability than the baseline

SVMRFE. In addition, it is observed that following the feature selection process,

an EENN classifier with RFE kernel can achieve reasonable trade-off between

sensitivity and specificity.

The overall framework of the proposed work is illustrated in the Fig 1.

International Journal of Pure and Applied Mathematics Special Issue

375

Page 6: Improving the Performance of Entropy Ensembles of Neural ... · as Na wve Bayes(NBs), KNN, Decision Tree (DT) Algorithm, Neural Network(NN). etc., were used to classify the patient

Figure 1: Architecture Diagram

Based on the theory of EENN-RFE, it is capable of removing those features that

contributes the least to the classification, but has no control over redundant

features. This leads to severe decline in the performance of classifiers’. Two

features, namely f1 and f2, on assumption that are proportional to each other and

highly relevant to the class labels, both of them would survive in the EENN-RFE

at the expense of missing potentially useful features. A neglected useful feature

is generally ineffective by itself due to its low correlation with the class labels.

However, it can considerably improvise the classification accuracy along with

other features. A typical example is the XOR problem. The planar case is taken

to consideration in the form of a coordinate system, where each axis represents a

feature. The data points in the first and third quadrant belong to one class and

those in the other two quadrants belong to some other class. Thus each feature is

individually irrelevant to the class while both feature together result in perfect

classification. Preprocessing efficiently eliminates systematic variation. To

improve the performance of EENN-RFE, the redundant features are eliminated

before conducting the baseline algorithm. The correlation coefficient introduced

to quantify the similarity between features. The correlation coefficient between

features fi and fj is computed as follows:

Heart disease dataset Microarray dataset

Perform pre-process

Feature selection and classification

Feature selection

RFE

Perform feature selection Selected features

Classification

EENN-RFE

Classified results

No heart disease

Heart disease Both

International Journal of Pure and Applied Mathematics Special Issue

376

Page 7: Improving the Performance of Entropy Ensembles of Neural ... · as Na wve Bayes(NBs), KNN, Decision Tree (DT) Algorithm, Neural Network(NN). etc., were used to classify the patient

𝑟𝑖𝑗 =∑ (𝑓𝑖𝑘 − 𝑓��)(𝑓𝑖𝑘 − 𝑓��)𝑛

𝑘=1

√∑ (𝑓𝑖𝑘 − 𝑓��)2𝑛

𝑘=1√∑ (𝑓𝑖𝑘 − 𝑓��)

2𝑛𝑘=1

(1)

where fik is the value of feature fi corresponding to the kth sample, and fi is the

averaged value of feature fi of all the n samples. The value of rij is within [−1, 1]

and the bigger the |rij |, fi and fj are more correlated features. The preprocessing

procedure is described below

1) Initialize the predefined threshold as 𝜃, the number of the current feature

as i = 1, and j = 2.

2) Compute 𝑟𝑖𝑗, if |𝑟𝑖𝑗 | > 𝜃, then remove feature fi and go to step 4.

3) Update as 𝑗 = 𝑗 + 1 and go to step 2 until j = l (l is the number of the

original features).

4) Update as 𝑖 = 𝑖 + 1, 𝑗 = 𝑖 + 1 and go to step 2 until 𝑖 = 𝑙 − 1.

Recursive Feature Elimination (RFE)

The EENN-RFE is adopted to select the most discriminative ion features

following the filtration of noisy irrelevant information. Initially, the current

feature subset is used to build the ENN model, the weight of each feature is

computed, r% (0 < r < 100) ion features with the smallest weights are eliminated.

If the number of r% current ion features is smaller than 1, only one ion feature

with the smallest weight is eliminated. The iteration is sustained until all the ion

features have been eliminated. In each iteration, the current ion feature subset is

evaluated by the k-fold cross-validation. The feature with the utmost cross

validation accuracy is the finally selected feature subset. An ENN is an ensemble

of a finite number of NNs that are trained for the similar task. The NN in the

ensemble are generally trained independently and then their predictions are

combined [24]. Any one NN in the ENN could make available a predictor by

itself for the task, but better results might be obtained when combined. The

architecture of the ENN is illustrated in Fig 2.

Figure 2: Architecture of the Ensemble Neural Network

The two main steps to construct an ENN are:

International Journal of Pure and Applied Mathematics Special Issue

377

Page 8: Improving the Performance of Entropy Ensembles of Neural ... · as Na wve Bayes(NBs), KNN, Decision Tree (DT) Algorithm, Neural Network(NN). etc., were used to classify the patient

Step 1 : Creating component networks.

Step 2 : Combining these component networks in the ENN.

In the step 1, good regression or classification component networks must be both

accurate and diverse is created. A number of training parameters can be

manipulated to find networks which generalize in a different manner. These

parameters include: the initial conditions, the training data, the typology of the

nets, and the training algorithm [25]. Bagging and Boosting are the most widely

used methods for the creation of training data. Breiman proposed Bagging [26]

based on the bootstrap sampling [27], a resampling gives rise to many samples

from one sample available. During the resampling, the randomly chosen

repeated data is used in the new training set. Then a component network with

this new sample is trained. This process is repeated until the component

networks are adequate in the ENN. So, Bagging is constructive for problems

with a shortage of data.

Ensemble Neural Networks Using the Entropy(EENN)

Entropy based ENN (EENN) is capable of decreasing over fitting in the ENN.

The variation in the initial random weights creates the different performance of

the single network with the same hidden nodes on the training data. The

proposed method chooses each component network with the best initial structure

initially and by considering the network’s accuracy and the model’s complexity,

the entropy is used to ensemble these best component networks. To make the

ENN both accurate and stable, the EENN reduces the error of each component

network initially, and then subsequently balances their contributions to the ENN.

Creation of the component network consists of two steps. In the first step, the

training data and testing data sets is created. The second step is to create the

component networks. During the creation of the training data and testing data

sets, some common ratios of training data to testing data is used in the analysis.

For each component network all training data are used. For the creation of

component networks, each component network is created several times, but the

best structure is used in the ENN. The criterion to choose the best structure is the

one with the smallest training MSE. The training of each component network

need to be highly accurate and diverse as high-quality regression ensemble

members must be both accurate and diverse. Thus, dissimilar quantity of hidden

nodes is used in different component networks, except for the cases with limited

data.

The method to define the number of hidden nodes in each component network is

similar to the Zhao’s method [28], where the best number of hidden nodes in a

single NN is worked out by the trial and error method initially. To choose the

best number of hidden nodes is to find the single NN with the smallest test MSE

and the smallest AIC value as they indicate sufficient training and proper number

of parameters and this number is selected as the maximum number of hidden

nodes of the component network. Subsequently other networks with the number

of hidden nodes less than the maximum are added. The gap between any two

International Journal of Pure and Applied Mathematics Special Issue

378

Page 9: Improving the Performance of Entropy Ensembles of Neural ... · as Na wve Bayes(NBs), KNN, Decision Tree (DT) Algorithm, Neural Network(NN). etc., were used to classify the patient

component networks should be maximum to increase the diversity and the

minimum number of hidden nodes would be minimum possible but with

sufficient accuracy. An upper boundary for the number of parameters that could

be incorporated in the model is determined by the fact that it is not possible to

determine more parameters than the number of samples in the data set [29]. The

boundary for the number of hidden nodes is thus determined as (for the case with

only 1 output node):

𝑁ℎ < (𝑁𝑡𝑟 − 1)/(𝑁𝑖 + 2) (2)

where Ni is the number of input nodes for the component network; Nh is the

number of hidden nodes; and Ntr is the number of the training data. In this work,

the method with the dissimilar number of hidden nodes in the component

networks is adopted if the training data is adequate.

The entropy concept to used attain the unbias ENN, where three parts of the

problem should be optimized at the same time: to maximize the entropy of the

combining weights of the whole ENN; to minimize the error between the mean

output of the ENN and the mean target value; to minimize the difference of the

standard deviation of the output of the ENN and the standard deviation of the

target value. This is beneficial for the whole ENN.

4. Experimentation Results

The data set is taken from the Data Mining Repository of the University of

California, Irvine (UCI) [30]. To end with the system is tested using Cleveland

data sets.

Attributes such as Age, sex, chest pain type, resting blood pressure, serum

cholesterol in mg/dl, fasting blood sugar, resting electrocardiographic results,

and maximum heart rate achieved, exercise induced angina, ST depression, and

slope of the peak exercise ST segment, number of major vessels, thal and the

diagnosis of heart disease are presented. In experimentation work we have used a

total of 909 records with 15 medical attributes.

This dataset is taken from Cleveland Heart Disease database [30].Have split this

record into two categories: one is training dataset (455 records) and second is

testing dataset (454 records). The records for each category are selected

randomly. “Diagnosis” attribute is the target predictable attribute. Value “1” of

this attribute for patients with heart disease and value “0” for patients with no

heart disease. “PatientID” is used as the key; the rest are input attributes. It is

assumed that problems such as missing data, inconsistent data, and duplicate

data have all been resolved. Table 1 shows the attribute information of Cleveland

Heart Disease database.

There are multi classes to be predicted: Absence, Moderate and presence of heart

disease in patients.

International Journal of Pure and Applied Mathematics Special Issue

379

Page 10: Improving the Performance of Entropy Ensembles of Neural ... · as Na wve Bayes(NBs), KNN, Decision Tree (DT) Algorithm, Neural Network(NN). etc., were used to classify the patient

Table I: Attribute information of Cleveland Heart Disease database

Attributes Description Type

Age age in years Numerical

sex sex (1 = male; 0 = female) Categorical

cp chest pain type

Value 1: typical angina

Value 2: atypical angina

Value 3: non-anginal pain

Value 4: asymptomatic

Categorical

restbps resting blood pressure (in mm Hg on admission to the hospital Numerical

chol serum cholestoral in mg/dl #10 (trestbps) Numerical

fbs (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false) Categorical

restecg: resting electrocardiographic results

Value 0: normal

Value 1: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV)

Value 2: showing probable or definite left ventricular hypertrophy by Estes' criteria

Categorical

thalach maximum heart rate achieved Numerical

exang exercise induced angina (1 = yes; 0 = no) Categorical

Oldpeak ST depression induced by exercise relative to rest Numerical

slope the slope of the peak exercise ST segment

Value 1: upsloping, Value 2: flat and Value 3: downsloping

Categorical

ca: number of major vessels (0-3) colored by flourosopy Categorical

thal 3 = normal; 6 = fixed defect; 7 = reversable defect Categorical

num: diagnosis of heart disease

Value 1: present

Value 0: not_present

Categorical

obes

1 = yes 0 = no

Categorical

smoke

1= past 2 = current 3 = never

Categorical

Accuracy is the typically used measure to evaluate the efficacy of clustering

methods; it is used to reckon how the test was worthy and consistent. In order to

calculate these metric, we first compute some of the terms like, True positive

(TP), True negative (TN), False negative (FN) and False positive (FP) based on

Table 2.

Table II: Confusion Matrix

Result of the

diagnostic test

Physician diagnosis

Positive Negative

Clustering

results

Positive TP FP

Negative FN TN

Precision and recall : Precision (also called positive predictive value) is the

fraction of retrieved instances that are relevant, while recall (also known as

International Journal of Pure and Applied Mathematics Special Issue

380

Page 11: Improving the Performance of Entropy Ensembles of Neural ... · as Na wve Bayes(NBs), KNN, Decision Tree (DT) Algorithm, Neural Network(NN). etc., were used to classify the patient

sensitivity) is the fraction of relevant instances that are retrieved. Both precision

and recall are therefore based on an understanding and measure of relevance.

P =TP

(TP + FP)

(3)

R =TP

(TP + FN)

(4)

Figure 3 and figure 4 shows the precision, recall comparison results of existing

ANN, RNN-RFE and proposed EENN-RFE classifier. From the results it

concludes that the proposed EENN-RFE classifier produces higher precision,

recall results when compared to existing methods.

Accuracy and F-measure are based on a combinatorial approach which considers

each possible pair of objects.

Fig. 1. Precision results

comparison

Fig. 2. Recall results

comparison

Fig. 3. F-measure

results comparison

Fig. 4. Accuracy

results comparison

F-measure is described as equation (5)

F − measure =2PR

(P+R), where P =

TP

(TP+FP), and R =

TP

(TP+FN)

(5)

The classification accuracy for measuring the clustering results is computed as

follows .Suppose that the final number of cluster is k ,clustering accuracy r is

defined by equation (6),

r =∑ ai

ki=1

n

(6)

ai be the number of instance occurs in both cluster and its corresponding class

,which has maximum value and error of the cluster is determined by e=1-r.

20 40 60 80 100

10

20

30

40

50

60

70

80

90

100

Number of samples

Pre

cis

ion

(%)

ANN

RNN-RFE

EENN-RFE

20 40 60 80 100

10

20

30

40

50

60

70

80

90

100

Number of samples

Re

ca

ll(%

)

ANN

RNN-RFE

EENN-RFE

20 40 60 80 100

10

20

30

40

50

60

70

80

90

100

Number of samples

F-m

ea

su

re(%

)

ANN

RNN-RFE

EENN-RFE

20 40 60 80 100

10

20

30

40

50

60

70

80

90

100

Number of samples

Accu

racy(%

)

ANN

RNN-RFE

EENN-RFE

International Journal of Pure and Applied Mathematics Special Issue

381

Page 12: Improving the Performance of Entropy Ensembles of Neural ... · as Na wve Bayes(NBs), KNN, Decision Tree (DT) Algorithm, Neural Network(NN). etc., were used to classify the patient

Table III: Results Comparison vs. Metrics

Samples Precision (%) Recall (%)

ANN RNN-RFE EENN-RFE ANN RNN-RFE EENN-RFE

10 72 74.5 78.36 74.5 75.25 81.23

20 73.5 76.53 80.23 75.63 76.63 81.25

30 76 79.52 82.58 78.36 79.52 83.63

40 78.63 81.25 86.63 79.63 80.52 85.69

50 80.21 83.78 88.21 80.25 82.32 89.36

Samples F-measure (%) Accuracy (%)

ANN RNN-RFE EENN-RFE ANN RNN-RFE EENN-RFE

10 73.25 74.875 79.795 78 82 88

20 74.565 76.58 80.74 80 84 88.63

30 77.18 79.52 83.105 82 86.6 89.82

40 79.13 80.885 86.16 83.5 88.52 90.25

50 80.23 83.05 88.785 85 89.5 91.28

5. Conclusion and Future Work

RFE, a feature selection method has been extensively used to select important

genes for microarray data. While RFE can remove most of the irrelevant

features, it cannot eliminate redundant features effectively, which may

deteriorate the algorithm’s performance. To overcome the above limitation, a

correlation measure is introduced to filter out a trunk of highly redundant

features before we use the conventional RFE. After the filtering procedure, the

feature selection is performed using EENN and feature reduction is performed

using RFE, so it is named as EENN-RFE. This work aims to improve the ENN

in two ways: (1) instead of using component NN directly, a preliminary selecting

process is used to get the best component NN; (2) the entropy is used to

determine the weights of the component NNs in the ENN. Usage of the entropy

to combine these best component networks is able to improve the performance

of the EENN by balancing the contribution of each component network. The

experimental results on the heart disease prediction dataset prove that the

proposed method outperforms the baseline RNN, ANN in terms of classification

accuracy, precision, recall, and F-measure. These results also illustrate the

potential of the proposed EENN to be applied to other kinds of problems, which

is considered as the scope of future work.

References

[1] Stathis A., Moore M.J., Advanced pancreatic carcinoma: current treatment and future challenges, Nature Reviews Clinical Oncology 7 (3) (2010), 163–172.

[2] Yin S., Kaynak O., Big data for modern industry: challenges and trends, Proceedings of the IEEE 103 (2) (2015), 143-146.

International Journal of Pure and Applied Mathematics Special Issue

382

Page 13: Improving the Performance of Entropy Ensembles of Neural ... · as Na wve Bayes(NBs), KNN, Decision Tree (DT) Algorithm, Neural Network(NN). etc., were used to classify the patient

[3] Pavlidis P., Weston J., Cai J., Grundy W.N., Gene functional classification from heterogeneous data, In Proceedings of the fifth annual international conference on Computational biology (2001), 249–255.

[4] Duan K., Rajapakse J.C., A variant of SVM-RFE for gene selection in cancer classification with expression data, Proceedings of the 2004 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (2004), 49–55.

[5] Liu H., Sun J., Liu L., Zhang H., Feature selection with dynamic mutual information, Pattern Recognition 42 (7) (2009), 1330–1339.

[6] Yin S., Zhu X., Intelligent particle filter and its application on fault detection of nonlinear system, IEEE Transactions on Industrial Electronics 62 (6) (2015), 3852–3861.

[7] Guyon I., Gunn S., Nikravesh M., Zadeh L.A., Feature extraction: foundations and applications, Springer (2008).

[8] Yin S., Huang Z., Performance monitoring for vehicle suspension system via fuzzy positivistic c-means clustering based on accelerometer measurements, IEEE/ASME Transactions on Mechatronics 20 (5) (2015), 2613–2620.

[9] Li G., You J., Liu X., Support VECTOR Machine (SVM) based prestack AVO inversion and its applications, Journal of Applied Geophysics 120 (2015), 60–68.

[10] Surinta O., Karaaba M.F., Schomaker L.R., Wiering M.A., Recognition of handwritten characters using local gradient feature descriptors, Engineering Applications of Artificial Intelligence 45 (2015), 405–414.

[11] Ahmad S., Kalra A., Stephen H., Estimating soil moisture using remote sensing data: A machine learning approach, Advances in Water Resources 33 (1) (2010), 69–80.

[12] Cao L.J., Tay F.E., Support vector machine with adaptive parameters in financial time series forecasting, IEEE Transactions on Neural Networks 14 (6) (2003), 1506–1518.

[13] Guyon I., Weston J., Barnhill S., Vapnik V., Gene selection for cancer classification using support vector machines, Machine learning 46 (1-3) (2002), 389–422.

[14] Nikolenko S.I., Tulupiev A.L., Learning Systems, Moscow, MCCME (2009).

[15] Karthikeyan T., Kanimozhi V.A., Deep Learning Approach for Prediction of Heart Disease Using Data mining Classification

International Journal of Pure and Applied Mathematics Special Issue

383

Page 14: Improving the Performance of Entropy Ensembles of Neural ... · as Na wve Bayes(NBs), KNN, Decision Tree (DT) Algorithm, Neural Network(NN). etc., were used to classify the patient

Algorithm Deep Belief Network, International Journal of Advanced Research in Science, Engineering and Technology 4 (1) (2017), 3194-3201.

[16] Kim J.K., Kang S., Neural Network-Based Coronary Heart Disease Risk Prediction Using Feature Correlation Analysis, Journal of Healthcare Engineering (2017), 1-13.

[17] Sen S.K. Predicting and Diagnosing of Heart Disease Using Machine Learning Algorithms, International Journal Of Engineering And Computer Science (IJECS) 6 (6) (2017), 21623-21531.

[18] Thomas J., Princy R.T., Human heart disease prediction system using data mining techniques, International Conference on Circuit, Power and Computing Technologies (ICCPCT) (2016), 1-5.

[19] Shrivastava S., Mehta N., Diagnosis of Heart Disease using Neural Network, IRACST - International Journal of Computer Science and Information Technology & Security (IJCSITS) 6 (6) (2016), 14-20.

[20] Turabieh H., A Hybrid ANN-GWO Algorithm for Prediction of Heart Disease, American Journal of Operations Research 6 (2) (2016), 136-146.

[21] Choi E., Schuetz A., Stewart W.F., Sun J., Using recurrent neural network models for early detection of heart failure onset, Journal of the American Medical Informatics Association 24 (2) (2016), 361-370,

[22] Yazdani A., Ramakrishnan K., Performance Evaluation of Artificial Neural Network Models for the Prediction of the Risk of Heart Disease, In International Conference for Innovation in Biomedical Engineering and Life Sciences (2015), 179-182.

[23] Sehgal P., Sharma M., Employing Gradient Based Techniques of Neural Network for Predicting Heart Disease, International Journal of Research in Management, Science & Technology 4 (2) (2016), 108-113.

[24] Sollich P., Krogh A., Learning with ensembles: How over-fitting can be useful, Touretzky DS, Mozer MC, Hasselmo ME, editors. Advances in neural information processing systems, 8. Cambridge, MA: Denver, CO, MIT press (1996), 190–196.

[25] Sharkey A., Combining artificial neural nets ensemble and modular multi-net systems. London: Springer (1999).

[26] Breiman L ., Bagging predictors, Machine Learn 24 (2) (1996), 123–140.

International Journal of Pure and Applied Mathematics Special Issue

384

Page 15: Improving the Performance of Entropy Ensembles of Neural ... · as Na wve Bayes(NBs), KNN, Decision Tree (DT) Algorithm, Neural Network(NN). etc., were used to classify the patient

[27] Efron B., Tibshirani R., An introduction to the bootstrap. New York: Chapman & Hall, 1993.

[28] Zhao Z.Y., Zhang Y., Liao H.J., Design of ensemble neural network using the Akaike information criterion, Eng Appl Artif Intel 21, (2008), 1182–1188.

[29] Ren L., Zhao Z., An optimal neural network and concrete strength modeling, Advances in Engineering Software 33 (3) (2002), 117-130.

[30] Gupta A., Kumar N., Bhatnagar V., Analysis of medical data using data mining and formal concept analysis, World Academy of Science, Engineering and Technology 11 (2005), 61-64.

[31] Rajesh, M., and J. M. Gnanasekar. "Congestion control in heterogeneous WANET using FRCC." Journal of Chemical and Pharmaceutical Sciences ISSN 974 (2015): 2115.

[32] Rajesh, M., and J. M. Gnanasekar. "A systematic review of congestion control in ad hoc network." International Journal of Engineering Inventions 3.11 (2014): 52-56.

[33] Rajesh, M., and J. M. Gnanasekar. " Annoyed Realm Outlook Taxonomy Using Twin Transfer Learning." International Journal of Pure and Applied Mathematics 116.21 (2017) 547-558.

[34] Rajesh, M., and J. M. Gnanasekar. " Get-Up-And-Go Efficientmemetic Algorithm Based Amalgam Routing Protocol." International Journal of Pure and Applied Mathematics 116.21 (2017) 537-547.

[35] Rajesh, M., and J. M. Gnanasekar. " Congestion Control Scheme for Heterogeneous Wireless Ad Hoc Networks Using Self-Adjust Hybrid Model." International Journal of Pure and Applied Mathematics 116.21 (2017) 537-547.

International Journal of Pure and Applied Mathematics Special Issue

385

Page 16: Improving the Performance of Entropy Ensembles of Neural ... · as Na wve Bayes(NBs), KNN, Decision Tree (DT) Algorithm, Neural Network(NN). etc., were used to classify the patient

386