Improving the Performance of Entropy Ensembles of Neural ... · as Na wve Bayes(NBs), KNN, Decision...
Transcript of Improving the Performance of Entropy Ensembles of Neural ... · as Na wve Bayes(NBs), KNN, Decision...
Improving the Performance of Entropy Ensembles of
Neural Networks (EENNS) on Classification of Heart
Disease Prediction 1S.Silvia Priscila and 2M. Hemalatha
1Bharathiar University,
Coimbatore.
2Bharathiar University,
Coimbatore.
Abstract In the classification of high-dimensional data, feature selection is a most
significant step particularly for selection of gene expression in microarray data
which encompass thousands of features. Being a wrapper method, an Entropy
Ensemble of Neural Networks with Recursive Feature Elimination (EENN-
RFE) algorithm is unique among influential feature selection techniques. In this
work a EENN-RFE based feature selection algorithm is proposed and it aims to
remove redundant features depending on the calculated correlation between
features before using EENN-RFE. Noisy and irrelevant features are removed
initially and consequently those features for which the correlation coefficient
among each pair of the remaining features is less than a predetermined
threshold are eliminated. In addition this work proposes a new method for
training of Artificial Neural Networks (ANNs) by measuring the computation
of an objective function with arithmetic mean of an ensemble of chosen
randomly created NNs, and applies this EENN classifier to the heart disease
classification problem. This EENN-RFE algorithm is implemented to the heart
disease dataset and the evaluation results reveal that the selection of a subset of
features and class-wise classification capability is effective than the existing
Support Vector Machine with Recursive Feature Elimination (SVM-RFE). From
the results it may be concluded that the proposed EENN-RFE algorithm after
the feature selection process, an EENN classifier obtains reasonable trade-off
between accuracy and execution time.
Keywords:Data Mining, high-dimensional data, Feature selection, Entropy
Ensemble of Neural Networks with Recursive Feature Elimination (EENN-
RFE), Artificial Neural Networks (ANNs), classification, heart disease
prediction.
International Journal of Pure and Applied MathematicsVolume 117 No. 7 2017, 371-386ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version)url: http://www.ijpam.euSpecial Issue ijpam.eu
371
1. Introduction
Through high death rate, heart disease is a common prediction. Generally, in
their initial stage patients exhibit no symptoms and are usually diagnosed in
advanced stage for cancer. The focal point of their sustained existence is hardly
less than a year [1]. In each case, these data highlights the importance of fast in
depth insight into the gene level of this particular malignant tumor. The
development of microarray technology has facilitated scientists in the direction
of screen mRNA expression level of up to tens of thousands of genes parallel
inside just one single experiment on a solid surface, which is made of either
glass or silicon. Biologists use the data consequently to categorize cancer-related
genes by comparing their expression values in healthy and cancerous tissues. On
the other hand, microarray data are usually available with thousands of features
but only dozens of samples. The universal challenge of big data necessitates for
efficient feature selection techniques [2]. In feature selection, a small subset of
features is selected with stronger classification capacity. Two essential types of
feature selection techniques are filter methods and wrapper methods. In the
former, methods like Fisher criterion [3], T-statistics [4] and mutual information
[5] work correspondingly in estimating the state of systems [6]. The features are
selected in accordance with some statistics that quantifies each feature’s
relevancy with the class labels. The intrinsic statistical properties of data, drives
the selected feature subset which makes it independent of the learning method
used. In the later, the quality of features is evaluated in terms of their influence
on the performance of a certain classifier. Comparison reveals that the wrapper
methods, depends highly on the classification algorithms used, and generate
feature subset that leads to higher classification accuracy, in spite of the fact that
it is more time consuming. Both of the feature selection methods have better
explanatory power than feature extraction [7] methods such as Principal
Components Analysis (PCA) [8].
Due to high quality generalization capability, Support Vector Machine (SVM)
remains an exceptional machine learning method [9-10] that has been
successfully applied in various domains [11]–[12]. The kernel function has
further enhanced the applications of SVM to a wider extent. A backward
elimination technique named Support Vector Machine- Recursive Feature
Elimination (SVM-RFE) [13] is developed as an efficient wrapper method. This
method initially starts with all the features and gradually removes one feature at
a time, and the removal has the least impact on the value of the objective
function, until the final desired feature subset is attained. Though SVM-RFE
effectively eliminates the irrelevant features, the final feature subset generally
consists of redundant features which are expected to be removed in early
iterations. However for SVM, the classification is a time consuming task. The
standard approach to artificial neural networks (ANN) is based on the
optimization theory [14]. ANN is a composition of neurons (see the next
section), which is dependent on the set of weights assigned on the edges of the
neural network. An approximation of an objective function at the training set is
International Journal of Pure and Applied Mathematics Special Issue
372
considered as neural network in pattern classification problem. To determine this
approximation an optimization problem at the space of parameters of the neural
network is studied.
A novel method is proposed in this paper to solve the above problem. Initially, to
measure the redundancy among features, a correlation coefficient is employed.
For removal of redundant features the correlation coefficient between every pair
of the remaining features is maintained less than a predetermined threshold. A
new approach to training of (ensembles of) artificial neural networks is
introduced after the filtering procedure. Here, an ensemble of selected neural
networks with randomly chosen parameters is considered instead of optimization
of the parameters of a single neural network. A neural network that has an
adequately less number of errors at the training set is selected. An averaged
neural network is then introduced which in the simplest case is an arithmetic
mean of selected neural networks. The results revealed that the proposed Entropy
Ensemble of Neural Networks (EENN) is some kind of approximation of the
objective function. EENN method is experimented on pancreatic cancer dataset
and the results reveal that EENN-RFE algorithm is capable if selecting a subset
of features that has both overall and class-wise classification capability than the
baseline SVMRFE. In addition, it is observed that following the feature selection
process, an EENN classifier with RFE kernel can achieve reasonable trade-off
between sensitivity and specificity.
2. Literature Review
Prediction System that utilizes the Deep Belief Network (DBN) classification
algorithm to predict the probable likelihood of heart related diseases. Deep
Belief Network is one of the effective classification algorithm which employs
Deep Learning approach in Deep Neural Network. This proposed work is a
comparison of Convolutional Neural Network (CNN) and DBN classification
algorithms. Convolutional Neural Network being one of the unsupervised
algorithm that provides 82% of accuracy in the prediction of heart diseases. But
the proposed Deep Belief Network algorithm provides 90% prediction accuracy
which inturn enhances the prediction accuracy of heart disease prediction
system. It is designed in the MATLAB 8.1 development environment.
Kim and Kang[16] devised an NN based prediction of CHD risk using Feature
Correlation Analysis (NN-FCA) in two stages. First being the feature selection
stage, which makes features that comply to the importance in predicting CHD
risk, is ranked, and second being the feature correlation analysis stage, during
which one learns about the existence of correlations between feature relations
and the data of each NN predictor output, is determined. From the 4146
individuals in the Korean dataset evaluated, 3031 had low CHD risk and 1115
had CHD high risk. The area under the Receiver Operating Characteristic (ROC)
curve of the proposed model (0.749 ± 0.010) was larger than the Framingham
Risk Score (FRS) (0.393 ± 0.010). The proposed NN-FCA, which utilizes
feature correlation analysis, was found to be better than FRS in terms of CHD
International Journal of Pure and Applied Mathematics Special Issue
373
risk prediction. Furthermore, the proposed model resulted in a larger ROC curve
and more accurate predictions of CHD risk in the Korean population than the
FRS.
Sen [17] observed that researchers accelerated their research works to develop
software with the help machine learning algorithm to help doctors in both
prediction and diagnosing of heart disease. The main goal of this work is the
prediction of the heart disease using machine learning algorithms. Comparative
performances of machine learning algorithms are interpreted through graphical
representations.
Thomas and Princy [18] conducted a survey on various classification techniques
used for predicting the risk level of an individual based on age, gender, blood
pressure, cholesterol, pulse rate etc. Data mining classification techniques such
as Naive Bayes(NBs), KNN, Decision Tree (DT) Algorithm, Neural
Network(NN). etc., were used to classify the patient risk level. Classification
accuracy was found to be high when more number of attributes was used.
Shrivastava and Mehta [19] aimed to build the model for the heart disease
diagnosis and to increase the effectiveness of the proposed model. The artificial
neural network model espoused here is based on the symptoms and risk factors.
The multilayer perceptron network is used to develop the model and trained on
Clevel and clinic foundation dataset. The Back-Propagation Neural Network
(BPNN) algorithm implemented is trained, validated, and tested for the
performance and recorded. Turabieh [20] investigated the performance of two
computational intelligence methods namely Gray Wolf Optimization (GWO)
and Artificial Neural Networks (ANN) by hybridizing them for prediction of
heart disease. The global search method GWO is efficient over the local search
gradient-based back propagation method. The proposed algorithm exhibits the
capability of ANN in finding a relationship between the input and the output
variables while the stochastic search capability of GWO is used for finding the
initial optimal weights and biases of the ANN thereby reduces the probability of
ANN getting stuck at local minima and slowly converging towards global
optimum. The performance of hybrid model (ANN-GWO) was compared with
standard BPNN using Root Mean Square Error (RMSE) for evaluation. The
results revealed that the proposed model enhanced the convergence speed and
the accuracy of prediction.
Choi et al [21] explored whether the use of deep learning to model temporal
relations among events in Electronic Health Records (EHRs) would improve
model performance in predicting initial diagnosis of Heart Failure (HF)
compared to conventional methods that ignore temporality. To detect relations
among time-stamped events (eg, disease diagnosis, medication orders, procedure
orders, etc.), Recurrent Neural Network (RNN) models using Gated Recurrent
Units (GRUs) were adapted with a 12- to 18-month observation window of cases
and controls. The regular logistic regression, neural network, support vector
machine, and K-nearest neighbor classifier approaches are used to compare the
International Journal of Pure and Applied Mathematics Special Issue
374
performance metrics of the proposed model. Yazdani and Ramakrishnan [22]
designed and implemented a clinical decision support system that helped in
predicting the risk of heart disease. The optimal artificial neural network model
identified among the different models is evaluated for accuracy measures on
standard heart disease database. To ease the doctors in predicting the risk of
heart disease an interface is also developed based on the optimal model.
Sehgal and Sharma [23] focused on the prediction of heart disease using gradient
based techniques of neural networks. ANN based models for prediction of heart
disease have been developed using neural network toolbox available in
MATLAB.
To develop the disease prediction models, the multilayer perceptron architecture
of neural networks with error back propagation along with significant gradient
based methods has been employed. A performance comparative study of
algorithms that employs various important training methods is also been
provided. The intention is to provide a more effective and acceptable framework,
for prediction and diagnosis of heart disease.
3. Proposed Work
Initially, the correlation coefficient is used to quantify the redundancy among
features. The removal of redundant features helps the correlation coefficient
between every pair of the remaining features to be less than a predetermined
threshold.
A new approach to training of (ensembles of) artificial neural networks is
introduced after the filtering procedure. Here, an ensemble of selected neural
networks with randomly chosen parameters is considered instead of optimization
of the parameters of a single neural network. A neural network that has an
adequately less number of errors at the training set is selected. An averaged
neural network is then introduced which in the simplest case is an arithmetic
mean of selected neural networks. The results revealed that the proposed Entropy
Ensemble of Neural Networks (EENN) is some kind of approximation of the
objective function.
EENN method is experimented on pancreatic cancer dataset and the results
reveal that EENN-RFE algorithm is capable if selecting a subset of features that
has both overall and class-wise classification capability than the baseline
SVMRFE. In addition, it is observed that following the feature selection process,
an EENN classifier with RFE kernel can achieve reasonable trade-off between
sensitivity and specificity.
The overall framework of the proposed work is illustrated in the Fig 1.
International Journal of Pure and Applied Mathematics Special Issue
375
Figure 1: Architecture Diagram
Based on the theory of EENN-RFE, it is capable of removing those features that
contributes the least to the classification, but has no control over redundant
features. This leads to severe decline in the performance of classifiers’. Two
features, namely f1 and f2, on assumption that are proportional to each other and
highly relevant to the class labels, both of them would survive in the EENN-RFE
at the expense of missing potentially useful features. A neglected useful feature
is generally ineffective by itself due to its low correlation with the class labels.
However, it can considerably improvise the classification accuracy along with
other features. A typical example is the XOR problem. The planar case is taken
to consideration in the form of a coordinate system, where each axis represents a
feature. The data points in the first and third quadrant belong to one class and
those in the other two quadrants belong to some other class. Thus each feature is
individually irrelevant to the class while both feature together result in perfect
classification. Preprocessing efficiently eliminates systematic variation. To
improve the performance of EENN-RFE, the redundant features are eliminated
before conducting the baseline algorithm. The correlation coefficient introduced
to quantify the similarity between features. The correlation coefficient between
features fi and fj is computed as follows:
Heart disease dataset Microarray dataset
Perform pre-process
Feature selection and classification
Feature selection
RFE
Perform feature selection Selected features
Classification
EENN-RFE
Classified results
No heart disease
Heart disease Both
International Journal of Pure and Applied Mathematics Special Issue
376
𝑟𝑖𝑗 =∑ (𝑓𝑖𝑘 − 𝑓��)(𝑓𝑖𝑘 − 𝑓��)𝑛
𝑘=1
√∑ (𝑓𝑖𝑘 − 𝑓��)2𝑛
𝑘=1√∑ (𝑓𝑖𝑘 − 𝑓��)
2𝑛𝑘=1
(1)
where fik is the value of feature fi corresponding to the kth sample, and fi is the
averaged value of feature fi of all the n samples. The value of rij is within [−1, 1]
and the bigger the |rij |, fi and fj are more correlated features. The preprocessing
procedure is described below
1) Initialize the predefined threshold as 𝜃, the number of the current feature
as i = 1, and j = 2.
2) Compute 𝑟𝑖𝑗, if |𝑟𝑖𝑗 | > 𝜃, then remove feature fi and go to step 4.
3) Update as 𝑗 = 𝑗 + 1 and go to step 2 until j = l (l is the number of the
original features).
4) Update as 𝑖 = 𝑖 + 1, 𝑗 = 𝑖 + 1 and go to step 2 until 𝑖 = 𝑙 − 1.
Recursive Feature Elimination (RFE)
The EENN-RFE is adopted to select the most discriminative ion features
following the filtration of noisy irrelevant information. Initially, the current
feature subset is used to build the ENN model, the weight of each feature is
computed, r% (0 < r < 100) ion features with the smallest weights are eliminated.
If the number of r% current ion features is smaller than 1, only one ion feature
with the smallest weight is eliminated. The iteration is sustained until all the ion
features have been eliminated. In each iteration, the current ion feature subset is
evaluated by the k-fold cross-validation. The feature with the utmost cross
validation accuracy is the finally selected feature subset. An ENN is an ensemble
of a finite number of NNs that are trained for the similar task. The NN in the
ensemble are generally trained independently and then their predictions are
combined [24]. Any one NN in the ENN could make available a predictor by
itself for the task, but better results might be obtained when combined. The
architecture of the ENN is illustrated in Fig 2.
Figure 2: Architecture of the Ensemble Neural Network
The two main steps to construct an ENN are:
International Journal of Pure and Applied Mathematics Special Issue
377
Step 1 : Creating component networks.
Step 2 : Combining these component networks in the ENN.
In the step 1, good regression or classification component networks must be both
accurate and diverse is created. A number of training parameters can be
manipulated to find networks which generalize in a different manner. These
parameters include: the initial conditions, the training data, the typology of the
nets, and the training algorithm [25]. Bagging and Boosting are the most widely
used methods for the creation of training data. Breiman proposed Bagging [26]
based on the bootstrap sampling [27], a resampling gives rise to many samples
from one sample available. During the resampling, the randomly chosen
repeated data is used in the new training set. Then a component network with
this new sample is trained. This process is repeated until the component
networks are adequate in the ENN. So, Bagging is constructive for problems
with a shortage of data.
Ensemble Neural Networks Using the Entropy(EENN)
Entropy based ENN (EENN) is capable of decreasing over fitting in the ENN.
The variation in the initial random weights creates the different performance of
the single network with the same hidden nodes on the training data. The
proposed method chooses each component network with the best initial structure
initially and by considering the network’s accuracy and the model’s complexity,
the entropy is used to ensemble these best component networks. To make the
ENN both accurate and stable, the EENN reduces the error of each component
network initially, and then subsequently balances their contributions to the ENN.
Creation of the component network consists of two steps. In the first step, the
training data and testing data sets is created. The second step is to create the
component networks. During the creation of the training data and testing data
sets, some common ratios of training data to testing data is used in the analysis.
For each component network all training data are used. For the creation of
component networks, each component network is created several times, but the
best structure is used in the ENN. The criterion to choose the best structure is the
one with the smallest training MSE. The training of each component network
need to be highly accurate and diverse as high-quality regression ensemble
members must be both accurate and diverse. Thus, dissimilar quantity of hidden
nodes is used in different component networks, except for the cases with limited
data.
The method to define the number of hidden nodes in each component network is
similar to the Zhao’s method [28], where the best number of hidden nodes in a
single NN is worked out by the trial and error method initially. To choose the
best number of hidden nodes is to find the single NN with the smallest test MSE
and the smallest AIC value as they indicate sufficient training and proper number
of parameters and this number is selected as the maximum number of hidden
nodes of the component network. Subsequently other networks with the number
of hidden nodes less than the maximum are added. The gap between any two
International Journal of Pure and Applied Mathematics Special Issue
378
component networks should be maximum to increase the diversity and the
minimum number of hidden nodes would be minimum possible but with
sufficient accuracy. An upper boundary for the number of parameters that could
be incorporated in the model is determined by the fact that it is not possible to
determine more parameters than the number of samples in the data set [29]. The
boundary for the number of hidden nodes is thus determined as (for the case with
only 1 output node):
𝑁ℎ < (𝑁𝑡𝑟 − 1)/(𝑁𝑖 + 2) (2)
where Ni is the number of input nodes for the component network; Nh is the
number of hidden nodes; and Ntr is the number of the training data. In this work,
the method with the dissimilar number of hidden nodes in the component
networks is adopted if the training data is adequate.
The entropy concept to used attain the unbias ENN, where three parts of the
problem should be optimized at the same time: to maximize the entropy of the
combining weights of the whole ENN; to minimize the error between the mean
output of the ENN and the mean target value; to minimize the difference of the
standard deviation of the output of the ENN and the standard deviation of the
target value. This is beneficial for the whole ENN.
4. Experimentation Results
The data set is taken from the Data Mining Repository of the University of
California, Irvine (UCI) [30]. To end with the system is tested using Cleveland
data sets.
Attributes such as Age, sex, chest pain type, resting blood pressure, serum
cholesterol in mg/dl, fasting blood sugar, resting electrocardiographic results,
and maximum heart rate achieved, exercise induced angina, ST depression, and
slope of the peak exercise ST segment, number of major vessels, thal and the
diagnosis of heart disease are presented. In experimentation work we have used a
total of 909 records with 15 medical attributes.
This dataset is taken from Cleveland Heart Disease database [30].Have split this
record into two categories: one is training dataset (455 records) and second is
testing dataset (454 records). The records for each category are selected
randomly. “Diagnosis” attribute is the target predictable attribute. Value “1” of
this attribute for patients with heart disease and value “0” for patients with no
heart disease. “PatientID” is used as the key; the rest are input attributes. It is
assumed that problems such as missing data, inconsistent data, and duplicate
data have all been resolved. Table 1 shows the attribute information of Cleveland
Heart Disease database.
There are multi classes to be predicted: Absence, Moderate and presence of heart
disease in patients.
International Journal of Pure and Applied Mathematics Special Issue
379
Table I: Attribute information of Cleveland Heart Disease database
Attributes Description Type
Age age in years Numerical
sex sex (1 = male; 0 = female) Categorical
cp chest pain type
Value 1: typical angina
Value 2: atypical angina
Value 3: non-anginal pain
Value 4: asymptomatic
Categorical
restbps resting blood pressure (in mm Hg on admission to the hospital Numerical
chol serum cholestoral in mg/dl #10 (trestbps) Numerical
fbs (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false) Categorical
restecg: resting electrocardiographic results
Value 0: normal
Value 1: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV)
Value 2: showing probable or definite left ventricular hypertrophy by Estes' criteria
Categorical
thalach maximum heart rate achieved Numerical
exang exercise induced angina (1 = yes; 0 = no) Categorical
Oldpeak ST depression induced by exercise relative to rest Numerical
slope the slope of the peak exercise ST segment
Value 1: upsloping, Value 2: flat and Value 3: downsloping
Categorical
ca: number of major vessels (0-3) colored by flourosopy Categorical
thal 3 = normal; 6 = fixed defect; 7 = reversable defect Categorical
num: diagnosis of heart disease
Value 1: present
Value 0: not_present
Categorical
obes
1 = yes 0 = no
Categorical
smoke
1= past 2 = current 3 = never
Categorical
Accuracy is the typically used measure to evaluate the efficacy of clustering
methods; it is used to reckon how the test was worthy and consistent. In order to
calculate these metric, we first compute some of the terms like, True positive
(TP), True negative (TN), False negative (FN) and False positive (FP) based on
Table 2.
Table II: Confusion Matrix
Result of the
diagnostic test
Physician diagnosis
Positive Negative
Clustering
results
Positive TP FP
Negative FN TN
Precision and recall : Precision (also called positive predictive value) is the
fraction of retrieved instances that are relevant, while recall (also known as
International Journal of Pure and Applied Mathematics Special Issue
380
sensitivity) is the fraction of relevant instances that are retrieved. Both precision
and recall are therefore based on an understanding and measure of relevance.
P =TP
(TP + FP)
(3)
R =TP
(TP + FN)
(4)
Figure 3 and figure 4 shows the precision, recall comparison results of existing
ANN, RNN-RFE and proposed EENN-RFE classifier. From the results it
concludes that the proposed EENN-RFE classifier produces higher precision,
recall results when compared to existing methods.
Accuracy and F-measure are based on a combinatorial approach which considers
each possible pair of objects.
Fig. 1. Precision results
comparison
Fig. 2. Recall results
comparison
Fig. 3. F-measure
results comparison
Fig. 4. Accuracy
results comparison
F-measure is described as equation (5)
F − measure =2PR
(P+R), where P =
TP
(TP+FP), and R =
TP
(TP+FN)
(5)
The classification accuracy for measuring the clustering results is computed as
follows .Suppose that the final number of cluster is k ,clustering accuracy r is
defined by equation (6),
r =∑ ai
ki=1
n
(6)
ai be the number of instance occurs in both cluster and its corresponding class
,which has maximum value and error of the cluster is determined by e=1-r.
20 40 60 80 100
10
20
30
40
50
60
70
80
90
100
Number of samples
Pre
cis
ion
(%)
ANN
RNN-RFE
EENN-RFE
20 40 60 80 100
10
20
30
40
50
60
70
80
90
100
Number of samples
Re
ca
ll(%
)
ANN
RNN-RFE
EENN-RFE
20 40 60 80 100
10
20
30
40
50
60
70
80
90
100
Number of samples
F-m
ea
su
re(%
)
ANN
RNN-RFE
EENN-RFE
20 40 60 80 100
10
20
30
40
50
60
70
80
90
100
Number of samples
Accu
racy(%
)
ANN
RNN-RFE
EENN-RFE
International Journal of Pure and Applied Mathematics Special Issue
381
Table III: Results Comparison vs. Metrics
Samples Precision (%) Recall (%)
ANN RNN-RFE EENN-RFE ANN RNN-RFE EENN-RFE
10 72 74.5 78.36 74.5 75.25 81.23
20 73.5 76.53 80.23 75.63 76.63 81.25
30 76 79.52 82.58 78.36 79.52 83.63
40 78.63 81.25 86.63 79.63 80.52 85.69
50 80.21 83.78 88.21 80.25 82.32 89.36
Samples F-measure (%) Accuracy (%)
ANN RNN-RFE EENN-RFE ANN RNN-RFE EENN-RFE
10 73.25 74.875 79.795 78 82 88
20 74.565 76.58 80.74 80 84 88.63
30 77.18 79.52 83.105 82 86.6 89.82
40 79.13 80.885 86.16 83.5 88.52 90.25
50 80.23 83.05 88.785 85 89.5 91.28
5. Conclusion and Future Work
RFE, a feature selection method has been extensively used to select important
genes for microarray data. While RFE can remove most of the irrelevant
features, it cannot eliminate redundant features effectively, which may
deteriorate the algorithm’s performance. To overcome the above limitation, a
correlation measure is introduced to filter out a trunk of highly redundant
features before we use the conventional RFE. After the filtering procedure, the
feature selection is performed using EENN and feature reduction is performed
using RFE, so it is named as EENN-RFE. This work aims to improve the ENN
in two ways: (1) instead of using component NN directly, a preliminary selecting
process is used to get the best component NN; (2) the entropy is used to
determine the weights of the component NNs in the ENN. Usage of the entropy
to combine these best component networks is able to improve the performance
of the EENN by balancing the contribution of each component network. The
experimental results on the heart disease prediction dataset prove that the
proposed method outperforms the baseline RNN, ANN in terms of classification
accuracy, precision, recall, and F-measure. These results also illustrate the
potential of the proposed EENN to be applied to other kinds of problems, which
is considered as the scope of future work.
References
[1] Stathis A., Moore M.J., Advanced pancreatic carcinoma: current treatment and future challenges, Nature Reviews Clinical Oncology 7 (3) (2010), 163–172.
[2] Yin S., Kaynak O., Big data for modern industry: challenges and trends, Proceedings of the IEEE 103 (2) (2015), 143-146.
International Journal of Pure and Applied Mathematics Special Issue
382
[3] Pavlidis P., Weston J., Cai J., Grundy W.N., Gene functional classification from heterogeneous data, In Proceedings of the fifth annual international conference on Computational biology (2001), 249–255.
[4] Duan K., Rajapakse J.C., A variant of SVM-RFE for gene selection in cancer classification with expression data, Proceedings of the 2004 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (2004), 49–55.
[5] Liu H., Sun J., Liu L., Zhang H., Feature selection with dynamic mutual information, Pattern Recognition 42 (7) (2009), 1330–1339.
[6] Yin S., Zhu X., Intelligent particle filter and its application on fault detection of nonlinear system, IEEE Transactions on Industrial Electronics 62 (6) (2015), 3852–3861.
[7] Guyon I., Gunn S., Nikravesh M., Zadeh L.A., Feature extraction: foundations and applications, Springer (2008).
[8] Yin S., Huang Z., Performance monitoring for vehicle suspension system via fuzzy positivistic c-means clustering based on accelerometer measurements, IEEE/ASME Transactions on Mechatronics 20 (5) (2015), 2613–2620.
[9] Li G., You J., Liu X., Support VECTOR Machine (SVM) based prestack AVO inversion and its applications, Journal of Applied Geophysics 120 (2015), 60–68.
[10] Surinta O., Karaaba M.F., Schomaker L.R., Wiering M.A., Recognition of handwritten characters using local gradient feature descriptors, Engineering Applications of Artificial Intelligence 45 (2015), 405–414.
[11] Ahmad S., Kalra A., Stephen H., Estimating soil moisture using remote sensing data: A machine learning approach, Advances in Water Resources 33 (1) (2010), 69–80.
[12] Cao L.J., Tay F.E., Support vector machine with adaptive parameters in financial time series forecasting, IEEE Transactions on Neural Networks 14 (6) (2003), 1506–1518.
[13] Guyon I., Weston J., Barnhill S., Vapnik V., Gene selection for cancer classification using support vector machines, Machine learning 46 (1-3) (2002), 389–422.
[14] Nikolenko S.I., Tulupiev A.L., Learning Systems, Moscow, MCCME (2009).
[15] Karthikeyan T., Kanimozhi V.A., Deep Learning Approach for Prediction of Heart Disease Using Data mining Classification
International Journal of Pure and Applied Mathematics Special Issue
383
Algorithm Deep Belief Network, International Journal of Advanced Research in Science, Engineering and Technology 4 (1) (2017), 3194-3201.
[16] Kim J.K., Kang S., Neural Network-Based Coronary Heart Disease Risk Prediction Using Feature Correlation Analysis, Journal of Healthcare Engineering (2017), 1-13.
[17] Sen S.K. Predicting and Diagnosing of Heart Disease Using Machine Learning Algorithms, International Journal Of Engineering And Computer Science (IJECS) 6 (6) (2017), 21623-21531.
[18] Thomas J., Princy R.T., Human heart disease prediction system using data mining techniques, International Conference on Circuit, Power and Computing Technologies (ICCPCT) (2016), 1-5.
[19] Shrivastava S., Mehta N., Diagnosis of Heart Disease using Neural Network, IRACST - International Journal of Computer Science and Information Technology & Security (IJCSITS) 6 (6) (2016), 14-20.
[20] Turabieh H., A Hybrid ANN-GWO Algorithm for Prediction of Heart Disease, American Journal of Operations Research 6 (2) (2016), 136-146.
[21] Choi E., Schuetz A., Stewart W.F., Sun J., Using recurrent neural network models for early detection of heart failure onset, Journal of the American Medical Informatics Association 24 (2) (2016), 361-370,
[22] Yazdani A., Ramakrishnan K., Performance Evaluation of Artificial Neural Network Models for the Prediction of the Risk of Heart Disease, In International Conference for Innovation in Biomedical Engineering and Life Sciences (2015), 179-182.
[23] Sehgal P., Sharma M., Employing Gradient Based Techniques of Neural Network for Predicting Heart Disease, International Journal of Research in Management, Science & Technology 4 (2) (2016), 108-113.
[24] Sollich P., Krogh A., Learning with ensembles: How over-fitting can be useful, Touretzky DS, Mozer MC, Hasselmo ME, editors. Advances in neural information processing systems, 8. Cambridge, MA: Denver, CO, MIT press (1996), 190–196.
[25] Sharkey A., Combining artificial neural nets ensemble and modular multi-net systems. London: Springer (1999).
[26] Breiman L ., Bagging predictors, Machine Learn 24 (2) (1996), 123–140.
International Journal of Pure and Applied Mathematics Special Issue
384
[27] Efron B., Tibshirani R., An introduction to the bootstrap. New York: Chapman & Hall, 1993.
[28] Zhao Z.Y., Zhang Y., Liao H.J., Design of ensemble neural network using the Akaike information criterion, Eng Appl Artif Intel 21, (2008), 1182–1188.
[29] Ren L., Zhao Z., An optimal neural network and concrete strength modeling, Advances in Engineering Software 33 (3) (2002), 117-130.
[30] Gupta A., Kumar N., Bhatnagar V., Analysis of medical data using data mining and formal concept analysis, World Academy of Science, Engineering and Technology 11 (2005), 61-64.
[31] Rajesh, M., and J. M. Gnanasekar. "Congestion control in heterogeneous WANET using FRCC." Journal of Chemical and Pharmaceutical Sciences ISSN 974 (2015): 2115.
[32] Rajesh, M., and J. M. Gnanasekar. "A systematic review of congestion control in ad hoc network." International Journal of Engineering Inventions 3.11 (2014): 52-56.
[33] Rajesh, M., and J. M. Gnanasekar. " Annoyed Realm Outlook Taxonomy Using Twin Transfer Learning." International Journal of Pure and Applied Mathematics 116.21 (2017) 547-558.
[34] Rajesh, M., and J. M. Gnanasekar. " Get-Up-And-Go Efficientmemetic Algorithm Based Amalgam Routing Protocol." International Journal of Pure and Applied Mathematics 116.21 (2017) 537-547.
[35] Rajesh, M., and J. M. Gnanasekar. " Congestion Control Scheme for Heterogeneous Wireless Ad Hoc Networks Using Self-Adjust Hybrid Model." International Journal of Pure and Applied Mathematics 116.21 (2017) 537-547.
International Journal of Pure and Applied Mathematics Special Issue
385
386