Analysis of Weka Data Mining Techniques for Heart Disease … ›...
Transcript of Analysis of Weka Data Mining Techniques for Heart Disease … ›...
Int J Med Rev 2020 Jan;7(1): 15-24
10.30491/ijmr.2020.221474.1078
Copyright © 2020 The Author(s). This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://
creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly
cited.
INTERNATIONAL JOURNAL OF
MEDICAL
REVIEWS
Narrative Review
Analysis of Weka Data Mining Techniques for Heart Disease
Prediction System
Basma Saleh *, Ahmed Saeidi, Ali al-Aqbi, Lamees Salman
Al-Mustansiriyah University, Baghdad, Iraq
* Corresponding Author: Basma Saleh Al-Mustansiriyah University, Baghdad, Iraq. Email: [email protected]
Received December 01, 2019; Accepted December 27, 2019; Online Published January 01, 2020
Abstract
Data mining is characterized as searching for useful information through very large data sets. Some of the key and most common techniques for
data mining are association rules, classification, clustering, prediction, and sequential models. For a wide range of applications, data mining
techniques are used. Data mining plays a significant role in disease detection in the health care industry. The patient should be needed to detect
a number of tests for the disease. However, the number of tests should be reduced by using data mining techniques. In time and performance, this
reduced test plays an important role. Heart disease is a cardiovascular disease that causes death. Health problems are enormous in this recent
situation because of the prediction and the classification of health problems in different situations. The data mining area included the prediction
and identification of abnormality and its risk rate in these domains. Today the health industry holds hidden information essential for decision-
making. For predicting heart problems, data extraction algorithms like K-star, J48, SMO, Naïve Bayes, MLP, Random Forest, Bayes Net, and
REPTREE are used for this study (Weka 3.8.3) software. The results of the predictive accuracy, the ROC curve, and the AUC value are combined
using a standard set of data and a collected dataset. By applying different data mining algorithms, the patient data can be used for diagnosis as
training samples. The main drawbacks of the previous studies are that they need accurate and the number of features. This paper surveys recent
data mining techniques applied to predict heart diseases. And Identifying the major risk factors of Heart Disease categorizing the risk factors in an
order which causes damages to the heart such as high blood cholesterol, diabetes, smoking, poor diet, obesity, hypertension, stress, etc. Data
mining functions and techniques are used to identify the level of risk factors to help the patients in taking precautions in advance to save their life.
Keywords: Data mining, Heart Disease, Prediction, Classification, Dataset, WEKA tool, Prediction techniques.
Saleh et al
16 | International Journal of Medical Reviews. 2020;7(1)15-24
Analysis of Weka Data Mining Techniques for Heart Disease Prediction System
International Journal of Medical Reviews. 2020;7(1):15-24 | 17
Saleh et al
18 | International Journal of Medical Reviews. 2020;7(1)15-24
Analysis of Weka Data Mining Techniques for Heart Disease Prediction System
International Journal of Medical Reviews. 2020;7(1):15-24 | 19
Saleh et al
20 | International Journal of Medical Reviews. 2020;7(1)15-24
Table 1. Commonly used Algorithms
Generic Name WEKA Name
Bayesian Network Naïve Bayes(NB)
Support Vector Machine SMO
C4.5 Decision Tree J48
K-Nearest Neighbor 1Bk
Workbench
Analysis of Weka Data Mining Techniques for Heart Disease Prediction System
International Journal of Medical Reviews. 2020;7(1):15-24 | 21
Figure 1. Weka GUI chooser.
Figure-2. The Explorer Interface
Saleh et al
22 | International Journal of Medical Reviews. 2020;7(1)15-24
Figure-3. The Knowledge Flow Interface
Figure-4. The Experimenter Interface
𝑀𝐶𝐶 =𝑇𝑃∗𝑇𝑁−𝐹𝑃∗𝐹𝑁
√(𝑇𝑃+𝐹𝑃)(𝑇𝑃+𝐹𝑁)(𝑇𝑁+𝐹𝑃)(𝑇𝑁+𝐹𝑁)
Analysis of Weka Data Mining Techniques for Heart Disease Prediction System
International Journal of Medical Reviews. 2020;7(1):15-24 | 23
Table-3. different outcomes of a two-class prediction
Actual class
Predicted class
YES No
YES True Positive (TP) False Negative (FN)
No False Positive (FP) True Negative (TN)
𝑇𝑃+𝑇𝑁
𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁
RMSE = √1
𝑛∑ (
𝜌𝑖,𝑗−𝜏𝑗
𝜏𝑗)2
𝑛𝑗=1
𝜌𝑖,𝑗
𝜏𝑗
1. Patel AK, Shukla DP. A data mining technique for prediction of
coronary heart disease using neuro-fuzzy integrated approach two
level Patel AK, Shukla DP. A data mining technique for prediction
of coronary heart disease using neuro-fuzzy integrated approach
two level. International Journal of Engineering and Computer
Science. 2013;2(09). 2663-2671.
2. Kundra D, Kaur N. Review on prediction system for heart diagnosis
using data mining techniques. Int. J. Latest Res. Eng. Technol. 2015
Oct;1(5):9-14.
3. Adamson SJ, Heather N, Morton V, Raistrick D. Initial preference
for drinking goal in the treatment of alcohol problems: II.
Treatment outcomes. Alcohol & Alcoholism. 2010 Mar
1;45(2):136-42. doi: 10.1093/alcalc/agq005
4. Lal, B Suresh. Diabetes: Causes, Symptoms And Treatments. Public
Health Environment and Social Issues in India. 1 st. : Serials
Publications, 2016, 5.
5. Shahsavarani AM, Azad Marz Abadi E, Hakimi Kalkhoran M.
Stress: Facts and theories through literature review. International
Journal of Medical Reviews. 2015 Jun 1;2(2):230-41.
6. Control, South Carolina Department of Health and Environmental.
What Is High Blood Pressure.
https://dc.statelibrary.sc.gov/handle/10827/25131. [Online] 2017.
[Cited: December 15, 2019.]
7. Sultana M, Haider A, Uddin MS. Analysis of data mining
techniques for heart disease prediction. In2016 3rd International
Conference on Electrical Engineering and Information
Communication Technology (ICEEICT) 2016 Sep 22 (pp. 1-5).
IEEE. doi: 10.1109/CEEICT.2016.7873142
8. Sujata, Joshi and Mydhili, K., Nair. Prediction of Heart Disease
Using Classification Based Data Mining Techniques. [book auth.]
L. Jain, et al., et al. Computational Intelligence in Data Mining -
Volume 2 Smart Innovation, Systems and Technologies. s.l. :
Springer, New Delhi, 2014, Vol. 32, pp. 503-511. doi:
10.1007/978-81-322-2208-8_46
9. Dangare C, Apte S. A data mining approach for prediction of heart
disease using neural networks. International Journal of Computer
Engineering and Technology (IJCET). 2012 Oct 14;3(3).
10. Heart Disease—General Info and Peer reviewed studies.
http://www.aristoloft.com. [Online] [Cited: December 15, 2019.]
11. Venkatalakshmi B, Shivsankar MV. Heart disease diagnosis using
predictive data mining. International Journal of Innovative
Research in Science, Engineering and Technology. 2014
Mar;3(3):1873-7. doi: 10.1109/ICECA.2017.8203643
12. Jabbar MA, Samreen S. Heart disease prediction system based on
hidden naïve bayes classifier. In 2016 International Conference
on Circuits, Controls, Communications and Computing (I4C) 2016
Oct 4 :pp. 1-5. doi: 10.1109/cimca.2016.8053261
13. Orphanou K, Dagliati A, Sacchi L, Stassopoulou A, Keravnou E,
Bellazzi R. Incorporating repeating temporal association rules in
naïve bayes classifiers for coronary heart disease diagnosis.
Journal of biomedical informatics. 2018 May 1;81:74-82. doi:
10.1016/j.jbi.2018.03.002
14. Alizadehsani R, Habibi J, Hosseini MJ, Mashayekhi H, Boghrati R,
Ghandeharioun A, Bahadorian B, Sani ZA. A data mining
approach for diagnosis of coronary artery disease. Computer
methods and programs in biomedicine. 2013 Jul 1;111(1):52-61.
doi: 10.1016/j.cmpb.2013.03.004
15. Ramotra AK, Mahajan A, Kumar R, Mansotra V. Comparative
Analysis of Data Mining Classification Techniques for Prediction
of Heart Disease Using the Weka and SPSS Modeler Tools. InSmart
Trends in Computing and Communications 2020 :pp. 89-97.
Springer, Singapore. doi: 10.1007/978-981-15-0077-0_10
16. Han J, Kamber M, Pei J. Data mining concepts and techniques
third edition. Morgan Kaufmann. 2011.[Book]
17. Aljaaf AJ, Al-Jumeily D, Hussain AJ, Dawson T, Fergus P, Al-
Jumaily M. Predicting the likelihood of heart failure with a multi
level risk assessment using decision tree. In2015 Third
International Conference on Technological Advances in Electrical,
Electronics and Computer Engineering (TAEECE) 2015 Apr 29 :pp.
101-106. doi: 10.1109/TAEECE.2015.7113608
18. Alexopoulos E, Dounias GD, Vemmos K. Medical diagnosis of
stroke using inductive machine learning. Machine Learning and
Applications: Machine Learning in Medical Applications. 1999
Jul:20-3.
19. Masetic Z, Subasi A. Congestive heart failure detection using
random forest classifier. Computer methods and programs in
biomedicine. 2016 ;130:54-64. doi: 10.1016/j.cmpb.2016.03.020
20. Lu H, Setiono R, Liu H. Effective data mining using neural
Saleh et al
24 | International Journal of Medical Reviews. 2020;7(1)15-24
networks. IEEE transactions on knowledge and data engineering.
1996 Dec;8(6):957-61. doi: 10.1109/69.553163
21. Gharehchopogh FS, Khalifelu ZA. Neural network application in
diagnosis of patient: a case study. InInternational Conference on
Computer Networks and Information Technology 2011 Jul 11: pp.
245–249. doi: 10.1109/ICCNIT.2011.6020937
22. Saxena K, Sharma R. Efficient heart disease prediction system.
Procedia Computer Science. 2016;85:962-9. doi:
10.1016/j.procs.2016.05.288
23. Nashif S, Raihan MR, Islam MR, Imam MH. Heart disease
detection by using machine learning algorithms and a real-time
cardiovascular health monitoring system. World Journal of
Engineering and Technology. 2018 Sep 12;6(4):854-73. doi:
10.4236/wjet.2018.64057
24. Jothikumar, R and Sivabalan, V. Analysis of Classification
Algorithms for Heart Disease Prediction and its Accuracies.
Middle-East Journal of Scientific Research. 2016; 24: pp. 200-206.
doi: 10.5829/idosi.mejsr.2016.24.RIETMA133
25. Sarangam Kodati DR. Analysis of heart disease using in data
mining tools Orange and Weka. Global Journal of Computer
Science and Technology. 2018; 18(1c): 16-22.
26. Platt J. Sequential minimal optimization: A fast algorithm for
training support vector machines. 1998; 98-14.
27. Oo AN, Naing T. Smo and Lazy Classifiers for Heart Disease
Prediction. International Journal of Advance Research and
Innovative Ideas in Education. 2019; 5: 2395-4396.
28. Latha CB, Jeeva SC. Improving the accuracy of prediction of heart
disease risk based on ensemble classification techniques.
Informatics in Medicine Unlocked. 2019 Jan 1;16: pp. 1-9. doi:
10.1016/j.imu.2019.100203
29. Mirmozaffari M, Alinezhad A, Gilanpour A. Data Mining Apriori
Algorithm for Heart Disease Prediction. Int'l Journal of Computing,
Communications & Instrumentation Engg. 2017;4(1):20-3.
30. Kalmegh S. Analysis of weka data mining algorithm reptree, simple
cart and randomtree for classification of indian news. International
Journal of Innovative Science, Engineering & Technology.
2015;2(2):438-46.
31. Witten, I. H. and Frank, E. Data Mining Practical Machine
Learning Tools and Techniques. Morgan Kaufman Publishers,
2005. p. 587.[Book]
32. Kirkby, R. WEKA Explorer User Guide for version 3-4. University
of Weikato, 2002. pp. 3-4.
33. Kumar MN, Koushik KV, Deepak K. Prediction of heart diseases
using data mining and machine learning algorithms and tools.
International Journal of Scientific Research in Computer Science,
Engineering and Information Technology. 2018;3(3): 887-898.
doi: 10.14419/ijet.v7i2.32.15714
34. Benjamin, H., Fredrick, David and S. Antony, Belcy. Heart Disease
Prediction Using Data Mining Techniques. Ictact Journal On Soft
Computing. 2018; 9. doi: 10.21917/ijsc.2018.0254
35. Sakr S, Elshawi R, Ahmed A, Qureshi WT, Brawner C, Keteyian S,
et al. Using machine learning on cardiorespiratory fitness data for
predicting hypertension: The Henry Ford ExercIse Testing (FIT)
Project. PLoS One. 2018;13(4). doi:
10.1371/journal.pone.0195344
36. Hari Ganesh S, Gajenthiran M. Comparative study of data mining
approaches for prediction heart diseases. IOSR Journal of
Engineering. 2014 Jul;4(7):36-9. doi: 10.9790/3021-04733639
37. Dbritto R, Srinivasaraghavan A, Joseph V. Comparative analysis of
accuracy on heart disease prediction using classification methods.
International Journal of Applied Information Systems.
2016;11(2):22-5. doi: 10.5120/ijais2016451578
38. Gupta N, Ahuja N, Malhotra S, Bala A, Kaur G. Intelligent heart
disease prediction in cloud environment through ensembling.
Expert Systems. 2017 Jun;34(3):e12207. doi: 10.1111/exsy.12207
39. Huda, K and Saria, E Cardiac Catheterization Procedure Prediction
Using Machine Learning and Data Mining Techniques. IOSR
Journal of Computer Engineering (IOSR-JCE). 2019; 21(1): 86-92.
doi: 10.9790/0661-2101018692
40. Vijayarani S, Sudha S. Comparative analysis of classification
function techniques for heart disease prediction. International
Journal of Innovative Research in Computer and Communication
Engineering. 2013 May;1(3):735-41.
41. Singhal S, Jena M. A study on WEKA tool for data preprocessing,
classification and clustering. International Journal of Innovative
technology and exploring engineering (IJItee). 2013;2(6):250-3.
42. Frank E, Hall M, Holmes G, Kirkby R, Pfahringer B, Witten IH,
Trigg L. Weka-a machine learning workbench for data mining. In
Data mining and knowledge discovery handbook. 2010: pp.
1269-1277.