Analysis of Weka Data Mining Techniques for Heart Disease … ›...

10
Int J Med Rev 2020 Jan;7(1): 15-24 10.30491/ijmr.2020.221474.1078 Copyright © 2020 The Author(s). This is an open-access article distributed under the terms of the Creative Commons Attribution License (http:// creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. INTERNATIONAL JOURNAL OF MEDICAL REVIEWS Narrative Review Analysis of Weka Data Mining Techniques for Heart Disease Prediction System Basma Saleh *, Ahmed Saeidi, Ali al-Aqbi, Lamees Salman Al-Mustansiriyah University, Baghdad, Iraq * Corresponding Author: Basma Saleh Al-Mustansiriyah University, Baghdad, Iraq. Email: [email protected] Received December 01, 2019; Accepted December 27, 2019; Online Published January 01, 2020 Abstract Data mining is characterized as searching for useful information through very large data sets. Some of the key and most common techniques for data mining are association rules, classification, clustering, prediction, and sequential models. For a wide range of applications, data mining techniques are used. Data mining plays a significant role in disease detection in the health care industry. The patient should be needed to detect a number of tests for the disease. However, the number of tests should be reduced by using data mining techniques. In time and performance, this reduced test plays an important role. Heart disease is a cardiovascular disease that causes death. Health problems are enormous in this recent situation because of the prediction and the classification of health problems in different situations. The data mining area included the prediction and identification of abnormality and its risk rate in these domains. Today the health industry holds hidden information essential for decision- making. For predicting heart problems, data extraction algorithms like K-star, J48, SMO, Naïve Bayes, MLP, Random Forest, Bayes Net, and REPTREE are used for this study (Weka 3.8.3) software. The results of the predictive accuracy, the ROC curve, and the AUC value are combined using a standard set of data and a collected dataset. By applying different data mining algorithms, the patient data can be used for diagnosis as training samples. The main drawbacks of the previous studies are that they need accurate and the number of features. This paper surveys recent data mining techniques applied to predict heart diseases. And Identifying the major risk factors of Heart Disease categorizing the risk factors in an order which causes damages to the heart such as high blood cholesterol, diabetes, smoking, poor diet, obesity, hypertension, stress, etc. Data mining functions and techniques are used to identify the level of risk factors to help the patients in taking precautions in advance to save their life. Keywords: Data mining, Heart Disease, Prediction, Classification, Dataset, WEKA tool, Prediction techniques.

Transcript of Analysis of Weka Data Mining Techniques for Heart Disease … ›...

Page 1: Analysis of Weka Data Mining Techniques for Heart Disease … › article_107189_0c249ea2d6de173e0592abc... · 2020-07-02 · 30. Kalmegh S. Analysis of weka data mining algorithm

Int J Med Rev 2020 Jan;7(1): 15-24

10.30491/ijmr.2020.221474.1078

Copyright © 2020 The Author(s). This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://

creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly

cited.

INTERNATIONAL JOURNAL OF

MEDICAL

REVIEWS

Narrative Review

Analysis of Weka Data Mining Techniques for Heart Disease

Prediction System

Basma Saleh *, Ahmed Saeidi, Ali al-Aqbi, Lamees Salman

Al-Mustansiriyah University, Baghdad, Iraq

* Corresponding Author: Basma Saleh Al-Mustansiriyah University, Baghdad, Iraq. Email: [email protected]

Received December 01, 2019; Accepted December 27, 2019; Online Published January 01, 2020

Abstract

Data mining is characterized as searching for useful information through very large data sets. Some of the key and most common techniques for

data mining are association rules, classification, clustering, prediction, and sequential models. For a wide range of applications, data mining

techniques are used. Data mining plays a significant role in disease detection in the health care industry. The patient should be needed to detect

a number of tests for the disease. However, the number of tests should be reduced by using data mining techniques. In time and performance, this

reduced test plays an important role. Heart disease is a cardiovascular disease that causes death. Health problems are enormous in this recent

situation because of the prediction and the classification of health problems in different situations. The data mining area included the prediction

and identification of abnormality and its risk rate in these domains. Today the health industry holds hidden information essential for decision-

making. For predicting heart problems, data extraction algorithms like K-star, J48, SMO, Naïve Bayes, MLP, Random Forest, Bayes Net, and

REPTREE are used for this study (Weka 3.8.3) software. The results of the predictive accuracy, the ROC curve, and the AUC value are combined

using a standard set of data and a collected dataset. By applying different data mining algorithms, the patient data can be used for diagnosis as

training samples. The main drawbacks of the previous studies are that they need accurate and the number of features. This paper surveys recent

data mining techniques applied to predict heart diseases. And Identifying the major risk factors of Heart Disease categorizing the risk factors in an

order which causes damages to the heart such as high blood cholesterol, diabetes, smoking, poor diet, obesity, hypertension, stress, etc. Data

mining functions and techniques are used to identify the level of risk factors to help the patients in taking precautions in advance to save their life.

Keywords: Data mining, Heart Disease, Prediction, Classification, Dataset, WEKA tool, Prediction techniques.

Page 2: Analysis of Weka Data Mining Techniques for Heart Disease … › article_107189_0c249ea2d6de173e0592abc... · 2020-07-02 · 30. Kalmegh S. Analysis of weka data mining algorithm

Saleh et al

16 | International Journal of Medical Reviews. 2020;7(1)15-24

Page 3: Analysis of Weka Data Mining Techniques for Heart Disease … › article_107189_0c249ea2d6de173e0592abc... · 2020-07-02 · 30. Kalmegh S. Analysis of weka data mining algorithm

Analysis of Weka Data Mining Techniques for Heart Disease Prediction System

International Journal of Medical Reviews. 2020;7(1):15-24 | 17

Page 4: Analysis of Weka Data Mining Techniques for Heart Disease … › article_107189_0c249ea2d6de173e0592abc... · 2020-07-02 · 30. Kalmegh S. Analysis of weka data mining algorithm

Saleh et al

18 | International Journal of Medical Reviews. 2020;7(1)15-24

Page 5: Analysis of Weka Data Mining Techniques for Heart Disease … › article_107189_0c249ea2d6de173e0592abc... · 2020-07-02 · 30. Kalmegh S. Analysis of weka data mining algorithm

Analysis of Weka Data Mining Techniques for Heart Disease Prediction System

International Journal of Medical Reviews. 2020;7(1):15-24 | 19

Page 6: Analysis of Weka Data Mining Techniques for Heart Disease … › article_107189_0c249ea2d6de173e0592abc... · 2020-07-02 · 30. Kalmegh S. Analysis of weka data mining algorithm

Saleh et al

20 | International Journal of Medical Reviews. 2020;7(1)15-24

Table 1. Commonly used Algorithms

Generic Name WEKA Name

Bayesian Network Naïve Bayes(NB)

Support Vector Machine SMO

C4.5 Decision Tree J48

K-Nearest Neighbor 1Bk

Workbench

Page 7: Analysis of Weka Data Mining Techniques for Heart Disease … › article_107189_0c249ea2d6de173e0592abc... · 2020-07-02 · 30. Kalmegh S. Analysis of weka data mining algorithm

Analysis of Weka Data Mining Techniques for Heart Disease Prediction System

International Journal of Medical Reviews. 2020;7(1):15-24 | 21

Figure 1. Weka GUI chooser.

Figure-2. The Explorer Interface

Page 8: Analysis of Weka Data Mining Techniques for Heart Disease … › article_107189_0c249ea2d6de173e0592abc... · 2020-07-02 · 30. Kalmegh S. Analysis of weka data mining algorithm

Saleh et al

22 | International Journal of Medical Reviews. 2020;7(1)15-24

Figure-3. The Knowledge Flow Interface

Figure-4. The Experimenter Interface

𝑀𝐶𝐶 =𝑇𝑃∗𝑇𝑁−𝐹𝑃∗𝐹𝑁

√(𝑇𝑃+𝐹𝑃)(𝑇𝑃+𝐹𝑁)(𝑇𝑁+𝐹𝑃)(𝑇𝑁+𝐹𝑁)

Page 9: Analysis of Weka Data Mining Techniques for Heart Disease … › article_107189_0c249ea2d6de173e0592abc... · 2020-07-02 · 30. Kalmegh S. Analysis of weka data mining algorithm

Analysis of Weka Data Mining Techniques for Heart Disease Prediction System

International Journal of Medical Reviews. 2020;7(1):15-24 | 23

Table-3. different outcomes of a two-class prediction

Actual class

Predicted class

YES No

YES True Positive (TP) False Negative (FN)

No False Positive (FP) True Negative (TN)

𝑇𝑃+𝑇𝑁

𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁

RMSE = √1

𝑛∑ (

𝜌𝑖,𝑗−𝜏𝑗

𝜏𝑗)2

𝑛𝑗=1

𝜌𝑖,𝑗

𝜏𝑗

1. Patel AK, Shukla DP. A data mining technique for prediction of

coronary heart disease using neuro-fuzzy integrated approach two

level Patel AK, Shukla DP. A data mining technique for prediction

of coronary heart disease using neuro-fuzzy integrated approach

two level. International Journal of Engineering and Computer

Science. 2013;2(09). 2663-2671.

2. Kundra D, Kaur N. Review on prediction system for heart diagnosis

using data mining techniques. Int. J. Latest Res. Eng. Technol. 2015

Oct;1(5):9-14.

3. Adamson SJ, Heather N, Morton V, Raistrick D. Initial preference

for drinking goal in the treatment of alcohol problems: II.

Treatment outcomes. Alcohol & Alcoholism. 2010 Mar

1;45(2):136-42. doi: 10.1093/alcalc/agq005

4. Lal, B Suresh. Diabetes: Causes, Symptoms And Treatments. Public

Health Environment and Social Issues in India. 1 st. : Serials

Publications, 2016, 5.

5. Shahsavarani AM, Azad Marz Abadi E, Hakimi Kalkhoran M.

Stress: Facts and theories through literature review. International

Journal of Medical Reviews. 2015 Jun 1;2(2):230-41.

6. Control, South Carolina Department of Health and Environmental.

What Is High Blood Pressure.

https://dc.statelibrary.sc.gov/handle/10827/25131. [Online] 2017.

[Cited: December 15, 2019.]

7. Sultana M, Haider A, Uddin MS. Analysis of data mining

techniques for heart disease prediction. In2016 3rd International

Conference on Electrical Engineering and Information

Communication Technology (ICEEICT) 2016 Sep 22 (pp. 1-5).

IEEE. doi: 10.1109/CEEICT.2016.7873142

8. Sujata, Joshi and Mydhili, K., Nair. Prediction of Heart Disease

Using Classification Based Data Mining Techniques. [book auth.]

L. Jain, et al., et al. Computational Intelligence in Data Mining -

Volume 2 Smart Innovation, Systems and Technologies. s.l. :

Springer, New Delhi, 2014, Vol. 32, pp. 503-511. doi:

10.1007/978-81-322-2208-8_46

9. Dangare C, Apte S. A data mining approach for prediction of heart

disease using neural networks. International Journal of Computer

Engineering and Technology (IJCET). 2012 Oct 14;3(3).

10. Heart Disease—General Info and Peer reviewed studies.

http://www.aristoloft.com. [Online] [Cited: December 15, 2019.]

11. Venkatalakshmi B, Shivsankar MV. Heart disease diagnosis using

predictive data mining. International Journal of Innovative

Research in Science, Engineering and Technology. 2014

Mar;3(3):1873-7. doi: 10.1109/ICECA.2017.8203643

12. Jabbar MA, Samreen S. Heart disease prediction system based on

hidden naïve bayes classifier. In 2016 International Conference

on Circuits, Controls, Communications and Computing (I4C) 2016

Oct 4 :pp. 1-5. doi: 10.1109/cimca.2016.8053261

13. Orphanou K, Dagliati A, Sacchi L, Stassopoulou A, Keravnou E,

Bellazzi R. Incorporating repeating temporal association rules in

naïve bayes classifiers for coronary heart disease diagnosis.

Journal of biomedical informatics. 2018 May 1;81:74-82. doi:

10.1016/j.jbi.2018.03.002

14. Alizadehsani R, Habibi J, Hosseini MJ, Mashayekhi H, Boghrati R,

Ghandeharioun A, Bahadorian B, Sani ZA. A data mining

approach for diagnosis of coronary artery disease. Computer

methods and programs in biomedicine. 2013 Jul 1;111(1):52-61.

doi: 10.1016/j.cmpb.2013.03.004

15. Ramotra AK, Mahajan A, Kumar R, Mansotra V. Comparative

Analysis of Data Mining Classification Techniques for Prediction

of Heart Disease Using the Weka and SPSS Modeler Tools. InSmart

Trends in Computing and Communications 2020 :pp. 89-97.

Springer, Singapore. doi: 10.1007/978-981-15-0077-0_10

16. Han J, Kamber M, Pei J. Data mining concepts and techniques

third edition. Morgan Kaufmann. 2011.[Book]

17. Aljaaf AJ, Al-Jumeily D, Hussain AJ, Dawson T, Fergus P, Al-

Jumaily M. Predicting the likelihood of heart failure with a multi

level risk assessment using decision tree. In2015 Third

International Conference on Technological Advances in Electrical,

Electronics and Computer Engineering (TAEECE) 2015 Apr 29 :pp.

101-106. doi: 10.1109/TAEECE.2015.7113608

18. Alexopoulos E, Dounias GD, Vemmos K. Medical diagnosis of

stroke using inductive machine learning. Machine Learning and

Applications: Machine Learning in Medical Applications. 1999

Jul:20-3.

19. Masetic Z, Subasi A. Congestive heart failure detection using

random forest classifier. Computer methods and programs in

biomedicine. 2016 ;130:54-64. doi: 10.1016/j.cmpb.2016.03.020

20. Lu H, Setiono R, Liu H. Effective data mining using neural

Page 10: Analysis of Weka Data Mining Techniques for Heart Disease … › article_107189_0c249ea2d6de173e0592abc... · 2020-07-02 · 30. Kalmegh S. Analysis of weka data mining algorithm

Saleh et al

24 | International Journal of Medical Reviews. 2020;7(1)15-24

networks. IEEE transactions on knowledge and data engineering.

1996 Dec;8(6):957-61. doi: 10.1109/69.553163

21. Gharehchopogh FS, Khalifelu ZA. Neural network application in

diagnosis of patient: a case study. InInternational Conference on

Computer Networks and Information Technology 2011 Jul 11: pp.

245–249. doi: 10.1109/ICCNIT.2011.6020937

22. Saxena K, Sharma R. Efficient heart disease prediction system.

Procedia Computer Science. 2016;85:962-9. doi:

10.1016/j.procs.2016.05.288

23. Nashif S, Raihan MR, Islam MR, Imam MH. Heart disease

detection by using machine learning algorithms and a real-time

cardiovascular health monitoring system. World Journal of

Engineering and Technology. 2018 Sep 12;6(4):854-73. doi:

10.4236/wjet.2018.64057

24. Jothikumar, R and Sivabalan, V. Analysis of Classification

Algorithms for Heart Disease Prediction and its Accuracies.

Middle-East Journal of Scientific Research. 2016; 24: pp. 200-206.

doi: 10.5829/idosi.mejsr.2016.24.RIETMA133

25. Sarangam Kodati DR. Analysis of heart disease using in data

mining tools Orange and Weka. Global Journal of Computer

Science and Technology. 2018; 18(1c): 16-22.

26. Platt J. Sequential minimal optimization: A fast algorithm for

training support vector machines. 1998; 98-14.

27. Oo AN, Naing T. Smo and Lazy Classifiers for Heart Disease

Prediction. International Journal of Advance Research and

Innovative Ideas in Education. 2019; 5: 2395-4396.

28. Latha CB, Jeeva SC. Improving the accuracy of prediction of heart

disease risk based on ensemble classification techniques.

Informatics in Medicine Unlocked. 2019 Jan 1;16: pp. 1-9. doi:

10.1016/j.imu.2019.100203

29. Mirmozaffari M, Alinezhad A, Gilanpour A. Data Mining Apriori

Algorithm for Heart Disease Prediction. Int'l Journal of Computing,

Communications & Instrumentation Engg. 2017;4(1):20-3.

30. Kalmegh S. Analysis of weka data mining algorithm reptree, simple

cart and randomtree for classification of indian news. International

Journal of Innovative Science, Engineering & Technology.

2015;2(2):438-46.

31. Witten, I. H. and Frank, E. Data Mining Practical Machine

Learning Tools and Techniques. Morgan Kaufman Publishers,

2005. p. 587.[Book]

32. Kirkby, R. WEKA Explorer User Guide for version 3-4. University

of Weikato, 2002. pp. 3-4.

33. Kumar MN, Koushik KV, Deepak K. Prediction of heart diseases

using data mining and machine learning algorithms and tools.

International Journal of Scientific Research in Computer Science,

Engineering and Information Technology. 2018;3(3): 887-898.

doi: 10.14419/ijet.v7i2.32.15714

34. Benjamin, H., Fredrick, David and S. Antony, Belcy. Heart Disease

Prediction Using Data Mining Techniques. Ictact Journal On Soft

Computing. 2018; 9. doi: 10.21917/ijsc.2018.0254

35. Sakr S, Elshawi R, Ahmed A, Qureshi WT, Brawner C, Keteyian S,

et al. Using machine learning on cardiorespiratory fitness data for

predicting hypertension: The Henry Ford ExercIse Testing (FIT)

Project. PLoS One. 2018;13(4). doi:

10.1371/journal.pone.0195344

36. Hari Ganesh S, Gajenthiran M. Comparative study of data mining

approaches for prediction heart diseases. IOSR Journal of

Engineering. 2014 Jul;4(7):36-9. doi: 10.9790/3021-04733639

37. Dbritto R, Srinivasaraghavan A, Joseph V. Comparative analysis of

accuracy on heart disease prediction using classification methods.

International Journal of Applied Information Systems.

2016;11(2):22-5. doi: 10.5120/ijais2016451578

38. Gupta N, Ahuja N, Malhotra S, Bala A, Kaur G. Intelligent heart

disease prediction in cloud environment through ensembling.

Expert Systems. 2017 Jun;34(3):e12207. doi: 10.1111/exsy.12207

39. Huda, K and Saria, E Cardiac Catheterization Procedure Prediction

Using Machine Learning and Data Mining Techniques. IOSR

Journal of Computer Engineering (IOSR-JCE). 2019; 21(1): 86-92.

doi: 10.9790/0661-2101018692

40. Vijayarani S, Sudha S. Comparative analysis of classification

function techniques for heart disease prediction. International

Journal of Innovative Research in Computer and Communication

Engineering. 2013 May;1(3):735-41.

41. Singhal S, Jena M. A study on WEKA tool for data preprocessing,

classification and clustering. International Journal of Innovative

technology and exploring engineering (IJItee). 2013;2(6):250-3.

42. Frank E, Hall M, Holmes G, Kirkby R, Pfahringer B, Witten IH,

Trigg L. Weka-a machine learning workbench for data mining. In

Data mining and knowledge discovery handbook. 2010: pp.

1269-1277.