Vol. 4, Special Issue 6, May 2015 Prediction of Heart Disease Using Naïve Bayes Algorithm ·...
Transcript of Vol. 4, Special Issue 6, May 2015 Prediction of Heart Disease Using Naïve Bayes Algorithm ·...
ISSN(Online) : 2319 - 8753
ISSN (Print) : 2347 - 6710
International Journal of Innovative Research in Science,
Engineering and Technology
(An ISO 3297: 2007 Certified Organization)
Vol. 4, Special Issue 6, May 2015
Copyright to IJIRSET www.ijirset.com 327
Prediction of Heart Disease Using Naïve Bayes
Algorithm
R.Karthiyayini1 , S.Chithaara
2
Assistant Professor, Department of computer Applications, Anna University, BIT campus, Tiruchirapalli, Tamilnadu,
India1
PG Scholar, Department of computer Applications, Anna University, BIT campus, Tiruchirapalli, Tamilnadu, India 2
ABSTRACT: The healthcare environment is generally perceived as being „information rich‟ yet „knowledge poor‟.
There is a wealth of data available within the healthcare systems. There is a lack of effective analysis tools to discover
hidden relationships and trends in data. Knowledge discovery and data mining have found numerous applications in
business and scientific domain. Valuable knowledge can be discovered from application of data mining techniques in
healthcare system. The potential use of classification based data mining technique Naïve Bayes to massive volume of
healthcare data. The healthcare industry collects huge amounts of healthcare data which, unfortunately, are not “mined”
to discover hidden information. Data preprocessing and effective decision making Naïve Bayes classifier is used. Using
medical profiles such as age, sex, blood pressure and blood sugar it can predict the likelihood of patients getting a heart
disease. The focus of this paper is to predict the heart disease using Naïve Bayes Algorithm.
KEYWORDS: Heart Disease, Naïve Bayes, Data mining.
1. INTRODUCTION
Knowledge discovery in databases is well-defined process consisting of several distinct steps. “Data mining is the non-
trivial extraction of implicit previously unknown and potentially useful information about data”. Data mining
technology provides a user-oriented approach to novel and hidden patterns in the data. The discovered knowledge can
be used by the healthcare administrators to improve the quality of service. A major challenge facing healthcare
organizations (hospitals, medical centres) is the provision of quality services at affordable costs. Quality service implies
diagnosing patients correctly and administering treatments that are effective. Hospitals must also minimize the cost of
clinical tests. They can achieve these results by employing appropriate computer-based information and/or decision
support systems. Health care data is massive. It includes patient centric data, resource management data and
transformed data. Health care organizations must have ability to analyze data. Treatment records of millions of patients
can be stored and computerized and data mining techniques may help in answering several important and critical
questions related to health care.
The availability of integrated information via the huge patient repositories, there is a shift in the perception
of clinicians, patients and payers from qualitative visualization of clinical data by demanding a more quantitative
assessment of information with the supporting of all clinical and imaging data. Medical diagnosis is considered as a
significant yet intricate task that needs to be carried out precisely and efficiently. Clinical decisions are often made
based on doctors‟ intuition and experience rather than on the knowledge rich data hidden in the database. This
suggestion is promising as data modeling and analysis tools, e.g., data mining, have the potential to generate a
knowledge-rich environment which can help to significantly improve the quality of clinical decisions.
EXISTING SYSTEM
The healthcare industry collects huge amounts of healthcare data which, unfortunately, are not “mined” to
discover hidden information. Clinical decisions are often made based on doctors‟ intuition and experience rather than
on the knowledge rich data hidden in the database. This practice leads to unwanted biases, errors and excessive medical
ISSN(Online) : 2319 - 8753
ISSN (Print) : 2347 - 6710
International Journal of Innovative Research in Science,
Engineering and Technology
(An ISO 3297: 2007 Certified Organization)
Vol. 4, Special Issue 6, May 2015
Copyright to IJIRSET www.ijirset.com 328
costs which affects the quality of service provided to patients. Many healthcare organizations struggle with the
utilization of data collected through an organization. Online transaction processing (OLTP) system that is not integrated
for decision making and pattern analysis.
PROPOSED SYSTEM
Knowledge discovery in databases is well-defined process consisting of several distinct steps.
1. Data mining is the core step, which results in the discovery of hidden but useful knowledge from massive
databases.
2. For successful healthcare organization it is important to empower the management and staff with data
warehousing based on critical thinking. Data warehousing can be supported by decision support tools such as
data mart, OLAP and data mining tools.
3. With stored data in two-dimensional format OLAP makes it possible to analyze potentially large amount of
data with very fast response times.
4. It provides the ability for users to go through the data and drill down or roll up through various dimensions as
defined by the data structure.
This paper consists of four phases. The following are:
USER ENROLLMENT
TRAINING SET MAINTENANCE
STORAGE OF RELEVANT USER PROFILE
REPORT GENERATION
1.1 USER ENROLLMENT:
In the Health care environment the data have been collected and mined and stored in a particular
locations. The users can access those locations by performing certain operations and can view the pattern generated.
The users are the doctors, researches, peoples who can enter the website and are allowed to view the records. The
doctors are the authorized persons who can update, insert, view and delete the records. The hospitals should be
registered properly and the details of the respective patients are mined and records are maintained. This helps the users
to access the details of the patients very easily and there is 100% conformance record of the patients and their
respective diseases. And this data mining is made for particular diseases and only for registered hospitals. The users can
also check out the information provided such as symptoms, causes, locations, number of hospitals and the brief
descriptions.
1.2 TRAINING SET MAINTENANCE:
The hospitals which are to be registered need to contact the administrator. The administrator provides the
login ID for the doctors working in the hospitals which have been registered. The doctors who wants to be a member of
the website, but working in the nonregistered hospitals also can create their account directly. This doesn‟t mean there is
no proper authentication, but paves the way to provide a better knowledge discovery. Those outside doctors can register
their details and want to send the details to the administrator. The administrator then check out the details and finally if
the verification is over, admin sends the ID and password to the respected doctor‟s email. If any other problem arises
then immediately the admin is called, the internal operations are properly handled by the admin only.
ISSN(Online) : 2319 - 8753
ISSN (Print) : 2347 - 6710
International Journal of Innovative Research in Science,
Engineering and Technology
(An ISO 3297: 2007 Certified Organization)
Vol. 4, Special Issue 6, May 2015
Copyright to IJIRSET www.ijirset.com 329
1.3 STORAGE OF RELEVANT USER PROFILE:
The doctors can login with their user name and they are provided to view the patient record. The patient‟s
record are updated and inserted by the doctor and make an entry in the database. Through this the patient‟s record is
maintained perfectly. The hospitals are registered and they are provided the registration ID, this provides the users to
view the registered hospital and their doctors list and patient details. This will be useful to generate the pattern using
NAIVE BAYESIAN classifier. This pattern will provide the appropriate occurrence diseases and its effects.
1.4 REPORT GENERATION:
The patient‟s record maintained is then make the use of generating the pattern. The rate of death due to
heart diseases and their risk factors are shown in the pattern and effectively generated. This pattern creates the user to
give right decision through effective mining of data‟s from a certain number of hospital‟s and the % of causes. The
patterns are generated with different charts :
-Year chart
-Age chart
- Gender chart
The probability of the states are compared and executed through NAIVE BAYESIAN Algorithm.
II. ARCHITECTURE
ISSN(Online) : 2319 - 8753
ISSN (Print) : 2347 - 6710
International Journal of Innovative Research in Science,
Engineering and Technology
(An ISO 3297: 2007 Certified Organization)
Vol. 4, Special Issue 6, May 2015
Copyright to IJIRSET www.ijirset.com 330
1. ALGORITHM:
1.1 PREDICTION ALGORITHM:
The Bayesian Classification represents a supervised learning method as well as a statistical method for
classification. Assumes an underlying probabilistic model and it allows us to capture uncertainty about the model in a
principled way by determining probabilities of the outcomes. It can solve diagnostic and predictive problems.
1.2 STEPS:
The Naive Bayes algorithm is based on Bayesian theorem Steps in algorithm are as follows:
1. Each data sample is represented by an n dimensional feature vector, X = (x1, x2….. xn), depicting n
measurements made on the sample from n attributes, respectively A1, A2, An.
2. Suppose that there are m classes, C1, C2……Cm. Given an unknown data sample, X (i.e., having no class
label), the classifier will predict that X belongs to the class having the highest posterior probability, conditioned if and
only if:
P(Ci/X)>P(Cj/X) for all 1< = j< = m and j!= i
Thus we maximize P(Ci|X). The class Ci for which P(Ci|X) is maximized is called the maximum posteriori
hypothesis. By Bayes theorem,
3. As P(X) is constant for all classes, only P(X|Ci)P(Ci) need be maximized. If the class prior probabilities
are not known, then it is commonly assumed that the classes are equally likely, i.e. P(C1) = P(C2) = …..= P(Cm), and
we would therefore maximize P(X|Ci). Otherwise, we maximize P(X|Ci)P(Ci).
2. SCREEN SHOTS:
HEART
DISEASE
PREDICTION
AGE
YEAR
GENDER
ISSN(Online) : 2319 - 8753
ISSN (Print) : 2347 - 6710
International Journal of Innovative Research in Science,
Engineering and Technology
(An ISO 3297: 2007 Certified Organization)
Vol. 4, Special Issue 6, May 2015
Copyright to IJIRSET www.ijirset.com 331
ISSN(Online) : 2319 - 8753
ISSN (Print) : 2347 - 6710
International Journal of Innovative Research in Science,
Engineering and Technology
(An ISO 3297: 2007 Certified Organization)
Vol. 4, Special Issue 6, May 2015
Copyright to IJIRSET www.ijirset.com 332
ISSN(Online) : 2319 - 8753
ISSN (Print) : 2347 - 6710
International Journal of Innovative Research in Science,
Engineering and Technology
(An ISO 3297: 2007 Certified Organization)
Vol. 4, Special Issue 6, May 2015
Copyright to IJIRSET www.ijirset.com 333
ISSN(Online) : 2319 - 8753
ISSN (Print) : 2347 - 6710
International Journal of Innovative Research in Science,
Engineering and Technology
(An ISO 3297: 2007 Certified Organization)
Vol. 4, Special Issue 6, May 2015
Copyright to IJIRSET www.ijirset.com 334
ISSN(Online) : 2319 - 8753
ISSN (Print) : 2347 - 6710
International Journal of Innovative Research in Science,
Engineering and Technology
(An ISO 3297: 2007 Certified Organization)
Vol. 4, Special Issue 6, May 2015
Copyright to IJIRSET www.ijirset.com 335
III. CONCLUSION
This system present the problem of constraining and summarizing different algorithms of data mining. It
focused on using different algorithms for predicting combinations of several target attributes. In this paper, have
presented an intelligent and effective heart attack prediction methods using data mining. Firstly, an efficient approach
for the extraction of significant patterns from the heart disease data warehouses for the efficient prediction of heart
attack Based on the calculated significant weightage, the frequent patterns having value greater than a predefined
threshold were chosen for the valuable prediction of heart attack. All these models could answer complex queries in
predicting heart attack. This system classifies the given data into different categories and also predicts the risk of the
heart disease if unknown sample is given as an input. The system can be served as training tool for medical students.
Also, it will be helping hand for doctor.
REFERENCES
[1].Mai Shouman, Tim Turner, Rob Stocker, “Using data mining techniques in heart disease diagnosis and treatment”, JapanEgypt Conference on
Electronics, Communications and Computers 978-1-4673-0483-2 c_2012 IEEE. [2]. N. Aaditya Sunder, P. PushpaLatha, “Performance analysis of classification data mining techniques over heart disease database” Inernational
Journal Of Engineering Science and Advance Technology”-vol-2 issue-3,470-478,May-June 2012.
[3.] Sellappan Palaniappan, Rafiah Awang, Intelligent Heart Disease Prediction System Using Data Mining Techniques, 978-1-4244-1968- 5/08/$25.00 ©2008 IEEE.
[4] Han, J., Kamber, M.: “Data Mining Concepts and Techniques”, Morgan Kaufmann Publishers, 2006.
[5] Shantakumar B.Patil, Y.S.Kumaraswamy, Intelligent and Effective Heart Attack Prediction System Using Data Mining and Artificial Neural Network, European Journal of Scientific Research ISSN 1450-216X Vol.31 No.4 (2009), pp.642656 © EuroJournals Publishing, Inc. 2009.