HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHM

Post on 16-Apr-2017

1.780 views 4 download

Transcript of HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHM

HEARTDISEASE PREDICTION USING NAÏVE BAYES CLASSIFIER

PRESENTED BY:-AMITESH GAURAV ASHOK RAJAKSHANU SONI

ABSTRACT

The main objective of this research is to develop an Intelligent System using data mining modeling technique, name, Naive Bayes.

It is implemented as web based application in this user answers the predefined questions.

It retrieves hidden data from stored database and compares the user values with trained data set.

It can answer complex queries for diagnosing heart disease and thus assist healthcare practitioners to make intelligent clinical decisions which traditional decision support systems cannot.

By providing effective treatments, it also helps to reduce treatment costs.

INTRODUCTION The Bayes theorem was developed and named for THOMAS BAYES (1702-1761).

“Naive” because it is based on independence assumption.

Describes what makes something "evidence" and how much evidence it is.

Bayesian Classifiers are statistical classifiers.

They can predict the probability that a data item is a member of a particular class.

Original Belief + = New Belief Observation

EXAMPLE

• 1% of women at age forty who participate in routine screening have breast cancer. 

• 80% of women with breast cancer will get positive Mammographies.  • 9.6% of women without breast cancer will also get positive

Mammographies. 

A woman in this age group had a positive mammography in a routine screening.  What is the probability that she actually has breast cancer?

WITHOUT BAYES THEOREM

• Create a large sample size and use probabilities given in the problem to work out the problem.

• Assume, for example, that 10,000 women participate in a routine screening for breast cancer. 1%, or 100 women, have breast cancer. 80% of women with breast cancer, 80 women, will get positive mammographies. 9.6%,950 women, of the 9900 women who don’t have breast cancer will also get positive mammographies.

• Create a table using the numbers obtained from the assumed sample size and determine the answer.

WITHOUT BAYES THEOREM CONTD.

Out of the 1030 women who get positive mammographies only 80 actually have breast cancer, therefore, the probability is 80/1030 or 7.767%

USING BAYES ALGORITHM

where A and B are events…

•P(A) and P(B) are the probabilities of A and B without regard to each other.

•P(A | B), a conditional probability, is the probability of observing event A given that B is true.

•P(B | A), is the probability of observing event B given that A is true.

USING BAYES ALGORITHM CONTD.

• 1% of women at age forty who participate in routine screening have breast cancer. 

P(B)= 0.01

• 80% of women with breast cancer will get positive mammographies. P(A│B) = 0.8

• 9.6% of women without breast cancer will also get positive mammographies.  P(A│B’) = 0.096

• A woman in this age group had a positive mammography in a routine screening.  What is the probability that she actually has breast cancer?

Find P(B│A) ?

USING BAYES ALGORITHM CONTD.

P(B│A) = P(A│B) P(B) P(A) P(B), P(A│B), and P(A│B’) are known. P(A) is needed to find P(B│A).

P(A) = P(A│B) P(B) + P(A│B’) P(B’) P(A) = (0.8) ( 0.01) + (0.096) (0.99) P(A) = 0.1030

P(B│A) = (0.8) (0.01) (0.1030)

P(B│A) = 0.07767

WHY PREFER NAÏVE BAYES ALGORITHM ?

Naive Bayes or Bayes’ Rule is the basis for many machine learning and data mining methods. The rule (algorithm) is used to create models with predictive capabilities. It provides new ways of exploring and understanding data.

Why to prefer naive Bayes implementation :-

1) When the data is high.

2) When the attributes are independent of each other.

3) When we expect more efficient output, as compared to other methods output.

BAYES CLASSIFIER USES IN HEART DISEASE PREDICTION

Using medical profiles such as age, sex, blood pressure and blood sugar, chest pain, ECG graph etc.

It can predict the likelihood of patients getting a heart disease.

It will be implemented in PYTHON as an application which takes medical test’s parameter as an input.

It can be used as a training tool to train nurses and medical students to diagnose

patients with heart disease.

DATA SOURCE Predictable attribute:-

1. Diagnosis (value 0: <50% diameter narrowing (no heart disease); value 1: >50% diameter narrowing (has heart disease))

Input attributes:-1. Age in Year2. Sex (value 1: Male; value 0: Female)3. Chest Pain Type (value 1:typical type1 angina, value 2:typical type 2 angina, value 3:non-angina pain; value 4:asymptomatic)4. Fasting Blood Sugar (value 1: >120 mg/dl; value 0: <120mg/dl)5. Restecg – resting electrographic results (value 0:normal;value 1: having ST-T wave abnormality; value 2: showingprobable or definite left ventricular hypertrophy)6. Exang - exercise induced angina (value 1: yes; value 0: no)

7. Thalach – maximum heart rate achieved8. Old peak – ST depression induced by exercise9. Heart Disease Present - 0:No 1: Yes

IMPLEMENTATION OF BAYESIAN CLASSIFICATION

The Naïve Bayes Classifier technique is mainly applicable when the dimensionality of the inputs is high.

Despite its simplicity, Naive Bayes can often outperform more sophisticated classification methods.

Naïve Bayes model recognizes the characteristics of patients with heart disease.

It shows the probability of each input attribute for the predictable state.

CONCLUSION Decision Support in Heart Disease Prediction System is developed using Naive Bayesian

Classification . The system extracts hidden knowledge from a historical heart disease database. This model could answer complex queries, each with its own strength with ease of model

interpretation and an easy access to detailed information and accuracy. The system is expandable in the sense that more number of records or attributes can be

incorporated and new significant rules can be generated using underlying Data Mining technique.

Presently the system has been using 9 attributes of medical diagnosis. It can also incorporate other data mining techniques and additional attributes for prediction.