Web Mining Final Report

21
Final Report How likely Am I to have Diabetes 04/28/2015 FINAL REPORT WEB MINING Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925) Page 1

description

web mining

Transcript of Web Mining Final Report

Page 1: Web Mining Final Report

Final Report How likely Am I to have Diabetes 04/28/2015

FINAL REPORT

WEB MINING

Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925) Page 1

Page 2: Web Mining Final Report

Final Report How likely Am I to have Diabetes 04/28/2015

ContentsDescription..................................................................................................................................................3

Questions....................................................................................................................................................3

Data Dictionary............................................................................................................................................4

Sample Data................................................................................................................................................5

Outlier Detection.........................................................................................................................................6

Normalized Data..........................................................................................................................................6

Association Rules.........................................................................................................................................7

Performance Models...............................................................................................................................8

Naïve Bayes.............................................................................................................................................9

Neural Network.....................................................................................................................................10

SVM.......................................................................................................................................................11

Logistic Regression................................................................................................................................12

KNN.......................................................................................................................................................13

Evaluation Models:....................................................................................................................................14

Decision Tree:............................................................................................................................................15

Naïve Bayes...........................................................................................................................................15

Neural Net:................................................................................................................................................16

SVM.......................................................................................................................................................16

Logistic Regression................................................................................................................................16

K Nearest Neighbor:..................................................................................................................................17

Answers.....................................................................................................................................................18

References.................................................................................................................................................18

Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925) Page 2

Page 3: Web Mining Final Report

Final Report How likely Am I to have Diabetes 04/28/2015

Description:

Our dataset relates whether a patient shows signs of diabetes according to World Health Organization criteria (i.e., if the 2 hour post-load plasma glucose was at least 200 mg/dl at any survey examination or if found during routine medical care).

The population lives near Phoenix, Arizona, USA. Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here are females at least 21 years old of Pima Indian heritage. Each instance represents individual patients and their various medical attributes along with diabetes classification attributes.

Questions:

How likely is a particular patient affected by diabetes given the medical parameters?

Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925) Page 3

Page 4: Web Mining Final Report

Final Report How likely Am I to have Diabetes 04/28/2015

Data Dictionary

Attribute Description Data Type RangePregnancies Number of

PregnanciesNumeric(17) 0-17

PG Concentration

Plasma glucose at 2 hours in an oral glucose tolerance test

Numeric(199) 0-199

Diastolic BP Diastolic Blood Pressure (mm Hg)

Numeric(122) 0-122

Tri Fold Thick Triceps Skin Fold Thickness (mm)

Numeric(52) 0-52

Serums Ins 2-Hour Serum Insulin (mu U/ml)

Numeric(846) 0-846

BMI Body Mass Index: (weight in kg/ (height in m)^2)

Decimal(53.2) 0-53.2

DP Function Diabetes Pedigree Function

Decimal(1.353) 0.088-1.353

Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925) Page 4

Page 5: Web Mining Final Report

Final Report How likely Am I to have Diabetes 04/28/2015

Age Age (years) Numeric(66) 21-66

Diagnosis Is the patient Sick or Healthy?

Varchar(7) Healthy or Sick

Sample Data:

Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925) Page 5

Page 6: Web Mining Final Report

Final Report How likely Am I to have Diabetes 04/28/2015

Outlier Detection:

Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925) Page 6

Page 7: Web Mining Final Report

Final Report How likely Am I to have Diabetes 04/28/2015

Normalized Data:

Association Rules:

Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925) Page 7

Page 8: Web Mining Final Report

Final Report How likely Am I to have Diabetes 04/28/2015

Performance Models:

Decision Tree:

Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925) Page 8

Page 9: Web Mining Final Report

Final Report How likely Am I to have Diabetes 04/28/2015

Naïve Bayes:

Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925) Page 9

Page 10: Web Mining Final Report

Final Report How likely Am I to have Diabetes 04/28/2015

Neural Network:

Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925) Page 10

Page 11: Web Mining Final Report

Final Report How likely Am I to have Diabetes 04/28/2015

SVM:

Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925) Page 11

Page 12: Web Mining Final Report

Final Report How likely Am I to have Diabetes 04/28/2015

Logistic Regression:

Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925) Page 12

Page 13: Web Mining Final Report

Final Report How likely Am I to have Diabetes 04/28/2015

KNN:

Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925) Page 13

Page 14: Web Mining Final Report

Final Report How likely Am I to have Diabetes 04/28/2015

Performance Summary:

Model Name Accuracy

Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925) Page 14

Page 15: Web Mining Final Report

Final Report How likely Am I to have Diabetes 04/28/2015

Decision Tree 73.18Naïve Bayes 76.17Neural Network 79.95SVM 77.73Logistic Regression 76.43KNN 100

Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925) Page 15

Page 16: Web Mining Final Report

Final Report How likely Am I to have Diabetes 04/28/2015

Evaluation Models:Decision Tree

Naïve Bayes

Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925) Page 16

Page 17: Web Mining Final Report

Final Report How likely Am I to have Diabetes 04/28/2015

Neural Net:

SVM

Logistic Regression

Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925) Page 17

Page 18: Web Mining Final Report

Final Report How likely Am I to have Diabetes 04/28/2015

K Nearest Neighbor

Evaluation Summary:

Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925) Page 18

Page 19: Web Mining Final Report

Final Report How likely Am I to have Diabetes 04/28/2015

Model Name Accuracy (%)Decision tree 71.86Naïve Bayes 75.51Neural Net 74.74Regression 75.65SVM 76.95K Nearest Neighbor 68.24

Answers:

Using the SVM model the above result was generated with a prediction accuracy of 76.95%

Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925) Page 19

Page 20: Web Mining Final Report

Final Report How likely Am I to have Diabetes 04/28/2015

Therefore given the medical records of any patient with the above attributes, our model can diagnose the patient for diabetes with a decent accuracy.

References:

Professor’s Class notes YouTube Wikipedia WHO’s dataset

Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925) Page 20