Web Mining Final Report
-
Upload
sasank-daggubatti -
Category
Documents
-
view
218 -
download
0
description
Transcript of Web Mining Final Report
Final Report How likely Am I to have Diabetes 04/28/2015
FINAL REPORT
WEB MINING
Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925) Page 1
Final Report How likely Am I to have Diabetes 04/28/2015
ContentsDescription..................................................................................................................................................3
Questions....................................................................................................................................................3
Data Dictionary............................................................................................................................................4
Sample Data................................................................................................................................................5
Outlier Detection.........................................................................................................................................6
Normalized Data..........................................................................................................................................6
Association Rules.........................................................................................................................................7
Performance Models...............................................................................................................................8
Naïve Bayes.............................................................................................................................................9
Neural Network.....................................................................................................................................10
SVM.......................................................................................................................................................11
Logistic Regression................................................................................................................................12
KNN.......................................................................................................................................................13
Evaluation Models:....................................................................................................................................14
Decision Tree:............................................................................................................................................15
Naïve Bayes...........................................................................................................................................15
Neural Net:................................................................................................................................................16
SVM.......................................................................................................................................................16
Logistic Regression................................................................................................................................16
K Nearest Neighbor:..................................................................................................................................17
Answers.....................................................................................................................................................18
References.................................................................................................................................................18
Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925) Page 2
Final Report How likely Am I to have Diabetes 04/28/2015
Description:
Our dataset relates whether a patient shows signs of diabetes according to World Health Organization criteria (i.e., if the 2 hour post-load plasma glucose was at least 200 mg/dl at any survey examination or if found during routine medical care).
The population lives near Phoenix, Arizona, USA. Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here are females at least 21 years old of Pima Indian heritage. Each instance represents individual patients and their various medical attributes along with diabetes classification attributes.
Questions:
How likely is a particular patient affected by diabetes given the medical parameters?
Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925) Page 3
Final Report How likely Am I to have Diabetes 04/28/2015
Data Dictionary
Attribute Description Data Type RangePregnancies Number of
PregnanciesNumeric(17) 0-17
PG Concentration
Plasma glucose at 2 hours in an oral glucose tolerance test
Numeric(199) 0-199
Diastolic BP Diastolic Blood Pressure (mm Hg)
Numeric(122) 0-122
Tri Fold Thick Triceps Skin Fold Thickness (mm)
Numeric(52) 0-52
Serums Ins 2-Hour Serum Insulin (mu U/ml)
Numeric(846) 0-846
BMI Body Mass Index: (weight in kg/ (height in m)^2)
Decimal(53.2) 0-53.2
DP Function Diabetes Pedigree Function
Decimal(1.353) 0.088-1.353
Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925) Page 4
Final Report How likely Am I to have Diabetes 04/28/2015
Age Age (years) Numeric(66) 21-66
Diagnosis Is the patient Sick or Healthy?
Varchar(7) Healthy or Sick
Sample Data:
Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925) Page 5
Final Report How likely Am I to have Diabetes 04/28/2015
Outlier Detection:
Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925) Page 6
Final Report How likely Am I to have Diabetes 04/28/2015
Normalized Data:
Association Rules:
Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925) Page 7
Final Report How likely Am I to have Diabetes 04/28/2015
Performance Models:
Decision Tree:
Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925) Page 8
Final Report How likely Am I to have Diabetes 04/28/2015
Naïve Bayes:
Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925) Page 9
Final Report How likely Am I to have Diabetes 04/28/2015
Neural Network:
Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925) Page 10
Final Report How likely Am I to have Diabetes 04/28/2015
SVM:
Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925) Page 11
Final Report How likely Am I to have Diabetes 04/28/2015
Logistic Regression:
Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925) Page 12
Final Report How likely Am I to have Diabetes 04/28/2015
KNN:
Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925) Page 13
Final Report How likely Am I to have Diabetes 04/28/2015
Performance Summary:
Model Name Accuracy
Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925) Page 14
Final Report How likely Am I to have Diabetes 04/28/2015
Decision Tree 73.18Naïve Bayes 76.17Neural Network 79.95SVM 77.73Logistic Regression 76.43KNN 100
Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925) Page 15
Final Report How likely Am I to have Diabetes 04/28/2015
Evaluation Models:Decision Tree
Naïve Bayes
Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925) Page 16
Final Report How likely Am I to have Diabetes 04/28/2015
Neural Net:
SVM
Logistic Regression
Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925) Page 17
Final Report How likely Am I to have Diabetes 04/28/2015
K Nearest Neighbor
Evaluation Summary:
Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925) Page 18
Final Report How likely Am I to have Diabetes 04/28/2015
Model Name Accuracy (%)Decision tree 71.86Naïve Bayes 75.51Neural Net 74.74Regression 75.65SVM 76.95K Nearest Neighbor 68.24
Answers:
Using the SVM model the above result was generated with a prediction accuracy of 76.95%
Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925) Page 19
Final Report How likely Am I to have Diabetes 04/28/2015
Therefore given the medical records of any patient with the above attributes, our model can diagnose the patient for diabetes with a decent accuracy.
References:
Professor’s Class notes YouTube Wikipedia WHO’s dataset
Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925) Page 20