CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms...
Transcript of CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms...
![Page 2: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”](https://reader034.fdocuments.in/reader034/viewer/2022052518/5f0932b37e708231d425b0e2/html5/thumbnails/2.jpg)
WHAT IS CLASSIFICATION?
§ A supervised learning task of determining the class of an instance; it is assumed that: § feature values for the given instance are known § the set of possible classes is known and given
§ Classes are given as nominal values; for instance: § classification of email messages: spam, not-spam § classification of news articles: politics, sport, culture i sl.
![Page 3: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”](https://reader034.fdocuments.in/reader034/viewer/2022052518/5f0932b37e708231d425b0e2/html5/thumbnails/3.jpg)
ToPlayOtNotToPlay.arff dataset
Example 1
![Page 4: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”](https://reader034.fdocuments.in/reader034/viewer/2022052518/5f0932b37e708231d425b0e2/html5/thumbnails/4.jpg)
Suppose you know that it is sunny outside
Then 60% chance that Play = no
Sunny weather
![Page 5: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”](https://reader034.fdocuments.in/reader034/viewer/2022052518/5f0932b37e708231d425b0e2/html5/thumbnails/5.jpg)
How well does outlook predict play?
![Page 6: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”](https://reader034.fdocuments.in/reader034/viewer/2022052518/5f0932b37e708231d425b0e2/html5/thumbnails/6.jpg)
For each attribute…
How well does outlook predict play?
![Page 7: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”](https://reader034.fdocuments.in/reader034/viewer/2022052518/5f0932b37e708231d425b0e2/html5/thumbnails/7.jpg)
Covert values to ratios
2 occurences of Play = no, where Outlook = rainy 5 occurrences of Play = no
Values to ratios
![Page 8: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”](https://reader034.fdocuments.in/reader034/viewer/2022052518/5f0932b37e708231d425b0e2/html5/thumbnails/8.jpg)
0.22 x 0.33 x 0.33 x 0.33 x 0.64 = 0.0053
Calculate the likelihood that: Outlook = sunny (0.22) Temperature = cool (0.33) Humidity = high (0.33) Windy = true (0.33) Play = yes (0.64)
Likelihood of playing under these weather conditions
Likelihood of playing under these weather conditions
![Page 9: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”](https://reader034.fdocuments.in/reader034/viewer/2022052518/5f0932b37e708231d425b0e2/html5/thumbnails/9.jpg)
0.60 x 0.20 x 0.80 x 0.60 x 0.36 = 0.0206
Calculate the likelihood that: Outlook = sunny (0.60) Temperature = cool (0.20) Humidity = high (0.80) Windy = true (0.60) Play = no (0.36)
Likelihood of NOT playing under these weather conditions
Likelihood of NOT playing under these weather conditions
![Page 10: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”](https://reader034.fdocuments.in/reader034/viewer/2022052518/5f0932b37e708231d425b0e2/html5/thumbnails/10.jpg)
Given these weather conditions: Outlook = sunny Temperature = cool Humidity = high Windy = true
Probability of Play = yes: 0.0053 = 20.5% 0.0053 + 0.0206
Probability of Play = no: 0.0206 = 79.5% 0.0053 + 0.0206
The Bayes Theorem
![Page 11: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”](https://reader034.fdocuments.in/reader034/viewer/2022052518/5f0932b37e708231d425b0e2/html5/thumbnails/11.jpg)
0.00 x 0.20 x 0.80 x 0.60 x 0.36 = 0.0000
Calculate the likelihood that: Outlook = overcast (0.00) Temperature = cool (0.20) Humidity = high (0.80) Windy = true (0.60) Play = no (0.36)
Likelihood of NOT playing under these weather conditions
![Page 12: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”](https://reader034.fdocuments.in/reader034/viewer/2022052518/5f0932b37e708231d425b0e2/html5/thumbnails/12.jpg)
Laplace estimator
Laplace estimator: Add 1 to each count
The original dataset
After the Laplace estimator
![Page 13: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”](https://reader034.fdocuments.in/reader034/viewer/2022052518/5f0932b37e708231d425b0e2/html5/thumbnails/13.jpg)
Laplace estimator
Convert incremented counts to ratios after implementing the Laplace estimator
![Page 14: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”](https://reader034.fdocuments.in/reader034/viewer/2022052518/5f0932b37e708231d425b0e2/html5/thumbnails/14.jpg)
Laplace estimator
Outlook = ovecast, Temperature = cool, Humidity = high, Windy = true
Play = no: 0.13 x 0.25 x 0.71 x 0.57 x 0.36 = 0.046 Play = yes: 0.42 x 0.33 x 0.36 x 0.36 x 0.64 = 0.0118
Probability of Play = no: 0.0046 = 28% 0.0046 + 0.0118
Probability of Play = yes: 0.0118 = 72% 0.0046 + 0.0118
![Page 15: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”](https://reader034.fdocuments.in/reader034/viewer/2022052518/5f0932b37e708231d425b0e2/html5/thumbnails/15.jpg)
Laplace estimator
Under these weather conditions: Outlook = sunny Temperature = cool Humidity = high Windy = true
NOT using Laplace estimator: Play = no: 79.5% Play = yes: 20.5%
Using Laplace estimator: Play = no: 72.0% Play = yes: 28.0%
The effect of Laplace estimator has little effect as sample size grows.
![Page 16: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”](https://reader034.fdocuments.in/reader034/viewer/2022052518/5f0932b37e708231d425b0e2/html5/thumbnails/16.jpg)
Prediction rules
Repeat previous calculation for all other combinations of weather
conditions.
Calculate the rules for each pair.
Then throw out the rules with p < 0.5
![Page 17: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”](https://reader034.fdocuments.in/reader034/viewer/2022052518/5f0932b37e708231d425b0e2/html5/thumbnails/17.jpg)
Prediction rules
Calculate probabilities for all 36 combinations
![Page 18: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”](https://reader034.fdocuments.in/reader034/viewer/2022052518/5f0932b37e708231d425b0e2/html5/thumbnails/18.jpg)
Prediction rules
The instance 6 is missing Rules predicting class for all combinations of attributes
![Page 19: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”](https://reader034.fdocuments.in/reader034/viewer/2022052518/5f0932b37e708231d425b0e2/html5/thumbnails/19.jpg)
Comparing the prediction with the original data
![Page 20: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”](https://reader034.fdocuments.in/reader034/viewer/2022052518/5f0932b37e708231d425b0e2/html5/thumbnails/20.jpg)
• Waikato Environment for Knowledge Analysis
• Java Software for data mining
• Set of algorithms for machine learning and data mining
• Developed at the University of Waikato, New Zealand
• Open-source
• Website: http://www.cs.waikato.ac.nz/ml/weka
Weka
![Page 21: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”](https://reader034.fdocuments.in/reader034/viewer/2022052518/5f0932b37e708231d425b0e2/html5/thumbnails/21.jpg)
§ We use datasets from the Technology Forge:
http://www.technologyforge.net/Datasets
Datasets we use
![Page 22: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”](https://reader034.fdocuments.in/reader034/viewer/2022052518/5f0932b37e708231d425b0e2/html5/thumbnails/22.jpg)
§ Attribut-Relation File Format – ARFF
§ Text file
@relation TPONTPNom !!@attribute Outlook {sunny, overcast, rainy} !@attribute Temp. {hot, mild, cool} !@attribute Humidity {high, normal} !@attribute Windy {'false', 'true'} !@attribute Play {no, yes} !!@data!sunny, hot, high, 'false', no !sunny, hot, high, 'true', no !overcast, hot, high, 'false', yes!...!
ARFF file
Attributes could be: • Numerical • Nominal
![Page 23: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”](https://reader034.fdocuments.in/reader034/viewer/2022052518/5f0932b37e708231d425b0e2/html5/thumbnails/23.jpg)
Classification in Weka
ToPlayOtNotToPlay.arff dataset
![Page 24: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”](https://reader034.fdocuments.in/reader034/viewer/2022052518/5f0932b37e708231d425b0e2/html5/thumbnails/24.jpg)
Classification results
The Laplace estimator is automatically applied
![Page 25: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”](https://reader034.fdocuments.in/reader034/viewer/2022052518/5f0932b37e708231d425b0e2/html5/thumbnails/25.jpg)
Classification results
Instance 6 is marked as a wrong identified instance
Probability of each instance in the dataset
![Page 26: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”](https://reader034.fdocuments.in/reader034/viewer/2022052518/5f0932b37e708231d425b0e2/html5/thumbnails/26.jpg)
Precision, Recall, and F-Measure
True Positives
Rate
False Positives
Rate
F measure = 2 * Precision * Recall Precision + Recall
Precision = TP (TP + FP)
Recall = TP (TP + NP)
![Page 27: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”](https://reader034.fdocuments.in/reader034/viewer/2022052518/5f0932b37e708231d425b0e2/html5/thumbnails/27.jpg)
Confusion Matrix
TP = True Positive
FP = False Positive
TN = True Negative
FN = False Negative
![Page 28: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”](https://reader034.fdocuments.in/reader034/viewer/2022052518/5f0932b37e708231d425b0e2/html5/thumbnails/28.jpg)
Example 2 – Eatable Mushrooms dataset
§ Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”
§ Hypothetical samples with descriptions corresponding to 23 species of mushrooms
§ There are 8124 instances with 22 nominal attributes which describe mushroom characteristics; one of which is whether a mushroom is eatable or not
§ Our goal is to predict whether a mushroom is eatable or not
![Page 29: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”](https://reader034.fdocuments.in/reader034/viewer/2022052518/5f0932b37e708231d425b0e2/html5/thumbnails/29.jpg)
Weka Tutorials and Assignments @ The Technology Forge
§ Link: http://www.technologyforge.net/WekaTutorials/
Thank you!
![Page 30: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”](https://reader034.fdocuments.in/reader034/viewer/2022052518/5f0932b37e708231d425b0e2/html5/thumbnails/30.jpg)
(Anonymous) survey for your comments and suggestions
http://goo.gl/cqdp3I