An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea.
-
Upload
norah-alexander -
Category
Documents
-
view
223 -
download
0
Transcript of An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea.
![Page 1: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea.](https://reader035.fdocuments.in/reader035/viewer/2022081515/56649ddf5503460f94ad8e45/html5/thumbnails/1.jpg)
An Exercise in An Exercise in Machine Learning Machine Learning
http://www.cs.iastate.edu/~cs573x/BBSIlab/2006/
Cornelia Caragea
![Page 2: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea.](https://reader035.fdocuments.in/reader035/viewer/2022081515/56649ddf5503460f94ad8e45/html5/thumbnails/2.jpg)
Outline
• Machine Learning Software
• Preparing Data
• Building Classifiers
• Interpreting Results
![Page 3: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea.](https://reader035.fdocuments.in/reader035/viewer/2022081515/56649ddf5503460f94ad8e45/html5/thumbnails/3.jpg)
Suites (General Purpose) WEKA (Source: Java) MLC++ (Source: C++) SAS List from KDNuggets (Various)
Specific Classification: C4.5, SVMlight Association Rule Mining Bayesian Net …
Commercial vs. Free
Machine Learning Software
![Page 4: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea.](https://reader035.fdocuments.in/reader035/viewer/2022081515/56649ddf5503460f94ad8e45/html5/thumbnails/4.jpg)
What does WEKA do? Implementation of the state-of-the-art learning algorithm
Main strengths in the classification Regression, Association Rules and clustering algorithms
Extensible to try new learning schemes
Large variety of handy tools (transforming datasets, filters, visualization etc…)
![Page 5: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea.](https://reader035.fdocuments.in/reader035/viewer/2022081515/56649ddf5503460f94ad8e45/html5/thumbnails/5.jpg)
WEKA resources API Documentation, Tutorials, Source code.
WEKA mailing list Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations
Weka-related Projects: Weka-Parallel - parallel processing for Weka
RWeka - linking R and Weka YALE - Yet Another Learning Environment Many others…
![Page 6: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea.](https://reader035.fdocuments.in/reader035/viewer/2022081515/56649ddf5503460f94ad8e45/html5/thumbnails/6.jpg)
Outline
• Machine Learning Software
• Preparing Data
• Building Classifiers
• Interpreting Results
![Page 7: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea.](https://reader035.fdocuments.in/reader035/viewer/2022081515/56649ddf5503460f94ad8e45/html5/thumbnails/7.jpg)
Preparing Data
ARFF Data Format Header – describing the attribute types
Data – (instances, examples) comma-separated list
![Page 8: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea.](https://reader035.fdocuments.in/reader035/viewer/2022081515/56649ddf5503460f94ad8e45/html5/thumbnails/8.jpg)
Launching WEKA
java -jar weka.jar
![Page 9: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea.](https://reader035.fdocuments.in/reader035/viewer/2022081515/56649ddf5503460f94ad8e45/html5/thumbnails/9.jpg)
Load Dataset into WEKA
![Page 10: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea.](https://reader035.fdocuments.in/reader035/viewer/2022081515/56649ddf5503460f94ad8e45/html5/thumbnails/10.jpg)
Data Filters
Useful support for data preprocessing Removing or adding attributes, resampling the dataset, removing examples, etc.
Creates stratified cross-validation folds of the given dataset, and class distributions are approximately retained within each fold.
Typically split data as 2/3 in training and 1/3 in testing
![Page 11: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea.](https://reader035.fdocuments.in/reader035/viewer/2022081515/56649ddf5503460f94ad8e45/html5/thumbnails/11.jpg)
Data Filters
![Page 12: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea.](https://reader035.fdocuments.in/reader035/viewer/2022081515/56649ddf5503460f94ad8e45/html5/thumbnails/12.jpg)
Outline
• Machine Learning Software
• Preparing Data
• Building Classifiers
• Interpreting Results
![Page 13: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea.](https://reader035.fdocuments.in/reader035/viewer/2022081515/56649ddf5503460f94ad8e45/html5/thumbnails/13.jpg)
Building Classifiers
A classifier model - mapping from dataset attributes to the class (target) attribute. Creation and form differs.
Decision Tree and Naïve Bayes Classifiers
Which one is the best? No Free Lunch!
![Page 14: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea.](https://reader035.fdocuments.in/reader035/viewer/2022081515/56649ddf5503460f94ad8e45/html5/thumbnails/14.jpg)
Building Classifiers
![Page 15: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea.](https://reader035.fdocuments.in/reader035/viewer/2022081515/56649ddf5503460f94ad8e45/html5/thumbnails/15.jpg)
(1) weka.classifiers.rules.ZeroR
Class for building and using a 0-R classifier Majority class classifier Predicts the mean (for a numeric class) or the mode (for a nominal class)
![Page 16: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea.](https://reader035.fdocuments.in/reader035/viewer/2022081515/56649ddf5503460f94ad8e45/html5/thumbnails/16.jpg)
Exercise 1
http://www.cs.iastate.edu/~cs573x/BBSIlab/2006/exercises/ex1.html
![Page 17: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea.](https://reader035.fdocuments.in/reader035/viewer/2022081515/56649ddf5503460f94ad8e45/html5/thumbnails/17.jpg)
(2)weka.classifiers.bayes.NaiveBayes
Class for building a Naive Bayes classifier
![Page 18: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea.](https://reader035.fdocuments.in/reader035/viewer/2022081515/56649ddf5503460f94ad8e45/html5/thumbnails/18.jpg)
(3) weka.classifiers.trees.J48 Class for generating a pruned or unpruned C4.5 decision tree
![Page 19: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea.](https://reader035.fdocuments.in/reader035/viewer/2022081515/56649ddf5503460f94ad8e45/html5/thumbnails/19.jpg)
Test Options
Percentage Split (2/3 Training; 1/3 Testing)
Cross-validation estimating the generalization error based on resampling when limited data; averaged error estimate.
stratified 10-fold leave-one-out (Loo)
![Page 20: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea.](https://reader035.fdocuments.in/reader035/viewer/2022081515/56649ddf5503460f94ad8e45/html5/thumbnails/20.jpg)
Outline
• Machine Learning Software
• Preparing Data
• Building Classifiers
• Interpreting Results
![Page 21: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea.](https://reader035.fdocuments.in/reader035/viewer/2022081515/56649ddf5503460f94ad8e45/html5/thumbnails/21.jpg)
Understanding Output
![Page 22: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea.](https://reader035.fdocuments.in/reader035/viewer/2022081515/56649ddf5503460f94ad8e45/html5/thumbnails/22.jpg)
Decision Tree Output (1)
![Page 23: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea.](https://reader035.fdocuments.in/reader035/viewer/2022081515/56649ddf5503460f94ad8e45/html5/thumbnails/23.jpg)
Decision Tree Output (2)
![Page 24: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea.](https://reader035.fdocuments.in/reader035/viewer/2022081515/56649ddf5503460f94ad8e45/html5/thumbnails/24.jpg)
http://www.cs.iastate.edu/~cs573x/BBSIlab/2006/exercises/ex2.html
Exercise 2
![Page 25: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea.](https://reader035.fdocuments.in/reader035/viewer/2022081515/56649ddf5503460f94ad8e45/html5/thumbnails/25.jpg)
Performance Measures Accuracy & Error rate Confusion matrix – contingency table True Positive rate & False Positive rate (Area under Receiver Operating Characteristic)
Precision,Recall & F-Measure Sensitivity & Specificity For more information on these, see
uisp09-Evaluation.ppt
![Page 26: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea.](https://reader035.fdocuments.in/reader035/viewer/2022081515/56649ddf5503460f94ad8e45/html5/thumbnails/26.jpg)
Decision Tree Pruning
Overcome Over-fitting Pre-pruning and Post-pruning Reduced error pruning Subtree raising with different confidence
Comparing tree size and accuracy
![Page 27: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea.](https://reader035.fdocuments.in/reader035/viewer/2022081515/56649ddf5503460f94ad8e45/html5/thumbnails/27.jpg)
Subtree replacement Bottom-up: tree is considered for replacement once all its subtrees have been considered
![Page 28: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea.](https://reader035.fdocuments.in/reader035/viewer/2022081515/56649ddf5503460f94ad8e45/html5/thumbnails/28.jpg)
Subtree Raising Deletes node and redistributes instances Slower than subtree replacement
![Page 29: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea.](https://reader035.fdocuments.in/reader035/viewer/2022081515/56649ddf5503460f94ad8e45/html5/thumbnails/29.jpg)
Exercise 3
http://www.cs.iastate.edu/~cs573x/BBSIlab/2006/exercises/ex3.html