An Exercise in Machine Learningweb.cs.iastate.edu/~cs573x/BBSIlab/2006/BBSI.pdf · Suites (General...
Transcript of An Exercise in Machine Learningweb.cs.iastate.edu/~cs573x/BBSIlab/2006/BBSI.pdf · Suites (General...
An Exercise in An Exercise in Machine Learning Machine Learning
http://www.cs.iastate.edu/~cs573x/BBSIlab/2006/
Cornelia Caragea
Outline
• Machine Learning Software
• Preparing Data
• Building Classifiers
• Interpreting Results
Suites (General Purpose) WEKA (Source: Java) MLC++ (Source: C++) SAS List from KDNuggets (Various)
Specific Classification: C4.5, SVMlight Association Rule Mining Bayesian Net …
Commercial vs. Free
Machine Learning Software
What does WEKA do? Implementation of the state-of-the-art
learning algorithm Main strengths in the classification Regression, Association Rules and
clustering algorithms Extensible to try new learning schemes Large variety of handy tools (transforming
datasets, filters, visualization etc…)
WEKA resources API Documentation, Tutorials, Source code. WEKA mailing list Data Mining: Practical Machine Learning Tools
and Techniques with Java Implementations Weka-related Projects:
Weka-Parallel - parallel processing for Weka RWeka - linking R and Weka YALE - Yet Another Learning Environment Many others…
Outline
• Machine Learning Software
• Preparing Data
• Building Classifiers
• Interpreting Results
Preparing Data
ARFF Data Format Header – describing the
attribute types Data – (instances,
examples) comma-separated list
Launching WEKA java -jar weka.jar
Load Dataset into WEKA
Data Filters Useful support for data preprocessing Removing or adding attributes, resampling the
dataset, removing examples, etc. Creates stratified cross-validation folds of the
given dataset, and class distributions areapproximately retained within each fold.
Typically split data as 2/3 in training and 1/3 intesting
Data Filters
Outline
• Machine Learning Software
• Preparing Data
• Building Classifiers
• Interpreting Results
Building Classifiers
A classifier model - mapping from datasetattributes to the class (target) attribute.Creation and form differs.
Decision Tree and Naïve Bayes Classifiers Which one is the best?
No Free Lunch!
Building Classifiers
(1) weka.classifiers.rules.ZeroR
Class for building and using a 0-R classifier Majority class classifier Predicts the mean (for a numeric class) or the
mode (for a nominal class)
Exercise 1
http://www.cs.iastate.edu/~cs573x/BBSIlab/2006/exercises/ex1.html
(2)weka.classifiers.bayes.NaiveBayes Class for building a Naive Bayes classifier
(3) weka.classifiers.trees.J48 Class for generating a pruned or
unpruned C4.5 decision tree
Test Options Percentage Split (2/3 Training; 1/3
Testing) Cross-validation
estimating the generalization error based onresampling when limited data; averaged errorestimate.
stratified 10-fold leave-one-out (Loo)
Outline
• Machine Learning Software
• Preparing Data
• Building Classifiers
• Interpreting Results
Understanding Output
Decision Tree Output (1)
Decision Tree Output (2)
http://www.cs.iastate.edu/~cs573x/BBSIlab/2006/exercises/ex2.html
Exercise 2
Performance Measures Accuracy & Error rate Confusion matrix – contingency table True Positive rate & False Positive rate (Area
under Receiver Operating Characteristic) Precision,Recall & F-Measure Sensitivity & Specificity For more information on these, see
uisp09-Evaluation.ppt
Decision Tree Pruning
Overcome Over-fitting Pre-pruning and Post-pruning Reduced error pruning Subtree raising with different confidence Comparing tree size and accuracy
Subtree replacement Bottom-up: tree is considered for
replacement once all its subtrees havebeen considered
Subtree Raising Deletes node and redistributes instances Slower than subtree replacement
Exercise 3
http://www.cs.iastate.edu/~cs573x/BBSIlab/2006/exercises/ex3.html