Machine Learning for Data Mining 1
Department of Computer Science, University of Waikato, New Zealand
Eibe Frank
WEKA: A Machine Learning Toolkit
The Explorer• Classification and
Regression
• Clustering
• Association Rules
• Attribute Selection
• Data Visualization
The Experimenter
The Knowledge Flow GUI
Conclusions
Machine Learning with WEKA
4/13/2012 University of Waikato 2
WEKA: the bird
Copyright: Martin Kramer ([email protected])
Machine Learning for Data Mining 2
4/13/2012 University of Waikato 3
WEKA: the software
Machine learning/data mining software written in Java (distributed under the GNU Public License)
Used for research, education, and applications
Complements “Data Mining” by Witten & Frank
Main features: Comprehensive set of data pre-processing tools,
learning algorithms and evaluation methods
Graphical user interfaces (incl. data visualization)
Environment for comparing learning algorithms
4/13/2012 University of Waikato 4
WEKA: versions There are several versions of WEKA:
WEKA 3.0: “book version” compatible with description in data mining book
WEKA 3.2: “GUI version” adds graphical user interfaces (book version is command-line only)
WEKA 3.3: “development version” with lots of improvements
This talk is based on the latest snapshot of WEKA 3.3 (soon to be WEKA 3.4)
Machine Learning for Data Mining 3
4/13/2012 University of Waikato 5
@relation heart-disease-simplified
@attribute age numeric@attribute sex { female, male}@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}@attribute cholesterol numeric@attribute exercise_induced_angina { no, yes}@attribute class { present, not_present}
@data63,male,typ_angina,233,no,not_present67,male,asympt,286,yes,present67,male,asympt,229,yes,present38,female,non_anginal,?,no,not_present...
WEKA only deals with “flat” files
4/13/2012 University of Waikato 6
@relation heart-disease-simplified
@attribute age numeric@attribute sex { female, male}@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}@attribute cholesterol numeric@attribute exercise_induced_angina { no, yes}@attribute class { present, not_present}
@data63,male,typ_angina,233,no,not_present67,male,asympt,286,yes,present67,male,asympt,229,yes,present38,female,non_anginal,?,no,not_present...
WEKA only deals with “flat” files
Machine Learning for Data Mining 4
4/13/2012 University of Waikato 7
4/13/2012 University of Waikato 8
Machine Learning for Data Mining 5
4/13/2012 University of Waikato 9
4/13/2012 University of Waikato 10
Explorer: pre-processing the data Data can be imported from a file in various
formats: ARFF, CSV, C4.5, binary
Data can also be read from a URL or from an SQL database (using JDBC)
Pre-processing tools in WEKA are called “filters”
WEKA contains filters for: Discretization, normalization, resampling, attribute
selection, transforming and combining attributes, …
Machine Learning for Data Mining 6
4/13/2012 University of Waikato 11
4/13/2012 University of Waikato 12
Machine Learning for Data Mining 7
4/13/2012 University of Waikato 13
4/13/2012 University of Waikato 14
Machine Learning for Data Mining 8
4/13/2012 University of Waikato 15
4/13/2012 University of Waikato 16
Machine Learning for Data Mining 9
4/13/2012 University of Waikato 17
4/13/2012 University of Waikato 18
Machine Learning for Data Mining 10
4/13/2012 University of Waikato 19
4/13/2012 University of Waikato 20
Machine Learning for Data Mining 11
4/13/2012 University of Waikato 21
4/13/2012 University of Waikato 22
Machine Learning for Data Mining 12
4/13/2012 University of Waikato 23
4/13/2012 University of Waikato 24
Machine Learning for Data Mining 13
4/13/2012 University of Waikato 25
4/13/2012 University of Waikato 26
Machine Learning for Data Mining 14
4/13/2012 University of Waikato 27
4/13/2012 University of Waikato 28
Machine Learning for Data Mining 15
4/13/2012 University of Waikato 29
4/13/2012 University of Waikato 30
Machine Learning for Data Mining 16
4/13/2012 University of Waikato 31
4/13/2012 University of Waikato 32
Explorer: building “classifiers” Classifiers in WEKA are models for predicting
nominal or numeric quantities
Implemented learning schemes include: Decision trees and lists, instance-based classifiers,
support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, …
“Meta”-classifiers include: Bagging, boosting, stacking, error-correcting output
codes, locally weighted learning, …
Machine Learning for Data Mining 17
4/13/2012 University of Waikato 33
4/13/2012 University of Waikato 34
Machine Learning for Data Mining 18
4/13/2012 University of Waikato 35
4/13/2012 University of Waikato 36
Machine Learning for Data Mining 19
4/13/2012 University of Waikato 37
4/13/2012 University of Waikato 38
Machine Learning for Data Mining 20
4/13/2012 University of Waikato 39
4/13/2012 University of Waikato 40
Machine Learning for Data Mining 21
4/13/2012 University of Waikato 41
4/13/2012 University of Waikato 42
Machine Learning for Data Mining 22
4/13/2012 University of Waikato 43
4/13/2012 University of Waikato 44
Machine Learning for Data Mining 23
4/13/2012 University of Waikato 45
4/13/2012 University of Waikato 46
Machine Learning for Data Mining 24
4/13/2012 University of Waikato 47
4/13/2012 University of Waikato 48
Machine Learning for Data Mining 25
4/13/2012 University of Waikato 49
4/13/2012 University of Waikato 50
Machine Learning for Data Mining 26
4/13/2012 University of Waikato 51
4/13/2012 University of Waikato 52
Machine Learning for Data Mining 27
4/13/2012 University of Waikato 53
4/13/2012 University of Waikato 54
Machine Learning for Data Mining 28
4/13/2012 University of Waikato 55
4/13/2012 University of Waikato 56
Machine Learning for Data Mining 29
4/13/2012 University of Waikato 57
4/13/2012 University of Waikato 58
Machine Learning for Data Mining 30
4/13/2012 University of Waikato 59
4/13/2012 University of Waikato 60
Machine Learning for Data Mining 31
4/13/2012 University of Waikato 61
4/13/2012 University of Waikato 62
Machine Learning for Data Mining 32
4/13/2012 University of Waikato 63
4/13/2012 University of Waikato 64
Machine Learning for Data Mining 33
4/13/2012 University of Waikato 65QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.
4/13/2012 University of Waikato 66QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.
Machine Learning for Data Mining 34
4/13/2012 University of Waikato 67QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.
4/13/2012 University of Waikato 68
Machine Learning for Data Mining 35
4/13/2012 University of Waikato 69
4/13/2012 University of Waikato 70
Machine Learning for Data Mining 36
4/13/2012 University of Waikato 71
4/13/2012 University of Waikato 72
Machine Learning for Data Mining 37
4/13/2012 University of Waikato 73
4/13/2012 University of Waikato 74
Machine Learning for Data Mining 38
4/13/2012 University of Waikato 75
QuickTime™ and a TIFF (LZW) decompressor are needed to see this pict
4/13/2012 University of Waikato 76
Machine Learning for Data Mining 39
4/13/2012 University of Waikato 77
4/13/2012 University of Waikato 78
Machine Learning for Data Mining 40
4/13/2012 University of Waikato 79
4/13/2012 University of Waikato 80
QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.
Machine Learning for Data Mining 41
4/13/2012 University of Waikato 81
QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.
4/13/2012 University of Waikato 82
Machine Learning for Data Mining 42
4/13/2012 University of Waikato 83
QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.
4/13/2012 University of Waikato 84
Machine Learning for Data Mining 43
4/13/2012 University of Waikato 85
4/13/2012 University of Waikato 86
Machine Learning for Data Mining 44
4/13/2012 University of Waikato 87
4/13/2012 University of Waikato 88
Machine Learning for Data Mining 45
4/13/2012 University of Waikato 89
4/13/2012 University of Waikato 90
Machine Learning for Data Mining 46
4/13/2012 University of Waikato 91
4/13/2012 University of Waikato 92
Explorer: clustering data WEKA contains “clusterers” for finding groups of
similar instances in a dataset
Implemented schemes are: k-Means, EM, Cobweb, X-means, FarthestFirst
Clusters can be visualized and compared to “true” clusters (if given)
Evaluation based on loglikelihood if clustering scheme produces a probability distribution
Machine Learning for Data Mining 47
4/13/2012 University of Waikato 93
4/13/2012 University of Waikato 94
Machine Learning for Data Mining 48
4/13/2012 University of Waikato 95
4/13/2012 University of Waikato 96
Machine Learning for Data Mining 49
4/13/2012 University of Waikato 97
4/13/2012 University of Waikato 98
Machine Learning for Data Mining 50
4/13/2012 University of Waikato 99
4/13/2012 University of Waikato 100
Machine Learning for Data Mining 51
4/13/2012 University of Waikato 101
4/13/2012 University of Waikato 102
Machine Learning for Data Mining 52
4/13/2012 University of Waikato 103
4/13/2012 University of Waikato 104
Machine Learning for Data Mining 53
4/13/2012 University of Waikato 105
4/13/2012 University of Waikato 106
Machine Learning for Data Mining 54
4/13/2012 University of Waikato 107
4/13/2012 University of Waikato 108
Explorer: finding associations WEKA contains an implementation of the Apriori
algorithm for learning association rules Works only with discrete data
Can identify statistical dependencies between groups of attributes: milk, butter bread, eggs (with confidence 0.9 and
support 2000)
Apriori can compute all rules that have a given minimum support and exceed a given confidence
Machine Learning for Data Mining 55
4/13/2012 University of Waikato 109
4/13/2012 University of Waikato 110
Machine Learning for Data Mining 56
4/13/2012 University of Waikato 111
4/13/2012 University of Waikato 112
Machine Learning for Data Mining 57
4/13/2012 University of Waikato 113
4/13/2012 University of Waikato 114
Machine Learning for Data Mining 58
4/13/2012 University of Waikato 115
4/13/2012 University of Waikato 116
Explorer: attribute selection Panel that can be used to investigate which
(subsets of) attributes are the most predictive ones
Attribute selection methods contain two parts: A search method: best-first, forward selection,
random, exhaustive, genetic algorithm, ranking
An evaluation method: correlation-based, wrapper, information gain, chi-squared, …
Very flexible: WEKA allows (almost) arbitrary combinations of these two
Machine Learning for Data Mining 59
4/13/2012 University of Waikato 117
4/13/2012 University of Waikato 118
Machine Learning for Data Mining 60
4/13/2012 University of Waikato 119
4/13/2012 University of Waikato 120
Machine Learning for Data Mining 61
4/13/2012 University of Waikato 121
4/13/2012 University of Waikato 122
Machine Learning for Data Mining 62
4/13/2012 University of Waikato 123
4/13/2012 University of Waikato 124
Machine Learning for Data Mining 63
4/13/2012 University of Waikato 125
Explorer: data visualization
Visualization very useful in practice: e.g. helps to determine difficulty of the learning problem
WEKA can visualize single attributes (1-d) and pairs of attributes (2-d) To do: rotating 3-d visualizations (Xgobi-style)
Color-coded class values
“Jitter” option to deal with nominal attributes (and to detect “hidden” data points)
“Zoom-in” function
4/13/2012 University of Waikato 126
Machine Learning for Data Mining 64
4/13/2012 University of Waikato 127
4/13/2012 University of Waikato 128
Machine Learning for Data Mining 65
4/13/2012 University of Waikato 129
4/13/2012 University of Waikato 130
Machine Learning for Data Mining 66
4/13/2012 University of Waikato 131
4/13/2012 University of Waikato 132
Machine Learning for Data Mining 67
4/13/2012 University of Waikato 133
4/13/2012 University of Waikato 134
Machine Learning for Data Mining 68
4/13/2012 University of Waikato 135
4/13/2012 University of Waikato 136
Machine Learning for Data Mining 69
4/13/2012 University of Waikato 137
4/13/2012 University of Waikato 138
Performing experiments Experimenter makes it easy to compare the
performance of different learning schemes
For classification and regression problems
Results can be written into file or database
Evaluation options: cross-validation, learning curve, hold-out
Can also iterate over different parameter settings
Significance-testing built in!
Machine Learning for Data Mining 70
4/13/2012 University of Waikato 139
4/13/2012 University of Waikato 140
Machine Learning for Data Mining 71
4/13/2012 University of Waikato 141
4/13/2012 University of Waikato 142
Machine Learning for Data Mining 72
4/13/2012 University of Waikato 143
4/13/2012 University of Waikato 144
Machine Learning for Data Mining 73
4/13/2012 University of Waikato 145
4/13/2012 University of Waikato 146
Machine Learning for Data Mining 74
4/13/2012 University of Waikato 147
4/13/2012 University of Waikato 148
Machine Learning for Data Mining 75
4/13/2012 University of Waikato 149
4/13/2012 University of Waikato 150
Machine Learning for Data Mining 76
4/13/2012 University of Waikato 151
4/13/2012 University of Waikato 152
The Knowledge Flow GUI New graphical user interface for WEKA
Java-Beans-based interface for setting up and running machine learning experiments
Data sources, classifiers, etc. are beans and can be connected graphically
Data “flows” through components: e.g.,
“data source” -> “filter” -> “classifier” -> “evaluator”
Layouts can be saved and loaded again later
Machine Learning for Data Mining 77
4/13/2012 University of Waikato 153
4/13/2012 University of Waikato 154
Machine Learning for Data Mining 78
4/13/2012 University of Waikato 155
4/13/2012 University of Waikato 156
Machine Learning for Data Mining 79
4/13/2012 University of Waikato 157
4/13/2012 University of Waikato 158
Machine Learning for Data Mining 80
4/13/2012 University of Waikato 159
4/13/2012 University of Waikato 160
Machine Learning for Data Mining 81
4/13/2012 University of Waikato 161
4/13/2012 University of Waikato 162
Machine Learning for Data Mining 82
4/13/2012 University of Waikato 163
4/13/2012 University of Waikato 164
Machine Learning for Data Mining 83
4/13/2012 University of Waikato 165
4/13/2012 University of Waikato 166
Machine Learning for Data Mining 84
4/13/2012 University of Waikato 167
4/13/2012 University of Waikato 168
Machine Learning for Data Mining 85
4/13/2012 University of Waikato 169
4/13/2012 University of Waikato 170
Machine Learning for Data Mining 86
4/13/2012 University of Waikato 171
4/13/2012 University of Waikato 172
Machine Learning for Data Mining 87
4/13/2012 University of Waikato 173
Conclusion: try it yourself!
WEKA is available at
http://www.cs.waikato.ac.nz/ml/weka
Also has a list of projects based on WEKA
WEKA contributors:
Abdelaziz Mahoui, Alexander K. Seewald, Ashraf M. Kibriya, Bernhard Pfahringer , Brent Martin, Peter Flach, Eibe Frank ,Gabi Schmidberger ,Ian H. Witten , J. Lindgren, Janice Boughton, Jason Wells, Len Trigg, Lucio de Souza Coelho, Malcolm Ware, Mark Hall ,Remco Bouckaert , Richard Kirkby, Shane Butler, Shane Legg, Stuart Inglis, Sylvain Roy,
Tony Voyle, Xin Xu, Yong Wang, Zhihai Wang
Top Related