Lecture 4: The Weka Package
-
Upload
marina-santini -
Category
Education
-
view
124 -
download
2
description
Transcript of Lecture 4: The Weka Package
1
Lecture 4: The Weka Package
Marina Santini, Uppsala University
Department of Linguistics and Philology, September 2013
Lec 4: The Weka Package
Machine Learning for Language Technology
Lec 4: The Weka Package2
OutlineRe: Witten & Frank (2005)
Introduction to Weka (Ch. 9) Getting Started: The Explorer (Ch. 10) The basic methods (4.3, 4.6, 4.7) Implementations (6.1, 6.3, 6.4) Evaluation (5.1-5.6)
Assignment 1
Lec 4: The Weka Package3
Introduction: What is Weka? WEKA: Waikato Environment for Knowledge Analysis Weka: the name of a flightless bird living in New Zealand
The Weka workbench is a collection of state-of-the-art machine learning algorithms and data preprocessing tools;
Open source code (GNU General Public License ) written in Java
http://www.cs.waikato.ac.nz/ml/weka/downloading.html
Lec 4: The Weka Package4
The interface: The Explorer Uploading the input (ARFF format);
Preprocessing
Bulding a classifier;
Tuning the parameters;
Examining the output (evaluation)
Lec 4: The Weka Package5
Uploading the input (2nd_set_7webgenres.arff)
Lec 4: The Weka Package6
Preprocessing
Lec 4: The Weka Package7
Building a classifier
Lec 4: The Weka Package8
Methods & Implementations Decision Trees
J4.8 is Weka’s implementation of C.4.5 revision 8.
Instance-Based Learning IBk is a k-nearest-neighbor classifier that uses the Eucledian
distance as a default, other options include Manhattan, Chebyshev and Minkowski distances. The number of nearest neighbors (default k=1) can be specified explicitly in the parameter window.
Linear Models In VotedPerceptron, each weight vector contribute a certain
number of votes. SMO implements the sequential minimal optimization algorithm for
training a support vector classifier, (SVM) using polynomial or Gaussian kernels (Platt 1998, Keerthi et al. 2001).
Logistic builds linear logistic regression models
Lec 4: The Weka Package9
Tuning Parameters
Lec 4: The Weka Package10
Evaluation
Lec 4: The Weka Package11
Compare Results
Lec 4: The Weka Package12
Assignment 1 Classification: Decision Trees, Nearest Neighbors
and a linear classifier of your choice; Software package: Weka; Data sets:
German plural English past tense
Send WRITTEN REPORT to: [email protected]
Report deadline Fri 4 Oct 2013, week 40.
Lec 4: The Weka Package13
Thank you and Good Luck!