Appendix Weka

8/13/2019 Appendix Weka

1/17

1

Appendix: The WEKA Data Mining

Software

http://www.cs.waikato.ac.nz/ml/weka/


2/17

2

WEKA: Introduction WEKA, developed by Waikato University, New Zealand.

WEKA (Waikato Environment for Knowledge Analysis) History: 1stversion (version 2.1, 1996); Version 2.3,

1998; Version 3.0, 1999; Version 3.4, 2003; Version 3.6,2008.

WEKA provides a collection of data mining, machinelearning algorithms and preprocessing tools. It includes algorithms for regression, classification, clustering,

association rule mining and attribute selection.

It also has data visualization facilities.

WEKA is an environment for comparing learning

algorithms With WEKA, researchers can implement new data

mining algorithms to add in WEKA

WEKA is the best-known open-source data miningsoftware.


3/17

3

WEKA: Introduction

WEKA was written in Java. WEKA 3.4 consists of 271477 lines of code. WEKA 3.6 consists of 509903 lines of code.

It can work on Windows, Linux and Macintosh.

Users can access its components through Java

programming or through a command-line interface. It consists of three main graphical user interfaces:

Explorer, Experimenterand Knowledge Flow.

The easiest way to use WEKA is through Explorer,the main graphical user interface.

Data can be loaded from various sources, includingfiles, URLs and databases. Database access isprovided through Java Database Connectivity.


4/17

4

WEKA data format

WEKA stores data in flat files (ARFF format).

Its easy to transform EXCEL file to ARFF format.

An ARFF file consists of a list of instances

We can create an ARFF file by using Notepad orWord.

The name of the dataset is with @relation

Attribute information is with @attribute

The data is with @data.

Beside ARFF format, WEKA allows CSV, LibSVM,

and C4.5s format.


5/17

5

WEKA ARFF format

@relation weather

@attribute outlook {sunny, overcast, rainy}@attribute temperature real

@attribute humidity real

@attribute windy {TRUE, FALSE}

@attribute play {yes, no}

@data

sunny, 85, 85, FALSE, no

sunny, 80, 90, TRUE, no

overcast, 83, 86, FALSE, yesrainy, 70, 96, FALSE, yes

rainy, 68, 80, FALSE, yes


6/17

6

Explorer GUI Consists of 6 panels, each for one data mining

tasks: Preprocess

Classify

Cluster

Associate

Select Attributes Visualize.

Preprocess: to use WEKAs data preprocessing tools (called filters) to

transform the dataset in several ways.

WEKA contains filters for: Discretization, normalization, resampling, attribute

selection, transforming and combining attributes,


7/17

7

Explorer (cont.)

Classify: Regression techniques (predictors of continuous classes)

Linear regression

Logistic regression

Neural network

Support vector machine

Classification algorithms Decision treesID3, C4.5 (called J48)

Nave Bayes, Bayes network

k-nearest-neighbors

Rule learners: Ripper, Prism

Lazy rule learners Meta learners (bagging, boosting)


8/17

8

Clustering Clustering algorithms:

K-Means, X-Means, FarthestFirst

Likelihood-based clustering: EM (Expectation-Maximization)

Cobweb (incremental clustering algorithm)

Clusters can be visualized and compared to true clusters (ifgiven)

Attribute Selection: This provides access to various methods formeasuring the utility of attributes and identifying the most

important attributes in a dataset. Filter method: the attribute set is filtered to produce the most

promising subset before learning begins.

A wide range of filtering criteria, including correlation-basedfeature selection, the chi-square statistic, gain ratio, information,support-machine-based criterion.

A variety of search methods: forward and backward selection,best-first search, genetic search and random search.

PCA (principal component analysis) to reduce the dimensionalityof a problem.

Discretizing numeric attributes.


9/17

9

Explorer (cont.)

Assocation rule mining Apriori algorithm

Work only with discrete data

Visualization

Scatter plots, ROC curves,Trees, graphs WEKA can visualize single attributes (1-d) and pairs of

attributes (2-d).

Color-coded class values.

Zoom-in function


10/17

10


11/17

11

Explorer

GUI

(Classify)


12/17

12

WEKA Experimenter

This interface is designed to facilitate experimentalcomparisonsof the performance of algorithmsbased on many different evaluation criteria.

Experiments can involves many algorithms that are

run on multiple datasets.

Can also iterate over different parameter settings

Experiments can also be distributed across differentcomputer nodes in a network.

Once an experiment has been set up, it can besaved in either XML or binary form, so that it can bere-visited.


13/17

13


14/17

14

Knowledge Flow Interface

The Explorer is designed for batch-based data

processing: training data is loaded into memory andthen processed.

However WEKA has implemented some incremental

algorithms.

Knowledge-flow interface can handle incremental

updates. It can load and preprocess individual

instances before feeding them into incremental

learning algorithms.

Knowledge-flow also provides nodes for

visualization and evaluation.


15/17

15


16/17

16

Conclusions Comparison to R, WEKA is weaker in classical statistics but

stronger in machine learning (data mining) algorithms.

WEKA has developed a set of extensions covering diverseareas, such as text mining, visualization and bioinformatics.

WEKA 3.6 includes support for importing PMML models(Predictive Modeling Markup Language). PMML is a XML-basedstandard fro expressing statistical and data mining models.

WEKA 3.6 can read and write data in the format used by the wellknown LibSVM and SVM-Light support vector machineimplementations.

WEKA has 2 limitations:

Most of the algorithms require all the data stored in mainmemory. So it restricts application to small or medium-sizeddatasets.

Java implementation is somewhat slower than an equivalent inC/C++


17/17

17

References

I.H. Witten and E. Frank, Data Mining: PracticalMachine Learning Tools and Techniques with JavaImplementations, Morgan Kaufmann, SanFrancisco, 2000.

M. Hall and E. Frank, The WEKA Data Mining

Software: An Update, J. SIGKDD Explorations, Vol.11, No. 1, 2008.

R. R. Bouckaert et al., WEKA Manual for Version3.6.0, 2008.

E. Frank et al., WEKAA Machine LearningWorkbench for Data Mining, 2003.

Appendix Weka

Documents

Transcript of Appendix Weka

for WEKA Version 3.4 - The Institutecsed.sggs.ac.in/csed/sites/default/files/WEKA Explorer Tutorial.pdf · Machine Learning with WEKA WEKA Explorer Tutorial for WEKA Version 3.4.3

INSTALLATION MANUAL FOR WEKA GUARD · PDF fileweka_manual_protector24-05-2016 installation manual for weka guard weka protector™ weka boxcoolers b.v. industrieweg 8 nl-2921 lb krimpen

Trabalho Weka

Weka Sample

Weka classification

Weka Project

WEKA 3.5.5

Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,

WEKA in the Ecosystem for Scientific Computing. Contents Part 1: Introduction to WEKA Part 2: WEKA & Octave Part 3: WEKA & R Part 4: WEKA & Python Part.

WEKA 3.5.5 (sumber: Machine Learning with WEKA). What is WEKA? Weka is a collection of machine learning algorithms for data mining tasks. Weka contains.

The WEKA - cs. · PDF fileThe WEKA Workbench Eibe Frank, Mark A. Hall, and Ian H. Witten Online Appendix for “Data Mining: Practical Machine Learning Tools and Techniques”

WEKA MODELS

Weka DataminingZZ

Weka presentation

Data Mining with WEKA WEKA

Weka Experiments

Tutorial weka

Weka Software

WEKA DATA MINING SYSTEM Weka Experiment Environment

Czym jest WEKA? w WEKA Wprowadzenie do WEKAquantup.pl/wp-content/uploads/2012/09/wprowadzenie_do_WEKA.pdf · Wprowadzenie do WEKA Adam Zagdański, Artur Suchwałko () Czym jest WEKA?