Weka: a useful tool in data mining and machine learning Team 5 Noha Elsherbiny, Huijun Xiong, and...

17
Weka: a useful tool in data mining and machine learning Team 5 Noha Elsherbiny, Huijun Xiong, and Bhanu Peddi

Transcript of Weka: a useful tool in data mining and machine learning Team 5 Noha Elsherbiny, Huijun Xiong, and...

Weka: a useful tool in data mining and machine learning

Team 5Noha Elsherbiny, Huijun Xiong, and

Bhanu Peddi

What does it mean really to mine data?

• Data mining is an experimental science.• Data mining finds valuable information hidden in large volumes

of data• Data mining is the analysis of data and the use of software

techniques for finding patterns and regularities in sets of data• Data mining encompasses varied fields, including:

– Databases– Statistics– Machine Learning– High Performance Computing– Visualization– Mathematics

How does WEKA come into play?

• "Drowning in Data yet Starving for Knowledge“

• There is no single machine learning scheme is suitable to all data mining problems.

• WEKA(Waikato Environment for knowledge Analysis)

What is in WEKA, that makes it special?

• Provides many different algorithms for data mining and machine learning

• This is an open source and freely available• It is platform-independent• It is easily useable by people who are not data

mining specialists• It provides flexible facilities for scripting experiments• Its has kept up-to-date, with new algorithms being

added as they appear in research literature.

How do one implement WEKA, then?

• Apply a learning method to a dataset and analyze its output to learn more about the data.

• Use learned models to generate prediction on new instances.

• Apply several different learners and compare their performance in order to chose best one for prediction.

How do you actually use it?

• All algorithms take their input form of a single relational table in the ARFF format.

• The learning methods are called classifiers.– weka.classifiers.IBk: k-nearest neighbour learner – weka.classifiers.trees.J48: decision trees– weka.classifiers.NaiveBayes: Naive Bayes

with/without kernels – weka.classifiers.SMO: support vector machines

• There are also pre-processing tools, called filters

Show us how to use it!

Decision Trees on weka

Naïvebayes on Weka

Support Vector machines

References

• Witten, Ian: Data Mining: Practical Tools and Techniques • KDNuggets,http://www.kdnuggets.com/ • McNicholas, P. D. and Zhao, Y. C. (2009), Association rules:

An overview, in Y. Zhao, C. Zhang & L. Cao, eds, 'Post-Mining of Association Rules: Techniques for Effective Knowledge Extraction', IGI Global, pp. 1-10. Available at https://irma-international.org/downloads/excerpts/33406.pdf

• http://maya.cs.depaul.edu/~Classes/Ect584/Weka/preprocess.html

• University of Waikato, New Zealand