Slides for “Data Mining” by I. H. Witten and E. Frank.

Slides for “Data Mining”by

I. H. Witten and E. Frank

From Naïve Bayes to Bayesian Network

Classifiers

Naïve Bayes

Assumption: attributes conditionally independent given the class

Doesn’t hold in practice but classification accuracy often high

However: sometimes performance much worse than e.g. decision tree

Can we eliminate the assumption?

Enter Bayesian networks

Graphical models that can represent any probability distribution

Graphical representation: directed acyclic graph, one node for each attribute

Overall probability distribution factorized into component distributions

Graph’s nodes hold component distributions (conditional distributions)

Computing the class probabilities

Two steps: computing a product of probabilities for each class and normalization For each class value

Take all attribute values and class value Look up corresponding entries in conditional

probability distribution tables Take the product of all probabilities

Divide the product for each class by the sum of the products (normalization)

Why can we do this? (Part I)

Single assumption: values of a node’s parents completely determine probability distribution for current node

Means that node/attribute is conditionally independent of other nodes given parents

Why can we do this? (Part II)

Chain rule from probability theory:

Because of our assumption from the previous slide:

Learning Bayes nets

Basic components of algorithms for learning Bayes nets: Method for evaluating the goodness

of a given network Measure based on probability of training

data given the network (or the logarithm thereof)

Method for searching through space of possible networks Amounts to searching through sets of

edges because nodes are fixed

Problem: overfitting

Can’t just maximize probability of the training dataBecause then it’s always better to add more edges

(fit the training data more closely)Need to use cross-validation or some penalty

for complexity of the networkAIC measure:MDL measure:LL: log-likelihood (log of probability of data), K:

number of free parameters, N: #instancesAnother possibility: Bayesian approach with

prior distribution over networks

Searching for a good structure

Task can be simplified: can optimize each node separatelyBecause probability of an instance is

product of individual nodes’ probabilities

Also works for AIC and MDL criterion because penalties just add up

Can optimize node by adding or removing edges from other nodes

Must not introduce cycles!

The K2 algorithm

Starts with given ordering of nodes (attributes)

Processes each node in turn Greedily tries adding edges from

previous nodes to current node Moves to next node when current

node can’t be optimized further Result depends on initial order

Some tricks

Sometimes it helps to start the search with a naïve Bayes network

It can also help to ensure that every node is in Markov blanket of class nodeMarkov blanket of a node includes all parents,

children, and children’s parents of that nodeGiven values for Markov blanket, node is

conditionally independent of nodes outside blanketI.e. node is irrelevant to classification if not in

Markov blanket of class node

Other algorithms

Extending K2 to consider greedily adding or deleting edges between any pair of nodesFurther step: considering inverting

the direction of edgesTAN (Tree Augmented Naïve Bayes):

Starts with naïve BayesConsiders adding second parent to

each node (apart from class node)Efficient algorithm exists

Likelihood vs. conditional likelihood

In classification what we really want is to maximize probability of class given other attributesNot probability of the instances

But: no closed-form solution for probabilities in nodes’ tables that maximize this

However: can easily compute conditional probability of data based on given network

Seems to work well when used for network scoring

Discussion

We have assumed: discrete data, no missing values, no new nodes

Different method of using Bayes nets for classification: Bayesian multinetsI.e. build one network for each class and

make prediction using Bayes’ ruleDifferent class of learning methods:

testing conditional independence assertions

Slides for “Data Mining” by I. H. Witten and E. Frank.

Documents

Transcript of Slides for “Data Mining” by I. H. Witten and E. Frank.

Data Mining - PhD in Information Engineering and Science - …phd.dii.unisi.it/corsi/matdid/25_DM-Tutorial-slides.pdf · · 2008-07-09Data Mining 2007 Ian H. Witten Data Mining

DMTM 02 Data Exploration Preparation...Pang-Ning Tan, Michael Steinbach, Vipin Kumar, “Introduction to Data Mining”, Addison Wesley ! Ian H. Witten, Eibe Frank. “Data Mining:

Slides for “Data Mining” by I. H. Witten and E. Frank

More Data Mining with Weka - University of Waikato...weka.waikato.ac.nz Ian H. Witten Department of Computer Science University of Waikato New Zealand More Data Mining with Weka Class

Interview with Edward Witten · Interview with Edward Witten Hirosi Ooguri This is a slightly edited version of an interview with Edward Witten that appeared in the December 2014

Equivariant Gromov-Witten Theory of GKM Orbifolds … · ABSTRACT Equivariant Gromov-Witten Theory of GKM Orbifolds Zhengyu Zong In this paper, we study the all genus Gromov-Witten

Mining warden inquiries 1972–2001 · 2021. 3. 11. · The last mining warden, Mr Frank Windridge, compiled the following information about mining warden inquiries held since 1990,

Localization Computations of Gromov-Witten Invariantstaipale/prelimpaper.pdfLocalization Computations of Gromov-Witten Invariants Kaisa Taipale May 2, 2007 Introduction Gromov-Witten

The WEKA - cs. · PDF fileThe WEKA Workbench Eibe Frank, Mark A. Hall, and Ian H. Witten Online Appendix for “Data Mining: Practical Machine Learning Tools and Techniques”

Witten and Huysseun, 2009

The WEKA Data Mining Software: An Update WEKA Data Mining Software: An Update Mark Hall Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer Peter Reutemann, Ian H. Witten Pentaho Corporation

The WEKA Data Mining Software: An Update - kdd.org · The WEKA Data Mining Software: An Update Mark Hall Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer Peter Reutemann, Ian H. Witten

Text mining - University of Waikatoihw/papers/04-IHW-Text...Text mining Ian H. Witten Computer Science, University of Waikato, Hamilton, New Zealand email ihw@cs.waikato.ac.nz Index

String Theory- Edward Witten

Data Mining - WPIruiz/KDDRG/Resources/Slides/WekaTextbook... · Slides for Chapter 1 of Data Mining by I. H. Witten, ... loan applications, screening images, load forecasting, ...

Volume - MSRIlibrary.msri.org/books/Book37/files/okonek.pdf · 1. Seiberg{Witten Invariants 394 2. Nonabelian Seiberg{Witten Theory 402 3. Seiberg{Witten Theory and K¨ahler Geometry

Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall.

Data Mining - Paris Dauphine Universitytsoukias/download/... · • Witten Ian and Eibe Frank, Data Mining, Practical Machine Learning Tools and Techniques with Java Implementations,

seiberg witten rev2edition

Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 6.1: Decision Trees Rodney Nielsen Many