Text Classification Using Naive Bayes

27
Bay esian Classifiers Part 2

Transcript of Text Classification Using Naive Bayes

Page 1: Text Classification Using Naive Bayes

8/8/2019 Text Classification Using Naive Bayes

http://slidepdf.com/reader/full/text-classification-using-naive-bayes 1/26

Bayesian Classifiers Part 2

Page 2: Text Classification Using Naive Bayes

8/8/2019 Text Classification Using Naive Bayes

http://slidepdf.com/reader/full/text-classification-using-naive-bayes 2/26

Contents

Simple Text Classification Using Naïve Bayes

Bayesian Belief Networks (Bayes Nets)

Page 3: Text Classification Using Naive Bayes

8/8/2019 Text Classification Using Naive Bayes

http://slidepdf.com/reader/full/text-classification-using-naive-bayes 3/26

SIMPLE TEXT CLASSIFICATIONUSING NAÏVE BAYES

Page 4: Text Classification Using Naive Bayes

8/8/2019 Text Classification Using Naive Bayes

http://slidepdf.com/reader/full/text-classification-using-naive-bayes 4/26

Learning to Classify Text

Page 5: Text Classification Using Naive Bayes

8/8/2019 Text Classification Using Naive Bayes

http://slidepdf.com/reader/full/text-classification-using-naive-bayes 5/26

Learning to Classify Text

Page 6: Text Classification Using Naive Bayes

8/8/2019 Text Classification Using Naive Bayes

http://slidepdf.com/reader/full/text-classification-using-naive-bayes 6/26

Learn_Naïve_Bayes_Text (Examples, V )

Page 7: Text Classification Using Naive Bayes

8/8/2019 Text Classification Using Naive Bayes

http://slidepdf.com/reader/full/text-classification-using-naive-bayes 7/26

Classify_Naïve_Bayes_Text (Doc)

Page 8: Text Classification Using Naive Bayes

8/8/2019 Text Classification Using Naive Bayes

http://slidepdf.com/reader/full/text-classification-using-naive-bayes 8/26

Twenty Newsgroups (Joachims, 1996)

1000 training documents from each of 20 groups 20,000

Use two third of them in learning to classify new documentsaccording to which newsgroup it came from.

Newsgroups:y comp.graphics, misc.forsale, comp.os.ms-windows.misc,

rec.autos, comp.sys.ibm.pc.hardware, rec.motorcycles,comp.sys.mac.hardware, rec.sport.baseball, comp.windows.x,rec.sport.hockey, alt.atheism, sci.space, soc.religion.christian,sci.crypt, talk.religion.misc, sci.electronics, talk.politics.mideast,sci.med, talk.politics.misc, talk.politics.guns

Naive Bayes: 89% classification accuracy

Random guess: ?

Page 9: Text Classification Using Naive Bayes

8/8/2019 Text Classification Using Naive Bayes

http://slidepdf.com/reader/full/text-classification-using-naive-bayes 9/26

An article from rec.sport.hockey

Path: cantaloupe.srv.cs.cmu.edu!das-news.harvard.edu!ogicse!uwm.edu

From: [email protected] (John Doe)

Subject: Re: This year's biggest and worst (opinion)...

Date: 5 Apr 93 09:53:39 GMT

I can only comment on the Kings, but the mostobvious candidate for pleasant surprise is Alex

Zhitnik. He came highly touted as a defensive

defenseman, but he's clearly much more than that.

Great skater and hard shot (though wish he were

more accurate). In fact, he pretty much allowed

the Kings to trade away that huge defensiveliability Paul Coffey. Kelly Hrudey is only the

biggest disappointment if you thought he was any

good to begin with. But, at best, he's only a

mediocre goaltender. A better choice would be

Tomas Sandstrom, though not through any fault of 

his own, but because some thugs in Toronto decided

Page 10: Text Classification Using Naive Bayes

8/8/2019 Text Classification Using Naive Bayes

http://slidepdf.com/reader/full/text-classification-using-naive-bayes 10/26

Learning Curve for 20 Newsgroups

Accuracy vs. Training set size (1/3 withheld for test)

(Note that the x-axis in log scale)

Page 11: Text Classification Using Naive Bayes

8/8/2019 Text Classification Using Naive Bayes

http://slidepdf.com/reader/full/text-classification-using-naive-bayes 11/26

Problems In Classifying Text

Frequent words e.g. the, of

Words with insignificant occurrence e.g.

less than threeRemove them from Vocabulary!

Page 12: Text Classification Using Naive Bayes

8/8/2019 Text Classification Using Naive Bayes

http://slidepdf.com/reader/full/text-classification-using-naive-bayes 12/26

BAYESIAN BELIEF NETWORKS(BAYES NETS)

Page 13: Text Classification Using Naive Bayes

8/8/2019 Text Classification Using Naive Bayes

http://slidepdf.com/reader/full/text-classification-using-naive-bayes 13/26

Overview

Bayesian Belief Network

Learning Bayesian Network

 ± Data is fully observable and network structure is

known Conditional probabilities table from training data (Naïve Bayesian

classifier)

 ± Network structure is known, but data is partiallyobservable

Conditional probabilities table can be obtained in similar manner forobtaining neural network weights

Other technique is by using EM algorithm

 ± Data is partially observable and networkstructure is unknown?

Page 14: Text Classification Using Naive Bayes

8/8/2019 Text Classification Using Naive Bayes

http://slidepdf.com/reader/full/text-classification-using-naive-bayes 14/26

Bayesian Belief Networks

Interesting because: ± Naive Bayes assumption of conditional

independence too restrictive

 ± But it's intractable without some suchassumptions...

 ± Bayesian Belief networks describe conditionalindependence among of variables

 ± allows combining prior knowledge about

(in)dependencies among variables withobserved training data

Also called Bayes Nets

Page 15: Text Classification Using Naive Bayes

8/8/2019 Text Classification Using Naive Bayes

http://slidepdf.com/reader/full/text-classification-using-naive-bayes 15/26

Conditional Independence

Page 16: Text Classification Using Naive Bayes

8/8/2019 Text Classification Using Naive Bayes

http://slidepdf.com/reader/full/text-classification-using-naive-bayes 16/26

Bayesian Belief Network

Network represents a set of conditional independenceassertions:y Each node is asserted to be conditionally independent of its

nondescendants, given its immediate predecessors.

y Directed acyclic graph

Page 17: Text Classification Using Naive Bayes

8/8/2019 Text Classification Using Naive Bayes

http://slidepdf.com/reader/full/text-classification-using-naive-bayes 17/26

Bayesian Belief Network

Page 18: Text Classification Using Naive Bayes

8/8/2019 Text Classification Using Naive Bayes

http://slidepdf.com/reader/full/text-classification-using-naive-bayes 18/26

Inference in Bayesian Networks

How can one infer the values of one or more networkvariables, given observed values of others?y Bayes net contains all information needed for this

inference

y

If only one variable with unknown value, easy to infer ity In general case, problem is NP hard

In practice, one can succeed in many casesy Exact inference methods work well for some network

structuresy Monte Carlo methods simulate the network randomly to

calculate approximate solutions

Page 19: Text Classification Using Naive Bayes

8/8/2019 Text Classification Using Naive Bayes

http://slidepdf.com/reader/full/text-classification-using-naive-bayes 19/26

Learning of Bayesian Networks

Several variants of this learning tasky Network structure might be know n or unknow n

y Training examples might provide values of all networkvariables, or just some

If structure known and observe all variablesy Then it's easy as training a Naive Bayes classifier

Suppose structure known, variables partially

observabley e.g., observe F or est Fi r e, St or m, BusT ourGrou p, Thund er ,

but not Light ni ng , Campf i r e Similar to training neural network with hidden units

In fact, one can learn network conditional probability tables using gradient ascent!

Converge to network h that (locally) maximizes P( D|h)

Page 20: Text Classification Using Naive Bayes

8/8/2019 Text Classification Using Naive Bayes

http://slidepdf.com/reader/full/text-classification-using-naive-bayes 20/26

Learning of Bayesian Networks?

Maximization of P(D|h)

In principle, it is easyy

Calculate P(D|h) for each h and return h of maximunP(D|h)

In practice, h contains many, many continuousvariablesy

Use gradient descent (ascent) method

In general, h contains discrete variables, too. (?)y Use an algorithm for combinatorial optimization,

such as simulated annealing method

Page 21: Text Classification Using Naive Bayes

8/8/2019 Text Classification Using Naive Bayes

http://slidepdf.com/reader/full/text-classification-using-naive-bayes 21/26

Gradient Ascent for Bayes Nets

i  j

k

Page 22: Text Classification Using Naive Bayes

8/8/2019 Text Classification Using Naive Bayes

http://slidepdf.com/reader/full/text-classification-using-naive-bayes 22/26

Gradient Ascent for Bayes Nets

' ' ' '

', '

' ' ' ' '

', '

ln lnln

1

1| , ,

1| , |

1| , |

h h

h

d Dd Dijk ijk ijk  

h

d D h ijk  

h ij ik h ij ik  

d D j k  h ijk  

h ij ik h ij ik h ik  

d D j k  h ijk  

h ij ik h ij ik h ik  

d D h ijk  

  P D P d   P d 

w w w

 P d 

  P d w

 P d y u P y u  P d w

 P d y u P y u P u  P d w

 P d y u P y u P u  P d w

x xx! !

x x x

x!

x

x!

x

x!

x

x!

x

§

§

§ §

§ §

§

Page 23: Text Classification Using Naive Bayes

8/8/2019 Text Classification Using Naive Bayes

http://slidepdf.com/reader/full/text-classification-using-naive-bayes 23/26

Gradient Ascent for Bayes Nets

ln 1| , |

1| ,

1| ,

, |1

,

, | , |

, |

h

h ij ik h ij ik h ik  

d Dijk h ijk  

h ij ik ijk h ik  

d D h ijk  

h ij ik h ik  

d D h

h ij ik h h ik  

d D h h ij ik  

h ij ik h ik h ij ik  

d D d h ij ik h ij ik  

 P D P d y u P y u P u

w P d w

 P d y u w P u

  P d w

 P d y u P u P d 

 P y u d P d P u

 P d  P y u

 P y u d P u P y u d  

  P y u P y u

x x!

x x

x!

x

!

!

! !

§

§

§

§

§ ...

 D

Page 24: Text Classification Using Naive Bayes

8/8/2019 Text Classification Using Naive Bayes

http://slidepdf.com/reader/full/text-classification-using-naive-bayes 24/26

Gradient Ascent for Bayes Nets

Page 25: Text Classification Using Naive Bayes

8/8/2019 Text Classification Using Naive Bayes

http://slidepdf.com/reader/full/text-classification-using-naive-bayes 25/26

More on Learning Bayes Nets

Page 26: Text Classification Using Naive Bayes

8/8/2019 Text Classification Using Naive Bayes

http://slidepdf.com/reader/full/text-classification-using-naive-bayes 26/26

Summary: Bayesian Belief 

Networks Combine prior knowledge with observed data ± Q: how does prior knowledge enter the network?

Impact of prior knowledge (when correct!) is to lower

the sample complexity

Active research area ± Extend from boolean to real-valued variables

 ± Parameterized distributions instead of tables

 ± Extend to first-order instead of propositional systems

 ± More effective inference methods

 ± ...