CS340: Machine Learning Naive Bayes classifiers Kevin Murphy · apply Bayes rule to compute the...

17
CS340: Machine Learning Naive Bayes classifiers Kevin Murphy 1

Transcript of CS340: Machine Learning Naive Bayes classifiers Kevin Murphy · apply Bayes rule to compute the...

Page 1: CS340: Machine Learning Naive Bayes classifiers Kevin Murphy · apply Bayes rule to compute the posterior p(y|x) = ... Naive Bayes assumption ... In matlab, use logsumexp.m. 15. Softmax

CS340: Machine Learning

Naive Bayes classifiers

Kevin Murphy

1

Page 2: CS340: Machine Learning Naive Bayes classifiers Kevin Murphy · apply Bayes rule to compute the posterior p(y|x) = ... Naive Bayes assumption ... In matlab, use logsumexp.m. 15. Softmax

Classifiers

• A classifier is a function f that maps input feature vectorsx ∈ X to output class labels y ∈ {1, . . . , C}

• X is the feature space eg X = IRp or X = {0, 1}p (can mixdiscrete and continuous features)

•We assume the class labels are unordered (categorical) andmutually exclusive. (If we allow an input to belong to multipleclasses, this is called a multi-label problem.)

• Goal: to learn f from a labeled training set of N input-outputpairs, (xi, yi), i = 1 : N .

•We will focus our attention on probabilistic classifiers, i.e., methodsthat return p(y|x).

• Alternative is to learn a discriminant function f (x) = y(x) topredict the most probable label.

2

Page 3: CS340: Machine Learning Naive Bayes classifiers Kevin Murphy · apply Bayes rule to compute the posterior p(y|x) = ... Naive Bayes assumption ... In matlab, use logsumexp.m. 15. Softmax

Generative vs discriminative classifiers

•Discriminative: directly learn the function that computes the classposterior p(y|x). It discriminates between different classes given

the input.

• Generative: learn the class-conditional density p(x|y) foreach value of y, and learn the class priors p(y); then one canapply Bayes rule to compute the posterior

p(y|x) =p(x, y)

p(x)=

p(x|y)p(y)

p(x)

where p(x) =∑C

y′=1 p(y′|x).

3

Page 4: CS340: Machine Learning Naive Bayes classifiers Kevin Murphy · apply Bayes rule to compute the posterior p(y|x) = ... Naive Bayes assumption ... In matlab, use logsumexp.m. 15. Softmax

Generative classifiers

We usually use a plug-in approximation for simplicity

p(y = c|x,D) ≈ p(y = c|x, θ, π) =p(x|y = c, θc)p(y = c|π)

c′ p(x|y = c′, θc′)p(y = c′|π)

where D is the training data, π are the parameters of the class priorp(y) and θ are the paramters of the class-conditional densities p(x|y).

4

Page 5: CS340: Machine Learning Naive Bayes classifiers Kevin Murphy · apply Bayes rule to compute the posterior p(y|x) = ... Naive Bayes assumption ... In matlab, use logsumexp.m. 15. Softmax

Class prior

• Class priorp(y = c|π) = πc

•MLE

πc =

∑Ni=1 I(yi = c)

N=

Nc

Nwhere Nc is the number of training examples that have class label c.

• Posterior mean (using Dirichlet prior)

πc =Nc + 1

N + C

5

Page 6: CS340: Machine Learning Naive Bayes classifiers Kevin Murphy · apply Bayes rule to compute the posterior p(y|x) = ... Naive Bayes assumption ... In matlab, use logsumexp.m. 15. Softmax

Gaussian class-conditional densities

Suppose x ∈ IR2 representing the height and weight of adult Westerners(in inches and pounds respectively), and y ∈ {1, 2} represents male orfemale. A natural choice for the class-conditional density is a two-dimensional Gaussian

p(x|y = c, θc) = N (x|µc, Σc)

where the mean and covariance matrix depend on the class c

55 60 65 70 75 8080

100

120

140

160

180

200

220

240

260

280

height

wei

ght

dotted red circle = female, solid blue cross =male

6

Page 7: CS340: Machine Learning Naive Bayes classifiers Kevin Murphy · apply Bayes rule to compute the posterior p(y|x) = ... Naive Bayes assumption ... In matlab, use logsumexp.m. 15. Softmax

Naive Bayes assumption

Assume features are conditionally independent given class.

p(x|y = c, θc) =

p∏

d=1

p(xd|y = c, θc) =

p∏

d=1

N (xd|µcd, σcd)

This is equivalent to assuming that Σc is diagonal.

55 60 65 70 750

0.2

0.4height male

55 60 65 70 750

0.2

0.4height female

100 150 200 2500

0.2

0.4weight male

100 150 200 2500

0.2

0.4weight female

7

Page 8: CS340: Machine Learning Naive Bayes classifiers Kevin Murphy · apply Bayes rule to compute the posterior p(y|x) = ... Naive Bayes assumption ... In matlab, use logsumexp.m. 15. Softmax

Training

55 60 65 70 750

0.2

0.4height male

55 60 65 70 750

0.2

0.4height female

100 150 200 2500

0.2

0.4weight male

100 150 200 2500

0.2

0.4weight female

h w y67 125 m79 210 m71 150 m68 140 f67 142 f60 120 f

h wm µ = 71.66, σ = 3.13 µ = 175.62, σ = 32.40f µ = 65.07, σ = 3.19 µ = 129.69, σ = 18.67

8

Page 9: CS340: Machine Learning Naive Bayes classifiers Kevin Murphy · apply Bayes rule to compute the posterior p(y|x) = ... Naive Bayes assumption ... In matlab, use logsumexp.m. 15. Softmax

Testing

p(y = m|x) =p(x|y = m)p(y = m)

p(x|y = m)p(y = m) + p(x|y = f )p(y = f )

Let us assume p(y = m) = p(y = f ) = 0.5

p(y = m|x) =p(x|y = m)

p(x|y = m) + p(x|y = f )

=p(xh|y = m)p(xw|y = m)

p(xh|y = m)p(xw|y = m) + p(xh|y = f )p(xw|y = f )

=N (xh; µmh, σmh) ×N (xh; µmw, σmw)

“ + N (xh; µfh, σfh) ×N (xh; µfw, σfw)

9

Page 10: CS340: Machine Learning Naive Bayes classifiers Kevin Murphy · apply Bayes rule to compute the posterior p(y|x) = ... Naive Bayes assumption ... In matlab, use logsumexp.m. 15. Softmax

Testing

55 60 65 70 750

0.2

0.4height male

55 60 65 70 750

0.2

0.4height female

100 150 200 2500

0.2

0.4weight male

100 150 200 2500

0.2

0.4weight female

h wm µ = 71.66, σ = 3.13 µ = 175.62, σ = 32.40f µ = 65.07, σ = 3.19 µ = 129.69, σ = 18.67

h w p(y = m|x)72 18060 10068 155

10

Page 11: CS340: Machine Learning Naive Bayes classifiers Kevin Murphy · apply Bayes rule to compute the posterior p(y|x) = ... Naive Bayes assumption ... In matlab, use logsumexp.m. 15. Softmax

Binary data

• Suppose we want to classify email into spam vs non-spam.

• A simple way to represent a text document (such as email) is as abag of words.

• Let xd = 1 if word d occurs in the document and xd = 0 otherwise.

0 200 400 600

0

100

200

300

400

500

600

700

800

900

words

docu

men

ts

training set

11

Page 12: CS340: Machine Learning Naive Bayes classifiers Kevin Murphy · apply Bayes rule to compute the posterior p(y|x) = ... Naive Bayes assumption ... In matlab, use logsumexp.m. 15. Softmax

Parameter estimation

Class-conditional denstity becomes a product of Bernoullis

p(~x|Y = c, θ) =

p∏

d=1

θxdcd

(1 − θcd)1−xd

MLE

θcd =Ncd

Nc

Posterior mean (with Beta(1,1) prior)

θcd =Ncd + 1

Nc + 2

12

Page 13: CS340: Machine Learning Naive Bayes classifiers Kevin Murphy · apply Bayes rule to compute the posterior p(y|x) = ... Naive Bayes assumption ... In matlab, use logsumexp.m. 15. Softmax

Fitted class conditional densities p(x = 1|y = c)

0 100 200 300 400 500 600 7000

0.2

0.4

0.6

0.8

1word freq for class 1

0 100 200 300 400 500 600 7000

0.2

0.4

0.6

0.8

1word freq for class 2

13

Page 14: CS340: Machine Learning Naive Bayes classifiers Kevin Murphy · apply Bayes rule to compute the posterior p(y|x) = ... Naive Bayes assumption ... In matlab, use logsumexp.m. 15. Softmax

Numerical issues

When computing

P (Y = c|~x) =P (~x|Y = c)P (Y = c)

∑Cc′=1 P (~x|Y = c′)P (Y = c′)

you will oftne encounter numerical underflow since p(~x, y = c) isvery small.Take logs

bcdef= log[P (~x|Y = c)P (Y = c)]

log P (Y = c|~x) = bc − log

C∑

c′=1

ebc′

but ebc will underflow!

14

Page 15: CS340: Machine Learning Naive Bayes classifiers Kevin Murphy · apply Bayes rule to compute the posterior p(y|x) = ... Naive Bayes assumption ... In matlab, use logsumexp.m. 15. Softmax

Log-sum-exp trick

log(e−120 + e−121) = log(

e−120(e0 + e−1))

= log(e0 + e−1) − 120

In general

log∑

c

ebc = log

[

(∑

c

ebc)e−BeB

]

= log

[

(∑

c

ebc−B)eB

]

=

[

log(∑

c

ebc−B)

]

+ B

where B = maxc bc.In matlab, use logsumexp.m.

15

Page 16: CS340: Machine Learning Naive Bayes classifiers Kevin Murphy · apply Bayes rule to compute the posterior p(y|x) = ... Naive Bayes assumption ... In matlab, use logsumexp.m. 15. Softmax

Softmax function

p(y = c|~x, θ, π) =p(x|y = c)p(y = c)

c′ p(x|y = c′)p(y = c′)

=exp[log p(x|y = c) + log p(y = c)]

c′ exp[log p(x|y = c′) + log p(y = c′)]

=exp [log πc +

d xd log θcd]∑

c′ exp [log πc′ +∑

d xd log θc′d]

Now define vectors

~x = [1, x1, . . . , x1p]

βc = [log πc, log θc1, . . . , log θcp]

Hence

p(y = c|~x, β) =exp[βT

c ~x]∑

c′ exp[βTc′~x]

16

Page 17: CS340: Machine Learning Naive Bayes classifiers Kevin Murphy · apply Bayes rule to compute the posterior p(y|x) = ... Naive Bayes assumption ... In matlab, use logsumexp.m. 15. Softmax

Logistic function

If y is binary

p(y = 1|x) =eβT

1 x

eβT1 x + eβT

2 x

=1

1 + e(β2−β1)Tx

=1

1 + ewTx

= σ(wTx)

where we have defined w = β2 − β1 and σ(·) is the logistic or sig-

moid function

σ(u) =1

1 + e−u

17