Gaussian’Naïve’Bayesmgormley/courses/10601-s17/slides/lecture6-gnb.pdfGaussian’Naïve’Bayes...

24
Gaussian Naïve Bayes 1 10601 Introduction to Machine Learning Matt Gormley Lecture 6 February 6, 2016 Machine Learning Department School of Computer Science Carnegie Mellon University Naïve Bayes Readings: Generative and Discriminative Classifiers: Naive Bayes and Logistic Regression(Mitchell, 2016) Murphy 3 Bishop HTF Mitchell 6.16.10 Optimization Readings: (next lecture) Lecture notes from 10600 (see Piazza note) Convex Optimization” Boyd and Vandenberghe (2009) [See Chapter 9. This advanced reading is entirely optional.]

Transcript of Gaussian’Naïve’Bayesmgormley/courses/10601-s17/slides/lecture6-gnb.pdfGaussian’Naïve’Bayes...

Page 1: Gaussian’Naïve’Bayesmgormley/courses/10601-s17/slides/lecture6-gnb.pdfGaussian’Naïve’Bayes 1 100601IntroductiontoMachineLearning Matt%Gormley Lecture6 February%6,%2016 Machine%Learning%Department

Gaussian  Naïve  Bayes

1

10-­‐601  Introduction  to  Machine  Learning

Matt  GormleyLecture  6

February  6,  2016

Machine  Learning  DepartmentSchool  of  Computer  ScienceCarnegie  Mellon  University

Naïve  Bayes  Readings:“Generative  and  Discriminative  Classifiers:  Naive  Bayes  and  Logistic  Regression”  (Mitchell,  2016)

Murphy  3Bishop  -­‐-­‐HTF  -­‐-­‐Mitchell  6.1-­‐6.10

Optimization  Readings:  (next  lecture)Lecture  notes  from  10-­‐600  (see  Piazza  note)

“Convex  Optimization”  Boyd  and  Vandenberghe (2009)    [See  Chapter  9.  This  advanced  reading  is  entirely  optional.]

Page 2: Gaussian’Naïve’Bayesmgormley/courses/10601-s17/slides/lecture6-gnb.pdfGaussian’Naïve’Bayes 1 100601IntroductiontoMachineLearning Matt%Gormley Lecture6 February%6,%2016 Machine%Learning%Department

Reminders

• Homework 2:  Naive Bayes– Release:  Wed,  Feb.  1– Due:  Mon,  Feb.  13  at  5:30pm

• Homework 3:  Linear  /  Logistic Regression– Release:  Mon,  Feb.  13– Due:  Wed,  Feb.  22  at  5:30pm

2

Page 3: Gaussian’Naïve’Bayesmgormley/courses/10601-s17/slides/lecture6-gnb.pdfGaussian’Naïve’Bayes 1 100601IntroductiontoMachineLearning Matt%Gormley Lecture6 February%6,%2016 Machine%Learning%Department

Naïve  Bayes  Outline• Probabilistic  (Generative)  View  of  

Classification– Decision  rule  for  probability  model

• Real-­‐world  Dataset– Economist  vs.  Onion  articles– Document  à bag-­‐of-­‐words  à binary  feature  

vector• Naive  Bayes:  Model

– Generating  synthetic  "labeled  documents"– Definition  of  model– Naive  Bayes  assumption– Counting  #  of  parameters  with  /  without  NB  

assumption• Naïve  Bayes:  Learning  from  Data

– Data  likelihood– MLE  for  Naive  Bayes– MAP  for  Naive  Bayes

• Visualizing  Gaussian  Naive  Bayes

3

This  Lecture

Last  Lecture

Page 4: Gaussian’Naïve’Bayesmgormley/courses/10601-s17/slides/lecture6-gnb.pdfGaussian’Naïve’Bayes 1 100601IntroductiontoMachineLearning Matt%Gormley Lecture6 February%6,%2016 Machine%Learning%Department

Naive  Bayes:  Model

Whiteboard– Generating  synthetic  "labeled  documents"– Definition  of  model– Naive  Bayes  assumption– Counting  #  of  parameters  with  /  without  NB  assumption

4

Page 5: Gaussian’Naïve’Bayesmgormley/courses/10601-s17/slides/lecture6-gnb.pdfGaussian’Naïve’Bayes 1 100601IntroductiontoMachineLearning Matt%Gormley Lecture6 February%6,%2016 Machine%Learning%Department

What’s  wrong  with  the  Naïve  Bayes  Assumption?

The  features  might  not  be  independent!!

5

• Example  1:– If  a  document  contains  the  word  “Donald”,  it’s  extremely  likely  to  contain  the  word  “Trump”

– These  are  not  independent!

• Example  2:– If  the  petal  width  is  very  high,  the  petal  length  is  also  likely  to  be  very  high

Page 6: Gaussian’Naïve’Bayesmgormley/courses/10601-s17/slides/lecture6-gnb.pdfGaussian’Naïve’Bayes 1 100601IntroductiontoMachineLearning Matt%Gormley Lecture6 February%6,%2016 Machine%Learning%Department

Naïve  Bayes:  Learning  from  Data

Whiteboard– Data  likelihood–MLE  for  Naive  Bayes–MAP  for  Naive  Bayes

6

Page 7: Gaussian’Naïve’Bayesmgormley/courses/10601-s17/slides/lecture6-gnb.pdfGaussian’Naïve’Bayes 1 100601IntroductiontoMachineLearning Matt%Gormley Lecture6 February%6,%2016 Machine%Learning%Department

VISUALIZING  NAÏVE  BAYES

7Slides  in  this  section  from  William  Cohen  (10-­‐601B,  Spring  2016)

Page 8: Gaussian’Naïve’Bayesmgormley/courses/10601-s17/slides/lecture6-gnb.pdfGaussian’Naïve’Bayes 1 100601IntroductiontoMachineLearning Matt%Gormley Lecture6 February%6,%2016 Machine%Learning%Department
Page 9: Gaussian’Naïve’Bayesmgormley/courses/10601-s17/slides/lecture6-gnb.pdfGaussian’Naïve’Bayes 1 100601IntroductiontoMachineLearning Matt%Gormley Lecture6 February%6,%2016 Machine%Learning%Department

Fisher  Iris  DatasetFisher  (1936)  used  150  measurements  of  flowers  from  3  different  species:  Iris  setosa (0),  Iris  virginica (1),  Iris  versicolor (2)  collected  by  Anderson  (1936)

9Full  dataset:  https://en.wikipedia.org/wiki/Iris_flower_data_set

Species Sepal  Length

Sepal  Width

Petal  Length

Petal  Width

0 4.3 3.0 1.1 0.1

0 4.9 3.6 1.4 0.1

0 5.3 3.7 1.5 0.2

1 4.9 2.4 3.3 1.0

1 5.7 2.8 4.1 1.3

1 6.3 3.3 4.7 1.6

1 6.7 3.0 5.0 1.7

Page 10: Gaussian’Naïve’Bayesmgormley/courses/10601-s17/slides/lecture6-gnb.pdfGaussian’Naïve’Bayes 1 100601IntroductiontoMachineLearning Matt%Gormley Lecture6 February%6,%2016 Machine%Learning%Department

Slide  from  William  Cohen

Page 11: Gaussian’Naïve’Bayesmgormley/courses/10601-s17/slides/lecture6-gnb.pdfGaussian’Naïve’Bayes 1 100601IntroductiontoMachineLearning Matt%Gormley Lecture6 February%6,%2016 Machine%Learning%Department

Slide  from  William  Cohen

Page 12: Gaussian’Naïve’Bayesmgormley/courses/10601-s17/slides/lecture6-gnb.pdfGaussian’Naïve’Bayes 1 100601IntroductiontoMachineLearning Matt%Gormley Lecture6 February%6,%2016 Machine%Learning%Department

Plot  the  difference  of  the  probabilities

Slide  from  William  Cohen

z-­‐axis  is  the  difference  of  the  posterior  probabilities:  p(y=1  |  x)  – p(y=0  |  x)

Page 13: Gaussian’Naïve’Bayesmgormley/courses/10601-s17/slides/lecture6-gnb.pdfGaussian’Naïve’Bayes 1 100601IntroductiontoMachineLearning Matt%Gormley Lecture6 February%6,%2016 Machine%Learning%Department

Question:  what  does  the  boundary  between  positive  and  negative  look  like  for  Naïve  Bayes?

Slide  from  William  Cohen  (10-­‐601B,  Spring  2016)

Page 14: Gaussian’Naïve’Bayesmgormley/courses/10601-s17/slides/lecture6-gnb.pdfGaussian’Naïve’Bayes 1 100601IntroductiontoMachineLearning Matt%Gormley Lecture6 February%6,%2016 Machine%Learning%Department

Iris  Data  (2  classes)

14

Page 15: Gaussian’Naïve’Bayesmgormley/courses/10601-s17/slides/lecture6-gnb.pdfGaussian’Naïve’Bayes 1 100601IntroductiontoMachineLearning Matt%Gormley Lecture6 February%6,%2016 Machine%Learning%Department

Iris  Data  (sigma not  shared)

15

Page 16: Gaussian’Naïve’Bayesmgormley/courses/10601-s17/slides/lecture6-gnb.pdfGaussian’Naïve’Bayes 1 100601IntroductiontoMachineLearning Matt%Gormley Lecture6 February%6,%2016 Machine%Learning%Department

Iris  Data  (sigma=1)

16

Page 17: Gaussian’Naïve’Bayesmgormley/courses/10601-s17/slides/lecture6-gnb.pdfGaussian’Naïve’Bayes 1 100601IntroductiontoMachineLearning Matt%Gormley Lecture6 February%6,%2016 Machine%Learning%Department

Iris  Data  (3  classes)

17

Page 18: Gaussian’Naïve’Bayesmgormley/courses/10601-s17/slides/lecture6-gnb.pdfGaussian’Naïve’Bayes 1 100601IntroductiontoMachineLearning Matt%Gormley Lecture6 February%6,%2016 Machine%Learning%Department

Iris  Data  (sigma not  shared)

18

Page 19: Gaussian’Naïve’Bayesmgormley/courses/10601-s17/slides/lecture6-gnb.pdfGaussian’Naïve’Bayes 1 100601IntroductiontoMachineLearning Matt%Gormley Lecture6 February%6,%2016 Machine%Learning%Department

Iris  Data  (sigma=1)

19

Page 20: Gaussian’Naïve’Bayesmgormley/courses/10601-s17/slides/lecture6-gnb.pdfGaussian’Naïve’Bayes 1 100601IntroductiontoMachineLearning Matt%Gormley Lecture6 February%6,%2016 Machine%Learning%Department

Naïve  Bayes  has  a  linear decision  boundary  (if  sigma  is  shared  across  classes)

Slide  from  William  Cohen  (10-­‐601B,  Spring  2016)

Page 21: Gaussian’Naïve’Bayesmgormley/courses/10601-s17/slides/lecture6-gnb.pdfGaussian’Naïve’Bayes 1 100601IntroductiontoMachineLearning Matt%Gormley Lecture6 February%6,%2016 Machine%Learning%Department

Figure  from  William  Cohen  (10-­‐601B,  Spring  2016)

Page 22: Gaussian’Naïve’Bayesmgormley/courses/10601-s17/slides/lecture6-gnb.pdfGaussian’Naïve’Bayes 1 100601IntroductiontoMachineLearning Matt%Gormley Lecture6 February%6,%2016 Machine%Learning%Department

Why  don’t  we  drop  the  generative  model  and  

try  to  learn  this  hyperplane directly?

Figure  from  William  Cohen  (10-­‐601B,  Spring  2016)

Page 23: Gaussian’Naïve’Bayesmgormley/courses/10601-s17/slides/lecture6-gnb.pdfGaussian’Naïve’Bayes 1 100601IntroductiontoMachineLearning Matt%Gormley Lecture6 February%6,%2016 Machine%Learning%Department

Beyond  the  Scope  of  this  Lecture

• Multinomial Naïve  Bayes  can  be  used  for  integer features

• Multi-­‐class  Naïve  Bayes  can  be  used  if  your  classification  problem  has  >  2  classes

23

Page 24: Gaussian’Naïve’Bayesmgormley/courses/10601-s17/slides/lecture6-gnb.pdfGaussian’Naïve’Bayes 1 100601IntroductiontoMachineLearning Matt%Gormley Lecture6 February%6,%2016 Machine%Learning%Department

Summary

1. Naïve  Bayes  provides  a  framework  for  generative  modeling

2. Choose  p(xm |  y)  appropriate  to  the  data(e.g.  Bernoulli  for  binary  features,  Gaussian  for  continuous  features)

3. Train  by  MLE or  MAP4. Classify  by  maximizing  the  posterior

24