CSC 411: Lecture 09: Naive Bayesfidler/teaching/2015/slides/CSC411/09_naive_bayes.pdf · Today...

CSC 411: Lecture 09: Naive Bayes

Class based on Raquel Urtasun & Rich Zemel’s lectures

Sanja Fidler

University of Toronto

Feb 8, 2015

Urtasun, Zemel, Fidler (UofT) CSC 411: 09-Naive Bayes Feb 8, 2015 1 / 28

Classification – Multi-dimensional (Gaussian) Bayes classifier

Estimate probability densities from data

Naive Bayes classifier

Generative vs Discriminative

Two approaches to classification:

Discriminative classifiers estimate parameters of decision boundary/classseparator directly from labeled examples

I learn p(y |x) directly (logistic regression models)I learn mappings from inputs to classes (least-squares, neural nets)

Generative approach: model the distribution of inputs characteristic of theclass (Bayes classifier)

I Build a model of p(x|y)I Apply Bayes Rule

Bayes Classifier

Aim to diagnose whether patient has diabetes: classify into one of twoclasses (yes C=1; no C=0)

Run battery of tests

Given patient’s results: x = [x1, x2, · · · , xd ]T we want to update classprobabilities using Bayes Rule:

p(C |x) =p(x|C )p(C )

More formally

posterior =Class likelihood× prior

Evidence

How can we compute p(x) for the two class case?

p(x) = p(x|C = 0)p(C = 0) + p(x|C = 1)p(C = 1)

Classification: Diabetes Example

Last class we had a single observation per patient: white blood cell count

p(C = 1|x = 48) =p(x = 48|C = 1)p(C = 1)

p(x = 48)

Add second observation: Plasma glucose value

Now our input x is 2-dimensional

Gaussian Discriminant Analysis (Gaussian Bayes Classifier)

Gaussian Discriminant Analysis in its general form assumes that p(x|t) isdistributed according to a multivariate normal (Gaussian) distribution

Multivariate Gaussian distribution:

p(x|t = k) =1

(2π)d/2|Σk |1/2exp

[−(x− µk)TΣ−1

k (x− µk)]

where |Σk | denotes the determinant of the matrix, and d is dimension of x

Each class k has associated mean vector µk and covariance matrix Σk

Typically the classes share a single covariance matrix Σ (“share” means thatthey have the same parameters; the covariance matrix in this case):Σ = Σ1 = · · · = Σk

Multivariate Data

Multiple measurements (sensors)

d inputs/features/attributes

N instances/observations/examples

x(1)1 x

(1)2 · · · x

x(2)1 x

(2)2 · · · x

......

. . ....

x(N)1 x

(N)2 · · · x

Multivariate Parameters

MeanE[x] = [µ1, · · · , µd ]T

Covariance

Σ = Cov(x) = E[(x− µ)T (x− µ)] =

σ21 σ12 · · · σ1d

σ12 σ22 · · · σ2d

......

. . ....

σd1 σd2 · · · σ2d

Correlation = Corr(x) is the covariance divided by the product of standarddeviation

ρij =σijσiσj

Multivariate Gaussian Distribution

x ∼ N (µ,Σ), a Gaussian (or normal) distribution defined as

p(x) =1

(2π)d/2|Σ|1/2exp

[−(x− µ)TΣ−1(x− µ)

Mahalanobis distance (x− µk)TΣ−1(x− µk) measures the distance from xto µ in terms of Σ

It normalizes for difference in variances and correlations

Bivariate Normal

(1 00 1

)Σ = 0.5

(1 00 1

)Σ = 2

(1 00 1

Figure: Probability density function

Figure: Contour plot of the pdf

Bivariate Normal

var(x1) = var(x2) var(x1) > var(x2) var(x1) < var(x2)

Bivariate Normal

(1 00 1

(1 0.5

(1 0.8

Bivariate Normal

Cov(x1, x2) = 0 Cov(x1, x2) > 0 Cov(x1, x2) < 0

Gaussian Discriminant Analysis (Gaussian Bayes Classifier)

GDA (GBC) decision boundary is based on class posterior:

log p(tk |x) = log p(x|tk) + log p(tk)− log p(x)

= −d

2log(2π)− 1

2log |Σ−1

k | −1

2(x− µk)TΣ−1

k (x− µk) +

+ log p(tk)− log p(x)

Decision: take the class with the highest posterior probability

Decision Boundary

likelihoods)

posterior)for)t1)

discriminant:!!P!(t1|x")!=!0.5!

Decision Boundary when Shared Covariance Matrix

Learning

Learn the parameters using maximum likelihood

`(φ, µ0, µ1,Σ) = − logN∏

p(x(n), t(n)|φ, µ0, µ1,Σ)

= − logN∏

p(x(n)|t(n), µ0, µ1,Σ)p(t(n)|φ)

What have we assumed?

CSC 411: Lecture 09: Naive Bayesfidler/teaching/2015/slides/CSC411/09_naive_bayes.pdf · Today...

Documents

Transcript of CSC 411: Lecture 09: Naive Bayesfidler/teaching/2015/slides/CSC411/09_naive_bayes.pdf · Today...

NLP Practical: Part I - University of Cambridge · Naive Bayes Naive Bayes Classi er: c^ = argmax c2C P(cj~f) = argmax c2C P(c) Yn i=1 P(f ijc) Feature vector f~; most probable class

Neural Networks Tutorial - cs.toronto.edujlucas/teaching/csc411/lectures/tut5... · CSC411 Tutorial #5 Neural Networks Oct, 2017 Shengyang Sun ssy@cs.toronto.edu *Based on the lectures

James Brofos April 8, 2014This issue can be addressed by implementing a classi er that estimates P(C= 1j~x) across the feature space using a naive Euclidean Distance metric. This classi

Classification Methods for Human Activity Recognition (HAR)inside.mines.edu/~whoff/courses/EENG510/projects/2015/Hobson... · Classi cation involved: Naive-Bayes Classi er using MATLAB’s

CSC411/2515: Tutorial 3urtasun/courses/CSC411/tutorial3.pdf · CSC411/2515: Tutorial 3 Erin Grant cscf411,2515gta@cs.toronto.edu October f1st, 2nd, 6th g, 2015

Naive Bayes Classifier-Based Fire Detection Using ... · Naive Bayes Classi er-Based Fire Detection Using Smartphone Sensors by Mohammad Mahdi Mahdavi Amjad For many years, smoke

Quantum circuit representation of Bayesian networks · liquidity risk assessment, and a 9-node naive Bayes classi er for bankruptcy prediction. The circuits were designed and simulated

CSC411- Machine Learning and Data Mining

Spam Filtering with Naive Bayes â€“ Which Naive Bayes?

Classi ca˘c~ao: Perceptron, Regress~ao Log stica e Naive Bayes · e Naive Bayes Fabr cio Olivetti de Fran˘ca Universidade Federal do ABC. T opicos 1. Classi cac~ao 2. Classi cadores

Bayesian Model Averaging for Improving Performance of the ... · Bayesian Model Averaging for Improving Performance of the Naive Bayes Classi er Ga Wu Supervised by : Dr.Scott Sanner

Spam Filtering with Naive Bayes - Which Naive Bayes? · PDF fileSpam Filtering with Naive Bayes – Which Naive Bayes? ∗ Vangelis Metsis † Institute of Informatics and Telecommunications,

Naive Bayes Classi cation

Data-analysis and Retrieval Introduction/Text Classification … · Data-analysis and Retrieval Introduction/Text Classi cation and Naive Bayes Ad Feelders Universiteit Utrecht Data-analysis

Extended Tree Augmented Naive Classi er - SUPSIpeople.idsia.ch/~giorgio/pdf/pgm14-etan.pdf · Abstract. This work proposes an extended version of the well-known tree-augmented naive

Classification with Naive Bayes - cla2019.github.io · Classi cation with Naive Bayes Benjamin Roth (Many thanks to Helmut Schmid for parts of the slides) Centrum f ur Informations-

Classification - KNN Classifier, Naive Bayesian ClassifierKNN Classi er Naive Bayesian Classi er Algorithm idea Let k be the number of nearest neighbors and D be the set of training

Big Data - Lecture 3 Supervised classification · Introduction Naive Bayes classi er Nearest Neighbour rule SVM Big Data - Lecture 3 Supervised classi cation S. Gadat Toulouse, Novembre

Augmented Naive Bayes Classi er 大規模構造学習電気通信大学情報理工学部 先端工学基礎課程卒業論文 Augmented Naive Bayes Classi erの 大規模構造学習

Welcome to CSC411 - Department of Computer Science ...guerzhoy/411/lec/W01/intro.pdf · Welcome to CSC411 CSC411: Machine Learning and Data Mining, Winter 2017 Michael Guerzhoy Y.

Augmented Naive Bayes Classi er 大規模構造学習電気通信大学情報理工学部先端工学基礎課程卒業論文 Augmented Naive Bayes Classi erの大規模構造学習