Bayesian Classification Methods - Nc State University › ~post › suchit ›...

33
Bayesian Classification Methods Suchit Mehrotra North Carolina State University [email protected] October 24, 2014 Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 1 / 33

Transcript of Bayesian Classification Methods - Nc State University › ~post › suchit ›...

Page 1: Bayesian Classification Methods - Nc State University › ~post › suchit › BayesianClassification.pdfBayesian: Probability is the researcher/observer "degree of belief" before

Bayesian Classification Methods

Suchit Mehrotra

North Carolina State University

[email protected]

October 24, 2014

Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 1 / 33

Page 2: Bayesian Classification Methods - Nc State University › ~post › suchit › BayesianClassification.pdfBayesian: Probability is the researcher/observer "degree of belief" before

How do you define probability?

There exists a fundamental difference between how Frequentists andBayesians view probability:

Frequentist: Think of probability as choosing a random point in aVenn diagram and determing the percentage of times the point wouldfall in a certain set as you repeat this process an infinite number oftimes. Seeks to be ”objective”.

Bayesian: The probability assigned to an event depends person toperson. It is the condition under which a person would make a betwith their own money. Reflects a state of belief.

Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 2 / 33

Page 3: Bayesian Classification Methods - Nc State University › ~post › suchit › BayesianClassification.pdfBayesian: Probability is the researcher/observer "degree of belief" before

Conditional Probability and Bayes Rule

The fundamental mechanism through which a Bayesian would update theirbeliefs about a certain event is Bayes Rule.Let A and B be events.

Conditional Probability:

P(A|B) =P(A ∩ B)

P(B)whereP(B) > 0 =⇒

P(A ∩ B) = P(B|A)P(A)

Bayes Rule:

P(A|B) =P(B|A)P(A)

P(B)

Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 3 / 33

Page 4: Bayesian Classification Methods - Nc State University › ~post › suchit › BayesianClassification.pdfBayesian: Probability is the researcher/observer "degree of belief" before

Posterior Distributions

Let θ be a parameter of interest and y be observed data. The Frequentistapproach considers θ to be fixed and the data to be random, whereasBayesians view θ as a random variable and y as fixed.

The posterior distribution for θ is an application of Bayes Rule:

π(θ|y) =fY|θ(y|θ)fΘ(θ)

fY(y)

fΘ(θ) encapsulates our prior belief about the parameter. Our priorbeliefs are then updated by the data we observe.

Since θ is treated as a random variable, we can now make probabilitystatements about it.

Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 4 / 33

Page 5: Bayesian Classification Methods - Nc State University › ~post › suchit › BayesianClassification.pdfBayesian: Probability is the researcher/observer "degree of belief" before

Primary Goals

Frequentist:

Calculate point estimates for the parameters and use standard errorsto calculate confidence intervals.

Bayesian:

Derive the posterior distribution of the parameter.

Summarize this distribution with information about mean, median,mode, quantile, variance etc.

Used credible intervals to show ranges which contain the parameterwith high probability.

Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 5 / 33

Page 6: Bayesian Classification Methods - Nc State University › ~post › suchit › BayesianClassification.pdfBayesian: Probability is the researcher/observer "degree of belief" before

Bayesian Learning

The Bayesian framework provides us with a mathematically rigorousway to update our beliefs as we gather new information.

There are no restrictions on what can constitute prior information.Therefore we can use posterior distributions from time 1 as our priorfor time 2.

π1(θ|y1) ∝ fY|θ(y1|θ) fΘ(θ)

π2(θ|y1, y2) ∝ fY|θ(y2|θ) π1(θ|y1)

...

πn(θ|y1, . . . , yn) ∝ fY|θ(yn|θ) πn−1(θ|y1, . . . , yn−1)

Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 6 / 33

Page 7: Bayesian Classification Methods - Nc State University › ~post › suchit › BayesianClassification.pdfBayesian: Probability is the researcher/observer "degree of belief" before

Overview of Differences [Gill (2008)]

Interpretation of ProbabilityFrequentist: Observed result from infinite series of trials performed

or imagined under identical conditions.Probabilistic quantity of interest is p(data|Ho)

Bayesian: Probability is the researcher/observer ”degree of belief”before or after the data are observed.

Probabilistic quantity of interest is p(θ|data).

What is Fixed and VariableFrequentist: Data are a iid random sample from continuous stream.

Parameters are fixed by nature.Bayesian: Data observed and so fixed by the sample generated.

Parameters are unknown and described distributionally.

How are Results Summarized?Frequentist: Point estimates and standard errors.

95% confidence intervals indicating that 19/20 times theinterval covers the true parameter value, on average.

Bayesian: Descriptions of posteriors such as means and quantiles.Highest posterior density intervals indicating region of

highest probability.

Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 7 / 33

Page 8: Bayesian Classification Methods - Nc State University › ~post › suchit › BayesianClassification.pdfBayesian: Probability is the researcher/observer "degree of belief" before

Linear Regression (Frequentist)

Consider the linear model y = Xβββ + e where X is a n x k matrix withrank k, βββ is a k x 1 vector of coefficients and y is an n x 1 vector ofresponses. e is a vector of errors ∼ iid N(0, σ2I).

Frequentist regression seeks point estimates by maximizing likelihoodfunction with respect to βββ and σ2.

L(βββ, σ2|X, y) = (2πσ2)−n2 exp

[−1

2σ2(y − Xβββ)T (y − Xβββ)

]The unbiased OLS estimates are:

β̂ββ = (XTX)−1XTy σ̂2 =(y − Xβ̂ββ)T (y − Xβ̂ββ)

n − k

Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 8 / 33

Page 9: Bayesian Classification Methods - Nc State University › ~post › suchit › BayesianClassification.pdfBayesian: Probability is the researcher/observer "degree of belief" before

Bayesian Regression

Bayesian regression, on the other hand, finds the posterior distribution ofthe parameters βββ and σ2.

π(βββ, σ2 |X, y) ∝ L(βββ, σ2 |X, y)fβββ,σ2(βββ, σ2)

∝ L(βββ, σ2 |X, y)fβββ|σ2(βββ |σ2)fσ2(σ2)

Setup Prior Posterior

Conjugate βββ |σ2 ∼ N(b, σ2) βββ |X ∼ t(n + a− k)σ2 ∼ IG(a, b) σ2 |X ∼ IG(n + a− k, 1

2σ̂2(n + a− k))

Uninformative βββ ∝ c over [−∞, ∞] βββ |X ∼ t(n − k)σ2 = 1

σover [0 :∞] σ2 |X ∼ IG( 1

2(n − k − 1), 1

2σ̂2(n − k))

Gill(2008)

Additionally, the posterior distribution conditioned on σ2 for βββ in thenon-informative case:

βββ|σ2, y ∼ N(β̂ββ, σ2Vβββ) where Vβββ = (XTX)−1

Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 9 / 33

Page 10: Bayesian Classification Methods - Nc State University › ~post › suchit › BayesianClassification.pdfBayesian: Probability is the researcher/observer "degree of belief" before

Example: Median Home Prices

From the MASS library: Data regarding 506 housing values insuburbs of Boston

Fit a model with medv (median value of owner occupied homes)as the response and four predictors.

Implementation with the blinreg function in the LearnBayespackage.

boston.fit <- lm(medv ~ crim + indus + chas + rm, data = Boston, y = TRUE,

x = TRUE)

theta.sample <- blinreg(boston.fit$y, boston.fit$x, 100000)

Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 10 / 33

Page 11: Bayesian Classification Methods - Nc State University › ~post › suchit › BayesianClassification.pdfBayesian: Probability is the researcher/observer "degree of belief" before

Example: Posterior Samples of Parameters

Crime

β1

Fre

quen

cy

−0.35 −0.25 −0.15 −0.05

040

00

Industry

β2

Fre

quen

cy

−0.4 −0.3 −0.2 −0.1

030

00

Charles River

β3

Fre

quen

cy

0 2 4 6 8

030

00

Rooms

β4

Fre

quen

cy

6 7 8 9

040

00Std. Deviation

σ

Fre

quen

cy

5.5 6.0 6.5

010

0030

00

Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 11 / 33

Page 12: Bayesian Classification Methods - Nc State University › ~post › suchit › BayesianClassification.pdfBayesian: Probability is the researcher/observer "degree of belief" before

Example: Confidence Intervals for Parameters

Bayesian regression with non-informative priors leads to similar estimatesas OLS.

OLS Confidence Intervals:

2.5 % 97.5 %

(Intercept) -26.43 -15.29crim -0.26 -0.12

indus -0.35 -0.18chas 2.48 6.65

rm 6.62 8.25

Bayesian Credible Intervals:

Intercept crim indus chas rm

2.5% -26.42 -0.26 -0.35 2.48 6.6297.5% -15.30 -0.12 -0.18 6.65 8.25

Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 12 / 33

Page 13: Bayesian Classification Methods - Nc State University › ~post › suchit › BayesianClassification.pdfBayesian: Probability is the researcher/observer "degree of belief" before

Logistic Regression

Like Linear Regression, Logistic Regression is also a likelihoodmaximization problem in the frequentist setup. When the number ofparameters is two, the log-likelihood function is:

`(β0, β1|y) = β0

n∑i=1

yi + β1

n∑i=1

xiyi −n∑

i=1

log(1 + eβ0+β1xi )

In the Bayesian setting, we incorporate prior information and find theposterior distribution of the parameters:

π(βββ|X, y) ∝ L(βββ|X, y)fβββ (βββ)

The standard ”weakly-informative” prior used in the arm package is theStudent-t distribution with 1 degree of freedom (Cauchy distribution).

Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 13 / 33

Page 14: Bayesian Classification Methods - Nc State University › ~post › suchit › BayesianClassification.pdfBayesian: Probability is the researcher/observer "degree of belief" before

Logistic Regression: Example with Cancer Data Set

Goal: Run Logistic Regression on the cancer data set from the Universityof Wisconsin using the bayesglm function in the arm package.

cancer.bayesglm <- bayesglm(malignant ~ . , data = cancer, family = binomial

(link = "logit"))

Using this function, you can get point estimates (posterior modes) for yourparameters and their standard errors. These estimates will be the same asthe output given by glm in most cases without substantive priorinformation.

Estimate Std. Error(Intercept) -8.2409 1.2842

id 0.0000 0.0000thickness 0.4522 0.1270unif.size 0.0063 0.1676

unif.shape 0.3642 0.1876adhesion 0.2002 0.1067cell.size 0.1460 0.1458

bare.nuclei1 -1.1499 0.8526

Some estimates from bayesglm

Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 14 / 33

Page 15: Bayesian Classification Methods - Nc State University › ~post › suchit › BayesianClassification.pdfBayesian: Probability is the researcher/observer "degree of belief" before

Logistic Regression: Example with Cancer Data Set

0.00

0.25

0.50

0.75

1.00

2.5 5.0 7.5 10.0thickness

mal

igna

nt

Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 15 / 33

Page 16: Bayesian Classification Methods - Nc State University › ~post › suchit › BayesianClassification.pdfBayesian: Probability is the researcher/observer "degree of belief" before

Logistic Regression: Separation

Separation occurs when a particular linear combination of predictors isalways associated with a particular response.

Can lead to maximum likelihood estimates of ∞ for some parameters!

A common fix is to drop predictors from the model.

Can lead to the best predictors being dropped from the model[Zorn(2005)]

2008 paper by Gelman, Jakulin, Pittau & Su proposes a Bayesian fix.

Use the t-distrubtion as the ”weakly-informative” prior.Created bayesglm in the arm (Applied Regression and MultilevelModeling) package with the t-distribution as the default prior.

Using bayesglm instead of glm can solve this problem!

Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 16 / 33

Page 17: Bayesian Classification Methods - Nc State University › ~post › suchit › BayesianClassification.pdfBayesian: Probability is the researcher/observer "degree of belief" before

Logistic Regression: Separation Example

0.00

0.25

0.50

0.75

1.00

−2 0 2 4 6x

y

Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 17 / 33

Page 18: Bayesian Classification Methods - Nc State University › ~post › suchit › BayesianClassification.pdfBayesian: Probability is the researcher/observer "degree of belief" before

Logistic Regression: Separation Example

This problem is fixed using bayesglm

0.00

0.25

0.50

0.75

1.00

−2 0 2 4 6x

y

Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 18 / 33

Page 19: Bayesian Classification Methods - Nc State University › ~post › suchit › BayesianClassification.pdfBayesian: Probability is the researcher/observer "degree of belief" before

Logistic Regression: Separation Example

Estimate Std. Error z value Pr(> |z|)(Intercept) -164.0925 69589.0320 -0.00 0.9981

x 75.6372 32263.2265 0.00 0.9981

glm fit

Estimate Std. Error

(Intercept) -4.0144 0.9054x 1.6192 0.3867

bayesglm fit

Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 19 / 33

Page 20: Bayesian Classification Methods - Nc State University › ~post › suchit › BayesianClassification.pdfBayesian: Probability is the researcher/observer "degree of belief" before

Naive Bayes: Overview

Like Linear Discriminant Analysis, Naive Bayes uses Bayes Rule todetermine the probability that a particular outcome is in class C`.

P(Y = C`|X ) =P(X |Y = C`)P(Y = C`)

P(X )

P(Y = C`|X ) is the posterior probability of the class given a set ofpredictors

P(Y = C`) is the prior probability of the outcome

P(X ) is the probability of the predictors

P(X |Y = C`) is the probablity of observing a set of predictors giventhat Y belongs to class `.

Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 20 / 33

Page 21: Bayesian Classification Methods - Nc State University › ~post › suchit › BayesianClassification.pdfBayesian: Probability is the researcher/observer "degree of belief" before

Naive Bayes: Independence Assumption

P(X|Y = C`) is extremely difficult to calculate without a strong set ofassumptions. The Naive Bayes model simplifies this calculation with theassumption that all predictors are mutually independent given Y.

P(Y = C` |X1 . . .Xn) =P(Y = C`)P(X1, . . . ,Xn |Y = C`)

P(X1, . . . ,Xn)

∝ P(Y = C`)P(X1, . . . ,Xn |Y = C`)

∝ P(Y = C`)P(X1 |Y = C`) . . .P(Xn |Y = C`)

∝ P(Y = C`)n∏

i=1

P(Xi |Y = C`)

Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 21 / 33

Page 22: Bayesian Classification Methods - Nc State University › ~post › suchit › BayesianClassification.pdfBayesian: Probability is the researcher/observer "degree of belief" before

Naive Bayes: Class Probabilities and Assignment

The classifier assigns the class which has the highest posteriorprobability:

argmaxC`

P(Y = C`)n∏

i=1

P(Xi = xi |Y = C`)

We need to assume prior probabilites for class priors. We have twobasic options.

Equiprobable probabilites: P(C`) = 1n

Calculation from training set: P(C`) = number of samples in class `total number of samples

The third option is to use information based on prior beliefs about theprobablity that Y = C`

Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 22 / 33

Page 23: Bayesian Classification Methods - Nc State University › ~post › suchit › BayesianClassification.pdfBayesian: Probability is the researcher/observer "degree of belief" before

Naive Bayes: Distributional Assumptions

Even though the independence assumptions simplifies the classifier, we stillneed to make assumptions about the predictors.

In the case of continuous predictors, the class probabilities arecalculated using a Normal Distribution by default.

The classifier estimates the µ` and σ` for each predictor,

P(X = x |Y = C`) =1

σ`√

2πexp

[−1

2σ2`

(x − µ`)2

]Other choices include:

MultinomialBernoulliSemi-supervised methods

Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 23 / 33

Page 24: Bayesian Classification Methods - Nc State University › ~post › suchit › BayesianClassification.pdfBayesian: Probability is the researcher/observer "degree of belief" before

Naive Bayes: Simple Example

Classify the Iris dataset using the NaiveBayes function in the klaR package.

Sepal.Length

2.0 2.5 3.0 3.5 4.0

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

0.5 1.0 1.5 2.0 2.5

4.5

5.5

6.5

7.5

●●●●

●●

●●

●● ●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

2.0

2.5

3.0

3.5

4.0

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●● ●

●●

● ●

● ●●

●●

●●

●●

●● ●

●●

●Sepal.Width

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

● ●

●●

●●●

●●

● ●

● ●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●●

●●

●●

●●

● ●●

●●

●●●● ●

●● ●● ● ●●

●● ●

●●●

●●

●●

●●

●● ●●●● ●● ●●

● ●●●●

●●●●●

●●

● ●●

●●

●●●

●●

●●

●●●

●●

● ●●

●●●

● ●●

●●●

●●

●●● ●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●● ●

●●●● ● ●●

●● ●

●●●

●●

●●

●●

● ●●●●● ● ●●●● ●●●

●●● ●●

●●

● ●●

●●

●●●

●●

●●

●●●

●●

●●●

●●●

● ●●

●●●

●●

● ●●●

●●

●●

●●

● ●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

Petal.Length

12

34

56

7

●●●●●

●●●●●●●●

●●●●●

●●

●●

●●● ●●●●● ●●●●●●

●●●

●●●●

●●●●●

●●●

●●●

●●

●●

●●●

●●

●●●

●●●

●●●

●●●

●●

●●●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

4.5 5.5 6.5 7.5

0.5

1.0

1.5

2.0

2.5

●●●● ●

●●

●●●

●●●●

●●● ●●

● ●

●●●●

●●●● ●

●● ●

●●●

●●

●● ●●

●● ●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●●●

●●● ●

●●

●●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●

●● ●● ●

●●●●

●●●

●●●

●●● ●●

●●

●●●●

●●●● ●

●● ●

●●●

●●

●● ●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

● ●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

1 2 3 4 5 6 7

●●●●●

●●●●●●●

●●●

●●●●●

●●

●●●●

●●●●●●

●●●●●

●●

●●●●

●● ●

●●

●●

●●

●●● ●

●●

●●●

●●●●

●●●●

●●

●●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●

Petal.Width

Iris Data

Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 24 / 33

Page 25: Bayesian Classification Methods - Nc State University › ~post › suchit › BayesianClassification.pdfBayesian: Probability is the researcher/observer "degree of belief" before

Naive Bayes: Simple Example

1 2 3 4 5 6 7

0.0

0.2

0.4

0.6

Petal.Length

Den

sity

0.5 1.5 2.5

0.0

0.4

0.8

1.2

Petal.WidthD

ensi

ty

setosa versicolor virginica

setosa 50 0 0versicolor 0 47 3virginica 0 3 47

Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 25 / 33

Page 26: Bayesian Classification Methods - Nc State University › ~post › suchit › BayesianClassification.pdfBayesian: Probability is the researcher/observer "degree of belief" before

Naive Bayes: Non-Parametric Estimation

cancer.nb <- NaiveBayes(malignant ~. ,data = cancer, usekernel = TRUE )

0 10 445 91 12 232

TRUE (97%)

0 10 434 61 23 235

FALSE (96%)

2 4 6 8 10

0.00

0.10

thickness

Den

sity

Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 26 / 33

Page 27: Bayesian Classification Methods - Nc State University › ~post › suchit › BayesianClassification.pdfBayesian: Probability is the researcher/observer "degree of belief" before

Naive Bayes: Priors

Bayesian approaches allow for the inclusion of prior information into themodel. In this context priors are applied to the probability of belonging toa group.

0 1

0 442 31 15 238

P(0) = 0.1; P(1) = 0.9

0 1

0 443 41 14 237

P(0) = 0.25; P(1) = 0.75

0 1

0 444 71 13 234

P(0) = 0.5; P(1) = 0.5

0 1

0 445 181 12 223

P(0) = 0.9; P(1) = 0.1

Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 27 / 33

Page 28: Bayesian Classification Methods - Nc State University › ~post › suchit › BayesianClassification.pdfBayesian: Probability is the researcher/observer "degree of belief" before

Naive Bayes: Sepsis Example

Fit Naive Bayes model on data regarding whether or not a child developssepsis based on three predictors. Data from research done at KaiserPermanente (Courtesy of Dr. David Draper at UCSC).

Response is whether or not a child developed Sepsis (66,846observations).

This is an extremely bad outcome for a child. They either die or gointo a permanent coma.

Two predictors ANC and I2T are derived from a blood test called theComplete Blood Count.

Third predictor is the baby’s age in hours at which the blood testresults were obtained.

In this scenario, our goal is likely to predict accurately which children willget sepsis. Overall classification rate is likely not as important.

Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 28 / 33

Page 29: Bayesian Classification Methods - Nc State University › ~post › suchit › BayesianClassification.pdfBayesian: Probability is the researcher/observer "degree of belief" before

Naive Bayes: Sepsis Example

0 10 20 30

0.0

0.2

0.4

0.6

0.8

1.0

Age <= 1

ANC

I2T

0 20 40

0.0

0.2

0.4

0.6

0.8

1.0

1 < Age <= 4

ANC

I2T

0 20 40 60 80

0.0

0.2

0.4

0.6

0.8

1.0

4 < Age

ANCI2

T

Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 29 / 33

Page 30: Bayesian Classification Methods - Nc State University › ~post › suchit › BayesianClassification.pdfBayesian: Probability is the researcher/observer "degree of belief" before

Naive Bayes: Sepsis Example

As can be seen by the confusion matrices below, the overall classificationrate of around 98.6% does not make up for the fact that we only correctlyclassify the sepsis incidence 11% of the time.

0 1

0 65887 2161 715 28

usekernel = FALSE

0 1

0 66589 2391 13 5

usekernel = TRUE

Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 30 / 33

Page 31: Bayesian Classification Methods - Nc State University › ~post › suchit › BayesianClassification.pdfBayesian: Probability is the researcher/observer "degree of belief" before

Naive Bayes: Pros and Cons

AdvantagesDue to the independence assumption, there is no curse ofdimensionality! We can fit models with p > n without any issues.Independence assumption simplifies math and speeds up the modelfitting process.We can include prior information to increase accuracy.Have the option of not making distributional assumptions about eachpredictor.

DisadvantagesIf independence assumptions do not hold, this model can performpoorly.Non-parametric approach can overfit data.Our prior information could be wrong.

Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 31 / 33

Page 32: Bayesian Classification Methods - Nc State University › ~post › suchit › BayesianClassification.pdfBayesian: Probability is the researcher/observer "degree of belief" before

Logistic Regression with Sepsis Data

How does Logistic regression fare with the Sepsis data?

0 1

0 66510 2331 92 11

Cutoff: 0.1, % Correct: 99.51%

0 1

0 66245 2171 357 27

Cutoff: 0.05, % Correct: 99.14%

0 1

0 65582 1801 1020 64

Cutoff: 0.025, % Correct: 98.20%

0 1

0 63339 1351 3263 109

Cutoff: 0.01, % Correct: 94.91%

Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 32 / 33

Page 33: Bayesian Classification Methods - Nc State University › ~post › suchit › BayesianClassification.pdfBayesian: Probability is the researcher/observer "degree of belief" before

The End

Questions?

Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 33 / 33