Suchit Dash - Tangible Rewards and Engagement: The Real Story
Bayesian Classification Methods - Nc State University › ~post › suchit ›...
Transcript of Bayesian Classification Methods - Nc State University › ~post › suchit ›...
Bayesian Classification Methods
Suchit Mehrotra
North Carolina State University
October 24, 2014
Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 1 / 33
How do you define probability?
There exists a fundamental difference between how Frequentists andBayesians view probability:
Frequentist: Think of probability as choosing a random point in aVenn diagram and determing the percentage of times the point wouldfall in a certain set as you repeat this process an infinite number oftimes. Seeks to be ”objective”.
Bayesian: The probability assigned to an event depends person toperson. It is the condition under which a person would make a betwith their own money. Reflects a state of belief.
Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 2 / 33
Conditional Probability and Bayes Rule
The fundamental mechanism through which a Bayesian would update theirbeliefs about a certain event is Bayes Rule.Let A and B be events.
Conditional Probability:
P(A|B) =P(A ∩ B)
P(B)whereP(B) > 0 =⇒
P(A ∩ B) = P(B|A)P(A)
Bayes Rule:
P(A|B) =P(B|A)P(A)
P(B)
Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 3 / 33
Posterior Distributions
Let θ be a parameter of interest and y be observed data. The Frequentistapproach considers θ to be fixed and the data to be random, whereasBayesians view θ as a random variable and y as fixed.
The posterior distribution for θ is an application of Bayes Rule:
π(θ|y) =fY|θ(y|θ)fΘ(θ)
fY(y)
fΘ(θ) encapsulates our prior belief about the parameter. Our priorbeliefs are then updated by the data we observe.
Since θ is treated as a random variable, we can now make probabilitystatements about it.
Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 4 / 33
Primary Goals
Frequentist:
Calculate point estimates for the parameters and use standard errorsto calculate confidence intervals.
Bayesian:
Derive the posterior distribution of the parameter.
Summarize this distribution with information about mean, median,mode, quantile, variance etc.
Used credible intervals to show ranges which contain the parameterwith high probability.
Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 5 / 33
Bayesian Learning
The Bayesian framework provides us with a mathematically rigorousway to update our beliefs as we gather new information.
There are no restrictions on what can constitute prior information.Therefore we can use posterior distributions from time 1 as our priorfor time 2.
π1(θ|y1) ∝ fY|θ(y1|θ) fΘ(θ)
π2(θ|y1, y2) ∝ fY|θ(y2|θ) π1(θ|y1)
...
πn(θ|y1, . . . , yn) ∝ fY|θ(yn|θ) πn−1(θ|y1, . . . , yn−1)
Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 6 / 33
Overview of Differences [Gill (2008)]
Interpretation of ProbabilityFrequentist: Observed result from infinite series of trials performed
or imagined under identical conditions.Probabilistic quantity of interest is p(data|Ho)
Bayesian: Probability is the researcher/observer ”degree of belief”before or after the data are observed.
Probabilistic quantity of interest is p(θ|data).
What is Fixed and VariableFrequentist: Data are a iid random sample from continuous stream.
Parameters are fixed by nature.Bayesian: Data observed and so fixed by the sample generated.
Parameters are unknown and described distributionally.
How are Results Summarized?Frequentist: Point estimates and standard errors.
95% confidence intervals indicating that 19/20 times theinterval covers the true parameter value, on average.
Bayesian: Descriptions of posteriors such as means and quantiles.Highest posterior density intervals indicating region of
highest probability.
Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 7 / 33
Linear Regression (Frequentist)
Consider the linear model y = Xβββ + e where X is a n x k matrix withrank k, βββ is a k x 1 vector of coefficients and y is an n x 1 vector ofresponses. e is a vector of errors ∼ iid N(0, σ2I).
Frequentist regression seeks point estimates by maximizing likelihoodfunction with respect to βββ and σ2.
L(βββ, σ2|X, y) = (2πσ2)−n2 exp
[−1
2σ2(y − Xβββ)T (y − Xβββ)
]The unbiased OLS estimates are:
β̂ββ = (XTX)−1XTy σ̂2 =(y − Xβ̂ββ)T (y − Xβ̂ββ)
n − k
Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 8 / 33
Bayesian Regression
Bayesian regression, on the other hand, finds the posterior distribution ofthe parameters βββ and σ2.
π(βββ, σ2 |X, y) ∝ L(βββ, σ2 |X, y)fβββ,σ2(βββ, σ2)
∝ L(βββ, σ2 |X, y)fβββ|σ2(βββ |σ2)fσ2(σ2)
Setup Prior Posterior
Conjugate βββ |σ2 ∼ N(b, σ2) βββ |X ∼ t(n + a− k)σ2 ∼ IG(a, b) σ2 |X ∼ IG(n + a− k, 1
2σ̂2(n + a− k))
Uninformative βββ ∝ c over [−∞, ∞] βββ |X ∼ t(n − k)σ2 = 1
σover [0 :∞] σ2 |X ∼ IG( 1
2(n − k − 1), 1
2σ̂2(n − k))
Gill(2008)
Additionally, the posterior distribution conditioned on σ2 for βββ in thenon-informative case:
βββ|σ2, y ∼ N(β̂ββ, σ2Vβββ) where Vβββ = (XTX)−1
Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 9 / 33
Example: Median Home Prices
From the MASS library: Data regarding 506 housing values insuburbs of Boston
Fit a model with medv (median value of owner occupied homes)as the response and four predictors.
Implementation with the blinreg function in the LearnBayespackage.
boston.fit <- lm(medv ~ crim + indus + chas + rm, data = Boston, y = TRUE,
x = TRUE)
theta.sample <- blinreg(boston.fit$y, boston.fit$x, 100000)
Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 10 / 33
Example: Posterior Samples of Parameters
Crime
β1
Fre
quen
cy
−0.35 −0.25 −0.15 −0.05
040
00
Industry
β2
Fre
quen
cy
−0.4 −0.3 −0.2 −0.1
030
00
Charles River
β3
Fre
quen
cy
0 2 4 6 8
030
00
Rooms
β4
Fre
quen
cy
6 7 8 9
040
00Std. Deviation
σ
Fre
quen
cy
5.5 6.0 6.5
010
0030
00
Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 11 / 33
Example: Confidence Intervals for Parameters
Bayesian regression with non-informative priors leads to similar estimatesas OLS.
OLS Confidence Intervals:
2.5 % 97.5 %
(Intercept) -26.43 -15.29crim -0.26 -0.12
indus -0.35 -0.18chas 2.48 6.65
rm 6.62 8.25
Bayesian Credible Intervals:
Intercept crim indus chas rm
2.5% -26.42 -0.26 -0.35 2.48 6.6297.5% -15.30 -0.12 -0.18 6.65 8.25
Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 12 / 33
Logistic Regression
Like Linear Regression, Logistic Regression is also a likelihoodmaximization problem in the frequentist setup. When the number ofparameters is two, the log-likelihood function is:
`(β0, β1|y) = β0
n∑i=1
yi + β1
n∑i=1
xiyi −n∑
i=1
log(1 + eβ0+β1xi )
In the Bayesian setting, we incorporate prior information and find theposterior distribution of the parameters:
π(βββ|X, y) ∝ L(βββ|X, y)fβββ (βββ)
The standard ”weakly-informative” prior used in the arm package is theStudent-t distribution with 1 degree of freedom (Cauchy distribution).
Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 13 / 33
Logistic Regression: Example with Cancer Data Set
Goal: Run Logistic Regression on the cancer data set from the Universityof Wisconsin using the bayesglm function in the arm package.
cancer.bayesglm <- bayesglm(malignant ~ . , data = cancer, family = binomial
(link = "logit"))
Using this function, you can get point estimates (posterior modes) for yourparameters and their standard errors. These estimates will be the same asthe output given by glm in most cases without substantive priorinformation.
Estimate Std. Error(Intercept) -8.2409 1.2842
id 0.0000 0.0000thickness 0.4522 0.1270unif.size 0.0063 0.1676
unif.shape 0.3642 0.1876adhesion 0.2002 0.1067cell.size 0.1460 0.1458
bare.nuclei1 -1.1499 0.8526
Some estimates from bayesglm
Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 14 / 33
Logistic Regression: Example with Cancer Data Set
0.00
0.25
0.50
0.75
1.00
2.5 5.0 7.5 10.0thickness
mal
igna
nt
Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 15 / 33
Logistic Regression: Separation
Separation occurs when a particular linear combination of predictors isalways associated with a particular response.
Can lead to maximum likelihood estimates of ∞ for some parameters!
A common fix is to drop predictors from the model.
Can lead to the best predictors being dropped from the model[Zorn(2005)]
2008 paper by Gelman, Jakulin, Pittau & Su proposes a Bayesian fix.
Use the t-distrubtion as the ”weakly-informative” prior.Created bayesglm in the arm (Applied Regression and MultilevelModeling) package with the t-distribution as the default prior.
Using bayesglm instead of glm can solve this problem!
Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 16 / 33
Logistic Regression: Separation Example
0.00
0.25
0.50
0.75
1.00
−2 0 2 4 6x
y
Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 17 / 33
Logistic Regression: Separation Example
This problem is fixed using bayesglm
0.00
0.25
0.50
0.75
1.00
−2 0 2 4 6x
y
Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 18 / 33
Logistic Regression: Separation Example
Estimate Std. Error z value Pr(> |z|)(Intercept) -164.0925 69589.0320 -0.00 0.9981
x 75.6372 32263.2265 0.00 0.9981
glm fit
Estimate Std. Error
(Intercept) -4.0144 0.9054x 1.6192 0.3867
bayesglm fit
Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 19 / 33
Naive Bayes: Overview
Like Linear Discriminant Analysis, Naive Bayes uses Bayes Rule todetermine the probability that a particular outcome is in class C`.
P(Y = C`|X ) =P(X |Y = C`)P(Y = C`)
P(X )
P(Y = C`|X ) is the posterior probability of the class given a set ofpredictors
P(Y = C`) is the prior probability of the outcome
P(X ) is the probability of the predictors
P(X |Y = C`) is the probablity of observing a set of predictors giventhat Y belongs to class `.
Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 20 / 33
Naive Bayes: Independence Assumption
P(X|Y = C`) is extremely difficult to calculate without a strong set ofassumptions. The Naive Bayes model simplifies this calculation with theassumption that all predictors are mutually independent given Y.
P(Y = C` |X1 . . .Xn) =P(Y = C`)P(X1, . . . ,Xn |Y = C`)
P(X1, . . . ,Xn)
∝ P(Y = C`)P(X1, . . . ,Xn |Y = C`)
∝ P(Y = C`)P(X1 |Y = C`) . . .P(Xn |Y = C`)
∝ P(Y = C`)n∏
i=1
P(Xi |Y = C`)
Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 21 / 33
Naive Bayes: Class Probabilities and Assignment
The classifier assigns the class which has the highest posteriorprobability:
argmaxC`
P(Y = C`)n∏
i=1
P(Xi = xi |Y = C`)
We need to assume prior probabilites for class priors. We have twobasic options.
Equiprobable probabilites: P(C`) = 1n
Calculation from training set: P(C`) = number of samples in class `total number of samples
The third option is to use information based on prior beliefs about theprobablity that Y = C`
Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 22 / 33
Naive Bayes: Distributional Assumptions
Even though the independence assumptions simplifies the classifier, we stillneed to make assumptions about the predictors.
In the case of continuous predictors, the class probabilities arecalculated using a Normal Distribution by default.
The classifier estimates the µ` and σ` for each predictor,
P(X = x |Y = C`) =1
σ`√
2πexp
[−1
2σ2`
(x − µ`)2
]Other choices include:
MultinomialBernoulliSemi-supervised methods
Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 23 / 33
Naive Bayes: Simple Example
Classify the Iris dataset using the NaiveBayes function in the klaR package.
Sepal.Length
2.0 2.5 3.0 3.5 4.0
●●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●● ●
●●
●●
●●
●
●●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●●
●●
●●
●●
●
●●●
●●
●
●
●
●
●●●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●●
●
●●●
●
●●●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●●
●●
●●
●●
●
●●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●●
●●
●●
●●
●
●●●
●●
●
●
●
●
●● ●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●●
●●
●
●
0.5 1.0 1.5 2.0 2.5
4.5
5.5
6.5
7.5
●●●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●● ●●●
●●
●●
●
●●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●●
●●
●●●
●
●
●●●
●●
●
●
●
●
●●●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●●
●●
●
●
2.0
2.5
3.0
3.5
4.0
●
●
●●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●●
●
●●●
●●
●
●●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●●
● ●
●
●
●
●
●
●●
●
●
●
●
●● ●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●● ●
●
●●
●
●
●
●
●Sepal.Width
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●●
●
●●
●
●●
●
●●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●●
●
●●
●
●●
● ●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●●
●
●●●
●●
●
●●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●●
● ●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●●
●
●
●
●●
● ●●
●
●●
●
●
●
●
●
●●●● ●
●● ●● ● ●●
●● ●
●●●
●●
●●
●
●●
●● ●●●● ●● ●●
● ●●●●
●●●●●
●●
● ●●
●●
●
●
●●●
●
●
●●
●●
●
●
●●●
●
●
●
●
●●
● ●●
●
●
●●●
●
●
● ●●
●●●
●●
●
●
●●● ●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●●●●
●●
●
●
●
●
●
●●
●●
●●
●●
●
●
●
●
●●
●
●●
●●
●●
●●
●●
●
●● ●● ●
●●●● ● ●●
●● ●
●●●
●●
●●
●
●●
● ●●●●● ● ●●●● ●●●
●●● ●●
●
●●
● ●●
●●
●
●
●●●
●
●
●●
●●
●
●
●●●
●
●
●
●
●●
●●●
●
●
●●●
●
●
● ●●
●●●
●●
●
●
● ●●●
●
●
●
●
●●
●
●
●
●
●●
●●
●
● ●●
●
●●
●
●
●
●
●
●●
● ●
●●
●●
●
●
●
●
●●
●
●●
●●
●●
●●
●●
●
Petal.Length
12
34
56
7
●●●●●
●●●●●●●●
●●●●●
●●
●●
●
●●● ●●●●● ●●●●●●
●●●
●●●●
●
●●●●●
●●●
●
●●●
●
●
●●
●●
●
●
●●●
●
●
●
●
●●
●●●
●
●
●●●
●
●
●●●
●●●
●●
●
●
●●●●
●
●
●
●
●●
●
●
●
●
●●
●●
●
● ●●
●
●●
●
●
●
●
●
●●
●●
●●
●●
●
●
●
●
●●
●
●●
●●
●●
●●
●●
●
4.5 5.5 6.5 7.5
0.5
1.0
1.5
2.0
2.5
●●●● ●
●●
●●●
●●●●
●
●●● ●●
●
●
●
●
● ●
●
●●●●
●
●●●● ●
●● ●
●●●
●
●●
●● ●●
●● ●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●●
● ●
●
●
●●●
●
●●
●●
●●●●
●
●
●
●●● ●
●
●
●
●
●
●
●●
●●●
●
●●
●●
●●
●
●●
●
●
● ●
●
●
●●●
●
●
●●
●
●●
●●
●●
●
●●
●
●
●
●
●●
●
●
●● ●● ●
●●●●
●●●
●●●
●●● ●●
●
●
●
●
●●
●
●●●●
●
●●●● ●
●● ●
●●●
●
●●
●● ●●
●●●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●●
●
●
●●●
●
●●
●●
● ●●●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●●●
●
●●
●●
●●
●
●●
●
●
●●
●
●
●● ●
●
●
●●
●
●●
●●
●●
●
●●
●
●
●
●
●●
●
●
1 2 3 4 5 6 7
●●●●●
●●●●●●●
●●●
●●●●●
●
●
●
●
●●
●
●●●●
●
●●●●●●
●●●●●
●
●●
●●●●
●● ●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●●● ●
●
●
●●
●
●
●●●
●
●●●●
●
●
●
●●●●
●
●
●
●
●
●
●●
●●●
●
●●
●●
●●
●
●●
●
●
● ●
●
●
●●●
●
●
●●
●
●●
●●
●●
●
●●
●
●
●
●
●●
●
●
Petal.Width
Iris Data
Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 24 / 33
Naive Bayes: Simple Example
1 2 3 4 5 6 7
0.0
0.2
0.4
0.6
Petal.Length
Den
sity
0.5 1.5 2.5
0.0
0.4
0.8
1.2
Petal.WidthD
ensi
ty
setosa versicolor virginica
setosa 50 0 0versicolor 0 47 3virginica 0 3 47
Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 25 / 33
Naive Bayes: Non-Parametric Estimation
cancer.nb <- NaiveBayes(malignant ~. ,data = cancer, usekernel = TRUE )
0 10 445 91 12 232
TRUE (97%)
0 10 434 61 23 235
FALSE (96%)
2 4 6 8 10
0.00
0.10
thickness
Den
sity
Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 26 / 33
Naive Bayes: Priors
Bayesian approaches allow for the inclusion of prior information into themodel. In this context priors are applied to the probability of belonging toa group.
0 1
0 442 31 15 238
P(0) = 0.1; P(1) = 0.9
0 1
0 443 41 14 237
P(0) = 0.25; P(1) = 0.75
0 1
0 444 71 13 234
P(0) = 0.5; P(1) = 0.5
0 1
0 445 181 12 223
P(0) = 0.9; P(1) = 0.1
Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 27 / 33
Naive Bayes: Sepsis Example
Fit Naive Bayes model on data regarding whether or not a child developssepsis based on three predictors. Data from research done at KaiserPermanente (Courtesy of Dr. David Draper at UCSC).
Response is whether or not a child developed Sepsis (66,846observations).
This is an extremely bad outcome for a child. They either die or gointo a permanent coma.
Two predictors ANC and I2T are derived from a blood test called theComplete Blood Count.
Third predictor is the baby’s age in hours at which the blood testresults were obtained.
In this scenario, our goal is likely to predict accurately which children willget sepsis. Overall classification rate is likely not as important.
Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 28 / 33
Naive Bayes: Sepsis Example
0 10 20 30
0.0
0.2
0.4
0.6
0.8
1.0
Age <= 1
ANC
I2T
0 20 40
0.0
0.2
0.4
0.6
0.8
1.0
1 < Age <= 4
ANC
I2T
0 20 40 60 80
0.0
0.2
0.4
0.6
0.8
1.0
4 < Age
ANCI2
T
Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 29 / 33
Naive Bayes: Sepsis Example
As can be seen by the confusion matrices below, the overall classificationrate of around 98.6% does not make up for the fact that we only correctlyclassify the sepsis incidence 11% of the time.
0 1
0 65887 2161 715 28
usekernel = FALSE
0 1
0 66589 2391 13 5
usekernel = TRUE
Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 30 / 33
Naive Bayes: Pros and Cons
AdvantagesDue to the independence assumption, there is no curse ofdimensionality! We can fit models with p > n without any issues.Independence assumption simplifies math and speeds up the modelfitting process.We can include prior information to increase accuracy.Have the option of not making distributional assumptions about eachpredictor.
DisadvantagesIf independence assumptions do not hold, this model can performpoorly.Non-parametric approach can overfit data.Our prior information could be wrong.
Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 31 / 33
Logistic Regression with Sepsis Data
How does Logistic regression fare with the Sepsis data?
0 1
0 66510 2331 92 11
Cutoff: 0.1, % Correct: 99.51%
0 1
0 66245 2171 357 27
Cutoff: 0.05, % Correct: 99.14%
0 1
0 65582 1801 1020 64
Cutoff: 0.025, % Correct: 98.20%
0 1
0 63339 1351 3263 109
Cutoff: 0.01, % Correct: 94.91%
Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 32 / 33
The End
Questions?
Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 33 / 33