On Discriminative vs. Generative classifiers: Naïve Bayes Presenter : Seung Hwan, Bae.
-
Upload
baldric-ball -
Category
Documents
-
view
224 -
download
1
Transcript of On Discriminative vs. Generative classifiers: Naïve Bayes Presenter : Seung Hwan, Bae.
On Discriminative vs. Gener-ative classifiers: Naïve Bayes
Presenter : Seung Hwan, Bae
2
Andrew Y. Ng and Michael I. JordanNeural Information Processing System (NIPS),
2001 (slides adapted from Ke Chen from University of Manchester
and YangQiu Song from MSRA)Total Citation: 831
3
Machine Learning
4
Training classifiers involves estimating f: X->Y, or P(Y|X)– X: Training data, Y: Labels
Discriminative classifiers(also called ‘infor-mative’ by Rubinstein & Hastie):– Assume some functional form from for P(Y|X)– Estimate parameters of P(Y|X) directly from training data
Generative classifier– Assume some functional from for P(X|Y), P(X)– Estimate parameters of P(X|Y), P(X) directly from train-
ing data– Use Bayes rule to calculate
Generative vs. Discriminative Classi-fiers
5
Bayes Formula
6
Generative Model
• Color• Size• Texture• Weight• …
7
Discriminative Model
Logistic Regression
• Color• Size• Texture• Weight• …
8
Generative models– Assume some functional form for P(X|Y), P(Y)– Estimate parameters of P(X|Y), P(Y) directly from training
data– Use Bayes rule to calculate P(Y|X=x)
Discriminative models– Directly assume some functional form for P(Y|X)– Estimate parameters of P(Y|X) directly from training data
Comparison
Y
X2X1
Y
X2X1
Naïve BayesGenerative
Logistic RegressionDiscriminative
9
Probability Basics
• Prior, conditional and joint probability for random variables– Prior probability:
– Conditional probability: – Joint probability: – Relationship:– Independence:
• Bayesian Rule
)| ,)( 121 XP(XX|XP 2
)()()(
)(X
XX
PCPC|P
|CP
)(XP
) )( ),,( 22 ,XP(XPXX 11 XX
)()|()()|() 2211122 XPXXPXPXXP,XP(X1
)()() ),()|( ),()|( 212121212 XPXP,XP(XXPXXPXPXXP 1
EvidencePriorLikelihood
Posterior
10
Establishing a probabilistic model for classi-fication– Discriminative model
Probabilistic Classification
),, , )( 1 n1L X(Xc,,cC|CP XX
),,,( 21 nxxx x
Discriminative Probabilistic Classifier
1x 2x nx
)|( 1 xcP )|( 2 xcP )|( xLcP
11
Establishing a probabilistic model for classi-fication (cont.)– Generative model
Probabilistic Classification
),, , )( 1 n1L X(Xc,,cCC|P XX
GenerativeProbabilistic Model
for Class 1
)|( 1cP x
1x 2x nx
GenerativeProbabilistic Model
for Class 2
)|( 2cP x
1x 2x nx
GenerativeProbabilistic Model
for Class L
)|( LcP x
1x 2x nx
),,,( 21 nxxx x
12
MAP classification rule– MAP: Maximum A Posterior– Assign x to c* if
Generative classification with the MAP rule– Apply Bayesian rule to convert them into posterior prob-
abilities
– Then apply the MAP rule
Probabilistic Classification
Lc,,cccc|cCP|cCP 1** , )( )( xXxX
Li
cCPcC|P
PcCPcC|P
|cCP
ii
iii
,,2,1 for
)()(
)()()(
)(
xX
xXxX
xX
13
Bayes classification
- Difficulty: learning the joint probability
- If the number of feature n is large or when a feature can take on a large number of values, then basing such a model on probability tables is infeasible.
Naïve Bayes
)()|,,()()( )( 1 CPCXXPCPC|P|CP n XX
)|,,( 1 CXXP n
14
Naïve Bayes classification– Assume that all input attributes are conditionally inde-
pendent!
– MAP classification rule: for
Naïve Bayes
)|()|()|(
)|,,()|(
)|,,();,,|()|,,,(
21
21
22121
CXPCXPCXP
CXXPCXP
CXXPCXXXPCXXXP
n
n
nnn
),,,( 21 nxxx x
Lnn ccccccPcxPcxPcPcxPcxP ,, , ),()]|()|([)()]|()|([ 1*
1***
1
15
Naïve Bayes Algorithm (for discrete input attributes)– Learning phase: Given a train set S,
Output: conditional probability tables; for elements
– Test phase: Given an unknown instance Look up tables to assign the label c* to X’ if
Naïve Bayes
;in examples with )|( estimate)|(ˆ
),1 ;,,1( attributeeach of valueattributeevery For
;in examples with )( estimate)(ˆ
of et value each targFor 1
S
S
ijkjijkj
jjjk
ii
Lii
cCxXPcCxXP
N,knj Xx
cCPcCP
)c,,c(c c
LNX jj ,
),,( 1 naa X
Lnn ccccccPcaPcaPcPcaPcaP ,, , ),(̂)]|(̂)|(̂[)(̂)]|(̂)|(̂[ 1*
1***
1
16
Example
• Example: Play Tennis
17
Learning phase
Example
Outlook Play=Yes
Play=No
Sunny 2/9 3/5Overcast 4/9 0/5
Rain 3/9 2/5
Temperature
Play=Yes Play=No
Hot 2/9 2/5Mild 4/9 2/5Cool 3/9 1/5
Humidity Play=Yes
Play=No
High 3/9 4/5Normal 6/9 1/5
Wind Play=Yes
Play=No
Strong 3/9 3/5Weak 6/9 2/5
P(Play=Yes) = 9/14P(Play=No) = 5/14
18
Test Phase– Given a new instances x’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)
– Look up tables
– MAP rule
Example
P(Outlook=Sunny|Play=Yes) = 2/9
P(Temperature=Cool|Play=Yes) = 3/9
P(Huminity=High|Play=Yes) = 3/9
P(Wind=Strong|Play=Yes) = 3/9
P(Play=Yes) = 9/14
P(Outlook=Sunny|Play=No) = 3/5
P(Temperature=Cool|Play==No) = 1/5
P(Huminity=High|Play=No) = 4/5
P(Wind=Strong|Play=No) = 3/5
P(Play=No) = 5/14
P(Yes|x’): [P(Sunny|Yes)P(Cool|Yes)P(High|Yes)P(Strong|Yes)]P(Play=Yes)
= 0.0053 P(No|x’): [P(Sunny|No) P(Cool|No)P(High|No)P(Strong|No)]P(Play=No) = 0.0206
Given the fact P(Yes|x’) < P(No|x’), we label x’ to be “No”.
19
• Test Phase– Given a new instance, x’=(Outlook=Sunny, Temperature=Cool, Humidity=High,
Wind=Strong)– Look up tables
– MAP rule
P(Outlook=Sunny|Play=No) = 3/5
P(Temperature=Cool|Play==No) = 1/5
P(Huminity=High|Play=No) = 4/5
P(Wind=Strong|Play=No) = 3/5
P(Play=No) = 5/14
P(Outlook=Sunny|Play=Yes) = 2/9
P(Temperature=Cool|Play=Yes) = 3/9
P(Huminity=High|Play=Yes) = 3/9
P(Wind=Strong|Play=Yes) = 3/9
P(Play=Yes) = 9/14
P(Yes|x’): [P(Sunny|Yes)P(Cool|Yes)P(High|Yes)P(Strong|
Yes)]P(Play=Yes) = 0.0053 P(No|x’): [P(Sunny|No) P(Cool|No)P(High|No)P(Strong|No)]P(Play=No) = 0.0206
Given the fact P(Yes|x’) < P(No|x’), we label x’ to be
“No”.
Example
20
Violation of Independent Assumption– For many real world tasks,– Nevertheless, naïve Bayes works surprisingly well any-
way! Zero conditional probability problem
– In no example contains the attribute value– In this circumstance, during test– For a remedy, conditional probabilities estimated with
Relevant Issues
)|()|( )|,,( 11 CXPCXPCXXP nn
0)|(̂ , ijkjjkj cCaXPaX0)|(ˆ)|(ˆ)|(ˆ
1 inijki cxPcaPcxP
)1 examples, virtual"" of(number prior o weight t:
) of valuespossible for /1 (usually, estimateprior :
for which examples trainingofnumber :
C and for which examples trainingofnumber :
)|(ˆ
mm
Xttpp
cCn
caXnmn
mpncCaXP
j
i
ijkjc
cijkj
21
Continuous-valued Input Attributes– Numberless vales for an attribute– Conditional probability modeled with the normal distribu-
tion
– Learning phase: Output: normal distributions and– Test phase:
• Calculate conditional probabilities with all the normal distribution• Apply the MAP rule to make a decision
Relevant Issues
ijji
ijji
ji
jij
jiij
cC
cX
XcCXP
for which examples of X valuesattribute ofdeviation standard :
Cfor which examples of valuesattribute of (avearage)mean :
2
)(exp
2
1)|(ˆ
2
2
Ln ccCXX ,, ),,,(for 11 X
Ln LicCP i ,,1 )( ),,(for 1 nXX X
22
Naïve Bayes based on the independent as-sumption– A small amount of training data to estimate parameters
(means and variances of the variable)– Only the variances of variables for each class need to be
determined and not the entire covariance matrix– Test is straightforward; just looking up tables or calculat-
ing conditional probabilities with normal distribution
Advantages of Naïve Bayes
23
Performance competitive to most of state-of-art classifiers even in presence of violat-ing independence assumption
Many successful application, e.g., spam mail fitering
A good candidate of a base learner in en-semble learning
Apart from classification, naïve Bayes can do more…
Conclusion