Chapter 2 (Part 2): Bayesian Decision Theory (Sections 2.3-2.5)
description
Transcript of Chapter 2 (Part 2): Bayesian Decision Theory (Sections 2.3-2.5)
![Page 1: Chapter 2 (Part 2): Bayesian Decision Theory (Sections 2.3-2.5)](https://reader034.fdocuments.in/reader034/viewer/2022050819/56814eee550346895dbc7d06/html5/thumbnails/1.jpg)
Pattern Classification, Chapter 2 (Part 2)
1
Pattern Classification
All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000 with the permission of the authors and the publisher
![Page 2: Chapter 2 (Part 2): Bayesian Decision Theory (Sections 2.3-2.5)](https://reader034.fdocuments.in/reader034/viewer/2022050819/56814eee550346895dbc7d06/html5/thumbnails/2.jpg)
Chapter 2 (Part 2): Bayesian Decision Theory
(Sections 2.3-2.5)
• Minimum-Error-Rate Classification
• Classifiers, Discriminant Functions and Decision Surfaces
• The Normal Density
![Page 3: Chapter 2 (Part 2): Bayesian Decision Theory (Sections 2.3-2.5)](https://reader034.fdocuments.in/reader034/viewer/2022050819/56814eee550346895dbc7d06/html5/thumbnails/3.jpg)
Pattern Classification, Chapter 2 (Part 2)
3
Minimum-Error-Rate Classification
•Actions are decisions on classesIf action i is taken and the true state of nature is j then:
the decision is correct if i = j and in error if i j
•Seek a decision rule that minimizes the probability of error which is the error rate
![Page 4: Chapter 2 (Part 2): Bayesian Decision Theory (Sections 2.3-2.5)](https://reader034.fdocuments.in/reader034/viewer/2022050819/56814eee550346895dbc7d06/html5/thumbnails/4.jpg)
Pattern Classification, Chapter 2 (Part 2)
4
• Introduction of the zero-one loss function:
Therefore, the conditional risk is:
“The risk corresponding to this loss function is the average probability error”
c,...,1j,i ji 1
ji 0),( ji
1jij
cj
1jjjii
)x|(P1)x|(P
)x|(P)|()x|(R
![Page 5: Chapter 2 (Part 2): Bayesian Decision Theory (Sections 2.3-2.5)](https://reader034.fdocuments.in/reader034/viewer/2022050819/56814eee550346895dbc7d06/html5/thumbnails/5.jpg)
Pattern Classification, Chapter 2 (Part 2)
5
•Minimize the risk requires maximize P(i | x)
(since R(i | x) = 1 – P(i | x))
•For Minimum error rate
•Decide i if P (i | x) > P(j | x) j i
![Page 6: Chapter 2 (Part 2): Bayesian Decision Theory (Sections 2.3-2.5)](https://reader034.fdocuments.in/reader034/viewer/2022050819/56814eee550346895dbc7d06/html5/thumbnails/6.jpg)
Pattern Classification, Chapter 2 (Part 2)
6
• Regions of decision and zero-one loss function, therefore:
• If is the zero-one loss function which means:
b1
2
a1
2
)(P
)(P2 then
0 1
2 0 if
)(P
)(P then
0 1
1 0
)|x(P
)|x(P :if decide then
)(P
)(P. Let
2
11
1
2
1121
2212
![Page 7: Chapter 2 (Part 2): Bayesian Decision Theory (Sections 2.3-2.5)](https://reader034.fdocuments.in/reader034/viewer/2022050819/56814eee550346895dbc7d06/html5/thumbnails/7.jpg)
Pattern Classification, Chapter 2 (Part 2)
7
![Page 8: Chapter 2 (Part 2): Bayesian Decision Theory (Sections 2.3-2.5)](https://reader034.fdocuments.in/reader034/viewer/2022050819/56814eee550346895dbc7d06/html5/thumbnails/8.jpg)
Pattern Classification, Chapter 2 (Part 2)
8Classifiers, Discriminant Functions
and Decision Surfaces
•The multi-category case
•Set of discriminant functions gi(x), i = 1,…, c
•The classifier assigns a feature vector x to class i
if: gi(x) > gj(x) j i
![Page 9: Chapter 2 (Part 2): Bayesian Decision Theory (Sections 2.3-2.5)](https://reader034.fdocuments.in/reader034/viewer/2022050819/56814eee550346895dbc7d06/html5/thumbnails/9.jpg)
Pattern Classification, Chapter 2 (Part 2)
9
![Page 10: Chapter 2 (Part 2): Bayesian Decision Theory (Sections 2.3-2.5)](https://reader034.fdocuments.in/reader034/viewer/2022050819/56814eee550346895dbc7d06/html5/thumbnails/10.jpg)
Pattern Classification, Chapter 2 (Part 2)
10
•Let gi(x) = - R(i | x)
(max. discriminant corresponds to min. risk!)
•For the minimum error rate, we take gi(x) = P(i | x)
(max. discrimination corresponds to max. posterior!)
gi(x) P(x | i) P(i)
gi(x) = ln P(x | i) + ln P(i)
(ln: natural logarithm!)
![Page 11: Chapter 2 (Part 2): Bayesian Decision Theory (Sections 2.3-2.5)](https://reader034.fdocuments.in/reader034/viewer/2022050819/56814eee550346895dbc7d06/html5/thumbnails/11.jpg)
Pattern Classification, Chapter 2 (Part 2)
11
•Feature space divided into c decision regions
if gi(x) > gj(x) j i then x is in Ri
(Ri means assign x to i)
•The two-category case•A classifier is a “dichotomizer” that has two
discriminant functions g1 and g2
Let g(x) g1(x) – g2(x)
Decide 1 if g(x) > 0 ; Otherwise decide 2
![Page 12: Chapter 2 (Part 2): Bayesian Decision Theory (Sections 2.3-2.5)](https://reader034.fdocuments.in/reader034/viewer/2022050819/56814eee550346895dbc7d06/html5/thumbnails/12.jpg)
Pattern Classification, Chapter 2 (Part 2)
12
•The computation of g(x)
)(P
)(Pln
)|x(P
)|x(Pln
)x|(P)x|(P)x(g
2
1
2
1
21
![Page 13: Chapter 2 (Part 2): Bayesian Decision Theory (Sections 2.3-2.5)](https://reader034.fdocuments.in/reader034/viewer/2022050819/56814eee550346895dbc7d06/html5/thumbnails/13.jpg)
Pattern Classification, Chapter 2 (Part 2)
13
![Page 14: Chapter 2 (Part 2): Bayesian Decision Theory (Sections 2.3-2.5)](https://reader034.fdocuments.in/reader034/viewer/2022050819/56814eee550346895dbc7d06/html5/thumbnails/14.jpg)
Pattern Classification, Chapter 2 (Part 2)
14
The Normal Density
• Univariate density
• Density which is analytically tractable
• Continuous density
• A lot of processes are asymptotically Gaussian
• Handwritten characters, speech sounds are ideal or prototype corrupted by random process (central limit theorem)
Where: = mean (or expected value) of x 2 = expected squared deviation or variance
,x
2
1exp
2
1)x(P
2
![Page 15: Chapter 2 (Part 2): Bayesian Decision Theory (Sections 2.3-2.5)](https://reader034.fdocuments.in/reader034/viewer/2022050819/56814eee550346895dbc7d06/html5/thumbnails/15.jpg)
Pattern Classification, Chapter 2 (Part 2)
15
![Page 16: Chapter 2 (Part 2): Bayesian Decision Theory (Sections 2.3-2.5)](https://reader034.fdocuments.in/reader034/viewer/2022050819/56814eee550346895dbc7d06/html5/thumbnails/16.jpg)
Pattern Classification, Chapter 2 (Part 2)
16
• Multivariate density
• Multivariate normal density in d dimensions is:
where:
x = (x1, x2, …, xd)t (t stands for the transpose vector form)
= (1, 2, …, d)t mean vector = d*d covariance matrix
|| and -1 are determinant and inverse respectively
)x()x(
2
1exp
)2(
1)x(P 1t
2/12/d
![Page 17: Chapter 2 (Part 2): Bayesian Decision Theory (Sections 2.3-2.5)](https://reader034.fdocuments.in/reader034/viewer/2022050819/56814eee550346895dbc7d06/html5/thumbnails/17.jpg)
Pattern Classification, Chapter 2 (Part 2)
17
Appendix
•Variance=S2
•Standard Deviation=S
2
1
2 )(1
1xx
nS
n
ii
![Page 18: Chapter 2 (Part 2): Bayesian Decision Theory (Sections 2.3-2.5)](https://reader034.fdocuments.in/reader034/viewer/2022050819/56814eee550346895dbc7d06/html5/thumbnails/18.jpg)
Pattern Classification, Chapter 2 (Part 2)
18
Bays theorem
A ﹁ A
B A and B ﹁ A and B
﹁ B A and ﹁ B ﹁ A and ﹁ B
)|()()|()(
)|()()|(
ABPAPABPAP
ABPAPBAP
)(
)|()()|(
BP
ABPAPBAP
![Page 19: Chapter 2 (Part 2): Bayesian Decision Theory (Sections 2.3-2.5)](https://reader034.fdocuments.in/reader034/viewer/2022050819/56814eee550346895dbc7d06/html5/thumbnails/19.jpg)
Pattern Classification, Chapter 2 (Part 2)
19