CPE542 - 3 - Classifiers_Bayes
-
Upload
ronald-santos -
Category
Documents
-
view
213 -
download
0
Transcript of CPE542 - 3 - Classifiers_Bayes
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
1/72
1
Sergios Theodoridis
Konstantinos Koutroumbas
Version 2
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
2/72
2
PATTERN RECOGNITIONPATTERN RECOGNITION
Typical application areas
Machine vision Character recognition (OCR)
Computer aided diagnosis
Speech recognition
Face recognition
Biometrics
mage !ata Base retrieval
!ata mining
Bion"ormatics
The tas#$ %ssign un#no&n o'ects patterns into thecorrect class* This is #no&n as classi+cation*
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
3/72
,
Features$ These are measura'le -uantities o'tained
"rom the patterns. and the classi+cation tas# is 'asedon their respective values*
Feature vectors$ % num'er o" "eatures
constitute the "eature vector
Feature vectors are treated as random vectors*
,,...,1 l x x
[ ] l T l R x x x ∈= ,...,1
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
4/72
/
%n e0ample$
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
5/72
The classi+er consists o" a set o" "unctions. &hosevalues. computed at . determine the class to&hich the corresponding pattern 'elongs
Classi+cation system overvie&
x
sensor
"eaturegeneration
"eatureselection
classi+erdesign
systemevaluation
atterns
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
6/72
3
Supervised unsupervised pattern recognition$
The t&o maor directions
Supervised$ atterns &hose class is #no&n a4priori are used "or training*
nsupervised$ The num'er o" classes is (ingeneral) un#no&n and no training patterns areavaila'le*
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
7/72
5
C!ASSI"IERS #ASE$ ON #A%ESC!ASSI"IERS #ASE$ ON #A%ES
$ECISION T&EOR% $ECISION T&EOR%
Statistical nature o" "eature vectors
%ssign the pattern represented 'y "eaturevectorto the most pro'a'le o" the availa'le classes
That is ma0imum
[ ]T l 21 x ,..., x , x x =
x
M ω ω ω ,...,, 21
)(: x P x ii ω ω →
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
8/72
6
Computation o" a4posteriori pro'a'ilities %ssume #no&n
7 a4priori pro'a'ilities
7
This is also #no&n as the li#elihood o"
)()...,(),(21 M
P P P ω ω ω
M i x p i ...2,1,)( =ω
... i to r w x ω
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
9/72
8
∑=
=
=
⇒=
2
1
)()()(
)(
)()(
)(
)()()()(
i
ii
ii
i
iii
P x p x p
x p
P x p
x P
P x p x P x p
ω ω
ω ω
ω
ω ω ω
The Bayes rule ( Μ 92)
&here
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
10/72
1:
The Bayes classi+cation rule ("or t&o classes M 92)
;iven classi"y it according to the rule
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
11/72
11
)()(2211
ω ω →→ R R and
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
12/72
12
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
13/72
1,
ndeed$ Moving the threshold the total shadedarea ?CR
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
14/72
1/
The Bayes classi+cation rule "or many (M2)classes$
;iven classi"y it to i"$
Such a choice also minimies the classi+cationerror pro'a'ility
Minimiing the average ris# For each &rong decision. a penalty term is assigned
since some decisions are more sensitive than others
i j x P x P ji ≠∀> )()( ω ω
x iω
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
15/72
1
For M 92
7 !e+ne the loss matri0
7 penalty term "or deciding class .although the pattern 'elongs to . etc*
Ris# &ith respect to
)(2221
1211
λ λ
λ λ = L
12λ
1ω
1ω
xd x p xd x pr R R
)()( 112111121
ω λ ω λ ∫ ∫ +=
2ω
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
16/72
13
Ris# &ith respect to
%verage ris#
2ω
xd x p xd x pr R R
)()( 222221221
ω λ ω λ ∫ ∫ +=
)()( 2211 ω ω P r P r r +=
⇒ ro'a'ilities o" &rongdecisions. &eighted 'y thepenalty terms
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
17/72
15
Choose and so that r is minimied
Then assign to i"
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
18/72
16
y probabiliterror
tionclassificaMinimumif
)()( if
)()( if
0and
2
1)()(
1221
21
12122
12
21211
221121
⇒=
>→
>→
====
λ λ
λ λ ω ω ω
λ
λ ω ω ω
λ λ ω ω
x P x P x
x P x P x
P P "
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
19/72
18
%n e0ample$
=−
==−
−−=−
−=−
00.1
5.00
2
1)()(
))1(exp(1
)(
)exp(1
)(
21
2
2
2
1
L
P P
x x p
x x p
ω ω
π ω
π ω
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
20/72
2:
Then the threshold value is$
Threshold "or minimum r
2
1
))1(exp()exp( :: minimumfor
0
22
0
0
=
⇒−−=−
x
x x x P x e
2
1
2
)21(ˆ
))1((exp2)(exp :ˆ
0
22
0
<−
=⇒−−=−
n x
x x x
0ˆ x
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
21/72
21
Thus moves to the le"t o"
(DEG)0
ˆ x 02
1 x=
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
22/72
22
$ISCRI'INANT "NCTIONS$ISCRI'INANT "NCTIONS
$ECISION SR"ACES$ECISION SR"ACES
" are contiguous$
is the sur"ace separating the regions* On oneside is positive (H). on the other is negative(4)* t is #no&n as !ecision Sur"ace
)()( :
)()( :
x P x P R
x P x P R
i j j
jii
ω ω
ω ω
>
>
ji R R , 0)()()( =−≡ x P x P x g ji ω ω
H
4 0 x g =)(
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
23/72
2,
" f (.) monotonic. the rule remains the same i" &euse$
is a dis(riminant )un(tion
n general. discriminant "unctions can 'e de+nedindependent o" the Bayesian rule* They lead tosu'optimal solutions. yet i" chosen appropriately.can 'e computationally more tracta'le*
ji x P f x P f x jii ≠∀>→ ))(())(( :if ω ω ω
))(()( x P f x g ii ω ≡
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
24/72
2/
#A%ESIAN C!ASSI"IER "OR NOR'A!#A%ESIAN C!ASSI"IER "OR NOR'A!
$ISTRI#TIONS$ISTRI#TIONS
Multivariate ;aussian pd"
called covariance matri0
[ ]
[ ] ))((
inmatrix
)()(2
1exp
)2(
1)( 1
2
12
Τ
−Τ
−−=Σ
×=
−Σ−−
Σ=
iii
ii
iii
i
i
x x E
x E
x x x p
µ µ
ω µ
µ µ
π
ω
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
25/72
2
is monotonic* !e+ne$
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
26/72
23
That is. is -uadratic and thesur"aces
-uadrics. ellipsoids. para'olas.hyper'olas.pairs o" lines*
For e0ample$
iiii
iii
C P
x x x x x g
+++−
+++−=
)ln()(
2
1
)(1
)(2
1)(
2
2
2
12
22112
2
2
2
12
ω µ µ
σ
µ µ σ σ
)( x g i0)()( =− x g x g ji
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
27/72
25
!ecision Eyperplanes
Iuadratic terms$
" %== (the same) the -uadraticterms are not o" interest* They are notinvolved in comparisons* Then.
e-uivalently. &e can &rite$
!iscriminant "unctions are =?
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
28/72
26
=et in addition$
7
7
7
7 22
02
2
)(
)(ln)(
2
1
,
)(
0)()()(
1)(
!en .
ji
ji
j
i
jio
ji
o
T
jiij
i
T
i
i
P
P x
w
x xw
x g x g x g
w x x g
! Σ
µ µ
µ µ
ω
ω σ µ µ
µ µ
µ
σ
σ
−
−−+=
−=
−=
=−=
+=
=
?ondiagonal$ ΙσΣ 2≠
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
29/72
28
?ondiagonal$
7
7
7
!ecision hyperplane
Ι σ Σ ≠
0)()( 0 =−= x xw x g T
ij
)(1 ji
w µ µ −Σ= −
2
1
1
20
)(
)
)(
)((n)(
2
1
1
1
x x x
P
P x
T
ji
ji
j
i
ji
−Σ
Σ≡
−
−−+=
−
−Σ µ µ
µ µ
ω
ω µ µ
)(tonormal
tonormalnot
1
ji
ji
µ µ
µ µ
−Σ
−−
Minimum !istance Classi+ers
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
30/72
,:
Minimum !istance Classi+ers
e-uipro'a'le
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
31/72
,1
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
32/72
,2
:tionclassifica$ayesianusin#2.2
0.1 %ector !eclassify t
&.1'.0
'.01.1
,'
'
,0
0
),,()(
),,()(and)()( :,i%en
2122
112121
=
=Σ
=
==
==
x
Σ # x p
Σ # x p P P
µ µ µ ω
µ ω ω ω ω ω
−
−=•55.015.015.0&5.0 1 Σ
[ ]
[ ] *+2.'.0
0.2
.0,0.2,&52.22.2
0.1
2.2,0.1:,fromsMa!alanobi-ompute
1
2,
21
1,2
21
=
−
−
Σ−−==
=•
−−"
""
d Σ
d d µ µ
1,,21 t!atbser%e .-lassify E E d d x
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
33/72
,,
Ma0imum =i#elihood
{ }
:met!od!e
to/.r.of ieli!ood t!easno/nis/!ic!
)(
),...,()(
,...,
)()( : parameter
ctor unno/n %eaninno/n /it!)(et
tindependenandno/n,....,,et
1
21
21
21
$
x p
x x x p $ p
x x x $
x p x p
x p
x x x
%
#
%
#
#
#
θ
θ
θ θ
θ θ
=Π=
≡=
≡
ESTI'ATION O" NKNO*N PRO#A#I!IT%$ENSIT% "NCTIONS S u p
p o r t S l
i d e
ide
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
34/72
,/
0)(
)(
)(
1
)(
)( :ˆ
)(ln)(ln)(
)(maxar# :ˆ
1
1
1M
=∂
∂Σ=
∂∂
Σ=≡
Π
=
=
=
θ
θ
θ θ
θ θ
θ θ θ
θ θ θ
%
%
#
% ML
%
#
%
%
&
%
x p
x p
L
x p $ p L
x p
S u p p o r
t S l i d e
ide
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
35/72
,
S u p p o r
t S l i d e
lide
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
36/72
,3
%symptotically un'iased and consistent
0ˆlim
34lim
t!en,)()(
suc! t!ataist!ereindeed,If,
2
0 5
0 5
0
0
=−
=
=
∞→
→∝
θ θ
θ θ
θ
θ
ML
ML
E
E
x p x p
S u p p o r
t S l i d e
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
37/72
,5
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
38/72
,6
Ma0imum %posteriori ro'a'ility
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
39/72
,8
The method$
ML M'P
M'P
M'P
p
$ p P
$ p
θ θ θ
θ θ θ
θ
θ θ θ
≅
∂∂
=
ˆenou#! broadoruniformis)(If
))()((:ˆ
or )(maxar#ˆ
S u p p o r
t S l i d e
Slide
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
40/72
/:
S u p p o r
t S l i d e
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
41/72
/1
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
42/72
/2
Bayesian n"erence
8o/99
) p(estimate :#oal!e
)(and)(,,...,; :i%en
follo/ed.isrootdifferenta 8ere
.forestimatesin#leaM"7M,
1
$ x
p x p x x $ # θ θ
θ
=
⇒
S u p p o r
t S l i d e
∫ Slide
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
43/72
/,
∑=
=+
=++
=
→•
→•
→•
#
%
% # #
# #
x #
x # #
x #
# $ p
# p
# x p Let
122
0
2
0
22
22
0
0
22
0
2
2
00
2
1 ,,
),()( :out t!atIt turns
),()(
),()(
exampleanainsi#!t %imore bit"
σ σ
σ σ σ
σ σ
µ σ σ µ
σ µ µ
σ µ µ
σ µ µ
)()(
)()(
)()(
)(
)()()(
)()(
1θ θ
θ θ θ
θ θ θ θ θ
θ θ θ
%
#
% x p $ p
d p $ p
p $ p
$ p
p $ p $ p
d $ p x p $ x p
=Π=
==
=
∫
∫ )( S u p
p o r t S l
i d
Slide
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
44/72
//
The a'ove is a se-uence o" ;aussians as
Ma0imum
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
45/72
/
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
46/72
/3
%ssume parametric modeling. i*e*.
The goal is to estimate
given a set
Dhy not M=G %s 'e"oreG
∑ ∫
∑
=
=
==
=
M
j
j
+
j
j
xd j x p P
P j x p x p
1 x
1
1)(,1
)()(
) j x( p θ
j P P P ,...,, and 21θ { } # 21 x ,..., x , x $ =
),...,,(max1,...,,
ji%
#
% P P
P P x P ji
θ θ
=Π
S u p p o r
t S l i d
Slid e
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
47/72
/5
This is a nonlinear pro'lem due to themissing la'el in"ormation* This is a typicalpro'lem &ith an incomplete data set*
The
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
48/72
/6
7 =et
7 Dhat &e need is to compute
7 But are not o'served* Eere comes the
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
49/72
/8
g
7
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
50/72
:
Jn#no&n parameters
7
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
51/72
1
2
2
x3 x3
x3 +−
0,,0
if ,as)()(ˆ , continuous)(If
→∞→→
∞→→
#
% % 2
# x p x p x p
# # #
2ˆ,
1)ˆ(ˆ)(ˆ
totalin
x x
#
%
x p x p
# %
# % P
#
# #
≤=≡
≈
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
52/72
2
aren Dindo&s
!ivide the multidimensional space inhypercu'es
!e+ne
≤11
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
53/72
,
7 That is. it is 1 inside a unit side hypercu'e centered at :
7
7
7 The pro'lem$
7 aren &indo&s4#ernels4potential "unctions
))(1
(1
)(ˆ1
∑=
−=
#
i
i
l
x x
# x p ϕ
x
#
atcentered!ypercubeside!an
inside pointsof number>
1
>%olume
1
ousdiscontinu (.)
continuous )(
ϕ
x p
∫ =≥
x
xd x x
x
1)( ,0)(
smoot!is)(
ϕ ϕ
ϕ
≤=ot!er/ise
2
1
0
1)( iji
x xϕ
Mean valuet Sl
i d e
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
54/72
/
7
7
7
7
Eence un'iased in the limit
∫ ∑ −=
−=
= =1
=)=()=
(1
)3)(41
(1
)3(ˆ4 x
l
#
i
i
l xd x p
2
x x
22
x x E
# 2 x p E ϕ ϕ
∞→→l 2
1 ,02
0)=
(of /idt!t!e0
→
−
→
x x ϕ
∫ =−
1)=
(1
xd 2
x x
2l ϕ
)()(1
0 x
2
x
2
2l
δ ϕ →→
∫ =−==
)(=)=()=()3(ˆ4 x
x p xd x p x x x p E δ
S u p p o r
t S l
Kariance
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
55/72
7 The smaller the the higher the variance
40.1, #41000 40.5, #41000
40.1, #410000
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
56/72
3
The higher the # the 'etter the accuracy
" 2 0
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
57/72
5
7
7
7
asymptotically un'iased
The method
7 Remem'er$
7
∞→
∞→
→
&
2
#
2 0
θ λ λ
λ λ
ω
ω
ω
ω ≡
−
−><−
−
∑
∑
=
= #
i
i
l
#
i
i
l
2
x x
2 #
2
x x
2 #
CJRS< OF !M
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
58/72
6
CJRS< OF !M
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
59/72
8
?%K< B%
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
60/72
3:
N ?earest ?eigh'or !ensity
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
61/72
31
7
θ )( 11
22
22
11 >
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
62/72
32
g
Choose % out o" the # training vectors. identi"ythe % nearest ones to x
Out o" these % identi"y % i that 'elong to class :i
The simplest version
%41 AAA
For large # this is not 'ad* t can 'e sho&n that$i" P ; is the optimal Bayesian error pro'a'ility.
then$
ji% % → "ssi#n ω
2)1
2( ; ; ; ## ; P P M
M P P P ≤
−−≤≤
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
63/72
3,
For small P ;$
%
P 2 P P P ## ;%## ; +≤≤
;%## P P ,% →∞→
2
' )('
2
; ; ##
; ##
P P P
P P
+≅≅
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
64/72
3/
Koronoi tesselation
{ } ji x xd x xd x R jii ≠
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
65/72
3
Bayes ro'a'ility Chain Rule
%ssume no& that the conditional dependence
"or each xi is limited to a su'set o" the "eaturesappearing in each o" the product terms* That is$
&here
)()@(...
...),...,@(),...,@(),...,,(
112
1211121
x p x x p
x x x p x x x p x x x p
⋅⋅
⋅⋅= −−−
∏=
⋅=
2
121 )@()(),...,,(i
ii ' x p x p x x x p
{ }121 ,...,, x x x ' iii −−⊆
S u p p o r
t
ort S l i d
e
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
66/72
33
For e0ample. i" ℓ93. then &e could assume$
Then$
The a'ove is a generaliation o" the ?ave Bayes* For the ?ave Bayes the assumptionis$
'i B C, for iB1, 2, 7, 6
),@(),...,@( D5*15* x x x p x x x p =
{ } { }15D5* ,...,, x x x x ' ⊆=
S u p p o r
t
ort S l i d
e
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
67/72
35
% graphical &ay to portray conditionaldependencies is given 'elo&
%ccording to this +gure&e have that$
E x* is conditionally
dependent on xD , x5.
E x=
on x>
E xD on x1 , x2
E x? on x2
E x1 , x2 are conditionally
independent on other
varia'les*
For this case$)()()@()@(),@(),...,,( 122'D5D5**21 x p x p x x p x x p x x x p x x x p ⋅⋅⋅⋅=
S u p p o r
t
Bayesian ?et&or#s port S l i
d e
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
68/72
36
Bayesian ?et&or#s
$e+nition, % Bayesian ?et&or# is a directedacyclic graph (!%;) &here the nodescorrespond to random varia'les*
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
69/72
38
The +gure 'elo& is an e0ample o" a Bayesian?et&or# corresponding to a paradigm "rom the
medical applications +eld* This Bayesiannet&or# modelsconditionaldependencies "or an
e0ample concerningsmo#ers (S).tendencies todevelop cancer (C)and heart disease
(E). together &ithvaria'lescorresponding toheart (E1. E2) andcancer (C1. C2)
medical tests*
S u p p o r
O !%; h ' t t d th i tpor t S l
i d e
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
70/72
5:
Once a !%; has 'een constructed. the ointpro'a'ility can 'e o'tained 'y multiplying themarginal (root nodes) and the conditional (non4
root nodes) pro'a'ilities*
Training$ Once a topology is given. pro'a'ilitiesare estimated via the training data set* Thereare also methods that learn the topology*
ro'a'ility n"erence$ This is the most commontas# that Bayesian net&or#s help us to solveeLciently* ;iven the values o" some o" the
varia'les in the graph. #no&n as evidence. thegoal is to compute the conditional pro'a'ilities"or some o" the other varia'les. given theevidence*
S u p p o r
por t S
l i d e
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
71/72
51
-
8/19/2019 CPE542 - 3 - Classifiers_Bayes
72/72
For a). a set o" calculations are re-uired thatpropagate "rom node x to node w* t turns out that P (w0@ x1) B 0.*'*
For '). the propagation is reversed in direction* tturns out that P ( x0@w1) B 0.D.
n general. the re-uired in"erence in"ormation iscomputed via a com'ined process o" @messagepassingA among the nodes o" the !%;*
Comple0ity$For singly connected graphs. message passing
algorithms amount to a comple0ity linear in thenum'er o" nodes*
S u p p o r