Post on 03-Apr-2018
7/29/2019 fischer linear discriminant
1/23
PCA finds the most accurate data representationin a lower dimensional space
Project data in the directions of maximum variance
Fisher Linear Discriminant project to a line whichpreserves direction useful for data classification
Data Representation vs. Data Classification
However the directions of maximum variance maybe useless for classification
apply PCA
to each class
separable
not separable
7/29/2019 fischer linear discriminant
2/23
Fisher Linear Discriminant
Main idea: find projection to a line s.t. samplesfrom different classes are well separated
bad line to project to,classes are mixed up
Example in 2D
good line to project to,classes are well separated
7/29/2019 fischer linear discriminant
3/23
Fisher Linear Discriminant
Suppose we have 2 classes and d-dimensionalsamples x1,,xn where n1 samples come from the first class n2 samples come from the second class
consider projection on a line
Let the line direction be given by unit vector v
vix
vt x i
Scalar vtxi is the distance ofprojection of xifrom the origin
Thus it vtxi is the projection ofxiinto a one dimensionalsubspace
7/29/2019 fischer linear discriminant
4/23
Fisher Linear Discriminant
How to measure separation between projections of
different classes?
Thus the projection of sample xionto a line indirection vis given by vtxi
====
========
1 1
1
1
111
1 11~
n
Cx
t
n
Cxi
ti
t
i i
vxn
vxvn
Let1 and2be the means of classes 1 and 2
Let and be the means of projections of
classes 1 and 21
~2
~
22~, tvsimilarly ====
seems like a good measure21~~
7/29/2019 fischer linear discriminant
5/23
Fisher Linear Discriminant
How good is as a measure of separation? The larger , the better is the expected separation
21 ~~
the vertical axes is a better line than the horizontalaxes to project to for class separability
however2121
~~ >>>>
21~~
1
2
1
2
1
~
2~
7/29/2019 fischer linear discriminant
6/23
Fisher Linear Discriminant
The problem with is that it does notconsider the variance of the classes
21 ~~
1
2
1~
2~
large variance
small
variance 1
2
7/29/2019 fischer linear discriminant
7/23
Fisher Linear Discriminant
We need to normalize by a factor which isproportional to variance
21 ~~
(((( ))))====
====
n
i
zizs
1
2
Define their scatteras
Have samples z1,,zn. Sample mean is ====
====
n
iiz z
n 1
1
Thus scatter is just sample variance multiplied by n
scatter measures the same thing as variance, the spread
of data around the mean scatter is just on different scale than variance
larger scatter: smaller scatter:
7/29/2019 fischer linear discriminant
8/23
Fisher Linear Discriminant
Fisher Solution: normalize by scatter21 ~~
(((( ))))
====
2Classy
22i
22
i
~ys~
Scatter for projected samples of class 2 is
Scatter for projected samples of class 1 is
(((( ))))
====
1
2
1
2
1~~
Classyi
i
ys
Let yi= vtxi, i.e. yi s are the projected samples
7/29/2019 fischer linear discriminant
9/23
Fisher Linear Discriminant
We need to normalize by both scatter of class 1 andscatter of class 2
(((( ))))(((( ))))
2
2
2
1
2
21
~~
~~
ssvJ
++++
====
Thus Fisher linear discriminant is to project on line
in the direction vwhich maximizes
want projected means are far from each other
want scatter in class 2 is assmall as possible, i.e. samplesof class 2 cluster around theprojected mean
2~
want scatter in class 1 is assmall as possible, i.e. samplesof class 1 cluster around theprojected mean 1
~
7/29/2019 fischer linear discriminant
10/23
Fisher Linear Discriminant
(((( )))) (((( ))))22
2
1
2
21
~~~~ss
vJ++++
====
If we find vwhich makes J(v) large, we are
guaranteed that the classes are well separated
1
~ 2~
small implies thatprojected samples ofclass 1 are clusteredaround projected mean
1
~s small implies that
projected samples ofclass 2 are clusteredaround projected mean
2
~s
projected means are far from each other
7/29/2019 fischer linear discriminant
11/23
Fisher Linear Discriminant Derivation
All we need to do now is to express Jexplicitly as a
function of vand maximize it straightforward but need linear algebra and Calculus
(((( )))) (((( ))))22
2
1
2
21
~~~~ss
vJ++++
====
Define the separate class scatter matrices S1 andS2 for classes 1 and 2. These measure the scatterof original samples xi(before projection)
(((( ))))(((( ))))====
1111
Classx
t
iii
xxS
(((( ))))(((( ))))
====
2
222
Classx
t
ii
i
xxS
7/29/2019 fischer linear discriminant
12/23
Fisher Linear Discriminant Derivation
Now define the withinthe class scatter matrix21 SSSW ++++====
Recall that (((( ))))
====
1
2
1
2
1~~
Classyi
i
ys
(( ))====
1Classy
2
1
t
i
t2
1i vxvs
~
Using yi= vtxi and 11
~ tv====
(((( ))))(((( )))) (((( ))))(((( ))))====
1Classy
t1i
t
t1i
i
vxvx
(((( ))))(((( ))))
====
1Classy
t
1i1it
i
vxxv vSvt 1====
(((( ))))(((( )))) (((( ))))(((( ))))
====
1Classy1i
tt
1it
i
xvxv
7/29/2019 fischer linear discriminant
13/23
Fisher Linear Discriminant Derivation
Similarly vSvs t 222~ ====
Lets rewrite the separations of the projected means
Therefore vSvvSvvSvss Wttt
====++++====++++ 21
2
2
2
1
~~
(((( )))) (((( ))))2212
21~~ tt vv ====
(((( ))))(((( )))) vv tt 2121 ====
vSv Bt
====
Define between the class scatter matrix(((( ))))(((( ))))tBS 2121 ====
SBmeasures separation between the means of two
classes (before projection)
7/29/2019 fischer linear discriminant
14/23
Fisher Linear Discriminant Derivation
Thus our objective function can be written:
Minimize J(v) by taking the derivative w.r.t. vand
setting it to 0
(((( ))))(((( ))))2vSv
vSvvSvdv
dvSvvSvdv
d
vJdv
d
Wt
BtWtWtBt
====
(((( )))) (((( ))))(((( ))))2
22
vSv
vSvvSvSvvS
Wt
BtWWtB ==== 0====
(((( ))))(((( ))))
vSv
vSv
s~s~
~~vJ
W
t
Bt
2
2
2
1
2
21====
++++
====
7/29/2019 fischer linear discriminant
15/23
Fisher Linear Discriminant Derivation
Need to solve (((( )))) (((( )))) 0==== vSvSvvSvSv WBtBWt
(((( )))) (((( ))))0====
vSv
vSvSv
vSv
vSvSv
Wt
WBt
Wt
BWt
(((( ))))0====
vSv
vSvSvvS
W
t
WBt
B
vSvS WB ========
generalized eigenvalue problem
7/29/2019 fischer linear discriminant
16/23
Fisher Linear Discriminant Derivation
If SWhas full rank (the inverse exists), can convert
this to a standard eigenvalue problem
vSvS WB ====
vvSS BW ==== 1
But SBx for any vector x, points in the same
direction as1 - 2
(((( ))))(((( )))) xxS tB 2121 ==== (((( )))) (((( )))) xt2121 ==== (((( ))))21 ====
Thus can solve the eigenvalue problem immediately
(((( ))))211 ====
WSv
(((( )))) (((( ))))[[[[ ]]]]211
W211
WB1
W SSSS ====
v v
(((( ))))211
WS ====
7/29/2019 fischer linear discriminant
17/23
Fisher Linear Discriminant Example
Data Class 1 has 5 samples c1=[(1,2),(2,3),(3,3),(4,5),(5,5)]
Class 2 has 6 samples c2=[(1,0),(2,1),(3,1),(3,2),(5,3),(6,5)]
Arrange data in 2 separate matrices
====
55
21c1
====
56
01c2
Notice that PCA performs verypoorly on this data because thedirection of largest variance is nothelpful for classification
7/29/2019 fischer linear discriminant
18/23
Fisher Linear Discriminant Example
First compute the mean for each class(((( )))) [[[[ ]]]]6.33cmean 11 ======== (((( )))) [[[[ ]]]]23.3cmean 22 ========
(((( ))))
======== 2.70.80.810ccov4S 11 (((( ))))
======== 1616163.17ccov5S 22
Compute scatter matrices S1 and S2 for each class
====++++==== 2.2324 243.27SSS 21W
Within the class scatter:
it has full rank, dont have to solve for eigenvalues
(((( ))))
========
47.041.0 41.039.0SinvS W1W The inverse of SW is
Finally, the optimal line direction v
(((( ))))
========
89.0
79.0Sv 211
W
7/29/2019 fischer linear discriminant
19/23
Fisher Linear Discriminant Example
Notice, as long as the linehas the right direction, its
exact position does notmatter
Last step is to compute
the actual 1D vector y.Lets do it separately foreach class
[[[[ ]]]] [[[[ ]]]]4.081.05251
73.065.011
====
========ttcvY
[[[[ ]]]] [[[[ ]]]]25.065.050
6173.065.0
22
====
========
ttcvY
7/29/2019 fischer linear discriminant
20/23
Multiple Discriminant Analysis (MDA)
Can generalize FLD to multiple classes In case of cclasses, can reduce dimensionality to
1, 2, 3,, c-1 dimensions
Project sample xi to a linear subspace yi= Vtxi Vis called projection matrix
7/29/2019 fischer linear discriminant
21/23
Multiple Discriminant Analysis (MDA)
Objective function: (((( ))))(((( ))))(((( ))))VSVdet
VSVdetVJ
Wt
Bt
====
within the class scatter matrix SW is
(((( ))))(((( )))) ==== ====
========
c
1i
t
ikiclassx
ik
c
1iiW xxSS
k
between the class scatter matrix SB is
(((( ))))(((( ))))tic
1iiiB nS ====
====
Let
====
iclassxi
i xn
1 ====
ixix
n
1
maximum rank is c -1
niby the number of samples of class i andibe the sample mean of class i
be the total mean of all samples
7/29/2019 fischer linear discriminant
22/23
Multiple Discriminant Analysis (MDA)
(((( )))) (((( ))))(((( ))))VSVdetVSVdetVJ
Wt
Bt
====
The optimal projection matrix Vto a subspace of
dimension kis given by the eigenvectors
corresponding to the largest keigenvalues
First solve the generalized eigenvalueproblem:
vSvS WB ====
At most c-1 distinct solution eigenvalues
Let v1, v2 ,, vc-1 be the corresponding eigenvectors
Thus can project to a subspace of dimension at
most c-1
7/29/2019 fischer linear discriminant
23/23
FDA and MDA Drawbacks
Reduces dimension only to k= c-1 (unlike PCA) For complex data, projection to even the best line may
result in unseparable projected samples
Will fail:1. J(v) is always 0: happens if1 =2
PCA performsreasonably wellhere:
PCA alsofails:
2. If J(v) is always large: classes have large overlap when
projected to any line (PCA will also fail)