fischer linear discriminant

7/29/2019 fischer linear discriminant

1/23

PCA finds the most accurate data representationin a lower dimensional space

Project data in the directions of maximum variance

Fisher Linear Discriminant project to a line whichpreserves direction useful for data classification

Data Representation vs. Data Classification

However the directions of maximum variance maybe useless for classification

apply PCA

to each class

separable

not separable


2/23

Fisher Linear Discriminant

Main idea: find projection to a line s.t. samplesfrom different classes are well separated

bad line to project to,classes are mixed up

Example in 2D

good line to project to,classes are well separated


3/23


Suppose we have 2 classes and d-dimensionalsamples x1,,xn where n1 samples come from the first class n2 samples come from the second class

consider projection on a line

Let the line direction be given by unit vector v

vix

vt x i

Scalar vtxi is the distance ofprojection of xifrom the origin

Thus it vtxi is the projection ofxiinto a one dimensionalsubspace


4/23


How to measure separation between projections of

different classes?

Thus the projection of sample xionto a line indirection vis given by vtxi

====

========

1 1

1

1

111

1 11~

n

Cx

t

n

Cxi

ti

t

i i

vxn

vxvn

Let1 and2be the means of classes 1 and 2

Let and be the means of projections of

classes 1 and 21

~2

~

22~, tvsimilarly ====

seems like a good measure21~~


5/23


How good is as a measure of separation? The larger , the better is the expected separation

21 ~~

the vertical axes is a better line than the horizontalaxes to project to for class separability

however2121

~~ >>>>

21~~

1

2

1

2

1

~

2~


6/23


The problem with is that it does notconsider the variance of the classes

21 ~~

1

2

1~

2~

large variance

small

variance 1

2


7/23


We need to normalize by a factor which isproportional to variance

21 ~~

(((( ))))====

====

n

i

zizs

1

2

Define their scatteras

Have samples z1,,zn. Sample mean is ====

====

n

iiz z

n 1

1

Thus scatter is just sample variance multiplied by n

scatter measures the same thing as variance, the spread

of data around the mean scatter is just on different scale than variance

larger scatter: smaller scatter:


8/23


Fisher Solution: normalize by scatter21 ~~

(((( ))))

====

2Classy

22i

22

i

~ys~

Scatter for projected samples of class 2 is

Scatter for projected samples of class 1 is

(((( ))))

====

1

2

1

2

1~~

Classyi

i

ys

Let yi= vtxi, i.e. yi s are the projected samples


9/23


We need to normalize by both scatter of class 1 andscatter of class 2

(((( ))))(((( ))))

2

2

2

1

2

21

~~

~~

ssvJ

++++

====

Thus Fisher linear discriminant is to project on line

in the direction vwhich maximizes

want projected means are far from each other

want scatter in class 2 is assmall as possible, i.e. samplesof class 2 cluster around theprojected mean

2~

want scatter in class 1 is assmall as possible, i.e. samplesof class 1 cluster around theprojected mean 1

~


10/23


(((( )))) (((( ))))22

2

1

2

21

~~~~ss

vJ++++

====

If we find vwhich makes J(v) large, we are

guaranteed that the classes are well separated

1

~ 2~

small implies thatprojected samples ofclass 1 are clusteredaround projected mean

1

~s small implies that

projected samples ofclass 2 are clusteredaround projected mean

2

~s

projected means are far from each other


11/23

Fisher Linear Discriminant Derivation

All we need to do now is to express Jexplicitly as a

function of vand maximize it straightforward but need linear algebra and Calculus

(((( )))) (((( ))))22

2

1

2

21

~~~~ss

vJ++++

====

Define the separate class scatter matrices S1 andS2 for classes 1 and 2. These measure the scatterof original samples xi(before projection)

(((( ))))(((( ))))====

1111

Classx

t

iii

xxS

(((( ))))(((( ))))

====

2

222

Classx

t

ii

i

xxS


12/23


Now define the withinthe class scatter matrix21 SSSW ++++====

Recall that (((( ))))

====

1

2

1

2

1~~

Classyi

i

ys

(( ))====

1Classy

2

1

t

i

t2

1i vxvs

~

Using yi= vtxi and 11

~ tv====

(((( ))))(((( )))) (((( ))))(((( ))))====

1Classy

t1i

t

t1i

i

vxvx

(((( ))))(((( ))))

====

1Classy

t

1i1it

i

vxxv vSvt 1====

(((( ))))(((( )))) (((( ))))(((( ))))

====

1Classy1i

tt

1it

i

xvxv


13/23


Similarly vSvs t 222~ ====

Lets rewrite the separations of the projected means

Therefore vSvvSvvSvss Wttt

====++++====++++ 21

2

2

2

1

~~

(((( )))) (((( ))))2212

21~~ tt vv ====

(((( ))))(((( )))) vv tt 2121 ====

vSv Bt

====

Define between the class scatter matrix(((( ))))(((( ))))tBS 2121 ====

SBmeasures separation between the means of two

classes (before projection)


14/23


Thus our objective function can be written:

Minimize J(v) by taking the derivative w.r.t. vand

setting it to 0

(((( ))))(((( ))))2vSv

vSvvSvdv

dvSvvSvdv

d

vJdv

d

Wt

BtWtWtBt

====

(((( )))) (((( ))))(((( ))))2

22

vSv

vSvvSvSvvS

Wt

BtWWtB ==== 0====

(((( ))))(((( ))))

vSv

vSv

s~s~

~~vJ

W

t

Bt

2

2

2

1

2

21====

++++

====


15/23


Need to solve (((( )))) (((( )))) 0==== vSvSvvSvSv WBtBWt

(((( )))) (((( ))))0====

vSv

vSvSv

vSv

vSvSv

Wt

WBt

Wt

BWt

(((( ))))0====

vSv

vSvSvvS

W

t

WBt

B

vSvS WB ========

generalized eigenvalue problem


16/23


If SWhas full rank (the inverse exists), can convert

this to a standard eigenvalue problem

vSvS WB ====

vvSS BW ==== 1

But SBx for any vector x, points in the same

direction as1 - 2

(((( ))))(((( )))) xxS tB 2121 ==== (((( )))) (((( )))) xt2121 ==== (((( ))))21 ====

Thus can solve the eigenvalue problem immediately

(((( ))))211 ====

WSv

(((( )))) (((( ))))[[[[ ]]]]211

W211

WB1

W SSSS ====

v v

(((( ))))211

WS ====


17/23

Fisher Linear Discriminant Example

Data Class 1 has 5 samples c1=[(1,2),(2,3),(3,3),(4,5),(5,5)]

Class 2 has 6 samples c2=[(1,0),(2,1),(3,1),(3,2),(5,3),(6,5)]

Arrange data in 2 separate matrices

====

55

21c1

====

56

01c2

Notice that PCA performs verypoorly on this data because thedirection of largest variance is nothelpful for classification


18/23


First compute the mean for each class(((( )))) [[[[ ]]]]6.33cmean 11 ======== (((( )))) [[[[ ]]]]23.3cmean 22 ========

(((( ))))

======== 2.70.80.810ccov4S 11 (((( ))))

======== 1616163.17ccov5S 22

Compute scatter matrices S1 and S2 for each class

====++++==== 2.2324 243.27SSS 21W

Within the class scatter:

it has full rank, dont have to solve for eigenvalues

(((( ))))

========

47.041.0 41.039.0SinvS W1W The inverse of SW is

Finally, the optimal line direction v

(((( ))))

========

89.0

79.0Sv 211

W


19/23


Notice, as long as the linehas the right direction, its

exact position does notmatter

Last step is to compute

the actual 1D vector y.Lets do it separately foreach class

[[[[ ]]]] [[[[ ]]]]4.081.05251

73.065.011

====

========ttcvY

[[[[ ]]]] [[[[ ]]]]25.065.050

6173.065.0

22

====

========

ttcvY


20/23

Multiple Discriminant Analysis (MDA)

Can generalize FLD to multiple classes In case of cclasses, can reduce dimensionality to

1, 2, 3,, c-1 dimensions

Project sample xi to a linear subspace yi= Vtxi Vis called projection matrix


21/23


Objective function: (((( ))))(((( ))))(((( ))))VSVdet

VSVdetVJ

Wt

Bt

====

within the class scatter matrix SW is

(((( ))))(((( )))) ==== ====

========

c

1i

t

ikiclassx

ik

c

1iiW xxSS

k

between the class scatter matrix SB is

(((( ))))(((( ))))tic

1iiiB nS ====

====

Let

====

iclassxi

i xn

1 ====

ixix

n

1

maximum rank is c -1

niby the number of samples of class i andibe the sample mean of class i

be the total mean of all samples


22/23


(((( )))) (((( ))))(((( ))))VSVdetVSVdetVJ

Wt

Bt

====

The optimal projection matrix Vto a subspace of

dimension kis given by the eigenvectors

corresponding to the largest keigenvalues

First solve the generalized eigenvalueproblem:

vSvS WB ====

At most c-1 distinct solution eigenvalues

Let v1, v2 ,, vc-1 be the corresponding eigenvectors

Thus can project to a subspace of dimension at

most c-1


23/23

FDA and MDA Drawbacks

Reduces dimension only to k= c-1 (unlike PCA) For complex data, projection to even the best line may

result in unseparable projected samples

Will fail:1. J(v) is always 0: happens if1 =2

PCA performsreasonably wellhere:

PCA alsofails:

2. If J(v) is always large: classes have large overlap when

projected to any line (PCA will also fail)

fischer linear discriminant

Documents

Transcript of fischer linear discriminant

Linear Discriminant Analysis and Its Generalization

Combining linear discriminant functions with neural ...

Contributions to linear discriminant analysis with ...1428754/FULLTEXT01.pdf · Abstract Edward Kanuti Ngailo (2020). Contributions to linear discriminant analysis with applications

Discriminant Analysis with Graph Learning for ...crabwq.github.io/pdf/2018 Discriminant Analysis... · Linear Discriminant Analysis Revisited In this section, the Linear Discriminant

Multi-label Linear Discriminant Analysisinside.mines.edu/.../Papers/Conference/2010eccv_ml_lda.pdfMulti-label Linear Discriminant Analysis Hua Wang, Chris Ding, and Heng Huang Department

A Comparison of Generalized Linear Discriminant Analysis ...

Linear discriminant analysis, two classes Linear discriminant

· PDF fileLECTURE 10: Linear Discriminant Analysis • Linear Discriminant Analysis, two classes • Linear Discriminant Analysis, C classes • I-DA vs. PCA example

Linear discriminant analysis, two classes Linear ...research.cs.tamu.edu/prism/lectures/pr/pr_l10.pdf · Linear discriminant analysis, ... –LDA can be derived as the Maximum Likelihood

Linear Discriminant Analysis(LDA)syllabus.cs.manchester.ac.uk/pgt/COMP61021/lectures/LDA.pdf• Case Study: PCA vs. LDA • Relevant Issues ... • Linear discriminant analysis (LDA)

Chapter 5: Linear Discriminant Functions

1 Lecture 4 Linear machine Linear discriminant functions Generalized linear discriminant function Fisher’s Linear discriminant Perceptron Optimal separating.

Fisher's linear discriminant

Linear Discriminant Functions Discriminant Functions Least Squares Method Fisher’s Linear Discriminant Probabilistic Generative Models.

Generalizing Linear Discriminant Analysis. Linear Discriminant Analysis Objective -Project a feature space (a dataset n-dimensional samples) onto a smaller.

Linear discriminant analysis: A detailed tutorialscholar.cu.edu.eg/?q=abo/files/linear_discreminate_analysisp_detaile… · Linear discriminant analysis: A detailed tutorial Alaa

Robust Linear Discriminant Analysis Using Ratio ...

1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 10a, April 7, 2015 Factor Analysis (v. PCA), Fischer Linear Discriminant.

1. Fisher Linear Discriminant 2. Multiple Discriminant ...

Linear discriminant functionspasserini/teaching/2011-2012/...Linear discriminant functions Description f(x) = wTx + w0The discriminant function is a linear combination of example features