PRINCIPAL COMPONENT ANALYSIS - KINX CDN

83
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University ELEC801 Pattern Recognition PRINCIPAL COMPONENT ANALYSIS ELEC801 Pattern Recognition, Fall 2017, KNU Instructor: Gil-Jin Jang Slide credits: Srinivasa Narasimhan, CS, CMU; James J. Cochran, Louisiana Tech University; Barnabás Póczos, University of Alberta Slide credit: Narasimhan, Cochran, Póczos 10/23/2017 1

Transcript of PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Page 1: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

PRINCIPAL COMPONENT ANALYSIS

ELEC801 Pattern Recognition, Fall 2017, KNU

Instructor: Gil-Jin Jang

Slide credits:

Srinivasa Narasimhan, CS, CMU; James J. Cochran, Louisiana Tech University;

Barnabás Póczos, University of Alberta

Slide credit: Narasimhan, Cochran, Póczos 10/23/2017 1

Page 2: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

� Example:

– Input data type: 53 blood and urine measurements (wet

chemistry)

– Samples: 65 people (33 alcoholics, 32 non-alcoholics).

� Data given in a matrix format

Problem: Hi-Dimensional Features

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 2

Slide credit: S. Narasimhan

H-WBC H-RBC H-Hgb H-Hct H-MCV H-MCH H-MCHCH-MCHC

A1 8.0000 4.8200 14.1000 41.0000 85.0000 29.0000 34.0000

A2 7.3000 5.0200 14.7000 43.0000 86.0000 29.0000 34.0000

A3 4.3000 4.4800 14.1000 41.0000 91.0000 32.0000 35.0000

A4 7.5000 4.4700 14.9000 45.0000 101.0000 33.0000 33.0000

A5 7.3000 5.5200 15.4000 46.0000 84.0000 28.0000 33.0000

A6 6.9000 4.8600 16.0000 47.0000 97.0000 33.0000 34.0000

A7 7.8000 4.6800 14.7000 43.0000 92.0000 31.0000 34.0000

A8 8.6000 4.8200 15.8000 42.0000 88.0000 33.0000 37.0000

A9 5.1000 4.7100 14.0000 43.0000 92.0000 30.0000 32.0000

Page 3: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

010 20 30 40 50 60 70

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

Person

H-Bands

0 50 150 250 350 45050

100

150

200

250

300

350

400

450

500

550

C-Triglycerides

C-LDH

0100

200300

400500

0

200

400

6000

1

2

3

4

C-TriglyceridesC-LDH

M-EPI

Univariate Bivariate

Trivariate

Data Representation

Slide credit: S. Narasimhan

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 3

Multi-variate

data

distributions can

be represented

by visualizing

distributions of

pairs or triplets.

Page 4: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

� Issues

– Any other better presentation?

– Do we need all 53 dimensions?

– How to find the BEST lower dimensional space that conveys

maximum useful information?

� One answer: Find PRINCIPAL COMPONENTS

– Goal: We wish to explain/summarize the underlying

variance-covariance structure of a large set of variables

through a few linear combinations of these variables.

Data Representation

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 4

Page 5: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

Principle Component

AnalysisGiven N points in a p-dimensional

space, for large N,– how does one project on to a 1-

dimensional space while preserving broad trends in the data and allowing it to be visualized?

Choose a line that fits the data so the points are spread out well along the line, such that– maximizes variance of projected data

(purple line)

– minimizes mean squared distance between

» data point and

» projections (sum of blue lines)

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 5

Page 6: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

� Minimize sum of squares of distances to the line.

– Minimizing sum of squares of distances to the line is the

same as maximizing the sum of squares of the projections

on that line, thanks to Pythagoras.

Algebraic Interpretation – 1D

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 6

Page 7: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

PCA: Mathematical Formulation

∑=

==

N

i

i

N 1

0

1xmx

∑=

−=

N

i

iJ

1

2

000)( xxx

Let us say we have xi, i=1…N data points in p dimensions (p is large)

If we want to represent the data set by a single point x0, then

Can we justify this choice mathematically?

Source: Chapter 3 of [DHS]

It turns out that if you minimize J0, you get the above solution,

viz., sample mean

Sample mean

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 7

Page 8: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

PCA: An Intuitive Approach

emx a+=

Niaii

...1, =+= emx

Representing the data set xi, i=1…N by its mean is quite uninformative

So let’s try to represent the data by a straight line of the form:

This is equation of a straight line that says that it passes through m

e is a unit vector along the straight line

And the signed distance of a point x from m is a

The individual points projected on this straight line would be

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 8

Page 9: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

PCA: An Intuitive Approach

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 9

( ) ( )

∑∑∑

∑∑∑

∑∑

===

===

==

−+−−=

−+−−=

−+=+≡

N

i

ii

N

i

T

i

N

i

i

N

i

ii

N

i

T

i

N

i

i

N

i

ii

N

i

iiN

aa

aa

aaDaaJ

1

2

11

2

1

2

11

22

1

2

1

11

||||)(2

||||)(2||||

,),,...,(

mxmxe

mxmxee

xemxeme

)( mxe −=i

T

ia

∑∑∑===

−+−=−+−−−=

N

i

i

T

N

i

i

N

i

T

ii

TJ

1

2

1

2

1

1||||||||))(()( mxSeemxemxmxee

∑=

−−=

N

i

T

ii

1

))(( mxmxS

Let’s now determine ai’s

Partially differentiating with respect to aiwe get:

Plugging in this expression for aiin J

1we get:

where is called the scatter matrix

(in this case, covariance matrix)

Page 10: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

So minimizing J1

is equivalent to maximizing:

PCA: An Intuitive Approach…

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 10

SeeT

1=eeT

)1( −− eeSeeTT

λ

eSe0eSe λλ ==− or22

Subject to the constraint that e is a unit vector:

Use Lagrange multiplier method to form the objective function:

Differentiate to obtain the equation:

Solution is that e is the eigenvector of S corresponding to the largest eigenvalue

Page 11: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

� Sx points in some other direction in general

e is an eigenvector and λ an eigenvalue if

Eigensystem

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 11

x

Sx

eSe = λe

Page 12: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

PCA: Extension to Multi-dimensions

12

ddaa eemx +++= ...

11

∑ ∑= =

−+=

N

i

i

d

k

kikdaJ

1

2

1

||)(|| xem

∑=

−−=

N

i

T

ii

1

))(( mxmxS

The preceding analysis can be extended in the following way.

Instead of projecting the data points on to a straight line, we may

now want to project them on a d-dimensional plane of the form:

d is much smaller than the original dimension p

In this case one can form the objective function:

It can also be shown that the vectors e1, e

2, …, e

dare d eigenvectors

corresponding to d largest eigenvalues of the scatter matrix

10/23/2017Slide credit: Narasimhan, Cochran, Póczos

Page 13: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

� Given a square matrix �, for some scalar

� (eigenvalue), a non-zero vector � is an eigenvector if

it satisfies (note: notation � ≡ �)

�� = �� ↔ (� − ��)� = �

� Characteristics

– There are � eigenvectors for non-singular � × � matrix �

– solution to the characteristic equation is obtained by finding

det(� − ��) = 0 (MATLAB function ‘eig’)

– example: 2x2 case

Eigenvalues and Eigenvectors

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 13

( ) 3,1012021

12det,

21

122

=⇒=−−⇒=

= λλ

λ

λS

Page 14: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

Eigenvectors on Covariance Matrix

• if � is a covariance matrix (� = �[���]), and � is a unit vector

(|�| = 1), then �[(���)�] = �,

since ���� = ����

� � = [��; ��; … ; ��] decorrelates �, i.e., zeros off-

diagonal terms of covariance matrix:

�[���] = �[��

����] =��

�[���]� =

=���� =

�� 0

0 ��⋯

0

0⋮ ⋱ ⋮

0 0 ⋯ ��

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 14

Page 15: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

Dimension Reduction

� Suppose each data point is a vector of dimension d.

� �[����] = �[��

���

���] =��

��[���]�� =��

����

– The eigenvectors of S define a new coordinate system

» eigenvector with largest eigenvalue captures the most variation among

training vectors x

» eigenvector with smallest eigenvalue has least variation

– We can compress the data (represent with little error) by only using the top few eigenvectors

» corresponds to choosing a “linear subspace”

� represent points on a line, plane, or “hyper-plane”

» these eigenvectors are known as the principal components

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 15

Page 16: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

Code Example: pca.m

function [W, eigvector, eigvalue] = pca(R)

% PCA Principal component analysis

% [W, EIGVECTOR, EIGVALUE] = PCA(covX)

% covX: covariance matrix

% EIGVECTOR: each column is a

% eigenvector of covX

% EIGVALUE: eigenvalues of covX

% W: E[WxxW'] = I

[v, d] = eig(R);

eigvector = v;

eigvalue = diag(d);

% Sort in a descending order

[-, index] = sort(-eigvalue);

% or use sort(eigvalue,'descend‘);

eigvalue = eigvalue(index);

eigvector = eigvector(:, index);

N = length(eigvalue);

W = zeros(N,N);

for m=1:N,

W(m,:) = ...

eigvector(:,m)'/sqrt(eigvalue(m));

end

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 16

Page 17: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

PCA Applications:

Face recognition

Facial expression recognition

Barnabás Póczos

University of Alberta

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 17

Page 18: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

Challenge: Face Recognition

Task: Identify specific

person, based on

facial image

regardless of wearing

glasses, lighting,…

Can we use all the

given 256 x 256

pixels?

� 2^16 = 64KB when

256 gray-level

images

Slide credit: Barnabás Póczos

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 18

Page 19: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

� An image is a point in a high dimensional space

– An N x M image is a point in RNM

– We can define vectors in this space as we did in the 2D case

The Space of Faces

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 19

+=

Page 20: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

Key Idea

}ˆ{ P

RLx=χ• Images in the possible set are highly correlated.

• So, compress them to a low-dimensional subspace that

captures key appearance characteristics

• EIGENFACES: [Turk and Pentland]

USE PCA!

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 20

Page 21: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

Eigenfaces

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 21

Eigenfaces look somewhat like generic faces.

Page 22: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

� Eigenfaces are

the eigenvectors of

the covariance matrix of

the vector space of

human faces

� Eigenfaces are the ‘standardized face ingredients’ derived from the statistical analysis of many pictures of human faces

� A human face may be considered to be a combination of these standardized faces

Eigenfaces – verbal summary

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 22

Page 23: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

1. Large set of images of human faces is taken.

2. The images are normalized to line up the eyes, mouths and other features.

3. The eigenvectors of the covariance matrix of the face image vectors are then extracted.

4. These eigenvectors are called eigenfaces.

Generating Eigenfaces

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 23

Page 24: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

� When properly weighted, eigenfaces can be summed together to create an approximate gray-scale rendering of a human face.

� Remarkably few eigenvector terms are needed to give a fair likeness of most people's faces.

� Hence eigenfaces provide a means of applying data compression to faces for identification purposes.

Eigenfaces for Face Recognition

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 24

Page 25: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

Dimensionality Reduction

The set of faces is a “subspace” of the set of images– Suppose it is K dimensional

– We can find the best subspace using PCA

– This is like fitting a “hyper-plane” to the set of faces

» spanned by vectors v1, v

2, ..., v

K

Any face:

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 25

Page 26: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

Projecting onto the Eigenfaces

� The eigenfaces v1, ..., v

Kspan the space of faces

– A face is converted to eigenface coordinates by

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 26

Page 27: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

Choosing the Dimension K

K NMi =

eigenvalues

� How many eigenfaces to use? – find the “knee”

� Look at the decay of the eigenvalues

– the eigenvalue tells you the amount of variance “in the direction” of that eigenface

– ignore eigenfaces with low variance

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 27

Page 28: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

Applying PCA: Eigenfaces

� Example data set: Images of faces

– Eigenface approach[Turk & Pentland], [Sirovich & Kirby]

� Each face x is …

– 256 × 256 values (luminance at location)

» x in ℜ256×256 (view as 64K dim vector)

� Form X = [ x1

, …, xm

] centered data matrix

� Compute Σ = XXT

� Problem: Σ is 64K × 64K … HUGE!!!

256 x

256

real v

alu

es

m faces

X =

x1, …, x

m

Method A: Build a PCA subspace for each person and check which

subspace can reconstruct the test image the best

Method B: Build one PCA database for the whole dataset and then

classify based on the weights.

Slide credit: Barnabás Póczos

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 28

Page 29: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

Computational Complexity

� Suppose m instances, each of size N

– Eigenfaces: m=500 faces, each of size N=64K

� Given N×N covariance matrix Σ, can compute

– all N eigenvectors/eigenvalues in O(N3)

– first k eigenvectors/eigenvalues in O(kN2)

� But if N=64K, it often becomes computationally

intractable

Slide credit: Barnabás Póczos

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 29

Page 30: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

A Clever Workaround

� Note that m<<64K

� Use L=XTX instead of Σ=XXT

� If v is eigenvector of L

then Xv is eigenvector of Σ

Proof: L v = γ v

XTX v = γ v

X (XTX v) = X(γ v) = γ Xv

(XXT)X v = γ (Xv)

Σ (Xv) = γ (Xv)

256 x

256

real v

alu

es

m faces

X =

x1, …, x

m

Slide credit: Barnabás Póczos

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 30

Page 31: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

Happiness subspace (method A)

Method A: Build a PCA subspace for each person and check which subspace

can reconstruct the test image the best

Slide credit: Barnabás Póczos

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 31

Page 32: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

Disgust subspace (method A)

Method A: Build a PCA subspace for each person and check which subspace

can reconstruct the test image the best

Slide credit: Barnabás Póczos

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 32

Page 33: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

Principle Components (Method B)

Slide credit: Barnabás Póczos

Method B: Build one PCA database for the whole dataset and then classify

based on the weights.

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 33

Page 34: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

Classification with Eigenfaces (Method B)

1. Process the image database (set of images with labels)

– Run PCA—compute eigenfaces

– Calculate the K coefficients for each image

2. Given a new image (to be recognized) x, calculate Kcoefficients

3. Detect if x is a face

4. If it is a face, who is it?

• Find closest labeled face in database

• nearest-neighbor in K-dimensional space

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 34

Page 35: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

Reconstructing… (Method B)

� … faster if train with…

– only people w/out glasses

– same lighting conditions

Slide credit: Barnabás Póczos

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 35

Page 36: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

� Advantages

– PCA is completely knowledge free

– Reduction in computation and memory requirements

� Shortcomings

– PCA Requires carefully controlled data:

» All faces centered in frame, same sizes, and sensitive to angles

– Alternative:

» “Learn” one set of PCA vectors for each angle

» Use the one with lowest error

Summary

Slide credit: Barnabás Póczos

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 36

Page 37: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

LINEAR DISCRIMINANT ANALYSIS

PCA vs LDA

LDA for two classes and multi-classes

Example problem

Applications of LDA

Slide credit: Narasimhan, Cochran, Póczos 10/23/2017 37

Page 38: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

Principal Component Analysis (PCA)

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 38

Find a transformation w, such that the wTx is dispersed the most

(maximum distribution)

XwYT

=

Page 39: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

Linear Discriminant Analysis (LDA)

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 39

Find a transformation w, such that the wTX1

and wTX2

are maximally separated &

each class is minimally dispersed (maximum separation)

11XwY

T=

22XwY

T=

Page 40: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

Linear Discriminant Analysis (LDA)

• Perform dimensionality reduction “while

preserving as much of the class discriminatory

information as possible”.

• Seeks to find directions along which the classes

are best separated.

• Takes into consideration the scatter within-

classes but also the scatter between-classes.

• More capable of distinguishing image variation

due to identity from variation due to other sources

such as illumination and expression.

4010/23/2017Slide credit: Narasimhan, Cochran, Póczos

Page 41: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

� Separation by mean, variation, …

Linear Discriminant Analysis (LDA)

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 41

1

2

3

Page 42: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

Linear Discriminant Analysis (LDA)

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 42

• Two classes: ω1, ω

2

• Introduce

• SB

: between-class scatter matrix

• SW

: within-class scatter matrix

( )( )∑∈

−−=

cx

T

cccxxS

ω

µµ

21SSS

w+=

∑∈

=

cxc

cx

µ1

( )( )TB

S2121

µµµµ −−=

class mean

scatter of the means

class covariance

sum of the covariances

Page 43: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition 43

LDA

( )( ) ( )( ) wSwwwSB

TTTT

B=−−=−−=

21212121

~~~~~

µµµµµµµµ

wSw

wSwwJ

W

T

B

T

=)(Objective function J(w)

maximizes SB

of y

minimizes SW

of y

XwYT

=

c

T

yc

c wyN

c

µµ

ω

== ∑∈

1~

( )( ) ( )( )

( )( ) wSwwxxw

wxwwxwyyS

c

T

x

T

cc

T

x

T

c

TT

c

TT

y

T

ccc

c

cc

=−−=

−−=−−=

∑∑

∈∈

ω

ωω

µµ

µµµµ~~

~

wSwSSW

T=+

21

~~

10/23/2017Slide credit: Narasimhan, Cochran, Póczos

Page 44: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

LDA by Eigenvector

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 44

wSw

wSwwJ

W

T

B

T

=)( How to maximize this

1such that 2

1 minimize =wSwwSw-

W

T

B

T

)1(2

1

2

1 )(w, −+−=Λ wSwwSw

W

T

B

Tλλ

0 =+−=∂

Λ∂wSwS

wWBT

λ

wSwSWB

λ =

wwSSBW

λ 1=

Eigen value problem!!

Page 45: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

LDA Example

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 45

• Example

• X1={(4,1),(2,4),(2,3),(3,6),(4,4)}

• X2={(9,10),(6,8),(9,5),(8,7),(10,8)}

−=

−=

==

64.204.

04.84.1,

6.24.

4.8.

]6.74.8[],6.30.3[

21

21

SS

µµ

• Class statistics

• Within and between class scatter

−=

=

28.544.

44.64.2,

00.1660.21

60.2116.29WBSS

• Solve the Eigen value problem

wwSSBW

λ 1=

Page 46: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

LDA on more than 2 classes

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 46

wSw

wSw

S

S

wJ

W

T

B

T

W

B

== ~

~

)(

Page 47: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

LDA on C > 2 classes

Slide credit: Narasimhan, Cochran, Póczos 47

1[ ]

kW V V= L

1 1 2 2

1 2 1 2[ ]

c

c

nU X X X X X= L

1 1

( )( )cnc

i i T

w j i j i

i j

S X Xµ µ

= =

= − −∑∑

1

( )( )c

T

b i i

i

S µ µ µ µ

=

= − −∑

B

W

SV V

Sλ=

• C-classes

wSwSWB

λ =

Page 48: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

Comparison of PCA and LDA

� PCA – finds axes of maximal variances

– Computed by eigenvalue decomposition

– Eigenfaces when applied to face recognition

� LDA – finds axes of maximal separation

– Often referred to Fisher’s linear discriminant

– Fisherfaces when applied to face recognition

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 48

Page 49: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

Which is a set of Eigenfaces and Fisherfaces?

Images from Wikipedia.org

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 49

Page 50: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

Eigenfaces vs. Fisherfaces

� Independent Comparative Study of PCA and LDA on the FERET

Data Set, by Kresimir Delac, Mislav Grgic, Sonja Grgic

– PCA: blurred, like average faces

– LDA finds more discriminant features from face images

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 50

Page 51: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

References

� Peter N. Belhumeur, João P. Hespanha, and David J. Kriegman, Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection, IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 19, NO. 7, JULY 1997

� Kresimir Delac, Mislav Grgic, Sonja Grgic, Independent Comparative Study of PCA, ICA, and LDA on the FERET Data Set

� M. Turk, A. Pentland, Eigenfaces for Recognition, Journal of Cognitive Neuroscience, 3(1), pp. 71-86, 1991.

� Gregory Shakhnarovich, Baback Moghaddam, Face Recognition in Subspaces, Mitsubishi TR2004-041 May 2004

� Naotoshi Seo, Project: Eigenfaces and Fisherfaces, http://note.sonots.com/SciSoftware/FaceRecognition.html

� http://www.face-rec.org/algorithms/

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 51

Page 52: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

PCA Applications:

Image Compression and Denoising

Barnabás Póczos

University of Alberta

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 52

Page 53: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

Original Image

• Divide the original 372x492 image into patches:• Each patch is an instance that contains 12x12 pixels on a grid

• View each as a 144-D vectorSlide credit: Barnabás Póczos

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 53

Page 54: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

L2

error and PCA dim

Slide credit: Barnabás Póczos

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 54

Page 55: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

Looks like the discrete cosine bases of JPG!...

60 most important eigenvectors

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 55

Page 56: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

2D Discrete Cosine Basis

http://en.wikipedia.org/wiki/Discrete_cosine_transform

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 56

Page 57: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

PCA compression: 144D to 60D

Slide credit: Barnabás Póczos

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 57

Page 58: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

16 most important eigenvectors

2 4 6 8 10 12

2

4

6

8

10

12

2 4 6 8 10 12

2

4

6

8

10

12

2 4 6 8 10 12

2

4

6

8

10

12

2 4 6 8 10 12

2

4

6

8

10

12

2 4 6 8 10 12

2

4

6

8

10

12

2 4 6 8 10 12

2

4

6

8

10

12

2 4 6 8 10 12

2

4

6

8

10

12

2 4 6 8 10 12

2

4

6

8

10

12

2 4 6 8 10 12

2

4

6

8

10

12

2 4 6 8 10 12

2

4

6

8

10

12

2 4 6 8 10 12

2

4

6

8

10

12

2 4 6 8 10 12

2

4

6

8

10

12

2 4 6 8 10 12

2

4

6

8

10

12

2 4 6 8 10 12

2

4

6

8

10

12

2 4 6 8 10 12

2

4

6

8

10

12

2 4 6 8 10 12

2

4

6

8

10

12

Slide credit: Barnabás Póczos

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 58

Page 59: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

PCA compression: 144D to 16D

Slide credit: Barnabás Póczos

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 59

Page 60: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

2 4 6 8 10 12

2

4

6

8

10

12

2 4 6 8 10 12

2

4

6

8

10

12

2 4 6 8 10 12

2

4

6

8

10

12

2 4 6 8 10 12

2

4

6

8

10

12

2 4 6 8 10 12

2

4

6

8

10

12

2 4 6 8 10 12

2

4

6

8

10

12

6 most important eigenvectors

Slide credit: Barnabás Póczos

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 60

Page 61: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

PCA compression: 144D to 6D

Slide credit: Barnabás Póczos

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 61

Page 62: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

2 4 6 8 10 12

2

4

6

8

10

12

2 4 6 8 10 12

2

4

6

8

10

12

2 4 6 8 10 12

2

4

6

8

10

12

3 most important eigenvectors

Slide credit: Barnabás Póczos

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 62

Page 63: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

PCA compression: 144D to 3D

Slide credit: Barnabás Póczos

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 63

Page 64: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

Noise Filtering by Auto-Encoder

x x’

U x

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 64

Page 65: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

Noisy image

Slide credit: Barnabás Póczos

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 65

Page 66: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

Denoised: 15 PCA components

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 66

Page 67: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

Kernel PCA

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 67

Page 68: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

z

v

-3 -2 -1 0 1 2 3

-4-2

02

4Motivation

z

u

-3 -2 -1 0 1 2 3

02

46

8

???

????

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 68

Page 69: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

Motivation

Linear projections

will not detect the

pattern.

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 69

Page 70: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

Nonlinear PCA

Three popular methods are available:

1) Neural-network based PCA (E. Oja, 1982)

2)Method of Principal Curves (T.J. Hastie and W. Stuetzle,

1989)

3) Kernel based PCA (B. Schölkopf, A. Smola, and K.

Müller, 1998)

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 70

Page 71: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

PCA

NPCA

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 71

Page 72: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

KPCA: Basic Idea

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 72

Page 73: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

� Let C be the scatter matrix of the centered mapping φ(x):

� Let w be an eigenvector of C, then w can be written as a linear

combination:

� Also, we have:

� Combining, we get:

Kernel PCA Formulation…

∑=

=

N

i

T

iiC

1

)()( xx φφ

∑=

=

N

k

kk

1

)(xw φα

ww λ=C

∑∑∑===

=

N

k

kk

N

k

kk

N

i

T

ii

111

)())()()()(( xxxx φαλφαφφ

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 73

Page 74: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

Kernel PCA Formulation…

).()(where,

,,2,1,)()()()()()(

)()()()(

)())()()()((

2

11 1

11 1

111

j

T

iij

N

k

k

T

lk

N

i

N

k

kk

T

ii

T

l

N

k

kk

N

i

N

k

kk

T

ii

N

k

kk

N

k

kk

N

i

T

ii

KK

KK

Nl

xxαα

αα

xxxxxx

xxxx

xxxx

φφλ

λ

φφαλαφφφφ

φαλαφφφ

φαλφαφφ

==

⇒=

⇒==

⇒=

⇒=

∑∑∑

∑∑∑

∑∑∑

== =

== =

===

Κ

Kernel or Gram matrix

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 74

Page 75: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

Kernel PCA Formulation…

αα λ=KFrom the eigen equation

And the fact that the eigenvector w is normalized to 1, we obtain:

λ

φαφα

1

1))(())((||||11

2

=

⇒=== ∑∑==

αα

ααxxw

T

T

N

i

ii

T

N

i

iiK

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 75

Page 76: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

KPCA Algorithm

Step 1: Compute the Gram matrix: NjikKjiij

,,1,),,( Κ== xx

Step 2: Compute (eigenvalue, eigenvector) pairs of K: Mll

l ,,1),,( Κ=λα

Step 3: Normalize the eigenvectors:

l

l

l

λ

α

α ←

Thus, an eigenvector wl of C is now represented as: ∑=

=

N

k

k

l

k

l

1

)(xw φα

To project a test feature φ(x) onto wl we need to compute:

∑∑==

==

N

k

k

l

k

N

k

k

l

k

TlTk

11

),())(()()( xxxxwx αφαφφ

So, we never need φ explicitly

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 76

Page 77: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

Feature Map Centering

So far we assumed that the feature map φ(x) is centered for thedata points x1,… xN

Actually, this centering can be done on the Gram matrix without ever

explicitly computing the feature map φ(x).

)/11()/11(~

NIKNIKTT

−−=

Scholkopf, Smola, Muller, “Nonlinear component analysis as a kernel eigenvalue problem,” Technical report #44,

Max Plank Institute, 1996.

is the kernel matrix for centered features, i.e., 0)(1

=∑=

N

i

ixφ

A similar expression exist for projecting test features on the feature eigenspace

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 77

Page 78: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

KPCA: USPS Digit Recognition

Scholkopf, Smola, Muller, “Nonlinear component analysis as a kernel eigenvalue problem,” Technical report #44,

Max Plank Institute, 1996.

dTyxk )(),( yx=Kernel function:

(d)

Classier: Linear SVM with features as kernel principal components

N = 3000, p = 16-by-16 image

Linear PCA

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 78

Page 79: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

Input points before kernel PCA

http://en.wikipedia.org/wiki/Kernel_principal_component_analysis

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 79

Page 80: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

Output after kernel PCA

The three groups are distinguishable using the first component only

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 80

Page 81: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

� PCA

– finds orthonormal basis for data

– Sorts dimensions in order of “importance”

– Discard low significance dimensions

� Uses:

– Get compact description

– Ignore noise

– Improve classification (hopefully)

� Not magic:

– Doesn’t know class labels

– Can only capture linear variations

� One of many tricks to reduce dimensionality!

PCA Conclusions

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 81

Page 82: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

*Matrix and Vector DerivativesMatrix and vector derivatives are obtained first by element-wise derivatives

and then reforming them into matrices and vectors.

Slide credit: Tae-Kyun Kim

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 82

Page 83: PRINCIPAL COMPONENT ANALYSIS - KINX CDN

Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University

ELEC801 Pattern Recognition

*Matrix and Vector Derivatives

Slide credit: Tae-Kyun Kim

10/23/2017Slide credit: Narasimhan, Cochran, Póczos 83