Principal Component Analysis CMPUT 466/551 Nilanjan Ray.

22
Principal Component Analysis CMPUT 466/551 Nilanjan Ray
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    224
  • download

    1

Transcript of Principal Component Analysis CMPUT 466/551 Nilanjan Ray.

Page 1: Principal Component Analysis CMPUT 466/551 Nilanjan Ray.

Principal Component Analysis

CMPUT 466/551

Nilanjan Ray

Page 2: Principal Component Analysis CMPUT 466/551 Nilanjan Ray.

Overview

• Principal component analysis (PCA) is a way to reduce data dimensionality

• PCA projects high dimensional data to a lower dimension

• PCA projects the data in the least square sense– it captures big (principal) variability in the data and ignores small variability

Page 3: Principal Component Analysis CMPUT 466/551 Nilanjan Ray.

PCA: An Intuitive Approach

N

iiN 1

0

1xmx

Let us say we have xi, i=1…N data points in p dimensions (p is large)

If we want to represent the data set by a single point x0, then

Can we justify this choice mathematically?

Source: Chapter 3 of [DHS]

N

iiJ

1

2

000 )( xxx

It turns out that if you minimize J0, you get the above solution, viz., sample mean

Sample mean

Page 4: Principal Component Analysis CMPUT 466/551 Nilanjan Ray.

PCA: An Intuitive Approach…

emx a

Representing the data set xi, i=1…N by its mean is quite uninformative

So let’s try to represent the data by a straight line of the form:

This is equation of a straight line that says that it passes through m

e is a unit vector along the straight line

And the signed distance of a point x from m is a

The training points projected on this straight line would be

Niaii ...1, emx

Page 5: Principal Component Analysis CMPUT 466/551 Nilanjan Ray.

PCA: An Intuitive Approach…

N

iii

N

i

Ti

N

ii

N

iii

N

i

Ti

N

ii

N

iiiN

aa

aa

aaaaJ

1

2

11

2

1

2

11

22

1

2

211

||||)(2

||||)(2||||

),,,,(

mxmxe

mxmxee

xeme

)( mxe iT

ia

N

ii

TN

ii

N

i

Tii

T SJ1

2

1

2

11 ||||||||))(()( mxeemxemxmxee

Let’s now determine ai’s

Partially differentiating with respect to ai we get:

Plugging in this expression for ai in J1 we get:

where

N

i

TiiS

1

))(( mxmx is called the scatter matrix

Page 6: Principal Component Analysis CMPUT 466/551 Nilanjan Ray.

So minimizing J1 is equivalent to maximizing:

PCA: An Intuitive Approach…

ee ST

1eeT

)1( eeee TTS

Subject to the constraint that e is a unit vector:

Use Lagrange multiplier method to form the objective function:

Differentiate to obtain the equation: eSe0ee orS 22

Solution is that e is the eigenvector of S corresponding to the largest eigenvalue

Page 7: Principal Component Analysis CMPUT 466/551 Nilanjan Ray.

PCA: An Intuitive Approach…

ddaa eemx 11

N

ii

d

kkikd aJ

1

2

1

||)(|| xem

The preceding analysis can be extended in the following way.

Instead of projecting the data points on to a straight line, we may

now want to project them on a d-dimensional plane of the form:

d is much smaller than the original dimension p

In this case one can form the objective function:

It can also be shown that the vectors e1, e2, …, ed are d eigenvectors

corresponding to d largest eigen values of the scatter matrix

N

i

TiiS

1

))(( mxmx

Page 8: Principal Component Analysis CMPUT 466/551 Nilanjan Ray.

PCA: Visually

Data points are represented in a rotated orthogonal coordinate system: the origin is the mean of the data points and the axes are provided by the eigenvectors.

Page 9: Principal Component Analysis CMPUT 466/551 Nilanjan Ray.

Computation of PCA

• In practice we compute PCA via SVD (singular value decomposition)

• Form the centered data matrix:

• Compute its SVD:

• U and V are orthogonal matrices, D is a diagonal matrix

)()( 1, mxmx NNpX

TpNpppp VDUX )( ,,,

Page 10: Principal Component Analysis CMPUT 466/551 Nilanjan Ray.

Computation of PCA…

• Note that the scatter matrix can be written as:

• So the eigenvectors of S are the columns of U and the eigenvalues are the diagonal elements of D2

• Take only a few significant eigenvalue-eigenvector pairs d << p; The new reduced dimension representation becomes:

TT UUDXXS 2

)()(~,, mxmx i

Tdpdpi UU

Page 11: Principal Component Analysis CMPUT 466/551 Nilanjan Ray.

Computation of PCA…

• Sometimes we are given only a few high dimensional data points, i.e., p >> N

• In such cases compute the SVD of XT:

• So that we get:

• Then, proceed as before, choose only d < N significant eigenvalues for data representation:

TNpNNNN

T UDVX )( ,,,

TNNNNNp VDUX )( ,,,

)()(~,, mxmx i

Tdpdpi UU

Page 12: Principal Component Analysis CMPUT 466/551 Nilanjan Ray.

PCA: A Gaussian Viewpoint

,)2

))((exp(

2

1))()(

2

1exp(||)2(

1~

12

21

p

i i

Ti

i

T

p

μxuμxμxx

where the covariance matrix is estimated from the scatter matrix as (1/N)Su’s and ’s are respectively eigenvectors and eigenvalues of S.

If p is large, then we need a even larger number of data points to estimate thecovariance matrix. So, when a limited number of training data points is availablethe estimation of the covariance matrix goes quite wrong. This is known as curseof dimensionality in this context.

To combat curse of dimensionality, we discard smaller eigenvalues and be content with:

),min(where,)2

))((exp(

2

1~

12

2

Npdd

i i

Ti

i

μxux

Page 13: Principal Component Analysis CMPUT 466/551 Nilanjan Ray.

PCA Examples

• Image compression example

• Novelty detection example

Page 14: Principal Component Analysis CMPUT 466/551 Nilanjan Ray.

Kernel PCA

• Assumption behind PCA is that the data points x are multivariate Gaussian

• Often this assumption does not hold

• However, it may still be possible that a transformation (x) is still Gaussian, then we can perform PCA in the space of (x)

• Kernel PCA performs this PCA; however, because of “kernel trick,” it never computes the mapping (x) explicitly!

Page 15: Principal Component Analysis CMPUT 466/551 Nilanjan Ray.

KPCA: Basic Idea

Page 16: Principal Component Analysis CMPUT 466/551 Nilanjan Ray.

Kernel PCA Formulation

• We need the following fact:

• Let v be a eigenvector of the scatter matrix:

• Then v belongs to the linear space spanned by the data points xi i=1, 2, …N.

• Proof:

N

i

TiiS

1

xx

N

iii

N

i

TiiS

11

)(1

xvxxvvv

Page 17: Principal Component Analysis CMPUT 466/551 Nilanjan Ray.

Kernel PCA Formulation…

• Let C be the scatter matrix of the centered mapping (x):

• Let w be an eigenvector of C, then w can be written as a linear combination:

• Also, we have:

• Combining, we get:

N

i

TiiC

1

)()( xx

N

kkk

1

)(xw

ww C

N

kkk

N

kkk

N

i

Tii

111

)())()()()(( xxxx

Page 18: Principal Component Analysis CMPUT 466/551 Nilanjan Ray.

Kernel PCA Formulation…

).()(where,

,,2,1,)()()()()()(

)()()()(

)())()()()((

2

11 1

11 1

111

jT

iij

N

kk

Tlk

N

i

N

kkk

Tii

Tl

N

kkk

N

i

N

kkk

Tii

N

kkk

N

kkk

N

i

Tii

KK

KK

Nl

xxαα

αα

xxxxxx

xxxx

xxxx

Kernel or Gram matrix

Page 19: Principal Component Analysis CMPUT 466/551 Nilanjan Ray.

Kernel PCA Formulation…

αα KFrom the eigen equation

And the fact that the eigenvector w is normalized to 1, we obtain:

1

1))(())((||||11

2

αα

ααxxw

T

TN

iii

TN

iii K

Page 20: Principal Component Analysis CMPUT 466/551 Nilanjan Ray.

KPCA AlgorithmStep 1: Compute the Gram matrix: NjikK jiij ,,1,),,( xx

Step 2: Compute (eigenvalue, eigenvector) pairs of K: Mlll ,,1),,( α

Step 3: Normalize the eigenvectors:l

ll

α

α

Thus, an eigenvector wl of C is now represented as:

N

kk

lk

l

1

)(xw

To project a test feature (x) onto wl we need to compute:

N

kk

lk

N

kk

lk

TlT k11

),())(()()( xxxxwx

So, we never need explicitly

Page 21: Principal Component Analysis CMPUT 466/551 Nilanjan Ray.

Feature Map CenteringSo far we assumed that the feature map (x) is centered for thedata points x1,… xN

Actually, this centering can be done on the Gram matrix without ever explicitly computing the feature map (x).

)/11()/11(~

NIKNIK TT

Scholkopf, Smola, Muller, “Nonlinear component analysis as a kernel eigenvalue problem,” Technical report #44, Max Plank Institute, 1996.

is the kernel matrix for centered features, i.e., 0)(1

N

iix

A similar expression exist for projecting test features on the feature eigenspace

Page 22: Principal Component Analysis CMPUT 466/551 Nilanjan Ray.

KPCA: USPS Digit Recognition

Scholkopf, Smola, Muller, “Nonlinear component analysis as a kernel eigenvalue problem,” Technical report #44, Max Plank Institute, 1996.

dTyxk )(),( yxKernel function:

(d)

Classier: Linear SVM with features as kernel principal componentsN = 3000, p = 16-by-16 image

Linear PCA