Principal Component Analysis - MIT...

Principal Component Analysis

Yihui Saw

Massachusetts Institute of Technology

[email protected]

April 25, 2013

Yihui Saw (MIT) PCA April 25, 2013 1 / 20

Overview

1 Introduction

2 Linear Algebra Background

3 Statistics Background

4 Principal Components

5 Example

6 Applications


What and why? - Principal Component Analysis (PCA)

Map high-dimensional feature vectors into lower-dimensional onesthat capture the important variation in our data

Apply an othorgonal transformation to convert the set of observationsinto a set of values called principal components


Linear Algebra Definitions

Definition (Eigenvectors and Eigenvalues)

Let A be a square matrix. A non-zero vector C is an eigenvector of A iffthere exist a number λ (real or complex) such that

AC = λC

If such a λ exist, it is called an eigenvalue of A.


Linear Algebra Definitions

Definition (Matrix Symmetry)

A matrix A is symmetric iffA = AT

Definition (Orthogonal diagonalizable)

Let A be a n × n matrix. A is orthogonal diagonalizable if there is anorthogonal matrix S such that S−1AS is diagonal.

Theorem (Spectral Theorem)

Let A be a n× n matrix. A is orthogonal diagonalizable iff A is symmetric.


Statistics Definitions

Definition (Covariance)

Let E (u) be the mean of a random variable u. Then the covariancecov [x , y ] of random variables x , y is defined ascov [x , y ] = E (xy)− E (x)E (y).

Definition (Variance)

The variance of a random variable x is var [x ] = cov [x , x ].

Definition (Covariance Matrix)

The covariance matrix Σ of X is the matrix with entries Σij = cov [xi , xj ].


Derivation and Defintion of Principal Components

Turns out principal components are the eigenvectors of the covariancematrix. Why? Derivation:

Input: Vector x of p variables.

Goal: Maximize variance. Minimize correlation.

Outcome: Vector y of m ≤ p variables.


An algorithm - The Covariance Method

1 Represent samples xi as column vectors of X

2 Center mean of data at 0.

−→µ =1

n(−→x1 + ...+−→xn)

B = X −−→µ h

where h is a vector of all 1s.

3 Find the covariance matrix:

S =1

n − 1BBT

4 Find the eigenvalues and eigenvectors of the covariance matrix:

S = VΛV T


An algorithm - The Covariance Method

5 Sort the columns of V in order of decreasing eigenvalues and selectfirst k columns, V ′. Project the original data set to a new space.

Y = B · V ′

6 Optional: To reconstruct data set with principal components.

X ′ = Y · V ′ +−→µ h


Example

Suppose we have 10 individual observations of dimension 2, each sample isa vector [x1, x2].

Data x1 Data x2 Adjusted x1 Adjusted x22.5 2.4 0.69 0.49

0.5 0.7 -1.31 -1.21

2.2 2.9 0.39 0.99

1.9 2.2 0.09 0.29

3.1 3.0 1.29 1.09

2.3 2.7 0.49 0.79

2.0 1.6 0.19 -0.31

1.0 1.1 -0.81 -0.81

1.5 1.6 -0.31 -0.31

1.1 0.9 -0.71 -1.01


Plot of Original Data


Example

A plot of the new data points after applying the PCA analysis using botheigenvectors.

Transformed Data x Transformed Data y

-.827970186 -.175115307

1.77758033 .142857227

-.992197494 .384374989

-.274210416 .130417207

-1.67580142 -.209498461

-.912949103 .175282444

.0991094375 -.349824698

1.14457216 .0464172582

.438046137 .0177646297

1.22382056 -.162675287


Plot of Adjusted Data


Plot of Reconstructed Data

Reconstruct data using only the most significant eigenvector.


Application : Facial Recognition

Idea: Reduce dimensionality with PCA. Classify with K-medoids.The Original Data



The Data using 1 Principal Component



The Data using 10 Principal Components



The Data using 100 Principal Components



The Data using All Principal Components


The End


Principal Component Analysis - MIT...

Documents

Transcript of Principal Component Analysis - MIT...