Principal Components

• Karl Pearson

Principal Components (PC)

• Objective: Given a data matrix of dimensions nxp (p variables and n elements) try to represent these data by using r variables (r<p) with minimum lost of information

We want to find a new set of p variables, Z, which are linear combinations of the original X variable such that :

• r of them contains all the information • The remaining p-r variables are noise

First interpretation of principal components Optimal Data Representation

xi

a

zi

ri

Proyection of a point in direction a: minimize the squared distanceImplies maximizing the variance (assuming zero mean variables)

xiT

xi = riT ri+ zT

i zi

Optimal Prediction

Find a new variable zi =a’Xi which is optimal to predictThe value of Xi in each element .

In general, find r variables, zi =Ar Xi , which are optimal to forecast All Xi with the least squared error criterion

It is easy to see that the solution is that zi =a’Xi must have maximum variance

Second interpretation of PC:

The line which minimizes the orthogonal distance provides the axes of the ellipsoid

Third interpretation of PC

Find the optimal direction to represent the data. Axe of the ellipsoid which contains the data

This is idea of Pearson orthogonal regression

Second component

Properties of PC

Standardized PC

Example Inves

Example Medifis

Example mundodes

Example Mundodes

Example for image analysis

The analysis have been done with 16 images. PC allows that Instead of sending 16 matrices of N2 pixels

16 3 70,616

we send a vector 16x3 with the values of the components and a matrix 3xN2 with the values of the new variables. We save

If instead of 16 images we have 100 images we save 95%

Principal Components

Documents

Transcript of Principal Components