The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The...

62
The Principal Components Analysis Slava Vaisman The University of Queensland [email protected] November 30, 2016 Slava Vaisman (UQ) PCA November 30, 2016 1 / 21

Transcript of The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The...

Page 1: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

The Principal Components Analysis

Slava Vaisman

The University of Queensland

[email protected]

November 30, 2016

Slava Vaisman (UQ) PCA November 30, 2016 1 / 21

Page 2: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

Overview

1 PCA review

2 Understanding the PCA

3 Computing the PCA

Slava Vaisman (UQ) PCA November 30, 2016 2 / 21

Page 3: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

A short overview of the PCA

PCA is a powerful feature reduction (feature extraction) mechanism, thathelps us to handle high-dimensional data with too many features.

In particular, PCA is a method for compressing a lot of data intosomething (smaller), that captures the essence of the original data.

1 PCA looks for a related set of the variables in our data that explainmost of the variance, and adds it to the first principal component.

2 Next, it is going to do the same with the next group of variables thatexplain most of the remaining variance, and constructs the secondprincipal component. And so on...

3 · · ·4 · · ·5 · · ·

Slava Vaisman (UQ) PCA November 30, 2016 3 / 21

Page 4: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

A short overview of the PCA

PCA is a powerful feature reduction (feature extraction) mechanism, thathelps us to handle high-dimensional data with too many features.

In particular, PCA is a method for compressing a lot of data intosomething (smaller), that captures the essence of the original data.

1 PCA looks for a related set of the variables in our data that explainmost of the variance, and adds it to the first principal component.

2 Next, it is going to do the same with the next group of variables thatexplain most of the remaining variance, and constructs the secondprincipal component. And so on...

3 · · ·4 · · ·5 · · ·

Slava Vaisman (UQ) PCA November 30, 2016 3 / 21

Page 5: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

A short overview of the PCA

PCA is a powerful feature reduction (feature extraction) mechanism, thathelps us to handle high-dimensional data with too many features.

In particular, PCA is a method for compressing a lot of data intosomething (smaller), that captures the essence of the original data.

1 PCA looks for a related set of the variables in our data that explainmost of the variance, and adds it to the first principal component.

2 Next, it is going to do the same with the next group of variables thatexplain most of the remaining variance, and constructs the secondprincipal component.

And so on...

3 · · ·4 · · ·5 · · ·

Slava Vaisman (UQ) PCA November 30, 2016 3 / 21

Page 6: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

A short overview of the PCA

PCA is a powerful feature reduction (feature extraction) mechanism, thathelps us to handle high-dimensional data with too many features.

In particular, PCA is a method for compressing a lot of data intosomething (smaller), that captures the essence of the original data.

1 PCA looks for a related set of the variables in our data that explainmost of the variance, and adds it to the first principal component.

2 Next, it is going to do the same with the next group of variables thatexplain most of the remaining variance, and constructs the secondprincipal component. And so on...

3 · · ·4 · · ·5 · · ·

Slava Vaisman (UQ) PCA November 30, 2016 3 / 21

Page 7: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

Example

example

Rotation – a linear transformation of our data.

Slava Vaisman (UQ) PCA November 30, 2016 4 / 21

Page 8: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

Example

example Rotation – a linear transformation of our data.

Slava Vaisman (UQ) PCA November 30, 2016 4 / 21

Page 9: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

Understanding the PCA – the general setting

Our m × n data matrix X is given by

X =

X1,1 . . . . . . X1,n

X2,1 . . . . . . X2,n... · · · . . .

...Xm,1 . . . . . . Xm,n

,

where

m — the number of measurement,

n — the number of observations.

Note that, each data sample is a column vector of X . Each sample is inm-dimensional space.

Slava Vaisman (UQ) PCA November 30, 2016 5 / 21

Page 10: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

Understanding the PCA – the general setting

Our m × n data matrix X is given by

X =

X1,1 . . . . . . X1,n

X2,1 . . . . . . X2,n... · · · . . .

...Xm,1 . . . . . . Xm,n

,

where

m — the number of measurement,

n — the number of observations.

Note that, each data sample is a column vector of X . Each sample is inm-dimensional space.

Slava Vaisman (UQ) PCA November 30, 2016 5 / 21

Page 11: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

Understanding the PCA – our objective

X =

X1,1 . . . . . . X1,n

X2,1 . . . . . . X2,n... · · · . . .

...Xm,1 . . . . . . Xm,n

Redundancy: It might happen, that our system has k << m degrees offreedom (the number of independent ways by which a dynamic system canmove), but is taking the entire m−dimensional space in our original dataset X .

Data Redundancy

1 Essentially, we would like to know if the rows of X are correlated.

2 If they are, we might be able to perform the desired dimensionalityreduction. Namely, we would like to remove the redundancy!

Slava Vaisman (UQ) PCA November 30, 2016 6 / 21

Page 12: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

Understanding the PCA – our objective

X =

X1,1 . . . . . . X1,n

X2,1 . . . . . . X2,n... · · · . . .

...Xm,1 . . . . . . Xm,n

Redundancy: It might happen, that our system has k << m degrees offreedom (the number of independent ways by which a dynamic system canmove), but is taking the entire m−dimensional space in our original dataset X .

Data Redundancy

1 Essentially, we would like to know if the rows of X are correlated.

2 If they are, we might be able to perform the desired dimensionalityreduction. Namely, we would like to remove the redundancy!

Slava Vaisman (UQ) PCA November 30, 2016 6 / 21

Page 13: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

A reminder: The Variance and the Covariance

First, let us formalize the concept of redundancy.

Suppose that we are given 2 data vectors (let us suppose that their meanis zero)

x = (x1, . . . , xn), and y = (y1, . . . , yn).

Then, the variance is given by (inner product):

σ2x =

1

n − 1

n∑i=1

xi · xi =1

n − 1x · xT , σ2

y =1

n − 1y · yT .

The covariance is measuring the statistical relationship between x and y:

σ2xy =

1

n − 1x · yT =

1

n − 1y · xT = σ2

yx.

If I observe x, can I say something about y?

Slava Vaisman (UQ) PCA November 30, 2016 7 / 21

Page 14: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

A reminder: The Variance and the Covariance

First, let us formalize the concept of redundancy.

Suppose that we are given 2 data vectors (let us suppose that their meanis zero)

x = (x1, . . . , xn), and y = (y1, . . . , yn).

Then, the variance is given by (inner product):

σ2x =

1

n − 1

n∑i=1

xi · xi =1

n − 1x · xT , σ2

y =1

n − 1y · yT .

The covariance is measuring the statistical relationship between x and y:

σ2xy =

1

n − 1x · yT =

1

n − 1y · xT = σ2

yx.

If I observe x, can I say something about y?

Slava Vaisman (UQ) PCA November 30, 2016 7 / 21

Page 15: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

A reminder: The Variance and the Covariance

First, let us formalize the concept of redundancy.

Suppose that we are given 2 data vectors (let us suppose that their meanis zero)

x = (x1, . . . , xn), and y = (y1, . . . , yn).

Then, the variance is given by (inner product):

σ2x =

1

n − 1

n∑i=1

xi · xi =1

n − 1x · xT , σ2

y =1

n − 1y · yT .

The covariance is measuring the statistical relationship between x and y:

σ2xy =

1

n − 1x · yT =

1

n − 1y · xT = σ2

yx.

If I observe x, can I say something about y?

Slava Vaisman (UQ) PCA November 30, 2016 7 / 21

Page 16: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

A reminder: The Variance and the Covariance

First, let us formalize the concept of redundancy.

Suppose that we are given 2 data vectors (let us suppose that their meanis zero)

x = (x1, . . . , xn), and y = (y1, . . . , yn).

Then, the variance is given by (inner product):

σ2x =

1

n − 1

n∑i=1

xi · xi =1

n − 1x · xT , σ2

y =1

n − 1y · yT .

The covariance is measuring the statistical relationship between x and y:

σ2xy =

1

n − 1x · yT =

1

n − 1y · xT = σ2

yx.

If I observe x, can I say something about y?

Slava Vaisman (UQ) PCA November 30, 2016 7 / 21

Page 17: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

Things to remember about the covariance

σ2xy ≈ 0 ⇒ x and y are (almost) statistically independent

σ2xy 6= 0 ⇒ x and y share some information ⇒ REDUNDANCY

σ2xy = σ2

yx

Slava Vaisman (UQ) PCA November 30, 2016 8 / 21

Page 18: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

Constructing a covariance matrix from our data

Recall that X is given by

X =

X1,1 . . . . . . X1,n

X2,1 . . . . . . X2,n... · · · . . .

...Xm,1 . . . . . . Xm,n

=

X1

X2...

Xm

.

The covariance matrix is given by CX = 1n−1X XT . In particular,

CX =

σ2

X1σ2

X1X2. . . σ2

X1Xm

σ2X2X1

σ2X2

. . . σ2X2Xm

... · · · . . ....

σ2XmX1

. . . . . . σ2Xm

.

Slava Vaisman (UQ) PCA November 30, 2016 9 / 21

Page 19: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

Constructing a covariance matrix from our data

Recall that X is given by

X =

X1,1 . . . . . . X1,n

X2,1 . . . . . . X2,n... · · · . . .

...Xm,1 . . . . . . Xm,n

=

X1

X2...

Xm

.

The covariance matrix is given by CX = 1n−1X XT . In particular,

CX =

σ2

X1σ2

X1X2. . . σ2

X1Xm

σ2X2X1

σ2X2

. . . σ2X2Xm

... · · · . . ....

σ2XmX1

. . . . . . σ2Xm

.

Slava Vaisman (UQ) PCA November 30, 2016 9 / 21

Page 20: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

Properties of covariance matrix

CX =

σ2

X1σ2

X1X2. . . σ2

X1Xm

σ2X2X1

σ2X2

. . . σ2X2Xm

... · · · . . ....

σ2XmX1

. . . . . . σ2Xm

.

The diagonal elements are the variances of the rows of our datamatrix X .

The off-diagonal elements are the corresponding covariances

CX (i , j) = σ2XiXj

= σ2XjXi

= CX (j , i).

CX = X XT is symmetric.

Intuitively: small off-diagonal entries ⇒ statistical independence.

Intuitively: not so small off-diagonal entries ⇒ REDUNDANCY!

Slava Vaisman (UQ) PCA November 30, 2016 10 / 21

Page 21: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

Properties of covariance matrix

CX =

σ2

X1σ2

X1X2. . . σ2

X1Xm

σ2X2X1

σ2X2

. . . σ2X2Xm

... · · · . . ....

σ2XmX1

. . . . . . σ2Xm

.

The diagonal elements are the variances of the rows of our datamatrix X .

The off-diagonal elements are the corresponding covariances

CX (i , j) = σ2XiXj

= σ2XjXi

= CX (j , i).

CX = X XT is symmetric.

Intuitively: small off-diagonal entries ⇒ statistical independence.

Intuitively: not so small off-diagonal entries ⇒ REDUNDANCY!

Slava Vaisman (UQ) PCA November 30, 2016 10 / 21

Page 22: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

Properties of covariance matrix

CX =

σ2

X1σ2

X1X2. . . σ2

X1Xm

σ2X2X1

σ2X2

. . . σ2X2Xm

... · · · . . ....

σ2XmX1

. . . . . . σ2Xm

.

The diagonal elements are the variances of the rows of our datamatrix X .

The off-diagonal elements are the corresponding covariances

CX (i , j) = σ2XiXj

= σ2XjXi

= CX (j , i).

CX = X XT is symmetric.

Intuitively: small off-diagonal entries ⇒ statistical independence.

Intuitively: not so small off-diagonal entries ⇒ REDUNDANCY!

Slava Vaisman (UQ) PCA November 30, 2016 10 / 21

Page 23: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

Properties of covariance matrix

CX =

σ2

X1σ2

X1X2. . . σ2

X1Xm

σ2X2X1

σ2X2

. . . σ2X2Xm

... · · · . . ....

σ2XmX1

. . . . . . σ2Xm

.

The diagonal elements are the variances of the rows of our datamatrix X .

The off-diagonal elements are the corresponding covariances

CX (i , j) = σ2XiXj

= σ2XjXi

= CX (j , i).

CX = X XT is symmetric.

Intuitively: small off-diagonal entries ⇒ statistical independence.

Intuitively: not so small off-diagonal entries ⇒ REDUNDANCY!

Slava Vaisman (UQ) PCA November 30, 2016 10 / 21

Page 24: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

Properties of covariance matrix

CX =

σ2

X1σ2

X1X2. . . σ2

X1Xm

σ2X2X1

σ2X2

. . . σ2X2Xm

... · · · . . ....

σ2XmX1

. . . . . . σ2Xm

.

The diagonal elements are the variances of the rows of our datamatrix X .

The off-diagonal elements are the corresponding covariances

CX (i , j) = σ2XiXj

= σ2XjXi

= CX (j , i).

CX = X XT is symmetric.

Intuitively: small off-diagonal entries ⇒ statistical independence.

Intuitively: not so small off-diagonal entries ⇒ REDUNDANCY!

Slava Vaisman (UQ) PCA November 30, 2016 10 / 21

Page 25: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

Properties of covariance matrix

CX =

σ2

X1σ2

X1X2. . . σ2

X1Xm

σ2X2X1

σ2X2

. . . σ2X2Xm

... · · · . . ....

σ2XmX1

. . . . . . σ2Xm

.

The diagonal elements are the variances of the rows of our datamatrix X .

The off-diagonal elements are the corresponding covariances

CX (i , j) = σ2XiXj

= σ2XjXi

= CX (j , i).

CX = X XT is symmetric.

Intuitively: small off-diagonal entries ⇒ statistical independence.

Intuitively: not so small off-diagonal entries ⇒ REDUNDANCY!

Slava Vaisman (UQ) PCA November 30, 2016 10 / 21

Page 26: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

Intuition

Non-zero off-diagonal entries ⇒ REDUNDANCY!

So, what do we want to achieve?

We want the covariance matrix to look like this, why?

CX =

σ2

X10 . . . 0

0 σ2X2

. . . 0... · · · . . .

...0 . . . 0 σ2

Xm

.

Because, in this case we will have

NO CORRELATION = NO REDUNDANCY!

What is this? DIAGONALIZATION!

DIAGONALIZATION OF CX = NO REDUNDANCY

Slava Vaisman (UQ) PCA November 30, 2016 11 / 21

Page 27: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

Intuition

Non-zero off-diagonal entries ⇒ REDUNDANCY!

So, what do we want to achieve?

We want the covariance matrix to look like this, why?

CX =

σ2

X10 . . . 0

0 σ2X2

. . . 0... · · · . . .

...0 . . . 0 σ2

Xm

.

Because, in this case we will have

NO CORRELATION = NO REDUNDANCY!

What is this? DIAGONALIZATION!

DIAGONALIZATION OF CX = NO REDUNDANCY

Slava Vaisman (UQ) PCA November 30, 2016 11 / 21

Page 28: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

Intuition

Non-zero off-diagonal entries ⇒ REDUNDANCY!

So, what do we want to achieve?

We want the covariance matrix to look like this, why?

CX =

σ2

X10 . . . 0

0 σ2X2

. . . 0... · · · . . .

...0 . . . 0 σ2

Xm

.

Because, in this case we will have

NO CORRELATION = NO REDUNDANCY!

What is this? DIAGONALIZATION!

DIAGONALIZATION OF CX = NO REDUNDANCY

Slava Vaisman (UQ) PCA November 30, 2016 11 / 21

Page 29: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

Intuition

Non-zero off-diagonal entries ⇒ REDUNDANCY!

So, what do we want to achieve?

We want the covariance matrix to look like this, why?

CX =

σ2

X10 . . . 0

0 σ2X2

. . . 0... · · · . . .

...0 . . . 0 σ2

Xm

.

Because, in this case we will have

NO CORRELATION = NO REDUNDANCY!

What is this? DIAGONALIZATION!

DIAGONALIZATION OF CX = NO REDUNDANCYSlava Vaisman (UQ) PCA November 30, 2016 11 / 21

Page 30: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

More intuition

Basically, we want to find a new way to look at my system (change ofbasis – linear transformation), such that CX becomes diagonal.

Suppose that we achieved the desired diagonalization:

CX =

σ2

X10 . . . 0

0 σ2X2

. . . 0... · · · . . .

...0 . . . . . . σ2

Xm

.

Now, we make an assumption.

An assumption

Larger values of σ2Xi

are much more interesting than the smaller ones.(Namely, most of the system dynamic is happening in places, where thevariance is relatively big.)

Slava Vaisman (UQ) PCA November 30, 2016 12 / 21

Page 31: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

More intuition

Basically, we want to find a new way to look at my system (change ofbasis – linear transformation), such that CX becomes diagonal.

Suppose that we achieved the desired diagonalization:

CX =

σ2

X10 . . . 0

0 σ2X2

. . . 0... · · · . . .

...0 . . . . . . σ2

Xm

.

Now, we make an assumption.

An assumption

Larger values of σ2Xi

are much more interesting than the smaller ones.(Namely, most of the system dynamic is happening in places, where thevariance is relatively big.)

Slava Vaisman (UQ) PCA November 30, 2016 12 / 21

Page 32: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

More intuition

Basically, we want to find a new way to look at my system (change ofbasis – linear transformation), such that CX becomes diagonal.

Suppose that we achieved the desired diagonalization:

CX =

σ2

X10 . . . 0

0 σ2X2

. . . 0... · · · . . .

...0 . . . . . . σ2

Xm

.

Now, we make an assumption.

An assumption

Larger values of σ2Xi

are much more interesting than the smaller ones.(Namely, most of the system dynamic is happening in places, where thevariance is relatively big.)

Slava Vaisman (UQ) PCA November 30, 2016 12 / 21

Page 33: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

Even more intuition

CX =

σ2

X10 . . . 0

0 σ2X2

. . . 0... · · · . . .

...0 . . . . . . σ2

Xm

.

Suppose that I order the variances, such that

σ2X1

> σ2X2

> · · · > σ2Xm.

In this case σ2X1

will tell me the dynamics of the strongest! — thisis the first principal component. example

σ2X2

captures less system dynamics, and forms the second principalcomponent.

And so on...

Slava Vaisman (UQ) PCA November 30, 2016 13 / 21

Page 34: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

Even more intuition

CX =

σ2

X10 . . . 0

0 σ2X2

. . . 0... · · · . . .

...0 . . . . . . σ2

Xm

.

Suppose that I order the variances, such that

σ2X1

> σ2X2

> · · · > σ2Xm.

In this case σ2X1

will tell me the dynamics of the strongest! — thisis the first principal component. example

σ2X2

captures less system dynamics, and forms the second principalcomponent.

And so on...

Slava Vaisman (UQ) PCA November 30, 2016 13 / 21

Page 35: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

Even more intuition

CX =

σ2

X10 . . . 0

0 σ2X2

. . . 0... · · · . . .

...0 . . . . . . σ2

Xm

.

Suppose that I order the variances, such that

σ2X1

> σ2X2

> · · · > σ2Xm.

In this case σ2X1

will tell me the dynamics of the strongest! — thisis the first principal component. example

σ2X2

captures less system dynamics, and forms the second principalcomponent.

And so on...

Slava Vaisman (UQ) PCA November 30, 2016 13 / 21

Page 36: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

Even more intuition

CX =

σ2

X10 . . . 0

0 σ2X2

. . . 0... · · · . . .

...0 . . . . . . σ2

Xm

.

Suppose that I order the variances, such that

σ2X1

> σ2X2

> · · · > σ2Xm.

In this case σ2X1

will tell me the dynamics of the strongest! — thisis the first principal component. example

σ2X2

captures less system dynamics, and forms the second principalcomponent.

And so on...

Slava Vaisman (UQ) PCA November 30, 2016 13 / 21

Page 37: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

The diagonalization (1)

1 Recall that X is our data matrix.

2 Compute a non-normalized covariance via X XT .

3 We saw that X XT is symmetric, that is, it has real eigenvalues andall eigenvectors are orthogonal to each other.

4 For such matrices, we can always performed the eigenvaluedecomposition.

Eigenvalue Decomposition

X XT = SΛS−1,

where Λ is a diagonal matrix, and S is a matrix of eigenvectors of X XT .(S ’s columns are normalized right eigenvectors of X XT .)

5 Since the eigenvectors are orthonormal, S−1 = ST !

6 Λ is a diagonal matrix with eigenvalues of X XT !

Slava Vaisman (UQ) PCA November 30, 2016 14 / 21

Page 38: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

The diagonalization (1)

1 Recall that X is our data matrix.

2 Compute a non-normalized covariance via X XT .

3 We saw that X XT is symmetric, that is, it has real eigenvalues andall eigenvectors are orthogonal to each other.

4 For such matrices, we can always performed the eigenvaluedecomposition.

Eigenvalue Decomposition

X XT = SΛS−1,

where Λ is a diagonal matrix, and S is a matrix of eigenvectors of X XT .(S ’s columns are normalized right eigenvectors of X XT .)

5 Since the eigenvectors are orthonormal, S−1 = ST !

6 Λ is a diagonal matrix with eigenvalues of X XT !

Slava Vaisman (UQ) PCA November 30, 2016 14 / 21

Page 39: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

The diagonalization (1)

1 Recall that X is our data matrix.

2 Compute a non-normalized covariance via X XT .

3 We saw that X XT is symmetric, that is, it has real eigenvalues andall eigenvectors are orthogonal to each other.

4 For such matrices, we can always performed the eigenvaluedecomposition.

Eigenvalue Decomposition

X XT = SΛS−1,

where Λ is a diagonal matrix, and S is a matrix of eigenvectors of X XT .(S ’s columns are normalized right eigenvectors of X XT .)

5 Since the eigenvectors are orthonormal, S−1 = ST !

6 Λ is a diagonal matrix with eigenvalues of X XT !

Slava Vaisman (UQ) PCA November 30, 2016 14 / 21

Page 40: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

The diagonalization (1)

1 Recall that X is our data matrix.

2 Compute a non-normalized covariance via X XT .

3 We saw that X XT is symmetric, that is, it has real eigenvalues andall eigenvectors are orthogonal to each other.

4 For such matrices, we can always performed the eigenvaluedecomposition.

Eigenvalue Decomposition

X XT = SΛS−1,

where Λ is a diagonal matrix, and S is a matrix of eigenvectors of X XT .(S ’s columns are normalized right eigenvectors of X XT .)

5 Since the eigenvectors are orthonormal, S−1 = ST !

6 Λ is a diagonal matrix with eigenvalues of X XT !

Slava Vaisman (UQ) PCA November 30, 2016 14 / 21

Page 41: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

The diagonalization (1)

1 Recall that X is our data matrix.

2 Compute a non-normalized covariance via X XT .

3 We saw that X XT is symmetric, that is, it has real eigenvalues andall eigenvectors are orthogonal to each other.

4 For such matrices, we can always performed the eigenvaluedecomposition.

Eigenvalue Decomposition

X XT = SΛS−1,

where Λ is a diagonal matrix, and S is a matrix of eigenvectors of X XT .(S ’s columns are normalized right eigenvectors of X XT .)

5 Since the eigenvectors are orthonormal, S−1 = ST !

6 Λ is a diagonal matrix with eigenvalues of X XT !

Slava Vaisman (UQ) PCA November 30, 2016 14 / 21

Page 42: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

The diagonalization (1)

1 Recall that X is our data matrix.

2 Compute a non-normalized covariance via X XT .

3 We saw that X XT is symmetric, that is, it has real eigenvalues andall eigenvectors are orthogonal to each other.

4 For such matrices, we can always performed the eigenvaluedecomposition.

Eigenvalue Decomposition

X XT = SΛS−1,

where Λ is a diagonal matrix, and S is a matrix of eigenvectors of X XT .(S ’s columns are normalized right eigenvectors of X XT .)

5 Since the eigenvectors are orthonormal, S−1 = ST !

6 Λ is a diagonal matrix with eigenvalues of X XT !

Slava Vaisman (UQ) PCA November 30, 2016 14 / 21

Page 43: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

The diagonalization (2)

Recall that when we started to work with our data X , we actuallyworked in a sort of arbitrary coordinate system.

We would like to figure out, what is the bases, that we should use, inorder to have a diagonal covariance matrix instead of our original one(which has a bunch of correlated data measures).

So, let us create a new set of measurements Y , that is related to theold set of measurements X as follows:

Y = ST X .

(Note that this is just a linear transformation!) And, we would like towork in this new bases from now on.

Slava Vaisman (UQ) PCA November 30, 2016 15 / 21

Page 44: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

The diagonalization (2)

Recall that when we started to work with our data X , we actuallyworked in a sort of arbitrary coordinate system.

We would like to figure out, what is the bases, that we should use, inorder to have a diagonal covariance matrix instead of our original one(which has a bunch of correlated data measures).

So, let us create a new set of measurements Y , that is related to theold set of measurements X as follows:

Y = ST X .

(Note that this is just a linear transformation!) And, we would like towork in this new bases from now on.

Slava Vaisman (UQ) PCA November 30, 2016 15 / 21

Page 45: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

The diagonalization (2)

Recall that when we started to work with our data X , we actuallyworked in a sort of arbitrary coordinate system.

We would like to figure out, what is the bases, that we should use, inorder to have a diagonal covariance matrix instead of our original one(which has a bunch of correlated data measures).

So, let us create a new set of measurements Y , that is related to theold set of measurements X as follows:

Y = ST X .

(Note that this is just a linear transformation!) And, we would like towork in this new bases from now on.

Slava Vaisman (UQ) PCA November 30, 2016 15 / 21

Page 46: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

The diagonalization (3)

Let us calculate the covariance of Y now.

CY =1

n − 1Y Y T =

1

n − 1(ST X )(ST X )T =

=1

n − 1ST X XT︸ ︷︷ ︸

SΛST

S =1

n − 1ST S ΛST S =

1

n − 1Λ.

What is Λ? — a diagonal matrix!To conclude:

If we work at the ST bases — the covariance matrix CY ofY = ST X is diagonal ⇒ NO REDUNDANCY!

Effectively, we figured out, what is the right way to look at ourproblem.

Slava Vaisman (UQ) PCA November 30, 2016 16 / 21

Page 47: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

The diagonalization (3)

Let us calculate the covariance of Y now.

CY =1

n − 1Y Y T =

1

n − 1(ST X )(ST X )T =

=1

n − 1ST X XT︸ ︷︷ ︸

SΛST

S =1

n − 1ST S ΛST S =

1

n − 1Λ.

What is Λ? — a diagonal matrix!To conclude:

If we work at the ST bases — the covariance matrix CY ofY = ST X is diagonal ⇒ NO REDUNDANCY!

Effectively, we figured out, what is the right way to look at ourproblem.

Slava Vaisman (UQ) PCA November 30, 2016 16 / 21

Page 48: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

The diagonalization (3)

Let us calculate the covariance of Y now.

CY =1

n − 1Y Y T =

1

n − 1(ST X )(ST X )T =

=1

n − 1ST X XT︸ ︷︷ ︸

SΛST

S =

1

n − 1ST S ΛST S =

1

n − 1Λ.

What is Λ? — a diagonal matrix!To conclude:

If we work at the ST bases — the covariance matrix CY ofY = ST X is diagonal ⇒ NO REDUNDANCY!

Effectively, we figured out, what is the right way to look at ourproblem.

Slava Vaisman (UQ) PCA November 30, 2016 16 / 21

Page 49: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

The diagonalization (3)

Let us calculate the covariance of Y now.

CY =1

n − 1Y Y T =

1

n − 1(ST X )(ST X )T =

=1

n − 1ST X XT︸ ︷︷ ︸

SΛST

S =1

n − 1ST S ΛST S =

1

n − 1Λ.

What is Λ? — a diagonal matrix!To conclude:

If we work at the ST bases — the covariance matrix CY ofY = ST X is diagonal ⇒ NO REDUNDANCY!

Effectively, we figured out, what is the right way to look at ourproblem.

Slava Vaisman (UQ) PCA November 30, 2016 16 / 21

Page 50: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

The diagonalization (3)

Let us calculate the covariance of Y now.

CY =1

n − 1Y Y T =

1

n − 1(ST X )(ST X )T =

=1

n − 1ST X XT︸ ︷︷ ︸

SΛST

S =1

n − 1ST S ΛST S =

1

n − 1Λ.

What is Λ?

— a diagonal matrix!To conclude:

If we work at the ST bases — the covariance matrix CY ofY = ST X is diagonal ⇒ NO REDUNDANCY!

Effectively, we figured out, what is the right way to look at ourproblem.

Slava Vaisman (UQ) PCA November 30, 2016 16 / 21

Page 51: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

The diagonalization (3)

Let us calculate the covariance of Y now.

CY =1

n − 1Y Y T =

1

n − 1(ST X )(ST X )T =

=1

n − 1ST X XT︸ ︷︷ ︸

SΛST

S =1

n − 1ST S ΛST S =

1

n − 1Λ.

What is Λ? — a diagonal matrix!

To conclude:

If we work at the ST bases — the covariance matrix CY ofY = ST X is diagonal ⇒ NO REDUNDANCY!

Effectively, we figured out, what is the right way to look at ourproblem.

Slava Vaisman (UQ) PCA November 30, 2016 16 / 21

Page 52: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

The diagonalization (3)

Let us calculate the covariance of Y now.

CY =1

n − 1Y Y T =

1

n − 1(ST X )(ST X )T =

=1

n − 1ST X XT︸ ︷︷ ︸

SΛST

S =1

n − 1ST S ΛST S =

1

n − 1Λ.

What is Λ? — a diagonal matrix!To conclude:

If we work at the ST bases — the covariance matrix CY ofY = ST X is diagonal ⇒ NO REDUNDANCY!

Effectively, we figured out, what is the right way to look at ourproblem.

Slava Vaisman (UQ) PCA November 30, 2016 16 / 21

Page 53: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

How many principal components should I use?

CY =

σ2

Y10 . . . 0

0 σ2Y2

. . . 0... · · · . . .

...0 . . . . . . σ2

Ym

.

Suppose that I order the variances, such that

σ2Y1

> σ2Y2

> · · · > σ2Ym.

The Percentage of Variance Explained (PVE) by the principalcomponent i , is defined by:

σ2Yi∑m

j=1 σ2Yj

.

So, the first k 6 m principal components explain

∑ki=1 σ

2Yi∑m

j=1 σ2Yj

of the system

variance.

Slava Vaisman (UQ) PCA November 30, 2016 17 / 21

Page 54: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

How many principal components should I use?

CY =

σ2

Y10 . . . 0

0 σ2Y2

. . . 0... · · · . . .

...0 . . . . . . σ2

Ym

.

Suppose that I order the variances, such that

σ2Y1

> σ2Y2

> · · · > σ2Ym.

The Percentage of Variance Explained (PVE) by the principalcomponent i , is defined by:

σ2Yi∑m

j=1 σ2Yj

.

So, the first k 6 m principal components explain

∑ki=1 σ

2Yi∑m

j=1 σ2Yj

of the system

variance.

Slava Vaisman (UQ) PCA November 30, 2016 17 / 21

Page 55: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

How many principal components should I use?

CY =

σ2

Y10 . . . 0

0 σ2Y2

. . . 0... · · · . . .

...0 . . . . . . σ2

Ym

.

Suppose that I order the variances, such that

σ2Y1

> σ2Y2

> · · · > σ2Ym.

The Percentage of Variance Explained (PVE) by the principalcomponent i , is defined by:

σ2Yi∑m

j=1 σ2Yj

.

So, the first k 6 m principal components explain

∑ki=1 σ

2Yi∑m

j=1 σ2Yj

of the system

variance.Slava Vaisman (UQ) PCA November 30, 2016 17 / 21

Page 56: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

Image compression

Slava Vaisman (UQ) PCA November 30, 2016 18 / 21

Page 57: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

PCA Assumptions

1 A data linearity is assumed. (Kernel PCA.)

XA =

(X1

(3 · X1 + 8)

)XB =

(X1

(X1)2

)

2 We assume that bigger variances have more important dynamics.

3 The principal components are assumed to be orthogonal.

4 We assume that the data-points come from the Gaussian distribution.

Slava Vaisman (UQ) PCA November 30, 2016 19 / 21

Page 58: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

PCA Assumptions

1 A data linearity is assumed. (Kernel PCA.)

XA =

(X1

(3 · X1 + 8)

)XB =

(X1

(X1)2

)

2 We assume that bigger variances have more important dynamics.

3 The principal components are assumed to be orthogonal.

4 We assume that the data-points come from the Gaussian distribution.

Slava Vaisman (UQ) PCA November 30, 2016 19 / 21

Page 59: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

PCA Assumptions

1 A data linearity is assumed. (Kernel PCA.)

XA =

(X1

(3 · X1 + 8)

)XB =

(X1

(X1)2

)

2 We assume that bigger variances have more important dynamics.

3 The principal components are assumed to be orthogonal.

4 We assume that the data-points come from the Gaussian distribution.

Slava Vaisman (UQ) PCA November 30, 2016 19 / 21

Page 60: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

PCA Assumptions

1 A data linearity is assumed. (Kernel PCA.)

XA =

(X1

(3 · X1 + 8)

)XB =

(X1

(X1)2

)

2 We assume that bigger variances have more important dynamics.

3 The principal components are assumed to be orthogonal.

4 We assume that the data-points come from the Gaussian distribution.

Slava Vaisman (UQ) PCA November 30, 2016 19 / 21

Page 61: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

PCA — conclusions

+1 A simple method — no parameters to tweak and no coefficients to

adjust.2 A dramatic reduction in a data size.3 Easy to compute.4 Very powerful for many practical applications.

−1 How do we incorporate a prior knowledge?.2 Too expensive for many applications — O

(n3)

complexity.3 Problems with outliers.4 Assumes linearity.

Slava Vaisman (UQ) PCA November 30, 2016 20 / 21

Page 62: The Principal Components Analysis€¦ · The Principal Components Analysis Slava Vaisman The University of Queensland r.vaisman@uq.edu.au November 30, 2016 Slava Vaisman (UQ) PCA

The End

Slava Vaisman (UQ) PCA November 30, 2016 21 / 21