Bayesian belief networks 2. PCA and ICA

22
1. Bayesian belief networks 2. PCA and ICA Peter Andras [email protected]

description

Bayesian belief networks 2. PCA and ICA. Peter Andras [email protected]. Principal component analysis PCA 1. Idea: the high dimensional data might be situated on a lower dimensional surface. PCA 2. How to find the lower dimensional surface ?. We look for linear surfaces, i.e., hyperplanes. - PowerPoint PPT Presentation

Transcript of Bayesian belief networks 2. PCA and ICA

Page 1: Bayesian belief networks 2. PCA and ICA

1. Bayesian belief networks2. PCA and ICA

Peter Andras

[email protected]

Page 2: Bayesian belief networks 2. PCA and ICA

Principal component analysisPCA 1.

Idea: the high dimensional data might be situated on a lower dimensional surface.

Page 3: Bayesian belief networks 2. PCA and ICA

PCA 2.

How to find the lower dimensional surface ?

We look for linear surfaces, i.e., hyperplanes.

We decompose the correlation matrix of data conform its eigenvectors.

Page 4: Bayesian belief networks 2. PCA and ICA

PCA 3.

The eigenvectors are called principal component vectors.

The new data vectors are formed by the projections of the original data vectors onto the principal component vectors.

Page 5: Bayesian belief networks 2. PCA and ICA

PCA 4.

nixi ,1 are the data vectors

n

i

iTiT xxn

xxER1

1)(

The correlation matrix is:

Page 6: Bayesian belief networks 2. PCA and ICA

PCA 5.

The eigenvectors are determined by the equation:

vRv where is a real number.

Example with two eigenvectors:

Page 7: Bayesian belief networks 2. PCA and ICA

PCA 6.

In principle we should find d eigenvectors if the dimensionality of the data vectors is d.

If the data vectors are situated on a lower dimensional linear surface we find less than d eigenvectors (i.e., the determinant of the correlation matrix is zero).

Page 8: Bayesian belief networks 2. PCA and ICA

PCA 7.

If v1, v2, …, vm, m<d, are the eigenvectors of R then the new, transformed data vectors are calculated as:

kiTik

im

ii

vxy

yyy

),,( 1

Page 9: Bayesian belief networks 2. PCA and ICA

PCA 8.

How to calculate the eigenvectors of R ?

First method: use standard matrix algebra methods.

(it is very laborious)

Second method: iterative calculation of the eigenvectors inspired by artificial neural networks.

Page 10: Bayesian belief networks 2. PCA and ICA

PCA 9.Iterative calculation of the eigenvectors

Let w1 Rd a randomly chosen vector, such that ||w1||=1

Perform iteratively the calculation:

)( 11,1 wyxyww iii

new

where yi=w1Txi and is a learning constant.

The algorithm converges to the eigenvector corresponding to the largest eigenvalue ().

Page 11: Bayesian belief networks 2. PCA and ICA

PCA 10.

To calculate the following eigenvectors we modify the iterative algorithm. Now we use the calculation formula:

)( *, kiii

knewk wyxyww

where

1

1

*k

j

jij

ii wuxx and uji=wjTxi.

This iterative algorithm converges to wk the k-th eigenvector.

Page 12: Bayesian belief networks 2. PCA and ICA

PCA 11.

If the algorithm doesn’t converge the situation can be:

a. the vector enters in a cycle;

b. the values doesn’t form any cycle.

If we have a cycle, all the vectors of the cycle are eigenvectors, and their corresponding eigenvalues are very close.

If we have no convergence and no cycle, that means that there is no more eigenvector that can be determined.

Page 13: Bayesian belief networks 2. PCA and ICA

PCA 12.

How to use the PCA for dimension reduction ?

Select the important eigenvectors.

Many times all of the eigenvectors can be determined but only part of them are important.

The importance of the eigenvectors is shown by their associated eigenvalue.

Page 14: Bayesian belief networks 2. PCA and ICA

PCA 13.

Selecting the important eigenvectors.

1. Graphical method:

Page 15: Bayesian belief networks 2. PCA and ICA

PCA 14.Selecting the important eigenvectors.

2. Relative power:

1.002.0;;1

m

j

kj

3. Cumulative power:

9.08.0;;1

1

m

j

k

jj

j

Page 16: Bayesian belief networks 2. PCA and ICA

PCA 15.Summary

The PCA is used for dimensionality reduction.

The data vectors are projected on the eigenvectors of their correlation matrix to obtain the transformed data vectors.

To calculate easily the PCA we can use the iterative algorithm.

To reduce the data dimension we consider only the important eigenvectors.

Page 17: Bayesian belief networks 2. PCA and ICA

Independent component analysisICA 1.

The idea: if the data vectors are linear combination of statistically independent data components, they should be separable in their components.

This is true if the component vectors have non-Gaussian distribution, with sharper or flatter peak.

Page 18: Bayesian belief networks 2. PCA and ICA

ICA 2.

Suppose xi=Asi, where xi are the data vectors, si are the vectors of statistically independent components (sj

i)

Our goal is to find the matrix A (more precisely, the rows of it).

Example: ‘cocktail-party’ effect: many independent voices registered together; goal: separate the independent voices; the recorded mixture is a linear mixture.

Page 19: Bayesian belief networks 2. PCA and ICA

ICA 3.

How to find the independent components ?

Optimize:

)||(||))(()( 24 wFxwExwkurt TT

All solution vectors (w) are local minimum solutions, and they correspond to one of the independent components, i.e., on the components of the si vectors.

Page 20: Bayesian belief networks 2. PCA and ICA

ICA 4.How to do it practically ?

FastICA algorithm (Hyvarinen and Oja):

Calculates by iterations the w vectors.

The calculation formula is:

wxxwn

wn

i

Tnew 3)(1

1

w converges to one of the vectors corresponding to one of the independent components.

Page 21: Bayesian belief networks 2. PCA and ICA

ICA 5.

In practice we have to calculate several w vectors. To test whether the generated independent components are really independent we can use statistical tests.

Let us consider s1i=w1Txi and s2

i=w2Txi.

Then we can test the independence of s1 and s2 by calculating their correlation and testing their identical origin by the F-test (they may not be strongly correlated but at the same time they may have identical origin).

If the testing accepts the independence of the two series we may accept w2 as a new vector that corresponds to a separate independent component.

Page 22: Bayesian belief networks 2. PCA and ICA

ICA 6.

Remarks

By calculating the independent components we get a new representation of the data, which has the property that the components contain minimum mutual information.

We can use the ICA to select the independent non-Gaussian components, but we cannot separate the Gaussian mixtures.