Introduction to Machine Learning 10701aarti/Class/10701_Spring14/slides/ICA.pdfIndependent Component...
Transcript of Introduction to Machine Learning 10701aarti/Class/10701_Spring14/slides/ICA.pdfIndependent Component...
![Page 1: Introduction to Machine Learning 10701aarti/Class/10701_Spring14/slides/ICA.pdfIndependent Component Analysis Barnabás Póczos & Aarti Singh . 2 Independent Component Analysis . 3](https://reader036.fdocuments.in/reader036/viewer/2022062302/5ecee26d7a5f4970a80edba4/html5/thumbnails/1.jpg)
Introduction to Machine Learning 10701
Independent Component Analysis
Barnabás Póczos & Aarti Singh
![Page 2: Introduction to Machine Learning 10701aarti/Class/10701_Spring14/slides/ICA.pdfIndependent Component Analysis Barnabás Póczos & Aarti Singh . 2 Independent Component Analysis . 3](https://reader036.fdocuments.in/reader036/viewer/2022062302/5ecee26d7a5f4970a80edba4/html5/thumbnails/2.jpg)
2
Independent Component Analysis
![Page 3: Introduction to Machine Learning 10701aarti/Class/10701_Spring14/slides/ICA.pdfIndependent Component Analysis Barnabás Póczos & Aarti Singh . 2 Independent Component Analysis . 3](https://reader036.fdocuments.in/reader036/viewer/2022062302/5ecee26d7a5f4970a80edba4/html5/thumbnails/3.jpg)
3
Observations (Mixtures)
original signals
Model
ICA estimated signals
Independent Component Analysis
![Page 4: Introduction to Machine Learning 10701aarti/Class/10701_Spring14/slides/ICA.pdfIndependent Component Analysis Barnabás Póczos & Aarti Singh . 2 Independent Component Analysis . 3](https://reader036.fdocuments.in/reader036/viewer/2022062302/5ecee26d7a5f4970a80edba4/html5/thumbnails/4.jpg)
4
We observe
Model
We want
Goal:
Independent Component Analysys
![Page 5: Introduction to Machine Learning 10701aarti/Class/10701_Spring14/slides/ICA.pdfIndependent Component Analysis Barnabás Póczos & Aarti Singh . 2 Independent Component Analysis . 3](https://reader036.fdocuments.in/reader036/viewer/2022062302/5ecee26d7a5f4970a80edba4/html5/thumbnails/5.jpg)
5
PCA Estimation Sources Observation
x(t) = As(t) s(t)
Mixing
y(t)=Wx(t)
The Cocktail Party Problem SOLVING WITH PCA
![Page 6: Introduction to Machine Learning 10701aarti/Class/10701_Spring14/slides/ICA.pdfIndependent Component Analysis Barnabás Póczos & Aarti Singh . 2 Independent Component Analysis . 3](https://reader036.fdocuments.in/reader036/viewer/2022062302/5ecee26d7a5f4970a80edba4/html5/thumbnails/6.jpg)
6
ICA Estimation Sources Observation
x(t) = As(t) s(t)
Mixing
y(t)=Wx(t)
The Cocktail Party Problem SOLVING WITH ICA
![Page 7: Introduction to Machine Learning 10701aarti/Class/10701_Spring14/slides/ICA.pdfIndependent Component Analysis Barnabás Póczos & Aarti Singh . 2 Independent Component Analysis . 3](https://reader036.fdocuments.in/reader036/viewer/2022062302/5ecee26d7a5f4970a80edba4/html5/thumbnails/7.jpg)
7
• Perform linear transformations • Matrix factorization
X U S
X A S
PCA: low rank matrix factorization for compression
ICA: full rank matrix factorization to remove dependency among the rows
=
=
N
N
N
M
M<N
ICA vs PCA, Similarities
Columns of U = PCA vectors
Columns of A = ICA vectors
![Page 8: Introduction to Machine Learning 10701aarti/Class/10701_Spring14/slides/ICA.pdfIndependent Component Analysis Barnabás Póczos & Aarti Singh . 2 Independent Component Analysis . 3](https://reader036.fdocuments.in/reader036/viewer/2022062302/5ecee26d7a5f4970a80edba4/html5/thumbnails/8.jpg)
8
PCA: X=US, UTU=I ICA: X=AS, A is invertible
PCA does compression
• M<N
ICA does not do compression • same # of features (M=N)
PCA just removes correlations, not higher order dependence ICA removes correlations, and higher order dependence
PCA: some components are more important than others
(based on eigenvalues) ICA: components are equally important
ICA vs PCA, Similarities
![Page 9: Introduction to Machine Learning 10701aarti/Class/10701_Spring14/slides/ICA.pdfIndependent Component Analysis Barnabás Póczos & Aarti Singh . 2 Independent Component Analysis . 3](https://reader036.fdocuments.in/reader036/viewer/2022062302/5ecee26d7a5f4970a80edba4/html5/thumbnails/9.jpg)
9
Note • PCA vectors are orthogonal
• ICA vectors are not orthogonal
ICA vs PCA
![Page 10: Introduction to Machine Learning 10701aarti/Class/10701_Spring14/slides/ICA.pdfIndependent Component Analysis Barnabás Póczos & Aarti Singh . 2 Independent Component Analysis . 3](https://reader036.fdocuments.in/reader036/viewer/2022062302/5ecee26d7a5f4970a80edba4/html5/thumbnails/10.jpg)
10
ICA vs PCA
![Page 11: Introduction to Machine Learning 10701aarti/Class/10701_Spring14/slides/ICA.pdfIndependent Component Analysis Barnabás Póczos & Aarti Singh . 2 Independent Component Analysis . 3](https://reader036.fdocuments.in/reader036/viewer/2022062302/5ecee26d7a5f4970a80edba4/html5/thumbnails/11.jpg)
11
Gabor wavelets, edge detection, receptive fields of V1 cells..., deep neural networks
ICA basis vectors extracted from natural images
![Page 12: Introduction to Machine Learning 10701aarti/Class/10701_Spring14/slides/ICA.pdfIndependent Component Analysis Barnabás Póczos & Aarti Singh . 2 Independent Component Analysis . 3](https://reader036.fdocuments.in/reader036/viewer/2022062302/5ecee26d7a5f4970a80edba4/html5/thumbnails/12.jpg)
12
PCA basis vectors extracted from natural images
![Page 13: Introduction to Machine Learning 10701aarti/Class/10701_Spring14/slides/ICA.pdfIndependent Component Analysis Barnabás Póczos & Aarti Singh . 2 Independent Component Analysis . 3](https://reader036.fdocuments.in/reader036/viewer/2022062302/5ecee26d7a5f4970a80edba4/html5/thumbnails/13.jpg)
13
STATIC • Image denoising • Microarray data processing • Decomposing the spectra of
galaxies • Face recognition • Facial expression recognition • Feature extraction • Clustering • Classification • Deep Neural Networks
TEMPORAL
• Medical signal processing – fMRI, ECG, EEG • Brain Computer Interfaces • Modeling of the hippocampus, place cells
• Modeling of the visual cortex • Time series analysis • Financial applications • Blind deconvolution
Some ICA Applications
![Page 14: Introduction to Machine Learning 10701aarti/Class/10701_Spring14/slides/ICA.pdfIndependent Component Analysis Barnabás Póczos & Aarti Singh . 2 Independent Component Analysis . 3](https://reader036.fdocuments.in/reader036/viewer/2022062302/5ecee26d7a5f4970a80edba4/html5/thumbnails/14.jpg)
14
EEG ~ Neural cocktail party Severe contamination of EEG activity by
• eye movements • blinks • muscle • heart, ECG artifact • vessel pulse • electrode noise • line noise, alternating current (60 Hz)
ICA can improve signal • effectively detect, separate and remove activity in EEG records
from a wide variety of artifactual sources. (Jung, Makeig, Bell, and Sejnowski)
ICA weights (mixing matrix) help find location of sources
ICA Application, Removing Artifacts from EEG
![Page 15: Introduction to Machine Learning 10701aarti/Class/10701_Spring14/slides/ICA.pdfIndependent Component Analysis Barnabás Póczos & Aarti Singh . 2 Independent Component Analysis . 3](https://reader036.fdocuments.in/reader036/viewer/2022062302/5ecee26d7a5f4970a80edba4/html5/thumbnails/15.jpg)
15 Fig from Jung
ICA Application, Removing Artifacts from EEG
![Page 16: Introduction to Machine Learning 10701aarti/Class/10701_Spring14/slides/ICA.pdfIndependent Component Analysis Barnabás Póczos & Aarti Singh . 2 Independent Component Analysis . 3](https://reader036.fdocuments.in/reader036/viewer/2022062302/5ecee26d7a5f4970a80edba4/html5/thumbnails/16.jpg)
16 Fig from Jung
Removing Artifacts from EEG
![Page 17: Introduction to Machine Learning 10701aarti/Class/10701_Spring14/slides/ICA.pdfIndependent Component Analysis Barnabás Póczos & Aarti Singh . 2 Independent Component Analysis . 3](https://reader036.fdocuments.in/reader036/viewer/2022062302/5ecee26d7a5f4970a80edba4/html5/thumbnails/17.jpg)
17 17
original noisy Wiener filtered
median filtered
ICA denoised
ICA for Image Denoising
(Hoyer, Hyvarinen)
![Page 18: Introduction to Machine Learning 10701aarti/Class/10701_Spring14/slides/ICA.pdfIndependent Component Analysis Barnabás Póczos & Aarti Singh . 2 Independent Component Analysis . 3](https://reader036.fdocuments.in/reader036/viewer/2022062302/5ecee26d7a5f4970a80edba4/html5/thumbnails/18.jpg)
18
Method for analysis and synthesis of human motion from motion captured data
Provides perceptually meaningful “style” components 109 markers, (327dim data) Motion capture ) data matrix
Goal: Find motion style components. ICA ) 6 independent components (emotion, content,…)
ICA for Motion Style Components
(Mori & Hoshino 2002, Shapiro et al 2006, Cao et al 2003)
![Page 19: Introduction to Machine Learning 10701aarti/Class/10701_Spring14/slides/ICA.pdfIndependent Component Analysis Barnabás Póczos & Aarti Singh . 2 Independent Component Analysis . 3](https://reader036.fdocuments.in/reader036/viewer/2022062302/5ecee26d7a5f4970a80edba4/html5/thumbnails/19.jpg)
19
walk sneaky
walk with sneaky sneaky with walk
![Page 20: Introduction to Machine Learning 10701aarti/Class/10701_Spring14/slides/ICA.pdfIndependent Component Analysis Barnabás Póczos & Aarti Singh . 2 Independent Component Analysis . 3](https://reader036.fdocuments.in/reader036/viewer/2022062302/5ecee26d7a5f4970a80edba4/html5/thumbnails/20.jpg)
20
ICA Theory
![Page 21: Introduction to Machine Learning 10701aarti/Class/10701_Spring14/slides/ICA.pdfIndependent Component Analysis Barnabás Póczos & Aarti Singh . 2 Independent Component Analysis . 3](https://reader036.fdocuments.in/reader036/viewer/2022062302/5ecee26d7a5f4970a80edba4/html5/thumbnails/21.jpg)
21
Definition (Independence)
Statistical (in)dependence
Definition (Mutual Information)
Definition (Shannon entropy)
Definition (KL divergence)
![Page 22: Introduction to Machine Learning 10701aarti/Class/10701_Spring14/slides/ICA.pdfIndependent Component Analysis Barnabás Póczos & Aarti Singh . 2 Independent Component Analysis . 3](https://reader036.fdocuments.in/reader036/viewer/2022062302/5ecee26d7a5f4970a80edba4/html5/thumbnails/22.jpg)
22
Solving the ICA problem with i.i.d. sources
![Page 23: Introduction to Machine Learning 10701aarti/Class/10701_Spring14/slides/ICA.pdfIndependent Component Analysis Barnabás Póczos & Aarti Singh . 2 Independent Component Analysis . 3](https://reader036.fdocuments.in/reader036/viewer/2022062302/5ecee26d7a5f4970a80edba4/html5/thumbnails/23.jpg)
23
Solving the ICA problem
![Page 24: Introduction to Machine Learning 10701aarti/Class/10701_Spring14/slides/ICA.pdfIndependent Component Analysis Barnabás Póczos & Aarti Singh . 2 Independent Component Analysis . 3](https://reader036.fdocuments.in/reader036/viewer/2022062302/5ecee26d7a5f4970a80edba4/html5/thumbnails/24.jpg)
24
Whitening
(We assumed centered data)
![Page 25: Introduction to Machine Learning 10701aarti/Class/10701_Spring14/slides/ICA.pdfIndependent Component Analysis Barnabás Póczos & Aarti Singh . 2 Independent Component Analysis . 3](https://reader036.fdocuments.in/reader036/viewer/2022062302/5ecee26d7a5f4970a80edba4/html5/thumbnails/25.jpg)
25
Whitening
We have,
![Page 26: Introduction to Machine Learning 10701aarti/Class/10701_Spring14/slides/ICA.pdfIndependent Component Analysis Barnabás Póczos & Aarti Singh . 2 Independent Component Analysis . 3](https://reader036.fdocuments.in/reader036/viewer/2022062302/5ecee26d7a5f4970a80edba4/html5/thumbnails/26.jpg)
26
whitened original mixed
Whitening solves half of the ICA problem
Note: The number of free parameters of an N by N orthogonal matrix is (N-1)(N-2)/2.
) whitening solves half of the ICA problem
![Page 27: Introduction to Machine Learning 10701aarti/Class/10701_Spring14/slides/ICA.pdfIndependent Component Analysis Barnabás Póczos & Aarti Singh . 2 Independent Component Analysis . 3](https://reader036.fdocuments.in/reader036/viewer/2022062302/5ecee26d7a5f4970a80edba4/html5/thumbnails/27.jpg)
27
Remove mean, E[x]=0 Whitening, E[xxT]=I Find an orthogonal W optimizing an objective function
• Sequence of 2-d Jacobi (Givens) rotations
find y (the estimation of s),
find W (the estimation of A-1)
ICA solution: y=Wx
ICA task: Given x,
original mixed whitened rotated (demixed)
Solving ICA
![Page 28: Introduction to Machine Learning 10701aarti/Class/10701_Spring14/slides/ICA.pdfIndependent Component Analysis Barnabás Póczos & Aarti Singh . 2 Independent Component Analysis . 3](https://reader036.fdocuments.in/reader036/viewer/2022062302/5ecee26d7a5f4970a80edba4/html5/thumbnails/28.jpg)
28
p q
p
q
Optimization Using Jacobi Rotation Matrices
![Page 29: Introduction to Machine Learning 10701aarti/Class/10701_Spring14/slides/ICA.pdfIndependent Component Analysis Barnabás Póczos & Aarti Singh . 2 Independent Component Analysis . 3](https://reader036.fdocuments.in/reader036/viewer/2022062302/5ecee26d7a5f4970a80edba4/html5/thumbnails/29.jpg)
29
ICA Cost Functions
Proof: Homework Lemma
Therefore,
![Page 30: Introduction to Machine Learning 10701aarti/Class/10701_Spring14/slides/ICA.pdfIndependent Component Analysis Barnabás Póczos & Aarti Singh . 2 Independent Component Analysis . 3](https://reader036.fdocuments.in/reader036/viewer/2022062302/5ecee26d7a5f4970a80edba4/html5/thumbnails/30.jpg)
30
) go away from normal distribution
ICA Cost Functions
Therefore,
The covariance is fixed: I. Which distribution has the largest entropy?
![Page 31: Introduction to Machine Learning 10701aarti/Class/10701_Spring14/slides/ICA.pdfIndependent Component Analysis Barnabás Póczos & Aarti Singh . 2 Independent Component Analysis . 3](https://reader036.fdocuments.in/reader036/viewer/2022062302/5ecee26d7a5f4970a80edba4/html5/thumbnails/31.jpg)
31 31
The sum of independent variables converges to the normal distribution
) For separation go far away from the normal distribution ) Negentropy, |kurtozis| maximization
Figs from Ata Kaban
Central Limit Theorem
![Page 32: Introduction to Machine Learning 10701aarti/Class/10701_Spring14/slides/ICA.pdfIndependent Component Analysis Barnabás Póczos & Aarti Singh . 2 Independent Component Analysis . 3](https://reader036.fdocuments.in/reader036/viewer/2022062302/5ecee26d7a5f4970a80edba4/html5/thumbnails/32.jpg)
32
ICA Algorithms
![Page 33: Introduction to Machine Learning 10701aarti/Class/10701_Spring14/slides/ICA.pdfIndependent Component Analysis Barnabás Póczos & Aarti Singh . 2 Independent Component Analysis . 3](https://reader036.fdocuments.in/reader036/viewer/2022062302/5ecee26d7a5f4970a80edba4/html5/thumbnails/33.jpg)
33
rows of W
Maximum Likelihood ICA Algorithm David J.C. MacKay (97)
![Page 34: Introduction to Machine Learning 10701aarti/Class/10701_Spring14/slides/ICA.pdfIndependent Component Analysis Barnabás Póczos & Aarti Singh . 2 Independent Component Analysis . 3](https://reader036.fdocuments.in/reader036/viewer/2022062302/5ecee26d7a5f4970a80edba4/html5/thumbnails/34.jpg)
34
Maximum Likelihood ICA Algorithm
![Page 35: Introduction to Machine Learning 10701aarti/Class/10701_Spring14/slides/ICA.pdfIndependent Component Analysis Barnabás Póczos & Aarti Singh . 2 Independent Component Analysis . 3](https://reader036.fdocuments.in/reader036/viewer/2022062302/5ecee26d7a5f4970a80edba4/html5/thumbnails/35.jpg)
35
Kurtosis = 4th order cumulant
Measures
•the distance from normality
•the degree of peakedness
ICA algorithm based on Kurtosis maximization
![Page 36: Introduction to Machine Learning 10701aarti/Class/10701_Spring14/slides/ICA.pdfIndependent Component Analysis Barnabás Póczos & Aarti Singh . 2 Independent Component Analysis . 3](https://reader036.fdocuments.in/reader036/viewer/2022062302/5ecee26d7a5f4970a80edba4/html5/thumbnails/36.jpg)
36
Probably the most famous ICA algorithm
The Fast ICA algorithm (Hyvarinen)
(¸ Lagrange multiplier)
Solve this equation by Newton–Raphson’s method.
![Page 37: Introduction to Machine Learning 10701aarti/Class/10701_Spring14/slides/ICA.pdfIndependent Component Analysis Barnabás Póczos & Aarti Singh . 2 Independent Component Analysis . 3](https://reader036.fdocuments.in/reader036/viewer/2022062302/5ecee26d7a5f4970a80edba4/html5/thumbnails/37.jpg)
37
Newton method for finding a root
![Page 38: Introduction to Machine Learning 10701aarti/Class/10701_Spring14/slides/ICA.pdfIndependent Component Analysis Barnabás Póczos & Aarti Singh . 2 Independent Component Analysis . 3](https://reader036.fdocuments.in/reader036/viewer/2022062302/5ecee26d7a5f4970a80edba4/html5/thumbnails/38.jpg)
38
Newton Method for Finding a Root
Linear Approximation (1st order Taylor approx):
Goal:
Therefore,
![Page 39: Introduction to Machine Learning 10701aarti/Class/10701_Spring14/slides/ICA.pdfIndependent Component Analysis Barnabás Póczos & Aarti Singh . 2 Independent Component Analysis . 3](https://reader036.fdocuments.in/reader036/viewer/2022062302/5ecee26d7a5f4970a80edba4/html5/thumbnails/39.jpg)
39
Illustration of Newton’s method Goal: finding a root
In the next step we will linearize here in x
![Page 40: Introduction to Machine Learning 10701aarti/Class/10701_Spring14/slides/ICA.pdfIndependent Component Analysis Barnabás Póczos & Aarti Singh . 2 Independent Component Analysis . 3](https://reader036.fdocuments.in/reader036/viewer/2022062302/5ecee26d7a5f4970a80edba4/html5/thumbnails/40.jpg)
40
Example: Finding a Root
http://en.wikipedia.org/wiki/Newton%27s_method
![Page 41: Introduction to Machine Learning 10701aarti/Class/10701_Spring14/slides/ICA.pdfIndependent Component Analysis Barnabás Póczos & Aarti Singh . 2 Independent Component Analysis . 3](https://reader036.fdocuments.in/reader036/viewer/2022062302/5ecee26d7a5f4970a80edba4/html5/thumbnails/41.jpg)
41
Newton Method for Finding a Root This can be generalized to multivariate functions
Therefore,
[Pseudo inverse if there is no inverse]
![Page 42: Introduction to Machine Learning 10701aarti/Class/10701_Spring14/slides/ICA.pdfIndependent Component Analysis Barnabás Póczos & Aarti Singh . 2 Independent Component Analysis . 3](https://reader036.fdocuments.in/reader036/viewer/2022062302/5ecee26d7a5f4970a80edba4/html5/thumbnails/42.jpg)
42
Newton method for FastICA
![Page 43: Introduction to Machine Learning 10701aarti/Class/10701_Spring14/slides/ICA.pdfIndependent Component Analysis Barnabás Póczos & Aarti Singh . 2 Independent Component Analysis . 3](https://reader036.fdocuments.in/reader036/viewer/2022062302/5ecee26d7a5f4970a80edba4/html5/thumbnails/43.jpg)
43
The Fast ICA algorithm (Hyvarinen) Solve:
The derivative of F :
Note:
![Page 44: Introduction to Machine Learning 10701aarti/Class/10701_Spring14/slides/ICA.pdfIndependent Component Analysis Barnabás Póczos & Aarti Singh . 2 Independent Component Analysis . 3](https://reader036.fdocuments.in/reader036/viewer/2022062302/5ecee26d7a5f4970a80edba4/html5/thumbnails/44.jpg)
44
The Fast ICA algorithm (Hyvarinen)
Therefore,
The Jacobian matrix becomes diagonal, and can easily be inverted.
![Page 45: Introduction to Machine Learning 10701aarti/Class/10701_Spring14/slides/ICA.pdfIndependent Component Analysis Barnabás Póczos & Aarti Singh . 2 Independent Component Analysis . 3](https://reader036.fdocuments.in/reader036/viewer/2022062302/5ecee26d7a5f4970a80edba4/html5/thumbnails/45.jpg)
45
Other Nonlinearities
![Page 46: Introduction to Machine Learning 10701aarti/Class/10701_Spring14/slides/ICA.pdfIndependent Component Analysis Barnabás Póczos & Aarti Singh . 2 Independent Component Analysis . 3](https://reader036.fdocuments.in/reader036/viewer/2022062302/5ecee26d7a5f4970a80edba4/html5/thumbnails/46.jpg)
46
Other Nonlinearities Newton method:
Algorithm:
![Page 47: Introduction to Machine Learning 10701aarti/Class/10701_Spring14/slides/ICA.pdfIndependent Component Analysis Barnabás Póczos & Aarti Singh . 2 Independent Component Analysis . 3](https://reader036.fdocuments.in/reader036/viewer/2022062302/5ecee26d7a5f4970a80edba4/html5/thumbnails/47.jpg)
47
Fast ICA for several units