.
.
. ..
.
.
Independent Component Analysis for Blind SourceSeparation
Tatsuya Yokota
Tokyo Institute of Technology
Jan. 31, 2012
Jan. 31, 2012 1/28
Outline
.. .1 Blind Source Separation
.. .2 Independent Component Analysis
.. .3 Experiments
.. .4 Summary
Jan. 31, 2012 2/28
What’s a Blind Source Separation
Blind Source Separation is a method to estimate original signals from observedsignals which consist of mixed original signals and noise.
Jan. 31, 2012 3/28
Example of BSS
BSS is often used for Speech analysis and Image analysis.
Jan. 31, 2012 4/28
Example of BSS (cont’d)
BSS is also very important for brain signal analysis.
Jan. 31, 2012 5/28
Model Formalization
The problem of BSS is formalized as follow:The matrix
X ∈ Rm×d (1)
denotes original signals, where m is number of original signals, and d is dimensionof one signal.We consider that the observed signals Y ∈ Rn×d are given by linear mixing systemas
Y = AX + E, (2)
where A ∈ Rn×m is the unknown mixing matrix and E ∈ Rn×d denotes a noise.Basically, n ≥ m.The goal of BSS is to estimate A and X so that X provides unknown originalsignal as possible.
Jan. 31, 2012 6/28
Kinds of BSS Methods
Actually, degree of freedom of BSS model is very high to estimate A and X.Because there are a huge number of combinations (A,X) which satisfyY = AX + E.Therefore, we need some constraint to solve the BSS problem such as:
PCA : orthogonal constraint
SCA : sparsity constraint
NMF : non-negativity constraint
ICA : in-dependency constraint
In this way, there are many methods to solve the BSS problem depending on theconstraints. What we use is depend on subject matter.The Non-negative Matrix Factorization(NMF) was introduced in my previousseminar. We can get its solution by the alternating least squares algorithm.Today, I will introduce another method the Independent Component Analysis.
Jan. 31, 2012 7/28
Independent Component Analysis
.The Cocktail Party Problem..
.
. ..
.
.
x1(t) = a11s1(t) + a12s2(t) + a13s3(t) (3)
x2(t) = a21s1(t) + a22s2(t) + a23s3(t) (4)
x3(t) = a31s1(t) + a32s2(t) + a33s3(t) (5)
x is an observed signal, and s is an original signal. We assume that {s1, s2, s3}are statistically independent of each other.
.The model of ICA..
.
. ..
.
.
Independent Component Analysis (ICA) is to estimate the independentcomponents s(t) from x(t).
x(t) = As(t) (6)
Jan. 31, 2012 8/28
Approach
.Hypothesis of ICA..
.
. ..
.
.
...1 {si} are statistically independent of each other,
p(s1, s2, . . . , sn) = p(s1)p(s2) · · · p(sn). (7)
...2 {si} follow the Non-Gaussian distribution.If {si} follows the Gaussian distribution, then ICA is impossible.
...3 A is a regular matrix.Therefore, we can rewrite the model as
s(t) = Bx(t), (8)
where B = A−1. It is only necessary to estimate B so that {si} areindependent.
Jan. 31, 2012 9/28
Whitening and ICA
.Definition of White signal..
.
. ..
.
.
White signals are defined as any z which satisfies conditions of
E[z] = 0, E[zzT ] = I. (9)
First, we show an example of original independent signals and observed signal asfollow:
(a) source (s1, s2) (b) observed (x1, x2)
Observed signals x(t) are given by x(t) = As(t).ICA give us the original signals s(t) by s(t) = Bx(t).
Jan. 31, 2012 10/28
Whitening and ICA (cont’d)
Whitening is useful for preprocessing of ICA.First, we apply the whitening to observed signals x(t).
(c) observed (x1, x2) (d) whitening (z1, z2)
The whitening signals are denoted as (z1, z2), and they are given by
z(t) = V x(t), (10)
where V is a whitening matrix for x. Model becomes
s(t) = Uz(t) = UV x(t) = Bx(t), (11)
and U is an orthogonal transform matrix. We can say that the whiteningsimplifies the ICA problem. So it is only necessary to estimate U .
Jan. 31, 2012 11/28
Non-Gaussianity and ICA
Non-Gaussianity is a measure of in-dependency.According to the central limit theorem, the Gaussianity of x(t) must be largerthan s(t).Now, we put bTi as mixing vector, si(t) = bTi x(t). We want to maximize theNon-Gaussianity of (bTi x(t)). Then such b is a part of solution B.For example, there are following two vector b′ and b. We can say that b is betterthan b′.
Jan. 31, 2012 12/28
Maximization of Kurtosis
Kurtosis is a measures of Non-Gaussianity. Kurtosis is defined by
kurt(y) = E[y4]− 3(E[y2])2. (12)
We assume that y is white (i.e. E[y] = 0, E[y2] = 1 ), then
kurt(y) = E[y4]− 3. (13)
We can solve the ICA problem by
b = maxb|kurt(bTx(t))|. (14)
Figure: Kurtosis
Jan. 31, 2012 13/28
Fast ICA algorithm based on Kurtosis
We consider z is a white signal given from x. And we consider to maximize theabsolute value of kurtosis as
maximize |kurt(wTz)|, s.t. wTw = 1. (15)
Differential of |kurt(wTz)| is given by
∂|kurt(wTz)|∂w
=∂
∂w
∣∣E{(wTz)4} − 3E{(wTz)2}2∣∣ (16)
=∂
∂w
∣∣E{(wTz)4} − 3{||w||2}2∣∣ (because E(zzT ) = I) (17)
= 4sign[kurt(wTz)][E{z(wTz)3} − 3w||w||2
](18)
Jan. 31, 2012 14/28
Fast ICA algorithm based on Kurtosis (cont’d)
According to the gradient method, we can obtain following algorithm:.Gradient algorithm based on Kurtosis..
.
. ..
.
.
w ← w +∆w, (19)
w ← w
||w||, (20)
∆w ∝ sign[kurt(wTz)][E{z(wTz)3} − 3w
]. (21)
We can see that above algorithm converge when w ∝ ∆w. And w and −w areequivalent solution, so we can obtain another algorithm:.Fast ICA algorithm based on Kurtosis..
.
. ..
.
.
w ← E{z(wTz)3} − 3w, (22)
w ← w
||w||. (23)
It is well known as a fast convergence algorithm for ICA !!Jan. 31, 2012 15/28
Example
-3
-2
-1
0
1
2
3
-3 -2 -1 0 1 2 3
(a) subgaussian
-4
-2
0
2
4
-4 -2 0 2 4
(b) supergaussian
Figure: Example of ICA
Jan. 31, 2012 16/28
Issue of Kurtosis
Kurtosis has a fatal issue that it is very weak with the outliers. BecauseKurtosis is a fourth order function.Following figure depicts the result of kurtosis based ICA with outlier. The rates ofoutliers is only 2 %.
-4
-3
-2
-1
0
1
2
3
4
-4 -3 -2 -1 0 1 2 3 4
Figure: With outliers (20 : 1000)
Jan. 31, 2012 17/28
Neg-entropy based ICA
Kurtosis is very weak with outliers.Hence, the Neg-entropy is often used for ICA. In strictly, the approximation ofneg-entropy is often used, because it is robust for outliers.Neg-entropy is defined by
J(y) = H(yGauss)−H(y), (24)
where
H(y) = −∫
py(η) log py(η)dη, (25)
and yGauss is a Gaussian distribution of µ = E(y) and σ =√E((y − µ)2).
If y follows Gaussian distribution, then J(y) = 0.
Jan. 31, 2012 18/28
Fast ICA algorithm based on Neg-entropy
The approximation procedure of neg-entropy is complex, then it is omitted here.We just introduce the fast ICA algorithm based on neg-entropy:
.Fast ICA algorithm based on Neg-entropy..
.
. ..
.
.
w ← E[zg(wTz)]− E[g′(wTz)]w (26)
w ← w
||w||(27)
where we can select functions g and g′ from...1 g1(y) = tanh(a1y) and g′1(y) = a1(1− tanh2(a1y)),...2 g2(y) = y exp(−y2/2) and g′2(y) = (1− y2) exp(−y2/2),...3 g3(y) = y3 and g′3(y) = 3y2.
1 ≤ a1 ≤ 2.Please note that (g3, g
′3) is equivalent to Kurtosis based ICA.
Jan. 31, 2012 19/28
Examples
We can see that neg-entropy based ICA is robust for outliers.
-4
-3
-2
-1
0
1
2
3
4
-4 -3 -2 -1 0 1 2 3 4
(a) Kurtosis based
-4
-3
-2
-1
0
1
2
3
4
-4 -3 -2 -1 0 1 2 3 4
(b) Neg-entropy based (using g1)
Figure: With outliers (20 : 1000)
Jan. 31, 2012 20/28
Experiments: Real Image 1
(a) newyork
(b) shanghai
Figure: Original Signals
(a) ob 1 (b) ob 2
Figure: Observed Signals
(a) estimated signal 1
(b) estimated signal 2
Figure: Estimated Signals
Jan. 31, 2012 21/28
Experiments: Real Image 2
(a) buta
(b) kobe
Figure: Original Signals
(a) ob 1 (b) ob 2
Figure: Observed Signals
(a) estimated signal 1
(b) estimated signal 2
Figure: Estimated Signals
Jan. 31, 2012 22/28
Experiments: Real Image 2 (using filtering)
(a) buta
(b) kobe
Figure: Original Signals
(a) ob 1 (b) ob 2
Figure: Observed Signals
(a) estimated signal 1
(b) estimated signal 2
Figure: Estimated Signals
Jan. 31, 2012 23/28
Experiments: Real Image 3 (using filtering)
(a) nyc (b) sha
(c) rock (d) pig
(e) obs1 (f) obs2
(g) obs3 (h) obs4
Figure: Ori. & Obs.
(a) estimated signal 1 (b) estimated signal 2
(c) estimated signal 3 (d) estimated signal 4
Figure: Estimated Signals
Jan. 31, 2012 24/28
Approaches of ICA
In this research area, many method for ICA are studied and proposed as follow:...1 Criteria of ICA [Hyvarinen et al., 2001]
Non-Gaussianity based ICA*
Kurtosis based ICA*Neg-entropy based ICA*
MLE based ICAMutual information based ICANon-linear ICATensor ICA
...2 Solving Algorithm for ICA
gradient method*fast fixed-point algorithm* [Hyvarinen and Oja, 1997]
(‘*’ were introduced today.)
Jan. 31, 2012 25/28
Summary
I introduced about BSS problem and basic ICA techniques (Kurtosis,Neg-entropy).
Kurtosis is weak with outliers.
Neg-entropy is proposed as a robust measure of Non-Gaussianity.
I conducted experiments of ICA using Image data.
In some case, worse results are obtained.
But I solved this issue by using differential filter.
This technique is proposed in [Hyvarinen, 1998].
We knew that the differential filter is very effective for ICA.
Jan. 31, 2012 26/28
Bibliography I
[Hyvarinen, 1998] Hyvarinen, A. (1998).Independent component analysis for time-dependent stochastic processes.
[Hyvarinen et al., 2001] Hyvarinen, A., Karhunen, J., and Oja, E. (2001).Independent Component Analysis.Wiley.
[Hyvarinen and Oja, 1997] Hyvarinen, A. and Oja, E. (1997).A fast fixed-point algorithm for independent component analysis.Neural Computation, 9:1483–1492.
Jan. 31, 2012 27/28
Thank you for listening
Jan. 31, 2012 28/28
Top Related