Separation theorem for independent subspace analysis and its consequences

10

Click here to load reader

Transcript of Separation theorem for independent subspace analysis and its consequences

Page 1: Separation theorem for independent subspace analysis and its consequences

Pattern Recognition 45 (2012) 1782–1791

Contents lists available at SciVerse ScienceDirect

Pattern Recognition

0031-32

doi:10.1

n Corr

E-m

bapoczo

journal homepage: www.elsevier.com/locate/pr

Separation theorem for independent subspace analysis and its consequences

Zoltan Szabo a,n, Barnabas Poczos b, Andras L +orincz a

a Faculty of Informatics, Eotvos Lorand University, Pazmany Peter setany 1/C, H-1117 Budapest, Hungaryb Carnegie Mellon University, Robotics Institute, 5000 Forbes Ave, Pittsburgh, PA 15213, United States

a r t i c l e i n f o

Article history:

Received 7 December 2010

Received in revised form

26 July 2011

Accepted 15 September 2011Available online 22 September 2011

Keywords:

Separation principles

Independent subspace analysis

Linear systems

Controlled models

Post nonlinear systems

Complex valued models

Partially observed systems

Nonparametric source dynamics

03/$ - see front matter & 2011 Elsevier Ltd. A

016/j.patcog.2011.09.007

esponding author. Tel.: þ36 70 548 3369; fax

ail addresses: [email protected] (Z. Szabo),

[email protected] (B. Poczos), andras.lorincz@elt

a b s t r a c t

Independent component analysis (ICA) – the theory of mixed, independent, non-Gaussian sources – has

a central role in signal processing, computer vision and pattern recognition. One of the most

fundamental conjectures of this research field is that independent subspace analysis (ISA) – the

extension of the ICA problem, where groups of sources are independent – can be solved by traditional

ICA followed by grouping the ICA components. The conjecture, called ISA separation principle, (i) has

been rigorously proven for some distribution types recently, (ii) forms the basis of the state-of-the-art

ISA solvers, (iii) enables one to estimate the unknown number and the dimensions of the sources

efficiently, and (iv) can be extended to generalizations of the ISA task, such as different linear-,

controlled-, post nonlinear-, complex valued-, partially observed problems, as well as to problems

dealing with nonparametric source dynamics. Here, we shall review the advances on this field.

& 2011 Elsevier Ltd. All rights reserved.

1. Introduction

Independent component analysis (ICA) [1–3] has receivedconsiderable attention in signal processing, computer vision andpattern recognition, e.g., in face representation and recognition[4,5], information theoretical image matching [6], fMRI analysis [7],feature extraction of natural images [8], texture segmentation [9],artifact separation in MEG recordings, and the exploration ofhidden factors in financial data [10]. One may consider ICA as acocktail party problem: we have some speakers (sources) and somemicrophones (sensors), which measure the mixed signals emittedby the sources. The task is to recover the original sources from themixed observations. For a recent review about ICA, see [11–13].

Traditional ICA algorithms are one-dimensional in the sense thatall sources are assumed to be independent real valued randomvariables. Nonetheless, applications in which only certain groups ofthe hidden sources are independent may be highly relevant inpractice, because one cannot expect that all source components arestatistically independent. In this case, the independent sources canbe multidimensional. For instance, consider the generalization of thecocktail-party problem where independent groups of musiciansare playing at the party. The separation task requires an extension

ll rights reserved.

: þ36 1 381 2140.

e.hu (A. L +orincz).

of ICA, which is called multidimensional ICA [14], independentsubspace analysis (ISA) [15], independent feature subspace analysis[16], subspace ICA [17], or group ICA [18] in the literature. Wewill use the ISA abbreviation throughout this paper. The severalsuccessful applications and the large number of different ISAalgorithms – the authors are aware of more than 30 ‘‘different’’ISA approaches (in terms of the applied cost function and theoptimization technique) – show the importance of this field. Below,we list a few successful applications of ISA in signal processing,computer vision, and pattern recognition.

ECG analysis: An important task of ECG signal processing is toestimate fetal ECG signals from ECG recordings measured onthe mother’s skin (cutaneous recordings) [19,14,20,18,17]. Thecardiac waveform of the fetal ECG can provide useful informationfor detecting certain diseases. Potential measurements on themother’s skin are the results of numerous bioelectric phenomena,such as maternal and fetal heart activity, respiration, stomachactivity, and some other noise terms. The electric activity of thefetal and maternal hearts can be considered as independentmultidimensional sources, and the ECG measurements on themother’s skin are the mixture of these bioelectric and noise signals,where the transfer from the bioelectric sources to the electrodes onthe body surface can be approximated by an unknown linearmixing. Our goal is to estimate this unknown linear mixing andthe fetal heart activity.

fMRI and MEG data processing: In fMRI data processing, our goalis to detect and extract task-related signal components, artifacts,

Page 2: Separation theorem for independent subspace analysis and its consequences

Z. Szabo et al. / Pattern Recognition 45 (2012) 1782–1791 1783

and noise terms from voxel activities. The main principles thatallow us to perform these tasks are localization and connectism,which state that different brain parts are responsible for differentcognitive tasks and these areas are spatially distributed. Accord-ing to these principles, one may assume that the time series (a.k.a.time courses) of the voxels are linear mixtures of independentcomponents [7], and it is of great importance to recover theseindependent signals (component maps) from the voxels. Thecomponent maps show the brain areas related to the independentcomponents. Recently, [21] has shown that the assumption thatall of these components are independent might be too strong inpractice, and hence the application of ISA instead of ICA can givephysiologically more meaningful results in certain cases.

Similarly, one might assume that there are hidden indepen-dent sources belonging to MEG measurements. Ref. [22] has alsoshown that the full independence assumption might be toorestrictive in this case as well, and better results can be achievedif we allow dependent sources too by using ISA instead of ICA.

Natural image analysis, texture classification: It has been demon-strated several times that ICA on natural images leads to imagefilters that resemble to the simple cells in the V1 visual cortical areaof the brain: They are localized, oriented, and selective only tocertain frequencies (bandpass filters) [23]. If we use ISA instead ofICA on natural images, i.e, if we allow dependencies between someof the components, then ISA will provide independent subspacesthat show phase- and shift-invariant properties [15,24,25]. ISA isnaturally able to group similar components (with respect tofrequency and orientation) into the same subspace. By exploitingthis invariance property and selecting only one element from eachsubspace, Ref. [26] showed that one can get results similar to otherstate-of-the-art methods in the texture classification task usingmuch smaller filter bank.

Action recognition: Another successful ISA application is action

recognition in movies [27]. Here the key idea was the observationthat ISA on spatiotemporal datasets provides subspaces thatcontain velocity selective invariant features. This ISA basedapproach outperformed several state-of-the-art methods usinghand-crafted features (Harris3D, Cuboids, Hessian, HOG/HOF,HOG3D, extended SURF, etc.).

Learning of face view-subspaces: Multi-view face detection andrecognition are very challenging problems. ICA, ISA and topo-graphic ICA [28] – a generalization of ISA – can be used to learnview-specific feature representation [29]. In turn, ICA on multi-view face data sets leads to view-specific feature components,and ISA can organize these feature vectors into facial view-specific groups (subspaces). In addition to this, topographic ICAis able to arrange these subspaces in a topographically consistentway as well.

Single-channel source separation: An important problem inaudio scene analysis is single-channel source separation. In thissetting there are several independent sound sources and amicrophone records the mixture of these sounds. The goal isto estimate the original signals from the microphone recording.Ref. [30] proposed an approach in which they applied ISA on theFourier transformed windowed observations (spectogram). Usingthis approach on a Beethoven string quartet, they found that thismethod was able to separate the independent instruments intodifferent subspaces.

Motion segmentation: Multibody motion segmentation is animportant problem in computer vision. By observing the trajectoriesof certain points of a few objects, our goal is to decide which pointsbelong to which objects. Assuming that we have a linear cameramodel, the multibody motion segmentation reduces to an ISAproblem, where each subspace belongs to a single object [31].

Gene expression analysis: Gene clustering is a valuable tool fordescribing the characteristic patterns of cells and understanding

properties of unknown genes. Linear latent variable models seemto outperform standard clustering approaches in this area. In thisframework it is assumed that the gene expression profiles are theresults of several biological processes, where each process affectsfew genes only. Since there are independent and highly inter-connected processes as well, Ref. [32] proposed to use ISA forprocessing gene expression data. Their approach lead to biologi-cally valuable and gene ontologically interpretable results.

One of the most exciting and fundamental hypotheses of theICA research is due to Cardoso [14], who conjectured that the ISAtask can be solved by ICA preprocessing and then clustering of theICA elements into statistically independent groups. While theextent of this conjecture, the ISA separation principle, is still anopen issue, it has recently been rigorously proven for somedistribution types [33], and for this reason we call it ISA Separa-tion Theorem. This principle (i) forms the basis of many state-of-the-art ISA algorithms, (ii) can be used to design algorithms thatscale well and efficiently estimate the dimensions of the hiddensources and (iii) can be extended to different linear-, controlled-,post nonlinear-, complex valued-, partially observed systems, aswell as to systems with nonparametric source dynamics. Here, wereview such consequences of the theorem.

Beyond ISA, there exist numerous other exciting directionsthat relax the traditional assumptions of ICA (one-dimensionalsources, i.i.d. sources in time, instantaneous mixture, completeobservation). Below we list a few of these directions. We will seein the subsequent sections that the ISA separation technique canbe extended to these models.

Linear systems: One may relax the ICA assumptions by assum-ing sources that have linear dynamics (e.g., autoregressive ones[34]), or echoes (moving average dynamics) may also be presentleading to the blind source deconvolution (BSD) problem [35].

Post nonlinear models: The linear mixing restriction of ICA canbe relaxed by assuming that there is an unknown component-wise nonlinear function superimposed on the linear mixture. ThisICA generalization also has many successful applications, e.g., insensor array processing, data processing in biological systems,and satellite communications. For an excellent review, see [36].

Complex valued sources and complex mixing: In the complex ICAproblem, the sources and the mixing are both realized in thecomplex domain. Complex values naturally emerge in fMRI dataprocessing, where in addition to the magnitude, the phaseinformation can also be important. The complex-valued compu-tations have been present from the ‘‘birth’’ of ICA [2,3] and shownice potentials in the analysis of biomedical signals (EEG, fMRI),see e.g., [37–39].

Incomplete observations: In this setting certain parts (coordi-nates/time instants) of the mixture are not available for observa-tion [40,41].

Nonparametric dynamics: The general case of sources withunknown, nonparametric dynamics is quite challenging, and veryfew works focused on this direction [18,42].

The paper is structured as follows: we define the ISA model inSection 2. We discuss the ISA Separation Theorem and its knownsufficient conditions in Section 3. Section 4 is about the exten-sions of the ISA separation principle. Corollaries of the separationprinciples are summarized in Section 5. Numerical illustrationsabout these corollaries are presented in Section 6. Conclusions aredrawn in Section 7. For the sake of convenience, we listed theabbreviations of the paper in the Appendix (Table A1).

2. The independent subspace analysis (ISA) model

Here we review the basics of the ISA model and the related ISAcost function (Section 2.1). We elaborate on the ambiguities of the

Page 3: Separation theorem for independent subspace analysis and its consequences

Z. Szabo et al. / Pattern Recognition 45 (2012) 1782–17911784

ISA task in Section 2.2 and present an ISA performance measurethat can be applied to general sources of different dimensions.

2.1. The ISA equations and cost function

We define the independent subspace analysis (ISA) model.Assume that we have an observation (xARDx ), which is instanta-neous linear mixture (A) of the hidden source (e), that is,

xt ¼ Aet , ð1Þ

where (i) the unknown mixing matrix AARDx�De has full columnrank, (ii) source et ¼ ½e1

t ; . . . ; eMt �ARDe is a vector concatenated

(using Matlab notation ‘‘;’’) of components emt ARdm (De ¼PM

m ¼ 1 dm), subject to the following conditions:

1.

bee

sub

et is assumed to be i.i.d. (independent and identically dis-tributed) in time t,

2.

there is at most one Gaussian variable among ems; this assump-tion will be referred to as the ‘‘non-Gaussian’’ assumption,

3.

ems are independent, that is Iðe1, . . . ,eMÞ ¼ 0, where I stands formutual information [43]. Mutual information of femgMm ¼ 1 isnon-negative and is zero, if and only if the femgMm ¼ 1 randomvariables are (jointly) independent.

The goal of the ISA problem is to eliminate the effect of the mixing(A) with a suitable WARDe�Dx demixing matrix and estimate theoriginal source components ems by using observations fxtg

Tt ¼ 1 only

(e ¼Wx). If all the em source components are one-dimensional(dm ¼ 1,8m), then the ICA task is recovered. For Dx4De the problemis called undercomplete, while the case of Dx¼De is regarded ascomplete.1

In ISA, it can be assumed without any loss of generality –applying zero mean normalization and principal componentanalysis [47] – that (i) x and e are white, i.e., their expectationvalue is zero, and their covariance matrix is the identity matrix (I),(ii) mixing matrix A is orthogonal, that is AT A¼ I, where super-script T stands for transposition, and (iii) the task is complete(D¼Dx ¼De). In what follows, this assumption will be referred toas ‘‘whiteness’’.

The estimation of the demixing matrix W¼A�1 is equivalent tothe minimization of the mutual information between the estimatedcomponents, or equivalently to the minimization of the sum of theentropies of the estimated source components [48]:

JIðWÞ :¼ Iðy1, . . . ,yMÞ, JHðWÞ :¼XM

m ¼ 1

HðymÞ, ð2Þ

where y¼Wx, y¼ ½y1; . . . ;yM�, ymARdm and H denotes the Shan-non’s multidimensional differential entropy [43]. One can easilyprove that due to the whiteness assumption, the optimization of thecost functions can be restricted to the orthogonal group (WAOD). Inthe rest of the paper we will consider the JH ISA cost function. In thespecial case when every hidden source component em is of one-dimensional (dm¼1, 8m), the ICA problem/cost function is recovered[49]. Other equivalent entropy and mutual information based formsof the ISA cost function are given in [33].

2.2. The ISA ambiguities and an ISA performance measure

Below, we list the ISA ambiguities that one can use to define ageneral performance measure for the ISA task.

1 We shall not treat the overcomplete case (Dx oDe), but some efforts have

n devoted to this area [44] extending the topographic ICA model [28] to

spaces with the quasi-orthogonal prior construction of [46].

Identification of the ISA model is ambiguous. However, theambiguities of the model are simple: hidden components can bedetermined up to permutation of the subspaces and up toinvertible linear transformations2 within the subspaces [50,51].Therefore, in the ideal case, the product of the estimated ISAdemixing matrix WISA and the ISA mixing matrix A, i.e., matrix

G¼ WISAA ð3Þ

is a block-permutation matrix (also called block-scaling matrix[18]). This property can be measured for source components withdifferent dimensions by a simple extension of the Amari-index[52]. Namely, assume that we have a weight matrix VARM�M

made of positive matrix elements. Loosely speaking, we shrinkthe di � dj sized blocks of matrix G according to the weights ofmatrix V and apply the traditional Amari-index for the resultingmatrix. Formally, (i) assume without loss of generality that thecomponent dimensions and their estimations are ordered inincreasing order (d1r � � �rdM , d1r � � �r dM), (ii) decompose Ginto di � dj sized blocks (G¼ ½Gij

�i,j ¼ 1,...,M) and define gij as the sumof the absolute values of the elements of the matrix GijARdi�dj ,weighted with Vij:

gij ¼ Vij

Xdi

k ¼ 1

Xdj

l ¼ 1

9ðGijÞk,l9: ð4Þ

Then the Amari-index with parameters V can be adapted to theISA task of possibly different component dimensions as follows:

rVðGÞ :¼1

2MðM�1Þ

XMi ¼ 1

PMj ¼ 1 gij

maxjgij�1

!þXMj ¼ 1

PMi ¼ 1 gij

maxigij�1

!24

35: ð5Þ

One can see that 0rrVðGÞr1 for any matrix G, and rVðGÞ ¼ 0 ifand only if G is block-permutation matrix with di � dj sizedblocks. rVðGÞ ¼ 1 is in the worst case, i.e, when all the gij elementsare equal. Note that this measure (5) is invariant to multiplicationwith a positive constant: rcV ¼ rV ð8c40Þ. Weight matrix V can beuniform (Vij ¼ 1) [53], or one can use weighing according to thesize of the subspaces: Vij ¼ 1=ðdidjÞ.

3. The ISA Separation Theorem

This section is about the ISA Separation Theorem that targetsone of the most relevant open conjectures of ICA research thatdates back to 1998 [14]. The conjecture is the cornerstone ofmany state-of-the-art ISA solvers and has a number of implica-tions. We show in Section 4 how to extend the conjecture to moregeneral models, such as non-i.i.d. linear-, controlled-, post non-linear-, complex valued-, partially observed systems, as well asfor problems with nonparametric source dynamics. Corollaries ofthese extensions are discussed in Section 5.

According to the ISA Separation Theorem, the solution of theISA task, i.e., the global optimum of the ISA cost function can befound by properly grouping the ICA elements, that is, ‘‘ISA¼ ICAfollowed by permutation search’’. In other words, one may thinkof the ISA problem (see JH, Eq. (2)) as an ICA task with dm¼1 (8m).In this case, cost function JH is the sum of entropies ofone-dimensional variables, which – given their one-dimensionalnature – can be estimated efficiently [49]. Then, if the ISASeparation Theorem holds, it is sufficient to permute the ICAelements (i.e., cluster them into statistically independent groups)to find the global solution of the ISA problem. One can providesufficient conditions for the ISA Separation Theorem by using cost

2 The condition of invertible linear transformations simplifies to orthogonal

transformations for the ‘‘white’’ case.

Page 4: Separation theorem for independent subspace analysis and its consequences

Fig. 2. Sufficient conditions for the ISA Separation Theorem. For details, see

Theorem 1.

Z. Szabo et al. / Pattern Recognition 45 (2012) 1782–1791 1785

function JH of the ISA task:

Theorem 1 (ISA Separation Theorem, Szabo et al. [33]). Let y¼½y1; . . . ; yD� ¼WxARD, where WAOD, xARD is the whitened observa-

tion of the ISA model, and D¼PM

m ¼ 1 dm. Let Sdm

R denote the surface

of the dm-dimensional unit sphere, that is Sdm

R :¼ fwARdm :Pdm

i ¼ 1 w2i ¼ 1g. Presume that the v :¼ emARdm sources ðm¼ 1, . . . ,MÞ

of the ISA model satisfy condition

HXdm

i ¼ 1

wivi

!Z

Xdm

i ¼ 1

w2i HðviÞ, 8wASdm

R , ð6Þ

and that the ICA cost function JICAðWÞ ¼PD

i ¼ 1 HðyiÞ has minimum over

the orthogonal matrices in WICA. Then it is sufficient to search for the

solution to the ISA task as a permutation of the solution to the ICA task.

Using the concept of demixing matrices, it is sufficient to explore forms

WISA ¼ PWICA, ð7Þ

where PARD�D is a permutation matrix to be determined and WISA is

the ISA demixing matrix.

The general question whether a certain source satisfies the ISASeparation Theorem is now partially answered, since Eq. (6)provides a sufficient condition. Eq. (6) holds, e.g., for variables(v :¼ em) satisfying the so-called w-EPI condition

e2HðPdm

i ¼ 1wiviÞZ

Xdm

i ¼ 1

e2HðwiviÞ, 8wASdm

R , ð8Þ

where EPI is a three letter acronym for the entropy power

inequality [43].The w-EPI condition is fulfilled, e.g., by spherical variables [54],

whose distributions are invariant to orthogonal transformations.One can show that in the 2-dimensional case (dm¼2) invarianceto 901 rotation, a condition weaker than invariance to sphericaltransformation, is sufficient. A special case of this requirement isinvariance to permutation and sign changes, which also includesdistributions having constant density over the spheres of theLp-space, the so-called Lp-spherical variables [55]. The case p¼2corresponds to spherical variables. For an illustration of distribu-tions with 901 rotation and sign change invariance, see Fig. 1.

Takano has shown [56] that the w-EPI condition is satisfied bycertain weakly dependent variables subject to the dimensionalityconstraint that dm¼2.

These sufficient conditions of the ISA Separation Theorem, i.e.conditions ensuring that the global optimum can be found by ICAfollowed by clustering of the ICA elements, are summarizedschematically in Fig. 2.

It is intriguing that if (6) is satisfied, then this simple separationprinciple provides the global minimum of the ISA cost function.About joint block diagonalization (JBD), Meraim and Belouchrani[57] have put forth a similar conjecture recently: the JBD of a finitematrix set can be obtained by the joint diagonalization of the set upto permutation. JBD based ISA solvers [18,58,51,59] make efficient

Fig. 1. Illustration: density functions (for variables em) invariant to 901 rotation or perm

arrowheads. Matrix R and matrix M are 901 counter-clockwise rotation and reflection

(a) and (c), respectively.

use of this conjecture in practice. We also note that this principlewas justified for local minimum points in [51].

4. Extensions of the ISA separation principle

Below we review the extensions of the ISA separation principle.The principle is extended to different linear (Section 4.1), postnonlinear- (Section 4.2), complex valued- (Section 4.3), controlled-,partially observed models, as well as for nonparametric sourcedynamics (Section 4.4). These different methods, however, can beused in combinations, too. It is important to note that the separationprinciple is valid for all of these models, and thus (i) the dimensionof the source components (dm) can be different and unknown in allof these models (see also the clustering algorithms in Section 5), and(ii) the Amari-index detailed in Section 2.2 can be applied as aperformance measure to all of them. Traditionally, the ISA problemconsiders the instantaneous linear mixture of independent and

identically distributed (i.i.d.) hidden sources (see Section 2.1). Theseconstraints will be alleviated below. The corresponding generalproblem family will be referred to as independent process analysis(IPA). The relationships of the different generalizations and separa-tion principles are illustrated in Fig. 4(a) and (b).

4.1. Linear systems

In this section we focus on linear models: Section 4.1.1 isabout autoregressive models, and Section 4.1.2 treats movingaverage (convolutive) based models.

4.1.1. The AR-IPA model

In the AR-IPA (autoregressive-IPA) task [60], the traditionali.i.d. assumption for the sources is generalized to AR time series:the hidden sources (smARdm ) are not necessarily independent,only their driving noises (emARdm ) are. The observation (xARD,D¼

PMm ¼ 1 dm) is an instantaneous linear mixture (A) of the

utation and sign changes. (a) and (c) density function f takes identical values at the

to axis x, respectively. (b) and (d) density functions (illustrated as 2D images) for

Page 5: Separation theorem for independent subspace analysis and its consequences

Z. Szabo et al. / Pattern Recognition 45 (2012) 1782–17911786

source s:

xt ¼ Ast , st ¼XLs

i ¼ 1

Fist�iþet , ð9Þ

where Ls is the order of the AR process, st ¼ ½s1t ; . . . ; s

Mt � and

et ¼ ½e1t ; . . . ;e

Mt �ARD denote the hidden sources and the hidden

driving noises, respectively. Eq. (9) can be rewritten in the followingconcise form: x¼As, and F½z�s¼ e using the polynomial of the time-shift operator F½z� :¼ I�

PLs

i ¼ 1 FiziAR½z�D�D [61]. We assume that

(i) polynomial matrix F½z� is stable, that is detðF½z�Þa0, for allzAC,9z9r1, (ii) mixing matrix AARD�D is invertible, and (iii) esatisfies the ISA assumptions (see Section 2.1). The aim of the AR-IPAtask is to estimate hidden sources sm, dynamics F½z�, driving noisesem and mixing matrix A or its W inverse given observations fxtg

Tt ¼ 1.

For the special case of Ls¼0, the ISA task is obtained.Making use of the basis transformation rule of AR processes, it

can be shown that the observation process x is also AR

xt ¼XLs

i ¼ 1

ðAFiA�1Þxt�iþnt , ð10Þ

with innovation nt ¼Aet , whose marginals are approximatelyGaussian according to the d-dependent central limit theorem [62].Using this form and that sources em are independent according toour assumptions, the AR-IPA estimation can be carried out by(i) applying AR fit to observation x, (ii) followed by ISA on nt , theestimated innovation of x. AR identification can be performed, e.g.,by the methods detailed in [63,64]. The pseudocode of this AR-IPAsolution can be found in Table A2. The presented approach extends[34] to multidimensional (dmZ1) sources. We note that in the one-dimensional case (dm¼1), simple temporal differentiating might besufficient for the reduction step [65].

4.1.2. The MA-IPA model and its extensions

In this section the assumption on instantaneous linear mixtureof the ISA model is weakened to convolutions. This problem iscalled moving average independent process analysis (MA-IPA,also known as blind subspace deconvolution) [33]. We describethis task for the undercomplete case. Assume that the convolutivemixture of hidden sources emARdm is available for observation(xARDx )

xt ¼XLe

l ¼ 0

Hlet�l, ð11Þ

where (i) Dx4De (undercomplete, De ¼PM

m ¼ 1 dm), (ii) the poly-nomial matrix H½z� ¼

PLe

l ¼ 0 HlzlAR½z�Dx�De has a (polynomial

matrix) left inverse3 and (iii) source e¼ ½e1; . . . ; eM�ARDe satisfiesthe conditions of ISA. The goal of this undercomplete MA-IPAproblem (uMA-IPA problem, where ‘‘u’’ stands for undercomplete)is to estimate the original em sources by using observationsfxtg

Tt ¼ 1 only. The case Le¼0 corresponds to the ISA task, and in

the blind source deconvolution problem [35] dm¼1 ð8mÞ, and Le isa non-negative integer. We note that in the ISA task the fullcolumn rank of matrix H0 was presumed, which is equivalent tothe assumption that matrix H0 has left inverse. This left inverseassumption is extended in the uMA-IPA model for the polynomialmatrix H½z�.

The separation principle below claims that by applying temporalconcatenation (TCC) on the observation, one can reduce the uMA-IPAestimation problem to ISA.

3 One can show for Dx 4De that under mild conditions H½z� has a left inverse

with probability 1 [66]; e.g., when the matrix ½H0 , . . . ,HLe� is drawn from a

continuous distribution.

Theorem 2 (uMA-IPA via TCC, Szabo et al. [33]). Let L0 be such that

DxL0ZDeðLeþL0Þ is fulfilled. Then we end up with an Xt ¼AEt ISA

task with an AARDxL0�DeðLeþL0Þ (H½z� dependent) Toeplitz matrix upon

applying temporal concatenation of depth LeþL0 and L0 on the

sources and the observations, respectively.

Choosing the minimal value for L0, the dimension of theobtained ISA task is Dmin ¼DeðLeþL0Þ ¼DeðLeþdDeLe=ðDx�DeÞeÞ.Unfortunately Dmin can easily become too large. This dimension-ality problem can be alleviated by the linear prediction approx-imation (LPA) approach, which is formulated in the followingtheorem.

Theorem 3 (uMA-IPA via LPA, Szabo et al. [67]). In the uMA-IPA

task, observation process xt is autoregressive (finite but of unknown

order), and its innovation ~xt :¼ xt�E½xt9xt�1,xt�2, . . .� is H0et , where

E½�9�� denotes the conditional expectation value. Consequently, there

is a polynomial matrix WLPAAR ½z�AR½z�Dx�Dx such that WLPA

AR ½z�x¼H0e,and thus the solution becomes an AR-IPA task.

In the undercomplete case, one can extend the LPA approachto the solution of the more general ARIMA-IPA (integratedautoregressive moving average IPA) [68] as well as to thecomplete MA-IPA [69] problems in an asymptotically consistentway. The ARIMA-IPA problem allows both AR and MA terms in theevolution of st hidden sources, and it also defines a non-stationaryprocess by means of rth order temporal difference.

In the one-dimensional (dm¼1) case, it has been shown thatthe uMA-IPA problem can be solved by means of spatio-temporaldecorrelation [70], and by the TCC technique [71]. Furthermore,the uMA-IPA and uARMA-IPA problems can be reduced to ICA byLPA [72,73].

4.2. Post nonlinear models

Below the linear mixing assumption of the ISA model isalleviated by presenting the post nonlinear ISA (PNL-ISA) problem[74]. Assume that the observations (xARD) are post nonlinearmixtures (fðA�Þ) of multidimensional independent sources(eARD

Þ

xt ¼ fðAetÞ, ð12Þ

where the (i) unknown function f : RD-RD is a component-wisetransformation, i.e, fðvÞ ¼ ½f 1ðv1Þ; . . . ; f DðvDÞ� and f is invertible, and(ii) mixing matrix AARD�D and hidden source e satisfy the ISAassumptions of Section 2.1. The PNL-ISA problem is to estimate thehidden source components em knowing only the observationsfxtg

Tt ¼ 1. For dm¼1, we get back the PNL-ICA problem [75] (for a

review see [36]), whereas ‘‘f ¼ identity’’ leads to the ISA task.Under certain technical conditions, one can carry out the

estimation of the hidden source e on the basis of the mirror

structure of the PNL-ISA system (12). Formally, we use the e ¼½e

1; . . . ; e

M� ¼WgðxÞ ¼WgðfðAeÞÞ equation, where we need to esti-

mate W and g. In the ideal case the component-wise acting gtransformation and matrix W inverts function f and matrix A,respectively. Independence of the estimated e

ms implies the

recovery of sources em up to permutation and invertible affinetransformations within each subspace. According to the d-depen-dent central limit theorem [62], marginals of Ae can be consideredas approximately Gaussian variables. Therefore, the nonlinearity gcan be optimized to make the distribution of observation x themost similar to a Gaussian distribution. This is called ‘‘gaussianiza-tion (transformation)’’ [76,77]. In these works the one-dimensional(dm¼1) PNL-ICA special case was treated; however, the ideas canbe generalized to PNL-ISA as well in the following way [74]: aftergaussianization, the next step is to estimate W by means of linearISA. This second step transforms the result of the gaussianization

Page 6: Separation theorem for independent subspace analysis and its consequences

Z. Szabo et al. / Pattern Recognition 45 (2012) 1782–1791 1787

transformation [gðxÞ] into the opposite direction, i.e., to the mostnon-Gaussian direction.

4.3. Complex valued models

We summarize a few basic concepts for complex variables. Anexcellent review on this topic can be found in [78]. Define thejv : C

L/R2L, jM : CL1�L2/R2L1�2L2 mappings as

jvðvÞ ¼ v�Rð�Þ

Ið�Þ

" #, jMðMÞ ¼M�

Rð�Þ �Ið�Þ

Ið�Þ Rð�Þ

" #, ð13Þ

where � is the Kronecker product, R stands for the real part, I forthe imaginary part. Subscript v and M denote vector and matrix,respectively. Independence of complex random variables vmACdm

ðm¼ 1, . . . ,MÞ is defined as the independence of variables jvðvmÞ.The entropy of a complex independent variable vACd isHðvÞ¼HðjvðvÞÞ.

By the definition of independence for complex random vari-ables, the complex valued ISA (C-ISA) task [79,80] can be definedsimilarly to the real case (Section 2.1) as x¼Ae.

We review two approaches to solve the complex ISA problem.First, suppose that the ‘‘non-Gaussian’’ assumption is made in theC-ISA model for variables jvðe

mÞAR2dm . Now, applying jv to thecomplex ISA equation (Eq. (1)), one gets

jvðxÞ ¼jMðAÞjvðeÞ: ð14Þ

Given that (i) the independence of emACdm is equivalent to thatof jvðe

mÞAR2d, and (ii) the existence of the inverse of jMðAÞ isinherited from A, we end up with a real valued ISA task withobservation jvðxÞ and M pieces of 2dm-dimensional hiddencomponents jvðe

mÞ. The consideration can be extended to thecomplex variants of the linear models of Section 4.1 including theARIMA-IPA model, too.

Another possible solution is to use the ISA Separation Theorem,which remains valid even for complex variables [33] when thefollowing condition holds for the v¼ emACdm hidden sources:

HXi ¼ 1

dmwivi

!Z

Xdm

i ¼ 1

9wi92HðviÞ, 8wASdm

C : ð15Þ

Sources v¼ emACdm that satisfy the complex w-EPI property –which is similar to (8), but ‘‘2H’’ needs to be replaced by ‘‘H’’ and‘‘Sdm

R ’’ by ‘‘Sdm

C ’’ – also satisfy the sufficient condition (15). Complexspherical variables [81], whose distribution are invariant to unitarytransformations, make one of the examples. The relation of thesesufficient conditions is illustrated in Fig. 3.

4.4. Controlled, partially observed models, and nonparametric

source dynamics

In what follows we briefly review the generalization of the IPAproblem to controlled (ARX-IPA, ‘‘X’’ stands for exogenous input)and partially observed problems (mAR-IPA, ‘‘m’’ denotes missingobservations), as well as to problems with nonparametric sourcedynamics (fAR-IPA, ‘‘f’’ means functional). All of these threeproblems can be solved with the tricks used for solving theAR-IPA problem (Section 4.1.1). Formally, the ARX-IPA [82],

Fig. 3. Sufficient conditions for the complex ISA Separation Theorem. For details,

see Section 4.3.

mAR-IPA [83], and fAR-IPA [53] problems are defined as follows:

xt ¼ Ast , st ¼XLs

i ¼ 1

Fist�iþXLu

j ¼ 1

Bjutþ1�jþet , ð16Þ

yt ¼MtðxtÞ, xt ¼Ast , st ¼XLs

i ¼ 1

Fist�iþet , ð17Þ

xt ¼ Ast , st ¼ fðst�1, . . . ,st�LsÞþet , ð18Þ

where the notation will be explained below.In the ARX-IPA problem (Eq. (16)) the AR-IPA assumption holds

(Eq. (9)), but the time evolution of the hidden source s canbe influenced via control variable ut ARDu through matricesBjARD�Du . The goal is to estimate the hidden source s, the drivingnoise e, the parameters of the dynamics and control matrices(Fi and Bj), as well as the mixing matrix A or its inverse W byusing observations fxtg

Tt ¼ 1. In the special case of Lu¼0, the ARX-

IPA task reduces to AR-IPA.In the mAR-IPA problem (Eq. (17)), the AR-IPA assumptions

(Eq. (9)) are relaxed by allowing a few coordinates of the mixedAR sources xt ARD to be missing at certain time instants. Formally,we observe yt ARD instead of xt , where ‘‘mask mappings’’ Mt :RD/RD represent the coordinates and the time indices of the non-missing observations. Our task is the estimation of the hiddensource s, its driving noise e, parameters of the dynamics F½z�, mixingmatrix A (or its inverse W) from observation fytg

Tt ¼ 1. The special

case of ‘‘Mt ¼ identity’’ corresponds to the AR-IPA task.In the fAR-IPA problem, the parametric assumption for the

dynamics of the hidden sources is circumvented by fAR sources(18). The goal is the same as before; we are to estimate hiddensources smARdm including their dynamics f and their drivinginnovations emARdm as well as mixing matrix A (or its inverse W)given observations fxtg

Tt ¼ 1. If we knew the parametric form of f

and if it were linear, then the problem would be AR-IPA.By exploiting the facts that the linear invertible transforma-

tions (A) of fAR, ARX, and AR (see Eq. (10)) processes also belongto the family of fAR, ARX, and AR processes with nt ¼ Aet

innovation,4 we can see that a reasonable approach is to fitARX, mAR, or fAR processes to the observations and then useISA on the estimated nt innovations.

The parameter estimation of ARX processes can be done byeither recent active learning methods [85], or by means of moretraditional approaches [86,87]. The mAR fit can be accomplished,e.g., by the maximum likelihood principle [86], the subspacetechnique [87], or in a Bayesian framework [88]. For the identifica-tion of fAR processes, one can use nonparametric regression [89,90].

5. Consequences of the separation principles

The ISA Separation Theorem (Section 3) and its extensions(Section 4) have a number of important consequences that wediscuss here.

According to the ISA Separation Theorem, the ISA task can besolved by finding the optimal permutation of the ICA elements byrendering the elements into statistically dependent subspaces.State-of-the-art solvers use this approach since it scales well andenables the estimation for unknown component dimensions(fdmg

Mm ¼ 1, M) too. These properties are detailed below.

First, assume that the dimensions (dm) of the hidden sourcesare given. Then according to the ISA Separation Theorem, cost

4 We note that these tricks can be used for other processes as well which have

this property, such as Markov-switching AR-IPA processes. For Markov-switching

AR processes, see [84].

Page 7: Separation theorem for independent subspace analysis and its consequences

Fig. 4. Illustration of the IPA problem family. (a) Connections between problems;

arrows point to special cases. (b) Separation principles. Prefix ‘‘u’’ denotes under-

complete case. According to the figure, complex linear models (Section 4.3) can be

reduced to linear models (Section 4.1) using the jv ,jM transformations. Similarly,

PNL problems (Section 4.2) can be reduced to linear problems (Section 4.1) using

Gaussianization, etc.

Z. Szabo et al. / Pattern Recognition 45 (2012) 1782–17911788

function (2) can be minimized by considering all demixingmatrices W¼ PWICA, where P denotes a permutation. Below welist a few possibilities for finding P.

Exhaustive way: The possible number of all permutations, i.e.,the number of P matrices is D!, where ‘‘!’’ denotes the factorialfunction. Considering that the ISA cost function is invariant to theexchange of elements within the subspaces (see, e.g., (2)), thenumber of relevant permutations decreases to D!=ð

QMm ¼ 1 dm!Þ.

This number can still be enormous, and the related computationscould be formidable justifying searches for efficient approxima-tions that we detail below.

Greedy way [79]: We exchange two estimated ICA componentsbelonging to different subspaces, if the exchange decreases thevalue of the ISA cost J as long as such pairs exist.

‘‘Global’’ way: Our experiences show that greedy permutationsearch is often sufficient for the estimation of the ISA subspaces.However, if the greedy approach cannot find the true ISAsubspaces, then global permutation search method of highercomputational burden may become necessary [91]: the cross-entropy solution suggested for the traveling salesman problem[92] can be adapted to this case.

Now, let us assume that source dimensions (dm) are not known inadvance. The lack of such knowledge causes combinatorial difficulty

in such a sense that one should try all possible D¼ d1þ � � �

þdM ðdm40,MrDÞ dimension allocations to the subspace (em)dimensions, where D is the dimension of the hidden source e. Thenumber of these f(D) possibilities grows quickly with the argument,its asymptotic behavior is known [93,94]: f ðDÞ � expðp

ffiffiffiffiffiffiffiffiffiffiffiffi2D=3

pÞ=

ð4Dffiffiffi3pÞ as D-1. An efficient method with good scaling properties

has been put forth in [68] for searching the permutation group forthe ISA Separation Theorem (see Table A3). This approach buildsupon the fact that the mutual information between different ISAsubspaces em is zero due the assumption of independence. Themethod assumes that coordinates of em that fall into the samesubspace can be paired by using the pairwise dependence of thecoordinates. For the clustering of the coordinates, one may applydifferent approaches:

Greedy solutions: The AR-IPA task (Section 4.1.1) can be solved bygreedy solutions as in [95] for Ls¼1 and [96] for LsZ1 assumingthat F½z� is block-diagonal, i.e, the sources sm are not coupledthrough the dynamics. Using the basis transformation rule of ARprocesses, we can see that after ICA preprocessing of the estimatedinnovation of the observations, it is sufficient to jointly blockdiagonalize the coefficient matrices of the polynomial matrixF

s½z� :¼WICAF½z�W�1

ICA. The optimal permutation to get block diag-onal matrices can be estimated using, e.g., greedy clustering of thecoordinates. Similarly, greedy considerations can be applied in theISA problem by replacing f9F

s

ij9ARD�DgDi,j ¼ 1, for example, with

generalized variance [97], or cumulant based [51] matrices.Robust approaches: It has been reported that the previous greedy

approach is not robust enough in certain applications, and morerobust clustering methods have been proposed to overcome thisdifficulty. These robust approaches include hierarchical clustering[20,98], tree-structured clustering [99], deterministic annealing [30],and spectral clustering methods [68]. We note that spectral clusteringmethods scale well; for those ISA problems that satisfy the conditionsdetailed in Table A3, a single general desktop computer can handleabout a million observations (in our case estimated ICA elements)within several minutes [100]. Spectral clustering thus fits large-scaleapplications, and for large ISA tasks ICA is the main bottleneck.

It is worth noting that one can construct examples wherealgorithms that only use pairwise dependencies cannot work well[101,91].

For all of the problems defined in Sections 3 and 4, thedimensions of the source components may differ or may evenbe unknown. The quality of the solution can be measured by the

Amari-index detailed in Section 2.2. Thanks to the separationprinciple, these problems can be reduced to well-known subtasks,such as ICA, clustering, estimation of mutual information betweenone-dimensional random variables, gaussianization, estimation oflinear models, principal component analysis, nonparametricregression, etc., enabling the exploitation of well-studied solutiontechniques of these subproblems.

6. Numerical illustrations

In this section we provide three numerical experiments todemonstrate some of the algorithms presented above.

In our first experiment we compare ISA and AR-IPA methods ona facial dataset [60]. Separating mixed facial images is a commonapplication of ISA. We chose six different facial images (M¼6)with 50�50 pixels (Fig. 5(a)). The pixel values were linearlyscaled and truncated to integers such that their sum was 100,000.Then we scanned the images from left-to-right and from top-to-bottom and took the 2D coordinate samples of the pixels as manytimes as the value of each pixel. When we mixed these two-dimensional sources, ISA was not able to find the proper sub-spaces because the sampling is very far from being temporallyindependent (Fig. 5(c) and (e)). Nevertheless, the AR-IPA methodwas able to estimate the subspaces of the faces (Fig. 5(d) and (f)).

In our second experiment we compare the TCC and LPA methodson the separation task of the convolutive mixture of stereo Beatlessongs (A Hard Day’s Night, Can’t Buy Me Love) [67]. The sources arenot i.i.d., and their dimension is dm¼2 (stereo songs). We studiedthe Dx ¼ 2De case in the uBSSD problem, using sample number1000rTr75;000 and convolution length 1rLer30. The perfor-mances measured by the Amari-index are shown in Fig. 6 (here weaveraged 50 independent experiments). Fig. 6(a) demonstrates thatthe TCC performed well when the sample size was TZ50,000. TheLPA method provided good results for TZ30,000 (Fig. 6(b)), and forthis sample size it worked better than TCC. For larger convolutionparameter Le, the LPA method is even more superior. For T¼75,000and Le ¼ 1;2,5;10,20;30, Fig. 6(c) shows that on average LPA per-forms 1.50, 2.24, 4.33, 4.42, 9.03, 11.13 times better than TCC,respectively.

In our third experiment we compared the AR-IPA and fAR-IPAmethods on the ikeda dataset [53]. Here, the hidden sm

t ¼ ½smt,1,sm

t,2�AR2 sources (M¼2) are realized by the ikeda map sm

tþ1;1 ¼ 1þlm

½smt,1 cos ðwm

t Þ�smt,2 sin ðwm

t Þ�, smtþ1;2 ¼ lm½sm

t,1 sin ðwmt Þþsm

t,2 cos ðwmt Þ�,

where lm is a parameter of the dynamical system and wmt ¼

0:4�ð6=ð1þðsmt,1Þ

2þðsm

t,2Þ2ÞÞ, see Fig. 7(a). We mixed these sources

by a random A mixing matrix; they formed our observations xt . Wecompared the AR-IPA and fAR-IPA methods on this dataset. Theresults of 10 independent experiments can be seen in Fig. 7(b). As the

Page 8: Separation theorem for independent subspace analysis and its consequences

Fig. 5. AR-IPA vs ISA illustration on facial dataset. (a) Original facial images, the hidden st sources. (b) Mixed sources, the observation (xt). (c) and (d) Independent sources

estimated (st) by ISA and AR-IPA, respectively. (e) and (f) Hinton-diagram of G, the product of the estimated demixing matrix and the mixing matrix for ISA and AR-IPA,

respectively. When the separation is perfect, it is a block-permutation matrix with 2�2 blocks.

1 2 5 10 30 50 75

10–2

10–1

100

Number of samples (T)

uMA–IPA: Beatles (LPA)

Le=1Le=2Le=5Le=10Le=20Le=30

x103 x1031 2 5 10 30 50 75

100

102

Number of samples (T)

uMA–IPA: Beatles (TCC/LPA)

Le=1Le=2Le=5Le=10Le=20Le=30

Fig. 6. LPA vs TCC illustration on convolved Beatles songs. (a) Hinton-diagram of G using the TCC method with Le¼5. When the separation is perfect this is a block-

permutation matrix of two blocks. (b) LPA performance as a function of the sample size (T) and the convolution length (Le). (c) same as (b), but here we show the quotient

of the TCC and LPA Amari-indices (showing how many times LPA is better than TCC).

5 10 20

10–2

10–1

100

Am

ari–

inde

x (r

)

Number of samples (T/1000)

ikeda

ARfAR, β =1/2fAR, β =1/4fAR, β =1/8fAR, β =1/16fAR, β =1/32fAR, β =1/64

Fig. 7. AR-IPA vs fAR-IPA illustration on the ikeda dataset. (a) Hidden st sources. (b) Amari-index as a function of the sample number for the AR-IPA and the fAR-IPA

method; bc Að0;1Þ: kernel regression bandwidth parameter. (c) Observation, xt . (d) Hinton-diagram of G with average Amari-index. When the separation is perfect it is

block-permutation matrix of 2�2 sized blocks. (e) Estimated subspaces (st) using the fAR-IPA method (bc ¼ 1=2, T¼20,000).

Z. Szabo et al. / Pattern Recognition 45 (2012) 1782–1791 1789

results show, the standard AR-IPA method could not find the propersubspaces, but the fAR-IPA method was able to estimate thesubspaces for TZ10;000 sample size.

7. Conclusions

We have reviewed known results on several different general-izations of independent subspace analysis and the ISA separationprinciple. According to this principle, the ISA task can be solved byapplying ICA, and then clustering the ICA components intostatistically dependent subspaces. The theorem has recently beenrigorously proven for some distribution types. Joint block diag-onalization based methods have an analogous approach with asimilar separation principle. The Separation Theorem enables oneto construct ISA methods that scale well with the dimensionseven if the dimensions of the subspaces differ or are unknown.It also makes possible to extend the ISA problem to differentlinear-, controlled-, post nonlinear-, complex valued-, and

partially observed systems, as well as to systems with nonpara-metric source dynamics.

Acknowledgments

The Project is supported by the European Union and co-financedby the European Social Fund (grant agreements no. TAMOP 4.2.1/B-09/1/KMR-2010-0003 and KMOP-1.1.2-08/1-2008-0002). Theresearch was partly supported by the Department of Energy (grantnumber DESC0002607).

Appendix A. Abbreviations

We summarize the notations in Table A1. The first part of thelist contains the acronyms. It is followed by a few mathematicalnotations.

Page 9: Separation theorem for independent subspace analysis and its consequences

Table A1Acronyms and mathematical notations.

Abbreviation Meaning

Acronyms

AR, MA, ARMA AutoRegressive, Moving Average, AR MA

ARX, ARIMA AR with eXogenous input, Integrated ARMA

fAR, mAR Functional AR, AR with Missing values

PNL, ‘‘u-’’ Post NonLinear, prefix for ‘‘undercomplete’’

ECG, EEG, Electro-Cardiography, Electro-Encephalography

EPI Entropy Power Inequality

FIR Finite Impulse Response

fMRI Functional Magnetic Resonance Imaging

ICA/ISA/IPA Independent Component/Subspace/Process Analysis

i.i.d. Independent Identically Distributed

JBD Joint Block Diagonalization

LPA Linear Prediction Approximation

TCC Temporal Concatenation

Mathematical notations

R,RL ,RL1�L2 ,R½z�L1�L2 Real and complex numbers, L-dimensional vectors,

C,CL ,CL1�L2 L1 � L2 sized matrices, -polynomial matrices

OD , UD D�D sized orthogonal and unitary matrices

H, I Entropy, mutual information

SdR , Sd

Cd-dimensional unit sphere over R and C

�, Rð�Þ, Ið�Þ Kronecker product, real and imaginary part

Table A2AR-IPA Algorithm—pseudocode.

Input of the algorithm:

Observation (fxtgTt ¼ 1), AR order (Ls)

Optional (depending on the applied ISA solver):

number of components (M) and source dimensions (fdmgMm ¼ 1)

Optimization:

AR estimation of order Ls on x ) FAR ½z�

Estimation of the innovation of x ) n ¼ FAR ½z�x

ISA on the estimated innovation n )

ISA demixing matrix: W ISA; optional: M , fdmgMm ¼ 1

Output of the algorithm:

Estimated mixing matrix, hidden source: A ¼ W�1

ISA, s ¼ W ISAx

Source dynamics, driving noise: F½z� ¼ W ISAFAR ½z�W�1

ISA, e ¼ W ISAn ,

Optional:

Number of components and dimensions of sources: M , fdmgMm ¼ 1

Table A3Approximation that scales well for the permutation search task in the ISA

Separation Theorem.

Construct an undirected graph with nodes corresponding to ICA coordinates

and edge weights (similarities) defined by the pairwise statistical

dependencies, i.e., the mutual information of the estimated ICA elements:

S¼ ½Iðe ICA,i ,e ICA,jÞ�Di,j ¼ 1. Cluster the ICA elements, i.e., the nodes using similarity

matrix S.

Z. Szabo et al. / Pattern Recognition 45 (2012) 1782–17911790

References

[1] C. Jutten, J. Herault, Blind separation of sources: an adaptive algorithmbased on neuromimetic architecture, Signal Processing 24 (1991) 1–10.

[2] J. Cardoso, A. Souloumiac, Blind beamforming for non-Gaussian signals, IEEProceedings F 140 (6) (1993) 362–370.

[3] P. Comon, Independent component analysis, a new concept? Signal Proces-sing 36 (3) (1994) 287–314.

[4] M.S. Bartlett, J.R. Movellan, T.J. Sejnowski, Face recognition by independentcomponent analysis, IEEE Transactions on Neural Networks 13 (6) (2002)1450–1464.

[5] P.C. Yuen, J.H. Lai, Face representation using independent componentanalysis, Pattern Recognition 35 (6) (2002) 1247–1257.

[6] H. Neemuchwalaa, A. Hero, P. Carson, Image matching using alpha-entropy measures and entropic graphs, Signal Processing 85 (2) (2005)277–296.

[7] M.J. McKeown, S. Makeig, G.G. Brown, T.-P. Jung, S.S. Kindermann, A.J. Bell,T.J. Sejnowski, Analysis of fMRI data by blind separation into independentspatial components, Human Brain Mapping 6 (1998) 160–188.

[8] A.J. Bell, T.J. Sejnowski, The ‘independent components’ of natural scenes areedge filters, Vision Research 37 (1997) 3327–3338.

[9] R. Jenssen, T. Eltoft, Independent component analysis for texture segmenta-tion, Pattern Recognition 36 (13) (2003) 2301–2315.

[10] A. Hyvarinen, E. Oja, Independent component analysis: algorithms andapplications, Neural Networks 13 (4-5) (2000) 411–430.

[11] A. Cichocki, S. Amari, Adaptive Blind Signal and Image Processing, Wiley,2002.

[12] A. Hyvarinen, J. Karhunen, E. Oja, Independent Component Analysis, Wiley,2001.

[13] S. Choi, A. Cichocki, H. Park, S. Lee, Blind source separation and independentcomponent analysis, Neural Information Processing: Letters and Reviews 6(2005) 1–57.

[14] J. Cardoso, Multidimensional independent component analysis, in: ICASSP1998, pp. 1941–1944.

[15] A. Hyvarinen, P.O. Hoyer, Emergence of phase and shift invariant featuresby decomposition of natural images into independent feature subspaces,Neural Computation 12 (2000) 1705–1720.

[16] H. Kim, S. Choi, S. Bang, Membership scoring via independent featuresubspace analysis for grouping co-expressed genes, in: IJCNN, 2003,pp. 1690–1695.

[17] A. Sharma, K.K. Paliwal, Subspace independent component analysis usingvector kurtosis, Pattern Recognition 39 (2006) 2227–2232.

[18] F.J. Theis, Blind signal separation into groups of dependent signals usingjoint block diagonalization, in: ISCAS, 2005, pp. 5878–5881.

[19] L.D. Lathauwer, B.D. Moor, J. Vandewalle, Fetal electrocardiogram extractionby blind source subspace separation, IEEE Transactions on BiomedicalEngineering 47 (5) (2000) 567–572.

[20] H. Stogbauer, A. Kraskov, S.A. Astakhov, P. Grassberger, Least dependentcomponent analysis based on mutual information, Physical Review E:Statistical, Nonlinear, and Soft Matter Physics 70 (6) (2004) 066123.

[21] S. Ma, X.-L. Li, N. Correa, T. Adali, V. Calhoun, Independent subspace analysiswith prior information for fMRI data, in: ICASSP, 2010, pp. 1922–1925.

[22] F. Kohl, G. Wubbeler, D. Kolossa, C. Elster, M. Bar, R. Orglmeister, Non-independent BSS: a model for evoked MEG signals with controllabledependencies, in: ICA, 2009, pp. 443–450.

[23] B.A. Olshausen, D.J. Field, Emergence of simple-cell receptive field proper-ties by learning a sparse code for natural images, Nature 381 (1996)607–609.

[24] Y. Nishimori, S. Akaho, M.D. Plumbley, Riemannian optimization method onthe flag manifold for independent subspace analysis, in: ICA, 2006,pp. 295–302.

[25] H. Choi, S. Choi, Relative gradient learning for independent subspaceanalysis, in: IJCNN, 2006, pp. 3919–3924.

[26] C.S. Santos, J.E. Kogler, E.D.M. Hernandez, Using independent subspaceanalysis for selecting filters used in texture processing, in: ICIP, 2005,pp. 465–468.

[27] Q. Le, W. Zou, S. Yeung, A. Ng, Stacked convolutional independent subspaceanalysis for action recognition, in: CVPR, 2011, pp. 3361–3368.

[28] A. Hyvarinen, P.O. Hoyer, M. Inki, Topographic independent componentanalysis, Neural Computation 13 (7) (2001) 1527–1558.

[29] S.Z. Li, X. Lv, H. Zhang, View-subspace analysis of multi-view face patterns,in: RATFG-RTS, 2001, pp. 125–132.

[30] M.A. Casey, A. Westner, Separation of mixed audio sources by independentsubspace analysis, in: ICMC, 2000, pp. 154–161.

[31] Z. Fan, J. Zhou, Y. Wu, Motion segmentation based on independent subspaceanalysis, in: ACCV, 2004.

[32] J.K. Kim, S. Choi, Tree-dependent components of gene expression data forclustering, in: ICANN, 2006, pp. 837–846.

[33] Z. Szabo, B. Poczos, A. L +orincz, Undercomplete blind subspace deconvolu-tion, Journal of Machine Learning Research 8 (2007) 1063–1095.

[34] A. Hyvarinen, Independent component analysis for time-dependent sto-chastic processes, in: ICANN 1998, pp. 541–546.

[35] M.S. Pedersen, J. Larsen, U. Kjems, L.C. Parra, A survey of convolutive blindsource separation methods, in: Springer Handbook of Speech Processing,Springer, 2007.

[36] C. Jutten, J. Karhunen, Advances in blind source separation (BSS) andindependent component analysis (ICA) for nonlinear systems, InternationalJournal of Neural Systems 14 (5) (2004) 267–292.

[37] J. Anemuller, T.J. Sejnowski, S. Makeig, Complex independent componentanalysis of frequency-domain electroencephalographic data, Neural Net-works 16 (2003) 1311–1323.

[38] V. Calhoun, T. Adali, G. Pearlson, P. van Zijl, J. Pekar, Independent compo-nent analysis of fMRI data in the complex domain, Magnetic Resonance inMedicine 48 (2002) 180–192.

[39] J. Anemuller, J. Duann, T.J. Sejnowski, S. Makeig, Spatio-temporal dynamicsin fMRI recordings revealed with complex independent component analy-sis, Neurocomputing 69 (13-15) (2006) 1502–1512.

[40] K. Chan, T.-W. Lee, T.J. Sejnowski, Variational Bayesian learning of ICA withmissing data, Neural Computation 15 (8) (2003) 1991–2011.

[41] A.T. Cemgil, C. Fevotte, S.J. Godsill, Variational and stochastic inferencefor Bayesian source separation, Digital Signal Processing 17 (2007)891–913.

Page 10: Separation theorem for independent subspace analysis and its consequences

Z. Szabo et al. / Pattern Recognition 45 (2012) 1782–1791 1791

[42] J. Anemuller, Second-order separation of multidimensional sources withconstrained mixing system, in: ICA, 2006, pp. 16–23.

[43] T.M. Cover, J.A. Thomas, Elements of Information Theory, Wiley, New York,USA, 1991.

[44] L. Ma, L. Zhang, Bayesian estimation of overcomplete independent featuresubspaces for natural images, in: ICA, 2007, pp. 746–753.

[46] A. Hyvarinen, M. Inki, Estimating overcomplete independent componentbases for image windows, Journal of Mathematical Imaging and Vision 17(2) (2002) 139–152.

[47] H. Hotelling, Analysis of a complex of statistical variables into principalcomponents, Journal of Educational Psychology 24 (1933) 417–441.

[48] B. Poczos, A. L +orincz, Independent subspace analysis using k-nearestneighborhood distances, in: ICANN, 2005, pp. 163–168.

[49] E.G. Learned-Miller, J.W. Fisher III, ICA using spacings estimates of entropy,Journal of Machine Learning Research 4 (2003) 1271–1295.

[50] F.J. Theis, Uniqueness of complex and multidimensional independentcomponent analysis, Signal Processing 84 (5) (2004) 951–956.

[51] F.J. Theis, Towards a general independent subspace analysis, in: NIPS, 2007,pp. 1361–1368.

[52] S. Amari, A. Cichocki, H.H. Yang, A new learning algorithm for blind signalseparation, in: NIPS, 1996, pp. 757–763.

[53] Z. Szabo, B. Poczos, Nonparametric independent process analysis, in:EUSIPCO, 2011, pp. 1718–1722.

[54] K.-T. Fang, S. Kotz, K.W. Ng, Symmetric Multivariate and Related Distribu-tions, Chapman and Hall, 1990.

[55] A.K. Gupta, D. Song, Lp-norm spherical distribution, Journal of StatisticalPlanning and Inference 60 (2) (1997) 241–260.

[56] S. Takano, The inequalities of Fisher information and entropy power fordependent variables, in: Proceedings of the Seventh Japan-Russia Sympo-sium on Probability Theory and Mathematical Statistics, 1995.

[57] K. Abed-Meraim, A. Belouchrani, Algorithms for joint block diagonalization,in: EUSIPCO, 2004, pp. 209–212.

[58] F.J. Theis, Multidimensional independent component analysis using char-acteristic functions, in: EUSIPCO, 2005.

[59] K. Nordhausen, H. Oja, Independent subspace analysis using three scattermatrices, Austrian Journal of Statistics 40 (1-2) (2011) 93–101.

[60] B. Poczos, B. Takacs, A. L +orincz, Independent subspace analysis on innova-tions, in: ECML, 2005, pp. 698–706.

[61] R.H. Lambert, Multichannel Blind Deconvolution: FIR Matrix Algebra andSeparation of Multipath Mixtures, Ph.D. Thesis, University of SouthernCalifornia, 1996.

[62] V. Petrov, Central limit theorem for m-dependent variables, in: Proceedingsof the All-Union Conference on Probability Theory and MathematicalStatistics, 1958, pp. 38–44.

[63] A. Neumaier, T. Schneider, Estimation of parameters and eigenmodes ofmultivariate autoregressive models, ACM Transactions on MathematicalSoftware 27 (1) (2001) 27–57.

[64] T. Schneider, A. Neumaier, Algorithm 808: ARfit—a Matlab package for theestimation of parameters and eigenmodes of multivariate autoregressivemodels, ACM Transactions on Mathematical Software 27 (1) (2001) 58–65.

[65] S. Choi, Differential learning algorithms for decorrelation and independentcomponent analysis, Neural Networks 19 (10) (2006) 1558–1567.

[66] R. Rajagopal, L.C. Potter, Multivariate MIMO FIR inverses, IEEE Transactionson Image Processing 12 (2003) 458–465.

[67] Z. Szabo, B. Poczos, A. L +orincz, Undercomplete blind subspace deconvolu-tion via linear prediction, in: ECML, 2007, pp. 740–747.

[68] B. Poczos, Z. Szabo, M. Kiszlinger, A. L +orincz, Independent process analysiswithout a priori dimensional information, in: ICA, 2007, pp. 252–259.

[69] Z. Szabo, Complete blind subspace deconvolution, in: ICA, 2009,pp. 138–145.

[70] S. Choi, A. Cichocki, Blind signal deconvolution by spatio-temporal decorr-elation and demixing, Neural Networks for Signal Processing 7 (1997)426–435.

[71] A. Mansour, C. Jutten, P. Loubaton, Subspace method for blind separation ofsources in convolutive mixture, in: EUSIPCO 1996, pp. 2081–2084.

[72] S. Icart, R. Gautier, Blind separation of convolutive mixtures using secondand fourth order moments, in: ICASSP 1996, pp. 3018–3021.

[73] A. Gorokhov, P. Loubaton, Multiple-input multiple-output ARMA systems:second order blind identification for signal extractions, in: SSAP 1996,pp. 348–351.

[74] Z. Szabo, B. Poczos, G. Szirtes, A. L +orincz, Post nonlinear independentsubspace analysis, in: ICANN, 2007, pp. 677–686.

[75] A. Taleb, C. Jutten, Source separation in post-nonlinear mixtures, IEEETransactions on Signal Processing 10 (47) (1999) 2807–2820.

[76] A. Ziehe, M. Kawanabe, S. Harmeling, K.-R. Muller, Blind separation of post-nonlinear mixtures using linearizing transformations and temporal decorr-elation, Journal of Machine Learning Research 4 (7-8) (2003) 1319–1338.

[77] J. Sole-Casals, C. Jutten, D. Pham, Fast approximation of nonlinearities forimproving inversion algorithms of PNL mixtures and Wiener systems,Signal Processing 85 (2005) 1780–1786.

[78] J. Eriksson, Complex random vectors and ICA models: identifiability,uniqueness and separability, IEEE Transactions on Information Theory 52(3) (2006) 1017–1029.

[79] Z. Szabo, A. L +orincz, Real and complex independent subspace analysis bygeneralized variance, in: ICARN, 2006, pp. 85–88.

[80] Y. Nishimori, S. Akaho, M.D. Plumbley, Natural conjugate gradient oncomplex flag manifolds for complex independent subspace analysis, in:ICANN, 2008, pp. 165–174.

[81] P. Krishnaiah, J. Lin, Complex elliptically symmetric distributions, Commu-nications in Statistics 15 (12) (1986) 3693–3718.

[82] Z. Szabo, A. L +orincz, Towards independent subspace analysis in controlleddynamical systems, in: ICARN, 2008, pp. 9–12.

[83] Z. Szabo, Autoregressive independent process analysis with missing obser-vations, in: ESANN, 2010, pp. 159–164.

[84] H.-M. Krolzig, Markov Switching Vector Autoregressions. Modelling, Statis-tical Inference and Application to Business Cycle Analysis, Springer, 1997.

[85] B. Poczos, A. L +orincz, Identification of recurrent neural networks byBayesian interrogation techniques, Journal of Machine Learning Research10 (2009) 515–554.

[86] J.T. Lomba, Estimation of dynamic econometric models with errors invariables, Lecture Notes in Economics and Mathematical Systems, vol.339, Springer, 1990.

[87] A. Garcıa-Hiernaux, J. Casals, M. Jerez, Fast estimation methods for timeseries models in state-space form, Journal of Statistical Computation andSimulation 79 (2) (2009) 121–134.

[88] K.R. Kadiyala, S. Karlsson, Numerical methods for estimation and inferencein Bayesian VAR-models, Journal of Applied Economics 12 (1997) 99–132.

[89] D. Bosq, Nonparametric statistics for stochastic processes: estimation andprediction, in: Lecture Notes in Statistics, Springer, 1998.

[90] N. Hilgert, B. Portier, Strong Uniform Consistency and Asymptotic Normalityof a Kernel Based Error Density Estimator in Functional AutoregressiveModels, Technical Report, 2009 /http://arxiv.org/abs/0905.2327v1S.

[91] Z. Szabo, B. Poczos, A. L +orincz, Cross-entropy optimization for independentprocess analysis, in: ICA, 2006, pp. 909–916.

[92] R.Y. Rubinstein, D.P. Kroese, The Cross-Entropy Method, Springer, 2004.[93] G.H. Hardy, S.I. Ramanujan, Asymptotic formulae in combinatory analysis,

Proceedings of the London Mathematical Society 17 (1) (1918) 75–115.[94] J.V. Uspensky, Asymptotic formulae for numerical functions which occur in

the theory of partitions, Bulletin of the Russian Academy of Sciences 14 (6)(1920) 199–218.

[95] B. Poczos, A. L +orincz, Non-combinatorial estimation of independent auto-regressive sources, Neurocomputing Letters 69 (2006) 2416–2419.

[96] Z. Szabo, B. Poczos, A. L +orincz, Auto-regressive independent process analysiswithout combinatorial efforts, Pattern Analysis & Applications 13 (2010)1–13.

[97] Z. Szabo, A. L +orincz, Independent subspace analysis can cope with the‘‘curse of dimensionality’’, Acta Cybernetica (þSymp. Intell. Syst. 2006) 18(2007) 213–221.

[98] P. Gruber, H.W. Gutch, F.J. Theis, Hierarchical extraction of independentsubspaces of unknown dimensions, in: ICA, 2009, pp. 259–266.

[99] F.R. Bach, M.I. Jordan, Beyond independent components: trees and clusters,Journal of Machine Learning Research 4 (2003) 1205–1233.

[100] D. Yan, L. Huang, M.I. Jordan, Fast approximate spectral clustering, in: KDD,2009, pp. 907–916.

[101] B. Poczos, A. L +orincz, Independent subspace analysis using geodesic span-ning trees, in: ICML, 2005, pp. 673–680.

Zoltan Szabo received M.Sc. in Applied Mathematics, 2006 and Ph.D. in Computer Science, 2009. He is a Ph.D. candidate (Applied Mathematics) and a research fellow at theEotvos Lorand University. His research interest includes independent subspace analysis and its extensions, information theory and kernel methods.

Barnabas Poczos received M.Sc. in Applied Mathematics, 2002 and Ph.D. in Computer Science, 2007. He is a postdoctoral fellow at the School of Computer Science,Carnegie Mellon University. His current research interest lies in the area of unsupervised learning, manifold learning, Bayesian methods, entropy and mutual informationestimation.

Andras L +orincz received M.Sc. in Physics, 1975; Ph.D. in Solid State Physics, 1978; C.Sc. in Molecular Physics, 1986; Laser Physics habilitation, 1998; InformationTechnology habilitation, 2009. He is working on intelligent systems and human computer collaboration at the Eotvos Lorand University.