Survey of Sparse and Non-Sparse Methods in Source Separation

8/14/2019 Survey of Sparse and Non-Sparse Methods in Source Separation

1/16

Survey of Sparse and Non-Sparse Methods in SourceSeparation

Paul D. OGrady, 1 Barak A. Pearlmutter, 1 Scott T. Rickard 2

1 Hamilton Institute, National University of Ireland Maynooth, Co. Kildare, Ireland. Ollscoil nahE ireann Ma Nuad, E ire

2 University College Dublin, Ireland. An Colaiste Ollscoile, Baile A tha Cliath, E ire

Received 20 September 2004; accepted 4 April 2005

ABSTRACT: Source separation arises in a variety of signal process-ing applications, ranging from speech processing to medical imageanalysis. The separation of a superposition of multiple signals isaccomplished by taking into account the structure of the mixing proc-ess and by making assumptions about the sources. When the infor-mation about the mixing process and sources is limited, the problemis called blind. By assuming that the sources can be representedsparsely in a given basis, recent research has demonstrated that sol-utions to previously problematic blind source separation problemscan be obtained. In some cases, solutions are possible to problemsintractable by previous non-sparse methods. Indeed, sparse meth-ods provide a powerful approach to the separation of linear mixturesof independent data. This paper surveys the recent arrival of sparseblind source separation methods and the previously existing non-sparse methods, providing insights and appropriate hooks into the-literature along the way. VVC 2005 Wiley Periodicals, Inc. Int J ImagingSyst Technol, 15;1833;2005; Published online in Wiley InterScience(www.interscience.wiley.com). DOI 10.1002/ima.20035

Key words: Blind Sources Separation; sparse methods; Non-negative Matrix Factorization

I. INTRODUCTIONWhen presented with a set of observations from sensors such asmicrophones, the process of extracting the underlying sources iscalled source separation. Doing so without strong additional infor-mation about the individual sources or constraints on the mixingprocess is called blind source separation (BSS). Generally the prob-

lem is stated as follows: given M linear mixtures of N sources mixedvia an unknown M N mixing matrix A, estimate the underlyingsources from the mixtures. When M N , this can be achieved byconstructing an unmixing matrix W , where W A 1 up to permu-tation and scaling of the rows. The conditions that must be satisedto guarantee separation are given by Darmois Theorem (Darmois,1953), which states that the sources must be non-Gaussian and stat-istically independent. The dimensionality of the mixing process

inuences the complexity of source separation. If M N , the mixingprocess A is dened by an even-determined (i.e. square) matrix and,provided that it is non-singular, the underlying sources can be esti-mated by a linear transformation. If M > N , the mixing process A isdened by an over-determined matrix and, provided that it is fullrank, the underlying sources can be estimated by least-squares opti-misation or linear transformation involving matrix pseudo-inver-sion. If M < N , then the mixing process is dened by an under-deter-mined matrix and consequently source estimation becomes moreinvolved and is usually achieved by some non-linear technique.

Environmental assumptions about the surroundings in which thesensor observations are made also inuence the complexity of theproblem. BSS of acoustic signals is often referred to as the cocktail

party problem (Cherry, 1953); that is the separation of individualvoices from a myriad of voices in an uncontrolled acoustic environ-ment such as a cocktail party. Sensor observations in a natural envi-ronment are confounded by signal reverberations, and conse-quently, the estimated unmixing process needs to identify a sourcearriving from multiple directions at different times as one individualsource. Generally, BSS techniques depart from this difcult real-world scenario and make less realistic assumptions about the envi-ronment so as to make the problem more tractable. There are typi-cally three assumptions that are made about environment. The mostrudimentary of these is the instantaneous case, where sources arriveinstantly at the sensors but with differing signal intensity. An exten-sion of the previous assumption, where arrival delays between sen-

sors are also considered, is known as the anechoic case. Theanechoic case can be further extended by considering multiple pathsbetween each source and each sensor, which results in the echoiccase, sometimes known as convolutional mixing. Each case can beextended to incorporate linear additive noise, which is usuallyassumed to be white and Gaussian.

Assumptions can also be made about the nature of the sources.Such assumptions form the basis for most source separation algo-rithms and include statistical properties such as independence andstationarity. One increasingly popular and powerful assumption isthat the sources have a parsimonious representation in a given basis.

Correspondence to: Paul D. OGrady; e-mail: [email protected] by Higher Education Authority of Ireland (An tu dara s Um Ard-

Oideachas) and Science Foundation Ireland grant 00/PI.1/C067.

'2005 Wiley Periodicals, Inc.


2/16

These methods have come to be known as sparse methods. A signalis said to be sparse when it is zero or nearly zero more than mightbe expected from its variance. Such a signal has a probability den-sity function or distribution of values with a sharp peak at zero andfat tails. This shape can be contrasted with a Gaussian distribution,which would have a smaller peak and tails that taper quiterapidly. A standard sparse distribution is the Laplacian distribution( p(c) ! e |c |), which has led to the sparseness assumption beingsometimes referred to as a Laplacian prior. The advantage of asparse signal representation is that the probability of two or moresources being simultaneously active is low. Thus, sparse representa-tions lend themselves to good separability because most of theenergy in a basis coefcient at any time instant belongs to a singlesource. This statistical property of the sources results in a nicelydened structure being imposed by the mixing process on the resul-tant mixtures, which can be exploited to make estimating the mix-ing process much easier. Additionally, sparsity can be used in manyinstances to perform source separation in the case when there aremore sources than sensors. A sparse representation of an acousticsignal can often be achieved by a transformation into a Fourier,Gabor or Wavelet basis.

The sparse representation of an acoustic signal has an interpreta-tion in information theoretic terms, where the representation of a

signal using a small number of coefcients corresponds to transmis-sion of information using a code utilising a small number of bits.Sparse representation of information is a phenomenon that alsooccurs in the natural world. In the brain, neurons are said to encodedata in a sparse way, if their ring pattern is characterised by longperiods of inactivity (Fo ldia k and Young, 1995; Kording et al.,2002), and recent work (DeWeese et al., 2003) indicates that sparserepresentations exist in the auditory cortex.

Blind source separation has been studied for nearly two decades.The earliest approach traces back to Herault and Jutten (1986)whose goal was to separate an instantaneous linear even-determinedmixture of non-Gaussian independent sources. They proposed a sol-ution that used a recurrent articial neural network to separate theunknown sources, and the crucial assumption being that the under-lying signals were independent. This early work led to the pioneer-ing adaptive algorithm of Jutten and Herault (1991). Linsker (1989)proposed unsupervised learning rules based on information theorythat maximise the average mutual information between the inputsand outputs of an articial neural network. Comon (1994) proposedthat mutual information was the most natural measure of independ-ence and showed that maximising the non-Gaussianity of the sourcesignals was equivalent to minimising the mutual informationbetween them. He also dubbed the concept of determining underly-ing sources by maximising independence, independent component analysis (ICA). Bell and Sejnowski (1995) developed a BSS algo-rithm called BS-Infomax, which is similar in spirit to that of Linsker and uses an elegant stochastic gradient learning rule that was pro-

posed by Amari et al. (1996). The idea of non-Gaussianity of sour-ces was used by Hyvarinen and Oja (1997) to develop their fICAalgorithm. As an alternative approach to separation using mutualinformation, Gaeta and Lacoume (1990) proposed maximum likeli-hood estimation, an approach elaborated by Pham et al. (1992),although, Pearlmutter and Parra (1996) and Cardoso (1997) later demonstrated that the BS-Infomax algorithm and maximum likeli-hood estimation are essentially equivalent. The early years of BSSresearch concentrated on solutions for even-determined and over-determined mixing processes. It was not until recent years that asolution for the under-determined case was proposed when Belou- T

a b l e I . A C o m p a r i s o n o f V a r i o u s S o u r c e S e p a r a t i o n T e c h n i q u e s .

A l g o r i t h m

N o . o f S o u r c e s ( N )

N o . o f S e n s o r s ( M )

M i x i n g M o d e l

R e p r e s e n t a t i o n

U n m i x i n g

M

> M

1

2

> 2

I n s t .

A n e c h o i c E c h o i c T i m e F r e q . S i

g n a l D i c t .

P a r a m e t e r E s t .

S e p a r a t i o n

W i e n e r ( 1 9 4 9 ) F i l t e r

P o w e r s p e c t r u m r e s h a p i n g

L i n e a r l t e r i n g

H e r a u l t a n d J u t t e n ( 1 9 8 6 )

I n d e p e n d e n c e m a x i m i s a t i o n

A d a p t i v e f e e d b a c k n e t w o r k

J A D E ( C a r d o s o a n d

S o u l o u m i a c , 1 9 9 3 )

J o i n t c u m u l a n t d i a g o n a l i s a t i o n

L i n e a r t r a n s f o r m a t i o n

I C A ( C o m o n , 1

9 9 4 )

M u t u a l i n f o r m a t i o n m i n i m i s a t i o n


B S - I n f o M a x ( B e l l a n d

S e j n o w s k i , 1 9 9 5 )

E n t r o p y m a x i m i s a t i o n


L a m b e r t ( 1 9 9 5 )

M u l t i c h a n n e l b l i n d d e c o n v o l u t i o n

C u r r e n t e s t i m a t e f e e d b a c k

L i n e t a l . (

1 9 9 7 ) a

H o u g h t r a n s f o r m

H a r d a s s i g n m e n t

S O B I ( B e l o u c h r a n i e t a l . ,

1 9 9 7 )

J o i n t u n i t a r y d i a g o n a l i s a t i o n


f I C A ( H y v a r i n e n a n d O j a

, 1 9 9 7 )

N o n - G a u s s i a n i t y m a x i m i s a t i o n


L e e e t a l . (

1 9 9 9 ) a

G r a d i e n t a s c e n t l e a r n i n g

M a x i m u m a p o s t e r i o r i

D U E T ( J o u r j i n e e t a l . ,

2 0 0 0 ) a

2 D H i s t o g r a m c l u s t e r i n g

B i n a r y t i m e - f r e q u e n c y m a s k i n g

B o l l a n d Z i b u l e v s k y ( 2 0 0 0 ) a

P o t e n t i a l f u n c t i o n c l u s t e r i n g

L 1 - N o r m m i n i m i s a t i o n

Z i b u l e v s k y a n d P e a r l m u t t e r ( 2 0 0 1 ) a

M A P w i t h L a p l a c i a n p r i o r

J o i n t o p t i m i s a t i o n

R o w e i s ( 2 0 0 1 )

H i d d e n m a r k o v m o d e l s

S p e c t r a l m a s k i n g a n d l t e r i n g

P e a r l m u t t e r a n d Z a d o r ( 2 0 0 4 ) a

K n o w n H R T F a n d d i c t i o n a r y

L 1 - N o r m m i n i m i s a t i o n

a S p a r s e m e t h o d s

.

Vol. 15, 1833 (2005) 19


3/16

chrani and Cardoso (1994) presented a maximum a posteriori(MAP) probability approach for discrete QAM sources. An app-roach for sparse sources was later proposed by Lewicki and Sejnow-ski (1998). The rst practical algorithm for separation in ananechoic environment was the DUET algorithm initially proposedby Jourjine et al. (2000) and further explored by Yilmaz and Rickard(2004). The rst algorithms for the anechoic separation of movingspeakers were presented by Rickard et al. (2001) and Anemu ller andKollmeier (2003). A selection of sparse and non-sparse source sepa-ration algorithms and their characteristics are presented in Table I.

Blind source separation techniques are not conned to acousticsignals. BSS has also been applied to the decomposition of func-tional brain imaging data such as electroencephalography (Junget al., 1999, 2000), functional magnetic resonance imaging(McKeown et al., 1998), and magnetoencephalography (Tang et al.,2000; Vigario et al., 2000; Wubbeler et al., 2000; Ziehe et al., 2000;Pearlmutter and Jaramillo, 2003). BSS has also been applied to suchdiverse areas as real time robot audition (Nakadai et al., 2002), digi-tal watermark attacks (Du et al., 2002), and nancial time seriesanalysis (Roth and Baram, 1996; Back and Weigend, 1997). It haseven been conjectured that BSS will have a role in the analysis of the Cosmic Microwave Background (Cardoso et al., 2003), poten-tially helping to elucidate the very origins of the universe.

This paper presents an overview of BSS techniques and is organ-ised as follows. The mixing models specic to each of the threecases specied earlier are presented in Section II and then a stagedalgorithm approach is discussed; staged algorithms perform mixingparameter estimation and source separation independently. Becausethese stages are separate, any method from stage one can be com-bined with any method in stage two to form a source separationalgorithm. Thus, in Section III, we present a selection of mixingparameter estimation techniques, and in Section IV, we present theunmixing methods for these staged algorithms. Within these sec-tions, we divide the techniques presented into instantaneous,anechoic and echoic mixing model subsections. Alternatively, somesource separation techniques estimate mixing parameters and sour-ces simultaneously; these methods are presented in Section V.

II. Mixing Models The generative model for the BSS problemcan be presented as follows: a set of T observations of M sensors

X x 1 j jx T

x10 x1 p x1T 1 p x20 x2 p x2T 1 p

..

. ... . .

. ...

x M 0 x M p x M T 1 p

266664

377775

consist of a linear mixture of N source signals

S s 1j js T

s10 s1 p s1T 1 ps20 s2 p s2T 1 p

..

. ... . .

. ...

s N 0 s N p s N T 1 p

266664

377775

by way of an unknown linear mixing process characterised by the M N mixing matrix A

A

a11 a12 a1 N a21 a22 a2 N

..

. ... . .

. ...

a M 1 a M 2 a MN

266664

377775

yielding the equation

x t A ? s t t ; 1

where [ (t ) is noise (usually white and Gaussian), p is the sampleperiod, and ? denotes the model dependent linear operator. Theform of the elements of the mixing matrix, a ij , and the linear opera-tor in the above equation are mixing model dependent and denewhether the mixing process is instantaneous, anechoic or echoic.Table II presents the linear operators and mixing matrix elementsspecic to the three cases of BSS, where the operator (t ij ) isused to denote a delay between source j to sensor i, cij is a scalar attenuation factor between source j to sensor i, ij

k and cij k are the

delay and attenuation parameters for the k -th arrival path, and L isthe number of paths the sources can take to the sensors.

The following subsections detail the three cases of BSS. In aslight abuse of conventional notation, the temporal index t is usedfor the relevant domain, which will be made explicit only whennecessary.

A. Instantaneous Mixing. For the instantaneous case, the mix-ing matrix A simply consists of scalars representing signal ampli-tudes. Taking a simple example where there are two sources andtwo mixtures, the generative model takes the form

x1t x2t a11 a12a21 a22 s1t s2t ; 2

and can be described as a linear mixture of N linear subspaces in M -space. This linear mixing imposes a structure on the resultant

mixtures, which becomes apparent when the mixtures have a sparserepresentation (see Fig. 1). The existence of this structure can beexplained as follows: from Eq. (2) it is evident that if only onesource is active, say s1 , then the resultant mixtures would be

x1t x2t s1t a11a21 :

Therefore, the points on the scatter plot of x1(t ) versus x2(t )would lie on the line through the origin whose direction is given bythe vector [ a 11 a21 ]

T . When the sources are sparse, making itunusual for more than one source to be active at the same time, thescatter plot of coefcients would constitute a mixture of lines, withthe lines broadened because of noise and the occasional simultane-ous activity. These line orientations are unique to each source andcorrespond to the columns of the mixing matrix A; therefore, theessence of the sparse approach is the identication of line orienta-

Table II. Mixing Model Specic Linear Operators and Mixing Parameters.

Mixing Model Linear operator Generative Model aij

Instantaneous Matrix multiply x ( t ) A s(t ) cij Anechoic Convolution x ( t ) A s(t ) cij (t ij )

Echoic Convolution x ( t ) A s(t ) S Lk 1ck ij (t k ij )

20 Vol. 15, 1833 (2005)


4/16

tions vectors (also known as basis vectors) from the observed data.In contrast, traditional non-sparse approaches exploit the statisticsof the sources as opposed to the structure of the mixtures and

employ methods such as mutual information minimisation and inde-pendence maximisation to identify the mixing process (see Table I).

B. Anechoic Mixing. Anechoic mixing is an extension of instan-taneous mixing in which source transmission delays between sen-sors are also considered. Taking a simple example with two mix-tures and N sources, the generative model is

x1t X N

j 1a1 j s j t 1 j ; 3

x2t X N

j 1a2 j s j t 2 j ; 4

where the attenuation and delay of source j to sensor i would bedetermined by the physical position of the source relative to thesensors. Some algorithms make an assumption that the distancebetween the sensors is smaller than half the wavelength of the high-est frequency of interest. This assumption, referred to in the arraysignal processing literature as the narrow band assumption, makes

it possible for a delay between sensors to be represented unambigu-ously as a phase shift of a signal. The problem of anechoic signalunmixing is therefore to identify the attenuation factor and relativedelay associated with each source. An illustration of the anechoiccase is provided in Figure 2.

C. Echoic Mixing. The echoic case of BSS considers not onlytransmission delays but reverberations too. This results in a moreinvolved generative model that in turn makes nding a solutionmore difcult. A parametric mixing model with two sources andtwo mixtures takes the form

x1t X L

k 1ak 11 s1t

k 11 a

k 12 s2t

k 12; 5

x2t X L

k 1ak 21 s1t

k 21 a

k 22 s2t

k 22; 6

where L is the number of paths the source signal can take to the sen-sors. The mixing matrix A can be thought of as a matrix of niteimpulse response (FIR) lters that are dened by the environment

Figure 1. Scatter plot of two linear mixtures of three zero-mean sources in both the time domain (left) and the transform domain (right). Thesparse transform domain consists of the coefcients of 512-point windowed FFTs. The gures axes are measured in arbitrary units of transformdomain mixture coefcients.

Figure 2. Anechoic mixing: speech is observed at the microphones with differing intensity and arrival times (because of propagation delays)but with no reverberations.

Vol. 15, 1833 (2005) 21


5/16


6/16

repeated over all data points where each stochastic gradient algo-rithm converges to an individual line orientation, which are thenadjoined to form the estimated mixing matrix, A [v1 | |v N ]. Thisalgorithm is analogous to k -Means where line orientations and dis-tances from a line replace cluster centres and distance from thecluster centres of conventional k -Means. This idea can be modiedto include soft data assignments and a batch mode of operation,where the stochastic gradient algorithm is replaced by an eigenvec-tor decomposition of the covariance matrix of the data set (OGradyand Pearlmutter, 2004b). This approach is an expectation maximisa-tion (EM) algorithm (Dempster et al., 1976), where the E-step cal-

culates posterior probabilities assigning data points to lines andM-step repositions the lines to match the points assigned to them.The separation of 6 sources from 4 mixtures using this EM basedapproach is presented in Figure 4. Theis (2001) presents a similar approach and Kearns et al. (1997) contains a useful discussion of hard and soft assignments.

Feature detection techniques from image processing have alsobeen used to locate line orientations in a scatter plot. Lin et al.(1997) present an algorithm that uses a Hough transform to identifylines. The scatter plot data are partitioned into bins, providing an

image representation, the image is then convolved with an edgedetection operator, and the resultant image is normalised andthresholded to form a binary image. This image is then Houghtransformed using line orientations as the feature of interest and theline orientation vectors of the scatter plot are identied as peaks inthe Hough transform space. The line orientation parameters associ-ated with each peak in Hough space are then adjoined to form theestimated mixing matrix.

Boll and Zibulevsky (2001) present a method that isrestricted to two observations. A radial grid of K lines that coversthe space of all possible directions (i.e. from 0 to radians) isimposed on the scatter plot, the difference between the polar phase of all data points and one radial grid line is calculated, andeach value is used to parameterise a local kernel density function(sometimes called a potential function). Each of the T local kernelestimates is then combined to give a single kernel estimate for that grid line. Grid lines that are close to scatter plot line orienta-tions are highly activated compared with those that are not. Theprocedure is repeated for all grid lines and the result is a globalkernel density function spanning K-grid lines where the localmaxima of the function correspond to the columns of the mixingmatrix.

Figure 4. Separation of 6 sources from 4 mixtures using the Soft-LOST algorithm (OGrady and Pearlmutter, 2004b). The plots above show 10-sclips of six acoustic sources s1 , . . . , s6 , 4 mixtures x 1 , . . . , x 4 and 6 source estimates s 1 , . . . ,s 6, where sound wave pressure is plotted against time inunits of seconds. The signal-to-noise ratios of the estimated sources in dB are as follows: 10.5845, 13.3895, 13.7402, 8.9102, 15.2732 and 11.3213.

Vol. 15, 1833 (2005) 23


7/16

Source separation techniques specic to the separation of sour-ces within a stereophonic music recording also exist. Barry et al.(2004) present an algorithm that exploits the fact that stereophonicmusic recordings are mixed using a panoramic potentiometer,which positions individual sources (instruments) within the stereoeld. Gain-scaling is applied to one mixture so that one sourcesintensity becomes equal in both the left and right mixture. Subtrac-tion of the mixtures will cause the source to cancel out because of phase cancellation. The appropriate gain required for the gain-scal-ing operation is determined by creating a frequency-azimuth spec-trogram; both mixtures are transformed using a discrete Fourier transform (DFT) of a specied frame size and the left mixture isscaled by a range of values corresponding to 0 8 (far left) to 180 8

(far right) azimuth. The resultant frames are subtracted from theright mixture frame and the results are used to construct the fre-quency-azimuth spectrogram. This procedure is repeated for the leftmixture. As can be seen from Figure 5, nulls or local minima areformed in the frequency bands specic to each source. These nullsrepresent the gain needed to cancel each source. In this way, theposition (mixing parameters) of each source can be determined byvisual inspection or by using some optimisation technique.

A.2. Non-Sparse Methods. The following methods do not makethe sparseness assumption and instead estimate the mixing processby exploiting the statistics of the observations in the time domain.These methods are restricted to the even-determined case.

The BS-Infomax algorithm of Bell and Sejnowski (1995) shows

that for signals with a positive kurtosis, such as speech, minimisingthe mutual information between the source estimates and maximis-ing the entropy of the source estimates are equivalent. Entropymaximisation can then be implemented using a stochastic gradientascent rule. There are a couple of interesting aspects to theapproach. First, the model is parameterised not by the mixingmatrix, as is natural for a forward model, but rather by its inversei.e. the unmixing matrix. Although the inverse contains the sameinformation, it makes for a much simpler update rule and superior numeric conditioning. The second interesting aspect is that the algo-rithm takes a naive stochastic gradient descent algorithm and modi-

es it by multiplying the gradient by a particular positive-denitematrix, W TW, where W is the current estimate of the unmixingmatrix. While Amari et al. (1996) provide a theoretical justicationof this based on information geometry, a naive perspective wouldbe that the multiplier chosen fortuitously eliminates a matrix inver-sion, making each step of the algorithm much faster and the conver-gence rate independent of the condition number of the unmixingmatrix.

Parra and Sajda (2003) formalise the problem of nding the

unmixing matrix as a generalised eigenvector decomposition of twomatrices, where one of the matrices is the covariance matrix of theobservations and the other is a symmetric matrix whose formdepends on the statistical assumptions on the sources (non-station-ary decorrelated, non-white decorrelated or non-Gaussian independ-ent). This formulation combines subspace analysis and mixingmatrix construction into a single computation, making for simpleimplementation.

Algorithms that rely solely on second order statistics have alsobeen developed, one such algorithm is the SOBI algorithm (Belou-chrani et al., 1997). SOBI is based on the unitary diagonalisation of the whitened data covariance matrix. This idea is extended toexploit the time coherence of the original sources, where multiple

whitened covariance matrices are diagonalised simultaneously, andeach matrix is calculated from observation data taken at differenttime delays. The algorithm considers the effect of noise on theobservations by using a whitening matrix calculation that improvesrobustness to noise. Unitary diagonalisation can be explained as fol-lows: given a whitening matrix V and the observations X , the cova-riance matrix of the whitened observations is E[VXX H VH ] VR X V

H VA R SAH VH I , where superscript H denotes the com-plex conjugate transpose of a matrix, R X denotes the observed datacovariance matrix, and R S denotes the covariance matrix of theoriginal sources. The source signals are assumed to have unit var-iance and are uncorrelated, therefore R S I . It turns out that VA isa unitary matrix, therefore VA U and the mixing process can befactored as A V 1U. The problem is then to nd a unitary matrixthat diagonalises the whitened covariance matrix, and for the casewhere there are multiple covariance matrices, the problem is to nda unitary matrix that simultaneously diagonalises all matrices gen-erated from the different time delays. This joint approximate diago-nalisation operation can be computed efciently using a generalisa-tion of the Jacobi technique for the exact diagonalisation of a singleHermitian matrix.

Sometimes a distinction is made between BSS and ICA tech-niques; for the purposes of the paper, we consider ICA to be arestrictive case of BSS where only the even-determined or over-determined case is considered, the mixing process is instantaneousand the observed data is pre-whitened. Consequently the mixingmatrix A is factorised into A VR , where V is the whiteningmatrix and R is an orthogonal rotation. The factor R is determinedby assuming independence between the original sources and select-ing a matrix that will maximise independence in the observations,using some independence criteria. The fICA algorithm of Hyvarinenand Oja (1997) uses an independence criteria that forces non-Gaus-sianity of the estimated sources. Other criteria such as maximumlikelihood (Gaeta and Lacoume, 1990) and mutual information(Comon, 1994) are presented in the literature; for an in depth dis-cussion on techniques particular to our denition of ICA, seeHyva rinen (1999) and Cardoso (1998).

Figure 5. The frequency-azimuth spectrogram for the right channel.Two synthetic sources comprising 5 non-overlapping partials are used.The arrows indicate frequency dependent nulls caused by phasecancellation. (From Barry et al., 2004, Figure 1, with permission).

24 Vol. 15, 1833 (2005)


8/16

B. Anechoic Mixing B.1. Sparse Methods. A simple online approach to anechoic

unmixing is presented by Aoki et al. (2001). Two sensor observa-tions are transformed into the Fourier domain, and the amplitudeand phase differences between the frequency bins of each observa-tion are calculated. The amplitude and phase associated with eachsource is known in advance, and if the difference between theseideal values and the current calculated values is smaller than somespecied threshold, then the associated frequency bin coefcient isassigned to that source. Although this method could not bedescribed as a blind model, it gives a good insight into how sourcescan be separated in an anechoic environment.

A more sophisticated blind approach called the DUET algorithmis presented in Jourjine et al. (2000) and further explored in Yilmazand Rickard (2004). The observations of each sensor are trans-formed into the timefrequency domain using a short-time win-dowed Fourier transform, and the relative attenuation and delay val-ues between the two observations are calculated from the ratio of corresponding timefrequency points. The main assumption is that,

because of the sparsity of speech in the timefrequency domain,almost all mixture timefrequency points with signicant magni-tude are in fact due to only one of the original sources. Thus, eachpoint in the stereo mixing is essentially a mixture of at most onesource, and the goal is to calculate to which source each timefre-quency point belongs. To do this, the relative attenuation and delayfrom each ratio of timefrequency points are placed into a 2D power-weighted histogram. Each source will cause a peak in the histogram.These peaks correspond to the relative attenuation and delay mixingparameters of the sources. An attenuation-delay histogram that indi-cates the existence of six sources and their associated attenuation and

delay parameters can be seen in Figure 6. Rickard et al. (2001)present an online algorithm based on DUET that uses a maximumlikelihood-based gradient search method to track the mixing parame-ters. Choi (2003) presents a real-time approach to anechoic unmixingusing a method similar to DUET, where the 2D histogram of esti-mates is replaced by a k -Means clustering algorithm.

Scatter plot representations can also be used in the anechoiccase. Boll (2002) presents a method for anechoic unmixing thatuses the same line orientation identication approach used in theinstantaneous case. The generative model is modied slightly inwhich the attenuation and delay parameters of the mixing processare segregated into two matrices. Observed data is transformedusing a DFT and the magnitude of the resultant complex data isused to create a scatter plot. The line orientations of the scatter plotare identied by the kernel density estimation method of Boll andZibulevsky (2001) (discussed in Section IIIA) and the resultant lineorientation vectors are adjoined to form the attenuation matrix. Theestimated delay matrix is formed by taking the real and imaginarycoefcients assigned to source j in the previous operation and itera-tively rectifying the delay parameter until the kernel function of thedata is maximised. The procedure is repeated for the N sources andthe resultant delay parameters form the estimated delay matrix.

B.2. Non-Sparse Methods. A number of modications that allowexisting instantaneous case algorithms to be used for anechoic situa-tions have been proposed. Platt and Faggin (1992) apply the mathe-matics of adaptive noise cancelling networks (Widrow et al., 1975)to the source separation network of Herault and Jutten (1986). Theadaption of noise cancellation networks is based on an elegantnotion: if a signal is corrupted by noise, it will have higher power than when it is not, because the noise power adds to the signalpower. This principle known as the minimum output power principleis used to adapt an extension of the He rault-Jutten network that con-tains delays in the feedback path. These delays cancel out interfer-ence from the delayed versions of other sources in the mixtures andthe new network is then capable of unmixing delayed mixtures. Thismodel can also be extended to the echoic case.

The BS-Infomax algorithm has also been extended to theanechoic case. Torkkola (1996b) extends BS-Infomax using theadaptive delay architecture of Platt and Faggin (1992). The resultantnetwork maximises the entropy at the output of the network withrespect to the weights and the delays. One problem with theapproach taken is that sometimes the network is unable to convergeto a proper separating solution. It is therefore crucial for conver-gence that the initial delay estimates are approximately in the cor-rect range, which can be achieved by using prior knowledge of therecording situation. Torkkola (1996a) also extends the algorithm tothe echoic case.

The simultaneous diagonalisation technique of the SOBI algo-rithm has also been transposed to the anechoic case. Yeredor (2001)

uses joint diagonalisation of spectral matrices to estimate the mix-ing coefcients as well as the delays. An analysis is provided in thefrequency domain, resulting in a diagonalisation of a set of spectralmatrices at different selected frequencies. The problem is to nd adiagonalising matrix B , which can be factored into a matrix of mix-ing coefcients and delays. This problem can be solved by extendingYeredor (2000), which solves the joint diagonalisation problem byalternately minimising a least squares cost function with respect to amatrix of mixing coefcients A , delays D, and source spectra C . Thealgorithm is iterated over three steps: rst minimise the least squarescost function with respect to C with A and D xed; then minimise

Figure 6. The DUET smoothed power-weighted histogram for a sixspeech source mixture. Each peak corresponds to one source andthe peak locations correspond to the associated sources mixingparameters. (From Yilmaz and Rickard, 2004, Figure 8, with permission).

Vol. 15, 1833 (2005) 25


9/16

with respect to each column of A , assuming the other columns and Tand C are xed and nally minimise with respect to each column of D, assuming the other columns and A and C are xed. The parameter estimates after convergence dene the mixing process.

C. Echoic Mixing. Lambert (1996) develops an FIR matrix alge-bra that can be used to solve problems involving echoic mixtures.The idea is to extend the algebra of scalar matrices to matrices of

lters (time domain) or polynomials (frequency domain). Lambertpresents techniques that compute the inverse of a matrix of FIR l-ters. These techniques are required to determine the inverse mixingprocess and have enjoyed wide deployment in separation methodsfor echoic mixtures.

Westner and Bove (1999) discuss the separation of over-deter-mined mixtures (more sensors than sources). A technique is pre-sented that involves taking the impulse response of the room inwhich the observations are made and then parameterising the FIRlters in the mixing process with the inverse of the impulseresponse. As described in Lambert (1996), it is possible to usestandard scalar matrix operations to invert FIR polynomial matrices,and in the case of an over-determined system, it is possible to use apseudo-inverse to determine the unmixing matrix. The motivationfor using more sensors than sources is quality of separation: resultsare presented that show that the signal to noise ratio of the separatedsources improves with an increase in the number of sensors. Thisapproach, where impulse responses are measured, diverges from theblind separation model. However this technique indicates the infor-mation needed to estimate sources from echoic mixtures.

A computationally efcient method to determine the inversemixing process is to estimate the parameters of the mixing processin the frequency domain, where convolution becomes scalar multi-plication, f (t ) g(t ) !F f (! ) g(! ). It is possible to transform an FIRlter matrix to an FIR polynomial matrix by performing a fre-quency transform on its elements. FIR polynomial matrices arematrices whose elements are complex valued polynomials (Lam-

bert, 1996). A consequence of the transformation is that the fre-quency components observed at the sensors are in fact instantane-ous mixtures of the original frequency components of the sources,i.e each frequency bin of x(t ) is an instantaneous mixture. The mix-ing process can be observed in the frequency domain by performingan FFT transformation on x(t ). When the mixing process is trans-formed in this way, the resultant FIR polynomial matrix can bemodelled by a combination of multiple complex valued matricesand therefore can be viewed as solving multiple instantaneous mix-ing problems in the frequency domain.

Smaragdis (1998) uses this approach and presents an algorithmthat employs BS-Infomax. The BS-Infomax learning rules are refor-mulated for complex domain data and are used to learn the unmix-

ing process at each frequency bin. One major problem with thisapproach is that of permutation and scaling: the columns of theresultant mixing matrices may not all correspond to the samesource, which results in incorrect source estimation. Although anumber of heuristics to alleviate this problem are presented (Ane-muller and Kollmeier, 2003), it remains an open problem. A owdiagram of this algorithm is shown in Figure 7.

Murata et al. (2001) also formulated the problem of convolutivemixture separation in the frequency domain. The instantaneousmixtures at each frequency bin are separated using simultaneousdiagonalisation similar to the SOBI algorithm. To solve the permu-

tation problem, a method based on the temporal structure of signalsthat exploits the non-stationarity of speech is introduced. The sepa-rated components at each bin are combined with the components inthe other bins that exhibit a similar envelope over some speciedtime. This method makes use of the fact that non-stationary signalsexhibit correlation between the frequency bins of the signals spec-

trum.Sahlin and Broman (1998) present an algorithm that is applied

to the problem of separating a signal from noise. A mobile phone istted with two microphones and the noise performance of the algo-rithm is tested in a range of echoic environments. Echoic separationis formulated as an optimisation problem in which the cost functionis the sum of squared cross correlations between the separated sig-nals, the goal is to estimate appropriate lter weights that willachieve separation. Minimisation of the cost function is performedusing a GaussNewton search, and when separation is achieved, theseparated sources are mutually uncorrelated. One problem thatarises in convolutive mixture separation is to determine the lter order required for separation. One way to tackle this problem is touse a technique known as leaking (Ljung and Sjo berg, 1992),which involves modifying the cost function with the benet thatthere is no performance degradation because of over-parameterisa-tion of the algorithm.

Convolutive mixtures also appear in digital information trans-mission. A typical assumption in this area is that the number of sources is less than the number of observations. Icart and Gautier (1996) present a method in which the generative model of Eq. (5) ismodelled using an auto-regressive moving average (ARMA)model. The rst stage of the algorithm is to estimate the convolutivemixture using a linear predictor, which is used to create a lter thatprovides an instantaneous mixture of the source signals. Sourcescan then be estimated from this new mixture using any instantane-ous mixture method. A similar approach is also presented in Goro-

khov et al. (1996).The problem of blind separation of convolutive sources can be

compared with that of blind channel equalisation (sometimes calledblind deconvolution). Blind equalisation is an adaptive lteringtechnique, which can restore a signal corrupted by a multipathchannel. Lambert (1995) extended this idea to the multichannelproblem of blind separation where separation is achieved by decor-relating multiple source signals. Lambert formulates a multichannelblind equalisation cost function and makes use of the Bussgangproperty (Bussgang, 1952) (also discussed in Fiori (2002)) toachieve separation (Lambert and Bell, 1997). The Bussgang prop-

Figure 7. Flow diagram of the Smaragdis frequency domain algo-rithm for echoic separation. (From Smaragdis, 1998, Figure 3, withpermission).

26 Vol. 15, 1833 (2005)


10/16

erty can be stated as follows: two random variables xi and x j areindependent if and only if E[ f ( xi) g( x j )] E[ f ( xi)] E[g( x j )] for allfunctions f ( ) and g( ).

Baxter and McWhirter (2003) present an algorithm, which is ageneralisation of instantaneous ICA, where hidden rotation matrixestimation is replaced by estimation of a Hidden Paraunitary Matrix(HPM). An initial process of strong decorrelation and spectralequalisation (whitening), based entirely on second order statistics,is followed by the identication of a HPM that necessitates the use

of fourth order statistics. The concept of an Elementary ParaunitaryMatrix (EPM) is introduced, and this takes the form of a number of delays applied to one of the channels, followed by a rotation. TheHPM is built up as a sequence of EPMs. The parameters requiredfor the HPM are determined using a novel method termed thesequential best rotation algorithm. At each iteration of the algo-rithm, a delay and rotation angle are computed and between themthey specify an EPM. These parameters are chosen to maximisestatistical independence (using any instantaneous ICA method) atthe output of that stage. The number of stages is not specied inadvance. Instead, the algorithm continues until no further improve-ment can be achieved, which differs from previous methods (Vai-dyanathan, 1993) of paraunitary matrix estimation.

Monaural source separation of convolutive sources has alsoreceived some attention. A biologically inspired technique thatexploits spectral cues is presented in Pearlmutter and Zador (2004).When sound reaches an organisms inner ear, the sounds spectrumis coloured by the head and the shape of the ears. This spectralcolouring or ltering is known as the head-related transfer function(HRTF) and is dened by the direction of the sound and the acous-tic properties of the ear. When using the HRTF, it is assumed thateach source has a unique position in space. The auditory scene issegregated into a number of different locations each with an associ-ated HRTF lter. Sources coming from these locations will be col-oured by its associated lter, which indicates the sources position;thus, identication of the HRTF lter applied to each source will

lead to separation of sources. This can be achieved by sparsely rep-resenting the sources in an over-complete (under-determined) dic-tionary, which is performed as follows: a known HRTF is used witha given signal dictionary where each dictionary element is lteredby the HRTF lter at each location in the auditory scene, resultingin an over-complete signal dictionary. The monaural mixture signalis then decomposed into these dictionary elements by L1-norm mini-misation (see Section IVA) and the resultant coefcients indicatethe location of the sources in the signal. The dictionary elements of the locations identied are then scaled by their coefcients and line-arly combined to create the estimated sources. By estimating thecoefcients by using a post-HRTF (sensor space) dictionary butreconstructing using a pre-HRTF (source space) dictionary, separa-tion and deconvolution can be simultaneously achieved.

Careful placement of microphones can greatly simplify the prob-lem of separating echoic sources. Sanchis et al. (2004) discuss howto convert echoic mixtures to instantaneous mixtures using two ster-eophonic microphones: the microphones are positioned at the samepoint in space (making the observations instantaneous at eachmicrophone) but in different directions. The directional propertiesof the microphones cause the mixtures observed at the microphonesto have a differing contribution form each source (if regular non-directional microphones were used, each microphone would recordthe same observation). Thus, the problem is converted to the instan-

taneous case. In some practical situations, the mixtures are not trulyinstantaneous because of reverberations and are considered to bestill echoic, and the advantage of the microphone arrangement inthis case is that fewer taps are need for the separation lters. Similar work is presented by Katayama et al. (2004).

IV. SEPARATION TECHNIQUESSubsequent to the estimation of the mixing matrix, separation of theunderlying sources can be performed. The complexity of the separa-tion process is inuenced by the mixing model used and the relativenumber of sources and sensors. The next section presents a number of techniques used in the separation stage of BSS algorithms.

A. Instantaneous Separation. Separation in the even-deter-mined case can be achieved by a linear transformation using theunmixing matrix W ,

s t Wx t ; 8

where s(t ) holds the estimated sources at time t, A is the estimated mix-

ing matrix and W A1

up to permutation and scaling of the rows.For the under-determined case, which is usually restricted to

sparse methods (see Table I), a linear transformation is not possible,since A s(t ) x(t ) has more unknowns in s than knowns in x and istherefore non-invertible. Consequently, some non-linear techniqueis needed to estimate the sources. Usually, these techniques involveassigning observed data x(t ) to the columns of A that characteriseeach source. The most rudimentary technique is to hard assign eachdata point to only one source based on some measure of proximityto columns of A (Vielva et al., 2000, 2002; Lin et al., 1997). A logi-cal extension of the previous technique is the partial assignment of each data point to multiple sources. This is generally done by mini-misation of the L1-norm (sometimes referred to as the shortest-path

algorithm (Boll and Zibulevsky, 2000) or basis pursuit (Chenet al., 1998)). L1-norm minimisation is a piecewise linear operationthat partially assigns the energy of x(t ) to the M columns of A thatform a cone around x(t ) in R M space. The remaining N M columnsare assigned zero coefcients; therefore, L1-norm minimisation pro-vides more accurate source estimates when the sources are sparse,in this case, when the number of sources active at any one time isless than or equal to M. L1-norm minimisation can be accomplishedby formulating the problem as a linear program

minimise s t k k1 subject to ^ A s t x t ; 9

where the observed data point x(t ) is in an appropriate domain andthe s(t ) coefcients, properly arranged, constitute the estimatedsources, s [s(1) | |s(T )]. A detailed discussion of signal recov-ery using L1-norm minimisation is presented by Takigawa et al.(2004). When using complex data, as in the case of a FFT represen-tation, the real and imaginary parts are treated separately; thus, dou-bling the number of coefcients.

Source separation can be formalised in a probabilistic frame-work: given x(t ) and A , a standard approach for reconstructing s(t )is maximum likelihood, which nds an s(t ) that maximises the pos-terior probability. If the prior probability P(s(t )) is Laplacian

Vol. 15, 1833 (2005) 27


11/16

( P(s(t )) ! e |s (t )|), then this probabilistic approach becomes the L1-norm minimisation of Eq. (9)

B. Anechoic Separation. For algorithms that are constrained tothe even-determined case (Yeredor, 2001), separation can be

achieved by linear transform in the frequency domain. Observationsare transformed using a DFT and the underlying sources are esti-mated by multiplying with the inverse product of the amplitude anddelay matrix, and the sources are then transformed back to the timedomain.

For algorithms such as DUET (Jourjine et al., 2000) that usetimefrequency representations of the observations, one powerfulapproach, known as timefrequency masking (Yilmaz and Rick-ard, 2004), is to partition the timefrequency domain into regionscorresponding to individual sources. These regions are dened byproximity to the estimated delay and amplitude parameters andeach source is then estimated by synthesising the coefcientscontained in its region. This technique relies on the assumptionof source disjointness; sources are said to be disjoint if the inter-section of the supports of the sources is zero, i.e. each sourceoccupies an exclusive region in the timefrequency domain.Although this condition is not fully met in practice, particularlyin acoustics, it has proven a very useful and practical approxima-tion (Rickard and Yilmaz, 2002).

Boll (2002) presents a method that reduces the anechoic caseto the instantaneous case, resulting in individual matrices for theamplitudes and delays. Sources can then be estimated using meth-ods presented in the previous subsection. Boll uses a techniquesimilar to L1-norm minimisation called second order conic pro-gramming , the difference being that the latter can include the mag-nitudes of complex numbers in the L1-norm; therefore, eliminatingthe need for, and approximation made by, putting the real and imag-

inary components separately into the L1-norm.

C. Echoic Separation. Separation of echoic mixtures is gener-ally only attempted in the even-determined case, where it can beachieved by convolving mixture observations with an inverse mix-ing process W ,

s t W x t : 10

The most widely deployed method to obtain an inverse mixingprocess is the FIR matrix inversion of Lambert (1995). Matrices of lters have inverses of the same form as matrices of reals, and a

convolutive mixing process with two mixtures and two sourceswould have an inverse of the form

W ^ A 1

1

~ A11 ~ A22 ~ A12 ~ A21

~ A22 ~ A12~ A21 ~ A11 : 11

Thus, in the low-noise even-determined case, estimation of the convol-utive mixing process allows the construction of an unmixing process.

For algorithms such as Icart and Gautier (1996) and Gorokhovet al. (1996), which reduce the problem to one of instantaneousmixing, the methods described in Sections IIIA and IVA are suf-cient to unmix the original convolved sources. For the blind equal-isation approach of Lambert (1995), source estimates can beobtained by making use of feedback where the current best esti-mates of A 21 and A 12 are used to obtain estimates of the sources.The equations for the two sensor two source case are as follows:

s1t x1t s2t ~ A21 ;

s2t x2t s1t ~ A12 : 12

V. JOINT ESTIMATIONThe previous two sections outlined a staged approach to BSS,where source separation is performed after estimating the mixingprocess. It is also possible to estimate the sources and mixing proc-ess concurrently by selecting an appropriate cost function and den-ing the problem as an optimisation problem. Such a cost functioncan be constructed using a measure of distance between X and theproduct AS . One typical measure is the square of the Euclidean dis-tance between X and AS .

C X ; A ; S X ASk k2 Xij xij AS ij 2 : 13

This is insufcient to fully constrain the solution, since an optimumcan be transformed into another equivalent optimum by S 7! US , A7! AU 1, where U is any invertible matrix. However, we are nowin a position to formulate X AS as an optimisation problem. Wewill focus our discussion on the emerging eld of non-negativematrix factorisation (NMF) (Lee and Seung, 1999), which is anoptimisation problem of the following form

minimize C X ; A ; S subject to A ; S 0:

The presented cost function is convex in A or S only, but not in bothvariables together. Therefore, it is unrealistic to expect an algorithmto nd a global minima for both variables together and usually,algorithms alternate updates of A and S.

A. Non-Negative Matrix Factorisation. Non-negative matrix

factorisation is a technique for the decomposition of multivariatedata. It is based on the fact that in many data processing tasks, nega-tive numbers are physically meaningless and contradict physicalrealities. For example, when considering grey-scale images, theprincipal component analysis decomposition of an image may resultin basis vectors that have both positive and negative components.This image is then represented by a linear combination of these basisvectors weighted by both positive and negative coefcients, withsome basis vectors being cancelled out by others. Since negativebasis has no real-world representation in a grey-scale image context,this has led researchers to argue that the search for a basis should be

s t max^ A s t x t

Ps t jx t ; ^ A

max^ A s t x t

Px t js t ; ^ A Ps t

max^ A s t x t

Ps t

max^ A s t x t

exp js 1t j js N t j

min^ A s t x t

js 1t j js N t j

min^ A s t x t

s t k k1 :

28 Vol. 15, 1833 (2005)


12/16


13/16

replaced by a projected gradient descent calculation with the resul-tant update elements set to zero if negative and each column nor-malised. An experiment using simple 3 3 pixel images in whicheach image is treated as a 9-element vector is presented (see Fig. 8).A data matrix X with 12 data images in its columns is constructedfrom random non-negative data S orig using the generative modelX Aorig Sorig . An over-complete basis of 10 images is manuallyconstructed from a complete basis of 6 images with the remaining 4images constructed from a combination of 2 from the complete

basis, these images are then placed in the columns of A orig . NSC iscompared with NMF and the results show that, provided the dimen-sions of A are appropriate, NSC correctly identies all the over-complete basis vectors of the data set, while NMF identies onlythe complete basis vectors. The images in X can be perfectlydescribed by the complete subset of the over-complete basis vec-tors, and the advantage of identifying the entire over-complete basisis that a sparser description of the data can be achieved.

An interesting application of NSC for monaural sound separa-tion using a temporal continuity assumption is presented in Virta-nen (2003). The Temporal continuity assumption is motivated bythe human auditory system (Bregman, 1990) and assumes that themixture sources have constant spectra with time-varying gain. Thisalgorithm considers only monaural signals; therefore, a differentgenerative model is used. The observed signal is split into T framesof F samples, the power spectrum within each frame is calculatedusing a DFT and these frames are adjoined to form the columns of X . The matrix of constant source spectra S has N F dimensionswith spectra contained in each row. A is a T N matrix containingtime-varying spectral gain information and denes the mixing of the constant spectra in S for each frame. It is also assumed that thesound sources are inactive most of the time resulting in a sparse Amatrix. The observations X are generated by X AS . The addi-tional temporal continuity requirement can be included as an addi-tional constraint on the NSC cost function of Eq. (17)

C X ; A ; S w1 X ASk k22 w2 A k k1

w3 XT

t 1 X M

i1ja t 1;i a t ;ij; 18

where w1, w2 and w3 are constants of proportionality that balancethe cost function for accuracy of reconstruction, sparseness andtemporal continuity, respectively. Temporal continuity is achievedby the third term of Eq. (18), which ensures that the gain differen-ces between consecutive frames are minimised. It is also worth not-ing that the sparse constraint has shifted from S to A . The optimisa-tion algorithm used is similar to Hoyer (2002) and is applied to thetask of separation and transcription of drum sources from real-worldpolyphonic music signals. A similar method that makes use of tem-poral structure in a different way to that described above is pre-sented by Smaragdis (2004).

Non-negative constraints have also found their way into a tradi-tional ICA framework. Plumbley (2003) presents an algorithm thatplaces a constraint of non-negativity on the sources. It is assumedthat observations are noiseless and sources are well-grounded , i.e.they have a non-vanishing probability density function in the neigh-bourhood of zero. It is shown that the combination of pre-whiteningand calculation of an appropriate orthonormal rotation is both nec-essary and sufcient for nding the set of underlying sources, pro-vided that the sources are well-grounded. Two cost functions are

presented that can be used to determine the rotation matrix; thesecost functions can then be used by any suitable optimisation algo-rithm. One interesting insight that emerges is that there is norequirement for the unwhitened inputs and mixing matrix to benon-negative in contrast with NMF. Although the discussion con-centrates on sources with well-grounded distributions, it is expectedthat the algorithm will work particularly well with sparse distribu-tions. This method diverges from the joint estimation approachbecause sources are estimated by a linear transformation, but illus-trates how non-negative matrix constraints can be introduced into aan ICA framework. Non-negativity constraints have also beenapplied to ICA for hyperspectral image analysis (Parra et al., 2000).

VI. FURTHER READINGThe authors would like to recommend the following books that theyhave found useful in their research. For a detailed discussion onICA techniques and principles, consult Roberts and Everson (2001)and Hyva rinen et al. (2001). For an introduction to blind deconvolu-tion and associated algorithms including Bussgang techniques, con-sult Haykin (1994). A modern overview of the different approachestaken (including psychology, physiology, engineering and computer science) for the cocktail party problem is presented in Divenyi(2005). For background material on blind signal processing includ-ing multipath blind deconvolution, refer to Cichocki and Amari(2002). For a solid entry-level introduction to machine learningprinciples and techniques, consult Duda et al. (2001).

VII. CONCLUSIONThe problem of BSS is a challenging and interesting one, which hasenjoyed much attention over the last two decades. The complexityof the problem is inuenced by environmental considerations, thenumber of sensor observations, and the number of underlying sour-ces to be separated. This survey presented an overview of the tech-niques used and the different approaches taken to solve the prob-lem. The problem can be solved by a staged approach in which themixing process is estimated rst and the sources are estimated sub-sequently. Alternatively, the problem can be viewed as an optimisa-tion problem in which the mixing process and source estimates areestimated jointly. The adoption of an assumption of sparsity, byway of a suitable basis transformation, ameliorates the problemsomewhat by enabling algorithms to efciently exploit the structureof the mixing process. The sparse assumption also provides for asolution to the under-determined case. It is envisaged that the abil-ity of sparse representations to efciently represent useful statisticalstructure in a signal will lead to an increased number of practicalsignal processing algorithms in which sparse representations play akey role.

ACKNOWLEDGMENTSHigher Education Authority of Ireland (An tu dara s Um Ard-Oideachas) and Science Foundation Ireland grant 00/PI.1/C067.

REFERENCESS. Amari, A. Cichocki, and H.H. Yang, A new learning algorithm for blindsignal separation, In Advances in neural information processing systems,Vol. 8, MIT Press, Cambridge, MA, 1996.

J. Anemu ller and B. Kollmeier, Adaptive separation of acoustic sources for anechoic conditions: A constrained frequency domain approach, SpeechCommun 39 (2003), 7995.

30 Vol. 15, 1833 (2005)


14/16

M. Aoki, M. Okamoto, S. Aoki, H. Matsui, T. Sakurai, and Y. Kaneda,Sound source segregation based on estimating incident angle of each fre-quency component of input signals acquired by multiple microphones,Acoust Sci Technol 22 (2001), 149157.

A.D. Back and A.S. Weigend, A rst application of independent componentanalysis to extracting structure from stock returns, Int J Neural Syst 8(1997), 473484.

D. Barry, B. Lawlor, and E. Coyle, Sound source separation: Azimuth dis-crimination and resynthesis, In Proc 7th Int Conf on Digital Audio Effects(DAFX-04),2004, pp. 240244.

P.D. Baxter and J.G. McWhirter, Blind signal separation of convolutive mix-tures, In The Thrity-Seventh Asilomar Conf on Signals, Systems and Com-puters, 2003, pp. 124128.

A.J. Bell and T.J. Sejnowski, An information-maximization approach to blindseparation and blind deconvolution, Neural Comput 7 (1995), 11291159.

A. Belouchrani, K. Abed Meraim, J.-F. Cardoso, and E . Moulines, A blindsource separation technique based on second order statistics, IEEE TransSignal Proc 45 (1997), 434444.

A. Belouchrani and J.-F. Cardoso, Maximum likelihood source separationfor discrete sources, In Proc EUSIPCO, 1994, pp. 768771.

P. Boll, Underdetermined blind separation of delayed sound sources in thefrequency domain. Technical Report UPC-DAC-2001-14, Universitat Poli-tecnica de Catalunya, Barcelona, 2002.

P. Bolland M. Zibulevsky, Blind separation of more sources than mixturesusing the sparsity of the short-time Fourier transform, In 2nd Int Workshopon Independent Component Analysis and Blind Signal Separation, Helsinki,Finland, June 1920, 2000, pp. 8792.

P. Bolland M. Zibulevsky, Underdetermined blind source separation usingsparse representations, Signal Process 81 (2001), 23532362.

A.S. Bregman, Auditory scene analysis: The perceptual organization of sound, MIT Press, Cambridge, MA, 1990. ISBN 0-262-02297-4.

J.J. Bussgang, Crosscorrelation functions of amplitude-distorted Gaussiansignals, Technical Report 216, MIT Research Laboratory of Electronics,Cambridge, MA, 1952.

J.-F. Cardoso, Eigen-structure of the fourth-order cumulant tensor withapplication to the blind source separation problem, In Proc IEEE Int Conf

on Acoustics, Speech and Signal Processing (ICASSP90), Albuquerque,NM, 1990, pp. 26552658.

J.-F. Cardoso, Infomax and maximum likelihood for blind source separationIEEE Signal Process Lett 4 (1997), 112114.

J.-F. Cardoso, Blind signal separation: Statistical principles, Proc of theIEEE, 9(10) (1998), 20092025. Available at ftp://sig.enst.fr/pub/jfc/Papers/ Proc IEEE.us.ps.gz.

J.-F. Cardoso, J. Delabrouille, and G. Patanchon, Independent componentanalysis of the cosmic microwave background, In Fourth Int Symp on Inde-pendent Component Analysis and Blind Signal Separation, Nara, Japan,April 14, 2003, pp. 11111116.

J.-F. Cardosoand A. Souloumiac, Blind beamforming for non-Gaussian sig-nals, IEE Proc F 140(6) (1993), 362370. Available at ftp://tsi.enst.fr/pub/ jfc/Papers/iee.ps.gz.

S.S. Chen, D.L. Donoho, and M.A. Saunders, Atomic decomposition bybasis pursuit, SIAM J Sci Comput 20 (1998), 3361.

C.E. Cherry, Some experiments in the recognition of speech, with one andtwo ears, J Acoust Soc Am 25 (1953), 975979.

C. Choi, Real-time binaural blind source separation, In Fourth Int Symp onIndependent Component Analysis and Blind Signal Separation, Nara, Japan,April 14, 2003, pp. 567572.

A. Cichockiand S.-I. Amari, Adaptive blind signal and image processing:Learning algorithms and applications, John Wiley & Sons, New York, 2002.ISBN 0-471-60791-6.

P. Comon, Independent component analysis: A new concept, Signal Process36 (1994), 287314.

G. Darmois, Analyse ge nerale des liaisons stochastiques, Rev Inst Int Stat21 (1953), 28.

A.P. Dempster, N.M. Laird, and D.B. Rubin, Maximum likelihood fromincomplete data via the EM algorithm, J R Stat Soc Ser B 39 (1976), 138.

M.R. DeWeese, M. Wehr, and A.M. Zador, Binary spiking in auditory cor-tex, J Neurosci 23 (2003), 79407949.

P. Divenyi (Editor). Speech separation by humans and machines, Kluwer

Academic Publishers, Boston, 2005. ISBN 1-4020-8001-8.D. Donohoand V. Stodden, When does non-negative matrix factorizationgive a correct decomposition into parts? InAdvances in neural informationprocessing systems 16. MIT Press, Cambridge, MA, 2004. Available athttp://books.nips.cc/papers/les/nips16/NIPS2003_LT10.pdf.

J. Du, C.-H. Lee, H.-K. Lee, and Y.-H. Suh, BSS: A new approach for watermark attack, In Proc 4th Int Symp on Multimedia Software Engineer-ing (ISMSE 2002), 2002, pp. 182187.

R.O. Duda, P.E. Hart, and D.G. Stork, Pattern classication, 2nd edition,John Wiley & Sons, New York, 2001.

S. Fiori, Notes on cost functions and estimators for Bussgang adaptiveblind equalization, Eur Trans Telecommun (ETT) 13 (2002), 631634.

P. Fo ldia k and M. Young, Sparse coding in the primate cortex, In Thehandbook of brain theory and neural networks,. MIT Press, Cambridge, MA,1995, pp. 895898.

M. Gaeta and J.-L. Lacoume, Source separation without prior knowledge: Themaximum likelihood solution, In Proc EUSIPCO90, 1990, pp. 621624.

A. Gorokhov, P. Loubaton, and E. Moulines, Second order blind equaliza-tion in multiple input multiple output FIR systems: A weighted least squaresapproach, In Proc IEEE Int Conf on Acoustics, Speech and Signal Process-ing, Atlanta, GA, May 710 1996, pp. 24152418.

S.S. Haykin, Blind deconvolution, In Information and system sciences,Prentice-Hall, Englewood Cliffs, NJ, 1994. ISBN 0-13-087362-4.

J. Herault and C. Jutten, Space or time adaptive signal processing by neuralmodels, In Proc AIP Conf on Neural Networks for Computing, AmericanInstitute of Physics, 1986, pp. 206211.

P.O. Hoyer, Non-negative sparse coding, In IEEE Workshop on Neural Net-works for Signal Processing, 2002.

M.V. Hulle, Kernel-based equiprobabilistic topographic map formation,Neural Comput 10 (1998), 18471871.

A. Hyva rinen, Survey on independent component analysis, Neural ComputSurv 2 (1999), 94128.

A. Hyva rinen, J. Karhunen, and E. Oja, Independent component analysis,John Wiley & Sons, New York, 2001.

A. Hyva rinen and E. Oja, A fast xed-point algorithm for independent com-ponent analysis, Neural Comput 9 (1997), 14831492.

S. Icart and R. Gautier, Blind separation of convolutive mixtures using sec-ond and fourth order moments, In Proc IEEE Int Conf on Acoustics, Speech,and Signal Processing (ICASSP96), Atlanta, GA, USA, May 1996, pp.30183021.

A. Jourjine, S. Rickard, and O. Yilmaz, Blind separation of disjoint orthogo-nal signals: Demixing N sources from 2 mixtures, In Proc IEEE Conf onAcoustics, Speech, and Signal Processing (ICASSP2000), June 2000, Vol. 5,pp. 29852988.

T.-P. Jung, C. Humphries, T.-W. Lee, M.J. McKeown, V. Iragui, S. Makeig,and T.J. Sejnowski, Removing electroencephalographic artifacts by blindsource separation, Psychophysiology 37 (2000), 163178.

T.-P. Jung, S. Makeig, M. Westereld, J. Townsend, E. Courchesne, andT.J. Sejnowski, Analyzing and visualizing single-trial event-related poten-tials, In Advances in neural information processing systems, Vol. 11, MITPress, Cambridge, MA, 1999, pp. 118124.

Vol. 15, 1833 (2005) 31


15/16

C. Jutten and J. Herault, Blind separation of sources, part I: An adaptivealgorithm based on neuromimetic architecture, Signal Process 24 (1991),110.

Y. Katayama, M. Ito, A.K. Barros, Y. Takeuchi, T. Matsumoto, H. Kudo, N.Ohnishi, and T. Mukai, Closely arranged directional microphone for sourceseparation: Effectiveness in reduction of the number of taps and preventingfactors, In Fifth Int Conf on Independent Component Analysis, Granada,Spain, Sept. 2224, 2004, Lecture Notes in Computer Science, Vol. 3195,Springer-Verlag.

M. Kearns, Y. Mansour, and A.Y. Ng, An information-theoretic analysis of

hard and soft assignment methods for clustering. Proc Thirteenth Conf onUncertainty in Articial Intelligence, 1997, pp. 282293.

K.P. Ko rding, P. Ko nig, and D.J. Klein, Learning of sparse auditory recep-tive elds, In Int Joint Conf on Neural Networks, 2002. Available at http:// www.koerding.com/ pubs/KKKijcnn01.pdf.

R.H. Lambert, A new method for source separation, In Proc IEEE Confer-ence on Acoustics, Speech, and Signal Processing, Detroit, MI, 1995, pp.21162119.

R.H. Lambert, Multichannel blind deconvolution: r matrix algebra and sep-aration of multipath mixtures. Ph.D. thesis, Univ of Southern California,1996.

R. Lambert and A. Bell, Blind separation of multiple speakers in a multipathenvironment, In Proc IEEE Int Conf on Acoustics, Speech, and Signal Proc-

essing (ICASSP97), Munich, Germany, April 1997, pp. 423426.D.D. Lee and H.S. Seung, Learning the parts of objects with nonnegativematric factorization, Nature 401 (1999), 788791.

D.D. Lee and H.S. Seung, Algorithms for non-negative matrix factorization,In Advances in neural information processing systems, Vol. 13, MIT Press,Cambridge, MA, 2001 pp. 556562. Available at citeseer.ist.psu.edu/ lee00algorithms.html.

T.-W. Lee, M.S. Lewicki, M. Girolami, and T.J. Sejnowski, Blind sourceseparation of more sources than mixtures using overcomplete representa-tions, IEEE Signal Process Lett 4 (1999), 8790.

M. Lewicki and T. Sejnowski, Learing overcomplete representations, InAdvances in neural information processing systems, Vol. 10, MIT Press,Cambridge, MA, 1998, pp. 556562.

J.K. Lin, D.G. Grier, and J.D. Cowan, Feature extraction approach to blindsource separation, In IEEE Workshop on Neural Networks for Signal Proc-essing (NNSP), 1997, pp. 398405.

R. Linsker, An application of the principle of maximum information preser-vation to linear systems, In Advances in neural information processing sys-tems, Morgan Kaufmann, San Francisco, CA, 1989, pp. 186194.

L. Ljung and J. Sjo berg, A comment on leaking in adaptive algorithms, In4th IFAC Int Symp on Adaptive Systems in Control and Signal Processing,ACASP92, 1992, pp. 377382.

J. MacQueen, Some methods for classication and analysis of multivariateobservations, In L. M. L. Cam and J. Neyman (Editors), Proc of the FifthBerkeley Symp on Mathematical Statistics and Probability, Berkeley, Vol.1, University of California Press, 1967, pp. 281297.

M.J. McKeown, S. Makeig, G.G. Brown, T.P. Jung, S.S. Kindermann, A.J.

Bell, and T.J. Sejnowski, Analysis of fMRI data by blind separation intoindependent spatial components, Hum Brain Mapp 6 (1998), 160188.

N. Murata, S. Ikeda, and A. Ziehe, An approach to blind source separationbased on temporal structure of speech signals, Neurocomputing 41 (2001),124.

K. Nakadai, H.G. Okuno, and H. Kitano, Real-time sound source localiza-tion and separation for robot audition. In Proc of 2002 Int Conf on SpokenLanguage Processing (ICSLP-2002), 2002, pp. 193196.

P.D. OGrady and B.A. Pearlmutter, Hard-LOST: Modied k -means for ori-ented lines. In Proc Irish Signals and Systems Conf, Belfast, June 30July 2,2004a, pp. 247252.

P.D. OGrady and B.A. Pearlmutter, Soft-LOST: EM on a mixture of ori-ented lines. In Fifth Int Conf on Independent Component Analysis, Granada,Spain, Sept. 2224, 2004b, Lecture Notes In Computer Science, Vol. 3195,Springer-Verlag, pp. 430436

L. Parra, K.-R. Mueller, C. Spence, A. Ziehe, and P. Sajda, Unmixing hyper-spectral data, In Advances in neural information processing systems, Vol.13, MIT Press, Cambridge, MA, 2000, pp. 942948.

L. Parra and P. Sajda, Blind source separation via generalized eigenvaluedecomposition, J Machine Learn Res 4 (2003), 12611269.

B.A. Pearlmutter and S. Jaramillo, Progress in blind separation of magneto-encephalographic data, In Independent component analyses, wavelets, andneural networks, Proceeding of the SPIE, Vol. 5102, 2003, pp. 129134.

B.A. Pearlmutter and L.C. Parra, A context-sensitive generalization of ICA,In Int Conf on Neural Information Processing, Hong Kong, Sept. 2427,1996, Springer-Verlag, pp. 151157. Available at http://www-bcl.cs.may.ie/

barak/papers/iconip-96-cica.ps.gz.

B.A. Pearlmutter and A.M. Zador, Monaural source separation using spec-tral cues, In Fifth Int Conf on Independent Component Analysis, Granada,Spain, Sept. 2224, 2004, Lecture Notes in Computer Science, Vol. 3195,Springer-Verlag, pp. 478485.

K. Pearson, On lines and planes of closest t to systems of points in space,Philos Mag 2 (1901), 559572.

D.T. Pham, P. Garrat, and C. Jutten, Separation of a mixture of independent

sources through a maximum likelihood approach, In European Signal Proc-essing Conf, 1992, pp. 771774.

J.C. Platt and F. Faggin, Networks for the separation of sources that are super-imposed and delayed, In Advances in neural information processing systems,Vol. 4, Morgan Kaufmann, San Francisco, CA,1992, pp. 730737.

M.D. Plumbley, Algorithms for non-negative independent component analy-sis, IEEE Trans Neural Netw 14 (2003), 534543.

S. Rickard, R. Balan, and J. Rosca, Real-time time-frequency based blindsource separation, In 3rd Int Conf on Independent Component Analysis andBlind Source Separation (ICA2001), Dec 2001.

S. Rickard and O. Yilmaz, On the approximate W-disjoint orthogonality of speech. In J. Principe and H. Bourlard (Editors), Proc IEEE Int Conf onAcoustics, Speech and Signal Processing, 2002, pp. 529532.

S.J. Roberts and R.M. Everson (Editors), Independent components analysis:Principles and practice, Cambridge University Press, Cambridge, UK, 2001.

Z. Roth and Y. Baram, Multi-dimensional density shaping by sigmoids,IEEE Trans Neural Netw 7 (1996), 12911298.

S.T. Roweis, One microphone source separation, In Advances in neuralinformation processing systems, Vol. 13, MIT Press, Cambridge, MA, 2001,pp. 793799.

H. Sahlin and H. Broman, Separation of real-world signals, Signal Process64 (1998), 103113.

J.M. Sanchis, F. Castells, and J.J. Rieta, Convolutive acoustic mixturesapproximation to an instantaneous model using a stereo boundary micro-phone conguration, In Fifth Int Conf on Independent Component Analysis,Granada, Spain, Sept. 2224, 2004, Lecture Notes in Computer Science,Vol. 3195, Springer-Verlag, pp. 816823.

P. Smaragdis, Blind separation of convolved mixtures in the frequencydomain, Neurocomputing 22 (1998), 2134.

P. Smaragdis, Non-negative matrix factor deconvolution; extraction of mul-tiple sound sources from monophonic inputs, In Fifth Int Conf on Independ-ent Component Analysis, Granada, Spain, Sept. 2224, 2004, Lecture Notesin Computer Science, Vol. 3195, Springer-Verlag, pp. 494499.

I. Takigawa, M. Kudo, A. Nakamura, and J. Toyama, On the minimum l1-norm signal recovery in underdetermined source separation, In Fifth IntConf on Independent Component Analysis, Granada, Spain, Sept. 2224,2004, Lecture Notes in Computer Science, Vol. 3195, Springer-Verlag,pp. 193200.

32 Vol. 15, 1833 (2005)


16/16

A.C. Tang, B.A. Pearlmutter, M. Zibulevsky, and S.A. Carter, Blind separa-tion of multichannel neuromagnetic responses, Neurocomputing 3233(2000), 11151120.

F.J. Theis, A geometric algorithm for overcomplete linear ICA, In Work-shop Report I of the Graduiertenkolleg, Windberg, 2001, pp. 6776.

K. Torkkola, Blind separation of convolved sources based on informationmaximization, In Proc IEEE Workshop on Neural Networks and SignalProcessing (NNSP96), Kyoto, Japan, 1996a, pp. 423432.

K. Torkkola, Blind separation of delayed sources based on information max-imization, In Proc IEEE Int Conf on Acoustics, Speech and Signal Process-ing (ICASSP96), Atlanta, GA, 1996b, pp. 35093512.

P.P. Vaidyanathan, Multirate Systems and Filter Banks. Prentice-Hall, Eng-lewood Cliffs, NJ, 1993.

M. van Hulle, Clustering approach to square and non-square blind sourceseparation, In IEEE Workshop on Neural Networks for Signal Processing(NNSP), 1999, pp. 315323.

L. Vielva, D. Erdogmus, C. Pantaleon, I. Santamaria, J. Pereda, and J. Prin-cipe, Underdetermined blind source separation in a time-varying environ-ment, In J. Principe and H. Bourlard (Editors), Proc IEEE Int Conf onAcoustics, Speech and Signal Processing, Vol. 3, 2002, pp. 30493052.

L. Vielva, D. Erdogmus, and J. Principe, Underdetermined blind source sep-aration using a probabilistic source sparsity model, In 2nd Int Workshop onIndependent Component Analysis and Blind Signal Separation, Helsinki,

Finland, June 1920, 2000, pp. 675679.R. Viga rio, J. Sa rela , V. Jousmaki, M. Ha malainen, and E. Oja, Independentcomponent approach to the analysis of EEG and MEG recordings, IEEETrans Biomed Eng 47 (2000), 589593.

T. Virtanen, Sound source separation using sparse coding with temporalcontinuity objective, In Proc Int Computer Music Conf (ICMC 2003), 2003.

A. Westnerand V.M. Bove, Blind separation of real world audio signalsusing overdetermined mixtures, In First Int Conf on Independent ComponentAnalysis, 1999.

B. Widrow, J. Glover, J. McCool, J. Kaunitz, C. Williams, R. Hearn, J. Zei-dler, E. Dong, and R. Goodlin, Adaptive noise cancelling: Principles andapplications, Proc IEEE 63(12) (1975), 16921716.

N. Wiener, Extrapolation, interpolation and smoothing of stationary timeseries, MIT Press, Cambridge, MA, 1949.

G. Wu bbeler, A. Ziehe, B.-M. Mackert, K.-R. Mu ller, L. Trahms, and G.

Curio, Independent component analysis of non-invasively recorded corticalmagnetic DC-elds in humans, IEEE Trans Biomed Eng 47 (2000), 594 599.

A. Yeredor, Approximate joint diagonalization using non-orthogonal matri-ces, Proc ICA 2000, June 2000, pp. 3338.

A. Yeredor, Blind source separation with pure delay mixtures, In Int Work-shop on Independent Component Analysis and Blind Source SeparationConference (ICA), San Diego, CA, Dec. 912, 2001.

O. Yilmazand S. Rickard, Blind separation of speech mixtures via time-fre-quency masking, IEEE Trans Signal Process 52 (2004), 18301847.

M. Zibulevsky, P. Kisilev, Y.Y. Zeevi, and B.A. Pearlmutter. Blindsource separation via multinode sparse representation, In Advances in neuralinformation processing systems, Vol. 14, MIT Press, Cambridge, MA, 2002,pp. 10491056.

M. Zibulevskyand B.A. Pearlmutter, Blind source separation by sparsedecomposition in a signal dictionary, Neural Comput 13 (2001), 863882.

A. Ziehe, K.-R. Mu ller, G. Nolte, B.-M. Mackert, and G. Curio, Artifactreduction in magnetoneurography based on time-delayed second order cor-relations, IEEE Trans Biomed Eng 47 (2000), 7587.

Vol. 15, 1833 (2005) 33

Survey of Sparse and Non-Sparse Methods in Source Separation

Documents

Transcript of Survey of Sparse and Non-Sparse Methods in Source Separation