A rd Data Dimension Arie Yeredor nd -Order Statistics Just Weight and See! + Sept. 28 th, 2010.

Post on 28-Mar-2015

214 views 1 download

Tags:

Transcript of A rd Data Dimension Arie Yeredor nd -Order Statistics Just Weight and See! + Sept. 28 th, 2010.

a rd Data Dimension

Arie Yeredor

nd -Order Statistics

Just Weight and See!

+

Sept. 28th, 2010

Outline

• Introduction

• A Better Outline (better motivated & better detailed…)

The Classical Mixture Model

• Static, Square-Invertible, Real-Valued, Noiseless• All sources are zero-mean, mutually

independent.

X AS

1,2,...,n n n N x A s

1 2, ,..., Ks s s

1 1

2 2,

T T

T T

T TK K

K N

x s

x sX S

x s

SOS and ICA

• Traditionally, Second-Order Statistics have played “Second Fiddle” to Higher-Order Statistics in ICA.

• In classical ICA, where the sources have i.i.d. time-structures (or any different temporal structures which are ignored), SOS are insufficient for separation:

• SOS do about “half” of the job, enabling spatial whitening (“sphering”), but unable to resolve a possible residual orthogonal mixing.

SOS and ICA

• However, when the sources have diverse temporal statistical structures, SOS can be sufficient for separation.

• Moreover, if the sources are Gaussian, SOS-based separation can even be optimal (in the sense on minimum residual interference to source ratio (ISR)).

SOS and ICA

• Classical SOS approaches can roughly be divided into two categories:– Approaches exploiting the special congruence

structure of the mixtures’ correlation matrices through (approximate) joint diagonalization;

– Approaches based on the principle of Maximum Likelihood (which only in the Gaussian case can be based on SOS alone).

• Yet, some other approaches offer an interesting insight into the borderline between the two.

The sources’ temporal SOS structure

Let

denote the vector of the observed segment of the -th source, and let

denote its correlation (covariance) matrix.

1 2T

k k k ks s s N s

N N

2

2

2

1 1 2 1

2 1 2 2

1 2

k k k k k

k k k k kTk k k

k k k k k

E s E s s E s s N

E s s E s E s s NE

E s N s E s N s E s N

C s s

k

A stationary white source

A stationary MA source

A stationary AR source

A block-stationary source,iid in each block

A block-stationary source,MA in each block

A cyclostationary source

A general nonstationary source

Most existing classical methods address the case of stationary

sources:

Joint-Diagonalization Based Approaches

• First(?*) there was AMUSE (Tong et al., 1990):

– Estimate ;

– Estimatefor some ;

– Obtain a consistent estimate of from the exact joint diagonalization of and .

*) also proposed two years earlier by Fêty and Van Uffelen

1

1ˆ0 : 0N

T

n

n nN

x xR R x x

1

1ˆ:2

NT T

n

n n n nN

x xR R x x x x

0

ˆ 0xR

A ˆ xR

Joint-Diagonalization Based Approaches

• SOBI (Belouchrani et al., 1997), considered the approximate joint diagonalization of several estimated correlation matrices (at several lags);

• TDSEP (Ziehe and Müller, 1998) considered the joint diagonalization of linear combinations of estimated correlation matrices.

Likelihood Based Approaches

• QML (Pham and Garat, 1997): Solve a nonlinear set of “estimating equations” constructed from correlation matrices between filtered versions of the mixtures;

• EML (Dégerine and Zaïdi, 2004): Assume that the sources are all Auto-Regressive (AR), and jointly estimate their AR parameters and the mixing matrix, resulting in “estimating equations” similar in form to QML.

“Borderline”Approaches

• GMI (Pham, 2001): Joint diagonalization of cross-spectral matrices in the frequency-domain, using a special off-diagonality criterion (which reflects the Kullback-Leibler divergence between these matrices and their diagonal forms).

• WASOBI (Weights-Adapted SOBI, Yeredor, 2000; Tichavský et al., 2006): Apply optimal weighting to the approximate joint diagonalization used in SOBI. Can be regarded (asymptotically) as Maximum-Likelihood estimation with respect to the estimated correlation matrices, which are asymptotically Gaussian.

What about non-stationary sources?

• A few particular forms of non-stationarity have been considered, e.g.:– Block-stationary sources, iid in each block: BGL (Pham and

Cardoso, 2001) – resulting in joint diagonalization of zero-lag correlation matrices estimated over different segments.

– Block-stationary sources, AR in each block: BARBI (Tichavsky et al., 2009) – weighted AJD of lagged correlation matrices estimated in different blocks.

– Cyclostationary sources – e.g., Liang et al., 1997; Abed-Meraim et al., 2001; Ferréol, 2004; Pham, 2007; CheViet et al., 2008…

– Sources with distinct time-frequency representations: Belouchrani & Amin, 1998; Zhang & Amin, 2000; Giulieri et al., 2003; Fadaili et al., 2005…

But…

• Quite often, the sources are neither:– Stationary– Block-stationary– Cyclostationary– Sparse in Time-Frequency– …

• Then what?...

(this is the mainsubject of this talk)

We shall assume:

• Each ( -th) source has its own general covariance matrix , which is not specially structured in any way;

• All sources are jointly Gaussian and uncorrelated with each other (hence, mutually independent).

kCk

A “Fully-Blind” Scenario

• The mixing matrix, the sources and their covariance matrices are unknown.

A “Semi-Blind” Scenario• The mixing matrix and the sources are

unknown.

• The sources’ covariance matrices are known.

The Mixture Model Again

• The zero-mean sources are mutually uncorrelated, , each has its own general covariance matrix .

X A S

1 2, ,..., Ks s sT

kE k s s 0 T

k k kE C s sN N

We are now ready for the “better” outline(better motivated, better detailed…)

A “Better” Outline

• The induced Cramér-Rao Lower Bound (iCRLB) on the attainable Interference to Source Ratios (ISRs)

• The semi-blind case:– Derivation of the iCRLB– ML estimation leading to “Hybrid Exact / Approximate joint

Diagonalization” (HEAD)– An example

• The fully-blind case:– The iCRLB is essentially the same– Iterative Quasi-ML estimation based on consistent estimation of

the sources’ covariance matrices, e.g. from multiple “snapshots”

• Comparative performance demonstration• Things worth weighting for?

What is an“Induced” CRLB?

• The CRLB is a well-known lower bound on the mean square error (MSE) attainable in unbiased estimation of a vector of (deterministic) parameters.

• In the context of ICA the full set of parameters includes the elements of the mixing matrix or, equivalently, of the “demixing matrix” , and, in a fully-blind scenario, also some parameters related to the sources’ distributions.

• However, the MSE in the estimation of these parameters is usually of very little interest.

A1B A

What is an“Induced” CRLB?

• A more interesting measure of performance is the Interference to Source Ratio (ISR), measuring the residual relative energy of other sources present in each separated source.

• Let denote the estimated separation matrix.

• We define the “contamination matrix” as the matrix

XB

ˆ T X B X A

What is an“Induced” CRLB?

• We have, for the reconstructed sources,

so is the “residual mixing” matrix after separation.

• Thus, the ratio

describes the ratio between the gain of the -th source and that of the -th source in the reconstruction of the latter.

ˆ ˆ ˆ S X B X X B X A S T X S

,

,

k

k k

T

T

X

X

T X

k

What is an“Induced” CRLB?

• The -th element of the ISR matrix is defined as the mean square value of this ratio, multiplied by the ratio of energies of the -th and -th sources:

– Note that this quantity is no longer data-dependent.– Also, it is insensitive to any scaling ambiguity in the

rows of .

2,

, 2,

T

kk T

k k k k

ETISR E k

T E

s sX

X s s

,k

k

T X

What is an“Induced” CRLB?

• Under a “small errors” assumption, we have , and therefore , so , namely .

The ISR can therefore be closely approximated as

2, ,

T

k k Tk k

EISR E T k

E

s sX

s s

ˆ A X A 1ˆ B X A 1 T X A A I , ,0, 1k k kT T X X

What is an“Induced” CRLB?

Since , each element of is a linear function of , and therefore the second moment of each element of can be determined from the second moments of the elements of , which are all bounded by the CRLB.

Therefore, the CRLB on estimation of the elements of “induces” a bound on the ISR through this linear relation.

B

ˆ T X B X A T X B X

T X

B X

Derivation of the iCRLB

• A key property of the iCRLB is its equivariance with respect to the mixing matrix.

• An equivariant estimator of is any estimator satisfying , since it then follows that

so the ISR matrix depends only on properties of the sources’ distributions.

B B X

1ˆ ˆ B QX B X Q

1ˆ ˆ ˆ ˆ T X B X A B AS A B S A A B S

Derivation of the iCRLB

• Most (but certainly not all) popular ICA algorithms lead to equivariant separation.

• Does this mean that the iCRLB is equivariant?– Genreally – no;– However, it can be shown that the ML

separator is equivariant, and since the CRLB (and therefore also the iCRLB) is attained (at least asymptotically) by the ML separator, it follows that the iCRLB is equivariant as well.

Derivation of the iCRLB

• An appealing consequence of the equivariance is that we may compute the iCRLB for any value of the mixing-matrix, knowing that the same result applies to any (nonsingular) mixing matrix.

• We choose the convenient non-mixing condition

IBA

• In order to compute the iCRLB we rearrange the mixing relation:

X A S

• In order to compute the iCRLB we rearrange the mixing relation:

( ) X A S x A I s

1

2

3

4

x

x

x

x

x

1

2

3

4

s

s

s

s

s

A I

X A S

Derivation of the iCRLB (contd.)

• The covariance matrix of the zero-mean sources vector is given by the block-diagonal matrix

• The covariance matrix of the zero-mean observations vector is therefore given by

x

1

2

K

s

C

CC

C

T x sC A I C A I

s

Derivation of the iCRLB (contd.)

• Therefore, under our Gaussian model assumption the observations vector is also a zero-mean Gaussian vector,

Note that

0,N xx C

1 1

T

T

x s

x s

C A I C A I

C B I C B I

Derivation of the iCRLB (contd.)

• Beginning with the semi-blind scenario, the only unknown parameters are the elements of , and the respective elements of the Fisher Information Matrix (FIM) are well-known to be given in this case by

(a matrix).

, ,

1 1

,, ,

1Trace

2k p qB Bk p q

JB B

x x

x x

C CB C C

B

2 2K K

Derivation of the iCRLB (contd.)

• Using the relation

and the non-mixing assumption , these elements can be conveniently expressed as

Let us define .

, ,

1

,

Trace , ,

, ,

2

0 otherwise

k p q

k

B B

k p q k

N k q p kJN k p q

C C

I

1 1T T x s x sC A I C A I C B I C B I

A B I

1,

1Tracek kN

C C

Derivation of the iCRLB (contd.)• Thus, with particular ordering of the elements of

, the FIM can take a block-diagonal form, with blocks of and diagonal terms:

1,2 2,1

2,1 2,1

1,3 3,1

3,1 1,3

2,3 3,2

3,2 2,3

1,1

2,2

3,3

1

1

1

1

1

1

2

2

2

B

B

B

B

B N

B

B

B

B

θθ J

B 1 / 2K K 2 2 K

Derivation of the iCRLB (contd.)

• So

and therefore

1

, ,

,,

,

ˆ 11cov

ˆ 1

1ˆvar2

k k

kk

k k

Bk

NB

BN

,,

, ,

1ˆvar1

kk

k k

B kN

The iCRLB

• Under the non-mixing condition we have ,and therefore the CRLB induces the following iCRLB:

(with ).

,, ,

, ,

Trace1ˆvar1 Trace

T

kk k T

k k kk k

EISR B

NE

s s C

Cs s

A I ˆ ˆ T X B X A B X

1,

1Tracek kN

C C

The iCRLB – Key Properties

• Invariance with respect to the mixing matrix;• Invariance with respect to other sources: The

bound on depends only on the covariance matrices of the -th and -th sources, and is unaffected by the other sources;

• Invariance with respect to scale: The ISR bound is invariant to any scaling of the sources. Note that this property is not shared by the bound on the variance of elements of alone.

,kISRk

,kT X

The iCRLB – Key Properties

• Non-identifiability condition: If sources and have similar covariance matrices (i.e., is a scaled version of ), then , implying an infinite bound on and on - which in turn implies non-identifiability of elements of . Otherwise, it can be shown that , so if no two sources have similar covariance matrices, all ISR bounds are finite and is identifiable.– Recall, however, that this bound was developed for

Gaussian sources. With non-Gaussian sources this condition can be shown to be applicable to estimation based exclusively on SOS; And yet, when this condition is breached, the mixture may still be identifiable using HOS.

kCk

C , ,1k k

,kISR ,kISR A

, , 1k k

A

The iCRLB – Key Properties

• Resemblance to other bounds:Assuming equal-energy sources, we have:

The same general form is shared by the ISR bound obtained, e.g. by Tichavský et al. (2006) and by Ollila et al. (2008) for unit-variance sources with iid temporal structures:

where is the pdf of the -th source.

, 1, ,

, ,

1Trace

1k

k k kk k

ISR C CN

,

1var

1k

k kk k

f sISR

N f s

kf s k

But can the iCRLB be reached?

Yes, asymptotically, using ML separation

(still considering the semi-blind case)

ML Separation (Semi-Blind Case)

• Recalling the notation , the likelihood of is given by

• Differentiating with respect to each element of and equating zero, we obtain the likelihood equations:

(with ).

x A I sx

11; log det

2T T

N NL c N sx B B x B I C B I x

B

1, ,

1

ˆ ˆ 0 , 1,K

Tk k m m k m

m

N A B k K

x C x

1ˆ ˆ A B

ML Separation (Semi-Blind Case)

• Define the “generalized” correlation matrices

• With slight manipulations, the same likelihood equations can be written as

where is the -th column of , and is Kronecker’s delta.

11ˆ 1,...,k Tk k K

N R X C X

1, ,

1

ˆ ˆ 0 , 1,K

Tk k m m k m

m

N A B k K

x C x

ˆ ˆ ˆ ˆ ˆ ˆ , 1,k kT T Tk k k k k K B R B e e e B R B e

ke k KI k

The HEAD Problem• The set of equations

can be seen as a “hybrid” exact-approximate joint diagonalization condition, termed the “HEAD” problem (Yeredor, 2009) and also, in a different context, termed “Structured Joint Congruence transformation” – StJoCo (Song et al., 2010).

ˆ ˆ ˆ , 1,kT Tk k k K e B R B e

What is HEAD?• “Classical” Approximate Joint Diagonalization:

Given a set of “target matrices” , each of dimensions , find a matrix , such that the “transformed” matrices are “as diagonal as possible” (often subject to some scaling constraints).

• HEAD: The number of matrices in the set equals the matrices’ dimension ( ). The -th transformed matrix is exactly “diagonal” in its -th row and column, with the scaling constraint of being at the -th location. All other values are irrelevant.

M

ˆ m T B R B

K K B

ˆ mR

M K k ˆ k T B R B

k1 ,k k

HEAD (contd.)• Note that HEAD is a set of nonlinear equations, not an

optimization problem.• It has been shown (Yeredor, 2009) that if all (symmetric)

target-matrices are positive-definite, then a solution of HEAD must exist.

• The HEAD problem has already been encountered in the context of ICA (in slightly different forms) in QML (by Pham & Garat, 1997) and in exact ML separation of AR sources (Dégerine & Zaïdi, 2004).

• Different iterative solutions have been proposed by Pham & Garat, 1997, by Dégerine & Zaïdi, 2004, by Yeredor, 2009 and by Song et al., 2010.

HEAD for General Matrices

HEAD for General Matrices

HEAD for “Nearly Jointly Diagonalizable” Matrices

HEAD for “Nearly Jointly Diagonalizable” Matrices

Summary of ML Separation(the semi-blind scenario)

• Inputs: Observed mixtures: ( )Sources’ covariance matrices:

( )• Construct “generalized” correlation matrices:

• Obtain the estimated demixing matrix as the solution to the HEAD problem,

X

1 2, ,..., KC C C

K N

N N

1ˆ 1,...,k Tk k K R X C X

ˆ ˆ ˆ 1,...,k Tk k k K B R B e e

B

An Example

• To capture the essence of the performance improvement relative to classical methods, we first consider two sources with parametrically-controlled temporal and spectral diversity.

Experiment Setup

• We generated two MA sources of length , multiplied by temporal envelopes.

• The zeros are at:– Source 1:– Source 2:

• The envelope is a Laplacian window of nominal half-width , centered around:– for source 1;– for source 2.

je 9.0,9.0,8.0

0.9, 0.8, 0.9 je

W50n150n

200N

Experiment Setup

Dependence of spectral diversity on :

Experiment Setup

Dependence of spectral diversity on :w

Comparative Performance Demonstration

• We compare the following approaches:– SOBI: Using ordinary correlation matrices up to lag 4.

Can only capture the spectral diversity;– BGL: Using zero-lag correlation matrices taken over

two blocks, one for , one for .Can only capture the temporal diversity;

– “SOBGL”: Jointly diagonalize the SOBI and BGL matrices. Can capture both, but is ad-hoc and sub-optimal.

– ML: optimally captures both.

100:1n 101: 200n

Comparative Performance Demonstration

45 90-45

-40

-35

-30

-25

-20

-15

-10

ISR

1,2 [

dB]

zeros' angle [degs.]

iCRLB

SOBI

BGLSOBGL

ML

25 50 100 200-45

-40

-35

-30

-25

-20

-15

-10

effective half-width [samples]45 90

-45

-40

-35

-30

-25

-20

-15

-10

ISR

1,2 [

dB]

zeros' angle [degs.]

iCRLB

SOBI

BGLSOBGL

ML

25 50 100 200-45

-40

-35

-30

-25

-20

-15

-10

effective half-width [samples]45 90

-45

-40

-35

-30

-25

-20

-15

-10

ISR

1,2 [

dB]

zeros' angle [degs.]

iCRLB

SOBI

BGLSOBGL

ML

25 50 100 200-45

-40

-35

-30

-25

-20

-15

-10

effective half-width [samples]45 90

-45

-40

-35

-30

-25

-20

-15

-10

ISR

1,2 [

dB]

zeros' angle [degs.]

iCRLB

SOBI

BGLSOBGL

ML

25 50 100 200-45

-40

-35

-30

-25

-20

-15

-10

effective half-width [samples]45 90

-45

-40

-35

-30

-25

-20

-15

-10

ISR

1,2 [

dB]

zeros' angle [degs.]

iCRLB

SOBI

BGLSOBGL

ML

25 50 100 200-45

-40

-35

-30

-25

-20

-15

-10

effective half-width [samples]

Now let’s consider the fully-blind case

The Fully-Blind Case

• Obviously, the semi-blind scenario is often non-realistic.

• We therefore turn to consider the fully-blind scenario, where the sources’ covariance matrices are unknown.

• Sometimes in a fully-blind scenario estimation of the unknown covariance matrices from the data can be made possible.

The Fully-Blind Case

• If the sources’ covariance matrices are succinctly parameterized in any way, and the sources exhibit sufficient ergodicity, these parameters may be estimated from a single realization of each source.

• For example, if the sources are stationary AR processes, the AR parameters may be consistently estimated from a single (sufficiently long) realization of each source.

• An iterative approach can then be taken: apply any initial (consistent) separation, estimate the covariance matrix for each separated source, and then plug the estimated covariance matrices into the semi-blind ML estimation. Repeat to refine, if necessary.

• This approach is taken, e.g. in WASOBI for AR sources.

The Fully-Blind Case• For non-stationary sources, estimation of the covariance

from a single realization is usually impossible (depending on the parameterization, if any).

• However, sometimes repeated realizations (“snapshots”) of the nonstationary mixture may be available, each realization being triggered to some external stimulus.

The Fully-Blind Case• For non-stationary sources, estimation of the covariance

from a single realization is usually impossible (depending on the parameterization, if any).

• However, sometimes repeated realizations (“snapshots”) of the nonstationary mixture may be available, each realization being triggered to some external stimulus.

The Fully-Blind Case

• This is a “3rd Data Dimension”.• It is then possible to take a similar iterative

approach, where following initial separation (of all the mixtures realizations), the covariance matrix of each source is estimated from the estimated ensemble of sources realizations.

• If the covariance matrices can be succinctly parameterized, the required number of snapshots may be relatively modest.

What About the Bound?

• In the fully-blind case the vector of unknown parameters is augmented with parameters related to the unknown covariance matrices (at most per each matrix, but possibly fewer).

• However, it can be shown (Yeredor, 2010) that if the determinants of the sources’ covariance matrices are all known (namely, do not depend on the unknown parameters), then the resulting FIM is block-diagonal, with the two distinct blocks accounting for the elements of the mixing (or demixing) matrix and for the unknown parameters of the covariance matrices.

2K

1 / 2N N

What About the Bound?

• This implies, that if the determinants of the covariance matrices are known, then the CRLB on estimation of all elements of the demixing (or mixing) matrices is the same in the fully-blind as in the semi-blind case.

,

BB θ

θ

J 0J

0 J

What About the Bound?

• Moreover, it can further be shown, that for the iCRLB to be the same in the semi- and fully-blind scenarios, knowledge of the determinants is not necessary.

• When the determinants are unknown, the only off-block-diagonal elements in the FIM under a non-mixing condition ( ) are those involving the diagonal elements of , whose variance does not affect the iCRLB.

A = IB

What About the Bound?

• Therefore, the iCRLB is indifferent to the knowledge of the sources’ covariance matrices.

• Of course, this does not mean in general that the same ISR attainable in the fully-blind case is always attainable in the semi-blind case as well.

• Nevertheless, it does imply that in scenarios involving multiple independent snapshots, the ML estimate, attaining the iCRLB asymptotically (in the number of snapshots) would indeed exhibit the same (optimal) asymptotic performance in the fully-blind as in the semi-blind cases.

“Blind-ML” Separation(the fully-blind scenario)

• Inputs: Snapshots of the observed mixtures: ( matrices, each )

• Apply some initial, consistent separation, obtaining estimates of the sources’ snapshots:

• From the obtained snapshots of estimated sources, estimate the covariance matrices of each source, either directly or via some succinct parameterization, obtaining: ( matrices, each )

1 2, ,..., MX X X

1 2ˆ ˆ ˆ, ,..., KC C C

K N

ˆ ˆ 1,...,m m m M S B X

M

K N N

“Blind-ML” Separation(the fully-blind scenario)

• Using the estimated covariance matrices, construct “generalized” correlation matrices:

• Obtain the estimated residual demixing matrix as the solution to the HEAD problem,

• The estimated overall demixing matrix is .

1

1

1 ˆ ˆˆ 1,...,M

k Tm k m

m

k KM

R S C S

ˆ ˆ ˆ 1,...,k Tres res k k k K B R B e e

ˆresB

ˆ ˆres B B

Simulation Results

For the fully-blind case

A Single-Snapshot(nearly) Fully-Blind Experiment

• We first consider a case of cyclostationary sources, where the covariance matrix can be consistently estimated from a single-snapshot.

• We generated four nonstationary AR sources, driven by periodically-modulated driving sequences.

A Single-Snapshot Experiment (contd.)

is a Gaussian white-noise process.

Let

and then

nvk

1 cos 2k k k kk

nv n A v n

T

1 21 2k kk k k ks n a v n a s n v n

1,2,3,4k

A Single-Snapshot Experiment (contd.)

We used the following parameters:(note that and are stationary)

0.50.600

1450550--

5070--

Poles’ magnitudes0.750.70.70.75

Poles’ phases-/+950-/+850-/+950-/+850

1s 2s 3s 4s

kA

k

kT

3s 4s

C1

20 40 60 80 100 120 140 160 180 200

20

40

60

80

100

120

140

160

180

200

C2

20 40 60 80 100 120 140 160 180 200

20

40

60

80

100

120

140

160

180

200

C3

20 40 60 80 100 120 140 160 180 200

20

40

60

80

100

120

140

160

180

200

C4

20 40 60 80 100 120 140 160 180 200

20

40

60

80

100

120

140

160

180

200

A Single-Snapshot Experiment (contd.)

• Consistent (though sub-optimal) covariance matrices estimation (via estimation of the parameters):

– Attain a near-separation condition;– Apply Yule-Walker equations to each separated

source to obtain estimates of its AR parameters;– Use inverse filtering to recover (estimate) the

driving sequences;– Using a DTFT of the squared estimated driving

sequences obtain an estimate of the period .– Using a linear LS fit of the squared amplitude,

obtain estimates of and .

kT

kA k

A Single-Snapshot Experiment (contd.)

• The estimated parameters are then used for constructing the estimated covariance matrix (for each source):

with

and then

1

2 1

2 1

ˆ1 0 0 1 0 0

ˆˆ 1 0 2ˆ ˆˆ ˆ 1 3

0 0

ˆˆ ˆ0 1 0 0

1 k

k k

k k kk

k k k

d

a d

a a d

a a d N

H

2ˆ ˆ ˆ1 cosˆ

kk k

k

d n A nT

ˆ ˆ ˆ ( 1,2,3,4.)Tk k k k C H H

A Single-Snapshot Experiment: Results

The results are presented in terms of each element of the ISR matrix, vs. the observation length .N

102

103

104

10-2

102

103

104

10-2

102

103

104

10-2

102

103

104

10-2

102

103

104

10-2

102

103

104

10-2

102

103

104

10-2

102

103

104

10-2

102

103

104

10-2

102

103

104

10-2

102

103

104

10-2

102

103

104

10-2

iCRLB

ML

WASOBI

Blind ML

102

103

104

10-2

102

103

104

10-2

102

103

104

10-2

102

103

104

10-2

102

103

104

10-2

102

103

104

10-2

102

103

104

10-2

102

103

104

10-2

102

103

104

10-2

102

103

104

10-2

102

103

104

10-2

102

103

104

10-2

iCRLB

ML

WASOBI

Blind ML

102

103

104

10-2

102

103

104

10-2

102

103

104

10-2

102

103

104

10-2

102

103

104

10-2

102

103

104

10-2

102

103

104

10-2

102

103

104

10-2

102

103

104

10-2

102

103

104

10-2

102

103

104

10-2

102

103

104

10-2

iCRLB

ML

WASOBI

Blind ML

102

103

104

10-2

102

103

104

10-2

102

103

104

10-2

102

103

104

10-2

102

103

104

10-2

102

103

104

10-2

102

103

104

10-2

102

103

104

10-2

102

103

104

10-2

102

103

104

10-2

102

103

104

10-2

102

103

104

10-2

iCRLB

ML

WASOBI

Blind ML

A Multiple-Snapshot(nearly) Fully-Blind Experiment

• Next, we consider time-varying AR (TVAR) processes (of order 4):

with

such that the “instantaneous poles” drift linearly:

.5,4,3,2,14

1

knvpnsnans kp

kkpk

44

1

1 1

1 1k kpp p

p p

a n z n z

1 0k k kp p p

n nn N

N N

A Multiple-Snapshot(nearly) Fully-Blind Experiment

• The sources’ covariance matrices are estimated via estimation of the time-varying AR parameters for each source from multiple snapshots, using the Dym-Gohberg algorithm (1981), which is roughly based on a “local Yule-Walker equations” approach.

• The information on the linear poles’ variation is not exploited, no particular relation between the TVAR parameters is assumed.

• Note that covariance estimation under the TVAR model requires the estimation of roughly parameters, rather than parameters per source, thus requiring considerably fewer snapshots for reliable estimation.

NP 1 / 2N N

A Multiple-Snapshot(nearly) Fully-Blind Experiment

• Results are presented in terms of the overall mean ISR (for all five sources) vs. the number of snapshots . The observation length was .

M200N

10 20 40 100 200 40010

-5

10-4

10-3

10-2

10-1

100

M

Overall mean ISR

iCRLB

MLWASOBI

BML

10 20 40 100 200 40010

-5

10-4

10-3

10-2

10-1

100

M

Overall mean ISR

iCRLB

MLWASOBI

BML

10 20 40 100 200 40010

-5

10-4

10-3

10-2

10-1

100

M

Overall mean ISR

iCRLB

MLWASOBI

BML

10 20 40 100 200 40010

-5

10-4

10-3

10-2

10-1

100

M

Overall mean ISR

iCRLB

MLWASOBI

BML

Things Worth Weighting For…?

• We have seen that the asymptotically- optimal ML separation applies the HEAD solution to a set of “generalized” correlation matrices.

• For AR sources of order , WASOBI applies optimally-weighted AJD to a set of “ordinary” correlation matrices.

• Are these two (apparently) different approaches related in any way?

K

1P

P

Let’s Compare

• Denote the “generalized” correlation matrices

• Denote the “ordinary” correlation matrices

1 2, ,..., KR R R

ˆ ˆ ˆ0 , 1 ,..., PR R R

Let’s Compare

• Assume nearly-separated sources, so

where is a “small” matrix, to be estimated from the data.

• In addition, since the sources are nearly separated, all “generalized” and “ordinary” correlation matrices are nearly diagonal, meaning that their off-diagonal elements are generally much smaller than their diagonal elements.

1ˆ ˆˆ ˆ ˆ A I E B A I E

E

Let’s Compare: HEAD

• We need to solve the HEAD equations

but (neglecting small terms to second order)

so (still neglecting small terms to second order)

so that are only relevant forand

,,

ˆ ˆ , 1,Tk

k

k k K B R B

ˆ ˆk k kT Tk kT B B I E I E ER R R ER R

, , , ,,

,

ˆ ˆ ˆ ˆTk k

k

k k k kk k kER R RE RB B

, ,ˆ ˆ,k kE E

,

ˆ ˆ Tk

k BRB

,

ˆ ˆ Tk

B BR

Let’s Compare: HEAD

• We end up solving in pairs (for each ):

or

, ,, ,

, , ,

,

, ,

ˆ ˆ 0

ˆ ˆ 0

k k

k k

k k kk k k

k k k

E ER R R

R R RE E

k

,, , ,

,,, ,

ˆ

ˆ

k k kk k k

k k

k

k k

R R R

R ER

E

R

Let’s Compare: WASOBI

• The LS AJD fit requires:

where .

• Neglecting small terms to second-order,

• The equations for a given pair (for all ) involve only and .

ˆ ˆ ˆˆ ˆ ˆ ˆ , 0,1,...,T

Tp p p p P A A I E I ER D D

Ddia ˆgˆ p pD R

ˆ ˆ ˆˆ ˆˆ Tp p p p R D D DE E

, ,, , ,ˆ ˆ ˆˆ ˆ 1,k kkR p R p R EpE k K

,k p

,ˆkE ,

ˆkE

Let’s Compare: WASOBI

• The LS equations can therefore be decoupled for each pair, taking the form:

• But in order to apply optimal weighting, we generally need the joint covariance matrix of all the vectors , which, for convenience, we shall refer to as Off-Diagonal Terms (ODIT) vectors.

, , ,

, , ,

, , ,

,

,

ˆ ˆ ˆ0 0 0

ˆ ˆ ˆ1 1 0

ˆ ˆ ˆ 0

ˆ

ˆ

k k k

k

k

kk

k

k

k

k

R R R

R R R

R P R R

E

E

P

,,

,

, ,ˆ ˆ ˆˆ

ˆk kk

k

k

E

E

r r r

,ˆkr

Rearrangements of Elements:The ODIT Vectors

Rearrangements or Elements:The ODIT Vectors

1,2

1,3

1,4

2,3

2,4

3,4

ˆ

ˆ

ˆ

ˆ

ˆ

ˆ

r

r

r

r

r

r

Let’s Compare: WASOBI (contd.)

• Fortunately, however, if the sources are (nearly) separated, this covariance matrix can be easily shown to be block-diagonal, and the optimally weighted LS equations can be decoupled as well:

each with a weight matrix , leading to

,,

,

, ,ˆ ˆ ˆ

ˆ

ˆk kk

k

k

E

E

r r r

1

,, Cov ˆkk

rW

, , , , , ,

1

,

,

,,

,

ˆˆ ˆ ˆ ˆ ˆ ˆ ˆ

ˆ k k

T Tkk kk k k

k

k k

E

E

r r r Wr r r rW

Let’s compare: WASOBI (contd.)

• But this equation

can also be written as:

• Recall the HEAD equation

, , , , , ,,, , ,

, ,, ,,, , , , ,

ˆ ˆ ˆ ˆ ˆ ˆ

ˆ ˆ ˆ ˆ ˆ ˆ

ˆ

ˆ

T T Tk k k

T T Tk

kk k k

k kk k k k k k k kk k

E

E

r r r r r r

r r r

W W W

W W Wr r r

, , , , , ,

1

,

,

,,

,

ˆˆ ˆ ˆ ˆ ˆ ˆ ˆ

ˆ k k

T Tkk kk k k

k

k k

E

E

r r r Wr r r rW

,, , ,

,,, ,

ˆ

ˆ

k k kk k k

k k

k

k k

R R R

R ER

E

R

Let’s Compare: Conclusion

• Indeed, it can be shown that when all the sources are Gaussian AR processes of maximal order , the two sets of equations become asymptotically equivalent, supporting the claim of general optimality of the optimal WLS fit of the estimated “ordinary” correlation matrices.

• In other cases (nonstationary / non-AR / higher order AR sources) the “ordinary” correlation matrices are not a sufficient statistic, and therefore even if optimal weighting is used, optimal separation cannot be attained without the explicit use of the appropriate “generalized” correlation matrices . 1 2, ,..., KR R R

P

ˆ ˆ ˆ0 , 1 ,..., PR R R

Let’s Compare: Conclusion

• However, in general fully-blind scenarios, when the sources’ covariance matrices are unknown and the “generalized” correlation matrices cannot be constructed, it might make sense to use AJD of “ordinary” correlations instead (compromising optimality).

• Rather than having to estimate the sources’ covariance matrices (each ), it would suffice to estimate the covariance matrices of the ODIT vectors (each ) – which would be rather easy using the multiple snapshots (if available).

K

N N 1 / 2K K

,ˆkr K K

An Example

• We used the same TVAR sources as before, modifying WASOBI to use an empirical “ODIT weighting” approach, which employs the empirical estimates of the covariance of each ODIT vector , based on the multiple snapshots.

,ˆkr

10 20 40 100 200 40010

-5

10-4

10-3

10-2

10-1

100

M

Overall mean ISR

iCRLB

ML

WASOBIBML

ODIT Weighting

10 20 40 100 200 40010

-5

10-4

10-3

10-2

10-1

100

M

Overall mean ISR

iCRLB

ML

WASOBIBML

ODIT Weighting

Conclusion

• We considered the framework of SOS-based separation for Gaussian sources with arbitrary temporal covariance structures;

• We derived the iCRLB on the attainable ISR;

• For the semi-blind scenario we have shown that the asymptotically-optimal ML separation is attained by solving the HEAD problem for “generalized” correlation matrices;

• For the fully-blind case the iCRLB remains the same, and asymptotically-optimal “blind-ML” separation is still possible if the sources’ covariance matrices can be consistently estimated (e.g., from multiple snapshots);

Conclusion

• We explored the relation between applying HEAD to “generalized” correlation matrices and applying weighted AJD to “ordinary” correlation matrices;

• A sub-optimal alternative to estimating the sources’ full covariance matrices in a fully-blind scenario, is to estimate the covariance matrices of the much smaller ODIT vectors, and use as weights in the AJD process (following an initial separation stage).

A “Better” Conclusion?

“Good things come to those who Weight”…

Thank You!