Performance Analysis for Sparse Support Recoverynehorai/research/sparse/...Performance Analysis for...

Performance Analysis for Sparse Support Recovery

Gongguo Tang and Arye Nehorai

ESE, Washington University

April 21st 2009

Gongguo Tang and Arye Nehorai (Institute) Performance Analysis for Sparse Support Recovery April 21st 2009 1 / 41

Outline

1 Background and Motivation2 Research Overview3 Mathematical Model4 Theoretical Analysis5 Conclusions6 Future Work


Background and Motivation

Background and Motivation


BackgroundBasic Concepts and Notations

Sparse signals refer to a set of signals that have only a few nonzerocomponents under a common basis/dictionary.

The set of indices corresponding to the nonzero components arecalled the support for the signal.

If several sparse signals share a common support, we call them jointlysparse.

Sparse signal support recovery aims at identifying the true support ofjointly sparse signals through its noisy linear measurements.

Suppose that S is an index set, then for x 2 FN a vector, xS denotesthe vector formed by those components of x indicated by S; forA 2 FM�N a matrix, AS denotes the matrix formed by thosecolumns indicated by S.


BackgroundReview of Compressive Sensing

Long-established paradigm for digital data acquisitionsample data at Nyquist rate (2x bandwidth)compress data (signal-dependent, nonlinear)brick wall to resolution/performance

This slide is adapted from R. Baraniuk, J. Romberg and M. Wakin�s "Tutorial on Compressive

Sensing".Gongguo Tang and Arye Nehorai (Institute) Performance Analysis for Sparse Support Recovery April 21st 2009 5 / 41

"Why go to so much e¤ort to acquire all the data when most ofwhat we get will be thrown away? Can�t we just directly measurethe part that won�t end up being thrown away?"

� David L. Donoho



Directly acquire �compressed�dataReplace samples by more general �measurements�

K < M � N


Sensing".



When data is sparse/compressible, we can directly acquire acondensed representation with no/little information lossRandom projection will work


Sensing".


BackgroundPrevious Assumptions

When there are measurement noises, there are di¤erent criteria formeasuring the recovery performance

various lp norms E kbx� x�kp, especially l2 and l1predictive power (e.g., E ky� byk2

2, where by is the estimate of y basedon bx0� 1 loss associated with the event of recovering the correct support S

Assumptions on noise

bounded noisesparse noiseGaussian noise


BackgroundPrevious Assumptions

Assumptions on sparse signal

deterministic with unknown support but known component valuesdeterministic with unknown support and unknown component valuesrandom with unknown support

Assumptions on measurement matrix

standard Gaussian ensembleBernoulli ensemblerandom but with a structure such as Toeplitzdeterministic


MotivationWhy Support Recovery?

The support of a sparse signal has physical signi�cance

the timing of events

the locations of objects or anomalies

Compressive Radar ImagingCompressive Sensor Network

the frequency components

Compressive Spectrum Analysis

the existence of certain substances such as chemicals and mRNAs

Compressed Sensing DNA Microarrays


MotivationTheoretical Consideration

After the recovery of the support, the magnitudes of the nonzerocomponents can be obtained by a solving a least-square problem


MotivationOther Applications leading to Support Recovery

Consider parameter estimation problem associated with the followingwidely applied model,

y (t) = A (θ) x (t) +w (t) , t = 1, � � � , T,

where A (θ) =�

ϕ (θ1) ϕ (θ2) � � � ϕ (θK)�and θ1, θ2, � � � , θK

are true parameters.

In order to solve this problem, we sample the parameter space to�θ1, θ2, � � � , θN

and form

A�θ�=�

ϕ�θ1�

ϕ�θ2�� ϕ

�θN� �. De�ne vector x (t) by

setting its components to those of x (t) when their locationscorrespond to true parameters and zero otherwise. Then we havetransformed a traditional parameter estimation problem to one ofsupport recovery.


Research Overview

Research Overview


Research Overview

Introduce hypothesis testing problems for sparse signal supportrecovery

Derive an upper bound for the probability of error (PoE) for generalmeasurement matrix

Study the e¤ect of di¤erent parameters

Analyze the PoE for multiple hypothesis testing and its implicationsfor system design


Mathematical Model

Mathematical Model


Mathematical ModelMeasurement Model

We will focus on the following model:

y (t) = Ax (t) +w (t) , t = 1, � � � , T, (1)

or in matrix form

Y = AX +W.

Here we have x (t) 2 FN, w (t) 2 FM, y (t) 2 FM with F = R or C.

X, W, Y are matrices with columns formed by fx(t)gTt=1, fw(t)gT

t=1,fy(t)gT

t=1 respectively.

Our analysis involves a constant κ which is 12 for F = R and 1 for

F = C.

Generally M is the dimension of hardware while T is the number oftime samples. Hence increasing M is more expensive.


Mathematical ModelAssumptions on Signal and Noise

We have the following assumptions:

fx(t)gTt=1 are jointly sparse signals with a common support

S = supp (X) .

fxS (t)gTt=1 follow i.i.d. FN (0, IK).

fw (t)gTt=1 follow i.i.d. FN (0, σ2IM) and are independent of

fx(t)gTt=1. Note that the noise variance σ2 can be viewed as 1/SNR.


Mathematical ModelAssumptions on Measurement Matrix

We consider two types of measurement matrices:

1 Non-degenerate measurement matrix: we say that a generalmeasurement matrix AM�N is non-degenerate if every M�Msubmatrix of A is nonsingular.

2 Gaussian measurement matrix: The element of A, say, aij follows i.i.d.FN (0, 1).


Mathematical ModelHypothesis Testing

We focus on two hypothesis testing problem:

1 Binary hypothesis testing (BHT) with jS0j = jS1j:�H0 : supp (X) = S0H1 : supp (X) = S1

.

2 Multiple hypothesis testing (MHT) :8><>:H1 : supp (X) = S1

...HL : supp (X) = SL

.

where Si�s are candidate supports with the same cardinality jSij = K.


Mathematical ModelProbability of Error

Our aim is to calculate an accurate upper bound for the PoE andanalyze the e¤ect of M, T, and noise variance σ2.

perr (A) =12

ZH1

Pr(YjH0)dY +12

ZH0

Pr(YjH1)dY

for BHT and

perr (A) =L

∑i=1

1L

ZHj :j 6=i

Pr(YjHi)dY

for MHT.


Theoretical Analysis

Theoretical Analysis


Theoretical AnalysisOptimal Decision Rule for BHT

Y = AX +W

The BHT problem is equivalent to deciding between two distributionsof Y:

YjH0 � FNM,T(0, Σ0 IT) or YjH1 � FNM,T(0, Σ1 IT),

where Σi = σ2IM +ASiA†Si.

With equal prior probabilities of S0 and S1, the optimal decision ruleis given by the likelihood ratio test:

f (YjH1)

f (YjH0)

H1

RH0

1 , trhY†�

Σ�11 � Σ�1

0

�Yi H1

QH0

T logjΣ0jjΣ1j


Theoretical AnalysisCalculation of PoE for BHT

Due to the symmetry of H0 and H1, we can just compute theprobability of false alarm

pFA = Pr fH1jH0g

= Pr�

trhY†�

Σ�11 � Σ�1

0

�Yi< T log

jΣ0jjΣ1j

jH0

�= Pr

�trhZ†�

Σ1/20 Σ�1

1 Σ1/20 � IM

�Zi< T log

jΣ0jjΣ1j

jH0

�,

where Z = Σ�1/20 Y � FN (0, IM IT).

We de�ne H = Σ1/20 Σ�1

1 Σ1/20 with Σi = ASiA

†Si+ σ2IM, which is a

fundamental matrix in our analysis.


Theoretical AnalysisCalculation of PoE for BHT

Suppose the ordered eigenvalues of H areσ1 < σ2 < � � � < σk1 < 1 = 1 = � � � = 1 < λ1 < λ2 < � � � < λk0 .,and H can be diagonalized by an orthogonal/unitary matrix Q.Then the transformation of Z = QN will give us

pFA = Pr fk0

∑i=1(λi � 1)

T

∑t=1jNitj2 �

k1

∑i=1(1� σi)

T

∑t=1

��N(i+k0)t

��2< T log

jΣ0jjΣ1j

jH0g


Theoretical AnalysisEigenvalue Structure of H

The eigenvalue structure of H, especially the eigenvalues that are greaterthan 1, determines the performance of measurement matrix A indistinguishing between di¤erent supports. We study the structure of H ina slightly general seting where the sizes of the two candidate supportsmight not be equal.

Problem1 How many eigenvalues of H are less than 1, greater than 1 and equalto 1? Is there a general rule?

2 Can we give tight lower bounds on the eigenvalues that are greaterthan 1? The bounds should have a nice distribution that can behandled easily.



M = 200, jS0 \ S1j = 20, jS0nS1j = 80, jS1nS0j = 60 and theelements of A are i.i.d. real Gaussian.



Note that jS1nS0j = 60 eigenvalues of H are less than 1,jS0nS1j = 80 greater than 1, and M� (jS0nS1j+ jS1nS0j) = 60identical to 1.



TheoremSuppose ki = jS0 \ S1j, k0 = jS0nS1j, k1 = jS1nS0j and M > k0 + k1, forgeneral non-degenerate measurement matrix, k0 eigenvalues of matrix Hare greater than 1, k1 less than 1 and M� (k0 + k1) equal to 1.

Note that from the bound we present later,q

∏k0i=1 λi ∏k1

i=1 (1/σi)determines the performance of the optimal BHT decision rule. Hence,generally and quite intuitively, the larger the di¤erence set S0∆S1, theeasier to distinguish between the two candidate supports.



TheoremFor Gaussian measurement matrix, the sorted eigenvalues of H that aregreater than 1 are lower bounded by those of Ik0 +

1σ2 V with probability

one, where V is a matrix obtained from measurement matrix A and Vfollows Wk0 (Ik0 , 2κ (M� k1 � ki)).

We comment that generally the larger M� k1 � ki = M� jS1j, the largerthe eigenvalues of Ik0 +

1σ2 V, and hence the better we can distinguish the

true support from the false one.


Theoretical AnalysisA Lower Bound on Eigenvalues

M = 200, jS0 \ S1j = 20, jS0nS1j = 80, jS1nS0j = 60, σ2 = 4 andthe element of A are i.i.d. real Gaussian.Blue line represents the true sorted eigenvalues of H that are greaterthan 1 and red line represents the lower bound.


Theoretical AnalysisBound on PoE

TheoremThe Probability of False Alarm can by bounded by

pFA = Pr (S1jH0) �(�

λg (S0, S1)

4

�kd/2 �λg (S1, S0)

4

�kd/2)�κT

,

where kd = jS0nS1j , λg (S0, S1) =kd

q∏kd

j=1 λj with λj�s the eigenvalues of

H =�

AS0A†S0+ σ2IM

�1/2 �AS1A†

S1+ σ2IM

��1 �AS0A†

S0+ σ2IM

�1/2that

are greater than one.


Theoretical AnalysisImplications of the Bound

The bound can be equivalently written as

q∏

kdi=1 λi ∏

kdi=1(1/σi)

4

!�κkdT

with λi�s and σi�s eigenvalues of H that are greater and less than 1,respectively. Hence these eigenvalues determines the systems abilityin distinguishing two supports.

As we will see the minimum of all λg�Si, Sj

��s determines the systems

ability in distinguishing all candidate supports, and can be viewed as ameasure of incoherence.

The logarithm of the bound can be approximated by�κkdT

� 12 log

�λg (S0, S1) λg (S1, S0)

�� log 4

. Hence, if we can

guarantee that λg (S0, S1) λg (S1, S0) of our measurement matrix isgreater than some constant, then we can make the pFA arbitrarilysmall by taking more temporal samples.


Theoretical AnalysisMultiple Hypothesis Testing

Now we turn to the MHT problem8><>:H1 : supp (X) = S1

...HL : supp (X) = SL

.

where Si�s are candidate supports with the same cardinality jSij = K andL = CK

N, the total number of candidate supports with size K.


Theoretical AnalysisPoE for MHT

Theorem

Denote by λmin = min�

λg�Si, Sj

�, then the total PoE for MHT can be

bounded by

perr � C expn�κT

hlog�λmin

�� log (4K (N� K))

1κT

io.


Theoretical AnalysisMultiple Hypothesis Testing

Theorem

For T = O�

log Nlog[K log N

K ]

�and M = O(K log (N/K)), then

Prn

λmin > 4 [K (N� K)]1

κT

o�! 1,

as N, K, M �! ∞.


Theoretical AnalysisDiscussion

M = O(K log (N/K)) is the same as conventional compressivesensing. We need MT samples in total. When K is su¢ ciently smallcompared with N, this value is still much smaller than N.Actually the value of T is not very large. For example, for N = 10100,K = 105, we have log N

log[K log NK ]� 13; for N = 10100, K = 1098, we have

log Nlog[K log N

K ]� 1;

After we recover the support, we can get the component values bysolving a least-square problem.


Theoretical AnalysisImplications of the Bound

In practice, given N, K, we take M = O(K log (N/K)),

T = O�

log Nlog[K log N

K ]

�and generate measurement matrix A. Then with

large probability, we will get λmin > 4 [K (N� K)]1

κT . For safety, we can

compute λmin

�nd T large enough such that λmin > 4 [K (N� K)]1

κT

continue to increase T so that perr < α.


Conclusions

Hypothesis testing for sparse signal support recovery

BHTMHT

Bound for PoE non-degenerate measurement matrix

The behavior of critical quantity

Implications in system design

Another dimension of data collection gives us more �exibility


Future Work

Design measurement system with optimal λmin.

Establish a necessary condition imposed on M and TAnalyze the behavior of λ (S0, S1) and λmin for other measurementmatrix structures.

Devise an e¢ cient algorithm for support recovery and compare itsperformance with the optimal one

The performance of l1 minimization algorithm

Develop an algorithm to compute λmin for given measurement matrix

Explore the relationship between λmin and Restricted IsometryProperty (RIP).

Apply this result to the design of transmitted signals in CompressiveRadar Imaging


Thank you!


Performance Analysis for Sparse Support Recoverynehorai/research/sparse/...Performance Analysis for...

Documents

Transcript of Performance Analysis for Sparse Support Recoverynehorai/research/sparse/...Performance Analysis for...