Reconstruction of Signals Drawn from a Gaussian …people.ee.duke.edu/~lcarin/renna13.pdf1...

1

Reconstruction of Signals Drawn from a GaussianMixture via Noisy Compressive Measurements

Francesco Renna⇤, Member, IEEE, Robert Calderbank, Fellow, IEEE, Lawrence Carin, Fellow, IEEE,and Miguel R. D. Rodrigues, Member, IEEE

Abstract—This paper determines to within a single mea-surement the minimum number of measurements required tosuccessfully reconstruct a signal drawn from a Gaussian mixturemodel in the low-noise regime. The method is to developupper and lower bounds that are a function of the maximumdimension of the linear subspaces spanned by the Gaussianmixture components. The method not only reveals the existenceor absence of a minimum mean-squared error (MMSE) errorfloor (phase transition) but also provides insight into the MMSEdecay via multivariate generalizations of the MMSE dimensionand the MMSE power offset, which are a function of theinteraction between the geometrical properties of the kernel andthe Gaussian mixture. These results apply not only to standardlinear random Gaussian measurements but also to linear kernelsthat minimize the MMSE. It is shown that optimal kernels do notchange the number of measurements associated with the MMSEphase transition, rather they affect the sensed power requiredto achieve a target MMSE in the low-noise regime. Overall, ourbounds are tighter and sharper than standard bounds on theminimum number of measurements needed to recover sparsesignals associated with a union of subspaces model, as they arenot asymptotic in the signal dimension or signal sparsity.

Index Terms—Compressive sensing, Gaussian mixtures, recon-struction, classification, MMSE, MMSE decay, MMSE poweroffset, phase transition, kernel design

I. INTRODUCTION

The foundation of the digital revolution is the Shannon-Nyquist theorem, which provides a theoretical basis for digitalprocessing of analog signals: it states that the sampling rateshould be at least twice the Fourier bandwidth of the signal.The discrete-time representation of a continuous-time signal,which lies at the heart of analogue to digital conversion,

EDICS: SAM-CSSM, SSP-PERF.This paper was presented in part at the 2013 IEEE Global Conference onSignal and Information Processing.The work of F. Renna was supported by Fundacao para a Ciencia e aTecnologia through the research project CMU-PT/SIA/0026/2009. The workof M. R. D. Rodrigues was supported by the EPSRC through the researchgrant EP/K503459/1. The work of R. Calderbank and L. Carin was partiallysupported by the Defense Advanced Research Projects Agency (DARPA)under the KeCom program, and also by the Office of Naval Research (ONR)and by the Air Force Office of Scientific Research under the ComplexNetworks Program. This work was also supported by the Royal SocietyInternational Exchanges Scheme IE120996.F. Renna is with the Instituto de Telecomunicacoes and the Departamento deCiencia de Computadores, Faculdade de Ciencias da Universidade do Porto,Porto, Portugal (e-mail: [email protected]).R. Calderbank and L. Carin are with the Department of Electricaland Computer Engineering, Duke University, Durham NC, USA (e-mail:{robert.calderbank, lcarin}@duke.edu).M. R. D. Rodrigues is with the Department of E&EE, University CollegeLondon, London, UK (email: [email protected]).

offers the means to resilient data communication, storage andprocessing.

It has been recognized recently that the so-called Nyquistrate can be excessive in various emerging applications [1]–[3]: this – in addition to representing a burden to analog-to-digital converters [4], [5] – can also lead to a hugenumber of samples that compromise communications, storageand processing resources. Modern acquisition systems thusadopt a two-step approach that involves both an analog-to-digital conversion operation, whose purpose is to convert theinformation-bearing signal from the analogue to the digitaldomain, and a compression operation whose purpose is to offersuccinct (near lossless) representations of the data.

There has been a recent emergence of a new sensingmodality, emblematically known as Compressive Sensing(CS) [6]–[8], that offers the means to simultaneously senseand compress a signal without any loss of information (underappropriate conditions on the signal model and measurementprocess). The sensing process is based on the projection ofthe signal of interest onto a set of vectors, which can beeither constituted randomly [6]–[10] or designed [11], [12],and the recovery process is based on the resolution of aninverse problem. It is well known that the reconstruction ofan n-dimensional signal that admits an s-sparse representationin some orthonormal basis or frame, via `

0

-pseudonorm min-imization algorithms, requires only s + 1 noiseless measure-ments [13], [14]. However, there are no tractable algorithmsable to solve such a minimization problem. On the other hand,`

1

minimization methods [15] or iterative methods, like greedymatching pursuit [16]–[18], provide reliable reconstructionwith overwhelming probability with only O(s log(n/s)) linearrandom measurements or projections [6], [8], [10].

However, even in early CS studies, it has been recognizedthat it is possible to derive better compression performance,in terms of the minimum number of measurements necessaryto achieve perfect or nearly perfect reconstruction, by lever-aging the fact that signals often obey models with additionalstructure beyond conventional sparsity. Some popular modelsthat capture such additional structure include the union of sub-spaces [19]–[22], wavelet trees [19], [23] or manifolds [24],[25]. Within the union of subspaces model, the source signalis assumed to belong to one out of a collection of K subspaceswith dimension less than or equal to s. Recovery of a signalin a union of subspaces is equivalent to the reconstrution ofa block-sparse signal, when the individual subspaces in theunion of subspaces model are decomposable as the directsum of a given number of lower dimensional subspaces [21].

The minimum number of measurements required for reliablereconstruction in such scenarios has been shown to be ofthe order O(s + log(2K)) [19] when using mixed `

2

/`

1

-norm minimization [21]. On the other hand, tree models,where the non-zero coefficients of the source signal are knownto be gathered into a rooted, connected, tree structure, candescribe the most relevant wavelet coefficients of piecewisesmooth signals or images [26]. In this case, the number ofmeasurements required for reliable reconstruction is of orderO(s) [23] by using a model-based version of the CoSaMPalgorithm [27]. Finally, in the case of Riemannian manifolds,the minimum number of random projection measurementsneeded for reliable reconstruction has been derived to be ofthe order O(s log(nV R⌧

�1

)), where s is the dimension of themanifold, ⌧�1 being its condition number, V being its volumeand R being its geodesic covering regularity [24].

Another very useful structured model is the Gaussian mix-ture model (GMM) [25], [28]–[30]. GMMs are typically usedin conjunction with the Bayesian CS formalism that entails theuse of statistical descriptions of the source and a statisticaldescription of the measurement system in order to performreconstruction [31]. One important feature of these modelsrelates to the existence of efficient and optimal inversion proce-dures, which can be expressed analytically in closed form [25].The other important feature – in addition to the approximationof any distribution with arbitrary precision [32] – relates to thefact that these models have also been shown to provide state-of-the-art results in various practical problems [29]. Theseinclude problems in image processing such as interpolation,zooming, deblurring [28]–[30] and dictionary learning [25].

A GMM also relates to the other well-known structuredmodels in the literature [19]–[22], [24], [25]. For example,the GMM can be seen as a Bayesian counterpart of the unionof subspaces model (assuming each GMM mixture componenthas a near-low-rank covariance matrix). In fact, a signal drawnfrom a GMM lies in a union of subspaces, where eachsubspace corresponds to the image of each class-conditionedcovariance matrix in the model1. In addition, a low-rank GMMcan also be seen as an approximation to a compact manifold.Compact manifolds can be covered by a finite collection oftopological disks that can be represented by high-probabilityellipsoids living on the principal hyperplanes corresponding tothe different components of a low-rank GMM [25]. However,we emphasize that adopting a GMM in lieu of the otherstructured models has a specific advantage. Reconstruction ofa signal drawn from a GMM from compressive linear mea-surements in Gaussian noise can be very effectively performedvia a closed-form inversion formula [25].

As such, and also in view of its practical relevance, thispaper studies in detail the behavior of the minimum mean-squared error (MMSE) associated with the reconstruction ofa signal drawn from a GMM, based on a set of linear andnoisy compressive measurements. We consider the asymptoticregime of low-noise, which is relevant in various signal andimage processing scenarios [33], [34]. The emphasis is to

1More generally, a signal drawn from a GMM model lies in a union of affinespaces rather than linear subspaces, where each affine space is associatedwith the mean and covariance of each class in the GMM model.

understand, as a function of the properties of the linearmeasurement kernel and the Gaussian mixture, whether theMMSE converges or does not converge to zero as the noisepower converges to zero, i.e. the MMSE phase transition. Themain contributions are:

• A bound on the number of linear random measurementsthat are necessary to reconstruct perfectly a signal drawnfrom a GMM in the low-noise regime;

• A bound on the number of linear random measurementsthat are sufficient to reconstruct perfectly a signal drawnfrom a GMM in the low-noise regime, by analyzing theMMSE performance of a (sub-optimal) classification andreconstruction strategy;

• Generalization of the bounds on the number of measure-ments that are necessary and/or sufficient to reconstruct asignal drawn from a GMM based on a set of noisy com-pressive measurements, considering the scenario wherethe linear measurement kernel is constituted randomlyand then extended to the case for which the linearmeasurement kernel is designed to minimize the mean-squared error;

• Characterization, whenever possible, of a more refinedbehavior of the low-noise asymptotics of the MMSE, thatportray the existence or absence of an MMSE error floor(the phase transition) as well as the MMSE decay, as afunction of the geometry of the kernel and the geometryof the Gaussian mixture.

Overall, this contribution offers an analysis of the re-construction performance and associated phase transitionsthat, and in contrast with other results in the CS literature(e.g. [6], [9], [19], [21]–[24], [35]–[40]), is non-asymptoticin the signal dimension or the signal sparsity. Recent workshave also proposed the use of message passing and beliefpropagation methods to increase the speed of reconstructionalgorithms [41]–[45]. However, these approaches have alsobeen studied under the large-system assumption. In particular,message passing methods are proved to be computationallyefficient in large-scale applications, while guaranteeing reliablereconstruction with a number of measurements n the sameorder as `

1

-norm minimization [41], [43]. In fact, messagepassing algorithms are shown to be equivalent to MMSEestimation in the asymptotic large-system limit, when theprojection matrix is sparse and the inputs are independentidentically distributed (i.i.d.) with arbitrary distribution [42].

On the other hand, the analysis proposed in this paper isbased on models that naturally incorporate memory ratherthan memoryless models, as opposed to previous contributionsin the literature on the information-theoretic characterizationof CS [38]–[40]. Memoryless models, however, have beencharacterized in terms of the reconstruction MMSE in thelarge-system limit. In particular, [46] shows that, in the large-system limit, the overall mean-squared error (MSE) can bedecoupled into the MSE relative to the reconstruction of theindividual elements of the sparse signal, and such value admitsa single-letter characterization. This analysis is justified onthe basis of a heuristic method from statistical physics, thereplica method, but it can be proved rigorously for the case

2

of sparse measurement matrices. The replica method has beenshown to provide bounds that are in agreement with the exactanalysis for the case of sparsity-pattern recovery in the low-noise regime [47], and it has been used also to evaluate theMSE associated to the maximum a posteriori (MAP) estimatorin the large system limit [48].

The remainder of the paper is organized as follows: Sec-tion II introduces the system model and the main performancequantities and definitions. The impact of a random linearmeasurement kernel on the behavior of the MMSE and itsphase transition is investigated first for a signal drawn froma Gaussian distribution in Section III, to develop essentialintuition; we then consider a signal drawn from a mixtureof Gaussian distributions in Section IV. The impact on thephase transition of designing the measurement kernel is thenillustrated in Section V. Section VI exhibits various numericalresults both with synthetic and real data, illustrating the mainoperational features of the problem. Section VII summarizesthe main contributions. For completeness, Appendix A collectsa number of useful Lemmas that are relevant for the proofs ofthe main results reported in Appendices B and C.

We use the following notation: boldface upper-case lettersdenote matrices (X) and boldface lower-case letters denotecolumn vectors (x); the context defines whether the quantitiesare deterministic or random. The symbols I

n

and 0

m⇥n

rep-resent the identity matrix of dimension n⇥n and the all-zero-entries matrix of dimension m ⇥ n, respectively (subscriptswill be dropped whenever the dimensions are clear from thecontext). The expressions (·)†, tr(·), rank(·) represent thetranspose, trace and the rank operators, respectively. Im(·)and Null(·) denote the (column) image and null space of amatrix, respectively, (·)? denotes the orthogonal complementof a linear subspace, and dim(·) denotes the dimension of alinear subspace. E [·] represents the expectation operator. TheGaussian distribution with mean µ and covariance matrix ⌃

is denoted by N (µ,⌃).

II. SYSTEM MODEL

We study the problem associated with the reconstruction ofa source signal x 2 Rn from a set of ` < n noisy, linearprojections y 2 R` where

y = �x+w, (1)

and � 2 R`⇥n is the linear measurement kernel2 andw ⇠ N (0,�

2 · I`

) is zero-mean, additive white Gaussiannoise (AWGN). We consider in Sections III and IV randommeasurement kernel designs, where the entries of � are drawni.i.d. from a zero-mean, fixed-variance, Gaussian distribution,which is common in the CS literature [6], [8]. However, wealso consider in Section V design of the measurement kernel�, that aims to minimize the reconstruction error.

In this work we also concentrate on two particular distri-butions for the source vector: a Gaussian distribution and aGMM distribution (of course, the former is a special case ofthe latter). For a Gaussian source vector, x ⇠ N (µ

x

,⌃

x

)

2Throughout the paper, we will refer to � as the sensing matrix, measurementmatrix and kernel, interchangeably.

where µx

represents the mean and ⌃

x

represents the (possiblyrank deficient) covariance matrix. We denote the eigenvaluedecomposition of the positive semidefinite covariance matrix

⌃

x

= U

x

⇤

x

U

†x

= U

x

diag(�

x1 , . . . ,�xs , 0, . . . , 0)U†x

, (2)

where the orthogonal matrix U

x

contains the eigenvectors of⌃

x

, the diagonal matrix ⇤

x

= diag(�

x1 , . . . ,�xs , 0, . . . , 0)

contains the eigenvalues of ⌃

x

, �x1 � . . . � �

xs > 0, ands = rank(⌃

x

) represents the rank of ⌃x

.For a GMM source, x ⇠ P

k

p

k

N (µ(k)

x

,⌃

(k)

x

), so thatthe source vector is assumed to be drawn from one outof K different classes with probability p

k

, k = 1, . . . ,K

where the distribution of the source vector conditioned onthe class k is Gaussian with mean µ(k)

x

and (possibly rankdeficient) covariance matrix ⌃

(k)

x

. We let s

k

= rank(⌃

(k)

x

),s

km

= rank(⌃

(k)

x

+⌃

(m)

x

) and s

max

= max

k

s

k

.A low-rank modeling approach, where s

k

< n for someor all k, is the basis of the theory developed in Sections III,IV and V. This implies that a realization of the source signallies on one out of the K affine subspaces associated withthe translation by the mean vector µ(k)

x

of the subspacescorresponding to the images of the class-conditioned covari-ance matrices ⌃

(k)

x

. Therefore, a signal drawn from a GMMmodel can also be seen to lie on a union of subspaces(or affine subspaces) [21]. However, the fact that naturalsignals and images are not always exactly low-rank but rather“approximately” low-rank is also discussed in the sequel (seeSection VI-B).

We use the MMSE to assess the level of distortion incurredin the reconstruction of the original source vector x from theprojections vector y in the CS model in (1), which is givenby:

MMSE = E⇥kx� ˆ

x(y)k2⇤ , (3)

whereˆ

x(y) = E [x|y] . (4)

We focus on the characterization of the behavior of theMMSE in the low-noise regime, i.e., for �

2 ! 0, whichrepresents the regime with most operational relevance in manysignal and image processing applications (the noise magnitudetypically considered for these applications is �

2

= �60

dB [33], [34]). This includes the characterization of an asymp-totic expansion of the MMSE as �

2 ! 0 together withthe characterization of the phase transition of the MMSE as�

2 ! 0. The MMSE phase transition, in line with otherresults in the literature [37], [38], corresponds to the minimumnumber of measurements that guarantee perfect reconstructionin the low-noise regime.

III. GAUSSIAN SOURCES

We first consider the characterization of the MMSE phasetransition and a low-noise MMSE expansion associated witha linearly and compressively measured signal drawn from aGaussian source. Such characterizations, which can be crisplyexpressed in terms of the geometry of the measurement kernel,the geometry of the source and their interplay, pave the way to

3

the characterization of the MMSE phase transition associatedwith GMM sources.

For a Gaussian source with mean µx

and covariance matrix⌃

x

, in the presence of zero-mean, additive Gaussian noisewith covariance matrix �

2

I, the conditional mean estimatorcan be expressed as follows [49]:

ˆ

x(y) = W(y) = µx

+W(y ��µx

), (5)

where W = ⌃

x

�

†(�

2

I + �⌃

x

�

†)

�1 corresponds to theWiener filter associated with a Gaussian source with meanzero and covariance ⌃

x

, and the MMSE can also be expressedin closed form as follows [49]:

MMSEG(�

2

) = tr

⇣⌃

x

�⌃

x

�

† ��

2

I+�⌃

x

�

†��1

�⌃

x

⌘,

(6)where we explicitly highlight the dependence of the MMSEon the measurement noise variance �

2, assuming a fixedGaussian source with covariance ⌃

x

. The following theoremnow reveals the asymptotic behavior of the MMSE in the low-noise regime. We define

⌃ = ⌃

12x

�

†�⌃

12x

= U⇤U

†, (7)

where ⌃

12x

is the (positive semidefinite) matrix square root of⌃

x

, U is an orthogonal matrix that contains the eigenvectorsof ⌃, ⇤ = diag(

p�

1

, . . . ,

p�

`

0, 0, . . . , 0) is a diagonal matrix

that contains the eigenvalues of ⌃, �

1

� . . . � �

`

0> 0

and `

0= rank(⌃). Note also that, when the entries of the

measurement kernel � are drawn i.i.d. from a zero-mean,fixed-variance, Gaussian distribution, the rank of ⌃ is equalto `

0= rank(⌃) = min{s, `}, with probability 1.

Theorem 1: Consider the linear measurement model in (1)where x ⇠ N (µ

x

,⌃

x

), s = rank(⌃

x

) and ` is the numberof measurements. The first-order, low-noise expansion of theMMSE is given by:

MMSEG(�

2

) = MG1 +DG · �2

+ o

��

2

�. (8)

The zero-order term in the expansion, which relates to theMMSE floor, is given by:

MG1 = lim

�

2!0

MMSEG(�

2

) =

sX

i=`

0+1

u

†i

⌃

x

u

i

, (9)

where the vectors u

`

0+1

, . . . ,u

s

form an orthonormal basis ofthe linear subspace Null(⌃)\Null(⌃

x

)

?, and the coefficientof the first-order term in the expansion is given by:

DG=

`

0X

i=1

1

�

i

u

†i

⌃

x

u

i

, (10)

where u

1

, . . . ,u

`

0 are eigenvectors of ⌃ corresponding to thepositive eigenvalues �

1

, . . . ,�

`

0 .Proof: The proof of this Theorem is provided in Ap-

pendix B.The value of the zero-order term is clearly zero if ` � s

and it is non-zero otherwise with its value dictacted by theinteraction of the geometry of the source with the geometryof the measurement kernel, as portrayed via (9) and (7). Thevalue of the coefficient of the first-order term is non-zero with

its value depending as well on the interaction of the descriptionof the source and of the kernel.

Theorem 1 and (8), (9) and (10) then lead to the conclusions:• When ` < s the number of measurements is not sufficient

to capture the full range of the source, so that thereconstruction is not perfect (MG

1 6= 0). On the otherhand, when ` � s such a number of measurementscapture completely the source information leading toperfect reconstruction (MG

1 = 0) in the low-noiseregime. That is, one requires the number of linear randommeasurements to be greater than or equal to the dimensionof the subspace spanned by the source for the phasetransition to occur.

• When ` � s, the rate of decay of the MMSE is O ��2

�

as �

2 ! 0, as in the scalar case [50]. On the otherhand, the power offset of the MMSE on a 10 · log

10

1

�

2 –scale is dictated by the quantity 10 · log

10

DG. In fact, thequantity DG, which represents the multivariate Gaussiancounterpart of the MMSE dimension put forth in [50],distinguishes MMSE expansions associated with differ-ent realizations of the measurement kernel and differentsource covariances.

• Of particular interest, the presence or absence of a MMSEfloor depends only on the relation between the numberof measurements ` and the rank of the source covariances. On the other hand, the exact value of the MMSEfloor (when ` < s) and the MMSE power offset (when` � s) depends on the relation between the geometry ofthe random measurement kernel and the geometry of thesource.

IV. GMM SOURCES

We are now ready to consider the characterization of MMSEphase transitions associated with a linearly and compressivelymeasured signal drawn from a GMM source. In particular,and in view of the lack of a closed-form tractable MMSEexpression, we derive necessary and sufficient conditions forthe phase transitions to occur via bounds to the MMSEfor GMM sources, which will be denoted by the symbolMMSEGM

(�

2

).

A. Necessary conditionThe necessary condition on the number of random linear

measurements (components of � constituted at random) forthe MMSE phase transition to occur is based on the analysisof a lower bound to the MMSE. We express the lower boundin terms of the Gaussian MMSE associated to the actual classfrom which each signal realization is drawn. Namely, we have

MMSEGM(�

2

) = E⇥kx� ˆ

x(y)k2⇤ (11)

=

KX

k=1

p

k

E⇥kx� ˆ

x(y)k2|c = k

⇤(12)

�X

k

p

k

MMSEGk

(�

2

) = MSELB(�2

), (13)

where MMSEGk

(�

2

) denotes the MMSE associated with thereconstruction of Gaussian signals x in class c = k from the

4

measurement vector y. Note that the equality in (12) is due tothe total probability formula and the inequality in (13) followsfrom the optimality of the MMSE estimator (5) for a singleGaussian source.

Via the analysis of MSELB(�2

), we obtain immediately thefollowing necessary condition on the number of random linearmeasurements for the true MMSE to approach zero.

Theorem 2: Consider the linear measurement model in (1)where x ⇠ P

k

p

k

N (µ(k)

x

,⌃

(k)

x

), s

max

= max

k

s

k

, withs

k

= rank(⌃

(k)

x

), and ` is the number of measurements. Then,with probability 1, it follows that:

lim

�

2!0

MMSEGM(�

2

) = 0 ) ` � s

max

. (14)

Proof: It is evident that if MMSEGk

(�

2

) ! 0 as�

2 ! 0, for all k, then MSELB(�2

) ! 0 as �

2 ! 0.Theorem 1 proves that a necessary and sufficient condi-tion for lim

�

2!0

MMSEGk

(�

2

) = 0 to hold with probabil-ity 1 is that ` � s

k

; therefore, a necessary condition forlim

�

2!0

MMSEGM(�

2

) = 0 to hold with probability 1 is that` � s

max

= max

k

s

k

.An immediate corollary is the following first-order, low-

noise expansion for the lower-bound of the MMSE for GMMinputs:

MSELB(�2

) =

X

k

p

k

MG1k

+

X

k

p

k

DGk

!�

2

+ o

��

2

�,

(15)where MG

1kand DG

k

are the zero and first-order terms of theexpansion of the Gaussian MMSE corresponding to class k.

B. Sufficient conditionThe sufficient condition on the number of random linear

measurements for the MMSE phase transition to occur is basedinstead on the analysis of an upper bound to the MMSE.We construct such an MMSE upper bound – denoted asMSECR(�

2

) – by using a (sub-optimal) classify and recon-struct procedure3:

1) First, we obtain an estimate of the signal class by theMAP classifier as follows:

c = argmax

k

p(y|c = k)p

k

, (16)

where the variable c represents the estimate of the signalclass, the random variable c represents the actual signalclass and p(y|c = k) denotes the conditioned probabilitydensity function (pdf) of the measurement vector y giventhe signal class k;

2) Then, we recontruct the source vector x from themeasurement vector y by using the conditional meanestimator associated with the class estimate c as follows:

ˆ

x(y, c = c) = Wc

(y) = µ(c)

x

+W

c

(y��µ(c)

x

). (17)

Note that a similar approach has been shown to offer state-of-the-art performance in the reconstruction of signals drawn

3Note that the classify and reconstruct procedure is presented here as a math-ematical tool to determine an upper bound to the number of measurementsneeded to obtain perfect reconstruction when using the optimal conditionalmean estimator.

from GMM sources from compressive measurements [29],[30].

The optimality of the conditional mean estimator togetherwith the (in general) sub-optimality of the classify and recon-struction approach leads immediately to the fact that:

MMSEGM(�

2

) MSECR(�2

). (18)

The analysis of the classify and reconstruct MMSE, whichis aided by recent results on the characterization of the perfor-mance of GMM classification problems from noisy compres-sive measurements [51], then leads to the following sufficientcondition on the number of random linear measurements forthe true MMSE to approach zero.


k

p

k

N (µ(k)

x

,⌃

(k)

x

), s

max

= max

k

s

k

, withs

k

= rank(⌃

(k)

x

), and ` is the number of measurements. Then,with probability 1, it holds:

` > s

max

) lim

�

2!0

MMSEGM(�

2

) = 0. (19)

Proof: The proof of this Theorem is provided in Ap-pendix C.

C. The MMSE phase transitionThe characterization of the MMSE phase transition follows

by combining the results encapsulated in Theorems 2 and 3. Inparticular, it is possible to construct a sharp characterization ofthe transition that is accurate within one measurement where:

• When ` < s

max

, the function MMSEGM(�

2

) convergesto an error floor as �

2 ! 0;• When ` > s

max


2

) convergesto zero as �

2 ! 0;• When ` = s

max


2

) may or maynot approach zero as �

2 ! 0, depending on the exactclass dependent source covariances (see Section VI).

Note that – akin to the Gaussian result – one requires thenumber of linear random measurements to be greater than thelargest of the dimensions of the subspaces spanned by theclass dependent source covariances for the phase transition tooccur. This is due to the fact that – as reported in Appendix C– with such a number of measurements one is able to classifyperfectly and thereby to reconstruct perfectly in the low-noise regime. Note also that the classify and reconstructprocedure is nearly “phase transition” optimal: the numberof measurements required by such a procedure differs atmost by one measurement from the number of measurementsrequired by the optimal conditional mean estimation strategy.This also provides a rationale for the state-of-the-art resultsreported in [29], [30], which are based on the use of the classconditioned Wiener filters for reconstruction and the detectionof the a posteriori most probable class of the signal.

V. FROM RANDOM TO DESIGNED KERNELS

The emphasis of Sections III and IV has been on thederivation of necessary and sufficient conditions on the numberof linear random measurements for the MMSE phase transitionto occur. However, in view of recent interest on the design of

5

linear measurements in the literature [11], [12], [33], [34],[52], it is also natural to ask whether designed kernels havean impact on such bounds.

In particular, we seek to characterize the impact on theMMSE phase transition of kernels designed via the followingoptimization problem:

minimize�

MMSE(�2

,�)

subject to tr

��

†� `

(20)

where the constraint guarantees that, on average, the rowsof the designed kernel have unit `

2

-norm. Note that, for thesake of clarity, we now express explicitly that the MMSE is afunction of the linear measurement kernel.

We start by showcasing the optimal linear kernel designfor a Gaussian source, where “optimality” is defined by theoptimization problem in (20), which follows immediately byleveraging results on the joint optimization of transmitter andreceiver for coherent multiple input multiple output (MIMO)transmission [53].


x

,⌃

x

), s = rank(⌃

x

) and ` is the number ofmeasurements. Then, the measurement kernel �? that solvesthe optimization problem in (20) can be expressed as follows:

�

?

=

hdiag

⇣q�

?

�,1

, . . . ,

q�

?

�,`

⌘0

`⇥(n�`)

iU

†x

, (21)

where the squared singular values of �? are obtained throughthe water-filling principle [54] as follows

�

?

�,i

=

⌘ � �

2

�

x,i

�+

, (22)

and ⌘ > 0 is such thatP

`

i=1

�

?

�,i

` and [x]

+

= max{x, 0}.Proof: The problem of finding the projection matrix

which minimizes the reconstruction MMSE for Gaussian inputsignals can be mapped, with appropriate modifications, tothe problem of finding the linear precoder which minimizesthe MMSE of a MIMO transmission system. In particular,the object function of this minimization problem is a Schur-concave function of the MMSE matrix and the optimal lin-ear precoder is shown to diagonalize the MIMO channelmatrix [53, Theroem 1]. This implies that, in our scenario,the designed kernel right singular vectors correspond to thesource covariance eigenvectors, i.e., that the measurementkernel that minimizes the MMSE exposes the modes of thesource covariance4. Then, the fact that the squared singularvalues of �

? obey the water-filing type of interpretation in(22) follows from the Karush-Kuhn-Tucker (KKT) conditionsassociated with the optimization problem yielded by takingthe kernel right singular vectors to correspond to the sourcecovariance eigenvectors.

It is now straightforward to show that kernel design doesnot impact the phase transition of the MMSE associated withGaussian sources. However, and despite the fact that kernel

4In general, when the additive Gaussian noise w has a non diagonal covariancematrix ⌃w , the measurement kernel �? can be shown to expose and alignthe modes of both the input source and the noise, as observed for themeasurement kernel which maximizes the mutual information between x

and y [33].

design does not affect the number of measurements necessaryto observe the phase transition, we also show that there is valuein using designed kernels in lieu of random ones because onecan thus improve reconstruction performance in terms both ofa lower error floor (if present) and a lower power offset.


x

,⌃

x

), s = rank(⌃

x

), ` is the number ofmeasurements and � = �

?, where �

? solves the optimizationproblem in (20). Then, the first-order, low-noise expansion ofthe MMSE is given by:

MMSEG(�

2

,�

?

) = MGD1 +DGD · �2

+ o(�

2

). (23)

where

MGD1 =

sX

i=`

0+1

�

x,i

, DGD= (`

0)

2

/` (24)

and `

0= min{s, `}.

Proof: On substituting the expression of �? in (21) into(6), it is possible to expand the MMSE associated with theoptimal linear kernel design as follows:

MMSEG(�

2

,�

?

) =

`

0X

i=1

�

x,i

1 +

1

�

2�x,i

�

?

�,i

+

sX

i=`

0+1

�

x,i

. (25)

Observe that in the limit �

2 ! 0, it follows from (22) that�

?

�,i

= `/`

0 for i = 1, . . . , `

0 and �

?

�,i

= 0 for i = `

0, . . . , `.

Moreover, notice also that the second term in (25) is identicallyequal to zero if and only if ` � s.

Note that when ` < s the error floor corresponds to thesource power that the kernel fails to capture in view of itscompressive nature. On the other hand, when ` � s the MMSEdecay associated with an optimal kernel is equal to that of arandom one, i.e. O(�

2

), but the MMSE power offset is lower:it is only a function of the number of measurements and thedimension of the subspace spanned by the source, indepen-dently of the exact form of the eigenvectors or eigenvalues ofthe source covariance.

Finally, it is also straightforward to show that kernel designdoes not impact the phase transition of the MMSE associatedwith GMM sources. The minimum number of measurementsrequired to perfectly reconstruct the signal in the low-noiseregime is also ` > s

max

in this case. However, careful kerneldesign can increase the system performance by guaranteeinglower error floors and power offsets.

The method leveraged to prove this result is also basedon the analysis of lower and upper bounds to the MMSEassociated with the reconstruction of signals drawn from aGMM source that are sensed now via the optimal linear kerneldesign. In particular, we consider a lower bound – which wedenote by MSELBD(�

2

) – which is expressed in terms of thevalue of the MMSE corresponding to the actual Gaussian classfrom which each realization of the input signal is drawn andthe corresponding optimal kernel �?

k

in (21). Then,

MMSEGM(�

2

,�

?

) � MSELBD(�2

) (26)

=

X

k

p

k

MMSEGk

(�

2

,�

?

k

). (27)

6

We also consider a trivial upper bound: the MMSE associatedwith the optimal kernel design can always be upper boundedby the MMSE associated with a random kernel design.


k

p

k

N (µ(k)

x

,⌃

(k)

x

), s

max

= max

k

s

k

, withs

k

= rank(⌃

(k)

x

), ` is the number of measurements and� = �

?, where �

? that solves the optimization problem in(20). Then, it holds

lim

�

2!0

MMSEGM(�

2

,�

?

) = 0 ) ` � s

max

, (28)

and` > s

max

) lim

�

2!0

MMSEGM(�

2

,�

?

) = 0. (29)

Proof: It is possible to prove the sufficient condition (29)by observing that

MMSEGM(�

2

,�

?

) MMSEGM(�

2

,�) (30)

for all possible measurement matrices � that verify the traceconstraint in (20). Among them, we can consider randomkernels with Gaussian, zero-mean and fixed-variance, i.i.d.entries and we can obtain the sufficient condition by leveragingdirectly the result in Theorem 3. On the other hand, in order toprove the necessary condition (28), consider the lower boundin (27) and observe that

MMSEGk

(�

2

,�

?

k

) MMSEGk

(�

2

,�) (31)

for all possible measurement matrices � that verify the traceconstraint in (20). Also in this case, we can consider randomkernels with Gaussian, zero-mean and fixed-variance, i.i.d.entries. Then, it is possible prove (28) by leveraging thenecessary condition embedded in Theorem 5, that is,

lim

�

2!0

MMSEGk

(�

2

,�) = 0 ) ` � s

k

. (32)

In view of (23) and (27), the low-noise expansion of theproposed lower bound is given by:

MSELBD(�2

) =

X

k

p

k

MGD1k

+

X

k

p

k

DGDk

!�

2

+ o(�

2

),

(33)where MGD

1kand DGD

k

are the zero-order term and thecoefficient of the first-order term of the expansion of theGaussian MMSE corresponding to class k. We also concludethat, though kernel design may not increase the MMSE decayrate beyond O(�

2

), it can have an impact on the MMSE poweroffset associated with GMM sources.

It is important to note though that this analysis has con-centrated on offline designs where the ` measurements aredesigned concurrently [33], [52], rather than online kerneldesigns that entail a sequential design of the measurementsby leveraging information derived from previous measure-ments [30], [33], [55]. It is possible that such online designshave an impact on the phase transition.

VI. NUMERICAL RESULTS

We now provide results both with synthetic and real data toillustrate the theory. Recovery is based upon the conditional

!10 0 10 20 30 40 50 60 70

!60

!50

!40

!30

!20

!10

0

1/!2 (dB)

MM

SE

(dB

)

2 measurements

3 measurements

4 measurements

5 measurements

Fig. 1. MMSE vs. 1/�2 for different numbers of random measurements ` =2, 3, 4, 5 for a Gaussian source with n = 5 and s = 4. Actual MMSE (solidlines) and low-noise, first-order expansions (dashed lines).

expectation, which is analytic for the case of GMM priors,and optimal in terms of mean-squared error.

A. Synthetic dataFigs. 1 and 2 showcase results pertaining to random ker-

nels, where the entries of the kernel are realizations of i.i.d.Gaussian random variables with zero mean and unit variancewhich are subsequently normalized by a scaling factor thatguarantees that tr

��

†�

� `.Fig. 1 shows the MMSE vs. 1/�

2 for a Gaussian sourcewith dimension n = 5 and rank s = 4. We confirm that theMMSE phase transition occurs with ` = 4 measurements. Wealso confirm that the first-order expansion in (8) captures wellthe behavior of the MMSE, both in the presence and absenceof an error floor, for values of 1/�2 larger than 20-30 dB. Infact, such noise amplitudes are already well below 1/�

2

= 60

dB, which is a noise level with operational significance forvarious image processing applications [33], [34]. Note also thatby taking the number of measurements to be greater than thenumber of measurements that achieves the phase transition wedo not affect the MMSE decay but we only affect the MMSEpower offset.

Fig. 2 now shows the values of the MMSE for a 2-classes GMM input with n = 4, p

1

= p

2

= 0.5, meansµ(1)

x

= µ(2)

x

= 0, and covariances drawn from a centralWishart distribution [56, p. 84] with dimension 4 and degreesof freedom 2, so that s

1

= s

2

= 2. We report the actual valueof MMSEGM

(�

2

), the lower bound MSELB(�2

), and the clas-sify and reconstruct upper bound MSECR(�

2

). We also reportanother MMSE upper bound associated with a linear estimator,i.e., the linear minimum mean-squared error (LMMSE) [57].We notice that when ` = 2 the lower bound converges tozero as �

2 ! 0, whereas the classify and reconstruct upperbound converges to an error floor. However, it appears thatthe classify and reconstruct upper bound captures better the

7

!10 0 10 20 30 40 50!50

!45

!40

!35

!30

!25

!20

!15

!10

!5

0

1/!2 (dB)

MS

E (

dB

)

2 measurements

3 measurments

4 measurements

Fig. 2. MMSE vs. 1/�2 for different numbers of random measurements ` =2, 3, 4 for a 2-classes GMM source with s

1

= s2

= 2. Actual MMSE (solidlines), lower bound (dashed lines), CR upper bound (dashed-dotted lines) andLMMSE upper bound (triangles).

features of the actual MMSE, which also exhibits an errorfloor. On the other hand, notice that when ` � 3 both the lowerand upper bound converge to zero as �2 ! 0, as expected. It isalso interesting to observe that the LMMSE upper bound doesnot describe crisply the MMSE phase transition: it is in factpossible to show by leveraging the previous machinery thatsuch a sub-optimal estimator requires the number of randommeasurements to be larger than or equal to the dimension ofthe union of subspaces spanned by the different signals in thedifferent classes for a MMSE phase transition to occur.

Another interesting feature relates to the fact that when` = 3 both the upper bound to the MMSE and the actualMMSE are O(�) as �

2 ! 0 whereas when ` > 3 the upperand lower bound to the MMSE and the actual MMSE areO(�

2

) as �

2 ! 0. This behavior appears to be related tothe fact that when ` = 3 or ` > 3 the misclassificationprobability of the optimal MAP classifier, which is also usedin the classify and reconstruction procedure, approaches zeroas �

2 ! 0 with different decays [51]. This behavior standsin contrast to the scalar case [50] and implies that the MMSEdimension corresponding to the mixture of Gaussian vectorsis not equal to the mixture of the MMSE dimensions of theindividual Gaussian vectors: this is only true when the first-order expansions of lower and the upper bound coincide.

B. Real dataThe phase transition phenomena can also be observed in

the reconstruction of real imagery data. As an example, weconsider a 256⇥ 256 cropped version of the image “barbara”.The input x, in this case, represents 8 ⇥ 8 non-overlappingpatches extracted from the image. The source is described bya 20-classes GMM prior that is obtained by training the non-parametric, Bayesian, dictionary learning algorithm describedin [25] over 100,000 patches randomly extracted from 500

0 10 20 30 40 50 60 7010

20

30

40

50

60

70

1/!2 (dB)

PS

NR

(dB

)

10 measurements

13 measurements

14 measurements

15 measurements

16 measurements

20 measurements

Fig. 3. PSNR vs. 1/�2. Image “barbara.png”, projected onto a GMM modelwith s

max

= 14.

images in the Berkeley Segmentation Dataset5. Note that theimage “barbara” is not in the training ensemble. The soobtained GMM prior has full rank, class conditioned inputcovariance matrices ⌃

(k)

x

. In order to fit the trained GMMto the low-rank model (or, equivalently, to the model of aunion of subspaces), which is the basis of our theory, onlythe first s

max

= 14 principal components of each class-conditioned input covariance matrix are retained, and theremaining 50 eigenvalues of each covariance matrix are setto be equal to zero. Moreover, our test image is modifiedby projecting each patch extracted from the image “barbara”onto the 14-dimensional sub-space corresponding to the low-rank input covariance matrix of the class associated withthat particular patch. Note that projecting the image onto thelower dimensional subspaces does not introduce substantialdistortion. In fact, the peak signal-to-noise ratio (PSNR) ofthe projected image with respect to the original ground truthin this case is equal to 77.3 dB6. This is a manifestation of thefact that natural images are well represented by “almost low-rank” GMM priors, and the eigenvalues of the correspondingclass conditioned input covariance matrices decay rapidly.This underscores that the low-rank GMM representation is agood model for patches extracted from natural imagery, andtherefore of significant practical value.

Reconstruction of the vectors x from the compressive mea-surements y is performed by the conditional mean estimatorcorresponding to the trained GMM prior after taking the14 principal components, which can be written in closed-form [25].

Fig. 3 shows the PSNR vs. 1/�2 for different numbers ofcompressive measurements. It is possible to clearly observe

5http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/resources.html

6The PSNR values obtained by choosing smax

= 13 and smax

= 15 are76.9 dB and 78 dB, respectively. On the other hand, setting s

max

= 10reduces the PSNR to 75.2 dB and it induces visible distortion effects.

8

Fig. 4. Barbara. The right hand column contains the non-compressed image “barbara.png” after projection onto a GMM model with smax

= 14. The threerows correspond to ` = 8, 12, 16 random measurements, from the top to the bottom. The first three columns, from left to right, correspond to the noise levels1/�2 = 20, 40, 60 dB.

the phase transition when ` > s

max

and that, similarly to whathas been noted in Fig. 2, the PSNR increases approximatelyas O(1/�) when ` = 15, and as O(1/�

2

) when ` > 15.Finally, in Fig. 4 are reported some reconstruction examples.The right hand column contains the image “barbara” projectedonto the union of 14-dimensional subspaces that characterizethe GMM prior. The three rows correspond to ` = 8, 12, 16

random measurements, from the top to the bottom. The firstthree columns, from left to right, correspond to the noise levels1/�

2

= 20, 40, 60 dB. When the number of measurementsis below the phase transition, perfect reconstruction is notpossible, even when �

2 ! 0: this is particularly evident byobserving the high-frequency content of the image. On theother hand, when ` = 16, a clear phase transition is observable.

It is also relevant to reflect further on the fact that ourtheory applies only to low-rank rather than the so-called“approximately” low-rank models. Arguably, a GMM modeltrained with natural images is not exactly low-rank, as realsignals do not perfectly lie on the union of low-dimensionalsubspaces [25]. In fact, it is typical to describe the class con-ditioned covariance matrices associated with a GMM modelas follows:

⌃

(k)

x

=

¯

⌃

(k)

x

+ " I

n

, (34)

where the matrix ¯

⌃

(k)

x

is exactly low-rank, and the matrix" I

n

accounts for model mismatch between real data andtheir projection onto the principal components that containthe large majority of the information associated to the data.Given the fact that the compressive sensing model in (1) withw ⇠ N (0,�

2

I) and full-rank GMM with class conditionedcovariances ⌃(k)

x

is mathematically equivalent to the compres-sive sensing model in (1) with w ⇠ N (0, "��

†+ �

2

I) andlow-rank GMM with class conditioned covariances ¯

⌃

(k)

x

, thenit is possible to appreciate the impact of model mismatch onthe theory.

For example, consider our earlier testing model with im-ages that are not projected onto lower-dimensional union ofsubspaces. It is evident that the reconstruction PSNR wouldnow be upper bounded as �

2 ! 0 for all ` < n, in viewof the noise amplification from �

2 to roughly71/(�

2

+ ").This also leads to the conclusions that the performance of the“approximately” low-rank model as �

2 ! 0 is comparableto the performance of the low-rank model with 1/�

2

= 1/".That is, operating an “approximately” low-rank model at a

7In fact, the noise power depends also on the value of tr(��

†). However,when the entries of � are i.i.d., zero-mean, Gaussian, the matrix ��

†

approximates well the identity matrix.

9

certain �

2 leads to a performance that is comparable to thatof operating a low-rank model at �2

+ ".Nonetheless, it is worth noting that natural images projected

on low-rank union of subspaces turn out to be very goodapproximations of the original images, as it was shown forthe case of the image “barbara” (see Fig. 4), thus validatingthe effectiveness of this model in representing real data.

VII. CONCLUSION

The principal contribution is a non-asymptotic (in the num-ber of dimensions) characterization of MMSE phase tran-sitions associated with the reconstruction of GMM signalsfrom noisy compressive measurements. In particular, it hasbeen shown that with either random or optimal kernel designsit is sufficient to take the number of measurements to bestrictly greater than the largest dimension of the sub-spacesspanned by the class-conditioned signals in the GMM model– a dual of sparsity – for a phase transition to occur. Ithas also been shown that additional measurements and/ordesigned measurements translate only onto (occasionally) animproved MMSE decay or an improved MMSE power offsetin the low-noise regime. Another interesting by-product of thecontribution is the fact that a (sub-optimal) classify and re-construction procedure is nearly phase-transition optimal. Thisalso provides a rationale for the state-of-the-art performanceof similar reconstruction procedures, as reported in the recentliterature [29], [30].

APPENDIX AUSEFUL LEMMAS

Lemma 1: Let A 2 Rn⇥n be a positive semidefinite matrix.Then, for all x 2 Rn,

Ax = 0 , x

†Ax = 0. (35)

Proof: If x

†Ax = 0, then, kA 1

2xk2 = 0, in which

A

12 is the positive semidefinite matrix square root of A.

Therefore, A

12x = 0, which also implies Ax = 0. The

opposite implication in (35) is straightforward.Lemma 2: Given two positive semidefinite matrices,

A

1

,A

2

2 Rn⇥n, with ranks s

1

and s

2

, respectively, ands

12

= rank(A

1

+A

2

), thens

1

+ s

2

2

= s

12

, Null(A

1

) = Null(A

2

)

, Im(A

1

) = Im(A

2

) (36)

Proof: First, it is easy to show that

Null(A

1

+A

2

) = Null(A

1

) \Null(A

2

), (37)

by leveraging Lemma 1 and considering that x†(A

1

+A

2

)x =

0 if and only if x

†A

1

x = 0 and x

†A

1

x = 0. Therefore,dimNull(A

1

+ A

2

) dimNull(A

1

) and dimNull(A

1

+

A

2

) dimNull(A

2

), which immediately implies s

12

�max{s

1

, s

2

}, and, in our case, s

1

= s

2

= s

12

. As aconsequence, we have that

dimNull(A

1

) = dimNull(A

2

) = dimNull(A

1

+A

2

), (38)

which can be combined with (37) to give

Null(A

1

) = Null(A

2

). (39)

Moreover, the last part of (36) can be easily obtained byobserving that the image of a positive semidefinite matrix isthe orthogonal complement of its null space.

Lemma 3: Let A 2 Rn⇥n be a positive semidefinite matrixwith s = rank(A) and let x 2 Rn. Then,

rank(A+ xx

†) = s+ 1 , x /2 Im(A). (40)

Proof: First, note that rank(A + xx

†) s + 1 [58,

§0.4.5.d]. Then, the proof is based on expressing A interms of its eigenvalues �

Ai and eigenvectors u

Ai as A =Pi

�

AiuAiu†Ai

.

APPENDIX BPROOF OF THEOREM 1

We can rewrite the expression of the MMSE for Gaussianinput sources in (6) by making use of the cyclic property ofthe trace and the matrix inversion Lemma [58, §0.7.4]

I�A

†(AA

†+ c

�1

I)

�1

A = (cI+A

†A)

�1

, (41)

with A = �⌃

12x

, as

MMSEG(�

2

) = tr

✓⌃

x

⇣I+ 1/�

2

⌃

12x

�

†�⌃

12x

⌘�1

◆(42)

= tr

⇣⌃

x

U

�I+ 1/�

2

⇤

��1

U

†⌘

(43)

= tr

⇣⌃

x

U

˜

⇤U

†⌘, (44)

where we have used the eigenvalue decomposition of⌃ in (7) and introduced the diagonal matrix ˜

⇤ =

diag

⇣1

1+�1/�2 , . . . ,

1

1+�`0/�2 , 1, . . . , 1

⌘. Note now that the

eigenvalue decomposition of a positive semidefinite matrix canalso be written as

A = U

A

⇤

A

U

†A

=

X

i

�

A,i

u

A,i

u

†A,i

, (45)

in which u

A,i

is the i-th column of UA

. Therefore, the MMSEfor Gaussian inputs can be expressed as

MMSEG(�

2

) =

`

0X

i=1

1

1 + �

i

/�

2

u

†i

⌃

x

u

i

+

nX

i=`

0+1

u

†i

⌃

x

u

i

(46)where u

1

, . . . ,u

`

0 are the eigenvectors of ⌃ correspond-ing to the positive eigenvalues �

1

, . . . ,�

`

0 and the vectorsu

`

0+1

, . . . ,u

n

can form any orthonormal basis of the nullspace Null(⌃), which has dimension dim(Null(⌃)) = n� `

0.The first term in (46) tends to zero when �

2 ! 0, sothat the phase transition of the MMSE is determined by thesecond term, which can be characterized on the basis of thedescription of the two null spaces Null(⌃) and Null(⌃

x

). Inparticular, note that

Null(⌃

x

) ✓ Null(⌃), (47)

as for each v 2 Rn such that ⌃

x

v = 0, it holds also⌃v = 0, since ⌃

12x

v = 0. Therefore, we can choose the basis

10

u

`

0+1

, . . . ,u

n

as follows. The last n�s vectors us+1

, . . . ,u

n

form an othonormal basis of of the space Null(⌃

x

) and theremaining vectors u

`

0+1

, . . . ,u

s

form an orthonormal basis ofNull(⌃) \Null(⌃

x

)

?. Then, we can write (46) as

MMSEG(�

2

) =

`

0X

i=1

1

1 + �

i

/�

2

u

†i

⌃

x

u

i

+

sX

i=`

0+1

u

†i

⌃

x

u

i

.

(48)Observe that, when ` � s, then `

0= ` � s, and the second

term in (48) is identically equally to zero, as the linear spaceNull(⌃) \ Null(⌃

x

)

? contains only the zero vector, and theGaussian MMSE tends to zero when �

2 ! 0. On the otherhand, if ` < s, by Lemma 1, u†

i

⌃

x

u

i

> 0, for i = `

0+1, . . . , s

and the MMSE is characterized by an error floor in the low-noise regime. Specifically, we can expand the Gaussian MMSEas

MMSEG(�

2

) = MG1 +DG · �2

+ o

��

2

�(49)

where

MG1 =

sX

i=`

0+1

u

†i

⌃

x

u

i

, , DG=

`

0X

i=1

1

�

i

u

†i

⌃

x

u

i

. (50)

APPENDIX CPROOF OF THEOREM 3

The proof is based on the analysis of the upper bound onMMSEGM

(�

2

) obtained by considering the mean-squared er-ror associated to the classify and reconstruct decoder describedin Section IV-B. The value of such upper bound is given by

MSECR(�2

) =

X

k

p

k

X

m

p

c|c(m|k)

·E⇥kx�W

m

(y)k2|c = m, c = k

⇤(51)

=

X

k

p

k

p

c|c(k|k)E⇥kx�W

k

(y)k2|c = c = k

⇤

+

X

k

p

k

X

m 6=k

p

c|c(m|k)

·E⇥kx�W

m

(y)k2|c = m, c = k

⇤, (52)

where we have denoted by p

c|c(m|k) the probability that theMAP classifier yields c = m given that the actual input classis c = k. The first term, then, can be upper bounded by usingthe law of total probability, and we obtain

MSECR(�2

) X

k

p

k

E⇥kx�W

k

(y)k2|c = k

⇤

+

X

k

p

k

X

m 6=k

p

c|c(m|k)

·E⇥kx�W

m

(y)k2|c = m, c = k

⇤(53)

= MSELB(�2

)

+

X

k

p

k

X

m 6=k

p

c|c(m|k)

·E⇥kx�W

m

(y)k2|c = m, c = k

⇤, (54)

where MSELB(�2

) is the lower bound to the MMSE, whichhas been shown to approach zero when �

2 ! 0, since we areassuming here that ` > s

max

.

Then, we need to show that, when m 6= k,

lim

�

2!0

p

c|c(m|k)E⇥kx�W

m

(y)k2|c = m, c = k

⇤= 0, (55)

and we consider separately in the remainder of this Appendixthe two cases for which the subspaces corresponding tothe images of the input covariance matrices of class k andm completely overlap or not. In other terms, we considerseparately the cases in which the affine spaces spanned by thesignals in class k and class m differ only for a fixed translationor not.

A. Non-overlapping case: sk+sm2

< s

km

In this case, given that ` > s

max

, by leveraging the resultsin [51, Theorem 2], we can state that

lim

�

2!0

p

c|c(m|k) = 0. (56)

The misclassification probability p

c|c(m|k) is the measure ofthe set representing the decision region of the MAP classifierassociated with class m with respect to the Gaussian measurecorresponding to the Gaussian distribution

N✓

µk

0

�,

⌃

(k)

x

0

0 �

2 · I�◆

. (57)

Moreover it can be shown that, in the limit �2 ! 0, the productin (55) is upper bounded by the integral of a measurablefunction over a set with measure zero, which is then equalto zero.

B. Overlapping case: sk+sm2

= s

km

Observe that, in this case, Lemma 2 states that Im(⌃

(k)

x

+

⌃

(m)

x

) = Im(⌃

(k)

x

) = Im(⌃

(m)

x

). Then, we further considertwo separate cases: i) the difference of the mean vectors ofclasses k and m lies on the subspace spanned by the covariancematrices of the two classes and ii) the difference of the meanvectors of classes k and m does not lie on the subspacespanned by the covariance matrices of the two classes. Weconsider first the case in which

µ(k)

x

� µ(m)

x

/2 Im(⌃

(k)

x

+⌃

(m)

x

). (58)

For simplicity of notation, we introduce the symbol Mkm

=

(µ(k)

x

� µ(m)

x

)(µ(k)

x

� µ(m)

x

)

† and we can show that, withprobability 1,

rank(�(⌃

(k)

x

+⌃

(m)

x

+M

km

)�

†) = s

km

+ 1, (59)

as, by Lemma 3, rank(⌃(k)

x

+⌃

(m)

x

+M

km

) = s

km

+ 1 and` > s

km

. This implies that, using again Lemma 3, we have

�(µ(k)

x

� µ(m)

x

) /2 Im(�(⌃

(k)

x

+⌃

(m)

x

)�

†). (60)

Therefore, by using the result in [51, Theorem 3], we can statethat lim

�

2!0

p

c|c(m|k) = 0, and, by using a similar proof tothat presented in the previous paragraphs, we can show that(55) is satisfied also in this case.

Consider now the case for which

µ(k)

x

� µ(m)

x

2 Im(⌃

(k)

x

+⌃

(m)

x

) = Im(⌃

(m)

x

). (61)

11

Here, the misclassification probability is not guaranteed toapproach zero when �

2 ! 0, and we have to resort to adifferent proof technique. The rationale behind this proof isthat also the mismatched mean-squared error approaches zeroin the low-noise regime, provided that the signals in the actualclass k and the mismatched class m span the same space.

By using the law of total probability we can write

p

c|c(m|k)E⇥kx�W

m

(y)k2|c = m, c = k

⇤ MSEMISkm

(�

2

)

(62)where MSEMIS

km

(�

2

) = E⇥kx�W

m

(y)k2|c = k

⇤is the mis-

matched mean-squared error incurred when estimating signalsdrawn from the Gaussian class k with the conditional meanestimator associated with class m.

Observe that, on denoting by ⌃

(k)

y

= I�

2

+�⌃

(k)

x

�

† thecovariance matrix of the measurement vector y conditionedon class k, we can write

MSEMISkm

(�

2

) = Ehtr

⇣(x�W

m

(y)) (x�Wm

(y))

†⌘|c = k

i

= tr

⇣⌃

(k)

x

⌘� 2tr

⇣W

k

⌃

(k)

y

W

†m

⌘

tr

⇣W

m

⌃

(k)

y

W

†m

⌘+ tr (M

km

)

�2tr

�M

km

�

†W

†m

�

+tr

��W

m

M

km

�

†W

†m

�. (63)

In order to prove that MSEMISkm

(�

2

) approaches zero when�

2 ! 0, we can show the following four identities:

lim

�

2!0

tr

⇣W

k

⌃

(k)

y

W

†m

⌘= tr(⌃

(k)

x

); (64)

lim

�

2!0

tr

⇣W

m

⌃

(k)

y

W

†m

⌘= tr(⌃

(k)

x

); (65)

lim

�

2!0

tr

�M

km

W

†m

�= tr (M

km

) ; (66)

lim

�

2!0

tr

�W

m

�M

km

�

†W

†m

�= tr (M

km

) . (67)

Specifically, we can leverage the inversion Lemma [58]

A(Ic

�1

+BA)

�1

B = I� (I+ cAB)

�1

, (68)

with A = �

† and B = �⌃

(m)

x

, and write

tr

⇣W

k

⌃

(k)

y

W

†m

⌘= tr

⇣⌃

(k)

x

�

†(I�

2

+�⌃

(m)

x

�

†)

�1

�⌃

(m)

x

⌘(69)

= tr

⇣⌃

(k)

x

⌘

�tr

✓⌃

(k)

x

(I+

1

�

2

�

†�⌃

(m)

x

)

�1

◆. (70)

Then, by noting that the matrix �

†�⌃

(m)

x

is diagonalizablewith probability 1, and by following steps similar to thoseadopted in the proof of Theorem 1, we are able to prove (64).Finally, also (65), (66) and (67) are proved by following acompletely similar approach.

REFERENCES

[1] D. Healy and D. Brady, “Compression at the physical interface,” IEEESignal Processing Mag., vol. 25, no. 2, pp. 67–71, 2008.

[2] M. Lustig, D. Donoho, J. Santos, and J. Pauly, “Compressed sensingMRI,” IEEE Signal Processing Mag., vol. 25, no. 2, pp. 72–82, 2008.

[3] M. Duarte, M. Davenport, D. Takhar, J. Laska, T. Sun, K. Kelly, andR. Baraniuk, “Single-pixel imaging via compressive sampling,” IEEESignal Processing Mag., vol. 25, no. 2, pp. 83–91, 2008.

[4] “A/D converters,” Analog Devices Corp., 2009. [Online].Available: http://www.analog.com/en/analog-to-digital-converters/ad-converters/products/index.html.

[5] “Data converters,” Texas Instruments Corp., 2009. [Online]. Available:http://focus.ti.com/analog/docs/dataconvertershome.tsp.

[6] E. Candes, J. K. Romberg, and T. Tao, “Stable signal recovery forincomplete and inaccurate measurements,” Comm. Pure Appl. Math.,vol. 59, no. 8, pp. 1207–1223, 2006.

[7] E. Candes and T. Tao, “Near-optimal signal recovery from randomprojections: Universal encoding strategies?” IEEE Trans. Inf. Theory,vol. 52, no. 12, pp. 5406–5425, 2006.

[8] D. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory, vol. 52,no. 4, pp. 1289–1306, 2006.

[9] E. Candes and T. Tao, “Decoding by linear programming,” IEEE Trans.Inf. Theory, vol. 51, no. 12, pp. 4203–4215, 2005.

[10] R. G. Baraniuk, M. Davenport, R. DeVore, and M. Wakin, “A simpleproof of the restricted isometry property for random matrices,” Constr.Approx., vol. 28, pp. 253–263, Dec. 2008.

[11] A. Ashok, P. Baheti, and M. Neifeld, “Compressive imaging systemdesign using task-specific information,” Applied Optics, vol. 47, no. 25,pp. 4457–4471, 2008.

[12] P. Baheti and M. Neifeld, “Recognition using information-optimaladaptive feature-specific imaging,” J. Opt. Soc. Amer. A, vol. 26, no. 4,pp. 1055–1070, 2009.

[13] R. Venkataramani and Y. Bresler, “Further results on spectrum blindsampling of 2d signals,” in IEEE Int. Conf. Image Proc. (ICIP), vol. 2,1998, pp. 752–756.

[14] D. Baron, M. B. Wakin, M. F. Duarte, S. Sarvotham, and R. G.Baraniuk, “Distributed compressed sensing,” Available online at:http://dsp.rice.edu/cs/, 2005.

[15] E. Candes, J. Romberg, and T. Tao, “Robust uncertainty principles: exactsignal reconstruction from highly incomplete frequency information,”IEEE Trans. Inf. Theory, vol. 52, no. 2, pp. 489–509, 2006.

[16] S. Mallat and Z. Zhang, “Matching pursuits with time-frequency dictio-naries,” IEEE Trans. Signal Process., vol. 41, no. 12, pp. 3397–3415,1993.

[17] S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decompositionby basis pursuit,” SIAM J. Sci. Comput., vol. 20, no. 1, pp. 33–61, 1998.

[18] J. A. Tropp and S. J. Wright, “Computational methods for sparse solutionof linear inverse problems,” Proc. IEEE, vol. 98, no. 6, pp. 948–958,2010.

[19] T. Blumensath and M. Davies, “Sampling theorems for signals from theunion of finite-dimensional linear subspaces,” IEEE Trans. Inf. Theory,vol. 55, no. 4, pp. 1872–1882, 2009.

[20] M. Stojnic, F. Parvaresh, and B. Hassibi, “On the reconstruction ofblock-sparse signals with an optimal number of measurements,” IEEETrans. Signal Process., vol. 57, no. 8, pp. 3075–3085, 2009.

[21] Y. C. Eldar and M. Mishali, “Robust recovery of signals from astructured union of subspaces,” IEEE Trans. Inf. Theory, vol. 55, no. 11,pp. 5302–5316, 2009.

[22] Y. Eldar, P. Kuppinger, and H. Bolcskei, “Block-sparse signals: Un-certainty relations and efficient recovery,” IEEE Trans. Signal Process.,vol. 58, no. 6, pp. 3042–3054, 2010.

[23] R. G. Baraniuk, V. Cevher, M. F. Duarte, and C. Hedge, “Model-basedcompressive sensing,” IEEE Trans. Inf. Theory, vol. 56, no. 4, pp. 1982–2001, 2010.

[24] R. G. Baraniuk and M. B. Wakin, “Random projections of smoothmanifolds,” Found. of Comput. Math., vol. 9, no. 1, pp. 51–77, 2009.

[25] M. Chen, J. Silva, J. Paisley, D. Dunson, and L. Carin, “Compressivesensing on manifolds using a nonparametric mixture of factor analyzers:Algorithm and performance bounds,” IEEE Trans. Signal Process.,vol. 58, no. 12, pp. 6140–6155, 2010.

[26] M. Crouse, R. Nowak, and R. Baraniuk, “Wavelet-based statistical signalprocessing using hidden markov models,” IEEE Trans. Signal Process.,vol. 46, no. 4, pp. 886–902, 1998.

[27] D. Needell and J. A. Tropp, “CoSaMP: Iterative signal recovery fromincomplete and inaccurate samples,” Appl. Computat. Harmon. Anal.,vol. 26, no. 3, pp. 301–321, 2009.

[28] G. Yu and G. Sapiro, “Statistical compressed sensing of Gaussianmixture models,” IEEE Trans. Signal Process., vol. 59, no. 12, pp. 5842–5858, 2011.

12

[29] G. Yu, G. Sapiro, and S. Mallat, “Solving inverse problems withpiecewise linear estimators: From gaussian mixture models to structuredsparsity,” IEEE Trans. Image Process., vol. 21, no. 5, pp. 2481–2499,2012.

[30] J. Duarte-Carvajalino, G. Yu, L. Carin, and G. Sapiro, “Task-drivenadaptive statistical compressive sensing of Gaussian mixture models,”IEEE Trans. Signal Process., vol. 61, no. 3, pp. 585–600, 2013.

[31] S. Ji, Y. Xue, and L. Carin, “Bayesian compressive sensing,” IEEE Trans.Signal Process., vol. 56, no. 6, pp. 2346–2356, 2008.

[32] H. W. Sorenson and D. L. Alspach, “Recursive Bayesian estimationusing Gaussian sums,” Automatica, vol. 7, no. 4, pp. 465–479, 1971.

[33] W. R. Carson, M. Chen, M. R. D. Rodrigues, R. Calderbank, andL. Carin, “Communications-inspired projection design with applicationsto compressive sensing,” SIAM J. Imag. Sciences, vol. 5, no. 4, pp.1185–1212, 2012.

[34] M. Chen, W. Carson, M. R. D. Rodrigues, R. Calderbank, and L. Carin,“Communications inspired linear discriminant analysis,” in Int. Conf.Machine Learn. (ICML), Jun.-Jul. 2012.

[35] E. Candes and T. Tao, “The Dantzig selector: Statistical estimation whenp is much larger than n,” Ann. Statist., vol. 35, no. 6, pp. 2313–2351,2007.

[36] D. Donoho and J. Tanner, “Observed universality of phase transitions inhigh-dimensional geometry, with implications for modern data analysisand signal processing,” Phil. Trans. Roy. Soc. A, vol. 367, no. 1906, pp.4273–4293, 2009.

[37] D. Donoho, A. Maleki, and A. Montanari, “The noise-sensitivity phasetransition in compressed sensing,” IEEE Trans. Inf. Theory, vol. 57,no. 10, pp. 6920–6941, 2011.

[38] Y. Wu and S. Verdu, “Optimal phase transitions in compressed sensing,”IEEE Trans. Inf. Theory, vol. 58, no. 10, pp. 6241–6263, 2012.

[39] G. Reeves and M. Gastpar, “The sampling rate-distortion tradeoff forsparsity pattern recovery in compressed sensing,” IEEE Trans. Inf.Theory, vol. 58, no. 5, pp. 3065–3092, 2012.

[40] A. Tulino, G. Caire, S. Shamai, and S. Verdu, “Support recovery withsparsely sampled free random matrices,” in IEEE Int. Symp. InformationTheory (ISIT), Jul-Aug 2011.

[41] D. Baron, S. Sarvotham, and R. Baraniuk, “Bayesian compressivesensing via belief propagation,” IEEE Trans. Signal Process., vol. 58,no. 1, pp. 269–280, 2010.

[42] D. Guo and C.-C. Wang, “Asymptotic mean-square optimality of beliefpropagation for sparse linear systems,” in IEEE Inf. Theory Workshop,Oct. 2006.

[43] D. L. Donoho, A. Maleki, and A. Montanari, “Message-passing algo-

rithms for compressed sensing,” Proc. Nat. Acad. Sci., vol. 106, no. 45,pp. 18 914–18 919, Nov. 2009.

[44] M. Bayati and A. Montanari, “The dynamics of message passing ondense graphs, with applications to compressed sensing,” IEEE Trans.Inf. Theory, vol. 57, no. 2, pp. 764–785, 2011.

[45] S. Rangan, “Generalized approximate message passing for estimationwith random linear mixing,” in IEEE Int. Symp. Information Theory(ISIT), Jul.-Aug. 2011.

[46] D. Guo, D. Baron, and S. Shamai, “A single-letter characterizationof optimal noisy compressed sensing,” in 47th Annual Allerton Conf.Commun., Control, and Comput., Sep. 2009.

[47] G. Reeves and M. Gastpar, “Compressed sensing phase transitions:Rigorous bounds versus replica predictions,” in Conf. Inf. Sci. Syst.(CISS), Mar. 2012.

[48] S. Rangan, A. Fletcher, and V. Goyal, “Asymptotic analysis of MAPestimation via the replica method and applications to compressedsensing,” IEEE Trans. Inf. Theory, vol. 58, no. 3, pp. 1902–1923, 2012.

[49] T. Kailath, A. H. Sayed, and B. Hassibi, Linear Estimation. UpperSaddle River, NJ: Prentice Hall, 2000.

[50] Y. Wu and S. Verdu, “MMSE dimension,” IEEE Trans. Inf. Theory,vol. 57, no. 8, pp. 4857–4879, 2011.

[51] H. Reboredo, F. Renna, R. Calderbank, and M. R. D. Rodrigues,“Compressive classification,” in IEEE Int. Symp. Information Theory(ISIT), Jul. 2013.

[52] J. Duarte-Carvajalino, G. Yu, L. Carin, and G. Sapiro, “Adapted statis-tical compressive sensing: Learning to sense gaussian mixture models,”in IEEE Int. Conf. on Acoustics, Speech and Sig. Process. (ICASSP),Mar. 2012.

[53] D. P. Palomar, J. M. Cioffi, and M. A. Lagunas, “Joint Tx-Rx beam-forming design for multicarrier MIMO channels: A unified frameworkfor convex optimiziation,” IEEE Trans. Signal Process., vol. 51, no. 9,pp. 2381–2401, 2003.

[54] T. M. Cover and J. A. Thomas, Elements of Information Theory. NewYork, NY: Wiley, 1991.

[55] J. Duarte-Carvajalino, G. Sapiro, G. Yu, and L. Carin, “Online adap-tive statistical compressed sensing of gaussian mixture models,” arXivpreprint arXiv:1112.5895, 2011.

[56] M. S. Srivastava and C. G. Khatri, An Introduction to MultivariateStatistics. Amsterdam: North-Holland, 1979.

[57] J. T. Flam, S. Chatterjee, K. Kansanen, and T. Ekman, “MinimumMean Square Error Estimation Under Gaussian Mixture Statistics,” arXivpreprint arXiv:1108.3410, 2011.

[58] R. Horn and C. Johnson, Matrix Analysis. Cambridge, UK: CambridgeUniversity Press, 1985.

13

Reconstruction of Signals Drawn from a Gaussian …people.ee.duke.edu/~lcarin/renna13.pdf1...

Documents

Transcript of Reconstruction of Signals Drawn from a Gaussian …people.ee.duke.edu/~lcarin/renna13.pdf1...