Performance Cpe

29
arXiv:1412.6724v3 [cs.IT] 25 Sep 2015 1 Performance of Compressive Parameter Estimation via K -Median Clustering Dian Mo, Student Member, IEEE, and Marco F. Duarte, Senior Member, IEEE Abstract In recent years, compressive sensing (CS) has attracted significant attention in parameter estimation tasks, including frequency estimation, time delay estimation, and localization. In order to use CS in parameter estimation, parametric dictionaries (PDs) collect observations for a sampling of the parameter space and yield sparse represen- tations for signals of interest when the sampling is sufficiently dense. While this dense sampling can lead to high coherence in the dictionary, it is possible to leverage structured sparsity models to prevent highly coherent dictionary elements from appearing simultaneously in the signal representations, alleviating these coherence issues. However, the resulting approaches depend heavily on a careful setting of the maximum allowable coherence; furthermore, their guarantees applied on the coefficient recovery do not translate in general to the parameter estimation task. In this paper, we propose the use of the earth mover’s distance (EMD), as applied to a pair of true and estimated coefficient vectors, to measure the error of the parameter estimation. We formally analyze the connection between the aforementioned EMD and the parameter estimation error. We theoretically show that the EMD provides a better-suited metric for the performance of PD-based parameter estimation than the commonly used Euclidean distance. Additionally, we leverage the previously described relationship between K-median clustering and EMD-based sparse approximation to develop improved PD-based parameter estimation algorithms. Finally, we present numerical experiments that verify our theoretical results and show the performance improvements obtained from the proposed compressive parameter estimation algorithms. Index Terms compressive sensing, parameter estimation, parametric dictionary, earth mover’s distance, K-median clustering I. I NTRODUCTION Compressive sensing (CS) simultaneously acquires and compresses signals via random projections, and recovers the signals if there exists a basis or dictionary in which the signals can be expressed sparsely [2–4]. Recently, the application of CS has been extended from signal recovery to parameter estimation through the design of parametric dictionaries (PDs) that contain signal observations for a sampling of the parameter space [5–16]. The resulting connection between parameter estimation and sparse signal recovery has made it possible for compressive parameter An early version of this work appeared at Proceedings of SPIE Wavelets and Sparsity XV, August 2013 [1]. The authors are with the Department of Electrical and Computer Engineering, University of Massachusetts Amherst, Amherst, MA 01003. Email: [email protected], [email protected].

description

dsfsdf

Transcript of Performance Cpe

Page 1: Performance Cpe

arX

iv:1

412.

6724

v3 [

cs.IT

] 25

Sep

201

51

Performance of Compressive Parameter

Estimation viaK-Median ClusteringDian Mo, Student Member, IEEE, and Marco F. Duarte,Senior Member, IEEE

Abstract

In recent years, compressive sensing (CS) has attracted significant attention in parameter estimation tasks,

including frequency estimation, time delay estimation, and localization. In order to use CS in parameter estimation,

parametric dictionaries (PDs) collect observations for a sampling of the parameter space and yield sparse represen-

tations for signals of interest when the sampling is sufficiently dense. While this dense sampling can lead to high

coherence in the dictionary, it is possible to leverage structured sparsity models to prevent highly coherent dictionary

elements from appearing simultaneously in the signal representations, alleviating these coherence issues. However,

the resulting approaches depend heavily on a careful setting of the maximum allowable coherence; furthermore, their

guarantees applied on the coefficient recovery do not translate in general to the parameter estimation task. In this paper,

we propose the use of the earth mover’s distance (EMD), as applied to a pair of true and estimated coefficient vectors,

to measure the error of the parameter estimation. We formally analyze the connection between the aforementioned

EMD and the parameter estimation error. We theoretically show that the EMD provides a better-suited metric for

the performance of PD-based parameter estimation than the commonly used Euclidean distance. Additionally, we

leverage the previously described relationship betweenK-median clustering and EMD-based sparse approximation to

develop improved PD-based parameter estimation algorithms. Finally, we present numerical experiments that verify

our theoretical results and show the performance improvements obtained from the proposed compressive parameter

estimation algorithms.

Index Terms

compressive sensing, parameter estimation, parametric dictionary, earth mover’s distance,K-median clustering

I. I NTRODUCTION

Compressive sensing (CS) simultaneously acquires and compresses signals via random projections, and recovers

the signals if there exists a basis or dictionary in which thesignals can be expressed sparsely [2–4]. Recently, the

application of CS has been extended from signal recovery to parameter estimation through the design of parametric

dictionaries (PDs) that contain signal observations for a sampling of the parameter space [5–16]. The resulting

connection between parameter estimation and sparse signalrecovery has made it possible for compressive parameter

An early version of this work appeared at Proceedings of SPIEWavelets and Sparsity XV, August 2013 [1].

The authors are with the Department of Electrical and Computer Engineering, University of Massachusetts Amherst, Amherst, MA 01003.

Email: [email protected], [email protected].

Page 2: Performance Cpe

2

estimation to be implemented via standard CS recovery algorithms, where the dictionary coefficients obtained from

signal recovery can be interpreted by matching the nonzero coefficient locations with the parameter estimates. This

CS approach has been previously formulated for landmark compressive parameter estimation problems, including

localization and bearing estimation [5–10], time delay estimation [11, 12], and frequency estimation (also known

as line spectral estimation) [13–17].

Unfortunately, since only in the contrived case when the unknown parameters are all contained in the sampling

set of the parameter space can the PD-based compressive parameter estimation be perfect, dense sampling of the

parameter space is needed to improve the parameter estimation resolution [6]. This dense sampling introduces highly

coherent elements in the PD, and previous approaches to address the resulting coherence issue need to carefully set

the value of the maximum allowable coherence among the chosen PD elements in the sparse approximations [6, 12,

15–17]. In practice, this approach restricts the minimum separation between any two parameters that can be observed

simultaneously in a signal, and can potentially exclude a large class of observable signals from consideration.

Additionally, the guarantee of bounded error between the true and the estimated coefficient vectors, measured via

the Euclidean distance in almost all existing methods, has avery limited impact on the performance of compressive

parameter estimation. In contrast, the earth mover’s distance (EMD) [18–20] is a very attractive option to measure

the error of PD-based parameter estimation due to the fact that the EMD of two sparse PD coefficient vectors is

indicative of the parameter estimation error when the entries of the PD coefficient vectors or the PD elements are

sorted by the value of the corresponding parameters.

In this paper, we propose a new method for compressive parameter estimation that uses PDs and sparsity and

replaces the hard thresholding operator withK-median clustering. This modification is motivated by the fact that

K-median clustering solves the EMD-sparse approximation problem [19, 20]. We also provide an analysis of

the connection between the EMD between a pair of true and estimated coefficient vectors and the corresponding

parameter estimation error. Some of our contributions can be detailed as follows. First, we theoretically show that

the EMD between the sparse PD coefficient vectors provides anupper bound for the parameter estimation error,

motivating the use of EMD-based algorithms to minimize the parameter estimation error. Second, we formulate

theorems that provide performance guarantees for PD-basedparameter estimation when clustering methods are used;

these guarantees refer to the correlation function of the parametric signal model, which measures the magnitude of

the inner product between signal observations corresponding to different parameters. Third, we analyze the effect

of the decay of the parametric signal’s correlation function on the performance of clustering parameter estimation

methods, and relate these guarantees to the performance of the clustering methods under CS compression and signal

noise. Finally, we introduce and analyze the joint use of thresholding and clustering methods to address performance

loss resulting from compression and noise. Although this paper focuses on the parameter estimation problem with

1-dimensional parameters, our work can be easily extended to2-D and higher-dimensional parameter estimation.

Previous works have considered the use of the EMD to measure the sparse recovery error [19, 20], but have not

focused on its application with PDs and the resulting impacton parameter estimation performance. Furthermore,

these previous works require a specially tailored CS measurement matrix and assume that sparsity is present in the

Page 3: Performance Cpe

3

canonical representation. Finally, the recovery algorithm is implemented via optimization, in contrast to the greedy

algorithm that is proposed in this paper, and we are not awareof prior work that leverages the previously known

connection between EMD andK-median clustering in sparse recovery algorithms.

This paper is organized as follows. We provide a summary of CSand compressive parameter estimation and the

issues present in existing work in Section II. In Section III, we present and analyze the use of EMD and clustering

methods for PD-based parameter estimation; furthermore, we formulate and analyze an algorithm for PD-based

sparse approximation in the EMD sense that employsK-median clustering and provides increased accuracy for

parameter estimation. In Section IV, we present numerical simulations that verify our results for the clustering

method in the example applications of time delay estimationand frequency estimation. Finally, we provide a

discussion and conclusions in Section V.

II. BACKGROUND

A. Compressive Sensing

Compressive sensing (CS) has emerged as a technique integrating sensing and compression for signals that are

known to be sparse or compressible in some basis. A discrete signal x ∈ CN is K-sparse in a basis or a frameΨ

when the signal can be represented by the basis or frame asx = Ψc with ‖c‖0 ≤ K, where theℓ0 “norm” ‖ · ‖0counts the number of nonzero entries. The dimension-reducing measurement matrixΦ ∈ R

M×N compresses the

signal x to obtain the measurementsy = Φx ∈ CM . Though, in general, it is ill-posed to recover the signalx

from the measurementsy whenM < N sinceΦ has a nontrivial null space, CS theory shows that it is possible

to recover the signal from a small number of measurements when the signal is sparse and the measurement matrix

satisfies the restricted isometry property [3, 22], a property that has been proven in the literature for matrices with

random independent entries withM = O(K logN) [23].

Given these conditions, the CS recovery problem usually canbe solved via optimization methods, such as basis

pursuit [24], or greedy algorithms, such as Iterative Hard Thresholding [25], Compressive Sampling Matching

Pursuit [26], Subspace Pursuit [27], and Orthogonal Matching Pursuit [28, 29]. Many of the mentioned greedy

algorithms rely on the hard thresholding operator, which sets all entries of an input vector to zero except those

with largest magnitudes. The sparse vector resulting from the thresholding operator provides the optimalK-sparse

approximation for the input vector in the sense that the output vector has minimum Euclidean distance to the input

vector among all possibleK-sparse vectors.

While classical CS processes signals by exploiting the factthat they can be described as sparse in some basis or

frame, the locations of the nonzero entries of the coefficient vectors often have additional underlying structure. To

capture this additional structure, model-based CS replaces the thresholding operator by a corresponding structured

sparse approximation operator, which, similarly, finds theoptimal structured sparse approximation for an input

vector in the sense that the output vector exhibits the desired structure and is closest to the input vector among all

possible structured sparse vectors [15, 17, 30, 31].

Page 4: Performance Cpe

4

B. Parameter Estimation

Parameter estimation problems are usually defined in terms of a parametric signal class, which is defined via a

mappingψ : Θ → X from the parameter spaceΘ ⊂ RD to the signal spaceX ⊂ C

N . The signal observed in a

parameter estimation problem containsK unknown parametric componentsx =∑K

i=1ciψ(θi), and the goal is to

obtain estimates of the parametersθi from the signalx. Examples include time delay estimation, where the signal

ψ(θi) corresponds to a known waveform with time delayθi, and line spectral estimation, where the parameter

θi = fi and the signalψ(θi) = ψ(fi) corresponds to a complex exponential of unit norm and frequency fi.

One can introduce a parametric dictionary (PD) as a collection of samples from the signal space

Ψ =[ψ(θ1), ψ(θ2), . . . , ψ(θL)

]⊆ ψ(Θ),

which corresponds to a set of samples from the parameter space Θ ={θ1, θ2, . . . , θL

}⊆ Θ. In this way, the

signal can be expressed as a linear combination of the PD elementsx = Ψc when all the unknown parameters

are contained in the sampling set of the parameter space, i.e., θi ∈ Θ for eachi = 1, . . . ,K. Therefore, finding

the unknown parameters reduces to finding the PD elements appearing in the signal representation or, equivalently,

finding the nonzero entries or support of the sparse PD coefficient vectorc for which we indeed havey = ΦΨc.

The search for the vectorc can be performed using CS recovery.

PD-based compressive parameter estimation can be perfect only if the parameter sample setΘ is dense and

large enough to contain all of the unknown parameters{θi}Ki=1. If this stringent case is not met for some unknown

parameterθk, a denser sampling of the parameter space decreases the difference between the unknown parameter

θk and the nearest parameter sampleθl, so that we can approximate the parametric signalψ(θk) with the parametric

signal ψ(θl). However, highly dense sampling increases the similarity between adjacent PD elements and by

extension the PD coherence [32], corresponding to the maximum normalized inner product of PD elements:

µ(Ψ) = max1≤i6=j≤L

∣∣〈ψ(θi), ψ(θj)〉∣∣

∥∥ψ(θi)∥∥2

∥∥ψ(θj)∥∥2

. (1)

Additionally, dense sampling increases the difficulty of distinguishing PD elements and severely hampers the

performance of compressive parameter estimation [33, 34].Prior work addressed such issues by using a coherence-

inhibiting structured sparse approximation where the resulting K nonzero entries of the coefficient vector correspond

to PD elements that have sufficiently low coherence, i.e.,

|〈ψ(θi), ψ(θj)〉|‖ψ(θi)‖2‖ψ(θj)‖2

≤ ν for i, j = 1 . . . ,K,

in order to inhibit the highly coherent PD elements from appearing in signal representation simultaneously; this

approach is known as band exclusion [6, 12, 15–17]. The maximum allowed coherenceν that defines the restriction

on the choice of PD elements is essential to successful performance: setting its value too large results in the selection

of coherent PD elements, while setting its value too small tightens up requirements on the minimum separation of

the parameters.

Another issue is that existing CS recovery algorithms commonly used in this setting can only guarantee stable

recovery of the sparse PD coefficient vectors when the error is measured by theℓ2 norm; in other words, the

Page 5: Performance Cpe

5

estimated coefficient vector is close to the true coefficientvector in Euclidean distance. Such a guarantee is linked

to the core hard thresholding operation, which returns the optimal sparse approximation to the input vector with

respect to theℓ2 norm. However, the guarantee provides control on the performance of parameter estimation only

in the most demanding case of exact recovery, when parameterestimation would be perfect. Otherwise, if exact

coefficient vector recovery cannot be met, such a recovery guarantee is meaningless for parameter estimation since

the ℓ2 norm cannot precisely measure the difference between the supports of the sparse PD coefficient vectors. For

an illustrative example, consider a simple frequency estimation problem where the PD collects complex sinusoids

at frequencies{f1, f2, . . . , fL} and there is only one unknown frequencyfi. In other words, the coefficient vector

is c = ei, whereei denotes the canonical vector that is equal to1 at its ith entries and0 elsewhere. Although two

estimated coefficient vectorsc1 = ej and c2 = ek have the sameℓ2 distance to the true coefficient vectorc, the

frequency estimatefj from c1 has smaller error than the frequency estimatefk from c2 when the PD elements are

sorted so thatfi < fj < fk.

Alternatively, the earth mover’s distance (EMD) has recently been used in CS to measure the distance between

coefficient vectors in terms of the similarity between theirsupports [19, 20]. In particular, the EMD between two

vectors with the sameℓ1 norm optimizes the work of the flow (i.e., the amount of the flowand the distance of flow)

among one vector to make the two vectors equivalent.1 In our illustrative example, if the values of the frequency

samples of the PD increase monotonically at a regular interval ∆ so thatfi = ∆ · i + f0, the EMDs between the

true coefficient vectorc and the estimated coefficient vectorsc1 and c2 are proportional to the frequency errors

|fi − fj| and |fi − fk|, respectively. Based on the fact that the work of the flow between any two entries of a PD

coefficient vector is isometric to the distance between the two corresponding parameters, it is reasonable that the

EMD between pairs of PD coefficient vectors efficiently measures the corresponding error of parameter estimation.

We elaborate our study of this property in Section III-A.

C. K-Median Clustering

Cluster analysis partitions a set of data points based on thesimilarity information between each pair of points,

which is usually expressed in terms of a distance [36, 37]. Clustering is the task of partitioning a set of points into

different groups in such a way that the points in the same group, which is called a cluster, are more similar to each

other than to those in other groups. The greater the similarity within a group is and the greater that the difference

among groups is, the better or more distinct the clustering is.

The goal of clusteringL points{p1, p2, . . . , pL} associated with weightsw1, w2, . . . , wL and mutual similarities

d(pi, pj) into K clusters is to find theK centroids{q1, q2, . . . , qK} of the clusters such that each cluster contains

all points that are more similar (i.e., closer) to their centroid than to other centroids:

Ci = {pl : d(pl, qi) ≤ d(pl, qj)}. (2)

1The standard definition of EMD assumes that the two vectors have the sameℓ1 norm. Nonetheless, one can add an additional cost due to

norm mismatch that is equal to the mismatch times the length of the signal [35].

Page 6: Performance Cpe

6

One can define a clustering quality measure as the total sum ofweighted similarity between points and centroids:

J =K∑

i=1

pj∈Ci

wjd(qi, pj). (3)

Different choices of mutual similarityd(pi, pj) can result in different procedures to obtain the centroids [37]. If the

square Euclidean distance (ℓ2) is used, i.e.,d(pi, pj) = ‖pi − pj‖22, then each cluster’s centroid will be the mean

of its elements, and so the clustering is calledK-means clustering.

If the Manhattan distance (ℓ1) is used, i.e.,d(pi, pj) = ‖pi−pj‖1, then each cluster’s centroid will be the median

of its elements, and so the clustering is calledK-median clustering. In the special case where all pointspi ∈ R,

i.e., all the points are along a line, and the absolute value is used as a distance, i.e.,d(pi, pj) = |pi−pj|, the object

function defined in (3) becomes

J =

K∑

i=1

pj∈Ci

wj |qi − pj |. (4)

One can solve for the centroids by setting the derivative of the measure function with respect to each of the weights

to zero; the result is∑

pj∈Ci

wjsign(qi − pj) = 0, (5)

for i = 1, . . . ,K, where sign(x) returns the sign ofx. Equation (5) illustrates that the resulting centroids arethe

medians of the elements in each cluster and the points on the two sides of the centroids have maximally balanced

weight: for eachi = 1, . . . ,K,∑

j:pj∈Ci,pj≤qi

wj ≥∑

j:pj∈Ci,pj>qi

wj ,

j:pj∈Ci,pj<qi

wj ≤∑

j:pj∈Ci,pj≥qi

wj .(6)

D. Earth Mover’s Distance

The earth mover’s distance (EMD) between two vectors(c, c) relies on the notion of mass assigned to each

entry of the involved vectors, with the mass of each entry being equal to its magnitude. The goal of EMD is to

compute the lowest transfer of mass among the entries of the first vector needed in order to match the mass of

the entries of the second vector. We assume that the vectorsc and c are nonnegative and sparse, with nonzero

entries{c1, c2, . . . , cK} and{c1, c2, . . . , cK}, and with supportsS = {s1, s2, · · · , sK} and S = {s1, s2, · · · , sK},

respectively. EMD(c, c) represents the distance between the two vectors by finding the minimum sum of mass flows

fij from entryci to entry cj multiplied by the distancedi,j = |si − sj | that can be applied to the first vectorc to

Page 7: Performance Cpe

7

yield the second vectorc. This is a typical linear programming problem that can be written as

EMD (c, c) = minfij

i,j

fijdij

such thatK∑

j=1

fij = ci, i = 1, 2, . . . ,K,

K∑

i=1

fij = cj , j = 1, 2, . . . ,K,

fij ≥ 0, i, j = 1, 2, . . .K.

(7)

E. EMD-Optimal Sparse Approximation

EMD-optimal sparse approximation plays a crucial role in our proposed compressive parameter estimation ap-

proach. This process makes it easy to integrate EMD into a CS framework to formulate a new compressive parameter

estimation algorithm [15].

Assume thatI = {1, 2, . . . , L} andS = {s1, s2, · · · , sK} is a K-element subset ofI. Consider the problem

of finding theK-sparse vectorc ∈ CL with supportS that has the smallest EMD to an arbitrary vectorv ∈ CL.

The minimum flow work defined in the EMD is achieved if and only if the flow is active between each entry of

the vectorv and its nearest nonzero entrysi of the vectorc. In other words, the nonzero entries ofc partition the

entries ofv into K different groups. Denote byVi the set of indices of the entries ofv that are matched to the

nonzero entrysi of c; this set can be written as

Vi = {l ∈ I : |l − si| ≤ |l− sj |, sj 6= si, sj ∈ S}. (8)

The EMD defined in (7) can be written as

EMD(v, c) =

K∑

i=1

j∈Vi

|vj ||j − si|. (9)

It is important to note that (9) has the same formula as (3–4),which is the objective function used inK-median

clustering. Thus, one can pose aK-median clustering problem to minimize the value of (9) overall possible

supportsS. To that end, defineL points in a one-dimensional space with weights|v1|, |v2|, · · · , |vL| and locations

1, 2, · · · , L. It is easy to see that if we denote the set of centroid positions obtained by performingK-median

clustering for this problem asS, then the setS corresponds to the support of theK-sparse signal that is closest

to the vectorv when measured with the EMD. One can then simply compute the sets in (8) and define the EMD-

optimalK-sparse approximationc to the vectorv as csi =∑

j∈Sivj for si ∈ S, with all other entries equal to

zero. Thus, it is computationally feasible to provide sparse approximations in the EMD sense [19, 20].

III. C LUSTERING METHODS FORPARAMETER ESTIMATION

A. Estimation Error

We wish to obtain parameter estimates such that the total errors of the unknown parameters and the estimated

parameters is minimized. Computing the estimation error between a set ofK one-dimensional true parameters

Page 8: Performance Cpe

8

θ = {θ1, θ2, . . . , θK} ⊂ R and a set ofK one-dimensional estimatesθ ={θ1, θ2, . . . , θK

}⊂ R is an assignment

problem that minimizes the cost of assigning each true parameter to a parameter estimate, when the cost of assigning

the true parameterθi ∈ θ to the parameter estimateθj ∈ θ is the absolute distance between the two values

tij =∣∣∣θi − θj

∣∣∣. The resulting parameter estimation error can be obtained by solving the following linear program,

which is a relaxation of the formal integer program optimization [38, Corollary 19.2a]:

PEE(θ, θ) = min{gij}

i,j

gijtij

such thatK∑

j=1

gij = 1, i = 1, 2, . . . ,K;

K∑

i=1

gij = 1, j = 1, 2, . . . ,K;

gij ≥ 0, i, j = 1, 2, . . . ,K.

(10)

In words, the outputg for the optimization above encodes the matching between true parameter values and their

estimates that minimizes the average parameter estimationerror.

When the sampling interval of the parameter space that generates the PD is constant and equal to∆ (so that

θi = ∆ · i+ θ0 with i denoting the index for the corresponding PD elementψ(θi)), it is easy to see that for a pair

of true and estimated coefficient vectors and the corresponding true and estimated parameters, the computations

of the parameter estimation error (10) and the EMD (7) are similar and obeytij = ∆ · dij . This straightforward

comparison demonstrates the close relationship between the EMD and parameter estimation error, which is formally

stated in the following theorem and proven in Appendix A.

Theorem 1. Assume that∆ is the sampling interval of the parameter space that generates the PD used for

parameter estimation. Ifc and c are twoK-sparse PD coefficient vectors corresponding to two sets of parameters

θ and θ, then the EMD between the two coefficient vectors provides anupper bound of the parameter estimation

error between the two sets of parameters:

PEE(θ, θ) ≤ ∆

cmin

EMD(c, c), (11)

wherecmin is the smallest component magnitude among the nonzero entries ofc and c.

Theorem 1 clearly shows that the EMD between sparse PD coefficient vectors provides an upper bound of the

parameter estimation error for the corresponding parameters, with a scaling factor proportional to the PD parameter

sampling rate∆. For a particular estimation problem where the PD sampling interval is fixed, Theorem 1 gives the

intuition that designing an algorithm that minimizes the EMD-measured error of the coefficient vector estimation

consequently will also minimize the parameter estimation error for the same case.

It is worth noting from the proof of Theorem 1 that the relationship between EMD and parameter estimation

error becomes strictly linear whencmin = cmax, i.e., all component magnitudes have the same values. If thedynamic

Page 9: Performance Cpe

9

range of component magnitudes is defined as

r = maxi6=j

cicj, (12)

then r reflects how close to equality (11) can be. The dynamic range of component magnitudes is an important

condition in parameter estimation, as we will show in the sequel.

B. The Role of Correlation in PD-based Parameter Estimation

We follow the convention of greedy algorithms for CS, where aproxy of the coefficient vector is obtained via the

correlation of the observations with the PD elements, i.e.,v = ΨHx, whereΨH denotes the Hermitian (conjugate

transpose) ofΨ. The resulting proxy vectorv can be expressed as a linear combination of shifted correlation

functions. The magnitudes of these components will be proportional to the magnitude of the corresponding signal

components.

The correlation value between the PD elements corresponding to parametersθ1 andθ2 is defined as

λ(ω) = λ(θ2 − θ1) = 〈ψ(θ2), ψ(θ1)〉 = ψH(θ1)ψ(θ2), (13)

whereω = θ1 − θ2 measures the difference between parameters, and we assume for simplicity that the correlation

depends only on the parameter differenceω. In many parameter estimation problems, such as frequency estima-

tion and time delay estimation, the correlation function has bounded variation such that the cumulative correlation

function, defined as

Λ(θ) =∑

ω∈Θ:ω≤θ

|λ(ω)| , (14)

is bounded. The cumulative correlation function is a nondecreasing function with infimumΛ(−∞) = 0 and

supremumΛ(∞) =∑

ω∈Θ|λ(ω)|.

As shown in Figure 1, the correlation functionλ(ω) achieves its maximum when the parameter differenceω = 0

and decreases as|ω| increases, finally vanishing when|ω| → ∞. In words, the larger the parameter difference is,

the smaller the similarity of corresponding PD elements is.Due to the even nature of the correlation function, i.e.,

λ(ω) = λ(−ω), the cumulative correlation function is rotationally symmetric, i.e.,Λ(θ)+Λ(−θ) = 2Λ(0) = Λ(∞),

as shown in Figure 1. Both figures also indicate that the correlation function for time delay estimation decays much

faster than that for frequency estimation, which is indicative of the increased difficulty for frequency estimation with

respect to time delay estimation that will be shown in the sequel.

For convenience of analysis, we assume that the correlationfunction is real and nonnegative, while noting that

the experimental results match our theory even when the assumption does not hold. When the signal is measured

directly without CS (i.e., the measurement matrix is the identity or Φ = I) in a noiseless setting, the observations

exactly match the sparse signal and can be written as

y = x =

K∑

i=1

ciψ(θi). (15)

Page 10: Performance Cpe

10

−50 0 500

0.2

0.4

0.6

0.8

1

ω/∆

|λ(ω

)|

TDEFE

−50 0 500

0.2

0.4

0.6

0.8

1

ω/∆

Λ(ω

)/Λ(∞

)

TDEFE

Fig. 1:Examples of correlation in PD constructions. (Left) Correlation functionλ(ω) and (right) normalized cumulative correlation

functionΛ(ω)/Λ(∞) as a function of the discretized parameterω/∆ for time delay estimation (TDE) and frequency estimation

(FE).

Therefore, the proxy entriesvj = ΨHj x correspond to inner products between the observation vector y and the

PD elementsψ(θj) corresponding to the sampled parametersθj ∈ Θ, and thus can be expressed as a linear

combination of shifted correlation functions. Forj = 1, 2, · · · , L, the j-th entry of proxy vector is

vj = v(θj) = 〈x, ψ(θj)〉 =K∑

i=1

ci〈ψ(θi), ψ(θj)〉

=

K∑

i=1

ciλ(θj − θi). (16)

It is easy to see that the proxy function will feature local maxima at the sampled parameter values from the PD that

are closest to the true parameter values. Thus, the goal of the parameter estimation finally reduces to the search of

local maxima in the proxy vector over all parameter samples represented in the PD, which often is addressed via

optimal sparse approximation of the proxy.

When the correlation function has fast decay (usually the case when the coherence of the PD is very small), it

is possible to find the local maxima of the proxy via the hard thresholding operator, as used in standard greedy

algorithms for sparse signal recovery. However, when the correlation function decays slowly (usually the case when

the PD elements are highly coherent), the thresholding operator will unavoidably focus its search around the peak

of the proxy with the largest magnitude, unless additional approaches like band exclusion are implemented. In

contrast, EMD-optimal sparse approximation identifies thelocal maxima of the proxy directly by exploiting the fact

that these local maxima correspond to theK-median clustering centroids, when certain conditions (tobe defined

in Theorem 2 in the next section) are met. Thus, we can proposean EMD and PD-based parameter estimation

algorithm, shown as Algorithm 1, which leverages a standarditerative, Lloyd-styleK-median clustering algorithm

[36]. This algorithm will repeatedly assign each entry of the input to the cluster whose median is closest to the

entry and then update each median by finding the weighted median of the cluster using the balance property (6).

Page 11: Performance Cpe

11

Algorithm 1 EMD-Optimal Sparse Parameter Estimatorθ = C(v, Θ,K)

Input: PD proxy vectorv, set of PD parameter valuesΘ, target sparsityK

Output: parameter estimatesθ, sampled indicesS

1: Initialize: setL = |Θ|, chooseS as a randomK-element subset of{1, . . . , L}2: repeat

3: gi = argminj=1,...,K |i− sj | for eachi = 1, . . . , L {label parameter samples}4: sj = median{(i, |vi|) : gi = j} for eachj = 1, . . . ,K {update cluster medians}5: θ = ΘS {find parameter estimates}6: until S does not change or maximum number of iterations is reached

C. Performance Analysis

There are some conditions that the signalx should satisfy to minimize the estimation error when using Algorithm

1.

• Minimum Parameter Separation:If two parametersθi and θj are too close to each other, the similarity of

ψ(θi) andψ(θj) makes it difficult to distinguish them. Therefore, our first condition considers the minimum

separation distance:

ζ = mini6=j

|θi − θj |. (17)

• Parameter Range:Any parameter observed should be sufficiently far away from the bounds of the parameter

range. It is often convenient to restrict the feasible parameters and sampled parameters in a small range, i.e,Θ

is bounded. According to (6), which implies a balance of the proxy v around the local maxima when using a

clustering method, there will be a bias in the estimation dueto the missed portion of the symmetric correlation

function λ when the unknown parameter is too close to the bound of the parameter range. Therefore, the

condition should also consider the minimum off-bound distanceǫ, formally written as

ǫ = min1≤i≤K

{min(θi −min(Θ),max(Θ)− θi)}. (18)

• Dynamic Range:If the magnitudes of some components in the signal are too small, they may be dwarfed by

larger components and ignored by the greedy algorithms. Thus, we need to pose an additional condition on

the dynamic range of the component magnitudes as defined in (12).

With these conditions, we can formulate the following theorem (proven in Appendix B) that guarantees the

performance of our clustering-based method for compressive parameter estimation.

Theorem 2. Assume that the sampling interval of the parameter space∆ → 0, and that the signalx given in (15)

involvingK parametersθ1, θ2, . . . , θK has a dynamic range of the component magnitudes as defined in (12). For

any allowed errorσ > 0, if the minimum separation distance (17) satisfies

ζ ≥ 2Λ−1

(2Λ(0)

(1− Λ(σ)/Λ(0)− 1

(2K − 2)r + 1

))+ 2σ, (19)

Page 12: Performance Cpe

12

and the minimum off-bound distance (18) satisfies

ǫ ≥ Λ−1

(2Λ(0)

(1− Λ(σ)/Λ(0)− 1

2Kr

)), (20)

then the estimatesθ1, θ2, . . . , θK returned from Algorithm 1 have estimation error

PEE(θ, θ)≤ Kσ. (21)

Theorem 2 provides a guarantee on the parameter estimation error obtained fromK-median clustering of the

proxy v for the PD coefficients. Theorem 2 also provides a connectionbetween the conditions described earlier and

the performance of PD-based parameter estimation, providing a guide for the design and evaluation of PDs that can

achieve the required estimation error for practical problems. For example, in time delay estimation problems, instead

of increasing the minimum separation distance required forrecovery, one can try to design a transmitted waveform

that improves the other conditions cited above (such as one that increases the rate of decay of the cumulative

correlation functions, as will be discussed in the sequel) to achieve small estimation error.

Additionally, Theorem 2 makes explicit the linear relationship between the normalized cumulative correlation

for the minimum separation distanceΛ(ζ)/Λ(∞) and the normalized cumulative correlation for the maximum

observed errorΛ(σ)/Λ(0), cf. Section IV. The required minimum separation distanceswill be dependent on the

specific parameter estimation problem even when the maximumallowed error is kept constant. This illustrates the

wide difference in performances between time delay estimation and frequency estimation: the minimum separation

distance required by time delay estimation is much smaller than that of frequency estimation, due to the contrasting

rates of decay of the functionΛ, cf. Figure 1. Although Theorem 2 is derived for the case of a nonnegative real

correlation function and is asymptotic on the parameter sampling interval∆, our numerical simulations in the sequel

show that the predicted relationship betweenζ andσ are observed in practical problems of modest sizes.

D. Effect of Compression and Measurement Noise

The addition of CS and measurement noise make the estimationproblem even harder, since in both of these cases

there is a decrease in the rate of decay of the cumulative correlation functionΛ. When the measurement matrixΦ

is used to obtain the observed measurementsy from the signalx such thaty = Φx =∑K

i=1ciΦψ(θi), the proxy

becomes

vj = v(θj) = 〈ΦHy, ψ(θj)〉 = 〈y,Φψ(ω)〉

=

K∑

i=1

ci〈Φψ(θi),Φψ(θj)〉. (22)

Only if ΦHΦ = I can (22) be identical to (16). We define the compressed correlation function as

λΦ (ω) = λΦ(θ1 − θ2) = 〈Φψ(θ2),Φψ(θ1)〉

= ψH(θ1)ΦHΦψ(θ2). (23)

Page 13: Performance Cpe

13

The proxy can again be expressed as the linear combination ofshifted copies of the redefined correlation function

(23):

vj = v(θj)=

K∑

i=1

ciλΦ(θj − θi

). (24)

Although in general we will haveλ 6= λΦ, we can use the preservation property of inner products through random

projections [39]. That is, whenΦ has independent and identically distributed (i.i.d.) random entries and sufficiently

many rows, there exists a constantδ > 0 such that for all pairs(θi, θj) of interest we have

(1− δ)λ(θi − θj) ≤ λΦ(θi − θj) ≤ (1 + δ)λ(θi − θj). (25)

The parameterδ decays as the compression rateκ =M/N increases, and the manifoldsλ(·) we consider here are

known to be amenable to large amounts of compression [40]. Such a relationship indicates that the compression can

affect the correlation function and the performance of clustering methods for compressive parameter estimation.

E. Quantifying the Role of Correlation Decay

We choose to focus on simple bounds for the correlation function λ to analyze its role in the performance of

EMD and PD-based parameter estimation. Similarly to [41], we use bounding functions to measure and control the

decay rate of the correlation functionλ. We approximate the correlation functionλΦ(ω) with an exponential function

λ(ω) = exp (−a |ω|) that provides an upper bound of the actual correlation function, i.e., λΦ(ω) ≤ λ(ω). The

performance obtained from the exponential function approximationλ(·) provides an upper bound of the performance

from the real correlation functionλ(·). In the exponential function,a is the parameter that controls the decay

rate: the largera is, the faster that the correlation function decays. It is easy to see that the decay rate of the

compressed correlation function (23) will be smaller than that of the original correlation function (13). We assume

that λ (ω) = exp (−a |ω|) and that a boundλΦ (ω) ≤ exp (−b |ω|) exists; in this case,b < a due to the fact that

(25) provides us with the following upper bound:

λΦ (ω) ≤ (1 + δ)λ (ω) ≤ (1 + δ) exp (−a |ω|)

≤ exp

(−(a− ln(1 + δ)

|ω|

)|ω|),

whereln(1 + δ)/ |ω| > 0 whenδ > 0. This shows that CS reduces the decay speed of the correlation function and

increases the necessary minimum separation distance and minimum off-bound distance to guarantee the preservation

of parameter estimation performance. This dependence is also manifested in the experimental results of Section IV

when the correlation function does not follow an exact exponential decay.

We observe in practice that the issues with slow-decaying correlation functions arise whenever the sum of the

copies of the correlation functions far from their peaks becomes comparable to the peak of any given copy. Thus,

one can use operators such as thresholding functions to remove this effect from appearing in Algorithm 1. We can

write a hard-thresholded version of the proxy from (24) as

vt(θi) =

v(θi), |v(θi)| > t,

0, |v(θi)| ≤ t.(26)

Page 14: Performance Cpe

14

As demonstrated by the following theorem, proven in Appendix C, the thresholding operator reduces the required

minimum separation distance for accurate estimation.

Theorem 3. Under the setup of Theorem 2, assume that the correlation function defined in (23) is given by

λΦ (ω) = exp (−a|ω|). For any allowed errorσ > 0, if t is the threshold given in (26), the dynamic range given

in (12) is equal tor, and the minimum separation distance given in (17) satisfies

ζ ≥ 1

aln

(√8r2

t2/ (rcmin)2 − exp(−2aσ)

+ 1

), (27)

where cmin is the minimum component magnitude, then the estimatesθ1, θ2, . . . , θK returned from performing

Algorithm 1 on the thresholded proxyvt in (26) have estimation error

PEE(θ, θ)≤ Kσ. (28)

Theorem 3 extends Theorem 2 by including the use of thresholding as a tool to combat the slow decay of the

correlation function, due to an ill-posed estimation problem or the use of CS. One can also instinctively see that

the presence of noise in the measurements will also slow the decay of the correlation function, which according

to the theorem will require larger minimum separation or careful thresholding. In practice, the decay coefficient

a can usually be obtained by finding the minimum value such thatthe exponential functionexp (−a|ω|) provides

a tight upper bound for the correlation functionλ(ω). Although Theorem 3 is based on an approximation of the

actual compressive parameter estimation problem setup, our numerical results in the sequel show its validation in

practical settings for time delay estimation and frequencyestimation.

IV. N UMERICAL EXPERIMENTS

In order to test the performance of the clustering parameterestimation method on different problems, we present

a number of numerical simulations involving time delay estimation and frequency estimation. Before detailing our

experimental setups, we define the parametric signals and the parametric dictionaries (PDs) involved in these two

example applications.

For time delay estimation, the parametric signal model describes a sampled version of a chirp as shown in (29),

whereT = 1 µs is the length of the chirp,fc = 1 MHz is the chirp’s starting frequency,fa = 20 MHz is the

chirp’s frequency sweep,fs = 1

Ts= 50 MHz is the sampling frequency of the discrete version of the chirp, and

N = 500 samples are taken. The parameter space range goes fromθmin = 0 to θmax = 10 µs. The PD for time

delay estimation contains all chirp signals correspondingto the sampled parametersΨ = [ψ(0), ψ(∆), . . . , ψ(θmax)]

with sampling interval∆.

For frequency estimation, the parametric signals are theN -dimensional signals with entries

ψ(θ)[n] =exp

(j2πθ n

N

)√N

, n = 0, 1, . . . , N − 1. (30)

The parameter space range goes fromθmin = 0 to θmax = 500 Hz. As before, the PD for frequency estimation con-

tains all parametric signals corresponding to the sampled parameterΨ = [ψ(0), ψ(∆), . . . , ψ(θmax)] with sampling

Page 15: Performance Cpe

15

ψ(θ)[n] =

√2

3Tfsexp

(j2π

(fc +

nTs − θ

Tfa

)(nTs − θ)

)(1 + cos

(2πnTs − θ

T

)), nTs − θ ∈ [0, T ],

0, otherwise.(29)

interval ∆. In both time delay estimation and frequency estimation, the number of unknown parameters is set to

K = 4.

In the first experiment, we illustrate the relationship between minimum separation distance and the maximum

allowable error described in Theorem 2. We measure the performance for each minimum separation distanceζ by

the maximum estimation errorσ over 1000 signals as shown in (15) with randomly chosen parameter values that

are spaced by at leastζ. For time delay estimation, the sampling step of the parameter space for all experiments is

∆ = 0.001 µs (unless otherwise specified), so that the PD contains observations for10001 parameter samples, and

we let the minimum separationζ ∈ [0.07 µs, 0.1 µs]. For frequency estimation, the sampling step of the parameter

space for all experiments is∆ = 0.05 Hz, so that the PD contains observations for 10001 parametersamples, and

we let the minimum separationζ ∈ [35 Hz, 70 Hz].

Figure 2 shows the normalized cumulative correlation for the maximum errorΛ(σ)/Λ(0) as a function of

the normalized cumulative correlation for the minimum separation distanceΛ(ζ)/Λ(∞) for both time delay

estimation and frequency estimation; recall thatΛ(0) = Λ(∞)/2. The Figure also shows the relationship between

the minimum separationζ and the maximum errorσ without the use of the cumulative correlation function for both

example cases. The approximately linear relationship betweenΛ(ζ)/Λ(∞) andΛ(σ)/Λ(0) for both the time delay

estimation and frequency estimation cases numerically verifies the result of Theorem 2. The difference between

the performance results as well as the relationship betweenζ/∆ andσ/∆ validates the conclusion that frequency

estimation requires a significantly larger minimum separation than time delay estimation. From Figure 2, we know

that it is impossible to get an arbitrarily small estimationerror even if the minimum separation keeps increasing, as

the estimation error cannot be smaller than the parameter sampling step∆ (as observed in the figure). In fact, the

figure shows that the relationship betweenΛ(ζ)/Λ(∆) andΛ(σ)/Λ(∆) ends its linearity exactly when the value

of the error becomes equal to the parameter sampling resolution ∆ for both application examples. To achieve more

precise estimation, the use of additional methods such as interpolation are needed [42]. Nonetheless, by employing

the functionΛ(·) (which is dependent on the parameter estimation problem), we can consistently obtain a linear

relationship between the minimum separation distance and the parameter estimation error across the parameter

estimation problems considered (cf. Figure 2).

In the second experiment, we illustrate the application of Theorem 3 in the time delay estimation problem. We

vary the chirp’s frequency sweepfa between2 Hz and20 Hz to generate different rates of decay of the correlation

function and obtain the decay parametera as the smallest value that enables the exponential functionexp (−a|ω|)to bound the correlation functionλ (ω). We then measure the performance of time delay estimation inthe same

Page 16: Performance Cpe

16

Λ(ζ)/Λ(∞)0.7 0.72 0.74

Λ(σ)/Λ(0)

1

1.05

1.1

Λ(ζ)/Λ(∞)0.66 0.67 0.68 0.69

Λ(σ)/Λ(0)

1.1

1.15

1.2

1.25

1.3

ζ [µs]0.07 0.08 0.09 0.1

σ[µs]

×10-3

0

2

4

6

8

ζ [Hz]30 40 50 60 70 80

σ[H

z]

0

2

4

6

8

10

Fig. 2: Performance results for two parameter estimation problemsas a function of the parameter separation. Left: time delay

estimation; right: frequency estimation. Top: Normalizedcumulative correlation function for the maximum estiamtion error

Λ(σ)/Λ(0) as a function of the normalized cumulative correlation function for the minimum separationΛ(ζ)/Λ(∞), showing

a linear dependence. Bottom: Normalized maximum errorσ/∆ as a function of normalized minimum separationζ/∆.

a0 5 10 15

ζ/∆

5

10

15

20

25

30

35

r0 5 10

ζ/∆

5

6

7

8

9

10

t0 0.2 0.4 0.6 0.8

ζ/∆

5.5

6

6.5

7

Fig. 3: Performance results for time delay estimation parameter estimation for a variety of rates of decay for the parametric

signal’s correlation function’s. The figures show the normalized minimum separation necessary for accurate estimation (σ ≤ ∆) as

a function of (left) decay coefficient, (middle) dynamic range, and (right) threshold value.

manner as before (maximum error over 1000 randomly drawn signals) by determining the minimum separationζ

for which the maximum observed estimation errorσ is equal to the PD parameter sampling step∆ = 0.02 µs. The

results in Figure 3 show the reciprocal relationship between the normalized minimum separationζ/∆ and the decay

parametera. Additionally, Figure 3 shows the logarithmic relationship between the normalized minimum separation

ζ/∆ and the dynamic range of the component magnitudesr whenfa = 10 Hz andt = 0.9. Finally, Figure 3 shows

the negative logarithm relationship between the normalized minimum separationζ/∆ and the thresholdt with

fa = 10 Hz andr = 1. All figures are indicative of agreement between Theorem 3 and practical results. Perhaps

the most important application of Theorem 3 focuses on the choice of thresholdt for the particular problem of

interest, which can improve the performance of the clustering method in compressive parameter estimation. To deal

with the problems of slow decay or large dynamic range, one can try to increase the threshold value on the proxy

rather than increasing the minimum separation to improve the estimation performance.

Our third and fourth experiments test the performance of clustering methods in compressive parameter estimation.

Page 17: Performance Cpe

17

Algorithm 2 Clustering Subspace Pursuit (CSP)

Input: measurement vectory, measurement matrixΦ, sparsityK, set of sampled parametersΘ, thresholdt

Output: estimated signalx, estimated parameter valuesθ

1: Initialize: x = 0, S = ∅, generate PDΨ from Θ.

2: repeat

3: yr = y − Φx {Compute residual}4: v = (ΦΨ)Hyr {Obtain proxy from residual}5: v(|v| ≤ t) = 0 {Threshold proxy}6: S = S ∪C(v, Θ,K) {Augment parameter estimates from proxy}7: c = (ΦΨS)

+y {Obtain proxy on parameter estimates}8: S = C(c, ΘS,K) {Refine parameter estimates}9: x = ΨScS {Assemble signal estimate}

10: θ = ΘS {Assemble parameter estimates}11: until a convergence criterion is met

K-median clustering is incorporated into subspace pursuit,a standard sparse recovery algorithm introduced in [27],

by replacing each instance of hard thresholding in subspacepursuit with an instance of Algorithm 1. The resulting

clustering subspace pursuit (CSP), as shown in Algorithm 2,is compared with the band-exclusion subspace pursuit

(BSP) used in [12, 16] for time delay estimation and frequency estimation.2 Similarly, CSP repeatedly computes

the proxy for the coefficient vectors from the measurement residual and then obtains the indices of the potential

parameter estimates from the thresholded proxy using Algorithm 1. A second clustering refines the estimation

after the potential estimates are merged the previous estimates in order to maintain a set ofK estimates. CSP

can also be armed with polar interpolation to significantly improve the estimation precision in a manner similar to

band-excluded interpolating subspace pursuit (BISP) [1, 42]; we call the resulting algorithm clustering interpolating

subspace pursuit (CISP).

Our third experiment tests the CSP, BSP, CISP, and BISP algorithms on1000 independent randomly generated time

delay estimation problems with minimum separationζ = 0.2 µs where CS measurements are taken under additive

white Gaussian noise (AWGN). We use a parametric dictionarywith parameter space sampling step∆ = 0.02 µs.

The maximum allowed coherence for BSP and BISP is chosen asν = 0.001 via grid search, and the threshold for

CSP and CISP is set ast = 0 (i.e., no thresholding takes place). Figure 4 shows the parameter estimation error as a

function of the CS compression rateκ =M/N when no noise is added. The results indicate that clustering-based

algorithms match the performance of their band-exclusion counterparts for most compression rates, without the

need to carefully tune a band exclusion parameter. Additionally, Figure 4 shows the parameter estimation error

as a function of the measurement’s signal-to-noise ratio (SNR) when the compression rateκ = 0.4, cf. Figure 3.

2In Algorithm 2, M+ denotes the pseudoinverse ofM .

Page 18: Performance Cpe

18

0 0.2 0.4 0.6 0.8 110

−20

10−15

10−10

10−5

100

105

Compression Rate κ

AverageParameterError[µ

s]

CSPBSPCISPBISP

0 20 40 60 80 10010

−6

10−4

10−2

100

SNR [dB]

Averageparametererror[µ

s]

CSPBSPCISPBISP

Fig. 4: Performance of compressive parameter estimation for the time delay estimation problem, as measured by the average

parameter error, as a function of (left) the CS compression rateκ = M/N and (right) the measurement SNR.

0 0.2 0.4 0.6 0.8 110

−20

10−15

10−10

10−5

100

105

Compression Rate κ

Averageparametererror[H

z]

CSPBSPCISPBISP

0 20 40 60 80 10010

−6

10−4

10−2

100

SNR [dB]

Averageparametererror[H

z]

CSPBSPCISPBISP

Fig. 5: Performance of compressive parameter estimation for the frequency estimation problem, as measured by the average

parameter error, as a function of (left) the CS compression rateκ = M/N with noiseless measurements and (right) the measurement

SNR withκ = 0.4.

CSP and CISP are shown to achieve the same noise robustness asBSP and BISP, respectively, as their curves

match almost exactly; we emphasize that our algorithm did noneed to perform careful parameter setting, since the

threshold levelt = 0.

In our fourth experiment, we repeat the third experiment on the frequency estimation problem instead, with1000

independent randomly drawn signals for each setup with minimum separationζ = 5 Hz and parameter sampling

step∆ = 0.5 Hz. The maximum allowed coherence for BSP and BISP isν = 0.2, and the threshold for CSP and

CISP is set tot = 0.4. Similarly as in the third experiment, Figure 5 shows the estimation error as a function of

the compression rate and the SNR: CSP can match the performance as BSP in this challenging problem with the

proper threshold, even under the presence of compression and noise.

In the last experiment, we apply our proposed clustering-based algorithm for compressive parameter estima-

tion with a real-world signal. The lynx signal hasN = 114 samples and is used to test the performance of line

spectral estimation algorithms in [43]. It is well approximated by a sum of complex sinusoids with small minimum

separation distance and large dynamic range among the component magnitudes. We increase the size of the signal

Page 19: Performance Cpe

19

Compression Rate κ0.2 0.4 0.6 0.8 1

AverageEstim

ationError[H

z]

×10-3

1

1.5

2

2.5

3

CISPBISP10dB30dB50dB

Fig. 6: Performance of compressive parameter estimation algorithm for the real frequency estimation problem. Dash-dot, dashed,

dotted lines represent the average relative estimation errors when SNR = 10, 30, 50 dB, and lines with circle and cross areerrors of

CISP and BISP.

by a factor of 10 using interpolation and obtain CS measurements of the resulting signal with a random matrix for

various compression rates under several levels of measurement noise. The maximum allowed coherence in BISP

is set toν = 0.2 after a grid search to optimize the algorithm’s performance, while the threshold level in CISP is

set tot = 0. Figure 6 shows the average relative estimation error between the estimates from CISP and BISP and

those obtained from root MUSIC (a line spectral estimation algorithm with high accuracy [43]) when applied to

the full length signal. The results show that CISP without thresholding has closer performance to root MUSIC than

the best configuration of BISP.

V. D ISCUSSION ANDCONCLUSIONS

In this paper, we have introduced and analyzed the relationship between the EMD applied to PD coefficient vectors

and the parameter estimation error obtained from sparse approximation methods applied to the PD representation.

We also leveraged the relationship between EMD-based sparse approximation andK-median clustering algorithms

in the design of new compressive parameter estimation algorithms. Based on the relationship between the EMD and

the parameter estimation error, we have analytically shownthat the EMD between PD coefficient vectors provides

an upper bound of the parameter estimation error obtained from methods that use PDs and EMD. Furthermore,

we leveraged the known connection between EMD-sparse approximation andK-median clustering, to formulate

new algorithms that employ sparse approximation in terms ofthe EMD; we then derived three theoretical results

that provide performance guarantees for EMD-based parameter estimation algorithms under certain requirements

for the signals observed, in contrast to existing work that does not provide similar guarantees. Our experimental

results show the validation of our analysis in several practical settings, and provides methods to control the effect

of coherence, compression, and noise in the performance of compressive parameter estimation.

In our new compressive parameter estimation algorithms, weuseK-median clustering rather than the more

predominantK-means clustering to obtain the sparse approximation or thelocal maxima of the proxy. The main

Page 20: Performance Cpe

20

difference betweenK-median clustering andK-means clustering is that their criteria in (3) are different: the former

uses the Manhattan distance and the latter uses the squared Euclidean distance. This difference preventsK-means

clustering in general from being able to return the EMD-optimal sparse approximation.

Our experiments have shown that our clustering-based algorithms for compressive parameter estimation can

achieve the same performance as those based on band exclusion. Though both methods use additional parameters

to improve the performance, the clustering method is preferable as it does not need to rely on parameter tuning

under a noiseless setting and for simpler problems, while the band exclusion is highly dependent on the allowed

coherence in all cases. The threshold level is needed only incases where the estimation problem is particularly

ill-posed, CS has been heavily used, measurement noise is present, or in other cases where the correlation function

decays slowly. As shown in Figure 4, the clustering method without thresholding has the same performance as the

band-exclusion method with optimally tuned maximum coherence. Interested readers can refer to [1], where we

have further studied the different sensitivities of these two methods on the additional parameters.

Atomic norm minimization has been recently proposed to apply sparsity concepts in line spectral estimation

for fully sampled and subsampled signals [44, 45] as well as for the super-resolution problem [46, 47]. Both our

clustering method and atomic norm minimization exploit theconcept of sparse representations for the signals of

interest. In atomic norm minimization, the sparsity is enforced by minimizing the atomic norm of the recovered

signal, while in our method a parametric dictionary is used to obtain a sparse coefficient vector. Additionally,

it is not easy to extend the atomic norm minimization to caseswhere the observations cannot be obtained via

subsampling, since the equivalent semidefinite programming form of the atomic norm will not exist. On the contrary,

our PD-based compressive parameter estimation algorithm can be applied straightforwardly to these tasks by using

a CS measurement matrixΦ containing only the rows of the identity matrix corresponding to the samples taken.

ACKNOWLEDGEMENTS

We thank Armin Eftekhari and Michael Wakin for helpful comments and for pointing us to [40].

APPENDIX A

PROOF OFTHEOREM 1

We first consider the case where the two vectorsc and c have the sameℓ1 norm, so that the standard definition

of the EMD applies. Letf∗ = [f∗11, f

∗12, . . . , f

∗1K , f

∗21, . . . , f

∗KK ]

T be the vector containing allf∗ij that solves the

optimization problem (7),g∗ be the similarly-defined binary vector that solves the optimization problem (10), and

r = f∗ − cming∗ is a flow residual. Similarly, letd and t be the similarly-defined vectors collecting all ground

distancesdij and tij . Then from (10) and (7), we have

EMD(c, c) = dT f∗ = dT (cming∗ + r)

= cmindT g∗ + dT r =

cmin

∆tT g∗ + dT r

=cmin

∆PEE(θ, θ) + dT r. (31)

Page 21: Performance Cpe

21

Note that the first term in (31) is the value of the objective function in the optimization problem (7) when all entries

of both c and c have magnitudecmin. The second term corresponds to the contribution to the objective function

due to magnitudes that are larger thancmin. We show now that this second term is nonnegative.

When the magnitude of the entryθi of c increases from its baseline value ofcmin to ci, at least one of the

outgoing flowsfij will need to increase. This implies that the corresponding flow f∗ij ≥ cming

∗ij . Thus we will

have for such an increased flow, the residualrij = f∗ij − cming

∗ij ≥ 0. So, having shown thatr is nonnegative, we

have thatdT r ≥ 0. Then one can rewrite (31) as EMD(c, c) ≥ cmin

∆PEE(θ, θ), proving the theorem. The result is

still valid when‖c‖1 6= ‖c‖1, as the added mismatch penalty further increases the EMD value. �

APPENDIX B

PROOF OFTHEOREM 2

Asymptotically, when the sampling step of the parameter∆ → 0, the proxy defined as (16) becomes a continuous

function such that

v(ω) =K∑

i=1

λ(ω − θi) (32)

for all ω ∈ Θ. In additional, the balanced weight properties around the cluster centroidθi, as defined in (6), reduces

to the equality ∫

p∈Cj ,p≤θi

w(p)dp =

p∈Cj,p≥θj

w(p)dp, (33)

wherep is the position function andw is the weight function. Additionally, the cumulative correlation function in

(14) converges to the integral

Λ(θ) =

∫ θ

−∞

λ(ω)dω. (34)

Without loss of generality, ifθmin = min(Θ) andθmax = max(Θ), assume that parameter values are sorted so

that

θmin + ǫ ≤ θ1 < θ2 < · · · < θK ≤ θmax− ǫ. (35)

When the entries of the proxyv is clustered intoK groups according to the centroidsθ1, θ2, . . . , θK , as shown

in Algorithm ??, the point(θi + θi+1)/2 is the upper bound for clusteri and the lower bound for clusteri + 1,

since it has the same distance to both centroids. We will showhow large the minimum separationζ and minimum

off-bound distanceǫ need to be such that the maximum estimation error ise, i.e. maxk

∣∣∣θk − θk

∣∣∣ = e.

We first consider the cases2 ≤ k ≤ K − 1: the k-th cluster with centroidθk includes the parameter range[θk−1+θk

2, θk+θk+1

2

]. According to the weight balance property (33), we need thatthe proxy function in (32) have

Page 22: Performance Cpe

22

the same sum over the range[θk−1+θk

2, θk

]and

[θk,

θk+θk+1

2

], i.e.,

∫ θk

θk−1+θk2

v(ω) dω =

∫ θk+θk+1

2

θk

v(ω) dω,

∫ θk

−∞

v(ω) dω =

∫ θk−1+θk2

−∞

v(ω) dω +

∫ θk+θk+1

2

−∞

v(ω) dω,

2K∑

i=1

ciΛ(θk − θi

)=

K∑

i=1

ciΛ

(θk−1 + θk

2− θi

)+

K∑

i=1

ciΛ

(θk + θk+1

2− θi

),

2ckΛ(θk − θk

)=

K∑

i=1

ciΛ

(θk−1 + θk

2− θi

)+

K∑

i=1

ciΛ

(θk + θk+1

2− θi

)− 2

i6=k

ciΛ(θk − θi

). (36)

Sinceθk − θk ≥ −e andθk+1 − θk ≥ ζ, for k = 2, 3, · · · ,K − 1, we obtain a lower bound of the left hand side

of (36) by repeatedly using the fact thatΛ(ω) is nondecreasing:K∑

i=1

ciΛ

(θk−1 + θk

2− θi

)+

K∑

i=1

ciΛ

(θk + θk+1

2− θi

)− 2

i6=k

ciΛ(θk − θi

)

≥k−1∑

i=1

ciΛ

(θk−1 + θk

2− θk−1

)+

K∑

i=k

ciΛ

(θk−1 + θk

2− θK

)+

k∑

i=1

ciΛ

(θk + θk+1

2− θk

)

+

K∑

i=k+1

ciΛ

(θk + θk+1

2− θK

)− 2

k−1∑

i=1

ciΛ(θk − θ1

)− 2

K∑

i=k+1

ciΛ(θk − θk+1

)

≥k−1∑

i=1

ciΛ

(θk−1 − θk−1 + θk − θk + θk − θk−1

2

)+

K∑

i=k

ciΛ (−∞)

+

k∑

i=1

ciΛ

(θk − θk + θk+1 − θk+1 + θk+1 − θk

2

)+

K∑

i=k+1

ciΛ (−∞)− 2

k−1∑

i=1

ciΛ (∞)

− 2

K∑

i=k+1

ciΛ(θk − θk + θk − θk+1

)

≥k−1∑

i=1

ciΛ

2− e

)+

k∑

i=1

ciΛ

2− e

)− 2

k−1∑

i=1

ciΛ (∞)− 2

K∑

i=k+1

ciΛ

(−ζ2+ e

)

≥k−1∑

i=1

ciΛ

2− e

)+

k∑

i=1

ciΛ

2− e

)− 2

k−1∑

i=1

ciΛ (∞)− 2

K∑

i=k+1

ci

(Λ(∞)− Λ

2− e

))

≥− 2∑

i6=k

cickck

(Λ (∞)− Λ

2− e

))+ ckΛ

2− e

)≥ −2(K − 1)rck

(Λ (∞)− Λ

2− e

))+ ckΛ

2− e

)

≥ (2(K − 1)r + 1) ckΛ

2− e

)− 2(K − 1)rckΛ(∞).

Plugging in this lower bound, we have that

2Λ(θk − θk

)≥ (2(K − 1)r + 1)Λ

2− e

)− 2(K − 1)rΛ(∞). (37)

Similarly, the upper bound of the left hand side of (36) is

2Λ(θk − θk

)≤ (2(K − 1)r + 2)Λ(∞)− (2(K − 1)r + 1)Λ

2− e

). (38)

Page 23: Performance Cpe

23

If ζ satisfies

Λ

2− e

)≥ Λ(∞)

(1− Λ(σ)/Λ(0)− 1

2(K − 1)r + 1

), (39)

it is easy to verify that

Λ(θk − θk

)≥ (2(K − 1)r + 1)Λ

2− e

)− 2(K − 1)rΛ(∞)

≥ (2(K − 1)r + 1)Λ(∞)− 2Λ(σ) + Λ(∞)− 2(K − 1)rΛ(∞)

≥ 2Λ(∞)− 2Λ(σ) ≥ 2Λ(−σ),

(40)

and

2Λ(θk − θk

)≤ (2(K − 1)r + 2)Λ(∞)− (2(K − 1)r + 1)Λ

2− e

)

≤ (2(K − 1)r + 2)Λ(∞)− (2(K − 1)r + 1)Λ(∞) + 2Λ(σ)− Λ(∞)

≥ 2Λ(σ),

(41)

which imply −σ ≤ θk − θk ≤ σ for k = 2, 3, · · · ,K − 1.

Next, we consider the first cluster with centroidθ1, which includes the parameter range[θmin,

(θ1 + θ2

)/2].

From the weight balance property, we have

∫ θ1

θmin

v(ω) dω =

∫ θ1+θ22

θ1

v(ω) dω,

2

∫ θ1

−∞

v(ω) dω =

∫ θmin

−∞

v(ω) dω +

∫ θ1+θ22

−∞

v(ω) dω,

2

K∑

i=1

ciΛ(θ1 − θi) =

K∑

i=1

ciΛ(θmin − θi) +

K∑

i=1

ciΛ

(θ1 + θ2

2− θi

),

2c1Λ(θ1 − θ1) =K∑

i=1

ciΛ(θmin − θi) +K∑

i=1

ciΛ

(θ1 + θ2

2− θi

)− 2

K∑

i=2

ciΛ(θ1 − θi). (42)

If ǫ satisfies

Λ(ǫ) ≥ Λ(∞)

(1− Λ(σ)/Λ(0)− 1

2Kr

), (43)

then we have the following result from (42):

2Λ(θ1 − θ1) ≤K∑

i=1

cic1Λ(θmin − θi) +

K∑

i=1

cic1Λ

(θ1 + θ2

2− θi

)− 2

K∑

i=2

cic1

Λ(θ1 − θi

)

≤K∑

i=1

cic1Λ(θmin − θ1) + Λ

(θ1 + θ2

2− θ1

)+

K∑

i=2

cic1

Λ

(θ1 + θ2

2− θ2

)− 2

K∑

i=2

cic1Λ(θ1 − θK

)

≤ KrΛ(−ǫ) + Λ(∞) + (K − 1)rΛ

(−ζ2+ e

)

≤ Kr (Λ(∞)− Λ(ǫ)) + Λ(∞) + (K − 1)r

(Λ(∞)− Λ

2− e

))

≤ Kr2Λ(σ)− Λ(∞)

2Kr+ Λ(∞) + (K − 1)r

2Λ(σ)− Λ(∞)

2(K − 1)r + 1

≤ 2Λ(σ)

Page 24: Performance Cpe

24

It can be similarly shown for the last cluster centroid that2Λ(θK − θK) ≥ 2Λ(−σ).In summary, when all estimation errors are smaller thanσ, it is straightforward for us to replace thee in (39)

by σ to get the expected condition onζ:

Λ

2− σ

)≥ Λ(∞)

(1− Λ(σ)/Λ(0)− 1

2(K − 1)r + 1

)(44)

APPENDIX C

PROOF OFTHEOREM 3

When the redefined correlation function isλΦ(ω) = exp (−a|ω|), the proxy function given in (24) is

v(ω) =

K∑

i=1

ci exp (−a |ω − θi|) . (45)

Without loss of generality, assume parameter values are sorted so thatθ1 < θ2 < · · · < θK and all component

magnitudes no smaller than 1.

Assume only the proxy in the parameter range[lj , uj ] around eachθj will be preserved after thresholding with

level t. We havel1 < θ1 < u1 < · · · < θj−1 < uj−1 < lj < θj < uj < · · · < θK . So the proxy atω = uj has

value equal to the threshold, i.e.,

t

cj=ν (uj)

cj=

K∑

i=1

cicj

exp (−a |uj − θi|) ,

t

cj=

j−1∑

i=1

cicj

exp (−a (uj − θi)) + exp (−a(uj − θj)) +

K∑

i=j+1

cicj

exp (−a (θi − uj)) ,

t

cj= exp (−a (uj − θj))

j−1∑

i=1

cicj

exp (−a (θj − θi)) + exp (−a (uj − θj)) ,

+ exp(−a(θj − uj))

K∑

i=j+1

cicj

exp (−a(θi − θj)) ,

Tj = Aj

1

Uj

+1

Uj

+BjUj , (46)

whereUj = exp(a(uj − θj)) > 1,

0 ≤ Aj =

j−1∑

i=1

cicj

exp (−a (θj − θi)) ≤j−1∑

i=1

cicj

exp (−aiζ) ≤ r1− exp (−aζj)1− exp (−aζ) ≤ r

exp (aζ)− 1,

0 ≤ Bj =K∑

i=j+1

cicj

exp (−a (θi − θj)) ≤K−j∑

i=1

cicj

exp (−aiζ) ≤ r1 − exp (−aζ(K − j + 1))

1− exp (−aζ) ≤ r

exp (aζ)− 1,

andTj = t/cj . Aj andBj satisfy

max ((1 +Aj)Bj , Aj (1 +Bj)) ≤(1 +

r

exp (aζ)− 1

)r

exp (aζ)− 1≤ 2r2

(exp (aζ)− 1)2 (47)

Page 25: Performance Cpe

25

One solution of the quadratic equation (46) is

Uj =Tj −

√T 2j − 4(1 +Aj)Bj

2Bj

. (48)

In this solution we can see thatUj decreases asTj increases. The alternative solution is omitted, sinceUj will

increase asTj increases.

Similarly,

t

cj=ν(lj)

cj=

K∑

i=1

exp (−a |lj − θi|) ,

t

cj=

j−1∑

i=1

cicj

exp (−a (lj − θi)) + exp (−a (θj − lj)) +

K∑

i=j+1

cicj

exp (−a (θi − lj)) ,

t

cj= exp (−a (lj − θj))

j−1∑

i=1

cicj

exp (−a (θj − θi)) + exp (−a (θj − lj)) ,

+ exp (−a (θj − lj))

K∑

i=j+1

cicj

exp (−a (θi − θj)) ,

Tj = AjLj +1

Lj

+Bj

1

Lj

, (49)

whereLj = exp (a (θj − lj)) > 1. And the solution is

Lj =Tj −

√T 2j − 4Aj(1 +Bj)

2Aj

. (50)

In order that the solutions (48) and (50) are real value,Tj or t should be larger enough. Only If

t ≥ 2cmax

√r + r2

exp(aζ) − 1(51)

can we have the relationship

T 2j ≥

(t

cj

)2

≥(

t

cmax

)2

≥4r1

exp(aζ) − 1+ 4r2

1

exp(aζ)− 1

≥4r1

exp(aζ) − 1+ 4r2

1

exp(aζ)− 1

≥max(4Aj + 4AjBj , 4Bj + 4AjBj)

(52)

such that both (48) and (50) are real value.

Let θj ∈ [lj , uj] be the estimated parameter forθj . Asymptotically, when the sampling step of the parameter

space∆ goes to zero, the balance weight properties implies∫ θj

lj

K∑

i=1

ci exp (−a |ω − θi|) dω =

∫ uj

θj

K∑

i=1

ci exp (−a |ω − θi|) dω. (53)

Page 26: Performance Cpe

26

When θj ≤ θj , we have

a

cj

∫ θj

lj

K∑

i=1

ci exp (−a |ω − θi|) dω

=a

j−1∑

i=1

cicj

∫ θj

lj

exp (−a (ω − θi)) dω + a

∫ θj

lj

exp (−a (θj − ω)) dω + aK∑

i=j+1

cicj

∫ θj

lj

exp (−a (θi − ω)) dω

=aAj

∫ θj

lj

exp (−a (ω − θj)) dω + a

∫ θj

lj

exp (−a (θj − ω)) dω + aBj

∫ θj

lj

exp (−a (θj − ω)) dω

=Aj (Lj − Ej) + (Bj + 1)

(1

Ej

− 1

Lj

)

=−AjEj +Bj

1

Ej

+1

Ej

+AjLj −1

Lj

−Bj

1

Lj

,

(54)

whereEj = 1/λ(θj − θj) = exp(a∣∣∣θj − θj

∣∣∣)= exp

(a(θj − θj

))≥ 1, and

a

cj

∫ uj

θj

K∑

i=1

ci exp (−a |ω − θi|) dω

=a

j−1∑

i=1

cicj

∫ uj

θj

exp (−a (ω − θi)) dω + a

∫ uj

θj

exp (−a |ω − θj |) dω + a

K∑

i=j+1

cicj

∫ uj

θj

exp (−a (θi − ω)) dω

=a

j−1∑

i=1

cicj

∫ uj

θj

exp (−a (ω − θi)) dω + a

∫ θj

θj

exp (−a (θj − ω)) dω

+ a

∫ uj

θj

exp (−a (ω − θj)) dω + a

K∑

i=j+1

cicj

∫ uj

θj

exp (−a (θi − ω)) dω

=Aj

(Ej −

1

Uj

)+ 1− 1

Ej

+ 1− 1

Uj

+Bj

(Uj −

1

Ej

)

=AjEj −Bj

1

Ej

+ 2− 1

Ej

−A1

Uj

− 1

Uj

+BjUj.

(55)

After plugging (54) and (55) into (53) and moving all terms with Lj or Uj to the right side and moving other terms

Page 27: Performance Cpe

27

to the left side, we obtain

2AjEj − 2Bj

1

Ej

+ 2− 2

Ej

=AjLj −1

Lj

−Bj

1

Lj

+Aj

1

Uj

+1

Uj

−BjUj

=AjLj +AjLj − Tj + Tj −BjUj −BjUj

=2AjLj − 2BjUj.

=(Tj −

√T 2j − 4Aj(1 +Bj)

)−(Tj −

√T 2j − 4(1 +Aj)Bj

)

=√T 2j − 4(1 +Aj)Bj −

√T 2j − 4Aj(1 +Bj)

=

(T 2j − 4(1 +Aj)Bj

)−(T 2j − 4Aj(1 +Bj)

)√T 2j − 4Aj(1 +Bj) +

√T 2j − 4(1 +Aj)Bj

=4 (Aj −Bj)√

T 2j − 4Aj(1 +Bj) +

√T 2j − 4(1 +Aj)Bj

(56)

One can show a similar result whenθj > θj andEj = exp(a∣∣∣θj − θj

∣∣∣)= exp

(a(θj − θj

)), so that (53) is

reduced to

Aj −Bj

Sj

=

AjEj −Bj

1

Ej

+ 1− 1

Ej

if θj ≤ θj

Aj

1

Ej

−BjEj − 1 +1

Ej

if θj > θj

, (57)

where

Sj =1

2

(√T 2j − 4Aj(1 +Bj) +

√T 2j − 4(1 +Aj)Bj

)≤ Tj ≤

t

cmin

≤ 1 ≤ Ej , (58)

and

Sj =1

2

(√T 2j − 4Aj(1 +Bj) +

√T 2j − 4(1 +Aj)Bj

)≥√(

t

rcmin

)− 8

(r

exp (aζ)− 1

)2

(59)

with the relationship (47). We now show that if

Sj ≥√(

t

rcmin

)− 8

(r

exp (aζ)− 1

)2

≥ exp(−aσ), (60)

or

ζ ≥ 1

aln

(√8r2

t2/ (rcmin)2 − exp(−2aσ)

+ 1

), (61)

as we expected, the estimation error is small.

Whenθj ≤ θj ,

(Aj −Bj)/Sj = AjEj −Bj/Ej + 1− 1/Ej ≥ (Aj −Bj)/Ej , (62)

which requiresAj ≥ Bj due to the fact thatSj ≤ Ej . So

AjEj −Bj

1

Ej

+1− 1

Ej

=Aj −Bj

Sj

≤ (Aj −Bj) exp(aσ) ≤ Aj exp(aσ)−Bj exp(−aσ)+1− exp(−aσ), (63)

sinceexp (−aσ) ≤ 1 ≤ exp (aσ). This implies thatEj ≤ exp (aσ) and θj − θj ≤ σ. Similarly, whenθj > θj ,

Aj ≤ Bj and θj − θj ≥ −σ. �

Page 28: Performance Cpe

28

REFERENCES

[1] D. Mo and M. F. Duarte, “Compressive Parameter Estimation with Earth Mover’s Distance via K-Median Clustering,” inProc. SPIE

Wavelets and Sparsity XV, vol. 8858, San Diego, CA, Aug. 2013.

[2] R. G. Baraniuk, “Compressive Sensing,”IEEE Signal Proc. Mag., vol. 24, no. 4, pp. 118–121, Jul. 2007.

[3] E. J. Candes, “Compressive Sampling,” inProc. Int. Congr. Math. (ICM), vol. 3, Madrid, Spain, Aug. 2006, pp. 1433–1452.

[4] D. L. Donoho, “Compressed Sensing,”IEEE Trans. Inf. Theory, vol. 52, no. 4, pp. 1289–1306, Apr. 2006.

[5] V. Cevher, A. C. Gurbuz, J. H. McClellan, and R. Chellappa, “Compressive Wireless Arrays for Bearing Estimation,” inIEEE Int. Conf.

Acoustics, Speech and Signal Proc. (ICASSP), Las Vegas, NV, Apr. 2008, pp. 2497–2500.

[6] M. F. Duarte, “Localization and Bearing Estimation via Structured Sparsity Models,” inIEEE Stat. Signal Proc. Workshop (SSP), Ann

Arbor, MI, Aug. 2012, pp. 333–336.

[7] M. A. Herman and T. Strohmer, “High Resolution Radar via Compressed Sensing,”IEEE Trans. Inf. Theory, vol. 57, no. 6, pp. 2275–2284,

Jun. 2009.

[8] H. J. Rad and G. Leus, “Sparsity-Aware TDOA Localizationof Multiple Sources,”IEEE Trans. Signal Proc., vol. 61, no. 19, pp. 4021–4025,

Oct. 2013.

[9] S. Sen, G. Tang, and A. Nehorai, “Multiobjective Optimization of OFDM Radar Waveform for Target Detection,”IEEE Trans. Inf. Theory,

vol. 59, no. 2, pp. 639–652, Feb. 2011.

[10] I. Stojanovic, M. Cetin, and W. C. Karl, “Compressed Sensing of Monostatic and Multistatic SAR,”IEEE Geosci. Remote Sens. Lett,

vol. 10, no. 6, pp. 1444–1448, 2013.

[11] A. Eftekhari, J. Romberg, and M. B. Wakin, “Matched Filtering From Limited Frequency Samples,”IEEE Trans. Inf. Theory, vol. 59,

no. 6, pp. 3475–3496, Jun. 2013.

[12] K. Fyhn, M. F. Duarte, and S. H. Jensen, “Compressive Time Delay Estimation Using Interpolation,” inIEEE Global Conf. Signal and

Info. Processing (GlobalSIP), Austin, TX, Dec. 2013, pp. 624–624.

[13] C. D. Austin, R. L. Moses, J. N. Ash, and E. Ertin, “On the Relation Between Sparse Reconstruction and Parameter Estimation With

Model Order Selection,”IEEE J. Sel. Topics in Signal Proc., vol. 4, no. 3, pp. 560–570, Jun. 2010.

[14] S. Bourguignon, H. Carfantan, and J. Idier, “A Sparsity-Based Method for the Estimation of Spectral Lines From Irregularly Sampled

Data,” IEEE J. Sel. Topics in Signal Proc., vol. 1, no. 4, pp. 575–585, Dec. 2007.

[15] M. F. Duarte and R. G. Baraniuk, “Spectral Compressive Sensing,”Appl. and Comput. Harmonic Anal., vol. 35, no. 1, pp. 111–129, Jan.

2013.

[16] K. Fyhn, H. Dadkhahi, and M. F. Duarte, “Spectral Compressive Sensing with Polar Interpolation,” inIEEE Int. Conf. Acoustics, Speech

and Signal Proc. (ICASSP), Vancouver, Canada, May 2013, pp. 6225–6229.

[17] A. Fannjiang and W. Liao, “Coherence Pattern-Guided Compressive Sensing with Unresolved Grids,”SIAM J. Imaging Sci., vol. 5, no. 1,

pp. 179–202, Feb. 2012.

[18] Y. Rubner, C. Tomasi, and L. J. Guibas, “A Metric for Distributions with Applications to Image Databases,” inInt. Conf. Comput. Vision

(ICCV), Bombay, India, Jan. 1998, pp. 59–66.

[19] R. Gupta, P. Indyk, and E. Price, “Sparse recovery for Earth Mover Distance,” in48th Ann. Allerton Conf. Commun, Control, and

Computing, Monticello, IL, Sep. 2010, pp. 1742–1744.

[20] P. Indyk and E. Price, “K-Median Clustering, Model-Based Compressive Sensing, andSparse Recovery for Earth Mover Distance,” in

ACM Symp. Theory of Computing (STOC), San Jose, CA, Jun. 2011, pp. 627–636.

[21] D. Mo and M. F. Duarte, “Performance of Compressive Parameter Estimation with Earth Mover’s Distance viaK-Median Clustering,”

University of Massachusetts, Amherst, MA, Tech. Rep., Dec.2014, available at http://arxiv.org/pdf/1412.6724.

[22] E. J. Candes and T. Tao, “Near-Optimal Signal RecoveryFrom Random Projections: Universal Encoding Strategies?”IEEE Trans. Inf.

Theory, vol. 52, no. 12, pp. 5406–5425, Dec. 2006.

[23] R. G. Baraniuk, M. Davenport, R. DeVore, and M. B. Wakin,“A Simple Proof of the Restricted Isometry Property for Random Matrices,”

Constructive Approximation, vol. 28, no. 3, pp. 253–263, Dec. 2008.

[24] S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic Decomposition by Basis Pursuit,”SIAM J. Sci. Computing, vol. 43, no. 1, pp.

129–159, Jan. 2001.

[25] T. Blumensath and M. E. Davies, “Iterative Thresholding for Sparse Approximations,”J. Fourier Analysis and Applicat., vol. 14, no. 5,

Page 29: Performance Cpe

29

pp. 629–654, Dec. 2008.

[26] D. Needell and J. A. Tropp, “CoSaMP: Iterative signal recovery from incomplete and inaccurate samples,”Appl. and Comput. Harmonic

Anal., vol. 26, no. 3, pp. 301–321, May 2009.

[27] W. Dai and O. Milenkovic, “Subspace Pursuit for Compressive Sensing Signal Reconstruction,”IEEE Trans. Inf. Theory, vol. 55, no. 5,

pp. 2230–2249, May 2009.

[28] S. G. Mallat and Z. Zhang, “Matching Pursuits With Time-Frequency Dictionaries,”IEEE Trans. Inf. Theory, vol. 41, no. 12, pp. 3397–3415,

Dec. 1993.

[29] J. A. Tropp and A. C. Gilbert, “Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit,”IEEE Trans. Inf. Theory,

vol. 53, no. 12, pp. 4655–4666, Dec. 2007.

[30] R. G. Baraniuk, V. Cevher, M. F. Duarte, and C. Hegde, “Model-Based Compressive Sensing,”IEEE Trans. Inf. Theory, vol. 56, no. 4,

pp. 1982–2001, Apr. 2010.

[31] T. Blumensath and M. E. Davies, “Sampling Theorems for Signals From the Union of Finite-Dimensional Linear Subspaces,” IEEE Trans.

Inf. Theory, vol. 55, no. 4, pp. 1872–1882, Apr. 2009.

[32] D. L. Donoho and M. Elad, “Optimally Sparse Representation in General (Non-Orthogonal) Dictionaries viaℓ1 Minimization,” in Proc.

Natl. Acad. Sci. U.S.A., vol. 100, no. 5, Mar. 2003, pp. 2197–2202.

[33] J. A. Tropp, “Greed is Good: Algorithmic Results for Sparse Approximation,”IEEE Trans. Inf. Theory, vol. 50, no. 10, pp. 2231–2242,

Oct. 2004.

[34] H. Rauhut, K. Schnass, and P. Vandergheynst, “Compressed Sensing and Redundant Dictionaries,”IEEE Trans. Inf. Theory, vol. 54, pp.

2210–2219, May 2008.

[35] O. Pele and M. Werman, “Fast and Robust Earth Mover’s Distance,” in Int. Conf. Comput. Vision (ICCV), Kyoto, Japan, Sep. 2009, pp.

460–467.

[36] P. S. Bradley, O. L. Mangasarian, and W. N. Street, “Clustering via Concave Minimization,” inAdvances in Neural Inf. Proc. Systems

(NIPS), vol. 9, Denver, CO, Dec. 1996, pp. 368–374.

[37] P. N. Tan, M. Steinbach, and V. Kumar,Introduction to Data Mining, 2nd ed. Pearson Education, Limited, Dec. 2014.

[38] A. Schrijver, Theory of Linear and Integer Programming. New York, NY: John Wiley & Sons, 1998.

[39] S. S. Vempala,The Random Projection Method. American Math. Soc., 2005.

[40] A. Eftekhari and M. B. Wakin, “New Analysis of Manifold Embeddings and Signal Recovery from Compressive Measurements,” Appl.

and Comput. Harmonic Anal., vol. 39, no. 1, pp. 67–109, Jul. 2014.

[41] G. Tang and B. Recht, “Atomic Decomposition of Mixturesof Translation-Invariant Signals,” inComput. Advances in Multi-Sensor Adaptive

Proc. (CAMSAP), Saint Martin, France, Dec. 2013.

[42] C. Ekanadham, D. Tranchina, and E. P. Simoncelli, “Recovery of Sparse Translation-Invariant Signals With Continuous Basis Pursuit,”

IEEE Trans. Inf. Theory, vol. 59, no. 10, pp. 4735–4744, Oct. 2011.

[43] P. Stoica and R. L. Moses,Spectral Analysis of Signals. Prentice Hall, 2005.

[44] B. N. Bhaskar, G. Tang, and B. Recht, “Atomic Norm Denoising With Applications to Line Spectral Estimatin,”IEEE Trans. Signal Proc.,

vol. 61, no. 23, pp. 5987–5999, Dec 2013.

[45] G. Tang, B. N. Bhaskar, P. Shah, and B. Recht, “Compressed Sensing Off the Grid,”IEEE Trans. Inf. Theory, vol. 59, no. 11, pp.

7465–7490, Nov. 2013.

[46] E. J. Candes and C. Fernandez-Granda, “Super Resolution via Transform-Invariant Group-Sparse Regularization,” in Int. Conf. Comput.

Vision (ICCV), Sydney, Australia, Dec. 2013, pp. 3336–3343.

[47] ——, “Super-Resolution from Noisy Data,”J. Fourier Analysis and Applicat., vol. 19, no. 6, pp. 1229–1254, Dec. 2013.