Prateek Jain at Microsoft Research

12
One-bit Compressed Sensing: Provable Support and Vector Recovery Sivakant Gopi [email protected] IIT Bombay, Mumbai, India Praneeth Netrapalli [email protected] The University of Texas at Austin, Austin, TX, 78705 USA Prateek Jain [email protected] Microsoft Research India, Bangalore, India Aditya Nori [email protected] Microsoft Research India, Bangalore, India Abstract In this paper, we study the problem of one- bit compressed sensing (1-bit CS), where the goal is to design a measurement matrix A and a recovery algorithm such that a k-sparse unit vector x * can be efficiently recovered from the sign of its linear measurements, i.e., b = sign(Ax * ). This is an important problem for signal acquisition and has several learning applications as well, e.g., multi-label classi- fication (Hsu et al., 2009). We study this problem in two settings: a) support recov- ery: recover the support of x * , b) approx- imate vector recovery: recover a unit vec- tor ˆ x such that k ˆ x - x * k 2 . For sup- port recovery, we propose two novel and ef- ficient solutions based on two combinatorial structures: union free families of sets and ex- panders. In contrast to existing methods for support recovery, our methods are universal i.e. a single measurement matrix A can re- cover all the signals. For approximate recov- ery, we propose the first method to recover a sparse vector using a near optimal number of measurements. We also empirically validate our algorithms and demonstrate that our al- gorithms recover the true signal using fewer measurements than the existing methods. Proceedings of the 30 th International Conference on Ma- chine Learning, Atlanta, Georgia, USA, 2013. JMLR: W&CP volume 28. Copyright 2013 by the author(s). 1. Introduction Several machine learning tasks require estimating a large number of parameters using a small number of training samples. In general, this problem is degener- ate as many parameter vectors can be consistent with the same training data. However, recent works in the area of compressed sensing as well as high-dimensional statistics have shown that if the true parameter vector has certain structure (for example, sparsity, low-rank), then the estimation problem can be solved efficiently (Cand` es & Tao, 2005; Cand` es & Recht, 2009; Negah- ban et al., 2009). The above problem can be studied in a number of dif- ferent settings such as compressed sensing, statistical learning etc. In this paper, we mostly focus on the pop- ular compressed sensing setting, where the goal is to design measurement matrices and recovery algorithms to estimate sparse vectors using a few linear measure- ments (Baraniuk et al., 2010). While the key appli- cation of this problem has been in the area of signal acquisition, it has also found applications in several learning related problems (Hsu et al., 2009; Duarte et al., 2008; Wright et al., 2010). In compressive sensing, a k-sparse signal x * R n is encoded as b = Ax * , A R m×n , so that given b and A, the sparse signal x * can be recovered exactly. In this paper, we mostly focus on recovering sparse signals; we briefly discuss extensions to other compressible signals in Section 4.1. Several results in compressive sensing (Cand` es & Tao, 2005; Garg & Khandekar, 2009) have shown that x *

Transcript of Prateek Jain at Microsoft Research

Page 1: Prateek Jain at Microsoft Research

One-bit Compressed Sensing: Provable Support and VectorRecovery

Sivakant Gopi [email protected]

IIT Bombay, Mumbai, India

Praneeth Netrapalli [email protected]

The University of Texas at Austin, Austin, TX, 78705 USA

Prateek Jain [email protected]

Microsoft Research India, Bangalore, India

Aditya Nori [email protected]

Microsoft Research India, Bangalore, India

Abstract

In this paper, we study the problem of one-bit compressed sensing (1-bit CS), where thegoal is to design a measurement matrix Aand a recovery algorithm such that a k-sparseunit vector x∗ can be efficiently recoveredfrom the sign of its linear measurements, i.e.,b = sign(Ax∗). This is an important problemfor signal acquisition and has several learningapplications as well, e.g., multi-label classi-fication (Hsu et al., 2009). We study thisproblem in two settings: a) support recov-ery: recover the support of x∗, b) approx-imate vector recovery: recover a unit vec-tor x such that ‖x − x∗‖2 ≤ ε. For sup-port recovery, we propose two novel and ef-ficient solutions based on two combinatorialstructures: union free families of sets and ex-panders. In contrast to existing methods forsupport recovery, our methods are universali.e. a single measurement matrix A can re-cover all the signals. For approximate recov-ery, we propose the first method to recover asparse vector using a near optimal number ofmeasurements. We also empirically validateour algorithms and demonstrate that our al-gorithms recover the true signal using fewermeasurements than the existing methods.

Proceedings of the 30 th International Conference on Ma-chine Learning, Atlanta, Georgia, USA, 2013. JMLR:W&CP volume 28. Copyright 2013 by the author(s).

1. Introduction

Several machine learning tasks require estimating alarge number of parameters using a small number oftraining samples. In general, this problem is degener-ate as many parameter vectors can be consistent withthe same training data. However, recent works in thearea of compressed sensing as well as high-dimensionalstatistics have shown that if the true parameter vectorhas certain structure (for example, sparsity, low-rank),then the estimation problem can be solved efficiently(Candes & Tao, 2005; Candes & Recht, 2009; Negah-ban et al., 2009).

The above problem can be studied in a number of dif-ferent settings such as compressed sensing, statisticallearning etc. In this paper, we mostly focus on the pop-ular compressed sensing setting, where the goal is todesign measurement matrices and recovery algorithmsto estimate sparse vectors using a few linear measure-ments (Baraniuk et al., 2010). While the key appli-cation of this problem has been in the area of signalacquisition, it has also found applications in severallearning related problems (Hsu et al., 2009; Duarteet al., 2008; Wright et al., 2010).

In compressive sensing, a k-sparse signal x∗ ∈ Rn isencoded as

b = Ax∗, A ∈ Rm×n,so that given b and A, the sparse signal x∗ can berecovered exactly. In this paper, we mostly focus onrecovering sparse signals; we briefly discuss extensionsto other compressible signals in Section 4.1.

Several results in compressive sensing (Candes & Tao,2005; Garg & Khandekar, 2009) have shown that x∗

Page 2: Prateek Jain at Microsoft Research

One-Bit Compressed Sensing: Provable Support and Vector Recovery

Table 1. Support Recovery: Comparison of different algorithms for 1-bit CS support recovery in terms of number ofmeasurements required, running time and universality. Note that our methods require slightly more measurements butare universal which is critical for compressed sensing algorithms as sampling a new matrix A for each signal is practicallyinfeasible.

Algorithm No. of measurements (m) Running Time Universal Negative x∗ allowedHB (Haupt & Baraniuk, 2011) O (k logn) O (n logn) No Yes

UFF (Algorithm 2) O`k2 logn

´O (nk logn) Yes No

Expanders (Algorithm 3) O`k3 logn

´O (nk logn) Yes Yes

Table 2. Approximate Vector Recovery: Comparison of different algorithms for 1-bit CS approximate vector recovery.Note that both our methods have a dependence on error (ε) that is near optimal while existing methods require asignificantly larger number of measurements and in typical settings, higher running time complexity as well.

Algorithm No. of measurements (m) Running Time UniversalPlan and Vershynin (Plan & Vershynin, 2011) O

`1ε5k log2 n

k

´O`

1ε5kn4 log2 n

k

´Yes

Plan and Vershynin (Plan & Vershynin, 2012) O`

1ε6k log n

k

´O`

1ε6nk log n

k

´Yes

Two-stage Algorithm (Algorithm 6) eO ` 1εk log n

k

´ eO `nk log nk

+ 1ε5

(k log nk

)5´

Yes

S-Approx (Algorithm 7) eO `k3 log nk

+ kε

´ eO “nk log nk

+ k5

ε5

”Yes

can be recovered using only m = O (k log n) linearmeasurements. However, all the above approaches re-quire the measurements b to be known exactly (upto infinite precision). Naturally, this requirement isnot practical, e.g., image sensors cannot store mea-surements up to arbitrary accuracy. Furthermore, ar-bitrarily quantized measurements might lead to largeerrors in recovery.

To address this issue and to simplify the signal acqui-sition process, (Boufounos & Baraniuk, 2008) intro-duced the problem of one-bit compressed sensing (or1-bit CS) where only one bit of the linear measure-ments, specifically their signs are observed. In partic-ular, given A and

b = sign(Ax∗), (1)

we need to recover the k-sparse signal x∗. Apart fromease of their implementation using comparators, theabove measurements are also known to be more ro-bust to noise and non-linearity, and in certain situa-tions perform better than standard compressive sens-ing (Laska & Baraniuk, 2012).

Note that using 1-bit measurements (1), we cannot re-cover the norm of x∗ from b because scaling x∗ doesnot change the measurements. Similarly, a small per-turbation in x∗ may not change b. Therefore, exactrecovery of x∗ is in general not possible, even when x∗

is a unit vector.

Instead, 1-bit CS is typically studied in these twosettings:Support recovery: recover the support of x∗,Approximate vector recovery: recover xthat is close to x∗ (up to normalization), i.e.,∣∣∣∣∣∣ x‖x‖2 −

x∗

‖x∗‖2

∣∣∣∣∣∣2≤ ε, where ε > 0 is a given

approximation factor.

For both of the above problems, a solution is evaluatedon the following three critical parameters: 1) Numberof measurements (m), 2) Running time of the recov-ery algorithm, and 3) Universality of the measurementmatrix. A 1-bit CS method is universal if a fixed de-sign matrix A can be used to recover all sparse signals.Note that universality is a crucial requirement, as it ispractically infeasible in several 1-bit CS applications(for instance, a single-pixel camera) to construct a newA for each signal.

In this paper, we study one-bit compressive sensing inboth the above mentioned settings and improve uponthe state-of-the-art results in those settings.

1.1. Support Recovery

Existing work: The best known solution for supportrecovery is by (Haupt & Baraniuk, 2011) that usesO(k log n) measurements. However, their solution isnot universal, which is crucial for several real-worldapplications.Our Results: We propose the first universal mea-surement matrices for support recovery in the 1-bitcompressed sensing problem. Our solutions are basedon two combinatorial structures, called union free setsand expander graphs. Compared to existing work, ourmeasurement schemes however require a factor of O(k)and O(k2) more measurements respectively. See Ta-ble 1 for a comparison of our methods with the methodby (Haupt & Baraniuk, 2011) with respect to the abovementioned critical problem parameters. We would liketo note that while expanders have previously been usedin compressed sensing (Jafarpour et al., 2009), to thebest of our knowledge, union free sets have so far notbeen used in this domain and might have applicationsto other related tasks as well.

Page 3: Prateek Jain at Microsoft Research

One-Bit Compressed Sensing: Provable Support and Vector Recovery

1.2. Approximate Recovery

Existing work: (Plan & Vershynin, 2011) and(Plan & Vershynin, 2012) provide provable and ef-ficient recovery methods for this problem. In par-ticular, (Plan & Vershynin, 2012) provides a pow-erful framework for recovering a large class of com-pressible signals using only one-bit measurements.However, the number of measurements required byboth (Plan & Vershynin, 2011) and (Plan & Ver-shynin, 2012) are sub-optimal in the dependence onε(O(ε−5)

and O(ε−6)

respectively).

Our Results: We propose a novel solution that ex-ploits well-known results for the standard compressedsensing problem to guarantee recovery using an opti-mal number of measurements, i.e., O

(ε−1)

– see sup-plementary material for a lower bound. See Table 2for a comparison of our proposed method with the ex-isting methods.

Finally, our experimental results show that our meth-ods are also empirically competitive with existingmethods. Since the focus of this paper is on practicaland provable methods for 1-bit CS, we draw a com-parison only against known state-of-the-art provablemethods.

Notation: We denote vectors using bold-faced letters(e.g., x) and matrices using capital letters (e.g., A).xi denotes the i-th element of x, and a(i) denotes i-th row of A. x(S) denotes elements of x restrictedto set S, and A(S) denotes columns of A from set S.A ∈ Rm×n denotes a design matrix, and x∗ ∈ Rndenotes the true signal. ‖x‖p denotes the `p norm ofx, and ‖x‖0 denotes the number of non-zeros in x.supp(x) denotes the set of non-zero elements of x. Weuse O (·) to ignore poly(log k+log log n) factors. Wlogstands for with out loss of generality, and s.t. standsfor such that.

2. Related Work

Compressive sensing (CS) using precise linear mea-surements is a well-studied problem and several meth-ods (Candes & Tao, 2005; Tropp & Gilbert, 2007; Ja-farpour et al., 2009) are known to achieve efficient re-covery using a near optimal number of measurements.In comparison, the problem of 1-bit CS is relativelynew and the state-of-the-art algorithms still lack incertain regards. For support recovery, existing algo-rithms are not universal while for approximate recov-ery they do not have information theoretic optimalmeasurement complexity bounds (See previous sectionfor more detailed discussion).

Apart from provable recovery methods, several heuris-

tics have also been proposed for this problem(Boufounos, 2009; Laska et al., 2011; Jacques et al.,2011); these methods have good empirical performancebut lack theoretical guarantees. Apart from the stan-dard 1-bit CS problem, several variants/extensionshave also been studied. For instance, (Davenportet al., 2012) recently studied a similar problem called1-bit matrix completion . (Ai et al., 2012) recentlyextended recovery results to measurement matricesA sampled from more general sub-Gaussian distribu-tions.

3. Support RecoveryProblem statement: Design a measurement matrix A ∈Rm×n, and a recovery algorithm for the following prob-lem: given b = sign(Ax∗) with x∗ ∈ Rn, ‖x∗‖0 ≤ k,find supp(x∗).For this problem, we propose two different approachesbased on: a) union free family (UFF) of sets, b) ex-pander graphs. For both these approaches, we providethe design matrix as well as the corresponding recoveryalgorithm.

3.1. Support Recovery using Union FreeFamily (UFF)

In this section, we describe an efficient algorithm thatrecovers the support of any non-negative vector usingO(k2 log n

)measurements.

3.1.1. UFF Background

Let U be a fixed set, and let Bi ⊆ U, 1 ≤ i ≤ n. Then,the family of sets F = B1, · · · , Bn is said to be k-union free if no Bi lies in the union of any other k setsfrom F .Definition 1. A family of sets F : = B1, · · · , Bnwith underlying set U = ∪ni=1Bi is called a k-union-free family (k-UFF) iff: Bi0 * Bi1 ∪ · · · ∪Bik ,for all distinct Bi0 , Bi1 , · · · , Bik ∈ F .Definition 2. A k-UFF is called d-regular-k-union-free (d, k)-UFF if: ∀Bi ∈ F , |Bi| = d.

The following theorem from (Erdos et al., 1982) guar-antees existence of a large (d, k)-UFF.Theorem 1. (Erdos et al., 1982) Let n(m, k, d) de-note the maximum cardinality of a (d, k)-UFF over anm-element underlying set (|U | = m). Let h = d dk e.Then,

n(m, k, d) ≥(m

h

)/

(kh

h

)2

It is well known that such a family can be constructedusing a randomized method: form subsets Bi, 1 ≤ i ≤n by selecting d elements of the underlying set U uni-formly at random. The algorithm is formally given inAlgorithm 1.

Page 4: Prateek Jain at Microsoft Research

One-Bit Compressed Sensing: Provable Support and Vector Recovery

Algorithm 1 Probabilistic construction of a size n,(d, k)-UFF from an m-element underlying set.input m,n, d1: U ← 1, 2, . . . ,m, F ← ∅2: for i = 1, · · · , n do3: Obtain Bi by randomly sampling d distinct ele-

ments of U with replacement4: F ← F ∪ Bi5: end for

output F

The following theorem shows that with high probabil-ity, for appropriate choice of the parameters m and d,Algorithm 1 outputs a (d, k)-UFF with high probabil-ity.

Theorem 2. For m = 10k2 log(

3nδ

)and d =

k log(

3nδ

), Algorithm 1 outputs a (d, k)-UFF with

probability greater than 1− δ.

We provide a proof of Theorem 2 in the supplementarymaterial.

3.1.2. UFF based sensing matrix

We now provide a novel method of constructing mea-surement matrix A using a given (d, k)-UFF.

Using the randomized construction mentioned in Al-gorithm 1, construct a (k log

(3nδ

), k)-UFF F , where

|F| = n with underlying set U = 1, 2, . . . ,m.Note that, from Theorem 2, we can choose m =10k2 log

(3nδ

).

Next, the sensing matrix A is defined as follows:

Aij = 1i∈Bj. (2)

That is, the j-th column of A is the incidence vec-tor of Bj . Also, if x∗ ∈ Rn is a non-negative vec-tor, then Ax∗ ≥ 0. Hence, the ith measurementbi = sign(a(i)x∗) is given by:

bi = 1Pj:i∈Bjx∗j>0. (3)

Note that A ∈ 0, 1m×n, b ∈ 0, 1m where m =O(k2 log n).Support Recovery Algorithm: We now present oursupport recovery algorithm that estimates the supportset S of x∗ using measurements b constructed by theabove described (d, k)-UFF based design matrix A.

Our algorithm proceeds in n steps: at the j-th step(1 ≤ j ≤ n), we add element j to the support set S ifmini∈Bj bi is positive. See Algorithm 2 for a detailedpseudo code.

The following theorem proved in the supplementarymaterial establishes the correctness of Algorithm 2.

Algorithm 2 UFF based Support Recovery (UFF)input A : measurement matrix, b : measurement vec-

tor (b = sign(Ax∗))1: S ← ∅2: for j = 1, · · · , n do3: if mini∈Bj bi > 0 then4: S ← S ∪ j5: end if6: end for

output S

Theorem 3. Suppose x∗ ∈ Rn is a non-negativevector s.t. ‖x∗‖0 ≤ k, A is a sensing matrix con-structed according to (2) and b is computed using (3).Then, the set S returned by Algorithm 2 satisfies:S = supp(x∗).3.1.3. Discussion

Above, we described our UFF based algorithm for thesupport recovery problem. Note that, the algorithmis universal, i.e., one design matrix A can be used torecover the support of every k-sparse non-negative vec-tor. Furthermore, the algorithm is efficient, with timecomplexity O(nk log n), and is easy to implement.

Also, note that the measurements given by (3) are bi-nary measurements (i.e., in 0, 1m) rather than signedmeasurements (i.e., in −1, 1m), but they are essen-tially of the same nature.

Robustness to noise: Note that Algorithm 2 re-quires exact measurements without any noise. How-ever, we can easily extend our method for handlingarbitrary adversarial noise in measurements b. Thatis, for the case where value of a small number of bi’scan be flipped arbitrarily. To this end, we use thefollowing robust version of UFFs:Definition 3. A family of sets F = B1, B2 · · · , Bnis called a (d, k, ε)-UFF if |Bi0∩(Bi1∪Bi2∪· · ·∪Bik)| <ε|Bi0 | holds for all distinct Bi0 , Bi1 , · · ·Bik ∈ F andeach set in F has size d.Theorem 4. (de Wolf, 2012) There exists a (d, k, ε)-UFF F over an m-element underlying set such that|F| = n,m = O(k

2 lognε2 ), d = O(k logn

ε ).

Using a (d, k, ε) UFF as in Theorem 4, Algorithm 2can be modified to make it robust up to

(12 − ε

)d ad-

versarial errors, i.e.,(

12 − ε

)d arbitrarily flipped mea-

surements. See Algorithm 8 and Theorem 8 in thesupplementary material for further details.

Handling Vectors with Negative Elements: Onedrawback of our UFF based algorithm is that it cannothandle cases where the underlying signal x∗ has nega-tive entries. A solution to this problem is to select eachnon-zero Aij uniformly at random from [0, 1] (instead

Page 5: Prateek Jain at Microsoft Research

One-Bit Compressed Sensing: Provable Support and Vector Recovery

of fixing it to be 1). This will ensure the following withprobability 1:

• if ` ∈ S∗ then∑j:i∈Bj x

∗j 6= 0 ∀i ∈ B`, and

• if ` /∈ S∗ then ∃j∗ ∈ Bl such that∑i:j∗∈Bi x

∗j = 0

The above observations along with proof of Theorem 3shows that we can recover support of x∗ even if it hasnegative entries. However, a drawback of this solutionis that the resulting algorithm is not universal as thevalues of A need to be sampled afresh for each x∗.In the next section, we present a solution based onexpanders that can handle vectors with negative ele-ments and is universal as well (although with highermeasurement complexity).

3.2. Support Recovery using Expanders

We now describe an expanders based algorithm thatrecovers the support using O

(k3 log n

)measurements.

3.2.1. Expanders Background

A left regular bipartite graph is called an expander ifevery small enough subset of the nodes on the left havea large enough neighborhood set on the right.

Definition 4. A d-left-regular bipartite graph (U, V ),s.t. |U | = n, |V | = m, is an (n,m, d, k, ε)-expander if

∀S ⊂ U , |S| ≤ k ⇒ |N(S)| > (1− ε)d|S|,where N(S) is the neighborhood of nodes in set S.

Expanders are closely related to UFF and hence canbe constructed in the same way as in Algorithm 1. Forthe sake of completeness, we recall the following resultthat establishes the existence of good expanders.

Lemma 1. (Claim 1, (Berinde & Indyk, 2008)) Forany n/2 ≥ k ≥ 1, ε > 0 there exists an (n,m, d, k, ε)expander with d = O

(log n

k

ε

)and m = O

(k log n

k

ε2

)3.2.2. Expander based Sensing Matrix

In this section, we present a method to construct asensing matrix A using a given expander. We firstconstruct a (n,m, d, k + 1, ε)-expander with ε = 1

16k .Using Theorem 1, we can choose m = O(k3 log n

k ) andd = O(k log n

k ). Let A ∈ Rm×n be the adjacency ma-trix of the expander:

Aij =

1 if (i, j) is an edge of the expander0 otherwise.

Then, we use A as the sensing matrix and observeb = sign(Ax∗), with x∗ ∈ Rn, ‖x∗‖0 ≤ k.

Support Recovery Algorithm: We now present oursupport recovery algorithm that estimates support setS of x∗ using measurements b constructed using the

Algorithm 3 Support recovery algorithm when A isconstructed from a (n,m, d, k, ε)-expander.input A : measurement matrix, b : measurement vec-

tor (b = sign(Ax∗))1: S ← ∅2: for j = 1, · · · , n do3: if |N(j) ∩ supp(b)| > d

2 then4: S ← S ∪ j5: end if6: end for

output S

above described design matrix A. Our algorithm pro-ceeds in n steps: at the j-th step (1 ≤ j ≤ n), weadd element j to S if the measurement correspondingto at least half of the neighbors of j (i.e. N(j)) arenon-zero i.e.,|N(j) ∩ supp(b)| > d

2 . See Algorithm 3for a detailed pseudo code.

The following theorem proved in the supplementarymaterial shows correctness of Algorithm 3.

Theorem 5. Let b = sign(Ax∗) with k-sparse x∗ ∈Rn and A as constructed in Section 3.2.2, then Al-gorithm 3 correctly identifies S∗ = supp(x∗), i.e.,S = S∗.

Discussion: Note that Algorithm 3 can exactly re-cover x∗ ∈ −1, 0, 1n: first recover supp(x∗) and thenset sign of each element xj (j ∈ S) to be the sign ofthe majority of elements in N(j) ∩ supp(b).

Robustness: for our choice of parameters, the algo-rithm can tolerate up to d

4 adversarial bit flips in b.Robustness up to d adversarial errors can be obtainedby choosing graphs with better expansion property.

Finally, observe that the computational complexity ofAlgorithm 3 is O

(nk log n

k

).

3.2.3. Divide and Conquer

In this section, we present a “Divide and Conquer” ap-proach that in conjunction with our support recoveryalgorithms can achieve even lower measurement com-plexity than our support recovery algorithms. How-ever, the obtained approach is no longer universal.

The key idea is to first partition the n coordinates ofx∗ into k disjoint random sets of equal size; Wlog wecan assume that k divides n. Since the sparsity of x∗

is k, on an average, each of the random partitions hassparsity 1. Using standard concentration bounds, withhigh probability, each of the partitions has at mostO(log k) non-zeros. We can then use our algorithmsfrom Sections 3.1 or 3.2 to recover the support of eachof the k subsets.

Page 6: Prateek Jain at Microsoft Research

One-Bit Compressed Sensing: Provable Support and Vector Recovery

Algorithm 4 Measurements for Algorithm 5input m, n, d, k1: P ← random permutation matrix2: Generate A′1, A

′2, . . . , A

′k using Algorithm 1 with

input (mk ,nk , d)

3: B : generate block diagonal matrix usingA′1, . . . , A

′k similar to (4)

output B · P

Algorithm 5 Support Recovery for Divide and Con-quer Approachinput A′` : `-th block UFF-based sensing matrix, B:

block matrix (4), P : permutation matrix, b : mea-surement vector (b = sign(B · Px∗))

1: S ← ∅2: for l = 1, · · · , k do3: b` ← b((` − 1)mk , · · · , `

mk − 1) i.e. `-th block of

b4: Run Algorithm 2 on b` and A′` to recover S`5: S ← S ∪ P−1(S`)6: end for

output S

Similar to the result by (Haupt & Baraniuk, 2011), wecan show that the number of measurements needed bythis approach is optimal up to poly (log k), althoughthe obtained approach is not universal. Algorithm 4provides a pseudo-code for generating sensing matrixand Algorithm 5 provides the recovery algorithm.

Construction of the measurement matrix: Themeasurement matrix A is given by A = B · P whereP is a random permutation matrix and B is a blockdiagonal matrix with k equal blocks, each block beinga UFF-based matrix A′`, constructed using Algorithm1 with parameters (mk ,

nk , d).

B =

A′1 · · · 0...

. . ....

0 · · · A′k

, (4)

where A′` =Algorithm 1(mk ,nk , d)

Theorem 6. Suppose x∗ ∈ Rn+ s.t. ||x∗||0 ≤ k, A =B · P is a sensing matrix as in Algorithm 4 with m =O(k log n

k

)and d = O

(log k log n

k

). Then, Algorithm

5 returns supp(x∗) in time O(n log n

k

)with probability

at least 1− e−eΩ(log k).

See the supplementary material for a detailed proof.

4. Approximate Vector Recovery

Problem Statement: Design matrix A ∈ Rm×n andan algorithm to solve: given b = sign(Ax∗) (where

x∗ ∈ Rn, ‖x∗‖0 ≤ k and ‖x∗‖2 = 1), output x suchthat: ∥∥∥∥ x

‖x‖2− x∗

∥∥∥∥2

≤ ε,

where ε > 0 is a given tolerance parameter. Notethat assuming x∗ to be of unit norm entails no lossof generality since scaling x∗ doesn’t change b. Inparticular, we can never recover ‖x∗‖2 from b.

For this problem, we propose two novel solutions whichare both universal, provide their measurement com-plexity and also provide their time complexity. Ourfirst solution is based on combining standard com-pressed sensing techniques with Gaussian measure-ments (see Section 4.1). Our second method first re-covers the true support using methods of Section 3 andthen uses Gaussian measurements to approximatelyrecover elements of x∗ (see Section 4.1).

4.1. Two-stage Approximate Recovery

In this section, we present our first approach for two-stage approximate recovery that exploits existing com-pressed sensing methods. Broadly, we design themeasurement matrix A as a product of two matricesA1 ∈ Rm′×n and A2 ∈ Rm×m′ (i.e., A = A2A1). Weselect A1 to be a standard compressed sensing matrixand A2 to be an iid Gaussian matrix. So the measure-ments are:

b = sign(A2A1x∗).

Now, let z∗ = A1x∗ and b = sign(A2z

∗). The mainidea is that given b and A2, we can find a vector z thatsatisfies each of the measurement, i.e., sign(A2z) = b.Furthermore, using Theorem 10 (Theorem 2, Jacqueset al. (2011)), z should closely approximate z∗. Next,given z and A1, using standard compressed sensingalgorithms we estimate x which should be a close ap-proximation to x∗.

Construction of the measurement matrix: LetA = A2 · A1 where A1 ∈ Rm′×n and A2 ∈ Rm×m′ .A1 is a matrix that satisfies the restricted isometryproperty (RIP) (Candes & Tao, 2005) with δ2k <

16 .

A1 is said to satisfy k-RIP with constant δk if, ∀x ∈Rn, ‖x‖0 ≤ k:

(1− δk)‖x‖22 ≤ ‖A1x‖22 ≤ (1 + δk)‖x‖22.

Also, if m′ = O(k log nk ) and each entry of A1 is sam-

pled from a centered sub-Gaussian then A1 satisfies2k-RIP with constant δ2k < 1

6 (Candes & Tao, 2005).

Next, select m = O(

1εm′ log m′

ε

)= O

(1εk log n

k

)and sample each entry of A2 independently fromN (0, 1). Using Theorem 10 (supplementary material)by (Jacques et al., 2011), with high probability such a

Page 7: Prateek Jain at Microsoft Research

One-Bit Compressed Sensing: Provable Support and Vector Recovery

Algorithm 6 Two-stage Approximate Recoveryinput A1, A2: measurement matrices (see Sec-

tion 4.1), b : measurement vector (b =sign(A2A1x

∗))1: Stage 1: Run an LP solver for the following LP:

find z s.t. bia(i)2 z > 0,∀i.

2: Stage 2: Run GradeS algorithm (Garg & Khan-dekar, 2009) (supplementary material) with inputsz and A1 to obtain x

output x

measurement matrix ensures that:

∀x,y, sign(A2x) = sign(A2y)⇒∥∥∥∥ x

‖x‖2− y

‖y‖2

∥∥∥∥2

≤ ε.

Algorithm for approximate recovery: In this sec-tion, we present our two-stage algorithm for approx-imate recovery. As mentioned earlier, the algorithmfirst uses a half space learning algorithm to obtainan estimate z of A1x

∗ and then uses the GradeS al-gorithm, a robust compressed sensing algorithm by(Garg & Khandekar, 2009), on z to obtain an estimatex of x∗. See Algorithm 6 for a pseudo-code of ourapproach. For completeness, we provide the GradeSalgorithm in the supplementary material.

Below, we provide proof of correctness of Algorithm 6.

Theorem 7. Let x∗ ∈ Rn be a k-sparse vectorwith ‖x∗‖2 = 1 and let A1 and A2 be chosen asdescribed in the previous section. Also, let b =sign(A2A1x

∗). Then for x returned by Algorithm 6,we have,

∥∥∥ x‖x‖2 − x

∗∥∥∥

2≤ 20ε, where 0 < ε < 1

4 .

See the supplementary material for a detailed proof.

Note that the computational complexity of solving theLP in Stage 1 of our algorithm can be bounded by

O((

k lognε

)5)

.

Remarks: The above algorithm can be made robustto classification noise by repeating each measurement afixed number of times and taking a majority vote. Forinstance, suppose each measurement is correct withprobability 1

2 +p and is flipped with probability 12 −p.

Then repeating each measurementO(

logmp

)times, we

can argue that a majority vote of measurements willgive us the true measurements (with high probability).We can then use Algorithm 6 to recover x∗.

Extension to Other Compressible Signals: Notethat, the second stage of Algorithm 6 is essentially justa “standard” compressed sensing module, whose goalis used to recover x∗ from “standard” (noisy) linearmeasurement of x∗, i.e., z = A1x

∗+η. Hence, we can

Algorithm 7 Support Recovery based ApproximateRecovery (S-Approx)

input A1 and A2, b =[b1

b2

]= sign

(A1x

A2x∗

)1: Stage 1: Run Expanders algorithm (Algorithm 3)

with inputs b1 = sign(A1x∗) and A1 to output S.

2: x← 0n×1

3: Stage 2: Run an LP solver for the following LP:find x s.t. b2(i)a(i)

2 (S)x(S) > 0, ∀1 ≤ i ≤ m′.output x

modify our second stage to recover other compressiblesignals as well, by directly using the corresponding re-covery method. Examples of such compressible signalsinclude low-rank matrices, low-rank + sparse matrices,wavelet based sparse vectors etc.

The framework by (Plan & Vershynin, 2012) can alsorecover a large class of compressible signals. However,as in the case of sparse vectors, their dependence onthe error ε is ε−6 while ours is only ε−1. Furthermore,(Plan & Vershynin, 2012) needs to compute “Gaussianwidth” for each of these class of functions; in contrast,we can directly use the existing results for these classof signals to provide measurement complexity.

Support Recovery based Approximate Recov-ery: In this section, we present another approachfor approximate vector recovery that first recovers thesupport of x∗ using our Expanders algorithm (Algo-rithm 3) and then solves the resulting low dimensionalproblem. That is, we choose the design matrix to

be: A =[A1

A2

]where A1 is a design matrix based

on expanders (as in Section 3.2) and A2 is an iid stan-dard Gaussian matrix. Using the measurements corre-sponding to A1, we can first recover the support usingAlgorithm 3.

Once we have the support, we can solve an LP re-stricted to the support, to obtain x that is consis-tent with the measurements. That is, sign(A2x) =sign(A2x

∗). Again using Theorem 10 (supplementarymaterial) by (Jacques et al., 2011), we can concludethat

∥∥∥ x‖x‖2

− x∗∥∥∥

2< ε. See Algorithm 7 for a pseudo-

code of our approach.

Now, the first step of support recovery re-quires O

(k3 log n

k

)measurements (Theorem 5) and

O(nk log n

k

)time. The second step needs O

(kε

)mea-

surements and O(k5

ε5

)time for recovery. So overall,

the algorithm needs O(k3 log n

k + kε

)measurements

and O(nk log n

k + k5

ε5

)time.

Page 8: Prateek Jain at Microsoft Research

One-Bit Compressed Sensing: Provable Support and Vector Recovery

(a) (b) (c) (d)Figure 1. (a), (b): Error (|S∗∆bS|) incurred by various support-recovery methods ( n = 3000, varying k, m). (c), (d):

Error (˛˛

x‖x‖2

− x∗

‖x∗‖2

˛˛2) incurred by various approximate recovery methods with fixed n = 3000 but varying k, m.

TwoStage (Algorithm 6) and PV incurs comparable error while S-approx (Algorithm 7) is significantly more accurate.

(a) (b) (c) (d) (e)Figure 2. Phase transition diagrams for different methods when applied to support recovery (a, b) and approximaterecovery (c, d, e). Each figure plots probability of success in 100 trials for different values of n and k. Red represents highprobability of success (see plot (a) for color coding). Clearly, UFF recovers the support in a larger regime as comparedto BH. For approximate recovery, S-approx performs better in a larger regime of the parameters as compared to bothTwoStage and PV, while TwoStage slightly outperforms PV.

5. Experiments

In this section, we present empirical results for our al-gorithms for support recovery as well as approximatevector recovery. For support recovery, we evaluateour UFF algorithm (Algorithm 2) against the sketchbased algorithm by (Haupt & Baraniuk, 2011) (HB).For support recovery, we evaluate our TwoStage algo-rithm and S-Approx algorithm against the algorithmby (Plan & Vershynin, 2012) (PV).

Support Recovery: For these experiments, we gen-erate a k-sparse signal x∗ ∈ 0, 1n randomly and esti-mate its support using linear measurements proposedby each of the algorithms. We report the L1 error insupport estimation: ErrorSupport(S, S∗) = |S∗∆S|.

We first compare recovery properties of different meth-ods using phase transition diagrams that are com-monly used in the compressive sensing literature (seeFigure 2 (a), (b)). For this, we fix the number of mea-surements (m = 500) while varying n and k. For eachproblem size (k, n,m) we generate 20 synthetic prob-lems and plot probability of exact support recovery;probability values in Figure 2 are color coded with redrepresenting high probability of recovery while bluerepresents low probability of recovery. Figure 2 (a),(b) show the phase transition diagrams of our UFFmethod (Algorithm 2) and the HB method, respec-tively. Note that UFF is able to recover the supportfor a significantly larger fraction of problems than HB.

Next, we study error incurred by different methodswhen the number of measurements are not enough forrecovery. First, we fix n = 3000,m = 500 and varyk. Figure 1(a) compares the error incurred by our

UFF algorithm against the HB algorithm. Clearly,our UFF based algorithm incurs smaller error thanHB for large k. For example, for k = 20, UFF is ableto recover the support exactly, while HB incurs around20% error. Next, we fix n = 3000, k = 20 while varyingm. Figure 1(b) shows that UFF is able to achieveexact recovery with around 400 measurements whileHB requires around 800 measurements.

Approximate Recovery: Here, we generate k-sparse signals x∗ ∈ Rn where non-zeros are sampledusing the standard k-variate Gaussian. We report er-ror in recovery, i.e., ErrorApprox =

∣∣∣∣∣∣ x‖x‖2 −

x∗

‖x∗‖2

∣∣∣∣∣∣2.

Here again, we first plot phase transition diagrams fordifferent methods. We fix m = 500 and vary n, k; foreach problem size (m,n, k) we measure probability ofsuccess (out of 20 runs) where a method is consideredto be successful for an instance if the error incurred isless than 0.3. Figures 2 (c), (d), (e) clearly show thatS-Approx is significantly better than both TwoStageand PV; TwoStage is also marginally better than PV.

Next, we fix n = 3000 and m = 500, while varyingk. Figure 1(c) compares TwoStage and S-approx algo-rithms with the PV algorithm. Here again, TwoStageand PV are comparable while S-approx incurs signif-icantly less error for k < 24. For larger k, TwoStageand PV are significantly better than S-approx. Finally,we fix n = 3000 and k = 20, while varying m. Hereagain, for small number of measurements, S-approx in-curs more error compared to TwoStage and PV. But,for larger number of measurements, it is significantlymore accurate.

Page 9: Prateek Jain at Microsoft Research

One-Bit Compressed Sensing: Provable Support and Vector Recovery

References

Ai, A., Lapanowski, A., Plan, Y., and Vershynin, R.One-bit compressed sensing with non-gaussian mea-surements. arXiv preprint arXiv:1208.6279, 2012.

Baraniuk, Richard G., Cevher, Volkan, Duarte,Marco F., and Hegde, Chinmay. Model-based com-pressive sensing. IEEE Transactions on InformationTheory, 56(4):1982–2001, 2010.

Berinde, Radu and Indyk, Piotr. Sparse recovery usingsparse random matrices, 2008.

Boufounos, Petros and Baraniuk, Richard G. 1-bitcompressive sensing. In CISS, pp. 16–21, 2008.

Boufounos, Petros T. Greedy sparse signal reconstruc-tion from sign measurements. In Proceedings of the43rd Asilomar conference on Signals, systems andcomputers, pp. 1305–1309, 2009.

Candes, Emmanuel J. and Recht, Benjamin. Exactmatrix completion via convex optimization. Founda-tions of Computational Mathematics, 9(6):717–772,December 2009.

Candes, Emmanuel J. and Tao, Terence. Decoding bylinear programming. IEEE Transactions on Infor-mation Theory, 51(12):4203–4215, 2005.

Davenport, M.A., Plan, Y., Berg, E., and Woot-ters, M. 1-bit matrix completion. arXiv preprintarXiv:1209.3672, 2012.

de Wolf, Ronald. Efficient data structures from union-free families of sets. http://homepages.cwi.nl/~rdewolf/unionfree_datastruc.pdf, 2012.

Duarte, M.F., Davenport, M.A., Takhar, D., Laska,J.N., Sun, Ting, Kelly, K.F., and Baraniuk, R.G.Single-pixel imaging via compressive sampling. Sig-nal Processing Magazine, IEEE, 25(2):83 –91, march2008. ISSN 1053-5888. doi: 10.1109/MSP.2007.914730.

Erdos, Peter L., Frankl, Peter, and Furedi, Zoltan.Families of finite sets in which no set is covered bythe union of two others. J. Comb. Theory, Ser. A,33(2):158–166, 1982.

Garg, Rahul and Khandekar, Rohit. Gradient de-scent with sparsification: an iterative algorithm forsparse recovery with restricted isometry property. InICML, 2009.

Haupt, Jarvis and Baraniuk, Richard G. Robust sup-port recovery using sparse compressive sensing ma-trices. In CISS, pp. 1–6, 2011.

Hsu, D., Kakade, S. M., Langford, J., and Zhang,T. Multi-label prediction via compressed sensing.In Advances in Neural Information Processing Sys-tems, 2009.

Jacques, Laurent, Laska, Jason N., Boufounos, Petros,and Baraniuk, Richard G. Robust 1-bit compressivesensing via binary stable embeddings of sparse vec-tors. CoRR, abs/1104.3160, 2011.

Jafarpour, Sina, Xu, Weiyu, Hassibi, Babak, andCalderbank, A. Robert. Efficient and robust com-pressed sensing using optimized expander graphs.IEEE Transactions on Information Theory, 55(9):4299–4308, 2009.

Laska, Jason N. and Baraniuk, Richard G. Regimechange: Bit-depth versus measurement-rate in com-pressive sensing. IEEE Transactions on Signal Pro-cessing, 60(7):3496–3505, 2012.

Laska, Jason N., Wen, Zaiwen, Yin, Wotao, and Bara-niuk, Richard G. Trust, but verify: Fast and accu-rate signal recovery from 1-bit compressive measure-ments. IEEE Transactions on Signal Processing, 59(11):5289–5301, 2011.

Negahban, Sahand, Ravikumar, Pradeep D., Wain-wright, Martin J., and Yu, Bin. A unified frameworkfor high-dimensional analysis of $m$-estimatorswith decomposable regularizers. In NIPS, pp. 1348–1356, 2009.

Plan, Y. and Vershynin, R. One-bit compressedsensing by linear programming. arXiv preprintarXiv:1109.4299, 2011.

Plan, Yaniv and Vershynin, Roman. Robust 1-bit compressed sensing and sparse logistic regres-sion: A convex programming approach. CoRR,abs/1202.1212, 2012.

Tropp, Joel A. and Gilbert, Anna C. Signal recoveryfrom random measurements via orthogonal match-ing pursuit. IEEE Transactions on InformationTheory, 53(12):4655–4666, 2007.

Wright, J., Ma, Yi, Mairal, J., Sapiro, G., Huang, T.S.,and Yan, Shuicheng. Sparse representation for com-puter vision and pattern recognition. Proceedings ofthe IEEE, 98(6):1031 –1044, june 2010. ISSN 0018-9219. doi: 10.1109/JPROC.2010.2044470.

Page 10: Prateek Jain at Microsoft Research

One-Bit Compressed Sensing: Provable Support and Vector Recovery

A. Proofs of UFF section

In this section we will give proofs for theorems in theUFF section and introduce a robust version of UFFwhen the measurements have adversarial noise.

Proof of Theorem 2. Consider k + 1 distinct elementsof F : B0, B1, · · · , Bk. Let us define the bad event Eas

E = 1B0⊆Ski=1 Bi.

The cardinality of⋃ki=1Bi is at most kd. Since the el-

ements of B0 are chosen independently and uniformlyat random from [m], we have:

P [E] ≤(kdd

)(md

) ≤ ( kd

m− d+ 1

)d.

The total number of choices for the sets B0, B1, · · · , Bkis n

(n−1k

). Using union bound, the probability that

Algorithm 1 does not return a (d, k)-UFF is

P [Err] ≤ n(n− 1k

)(kd

m− d+ 1

)d≤ n

(e(n− 1)

k

)k (kd

m− d+ 1

)d≤ n2kek

(19

)k log( 3nδ )

= n2kekn−k log 93−k log 9δk log 9 < δ.

This finishes the proof.

Proof of Theorem 3. Let S∗ = supp(x∗). We knowthat |S∗| ≤ k. Wlog, assume that non-zero ele-ments are in the first k dimensions of x∗, i.e., S∗ =1, 2, · · · , |S∗|.Proof of S ⊆ S∗: Consider any ` /∈ S∗. As, A is con-structed from F which is (d, k)-UFF (see Definition 1):

Bl * B1 ∪B2 ∪ · · · ∪B|S∗|.Therefore, ∃1 ≤ i′ ≤ m s.t. i′ ∈ B` and i′ /∈ Bj ∀j ∈S∗. Furthermore, bi′ = 1Pj:i′∈Bj

x∗j>0 = 0. There-

fore, mini∈B` bi = 0, i.e., ` /∈ S (see Step 4 of Algo-rithm 2), and it follows that S ⊆ S∗.Proof of S∗ ⊆ S: Now consider any ` ∈ S∗. For all i ∈B`, we have: bi = 1Pj:i∈Bj

x∗j>0 ≥ 1x∗`>0 = 1.Therefore, mini∈B` bi > 0, and by Step 5 of Algo-rithm 2: ` ∈ S. Hence, S∗ ⊆ S .

In the presence of arbitrary adversarial noise, the mea-surements no longer satisfy (3) but are given by

b = Sign (Ax∗ + η) (5)

where η ∈ Rm is a sparse vector of outliers and ‖η‖0is the number of adversarial errors. In the case ofadversarial errors, we use a (d, k, ε)-UFF to constructthe measurement matrix as in (2) and the followingalgorithm to reconstruct x∗.

Algorithm 8 Support recovery algorithm when A isconstructed from a (d, k, ε)-UFFinput A : measurement matrix, ε : robustness param-

eter, b : measurement vector (b = sign(Ax∗ + η))1: S ← ∅2: for j = 1, · · · , n do3: if |supp(b) ∩Bj | > |Bj |

2 then4: S ← S ∪ j5: end if6: end for

output S

Theorem 8 shows that Algorithm 8 recovers x∗ evenin the presence of at most

(12 − ε

)d adversarial errors.

Theorem 8. Suppose x∗ ∈ Rn≥0 is a vector of non-negative elements s.t. ‖x∗‖0 ≤ k, A is a sensingmatrix constructed according to (2) and the measure-ments are according to (5). Suppose further that theunderlying UFF is a (d, k, ε)-UFF and there are up to(

12 − ε

)d adversarial errors in the measurement (i.e.,

‖η‖0 ≤(

12 − ε

)d where η is as in (5)). Then, the set

S returned by Algorithm 8 satisfies: S = supp(x∗).

Proof. The proof of this theorem is along lines ofthe proof for Theorem 3. Let S∗ = supp(x∗). Weknow that |S∗| ≤ k. Wlog, assume that non-zeroelements are in the first k dimensions of x∗, i.e.,S∗ = 1, 2, · · · , |S∗|.

We show S = S∗, by first proving S ⊆ S∗ and thenS∗ ⊆ S.Proof of S ⊆ S∗: Consider any ` /∈ S∗. Since A isconstructed from F which is (d, k, ε)-UFF (see Defini-tion 3):∣∣Bl ∩ (B1 ∪B2 ∪ · · · ∪B|S∗|

)∣∣ < ε |Bl| = εd.

Since there are at most(

12 − ε

)d adversarial errors, we

have

|supp(b) ∩B`| < εd+(

12− ε)d

=d

2=|B`|

2

So, by Step 4 of Algorithm 8, we have ` /∈ S. Hence,S ⊆ S∗.

Page 11: Prateek Jain at Microsoft Research

One-Bit Compressed Sensing: Provable Support and Vector Recovery

Proof of S∗ ⊆ S: Now consider any ` ∈ S∗. For everyi ∈ B` \ supp (η), we have:

bi = 1Pj:i∈Bjx∗j>0 ≥ 1x∗`>0 = 1.

So, |supp(b) ∩B`| > (1− ε) d −(

12 − ε

)d = d

2 = |B`|2

and by Step 5 of Algorithm 8: ` ∈ S. Hence, S∗ ⊆ S.

B. Proofs of Expanders section

In this section we will prove Theorem 5 for which weneed the following lemma.

Lemma 2. With the sensing matrix A constructed asin section 3.2.2 and b = sign(Ax∗) where x∗ is a k-sparse vector, we have |supp(b)| > (1−2ε)d|S∗|, whereS∗ = supp(x∗).

Proof of Lemma 2. Since |S∗| < k + 1, we haveN(S∗) > (1−ε)d|S∗| by the expansion property. Now,N(S∗) can be partitioned into N1(S∗) and N>1(S∗),where N1(S∗) are the vertices in N(S∗) with only oneneighbor in S∗ and N>1(S∗) are the vertices in N(S∗)with at least two neighbors in S∗.

So the number of edges between S∗ and N(S∗) isd|S∗| ≥ |N1(S∗)| + 2|N>1(S∗)|. Also |N(S∗)| =|N1(S∗)| + |N>1(S∗)| > (1 − ε)d|S∗|. Eliminat-ing |N>1(S∗)|, we obtain |N1(S∗)| > (1 − 2ε)d|S∗|.Also, N1(S) ⊆ supp(b). Hence, |supp(b)| > (1 −2ε)d|S∗|.

Proof of Theorem 5. We first prove S∗ ⊆ S. Let j ∈supp(x∗), then |N(j)∪supp(b)| ≤ |N(S∗∪j)| ≤ d|S∗|.Using Lemma 2 with the above inequality we get:|N(j) ∩ supp(b)| > (1− 2ε)d|S∗| − d(|S∗| − 1).As ε < 1

8k , |N(j) ∩ supp(b)| > 3d4 . Hence, Step 4 of

Algorithm 3 will add j to S and hence, S∗ ⊆ S.

We now prove S ⊆ S∗. Let j /∈ S∗, then |S∗ ∪ j| ≤k + 1. Using expansion property,

(1− ε)d(|S∗|+ 1) < |N(S∗ ∪ j)|≤ |N(S∗)|+ |N(j)| − |N(j) ∩N(S∗)|≤ d|S∗|+ d− |N(j) ∩N(S∗)|

⇒ |N(j) ∩N(S∗)| < εd(|S∗|+ 1) ≤ εd(k + 1) <d

4.

As supp(b) ⊆ N(S∗), |N(j) ∩ supp(b)| < d4 . Hence,

Step 4 of Algorithm 3 will not add j to S. Hence,S ⊆ S∗.

C. Proof of the Divide and Conquerapproach

Proof of Theorem 6. Let r = log k, z = Px∗ and z` =z((`− 1)mk , · · · , `

mk − 1) i.e. the `th block of z. Now,

Pr[||z`||0 > r] ≤(k

r

)1kr≤(er

)rwhere the second inequality follows from Stirling’s ap-proximation. By union bound, we have

Pr[∃` : ||z`||0 > r] ≤ k(er

)r= e−Ω(log k).

So ‖z`‖0, ∀` is at most O (log k) with probability atleast 1− e−Ω(log k). Theorem now follows using Theo-rem 3.

D. GraDeS

This section is almost entirely from (Garg & Khan-dekar, 2009), presented here for the sake of complete-ness. Before we present the GraDeS algorithm, wehave the following definition:Definition 5. Let Hk : Rn → Rn be a function thatsets all but the k largest coordinates in absolute valueto zero. More precisely, for x ∈ Rn, let π be a per-mutation of [n] such that

∣∣xπ(1)

∣∣ ≥ ∣∣xπ(2)

∣∣ ≥ · · · ≥∣∣xπ(n)

∣∣. Then the vector Hk(x) is a vector x wherexπ(i) = xπ(i) for i ≤ k and xπ(i) = 0 for i ≥ k + 1.

Algorithm 9 GraDeS (Garg & Khandekar, 2009)input z, A1, γ and ε1: Initialize x← 02: while ‖z −A1x‖2 > ε do

3: x← Hk

(x+ 1

γAT1 (z −A1x)

)4: end while

output x

The following theorem which shows the correctnessof Algorithm 9 is a restatement of Theorem 2.3 from(Garg & Khandekar, 2009).Theorem 9. Suppose x∗ is a k-sparse vector satisfy-ing z = A1x

∗+ e for an error vector e ∈ Rm′ and theisometry constant of the matrix A1 satisfies δ2k < 1

3 .There exists a constant D > 0 that depends only onδ2k, such that Algorithm 9 with γ = 1 + δ2k, computesa k-sparse vector x ∈ Rn satisfying ‖x∗ − x‖ ≤ D ‖e‖in at most 1

log(

1−δ2k4δ2k

) · log

(‖z‖2

‖e‖2

)iterations. Moreover, for δ2k < 1

6 , we can choose theconstant D to be 6.

Page 12: Prateek Jain at Microsoft Research

One-Bit Compressed Sensing: Provable Support and Vector Recovery

E. Recovery using GaussianMeasurements

Here we state a theorem from (Jacques et al., 2011)which guarantees that all unit vectors which agree withthe 1-bit measurements obtained from a random Gaus-sian matrix must be very close to each other.

Theorem 10 (Theorem 2 of (Jacques et al., 2011)).Let A ∈ Rm×n be a matrix generated as A ∼Nm×n(0, 1). Fix 0 < η ≤ 1 and ε > 0. If the numberof measurements(m) satisfy:

m >8εk log(

16nεη

),

then with probability 1− η, for all k-sparse vectors xand y:

sign(Ax) = sign(Ay)⇒∣∣∣∣∣∣∣∣ x

||x||2− y

||y||2

∣∣∣∣∣∣∣∣2

≤ ε.

F. Proof of the Two-stage algorithm(Algorithm 6)

Here we prove Theorem 7 which is a proof of correct-ness of Two-stage algorithm (Algorithm 6).

Proof of Theorem 7. We prove the theorem by ana-lyzing both the stages of our algorithm.

Stage 1: Let z∗ = A1x∗. As b = sign(A2z

∗),(a(i)

2 , bi),∀i are linearly separable and hence using lin-ear programming, we can find a vector z consistentwith the measurements b i.e. b = sign(A2z). UsingTheorem 10, ∣∣∣∣∣∣∣∣ z∗

||z∗||2− z

||z||2

∣∣∣∣∣∣∣∣2

< ε. (6)

Stage 2: In stage 2 of Algorithm 6, we run GradeSwith inputs bz

‖bz‖2 and A1. Now, using (6):

z

‖z‖2= A1

x∗

‖A1x∗‖2+ η,

where ‖η‖2 ≤ ε. Also, since A1 satisfies RIP withδ2k < 1/6, using the recovery result of GradeS (Theo-rem 9, Appendix D), the recovered vector x satisfies:∣∣∣∣∣∣∣∣x− x∗

‖A1x∗‖2

∣∣∣∣∣∣∣∣2

≤ 6ε.

That is,

‖x‖22 +‖x∗‖22‖A1x∗‖22

− 2xTx∗

‖A1x∗‖2≤ 36ε2,

‖x‖2‖A1x∗‖2

‖x∗‖2+

‖x∗‖2‖x‖2‖A1x∗‖2

− 2xTx∗

‖x‖2‖x∗‖2≤ 36ε2

‖A1x∗‖2

‖x‖2‖x∗‖2.

Using the fact that t+ 1/t ≥ 2 and using RIP,

2− 2xTx∗

‖x‖2‖x∗‖2≤ 36ε2(1 + δ2k)

1‖x‖2

.

Also, ‖x‖2 ≥ ‖ x∗

‖A1x∗‖2 ‖2−6ε ≥ 11+δ2k

−6ε. So we have

∣∣∣∣∣∣∣∣ x∗

‖x∗‖2− x

‖x‖2

∣∣∣∣∣∣∣∣22

< 36

1 + δ2k(1

1+δ2k− ε) ε2

⇒∣∣∣∣∣∣∣∣ x∗

‖x∗‖2− x

‖x‖2

∣∣∣∣∣∣∣∣2

< 20ε

for ε < 14 .

G. Lower Bound on ReconstructionError

The following is a lower bound on the reconstruc-tion error of any approximate recovery algorithm from(Jacques et al., 2011).

Theorem 11 (Theorem 1 of (Jacques et al., 2011)).Let ||x∗||0 ≤ k, ||x∗||2 = 1, b = sign(Ax∗), A ∈ Rm×nand let x = ∆1bit(b, A, k) be the unit vector recon-structed by some recovery algorithm ∆1bit based onb, A, k. Then the worst case reconstruction errorsupx∗ ||x− x∗||2 ≥

kem .