Sparse Solutions of Underdetermined Linear Equations by ...

Sparse Solutions of Underdetermined Linear Equations by Linear ProgrammingDavid Donoho & Jared Tanner
Stanford University, Department of Statistics University of Utah, Department of Mathematics
Arizona State University: March 6th 2006
Underdetermined systems, dictionary perspective
Ax = b, A ∈ R d×n, d < n
Ax = b, A ∈ R d×n, d < n
I Least squares solution via “canonical dual” (ATA)−1AT
• Linear reconstruction, not signal adaptive • Solution vector full, n nonzero elements in x
Ax = b, A ∈ R d×n, d < n
I Eschew redundancy, find simple model of data from A
Ax = b, A ∈ R d×n, d < n
I Seek sparsest solution, x`0 := # nonzero elements
min x`0 subject to Ax = b
Ax = b, A ∈ R d×n, d < n
I Combinatorial cost for naive approach
Ax = b, A ∈ R d×n, d < n
I Combinatorial cost for naive approach
I Efficient nonlinear (signal adaptive) methods • Greedy (local) and Basis Pursuit (global)
Greedy [Temlyakov, DeVore, Tropp, ...]
I Orthogonal Matching Pursuit: initial r = b, A = [] while r 6= 0
max `∞
max `∞
max `∞
AT r =: aT j r A = [A aj ] r = b − A(AT A)−1ATb
max `∞
I Nonlinear selection of basis, x = (AT A)−1ATb; x`0 ≤ d
max `∞
I Highly redundant dictionary often give fast decay of residual
max `∞
I Recovery of sparsest solution? • examples of arbitrary sub-optimality for a fixed dictionary A
[Temlyakov, DeVore, S. Chen, Tropp, . . . ]
max `∞
[Temlyakov, DeVore, S. Chen, Tropp, . . . ] • residual nonzero for steps < d , irregardless of sparsity [Chen]
max `∞
I Recover sparsest if sufficiently sparse, O( √
d) [Tropp]
max `∞
d) [Tropp]
max `∞
d) [Tropp]
Basis Pursuit
I Rather than solve `0 (combinatorial), solve `1, (use LP)
• Global basis selection rather than greedy local selection
Basis Pursuit
I Example, A = [A1A2] two ONB with coherence maxij(ai , aj ) • If x`0 / .914(1 + µ−1) then `1 → `0 [Elad, Bruckstein] • Coherence, µ := maxij(ai , aj) ≥ 1/
√ d , [Candes, Romberg]
d), signals.
Basis Pursuit
d), signals.
Basis Pursuit
d), signals.
I Is the story over? Can O( √
d) threshold be overcome? yes!
Basis Pursuit
d), signals.
I Examples of success: partial Fourier and Laplace, x`0 / bd 2 c
Basis Pursuit
d), signals.
I Examples of success: partial Fourier and Laplace, x`0 / bd 2 c
I More to come for typical (random) matrices
Sparsity threshold and the sampling matrix, A
Deterministic:
d) that is, success only for highly sparse signals:
I Some special cases of success, partial Fourier and Laplace, x`0 / bd
2 c [Donoho, T]
Deterministic:
2 c [Donoho, T]
Deterministic:
2 c [Donoho, T]
d threshold for most A with randomness
I Recent order bounds for random ortho-projectors • `1 → `0 if x`0 / O(d/ log(n/d)) [Candes, Tao, Romberg; Vershinin, Rudelson] • OMP→ `0 if x`0 / O(d/ log(n)) [Tropp]
Deterministic:
2 c [Donoho, T]
I What is the precise `1 sparsity threshold for random matrices?
Deterministic:
2 c [Donoho, T]
I Computing random inner products, “correlation with noise”
Deterministic:
2 c [Donoho, T]
I Computing random inner products, “correlation with noise” Why solve this problem? Are there applications?
Motivation for systems with random A
Compressed Sensing [Donoho; Candes, Tao]:
I Transform Φ with sparse signal coefficients, x = Φx . Can x be recovered with few measurements of x?
I Yes, from nonadaptive measurements recover sparse coef. Sample the signal with AΦ where A is random d × n, d < n.
min Φx`1 subject to measurements AΦx = b
I Coming to a digital camera near you [Baranuik]
Phase transition as function of measurements (aspect ratio):
I Fix aspect ratio, δ = d/n ∈ (0, 1), where A ∈ R d×n
Sparsity threshold, x`0 ≤ ρ(δ)d , ρ(δ) ∈ (0, 1)
Phase transition as function of measurements (aspect ratio):
I Fix aspect ratio, δ = d/n ∈ (0, 1), where A ∈ R d×n
Sparsity threshold, x`0 ≤ ρ(δ)d , ρ(δ) ∈ (0, 1)
I Phase transition as n → ∞, overwhelming probability `1 → `0
Neighborliness and constrained `1 minimization
Theorem Let A be a d ×n matrix, d < n. The two properties of A are equiv.:
I The polytope AT has n vertices and is outwardly k-neighborly,
I Whenever y = Ax has a nonnegative solution x0 having at
most k nonzeros, x0 is the unique nonnegative solution to
y = Ax and so the unique solution to the constrained `1
minimization problem.
Neighborliness and constrained `1 minimization
Theorem Let A be a d ×n matrix, d < n. The two properties of A are equiv.:
I The polytope AT has n vertices and is outwardly k-neighborly,
I Whenever y = Ax has a nonnegative solution x0 having at
most k nonzeros, x0 is the unique nonnegative solution to
y = Ax and so the unique solution to the constrained `1
minimization problem.
Lemma (Neighborliness and face numbers) Suppose the polytope P = AT has n vertices and is outwardly
k-neighborly. Then
∀` = 0, . . . , k − 1, ∀ F ∈ F`(T n−1), AF ∈ F`(AT ).
Conversely, suppose that the above equation holds; then P = AT
has n vertices and is outwardly k-neighborly.
Strong threshold, random A and all x0
Expected number of faces, random ortho-projector:
Efk(AT ) = fk(T ) − 2 ∑
G∈Fd+1+2s(T )
β(F ,G )γ(G ,T )
where β and γ are internal and external angles respectively [Affentranger, Schneider]
Efk(AT ) = fk(T ) − 2 ∑
G∈Fd+1+2s(T )
β(F ,G )γ(G ,T )
Theorem (Strong threshold) Let ρ < ρN(δ) and let A = Ad ,n be a uniformly-distributed random
projection from R n to R
d , with d ≥ δn. Then
Prob{f`(AT n−1) = f`(T n−1), ` = 0, . . . , bρdc} → 1, as n → ∞.
=⇒ P is k neighborly for k = b(ρN(δ) − ε)dc
Efk(AT ) = fk(T ) − 2 ∑
G∈Fd+1+2s(T )
β(F ,G )γ(G ,T )
Theorem (Strong threshold) Let ρ < ρN(δ) and let A = Ad ,n be a uniformly-distributed random
d , with d ≥ δn. Then
Prob{f`(AT n−1) = f`(T n−1), ` = 0, . . . , bρdc} → 1, as n → ∞.
=⇒ P is k neighborly for k = b(ρN(δ) − ε)dc ⇒ With overwhelming probability (f`(T n−1)− Ef`(AT n−1) ≤ πne
−εn) on A, for every x0 with x0`0 ≤ b(ρN(δ) − ε)dc, y = Ax0
generates an instance of the constrained `1 minimization problem with x0 as its unique solution.
Phase Transition, Strong (non-negative)
`1 → `0 if x0`0 ≤ b(ρN(δ) − ε)dc and x ≥ 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
ρN(δ)
δ
I As δ → 0 ρN(δ) ∼ [2e log(1/δ)]−1
Weak threshold, random A and most x0
Theorem (Vershik-Sporyshev) Let d = d(n) ∼ δn and let A = Ad ,n be a uniform random
d . Then for a sequence k = k(n) with
k/d ∼ ρ, ρ < ρVS (δ), we have
fk(AT n−1) = fk(T n−1)(1 + oP(1)).
Weak threshold, random A and most x0
Theorem (Vershik-Sporyshev) Let d = d(n) ∼ δn and let A = Ad ,n be a uniform random
d . Then for a sequence k = k(n) with
k/d ∼ ρ, ρ < ρVS (δ), we have
fk(AT n−1) = fk(T n−1)(1 + oP(1)).
Theorem Let A be a d × n matrix, d < n in general position. For
1 ≤ k ≤ d − 1, these two properties of A are equivalent
I The polytope P = AT has at least (1 − ε) times as many
zero-free (k − 1)-faces as T , I Among all problem instances (y ,A) generated by some
nonnegative vector x0 with at most k nonzeros, the
constrained `1 minimization recovers the sparsest solution,
except in a fraction ≤ ε of instances.
Phase Transition, Weak (non-negative)
`1 → `0 if x`0 ≤ b(ρVS (δ) − ε)dc and x ≥ 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
δ
ρvs(δ)
I Asymptotic limit of empirical tests (example shown later) I As δ → 0 ρVS(δ) ∼ [2 log(1/δ)]−1
I Typically e times less strict sparsity requirement as δ → 0
Phase Transitions, `1 → `0 if x`0 < ρ(δ)d
Two modalities from the random sampling perspective: I Weak threshold, random signal and measurement independent I Strong threshold, worst signal for a given measurement
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
δ
ρ(δ)
Non-negative signal, x , [Donoho,T]
Phase Transitions, `1 → `0 if x`0 < ρ(δ)d
Two modalities from the random sampling perspective: I Weak threshold, random signal and measurement independent I Strong threshold, worst signal for a given measurement
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
δ
ρ(δ)
Solid for non-negative, dashed for signed signal [Donoho]
Some precise numbers and implications
δ = .1 δ = .25 δ = .5 δ = .75 δ = .9
ρ+ N .060131 .087206 .133457 .198965 .266558
ρ+ W .240841 .364970 .558121 .765796 .902596
ρ±N .048802 .065440 .089416 .117096 .140416
ρ±W .188327 .266437 .384803 .532781 .677258
I For most A measure 1/10 of a non-negative signal, recovery every signal if 6% sparse, most if 24% sparse.
δ = .1 δ = .25 δ = .5 δ = .75 δ = .9
ρ+ N .060131 .087206 .133457 .198965 .266558
ρ+ W .240841 .364970 .558121 .765796 .902596
ρ±N .048802 .065440 .089416 .117096 .140416
ρ±W .188327 .266437 .384803 .532781 .677258
I Half ’under-sampling’, i.e., δ = 1/2: apply `1, if non-negative solution is < 55% sparse then typically is sparsest solution.
δ = .1 δ = .25 δ = .5 δ = .75 δ = .9
ρ+ N .060131 .087206 .133457 .198965 .266558
ρ+ W .240841 .364970 .558121 .765796 .902596
ρ±N .048802 .065440 .089416 .117096 .140416
ρ±W .188327 .266437 .384803 .532781 .677258
I Half ’under-sampling’, i.e., δ = 1/2: apply `1, if non-negative solution is < 55% sparse then typically is sparsest solution.
I Encode (1 − δ)n bits of info in signal of length n. Can recover with less than δρ±W (δ)n accidental, δρ±N(δ)n malicious errors. • twice redun., tolerate 19% random error, 4.4% mallitous.
Empirical verification of weak transitions
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
δ
ρ F
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Non-negative Signed signal
n = 200, 40 × 40 mesh with 60 random tests per node.
Ingredients of the proofs, non-negative
Proof (main ideas):
I Given x0 ≥ 0 with x0`1 = 1 and x0`0 = k
Proof (main ideas):
I x0 is on a k − 1 face (say F ) of unit simplex, T n−1
Proof (main ideas):
I b = Ax0 either on boundary (AF ∈ Fk(AT )), or inside AT
Proof (main ideas):
I if on boundary then unique for x`1 ≤ 1 and Ax = b, `1 → `0
Proof (main ideas):
I if b in the interior of P = AT then ∃ x`1 < 1 with Ax = b.
Proof (main ideas):
I if b in the interior of P = AT then ∃ x`1 < 1 with Ax = b. I If j ≤ k faces of T n−1 remain in P then `1 → `0 for x`0 ≤ k
Proof (main ideas):
Proof (main ideas):
ρN := Prob(f`(AT n−1) = f`(T n−1) : ` = 0, . . . , bρNdc) → 1
That is, f`(T n−1) − Ef`(AT n−1) ≤ πne
−εn
Proof (main ideas):
−εn
ρW := Ef`(AT n−1) ≥ (1 − ε)f`(T n−1), ` = 0, . . . , bρW dc
Proof (main ideas):
−εn
ρW := Ef`(AT n−1) ≥ (1 − ε)f`(T n−1), ` = 0, . . . , bρW dc
Robustness: Nearby sparse solution, Ax0 − b2 ≤ ε, then solve
min x1,ε`1 such that Ax1,ε − b2 ≤ ε
Then x0 − x1,ε2 ≤ C (k ,A)ε where k = x0`0 .
Summary
I Underdetermined system, Ax = b with A ∈ R d×n where d < n
I To obtain sparsest solution, x`0 , solve constrained min x`1
I Precise sparsity phase transitions, ρ(d/n) available for `1 → `0
I That is, if x`0 < ρ(d/n) · d then min x`1 → min x`0
I Surprisingly large transition, effectiveness of Basis Pursuit (`1)
Summary
I Underdetermined system, Ax = b with A ∈ R d×n where d < n
I To obtain sparsest solution, x`0 , solve constrained min x`1
I Precise sparsity phase transitions, ρ(d/n) available for `1 → `0
I That is, if x`0 < ρ(d/n) · d then min x`1 → min x`0
I Surprisingly large transition, effectiveness of Basis Pursuit (`1)
Associated Papers for non-negative case [Donoho, T]:
I Sparse Nonnegative Solution of Underdetermined Linear Equations by Linear Programming, Proc. Nat. Acc. Sci.
I Neighborliness of Randomly-Projected Simplices in High Dimensions, Proc. Nat. Acc. Sci.
• see also work by Donoho; Candes, Romberg, Tao; Tropp
Thank you for your time
Underdetermined Systems, the frame perspective
Least squares
Random sampling matrices
Phase transitions for 1 recovering 0
Summary

Sparse Solutions of Underdetermined Linear Equations by ...

Documents

Transcript of Sparse Solutions of Underdetermined Linear Equations by ...