Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. ·...

156
Constrained optimization DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/OBDA_fall17/index.html Carlos Fernandez-Granda

Transcript of Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. ·...

Page 1: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Constrained optimization

DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysishttp://www.cims.nyu.edu/~cfgranda/pages/OBDA_fall17/index.html

Carlos Fernandez-Granda

Page 2: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Compressed sensing

Convex constrained problems

Analyzing optimization-based methods

Page 3: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Magnetic resonance imaging

2D DFT (magnitude) 2D DFT (log. of magnitude)

Page 4: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Magnetic resonance imaging

Data: Samples from spectrum

Problem: Sampling is time consuming (annoying, kids move . . . )

Images are compressible (sparse in wavelet basis)

Can we recover compressible signals from less data?

Page 5: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

2D wavelet transform

Page 6: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

2D wavelet transform

Page 7: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Full DFT matrix

Page 8: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Full DFT matrix

Page 9: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Regular subsampling

Page 10: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Regular subsampling

Page 11: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Random subsampling

Page 12: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Random subsampling

Page 13: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Toy example

Page 14: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Regular subsampling

Page 15: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Random subsampling

Page 16: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Linear inverse problems

Linear inverse problem

A~x = ~y

Linear measurements, A ∈ Rm×n

~y [i ] = 〈Ai :, ~x〉 , 1 ≤ i ≤ n,

Aim: Recover data signal ~x ∈ Rm from data ~y ∈ Rn

We need n ≥ m, otherwise the problem is underdetermined

If n < m there are infinite solutions ~x + ~w where ~w ∈ null (A)

Page 17: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Sparse recovery

Aim: Recover sparse ~x from linear measurements

A~x = ~y

When is the problem well posed?

There shouldn’t be two sparse vectors ~x1 and ~x2 such that A~x1 = A~x2

Page 18: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Spark

The spark of a matrix is the smallest subset of columns that is linearlydependent

Let ~y := A~x ∗, where A ∈ Rm×n, ~y ∈ Rn and ~x ∗ ∈ Rm is a sparse vectorwith s nonzero entries

The vector ~x ∗ is the only vector with sparsity s consistent with the data,i.e. it is the solution of

min~x||~x ||0 subject to A~x = ~y

for any choice of ~x ∗ if and only if

spark (A) > 2s

Page 19: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Spark

The spark of a matrix is the smallest subset of columns that is linearlydependent

Let ~y := A~x ∗, where A ∈ Rm×n, ~y ∈ Rn and ~x ∗ ∈ Rm is a sparse vectorwith s nonzero entries

The vector ~x ∗ is the only vector with sparsity s consistent with the data,i.e. it is the solution of

min~x||~x ||0 subject to A~x = ~y

for any choice of ~x ∗ if and only if

spark (A) > 2s

Page 20: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Proof

Equivalent statements

I For any ~x ∗, ~x ∗ is the only vector with sparsity s consistent with thedata

I For any pair of s-sparse vectors ~x1 and ~x2

A (~x1 − ~x2) 6= ~0

I For any pair of subsets of s indices T1 and T2

AT1∪T2~α 6= ~0 for any ~α ∈ R|T1∪T2|

I All submatrices with at most 2s columns have no nonzero vectors intheir null space

I All submatrices with at most 2s columns are full rank

Page 21: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Proof

Equivalent statements

I For any ~x ∗, ~x ∗ is the only vector with sparsity s consistent with thedata

I For any pair of s-sparse vectors ~x1 and ~x2

A (~x1 − ~x2) 6= ~0

I For any pair of subsets of s indices T1 and T2

AT1∪T2~α 6= ~0 for any ~α ∈ R|T1∪T2|

I All submatrices with at most 2s columns have no nonzero vectors intheir null space

I All submatrices with at most 2s columns are full rank

Page 22: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Proof

Equivalent statements

I For any ~x ∗, ~x ∗ is the only vector with sparsity s consistent with thedata

I For any pair of s-sparse vectors ~x1 and ~x2

A (~x1 − ~x2) 6= ~0

I For any pair of subsets of s indices T1 and T2

AT1∪T2~α 6= ~0 for any ~α ∈ R|T1∪T2|

I All submatrices with at most 2s columns have no nonzero vectors intheir null space

I All submatrices with at most 2s columns are full rank

Page 23: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Proof

Equivalent statements

I For any ~x ∗, ~x ∗ is the only vector with sparsity s consistent with thedata

I For any pair of s-sparse vectors ~x1 and ~x2

A (~x1 − ~x2) 6= ~0

I For any pair of subsets of s indices T1 and T2

AT1∪T2~α 6= ~0 for any ~α ∈ R|T1∪T2|

I All submatrices with at most 2s columns have no nonzero vectors intheir null space

I All submatrices with at most 2s columns are full rank

Page 24: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Proof

Equivalent statements

I For any ~x ∗, ~x ∗ is the only vector with sparsity s consistent with thedata

I For any pair of s-sparse vectors ~x1 and ~x2

A (~x1 − ~x2) 6= ~0

I For any pair of subsets of s indices T1 and T2

AT1∪T2~α 6= ~0 for any ~α ∈ R|T1∪T2|

I All submatrices with at most 2s columns have no nonzero vectors intheir null space

I All submatrices with at most 2s columns are full rank

Page 25: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Restricted-isometry property

Robust version of spark

If two s-sparse vectors ~x1, ~x2 are far, then A~x1, A~x2 should be far

The linear operator should preserve distances (be an isometry) whenrestricted to act upon sparse vectors

Page 26: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Restricted-isometry property

A satisfies the restricted isometry property (RIP) with constant κs if

(1− κs) ||~x ||2 ≤ ||A~x ||2 ≤ (1+ κs) ||~x ||2

for any s-sparse vector ~x

If A satisfies the RIP for a sparsity level 2s then for any s-sparse ~x1, ~x2

||~y2 − ~y1||2

= A (~x1 − ~x2)

≥ (1− κ2s) ||~x2 − ~x1||2

Page 27: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Restricted-isometry property

A satisfies the restricted isometry property (RIP) with constant κs if

(1− κs) ||~x ||2 ≤ ||A~x ||2 ≤ (1+ κs) ||~x ||2

for any s-sparse vector ~x

If A satisfies the RIP for a sparsity level 2s then for any s-sparse ~x1, ~x2

||~y2 − ~y1||2 = A (~x1 − ~x2)

≥ (1− κ2s) ||~x2 − ~x1||2

Page 28: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Regular subsampling

Page 29: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Regular subsampling

Page 30: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Correlation with column 20

0 10 20 30 40 50 60

0.0

0.2

0.4

0.6

0.8

1.0C

orr

ela

tion

Page 31: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Random subsampling

Page 32: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Random subsampling

Page 33: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Correlation with column 20

0 10 20 30 40 50 600.6

0.4

0.2

0.0

0.2

0.4

0.6

0.8

1.0C

orr

ela

tion

Page 34: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Restricted-isometry property

Deterministic matrices tend to not satisfy the RIP

It is NP-hard to check if spark or RIP hold

Random matrices satisfy RIP with high probability

We prove it for Gaussian iid matrices, ideas in proof for random Fouriermatrices are similar

Page 35: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Restricted-isometry property for Gaussian matrices

Let A ∈ Rm×n be a random matrix with iid standard Gaussian entries

1√m

A satisfies the RIP for a constant κs with probability 1− C2n as long as

m ≥ C1s

κ2s

log(ns

)for two fixed constants C1,C2 > 0

Page 36: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Proof

For a fixed support T of size s bounds follow from bounds on singularvalues of Gaussian matrices

Page 37: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Singular values of m × s matrix, s = 100

0 20 40 60 80 100

0.5

1

1.5

i

σi√m

m/s25102050100200

Page 38: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Singular values of m × s matrix, s = 1000

0 200 400 600 800 1,000

0.5

1

1.5

i

σi√m

m/s25102050100200

Page 39: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Proof

For a fixed submatrix the singular values are bounded by√m (1− κs) ≤ σs ≤ σ1 ≤

√m (1+ κs)

with probability at least

1− 2(12κs

)s

exp(−mκ2

s

32

)For any vector ~x with support T

√1− κs ||~x ||2 ≤

1√m||A~x ||2 ≤

√1+ κs ||~x ||2

Page 40: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Union bound

For any events S1, S2, . . . ,Sn in a probability space

P (∪iSi ) ≤n∑

i=1

P (Si ) .

Page 41: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Proof

Number of different supports of size s

(n

s

)≤(ens

)sBy the union bound

√1− κs ||~x ||2 ≤

1√m||A~x ||2 ≤

√1+ κs ||~x ||2

holds for any s-sparse vector ~x with probability at least

1− 2(ens

)s (12κs

)s

exp(−mκ2

s

32

)= 1− exp

(log 2+ s + s log

(ns

)+ s log

(12κs

)− mκ2

s

2

)≤ 1− C2

nas long as m ≥ C1s

κ2s

log(ns

)

Page 42: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Proof

Number of different supports of size s(n

s

)≤(ens

)s

By the union bound

√1− κs ||~x ||2 ≤

1√m||A~x ||2 ≤

√1+ κs ||~x ||2

holds for any s-sparse vector ~x with probability at least

1− 2(ens

)s (12κs

)s

exp(−mκ2

s

32

)= 1− exp

(log 2+ s + s log

(ns

)+ s log

(12κs

)− mκ2

s

2

)≤ 1− C2

nas long as m ≥ C1s

κ2s

log(ns

)

Page 43: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Proof

Number of different supports of size s(n

s

)≤(ens

)sBy the union bound

√1− κs ||~x ||2 ≤

1√m||A~x ||2 ≤

√1+ κs ||~x ||2

holds for any s-sparse vector ~x with probability at least

1− 2(ens

)s (12κs

)s

exp(−mκ2

s

32

)= 1− exp

(log 2+ s + s log

(ns

)+ s log

(12κs

)− mκ2

s

2

)≤ 1− C2

nas long as m ≥ C1s

κ2s

log(ns

)

Page 44: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Sparse recovery via `1-norm minimization

`0-“norm" minimization is intractable

(As usual) we can minimize `1 norm instead, estimate ~x`1 is the solutionto

min~x||~x ||1 subject to A~x = ~y

Page 45: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Minimum `2-norm solution (regular subsampling)

0 10 20 30 40 50 60 70-3

-2

-1

0

1

2

3

4

True SignalMinimum L2 Solution

Page 46: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Minimum `1-norm solution (regular subsampling)

0 10 20 30 40 50 60 70-3

-2

-1

0

1

2

3

4

True SignalMinimum L1 Solution

Page 47: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Minimum `2-norm solution (random subsampling)

0 10 20 30 40 50 60 70-3

-2

-1

0

1

2

3

4

True SignalMinimum L2 Solution

Page 48: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Minimum `1-norm solution (random subsampling)

0 10 20 30 40 50 60 70-3

-2

-1

0

1

2

3

4

True SignalMinimum L1 Solution

Page 49: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Geometric intuition

`2 norm `1 norm

Page 50: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Sparse recovery via `1-norm minimization

If the signal is sparse in a transform domain then

min~x||~c ||1 subject to AW ~c = ~y

If we want to recover the original ~c ∗ then AW should satisfy the RIP

However, we might be fine with any ~c ′ such that A~c ′ = ~x ∗

Page 51: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Sparse recovery via `1-norm minimization

If the signal is sparse in a transform domain then

min~x||~c ||1 subject to AW ~c = ~y

If we want to recover the original ~c ∗ then AW should satisfy the RIP

However, we might be fine with any ~c ′ such that A~c ′ = ~x ∗

Page 52: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Regular subsampling

Page 53: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Minimum `2-norm solution (regular subsampling)

Page 54: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Minimum `1-norm solution (regular subsampling)

Page 55: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Random subsampling

Page 56: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Minimum `2-norm solution (random subsampling)

Page 57: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Minimum `1-norm solution (random subsampling)

Page 58: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Compressed sensing

Convex constrained problems

Analyzing optimization-based methods

Page 59: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Convex sets

A convex set S is any set such that for any ~x , ~y ∈ S and θ ∈ (0, 1)

θ~x + (1− θ) ~y ∈ S

The intersection of convex sets is convex

Page 60: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Convex vs nonconvex

Nonconvex Convex

Page 61: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Epigraph

f

epi (f )

A function is convex if and only if its epigraph is convex

Page 62: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Projection onto convex set

The projection of any vector ~x onto a non-empty closed convex set S

PS (~x) := argmin~y∈S||~x − ~y ||2

exists and is unique

Page 63: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Proof

Assume there are two distinct projections ~y1 6= ~y2

Consider

~y ′ :=~y1 + ~y2

2

~y ′ belongs to S (why?)

Page 64: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Proof

⟨~x − ~y ′, ~y1 − ~y ′

⟩=

⟨~x −

~y1 + ~y2

2, ~y1 −

~y1 + ~y2

2

⟩=

⟨~x − ~y1

2+~x − ~y2

2,~x − ~y1

2−~x − ~y2

2

=14

(||~x − ~y1||2 + ||~x − ~y2||2

)= 0

By Pythagoras’ theorem

||~x − ~y1||22 =∣∣∣∣~x − ~y ′∣∣∣∣22 + ∣∣∣∣~y1 − ~y ′

∣∣∣∣22

=∣∣∣∣~x − ~y ′∣∣∣∣22 + ∣∣∣∣∣∣∣∣~y1 − ~y2

2

∣∣∣∣∣∣∣∣22

>∣∣∣∣~x − ~y ′∣∣∣∣22

Page 65: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Proof

⟨~x − ~y ′, ~y1 − ~y ′

⟩=

⟨~x −

~y1 + ~y2

2, ~y1 −

~y1 + ~y2

2

⟩=

⟨~x − ~y1

2+~x − ~y2

2,~x − ~y1

2−~x − ~y2

2

⟩=

14

(||~x − ~y1||2 + ||~x − ~y2||2

)= 0

By Pythagoras’ theorem

||~x − ~y1||22 =∣∣∣∣~x − ~y ′∣∣∣∣22 + ∣∣∣∣~y1 − ~y ′

∣∣∣∣22

=∣∣∣∣~x − ~y ′∣∣∣∣22 + ∣∣∣∣∣∣∣∣~y1 − ~y2

2

∣∣∣∣∣∣∣∣22

>∣∣∣∣~x − ~y ′∣∣∣∣22

Page 66: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Proof

⟨~x − ~y ′, ~y1 − ~y ′

⟩=

⟨~x −

~y1 + ~y2

2, ~y1 −

~y1 + ~y2

2

⟩=

⟨~x − ~y1

2+~x − ~y2

2,~x − ~y1

2−~x − ~y2

2

⟩=

14

(||~x − ~y1||2 + ||~x − ~y2||2

)= 0

By Pythagoras’ theorem

||~x − ~y1||22

=∣∣∣∣~x − ~y ′∣∣∣∣22 + ∣∣∣∣~y1 − ~y ′

∣∣∣∣22

=∣∣∣∣~x − ~y ′∣∣∣∣22 + ∣∣∣∣∣∣∣∣~y1 − ~y2

2

∣∣∣∣∣∣∣∣22

>∣∣∣∣~x − ~y ′∣∣∣∣22

Page 67: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Proof

⟨~x − ~y ′, ~y1 − ~y ′

⟩=

⟨~x −

~y1 + ~y2

2, ~y1 −

~y1 + ~y2

2

⟩=

⟨~x − ~y1

2+~x − ~y2

2,~x − ~y1

2−~x − ~y2

2

⟩=

14

(||~x − ~y1||2 + ||~x − ~y2||2

)= 0

By Pythagoras’ theorem

||~x − ~y1||22 =∣∣∣∣~x − ~y ′∣∣∣∣22 + ∣∣∣∣~y1 − ~y ′

∣∣∣∣22

=∣∣∣∣~x − ~y ′∣∣∣∣22 + ∣∣∣∣∣∣∣∣~y1 − ~y2

2

∣∣∣∣∣∣∣∣22

>∣∣∣∣~x − ~y ′∣∣∣∣22

Page 68: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Proof

⟨~x − ~y ′, ~y1 − ~y ′

⟩=

⟨~x −

~y1 + ~y2

2, ~y1 −

~y1 + ~y2

2

⟩=

⟨~x − ~y1

2+~x − ~y2

2,~x − ~y1

2−~x − ~y2

2

⟩=

14

(||~x − ~y1||2 + ||~x − ~y2||2

)= 0

By Pythagoras’ theorem

||~x − ~y1||22 =∣∣∣∣~x − ~y ′∣∣∣∣22 + ∣∣∣∣~y1 − ~y ′

∣∣∣∣22

=∣∣∣∣~x − ~y ′∣∣∣∣22 + ∣∣∣∣∣∣∣∣~y1 − ~y2

2

∣∣∣∣∣∣∣∣22

>∣∣∣∣~x − ~y ′∣∣∣∣22

Page 69: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Proof

⟨~x − ~y ′, ~y1 − ~y ′

⟩=

⟨~x −

~y1 + ~y2

2, ~y1 −

~y1 + ~y2

2

⟩=

⟨~x − ~y1

2+~x − ~y2

2,~x − ~y1

2−~x − ~y2

2

⟩=

14

(||~x − ~y1||2 + ||~x − ~y2||2

)= 0

By Pythagoras’ theorem

||~x − ~y1||22 =∣∣∣∣~x − ~y ′∣∣∣∣22 + ∣∣∣∣~y1 − ~y ′

∣∣∣∣22

=∣∣∣∣~x − ~y ′∣∣∣∣22 + ∣∣∣∣∣∣∣∣~y1 − ~y2

2

∣∣∣∣∣∣∣∣22

>∣∣∣∣~x − ~y ′∣∣∣∣22

Page 70: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Convex combination

Given n vectors ~x1, ~x2, . . . , ~xn ∈ Rn,

~x :=n∑

i=1

θi~xi

is a convex combination of ~x1, ~x2, . . . , ~xn if

θi ≥ 0, 1 ≤ i ≤ nn∑

i=1

θi = 1

Page 71: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Convex hull

The convex hull of S is the set of convex combinations of points in S

The `1-norm ball is the convex hull of the intersection between the`0 “norm" ball and the `∞-norm ball

Page 72: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

`1-norm ball

Page 73: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

B`1 ⊆ C (B`0 ∩ B`∞)

Let ~x ∈ B`1

Set θi := |~x [i ]|, θ0 = 1−∑n

i=1 θi∑ni=0 θi = 1 by construction, θi ≥ 0 and

θ0 = 1−n+1∑i=1

θi

= 1− ||~x ||1≥ 0 because ~x ∈ B`1

~x ∈ B`0 ∩ B`∞ because

~x =n∑

i=1

θi sign (~x [i ]) ~ei + θ0~0

Page 74: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

B`1 ⊆ C (B`0 ∩ B`∞)

Let ~x ∈ B`1

Set θi := |~x [i ]|, θ0 = 1−∑n

i=1 θi∑ni=0 θi = 1 by construction, θi ≥ 0 and

θ0 = 1−n+1∑i=1

θi

= 1− ||~x ||1≥ 0 because ~x ∈ B`1

~x ∈ B`0 ∩ B`∞ because

~x =n∑

i=1

θi sign (~x [i ]) ~ei + θ0~0

Page 75: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

C (B`0 ∩ B`∞) ⊆ B`1

Let ~x ∈ C (B`0 ∩ B`∞), then

~x =m∑i=1

θi~yi

||~x ||1 ≤m∑i=1

θi ||~yi ||1 by the Triangle inequality

≤m∑i=1

θi ||~yi ||∞ ~yi only has one nonzero entry

≤m∑i=1

θi

≤ 1

Page 76: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

C (B`0 ∩ B`∞) ⊆ B`1

Let ~x ∈ C (B`0 ∩ B`∞), then

~x =m∑i=1

θi~yi

||~x ||1

≤m∑i=1

θi ||~yi ||1 by the Triangle inequality

≤m∑i=1

θi ||~yi ||∞ ~yi only has one nonzero entry

≤m∑i=1

θi

≤ 1

Page 77: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

C (B`0 ∩ B`∞) ⊆ B`1

Let ~x ∈ C (B`0 ∩ B`∞), then

~x =m∑i=1

θi~yi

||~x ||1 ≤m∑i=1

θi ||~yi ||1 by the Triangle inequality

≤m∑i=1

θi ||~yi ||∞ ~yi only has one nonzero entry

≤m∑i=1

θi

≤ 1

Page 78: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

C (B`0 ∩ B`∞) ⊆ B`1

Let ~x ∈ C (B`0 ∩ B`∞), then

~x =m∑i=1

θi~yi

||~x ||1 ≤m∑i=1

θi ||~yi ||1 by the Triangle inequality

≤m∑i=1

θi ||~yi ||∞ ~yi only has one nonzero entry

≤m∑i=1

θi

≤ 1

Page 79: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

C (B`0 ∩ B`∞) ⊆ B`1

Let ~x ∈ C (B`0 ∩ B`∞), then

~x =m∑i=1

θi~yi

||~x ||1 ≤m∑i=1

θi ||~yi ||1 by the Triangle inequality

≤m∑i=1

θi ||~yi ||∞ ~yi only has one nonzero entry

≤m∑i=1

θi

≤ 1

Page 80: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

C (B`0 ∩ B`∞) ⊆ B`1

Let ~x ∈ C (B`0 ∩ B`∞), then

~x =m∑i=1

θi~yi

||~x ||1 ≤m∑i=1

θi ||~yi ||1 by the Triangle inequality

≤m∑i=1

θi ||~yi ||∞ ~yi only has one nonzero entry

≤m∑i=1

θi

≤ 1

Page 81: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Convex optimization problem

f0, f1, . . . , fm, h1, . . . , hp : Rn → R

minimize f0 (~x)

subject to fi (~x) ≤ 0, 1 ≤ i ≤ m,

hi (~x) = 0, 1 ≤ i ≤ p,

Page 82: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Definitions

I A feasible vector is a vector that satisfies all the constraints

I A solution is any vector ~x ∗ such that for all feasible vectors ~x

f0 (~x) ≥ f0 (~x∗)

I If a solution exists f (~x ∗) is the optimal value or optimum of theproblem

Page 83: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Convex optimization problem

The optimization problem is convex if

I f0 is convex

I f1, . . . , fm are convex

I h1, . . . , hp are affine, i.e. hi (~x) = ~aTi ~x + bi for some ~ai ∈ Rn and

bi ∈ R

Page 84: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Linear program

minimize ~aT~x

subject to ~c Ti ~x ≤ di , 1 ≤ i ≤ m

A~x = ~b

Page 85: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

`1-norm minimization as an LP

The optimization problem

minimize ||~x ||1subject to A~x = ~b

can be recast as the LP

minimizem∑i=1

~t[i ]

subject to ~t[i ] ≥ ~eiT~x

~t[i ] ≥ −~ei T~x

A~x = ~b

Page 86: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Proof

Solution to `1-norm min. problem: ~x `1

Solution to linear program:(~x lp, ~t lp)

Set ~t `1 [i ] :=∣∣~x `1 [i ]∣∣(

~x `1 , ~t `1)is feasible for linear program

∣∣∣∣∣∣~x `1∣∣∣∣∣∣1=

m∑i=1

~t `1 [i ]

≥m∑i=1

~t lp[i ] by optimality of ~t lp

≥∣∣∣∣∣∣~x lp

∣∣∣∣∣∣1

~x lp is a solution to the `1-norm min. problem

Page 87: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Proof

Solution to `1-norm min. problem: ~x `1

Solution to linear program:(~x lp, ~t lp)

Set ~t `1 [i ] :=∣∣~x `1 [i ]∣∣(

~x `1 , ~t `1)is feasible for linear program

∣∣∣∣∣∣~x `1∣∣∣∣∣∣1=

m∑i=1

~t `1 [i ]

≥m∑i=1

~t lp[i ] by optimality of ~t lp

≥∣∣∣∣∣∣~x lp

∣∣∣∣∣∣1

~x lp is a solution to the `1-norm min. problem

Page 88: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Proof

Solution to `1-norm min. problem: ~x `1

Solution to linear program:(~x lp, ~t lp)

Set ~t `1 [i ] :=∣∣~x `1 [i ]∣∣(

~x `1 , ~t `1)is feasible for linear program

∣∣∣∣∣∣~x `1∣∣∣∣∣∣1=

m∑i=1

~t `1 [i ]

≥m∑i=1

~t lp[i ] by optimality of ~t lp

≥∣∣∣∣∣∣~x lp

∣∣∣∣∣∣1

~x lp is a solution to the `1-norm min. problem

Page 89: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Proof

Solution to `1-norm min. problem: ~x `1

Solution to linear program:(~x lp, ~t lp)

Set ~t `1 [i ] :=∣∣~x `1 [i ]∣∣(

~x `1 , ~t `1)is feasible for linear program

∣∣∣∣∣∣~x `1∣∣∣∣∣∣1=

m∑i=1

~t `1 [i ]

≥m∑i=1

~t lp[i ] by optimality of ~t lp

≥∣∣∣∣∣∣~x lp

∣∣∣∣∣∣1

~x lp is a solution to the `1-norm min. problem

Page 90: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Proof

Set ~t `1 [i ] :=∣∣~x `1 [i ]∣∣

m∑i=1

t`1i =∣∣∣∣∣∣~x `1∣∣∣∣∣∣

1

≤∣∣∣∣∣∣~x lp

∣∣∣∣∣∣1

by optimality of ~x `1

≤m∑i=1

~t lp[i ]

(~x `1 , ~t `1

)is a solution to the linear problem

Page 91: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Proof

Set ~t `1 [i ] :=∣∣~x `1 [i ]∣∣

m∑i=1

t`1i =∣∣∣∣∣∣~x `1∣∣∣∣∣∣

1

≤∣∣∣∣∣∣~x lp

∣∣∣∣∣∣1

by optimality of ~x `1

≤m∑i=1

~t lp[i ]

(~x `1 , ~t `1

)is a solution to the linear problem

Page 92: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Proof

Set ~t `1 [i ] :=∣∣~x `1 [i ]∣∣

m∑i=1

t`1i =∣∣∣∣∣∣~x `1∣∣∣∣∣∣

1

≤∣∣∣∣∣∣~x lp

∣∣∣∣∣∣1

by optimality of ~x `1

≤m∑i=1

~t lp[i ]

(~x `1 , ~t `1

)is a solution to the linear problem

Page 93: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Proof

Set ~t `1 [i ] :=∣∣~x `1 [i ]∣∣

m∑i=1

t`1i =∣∣∣∣∣∣~x `1∣∣∣∣∣∣

1

≤∣∣∣∣∣∣~x lp

∣∣∣∣∣∣1

by optimality of ~x `1

≤m∑i=1

~t lp[i ]

(~x `1 , ~t `1

)is a solution to the linear problem

Page 94: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Quadratic program

For a positive semidefinite matrix Q ∈ Rn×n

minimize ~xTQ~x + ~aT~x

subject to ~c Ti ~x ≤ di , 1 ≤ i ≤ m,

A~x = ~b

Page 95: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

`1-norm regularized least squares as a QP

The optimization problem

minimize ||A~x − y ||22 + ~α ||~x ||1

can be recast as the QP

minimize ~xTATA~x − 2~yT~x + ~αn∑

i=1

~t[i ]

subject to ~t[i ] ≥ ~eiT~x

~t[i ] ≥ −~ei T~x

Page 96: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Lagrangian

The Lagrangian of a canonical optimization problem is

L (~x , ~α, ~ν) := f0 (~x) +m∑i=1

~α[i ] fi (~x) +

p∑j=1

~ν[j ] hj (~x) ,

~α ∈ Rm, ~ν ∈ Rp are called Lagrange multipliers or dual variables

If ~x is feasible and ~α[i ] ≥ 0 for 1 ≤ i ≤ m

L (~x , ~α, ~ν) ≤ f0 (~x)

Page 97: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Lagrange dual function

The Lagrange dual function of the problem is

l (~α, ~ν) := inf~x∈Rn

f0 (~x) +m∑i=1

~α[i ]fi (~x) +

p∑j=1

~ν[j ]hj (~x)

Let p∗ be an optimum of the optimization problem

l (~α, ~ν) ≤ p∗

as long as ~α[i ] ≥ 0 for 1 ≤ i ≤ n

Page 98: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Dual problem

The dual problem of the (primal) optimization problem is

maximize l (~α, ~ν)

subject to ~α[i ] ≥ 0, 1 ≤ i ≤ m.

The dual problem is always convex, even if the primal isn’t!

Page 99: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Maximum/supremum of convex functions

Pointwise maximum of m convex functions f1, . . . , fm

fmax (x) := max1≤i≤m

fi (x)

is convex

Pointwise supremum of a family of convex functions indexed by a set I

fsup (x) := supi∈I

fi (x)

is convex

Page 100: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Proof

For any 0 ≤ θ ≤ 1 and any ~x , ~y ∈ R,

fsup (θ~x + (1− θ) ~y) = supi∈I

fi (θ~x + (1− θ) ~y)

≤ supi∈I

θfi (~x) + (1− θ) fi (~y) by convexity of the fi

≤ θ supi∈I

fi (~x) + (1− θ) supj∈I

fj (~y)

= θfsup (~x) + (1− θ) fsup (~y)

Page 101: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Proof

For any 0 ≤ θ ≤ 1 and any ~x , ~y ∈ R,

fsup (θ~x + (1− θ) ~y) = supi∈I

fi (θ~x + (1− θ) ~y)

≤ supi∈I

θfi (~x) + (1− θ) fi (~y) by convexity of the fi

≤ θ supi∈I

fi (~x) + (1− θ) supj∈I

fj (~y)

= θfsup (~x) + (1− θ) fsup (~y)

Page 102: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Proof

For any 0 ≤ θ ≤ 1 and any ~x , ~y ∈ R,

fsup (θ~x + (1− θ) ~y) = supi∈I

fi (θ~x + (1− θ) ~y)

≤ supi∈I

θfi (~x) + (1− θ) fi (~y) by convexity of the fi

≤ θ supi∈I

fi (~x) + (1− θ) supj∈I

fj (~y)

= θfsup (~x) + (1− θ) fsup (~y)

Page 103: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Proof

For any 0 ≤ θ ≤ 1 and any ~x , ~y ∈ R,

fsup (θ~x + (1− θ) ~y) = supi∈I

fi (θ~x + (1− θ) ~y)

≤ supi∈I

θfi (~x) + (1− θ) fi (~y) by convexity of the fi

≤ θ supi∈I

fi (~x) + (1− θ) supj∈I

fj (~y)

= θfsup (~x) + (1− θ) fsup (~y)

Page 104: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Weak duality

If p∗ is a primal optimum and d∗ a dual optimum

d∗ ≤ p∗

Page 105: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Strong duality

For convex problems

d∗ = p∗

under very weak conditions

LPs: The primal optimum is finite

General convex programs (Slater’s condition):

There exists a point that is strictly feasible

fi (~x) < 0 1 ≤ i ≤ m

Page 106: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

`1-norm minimization

The dual problem of

min~x||~x ||1 subject to A~x = ~y

is

max~ν~yT~ν subject to

∣∣∣∣∣∣AT~ν∣∣∣∣∣∣∞≤ 1

Page 107: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Proof

Lagrangian L (~x , ~ν) = ||~x ||1 + ~νT (~y − A~x)

Lagrange dual function

l (~α, ~ν) := inf~x∈Rn

||~x ||1 − (AT~ν)T~x + ~νT~y

If AT~ν[i ] > 1? We can set ~x [i ]→∞ and l (~α, ~ν)→ −∞

If∣∣∣∣AT~ν

∣∣∣∣∞ ≤ 1?

(AT~ν)T~x ≤ ||~x ||1∣∣∣∣∣∣AT~ν

∣∣∣∣∣∣∞≤ ||~x ||1

so l (~α, ~ν) = ~νT~y

Page 108: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Proof

Lagrangian L (~x , ~ν) = ||~x ||1 + ~νT (~y − A~x)

Lagrange dual function

l (~α, ~ν) := inf~x∈Rn

||~x ||1 − (AT~ν)T~x + ~νT~y

If AT~ν[i ] > 1?

We can set ~x [i ]→∞ and l (~α, ~ν)→ −∞

If∣∣∣∣AT~ν

∣∣∣∣∞ ≤ 1?

(AT~ν)T~x ≤ ||~x ||1∣∣∣∣∣∣AT~ν

∣∣∣∣∣∣∞≤ ||~x ||1

so l (~α, ~ν) = ~νT~y

Page 109: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Proof

Lagrangian L (~x , ~ν) = ||~x ||1 + ~νT (~y − A~x)

Lagrange dual function

l (~α, ~ν) := inf~x∈Rn

||~x ||1 − (AT~ν)T~x + ~νT~y

If AT~ν[i ] > 1? We can set ~x [i ]→∞ and l (~α, ~ν)→ −∞

If∣∣∣∣AT~ν

∣∣∣∣∞ ≤ 1?

(AT~ν)T~x ≤ ||~x ||1∣∣∣∣∣∣AT~ν

∣∣∣∣∣∣∞≤ ||~x ||1

so l (~α, ~ν) = ~νT~y

Page 110: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Proof

Lagrangian L (~x , ~ν) = ||~x ||1 + ~νT (~y − A~x)

Lagrange dual function

l (~α, ~ν) := inf~x∈Rn

||~x ||1 − (AT~ν)T~x + ~νT~y

If AT~ν[i ] > 1? We can set ~x [i ]→∞ and l (~α, ~ν)→ −∞

If∣∣∣∣AT~ν

∣∣∣∣∞ ≤ 1?

(AT~ν)T~x

≤ ||~x ||1∣∣∣∣∣∣AT~ν

∣∣∣∣∣∣∞≤ ||~x ||1

so l (~α, ~ν) = ~νT~y

Page 111: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Proof

Lagrangian L (~x , ~ν) = ||~x ||1 + ~νT (~y − A~x)

Lagrange dual function

l (~α, ~ν) := inf~x∈Rn

||~x ||1 − (AT~ν)T~x + ~νT~y

If AT~ν[i ] > 1? We can set ~x [i ]→∞ and l (~α, ~ν)→ −∞

If∣∣∣∣AT~ν

∣∣∣∣∞ ≤ 1?

(AT~ν)T~x ≤ ||~x ||1∣∣∣∣∣∣AT~ν

∣∣∣∣∣∣∞≤ ||~x ||1

so l (~α, ~ν) = ~νT~y

Page 112: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Proof

Lagrangian L (~x , ~ν) = ||~x ||1 + ~νT (~y − A~x)

Lagrange dual function

l (~α, ~ν) := inf~x∈Rn

||~x ||1 − (AT~ν)T~x + ~νT~y

If AT~ν[i ] > 1? We can set ~x [i ]→∞ and l (~α, ~ν)→ −∞

If∣∣∣∣AT~ν

∣∣∣∣∞ ≤ 1?

(AT~ν)T~x ≤ ||~x ||1∣∣∣∣∣∣AT~ν

∣∣∣∣∣∣∞≤ ||~x ||1

so l (~α, ~ν) = ~νT~y

Page 113: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Strong duality

The solution ~ν ∗ to

max~ν~yT~ν subject to

∣∣∣∣∣∣AT~ν∣∣∣∣∣∣∞≤ 1

satisfies

(AT~ν ∗)[i ]= sign(~x ∗[i ]) for all ~x ∗[i ] 6= 0

for all solutions ~x ∗ to the primal problem

min~x||~x ||1 subject to A~x = ~y

Page 114: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Dual solution

0 10 20 30 40 50 60 70-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

True Signal SignMinimum L1 Dual Solution

Page 115: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Proof

By strong duality

||~x ∗||1 = ~yT~ν ∗

= (A~x ∗)T~ν ∗

= (~x ∗)T (AT~ν ∗)

=m∑i=1

(AT~ν ∗)[i ]~x ∗[i ]

By Hölder’s inequality

||~x ∗||1 ≥m∑i=1

(AT~ν ∗)[i ]~x ∗[i ]

with equality if and only if

(AT~ν ∗)[i ] = sign(~x ∗[i ]) for all ~x ∗[i ] 6= 0

Page 116: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Another algorithm for sparse recovery

Aim: Find nonzero locations of a sparse vector ~x from ~y = A~x

Insight: We have access to inner products of ~x and AT ~w for any ~w

~yT ~w

= (A~x)T ~w

= ~xT (AT ~w)

Idea: Maximize AT ~w , bounding magnitude of entries by 1

Entries where ~x is nonzero should saturate to 1 or -1

Page 117: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Another algorithm for sparse recovery

Aim: Find nonzero locations of a sparse vector ~x from ~y = A~x

Insight: We have access to inner products of ~x and AT ~w for any ~w

~yT ~w = (A~x)T ~w

= ~xT (AT ~w)

Idea: Maximize AT ~w , bounding magnitude of entries by 1

Entries where ~x is nonzero should saturate to 1 or -1

Page 118: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Another algorithm for sparse recovery

Aim: Find nonzero locations of a sparse vector ~x from ~y = A~x

Insight: We have access to inner products of ~x and AT ~w for any ~w

~yT ~w = (A~x)T ~w

= ~xT (AT ~w)

Idea: Maximize AT ~w , bounding magnitude of entries by 1

Entries where ~x is nonzero should saturate to 1 or -1

Page 119: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Another algorithm for sparse recovery

Aim: Find nonzero locations of a sparse vector ~x from ~y = A~x

Insight: We have access to inner products of ~x and AT ~w for any ~w

~yT ~w = (A~x)T ~w

= ~xT (AT ~w)

Idea: Maximize AT ~w , bounding magnitude of entries by 1

Entries where ~x is nonzero should saturate to 1 or -1

Page 120: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Another algorithm for sparse recovery

Aim: Find nonzero locations of a sparse vector ~x from ~y = A~x

Insight: We have access to inner products of ~x and AT ~w for any ~w

~yT ~w = (A~x)T ~w

= ~xT (AT ~w)

Idea: Maximize AT ~w , bounding magnitude of entries by 1

Entries where ~x is nonzero should saturate to 1 or -1

Page 121: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Compressed sensing

Convex constrained problems

Analyzing optimization-based methods

Page 122: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Analyzing optimization-based methods

Best case scenario: Primal solution has closed form

Otherwise: Use dual solution to characterize primal solution

Page 123: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Minimum `2-norm solution

Let A ∈ Rm×n be a full rank matrix such that m < n

For any ~y ∈ Rn the solution to the optimization problem

argmin~x||~x ||2 subject to A~x = ~y .

is

~x ∗ := VS−1UT~y

= AT(ATA

)−1~y

where A = USV T is the SVD of A

Page 124: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Proof

~x = Prow(A) ~x + Prow(A)⊥ ~x

Since A is full rank V , Prow(A) ~x = V ~c for some vector ~c ∈ Rn

A~x = AProw(A) ~x

= USV TV ~c

= US~c

A~x = ~y is equivalent to US~c = ~y and ~c = S−1UT~y

Page 125: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Proof

~x = Prow(A) ~x + Prow(A)⊥ ~x

Since A is full rank V , Prow(A) ~x = V ~c for some vector ~c ∈ Rn

A~x = AProw(A) ~x

= USV TV ~c

= US~c

A~x = ~y is equivalent to US~c = ~y and ~c = S−1UT~y

Page 126: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Proof

~x = Prow(A) ~x + Prow(A)⊥ ~x

Since A is full rank V , Prow(A) ~x = V ~c for some vector ~c ∈ Rn

A~x = AProw(A) ~x

= USV TV ~c

= US~c

A~x = ~y is equivalent to US~c = ~y and ~c = S−1UT~y

Page 127: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Proof

~x = Prow(A) ~x + Prow(A)⊥ ~x

Since A is full rank V , Prow(A) ~x = V ~c for some vector ~c ∈ Rn

A~x = AProw(A) ~x

= USV TV ~c

= US~c

A~x = ~y is equivalent to US~c = ~y and ~c = S−1UT~y

Page 128: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Proof

For all feasible vectors ~x

Prow(A) ~x = VS−1UT~y

By Pythagoras’ theorem, minimizing ||~x ||2 is equivalent to minimizing

||~x ||22 =∣∣∣∣Prow(A) ~x

∣∣∣∣22 +

∣∣∣∣∣∣Prow(A)⊥ ~x∣∣∣∣∣∣2

2

Page 129: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Regular subsampling

Page 130: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Minimum `2-norm solution (regular subsampling)

0 10 20 30 40 50 60 70-3

-2

-1

0

1

2

3

4

True SignalMinimum L2 Solution

Page 131: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Regular subsampling

A :=1√2

[Fm/2 Fm/2

]F ∗m/2Fm/2 = I

Fm/2F∗m/2 = I

~x :=

[~xup~xdown

]

Page 132: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Regular subsampling

~x`2 = arg minA~x=~y

||~x ||2

= AT(ATA

)−1~y

=1√2

[F ∗m/2F ∗m/2

](1√2

[Fm/2 Fm/2

] 1√2

[F ∗m/2F ∗m/2

])−11√2

[Fm/2 Fm/2

] [ ~xup~xdown

]

=12

[F ∗m/2F ∗m/2

](12

[Fm/2F

∗m/2 + Fm/2F

∗m/2

])−1 (Fm/2~xup + Fm/2~xdown

)=

12

[F ∗m/2F ∗m/2

]I−1 (Fm/2~xup + Fm/2~xdown

)=

12

[F ∗m/2

(Fm/2~xup + Fm/2~xdown

)F ∗m/2

(Fm/2~xup + Fm/2~xdown

)]

=12

[~xup + ~xdown~xup + ~xdown

]

Page 133: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Regular subsampling

~x`2 = arg minA~x=~y

||~x ||2

= AT(ATA

)−1~y

=1√2

[F ∗m/2F ∗m/2

](1√2

[Fm/2 Fm/2

] 1√2

[F ∗m/2F ∗m/2

])−11√2

[Fm/2 Fm/2

] [ ~xup~xdown

]

=12

[F ∗m/2F ∗m/2

](12

[Fm/2F

∗m/2 + Fm/2F

∗m/2

])−1 (Fm/2~xup + Fm/2~xdown

)=

12

[F ∗m/2F ∗m/2

]I−1 (Fm/2~xup + Fm/2~xdown

)=

12

[F ∗m/2

(Fm/2~xup + Fm/2~xdown

)F ∗m/2

(Fm/2~xup + Fm/2~xdown

)]

=12

[~xup + ~xdown~xup + ~xdown

]

Page 134: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Regular subsampling

~x`2 = arg minA~x=~y

||~x ||2

= AT(ATA

)−1~y

=1√2

[F ∗m/2F ∗m/2

](1√2

[Fm/2 Fm/2

] 1√2

[F ∗m/2F ∗m/2

])−11√2

[Fm/2 Fm/2

] [ ~xup~xdown

]

=12

[F ∗m/2F ∗m/2

](12

[Fm/2F

∗m/2 + Fm/2F

∗m/2

])−1 (Fm/2~xup + Fm/2~xdown

)=

12

[F ∗m/2F ∗m/2

]I−1 (Fm/2~xup + Fm/2~xdown

)=

12

[F ∗m/2

(Fm/2~xup + Fm/2~xdown

)F ∗m/2

(Fm/2~xup + Fm/2~xdown

)]

=12

[~xup + ~xdown~xup + ~xdown

]

Page 135: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Regular subsampling

~x`2 = arg minA~x=~y

||~x ||2

= AT(ATA

)−1~y

=1√2

[F ∗m/2F ∗m/2

](1√2

[Fm/2 Fm/2

] 1√2

[F ∗m/2F ∗m/2

])−11√2

[Fm/2 Fm/2

] [ ~xup~xdown

]

=12

[F ∗m/2F ∗m/2

](12

[Fm/2F

∗m/2 + Fm/2F

∗m/2

])−1 (Fm/2~xup + Fm/2~xdown

)

=12

[F ∗m/2F ∗m/2

]I−1 (Fm/2~xup + Fm/2~xdown

)=

12

[F ∗m/2

(Fm/2~xup + Fm/2~xdown

)F ∗m/2

(Fm/2~xup + Fm/2~xdown

)]

=12

[~xup + ~xdown~xup + ~xdown

]

Page 136: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Regular subsampling

~x`2 = arg minA~x=~y

||~x ||2

= AT(ATA

)−1~y

=1√2

[F ∗m/2F ∗m/2

](1√2

[Fm/2 Fm/2

] 1√2

[F ∗m/2F ∗m/2

])−11√2

[Fm/2 Fm/2

] [ ~xup~xdown

]

=12

[F ∗m/2F ∗m/2

](12

[Fm/2F

∗m/2 + Fm/2F

∗m/2

])−1 (Fm/2~xup + Fm/2~xdown

)=

12

[F ∗m/2F ∗m/2

]I−1 (Fm/2~xup + Fm/2~xdown

)

=12

[F ∗m/2

(Fm/2~xup + Fm/2~xdown

)F ∗m/2

(Fm/2~xup + Fm/2~xdown

)]

=12

[~xup + ~xdown~xup + ~xdown

]

Page 137: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Regular subsampling

~x`2 = arg minA~x=~y

||~x ||2

= AT(ATA

)−1~y

=1√2

[F ∗m/2F ∗m/2

](1√2

[Fm/2 Fm/2

] 1√2

[F ∗m/2F ∗m/2

])−11√2

[Fm/2 Fm/2

] [ ~xup~xdown

]

=12

[F ∗m/2F ∗m/2

](12

[Fm/2F

∗m/2 + Fm/2F

∗m/2

])−1 (Fm/2~xup + Fm/2~xdown

)=

12

[F ∗m/2F ∗m/2

]I−1 (Fm/2~xup + Fm/2~xdown

)=

12

[F ∗m/2

(Fm/2~xup + Fm/2~xdown

)F ∗m/2

(Fm/2~xup + Fm/2~xdown

)]

=12

[~xup + ~xdown~xup + ~xdown

]

Page 138: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Regular subsampling

~x`2 = arg minA~x=~y

||~x ||2

= AT(ATA

)−1~y

=1√2

[F ∗m/2F ∗m/2

](1√2

[Fm/2 Fm/2

] 1√2

[F ∗m/2F ∗m/2

])−11√2

[Fm/2 Fm/2

] [ ~xup~xdown

]

=12

[F ∗m/2F ∗m/2

](12

[Fm/2F

∗m/2 + Fm/2F

∗m/2

])−1 (Fm/2~xup + Fm/2~xdown

)=

12

[F ∗m/2F ∗m/2

]I−1 (Fm/2~xup + Fm/2~xdown

)=

12

[F ∗m/2

(Fm/2~xup + Fm/2~xdown

)F ∗m/2

(Fm/2~xup + Fm/2~xdown

)]

=12

[~xup + ~xdown~xup + ~xdown

]

Page 139: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Minimum `1-norm solution

Problem: argminA~x=~y ||~x ||1 doesn’t have a closed form

Instead we can use a dual variable to certify optimality

Page 140: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Dual solution

The solution ~ν ∗ to

max~ν~yT~ν subject to

∣∣∣∣∣∣AT~ν∣∣∣∣∣∣∞≤ 1

satisfies

(AT~ν ∗)[i ]= sign(~x ∗[i ]) for all ~x ∗[i ] 6= 0

where ~x ∗[i ] is a solution to the primal problem

min~x||~x ||1 subject to A~x = ~y

Page 141: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Dual certificate

If there exists a vector ~ν ∈ Rn such that

(AT~ν)[i ] = sign(~x ∗[i ]) if ~x ∗[i ] 6= 0∣∣∣(AT~ν)[i ]∣∣∣<1 if ~x ∗[i ] = 0

then ~x ∗ is the unique solution to the primal problem

min~x||~x ||1 subject to A~x = ~y

as long as the submatrix AT is full rank

Page 142: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Proof 1

~ν is feasible for the dual problem, so for any primal feasible ~x

||~x ||1 ≥ ~yT~ν

= (A~x ∗)T~ν

= (~x ∗)T (AT~ν)

=∑i∈T

~x ∗[i ] sign(~x ∗[i ])

= ||~x ∗||1

~x ∗ must be a solution

Page 143: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Proof 1

~ν is feasible for the dual problem, so for any primal feasible ~x

||~x ||1 ≥ ~yT~ν = (A~x ∗)T~ν

= (~x ∗)T (AT~ν)

=∑i∈T

~x ∗[i ] sign(~x ∗[i ])

= ||~x ∗||1

~x ∗ must be a solution

Page 144: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Proof 1

~ν is feasible for the dual problem, so for any primal feasible ~x

||~x ||1 ≥ ~yT~ν = (A~x ∗)T~ν

= (~x ∗)T (AT~ν)

=∑i∈T

~x ∗[i ] sign(~x ∗[i ])

= ||~x ∗||1

~x ∗ must be a solution

Page 145: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Proof 1

~ν is feasible for the dual problem, so for any primal feasible ~x

||~x ||1 ≥ ~yT~ν = (A~x ∗)T~ν

= (~x ∗)T (AT~ν)

=∑i∈T

~x ∗[i ] sign(~x ∗[i ])

= ||~x ∗||1

~x ∗ must be a solution

Page 146: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Proof 1

~ν is feasible for the dual problem, so for any primal feasible ~x

||~x ||1 ≥ ~yT~ν = (A~x ∗)T~ν

= (~x ∗)T (AT~ν)

=∑i∈T

~x ∗[i ] sign(~x ∗[i ])

= ||~x ∗||1

~x ∗ must be a solution

Page 147: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Proof 2

AT~ν is a subgradient of the `1 norm at ~x ∗

For any other feasible vector ~x

||~x ||1 ≥ ||~x∗||1 + (AT~ν)T (~x − ~x ∗)

= ||~x ∗||1 + ~νT (A~x − A~x ∗)

= ||~x ∗||1

Page 148: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Proof 2

AT~ν is a subgradient of the `1 norm at ~x ∗

For any other feasible vector ~x

||~x ||1 ≥ ||~x∗||1 + (AT~ν)T (~x − ~x ∗)

= ||~x ∗||1 + ~νT (A~x − A~x ∗)

= ||~x ∗||1

Page 149: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Proof 2

AT~ν is a subgradient of the `1 norm at ~x ∗

For any other feasible vector ~x

||~x ||1 ≥ ||~x∗||1 + (AT~ν)T (~x − ~x ∗)

= ||~x ∗||1 + ~νT (A~x − A~x ∗)

= ||~x ∗||1

Page 150: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Random subsampling

Page 151: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Minimum `1-norm solution (random subsampling)

0 10 20 30 40 50 60 70-3

-2

-1

0

1

2

3

4

True SignalMinimum L1 Solution

Page 152: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Exact sparse recovery via `1-norm minimization

Assumption: There exists a signal ~x ∗ ∈ Rm with s nonzeros such that

A~x ∗ = ~y

for a random A ∈ Rm×n (random Fourier, Gaussian iid, . . . )

Exact recovery: If the number of measurements satisfies

m ≥ C ′s log n

the solution of the problem

minimize ||~x ||1 subject to A ~x = y

is the original signal with probability at least 1− 1n

Page 153: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Proof

Show that dual certificate always exists

We need

ATT~ν = sign(~x ∗T ) s constraints∣∣∣∣∣∣AT

T c~ν∣∣∣∣∣∣∞< 1

Idea: Impose AT~ν = sign(~x ∗) and minimize∣∣∣∣AT

T c~ν∣∣∣∣∞

Problem: No closed-form solution

How about minimizing `2 norm?

Page 154: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Proof of exact recovery

Prove that dual certificate exists for any s-sparse ~x ∗

Dual certificate candidate: Solution of

minimize ||~v ||2subject to AT

T~v = sign (~x ∗T )

Closed-form solution ~ν`2 := AT

(ATTAT

)−1 sign (~x ∗T )

ATTAT is invertible with high probability

We need to prove that AT~ν`2 satisfies∣∣∣∣∣∣(AT~ν`2)T c

∣∣∣∣∣∣∞< 1

Page 155: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Dual certificate

0 10 20 30 40 50 60 70-1.5

-1

-0.5

0

0.5

1

1.5

Sign PatternDual Function

Page 156: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en

Proof of exact recovery

To control (AT~ν`2)T c , we need to bound

ATi

(ATTAT

)−1sign (~x ∗T )

for i ∈ T c

Let ~w :=(ATTAT

)−1 sign (~x ∗T )

|ATi ~w| can be bounded using independence

Result then follows from union bound