Download - Data Compressing

7/30/2019 Data Compressing

1/163

Lecture Note (Data Compression-CS) by Prof. Byeungwoo Jeon (2012 Fall)

2012/FallCh2Ch2--page.page.11

Digital Media Lab.

Data CompressionData Compression

(ECE 5546(ECE 5546--41)41)

Introduction to Compressive Sensing

Byeungwoo Jeon

Digital Media Lab, SKKU, Korea

http://media.skku.ac.kr; [email protected]

2012 Fall

Digital Media Lab.

Course IntroductionCourse Introductionn Data compression (before)

n Main text: Introduction to Data Compression (3rd Ed) (K. Sayood)n Main Topics

n Mathematical Preliminaries for Lossless compressionn Huffman Codingn Arithmetic Codingn Dictionary Techniquesn Context-Based Compressionn Lossless Image Compression

n Data compression (this semester): Compressed Sensingn Texts:

n R. Baraniuk, M. Davenport, M. Duarte, C. An Hegde, An Introductionto Compressive Sensing, Connexions Web site.http://cnx.org/content/col11133/1.5/, Apr 2, 2011.

n Compressed Sensing: Theory and Applications, edited by Y. C. Eldarand G. Kutyniok

n Lecture Note, Introduction to Compressed Sensing, Spring, 2011 byProf. Heung-No Lee (http://infonet.gist.ac.kr)

n Selected papers

2


2/163



Digital Media Lab.

Major Subjects to CoverMajor Subjects to Cover

3 Digital Media Lab.

How to StudyHow to Studyn Basic framework of the course

n Lecture (2 hours)n Paper Investigation (1 hour) : student presentation

n Each student should study thoroughly and present at least one paper.n It should be completely understood by the presenter before presentation.n A list of papers will be provided by the instructor. However, a preferred

paper can be suggested by student.

n Grading Policyn Attendance 10%n Project/Presentation 20 %n Homework 10 %n Exam (Mid 30 + Final 30) 60 %

4


3/163



Digital Media Lab.

Very Brief Introduction to CSVery Brief Introduction to CS

modified from a file by Igor Carron (version 2- draft)

athttps://docs.google.com/viewer?a=v&pid=sites&srcid=ZGVmYXVsdGRvbWFpbnxpZ2

9yY2Fycm9uMnxneDoxYmNkZjU5MWQ2NmJkOGUy)

Digital Media Lab.

Solving linear equationsSolving linear equationsn Solving linear equations (Y: Measured; X: Unknown; A: from signal

model)

n Its solution of X is easy unless the matrix A is non-invertible.n Nonlinear system can be approximated to a linear system of equations.n Continuous system can be discretized to a linear system of equations.

Y AX=

6


4/163


5/163


6/163



Digital Media Lab.

RfRf: Regularization (3): Regularization (3)n In statistics and machine learning, regularization is used to prevent

overfitting. Typical examples of regularization in statistical machinelearning include ridge regression, lasso, and L2-norm in supportvector machines.

n Regularization methods are also used for model selection, wherethey work by implicitly or explicitly penalizing models based on thenumber of their parameters. For example, Bayesian learning methodsmake use of a prior probability that (usually) gives lower probability tomore complex models. Well-known model selection techniquesinclude theAkaike information criterion (AIC), minimum descriptionlength (MDL), and the Bayesian information criterion (BIC).

Alternative methods of controlling overfitting not involvingregularization include cross-validation.

http://en.wikipedia.org/wiki/Regularization_(mathematics) 11 Digital Media Lab.

Compressed SensingCompressed Sensingn An instance of an underdetermined system of linear equation is a

compressed sensing system.n Y ~ compressed measurement (few)n A ~ sensing (in a form of linear combinations)n X ~ original information (like to find)

n The recovery of a sparse solution to an underdetermined system oflinear equations is performed using Compressed SensingReconstruction techniques /solvers.

n Key Question: Do all underdetermined systems of linearequations admit very sparse and unique solution ?n Answer: Some systems do under a condition (RIP, NSP,.).

n Issues to study (in this semester)n Mathematical backgroundn Checking the conditionn Recovery algorithmsn Implementing the algorithmsn Applications

Y AX=

12


7/163


8/163


(ECE 5546(ECE 5546--41)41)

2012 Fall

Digital Media Lab.

Ch2. Sparse and Compressible Signal Models

Byeungwoo Jeon



What we like to cover in this classWhat we like to cover in this class

Digital Media Lab. 2Algorithms for sparse analysis : Lecture I: Background on sparse approximation by

Anna C. Gilbert Department of Mathematics, University of Michigan)


9/163

Underdetermined linear equationsUnderdetermined linear equations

n Solving linear equations (Y: Measured; X: Unknown; A: from signalmodel)

n Underdetermined case:n Too few equations and too many unknowns means infinite number of

solutions (ie, matrix A cannot be inverted as in the square case)

Y AX=

Digital Media Lab.

n CS tries to solve this under the condition of sparseness.n Compressed Sensing reconstruction techniques allows one to find a solution

that is sparse, i.e., thathas the property of having very few non-zeros

elements (the rest of the elements are zeros).

n We need to evaluate the fitness of solution: need measurement (~ norm)n Need to formally define signal model, sparseness, compressibility, etc.

Need lots of concept in linear algebra

3

Compressed SensingCompressed Sensing

n An instance of an underdetermined system of linear equation is acompressed sensing system.n Y ~ compressed measurement (few)n A ~ sensing (in a form of linear combinations)n X ~ original information (like to find)

n The recovery of a sparse solution to an underdetermined system oflinear equations is performed using Compressed Sensing

Y AX=

Digital Media Lab.

.

n Key Question: Do all underdetermined systems of linearequations admit very sparse and unique solution ?n Answer: Some systems do under a condition (RIP, NSP,.).

n Issues to study (in this semester)n Mathematical backgroundn Checking the conditionn Recovery algorithmsn Implementing the algorithmsn Applications

4


10/163


11/163

Vector in 2Vector in 2--D (or 3D (or 3--D) worldD) world

n Vector: a directed line segment (direction & magnitude)n In 2-spacen In 3-space

n Vector addition and scalar multiplication

n Vector norm: Ifv is a vector then the magnitude of the vector, calledthe norm of the vector and denoted by ||v||. Furthermore, ifv is a

Terminal point

Initial point

Digital Media Lab.

vector in 2-space (or in 3-space), then,

n Dot product: Ifu and v are two vectors in 2-space (or 3-space), andthe angle between them is q, then the dot product is defined as,

n It is sometimes called scalar productorEuclidean inner product

7

2 2 2 2 2

1 2 1 2 3( 2 ); ( 3 );v v v in space v v v v in space= + - = + + -

cos( )u v u v q=g

Extension to NExtension to N--space (1)space (1)

n Definition ofn-space: For a given a positive integer n, an ordered n-tuple is a sequence of n real numbers denoted by . Thecomplete set of all ordered n-tuples is called n-space and is denotedby .n It is a natural extension of 2-space, 3-space

n Definition of arithmetic operations in n-space:

( )1 2, , ..., na a a

n

Digital Media Lab. 8


12/163


n Definition ofEuclidean inner product: For two vectors u, v in Rn,

, the Euclidean inner productisdefined as:

n It is a natural extension of dot product in 2-space.n It can be written in matrix as following: (suppose u and v are in column

( ) ( )1 2 1 2, ,..., , , ,...,n nu u u u v v v v= =

1,

n

iiu v u v u v=

= < > =

g

Digital Media Lab.

vectors)

n Note that when we add in addition, scalar multiplication and theEuclidean inner product to the n-space, it is often called Euclidean n-space.

9

Tu v v u=g

n Lets extend the concepts of norm and distance to n-space

n Definition: For a vector , the Euclidean normis defined as,

( )1 2, , ...,n

nu u u u R=

2

1

n

ii

u u u u=

= =

g


Digital Media Lab.

n Definition: For two vectors , the Euclidean distance bet.two points indicated by the two vectors, is defined as,

10

, nu v R

( )2

1

( , )n

i i

i

d u v u v u v=

= - = -


13/163

Generalization to Vector SpaceGeneralization to Vector Space

n Up to now, we have a good geometric analogy, esp., on 2-space (or3-space), coming from a notion that a vector is interpreted as adirectional line segment.

n A vector, however, is a much more general concept and it doesntnecessarily have to represent a directed line segments as before.n For example, a vector can be a matrix or a function and that is only a

couple of possibilities for vectors.

Digital Media Lab.

n Nor does a vector have to represent the vectors we looked at in Rn

(that is, a vector may not be in Rn, therefore, it is a more general

object).

n The concept of n-space is now generalized into vector space.n A vector space is nothing more than a collection of vectors (whatever

those now are) that satisfies a set of axioms.n Once we get the general definition of a vector and a vector space out of

the way, well look at many of the important ideas that come with vectorspaces.

11

Vector Space (1)Vector Space (1)

n Definition: Let V be a set on which addition and scalar multiplicationare defined (this means that if and are objects in , and is a scalarthen weve defined and in some way). If the following axioms aretrue for all objects and in and all scalars and , then iscalled a vector space and the objects in are called vectors.(a) (closure under addition) is in .

(b) (closure under scalar multiplication) is in V.(c) (commutation with addition)

Digital Media Lab.

(d) (association)(e) There is a special object in V, denoted 0 and called the zero vector, such

that for all u in V we have .(f) For every u in V there is another object in V, denoted u and called the

negative of u, such that .(g) (distribution)(h)(i)

(j)

n A vector space is simply a collection of vectors satisfying the axioms above.

12


14/163

Vector Space (2)Vector Space (2)

n Noten No need to be locked into the standard ways of defining addition and

scalar multiplication. For the most part we will be doing addition andscalar multiplication in a fairly standard way, but there will be the

occasional example where we wont.n In order for something to be a vector space it simply must have an

addition and scalar multiplication that meets the above axioms and itdoesnt matter how strange the addition or scalar multiplication might be.

Digital Media Lab.

n When the scalar in the definition is complex numbers, it is calledcomplex vector space. In the same way, when we restrict the scalars toreal numbers we generally call the vector space a real vector space.

n Ex1: If n is any positive integer then the set V = with the standardaddition and scalar multiplication as defined in the Euclidean n-space

section is a vector space.

n Ex2: Show a set V = with the standard scalar multiplication and anaddition defined as, is not a VS.

13

n

n

1 2 1 2 1 1 2 2( , ) ( , ) ( 2 , )u u v v u v u v+ = + +

RfRf: Signal and Vector Space: Signal and Vector Space

n Many natural and man-made system can be modeled well as linear.

n Model such a linear structure using a linear model by treating signalas a vector in a vector space.n The vector space model can capture the linear structure well.n This modeling allows us to apply intuitions and tools from the geometry in

3-space such as length, distance, angles, etc.n This is useful when the signal lives in high-dimensional or infinite-

Digital Media Lab.

dimensional spaces.

14


15/163

Inner ProductInner Product

n Generalization of the concept of dot product (or inner product) in n-space for the vector space:


Norm in Vector SpaceNorm in Vector Space

n A norm is a function that assigns a strictly positive length or size toall vectors in a vector space, other than the zero vector (which haszero length assigned to it).n A simple example is the 2-dimensional Euclidean space R

2equipped with

the Euclidean norm. The Euclidean norm assigns to each vector thelength of the vector. Because of this, the Euclidean norm is often known

as the magnitude.n A vector space with a norm is called a normed vector space.

Digital Media Lab.

n Definition of Norm: Given a vector space Vover a subfield Fof thecomplex numbers, a norm on V is a function p: V Rwith thefollowing properties: For all a Fand all u, v V,n P1: p(av) = |a|p(v), (positive homogeneityorpositive scalability).n P2: p(u + v) p(u) +p(v) (triangle inequality).n P3: Ifp(v) = 0, then v is the zero vector (separates points).

n A simple consequence of the first two axioms, positive homogeneityand the triangle inequality, isp(0) = 0 and thus,p(v) 0 (positivity).

16


16/163


17/163

Examples of Norm (2)Examples of Norm (2)

n Zero norm: In signal processing and statistics, David Donohoreferred to the zero "norm" with quotation marks.

n supp(x): support of x (set of index indicating non-zero components of x)

n Following Donoho's notation, the zero "norm" ofx is simply the

{ }0

supp( ) supp( ) 0iu u where u i u= =

Digital Media Lab.

number of non-zero coordinates ofx, or the Hamming distance of thevector from zero.n When this "norm" is localized to a bounded set, it is the limit ofp-norms

asp approaches 0.n Of course, the zero "norm" is not a B-norm, because it is not positive

homogeneous. It is not even an F-norm, because it is discontinuous,

jointly and severally, with respect to the scalar argument in scalar-vectormultiplication and with respect to its vector argument.

n Abusing terminology, some engineers omit Donoho's quotation marksand inappropriately call the number-of-nonzeros function the L0 norm(sic.), also misusing the notation for the Lebesgue space ofmeasurablefunctions.

19http://en.wikipedia.org/wiki/Norm_(mathematics)


n P-norm (forp 1, a real number)

n Note that forp = 1, we get the taxicab norm, forp = 2 we get the Euclideannorm, and asp approaches the infinity, thep-norm approaches the infinity

norm or maximum norm.n This definition is still of some interest for 0


18/163


n Lp norm: For

1/

1

1,..,

, [1, )

,max

pn

p

i

ip

i

i n

u p

uu p

=

=

= =

[1, ]p


Properties of NormsProperties of Norms

n The concept of unit circle (the set of all vectors of norm 1) is differentin different normsn For the 1-norm the unit circle in R2 is a squaren For the 2-norm (Euclidean norm) it is the well-known unit circlen For the infinity norm it is a different square.n For anyp-norm it is a superellipse (with congruent axes).n Due to the definition of the norm, the unit circle is always convex and

centrally symmetric (therefore, the unit ball may be a rectangle but

Digital Media Lab.

cannot be a triangle).

n Illustration of unit circles in different norms

22


19/163

Digital Media Lab.

2.2 BASES AND FRAMES2.2 BASES AND FRAMES

23

Linear IndependenceLinear Independence

n Linear Independence

Digital Media Lab.

n A finite set of vectors that contains the zero vector will be linearlydependent.

n Suppose that is a set of vectors in Rn . If k > n, thenthe set of vectors is linearly dependent.

24

{ }1,..., kS v v=


20/163


21/163

OrthogonalityOrthogonality and Basis (2)and Basis (2)

n Any vector in an inner product space, with an orthogonal/orthonormalbasis can be easily represented as a linear combination of basisvectors for that vector.


Orthogonal Complement (1)Orthogonal Complement (1)

n Definition ofOrthogonal complement: Suppose that W is a subspaceof an inner product space V. We say that a vector u from V isorthogonal to W if it is orthogonal to every vector in W. The set of allvectors that are orthogonal to W is called the orthogonalcomplement of Wand is denoted by .n We say that W and are orthogonal complements.

n Theorem

W^

W



22/163

Orthogonal Complement (2)Orthogonal Complement (2)

n Extension of Projection

Digital Media Lab.

n Theorem

29

In Matrix FormIn Matrix Form

n A basis set , any vector x in Rn

is uniquely representedas,

n Form a nxn matrix with columns given by s, and let c denotethe length-n vector with entries ci, the matrix representation is:

{ }1

n n

i if

=

1

n

i i

i

x c f=

=

F if

Digital Media Lab.

n Orthonormal basis should satisfy,n Therefore,

n In matrix form, (note orthonormality means ).

30

x c= F

, ( )i j i jf f d< >= -

,i jc x f=< >

Tc x= F

TIF F =


23/163

DictionaryDictionary

n A dictionary in is a collection of unit-normvectors:

n Each elements are called atoms.

n If spans , the dictionary is complete.

n If are linearly dependent, the dictionary is redundant.

nin RF { }1

N n

i ij

=

21ij =

{ } 1N n

i iRj

=

{ }1i i

j=

Digital Media Lab.

n In the sparse approximation literature, it is also common for a basis orframe to be referred to as a dictionaryorover-complete dictionaryrespectively, with the dictionary elements being called atoms.

31

Digital Media Lab.

2.3 SPARSE REPRESENTATION2.3 SPARSE REPRESENTATION

32


24/163

KK--SparseSparse

n Definition ofK-sparse: A signal is called K-sparse if it has at most Knon-zero components, i.e.,

n Note that a signalxitself may not show K-sparse, we still refer to x asbeing K-sparse, with understanding thatxcan be expressed K-sparsethrough linear transformation.

0x K

Digital Media Lab.

n Ex: x(t) = cos(wt)n Time-domainn Fourier-domain

n DCT-domain

33

0w erea a=

Ex: Sparse representation of imagesEx: Sparse representation of images

n Sparse representation of an image via multiscale wavelet transformn Note that most of wavelet coefficients are close to zero.


(a). Original image

(b). Wavelet representation (larger coeff lighter pixel)

Fog. 1.3 (Compressed Sensing by Y. Eldar et. Al)


25/163

Ex: SparseEx: Sparse approx. ofapprox. of a natural imagea natural image

n Sparse approximation of a natural image


(b) approximated by taking only 10% largest wavelet coefficient(a) Original image

Fog. 1.4 (Compressed Sensing by Y. Eldar et. Al)

Set of KSet of K--Sparse SignalsSparse Signals

n Set of all K-sparse signals

n Q: Is the set Sk a linear space?n That is, for any pair of vector x, z in Sk , x+z also belongs to Sk ?

{ }0K x x KS =

Digital Media Lab.

n See Fig. 1.5.

36Fog. 1.5 (Compressed Sensing by Y. Eldar et. Al)


26/163

Sparseness of ImageSparseness of Image

n Most natural images are characterized by large smooth or texturedregions with relatively few sharp edges.n Signals with this structure are known to be very nearly sparse when

represented using a multiscale wavelet approximation.

n K-term approximationn Need measure (i.e., appropriate norm) to measure the error.

n This kind of approximation is non-linear (since choice of which coefficients to

keep in the approximation depends on signal itself).


Digital Media Lab.

2.4 COMPRESSIBLE SIGNALS2.4 COMPRESSIBLE SIGNALS

38


27/163

Compressible vs. SparseCompressible vs. Sparse

n Few real-world signals are truly sparse, rather they are compressible(meaning that they can be well-approximated by sparse signals).n The terms mean the same concept: compressible, approximately sparse,

relatively sparse

n Quantification of the compressibility by calculating the error incurredby approximating a signal x by some ,

Kx S

Digital Media Lab.

n If , then for any p.

n Thresholding (keeping only the K largest coefficients) gives the optimal

approximation for all p.

n Choose a basis set such that the coefficients obey the power-law decay.

n

39

argm nK

K px

x xs S

= -

Kx S ( ) 0K pxs =

Compressibility (1)Compressibility (1)

n Definition of compressibility: A signal is called compressible if itssorted coefficient magnitudes in decays rapidly.

n Power-law decay: suppose there exist C1 and q > 0 such that

Y

1 2 ... nx wherea a a a = Y

sa , 1, 2,...qC s sa - =

Digital Media Lab.

n Larger q means faster magnitude decay, and the more compressible asignal is.

n In power-law decay, a signal can be approximated pretty well for K


28/163

Compressibility (2)Compressibility (2)

n Depending on the space (referred by ), the signal can be eithercompressible or not.n Therefore proper choice of the space is important

n Q: For a such compressible signal (K-approximated), there existconstant C2 and r > 0 depending only on C1 and q such that,

r-

Y


2 2K

KK--term Approximationterm Approximation

n Only K-largest coefficients are kept while making the others zero torepresent the given signal.n K-term approximation error

2( ) argmin

K

K x xa

s a S

= - Y



29/163

Digital Media Lab.

-- end --

43


30/163


(ECE 5546(ECE 5546--41)41)

2012 Fall

Digital Media Lab.

Ch 3. Sensing Matrices

Byeungwoo JeonDigital Media Lab, SKKU, Korea


Digital Media Lab.

What we are doing?


31/163

Sparse vs. Compressible (1)Sparse vs. Compressible (1)

n Recall that we call a signalxis K-sparse if it has at most k non-zeros.n K-sparse signal may not be themselves sparse, but it admits a sparse

representation in some basis

n But, few real-world signals are truly sparsen Most of signals can be represented as the compressible signal

0

{ : }K

x KS = 0

supp( )

lim

p

pp

x x->

=

Digital Media Lab.

x s= Y

=

Example: K= 4

Y sx

Sparse vs. Compressible (2)Sparse vs. Compressible (2)n Compressible signal means that the vector of coefficients in certain

basis has few large coefficients and other coefficients with smallvalues.,n If we set small coefficients to zero, the remaining large coefficients can

represent the original signal with hardly noticeable perpetual loss

sparse compressiblex

Digital Media Lab.


32/163

Compressed Sensing (1)Compressed Sensing (1)

n Compressed sensing measurement processn y: measurement vector (Mx1)n F: measurement (sensing) matrix (MxN)n x: input signal vector in its original domain (ex: time or spatial) (Nx1)n

(CS is also possible for continuous-time signal)

x= F

Digital Media Lab.

n F represents a dimensionality reduction (maps RN into RM, M


33/163

Compressed Sensing (3)Compressed Sensing (3)

n Measurement process withn There are small number of columns corresponding to nonzero coeffs.n The measurement vector y is a linear combination of these columns

n

Ex: The following is an underdetermined systemn 4-sparse, N unknownsn Fewer equations than unknowns (M


34/163

Questions to Answer (2)Questions to Answer (2)

n We further need to learn many issues:

n Q1: How to design an MxN sensing matrixF to ensure that itpreserves the information in the signal x?n

Sensing matrix which is designed to reduce the number of measurementsas much as possible while allowing the recovery wide class of signal xfrom their measurement

n Sensing matrix design problem

Digital Media Lab.

n Q2: How small Mcan we choose given K and N?

n Q3: How sparse the signal has to be at a given M?

n Q4: How to recover the original signalxfrom measurements y?n Look for fast and robust algorithmsSignal recovery problem

n Q5: When will the L1 convex relaxation solution attain the L0 solution?

Questions to Answer (3)Questions to Answer (3)n After answering the previous questions, we further need to investigate

yet another issues:

n How to incorporate measurement noise in the signal model y = Fx ?n What would happen to the L1 minimization signal recovery?n Reliability issue in signal recovery

n What would happen if there is a model mismatch ?

Digital Media Lab.

n If the signal is not exactly a K-sparse signal, what kind of results do weexpect under such an assumption?


35/163

Digital Media Lab.

Design of Sensing Matrix

1. Null Space Property2. Restricted Isometry property

3. Bounded Coherence Property

1. Null Space Conditions1. Null Space Conditionsn Q: Like to design F so that we can find recover all sparse signal x

corresponding to the measurement y. What condition on F do we need?

n Definition of Null space ofF

n Uniqueness solution condition fory= Fx

( ) { 0}N z zF = F =

( ) contains uniquely representsF F

Digital Media Lab.

n Proof:

2K Kno vector in all xS S

X y

.

.

.

.

..

.

.

.

.

In this case, no way to find all signals x

from the measurements y distinct x

must mean distinct measurement vectors.


36/163

SparkSpark

n Definition of Spark: The spark of a given matrix F is the smallestnumber of columns ofF that are linearly dependent.

n Sparkn

A term coined by Donoho & Elad (2003)n It is a way characterizing the null space ofF using L-0 norm.n It is very complex to obtain (compared to a rank), since it calls for

combinatorial search over all possible subsets of columns from F.

Digital Media Lab.

More on Spark (1)More on Spark (1)n Solving a underdetermined equation: (F ~ MxN, N >> M)

n Term A: Its corresponding columns ofF are linearly dependentn Term B: Its corresponding columns ofF are linearly independent

i i k k j k

i k j

y x x x xf f f= F = = + A: linearly dep. B:linearly indep.

Digital Media Lab.

1

2

1

1 2

.

...... .....

.

.

.

N

M

N

x

x

y

y

x

f f f

=

M


37/163

More on Spark (2)More on Spark (2)

n Note that any vector x can be represented asn Where

1 1

2 2

0x x

x x

MM MM

M MM

( )A

x NS F

Digital Media Lab.

n Note thatn (zero vector is not considered since it is a trivial case )

1 1

0

0

.

0

A B

i i

i i

N N

x x xx x

x x

x x

+ +

= = + = +

M M

2 ( ) 1spark M F +

Spark ConditionSpark Conditionn Theorem: for any vector ,

n (This is an equivalent way of characterizing the null space condition)

y R

( )at most one signal

2s.t. y=K

spark Kx x

$F >

S F

Digital Media Lab.

n This theorem guarantees uniqueness of representation for K-sparsesignals.n It has combinatorial computational complexity, since it must verify that all

sets of columns of a certain size are linearly independent.

n (Proved by D.L.Donoho and M.Elad Optimally sparse representationin general (nonorthogonal) dictionaries via l1 minimization)

n The spark provides a complete characterization of when sparserecovery is possible. However, when dealing with approximatelysparse signals (ie. compressible signal), we must consider somewhatmore restrictive conditions on the null space ofF.


38/163

Proof to Spark ConditionProof to Spark Condition

n () Suppose spark(F) 2Kn Assume that for some y there exists such that

n Letting , we can write this asn Since , all sets of up to 2K columns ofF are linearly

independent, and therefore .

, Kx x S

h x x= - 0hF =( ) 2spark KF >

0h =x=

y x x= F = F( ) 0x xF - =

Corollary to Spark ConditionCorollary to Spark Conditionn Corollary:

n Proof:2 ( ) 1; ( ) 2

2 1

2

spark M spark K

K M

K M

F + F >

< +

2 M

Digital Media Lab.


39/163

More on Spark ConditionMore on Spark Condition

n The spark provides a complete characterization of when sparserecovery is possible.

n However, when dealing with approximately sparse signals (ie.

( )at most one signal

2s.t. y=K

spark Kx x

$F >

S F

Digital Media Lab.

compressible signal), we must consider somewhat more restrictiveconditions on the null space ofF.n We must also ensure that NS(F) does not contain any vectors that are

too compressible in addition to vectors that are sparse. Null space property

n Notationn : length-N vector obtained by setting 0 the entries of x indicated by

n : MxN matrix obtained by replacing zero column at the columnpositions indexed by

{1, 2,..., }subset of indices NL {1, 2,..., } \C NL LxL

C

LFCL

2. Null Space Property (NSP)2. Null Space Property (NSP)n Definition of null space property (NSP) of order K

n A matrix F satisfies the null space property (NSP) of order Kif thereexists a constant C > 0 such that,

holds for all and for all such that .

1

2

Chh C

LL

h N F L KL

Digital Media Lab.

n Rf:

1

2

3

n

h

h

hh

h

=

M

M

1

3

0

,0

0

0

h

hhL

=

2

4

0

0C

n

h

hh

h

L

=

M

Ch h hL L= +

{1,3}L


40/163

Null Space Property (NSP)Null Space Property (NSP)

n The NSP implies that vectors in the null space ofF should not be tooconcentrated on a small subset of indices.

n If a vector h is exactly K-sparse, then there exists a such that

1

2

Ch

h C K

L

L

L 0C =

Digital Media Lab.

Therefore, NSP indicates that as well.

n This means that if a matrix F satisfies the NSP, then the only K-sparsevector in N(F) is h=0.

20, 0h thus hL L= =

NSP and Sparse RecoveryNSP and Sparse Recoveryn How to measure the performance of sparse recovery algorithms when

dealing with general non-sparse x.

n The following relationship under NSP guarantees exact recovery of allpossible K-sparse signals, but also ensures a degree of robustness tonon-sparse signals that directly depends on how well the signals are

approximated by K-sparse vectors.s

Digital Media Lab.

n Where represents a specific recovery method, and

1

2

Chh C

LL

:M ND R R

2x

K-

( ) min

K

K p pxx x xs

S= -


41/163

NSP TheoremNSP Theorem

n Theorem: For a sensing matrix F : RN RM, and an arbitraryrecovery algorithm D : RM RN

1

2

a pair ( , ) satisfiessatisfies

( )( ) 2K xx x C NSP of order KK

s

F DF

D F -

Digital Media Lab.

Proof of NSP TheoremProof of NSP Theorem

n Suppose and let be the indices corresponding to the 2K

largest entries of h. Split into and where

n Set and , so that .

n Since by construction , we can apply to

obtain . Moreover, since , we have

( )h N F LL 0L 1L 0 1 KL = L =

1Cx h hL= +

0x h = - h x x= -

Kx S1

2

( )( ) Kx x C

K

sD F -

( )x x = D F ( )h N F

Digital Media Lab.

n so that . Thus, . Finally, we have that

n If matrix F satisfies the NSP then the only 2K-sparse vector in N(F )is h=0

x x= - =

xF = F ( )x x = D F

1 1

2 2 2 2

( )( ) 2

2

CK

hxh h x x x x C C

K K

s LL = - = - D F =


42/163

RestrictedRestricted IsometryIsometry Property (RIP)Property (RIP)

n When measurements are contaminated with noise or have beencorrupted by some error such as quantization, it will be useful toconsider somewhat stronger conditions.

n

Candes and Tao introduced the isometry condition on matrices A andestablished its important role in CS.

n In mathematics, an isometry is a distance-preserving map between

Digital Media Lab.

metric spaces. Geometric figures which can be related by an isometryare called congruent.

RestrictedRestricted IsometryIsometry Property (RIP)Property (RIP)

n Definition of RIPn A matrix F satisfies the restricted isometry property(RIP) of order Kif

there exists a such that

holds for all .

(0,1)Kd 2 2 2

2 2 2(1 ) (1 )K Kx xd d- F +

{ }0|Kx x x KS =

Digital Media Lab.

n If a matrix F satisfies the RIP of order 2K, F approximately preservesthe distance between any pair of K-sparse vectors.n Fundamental implications concerning robustness to noise.

n If a matrix F satisfies the RIP of order K with constant dK, then, forany K < K, we automatically have that F satisfies the RIP of order Kwith constant .

n If a matrix F satisfies the RIP of order K with a sufficiently smallconstant, then it will also automatically satisfy the RIP of ordergKforcertain g, albeit with a somewhat worse constant.

'K Kd d


43/163

The RIP and StabilityThe RIP and Stability

n Definition of C-stable: Let denote a sensing matrix anddenote a recovery algorithm. A pair (F,D) is called C-

stable if for any and any , we have that

n It says that if we add a small amount of noise to the measurements,then the impact of this on the recovered signal should notbe

: ND R RKx S e

2 2

( )x e x C eD F + -

: N MF R R

Digital Media Lab.

arbitrarily large.

n As C 1, F must satisfy the lower bound of below with

n Thus if we desire to reduce the impact of noise in the recovered signal,we must adjust F so that it satisfies the lower bound of above inequalitywith a tighter constant.

2 2 2

2 2 2(1 ) (1 )

K Kx xd d- F +

21 1/ 0K

Cd = -

The RIP and StabilityThe RIP and Stabilityn Theorem: If a pair (F,D) is C-stable, then, for all

n It demonstrates that the existence of any decoding algorithm that can

stably recover from noisy measurements requires that F satisfy thelower bound of RIP with a constant determined by C.

2 2

1x x

C F

Kx S

Digital Media Lab.


44/163

Digital Media Lab.

End of Lecture 3

(Chapter 3 is continued in next week)


45/163


(ECE 5546(ECE 5546--41)41)

2012 Fall

Digital Media Lab.

Ch 3. Sensing Matrices



Digital Media Lab.

How many measurement are necessary to

achieve RIP?

(Measurement Bound)

2


46/163

Measurement bound (1)Measurement bound (1)

n Lemma: For K and M satisfying K < N/2, there exist a subset X of SKsuch that for any x in X, we have,

and for any distinct x & z in X,

2x K

2/ 2 log log

2

K Nx z K and X

-

Digital Media Lab.

n Proof:

3

Measurement bound (2)Measurement bound (2)n Theorem: Let F be an NxM matrix that satisfies the RIP of order 2K

with constant . Then,

n Proof:

(0,0.5]d

( )1

log log 24 1 0.282

NM CK where C

K

= +



47/163

Measurement bound (3)Measurement bound (3)

n Johnson-Lindenstrauss lemma:

( )002

log0

c pM where constant c

e >


Digital Media Lab.

How to design the sensing matrix?

6


48/163

RIP and NSP (1)RIP and NSP (1)

n Theorem: Suppose F satisfies the RIP of order 2K with .Then, F satisfies the NSP of order 2K with constant,

( )2 2 1Kd < -

( )

2

2

2

1 1 2

K

K

Cd

d

=

- +

Digital Media Lab.

RIP and NSP (2)RIP and NSP (2)n Lemma: Suppose , then,

n Lemma : Suppose that F satisfies the RIP of order 2K, and let

be arbitrary. Let be any subset of {1,2,,N} s.t.Define as the index set corresponding to the K

Ku S 1

2

uu K u

K

, 0N

h R h 00 1

.KL L

Digital Media Lab.

entries of with largest magnitude, and set . Then,

where,

8

0c 0 1

L = L L

0 1

2

2

,ch h hh

hKa b

L L

LL

F F +

2

2 2

2 1,

1 1

K

K K

da b

d d= =

- -


49/163

Matrix Design Satisfying RIPMatrix Design Satisfying RIP

n Q: How to construct matrix satisfying RIP?

n Methods1. Deterministic method2. Randomization method

n Method without specified d2K

(just assume d2K

> 0)

n Method with specified d2K

( particular value ofd2K

is specified)

Digital Media Lab.

n Definition of RIPn A matrix F satisfies the restricted isometry property(RIP) of order Kif

there exists a such that, for all ,

n Theorem on RIP and NSP: Suppose F satisfies the RIP of order 2Kwith . Then, F satisfies the NSP of order 2K with constant,

9

2 2 2

2 2 2(1 ) (1 )K Kx x xd d- F +

{ }0|K x x K S = (0,1)Kd

( )2 2 1Kd < -

( )2

2

2

1 1 2

K

K

Cd

d=

- +

Deterministic Matrix DesignDeterministic Matrix Designn Idea: deterministically construct matrices of size MxN that satisfy the

RIP of order Kn It requires M to be relatively large.

n (ex). Requires

n In real-world problem, these results lead to an unacceptably large M.( ) ( )2 og [62]; [115]M O K N in M O KN ina= =



50/163

Randomization Matrix Design (1)Randomization Matrix Design (1)

n Idea: Choose random numbers for matrix entriesn For given M and N, generate random matrices F by choosing the entries

fij as independent realizations from some PDF.

n Randomization method without specified d2K

(just assume d2K

> 0)n Set M=2K, and draw F according to Gaussian PDF.

n With probability 1, any subset of 2K columns are linearly indep. and hence all

subsets of 2K columns will be bounded below by (1- d2K

), where d2K

> 0.

Digital Media Lab.

n Problem: how to know the value ofd2K

?

n Need to search all combinations of K-dimensional subspaces of .

n Considering realistic values of N and K, such search is of prohibitively too

much computation.

11

N

R

Randomization Matrix DesignRandomization Matrix Design (2)(2)

n Randomization method with specified value ofd2K

n Like to achieve RIP of order 2K for a specified constant d2K

.n It can be achieved by specifying additional two conditions on the PDF.

n Cond 1: Let the PDF yield a matrix that is norm-preserving, That is,

n Under this condition, the variance of PDF is 1/M.

( )21

ijE f =

Digital Media Lab.

n Cond 2: The PDF is sub-Gaussian. That is, there exists a constant c > 0 s.t.

n Note that, the moment-generating function of the PDF is dominated by

that of a Gaussian PDF, which is also equivalent to requiring that the

tails of the PDF decay at least as fast as the tails of a Gaussian PDF.

12

( )2 2 / 2ij t c tE e e for all t R

f


51/163


n Examples of sub-Gaussian PDFn Gaussian, Bernoulli with taking values , more generally any PDF

with bounded support.

n Strictly-Sub-Gaussian: a PDF satisfyingwith a constant c below.

1

( )2 2 /2

.

ij t c t

E e e for all t R

f

= ( )2 2

1ij

c E f= =

Digital Media Lab.

n Corollary: suppose that F is an MxN matrix whose entries fij are iidwith fij drawn according to a strictly sub-Gaussian PDF with c

2=1/M.Let Y = Fx for x in RN. Then for any e > 0 and any x in RN,

With

n Note that the norm of a sub-Gaussian RV strongly concentrates about itsmean.

13

( ) ( )2

2 2 2 2 2

*2 2 2 2 2& P 2 exp

ME Y x Y x x

ee

k

= - -

* 2 6.52

1 log(2)k =

-


n Theorem: Fix . Let F be an MxN random matrix whoseentries fij drawn according to a strictly-Gaussian PDF with c

2=1/M.If,

then F satisfies the RIP of order K with prescribed d with probabilityexceeding , where k1 is arbitrary and

1 logN

M KK

k =

(0,1)d

21 2M

ek-- ( )2 *2 12 log 42 / .ek d k d k = -

Digital Media Lab.

n Note that the measurement bound above satisfies the optimalnumber of measurements (up to a constant).

14


52/163

Why randomized method better?Why randomized method better?

n One can show that for the random construction, the measurementsare democratic.n It means that it is possible to recover a signal using any sufficiently large

subset of the measurements.n Thus, by using random F one can be robust to the loss or corruption of a

small fraction of the measurements.

n Universality: can easily accommodate some other basis

Digital Media Lab.

n In practice, we are often more interested in the setting where x is sparsewith respect to some basis Y. In this case, what actually required is RIP ofthe product (FY).n In case deterministic design, the design process must take into account Y.n In randomized design, F can be designed independently from Y.

n IfF is Gaussian and Y is orthonormal, note that (FY) is also Gaussian.n Furthermore, for sufficiently large M, (FY) will satisfy RIP with high

probability.

15

Practical SituationPractical Situation

n In practical implementation, the fully random matrix design may besometimes impractical to build in HW. Therefore it is possible to:n Use a reduced amount of randomnessn Or model the architecture via matrices F that has significantly more

structure than a fully random matrixn EX: random demodulator[192], random filtering [194], modulated

wideband converter [147], random convolution [2,166], compressivemultiplier [179]

Digital Media Lab.

n Although not quite easy as in the fully random case, one can provethat many of such constructions also satisfy RIP.

16


53/163

CoherenceCoherence

n Definition: Coherence of a matrix F, m(F), is the largest absoluteinner product between any two columns offi, fj ofF.

12 2

,( ) max

i j

i j N i j

f fm

f f < F =

Digital Media Lab.

n Note that the coherence satisfies the relation below:

n Its lower bound is called Welch bound

n When N>>M, the lower bound is approximately to

n Coherence is related to spark, NSP, and RIP.

17

( ) 1( 1)

N M

M Nm

- F

-

( ) 1 /m F

CoherenceCoherence and Sparkand Sparkn Theorem: The eigenvalues of an NxN matrix Mwith entries

lie in the union of N discs centered at andwith radius .

,1 , ,ij

m i j N

( ), ,1 ,i i i id d c r i N = i iic m=

i ij

j i

r m

=

Digital Media Lab.

n Lemma: For any matrix F,

18

1( ) 1

( )spark

mF +

F


54/163

Coherence and NSPCoherence and NSP

n Theorem: If , then for each measurement vector ,

there exists at most one signal such that y=Fx.

1 11

2 ( )K

m

< + F

My R

Kx S

Digital Media Lab.

n Lemma: IfF has unit-norm columns and coherence m=m(F), then Fsatisfies the RIP of order K with d=(K-1)m for all K


55/163


(ECE 5546(ECE 5546--41)41)

2012 Fall

Digital Media Lab.

Ch 4. Sparse Signal Recovery via L1 Minimization



Digital Media Lab.

How to recover a sparse signalfrom a small

number of linear measurements ?

2


56/163

Sparse Signal Recovery (1)Sparse Signal Recovery (1)

n Problem:Fory=Fx and forxassumed to be sparse (or compressible), findsatisfying

where B(y) ensures that is consistent with the measurements y.

0 argmin ( )

z

x z subject to z B y=

x

Digital Media Lab.

n Under an assumption of x being sparse (or compressible), find xcorresponding to measurement y under L0 optimality condition.n The solution seeks for the sparsest signal in B(y).n Consistent with measurements: depending on existence of measurement

noise, B(y) has two cases.

3

{ }

{ }2

| ( )( )

| ( )

z y noise free caseB y

z z y noisy casee

F = -=

F -


n The framework also holds for the case x is not apparently sparse.

n In that case, suppose , then the problem is,

where,

x a= Y

0 argmin ( )

z

z subject to z B ya =

{ }| ( )z z y noise free case FY = -

Digital Media Lab.

n Note that under an assumption of Y referring to orthonormal basis, itis possible to assume y=I without loss of generality.

4

{ }2

| ( )z z y noisy caseeFY -


57/163


n How to solve the L0 minimization problem?

0 argmin ( )

z

z subject to z B y=

{ }{ }

2

| ( )( )| ( )

z y noise free caseB yz z y noisy casee

FY = -= FY -

Digital Media Lab.

n Note that || . ||0

is a non-convexfunction.n Potentially very complex (NP-hard) to solve this minimization problem.

n L0solution via L1 minimization

n If B(y) is convex, this problem becomes computationally tractable !n The solution prefers to a sparse solution in B(y).

n Big Question: Will L1 solution be similar to L1 solution?

5

1 argmin ( )

z

z subject to z B y=

Why LWhy L11 Minimization Preferred?Minimization Preferred?

n Intuitively,n L1 minimization promotes sparsity.n There are variety of reasons to suspect that L1 minimization will provide

an accurate method for sparse signal recovery.n L1 minimization provides a computationally tractable approach to the

signal recovery.



58/163

Digital Media Lab.

Analysis of L1 Minimization Solution

7

1 argmin ( )

z


NoiseNoise--free Signal Recovery (1)free Signal Recovery (1)

n Lemma: Let F be a matrix that satisfies the RIP of order 2K withconstant . Let be given, define ,L0 ~ index set corresponding to K entries of x w/ largest magnitudeL1 ~ index set corresponding to K entries of w/ largest magnitudeSet . If , then,

10 12

,( )K

h hxh C C

s L< F F > +

22 1

Kd < - ,

Nx x R h x x= -

0 1L = L LU 0

ch

1 1x x

Digital Media Lab.

where,

n It shows an error bound for the class of L1 minimization algorithm

when combined with a measurement matrix F satisfying RIP.n For specific bounds for concrete examples of B(y), need to examine how

requiring affects .

8

2L

20 1

2 2

1 (1 2 ) 22 ,

1 (1 2) 1 (1 2 )

K

K K

C Cd

d d

- -= =

- + - -

( )x B y ,h hL< F F >


59/163


n Proof: (self-study)



n Theorem: Let F be a matrix that satisfies the RIP of order 2K withconstant . When B(y)={z | Fz = y }, the solution to theL1 minimization obeys that,

102

( ) Kx x C

K

s-

22 1

Kd < -

Digital Media Lab.

n For and F satisfying RIP,

n Note L1 minimization exactly provides the solution by L0 minimization.

n In other words, for as few as O(Klog(N/K)) measurements, we can

exactly recover any K-sparse signals using the L1 minimization.n This can be shown also stable under noisy measurements.

10

{ }0|Kx x x K S =

2 0x x- =


60/163


n Proof: For x belonging to B(y), the lemma can be applied to obtainthat for ,

Since , Therefore, Fh = 0, and the

10 12

2

,( )K

h hxh C C

hK

s L

L

< F F > +

, ( ), .x x B y y x x = F = F

h x x= -

Digital Media Lab.

, .

11

102

( )Kh C

K

s

--Q.E.D.---

Noisy Signal Recovery (1)Noisy Signal Recovery (1)

n Theorem: Let F be a matrix that satisfies the RIP of order 2K withconstant . Let y=Fx+e with (that is, boundednoise) . Then, for the L1 solution obeys

10 2

2

( ) K

xx x C C

K

se- +

22 1

Kd < -

2e e

{ }2

( ) | ,B y z z y e= F -

Digital Media Lab.

,

n This provides a bound on the worst-case performance for uniformly

bounded noise.

12

( ) ( )2 2

0 2

2 2

1 1 2 1 22 , 4

1 1 2 1 1 2

K K

K K

C Cd d

d d

- - += =

- + - +


61/163


n Proof: (self-study)



n What is the bound of recovery error if the noise is Gaussian ?

n Corollary: Let F be a sensing matrix that satisfies the RIP of order-

2, ( , )My x e where e R iid with N o s= F +

-

Digital Media Lab.

, ,obtain measurement y=Fx+e where the entries of e are iid N(0,s2).Then, when the solution to L1minimization obeys,

14

( )0222

1 8 1

1 (1 2)

c MK

K

x x M with probability at least ed

sd

-+- -- +

{ }2( ) | 2 ,B y z z y Ms= F -


62/163

Digital Media Lab.

How to recover a non-sparse signal from a

small number of linear measurements ?

15

InstanceInstance--optimal Guarantee (1)optimal Guarantee (1)

n Theorem: Let F be a MxN matrix and that is arecovery algorithm satisfying,

then,

( ) 22 ( ) 1,Kx x C x for some Ks- D F

: M NRD

( )2

1 1 1 / C N> - -

Digital Media Lab.

n n or er o ma e e oun o or a s gna s x w a cons an ,then, regardless of what recovery algorithm is being used, need totake measurements.

N


63/163

RfRf: Instance: Instance--Optimal?Optimal?

n The theorem says not only about exact recovery of all possible k-sparse signals, but also ensures a degree of robustness to non-sparse signals that directly depends on how well the signals areapproximated by k-sparse vectors.

Instance-optimal guarantee (i.e., it guarantees optimal performance foreach instance of x)

n Cf: Guarantee that onlyholds for some subsetof possible signals,

Digital Media Lab.

such as compressible or sparse signals (the quality of guaranteeadapts to the particular choice of x)n In that sense, instance-optimality is also commonly referred to as

uniform guarantee since they hold uniformly for all x.

17


n Theorem: Fix . Let F be a MxN random matrix whoseentries are iid with drawn according to a strictly sub-Gaussiandistribution with If,

then, F satisfies the RIP of order K with the prescribed d with

(0,1)d

1log

NM K

Kk

ijf

ijf

21 / .c M=

Digital Media Lab.

pro a y excee ng , w ere s ar rary an21 2e-- 1k

( )2 *2 1/ 2 log 42 / /ek d k d k = -


64/163


n Theorem: Let be fixed. Set , and suppose that Fbe a MxN sub-Gaussian random matrix with

and measurement is y= Fx. Set . Then, with probabilityexceeding , when the L1

NR

( )321 2 MMe e kk --- -

22 1

Kd < -

1log

NM K

K

k

2 ( )

Ke s=

{ }2

( ) | ,B y z z y e= F -

Digital Media Lab.

,

2 2

22

2

1 (1 2 ) 8 ( )

1 (1 2)

K K

K

K

x xd d

sd

+ - +-

- +

Digital Media Lab.

End of Chapter 4

20


65/163

Data CompressionData Compression(ECE 5546(ECE 5546--41)41)

2012 Fall

Digital Media Lab.

Ch 5. Algorithms for Sparse Recovery

Part 1

Byeungwoo Jeon



Digital Media Lab.

Various recovery algorithms for compressed-

sensed sparse signal

2


66/163




0 argmin ( )

z

z subject to z B y=

x

From Chapter 4

Digital Media Lab.

n Under an assumption of x being sparse (or compressible), find xcorresponding to measurement y under L0 optimality condition.n The solution seeks for the sparsest signal in B(y).n Consistent with measurements: depending on existence of measurement

noise, B(y) has two cases.

n Loss (cost) function other than the Euclidean distance may also beappropriate.

3

{ }{ }

2

| ( )( )| ( )

z y noise free caseB yz z y noisy casee

F = -= F -


n How to solve the L0 minimization problem?

0 argmin ( )

z

z subject to z B y=

{ }

{ }2

| ( )( )

| ( )

z z y noise free caseB y

z z y noisy casee

FY = -=

FY -

From Chapter 4

Digital Media Lab.

n Note that || . ||0

is a non-convexfunction.n Potentially very complex (NP-hard) to solve this minimization problem.

n L0solution via L1 minimization

n

If B(y) is convex, this problem becomes computationally tractable !n The solution prefers to a sparse solution in B(y).

n Big Question: Will L1 solution be similar to L1 solution?

4

1 argmin ( )

z



67/163

Use of Different NormsUse of Different Norms

n Solve the underdetermined system

n L2 norm (p=2): small penalty on small residual, strong penalty on

; ; ;MxN N M

y x where R x R y R M N= F F 0 as:

n The parameterm can be found by trial-and-error, or by statisticaltechnique such as cross-validation.

n Actually decision of a proper value m is a research problem.

9

( ){ }min ( ) ,x x H x ym+ F

ConvexConvexoptimizationoptimization--based method (2)based method (2)

n Ex: J(x) = ||x||p

n p=0 (L0 norm): directly measure sparsity (but hard to solve)n p=1 (L1 norm): gives robustness against outliers

n Ex: for example, p=2

{ }0

min ( ) | minx x

J x y x x subject to y x= F = F

( ),p

x y x yF = F -

Digital Media Lab.

n Ex: the noisy case can be modified in several ways:

10

2 0min

xx y subject to x KF -

2

2

0

1min , 0

2xx y xm m

F - + >

Review

Convexity,

Optimization,

etc.

( ){ }0 2

min ( ) | , minx x

J x H x y x subject to x ye eF F -


70/163

ConvexConvexoptimizationoptimization--based method (3)based method (3)

n Standard optimization package cannot be used for real applicationsof CS since the number of unknowns (that is, dimension of x) is verylarge.

n If there are no restrictions on the sensing matrix F and the signal x,the solution to the sparse approximation is very complex (NP-hard).n In practice, sparse approximation algorithms tend to be slow unless the

sensing matrix F admits a fast matrix-vector multiply (like fast transform

Digital Media Lab.

Need 1~2 volunteer to

investigate on fast

computation utliizing

the structure of

sensing matrix

algorithm utilizing matrix structure).n In case of compressible signal which needs some transformation first,

fast multiplication is possible when both the sensing (random) matrix andsparsity basis are structured.

n Then, the question is how to incorporate more sophisticated signalconstraints into sparsity models.

11

L0 Approach (1)L0 Approach (1)

n L0 norm explictly computes the number of nonzero components ofthe given datan Directly related to sparsity of a signaln A function card(x): cardinalityn For scalar x:

n Card(x) has no convexity properties.

( ) 0 ( 0) 1 ( 0)card x x and x= =

Digital Media Lab.

n Note however it is quasi-concave on since, for x, y 0

12

nR+

{ }( ) min ( ), ( )card x y card x card y+

From Prof. S. Boyd (EE364a, b) Stanford Univ


71/163

RfRf:: QuasiconvexityQuasiconvexity (1)(1)

n Quasiconvex function: a real-valued function defined on an intervalor on a convex subset of a real vector space such that the inverseimage of any set of the form (-infinity, a) is a convex set.n Informally, along any stretch of the curve, the highest point is one of the

endpoints.n The negative of a quasiconvex function is said to be quasiconcave.

Digital Media Lab. 13http://en.wikipedia.org/wiki/Quasiconvex_function

A quasiconvex function that

is not convex

A function that is not quasiconvex: the set of

points in the domain of the function for which

the function values are below the dashed red

line is the union of the two red intervals,

which is not a convex set


n Def: A function f : S R defined on a convex subset S of a real vectorspace is quasiconvexif for all x, y S and l [0,1], then,

n Note that the pointsxand y, and the point directly between them, can be

points on a line or more generally points in n-dimensional space.n In words, iffis such that it is always true that a point directly between two

other oints does not ive a hi her a value of the function than do both of

{ } ( )(1 ) max ( ), ( )f x y f x f yl l+ -

Digital Media Lab.

the other points, then fis quasiconvex.

n An alternative way of defining a quasi-convex function is to require thateach sub-levelset Sa(f) is a convex set.

n A concave function can be quasiconvex function. For example log(x) isconcave, and it is quasiconvex.

n Any monotonic function is both quasiconvex and quasiconcave. Moregenerally, a function which decreases up to a point and increases fromthat point on is quasiconvex (compare unimodality).

14http://en.wikipedia.org/wiki/Quasiconvex_function

{ }( ) | ( ) ~aS f x f x a convex set =


72/163


n Quasiconvexity is a generalization of convexity.n All convex functions are also quasiconvex, but not all quasiconvex

functions are convex.

n A function that is both quasiconvex and quasiconcave is quasilinear.


The probability density function of the normal

distribution is quasiconcave but not concave

http://en.wikipedia.org/wiki/Quasiconvex_function

A quasilinear function is both

quasiconvex and quasiconcave


n General convex-cardinality problemn It refers to a problem what would be convex, except for appearance of

card(.) in objective or constrains.n Example: For f, C: convex,

( )inimize card x subject to x C

( ) , ( )inimize f x subject to x C card x K

Digital Media Lab.

n Solving convex-cardinality problem: for x Rn,

n Fix a sparsity pattern of x (i.e., which entries are zero/nonzero), thensolve its convex problem

n If we solve the 2n

convex problems associated with all possible sparsitypatterns, the convex-cardinality problem is solved completely.n

However, practically possible only for n 10n General convex-cardinality problem is NP-hard.

16From Prof. S. Boyd (EE364a, b) Stanford Univ


73/163


n Many forms of optimization problems

2 0inimize x y subject to x K F -

0 2Minimize x subject to x y eF -

inimize x y xlF - +

Digital Media Lab.

n L1-norm Heuristicn Replace ||x||0 with l||x||1 or add regularization term l||x||1 to objective fct.n l is a parameter used to achieve desired sparsity

n More sophisticated versions use or wherew and v are positive weights.


( ) ( )i i i i i ii i i

w x w x v x+ -

+

RfRf: Reweighted L1 algorithm (1): Reweighted L1 algorithm (1)

n (joint work of E. Candes, M. Wakins and S. Boyd)

n Minimum L0 recovery requires minimal oversampling but intractable

n Observation: If x* solution to the combinatorial search and

{ 0}01

ix

i

min x subject to y x= = F

Digital Media Lab.

then, x* is also the solution to

18

*

**

*

10

0

i

ii

i

xxw

x

=

=*

i i

i

min w x subject to y x= F

From CS Theory Lecture Notes by E. Candes, 2007


74/163


Initial step:

Loop: For j=1,2,3,n Solve

n Update

(0 ) 1i

w for all i=

( ) ( 1)

arg min | |j j

i i

ix w x such that y x

- = = F

( ) 1j =

Digital Media Lab.

n Until convergence (typically 2~5 iterations)

n

Intuition: down-weight large entries of x to mimic magnitude-insensitive L0-penalty.

19

( )| |ix e+



n Empirical performance

Digital Media Lab. 20From CS Theory Lecture Notes by E. Candes, 2007


75/163


n Connection between L1 norm and sparsityn Known for a long time, early 70sn Mainly studied in Geophysics (literature on sparse spike trains)n Key rough empirical fact is that L1 returns sparse solution

n Replace the combinatorial L0 function with the L1 norm, yielding aconvex optimization problemn It makes the problem tractable !

Digital Media Lab.

n here can be several variants of the problem.

21From CS Theory Lecture Notes by E. Candes, 2007


n The L1 minimization problem

n There is always a solution with at mostM non-zero termsn

In general, the solution is unique

1min ,

N

xN

x Rx R

F

Digital Media Lab.

n m ar y,

n There is always a solution (r = y-Fx) has at most(N-M) non-zero termsn In general, the solution is unique

22

1min ,

N

xN

x Rx y R

F - F



76/163


n Variantn Start with minimum cardinality problem (C: convex)

n Apply heuristic to obtain L1-norm minimization problem

( )inimize card x subject to x C

1|| ||inimize x subject to x C

Digital Media Lab.

n ariantn Start with cardinality constrained problem (f, C: convex)

n Apply heuristic to obtain L1-norm constrained problem

n Or L1-regularized problem

n b, g are adjusted so that card(x) K.

23

( ) , ( )inimize f x subject to x C card x K

1( ) , || ||Minimize f x subject to x C x b

1( ) || ||inimize f x x subject to x C l+



n Variant with polishingn Use L1 heuristic to find x estimate with required sparsityn Fix the sparsity pattern of xn Re-solve the (convex) optimization problem with this sparsity pattern to

obtain final (heuristic) solution.

Digital Media Lab. 24From Prof. S. Boyd (EE364a, b) Stanford Univ


77/163

Digital Media Lab.

Some examples: convex optimization

25

From Computational Methods for Sparse 2010 IEEE Proceedings by J.A. Tropp and J. Wright

EqualityEquality--constrainedconstrained ProblemProblem

n Equality-constrained problemn Among all x consistent with measurements, pick one with min L1 norm

1min ( 1)

xx subject to y x C= F

Digital Media Lab. 26From Computational Methods for Sparse 2010 IEEE

Proceedings by J.A. Tropp and J. Wright


78/163

Convex Relaxation MethodConvex Relaxation Method

n Convex Relaxation Method

n m is a regularization parameter: it governs the sparsity of the solutionn large m typically produces sparser results.

2

2

1

1min , 0 ( 2)

2xx y x Cm m

F - +

Digital Media Lab.

n How to choose m ?n Often need to solve the equation repeatedly for different choices ofm, or to

trace systematically the path of solutions as m decreases towards zero.

27From Computational Methods for Sparse 2010 IEEE


LASSOLASSO

n Least Absolute Shrinkage and Selection Operator (LASSO) methodn It is equivalent to the convex relaxation method (C2) in the sense that the

path of solution of (C3) parameterized by positive b matches the solutionpath for as m varies.

2

2

1

min ( 3)x

x y subject to x CbF -

Digital Media Lab.

n Rf: its L0 version:

n Can interpret this as fitting the vector y as a linear combination of Kregressors (chosen form N possible regressors) ~ feature selection (instatistics).

n i.e., choose a subset of M regressors that (together) best fit or explain y.n

In can be solved (in principle) by trying all choices.

n Rf: An independent variable is also known as a "predictor variable", "regressor",

"controlled variable, "manipulated variable", "explanatory variable", "feature" (see

machine learning and pattern recognition) or an "input variable.

28From CS Tutorial at ITA 2008 by Baraniuk, Romberg, and Wakin

M

2 0min

xy subject to x KF -


79/163

OthersOthers

n Quadratic relaxation (LASSO)n Explicit parameterization of the error norm

21

min ( 4)x

x subject to x y CeF -

Digital Media Lab.

n Danzig selector (with residual correlation constraints)

29

( )1

minT

xx subject to x y e

F F -

From CS Tutorial at ITA 2008 by Baraniuk, Romberg, and Wakin

Further study (Volunteer?)Further study (Volunteer?)

n Other optimization algorithms:n interior point methods (slow, but extremely accurate)

n homotopy methods (fast and accurate for small-scale problems)



80/163

Gradient Method (1)Gradient Method (1)

n (also known as the first-order method) iteratively solve the followingproblem

n Similar methods under this categoryn Operator splitting [65]

2

2

1

1min , 0 ( 2)

2x

x y x Cm m

F - +

Digital Media Lab.

n Iterative splitting and thresholding (IST) [66]n Fixed-point iteration [67]n Sparse reconstruction via separable approximation (SpaRSA) [68]n TwIST [70]n GPSR [71]




n Gradient-descent framework

Input: a signal yRM, sensing matrixRMxN, regularization parameterm> 0, and initial estimate x0 Output: coefficient vector xRN

Algorithm:(1). Initialize: set k=1

+


.

If an acceptance test on is not passed, increase ak by somefactor and repeat.(3). Line search: choose gk (0,1] and obtain xk+1 from

(4). Test: If stopping criterion holds, terminate with x=xk+1. Otherwise,set kk+1 and goto (2).

k

( ) ( )2* *

2 1

1: arg min

2k k k k k

z

z x x y z x za m+ = - F F - + - +

( )1 :k k k k k x x xg ++ = + -

k


81/163


n This gradient-based method works well on sparse signals when thedictionary F satisfies RIP.n It benefits from warm starting, that is, the work required to identify a

solution can be reduced dramatically when the initial estimate of x is close

to the solution.

n Continuation strategyn Solve the optimization problem (C2) for a decreasing sequences ofm

Digital Media Lab.

using the approximate solution for each value as the starting point for thenext sub-problem.


Procedings by J.A. Tropp and J. Wright

Review:Review:

Convex OptimizationConvex Optimization



82/163

References (1)References (1)

n Introduction to Optimizationn http://ocw.mit.edu/courses/electrical-engineering-and-computer-

science/6-079-introduction-to-convex-optimization-fall-2009/index.htm


References (2)References (2)

n Convex Optimization (EE364a by Prof. Boyd)n http://www.stanford.edu/class/ee364a/lectures.htmln Video lecture is also available Introduction

Convex sets

Convex functions

Convex optimization problems

DualityApproximation and fitting

Statistical estimation

Geometric roblems


Numerical linear algebra background

Unconstrained minimization

Equality constrained minimization

Interior-point methods

Conclusions

Lecture slides in one file.

Additional lecture slides:

Convex optimization examples

Stochastic programming

Chance constrained optimization

Filter design and equalizationDisciplined convex programming and CVX

Two lectures from EE364b:

methods for convex-cardinality problems

methods for convex-cardinality problems, part II


83/163

Mathematical Optimization ProblemMathematical Optimization Problem

n Optimize problem

Digital Media Lab. 37From Prof. S. Boyd (EE364a, b) Stanford Univ

Solving Optimization ProblemSolving Optimization Problem

n General optimization problemn Very difficult to solven Methods involve some compromise, e.g., very long computation time, or

not always finding the solution

n

Exceptions : certain problem classes can be solved efficiently andreliablyn Least-square problems

2min yF -

Digital Media Lab.

n Analytical solution

n Linear programming problemsn No analytical formula for solution

n Reliable and efficient algorithms and software~ a mature technology

n Convex optimization problemsn Objective and constraint functions are convex

n It includes least-squares problems and linear programming as special cases

38

x

( )1

* T Ty

-= F F F

in 1,.....,T T

ic x subject to a x yi m =

0min ( ) ( ) , 1,...,

i if x subject to f x y i m =



84/163

Optimization problem in standard formOptimization problem in standard form


Optimal & locally optimal pointsOptimal & locally optimal points



85/163

Implicit constrainsImplicit constrains


Convex Set and Others (1)Convex Set and Others (1)

n Def: A set W is convexif and only if for any x1 and x2 W and for any, 0 1, the convex combination x= x1 + (1-) x2 W

n Example: convex set non-convex set non-convex set

Digital Media Lab.

n Def: Convex combination of x1,. . ., xk: any point x of the form x = 1x1+ 2x2 + + kxk with 1 + + k =1, j 0.

n Def: Convex hull (conv S): a set of all convex combinations of pointsin S



86/163

Convex Set and Others (2)Convex Set and Others (2)

n Def: Conic(nonnegative) combination of x1 and x1: any point of theform x = 1x1 + 2x2 with 1 0, 2 0.

Digital Media Lab.

n Def: Convex cone: a set that contains all conic combinations of pointsin the set


Convex functionConvex function

n Def: A function f(x): W R is convexif only if any convexcombination x = x1 + (1-) x2 for all x1, x2 W and , 0 1,satisfies f{x1 + (1-) x2} f(x1)+(1

)f(x2).

Digital Media Lab.

n Note that f is concave if (f) is convexn A function f is strictly convex iff f{x1 + (1-) x2} < f(x1)+(1

)f(x2).



87/163

11ststorder Conditionorder Condition

n Def: A function f is differentiable ifdom f is open and the gradient

exists at each x dom f.

n Def: 1st-order condition: A differentiable function f with convex

1

( ) ,...,n

f ff x

x x

=

Digital Media Lab.

domain is convexiff f(y) f(x)+f(x) (yx) for all x,ydom f.


2nd order Condition2nd order Condition

n Def: A function f is twice differentiable ifdom f is open and theHessian2f(x)Sn exists at each xdom f.

n Def: 2nd-order conditions: for twice-differentiable function f with

( )2

2( ) 1 ,

iji j

ff x for i j N

x x

=

Digital Media Lab.

convex oma n, a unc on s convex an on y

n Strict convex: no equality sign

46

( )2 ( ) 0 1 ,ij

f x for i j N



88/163


(ECE 5546(ECE 5546--41)41)

2012 Fall

Digital Media Lab.

Ch 5. Algorithms for Sparse Recovery

Part 2

Byeungwoo Jeon



Recovery Algorithms (1)Recovery Algorithms (1)

n Category 1: Convex optimization approach (or convex relaxation)n Replace the combinatorial problem with a convex optimization problem.n Solve the convex-optimization problem with algorithms which can exploit

the problem structure.

n Category 2: Greedy algorithmsn Greedy pursuits

n Iteratively refine a sparse solution by successively identifying one or more

Digital Media Lab.

components that yield the greatest improvements in quality.

n In general very fast and are applicable to very large datasets, however,

theoretical peformance guarantees are typically weaker than those of some

other methods.

n Thresholding algorithmsn The methods alternate both element selection as well as element pruning

steps. These methods are often very easy to implement and can be relatively

fast.

n These have theoretical performance guarantees that rival those guarantees

derived for convex optimization-based approaches.

2


89/163

Recovery Algorithms (2)Recovery Algorithms (2)

n Category 3: Bayesian frameworkn Assume a prior distribution for the unknown coefficients favoring sparsity.n Develop a maximum a posterior estimator incorporating the observation.n Identify a region of significant posterior mass or average over most-

probable models.

n Category 4: Other approachesn Non-convex optimization method: relax the L0 problem to a related non-

Digital Media Lab.

convex problem and attempt to identify a stationary point.n Brute force method: search through all possible support sets, possibly

using cutting-plane methods to reduce the number of possibilities.n Heuristic method: based on belief-propagation and message-passing

techniques developed in graphical models and coding theory.

3

Digital Media Lab.

Greedy Algorithms

A greedy algorithm is an algorithm that follows the problem solving heuristic of

making the locally optimal choice at each stage with the hope of finding a global

optimum. In many problems, a greedy strategy does not in general produce anoptimal solution, but nonetheless a greedy heuristic may yield locally optimal

solutions that approximate a global optimal solution in a reasonable time.

4http://en.wikipedia.org/wiki/Greedy_algorithm


90/163

Greedy Algorithm (1)Greedy Algorithm (1)

n Starting at A, a greedy algorithm (GA) will find the local maximum at"m", instead of the global maximum at "M".

Search global

maximum startin

Digital Media Lab. 5http://en.wikipedia.org/wiki/Greedy_algorithm

from A?

Greedy Algorithm (2)Greedy Algorithm (2)

n The greedy algorithm determines theminimum number of coins to give whilemaking change. These are the steps ahuman would take to emulate a greedyalgorithm to represent 36 cents usingonly coins with values {1, 5, 10, 20}.

n The coin of the highest value, less

Ex: How to pay 36 cents

using only coins with

values {1, 5, 10, 20}?

Digital Media Lab.

than the remaining change owed, isthe local optimum.n (Note that in general the change-

making problem requires dynamic

programming or integer programming

to find an optimal solution.)

n However, most currency systems,

including the Euro (pictured) and US

Dollar, are special cases where the

greedy strategy does find an optimum

solution

6http://en.wikipedia.org/wiki/Greedy_algorithm


91/163

Sparse Signal Recovery viaSparse Signal Recovery via Greedy AlgorithmGreedy Algorithm

{ }| ( )z z y noise free case F = -=

0 argmin ( )

z

z subject to z B y=

x

x




2| ( )z z y noisy caseeF -

n This problem can be re-written as,

n W denotes a particular subset of the indices i=1,,N, and fidenotes theith column ofF.

n Use greedy algorithm to find the index set W.

min : i ii

I y x fW

W

=

RfRf: Greedy Algorithms: Greedy Algorithms

n Greedy algorithms have been called in different terms in other fieldsn Statistics: Forward stepwise regression

n Nonlinear approximation: Pure greedy algorithm

n Signal Processing: Matching pursuit

n Radio Astronomy: CLEAN algorithm



92/163


93/163

Basic idea ofBasic idea of PursuitPursuit algorithm (3)algorithm (3)

n Pre-normalization: assume that all the columns ofF are normalizedby multiplying by normalizing matrix W

( ) ,normalizing M N N N W where R W R F F F

1 2 2

1 1.....

N

where W diagf f

=

Digital Media Lab.

n Under thispre-normalization, the solution of pursuit algorithm can beeasily found by identifying the column maximizing

n As a final step, the solution vector x should be post-normalized by

n Theorem tells that the normalization does not change the solution.n From now on, assume pre-normalization without loss of generality.

11

( )2

T

jyf

( )normalizing

Wx

Basic idea ofBasic idea of PursuitPursuit algorithm (4)algorithm (4)

n Suppose K > 1: since y is a linear combination ofKcolumns ofF, theproblem is to find a subset ofF consisting of K columns.n Need to enumerate combinations

n Greedy algorithm (Pursuit-based methods): instead of the exhaustivesearch, select column one by one in favor of local optimum.n Start from x(0) = 0 (residual r(0) = y), it iteratively constructs K-term

( )~ KN

O NK

Digital Media Lab.

, ,expanding the set by one additional column.

n The additional column at each stage is the one which maximally reducesthe residual error (in L2 sense) in approximating the measurement y usingthe currently active columns.n Residual: as-yet unexplained portion of the measurement

n After constructing an approximation including a new column, a newresidual vector is computed by subtracting the approximation represented

by the newly selected column from the current residual.n A new residual L2 error is evaluated: if it falls below a threshold, the

algorithm terminates. Otherwise, looks for another column.

12


94/163

Various Pursuit AlgorithmsVarious Pursuit Algorithms

n Matching Pursuit (MP) ~ also known as pure greedy algorithm

n Orthogonal Matching Pursuit (OMP)

n Weak-Matching Pursuit

n These algorithms all belong to Greedy algorithms (GA)n Its variants include

Digital Media Lab.

n Pure GA(PGA)

n Orthogonal GA(OGA)

n Relaxed GA(RGA)

n Weak GA (WGA)

n Rf: At this point, it is not fully clear what role greedy pursuit algorithmswill ultimately play in practice.From Computational Methods for Sparse 2010 IEEEProceedings by J.A. Tropp and J. Wright

13

Digital Media Lab.

Category 2: Greedy algorithms

n Greedy pursuits

n Thresholding algorithms

14


95/163

Matching Pursuit (MP) (1)Matching Pursuit (MP) (1)

n First proposed by Mallat and Zhang*: iterative greedy algorithm thatdecomposes a signal into a linear combination of elements from adictionary (i.e., sensing matrix).

0

( ) ( )

Inputs: Sensing matrix , measurement vector , error threhold

Outputs: a sparse signal

Initialize: Set 0, index set , and residual .

Main Iteration: Increment by 1 and perform followings: