7/30/2019 Data Compressing
1/163
Lecture Note (Data Compression-CS) by Prof. Byeungwoo Jeon (2012 Fall)
2012/FallCh2Ch2--page.page.11
Digital Media Lab.
Data CompressionData Compression
(ECE 5546(ECE 5546--41)41)
Introduction to Compressive Sensing
Byeungwoo Jeon
Digital Media Lab, SKKU, Korea
http://media.skku.ac.kr; [email protected]
2012 Fall
Digital Media Lab.
Course IntroductionCourse Introductionn Data compression (before)
n Main text: Introduction to Data Compression (3rd Ed) (K. Sayood)n Main Topics
n Mathematical Preliminaries for Lossless compressionn Huffman Codingn Arithmetic Codingn Dictionary Techniquesn Context-Based Compressionn Lossless Image Compression
n Data compression (this semester): Compressed Sensingn Texts:
n R. Baraniuk, M. Davenport, M. Duarte, C. An Hegde, An Introductionto Compressive Sensing, Connexions Web site.http://cnx.org/content/col11133/1.5/, Apr 2, 2011.
n Compressed Sensing: Theory and Applications, edited by Y. C. Eldarand G. Kutyniok
n Lecture Note, Introduction to Compressed Sensing, Spring, 2011 byProf. Heung-No Lee (http://infonet.gist.ac.kr)
n Selected papers
2
7/30/2019 Data Compressing
2/163
Lecture Note (Data Compression-CS) by Prof. Byeungwoo Jeon (2012 Fall)
2012/FallCh2Ch2--page.page.22
Digital Media Lab.
Major Subjects to CoverMajor Subjects to Cover
3 Digital Media Lab.
How to StudyHow to Studyn Basic framework of the course
n Lecture (2 hours)n Paper Investigation (1 hour) : student presentation
n Each student should study thoroughly and present at least one paper.n It should be completely understood by the presenter before presentation.n A list of papers will be provided by the instructor. However, a preferred
paper can be suggested by student.
n Grading Policyn Attendance 10%n Project/Presentation 20 %n Homework 10 %n Exam (Mid 30 + Final 30) 60 %
4
7/30/2019 Data Compressing
3/163
Lecture Note (Data Compression-CS) by Prof. Byeungwoo Jeon (2012 Fall)
2012/FallCh2Ch2--page.page.33
Digital Media Lab.
Very Brief Introduction to CSVery Brief Introduction to CS
modified from a file by Igor Carron (version 2- draft)
athttps://docs.google.com/viewer?a=v&pid=sites&srcid=ZGVmYXVsdGRvbWFpbnxpZ2
9yY2Fycm9uMnxneDoxYmNkZjU5MWQ2NmJkOGUy)
Digital Media Lab.
Solving linear equationsSolving linear equationsn Solving linear equations (Y: Measured; X: Unknown; A: from signal
model)
n Its solution of X is easy unless the matrix A is non-invertible.n Nonlinear system can be approximated to a linear system of equations.n Continuous system can be discretized to a linear system of equations.
Y AX=
6
7/30/2019 Data Compressing
4/163
7/30/2019 Data Compressing
5/163
7/30/2019 Data Compressing
6/163
Lecture Note (Data Compression-CS) by Prof. Byeungwoo Jeon (2012 Fall)
2012/FallCh2Ch2--page.page.66
Digital Media Lab.
RfRf: Regularization (3): Regularization (3)n In statistics and machine learning, regularization is used to prevent
overfitting. Typical examples of regularization in statistical machinelearning include ridge regression, lasso, and L2-norm in supportvector machines.
n Regularization methods are also used for model selection, wherethey work by implicitly or explicitly penalizing models based on thenumber of their parameters. For example, Bayesian learning methodsmake use of a prior probability that (usually) gives lower probability tomore complex models. Well-known model selection techniquesinclude theAkaike information criterion (AIC), minimum descriptionlength (MDL), and the Bayesian information criterion (BIC).
Alternative methods of controlling overfitting not involvingregularization include cross-validation.
http://en.wikipedia.org/wiki/Regularization_(mathematics) 11 Digital Media Lab.
Compressed SensingCompressed Sensingn An instance of an underdetermined system of linear equation is a
compressed sensing system.n Y ~ compressed measurement (few)n A ~ sensing (in a form of linear combinations)n X ~ original information (like to find)
n The recovery of a sparse solution to an underdetermined system oflinear equations is performed using Compressed SensingReconstruction techniques /solvers.
n Key Question: Do all underdetermined systems of linearequations admit very sparse and unique solution ?n Answer: Some systems do under a condition (RIP, NSP,.).
n Issues to study (in this semester)n Mathematical backgroundn Checking the conditionn Recovery algorithmsn Implementing the algorithmsn Applications
Y AX=
12
7/30/2019 Data Compressing
7/163
7/30/2019 Data Compressing
8/163
Data CompressionData Compression
(ECE 5546(ECE 5546--41)41)
2012 Fall
Digital Media Lab.
Ch2. Sparse and Compressible Signal Models
Byeungwoo Jeon
Digital Media Lab, SKKU, Korea
http://media.skku.ac.kr; [email protected]
What we like to cover in this classWhat we like to cover in this class
Digital Media Lab. 2Algorithms for sparse analysis : Lecture I: Background on sparse approximation by
Anna C. Gilbert Department of Mathematics, University of Michigan)
7/30/2019 Data Compressing
9/163
Underdetermined linear equationsUnderdetermined linear equations
n Solving linear equations (Y: Measured; X: Unknown; A: from signalmodel)
n Underdetermined case:n Too few equations and too many unknowns means infinite number of
solutions (ie, matrix A cannot be inverted as in the square case)
Y AX=
Digital Media Lab.
n CS tries to solve this under the condition of sparseness.n Compressed Sensing reconstruction techniques allows one to find a solution
that is sparse, i.e., thathas the property of having very few non-zeros
elements (the rest of the elements are zeros).
n We need to evaluate the fitness of solution: need measurement (~ norm)n Need to formally define signal model, sparseness, compressibility, etc.
Need lots of concept in linear algebra
3
Compressed SensingCompressed Sensing
n An instance of an underdetermined system of linear equation is acompressed sensing system.n Y ~ compressed measurement (few)n A ~ sensing (in a form of linear combinations)n X ~ original information (like to find)
n The recovery of a sparse solution to an underdetermined system oflinear equations is performed using Compressed Sensing
Y AX=
Digital Media Lab.
.
n Key Question: Do all underdetermined systems of linearequations admit very sparse and unique solution ?n Answer: Some systems do under a condition (RIP, NSP,.).
n Issues to study (in this semester)n Mathematical backgroundn Checking the conditionn Recovery algorithmsn Implementing the algorithmsn Applications
4
7/30/2019 Data Compressing
10/163
7/30/2019 Data Compressing
11/163
Vector in 2Vector in 2--D (or 3D (or 3--D) worldD) world
n Vector: a directed line segment (direction & magnitude)n In 2-spacen In 3-space
n Vector addition and scalar multiplication
n Vector norm: Ifv is a vector then the magnitude of the vector, calledthe norm of the vector and denoted by ||v||. Furthermore, ifv is a
Terminal point
Initial point
Digital Media Lab.
vector in 2-space (or in 3-space), then,
n Dot product: Ifu and v are two vectors in 2-space (or 3-space), andthe angle between them is q, then the dot product is defined as,
n It is sometimes called scalar productorEuclidean inner product
7
2 2 2 2 2
1 2 1 2 3( 2 ); ( 3 );v v v in space v v v v in space= + - = + + -
cos( )u v u v q=g
Extension to NExtension to N--space (1)space (1)
n Definition ofn-space: For a given a positive integer n, an ordered n-tuple is a sequence of n real numbers denoted by . Thecomplete set of all ordered n-tuples is called n-space and is denotedby .n It is a natural extension of 2-space, 3-space
n Definition of arithmetic operations in n-space:
( )1 2, , ..., na a a
n
Digital Media Lab. 8
7/30/2019 Data Compressing
12/163
Extension to NExtension to N--space (2)space (2)
n Definition ofEuclidean inner product: For two vectors u, v in Rn,
, the Euclidean inner productisdefined as:
n It is a natural extension of dot product in 2-space.n It can be written in matrix as following: (suppose u and v are in column
( ) ( )1 2 1 2, ,..., , , ,...,n nu u u u v v v v= =
1,
n
iiu v u v u v=
= < > =
g
Digital Media Lab.
vectors)
n Note that when we add in addition, scalar multiplication and theEuclidean inner product to the n-space, it is often called Euclidean n-space.
9
Tu v v u=g
n Lets extend the concepts of norm and distance to n-space
n Definition: For a vector , the Euclidean normis defined as,
( )1 2, , ...,n
nu u u u R=
2
1
n
ii
u u u u=
= =
g
Extension to NExtension to N--space (3)space (3)
Digital Media Lab.
n Definition: For two vectors , the Euclidean distance bet.two points indicated by the two vectors, is defined as,
10
, nu v R
( )2
1
( , )n
i i
i
d u v u v u v=
= - = -
7/30/2019 Data Compressing
13/163
Generalization to Vector SpaceGeneralization to Vector Space
n Up to now, we have a good geometric analogy, esp., on 2-space (or3-space), coming from a notion that a vector is interpreted as adirectional line segment.
n A vector, however, is a much more general concept and it doesntnecessarily have to represent a directed line segments as before.n For example, a vector can be a matrix or a function and that is only a
couple of possibilities for vectors.
Digital Media Lab.
n Nor does a vector have to represent the vectors we looked at in Rn
(that is, a vector may not be in Rn, therefore, it is a more general
object).
n The concept of n-space is now generalized into vector space.n A vector space is nothing more than a collection of vectors (whatever
those now are) that satisfies a set of axioms.n Once we get the general definition of a vector and a vector space out of
the way, well look at many of the important ideas that come with vectorspaces.
11
Vector Space (1)Vector Space (1)
n Definition: Let V be a set on which addition and scalar multiplicationare defined (this means that if and are objects in , and is a scalarthen weve defined and in some way). If the following axioms aretrue for all objects and in and all scalars and , then iscalled a vector space and the objects in are called vectors.(a) (closure under addition) is in .
(b) (closure under scalar multiplication) is in V.(c) (commutation with addition)
Digital Media Lab.
(d) (association)(e) There is a special object in V, denoted 0 and called the zero vector, such
that for all u in V we have .(f) For every u in V there is another object in V, denoted u and called the
negative of u, such that .(g) (distribution)(h)(i)
(j)
n A vector space is simply a collection of vectors satisfying the axioms above.
12
7/30/2019 Data Compressing
14/163
Vector Space (2)Vector Space (2)
n Noten No need to be locked into the standard ways of defining addition and
scalar multiplication. For the most part we will be doing addition andscalar multiplication in a fairly standard way, but there will be the
occasional example where we wont.n In order for something to be a vector space it simply must have an
addition and scalar multiplication that meets the above axioms and itdoesnt matter how strange the addition or scalar multiplication might be.
Digital Media Lab.
n When the scalar in the definition is complex numbers, it is calledcomplex vector space. In the same way, when we restrict the scalars toreal numbers we generally call the vector space a real vector space.
n Ex1: If n is any positive integer then the set V = with the standardaddition and scalar multiplication as defined in the Euclidean n-space
section is a vector space.
n Ex2: Show a set V = with the standard scalar multiplication and anaddition defined as, is not a VS.
13
n
n
1 2 1 2 1 1 2 2( , ) ( , ) ( 2 , )u u v v u v u v+ = + +
RfRf: Signal and Vector Space: Signal and Vector Space
n Many natural and man-made system can be modeled well as linear.
n Model such a linear structure using a linear model by treating signalas a vector in a vector space.n The vector space model can capture the linear structure well.n This modeling allows us to apply intuitions and tools from the geometry in
3-space such as length, distance, angles, etc.n This is useful when the signal lives in high-dimensional or infinite-
Digital Media Lab.
dimensional spaces.
14
7/30/2019 Data Compressing
15/163
Inner ProductInner Product
n Generalization of the concept of dot product (or inner product) in n-space for the vector space:
Digital Media Lab. 15
Norm in Vector SpaceNorm in Vector Space
n A norm is a function that assigns a strictly positive length or size toall vectors in a vector space, other than the zero vector (which haszero length assigned to it).n A simple example is the 2-dimensional Euclidean space R
2equipped with
the Euclidean norm. The Euclidean norm assigns to each vector thelength of the vector. Because of this, the Euclidean norm is often known
as the magnitude.n A vector space with a norm is called a normed vector space.
Digital Media Lab.
n Definition of Norm: Given a vector space Vover a subfield Fof thecomplex numbers, a norm on V is a function p: V Rwith thefollowing properties: For all a Fand all u, v V,n P1: p(av) = |a|p(v), (positive homogeneityorpositive scalability).n P2: p(u + v) p(u) +p(v) (triangle inequality).n P3: Ifp(v) = 0, then v is the zero vector (separates points).
n A simple consequence of the first two axioms, positive homogeneityand the triangle inequality, isp(0) = 0 and thus,p(v) 0 (positivity).
16
7/30/2019 Data Compressing
16/163
7/30/2019 Data Compressing
17/163
Examples of Norm (2)Examples of Norm (2)
n Zero norm: In signal processing and statistics, David Donohoreferred to the zero "norm" with quotation marks.
n supp(x): support of x (set of index indicating non-zero components of x)
n Following Donoho's notation, the zero "norm" ofx is simply the
{ }0
supp( ) supp( ) 0iu u where u i u= =
Digital Media Lab.
number of non-zero coordinates ofx, or the Hamming distance of thevector from zero.n When this "norm" is localized to a bounded set, it is the limit ofp-norms
asp approaches 0.n Of course, the zero "norm" is not a B-norm, because it is not positive
homogeneous. It is not even an F-norm, because it is discontinuous,
jointly and severally, with respect to the scalar argument in scalar-vectormultiplication and with respect to its vector argument.
n Abusing terminology, some engineers omit Donoho's quotation marksand inappropriately call the number-of-nonzeros function the L0 norm(sic.), also misusing the notation for the Lebesgue space ofmeasurablefunctions.
19http://en.wikipedia.org/wiki/Norm_(mathematics)
Examples of Norm (3)Examples of Norm (3)
n P-norm (forp 1, a real number)
n Note that forp = 1, we get the taxicab norm, forp = 2 we get the Euclideannorm, and asp approaches the infinity, thep-norm approaches the infinity
norm or maximum norm.n This definition is still of some interest for 0
7/30/2019 Data Compressing
18/163
Examples of Norm (4)Examples of Norm (4)
n Lp norm: For
1/
1
1,..,
, [1, )
,max
pn
p
i
ip
i
i n
u p
uu p
=
=
= =
[1, ]p
Digital Media Lab. 21
Properties of NormsProperties of Norms
n The concept of unit circle (the set of all vectors of norm 1) is differentin different normsn For the 1-norm the unit circle in R2 is a squaren For the 2-norm (Euclidean norm) it is the well-known unit circlen For the infinity norm it is a different square.n For anyp-norm it is a superellipse (with congruent axes).n Due to the definition of the norm, the unit circle is always convex and
centrally symmetric (therefore, the unit ball may be a rectangle but
Digital Media Lab.
cannot be a triangle).
n Illustration of unit circles in different norms
22
7/30/2019 Data Compressing
19/163
Digital Media Lab.
2.2 BASES AND FRAMES2.2 BASES AND FRAMES
23
Linear IndependenceLinear Independence
n Linear Independence
Digital Media Lab.
n A finite set of vectors that contains the zero vector will be linearlydependent.
n Suppose that is a set of vectors in Rn . If k > n, thenthe set of vectors is linearly dependent.
24
{ }1,..., kS v v=
7/30/2019 Data Compressing
20/163
7/30/2019 Data Compressing
21/163
OrthogonalityOrthogonality and Basis (2)and Basis (2)
n Any vector in an inner product space, with an orthogonal/orthonormalbasis can be easily represented as a linear combination of basisvectors for that vector.
Digital Media Lab. 27
Orthogonal Complement (1)Orthogonal Complement (1)
n Definition ofOrthogonal complement: Suppose that W is a subspaceof an inner product space V. We say that a vector u from V isorthogonal to W if it is orthogonal to every vector in W. The set of allvectors that are orthogonal to W is called the orthogonalcomplement of Wand is denoted by .n We say that W and are orthogonal complements.
n Theorem
W^
W
Digital Media Lab. 28
7/30/2019 Data Compressing
22/163
Orthogonal Complement (2)Orthogonal Complement (2)
n Extension of Projection
Digital Media Lab.
n Theorem
29
In Matrix FormIn Matrix Form
n A basis set , any vector x in Rn
is uniquely representedas,
n Form a nxn matrix with columns given by s, and let c denotethe length-n vector with entries ci, the matrix representation is:
{ }1
n n
i if
=
1
n
i i
i
x c f=
=
F if
Digital Media Lab.
n Orthonormal basis should satisfy,n Therefore,
n In matrix form, (note orthonormality means ).
30
x c= F
, ( )i j i jf f d< >= -
,i jc x f=< >
Tc x= F
TIF F =
7/30/2019 Data Compressing
23/163
DictionaryDictionary
n A dictionary in is a collection of unit-normvectors:
n Each elements are called atoms.
n If spans , the dictionary is complete.
n If are linearly dependent, the dictionary is redundant.
nin RF { }1
N n
i ij
=
21ij =
{ } 1N n
i iRj
=
{ }1i i
j=
Digital Media Lab.
n In the sparse approximation literature, it is also common for a basis orframe to be referred to as a dictionaryorover-complete dictionaryrespectively, with the dictionary elements being called atoms.
31
Digital Media Lab.
2.3 SPARSE REPRESENTATION2.3 SPARSE REPRESENTATION
32
7/30/2019 Data Compressing
24/163
KK--SparseSparse
n Definition ofK-sparse: A signal is called K-sparse if it has at most Knon-zero components, i.e.,
n Note that a signalxitself may not show K-sparse, we still refer to x asbeing K-sparse, with understanding thatxcan be expressed K-sparsethrough linear transformation.
0x K
Digital Media Lab.
n Ex: x(t) = cos(wt)n Time-domainn Fourier-domain
n DCT-domain
33
0w erea a=
Ex: Sparse representation of imagesEx: Sparse representation of images
n Sparse representation of an image via multiscale wavelet transformn Note that most of wavelet coefficients are close to zero.
Digital Media Lab. 34
(a). Original image
(b). Wavelet representation (larger coeff lighter pixel)
Fog. 1.3 (Compressed Sensing by Y. Eldar et. Al)
7/30/2019 Data Compressing
25/163
Ex: SparseEx: Sparse approx. ofapprox. of a natural imagea natural image
n Sparse approximation of a natural image
Digital Media Lab. 35
(b) approximated by taking only 10% largest wavelet coefficient(a) Original image
Fog. 1.4 (Compressed Sensing by Y. Eldar et. Al)
Set of KSet of K--Sparse SignalsSparse Signals
n Set of all K-sparse signals
n Q: Is the set Sk a linear space?n That is, for any pair of vector x, z in Sk , x+z also belongs to Sk ?
{ }0K x x KS =
Digital Media Lab.
n See Fig. 1.5.
36Fog. 1.5 (Compressed Sensing by Y. Eldar et. Al)
7/30/2019 Data Compressing
26/163
Sparseness of ImageSparseness of Image
n Most natural images are characterized by large smooth or texturedregions with relatively few sharp edges.n Signals with this structure are known to be very nearly sparse when
represented using a multiscale wavelet approximation.
n K-term approximationn Need measure (i.e., appropriate norm) to measure the error.
n This kind of approximation is non-linear (since choice of which coefficients to
keep in the approximation depends on signal itself).
Digital Media Lab. 37
Digital Media Lab.
2.4 COMPRESSIBLE SIGNALS2.4 COMPRESSIBLE SIGNALS
38
7/30/2019 Data Compressing
27/163
Compressible vs. SparseCompressible vs. Sparse
n Few real-world signals are truly sparse, rather they are compressible(meaning that they can be well-approximated by sparse signals).n The terms mean the same concept: compressible, approximately sparse,
relatively sparse
n Quantification of the compressibility by calculating the error incurredby approximating a signal x by some ,
Kx S
Digital Media Lab.
n If , then for any p.
n Thresholding (keeping only the K largest coefficients) gives the optimal
approximation for all p.
n Choose a basis set such that the coefficients obey the power-law decay.
n
39
argm nK
K px
x xs S
= -
Kx S ( ) 0K pxs =
Compressibility (1)Compressibility (1)
n Definition of compressibility: A signal is called compressible if itssorted coefficient magnitudes in decays rapidly.
n Power-law decay: suppose there exist C1 and q > 0 such that
Y
1 2 ... nx wherea a a a = Y
sa , 1, 2,...qC s sa - =
Digital Media Lab.
n Larger q means faster magnitude decay, and the more compressible asignal is.
n In power-law decay, a signal can be approximated pretty well for K
7/30/2019 Data Compressing
28/163
Compressibility (2)Compressibility (2)
n Depending on the space (referred by ), the signal can be eithercompressible or not.n Therefore proper choice of the space is important
n Q: For a such compressible signal (K-approximated), there existconstant C2 and r > 0 depending only on C1 and q such that,
r-
Y
Digital Media Lab. 41
2 2K
KK--term Approximationterm Approximation
n Only K-largest coefficients are kept while making the others zero torepresent the given signal.n K-term approximation error
2( ) argmin
K
K x xa
s a S
= - Y
Digital Media Lab. 42
7/30/2019 Data Compressing
29/163
Digital Media Lab.
-- end --
43
7/30/2019 Data Compressing
30/163
Data CompressionData Compression
(ECE 5546(ECE 5546--41)41)
2012 Fall
Digital Media Lab.
Ch 3. Sensing Matrices
Byeungwoo JeonDigital Media Lab, SKKU, Korea
http://media.skku.ac.kr; [email protected]
Digital Media Lab.
What we are doing?
7/30/2019 Data Compressing
31/163
Sparse vs. Compressible (1)Sparse vs. Compressible (1)
n Recall that we call a signalxis K-sparse if it has at most k non-zeros.n K-sparse signal may not be themselves sparse, but it admits a sparse
representation in some basis
n But, few real-world signals are truly sparsen Most of signals can be represented as the compressible signal
0
{ : }K
x KS = 0
supp( )
lim
p
pp
x x->
=
Digital Media Lab.
x s= Y
=
Example: K= 4
Y sx
Sparse vs. Compressible (2)Sparse vs. Compressible (2)n Compressible signal means that the vector of coefficients in certain
basis has few large coefficients and other coefficients with smallvalues.,n If we set small coefficients to zero, the remaining large coefficients can
represent the original signal with hardly noticeable perpetual loss
sparse compressiblex
Digital Media Lab.
7/30/2019 Data Compressing
32/163
Compressed Sensing (1)Compressed Sensing (1)
n Compressed sensing measurement processn y: measurement vector (Mx1)n F: measurement (sensing) matrix (MxN)n x: input signal vector in its original domain (ex: time or spatial) (Nx1)n
(CS is also possible for continuous-time signal)
x= F
Digital Media Lab.
n F represents a dimensionality reduction (maps RN into RM, M
7/30/2019 Data Compressing
33/163
Compressed Sensing (3)Compressed Sensing (3)
n Measurement process withn There are small number of columns corresponding to nonzero coeffs.n The measurement vector y is a linear combination of these columns
n
Ex: The following is an underdetermined systemn 4-sparse, N unknownsn Fewer equations than unknowns (M
7/30/2019 Data Compressing
34/163
Questions to Answer (2)Questions to Answer (2)
n We further need to learn many issues:
n Q1: How to design an MxN sensing matrixF to ensure that itpreserves the information in the signal x?n
Sensing matrix which is designed to reduce the number of measurementsas much as possible while allowing the recovery wide class of signal xfrom their measurement
n Sensing matrix design problem
Digital Media Lab.
n Q2: How small Mcan we choose given K and N?
n Q3: How sparse the signal has to be at a given M?
n Q4: How to recover the original signalxfrom measurements y?n Look for fast and robust algorithmsSignal recovery problem
n Q5: When will the L1 convex relaxation solution attain the L0 solution?
Questions to Answer (3)Questions to Answer (3)n After answering the previous questions, we further need to investigate
yet another issues:
n How to incorporate measurement noise in the signal model y = Fx ?n What would happen to the L1 minimization signal recovery?n Reliability issue in signal recovery
n What would happen if there is a model mismatch ?
Digital Media Lab.
n If the signal is not exactly a K-sparse signal, what kind of results do weexpect under such an assumption?
7/30/2019 Data Compressing
35/163
Digital Media Lab.
Design of Sensing Matrix
1. Null Space Property2. Restricted Isometry property
3. Bounded Coherence Property
1. Null Space Conditions1. Null Space Conditionsn Q: Like to design F so that we can find recover all sparse signal x
corresponding to the measurement y. What condition on F do we need?
n Definition of Null space ofF
n Uniqueness solution condition fory= Fx
( ) { 0}N z zF = F =
( ) contains uniquely representsF F
Digital Media Lab.
n Proof:
2K Kno vector in all xS S
X y
.
.
.
.
..
.
.
.
.
In this case, no way to find all signals x
from the measurements y distinct x
must mean distinct measurement vectors.
7/30/2019 Data Compressing
36/163
SparkSpark
n Definition of Spark: The spark of a given matrix F is the smallestnumber of columns ofF that are linearly dependent.
n Sparkn
A term coined by Donoho & Elad (2003)n It is a way characterizing the null space ofF using L-0 norm.n It is very complex to obtain (compared to a rank), since it calls for
combinatorial search over all possible subsets of columns from F.
Digital Media Lab.
More on Spark (1)More on Spark (1)n Solving a underdetermined equation: (F ~ MxN, N >> M)
n Term A: Its corresponding columns ofF are linearly dependentn Term B: Its corresponding columns ofF are linearly independent
i i k k j k
i k j
y x x x xf f f= F = = + A: linearly dep. B:linearly indep.
Digital Media Lab.
1
2
1
1 2
.
...... .....
.
.
.
N
M
N
x
x
y
y
x
f f f
=
M
7/30/2019 Data Compressing
37/163
More on Spark (2)More on Spark (2)
n Note that any vector x can be represented asn Where
1 1
2 2
0x x
x x
MM MM
M MM
( )A
x NS F
Digital Media Lab.
n Note thatn (zero vector is not considered since it is a trivial case )
1 1
0
0
.
0
A B
i i
i i
N N
x x xx x
x x
x x
+ +
= = + = +
M M
2 ( ) 1spark M F +
Spark ConditionSpark Conditionn Theorem: for any vector ,
n (This is an equivalent way of characterizing the null space condition)
y R
( )at most one signal
2s.t. y=K
spark Kx x
$F >
S F
Digital Media Lab.
n This theorem guarantees uniqueness of representation for K-sparsesignals.n It has combinatorial computational complexity, since it must verify that all
sets of columns of a certain size are linearly independent.
n (Proved by D.L.Donoho and M.Elad Optimally sparse representationin general (nonorthogonal) dictionaries via l1 minimization)
n The spark provides a complete characterization of when sparserecovery is possible. However, when dealing with approximatelysparse signals (ie. compressible signal), we must consider somewhatmore restrictive conditions on the null space ofF.
7/30/2019 Data Compressing
38/163
Proof to Spark ConditionProof to Spark Condition
n () Suppose spark(F) 2Kn Assume that for some y there exists such that
n Letting , we can write this asn Since , all sets of up to 2K columns ofF are linearly
independent, and therefore .
, Kx x S
h x x= - 0hF =( ) 2spark KF >
0h =x=
y x x= F = F( ) 0x xF - =
Corollary to Spark ConditionCorollary to Spark Conditionn Corollary:
n Proof:2 ( ) 1; ( ) 2
2 1
2
spark M spark K
K M
K M
F + F >
< +
2 M
Digital Media Lab.
7/30/2019 Data Compressing
39/163
More on Spark ConditionMore on Spark Condition
n The spark provides a complete characterization of when sparserecovery is possible.
n However, when dealing with approximately sparse signals (ie.
( )at most one signal
2s.t. y=K
spark Kx x
$F >
S F
Digital Media Lab.
compressible signal), we must consider somewhat more restrictiveconditions on the null space ofF.n We must also ensure that NS(F) does not contain any vectors that are
too compressible in addition to vectors that are sparse. Null space property
n Notationn : length-N vector obtained by setting 0 the entries of x indicated by
n : MxN matrix obtained by replacing zero column at the columnpositions indexed by
{1, 2,..., }subset of indices NL {1, 2,..., } \C NL LxL
C
LFCL
2. Null Space Property (NSP)2. Null Space Property (NSP)n Definition of null space property (NSP) of order K
n A matrix F satisfies the null space property (NSP) of order Kif thereexists a constant C > 0 such that,
holds for all and for all such that .
1
2
Chh C
LL
h N F L KL
Digital Media Lab.
n Rf:
1
2
3
n
h
h
hh
h
=
M
M
1
3
0
,0
0
0
h
hhL
=
2
4
0
0C
n
h
hh
h
L
=
M
Ch h hL L= +
{1,3}L
7/30/2019 Data Compressing
40/163
Null Space Property (NSP)Null Space Property (NSP)
n The NSP implies that vectors in the null space ofF should not be tooconcentrated on a small subset of indices.
n If a vector h is exactly K-sparse, then there exists a such that
1
2
Ch
h C K
L
L
L 0C =
Digital Media Lab.
Therefore, NSP indicates that as well.
n This means that if a matrix F satisfies the NSP, then the only K-sparsevector in N(F) is h=0.
20, 0h thus hL L= =
NSP and Sparse RecoveryNSP and Sparse Recoveryn How to measure the performance of sparse recovery algorithms when
dealing with general non-sparse x.
n The following relationship under NSP guarantees exact recovery of allpossible K-sparse signals, but also ensures a degree of robustness tonon-sparse signals that directly depends on how well the signals are
approximated by K-sparse vectors.s
Digital Media Lab.
n Where represents a specific recovery method, and
1
2
Chh C
LL
:M ND R R
2x
K-
( ) min
K
K p pxx x xs
S= -
7/30/2019 Data Compressing
41/163
NSP TheoremNSP Theorem
n Theorem: For a sensing matrix F : RN RM, and an arbitraryrecovery algorithm D : RM RN
1
2
a pair ( , ) satisfiessatisfies
( )( ) 2K xx x C NSP of order KK
s
F DF
D F -
Digital Media Lab.
Proof of NSP TheoremProof of NSP Theorem
n Suppose and let be the indices corresponding to the 2K
largest entries of h. Split into and where
n Set and , so that .
n Since by construction , we can apply to
obtain . Moreover, since , we have
( )h N F LL 0L 1L 0 1 KL = L =
1Cx h hL= +
0x h = - h x x= -
Kx S1
2
( )( ) Kx x C
K
sD F -
( )x x = D F ( )h N F
Digital Media Lab.
n so that . Thus, . Finally, we have that
n If matrix F satisfies the NSP then the only 2K-sparse vector in N(F )is h=0
x x= - =
xF = F ( )x x = D F
1 1
2 2 2 2
( )( ) 2
2
CK
hxh h x x x x C C
K K
s LL = - = - D F =
7/30/2019 Data Compressing
42/163
RestrictedRestricted IsometryIsometry Property (RIP)Property (RIP)
n When measurements are contaminated with noise or have beencorrupted by some error such as quantization, it will be useful toconsider somewhat stronger conditions.
n
Candes and Tao introduced the isometry condition on matrices A andestablished its important role in CS.
n In mathematics, an isometry is a distance-preserving map between
Digital Media Lab.
metric spaces. Geometric figures which can be related by an isometryare called congruent.
RestrictedRestricted IsometryIsometry Property (RIP)Property (RIP)
n Definition of RIPn A matrix F satisfies the restricted isometry property(RIP) of order Kif
there exists a such that
holds for all .
(0,1)Kd 2 2 2
2 2 2(1 ) (1 )K Kx xd d- F +
{ }0|Kx x x KS =
Digital Media Lab.
n If a matrix F satisfies the RIP of order 2K, F approximately preservesthe distance between any pair of K-sparse vectors.n Fundamental implications concerning robustness to noise.
n If a matrix F satisfies the RIP of order K with constant dK, then, forany K < K, we automatically have that F satisfies the RIP of order Kwith constant .
n If a matrix F satisfies the RIP of order K with a sufficiently smallconstant, then it will also automatically satisfy the RIP of ordergKforcertain g, albeit with a somewhat worse constant.
'K Kd d
7/30/2019 Data Compressing
43/163
The RIP and StabilityThe RIP and Stability
n Definition of C-stable: Let denote a sensing matrix anddenote a recovery algorithm. A pair (F,D) is called C-
stable if for any and any , we have that
n It says that if we add a small amount of noise to the measurements,then the impact of this on the recovered signal should notbe
: ND R RKx S e
2 2
( )x e x C eD F + -
: N MF R R
Digital Media Lab.
arbitrarily large.
n As C 1, F must satisfy the lower bound of below with
n Thus if we desire to reduce the impact of noise in the recovered signal,we must adjust F so that it satisfies the lower bound of above inequalitywith a tighter constant.
2 2 2
2 2 2(1 ) (1 )
K Kx xd d- F +
21 1/ 0K
Cd = -
The RIP and StabilityThe RIP and Stabilityn Theorem: If a pair (F,D) is C-stable, then, for all
n It demonstrates that the existence of any decoding algorithm that can
stably recover from noisy measurements requires that F satisfy thelower bound of RIP with a constant determined by C.
2 2
1x x
C F
Kx S
Digital Media Lab.
7/30/2019 Data Compressing
44/163
Digital Media Lab.
End of Lecture 3
(Chapter 3 is continued in next week)
7/30/2019 Data Compressing
45/163
Data CompressionData Compression
(ECE 5546(ECE 5546--41)41)
2012 Fall
Digital Media Lab.
Ch 3. Sensing Matrices
Byeungwoo JeonDigital Media Lab, SKKU, Korea
http://media.skku.ac.kr; [email protected]
Digital Media Lab.
How many measurement are necessary to
achieve RIP?
(Measurement Bound)
2
7/30/2019 Data Compressing
46/163
Measurement bound (1)Measurement bound (1)
n Lemma: For K and M satisfying K < N/2, there exist a subset X of SKsuch that for any x in X, we have,
and for any distinct x & z in X,
2x K
2/ 2 log log
2
K Nx z K and X
-
Digital Media Lab.
n Proof:
3
Measurement bound (2)Measurement bound (2)n Theorem: Let F be an NxM matrix that satisfies the RIP of order 2K
with constant . Then,
n Proof:
(0,0.5]d
( )1
log log 24 1 0.282
NM CK where C
K
= +
Digital Media Lab. 4
7/30/2019 Data Compressing
47/163
Measurement bound (3)Measurement bound (3)
n Johnson-Lindenstrauss lemma:
( )002
log0
c pM where constant c
e >
Digital Media Lab. 5
Digital Media Lab.
How to design the sensing matrix?
6
7/30/2019 Data Compressing
48/163
RIP and NSP (1)RIP and NSP (1)
n Theorem: Suppose F satisfies the RIP of order 2K with .Then, F satisfies the NSP of order 2K with constant,
( )2 2 1Kd < -
( )
2
2
2
1 1 2
K
K
Cd
d
=
- +
Digital Media Lab.
RIP and NSP (2)RIP and NSP (2)n Lemma: Suppose , then,
n Lemma : Suppose that F satisfies the RIP of order 2K, and let
be arbitrary. Let be any subset of {1,2,,N} s.t.Define as the index set corresponding to the K
Ku S 1
2
uu K u
K
, 0N
h R h 00 1
.KL L
Digital Media Lab.
entries of with largest magnitude, and set . Then,
where,
8
0c 0 1
L = L L
0 1
2
2
,ch h hh
hKa b
L L
LL
F F +
2
2 2
2 1,
1 1
K
K K
da b
d d= =
- -
7/30/2019 Data Compressing
49/163
Matrix Design Satisfying RIPMatrix Design Satisfying RIP
n Q: How to construct matrix satisfying RIP?
n Methods1. Deterministic method2. Randomization method
n Method without specified d2K
(just assume d2K
> 0)
n Method with specified d2K
( particular value ofd2K
is specified)
Digital Media Lab.
n Definition of RIPn A matrix F satisfies the restricted isometry property(RIP) of order Kif
there exists a such that, for all ,
n Theorem on RIP and NSP: Suppose F satisfies the RIP of order 2Kwith . Then, F satisfies the NSP of order 2K with constant,
9
2 2 2
2 2 2(1 ) (1 )K Kx x xd d- F +
{ }0|K x x K S = (0,1)Kd
( )2 2 1Kd < -
( )2
2
2
1 1 2
K
K
Cd
d=
- +
Deterministic Matrix DesignDeterministic Matrix Designn Idea: deterministically construct matrices of size MxN that satisfy the
RIP of order Kn It requires M to be relatively large.
n (ex). Requires
n In real-world problem, these results lead to an unacceptably large M.( ) ( )2 og [62]; [115]M O K N in M O KN ina= =
Digital Media Lab. 10
7/30/2019 Data Compressing
50/163
Randomization Matrix Design (1)Randomization Matrix Design (1)
n Idea: Choose random numbers for matrix entriesn For given M and N, generate random matrices F by choosing the entries
fij as independent realizations from some PDF.
n Randomization method without specified d2K
(just assume d2K
> 0)n Set M=2K, and draw F according to Gaussian PDF.
n With probability 1, any subset of 2K columns are linearly indep. and hence all
subsets of 2K columns will be bounded below by (1- d2K
), where d2K
> 0.
Digital Media Lab.
n Problem: how to know the value ofd2K
?
n Need to search all combinations of K-dimensional subspaces of .
n Considering realistic values of N and K, such search is of prohibitively too
much computation.
11
N
R
Randomization Matrix DesignRandomization Matrix Design (2)(2)
n Randomization method with specified value ofd2K
n Like to achieve RIP of order 2K for a specified constant d2K
.n It can be achieved by specifying additional two conditions on the PDF.
n Cond 1: Let the PDF yield a matrix that is norm-preserving, That is,
n Under this condition, the variance of PDF is 1/M.
( )21
ijE f =
Digital Media Lab.
n Cond 2: The PDF is sub-Gaussian. That is, there exists a constant c > 0 s.t.
n Note that, the moment-generating function of the PDF is dominated by
that of a Gaussian PDF, which is also equivalent to requiring that the
tails of the PDF decay at least as fast as the tails of a Gaussian PDF.
12
( )2 2 / 2ij t c tE e e for all t R
f
7/30/2019 Data Compressing
51/163
Randomization Matrix DesignRandomization Matrix Design (3)(3)
n Examples of sub-Gaussian PDFn Gaussian, Bernoulli with taking values , more generally any PDF
with bounded support.
n Strictly-Sub-Gaussian: a PDF satisfyingwith a constant c below.
1
( )2 2 /2
.
ij t c t
E e e for all t R
f
= ( )2 2
1ij
c E f= =
Digital Media Lab.
n Corollary: suppose that F is an MxN matrix whose entries fij are iidwith fij drawn according to a strictly sub-Gaussian PDF with c
2=1/M.Let Y = Fx for x in RN. Then for any e > 0 and any x in RN,
With
n Note that the norm of a sub-Gaussian RV strongly concentrates about itsmean.
13
( ) ( )2
2 2 2 2 2
*2 2 2 2 2& P 2 exp
ME Y x Y x x
ee
k
= - -
* 2 6.52
1 log(2)k =
-
Randomization Matrix DesignRandomization Matrix Design (4)(4)
n Theorem: Fix . Let F be an MxN random matrix whoseentries fij drawn according to a strictly-Gaussian PDF with c
2=1/M.If,
then F satisfies the RIP of order K with prescribed d with probabilityexceeding , where k1 is arbitrary and
1 logN
M KK
k =
(0,1)d
21 2M
ek-- ( )2 *2 12 log 42 / .ek d k d k = -
Digital Media Lab.
n Note that the measurement bound above satisfies the optimalnumber of measurements (up to a constant).
14
7/30/2019 Data Compressing
52/163
Why randomized method better?Why randomized method better?
n One can show that for the random construction, the measurementsare democratic.n It means that it is possible to recover a signal using any sufficiently large
subset of the measurements.n Thus, by using random F one can be robust to the loss or corruption of a
small fraction of the measurements.
n Universality: can easily accommodate some other basis
Digital Media Lab.
n In practice, we are often more interested in the setting where x is sparsewith respect to some basis Y. In this case, what actually required is RIP ofthe product (FY).n In case deterministic design, the design process must take into account Y.n In randomized design, F can be designed independently from Y.
n IfF is Gaussian and Y is orthonormal, note that (FY) is also Gaussian.n Furthermore, for sufficiently large M, (FY) will satisfy RIP with high
probability.
15
Practical SituationPractical Situation
n In practical implementation, the fully random matrix design may besometimes impractical to build in HW. Therefore it is possible to:n Use a reduced amount of randomnessn Or model the architecture via matrices F that has significantly more
structure than a fully random matrixn EX: random demodulator[192], random filtering [194], modulated
wideband converter [147], random convolution [2,166], compressivemultiplier [179]
Digital Media Lab.
n Although not quite easy as in the fully random case, one can provethat many of such constructions also satisfy RIP.
16
7/30/2019 Data Compressing
53/163
CoherenceCoherence
n Definition: Coherence of a matrix F, m(F), is the largest absoluteinner product between any two columns offi, fj ofF.
12 2
,( ) max
i j
i j N i j
f fm
f f < F =
Digital Media Lab.
n Note that the coherence satisfies the relation below:
n Its lower bound is called Welch bound
n When N>>M, the lower bound is approximately to
n Coherence is related to spark, NSP, and RIP.
17
( ) 1( 1)
N M
M Nm
- F
-
( ) 1 /m F
CoherenceCoherence and Sparkand Sparkn Theorem: The eigenvalues of an NxN matrix Mwith entries
lie in the union of N discs centered at andwith radius .
,1 , ,ij
m i j N
( ), ,1 ,i i i id d c r i N = i iic m=
i ij
j i
r m
=
Digital Media Lab.
n Lemma: For any matrix F,
18
1( ) 1
( )spark
mF +
F
7/30/2019 Data Compressing
54/163
Coherence and NSPCoherence and NSP
n Theorem: If , then for each measurement vector ,
there exists at most one signal such that y=Fx.
1 11
2 ( )K
m
< + F
My R
Kx S
Digital Media Lab.
n Lemma: IfF has unit-norm columns and coherence m=m(F), then Fsatisfies the RIP of order K with d=(K-1)m for all K
7/30/2019 Data Compressing
55/163
Data CompressionData Compression
(ECE 5546(ECE 5546--41)41)
2012 Fall
Digital Media Lab.
Ch 4. Sparse Signal Recovery via L1 Minimization
Byeungwoo JeonDigital Media Lab, SKKU, Korea
http://media.skku.ac.kr; [email protected]
Digital Media Lab.
How to recover a sparse signalfrom a small
number of linear measurements ?
2
7/30/2019 Data Compressing
56/163
Sparse Signal Recovery (1)Sparse Signal Recovery (1)
n Problem:Fory=Fx and forxassumed to be sparse (or compressible), findsatisfying
where B(y) ensures that is consistent with the measurements y.
0 argmin ( )
z
x z subject to z B y=
x
Digital Media Lab.
n Under an assumption of x being sparse (or compressible), find xcorresponding to measurement y under L0 optimality condition.n The solution seeks for the sparsest signal in B(y).n Consistent with measurements: depending on existence of measurement
noise, B(y) has two cases.
3
{ }
{ }2
| ( )( )
| ( )
z y noise free caseB y
z z y noisy casee
F = -=
F -
Sparse Signal Recovery (2)Sparse Signal Recovery (2)
n The framework also holds for the case x is not apparently sparse.
n In that case, suppose , then the problem is,
where,
x a= Y
0 argmin ( )
z
z subject to z B ya =
{ }| ( )z z y noise free case FY = -
Digital Media Lab.
n Note that under an assumption of Y referring to orthonormal basis, itis possible to assume y=I without loss of generality.
4
{ }2
| ( )z z y noisy caseeFY -
7/30/2019 Data Compressing
57/163
Sparse Signal Recovery (3)Sparse Signal Recovery (3)
n How to solve the L0 minimization problem?
0 argmin ( )
z
z subject to z B y=
{ }{ }
2
| ( )( )| ( )
z y noise free caseB yz z y noisy casee
FY = -= FY -
Digital Media Lab.
n Note that || . ||0
is a non-convexfunction.n Potentially very complex (NP-hard) to solve this minimization problem.
n L0solution via L1 minimization
n If B(y) is convex, this problem becomes computationally tractable !n The solution prefers to a sparse solution in B(y).
n Big Question: Will L1 solution be similar to L1 solution?
5
1 argmin ( )
z
z subject to z B y=
Why LWhy L11 Minimization Preferred?Minimization Preferred?
n Intuitively,n L1 minimization promotes sparsity.n There are variety of reasons to suspect that L1 minimization will provide
an accurate method for sparse signal recovery.n L1 minimization provides a computationally tractable approach to the
signal recovery.
Digital Media Lab. 6
7/30/2019 Data Compressing
58/163
Digital Media Lab.
Analysis of L1 Minimization Solution
7
1 argmin ( )
z
x z subject to z B y=
NoiseNoise--free Signal Recovery (1)free Signal Recovery (1)
n Lemma: Let F be a matrix that satisfies the RIP of order 2K withconstant . Let be given, define ,L0 ~ index set corresponding to K entries of x w/ largest magnitudeL1 ~ index set corresponding to K entries of w/ largest magnitudeSet . If , then,
10 12
,( )K
h hxh C C
s L< F F > +
22 1
Kd < - ,
Nx x R h x x= -
0 1L = L LU 0
ch
1 1x x
Digital Media Lab.
where,
n It shows an error bound for the class of L1 minimization algorithm
when combined with a measurement matrix F satisfying RIP.n For specific bounds for concrete examples of B(y), need to examine how
requiring affects .
8
2L
20 1
2 2
1 (1 2 ) 22 ,
1 (1 2) 1 (1 2 )
K
K K
C Cd
d d
- -= =
- + - -
( )x B y ,h hL< F F >
7/30/2019 Data Compressing
59/163
NoiseNoise--free Signal Recovery (2)free Signal Recovery (2)
n Proof: (self-study)
Digital Media Lab. 9
NoiseNoise--free Signal Recovery (3)free Signal Recovery (3)
n Theorem: Let F be a matrix that satisfies the RIP of order 2K withconstant . When B(y)={z | Fz = y }, the solution to theL1 minimization obeys that,
102
( ) Kx x C
K
s-
22 1
Kd < -
Digital Media Lab.
n For and F satisfying RIP,
n Note L1 minimization exactly provides the solution by L0 minimization.
n In other words, for as few as O(Klog(N/K)) measurements, we can
exactly recover any K-sparse signals using the L1 minimization.n This can be shown also stable under noisy measurements.
10
{ }0|Kx x x K S =
2 0x x- =
7/30/2019 Data Compressing
60/163
NoiseNoise--free Signal Recovery (4)free Signal Recovery (4)
n Proof: For x belonging to B(y), the lemma can be applied to obtainthat for ,
Since , Therefore, Fh = 0, and the
10 12
2
,( )K
h hxh C C
hK
s L
L
< F F > +
, ( ), .x x B y y x x = F = F
h x x= -
Digital Media Lab.
, .
11
102
( )Kh C
K
s
--Q.E.D.---
Noisy Signal Recovery (1)Noisy Signal Recovery (1)
n Theorem: Let F be a matrix that satisfies the RIP of order 2K withconstant . Let y=Fx+e with (that is, boundednoise) . Then, for the L1 solution obeys
10 2
2
( ) K
xx x C C
K
se- +
22 1
Kd < -
2e e
{ }2
( ) | ,B y z z y e= F -
Digital Media Lab.
,
n This provides a bound on the worst-case performance for uniformly
bounded noise.
12
( ) ( )2 2
0 2
2 2
1 1 2 1 22 , 4
1 1 2 1 1 2
K K
K K
C Cd d
d d
- - += =
- + - +
7/30/2019 Data Compressing
61/163
Noisy Signal Recovery (2)Noisy Signal Recovery (2)
n Proof: (self-study)
Digital Media Lab. 13
Noisy Signal Recovery (3)Noisy Signal Recovery (3)
n What is the bound of recovery error if the noise is Gaussian ?
n Corollary: Let F be a sensing matrix that satisfies the RIP of order-
2, ( , )My x e where e R iid with N o s= F +
-
Digital Media Lab.
, ,obtain measurement y=Fx+e where the entries of e are iid N(0,s2).Then, when the solution to L1minimization obeys,
14
( )0222
1 8 1
1 (1 2)
c MK
K
x x M with probability at least ed
sd
-+- -- +
{ }2( ) | 2 ,B y z z y Ms= F -
7/30/2019 Data Compressing
62/163
Digital Media Lab.
How to recover a non-sparse signal from a
small number of linear measurements ?
15
InstanceInstance--optimal Guarantee (1)optimal Guarantee (1)
n Theorem: Let F be a MxN matrix and that is arecovery algorithm satisfying,
then,
( ) 22 ( ) 1,Kx x C x for some Ks- D F
: M NRD
( )2
1 1 1 / C N> - -
Digital Media Lab.
n n or er o ma e e oun o or a s gna s x w a cons an ,then, regardless of what recovery algorithm is being used, need totake measurements.
N
7/30/2019 Data Compressing
63/163
RfRf: Instance: Instance--Optimal?Optimal?
n The theorem says not only about exact recovery of all possible k-sparse signals, but also ensures a degree of robustness to non-sparse signals that directly depends on how well the signals areapproximated by k-sparse vectors.
Instance-optimal guarantee (i.e., it guarantees optimal performance foreach instance of x)
n Cf: Guarantee that onlyholds for some subsetof possible signals,
Digital Media Lab.
such as compressible or sparse signals (the quality of guaranteeadapts to the particular choice of x)n In that sense, instance-optimality is also commonly referred to as
uniform guarantee since they hold uniformly for all x.
17
InstanceInstance--optimal Guarantee (2)optimal Guarantee (2)
n Theorem: Fix . Let F be a MxN random matrix whoseentries are iid with drawn according to a strictly sub-Gaussiandistribution with If,
then, F satisfies the RIP of order K with the prescribed d with
(0,1)d
1log
NM K
Kk
ijf
ijf
21 / .c M=
Digital Media Lab.
pro a y excee ng , w ere s ar rary an21 2e-- 1k
( )2 *2 1/ 2 log 42 / /ek d k d k = -
7/30/2019 Data Compressing
64/163
InstanceInstance--optimal Guarantee (3)optimal Guarantee (3)
n Theorem: Let be fixed. Set , and suppose that Fbe a MxN sub-Gaussian random matrix with
and measurement is y= Fx. Set . Then, with probabilityexceeding , when the L1
NR
( )321 2 MMe e kk --- -
22 1
Kd < -
1log
NM K
K
k
2 ( )
Ke s=
{ }2
( ) | ,B y z z y e= F -
Digital Media Lab.
,
2 2
22
2
1 (1 2 ) 8 ( )
1 (1 2)
K K
K
K
x xd d
sd
+ - +-
- +
Digital Media Lab.
End of Chapter 4
20
7/30/2019 Data Compressing
65/163
Data CompressionData Compression(ECE 5546(ECE 5546--41)41)
2012 Fall
Digital Media Lab.
Ch 5. Algorithms for Sparse Recovery
Part 1
Byeungwoo Jeon
Digital Media Lab, SKKU, Korea
http://media.skku.ac.kr; [email protected]
Digital Media Lab.
Various recovery algorithms for compressed-
sensed sparse signal
2
7/30/2019 Data Compressing
66/163
Sparse Signal Recovery (1)Sparse Signal Recovery (1)
n Problem:Fory=Fx and forxassumed to be sparse (or compressible), findsatisfying
where B(y) ensures that is consistent with the measurements y.
0 argmin ( )
z
z subject to z B y=
x
From Chapter 4
Digital Media Lab.
n Under an assumption of x being sparse (or compressible), find xcorresponding to measurement y under L0 optimality condition.n The solution seeks for the sparsest signal in B(y).n Consistent with measurements: depending on existence of measurement
noise, B(y) has two cases.
n Loss (cost) function other than the Euclidean distance may also beappropriate.
3
{ }{ }
2
| ( )( )| ( )
z y noise free caseB yz z y noisy casee
F = -= F -
Sparse Signal Recovery (2)Sparse Signal Recovery (2)
n How to solve the L0 minimization problem?
0 argmin ( )
z
z subject to z B y=
{ }
{ }2
| ( )( )
| ( )
z z y noise free caseB y
z z y noisy casee
FY = -=
FY -
From Chapter 4
Digital Media Lab.
n Note that || . ||0
is a non-convexfunction.n Potentially very complex (NP-hard) to solve this minimization problem.
n L0solution via L1 minimization
n
If B(y) is convex, this problem becomes computationally tractable !n The solution prefers to a sparse solution in B(y).
n Big Question: Will L1 solution be similar to L1 solution?
4
1 argmin ( )
z
x z subject to z B y=
7/30/2019 Data Compressing
67/163
Use of Different NormsUse of Different Norms
n Solve the underdetermined system
n L2 norm (p=2): small penalty on small residual, strong penalty on
; ; ;MxN N M
y x where R x R y R M N= F F 0 as:
n The parameterm can be found by trial-and-error, or by statisticaltechnique such as cross-validation.
n Actually decision of a proper value m is a research problem.
9
( ){ }min ( ) ,x x H x ym+ F
ConvexConvexoptimizationoptimization--based method (2)based method (2)
n Ex: J(x) = ||x||p
n p=0 (L0 norm): directly measure sparsity (but hard to solve)n p=1 (L1 norm): gives robustness against outliers
n Ex: for example, p=2
{ }0
min ( ) | minx x
J x y x x subject to y x= F = F
( ),p
x y x yF = F -
Digital Media Lab.
n Ex: the noisy case can be modified in several ways:
10
2 0min
xx y subject to x KF -
2
2
0
1min , 0
2xx y xm m
F - + >
Review
Convexity,
Optimization,
etc.
( ){ }0 2
min ( ) | , minx x
J x H x y x subject to x ye eF F -
7/30/2019 Data Compressing
70/163
ConvexConvexoptimizationoptimization--based method (3)based method (3)
n Standard optimization package cannot be used for real applicationsof CS since the number of unknowns (that is, dimension of x) is verylarge.
n If there are no restrictions on the sensing matrix F and the signal x,the solution to the sparse approximation is very complex (NP-hard).n In practice, sparse approximation algorithms tend to be slow unless the
sensing matrix F admits a fast matrix-vector multiply (like fast transform
Digital Media Lab.
Need 1~2 volunteer to
investigate on fast
computation utliizing
the structure of
sensing matrix
algorithm utilizing matrix structure).n In case of compressible signal which needs some transformation first,
fast multiplication is possible when both the sensing (random) matrix andsparsity basis are structured.
n Then, the question is how to incorporate more sophisticated signalconstraints into sparsity models.
11
L0 Approach (1)L0 Approach (1)
n L0 norm explictly computes the number of nonzero components ofthe given datan Directly related to sparsity of a signaln A function card(x): cardinalityn For scalar x:
n Card(x) has no convexity properties.
( ) 0 ( 0) 1 ( 0)card x x and x= =
Digital Media Lab.
n Note however it is quasi-concave on since, for x, y 0
12
nR+
{ }( ) min ( ), ( )card x y card x card y+
From Prof. S. Boyd (EE364a, b) Stanford Univ
7/30/2019 Data Compressing
71/163
RfRf:: QuasiconvexityQuasiconvexity (1)(1)
n Quasiconvex function: a real-valued function defined on an intervalor on a convex subset of a real vector space such that the inverseimage of any set of the form (-infinity, a) is a convex set.n Informally, along any stretch of the curve, the highest point is one of the
endpoints.n The negative of a quasiconvex function is said to be quasiconcave.
Digital Media Lab. 13http://en.wikipedia.org/wiki/Quasiconvex_function
A quasiconvex function that
is not convex
A function that is not quasiconvex: the set of
points in the domain of the function for which
the function values are below the dashed red
line is the union of the two red intervals,
which is not a convex set
RfRf:: QuasiconvexityQuasiconvexity (2)(2)
n Def: A function f : S R defined on a convex subset S of a real vectorspace is quasiconvexif for all x, y S and l [0,1], then,
n Note that the pointsxand y, and the point directly between them, can be
points on a line or more generally points in n-dimensional space.n In words, iffis such that it is always true that a point directly between two
other oints does not ive a hi her a value of the function than do both of
{ } ( )(1 ) max ( ), ( )f x y f x f yl l+ -
Digital Media Lab.
the other points, then fis quasiconvex.
n An alternative way of defining a quasi-convex function is to require thateach sub-levelset Sa(f) is a convex set.
n A concave function can be quasiconvex function. For example log(x) isconcave, and it is quasiconvex.
n Any monotonic function is both quasiconvex and quasiconcave. Moregenerally, a function which decreases up to a point and increases fromthat point on is quasiconvex (compare unimodality).
14http://en.wikipedia.org/wiki/Quasiconvex_function
{ }( ) | ( ) ~aS f x f x a convex set =
7/30/2019 Data Compressing
72/163
RfRf:: QuasiconvexityQuasiconvexity (3)(3)
n Quasiconvexity is a generalization of convexity.n All convex functions are also quasiconvex, but not all quasiconvex
functions are convex.
n A function that is both quasiconvex and quasiconcave is quasilinear.
Digital Media Lab. 15
The probability density function of the normal
distribution is quasiconcave but not concave
http://en.wikipedia.org/wiki/Quasiconvex_function
A quasilinear function is both
quasiconvex and quasiconcave
L0 Approach (2)L0 Approach (2)
n General convex-cardinality problemn It refers to a problem what would be convex, except for appearance of
card(.) in objective or constrains.n Example: For f, C: convex,
( )inimize card x subject to x C
( ) , ( )inimize f x subject to x C card x K
Digital Media Lab.
n Solving convex-cardinality problem: for x Rn,
n Fix a sparsity pattern of x (i.e., which entries are zero/nonzero), thensolve its convex problem
n If we solve the 2n
convex problems associated with all possible sparsitypatterns, the convex-cardinality problem is solved completely.n
However, practically possible only for n 10n General convex-cardinality problem is NP-hard.
16From Prof. S. Boyd (EE364a, b) Stanford Univ
7/30/2019 Data Compressing
73/163
L0 Approach (3)L0 Approach (3)
n Many forms of optimization problems
2 0inimize x y subject to x K F -
0 2Minimize x subject to x y eF -
inimize x y xlF - +
Digital Media Lab.
n L1-norm Heuristicn Replace ||x||0 with l||x||1 or add regularization term l||x||1 to objective fct.n l is a parameter used to achieve desired sparsity
n More sophisticated versions use or wherew and v are positive weights.
17From Prof. S. Boyd (EE364a, b) Stanford Univ
( ) ( )i i i i i ii i i
w x w x v x+ -
+
RfRf: Reweighted L1 algorithm (1): Reweighted L1 algorithm (1)
n (joint work of E. Candes, M. Wakins and S. Boyd)
n Minimum L0 recovery requires minimal oversampling but intractable
n Observation: If x* solution to the combinatorial search and
{ 0}01
ix
i
min x subject to y x= = F
Digital Media Lab.
then, x* is also the solution to
18
*
**
*
10
0
i
ii
i
xxw
x
=
=*
i i
i
min w x subject to y x= F
From CS Theory Lecture Notes by E. Candes, 2007
7/30/2019 Data Compressing
74/163
RfRf: Reweighted L1 algorithm (2): Reweighted L1 algorithm (2)
Initial step:
Loop: For j=1,2,3,n Solve
n Update
(0 ) 1i
w for all i=
( ) ( 1)
arg min | |j j
i i
ix w x such that y x
- = = F
( ) 1j =
Digital Media Lab.
n Until convergence (typically 2~5 iterations)
n
Intuition: down-weight large entries of x to mimic magnitude-insensitive L0-penalty.
19
( )| |ix e+
From CS Theory Lecture Notes by E. Candes, 2007
RfRf: Reweighted L1 algorithm (3): Reweighted L1 algorithm (3)
n Empirical performance
Digital Media Lab. 20From CS Theory Lecture Notes by E. Candes, 2007
7/30/2019 Data Compressing
75/163
L1 Approach (1)L1 Approach (1)
n Connection between L1 norm and sparsityn Known for a long time, early 70sn Mainly studied in Geophysics (literature on sparse spike trains)n Key rough empirical fact is that L1 returns sparse solution
n Replace the combinatorial L0 function with the L1 norm, yielding aconvex optimization problemn It makes the problem tractable !
Digital Media Lab.
n here can be several variants of the problem.
21From CS Theory Lecture Notes by E. Candes, 2007
L1 Approach (2)L1 Approach (2)
n The L1 minimization problem
n There is always a solution with at mostM non-zero termsn
In general, the solution is unique
1min ,
N
xN
x Rx R
F
Digital Media Lab.
n m ar y,
n There is always a solution (r = y-Fx) has at most(N-M) non-zero termsn In general, the solution is unique
22
1min ,
N
xN
x Rx y R
F - F
From CS Theory Lecture Notes by E. Candes, 2007
7/30/2019 Data Compressing
76/163
L1 Approach (3)L1 Approach (3)
n Variantn Start with minimum cardinality problem (C: convex)
n Apply heuristic to obtain L1-norm minimization problem
( )inimize card x subject to x C
1|| ||inimize x subject to x C
Digital Media Lab.
n ariantn Start with cardinality constrained problem (f, C: convex)
n Apply heuristic to obtain L1-norm constrained problem
n Or L1-regularized problem
n b, g are adjusted so that card(x) K.
23
( ) , ( )inimize f x subject to x C card x K
1( ) , || ||Minimize f x subject to x C x b
1( ) || ||inimize f x x subject to x C l+
From Prof. S. Boyd (EE364a, b) Stanford Univ
L1 Approach (4)L1 Approach (4)
n Variant with polishingn Use L1 heuristic to find x estimate with required sparsityn Fix the sparsity pattern of xn Re-solve the (convex) optimization problem with this sparsity pattern to
obtain final (heuristic) solution.
Digital Media Lab. 24From Prof. S. Boyd (EE364a, b) Stanford Univ
7/30/2019 Data Compressing
77/163
Digital Media Lab.
Some examples: convex optimization
25
From Computational Methods for Sparse 2010 IEEE Proceedings by J.A. Tropp and J. Wright
EqualityEquality--constrainedconstrained ProblemProblem
n Equality-constrained problemn Among all x consistent with measurements, pick one with min L1 norm
1min ( 1)
xx subject to y x C= F
Digital Media Lab. 26From Computational Methods for Sparse 2010 IEEE
Proceedings by J.A. Tropp and J. Wright
7/30/2019 Data Compressing
78/163
Convex Relaxation MethodConvex Relaxation Method
n Convex Relaxation Method
n m is a regularization parameter: it governs the sparsity of the solutionn large m typically produces sparser results.
2
2
1
1min , 0 ( 2)
2xx y x Cm m
F - +
Digital Media Lab.
n How to choose m ?n Often need to solve the equation repeatedly for different choices ofm, or to
trace systematically the path of solutions as m decreases towards zero.
27From Computational Methods for Sparse 2010 IEEE
Proceedings by J.A. Tropp and J. Wright
LASSOLASSO
n Least Absolute Shrinkage and Selection Operator (LASSO) methodn It is equivalent to the convex relaxation method (C2) in the sense that the
path of solution of (C3) parameterized by positive b matches the solutionpath for as m varies.
2
2
1
min ( 3)x
x y subject to x CbF -
Digital Media Lab.
n Rf: its L0 version:
n Can interpret this as fitting the vector y as a linear combination of Kregressors (chosen form N possible regressors) ~ feature selection (instatistics).
n i.e., choose a subset of M regressors that (together) best fit or explain y.n
In can be solved (in principle) by trying all choices.
n Rf: An independent variable is also known as a "predictor variable", "regressor",
"controlled variable, "manipulated variable", "explanatory variable", "feature" (see
machine learning and pattern recognition) or an "input variable.
28From CS Tutorial at ITA 2008 by Baraniuk, Romberg, and Wakin
M
2 0min
xy subject to x KF -
7/30/2019 Data Compressing
79/163
OthersOthers
n Quadratic relaxation (LASSO)n Explicit parameterization of the error norm
21
min ( 4)x
x subject to x y CeF -
Digital Media Lab.
n Danzig selector (with residual correlation constraints)
29
( )1
minT
xx subject to x y e
F F -
From CS Tutorial at ITA 2008 by Baraniuk, Romberg, and Wakin
Further study (Volunteer?)Further study (Volunteer?)
n Other optimization algorithms:n interior point methods (slow, but extremely accurate)
n homotopy methods (fast and accurate for small-scale problems)
Digital Media Lab. 30
7/30/2019 Data Compressing
80/163
Gradient Method (1)Gradient Method (1)
n (also known as the first-order method) iteratively solve the followingproblem
n Similar methods under this categoryn Operator splitting [65]
2
2
1
1min , 0 ( 2)
2x
x y x Cm m
F - +
Digital Media Lab.
n Iterative splitting and thresholding (IST) [66]n Fixed-point iteration [67]n Sparse reconstruction via separable approximation (SpaRSA) [68]n TwIST [70]n GPSR [71]
31From Computational Methods for Sparse 2010 IEEE
Proceedings by J.A. Tropp and J. Wright
Gradient Method (2)Gradient Method (2)
n Gradient-descent framework
Input: a signal yRM, sensing matrixRMxN, regularization parameterm> 0, and initial estimate x0 Output: coefficient vector xRN
Algorithm:(1). Initialize: set k=1
+
Digital Media Lab. 32
.
If an acceptance test on is not passed, increase ak by somefactor and repeat.(3). Line search: choose gk (0,1] and obtain xk+1 from
(4). Test: If stopping criterion holds, terminate with x=xk+1. Otherwise,set kk+1 and goto (2).
k
( ) ( )2* *
2 1
1: arg min
2k k k k k
z
z x x y z x za m+ = - F F - + - +
( )1 :k k k k k x x xg ++ = + -
k
7/30/2019 Data Compressing
81/163
Gradient Method (3)Gradient Method (3)
n This gradient-based method works well on sparse signals when thedictionary F satisfies RIP.n It benefits from warm starting, that is, the work required to identify a
solution can be reduced dramatically when the initial estimate of x is close
to the solution.
n Continuation strategyn Solve the optimization problem (C2) for a decreasing sequences ofm
Digital Media Lab.
using the approximate solution for each value as the starting point for thenext sub-problem.
33From Computational Methods for Sparse 2010 IEEE
Procedings by J.A. Tropp and J. Wright
Review:Review:
Convex OptimizationConvex Optimization
Digital Media Lab. 34
7/30/2019 Data Compressing
82/163
References (1)References (1)
n Introduction to Optimizationn http://ocw.mit.edu/courses/electrical-engineering-and-computer-
science/6-079-introduction-to-convex-optimization-fall-2009/index.htm
Digital Media Lab. 35
References (2)References (2)
n Convex Optimization (EE364a by Prof. Boyd)n http://www.stanford.edu/class/ee364a/lectures.htmln Video lecture is also available Introduction
Convex sets
Convex functions
Convex optimization problems
DualityApproximation and fitting
Statistical estimation
Geometric roblems
Digital Media Lab. 36
Numerical linear algebra background
Unconstrained minimization
Equality constrained minimization
Interior-point methods
Conclusions
Lecture slides in one file.
Additional lecture slides:
Convex optimization examples
Stochastic programming
Chance constrained optimization
Filter design and equalizationDisciplined convex programming and CVX
Two lectures from EE364b:
methods for convex-cardinality problems
methods for convex-cardinality problems, part II
7/30/2019 Data Compressing
83/163
Mathematical Optimization ProblemMathematical Optimization Problem
n Optimize problem
Digital Media Lab. 37From Prof. S. Boyd (EE364a, b) Stanford Univ
Solving Optimization ProblemSolving Optimization Problem
n General optimization problemn Very difficult to solven Methods involve some compromise, e.g., very long computation time, or
not always finding the solution
n
Exceptions : certain problem classes can be solved efficiently andreliablyn Least-square problems
2min yF -
Digital Media Lab.
n Analytical solution
n Linear programming problemsn No analytical formula for solution
n Reliable and efficient algorithms and software~ a mature technology
n Convex optimization problemsn Objective and constraint functions are convex
n It includes least-squares problems and linear programming as special cases
38
x
( )1
* T Ty
-= F F F
in 1,.....,T T
ic x subject to a x yi m =
0min ( ) ( ) , 1,...,
i if x subject to f x y i m =
From Prof. S. Boyd (EE364a, b) Stanford Univ
7/30/2019 Data Compressing
84/163
Optimization problem in standard formOptimization problem in standard form
Digital Media Lab. 39
Optimal & locally optimal pointsOptimal & locally optimal points
Digital Media Lab. 40
7/30/2019 Data Compressing
85/163
Implicit constrainsImplicit constrains
Digital Media Lab. 41
Convex Set and Others (1)Convex Set and Others (1)
n Def: A set W is convexif and only if for any x1 and x2 W and for any, 0 1, the convex combination x= x1 + (1-) x2 W
n Example: convex set non-convex set non-convex set
Digital Media Lab.
n Def: Convex combination of x1,. . ., xk: any point x of the form x = 1x1+ 2x2 + + kxk with 1 + + k =1, j 0.
n Def: Convex hull (conv S): a set of all convex combinations of pointsin S
42From Prof. S. Boyd (EE364a, b) Stanford Univ
7/30/2019 Data Compressing
86/163
Convex Set and Others (2)Convex Set and Others (2)
n Def: Conic(nonnegative) combination of x1 and x1: any point of theform x = 1x1 + 2x2 with 1 0, 2 0.
Digital Media Lab.
n Def: Convex cone: a set that contains all conic combinations of pointsin the set
43From Prof. S. Boyd (EE364a, b) Stanford Univ
Convex functionConvex function
n Def: A function f(x): W R is convexif only if any convexcombination x = x1 + (1-) x2 for all x1, x2 W and , 0 1,satisfies f{x1 + (1-) x2} f(x1)+(1
)f(x2).
Digital Media Lab.
n Note that f is concave if (f) is convexn A function f is strictly convex iff f{x1 + (1-) x2} < f(x1)+(1
)f(x2).
44From Prof. S. Boyd (EE364a, b) Stanford Univ
7/30/2019 Data Compressing
87/163
11ststorder Conditionorder Condition
n Def: A function f is differentiable ifdom f is open and the gradient
exists at each x dom f.
n Def: 1st-order condition: A differentiable function f with convex
1
( ) ,...,n
f ff x
x x
=
Digital Media Lab.
domain is convexiff f(y) f(x)+f(x) (yx) for all x,ydom f.
45From Prof. S. Boyd (EE364a, b) Stanford Univ
2nd order Condition2nd order Condition
n Def: A function f is twice differentiable ifdom f is open and theHessian2f(x)Sn exists at each xdom f.
n Def: 2nd-order conditions: for twice-differentiable function f with
( )2
2( ) 1 ,
iji j
ff x for i j N
x x
=
Digital Media Lab.
convex oma n, a unc on s convex an on y
n Strict convex: no equality sign
46
( )2 ( ) 0 1 ,ij
f x for i j N
From Prof. S. Boyd (EE364a, b) Stanford Univ
7/30/2019 Data Compressing
88/163
Data CompressionData Compression
(ECE 5546(ECE 5546--41)41)
2012 Fall
Digital Media Lab.
Ch 5. Algorithms for Sparse Recovery
Part 2
Byeungwoo Jeon
Digital Media Lab, SKKU, Korea
http://media.skku.ac.kr; [email protected]
Recovery Algorithms (1)Recovery Algorithms (1)
n Category 1: Convex optimization approach (or convex relaxation)n Replace the combinatorial problem with a convex optimization problem.n Solve the convex-optimization problem with algorithms which can exploit
the problem structure.
n Category 2: Greedy algorithmsn Greedy pursuits
n Iteratively refine a sparse solution by successively identifying one or more
Digital Media Lab.
components that yield the greatest improvements in quality.
n In general very fast and are applicable to very large datasets, however,
theoretical peformance guarantees are typically weaker than those of some
other methods.
n Thresholding algorithmsn The methods alternate both element selection as well as element pruning
steps. These methods are often very easy to implement and can be relatively
fast.
n These have theoretical performance guarantees that rival those guarantees
derived for convex optimization-based approaches.
2
7/30/2019 Data Compressing
89/163
Recovery Algorithms (2)Recovery Algorithms (2)
n Category 3: Bayesian frameworkn Assume a prior distribution for the unknown coefficients favoring sparsity.n Develop a maximum a posterior estimator incorporating the observation.n Identify a region of significant posterior mass or average over most-
probable models.
n Category 4: Other approachesn Non-convex optimization method: relax the L0 problem to a related non-
Digital Media Lab.
convex problem and attempt to identify a stationary point.n Brute force method: search through all possible support sets, possibly
using cutting-plane methods to reduce the number of possibilities.n Heuristic method: based on belief-propagation and message-passing
techniques developed in graphical models and coding theory.
3
Digital Media Lab.
Greedy Algorithms
A greedy algorithm is an algorithm that follows the problem solving heuristic of
making the locally optimal choice at each stage with the hope of finding a global
optimum. In many problems, a greedy strategy does not in general produce anoptimal solution, but nonetheless a greedy heuristic may yield locally optimal
solutions that approximate a global optimal solution in a reasonable time.
4http://en.wikipedia.org/wiki/Greedy_algorithm
7/30/2019 Data Compressing
90/163
Greedy Algorithm (1)Greedy Algorithm (1)
n Starting at A, a greedy algorithm (GA) will find the local maximum at"m", instead of the global maximum at "M".
Search global
maximum startin
Digital Media Lab. 5http://en.wikipedia.org/wiki/Greedy_algorithm
from A?
Greedy Algorithm (2)Greedy Algorithm (2)
n The greedy algorithm determines theminimum number of coins to give whilemaking change. These are the steps ahuman would take to emulate a greedyalgorithm to represent 36 cents usingonly coins with values {1, 5, 10, 20}.
n The coin of the highest value, less
Ex: How to pay 36 cents
using only coins with
values {1, 5, 10, 20}?
Digital Media Lab.
than the remaining change owed, isthe local optimum.n (Note that in general the change-
making problem requires dynamic
programming or integer programming
to find an optimal solution.)
n However, most currency systems,
including the Euro (pictured) and US
Dollar, are special cases where the
greedy strategy does find an optimum
solution
6http://en.wikipedia.org/wiki/Greedy_algorithm
7/30/2019 Data Compressing
91/163
Sparse Signal Recovery viaSparse Signal Recovery via Greedy AlgorithmGreedy Algorithm
{ }| ( )z z y noise free case F = -=
0 argmin ( )
z
z subject to z B y=
x
x
n Problem:Fory=Fx and forxassumed to be sparse (or compressible), findsatisfying
where B(y) ensures that is consistent with the measurements y.
Digital Media Lab. 7
2| ( )z z y noisy caseeF -
n This problem can be re-written as,
n W denotes a particular subset of the indices i=1,,N, and fidenotes theith column ofF.
n Use greedy algorithm to find the index set W.
min : i ii
I y x fW
W
=
RfRf: Greedy Algorithms: Greedy Algorithms
n Greedy algorithms have been called in different terms in other fieldsn Statistics: Forward stepwise regression
n Nonlinear approximation: Pure greedy algorithm
n Signal Processing: Matching pursuit
n Radio Astronomy: CLEAN algorithm
Digital Media Lab. 8
7/30/2019 Data Compressing
92/163
7/30/2019 Data Compressing
93/163
Basic idea ofBasic idea of PursuitPursuit algorithm (3)algorithm (3)
n Pre-normalization: assume that all the columns ofF are normalizedby multiplying by normalizing matrix W
( ) ,normalizing M N N N W where R W R F F F
1 2 2
1 1.....
N
where W diagf f
=
Digital Media Lab.
n Under thispre-normalization, the solution of pursuit algorithm can beeasily found by identifying the column maximizing
n As a final step, the solution vector x should be post-normalized by
n Theorem tells that the normalization does not change the solution.n From now on, assume pre-normalization without loss of generality.
11
( )2
T
jyf
( )normalizing
Wx
Basic idea ofBasic idea of PursuitPursuit algorithm (4)algorithm (4)
n Suppose K > 1: since y is a linear combination ofKcolumns ofF, theproblem is to find a subset ofF consisting of K columns.n Need to enumerate combinations
n Greedy algorithm (Pursuit-based methods): instead of the exhaustivesearch, select column one by one in favor of local optimum.n Start from x(0) = 0 (residual r(0) = y), it iteratively constructs K-term
( )~ KN
O NK
Digital Media Lab.
, ,expanding the set by one additional column.
n The additional column at each stage is the one which maximally reducesthe residual error (in L2 sense) in approximating the measurement y usingthe currently active columns.n Residual: as-yet unexplained portion of the measurement
n After constructing an approximation including a new column, a newresidual vector is computed by subtracting the approximation represented
by the newly selected column from the current residual.n A new residual L2 error is evaluated: if it falls below a threshold, the
algorithm terminates. Otherwise, looks for another column.
12
7/30/2019 Data Compressing
94/163
Various Pursuit AlgorithmsVarious Pursuit Algorithms
n Matching Pursuit (MP) ~ also known as pure greedy algorithm
n Orthogonal Matching Pursuit (OMP)
n Weak-Matching Pursuit
n These algorithms all belong to Greedy algorithms (GA)n Its variants include
Digital Media Lab.
n Pure GA(PGA)
n Orthogonal GA(OGA)
n Relaxed GA(RGA)
n Weak GA (WGA)
n Rf: At this point, it is not fully clear what role greedy pursuit algorithmswill ultimately play in practice.From Computational Methods for Sparse 2010 IEEEProceedings by J.A. Tropp and J. Wright
13
Digital Media Lab.
Category 2: Greedy algorithms
n Greedy pursuits
n Thresholding algorithms
14
7/30/2019 Data Compressing
95/163
Matching Pursuit (MP) (1)Matching Pursuit (MP) (1)
n First proposed by Mallat and Zhang*: iterative greedy algorithm thatdecomposes a signal into a linear combination of elements from adictionary (i.e., sensing matrix).
0
( ) ( )
Inputs: Sensing matrix , measurement vector , error threhold
Outputs: a sparse signal
Initialize: Set 0, index set , and residual .
Main Iteration: Increment by 1 and perform followings:
Top Related