GRADIENT PROJECTION FOR SPARSE RECONSTRUCTION: APPLICATION TO COMPRESSED SENSING AND OTHER INVERSE...
-
Upload
baldwin-hubbard -
Category
Documents
-
view
241 -
download
0
Transcript of GRADIENT PROJECTION FOR SPARSE RECONSTRUCTION: APPLICATION TO COMPRESSED SENSING AND OTHER INVERSE...
GRADIENT PROJECTION FOR SPARSE RECONSTRUCTION:
APPLICATION TO COMPRESSED SENSING AND OTHER INVERSE PROBLEMS
M´ARIO A. T. FIGUEIREDO
ROBERT D. NOWAK
STEPHEN J. WRIGHT
BACKGROUND
PREVIOUS ALGORITHMS
Interior-point method
SparseLab: a Matlab software package designed to find sparse solutions to systems of linear equations
L1_ls: a Matlab implementation of the interior-point method for L1-regularized least squares
L1-MAGIC: a collection of MATLAB routines for solving the convex optimization programs central to compressive sampling
GRADIENT PROJECTION FOR SPARSE RECONSTRUCTION
Formulation
GRADIENT DESCENT
Gradient descent is a first-order optimization algorithm. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point.
BASICGRADIENT PROJECTION
GPSR-BASIC
GPSR-BB
An approach due to Barzilai and Borwein (BB) that F may not decrease at every iteration.
Standard non-monotone method:
Eliminate the backtracking line search step
Monotone method:
CONVERGENCE
Theorem 1: The sequence of iterates {z(k)} generated by
the either the GPSR-Basic of GPSR-BB algorithms either
terminates at a solution or else converges to a solution
of function below at an R-linear rate.
T. Serafini, G. Zanghirati, L. Zanni. “Gradient projection methods for large quadratic programs and applications in training support vector machines,” Optimization Methods and Software, vol. 20, pp. 353–378, 2004.
TERMINATION
Several termination criterions are presented while these options all perform well on all data sets.
The one author used in this paper is motivated by perturbation results for linear complementarity problem (LCP).
DEBIASING
The function above is a bias solution for least-square problem. So we could fix the zero components, and use standard least square to get a debiasing solution.
It is also worth pointing out that debiasing is not always desirable. Shrinking the selected coefficients can mitigate unusually large noise deviations, a desirable effect that may be undone by debiasing.
WARM STARTING AND CONTINUATION
After solving the problem with a given τ, we could use the solution to initialize GPSR for a nearby value of τ.
It has been noted recently that the speed of GPSR may degrade considerably for smaller values of the regularization parameter τ. However, if we use GPSR for a larger value of τ, then decrease τ in steps toward its desired value.
To benefit from a warm start, IP methods require the initial point to be not only close to the solution but also sufficiently interior to the feasible set and close to a “central path,” which is difficult to satisfy in practice.
EXPERIMENTS
EXPERIMENTS
CONCLUSIONS
Significantly faster than the state-of-the-art algorithms in experimental comparisons
Poor performance when the regularization parameter τ is small, while continuation heuristic could be used to recover efficient practical performance
While it is not obvious WHY GPSR perform well