Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of...
Transcript of Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of...
![Page 1: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,](https://reader033.fdocuments.in/reader033/viewer/2022042116/5e93c8a798d8132d9f59c1db/html5/thumbnails/1.jpg)
Fast first-order methods for convex optimization
with line search
Katya Scheinberg Lehigh University
(joint work with X. Bai, D. Goldfarb and S. Ma)
TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAAAAAAAA
6/26/12 ERGO seminar, The University of Ediburgh
![Page 2: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,](https://reader033.fdocuments.in/reader033/viewer/2022042116/5e93c8a798d8132d9f59c1db/html5/thumbnails/2.jpg)
l The field of convex optimization has been extensively developed since Khachian showed in 1979 that ellipsoid method has polynomial complexity when applied to LP.
l General theory of interior point algorithms for convex optimization was developed by Nesterov and Nemirovskii.
l Any convex optimization problem can be solved in polynomial time by an IPM. For some known classes (LP, QP, SDP) the IPMs are readily available.
l For decades optimization methods relied of the fact that the problem data, when large, is typically sparse.
l Second-order methods (IPM) have good convergence rate, but high per iteration complexity. They exploit sparsity structure to facilitate linear algebra.
l First-order methods (gradient based) have slow convergence and were considered inefficient.
Introduction
6/26/12 ERGO seminar, The University of Ediburgh
![Page 3: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,](https://reader033.fdocuments.in/reader033/viewer/2022042116/5e93c8a798d8132d9f59c1db/html5/thumbnails/3.jpg)
l At the core of many statistical machine learning problems lies an optimization problem, often convex, from a well-studied class (LP, QP, SDP).
l These problems are very large and dense in terms of data. l IPMs are often too expensive to use. ML community initially
assumed that traditional optimization methods have to be abandoned.
l However, often structure (sparsity) is present in the solution. l This structure can be well exploited by first-order
approaches to convex optimization. l Recent advances in complexity results give rise to very
significant interest in first-order methods.
Introduction
6/26/12 ERGO seminar, The University of Ediburgh
![Page 4: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,](https://reader033.fdocuments.in/reader033/viewer/2022042116/5e93c8a798d8132d9f59c1db/html5/thumbnails/4.jpg)
l Problem:
l Assumptions:
Problem under consideration
Many applications involve optimization of this form
6/26/12 ERGO seminar, The University of Ediburgh
![Page 5: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,](https://reader033.fdocuments.in/reader033/viewer/2022042116/5e93c8a798d8132d9f59c1db/html5/thumbnails/5.jpg)
l Breast cancer diagnostics l Test results of a group of patients, some have been
diagnosed with cancer, other do not have it. Find how the test can predict high risk patients.
l Spam filter l From a list of spam and nonspam labeled emails learn to
detect spam automatically.
l Genetic disease l find away to identify high risk individuals based on gene
expression data. l Target customer groups
l By demographic data and past purchases find customers most likely to buy certain products.
Support vector machine classification
6/26/12 ERGO seminar, The University of Ediburgh
![Page 6: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,](https://reader033.fdocuments.in/reader033/viewer/2022042116/5e93c8a798d8132d9f59c1db/html5/thumbnails/6.jpg)
l Problem:
Support Vector Machines
+ -
6/26/12 ERGO seminar, The University of Ediburgh
![Page 7: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,](https://reader033.fdocuments.in/reader033/viewer/2022042116/5e93c8a798d8132d9f59c1db/html5/thumbnails/7.jpg)
l Problem:
Linear Support Vector Machines
+ -
6/26/12 ERGO seminar, The University of Ediburgh
![Page 8: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,](https://reader033.fdocuments.in/reader033/viewer/2022042116/5e93c8a798d8132d9f59c1db/html5/thumbnails/8.jpg)
l Compresses sensing, MRI l Recover sparse signal x, which satisfies Ax=b.
l Sparse least square regression (Lasso) l Find linear regression models while selecting important
features.
l Regression models using polynomials with variable selection l birthweight dataset from Hosmer and
Lemeshow (1989), weight of 189 babies and 8 variables per mother. Predictive models for birthweight.
Sparse models
6/26/12 ERGO seminar, The University of Ediburgh
![Page 9: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,](https://reader033.fdocuments.in/reader033/viewer/2022042116/5e93c8a798d8132d9f59c1db/html5/thumbnails/9.jpg)
l Problem:
Lasso regression
6/26/12 ERGO seminar, The University of Ediburgh
![Page 10: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,](https://reader033.fdocuments.in/reader033/viewer/2022042116/5e93c8a798d8132d9f59c1db/html5/thumbnails/10.jpg)
l Problem:
Group Lasso regression
• Assume that columns of A form groups of correlated features. • Find sparse vector x where nonzeros are selected according to groups • xi is a subvector of x corresponding to the i-th group of features.
6/26/12 ERGO seminar, The University of Ediburgh
![Page 11: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,](https://reader033.fdocuments.in/reader033/viewer/2022042116/5e93c8a798d8132d9f59c1db/html5/thumbnails/11.jpg)
FMRI Analysis and schizophrenia prediction
Network Extracted
Correlation Matrix
(N2=2x1010)Thresholded Matrix
MR
Signal
M1
V1
PP
1 N1
N
-0.5
0
0.5
1
1 N1
N
Measuring blood oxigination in voxels of the brain.
Construct predictive models based on FMRI data, use to predict/diagnose schizophrenia or classify “states of mind”.
6/26/12 ERGO seminar, The University of Ediburgh
![Page 12: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,](https://reader033.fdocuments.in/reader033/viewer/2022042116/5e93c8a798d8132d9f59c1db/html5/thumbnails/12.jpg)
l Problem:
l Formulation:
Sparse Inverse Covariance Selection
6/26/12 ERGO seminar, The University of Ediburgh
![Page 13: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,](https://reader033.fdocuments.in/reader033/viewer/2022042116/5e93c8a798d8132d9f59c1db/html5/thumbnails/13.jpg)
l Lasso or CS:
l Group Lasso or MMV
l Matrix Completion
l Robust PCA
l SICS
Summary and add’l examples
6/26/12 ERGO seminar, The University of Ediburgh
![Page 14: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,](https://reader033.fdocuments.in/reader033/viewer/2022042116/5e93c8a798d8132d9f59c1db/html5/thumbnails/14.jpg)
First-order methods applied to problems of the form f(x)+g(x)
6/26/12 ERGO seminar, The University of Ediburgh
![Page 15: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,](https://reader033.fdocuments.in/reader033/viewer/2022042116/5e93c8a798d8132d9f59c1db/html5/thumbnails/15.jpg)
l Consider:
l Quadratic upper approximation
Prox method with nonsmooth term
Assume that g(y) is such that the above function is easy to optimize over y
6/26/12 ERGO seminar, The University of Ediburgh
![Page 16: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,](https://reader033.fdocuments.in/reader033/viewer/2022042116/5e93c8a798d8132d9f59c1db/html5/thumbnails/16.jpg)
l Minimize quadratic upper approximation on each iteration
l O(L/²) complexity: If ¯/L· µ· 1/L then in k iterations finds solution
ISTA/prox gradient projection
6/26/12 ERGO seminar, The University of Ediburgh
![Page 17: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,](https://reader033.fdocuments.in/reader033/viewer/2022042116/5e93c8a798d8132d9f59c1db/html5/thumbnails/17.jpg)
l Minimize upper approximation at a “shifted” point.
l complexity: If ¯/L·µ· 1/L then in k iterations finds solution
Fast first-order method Nesterov, Beck & Teboulle
6/26/12 ERGO seminar, The University of Ediburgh
![Page 18: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,](https://reader033.fdocuments.in/reader033/viewer/2022042116/5e93c8a798d8132d9f59c1db/html5/thumbnails/18.jpg)
Specifically for CS setting and Lasso
f(x) g(x)
2 matrix/vector multiplications + shrinkage operator per iteration
iteration bound
6/26/12 ERGO seminar, The University of Ediburgh
![Page 19: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,](https://reader033.fdocuments.in/reader033/viewer/2022042116/5e93c8a798d8132d9f59c1db/html5/thumbnails/19.jpg)
Choosing prox parameter via backtracking
6/26/12 ERGO seminar, The University of Ediburgh
![Page 20: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,](https://reader033.fdocuments.in/reader033/viewer/2022042116/5e93c8a798d8132d9f59c1db/html5/thumbnails/20.jpg)
• Minimize quadratic upper relaxation on each iteration
• Using backtracking find µk such that
• In k iterations finds solution
Beck&Teboulle, Tseng, Auslender&Teboulle, 2008
Iterative Shrinkage Threshholding Algorithm (ISTA)
6/26/12 ERGO seminar, The University of Ediburgh
![Page 21: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,](https://reader033.fdocuments.in/reader033/viewer/2022042116/5e93c8a798d8132d9f59c1db/html5/thumbnails/21.jpg)
Minimize quadratic upper relaxation on each iteration
Using backtracking find µk · µk-1 such that
In k iterations finds solution
Fast Iterative Shrinkage Threshholding Algorithm (FISTA)
Beck&Teboulle, Tseng, 2008 6/26/12 ERGO seminar, The University of Ediburgh
![Page 22: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,](https://reader033.fdocuments.in/reader033/viewer/2022042116/5e93c8a798d8132d9f59c1db/html5/thumbnails/22.jpg)
Minimize quadratic upper relaxation on each iteration
Using backtracking find µk · µk-1 such that
In k iterations finds solution
Fast Iterative Shrinkage Threshholding Algorithm (FISTA)
6/26/12 ERGO seminar, The University of Ediburgh
![Page 23: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,](https://reader033.fdocuments.in/reader033/viewer/2022042116/5e93c8a798d8132d9f59c1db/html5/thumbnails/23.jpg)
Find µk · µk-1 such that
Cycle to find µk
Convergence rate:
Need to compute Ax-b
6/26/12 ERGO seminar, The University of Ediburgh
![Page 24: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,](https://reader033.fdocuments.in/reader033/viewer/2022042116/5e93c8a798d8132d9f59c1db/html5/thumbnails/24.jpg)
Find µk such that
Goldfarb & S. 2010
To allow for larger µk we need to
reduce and vice versa
6/26/12 ERGO seminar, The University of Ediburgh
![Page 25: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,](https://reader033.fdocuments.in/reader033/viewer/2022042116/5e93c8a798d8132d9f59c1db/html5/thumbnails/25.jpg)
Find µk such that
Goldfarb & S. 2010
FISTA with full backtracking
Cycle to find µ and t
6/26/12 ERGO seminar, The University of Ediburgh
![Page 26: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,](https://reader033.fdocuments.in/reader033/viewer/2022042116/5e93c8a798d8132d9f59c1db/html5/thumbnails/26.jpg)
Find µk such that
Goldfarb & S. 2010
FISTA with full backtracking
Cycle to find µ and t
6/26/12 ERGO seminar, The University of Ediburgh
Bla-bla-bla….
![Page 27: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,](https://reader033.fdocuments.in/reader033/viewer/2022042116/5e93c8a798d8132d9f59c1db/html5/thumbnails/27.jpg)
Computational Results
6/26/12 ERGO seminar, The University of Ediburgh
![Page 28: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,](https://reader033.fdocuments.in/reader033/viewer/2022042116/5e93c8a798d8132d9f59c1db/html5/thumbnails/28.jpg)
Computational results
6/26/12 ERGO seminar, The University of Ediburgh
![Page 29: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,](https://reader033.fdocuments.in/reader033/viewer/2022042116/5e93c8a798d8132d9f59c1db/html5/thumbnails/29.jpg)
Computational results
6/26/12 ERGO seminar, The University of Ediburgh
![Page 30: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,](https://reader033.fdocuments.in/reader033/viewer/2022042116/5e93c8a798d8132d9f59c1db/html5/thumbnails/30.jpg)
Computational results
6/26/12 ERGO seminar, The University of Ediburgh
![Page 31: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,](https://reader033.fdocuments.in/reader033/viewer/2022042116/5e93c8a798d8132d9f59c1db/html5/thumbnails/31.jpg)
Computational results
6/26/12 ERGO seminar, The University of Ediburgh
![Page 32: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,](https://reader033.fdocuments.in/reader033/viewer/2022042116/5e93c8a798d8132d9f59c1db/html5/thumbnails/32.jpg)
Computational results
6/26/12 ERGO seminar, The University of Ediburgh
![Page 33: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,](https://reader033.fdocuments.in/reader033/viewer/2022042116/5e93c8a798d8132d9f59c1db/html5/thumbnails/33.jpg)
Complexity bounds on alternating linearization methods
6/26/12 ERGO seminar, The University of Ediburgh
![Page 34: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,](https://reader033.fdocuments.in/reader033/viewer/2022042116/5e93c8a798d8132d9f59c1db/html5/thumbnails/34.jpg)
l Consider:
l Relax constraints via Augmented Lagrangian technique
Alternating directions method
Assume that f(x) and g(y) are both such that the above functions are easy to optimize in x or y
6/26/12 ERGO seminar, The University of Ediburgh
![Page 35: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,](https://reader033.fdocuments.in/reader033/viewer/2022042116/5e93c8a798d8132d9f59c1db/html5/thumbnails/35.jpg)
Sparse Inverse Covariance Selection
Shrinkage O(n2) ops
Eigenvalue decomposition O(n3) ops. Same as one gradient of f(X)
f(x) g(x)
6/26/12 ERGO seminar, The University of Ediburgh
![Page 36: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,](https://reader033.fdocuments.in/reader033/viewer/2022042116/5e93c8a798d8132d9f59c1db/html5/thumbnails/36.jpg)
Lasso or group Lasso
Shrinkage O(n2) ops
Matrix inverse, can take O(n3) ops. But can also be O(n ln n) for special A.
f(x) g(x)
6/26/12 ERGO seminar, The University of Ediburgh
![Page 37: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,](https://reader033.fdocuments.in/reader033/viewer/2022042116/5e93c8a798d8132d9f59c1db/html5/thumbnails/37.jpg)
Alternating direction method (ADM)
Widely used method without complexity bounds
6/26/12 ERGO seminar, The University of Ediburgh
![Page 38: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,](https://reader033.fdocuments.in/reader033/viewer/2022042116/5e93c8a798d8132d9f59c1db/html5/thumbnails/38.jpg)
A slight modification of ADM
This turns out to be equivalent to……
Goldfarb, Ma, S, ’09-’10
6/26/12 ERGO seminar, The University of Ediburgh
![Page 39: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,](https://reader033.fdocuments.in/reader033/viewer/2022042116/5e93c8a798d8132d9f59c1db/html5/thumbnails/39.jpg)
Alternating linearization method (ALM) Goldfarb, Ma, S, ’09-’10
6/26/12 ERGO seminar, The University of Ediburgh
![Page 40: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,](https://reader033.fdocuments.in/reader033/viewer/2022042116/5e93c8a798d8132d9f59c1db/html5/thumbnails/40.jpg)
Fast ALM (FALM) Goldfarb, Ma, S, ’09-’10
6/26/12 ERGO seminar, The University of Ediburgh
![Page 41: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,](https://reader033.fdocuments.in/reader033/viewer/2022042116/5e93c8a798d8132d9f59c1db/html5/thumbnails/41.jpg)
Complexity results
FISTA
FALM
6/26/12 ERGO seminar, The University of Ediburgh
![Page 42: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,](https://reader033.fdocuments.in/reader033/viewer/2022042116/5e93c8a798d8132d9f59c1db/html5/thumbnails/42.jpg)
Experiments on SICS
6/26/12 ERGO seminar, The University of Ediburgh
Gene expression networks using the five data sets from Li and Toh(2010) (1) Lymph node status (2) Estrogen receptor; (3) Arabidopsis thaliana; (4) Leukemia; (5) Hereditary breast cancer.
PSM by Duchi et al (2008) and VSM by Lu (2009)
![Page 43: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,](https://reader033.fdocuments.in/reader033/viewer/2022042116/5e93c8a798d8132d9f59c1db/html5/thumbnails/43.jpg)
Experiments in CS
6/26/12 ERGO seminar, The University of Ediburgh
Comparison of algorithms on image recovery problem. Here matrix inverse take O(n ln n) ops, as do mat-vec multiplications.
PSM by Duchi et al (2008) and VSM by Lu (2009)
![Page 44: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,](https://reader033.fdocuments.in/reader033/viewer/2022042116/5e93c8a798d8132d9f59c1db/html5/thumbnails/44.jpg)
FALM with backtracking Goldfarb, S, ‘10
6/26/12 ERGO seminar, The University of Ediburgh
![Page 45: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,](https://reader033.fdocuments.in/reader033/viewer/2022042116/5e93c8a798d8132d9f59c1db/html5/thumbnails/45.jpg)
Conclusion and Future work
l Performing backtracking carefully is possible and desirable in accelerated first order methods.
l The trade-offs are different and need to be explored for particular applications beyond CS.
l Accelerated alternating direction methods can utilize the same ideas.
l Careful implementation is being considered. l Combining backtracking with inexact evaluations
may be beneficial. l Seeking problems where average behavior differs
greatly from the worst case. 6/26/12 ERGO seminar, The University of Ediburgh
![Page 46: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,](https://reader033.fdocuments.in/reader033/viewer/2022042116/5e93c8a798d8132d9f59c1db/html5/thumbnails/46.jpg)
Thank you!
6/26/12 ERGO seminar, The University of Ediburgh