Non-convex optimizationVenn diagram. Non-convex functions. Non-convex functions local maxima Global...

32
Non-convex optimization Issam Laradji

Transcript of Non-convex optimizationVenn diagram. Non-convex functions. Non-convex functions local maxima Global...

Page 1: Non-convex optimizationVenn diagram. Non-convex functions. Non-convex functions local maxima Global minimum Local minima. Non-convex functions Global minimum ... Global minimum Local

Non-convex optimization

Issam Laradji

Page 2: Non-convex optimizationVenn diagram. Non-convex functions. Non-convex functions local maxima Global minimum Local minima. Non-convex functions Global minimum ... Global minimum Local

Strongly Convex

f(x)

x

Objective function

Page 3: Non-convex optimizationVenn diagram. Non-convex functions. Non-convex functions local maxima Global minimum Local minima. Non-convex functions Global minimum ... Global minimum Local

Strongly Convex

Assumptionsf(x)

x

Objective function

Gradient Lipschitz continuous

Strongly convex

Page 4: Non-convex optimizationVenn diagram. Non-convex functions. Non-convex functions local maxima Global minimum Local minima. Non-convex functions Global minimum ... Global minimum Local

Strongly Convex

Assumptionsf(x)

x

Objective function

Gradient Lipschitz continuous

Strongly convex

Randomized coordinate descent

Page 5: Non-convex optimizationVenn diagram. Non-convex functions. Non-convex functions local maxima Global minimum Local minima. Non-convex functions Global minimum ... Global minimum Local

Non-strongly Convex optimization

Assumptions

Gradient Lipschitz continuous

Convergence rate

Compared to the strongly convex convergence rate

Page 6: Non-convex optimizationVenn diagram. Non-convex functions. Non-convex functions local maxima Global minimum Local minima. Non-convex functions Global minimum ... Global minimum Local

Non-strongly Convex optimization

Page 7: Non-convex optimizationVenn diagram. Non-convex functions. Non-convex functions local maxima Global minimum Local minima. Non-convex functions Global minimum ... Global minimum Local

Non-Strongly Convex

Assumptions

Objective function

Lipschitz continuous

Restricted secant inequality

Randomized coordinate descent

Page 8: Non-convex optimizationVenn diagram. Non-convex functions. Non-convex functions local maxima Global minimum Local minima. Non-convex functions Global minimum ... Global minimum Local

Invex functions (a generalization of convex function)

Assumptions

Objective function

Lipschitz continuous

Polyak [1963]

This inequality simply requires that the gradient grows faster than a linear function as we move away from the optimal function value.

Invex function (one global minimum)

Page 9: Non-convex optimizationVenn diagram. Non-convex functions. Non-convex functions local maxima Global minimum Local minima. Non-convex functions Global minimum ... Global minimum Local

Invex functions (a generalization of convex function)

Assumptions

Objective function

Lipschitz continuous

Polyak [1963] - for invex functions where this holds

Randomized coordinate descent

Invex function (one global minimum)

Page 10: Non-convex optimizationVenn diagram. Non-convex functions. Non-convex functions local maxima Global minimum Local minima. Non-convex functions Global minimum ... Global minimum Local

Invex functions (a generalization of convex function)

Assumptions

Objective function

Lipschitz continuous

Polyak [1963] - for invex functions where this holds

Randomized coordinate descent

PolyakConvex

Invex

Venn diagram

Page 11: Non-convex optimizationVenn diagram. Non-convex functions. Non-convex functions local maxima Global minimum Local minima. Non-convex functions Global minimum ... Global minimum Local

Non-convex functions

Page 12: Non-convex optimizationVenn diagram. Non-convex functions. Non-convex functions local maxima Global minimum Local minima. Non-convex functions Global minimum ... Global minimum Local

Non-convex functionslocal maxima

Global minimum Local minima

Page 13: Non-convex optimizationVenn diagram. Non-convex functions. Non-convex functions local maxima Global minimum Local minima. Non-convex functions Global minimum ... Global minimum Local

Non-convex functions

Global minimum Local minima

Strategy 1: local optimization of the non-convex function

Page 14: Non-convex optimizationVenn diagram. Non-convex functions. Non-convex functions local maxima Global minimum Local minima. Non-convex functions Global minimum ... Global minimum Local

Non-convex functionslocal maxima

Global minimum Local minima

Assumptions for local non-convex optimization

Lipschitz continuous Locally convex

Page 15: Non-convex optimizationVenn diagram. Non-convex functions. Non-convex functions local maxima Global minimum Local minima. Non-convex functions Global minimum ... Global minimum Local

Non-convex functions

Global minimum

Local minima

Strategy 1: local optimization of the non-convex function

All convex functions rates apply.

Local randomized coordinate descentlocal maxima

Page 16: Non-convex optimizationVenn diagram. Non-convex functions. Non-convex functions local maxima Global minimum Local minima. Non-convex functions Global minimum ... Global minimum Local

Non-convex functions

Global minimum Local minima

Strategy 1: local optimization of the non-convex function

Issue: dealing with saddle points

local maxima

Page 17: Non-convex optimizationVenn diagram. Non-convex functions. Non-convex functions local maxima Global minimum Local minima. Non-convex functions Global minimum ... Global minimum Local

Non-convex functions

Global minimum

Local minima

Strategy 2: Global optimization of the non-convex function

Issue: Exponential number of saddle points

Page 18: Non-convex optimizationVenn diagram. Non-convex functions. Non-convex functions local maxima Global minimum Local minima. Non-convex functions Global minimum ... Global minimum Local

Local non-convex optimization● Gradient Descent

○ Difficult to define a proper step size

● Newton method○ Newton method solves the slowness problem by rescaling the gradients in

each direction with the inverse of the corresponding eigenvalues of the hessian

○ can result in moving in the wrong direction (negative eigenvalues)

● Saddle-Free Newton’s method○ rescales gradients by the absolute value of the inverse Hessian and the

Hessian’s Lanczos vectors.

Page 19: Non-convex optimizationVenn diagram. Non-convex functions. Non-convex functions local maxima Global minimum Local minima. Non-convex functions Global minimum ... Global minimum Local

Local non-convex optimization● Random stochastic gradient descent

○ Sample noise r uniformly from unit sphere○ Escapes saddle points but step size is difficult to determine

● Cubic regularization [Nesterov 2006]

Gradient Lipschitz continuous

Hessian Lipschitz continuous

Page 20: Non-convex optimizationVenn diagram. Non-convex functions. Non-convex functions local maxima Global minimum Local minima. Non-convex functions Global minimum ... Global minimum Local

Local non-convex optimization● Random stochastic gradient descent

○ Sample noise r uniformly from unit sphere○ Escapes saddle points but step size is difficult to determine

● Momentum○ can help escape saddle points (rolling ball)

Page 21: Non-convex optimizationVenn diagram. Non-convex functions. Non-convex functions local maxima Global minimum Local minima. Non-convex functions Global minimum ... Global minimum Local

● Matrix completion problem [De Sa et al. 2015]

Global non-convex optimization

● Applications (usually for datasets with missing data)○ matrix completion ○ Image reconstruction○ recommendation systems.

Page 22: Non-convex optimizationVenn diagram. Non-convex functions. Non-convex functions local maxima Global minimum Local minima. Non-convex functions Global minimum ... Global minimum Local

● Matrix completion problem [De Sa et al. 2015]

● Reformulate it as (unconstrained non-convex problem)

● Gradient descent

Global non-convex optimization

Page 23: Non-convex optimizationVenn diagram. Non-convex functions. Non-convex functions local maxima Global minimum Local minima. Non-convex functions Global minimum ... Global minimum Local

● Reformulate it as (unconstrained non-convex problem)

● Gradient descent

● Using the Riemannian manifold, we can derive the following

● This widely-used algorithm converges globally, using only random initialization

Global non-convex optimization

Page 24: Non-convex optimizationVenn diagram. Non-convex functions. Non-convex functions local maxima Global minimum Local minima. Non-convex functions Global minimum ... Global minimum Local

Convex relaxation of non-convex functions optimization

● Convex Neural Networks [Bengio et al. 2006]○ Single-hidden layer network

original neural networks non-convex problem

Page 25: Non-convex optimizationVenn diagram. Non-convex functions. Non-convex functions local maxima Global minimum Local minima. Non-convex functions Global minimum ... Global minimum Local

Convex relaxation of non-convex functions optimization

● Convex Neural Networks [Bengio et al. 2006]○ Single-hidden layer network

○ Use alternating minimization

○ Potential issues with the activation function

Page 26: Non-convex optimizationVenn diagram. Non-convex functions. Non-convex functions local maxima Global minimum Local minima. Non-convex functions Global minimum ... Global minimum Local

Bayesian optimization (global non-convex optimization)

● Typically used for finding optimal parameters ○ Determining the step size of # hidden layers for neural networks○ The parameter values are bounded ?

● Other methods include sampling the parameter values random uniformly○ Grid-search

Page 27: Non-convex optimizationVenn diagram. Non-convex functions. Non-convex functions local maxima Global minimum Local minima. Non-convex functions Global minimum ... Global minimum Local

Bayesian optimization (global non-convex optimization)

● Fit Gaussian process on the observed data (purple shade)○ Probability distribution on the function values

Page 28: Non-convex optimizationVenn diagram. Non-convex functions. Non-convex functions local maxima Global minimum Local minima. Non-convex functions Global minimum ... Global minimum Local

Bayesian optimization (global non-convex optimization)

● Fit Gaussian process on the observed data (purple shade)○ Probability distribution on the function values

● Acquisition function (green shade)○ a function of

■ the objective value (exploitation) in the Gaussian density function; and■ the uncertainty in the prediction value (exploration).

Page 29: Non-convex optimizationVenn diagram. Non-convex functions. Non-convex functions local maxima Global minimum Local minima. Non-convex functions Global minimum ... Global minimum Local

Bayesian optimization

● Slower than grid-search with low level of smoothness (illustrate)● Faster than grid-search with high level of smoothness (illustrate)

Page 30: Non-convex optimizationVenn diagram. Non-convex functions. Non-convex functions local maxima Global minimum Local minima. Non-convex functions Global minimum ... Global minimum Local

Bayesian optimization

● Slower than grid-search with low level of smoothness (illustrate)● Faster than grid-search with high level of smoothness (illustrate)

Grid-search Bayesian optimization

Page 31: Non-convex optimizationVenn diagram. Non-convex functions. Non-convex functions local maxima Global minimum Local minima. Non-convex functions Global minimum ... Global minimum Local

Bayesian optimization

● Slower than grid-search with low level of smoothness (illustrate)● Faster than grid-search with high level of smoothness (illustrate)

Grid-search Bayesian optimization

A measure of smoothness

Page 32: Non-convex optimizationVenn diagram. Non-convex functions. Non-convex functions local maxima Global minimum Local minima. Non-convex functions Global minimum ... Global minimum Local

Summary

Non-convex optimization

● Strategy 1: Local non-convex optimization○ Convexity convergence rates apply

○ Escape saddle points using, for example, cubic regularization and saddle-free newton

update

● Strategy 2: Relaxing the non-convex problem to a convex problem○ Convex neural networks

● Strategy 3: Global non-convex optimization○ Bayesian optimization○ Matrix completion