LijunZhang [email protected]

77
Convex Optimization Lijun Zhang [email protected] http://cs.nju.edu.cn/zlj Modification of http://stanford.edu/~boyd/cvxbook/bv_cvxslides.pdf

Transcript of LijunZhang [email protected]

Page 1: LijunZhang zlj@nju.edu.cn

Convex Optimization

Lijun [email protected]://cs.nju.edu.cn/zlj

Modification of http://stanford.edu/~boyd/cvxbook/bv_cvxslides.pdf

Page 2: LijunZhang zlj@nju.edu.cn

Outline

Introduction

Convex Sets & Functions

Convex Optimization Problems

Duality

Convex Optimization Methods

Summary

Page 3: LijunZhang zlj@nju.edu.cn

Mathematical Optimization

Optimization Problem

Page 4: LijunZhang zlj@nju.edu.cn

Applications

Dimensionality Reduction (PCA)

Clustering (NMF)

Classification (SVM)

max∈s. t. 1

min∈ , ∈

min∈ , ∈

s. t. 0, 0

Page 5: LijunZhang zlj@nju.edu.cn

Least-squares

The Problem

Given , predict by Properties

Page 6: LijunZhang zlj@nju.edu.cn

Linear Programming

The Problem

Properties

Page 7: LijunZhang zlj@nju.edu.cn

Convex Optimization Problem

The Problem

Conditions

Page 8: LijunZhang zlj@nju.edu.cn

Convex Optimization Problem

The Problem

Properties

Page 9: LijunZhang zlj@nju.edu.cn

Nonlinear Optimization Definition

The objective or constraint functions are not linear Could be convex or nonconvex

Page 10: LijunZhang zlj@nju.edu.cn

Outline

Introduction

Convex Sets & Functions

Convex Optimization Problems

Duality

Convex Optimization Methods

Summary

Page 11: LijunZhang zlj@nju.edu.cn

Affine Set

Page 12: LijunZhang zlj@nju.edu.cn

Convex Set

Page 13: LijunZhang zlj@nju.edu.cn

Convex Cone

Page 14: LijunZhang zlj@nju.edu.cn

Some Examples (1)

Page 15: LijunZhang zlj@nju.edu.cn

Some Examples (2)

Page 16: LijunZhang zlj@nju.edu.cn

Some Examples (3)

Page 17: LijunZhang zlj@nju.edu.cn

Operations that Preserve Convexity

Page 18: LijunZhang zlj@nju.edu.cn

Convex Functions

Page 19: LijunZhang zlj@nju.edu.cn

Examples on

Page 20: LijunZhang zlj@nju.edu.cn

Examples on and

Page 21: LijunZhang zlj@nju.edu.cn

Restriction of a Convex Function to a Line

Page 22: LijunZhang zlj@nju.edu.cn

First-order Conditions

Page 23: LijunZhang zlj@nju.edu.cn

Second-order Conditions

Page 24: LijunZhang zlj@nju.edu.cn

Examples

Page 25: LijunZhang zlj@nju.edu.cn

Operations that Preserve Convexity

Page 26: LijunZhang zlj@nju.edu.cn

Positive Weighted Sum & Composition with Affine Function

Page 27: LijunZhang zlj@nju.edu.cn

Pointwise Maximum

Hinge loss: ℓ max 0,1

Page 28: LijunZhang zlj@nju.edu.cn

The Conjugate Function

Page 29: LijunZhang zlj@nju.edu.cn

Examples

Page 30: LijunZhang zlj@nju.edu.cn

Outline

Introduction

Convex Sets & Functions

Convex Optimization Problems

Duality

Convex Optimization Methods

Summary

Page 31: LijunZhang zlj@nju.edu.cn

Optimization Problem in Standard Form

Page 32: LijunZhang zlj@nju.edu.cn

Optimal and Locally Optimal Points

Page 33: LijunZhang zlj@nju.edu.cn

Implicit Constraints

Page 34: LijunZhang zlj@nju.edu.cn

Convex Optimization Problem

Page 35: LijunZhang zlj@nju.edu.cn

Example

Page 36: LijunZhang zlj@nju.edu.cn

Local and Global Optima

Page 37: LijunZhang zlj@nju.edu.cn

Optimality Criterion for Differentiable

Page 38: LijunZhang zlj@nju.edu.cn

Examples

Page 39: LijunZhang zlj@nju.edu.cn

Popular Convex Problems

Linear Program (LP) Linear-fractional Program Quadratic Program (QP) Quadratically Constrained Quadratic

program (QCQP) Second-order Cone Programming

(SOCP) Geometric Programming (GP) Semidefinite Program (SDP)

Page 40: LijunZhang zlj@nju.edu.cn

Outline

Introduction

Convex Sets & Functions

Convex Optimization Problems

Duality

Convex Optimization Methods

Summary

Page 41: LijunZhang zlj@nju.edu.cn

Lagrangian

Page 42: LijunZhang zlj@nju.edu.cn

Lagrangian

Page 43: LijunZhang zlj@nju.edu.cn

Lagrange Dual Function

Page 44: LijunZhang zlj@nju.edu.cn

Lagrange Dual Function

Page 45: LijunZhang zlj@nju.edu.cn

Least-norm Solution of Linear Equations

Page 46: LijunZhang zlj@nju.edu.cn

Lagrange Dual and Conjugate Function

Page 47: LijunZhang zlj@nju.edu.cn

The Dual Problem

Page 48: LijunZhang zlj@nju.edu.cn

Weak and Strong Duality

Page 49: LijunZhang zlj@nju.edu.cn

Weak and Strong Duality

Page 50: LijunZhang zlj@nju.edu.cn

Slater’s Constraint Qualification

Page 51: LijunZhang zlj@nju.edu.cn

Complementary Slackness

Page 52: LijunZhang zlj@nju.edu.cn

Karush-Kuhn-Tucker (KKT) Conditions

Page 53: LijunZhang zlj@nju.edu.cn

KKT Conditions for Convex Problem

Page 54: LijunZhang zlj@nju.edu.cn

An Example—SVM (1)

The Optimization Problem

Define the hinge loss as

Its Conjugate Function is

Page 55: LijunZhang zlj@nju.edu.cn

An Example—SVM (2)

The Optimization Problem becomes

It is Equivalent to

The Lagrangian is

Page 56: LijunZhang zlj@nju.edu.cn

An Example—SVM (3)

The Lagrange Dual Function is

Minimize one by one

Page 57: LijunZhang zlj@nju.edu.cn

An Example—SVM (4)

Finally, We Obtain

The Dual Problem is

Page 58: LijunZhang zlj@nju.edu.cn

An Example—SVM (5)

Karush-Kuhn-Tucker (KKT) Conditions

Page 59: LijunZhang zlj@nju.edu.cn

An Example—SVM (5)

Karush-Kuhn-Tucker (KKT) Conditions

Can be used to recover ∗ from ∗

Page 60: LijunZhang zlj@nju.edu.cn

An Example—SVM (5)

Karush-Kuhn-Tucker (KKT) Conditions

Can be used to recover ∗ from ∗

Page 61: LijunZhang zlj@nju.edu.cn

Outline

Introduction

Convex Sets & Functions

Convex Optimization Problems

Duality

Convex Optimization Methods

Summary

Page 62: LijunZhang zlj@nju.edu.cn

More Assumptions

Lipschitz continuous

Strong Convexity

Smooth1 1 1 2

,

,

Page 63: LijunZhang zlj@nju.edu.cn

Performance Measure

The Problem

Convergence Rate

Iteration Complexity

Page 64: LijunZhang zlj@nju.edu.cn

Gradient-based Methods

The Convergence Rate

GD—Gradient Descent AGD—Nesterov’s Accelerated Gradient

Descent [Nesterov, 2005, Nesterov, 2007, Tseng, 2008]

EGD—Epoch Gradient Descent [Hazan and Kale, 2011]

SGD—SGD with -suffix Averaging [Rakhlin

et al., 2012]

Page 65: LijunZhang zlj@nju.edu.cn

Gradient Descent (1)

Move along the opposite direction of gradients

Page 66: LijunZhang zlj@nju.edu.cn

Gradient Descent (2)

Gradient Descent with Projection

Projection Operator

Page 67: LijunZhang zlj@nju.edu.cn

Analysis (1)

Page 68: LijunZhang zlj@nju.edu.cn

Analysis (2)

Page 69: LijunZhang zlj@nju.edu.cn

Analysis (3)

Page 70: LijunZhang zlj@nju.edu.cn

A Key Step (1)

Evaluate the Gradient or Subgradient Logit loss

Page 71: LijunZhang zlj@nju.edu.cn

A Key Step (1)

Evaluate the Gradient or Subgradient Logit loss

Hinge loss

Page 72: LijunZhang zlj@nju.edu.cn

A Key Step (2)

Evaluate the Gradient or Subgradient Logit loss

Hinge loss

Page 73: LijunZhang zlj@nju.edu.cn

A Key Step (3)

Evaluate the Gradient or Subgradient Logit loss

Hinge loss

Page 74: LijunZhang zlj@nju.edu.cn

Outline

Introduction

Convex Sets & Functions

Convex Optimization Problems

Duality

Convex Optimization Methods

Summary

Page 75: LijunZhang zlj@nju.edu.cn

Summary

Convex Sets & Functions Definitions, Operations that Preserve

Convexity Convex Optimization Problems Definitions, Optimality Criterion

Duality Lagrange, Dual Problem, KKT Conditions

Convex Optimization Methods Gradient-based Methods

Page 76: LijunZhang zlj@nju.edu.cn

Reference (1) Hazan, E. and Kale, S. (2011)

Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization. In Proceedings of the 24th Annual Conference on Learning Theory, pages 421–436.

Nesterov, Y. (2005)Smooth minimization of non-smooth functions. Mathematical Programming, 103(1):127–152.

Nesterov, Y. (2007).Gradient methods for minimizing composite objective function. Core discussion papers.

Page 77: LijunZhang zlj@nju.edu.cn

Reference (2) Tseng, P. (2008).

On acclerated proximal gradient methods for convex-concave optimization. Technical report, University of Washington.

Boyd, S. and Vandenberghe, L. (2004).Convex Optimization. Cambridge University Press.

Rakhlin, A., Shamir, O., and Sridharan, K. (2012)Making gradient descent optimal for strongly convex stochastic optimization. In Proceedings of the 29th International Conference on Machine Learning, pages 449–456.