Introduction of Java Agent Development Environment (JADE) 余萍 [email protected].
LijunZhang [email protected]
Transcript of LijunZhang [email protected]
![Page 1: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/1.jpg)
Convex Optimization
Lijun [email protected]://cs.nju.edu.cn/zlj
Modification of http://stanford.edu/~boyd/cvxbook/bv_cvxslides.pdf
![Page 2: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/2.jpg)
Outline
Introduction
Convex Sets & Functions
Convex Optimization Problems
Duality
Convex Optimization Methods
Summary
![Page 3: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/3.jpg)
Mathematical Optimization
Optimization Problem
![Page 4: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/4.jpg)
Applications
Dimensionality Reduction (PCA)
Clustering (NMF)
Classification (SVM)
max∈s. t. 1
min∈ , ∈
min∈ , ∈
s. t. 0, 0
![Page 5: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/5.jpg)
Least-squares
The Problem
Given , predict by Properties
![Page 6: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/6.jpg)
Linear Programming
The Problem
Properties
![Page 7: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/7.jpg)
Convex Optimization Problem
The Problem
Conditions
![Page 8: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/8.jpg)
Convex Optimization Problem
The Problem
Properties
![Page 9: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/9.jpg)
Nonlinear Optimization Definition
The objective or constraint functions are not linear Could be convex or nonconvex
![Page 10: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/10.jpg)
Outline
Introduction
Convex Sets & Functions
Convex Optimization Problems
Duality
Convex Optimization Methods
Summary
![Page 11: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/11.jpg)
Affine Set
![Page 12: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/12.jpg)
Convex Set
![Page 13: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/13.jpg)
Convex Cone
![Page 14: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/14.jpg)
Some Examples (1)
![Page 15: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/15.jpg)
Some Examples (2)
![Page 16: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/16.jpg)
Some Examples (3)
![Page 17: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/17.jpg)
Operations that Preserve Convexity
![Page 18: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/18.jpg)
Convex Functions
![Page 19: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/19.jpg)
Examples on
![Page 20: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/20.jpg)
Examples on and
![Page 21: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/21.jpg)
Restriction of a Convex Function to a Line
![Page 22: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/22.jpg)
First-order Conditions
![Page 23: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/23.jpg)
Second-order Conditions
![Page 24: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/24.jpg)
Examples
![Page 25: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/25.jpg)
Operations that Preserve Convexity
![Page 26: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/26.jpg)
Positive Weighted Sum & Composition with Affine Function
![Page 27: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/27.jpg)
Pointwise Maximum
Hinge loss: ℓ max 0,1
![Page 28: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/28.jpg)
The Conjugate Function
![Page 29: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/29.jpg)
Examples
![Page 30: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/30.jpg)
Outline
Introduction
Convex Sets & Functions
Convex Optimization Problems
Duality
Convex Optimization Methods
Summary
![Page 31: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/31.jpg)
Optimization Problem in Standard Form
![Page 32: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/32.jpg)
Optimal and Locally Optimal Points
![Page 33: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/33.jpg)
Implicit Constraints
![Page 34: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/34.jpg)
Convex Optimization Problem
![Page 35: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/35.jpg)
Example
![Page 36: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/36.jpg)
Local and Global Optima
![Page 37: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/37.jpg)
Optimality Criterion for Differentiable
![Page 38: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/38.jpg)
Examples
![Page 39: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/39.jpg)
Popular Convex Problems
Linear Program (LP) Linear-fractional Program Quadratic Program (QP) Quadratically Constrained Quadratic
program (QCQP) Second-order Cone Programming
(SOCP) Geometric Programming (GP) Semidefinite Program (SDP)
![Page 40: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/40.jpg)
Outline
Introduction
Convex Sets & Functions
Convex Optimization Problems
Duality
Convex Optimization Methods
Summary
![Page 41: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/41.jpg)
Lagrangian
![Page 42: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/42.jpg)
Lagrangian
![Page 43: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/43.jpg)
Lagrange Dual Function
![Page 44: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/44.jpg)
Lagrange Dual Function
![Page 45: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/45.jpg)
Least-norm Solution of Linear Equations
![Page 46: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/46.jpg)
Lagrange Dual and Conjugate Function
![Page 47: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/47.jpg)
The Dual Problem
![Page 48: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/48.jpg)
Weak and Strong Duality
![Page 49: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/49.jpg)
Weak and Strong Duality
![Page 50: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/50.jpg)
Slater’s Constraint Qualification
![Page 51: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/51.jpg)
Complementary Slackness
![Page 52: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/52.jpg)
Karush-Kuhn-Tucker (KKT) Conditions
![Page 53: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/53.jpg)
KKT Conditions for Convex Problem
![Page 54: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/54.jpg)
An Example—SVM (1)
The Optimization Problem
Define the hinge loss as
Its Conjugate Function is
![Page 55: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/55.jpg)
An Example—SVM (2)
The Optimization Problem becomes
It is Equivalent to
The Lagrangian is
![Page 56: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/56.jpg)
An Example—SVM (3)
The Lagrange Dual Function is
Minimize one by one
![Page 57: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/57.jpg)
An Example—SVM (4)
Finally, We Obtain
The Dual Problem is
![Page 58: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/58.jpg)
An Example—SVM (5)
Karush-Kuhn-Tucker (KKT) Conditions
![Page 59: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/59.jpg)
An Example—SVM (5)
Karush-Kuhn-Tucker (KKT) Conditions
Can be used to recover ∗ from ∗
![Page 60: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/60.jpg)
An Example—SVM (5)
Karush-Kuhn-Tucker (KKT) Conditions
Can be used to recover ∗ from ∗
![Page 61: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/61.jpg)
Outline
Introduction
Convex Sets & Functions
Convex Optimization Problems
Duality
Convex Optimization Methods
Summary
![Page 62: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/62.jpg)
More Assumptions
Lipschitz continuous
Strong Convexity
Smooth1 1 1 2
,
,
![Page 63: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/63.jpg)
Performance Measure
The Problem
Convergence Rate
Iteration Complexity
![Page 64: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/64.jpg)
Gradient-based Methods
The Convergence Rate
GD—Gradient Descent AGD—Nesterov’s Accelerated Gradient
Descent [Nesterov, 2005, Nesterov, 2007, Tseng, 2008]
EGD—Epoch Gradient Descent [Hazan and Kale, 2011]
SGD—SGD with -suffix Averaging [Rakhlin
et al., 2012]
![Page 65: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/65.jpg)
Gradient Descent (1)
Move along the opposite direction of gradients
![Page 66: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/66.jpg)
Gradient Descent (2)
Gradient Descent with Projection
Projection Operator
![Page 67: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/67.jpg)
Analysis (1)
![Page 68: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/68.jpg)
Analysis (2)
![Page 69: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/69.jpg)
Analysis (3)
![Page 70: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/70.jpg)
A Key Step (1)
Evaluate the Gradient or Subgradient Logit loss
![Page 71: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/71.jpg)
A Key Step (1)
Evaluate the Gradient or Subgradient Logit loss
Hinge loss
![Page 72: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/72.jpg)
A Key Step (2)
Evaluate the Gradient or Subgradient Logit loss
Hinge loss
![Page 73: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/73.jpg)
A Key Step (3)
Evaluate the Gradient or Subgradient Logit loss
Hinge loss
![Page 74: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/74.jpg)
Outline
Introduction
Convex Sets & Functions
Convex Optimization Problems
Duality
Convex Optimization Methods
Summary
![Page 75: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/75.jpg)
Summary
Convex Sets & Functions Definitions, Operations that Preserve
Convexity Convex Optimization Problems Definitions, Optimality Criterion
Duality Lagrange, Dual Problem, KKT Conditions
Convex Optimization Methods Gradient-based Methods
![Page 76: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/76.jpg)
Reference (1) Hazan, E. and Kale, S. (2011)
Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization. In Proceedings of the 24th Annual Conference on Learning Theory, pages 421–436.
Nesterov, Y. (2005)Smooth minimization of non-smooth functions. Mathematical Programming, 103(1):127–152.
Nesterov, Y. (2007).Gradient methods for minimizing composite objective function. Core discussion papers.
![Page 77: LijunZhang zlj@nju.edu.cn](https://reader030.fdocuments.in/reader030/viewer/2022012512/618ab182d71ad775610cee90/html5/thumbnails/77.jpg)
Reference (2) Tseng, P. (2008).
On acclerated proximal gradient methods for convex-concave optimization. Technical report, University of Washington.
Boyd, S. and Vandenberghe, L. (2004).Convex Optimization. Cambridge University Press.
Rakhlin, A., Shamir, O., and Sridharan, K. (2012)Making gradient descent optimal for strongly convex stochastic optimization. In Proceedings of the 29th International Conference on Machine Learning, pages 449–456.