CS 59000 Statistical Machine learning Lecture 18

34
CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008

description

CS 59000 Statistical Machine learning Lecture 18. Yuan (Alan) Qi Purdue CS Oct. 30 2008. Outline. Review of Support Vector Machines for Linearly Separable Case Support Vector Machines for Overlapping Class Distributions Support Vector Machines for Regression. Support Vector Machines. - PowerPoint PPT Presentation

Transcript of CS 59000 Statistical Machine learning Lecture 18

Page 1: CS 59000 Statistical Machine learning Lecture 18

CS 59000 Statistical Machine learningLecture 18

Yuan (Alan) QiPurdue CS

Oct. 30 2008

Page 2: CS 59000 Statistical Machine learning Lecture 18

Outline

• Review of Support Vector Machines for Linearly Separable Case

• Support Vector Machines for Overlapping Class Distributions

• Support Vector Machines for Regression

Page 3: CS 59000 Statistical Machine learning Lecture 18

Support Vector Machines

Support Vector Machines: motivated by statistical learning theory.

Maximum margin classifiers

Margin: the smallest distance between the decision boundary and any of the samples

Page 4: CS 59000 Statistical Machine learning Lecture 18

Maximizing Margin

Since scaling w and b together will not change the above ratio, we set

In the case of data points for which the equality holds, the constraints are said to be active, whereas for the remainder they are said to be inactive.

Page 5: CS 59000 Statistical Machine learning Lecture 18

Optimization Problem

Quadratic programming:

Subject to

Page 6: CS 59000 Statistical Machine learning Lecture 18

Lagrange Multiplier

Maximize

Subject to

Gradient of constraint:

Page 7: CS 59000 Statistical Machine learning Lecture 18

Geometrical Illustration of Lagrange Multiplier

Page 8: CS 59000 Statistical Machine learning Lecture 18

Lagrange Multiplier with Inequality Constraints

Page 9: CS 59000 Statistical Machine learning Lecture 18

Karush-Kuhn-Tucker (KKT) condition

Page 10: CS 59000 Statistical Machine learning Lecture 18

Lagrange Function for SVM

Quadratic programming:Subject to

Lagrange function:

Page 11: CS 59000 Statistical Machine learning Lecture 18

Dual Variables

Setting derivatives over L to zero:

Page 12: CS 59000 Statistical Machine learning Lecture 18

Dual Problem

Page 13: CS 59000 Statistical Machine learning Lecture 18

Prediction

Page 14: CS 59000 Statistical Machine learning Lecture 18

KKT Condition, Support Vectors, and Bias

The corresponding data points in the latter case are known as support vectors. Then we can solve the bias term as follows:

Page 15: CS 59000 Statistical Machine learning Lecture 18

Computational Complexity

Quadratic programming:

When Dimension < Number of data points, Solving the Dual problem is more costly.

Dual representation allows the use of kernels

Page 16: CS 59000 Statistical Machine learning Lecture 18

Example: SVM Classification

Page 17: CS 59000 Statistical Machine learning Lecture 18

Classification for Overlapping Classes

Soft Margin:

Page 18: CS 59000 Statistical Machine learning Lecture 18

New Cost Function

To maximize margin and softly penalize points that lies on the wrong side of margin (not decision) boundary, we minimize

Page 19: CS 59000 Statistical Machine learning Lecture 18

Lagrange Function

Where we have Lagrange multipliers:

Page 20: CS 59000 Statistical Machine learning Lecture 18

KKT Condition

Page 21: CS 59000 Statistical Machine learning Lecture 18

Gradients

Page 22: CS 59000 Statistical Machine learning Lecture 18

Dual Lagrangian

Since and , we have

Page 23: CS 59000 Statistical Machine learning Lecture 18

Dual Lagrangian with Constraints

Maximize

Subject to

Page 24: CS 59000 Statistical Machine learning Lecture 18

Support Vectors

Discussions on two cases of support vectors.

Page 25: CS 59000 Statistical Machine learning Lecture 18

Solve Bias Term

Discussion on solving SVMs...

Page 26: CS 59000 Statistical Machine learning Lecture 18

Interpretation from Regularization Framework

Page 27: CS 59000 Statistical Machine learning Lecture 18

Regularized Logistic Regression

For logistic regression, we have

Page 28: CS 59000 Statistical Machine learning Lecture 18

Visualization of Hinge Error Function

Page 29: CS 59000 Statistical Machine learning Lecture 18

SVM for Regression

Using sum of square errors, we have

However, the solution for ridge regression is not sparse.

Page 30: CS 59000 Statistical Machine learning Lecture 18

Є-insensitive Error Function

Minimize

Page 31: CS 59000 Statistical Machine learning Lecture 18

Slack Variables

How many slack variables do we need?

Minimize

Page 32: CS 59000 Statistical Machine learning Lecture 18

Visualization of SVM Regression

Page 33: CS 59000 Statistical Machine learning Lecture 18

Support Vectors for Regression

Which points will be support vectors for regression?

Why?

Page 34: CS 59000 Statistical Machine learning Lecture 18

Sparsity Revisited

Discussion: Error function or regularizer (Lasso)