CS 59000 Statistical Machine learning Lecture 18

CS 59000 Statistical Machine learningLecture 18

Yuan (Alan) QiPurdue CS

Oct. 30 2008

Outline

• Review of Support Vector Machines for Linearly Separable Case

• Support Vector Machines for Overlapping Class Distributions

• Support Vector Machines for Regression

Support Vector Machines

Support Vector Machines: motivated by statistical learning theory.

Maximum margin classifiers

Margin: the smallest distance between the decision boundary and any of the samples

Maximizing Margin

Since scaling w and b together will not change the above ratio, we set

In the case of data points for which the equality holds, the constraints are said to be active, whereas for the remainder they are said to be inactive.

Optimization Problem

Quadratic programming:

Subject to

Lagrange Multiplier

Maximize

Subject to

Gradient of constraint:

Geometrical Illustration of Lagrange Multiplier

Lagrange Multiplier with Inequality Constraints

Karush-Kuhn-Tucker (KKT) condition

Lagrange Function for SVM

Quadratic programming:Subject to

Lagrange function:

Dual Variables

Setting derivatives over L to zero:

Dual Problem

Prediction

KKT Condition, Support Vectors, and Bias

The corresponding data points in the latter case are known as support vectors. Then we can solve the bias term as follows:

Computational Complexity

Quadratic programming:

When Dimension < Number of data points, Solving the Dual problem is more costly.

Dual representation allows the use of kernels

Example: SVM Classification

Classification for Overlapping Classes

Soft Margin:

New Cost Function

To maximize margin and softly penalize points that lies on the wrong side of margin (not decision) boundary, we minimize

Lagrange Function

Where we have Lagrange multipliers:

KKT Condition

Gradients

Dual Lagrangian

Since and , we have

Dual Lagrangian with Constraints

Maximize

Subject to

Support Vectors

Discussions on two cases of support vectors.

Solve Bias Term

Discussion on solving SVMs...

Interpretation from Regularization Framework

Regularized Logistic Regression

For logistic regression, we have

Visualization of Hinge Error Function

SVM for Regression

Using sum of square errors, we have

However, the solution for ridge regression is not sparse.

Є-insensitive Error Function

Minimize

Slack Variables

How many slack variables do we need?

Minimize

Visualization of SVM Regression

Support Vectors for Regression

Which points will be support vectors for regression?

Why?

Sparsity Revisited

Discussion: Error function or regularizer (Lasso)

CS 59000 Statistical Machine learning Lecture 18

Documents

Transcript of CS 59000 Statistical Machine learning Lecture 18