Support Vector Machines and their Applications · Support Vector Machines and their Applications...

73
Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of Technology Kanpur. Summer School on “Expert Systems And Their Applications”, Indian Institute of Information Technology Allahabad. June 14, 2009 1 / 24 Support Vector Machines and their Applications

Transcript of Support Vector Machines and their Applications · Support Vector Machines and their Applications...

Page 1: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

Support Vector Machines and their Applications

Purushottam Kar

Department of Computer Science and Engineering,Indian Institute of Technology Kanpur.

Summer School on “Expert Systems And Their Applications”,Indian Institute of Information Technology Allahabad.

June 14, 2009

1 / 24

Support Vector Machines and their Applications

Page 2: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

Support Vector Machines

What is being “supported” ?

How can vectors support anything ?

Wait !! Machines ?? - Is this a Mechanical Engineering Lecture ?

2 / 24

Support Vector Machines and their Applications

Page 3: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

Support Vector Machines

What is being “supported” ?

How can vectors support anything ?

Wait !! Machines ?? - Is this a Mechanical Engineering Lecture ?

2 / 24

Support Vector Machines and their Applications

Page 4: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

Support Vector Machines

What is being “supported” ?

How can vectors support anything ?

Wait !! Machines ?? - Is this a Mechanical Engineering Lecture ?

2 / 24

Support Vector Machines and their Applications

Page 5: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

The Learning Methodology

Is it possible to write an algorithm to distinguish between ...

a well-formed and an ill-formed C++ program ?

a palindrome and a non-palindrome ?

a graph with and without cliques of size bigger than 1000 ?

3 / 24

Support Vector Machines and their Applications

Page 6: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

The Learning Methodology

Is it possible to write an algorithm to distinguish between ...

a well-formed and an ill-formed C++ program ?

a palindrome and a non-palindrome ?

a graph with and without cliques of size bigger than 1000 ?

3 / 24

Support Vector Machines and their Applications

Page 7: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

The Learning Methodology

Is it possible to write an algorithm to distinguish between ...

a well-formed and an ill-formed C++ program ?

a palindrome and a non-palindrome ?

a graph with and without cliques of size bigger than 1000 ?

3 / 24

Support Vector Machines and their Applications

Page 8: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

The Learning Methodology

Is it possible to write an algorithm to distinguish between ...

a well-formed and an ill-formed C++ program ?

a palindrome and a non-palindrome ?

a graph with and without cliques of size bigger than 1000 ?

3 / 24

Support Vector Machines and their Applications

Page 9: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

The Learning Methodology

Is it possible to write an algorithm to distinguish between ...

a handwritten 4 and a handwritten 9 ?

a spam and a non-spam e-mail ?

a positive movie review and a negative movie review ?

4 / 24

Support Vector Machines and their Applications

Page 10: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

The Learning Methodology

Is it possible to write an algorithm to distinguish between ...

a handwritten 4 and a handwritten 9 ?

a spam and a non-spam e-mail ?

a positive movie review and a negative movie review ?

4 / 24

Support Vector Machines and their Applications

Page 11: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

The Learning Methodology

Is it possible to write an algorithm to distinguish between ...

a handwritten 4 and a handwritten 9 ?

a spam and a non-spam e-mail ?

a positive movie review and a negative movie review ?

4 / 24

Support Vector Machines and their Applications

Page 12: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

Statistical Machine Learning

“Synthesize” a program based on training data

Assume training data that is randomly generated from someunknown but fixed distribution and a target function

Give probabilistic error bounds

In other words be probably-approximately-correct

The motto - Let the data decide the algorithm

5 / 24

Support Vector Machines and their Applications

Page 13: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

Statistical Machine Learning

“Synthesize” a program based on training data

Assume training data that is randomly generated from someunknown but fixed distribution and a target function

Give probabilistic error bounds

In other words be probably-approximately-correct

The motto - Let the data decide the algorithm

5 / 24

Support Vector Machines and their Applications

Page 14: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

Statistical Machine Learning

“Synthesize” a program based on training data

Assume training data that is randomly generated from someunknown but fixed distribution and a target function

Give probabilistic error bounds

In other words be probably-approximately-correct

The motto - Let the data decide the algorithm

5 / 24

Support Vector Machines and their Applications

Page 15: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

Statistical Machine Learning

“Synthesize” a program based on training data

Assume training data that is randomly generated from someunknown but fixed distribution and a target function

Give probabilistic error bounds

In other words be probably-approximately-correct

The motto - Let the data decide the algorithm

5 / 24

Support Vector Machines and their Applications

Page 16: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

Statistical Machine Learning

“Synthesize” a program based on training data

Assume training data that is randomly generated from someunknown but fixed distribution and a target function

Give probabilistic error bounds

In other words be probably-approximately-correct

The motto - Let the data decide the algorithm

5 / 24

Support Vector Machines and their Applications

Page 17: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

Expert Systems

... a computing system capable of representing andreasoning about some knowledge rich domain, such asinternal medicine or geology ...

Introduction to Expert Systems, Peter Jackson, Addison Wesley PublishingCompany, 1986.

6 / 24

Support Vector Machines and their Applications

Page 18: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

Linear Machines

Arguably the simplest of classifiers acting onvectoral data

Numerous Learning Algorithms - Perceptron,SVM

7 / 24

Support Vector Machines and their Applications

Page 19: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

Linear Machines

Arguably the simplest of classifiers acting onvectoral data

Numerous Learning Algorithms - Perceptron,SVM

7 / 24

Support Vector Machines and their Applications

Page 20: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

Linear Machines

Arguably the simplest of classifiers acting onvectoral data

Numerous Learning Algorithms - Perceptron,SVM

7 / 24

Support Vector Machines and their Applications

Page 21: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

Support Vector Machines

A “special” hyperplane - with the maximummargin

Margin of a point measures how far is it fromthe hyperplane

8 / 24

Support Vector Machines and their Applications

Page 22: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

Support Vector Machines

A “special” hyperplane - with the maximummargin

Margin of a point measures how far is it fromthe hyperplane

8 / 24

Support Vector Machines and their Applications

Page 23: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

Support Vector Machines

A “special” hyperplane - with the maximummargin

Margin of a point measures how far is it fromthe hyperplane

8 / 24

Support Vector Machines and their Applications

Page 24: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

Learning the Maximum Margin Classifier

minimize~w ,b ‖~w‖2subject to yi (〈~w · ~xi 〉+ b) ≥ 1,

i = 1, . . . , l .

A Linearly-constrained Quadratic program

Solvable in polynomial time - several algorithms known

Does not give us much insight into the nature of the hyperplane

9 / 24

Support Vector Machines and their Applications

Page 25: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

Learning the Maximum Margin Classifier

minimize~w ,b ‖~w‖2subject to yi (〈~w · ~xi 〉+ b) ≥ 1,

i = 1, . . . , l .

A Linearly-constrained Quadratic program

Solvable in polynomial time - several algorithms known

Does not give us much insight into the nature of the hyperplane

9 / 24

Support Vector Machines and their Applications

Page 26: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

Learning the Maximum Margin Classifier

minimize~w ,b ‖~w‖2subject to yi (〈~w · ~xi 〉+ b) ≥ 1,

i = 1, . . . , l .

A Linearly-constrained Quadratic program

Solvable in polynomial time - several algorithms known

Does not give us much insight into the nature of the hyperplane

9 / 24

Support Vector Machines and their Applications

Page 27: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

Non-linearly Separable Data

Use slack variables to allow points to lie onthe “wrong” side of the hyperplane

Can still be solved using a QCQP

10 / 24

Support Vector Machines and their Applications

Page 28: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

Non-linearly Separable Data

Use slack variables to allow points to lie onthe “wrong” side of the hyperplane

Can still be solved using a QCQP

10 / 24

Support Vector Machines and their Applications

Page 29: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

Non-linearly Separable Data

Use slack variables to allow points to lie onthe “wrong” side of the hyperplane

Can still be solved using a QCQP

10 / 24

Support Vector Machines and their Applications

Page 30: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

Learning the Soft Margin Classifier

minimize~w ,b ‖~w‖2 + Cl∑

i=1

ξ2i

subject to yi (〈~w · ~xi 〉+ b) ≥ 1− ξi ,i = 1, . . . , l .

Again a Linearly-constrained Quadratic program

More insight gained by looking at the dual program

11 / 24

Support Vector Machines and their Applications

Page 31: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

Learning the Soft Margin Classifier

minimize~w ,b ‖~w‖2 + Cl∑

i=1

ξ2i

subject to yi (〈~w · ~xi 〉+ b) ≥ 1− ξi ,i = 1, . . . , l .

Again a Linearly-constrained Quadratic program

More insight gained by looking at the dual program

11 / 24

Support Vector Machines and their Applications

Page 32: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

The Dual Program for the Hard Margin SVM

maximizel∑

i=1

αi −1

2

l∑i ,j=1

yiyjαiαj〈~xi · ~xj〉,

subject tol∑

i=1

yiαi = 0,

α1 ≥ 0, i = 1, . . . , l .

Some properties of the optimum (~w∗, b∗, ~α∗):

α∗i [yi (〈~w∗ · ~xi 〉+ b∗)− 1] = 0, i = 1, . . . , l

~w∗ =∑l

i=1 yiα∗i ~xi

f (~x , ~α∗, b∗) =∑l

i=1 yiα∗i 〈~xi · ~x〉+ b∗

12 / 24

Support Vector Machines and their Applications

Page 33: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

The Dual Program for the Hard Margin SVM

maximizel∑

i=1

αi −1

2

l∑i ,j=1

yiyjαiαj〈~xi · ~xj〉,

subject tol∑

i=1

yiαi = 0,

α1 ≥ 0, i = 1, . . . , l .

Some properties of the optimum (~w∗, b∗, ~α∗):

α∗i [yi (〈~w∗ · ~xi 〉+ b∗)− 1] = 0, i = 1, . . . , l

~w∗ =∑l

i=1 yiα∗i ~xi

f (~x , ~α∗, b∗) =∑l

i=1 yiα∗i 〈~xi · ~x〉+ b∗

12 / 24

Support Vector Machines and their Applications

Page 34: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

The Dual Program for the Hard Margin SVM

maximizel∑

i=1

αi −1

2

l∑i ,j=1

yiyjαiαj〈~xi · ~xj〉,

subject tol∑

i=1

yiαi = 0,

α1 ≥ 0, i = 1, . . . , l .

Some properties of the optimum (~w∗, b∗, ~α∗):

α∗i [yi (〈~w∗ · ~xi 〉+ b∗)− 1] = 0, i = 1, . . . , l

~w∗ =∑l

i=1 yiα∗i ~xi

f (~x , ~α∗, b∗) =∑l

i=1 yiα∗i 〈~xi · ~x〉+ b∗

12 / 24

Support Vector Machines and their Applications

Page 35: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

The Dual Program for the Hard Margin SVM

maximizel∑

i=1

αi −1

2

l∑i ,j=1

yiyjαiαj〈~xi · ~xj〉,

subject tol∑

i=1

yiαi = 0,

α1 ≥ 0, i = 1, . . . , l .

Some properties of the optimum (~w∗, b∗, ~α∗):

α∗i [yi (〈~w∗ · ~xi 〉+ b∗)− 1] = 0, i = 1, . . . , l

~w∗ =∑l

i=1 yiα∗i ~xi

f (~x , ~α∗, b∗) =∑l

i=1 yiα∗i 〈~xi · ~x〉+ b∗

12 / 24

Support Vector Machines and their Applications

Page 36: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

The Support

Consider each vector applying a force of αi on the hyperplane in thedirection yi

~w‖~w‖2 .

The conditions exactly correspond to the force and the torque onthe hyperplane being zero

Hence the vectors lying on the margin, for whom αi 6= 0, “support”the hyperplane.

13 / 24

Support Vector Machines and their Applications

Page 37: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

The Support

Consider each vector applying a force of αi on the hyperplane in thedirection yi

~w‖~w‖2 .

The conditions exactly correspond to the force and the torque onthe hyperplane being zero

Hence the vectors lying on the margin, for whom αi 6= 0, “support”the hyperplane.

13 / 24

Support Vector Machines and their Applications

Page 38: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

The Support

Consider each vector applying a force of αi on the hyperplane in thedirection yi

~w‖~w‖2 .

The conditions exactly correspond to the force and the torque onthe hyperplane being zero

Hence the vectors lying on the margin, for whom αi 6= 0, “support”the hyperplane.

13 / 24

Support Vector Machines and their Applications

Page 39: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

Luckiness and Generalization

Essentially the idea is to show that the probability of the trainingsample misleading us into learning an erroneous classifier is small

To do this model selection has to be done carefully

The classifier family should not be too powerful to preventoverfitting

14 / 24

Support Vector Machines and their Applications

Page 40: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

Luckiness and Generalization

Essentially the idea is to show that the probability of the trainingsample misleading us into learning an erroneous classifier is small

To do this model selection has to be done carefully

The classifier family should not be too powerful to preventoverfitting

14 / 24

Support Vector Machines and their Applications

Page 41: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

Luckiness and Generalization

Essentially the idea is to show that the probability of the trainingsample misleading us into learning an erroneous classifier is small

To do this model selection has to be done carefully

The classifier family should not be too powerful to preventoverfitting

14 / 24

Support Vector Machines and their Applications

Page 42: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

Bounds for Linear Classifiers

Theorem (Vapnik and Chervonenkis)

For any probability distribution on the input domain Rd , and anytarget function g, with probability no less than 1− δ, any linearhyperplane f classifying a randomly chosen training set of size lperfectly cannot disagree with the target function on more than εfraction of the input domain (with respect to the underlyingdistribution) where

ε =2

l

((d + 1) log

2el

d + 1+ log

2

δ

)

15 / 24

Support Vector Machines and their Applications

Page 43: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

Bounds for Large Margin Classifiers

Theorem (Vapnik)

For any probability distribution on the input domain - a ball ofradius R, and any target function g, with probability no less than1− δ, any linear hyperplane f classifying a randomly chosentraining set of size l perfectly with margin ≥ γ cannot disagreewith the target function on more than ε fraction of the inputdomain (with respect to the underlying distribution) where

ε = O

(1

l

(R2

γ2+ log

1

δ

))NOTE : Error bound Independent of the dimension !

16 / 24

Support Vector Machines and their Applications

Page 44: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

The XOR problem

The feature map Φ : (x , y) 7−→ (x2, y2,√

2xy)makes the problem linearly a separable one.

But do we need to perform the actual featuremap ?

No ! Since all that the SVM trainingalgorithm requires are values of 〈Φ(~xi ) ·Φ(~xj)〉

But 〈Φ(~xi ) · Φ(~xj)〉 = 〈~xi · ~xj〉2

17 / 24

Support Vector Machines and their Applications

Page 45: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

The XOR problem

The feature map Φ : (x , y) 7−→ (x2, y2,√

2xy)makes the problem linearly a separable one.

But do we need to perform the actual featuremap ?

No ! Since all that the SVM trainingalgorithm requires are values of 〈Φ(~xi ) ·Φ(~xj)〉

But 〈Φ(~xi ) · Φ(~xj)〉 = 〈~xi · ~xj〉2

17 / 24

Support Vector Machines and their Applications

Page 46: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

The XOR problem

The feature map Φ : (x , y) 7−→ (x2, y2,√

2xy)makes the problem linearly a separable one.

But do we need to perform the actual featuremap ?

No ! Since all that the SVM trainingalgorithm requires are values of 〈Φ(~xi ) ·Φ(~xj)〉

But 〈Φ(~xi ) · Φ(~xj)〉 = 〈~xi · ~xj〉2

17 / 24

Support Vector Machines and their Applications

Page 47: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

The XOR problem

The feature map Φ : (x , y) 7−→ (x2, y2,√

2xy)makes the problem linearly a separable one.

But do we need to perform the actual featuremap ?

No ! Since all that the SVM trainingalgorithm requires are values of 〈Φ(~xi ) ·Φ(~xj)〉

But 〈Φ(~xi ) · Φ(~xj)〉 = 〈~xi · ~xj〉2

17 / 24

Support Vector Machines and their Applications

Page 48: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

The XOR problem

The feature map Φ : (x , y) 7−→ (x2, y2,√

2xy)makes the problem linearly a separable one.

But do we need to perform the actual featuremap ?

No ! Since all that the SVM trainingalgorithm requires are values of 〈Φ(~xi ) ·Φ(~xj)〉

But 〈Φ(~xi ) · Φ(~xj)〉 = 〈~xi · ~xj〉2

17 / 24

Support Vector Machines and their Applications

Page 49: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

The Kernel Trick

Many algorithms admit the Kernel trick - SVM, SVM-regression,Kernel-PCA, Kernel-clustering, Perceptron

However kernel trick not used for the Perceptron algorithm - why ?

Note that not all kernels correspond to feature maps

18 / 24

Support Vector Machines and their Applications

Page 50: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

The Kernel Trick

Many algorithms admit the Kernel trick - SVM, SVM-regression,Kernel-PCA, Kernel-clustering, Perceptron

However kernel trick not used for the Perceptron algorithm - why ?

Note that not all kernels correspond to feature maps

18 / 24

Support Vector Machines and their Applications

Page 51: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

The Kernel Trick

Many algorithms admit the Kernel trick - SVM, SVM-regression,Kernel-PCA, Kernel-clustering, Perceptron

However kernel trick not used for the Perceptron algorithm - why ?

Note that not all kernels correspond to feature maps

18 / 24

Support Vector Machines and their Applications

Page 52: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

Implementations

Many techniques known - Ellipsoid method, Gradient descent,Sequential Minimal Optimization - the latter is tuned to the QPproblem

LIBSVM http://www.csie.ntu.edu.tw/~cjlin/libsvm/ -supports multi-class classification and regression

SVMlight http://svmlight.joachims.org/ - good scalability

General Convex Optimization Solvers - CVX, SeDuMi - compatiblewith Matlab c©

19 / 24

Support Vector Machines and their Applications

Page 53: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

Implementations

Many techniques known - Ellipsoid method, Gradient descent,Sequential Minimal Optimization - the latter is tuned to the QPproblem

LIBSVM http://www.csie.ntu.edu.tw/~cjlin/libsvm/ -supports multi-class classification and regression

SVMlight http://svmlight.joachims.org/ - good scalability

General Convex Optimization Solvers - CVX, SeDuMi - compatiblewith Matlab c©

19 / 24

Support Vector Machines and their Applications

Page 54: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

Implementations

Many techniques known - Ellipsoid method, Gradient descent,Sequential Minimal Optimization - the latter is tuned to the QPproblem

LIBSVM http://www.csie.ntu.edu.tw/~cjlin/libsvm/ -supports multi-class classification and regression

SVMlight http://svmlight.joachims.org/ - good scalability

General Convex Optimization Solvers - CVX, SeDuMi - compatiblewith Matlab c©

19 / 24

Support Vector Machines and their Applications

Page 55: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

Implementations

Many techniques known - Ellipsoid method, Gradient descent,Sequential Minimal Optimization - the latter is tuned to the QPproblem

LIBSVM http://www.csie.ntu.edu.tw/~cjlin/libsvm/ -supports multi-class classification and regression

SVMlight http://svmlight.joachims.org/ - good scalability

General Convex Optimization Solvers - CVX, SeDuMi - compatiblewith Matlab c©

19 / 24

Support Vector Machines and their Applications

Page 56: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

Handwritten Digit Recognition

Boser-Guyon-Vapnik First real-world application of SVMS

Two datasets USPS and NIST used for training and testing

Performance stable even across a range of kernel choices

Even simple polynomial kernels gave improvements (3.2% error)over MSE techniques like backpropagation or ridge-regression(12.7% error) on the USPS dataset.

20 / 24

Support Vector Machines and their Applications

Page 57: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

Handwritten Digit Recognition

Boser-Guyon-Vapnik First real-world application of SVMS

Two datasets USPS and NIST used for training and testing

Performance stable even across a range of kernel choices

Even simple polynomial kernels gave improvements (3.2% error)over MSE techniques like backpropagation or ridge-regression(12.7% error) on the USPS dataset.

20 / 24

Support Vector Machines and their Applications

Page 58: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

Handwritten Digit Recognition

Boser-Guyon-Vapnik First real-world application of SVMS

Two datasets USPS and NIST used for training and testing

Performance stable even across a range of kernel choices

Even simple polynomial kernels gave improvements (3.2% error)over MSE techniques like backpropagation or ridge-regression(12.7% error) on the USPS dataset.

20 / 24

Support Vector Machines and their Applications

Page 59: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

Text Categorization

Joachims - used insight from information retrieval research

Reuters-21578 - a collection of labeled news stories used.

Native kernel used.

Performance measured using Precision-Recall break-even point

Aggregate performance of Bayes, k-NN, C4.5 around or less than80% whereas performance of even native kernel > 84%

Use of RBF kernels improved performance to > 86%

21 / 24

Support Vector Machines and their Applications

Page 60: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

Text Categorization

Joachims - used insight from information retrieval research

Reuters-21578 - a collection of labeled news stories used.

Native kernel used.

Performance measured using Precision-Recall break-even point

Aggregate performance of Bayes, k-NN, C4.5 around or less than80% whereas performance of even native kernel > 84%

Use of RBF kernels improved performance to > 86%

21 / 24

Support Vector Machines and their Applications

Page 61: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

Text Categorization

Joachims - used insight from information retrieval research

Reuters-21578 - a collection of labeled news stories used.

Native kernel used.

Performance measured using Precision-Recall break-even point

Aggregate performance of Bayes, k-NN, C4.5 around or less than80% whereas performance of even native kernel > 84%

Use of RBF kernels improved performance to > 86%

21 / 24

Support Vector Machines and their Applications

Page 62: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

Text Categorization

Joachims - used insight from information retrieval research

Reuters-21578 - a collection of labeled news stories used.

Native kernel used.

Performance measured using Precision-Recall break-even point

Aggregate performance of Bayes, k-NN, C4.5 around or less than80% whereas performance of even native kernel > 84%

Use of RBF kernels improved performance to > 86%

21 / 24

Support Vector Machines and their Applications

Page 63: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

Text Categorization

Joachims - used insight from information retrieval research

Reuters-21578 - a collection of labeled news stories used.

Native kernel used.

Performance measured using Precision-Recall break-even point

Aggregate performance of Bayes, k-NN, C4.5 around or less than80% whereas performance of even native kernel > 84%

Use of RBF kernels improved performance to > 86%

21 / 24

Support Vector Machines and their Applications

Page 64: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

Text Categorization

Joachims - used insight from information retrieval research

Reuters-21578 - a collection of labeled news stories used.

Native kernel used.

Performance measured using Precision-Recall break-even point

Aggregate performance of Bayes, k-NN, C4.5 around or less than80% whereas performance of even native kernel > 84%

Use of RBF kernels improved performance to > 86%

21 / 24

Support Vector Machines and their Applications

Page 65: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

Image based Gender Identification

Varma-Babu - application Kernel Learning

Learn the kernel as a linear combination of base kernels

Performance gains of 5-10% observed over other kernel basedlearning techniques

22 / 24

Support Vector Machines and their Applications

Page 66: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

Image based Gender Identification

Varma-Babu - application Kernel Learning

Learn the kernel as a linear combination of base kernels

Performance gains of 5-10% observed over other kernel basedlearning techniques

22 / 24

Support Vector Machines and their Applications

Page 67: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

Image based Gender Identification

Varma-Babu - application Kernel Learning

Learn the kernel as a linear combination of base kernels

Performance gains of 5-10% observed over other kernel basedlearning techniques

22 / 24

Support Vector Machines and their Applications

Page 68: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

Topic Drift in Page-ranking Algorithms

Karnick-Saradhi - application of multi-class Suport vector datadescription

Use SVDD to obtain representative pages for a topic and pruneirrelevant pages

Use a kernel based on both link and content information

Performance gains in terms of precision and recall observed overother existing topic distillation techniques

23 / 24

Support Vector Machines and their Applications

Page 69: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

Topic Drift in Page-ranking Algorithms

Karnick-Saradhi - application of multi-class Suport vector datadescription

Use SVDD to obtain representative pages for a topic and pruneirrelevant pages

Use a kernel based on both link and content information

Performance gains in terms of precision and recall observed overother existing topic distillation techniques

23 / 24

Support Vector Machines and their Applications

Page 70: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

Topic Drift in Page-ranking Algorithms

Karnick-Saradhi - application of multi-class Suport vector datadescription

Use SVDD to obtain representative pages for a topic and pruneirrelevant pages

Use a kernel based on both link and content information

Performance gains in terms of precision and recall observed overother existing topic distillation techniques

23 / 24

Support Vector Machines and their Applications

Page 71: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

Bibliography

An Introduction to Support Vector Machines, Nello Cristianini andJohn Shawe-Taylor, Cambridge University Press, 2000.

Learning with Kernels, Bernhard Schlkopf and Alexander J. Smola,The MIT Press, 2002.

http://www.support-vector.net/

24 / 24

Support Vector Machines and their Applications

Page 72: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

Bibliography

An Introduction to Support Vector Machines, Nello Cristianini andJohn Shawe-Taylor, Cambridge University Press, 2000.

Learning with Kernels, Bernhard Schlkopf and Alexander J. Smola,The MIT Press, 2002.

http://www.support-vector.net/

24 / 24

Support Vector Machines and their Applications

Page 73: Support Vector Machines and their Applications · Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of

Introduction SVMs Generalization Bounds The Kernel Trick Implementations Applications

Bibliography

An Introduction to Support Vector Machines, Nello Cristianini andJohn Shawe-Taylor, Cambridge University Press, 2000.

Learning with Kernels, Bernhard Schlkopf and Alexander J. Smola,The MIT Press, 2002.

http://www.support-vector.net/

24 / 24

Support Vector Machines and their Applications