LOGO Classification IV Lecturer: Dr. Bo Yuan E-mail: [email protected].

31
LOGO Classification IV Lecturer: Dr. Bo Yuan E-mail: [email protected]

Transcript of LOGO Classification IV Lecturer: Dr. Bo Yuan E-mail: [email protected].

LOGO

Classification IV

Lecturer Dr Bo Yuan

E-mail yuanbsztsinghuaeducn

Overview

Support Vector Machines

2

Linear Classifier

3

wx + b gt0

wx + b lt0

wx + b =0

)(

))(()(

bxwsign

xgsignbwxf

w

n

iiixwxw

caseinJust

1

0)( 21

21

xxw

bxwbxw

Distance to Hyperplane

4

x

x

bxwxg )(

ww

wwbxw

bwxwxg

wxx

)()(

||||

|)(||||||)(|

||||||||

w

xg

ww

wxg

wxxM

||||

||

w

b

Selection of Classifiers

5

Which classifier is the best

All have the same training error

How about generalization

Unknown Samples

6

A

B

Classifier B divides the space more consistently (unbiased)

Margins

7

Support Vectors Support Vectors

Margins

The margin of a linear classifier is defined as the width that the boundary could be increased by before hitting a data point

Intuitively it is safer to choose a classifier with a larger margin

Wider buffer zone for mistakes

The hyperplane is decided by only a few data points

Support Vectors

Others can be discarded

Select the classifier with the maximum margin Linear Support Vector Machines (LSVM)

Works very well in practice

How to specify the margin formally

8

Margins

9

ldquoPredict Class

= +1rdquo

zone

ldquoPredict Class

= -1rdquo

zone

wx+b=1

wx+b=

0wx+b=-1

X-

x+M=Margin

Width

||||

2

wM

Objective Function

Correctly classify all data points

Maximize the margin

Quadratic Optimization Problem

Minimize

Subject to 10

1 1 ii yifbxw

1 1 ii yifbxw

01)( bxwy ii

www

M T

2

1min

2max

www t

2

1)(

1)( bxwy ii

Lagrange Multipliers

11

l

i

l

iiiiiP bxwywL

1 1

2 )(||||2

1

00

0

1

1

l

iii

p

l

iiii

p

yb

L

xyww

L

0amp0

2

1

2

1

ii

ii

jijiijT

ii

jijijiji

iiD

ytosubject

xxyyHwhereH

xxyyL

Dual Problem

Quadratic problem again

Solutions of w amp b

12

bxxyxgl

iiii

1

)(

inner product

positivewithSamplesVectorsSupport

Ss Smsmmms

s

Smsmmms

ssmmSm

ms

Smsmmms

ss

xxyyN

b

xxyyb

ybxxyy

bxxyy

bwxy

)(1

)(

1)(

1)(

2

An Example

13

(1 1 +1)

(0 0 -1)

1

2

212

2

11 00

iii y

00

02

22221212

21211111

2221

1211

xxyyxxyy

xxyyxxyy

HH

HHH

211

2

12

121 2

2

1

HLi

iD

11 21

1)(

1121

]11[]00[)1(1]11[11

21

1

2

1

xxbwxxg

wxb

xywi

iii

0121 xx

22

22

wM

x1

x2

Soft Margin

14

wx+b=1

wx+b=0

wx+b=-

1

e7

e11 e2 01)( iii bwxy

0

2

1)(

i

ii

t Cwww

l

i

l

iii

l

iiiiiiP bxwyCwL

1 11

2]1)([

2

1

Soft Margin

15

iii

p

l

iii

p

l

iiii

p

uCL

yb

L

xyww

L

0

00

0

1

1

i

iiiT

iiD yandCtsHL 00

2

1

l

i

l

iii

l

iiiiiiP bxwyCwL

1 11

2]1)([

2

1

Non-linear SVMs

16

0 x

0 x

x2

x

Feature Space

17

Φ x rarr φ(x)

x1

x2

x12

x22

Feature Space

18

Φ x rarr φ(x)

x2

x1

Quadratic Basis Functions

19

mm

m

m

m

m

xx

xx

xx

xx

xx

xx

x

x

x

x

x

x

x

1

2

32

1

31

21

2

22

21

2

1

2

2

2

2

2

2

2

2

2

1

)(

Constant Terms

Linear Terms

Pure Quadratic Terms

Quadratic Cross-Terms

Number of terms

22

)1)(2( 22

2

mmmCm

Calculation of Φ(xi )Φ(xj )

20

mm

m

m

m

m

mm

m

m

m

m

bb

bb

bb

bb

bb

bb

b

b

b

b

b

b

aa

aa

aa

aa

aa

aa

a

a

a

a

a

a

ba

1

2

32

1

31

21

2

22

21

2

1

1

2

32

1

31

21

2

22

21

2

1

2

2

2

2

2

2

2

2

2

1

2

2

2

2

2

2

2

2

2

1

)()(

1

m

iiiba

1

2

m

iii ba

1

22

1

1 1

2m

ijiji

m

ij

bbaa

)()( jiji xxxx

It turns out hellip

21

m

i

m

i

m

i

m

ijjijiiiii bbaabababa

1 1

1

1 1

22 221)()(

1

1 111

2

1 1 1

1

2

1

22

122)(

12

12)(12)()1(

m

i

m

iii

m

ijjjii

m

iii

m

i

m

j

m

iiijjii

m

iii

m

iii

babababa

bababa

bababababa

)()()1()( 2 bababaK )(mO )( 2mO

Kernel Trick

The linear classifier relies on dot products between vectors xixj

If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)

A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)

Example x=[x1 x2] K(xi xj) = (1 + xi xj)2

22

]2221[)( where)(x)(x

]2221[]2221[

2221)1()(

212221

21ji

212221

2121

2221

21

221122

222211

21

21

2

xxxxxxx

xxxxxxxxxxxx

xxxxxxxxxxxxxxxxK

jjjjjjiiiiii

jijijijijijijiji

Kernels

23

)tanh()(

2exp)(

)1()(

2

2

cxxxxKTangentHyperbolic

xxxxKGaussian

xxxxKPolynomial

jiji

ji

ji

djiji

String Kernel

24

64

4

2)()(

)(

catcatKcarcarK

catcarK

Similarity between text strings Car vs Custard

Solutions of w amp b

25

bxxKyxg

xxKyyN

xxyyN

b

xxKyxxyxw

xyw

l

iiii

Ss Smsmmms

sSs Smsmmms

s

l

i

l

ijiiijiiij

l

iiii

1

1 1

1

)()(

))((1

))()((1

)()()()(

)(

bxxybxwxgl

iiii

1

)(

Decision Boundaries

26

More Maths hellip

27

Quadratic Optimization

Lagrange Duality

KarushndashKuhnndashTucker Conditions

SVM Roadmap

28

Kernel Trick

K(ab)=Φ(a)Φ(b)

ab rarr Φ(a)Φ(b)High Computational Cost

Soft MarginNonlinear Problem

Linear SVMNoise

Linear ClassifierMaximum Margin

Reading Materials

Text Book

Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000

Online Resources

httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg

httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm

A list of papers uploaded to the web learning portal

Wikipedia amp Google

29

Review

What is the definition of margin in a linear classifier

Why do we want to maximize the margin

What is the mathematical expression of margin

How to solve the objective function in SVM

What are support vectors

What is soft margin

How does SVM solve nonlinear problems

What is so called ldquokernel trickrdquo

What are the commonly used kernels

30

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic SVM in Practice

Hints

Applications

Demos

Multi-Class Problems

Softwarebull A very popular toolbox Libsvm

Any other interesting topics beyond this lecture

Length 20 minutes plus question time

31

  • Classification IV
  • Overview
  • Linear Classifier
  • Distance to Hyperplane
  • Selection of Classifiers
  • Unknown Samples
  • Margins
  • Margins (2)
  • Margins (3)
  • Objective Function
  • Lagrange Multipliers
  • Solutions of w amp b
  • An Example
  • Soft Margin
  • Soft Margin (2)
  • Non-linear SVMs
  • Feature Space
  • Feature Space (2)
  • Quadratic Basis Functions
  • Calculation of Φ(xi )Φ(xj )
  • It turns out hellip
  • Kernel Trick
  • Kernels
  • String Kernel
  • Solutions of w amp b (2)
  • Decision Boundaries
  • More Maths hellip
  • SVM Roadmap
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk

Overview

Support Vector Machines

2

Linear Classifier

3

wx + b gt0

wx + b lt0

wx + b =0

)(

))(()(

bxwsign

xgsignbwxf

w

n

iiixwxw

caseinJust

1

0)( 21

21

xxw

bxwbxw

Distance to Hyperplane

4

x

x

bxwxg )(

ww

wwbxw

bwxwxg

wxx

)()(

||||

|)(||||||)(|

||||||||

w

xg

ww

wxg

wxxM

||||

||

w

b

Selection of Classifiers

5

Which classifier is the best

All have the same training error

How about generalization

Unknown Samples

6

A

B

Classifier B divides the space more consistently (unbiased)

Margins

7

Support Vectors Support Vectors

Margins

The margin of a linear classifier is defined as the width that the boundary could be increased by before hitting a data point

Intuitively it is safer to choose a classifier with a larger margin

Wider buffer zone for mistakes

The hyperplane is decided by only a few data points

Support Vectors

Others can be discarded

Select the classifier with the maximum margin Linear Support Vector Machines (LSVM)

Works very well in practice

How to specify the margin formally

8

Margins

9

ldquoPredict Class

= +1rdquo

zone

ldquoPredict Class

= -1rdquo

zone

wx+b=1

wx+b=

0wx+b=-1

X-

x+M=Margin

Width

||||

2

wM

Objective Function

Correctly classify all data points

Maximize the margin

Quadratic Optimization Problem

Minimize

Subject to 10

1 1 ii yifbxw

1 1 ii yifbxw

01)( bxwy ii

www

M T

2

1min

2max

www t

2

1)(

1)( bxwy ii

Lagrange Multipliers

11

l

i

l

iiiiiP bxwywL

1 1

2 )(||||2

1

00

0

1

1

l

iii

p

l

iiii

p

yb

L

xyww

L

0amp0

2

1

2

1

ii

ii

jijiijT

ii

jijijiji

iiD

ytosubject

xxyyHwhereH

xxyyL

Dual Problem

Quadratic problem again

Solutions of w amp b

12

bxxyxgl

iiii

1

)(

inner product

positivewithSamplesVectorsSupport

Ss Smsmmms

s

Smsmmms

ssmmSm

ms

Smsmmms

ss

xxyyN

b

xxyyb

ybxxyy

bxxyy

bwxy

)(1

)(

1)(

1)(

2

An Example

13

(1 1 +1)

(0 0 -1)

1

2

212

2

11 00

iii y

00

02

22221212

21211111

2221

1211

xxyyxxyy

xxyyxxyy

HH

HHH

211

2

12

121 2

2

1

HLi

iD

11 21

1)(

1121

]11[]00[)1(1]11[11

21

1

2

1

xxbwxxg

wxb

xywi

iii

0121 xx

22

22

wM

x1

x2

Soft Margin

14

wx+b=1

wx+b=0

wx+b=-

1

e7

e11 e2 01)( iii bwxy

0

2

1)(

i

ii

t Cwww

l

i

l

iii

l

iiiiiiP bxwyCwL

1 11

2]1)([

2

1

Soft Margin

15

iii

p

l

iii

p

l

iiii

p

uCL

yb

L

xyww

L

0

00

0

1

1

i

iiiT

iiD yandCtsHL 00

2

1

l

i

l

iii

l

iiiiiiP bxwyCwL

1 11

2]1)([

2

1

Non-linear SVMs

16

0 x

0 x

x2

x

Feature Space

17

Φ x rarr φ(x)

x1

x2

x12

x22

Feature Space

18

Φ x rarr φ(x)

x2

x1

Quadratic Basis Functions

19

mm

m

m

m

m

xx

xx

xx

xx

xx

xx

x

x

x

x

x

x

x

1

2

32

1

31

21

2

22

21

2

1

2

2

2

2

2

2

2

2

2

1

)(

Constant Terms

Linear Terms

Pure Quadratic Terms

Quadratic Cross-Terms

Number of terms

22

)1)(2( 22

2

mmmCm

Calculation of Φ(xi )Φ(xj )

20

mm

m

m

m

m

mm

m

m

m

m

bb

bb

bb

bb

bb

bb

b

b

b

b

b

b

aa

aa

aa

aa

aa

aa

a

a

a

a

a

a

ba

1

2

32

1

31

21

2

22

21

2

1

1

2

32

1

31

21

2

22

21

2

1

2

2

2

2

2

2

2

2

2

1

2

2

2

2

2

2

2

2

2

1

)()(

1

m

iiiba

1

2

m

iii ba

1

22

1

1 1

2m

ijiji

m

ij

bbaa

)()( jiji xxxx

It turns out hellip

21

m

i

m

i

m

i

m

ijjijiiiii bbaabababa

1 1

1

1 1

22 221)()(

1

1 111

2

1 1 1

1

2

1

22

122)(

12

12)(12)()1(

m

i

m

iii

m

ijjjii

m

iii

m

i

m

j

m

iiijjii

m

iii

m

iii

babababa

bababa

bababababa

)()()1()( 2 bababaK )(mO )( 2mO

Kernel Trick

The linear classifier relies on dot products between vectors xixj

If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)

A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)

Example x=[x1 x2] K(xi xj) = (1 + xi xj)2

22

]2221[)( where)(x)(x

]2221[]2221[

2221)1()(

212221

21ji

212221

2121

2221

21

221122

222211

21

21

2

xxxxxxx

xxxxxxxxxxxx

xxxxxxxxxxxxxxxxK

jjjjjjiiiiii

jijijijijijijiji

Kernels

23

)tanh()(

2exp)(

)1()(

2

2

cxxxxKTangentHyperbolic

xxxxKGaussian

xxxxKPolynomial

jiji

ji

ji

djiji

String Kernel

24

64

4

2)()(

)(

catcatKcarcarK

catcarK

Similarity between text strings Car vs Custard

Solutions of w amp b

25

bxxKyxg

xxKyyN

xxyyN

b

xxKyxxyxw

xyw

l

iiii

Ss Smsmmms

sSs Smsmmms

s

l

i

l

ijiiijiiij

l

iiii

1

1 1

1

)()(

))((1

))()((1

)()()()(

)(

bxxybxwxgl

iiii

1

)(

Decision Boundaries

26

More Maths hellip

27

Quadratic Optimization

Lagrange Duality

KarushndashKuhnndashTucker Conditions

SVM Roadmap

28

Kernel Trick

K(ab)=Φ(a)Φ(b)

ab rarr Φ(a)Φ(b)High Computational Cost

Soft MarginNonlinear Problem

Linear SVMNoise

Linear ClassifierMaximum Margin

Reading Materials

Text Book

Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000

Online Resources

httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg

httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm

A list of papers uploaded to the web learning portal

Wikipedia amp Google

29

Review

What is the definition of margin in a linear classifier

Why do we want to maximize the margin

What is the mathematical expression of margin

How to solve the objective function in SVM

What are support vectors

What is soft margin

How does SVM solve nonlinear problems

What is so called ldquokernel trickrdquo

What are the commonly used kernels

30

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic SVM in Practice

Hints

Applications

Demos

Multi-Class Problems

Softwarebull A very popular toolbox Libsvm

Any other interesting topics beyond this lecture

Length 20 minutes plus question time

31

  • Classification IV
  • Overview
  • Linear Classifier
  • Distance to Hyperplane
  • Selection of Classifiers
  • Unknown Samples
  • Margins
  • Margins (2)
  • Margins (3)
  • Objective Function
  • Lagrange Multipliers
  • Solutions of w amp b
  • An Example
  • Soft Margin
  • Soft Margin (2)
  • Non-linear SVMs
  • Feature Space
  • Feature Space (2)
  • Quadratic Basis Functions
  • Calculation of Φ(xi )Φ(xj )
  • It turns out hellip
  • Kernel Trick
  • Kernels
  • String Kernel
  • Solutions of w amp b (2)
  • Decision Boundaries
  • More Maths hellip
  • SVM Roadmap
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk

Linear Classifier

3

wx + b gt0

wx + b lt0

wx + b =0

)(

))(()(

bxwsign

xgsignbwxf

w

n

iiixwxw

caseinJust

1

0)( 21

21

xxw

bxwbxw

Distance to Hyperplane

4

x

x

bxwxg )(

ww

wwbxw

bwxwxg

wxx

)()(

||||

|)(||||||)(|

||||||||

w

xg

ww

wxg

wxxM

||||

||

w

b

Selection of Classifiers

5

Which classifier is the best

All have the same training error

How about generalization

Unknown Samples

6

A

B

Classifier B divides the space more consistently (unbiased)

Margins

7

Support Vectors Support Vectors

Margins

The margin of a linear classifier is defined as the width that the boundary could be increased by before hitting a data point

Intuitively it is safer to choose a classifier with a larger margin

Wider buffer zone for mistakes

The hyperplane is decided by only a few data points

Support Vectors

Others can be discarded

Select the classifier with the maximum margin Linear Support Vector Machines (LSVM)

Works very well in practice

How to specify the margin formally

8

Margins

9

ldquoPredict Class

= +1rdquo

zone

ldquoPredict Class

= -1rdquo

zone

wx+b=1

wx+b=

0wx+b=-1

X-

x+M=Margin

Width

||||

2

wM

Objective Function

Correctly classify all data points

Maximize the margin

Quadratic Optimization Problem

Minimize

Subject to 10

1 1 ii yifbxw

1 1 ii yifbxw

01)( bxwy ii

www

M T

2

1min

2max

www t

2

1)(

1)( bxwy ii

Lagrange Multipliers

11

l

i

l

iiiiiP bxwywL

1 1

2 )(||||2

1

00

0

1

1

l

iii

p

l

iiii

p

yb

L

xyww

L

0amp0

2

1

2

1

ii

ii

jijiijT

ii

jijijiji

iiD

ytosubject

xxyyHwhereH

xxyyL

Dual Problem

Quadratic problem again

Solutions of w amp b

12

bxxyxgl

iiii

1

)(

inner product

positivewithSamplesVectorsSupport

Ss Smsmmms

s

Smsmmms

ssmmSm

ms

Smsmmms

ss

xxyyN

b

xxyyb

ybxxyy

bxxyy

bwxy

)(1

)(

1)(

1)(

2

An Example

13

(1 1 +1)

(0 0 -1)

1

2

212

2

11 00

iii y

00

02

22221212

21211111

2221

1211

xxyyxxyy

xxyyxxyy

HH

HHH

211

2

12

121 2

2

1

HLi

iD

11 21

1)(

1121

]11[]00[)1(1]11[11

21

1

2

1

xxbwxxg

wxb

xywi

iii

0121 xx

22

22

wM

x1

x2

Soft Margin

14

wx+b=1

wx+b=0

wx+b=-

1

e7

e11 e2 01)( iii bwxy

0

2

1)(

i

ii

t Cwww

l

i

l

iii

l

iiiiiiP bxwyCwL

1 11

2]1)([

2

1

Soft Margin

15

iii

p

l

iii

p

l

iiii

p

uCL

yb

L

xyww

L

0

00

0

1

1

i

iiiT

iiD yandCtsHL 00

2

1

l

i

l

iii

l

iiiiiiP bxwyCwL

1 11

2]1)([

2

1

Non-linear SVMs

16

0 x

0 x

x2

x

Feature Space

17

Φ x rarr φ(x)

x1

x2

x12

x22

Feature Space

18

Φ x rarr φ(x)

x2

x1

Quadratic Basis Functions

19

mm

m

m

m

m

xx

xx

xx

xx

xx

xx

x

x

x

x

x

x

x

1

2

32

1

31

21

2

22

21

2

1

2

2

2

2

2

2

2

2

2

1

)(

Constant Terms

Linear Terms

Pure Quadratic Terms

Quadratic Cross-Terms

Number of terms

22

)1)(2( 22

2

mmmCm

Calculation of Φ(xi )Φ(xj )

20

mm

m

m

m

m

mm

m

m

m

m

bb

bb

bb

bb

bb

bb

b

b

b

b

b

b

aa

aa

aa

aa

aa

aa

a

a

a

a

a

a

ba

1

2

32

1

31

21

2

22

21

2

1

1

2

32

1

31

21

2

22

21

2

1

2

2

2

2

2

2

2

2

2

1

2

2

2

2

2

2

2

2

2

1

)()(

1

m

iiiba

1

2

m

iii ba

1

22

1

1 1

2m

ijiji

m

ij

bbaa

)()( jiji xxxx

It turns out hellip

21

m

i

m

i

m

i

m

ijjijiiiii bbaabababa

1 1

1

1 1

22 221)()(

1

1 111

2

1 1 1

1

2

1

22

122)(

12

12)(12)()1(

m

i

m

iii

m

ijjjii

m

iii

m

i

m

j

m

iiijjii

m

iii

m

iii

babababa

bababa

bababababa

)()()1()( 2 bababaK )(mO )( 2mO

Kernel Trick

The linear classifier relies on dot products between vectors xixj

If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)

A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)

Example x=[x1 x2] K(xi xj) = (1 + xi xj)2

22

]2221[)( where)(x)(x

]2221[]2221[

2221)1()(

212221

21ji

212221

2121

2221

21

221122

222211

21

21

2

xxxxxxx

xxxxxxxxxxxx

xxxxxxxxxxxxxxxxK

jjjjjjiiiiii

jijijijijijijiji

Kernels

23

)tanh()(

2exp)(

)1()(

2

2

cxxxxKTangentHyperbolic

xxxxKGaussian

xxxxKPolynomial

jiji

ji

ji

djiji

String Kernel

24

64

4

2)()(

)(

catcatKcarcarK

catcarK

Similarity between text strings Car vs Custard

Solutions of w amp b

25

bxxKyxg

xxKyyN

xxyyN

b

xxKyxxyxw

xyw

l

iiii

Ss Smsmmms

sSs Smsmmms

s

l

i

l

ijiiijiiij

l

iiii

1

1 1

1

)()(

))((1

))()((1

)()()()(

)(

bxxybxwxgl

iiii

1

)(

Decision Boundaries

26

More Maths hellip

27

Quadratic Optimization

Lagrange Duality

KarushndashKuhnndashTucker Conditions

SVM Roadmap

28

Kernel Trick

K(ab)=Φ(a)Φ(b)

ab rarr Φ(a)Φ(b)High Computational Cost

Soft MarginNonlinear Problem

Linear SVMNoise

Linear ClassifierMaximum Margin

Reading Materials

Text Book

Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000

Online Resources

httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg

httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm

A list of papers uploaded to the web learning portal

Wikipedia amp Google

29

Review

What is the definition of margin in a linear classifier

Why do we want to maximize the margin

What is the mathematical expression of margin

How to solve the objective function in SVM

What are support vectors

What is soft margin

How does SVM solve nonlinear problems

What is so called ldquokernel trickrdquo

What are the commonly used kernels

30

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic SVM in Practice

Hints

Applications

Demos

Multi-Class Problems

Softwarebull A very popular toolbox Libsvm

Any other interesting topics beyond this lecture

Length 20 minutes plus question time

31

  • Classification IV
  • Overview
  • Linear Classifier
  • Distance to Hyperplane
  • Selection of Classifiers
  • Unknown Samples
  • Margins
  • Margins (2)
  • Margins (3)
  • Objective Function
  • Lagrange Multipliers
  • Solutions of w amp b
  • An Example
  • Soft Margin
  • Soft Margin (2)
  • Non-linear SVMs
  • Feature Space
  • Feature Space (2)
  • Quadratic Basis Functions
  • Calculation of Φ(xi )Φ(xj )
  • It turns out hellip
  • Kernel Trick
  • Kernels
  • String Kernel
  • Solutions of w amp b (2)
  • Decision Boundaries
  • More Maths hellip
  • SVM Roadmap
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk

Distance to Hyperplane

4

x

x

bxwxg )(

ww

wwbxw

bwxwxg

wxx

)()(

||||

|)(||||||)(|

||||||||

w

xg

ww

wxg

wxxM

||||

||

w

b

Selection of Classifiers

5

Which classifier is the best

All have the same training error

How about generalization

Unknown Samples

6

A

B

Classifier B divides the space more consistently (unbiased)

Margins

7

Support Vectors Support Vectors

Margins

The margin of a linear classifier is defined as the width that the boundary could be increased by before hitting a data point

Intuitively it is safer to choose a classifier with a larger margin

Wider buffer zone for mistakes

The hyperplane is decided by only a few data points

Support Vectors

Others can be discarded

Select the classifier with the maximum margin Linear Support Vector Machines (LSVM)

Works very well in practice

How to specify the margin formally

8

Margins

9

ldquoPredict Class

= +1rdquo

zone

ldquoPredict Class

= -1rdquo

zone

wx+b=1

wx+b=

0wx+b=-1

X-

x+M=Margin

Width

||||

2

wM

Objective Function

Correctly classify all data points

Maximize the margin

Quadratic Optimization Problem

Minimize

Subject to 10

1 1 ii yifbxw

1 1 ii yifbxw

01)( bxwy ii

www

M T

2

1min

2max

www t

2

1)(

1)( bxwy ii

Lagrange Multipliers

11

l

i

l

iiiiiP bxwywL

1 1

2 )(||||2

1

00

0

1

1

l

iii

p

l

iiii

p

yb

L

xyww

L

0amp0

2

1

2

1

ii

ii

jijiijT

ii

jijijiji

iiD

ytosubject

xxyyHwhereH

xxyyL

Dual Problem

Quadratic problem again

Solutions of w amp b

12

bxxyxgl

iiii

1

)(

inner product

positivewithSamplesVectorsSupport

Ss Smsmmms

s

Smsmmms

ssmmSm

ms

Smsmmms

ss

xxyyN

b

xxyyb

ybxxyy

bxxyy

bwxy

)(1

)(

1)(

1)(

2

An Example

13

(1 1 +1)

(0 0 -1)

1

2

212

2

11 00

iii y

00

02

22221212

21211111

2221

1211

xxyyxxyy

xxyyxxyy

HH

HHH

211

2

12

121 2

2

1

HLi

iD

11 21

1)(

1121

]11[]00[)1(1]11[11

21

1

2

1

xxbwxxg

wxb

xywi

iii

0121 xx

22

22

wM

x1

x2

Soft Margin

14

wx+b=1

wx+b=0

wx+b=-

1

e7

e11 e2 01)( iii bwxy

0

2

1)(

i

ii

t Cwww

l

i

l

iii

l

iiiiiiP bxwyCwL

1 11

2]1)([

2

1

Soft Margin

15

iii

p

l

iii

p

l

iiii

p

uCL

yb

L

xyww

L

0

00

0

1

1

i

iiiT

iiD yandCtsHL 00

2

1

l

i

l

iii

l

iiiiiiP bxwyCwL

1 11

2]1)([

2

1

Non-linear SVMs

16

0 x

0 x

x2

x

Feature Space

17

Φ x rarr φ(x)

x1

x2

x12

x22

Feature Space

18

Φ x rarr φ(x)

x2

x1

Quadratic Basis Functions

19

mm

m

m

m

m

xx

xx

xx

xx

xx

xx

x

x

x

x

x

x

x

1

2

32

1

31

21

2

22

21

2

1

2

2

2

2

2

2

2

2

2

1

)(

Constant Terms

Linear Terms

Pure Quadratic Terms

Quadratic Cross-Terms

Number of terms

22

)1)(2( 22

2

mmmCm

Calculation of Φ(xi )Φ(xj )

20

mm

m

m

m

m

mm

m

m

m

m

bb

bb

bb

bb

bb

bb

b

b

b

b

b

b

aa

aa

aa

aa

aa

aa

a

a

a

a

a

a

ba

1

2

32

1

31

21

2

22

21

2

1

1

2

32

1

31

21

2

22

21

2

1

2

2

2

2

2

2

2

2

2

1

2

2

2

2

2

2

2

2

2

1

)()(

1

m

iiiba

1

2

m

iii ba

1

22

1

1 1

2m

ijiji

m

ij

bbaa

)()( jiji xxxx

It turns out hellip

21

m

i

m

i

m

i

m

ijjijiiiii bbaabababa

1 1

1

1 1

22 221)()(

1

1 111

2

1 1 1

1

2

1

22

122)(

12

12)(12)()1(

m

i

m

iii

m

ijjjii

m

iii

m

i

m

j

m

iiijjii

m

iii

m

iii

babababa

bababa

bababababa

)()()1()( 2 bababaK )(mO )( 2mO

Kernel Trick

The linear classifier relies on dot products between vectors xixj

If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)

A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)

Example x=[x1 x2] K(xi xj) = (1 + xi xj)2

22

]2221[)( where)(x)(x

]2221[]2221[

2221)1()(

212221

21ji

212221

2121

2221

21

221122

222211

21

21

2

xxxxxxx

xxxxxxxxxxxx

xxxxxxxxxxxxxxxxK

jjjjjjiiiiii

jijijijijijijiji

Kernels

23

)tanh()(

2exp)(

)1()(

2

2

cxxxxKTangentHyperbolic

xxxxKGaussian

xxxxKPolynomial

jiji

ji

ji

djiji

String Kernel

24

64

4

2)()(

)(

catcatKcarcarK

catcarK

Similarity between text strings Car vs Custard

Solutions of w amp b

25

bxxKyxg

xxKyyN

xxyyN

b

xxKyxxyxw

xyw

l

iiii

Ss Smsmmms

sSs Smsmmms

s

l

i

l

ijiiijiiij

l

iiii

1

1 1

1

)()(

))((1

))()((1

)()()()(

)(

bxxybxwxgl

iiii

1

)(

Decision Boundaries

26

More Maths hellip

27

Quadratic Optimization

Lagrange Duality

KarushndashKuhnndashTucker Conditions

SVM Roadmap

28

Kernel Trick

K(ab)=Φ(a)Φ(b)

ab rarr Φ(a)Φ(b)High Computational Cost

Soft MarginNonlinear Problem

Linear SVMNoise

Linear ClassifierMaximum Margin

Reading Materials

Text Book

Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000

Online Resources

httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg

httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm

A list of papers uploaded to the web learning portal

Wikipedia amp Google

29

Review

What is the definition of margin in a linear classifier

Why do we want to maximize the margin

What is the mathematical expression of margin

How to solve the objective function in SVM

What are support vectors

What is soft margin

How does SVM solve nonlinear problems

What is so called ldquokernel trickrdquo

What are the commonly used kernels

30

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic SVM in Practice

Hints

Applications

Demos

Multi-Class Problems

Softwarebull A very popular toolbox Libsvm

Any other interesting topics beyond this lecture

Length 20 minutes plus question time

31

  • Classification IV
  • Overview
  • Linear Classifier
  • Distance to Hyperplane
  • Selection of Classifiers
  • Unknown Samples
  • Margins
  • Margins (2)
  • Margins (3)
  • Objective Function
  • Lagrange Multipliers
  • Solutions of w amp b
  • An Example
  • Soft Margin
  • Soft Margin (2)
  • Non-linear SVMs
  • Feature Space
  • Feature Space (2)
  • Quadratic Basis Functions
  • Calculation of Φ(xi )Φ(xj )
  • It turns out hellip
  • Kernel Trick
  • Kernels
  • String Kernel
  • Solutions of w amp b (2)
  • Decision Boundaries
  • More Maths hellip
  • SVM Roadmap
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk

Selection of Classifiers

5

Which classifier is the best

All have the same training error

How about generalization

Unknown Samples

6

A

B

Classifier B divides the space more consistently (unbiased)

Margins

7

Support Vectors Support Vectors

Margins

The margin of a linear classifier is defined as the width that the boundary could be increased by before hitting a data point

Intuitively it is safer to choose a classifier with a larger margin

Wider buffer zone for mistakes

The hyperplane is decided by only a few data points

Support Vectors

Others can be discarded

Select the classifier with the maximum margin Linear Support Vector Machines (LSVM)

Works very well in practice

How to specify the margin formally

8

Margins

9

ldquoPredict Class

= +1rdquo

zone

ldquoPredict Class

= -1rdquo

zone

wx+b=1

wx+b=

0wx+b=-1

X-

x+M=Margin

Width

||||

2

wM

Objective Function

Correctly classify all data points

Maximize the margin

Quadratic Optimization Problem

Minimize

Subject to 10

1 1 ii yifbxw

1 1 ii yifbxw

01)( bxwy ii

www

M T

2

1min

2max

www t

2

1)(

1)( bxwy ii

Lagrange Multipliers

11

l

i

l

iiiiiP bxwywL

1 1

2 )(||||2

1

00

0

1

1

l

iii

p

l

iiii

p

yb

L

xyww

L

0amp0

2

1

2

1

ii

ii

jijiijT

ii

jijijiji

iiD

ytosubject

xxyyHwhereH

xxyyL

Dual Problem

Quadratic problem again

Solutions of w amp b

12

bxxyxgl

iiii

1

)(

inner product

positivewithSamplesVectorsSupport

Ss Smsmmms

s

Smsmmms

ssmmSm

ms

Smsmmms

ss

xxyyN

b

xxyyb

ybxxyy

bxxyy

bwxy

)(1

)(

1)(

1)(

2

An Example

13

(1 1 +1)

(0 0 -1)

1

2

212

2

11 00

iii y

00

02

22221212

21211111

2221

1211

xxyyxxyy

xxyyxxyy

HH

HHH

211

2

12

121 2

2

1

HLi

iD

11 21

1)(

1121

]11[]00[)1(1]11[11

21

1

2

1

xxbwxxg

wxb

xywi

iii

0121 xx

22

22

wM

x1

x2

Soft Margin

14

wx+b=1

wx+b=0

wx+b=-

1

e7

e11 e2 01)( iii bwxy

0

2

1)(

i

ii

t Cwww

l

i

l

iii

l

iiiiiiP bxwyCwL

1 11

2]1)([

2

1

Soft Margin

15

iii

p

l

iii

p

l

iiii

p

uCL

yb

L

xyww

L

0

00

0

1

1

i

iiiT

iiD yandCtsHL 00

2

1

l

i

l

iii

l

iiiiiiP bxwyCwL

1 11

2]1)([

2

1

Non-linear SVMs

16

0 x

0 x

x2

x

Feature Space

17

Φ x rarr φ(x)

x1

x2

x12

x22

Feature Space

18

Φ x rarr φ(x)

x2

x1

Quadratic Basis Functions

19

mm

m

m

m

m

xx

xx

xx

xx

xx

xx

x

x

x

x

x

x

x

1

2

32

1

31

21

2

22

21

2

1

2

2

2

2

2

2

2

2

2

1

)(

Constant Terms

Linear Terms

Pure Quadratic Terms

Quadratic Cross-Terms

Number of terms

22

)1)(2( 22

2

mmmCm

Calculation of Φ(xi )Φ(xj )

20

mm

m

m

m

m

mm

m

m

m

m

bb

bb

bb

bb

bb

bb

b

b

b

b

b

b

aa

aa

aa

aa

aa

aa

a

a

a

a

a

a

ba

1

2

32

1

31

21

2

22

21

2

1

1

2

32

1

31

21

2

22

21

2

1

2

2

2

2

2

2

2

2

2

1

2

2

2

2

2

2

2

2

2

1

)()(

1

m

iiiba

1

2

m

iii ba

1

22

1

1 1

2m

ijiji

m

ij

bbaa

)()( jiji xxxx

It turns out hellip

21

m

i

m

i

m

i

m

ijjijiiiii bbaabababa

1 1

1

1 1

22 221)()(

1

1 111

2

1 1 1

1

2

1

22

122)(

12

12)(12)()1(

m

i

m

iii

m

ijjjii

m

iii

m

i

m

j

m

iiijjii

m

iii

m

iii

babababa

bababa

bababababa

)()()1()( 2 bababaK )(mO )( 2mO

Kernel Trick

The linear classifier relies on dot products between vectors xixj

If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)

A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)

Example x=[x1 x2] K(xi xj) = (1 + xi xj)2

22

]2221[)( where)(x)(x

]2221[]2221[

2221)1()(

212221

21ji

212221

2121

2221

21

221122

222211

21

21

2

xxxxxxx

xxxxxxxxxxxx

xxxxxxxxxxxxxxxxK

jjjjjjiiiiii

jijijijijijijiji

Kernels

23

)tanh()(

2exp)(

)1()(

2

2

cxxxxKTangentHyperbolic

xxxxKGaussian

xxxxKPolynomial

jiji

ji

ji

djiji

String Kernel

24

64

4

2)()(

)(

catcatKcarcarK

catcarK

Similarity between text strings Car vs Custard

Solutions of w amp b

25

bxxKyxg

xxKyyN

xxyyN

b

xxKyxxyxw

xyw

l

iiii

Ss Smsmmms

sSs Smsmmms

s

l

i

l

ijiiijiiij

l

iiii

1

1 1

1

)()(

))((1

))()((1

)()()()(

)(

bxxybxwxgl

iiii

1

)(

Decision Boundaries

26

More Maths hellip

27

Quadratic Optimization

Lagrange Duality

KarushndashKuhnndashTucker Conditions

SVM Roadmap

28

Kernel Trick

K(ab)=Φ(a)Φ(b)

ab rarr Φ(a)Φ(b)High Computational Cost

Soft MarginNonlinear Problem

Linear SVMNoise

Linear ClassifierMaximum Margin

Reading Materials

Text Book

Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000

Online Resources

httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg

httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm

A list of papers uploaded to the web learning portal

Wikipedia amp Google

29

Review

What is the definition of margin in a linear classifier

Why do we want to maximize the margin

What is the mathematical expression of margin

How to solve the objective function in SVM

What are support vectors

What is soft margin

How does SVM solve nonlinear problems

What is so called ldquokernel trickrdquo

What are the commonly used kernels

30

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic SVM in Practice

Hints

Applications

Demos

Multi-Class Problems

Softwarebull A very popular toolbox Libsvm

Any other interesting topics beyond this lecture

Length 20 minutes plus question time

31

  • Classification IV
  • Overview
  • Linear Classifier
  • Distance to Hyperplane
  • Selection of Classifiers
  • Unknown Samples
  • Margins
  • Margins (2)
  • Margins (3)
  • Objective Function
  • Lagrange Multipliers
  • Solutions of w amp b
  • An Example
  • Soft Margin
  • Soft Margin (2)
  • Non-linear SVMs
  • Feature Space
  • Feature Space (2)
  • Quadratic Basis Functions
  • Calculation of Φ(xi )Φ(xj )
  • It turns out hellip
  • Kernel Trick
  • Kernels
  • String Kernel
  • Solutions of w amp b (2)
  • Decision Boundaries
  • More Maths hellip
  • SVM Roadmap
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk

Unknown Samples

6

A

B

Classifier B divides the space more consistently (unbiased)

Margins

7

Support Vectors Support Vectors

Margins

The margin of a linear classifier is defined as the width that the boundary could be increased by before hitting a data point

Intuitively it is safer to choose a classifier with a larger margin

Wider buffer zone for mistakes

The hyperplane is decided by only a few data points

Support Vectors

Others can be discarded

Select the classifier with the maximum margin Linear Support Vector Machines (LSVM)

Works very well in practice

How to specify the margin formally

8

Margins

9

ldquoPredict Class

= +1rdquo

zone

ldquoPredict Class

= -1rdquo

zone

wx+b=1

wx+b=

0wx+b=-1

X-

x+M=Margin

Width

||||

2

wM

Objective Function

Correctly classify all data points

Maximize the margin

Quadratic Optimization Problem

Minimize

Subject to 10

1 1 ii yifbxw

1 1 ii yifbxw

01)( bxwy ii

www

M T

2

1min

2max

www t

2

1)(

1)( bxwy ii

Lagrange Multipliers

11

l

i

l

iiiiiP bxwywL

1 1

2 )(||||2

1

00

0

1

1

l

iii

p

l

iiii

p

yb

L

xyww

L

0amp0

2

1

2

1

ii

ii

jijiijT

ii

jijijiji

iiD

ytosubject

xxyyHwhereH

xxyyL

Dual Problem

Quadratic problem again

Solutions of w amp b

12

bxxyxgl

iiii

1

)(

inner product

positivewithSamplesVectorsSupport

Ss Smsmmms

s

Smsmmms

ssmmSm

ms

Smsmmms

ss

xxyyN

b

xxyyb

ybxxyy

bxxyy

bwxy

)(1

)(

1)(

1)(

2

An Example

13

(1 1 +1)

(0 0 -1)

1

2

212

2

11 00

iii y

00

02

22221212

21211111

2221

1211

xxyyxxyy

xxyyxxyy

HH

HHH

211

2

12

121 2

2

1

HLi

iD

11 21

1)(

1121

]11[]00[)1(1]11[11

21

1

2

1

xxbwxxg

wxb

xywi

iii

0121 xx

22

22

wM

x1

x2

Soft Margin

14

wx+b=1

wx+b=0

wx+b=-

1

e7

e11 e2 01)( iii bwxy

0

2

1)(

i

ii

t Cwww

l

i

l

iii

l

iiiiiiP bxwyCwL

1 11

2]1)([

2

1

Soft Margin

15

iii

p

l

iii

p

l

iiii

p

uCL

yb

L

xyww

L

0

00

0

1

1

i

iiiT

iiD yandCtsHL 00

2

1

l

i

l

iii

l

iiiiiiP bxwyCwL

1 11

2]1)([

2

1

Non-linear SVMs

16

0 x

0 x

x2

x

Feature Space

17

Φ x rarr φ(x)

x1

x2

x12

x22

Feature Space

18

Φ x rarr φ(x)

x2

x1

Quadratic Basis Functions

19

mm

m

m

m

m

xx

xx

xx

xx

xx

xx

x

x

x

x

x

x

x

1

2

32

1

31

21

2

22

21

2

1

2

2

2

2

2

2

2

2

2

1

)(

Constant Terms

Linear Terms

Pure Quadratic Terms

Quadratic Cross-Terms

Number of terms

22

)1)(2( 22

2

mmmCm

Calculation of Φ(xi )Φ(xj )

20

mm

m

m

m

m

mm

m

m

m

m

bb

bb

bb

bb

bb

bb

b

b

b

b

b

b

aa

aa

aa

aa

aa

aa

a

a

a

a

a

a

ba

1

2

32

1

31

21

2

22

21

2

1

1

2

32

1

31

21

2

22

21

2

1

2

2

2

2

2

2

2

2

2

1

2

2

2

2

2

2

2

2

2

1

)()(

1

m

iiiba

1

2

m

iii ba

1

22

1

1 1

2m

ijiji

m

ij

bbaa

)()( jiji xxxx

It turns out hellip

21

m

i

m

i

m

i

m

ijjijiiiii bbaabababa

1 1

1

1 1

22 221)()(

1

1 111

2

1 1 1

1

2

1

22

122)(

12

12)(12)()1(

m

i

m

iii

m

ijjjii

m

iii

m

i

m

j

m

iiijjii

m

iii

m

iii

babababa

bababa

bababababa

)()()1()( 2 bababaK )(mO )( 2mO

Kernel Trick

The linear classifier relies on dot products between vectors xixj

If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)

A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)

Example x=[x1 x2] K(xi xj) = (1 + xi xj)2

22

]2221[)( where)(x)(x

]2221[]2221[

2221)1()(

212221

21ji

212221

2121

2221

21

221122

222211

21

21

2

xxxxxxx

xxxxxxxxxxxx

xxxxxxxxxxxxxxxxK

jjjjjjiiiiii

jijijijijijijiji

Kernels

23

)tanh()(

2exp)(

)1()(

2

2

cxxxxKTangentHyperbolic

xxxxKGaussian

xxxxKPolynomial

jiji

ji

ji

djiji

String Kernel

24

64

4

2)()(

)(

catcatKcarcarK

catcarK

Similarity between text strings Car vs Custard

Solutions of w amp b

25

bxxKyxg

xxKyyN

xxyyN

b

xxKyxxyxw

xyw

l

iiii

Ss Smsmmms

sSs Smsmmms

s

l

i

l

ijiiijiiij

l

iiii

1

1 1

1

)()(

))((1

))()((1

)()()()(

)(

bxxybxwxgl

iiii

1

)(

Decision Boundaries

26

More Maths hellip

27

Quadratic Optimization

Lagrange Duality

KarushndashKuhnndashTucker Conditions

SVM Roadmap

28

Kernel Trick

K(ab)=Φ(a)Φ(b)

ab rarr Φ(a)Φ(b)High Computational Cost

Soft MarginNonlinear Problem

Linear SVMNoise

Linear ClassifierMaximum Margin

Reading Materials

Text Book

Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000

Online Resources

httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg

httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm

A list of papers uploaded to the web learning portal

Wikipedia amp Google

29

Review

What is the definition of margin in a linear classifier

Why do we want to maximize the margin

What is the mathematical expression of margin

How to solve the objective function in SVM

What are support vectors

What is soft margin

How does SVM solve nonlinear problems

What is so called ldquokernel trickrdquo

What are the commonly used kernels

30

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic SVM in Practice

Hints

Applications

Demos

Multi-Class Problems

Softwarebull A very popular toolbox Libsvm

Any other interesting topics beyond this lecture

Length 20 minutes plus question time

31

  • Classification IV
  • Overview
  • Linear Classifier
  • Distance to Hyperplane
  • Selection of Classifiers
  • Unknown Samples
  • Margins
  • Margins (2)
  • Margins (3)
  • Objective Function
  • Lagrange Multipliers
  • Solutions of w amp b
  • An Example
  • Soft Margin
  • Soft Margin (2)
  • Non-linear SVMs
  • Feature Space
  • Feature Space (2)
  • Quadratic Basis Functions
  • Calculation of Φ(xi )Φ(xj )
  • It turns out hellip
  • Kernel Trick
  • Kernels
  • String Kernel
  • Solutions of w amp b (2)
  • Decision Boundaries
  • More Maths hellip
  • SVM Roadmap
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk

Margins

7

Support Vectors Support Vectors

Margins

The margin of a linear classifier is defined as the width that the boundary could be increased by before hitting a data point

Intuitively it is safer to choose a classifier with a larger margin

Wider buffer zone for mistakes

The hyperplane is decided by only a few data points

Support Vectors

Others can be discarded

Select the classifier with the maximum margin Linear Support Vector Machines (LSVM)

Works very well in practice

How to specify the margin formally

8

Margins

9

ldquoPredict Class

= +1rdquo

zone

ldquoPredict Class

= -1rdquo

zone

wx+b=1

wx+b=

0wx+b=-1

X-

x+M=Margin

Width

||||

2

wM

Objective Function

Correctly classify all data points

Maximize the margin

Quadratic Optimization Problem

Minimize

Subject to 10

1 1 ii yifbxw

1 1 ii yifbxw

01)( bxwy ii

www

M T

2

1min

2max

www t

2

1)(

1)( bxwy ii

Lagrange Multipliers

11

l

i

l

iiiiiP bxwywL

1 1

2 )(||||2

1

00

0

1

1

l

iii

p

l

iiii

p

yb

L

xyww

L

0amp0

2

1

2

1

ii

ii

jijiijT

ii

jijijiji

iiD

ytosubject

xxyyHwhereH

xxyyL

Dual Problem

Quadratic problem again

Solutions of w amp b

12

bxxyxgl

iiii

1

)(

inner product

positivewithSamplesVectorsSupport

Ss Smsmmms

s

Smsmmms

ssmmSm

ms

Smsmmms

ss

xxyyN

b

xxyyb

ybxxyy

bxxyy

bwxy

)(1

)(

1)(

1)(

2

An Example

13

(1 1 +1)

(0 0 -1)

1

2

212

2

11 00

iii y

00

02

22221212

21211111

2221

1211

xxyyxxyy

xxyyxxyy

HH

HHH

211

2

12

121 2

2

1

HLi

iD

11 21

1)(

1121

]11[]00[)1(1]11[11

21

1

2

1

xxbwxxg

wxb

xywi

iii

0121 xx

22

22

wM

x1

x2

Soft Margin

14

wx+b=1

wx+b=0

wx+b=-

1

e7

e11 e2 01)( iii bwxy

0

2

1)(

i

ii

t Cwww

l

i

l

iii

l

iiiiiiP bxwyCwL

1 11

2]1)([

2

1

Soft Margin

15

iii

p

l

iii

p

l

iiii

p

uCL

yb

L

xyww

L

0

00

0

1

1

i

iiiT

iiD yandCtsHL 00

2

1

l

i

l

iii

l

iiiiiiP bxwyCwL

1 11

2]1)([

2

1

Non-linear SVMs

16

0 x

0 x

x2

x

Feature Space

17

Φ x rarr φ(x)

x1

x2

x12

x22

Feature Space

18

Φ x rarr φ(x)

x2

x1

Quadratic Basis Functions

19

mm

m

m

m

m

xx

xx

xx

xx

xx

xx

x

x

x

x

x

x

x

1

2

32

1

31

21

2

22

21

2

1

2

2

2

2

2

2

2

2

2

1

)(

Constant Terms

Linear Terms

Pure Quadratic Terms

Quadratic Cross-Terms

Number of terms

22

)1)(2( 22

2

mmmCm

Calculation of Φ(xi )Φ(xj )

20

mm

m

m

m

m

mm

m

m

m

m

bb

bb

bb

bb

bb

bb

b

b

b

b

b

b

aa

aa

aa

aa

aa

aa

a

a

a

a

a

a

ba

1

2

32

1

31

21

2

22

21

2

1

1

2

32

1

31

21

2

22

21

2

1

2

2

2

2

2

2

2

2

2

1

2

2

2

2

2

2

2

2

2

1

)()(

1

m

iiiba

1

2

m

iii ba

1

22

1

1 1

2m

ijiji

m

ij

bbaa

)()( jiji xxxx

It turns out hellip

21

m

i

m

i

m

i

m

ijjijiiiii bbaabababa

1 1

1

1 1

22 221)()(

1

1 111

2

1 1 1

1

2

1

22

122)(

12

12)(12)()1(

m

i

m

iii

m

ijjjii

m

iii

m

i

m

j

m

iiijjii

m

iii

m

iii

babababa

bababa

bababababa

)()()1()( 2 bababaK )(mO )( 2mO

Kernel Trick

The linear classifier relies on dot products between vectors xixj

If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)

A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)

Example x=[x1 x2] K(xi xj) = (1 + xi xj)2

22

]2221[)( where)(x)(x

]2221[]2221[

2221)1()(

212221

21ji

212221

2121

2221

21

221122

222211

21

21

2

xxxxxxx

xxxxxxxxxxxx

xxxxxxxxxxxxxxxxK

jjjjjjiiiiii

jijijijijijijiji

Kernels

23

)tanh()(

2exp)(

)1()(

2

2

cxxxxKTangentHyperbolic

xxxxKGaussian

xxxxKPolynomial

jiji

ji

ji

djiji

String Kernel

24

64

4

2)()(

)(

catcatKcarcarK

catcarK

Similarity between text strings Car vs Custard

Solutions of w amp b

25

bxxKyxg

xxKyyN

xxyyN

b

xxKyxxyxw

xyw

l

iiii

Ss Smsmmms

sSs Smsmmms

s

l

i

l

ijiiijiiij

l

iiii

1

1 1

1

)()(

))((1

))()((1

)()()()(

)(

bxxybxwxgl

iiii

1

)(

Decision Boundaries

26

More Maths hellip

27

Quadratic Optimization

Lagrange Duality

KarushndashKuhnndashTucker Conditions

SVM Roadmap

28

Kernel Trick

K(ab)=Φ(a)Φ(b)

ab rarr Φ(a)Φ(b)High Computational Cost

Soft MarginNonlinear Problem

Linear SVMNoise

Linear ClassifierMaximum Margin

Reading Materials

Text Book

Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000

Online Resources

httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg

httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm

A list of papers uploaded to the web learning portal

Wikipedia amp Google

29

Review

What is the definition of margin in a linear classifier

Why do we want to maximize the margin

What is the mathematical expression of margin

How to solve the objective function in SVM

What are support vectors

What is soft margin

How does SVM solve nonlinear problems

What is so called ldquokernel trickrdquo

What are the commonly used kernels

30

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic SVM in Practice

Hints

Applications

Demos

Multi-Class Problems

Softwarebull A very popular toolbox Libsvm

Any other interesting topics beyond this lecture

Length 20 minutes plus question time

31

  • Classification IV
  • Overview
  • Linear Classifier
  • Distance to Hyperplane
  • Selection of Classifiers
  • Unknown Samples
  • Margins
  • Margins (2)
  • Margins (3)
  • Objective Function
  • Lagrange Multipliers
  • Solutions of w amp b
  • An Example
  • Soft Margin
  • Soft Margin (2)
  • Non-linear SVMs
  • Feature Space
  • Feature Space (2)
  • Quadratic Basis Functions
  • Calculation of Φ(xi )Φ(xj )
  • It turns out hellip
  • Kernel Trick
  • Kernels
  • String Kernel
  • Solutions of w amp b (2)
  • Decision Boundaries
  • More Maths hellip
  • SVM Roadmap
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk

Margins

The margin of a linear classifier is defined as the width that the boundary could be increased by before hitting a data point

Intuitively it is safer to choose a classifier with a larger margin

Wider buffer zone for mistakes

The hyperplane is decided by only a few data points

Support Vectors

Others can be discarded

Select the classifier with the maximum margin Linear Support Vector Machines (LSVM)

Works very well in practice

How to specify the margin formally

8

Margins

9

ldquoPredict Class

= +1rdquo

zone

ldquoPredict Class

= -1rdquo

zone

wx+b=1

wx+b=

0wx+b=-1

X-

x+M=Margin

Width

||||

2

wM

Objective Function

Correctly classify all data points

Maximize the margin

Quadratic Optimization Problem

Minimize

Subject to 10

1 1 ii yifbxw

1 1 ii yifbxw

01)( bxwy ii

www

M T

2

1min

2max

www t

2

1)(

1)( bxwy ii

Lagrange Multipliers

11

l

i

l

iiiiiP bxwywL

1 1

2 )(||||2

1

00

0

1

1

l

iii

p

l

iiii

p

yb

L

xyww

L

0amp0

2

1

2

1

ii

ii

jijiijT

ii

jijijiji

iiD

ytosubject

xxyyHwhereH

xxyyL

Dual Problem

Quadratic problem again

Solutions of w amp b

12

bxxyxgl

iiii

1

)(

inner product

positivewithSamplesVectorsSupport

Ss Smsmmms

s

Smsmmms

ssmmSm

ms

Smsmmms

ss

xxyyN

b

xxyyb

ybxxyy

bxxyy

bwxy

)(1

)(

1)(

1)(

2

An Example

13

(1 1 +1)

(0 0 -1)

1

2

212

2

11 00

iii y

00

02

22221212

21211111

2221

1211

xxyyxxyy

xxyyxxyy

HH

HHH

211

2

12

121 2

2

1

HLi

iD

11 21

1)(

1121

]11[]00[)1(1]11[11

21

1

2

1

xxbwxxg

wxb

xywi

iii

0121 xx

22

22

wM

x1

x2

Soft Margin

14

wx+b=1

wx+b=0

wx+b=-

1

e7

e11 e2 01)( iii bwxy

0

2

1)(

i

ii

t Cwww

l

i

l

iii

l

iiiiiiP bxwyCwL

1 11

2]1)([

2

1

Soft Margin

15

iii

p

l

iii

p

l

iiii

p

uCL

yb

L

xyww

L

0

00

0

1

1

i

iiiT

iiD yandCtsHL 00

2

1

l

i

l

iii

l

iiiiiiP bxwyCwL

1 11

2]1)([

2

1

Non-linear SVMs

16

0 x

0 x

x2

x

Feature Space

17

Φ x rarr φ(x)

x1

x2

x12

x22

Feature Space

18

Φ x rarr φ(x)

x2

x1

Quadratic Basis Functions

19

mm

m

m

m

m

xx

xx

xx

xx

xx

xx

x

x

x

x

x

x

x

1

2

32

1

31

21

2

22

21

2

1

2

2

2

2

2

2

2

2

2

1

)(

Constant Terms

Linear Terms

Pure Quadratic Terms

Quadratic Cross-Terms

Number of terms

22

)1)(2( 22

2

mmmCm

Calculation of Φ(xi )Φ(xj )

20

mm

m

m

m

m

mm

m

m

m

m

bb

bb

bb

bb

bb

bb

b

b

b

b

b

b

aa

aa

aa

aa

aa

aa

a

a

a

a

a

a

ba

1

2

32

1

31

21

2

22

21

2

1

1

2

32

1

31

21

2

22

21

2

1

2

2

2

2

2

2

2

2

2

1

2

2

2

2

2

2

2

2

2

1

)()(

1

m

iiiba

1

2

m

iii ba

1

22

1

1 1

2m

ijiji

m

ij

bbaa

)()( jiji xxxx

It turns out hellip

21

m

i

m

i

m

i

m

ijjijiiiii bbaabababa

1 1

1

1 1

22 221)()(

1

1 111

2

1 1 1

1

2

1

22

122)(

12

12)(12)()1(

m

i

m

iii

m

ijjjii

m

iii

m

i

m

j

m

iiijjii

m

iii

m

iii

babababa

bababa

bababababa

)()()1()( 2 bababaK )(mO )( 2mO

Kernel Trick

The linear classifier relies on dot products between vectors xixj

If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)

A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)

Example x=[x1 x2] K(xi xj) = (1 + xi xj)2

22

]2221[)( where)(x)(x

]2221[]2221[

2221)1()(

212221

21ji

212221

2121

2221

21

221122

222211

21

21

2

xxxxxxx

xxxxxxxxxxxx

xxxxxxxxxxxxxxxxK

jjjjjjiiiiii

jijijijijijijiji

Kernels

23

)tanh()(

2exp)(

)1()(

2

2

cxxxxKTangentHyperbolic

xxxxKGaussian

xxxxKPolynomial

jiji

ji

ji

djiji

String Kernel

24

64

4

2)()(

)(

catcatKcarcarK

catcarK

Similarity between text strings Car vs Custard

Solutions of w amp b

25

bxxKyxg

xxKyyN

xxyyN

b

xxKyxxyxw

xyw

l

iiii

Ss Smsmmms

sSs Smsmmms

s

l

i

l

ijiiijiiij

l

iiii

1

1 1

1

)()(

))((1

))()((1

)()()()(

)(

bxxybxwxgl

iiii

1

)(

Decision Boundaries

26

More Maths hellip

27

Quadratic Optimization

Lagrange Duality

KarushndashKuhnndashTucker Conditions

SVM Roadmap

28

Kernel Trick

K(ab)=Φ(a)Φ(b)

ab rarr Φ(a)Φ(b)High Computational Cost

Soft MarginNonlinear Problem

Linear SVMNoise

Linear ClassifierMaximum Margin

Reading Materials

Text Book

Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000

Online Resources

httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg

httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm

A list of papers uploaded to the web learning portal

Wikipedia amp Google

29

Review

What is the definition of margin in a linear classifier

Why do we want to maximize the margin

What is the mathematical expression of margin

How to solve the objective function in SVM

What are support vectors

What is soft margin

How does SVM solve nonlinear problems

What is so called ldquokernel trickrdquo

What are the commonly used kernels

30

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic SVM in Practice

Hints

Applications

Demos

Multi-Class Problems

Softwarebull A very popular toolbox Libsvm

Any other interesting topics beyond this lecture

Length 20 minutes plus question time

31

  • Classification IV
  • Overview
  • Linear Classifier
  • Distance to Hyperplane
  • Selection of Classifiers
  • Unknown Samples
  • Margins
  • Margins (2)
  • Margins (3)
  • Objective Function
  • Lagrange Multipliers
  • Solutions of w amp b
  • An Example
  • Soft Margin
  • Soft Margin (2)
  • Non-linear SVMs
  • Feature Space
  • Feature Space (2)
  • Quadratic Basis Functions
  • Calculation of Φ(xi )Φ(xj )
  • It turns out hellip
  • Kernel Trick
  • Kernels
  • String Kernel
  • Solutions of w amp b (2)
  • Decision Boundaries
  • More Maths hellip
  • SVM Roadmap
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk

Margins

9

ldquoPredict Class

= +1rdquo

zone

ldquoPredict Class

= -1rdquo

zone

wx+b=1

wx+b=

0wx+b=-1

X-

x+M=Margin

Width

||||

2

wM

Objective Function

Correctly classify all data points

Maximize the margin

Quadratic Optimization Problem

Minimize

Subject to 10

1 1 ii yifbxw

1 1 ii yifbxw

01)( bxwy ii

www

M T

2

1min

2max

www t

2

1)(

1)( bxwy ii

Lagrange Multipliers

11

l

i

l

iiiiiP bxwywL

1 1

2 )(||||2

1

00

0

1

1

l

iii

p

l

iiii

p

yb

L

xyww

L

0amp0

2

1

2

1

ii

ii

jijiijT

ii

jijijiji

iiD

ytosubject

xxyyHwhereH

xxyyL

Dual Problem

Quadratic problem again

Solutions of w amp b

12

bxxyxgl

iiii

1

)(

inner product

positivewithSamplesVectorsSupport

Ss Smsmmms

s

Smsmmms

ssmmSm

ms

Smsmmms

ss

xxyyN

b

xxyyb

ybxxyy

bxxyy

bwxy

)(1

)(

1)(

1)(

2

An Example

13

(1 1 +1)

(0 0 -1)

1

2

212

2

11 00

iii y

00

02

22221212

21211111

2221

1211

xxyyxxyy

xxyyxxyy

HH

HHH

211

2

12

121 2

2

1

HLi

iD

11 21

1)(

1121

]11[]00[)1(1]11[11

21

1

2

1

xxbwxxg

wxb

xywi

iii

0121 xx

22

22

wM

x1

x2

Soft Margin

14

wx+b=1

wx+b=0

wx+b=-

1

e7

e11 e2 01)( iii bwxy

0

2

1)(

i

ii

t Cwww

l

i

l

iii

l

iiiiiiP bxwyCwL

1 11

2]1)([

2

1

Soft Margin

15

iii

p

l

iii

p

l

iiii

p

uCL

yb

L

xyww

L

0

00

0

1

1

i

iiiT

iiD yandCtsHL 00

2

1

l

i

l

iii

l

iiiiiiP bxwyCwL

1 11

2]1)([

2

1

Non-linear SVMs

16

0 x

0 x

x2

x

Feature Space

17

Φ x rarr φ(x)

x1

x2

x12

x22

Feature Space

18

Φ x rarr φ(x)

x2

x1

Quadratic Basis Functions

19

mm

m

m

m

m

xx

xx

xx

xx

xx

xx

x

x

x

x

x

x

x

1

2

32

1

31

21

2

22

21

2

1

2

2

2

2

2

2

2

2

2

1

)(

Constant Terms

Linear Terms

Pure Quadratic Terms

Quadratic Cross-Terms

Number of terms

22

)1)(2( 22

2

mmmCm

Calculation of Φ(xi )Φ(xj )

20

mm

m

m

m

m

mm

m

m

m

m

bb

bb

bb

bb

bb

bb

b

b

b

b

b

b

aa

aa

aa

aa

aa

aa

a

a

a

a

a

a

ba

1

2

32

1

31

21

2

22

21

2

1

1

2

32

1

31

21

2

22

21

2

1

2

2

2

2

2

2

2

2

2

1

2

2

2

2

2

2

2

2

2

1

)()(

1

m

iiiba

1

2

m

iii ba

1

22

1

1 1

2m

ijiji

m

ij

bbaa

)()( jiji xxxx

It turns out hellip

21

m

i

m

i

m

i

m

ijjijiiiii bbaabababa

1 1

1

1 1

22 221)()(

1

1 111

2

1 1 1

1

2

1

22

122)(

12

12)(12)()1(

m

i

m

iii

m

ijjjii

m

iii

m

i

m

j

m

iiijjii

m

iii

m

iii

babababa

bababa

bababababa

)()()1()( 2 bababaK )(mO )( 2mO

Kernel Trick

The linear classifier relies on dot products between vectors xixj

If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)

A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)

Example x=[x1 x2] K(xi xj) = (1 + xi xj)2

22

]2221[)( where)(x)(x

]2221[]2221[

2221)1()(

212221

21ji

212221

2121

2221

21

221122

222211

21

21

2

xxxxxxx

xxxxxxxxxxxx

xxxxxxxxxxxxxxxxK

jjjjjjiiiiii

jijijijijijijiji

Kernels

23

)tanh()(

2exp)(

)1()(

2

2

cxxxxKTangentHyperbolic

xxxxKGaussian

xxxxKPolynomial

jiji

ji

ji

djiji

String Kernel

24

64

4

2)()(

)(

catcatKcarcarK

catcarK

Similarity between text strings Car vs Custard

Solutions of w amp b

25

bxxKyxg

xxKyyN

xxyyN

b

xxKyxxyxw

xyw

l

iiii

Ss Smsmmms

sSs Smsmmms

s

l

i

l

ijiiijiiij

l

iiii

1

1 1

1

)()(

))((1

))()((1

)()()()(

)(

bxxybxwxgl

iiii

1

)(

Decision Boundaries

26

More Maths hellip

27

Quadratic Optimization

Lagrange Duality

KarushndashKuhnndashTucker Conditions

SVM Roadmap

28

Kernel Trick

K(ab)=Φ(a)Φ(b)

ab rarr Φ(a)Φ(b)High Computational Cost

Soft MarginNonlinear Problem

Linear SVMNoise

Linear ClassifierMaximum Margin

Reading Materials

Text Book

Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000

Online Resources

httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg

httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm

A list of papers uploaded to the web learning portal

Wikipedia amp Google

29

Review

What is the definition of margin in a linear classifier

Why do we want to maximize the margin

What is the mathematical expression of margin

How to solve the objective function in SVM

What are support vectors

What is soft margin

How does SVM solve nonlinear problems

What is so called ldquokernel trickrdquo

What are the commonly used kernels

30

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic SVM in Practice

Hints

Applications

Demos

Multi-Class Problems

Softwarebull A very popular toolbox Libsvm

Any other interesting topics beyond this lecture

Length 20 minutes plus question time

31

  • Classification IV
  • Overview
  • Linear Classifier
  • Distance to Hyperplane
  • Selection of Classifiers
  • Unknown Samples
  • Margins
  • Margins (2)
  • Margins (3)
  • Objective Function
  • Lagrange Multipliers
  • Solutions of w amp b
  • An Example
  • Soft Margin
  • Soft Margin (2)
  • Non-linear SVMs
  • Feature Space
  • Feature Space (2)
  • Quadratic Basis Functions
  • Calculation of Φ(xi )Φ(xj )
  • It turns out hellip
  • Kernel Trick
  • Kernels
  • String Kernel
  • Solutions of w amp b (2)
  • Decision Boundaries
  • More Maths hellip
  • SVM Roadmap
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk

Objective Function

Correctly classify all data points

Maximize the margin

Quadratic Optimization Problem

Minimize

Subject to 10

1 1 ii yifbxw

1 1 ii yifbxw

01)( bxwy ii

www

M T

2

1min

2max

www t

2

1)(

1)( bxwy ii

Lagrange Multipliers

11

l

i

l

iiiiiP bxwywL

1 1

2 )(||||2

1

00

0

1

1

l

iii

p

l

iiii

p

yb

L

xyww

L

0amp0

2

1

2

1

ii

ii

jijiijT

ii

jijijiji

iiD

ytosubject

xxyyHwhereH

xxyyL

Dual Problem

Quadratic problem again

Solutions of w amp b

12

bxxyxgl

iiii

1

)(

inner product

positivewithSamplesVectorsSupport

Ss Smsmmms

s

Smsmmms

ssmmSm

ms

Smsmmms

ss

xxyyN

b

xxyyb

ybxxyy

bxxyy

bwxy

)(1

)(

1)(

1)(

2

An Example

13

(1 1 +1)

(0 0 -1)

1

2

212

2

11 00

iii y

00

02

22221212

21211111

2221

1211

xxyyxxyy

xxyyxxyy

HH

HHH

211

2

12

121 2

2

1

HLi

iD

11 21

1)(

1121

]11[]00[)1(1]11[11

21

1

2

1

xxbwxxg

wxb

xywi

iii

0121 xx

22

22

wM

x1

x2

Soft Margin

14

wx+b=1

wx+b=0

wx+b=-

1

e7

e11 e2 01)( iii bwxy

0

2

1)(

i

ii

t Cwww

l

i

l

iii

l

iiiiiiP bxwyCwL

1 11

2]1)([

2

1

Soft Margin

15

iii

p

l

iii

p

l

iiii

p

uCL

yb

L

xyww

L

0

00

0

1

1

i

iiiT

iiD yandCtsHL 00

2

1

l

i

l

iii

l

iiiiiiP bxwyCwL

1 11

2]1)([

2

1

Non-linear SVMs

16

0 x

0 x

x2

x

Feature Space

17

Φ x rarr φ(x)

x1

x2

x12

x22

Feature Space

18

Φ x rarr φ(x)

x2

x1

Quadratic Basis Functions

19

mm

m

m

m

m

xx

xx

xx

xx

xx

xx

x

x

x

x

x

x

x

1

2

32

1

31

21

2

22

21

2

1

2

2

2

2

2

2

2

2

2

1

)(

Constant Terms

Linear Terms

Pure Quadratic Terms

Quadratic Cross-Terms

Number of terms

22

)1)(2( 22

2

mmmCm

Calculation of Φ(xi )Φ(xj )

20

mm

m

m

m

m

mm

m

m

m

m

bb

bb

bb

bb

bb

bb

b

b

b

b

b

b

aa

aa

aa

aa

aa

aa

a

a

a

a

a

a

ba

1

2

32

1

31

21

2

22

21

2

1

1

2

32

1

31

21

2

22

21

2

1

2

2

2

2

2

2

2

2

2

1

2

2

2

2

2

2

2

2

2

1

)()(

1

m

iiiba

1

2

m

iii ba

1

22

1

1 1

2m

ijiji

m

ij

bbaa

)()( jiji xxxx

It turns out hellip

21

m

i

m

i

m

i

m

ijjijiiiii bbaabababa

1 1

1

1 1

22 221)()(

1

1 111

2

1 1 1

1

2

1

22

122)(

12

12)(12)()1(

m

i

m

iii

m

ijjjii

m

iii

m

i

m

j

m

iiijjii

m

iii

m

iii

babababa

bababa

bababababa

)()()1()( 2 bababaK )(mO )( 2mO

Kernel Trick

The linear classifier relies on dot products between vectors xixj

If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)

A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)

Example x=[x1 x2] K(xi xj) = (1 + xi xj)2

22

]2221[)( where)(x)(x

]2221[]2221[

2221)1()(

212221

21ji

212221

2121

2221

21

221122

222211

21

21

2

xxxxxxx

xxxxxxxxxxxx

xxxxxxxxxxxxxxxxK

jjjjjjiiiiii

jijijijijijijiji

Kernels

23

)tanh()(

2exp)(

)1()(

2

2

cxxxxKTangentHyperbolic

xxxxKGaussian

xxxxKPolynomial

jiji

ji

ji

djiji

String Kernel

24

64

4

2)()(

)(

catcatKcarcarK

catcarK

Similarity between text strings Car vs Custard

Solutions of w amp b

25

bxxKyxg

xxKyyN

xxyyN

b

xxKyxxyxw

xyw

l

iiii

Ss Smsmmms

sSs Smsmmms

s

l

i

l

ijiiijiiij

l

iiii

1

1 1

1

)()(

))((1

))()((1

)()()()(

)(

bxxybxwxgl

iiii

1

)(

Decision Boundaries

26

More Maths hellip

27

Quadratic Optimization

Lagrange Duality

KarushndashKuhnndashTucker Conditions

SVM Roadmap

28

Kernel Trick

K(ab)=Φ(a)Φ(b)

ab rarr Φ(a)Φ(b)High Computational Cost

Soft MarginNonlinear Problem

Linear SVMNoise

Linear ClassifierMaximum Margin

Reading Materials

Text Book

Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000

Online Resources

httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg

httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm

A list of papers uploaded to the web learning portal

Wikipedia amp Google

29

Review

What is the definition of margin in a linear classifier

Why do we want to maximize the margin

What is the mathematical expression of margin

How to solve the objective function in SVM

What are support vectors

What is soft margin

How does SVM solve nonlinear problems

What is so called ldquokernel trickrdquo

What are the commonly used kernels

30

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic SVM in Practice

Hints

Applications

Demos

Multi-Class Problems

Softwarebull A very popular toolbox Libsvm

Any other interesting topics beyond this lecture

Length 20 minutes plus question time

31

  • Classification IV
  • Overview
  • Linear Classifier
  • Distance to Hyperplane
  • Selection of Classifiers
  • Unknown Samples
  • Margins
  • Margins (2)
  • Margins (3)
  • Objective Function
  • Lagrange Multipliers
  • Solutions of w amp b
  • An Example
  • Soft Margin
  • Soft Margin (2)
  • Non-linear SVMs
  • Feature Space
  • Feature Space (2)
  • Quadratic Basis Functions
  • Calculation of Φ(xi )Φ(xj )
  • It turns out hellip
  • Kernel Trick
  • Kernels
  • String Kernel
  • Solutions of w amp b (2)
  • Decision Boundaries
  • More Maths hellip
  • SVM Roadmap
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk

Lagrange Multipliers

11

l

i

l

iiiiiP bxwywL

1 1

2 )(||||2

1

00

0

1

1

l

iii

p

l

iiii

p

yb

L

xyww

L

0amp0

2

1

2

1

ii

ii

jijiijT

ii

jijijiji

iiD

ytosubject

xxyyHwhereH

xxyyL

Dual Problem

Quadratic problem again

Solutions of w amp b

12

bxxyxgl

iiii

1

)(

inner product

positivewithSamplesVectorsSupport

Ss Smsmmms

s

Smsmmms

ssmmSm

ms

Smsmmms

ss

xxyyN

b

xxyyb

ybxxyy

bxxyy

bwxy

)(1

)(

1)(

1)(

2

An Example

13

(1 1 +1)

(0 0 -1)

1

2

212

2

11 00

iii y

00

02

22221212

21211111

2221

1211

xxyyxxyy

xxyyxxyy

HH

HHH

211

2

12

121 2

2

1

HLi

iD

11 21

1)(

1121

]11[]00[)1(1]11[11

21

1

2

1

xxbwxxg

wxb

xywi

iii

0121 xx

22

22

wM

x1

x2

Soft Margin

14

wx+b=1

wx+b=0

wx+b=-

1

e7

e11 e2 01)( iii bwxy

0

2

1)(

i

ii

t Cwww

l

i

l

iii

l

iiiiiiP bxwyCwL

1 11

2]1)([

2

1

Soft Margin

15

iii

p

l

iii

p

l

iiii

p

uCL

yb

L

xyww

L

0

00

0

1

1

i

iiiT

iiD yandCtsHL 00

2

1

l

i

l

iii

l

iiiiiiP bxwyCwL

1 11

2]1)([

2

1

Non-linear SVMs

16

0 x

0 x

x2

x

Feature Space

17

Φ x rarr φ(x)

x1

x2

x12

x22

Feature Space

18

Φ x rarr φ(x)

x2

x1

Quadratic Basis Functions

19

mm

m

m

m

m

xx

xx

xx

xx

xx

xx

x

x

x

x

x

x

x

1

2

32

1

31

21

2

22

21

2

1

2

2

2

2

2

2

2

2

2

1

)(

Constant Terms

Linear Terms

Pure Quadratic Terms

Quadratic Cross-Terms

Number of terms

22

)1)(2( 22

2

mmmCm

Calculation of Φ(xi )Φ(xj )

20

mm

m

m

m

m

mm

m

m

m

m

bb

bb

bb

bb

bb

bb

b

b

b

b

b

b

aa

aa

aa

aa

aa

aa

a

a

a

a

a

a

ba

1

2

32

1

31

21

2

22

21

2

1

1

2

32

1

31

21

2

22

21

2

1

2

2

2

2

2

2

2

2

2

1

2

2

2

2

2

2

2

2

2

1

)()(

1

m

iiiba

1

2

m

iii ba

1

22

1

1 1

2m

ijiji

m

ij

bbaa

)()( jiji xxxx

It turns out hellip

21

m

i

m

i

m

i

m

ijjijiiiii bbaabababa

1 1

1

1 1

22 221)()(

1

1 111

2

1 1 1

1

2

1

22

122)(

12

12)(12)()1(

m

i

m

iii

m

ijjjii

m

iii

m

i

m

j

m

iiijjii

m

iii

m

iii

babababa

bababa

bababababa

)()()1()( 2 bababaK )(mO )( 2mO

Kernel Trick

The linear classifier relies on dot products between vectors xixj

If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)

A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)

Example x=[x1 x2] K(xi xj) = (1 + xi xj)2

22

]2221[)( where)(x)(x

]2221[]2221[

2221)1()(

212221

21ji

212221

2121

2221

21

221122

222211

21

21

2

xxxxxxx

xxxxxxxxxxxx

xxxxxxxxxxxxxxxxK

jjjjjjiiiiii

jijijijijijijiji

Kernels

23

)tanh()(

2exp)(

)1()(

2

2

cxxxxKTangentHyperbolic

xxxxKGaussian

xxxxKPolynomial

jiji

ji

ji

djiji

String Kernel

24

64

4

2)()(

)(

catcatKcarcarK

catcarK

Similarity between text strings Car vs Custard

Solutions of w amp b

25

bxxKyxg

xxKyyN

xxyyN

b

xxKyxxyxw

xyw

l

iiii

Ss Smsmmms

sSs Smsmmms

s

l

i

l

ijiiijiiij

l

iiii

1

1 1

1

)()(

))((1

))()((1

)()()()(

)(

bxxybxwxgl

iiii

1

)(

Decision Boundaries

26

More Maths hellip

27

Quadratic Optimization

Lagrange Duality

KarushndashKuhnndashTucker Conditions

SVM Roadmap

28

Kernel Trick

K(ab)=Φ(a)Φ(b)

ab rarr Φ(a)Φ(b)High Computational Cost

Soft MarginNonlinear Problem

Linear SVMNoise

Linear ClassifierMaximum Margin

Reading Materials

Text Book

Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000

Online Resources

httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg

httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm

A list of papers uploaded to the web learning portal

Wikipedia amp Google

29

Review

What is the definition of margin in a linear classifier

Why do we want to maximize the margin

What is the mathematical expression of margin

How to solve the objective function in SVM

What are support vectors

What is soft margin

How does SVM solve nonlinear problems

What is so called ldquokernel trickrdquo

What are the commonly used kernels

30

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic SVM in Practice

Hints

Applications

Demos

Multi-Class Problems

Softwarebull A very popular toolbox Libsvm

Any other interesting topics beyond this lecture

Length 20 minutes plus question time

31

  • Classification IV
  • Overview
  • Linear Classifier
  • Distance to Hyperplane
  • Selection of Classifiers
  • Unknown Samples
  • Margins
  • Margins (2)
  • Margins (3)
  • Objective Function
  • Lagrange Multipliers
  • Solutions of w amp b
  • An Example
  • Soft Margin
  • Soft Margin (2)
  • Non-linear SVMs
  • Feature Space
  • Feature Space (2)
  • Quadratic Basis Functions
  • Calculation of Φ(xi )Φ(xj )
  • It turns out hellip
  • Kernel Trick
  • Kernels
  • String Kernel
  • Solutions of w amp b (2)
  • Decision Boundaries
  • More Maths hellip
  • SVM Roadmap
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk

Solutions of w amp b

12

bxxyxgl

iiii

1

)(

inner product

positivewithSamplesVectorsSupport

Ss Smsmmms

s

Smsmmms

ssmmSm

ms

Smsmmms

ss

xxyyN

b

xxyyb

ybxxyy

bxxyy

bwxy

)(1

)(

1)(

1)(

2

An Example

13

(1 1 +1)

(0 0 -1)

1

2

212

2

11 00

iii y

00

02

22221212

21211111

2221

1211

xxyyxxyy

xxyyxxyy

HH

HHH

211

2

12

121 2

2

1

HLi

iD

11 21

1)(

1121

]11[]00[)1(1]11[11

21

1

2

1

xxbwxxg

wxb

xywi

iii

0121 xx

22

22

wM

x1

x2

Soft Margin

14

wx+b=1

wx+b=0

wx+b=-

1

e7

e11 e2 01)( iii bwxy

0

2

1)(

i

ii

t Cwww

l

i

l

iii

l

iiiiiiP bxwyCwL

1 11

2]1)([

2

1

Soft Margin

15

iii

p

l

iii

p

l

iiii

p

uCL

yb

L

xyww

L

0

00

0

1

1

i

iiiT

iiD yandCtsHL 00

2

1

l

i

l

iii

l

iiiiiiP bxwyCwL

1 11

2]1)([

2

1

Non-linear SVMs

16

0 x

0 x

x2

x

Feature Space

17

Φ x rarr φ(x)

x1

x2

x12

x22

Feature Space

18

Φ x rarr φ(x)

x2

x1

Quadratic Basis Functions

19

mm

m

m

m

m

xx

xx

xx

xx

xx

xx

x

x

x

x

x

x

x

1

2

32

1

31

21

2

22

21

2

1

2

2

2

2

2

2

2

2

2

1

)(

Constant Terms

Linear Terms

Pure Quadratic Terms

Quadratic Cross-Terms

Number of terms

22

)1)(2( 22

2

mmmCm

Calculation of Φ(xi )Φ(xj )

20

mm

m

m

m

m

mm

m

m

m

m

bb

bb

bb

bb

bb

bb

b

b

b

b

b

b

aa

aa

aa

aa

aa

aa

a

a

a

a

a

a

ba

1

2

32

1

31

21

2

22

21

2

1

1

2

32

1

31

21

2

22

21

2

1

2

2

2

2

2

2

2

2

2

1

2

2

2

2

2

2

2

2

2

1

)()(

1

m

iiiba

1

2

m

iii ba

1

22

1

1 1

2m

ijiji

m

ij

bbaa

)()( jiji xxxx

It turns out hellip

21

m

i

m

i

m

i

m

ijjijiiiii bbaabababa

1 1

1

1 1

22 221)()(

1

1 111

2

1 1 1

1

2

1

22

122)(

12

12)(12)()1(

m

i

m

iii

m

ijjjii

m

iii

m

i

m

j

m

iiijjii

m

iii

m

iii

babababa

bababa

bababababa

)()()1()( 2 bababaK )(mO )( 2mO

Kernel Trick

The linear classifier relies on dot products between vectors xixj

If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)

A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)

Example x=[x1 x2] K(xi xj) = (1 + xi xj)2

22

]2221[)( where)(x)(x

]2221[]2221[

2221)1()(

212221

21ji

212221

2121

2221

21

221122

222211

21

21

2

xxxxxxx

xxxxxxxxxxxx

xxxxxxxxxxxxxxxxK

jjjjjjiiiiii

jijijijijijijiji

Kernels

23

)tanh()(

2exp)(

)1()(

2

2

cxxxxKTangentHyperbolic

xxxxKGaussian

xxxxKPolynomial

jiji

ji

ji

djiji

String Kernel

24

64

4

2)()(

)(

catcatKcarcarK

catcarK

Similarity between text strings Car vs Custard

Solutions of w amp b

25

bxxKyxg

xxKyyN

xxyyN

b

xxKyxxyxw

xyw

l

iiii

Ss Smsmmms

sSs Smsmmms

s

l

i

l

ijiiijiiij

l

iiii

1

1 1

1

)()(

))((1

))()((1

)()()()(

)(

bxxybxwxgl

iiii

1

)(

Decision Boundaries

26

More Maths hellip

27

Quadratic Optimization

Lagrange Duality

KarushndashKuhnndashTucker Conditions

SVM Roadmap

28

Kernel Trick

K(ab)=Φ(a)Φ(b)

ab rarr Φ(a)Φ(b)High Computational Cost

Soft MarginNonlinear Problem

Linear SVMNoise

Linear ClassifierMaximum Margin

Reading Materials

Text Book

Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000

Online Resources

httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg

httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm

A list of papers uploaded to the web learning portal

Wikipedia amp Google

29

Review

What is the definition of margin in a linear classifier

Why do we want to maximize the margin

What is the mathematical expression of margin

How to solve the objective function in SVM

What are support vectors

What is soft margin

How does SVM solve nonlinear problems

What is so called ldquokernel trickrdquo

What are the commonly used kernels

30

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic SVM in Practice

Hints

Applications

Demos

Multi-Class Problems

Softwarebull A very popular toolbox Libsvm

Any other interesting topics beyond this lecture

Length 20 minutes plus question time

31

  • Classification IV
  • Overview
  • Linear Classifier
  • Distance to Hyperplane
  • Selection of Classifiers
  • Unknown Samples
  • Margins
  • Margins (2)
  • Margins (3)
  • Objective Function
  • Lagrange Multipliers
  • Solutions of w amp b
  • An Example
  • Soft Margin
  • Soft Margin (2)
  • Non-linear SVMs
  • Feature Space
  • Feature Space (2)
  • Quadratic Basis Functions
  • Calculation of Φ(xi )Φ(xj )
  • It turns out hellip
  • Kernel Trick
  • Kernels
  • String Kernel
  • Solutions of w amp b (2)
  • Decision Boundaries
  • More Maths hellip
  • SVM Roadmap
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk

An Example

13

(1 1 +1)

(0 0 -1)

1

2

212

2

11 00

iii y

00

02

22221212

21211111

2221

1211

xxyyxxyy

xxyyxxyy

HH

HHH

211

2

12

121 2

2

1

HLi

iD

11 21

1)(

1121

]11[]00[)1(1]11[11

21

1

2

1

xxbwxxg

wxb

xywi

iii

0121 xx

22

22

wM

x1

x2

Soft Margin

14

wx+b=1

wx+b=0

wx+b=-

1

e7

e11 e2 01)( iii bwxy

0

2

1)(

i

ii

t Cwww

l

i

l

iii

l

iiiiiiP bxwyCwL

1 11

2]1)([

2

1

Soft Margin

15

iii

p

l

iii

p

l

iiii

p

uCL

yb

L

xyww

L

0

00

0

1

1

i

iiiT

iiD yandCtsHL 00

2

1

l

i

l

iii

l

iiiiiiP bxwyCwL

1 11

2]1)([

2

1

Non-linear SVMs

16

0 x

0 x

x2

x

Feature Space

17

Φ x rarr φ(x)

x1

x2

x12

x22

Feature Space

18

Φ x rarr φ(x)

x2

x1

Quadratic Basis Functions

19

mm

m

m

m

m

xx

xx

xx

xx

xx

xx

x

x

x

x

x

x

x

1

2

32

1

31

21

2

22

21

2

1

2

2

2

2

2

2

2

2

2

1

)(

Constant Terms

Linear Terms

Pure Quadratic Terms

Quadratic Cross-Terms

Number of terms

22

)1)(2( 22

2

mmmCm

Calculation of Φ(xi )Φ(xj )

20

mm

m

m

m

m

mm

m

m

m

m

bb

bb

bb

bb

bb

bb

b

b

b

b

b

b

aa

aa

aa

aa

aa

aa

a

a

a

a

a

a

ba

1

2

32

1

31

21

2

22

21

2

1

1

2

32

1

31

21

2

22

21

2

1

2

2

2

2

2

2

2

2

2

1

2

2

2

2

2

2

2

2

2

1

)()(

1

m

iiiba

1

2

m

iii ba

1

22

1

1 1

2m

ijiji

m

ij

bbaa

)()( jiji xxxx

It turns out hellip

21

m

i

m

i

m

i

m

ijjijiiiii bbaabababa

1 1

1

1 1

22 221)()(

1

1 111

2

1 1 1

1

2

1

22

122)(

12

12)(12)()1(

m

i

m

iii

m

ijjjii

m

iii

m

i

m

j

m

iiijjii

m

iii

m

iii

babababa

bababa

bababababa

)()()1()( 2 bababaK )(mO )( 2mO

Kernel Trick

The linear classifier relies on dot products between vectors xixj

If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)

A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)

Example x=[x1 x2] K(xi xj) = (1 + xi xj)2

22

]2221[)( where)(x)(x

]2221[]2221[

2221)1()(

212221

21ji

212221

2121

2221

21

221122

222211

21

21

2

xxxxxxx

xxxxxxxxxxxx

xxxxxxxxxxxxxxxxK

jjjjjjiiiiii

jijijijijijijiji

Kernels

23

)tanh()(

2exp)(

)1()(

2

2

cxxxxKTangentHyperbolic

xxxxKGaussian

xxxxKPolynomial

jiji

ji

ji

djiji

String Kernel

24

64

4

2)()(

)(

catcatKcarcarK

catcarK

Similarity between text strings Car vs Custard

Solutions of w amp b

25

bxxKyxg

xxKyyN

xxyyN

b

xxKyxxyxw

xyw

l

iiii

Ss Smsmmms

sSs Smsmmms

s

l

i

l

ijiiijiiij

l

iiii

1

1 1

1

)()(

))((1

))()((1

)()()()(

)(

bxxybxwxgl

iiii

1

)(

Decision Boundaries

26

More Maths hellip

27

Quadratic Optimization

Lagrange Duality

KarushndashKuhnndashTucker Conditions

SVM Roadmap

28

Kernel Trick

K(ab)=Φ(a)Φ(b)

ab rarr Φ(a)Φ(b)High Computational Cost

Soft MarginNonlinear Problem

Linear SVMNoise

Linear ClassifierMaximum Margin

Reading Materials

Text Book

Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000

Online Resources

httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg

httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm

A list of papers uploaded to the web learning portal

Wikipedia amp Google

29

Review

What is the definition of margin in a linear classifier

Why do we want to maximize the margin

What is the mathematical expression of margin

How to solve the objective function in SVM

What are support vectors

What is soft margin

How does SVM solve nonlinear problems

What is so called ldquokernel trickrdquo

What are the commonly used kernels

30

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic SVM in Practice

Hints

Applications

Demos

Multi-Class Problems

Softwarebull A very popular toolbox Libsvm

Any other interesting topics beyond this lecture

Length 20 minutes plus question time

31

  • Classification IV
  • Overview
  • Linear Classifier
  • Distance to Hyperplane
  • Selection of Classifiers
  • Unknown Samples
  • Margins
  • Margins (2)
  • Margins (3)
  • Objective Function
  • Lagrange Multipliers
  • Solutions of w amp b
  • An Example
  • Soft Margin
  • Soft Margin (2)
  • Non-linear SVMs
  • Feature Space
  • Feature Space (2)
  • Quadratic Basis Functions
  • Calculation of Φ(xi )Φ(xj )
  • It turns out hellip
  • Kernel Trick
  • Kernels
  • String Kernel
  • Solutions of w amp b (2)
  • Decision Boundaries
  • More Maths hellip
  • SVM Roadmap
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk

Soft Margin

14

wx+b=1

wx+b=0

wx+b=-

1

e7

e11 e2 01)( iii bwxy

0

2

1)(

i

ii

t Cwww

l

i

l

iii

l

iiiiiiP bxwyCwL

1 11

2]1)([

2

1

Soft Margin

15

iii

p

l

iii

p

l

iiii

p

uCL

yb

L

xyww

L

0

00

0

1

1

i

iiiT

iiD yandCtsHL 00

2

1

l

i

l

iii

l

iiiiiiP bxwyCwL

1 11

2]1)([

2

1

Non-linear SVMs

16

0 x

0 x

x2

x

Feature Space

17

Φ x rarr φ(x)

x1

x2

x12

x22

Feature Space

18

Φ x rarr φ(x)

x2

x1

Quadratic Basis Functions

19

mm

m

m

m

m

xx

xx

xx

xx

xx

xx

x

x

x

x

x

x

x

1

2

32

1

31

21

2

22

21

2

1

2

2

2

2

2

2

2

2

2

1

)(

Constant Terms

Linear Terms

Pure Quadratic Terms

Quadratic Cross-Terms

Number of terms

22

)1)(2( 22

2

mmmCm

Calculation of Φ(xi )Φ(xj )

20

mm

m

m

m

m

mm

m

m

m

m

bb

bb

bb

bb

bb

bb

b

b

b

b

b

b

aa

aa

aa

aa

aa

aa

a

a

a

a

a

a

ba

1

2

32

1

31

21

2

22

21

2

1

1

2

32

1

31

21

2

22

21

2

1

2

2

2

2

2

2

2

2

2

1

2

2

2

2

2

2

2

2

2

1

)()(

1

m

iiiba

1

2

m

iii ba

1

22

1

1 1

2m

ijiji

m

ij

bbaa

)()( jiji xxxx

It turns out hellip

21

m

i

m

i

m

i

m

ijjijiiiii bbaabababa

1 1

1

1 1

22 221)()(

1

1 111

2

1 1 1

1

2

1

22

122)(

12

12)(12)()1(

m

i

m

iii

m

ijjjii

m

iii

m

i

m

j

m

iiijjii

m

iii

m

iii

babababa

bababa

bababababa

)()()1()( 2 bababaK )(mO )( 2mO

Kernel Trick

The linear classifier relies on dot products between vectors xixj

If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)

A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)

Example x=[x1 x2] K(xi xj) = (1 + xi xj)2

22

]2221[)( where)(x)(x

]2221[]2221[

2221)1()(

212221

21ji

212221

2121

2221

21

221122

222211

21

21

2

xxxxxxx

xxxxxxxxxxxx

xxxxxxxxxxxxxxxxK

jjjjjjiiiiii

jijijijijijijiji

Kernels

23

)tanh()(

2exp)(

)1()(

2

2

cxxxxKTangentHyperbolic

xxxxKGaussian

xxxxKPolynomial

jiji

ji

ji

djiji

String Kernel

24

64

4

2)()(

)(

catcatKcarcarK

catcarK

Similarity between text strings Car vs Custard

Solutions of w amp b

25

bxxKyxg

xxKyyN

xxyyN

b

xxKyxxyxw

xyw

l

iiii

Ss Smsmmms

sSs Smsmmms

s

l

i

l

ijiiijiiij

l

iiii

1

1 1

1

)()(

))((1

))()((1

)()()()(

)(

bxxybxwxgl

iiii

1

)(

Decision Boundaries

26

More Maths hellip

27

Quadratic Optimization

Lagrange Duality

KarushndashKuhnndashTucker Conditions

SVM Roadmap

28

Kernel Trick

K(ab)=Φ(a)Φ(b)

ab rarr Φ(a)Φ(b)High Computational Cost

Soft MarginNonlinear Problem

Linear SVMNoise

Linear ClassifierMaximum Margin

Reading Materials

Text Book

Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000

Online Resources

httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg

httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm

A list of papers uploaded to the web learning portal

Wikipedia amp Google

29

Review

What is the definition of margin in a linear classifier

Why do we want to maximize the margin

What is the mathematical expression of margin

How to solve the objective function in SVM

What are support vectors

What is soft margin

How does SVM solve nonlinear problems

What is so called ldquokernel trickrdquo

What are the commonly used kernels

30

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic SVM in Practice

Hints

Applications

Demos

Multi-Class Problems

Softwarebull A very popular toolbox Libsvm

Any other interesting topics beyond this lecture

Length 20 minutes plus question time

31

  • Classification IV
  • Overview
  • Linear Classifier
  • Distance to Hyperplane
  • Selection of Classifiers
  • Unknown Samples
  • Margins
  • Margins (2)
  • Margins (3)
  • Objective Function
  • Lagrange Multipliers
  • Solutions of w amp b
  • An Example
  • Soft Margin
  • Soft Margin (2)
  • Non-linear SVMs
  • Feature Space
  • Feature Space (2)
  • Quadratic Basis Functions
  • Calculation of Φ(xi )Φ(xj )
  • It turns out hellip
  • Kernel Trick
  • Kernels
  • String Kernel
  • Solutions of w amp b (2)
  • Decision Boundaries
  • More Maths hellip
  • SVM Roadmap
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk

Soft Margin

15

iii

p

l

iii

p

l

iiii

p

uCL

yb

L

xyww

L

0

00

0

1

1

i

iiiT

iiD yandCtsHL 00

2

1

l

i

l

iii

l

iiiiiiP bxwyCwL

1 11

2]1)([

2

1

Non-linear SVMs

16

0 x

0 x

x2

x

Feature Space

17

Φ x rarr φ(x)

x1

x2

x12

x22

Feature Space

18

Φ x rarr φ(x)

x2

x1

Quadratic Basis Functions

19

mm

m

m

m

m

xx

xx

xx

xx

xx

xx

x

x

x

x

x

x

x

1

2

32

1

31

21

2

22

21

2

1

2

2

2

2

2

2

2

2

2

1

)(

Constant Terms

Linear Terms

Pure Quadratic Terms

Quadratic Cross-Terms

Number of terms

22

)1)(2( 22

2

mmmCm

Calculation of Φ(xi )Φ(xj )

20

mm

m

m

m

m

mm

m

m

m

m

bb

bb

bb

bb

bb

bb

b

b

b

b

b

b

aa

aa

aa

aa

aa

aa

a

a

a

a

a

a

ba

1

2

32

1

31

21

2

22

21

2

1

1

2

32

1

31

21

2

22

21

2

1

2

2

2

2

2

2

2

2

2

1

2

2

2

2

2

2

2

2

2

1

)()(

1

m

iiiba

1

2

m

iii ba

1

22

1

1 1

2m

ijiji

m

ij

bbaa

)()( jiji xxxx

It turns out hellip

21

m

i

m

i

m

i

m

ijjijiiiii bbaabababa

1 1

1

1 1

22 221)()(

1

1 111

2

1 1 1

1

2

1

22

122)(

12

12)(12)()1(

m

i

m

iii

m

ijjjii

m

iii

m

i

m

j

m

iiijjii

m

iii

m

iii

babababa

bababa

bababababa

)()()1()( 2 bababaK )(mO )( 2mO

Kernel Trick

The linear classifier relies on dot products between vectors xixj

If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)

A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)

Example x=[x1 x2] K(xi xj) = (1 + xi xj)2

22

]2221[)( where)(x)(x

]2221[]2221[

2221)1()(

212221

21ji

212221

2121

2221

21

221122

222211

21

21

2

xxxxxxx

xxxxxxxxxxxx

xxxxxxxxxxxxxxxxK

jjjjjjiiiiii

jijijijijijijiji

Kernels

23

)tanh()(

2exp)(

)1()(

2

2

cxxxxKTangentHyperbolic

xxxxKGaussian

xxxxKPolynomial

jiji

ji

ji

djiji

String Kernel

24

64

4

2)()(

)(

catcatKcarcarK

catcarK

Similarity between text strings Car vs Custard

Solutions of w amp b

25

bxxKyxg

xxKyyN

xxyyN

b

xxKyxxyxw

xyw

l

iiii

Ss Smsmmms

sSs Smsmmms

s

l

i

l

ijiiijiiij

l

iiii

1

1 1

1

)()(

))((1

))()((1

)()()()(

)(

bxxybxwxgl

iiii

1

)(

Decision Boundaries

26

More Maths hellip

27

Quadratic Optimization

Lagrange Duality

KarushndashKuhnndashTucker Conditions

SVM Roadmap

28

Kernel Trick

K(ab)=Φ(a)Φ(b)

ab rarr Φ(a)Φ(b)High Computational Cost

Soft MarginNonlinear Problem

Linear SVMNoise

Linear ClassifierMaximum Margin

Reading Materials

Text Book

Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000

Online Resources

httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg

httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm

A list of papers uploaded to the web learning portal

Wikipedia amp Google

29

Review

What is the definition of margin in a linear classifier

Why do we want to maximize the margin

What is the mathematical expression of margin

How to solve the objective function in SVM

What are support vectors

What is soft margin

How does SVM solve nonlinear problems

What is so called ldquokernel trickrdquo

What are the commonly used kernels

30

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic SVM in Practice

Hints

Applications

Demos

Multi-Class Problems

Softwarebull A very popular toolbox Libsvm

Any other interesting topics beyond this lecture

Length 20 minutes plus question time

31

  • Classification IV
  • Overview
  • Linear Classifier
  • Distance to Hyperplane
  • Selection of Classifiers
  • Unknown Samples
  • Margins
  • Margins (2)
  • Margins (3)
  • Objective Function
  • Lagrange Multipliers
  • Solutions of w amp b
  • An Example
  • Soft Margin
  • Soft Margin (2)
  • Non-linear SVMs
  • Feature Space
  • Feature Space (2)
  • Quadratic Basis Functions
  • Calculation of Φ(xi )Φ(xj )
  • It turns out hellip
  • Kernel Trick
  • Kernels
  • String Kernel
  • Solutions of w amp b (2)
  • Decision Boundaries
  • More Maths hellip
  • SVM Roadmap
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk

Non-linear SVMs

16

0 x

0 x

x2

x

Feature Space

17

Φ x rarr φ(x)

x1

x2

x12

x22

Feature Space

18

Φ x rarr φ(x)

x2

x1

Quadratic Basis Functions

19

mm

m

m

m

m

xx

xx

xx

xx

xx

xx

x

x

x

x

x

x

x

1

2

32

1

31

21

2

22

21

2

1

2

2

2

2

2

2

2

2

2

1

)(

Constant Terms

Linear Terms

Pure Quadratic Terms

Quadratic Cross-Terms

Number of terms

22

)1)(2( 22

2

mmmCm

Calculation of Φ(xi )Φ(xj )

20

mm

m

m

m

m

mm

m

m

m

m

bb

bb

bb

bb

bb

bb

b

b

b

b

b

b

aa

aa

aa

aa

aa

aa

a

a

a

a

a

a

ba

1

2

32

1

31

21

2

22

21

2

1

1

2

32

1

31

21

2

22

21

2

1

2

2

2

2

2

2

2

2

2

1

2

2

2

2

2

2

2

2

2

1

)()(

1

m

iiiba

1

2

m

iii ba

1

22

1

1 1

2m

ijiji

m

ij

bbaa

)()( jiji xxxx

It turns out hellip

21

m

i

m

i

m

i

m

ijjijiiiii bbaabababa

1 1

1

1 1

22 221)()(

1

1 111

2

1 1 1

1

2

1

22

122)(

12

12)(12)()1(

m

i

m

iii

m

ijjjii

m

iii

m

i

m

j

m

iiijjii

m

iii

m

iii

babababa

bababa

bababababa

)()()1()( 2 bababaK )(mO )( 2mO

Kernel Trick

The linear classifier relies on dot products between vectors xixj

If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)

A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)

Example x=[x1 x2] K(xi xj) = (1 + xi xj)2

22

]2221[)( where)(x)(x

]2221[]2221[

2221)1()(

212221

21ji

212221

2121

2221

21

221122

222211

21

21

2

xxxxxxx

xxxxxxxxxxxx

xxxxxxxxxxxxxxxxK

jjjjjjiiiiii

jijijijijijijiji

Kernels

23

)tanh()(

2exp)(

)1()(

2

2

cxxxxKTangentHyperbolic

xxxxKGaussian

xxxxKPolynomial

jiji

ji

ji

djiji

String Kernel

24

64

4

2)()(

)(

catcatKcarcarK

catcarK

Similarity between text strings Car vs Custard

Solutions of w amp b

25

bxxKyxg

xxKyyN

xxyyN

b

xxKyxxyxw

xyw

l

iiii

Ss Smsmmms

sSs Smsmmms

s

l

i

l

ijiiijiiij

l

iiii

1

1 1

1

)()(

))((1

))()((1

)()()()(

)(

bxxybxwxgl

iiii

1

)(

Decision Boundaries

26

More Maths hellip

27

Quadratic Optimization

Lagrange Duality

KarushndashKuhnndashTucker Conditions

SVM Roadmap

28

Kernel Trick

K(ab)=Φ(a)Φ(b)

ab rarr Φ(a)Φ(b)High Computational Cost

Soft MarginNonlinear Problem

Linear SVMNoise

Linear ClassifierMaximum Margin

Reading Materials

Text Book

Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000

Online Resources

httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg

httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm

A list of papers uploaded to the web learning portal

Wikipedia amp Google

29

Review

What is the definition of margin in a linear classifier

Why do we want to maximize the margin

What is the mathematical expression of margin

How to solve the objective function in SVM

What are support vectors

What is soft margin

How does SVM solve nonlinear problems

What is so called ldquokernel trickrdquo

What are the commonly used kernels

30

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic SVM in Practice

Hints

Applications

Demos

Multi-Class Problems

Softwarebull A very popular toolbox Libsvm

Any other interesting topics beyond this lecture

Length 20 minutes plus question time

31

  • Classification IV
  • Overview
  • Linear Classifier
  • Distance to Hyperplane
  • Selection of Classifiers
  • Unknown Samples
  • Margins
  • Margins (2)
  • Margins (3)
  • Objective Function
  • Lagrange Multipliers
  • Solutions of w amp b
  • An Example
  • Soft Margin
  • Soft Margin (2)
  • Non-linear SVMs
  • Feature Space
  • Feature Space (2)
  • Quadratic Basis Functions
  • Calculation of Φ(xi )Φ(xj )
  • It turns out hellip
  • Kernel Trick
  • Kernels
  • String Kernel
  • Solutions of w amp b (2)
  • Decision Boundaries
  • More Maths hellip
  • SVM Roadmap
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk

Feature Space

17

Φ x rarr φ(x)

x1

x2

x12

x22

Feature Space

18

Φ x rarr φ(x)

x2

x1

Quadratic Basis Functions

19

mm

m

m

m

m

xx

xx

xx

xx

xx

xx

x

x

x

x

x

x

x

1

2

32

1

31

21

2

22

21

2

1

2

2

2

2

2

2

2

2

2

1

)(

Constant Terms

Linear Terms

Pure Quadratic Terms

Quadratic Cross-Terms

Number of terms

22

)1)(2( 22

2

mmmCm

Calculation of Φ(xi )Φ(xj )

20

mm

m

m

m

m

mm

m

m

m

m

bb

bb

bb

bb

bb

bb

b

b

b

b

b

b

aa

aa

aa

aa

aa

aa

a

a

a

a

a

a

ba

1

2

32

1

31

21

2

22

21

2

1

1

2

32

1

31

21

2

22

21

2

1

2

2

2

2

2

2

2

2

2

1

2

2

2

2

2

2

2

2

2

1

)()(

1

m

iiiba

1

2

m

iii ba

1

22

1

1 1

2m

ijiji

m

ij

bbaa

)()( jiji xxxx

It turns out hellip

21

m

i

m

i

m

i

m

ijjijiiiii bbaabababa

1 1

1

1 1

22 221)()(

1

1 111

2

1 1 1

1

2

1

22

122)(

12

12)(12)()1(

m

i

m

iii

m

ijjjii

m

iii

m

i

m

j

m

iiijjii

m

iii

m

iii

babababa

bababa

bababababa

)()()1()( 2 bababaK )(mO )( 2mO

Kernel Trick

The linear classifier relies on dot products between vectors xixj

If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)

A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)

Example x=[x1 x2] K(xi xj) = (1 + xi xj)2

22

]2221[)( where)(x)(x

]2221[]2221[

2221)1()(

212221

21ji

212221

2121

2221

21

221122

222211

21

21

2

xxxxxxx

xxxxxxxxxxxx

xxxxxxxxxxxxxxxxK

jjjjjjiiiiii

jijijijijijijiji

Kernels

23

)tanh()(

2exp)(

)1()(

2

2

cxxxxKTangentHyperbolic

xxxxKGaussian

xxxxKPolynomial

jiji

ji

ji

djiji

String Kernel

24

64

4

2)()(

)(

catcatKcarcarK

catcarK

Similarity between text strings Car vs Custard

Solutions of w amp b

25

bxxKyxg

xxKyyN

xxyyN

b

xxKyxxyxw

xyw

l

iiii

Ss Smsmmms

sSs Smsmmms

s

l

i

l

ijiiijiiij

l

iiii

1

1 1

1

)()(

))((1

))()((1

)()()()(

)(

bxxybxwxgl

iiii

1

)(

Decision Boundaries

26

More Maths hellip

27

Quadratic Optimization

Lagrange Duality

KarushndashKuhnndashTucker Conditions

SVM Roadmap

28

Kernel Trick

K(ab)=Φ(a)Φ(b)

ab rarr Φ(a)Φ(b)High Computational Cost

Soft MarginNonlinear Problem

Linear SVMNoise

Linear ClassifierMaximum Margin

Reading Materials

Text Book

Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000

Online Resources

httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg

httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm

A list of papers uploaded to the web learning portal

Wikipedia amp Google

29

Review

What is the definition of margin in a linear classifier

Why do we want to maximize the margin

What is the mathematical expression of margin

How to solve the objective function in SVM

What are support vectors

What is soft margin

How does SVM solve nonlinear problems

What is so called ldquokernel trickrdquo

What are the commonly used kernels

30

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic SVM in Practice

Hints

Applications

Demos

Multi-Class Problems

Softwarebull A very popular toolbox Libsvm

Any other interesting topics beyond this lecture

Length 20 minutes plus question time

31

  • Classification IV
  • Overview
  • Linear Classifier
  • Distance to Hyperplane
  • Selection of Classifiers
  • Unknown Samples
  • Margins
  • Margins (2)
  • Margins (3)
  • Objective Function
  • Lagrange Multipliers
  • Solutions of w amp b
  • An Example
  • Soft Margin
  • Soft Margin (2)
  • Non-linear SVMs
  • Feature Space
  • Feature Space (2)
  • Quadratic Basis Functions
  • Calculation of Φ(xi )Φ(xj )
  • It turns out hellip
  • Kernel Trick
  • Kernels
  • String Kernel
  • Solutions of w amp b (2)
  • Decision Boundaries
  • More Maths hellip
  • SVM Roadmap
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk

Feature Space

18

Φ x rarr φ(x)

x2

x1

Quadratic Basis Functions

19

mm

m

m

m

m

xx

xx

xx

xx

xx

xx

x

x

x

x

x

x

x

1

2

32

1

31

21

2

22

21

2

1

2

2

2

2

2

2

2

2

2

1

)(

Constant Terms

Linear Terms

Pure Quadratic Terms

Quadratic Cross-Terms

Number of terms

22

)1)(2( 22

2

mmmCm

Calculation of Φ(xi )Φ(xj )

20

mm

m

m

m

m

mm

m

m

m

m

bb

bb

bb

bb

bb

bb

b

b

b

b

b

b

aa

aa

aa

aa

aa

aa

a

a

a

a

a

a

ba

1

2

32

1

31

21

2

22

21

2

1

1

2

32

1

31

21

2

22

21

2

1

2

2

2

2

2

2

2

2

2

1

2

2

2

2

2

2

2

2

2

1

)()(

1

m

iiiba

1

2

m

iii ba

1

22

1

1 1

2m

ijiji

m

ij

bbaa

)()( jiji xxxx

It turns out hellip

21

m

i

m

i

m

i

m

ijjijiiiii bbaabababa

1 1

1

1 1

22 221)()(

1

1 111

2

1 1 1

1

2

1

22

122)(

12

12)(12)()1(

m

i

m

iii

m

ijjjii

m

iii

m

i

m

j

m

iiijjii

m

iii

m

iii

babababa

bababa

bababababa

)()()1()( 2 bababaK )(mO )( 2mO

Kernel Trick

The linear classifier relies on dot products between vectors xixj

If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)

A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)

Example x=[x1 x2] K(xi xj) = (1 + xi xj)2

22

]2221[)( where)(x)(x

]2221[]2221[

2221)1()(

212221

21ji

212221

2121

2221

21

221122

222211

21

21

2

xxxxxxx

xxxxxxxxxxxx

xxxxxxxxxxxxxxxxK

jjjjjjiiiiii

jijijijijijijiji

Kernels

23

)tanh()(

2exp)(

)1()(

2

2

cxxxxKTangentHyperbolic

xxxxKGaussian

xxxxKPolynomial

jiji

ji

ji

djiji

String Kernel

24

64

4

2)()(

)(

catcatKcarcarK

catcarK

Similarity between text strings Car vs Custard

Solutions of w amp b

25

bxxKyxg

xxKyyN

xxyyN

b

xxKyxxyxw

xyw

l

iiii

Ss Smsmmms

sSs Smsmmms

s

l

i

l

ijiiijiiij

l

iiii

1

1 1

1

)()(

))((1

))()((1

)()()()(

)(

bxxybxwxgl

iiii

1

)(

Decision Boundaries

26

More Maths hellip

27

Quadratic Optimization

Lagrange Duality

KarushndashKuhnndashTucker Conditions

SVM Roadmap

28

Kernel Trick

K(ab)=Φ(a)Φ(b)

ab rarr Φ(a)Φ(b)High Computational Cost

Soft MarginNonlinear Problem

Linear SVMNoise

Linear ClassifierMaximum Margin

Reading Materials

Text Book

Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000

Online Resources

httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg

httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm

A list of papers uploaded to the web learning portal

Wikipedia amp Google

29

Review

What is the definition of margin in a linear classifier

Why do we want to maximize the margin

What is the mathematical expression of margin

How to solve the objective function in SVM

What are support vectors

What is soft margin

How does SVM solve nonlinear problems

What is so called ldquokernel trickrdquo

What are the commonly used kernels

30

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic SVM in Practice

Hints

Applications

Demos

Multi-Class Problems

Softwarebull A very popular toolbox Libsvm

Any other interesting topics beyond this lecture

Length 20 minutes plus question time

31

  • Classification IV
  • Overview
  • Linear Classifier
  • Distance to Hyperplane
  • Selection of Classifiers
  • Unknown Samples
  • Margins
  • Margins (2)
  • Margins (3)
  • Objective Function
  • Lagrange Multipliers
  • Solutions of w amp b
  • An Example
  • Soft Margin
  • Soft Margin (2)
  • Non-linear SVMs
  • Feature Space
  • Feature Space (2)
  • Quadratic Basis Functions
  • Calculation of Φ(xi )Φ(xj )
  • It turns out hellip
  • Kernel Trick
  • Kernels
  • String Kernel
  • Solutions of w amp b (2)
  • Decision Boundaries
  • More Maths hellip
  • SVM Roadmap
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk

Quadratic Basis Functions

19

mm

m

m

m

m

xx

xx

xx

xx

xx

xx

x

x

x

x

x

x

x

1

2

32

1

31

21

2

22

21

2

1

2

2

2

2

2

2

2

2

2

1

)(

Constant Terms

Linear Terms

Pure Quadratic Terms

Quadratic Cross-Terms

Number of terms

22

)1)(2( 22

2

mmmCm

Calculation of Φ(xi )Φ(xj )

20

mm

m

m

m

m

mm

m

m

m

m

bb

bb

bb

bb

bb

bb

b

b

b

b

b

b

aa

aa

aa

aa

aa

aa

a

a

a

a

a

a

ba

1

2

32

1

31

21

2

22

21

2

1

1

2

32

1

31

21

2

22

21

2

1

2

2

2

2

2

2

2

2

2

1

2

2

2

2

2

2

2

2

2

1

)()(

1

m

iiiba

1

2

m

iii ba

1

22

1

1 1

2m

ijiji

m

ij

bbaa

)()( jiji xxxx

It turns out hellip

21

m

i

m

i

m

i

m

ijjijiiiii bbaabababa

1 1

1

1 1

22 221)()(

1

1 111

2

1 1 1

1

2

1

22

122)(

12

12)(12)()1(

m

i

m

iii

m

ijjjii

m

iii

m

i

m

j

m

iiijjii

m

iii

m

iii

babababa

bababa

bababababa

)()()1()( 2 bababaK )(mO )( 2mO

Kernel Trick

The linear classifier relies on dot products between vectors xixj

If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)

A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)

Example x=[x1 x2] K(xi xj) = (1 + xi xj)2

22

]2221[)( where)(x)(x

]2221[]2221[

2221)1()(

212221

21ji

212221

2121

2221

21

221122

222211

21

21

2

xxxxxxx

xxxxxxxxxxxx

xxxxxxxxxxxxxxxxK

jjjjjjiiiiii

jijijijijijijiji

Kernels

23

)tanh()(

2exp)(

)1()(

2

2

cxxxxKTangentHyperbolic

xxxxKGaussian

xxxxKPolynomial

jiji

ji

ji

djiji

String Kernel

24

64

4

2)()(

)(

catcatKcarcarK

catcarK

Similarity between text strings Car vs Custard

Solutions of w amp b

25

bxxKyxg

xxKyyN

xxyyN

b

xxKyxxyxw

xyw

l

iiii

Ss Smsmmms

sSs Smsmmms

s

l

i

l

ijiiijiiij

l

iiii

1

1 1

1

)()(

))((1

))()((1

)()()()(

)(

bxxybxwxgl

iiii

1

)(

Decision Boundaries

26

More Maths hellip

27

Quadratic Optimization

Lagrange Duality

KarushndashKuhnndashTucker Conditions

SVM Roadmap

28

Kernel Trick

K(ab)=Φ(a)Φ(b)

ab rarr Φ(a)Φ(b)High Computational Cost

Soft MarginNonlinear Problem

Linear SVMNoise

Linear ClassifierMaximum Margin

Reading Materials

Text Book

Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000

Online Resources

httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg

httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm

A list of papers uploaded to the web learning portal

Wikipedia amp Google

29

Review

What is the definition of margin in a linear classifier

Why do we want to maximize the margin

What is the mathematical expression of margin

How to solve the objective function in SVM

What are support vectors

What is soft margin

How does SVM solve nonlinear problems

What is so called ldquokernel trickrdquo

What are the commonly used kernels

30

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic SVM in Practice

Hints

Applications

Demos

Multi-Class Problems

Softwarebull A very popular toolbox Libsvm

Any other interesting topics beyond this lecture

Length 20 minutes plus question time

31

  • Classification IV
  • Overview
  • Linear Classifier
  • Distance to Hyperplane
  • Selection of Classifiers
  • Unknown Samples
  • Margins
  • Margins (2)
  • Margins (3)
  • Objective Function
  • Lagrange Multipliers
  • Solutions of w amp b
  • An Example
  • Soft Margin
  • Soft Margin (2)
  • Non-linear SVMs
  • Feature Space
  • Feature Space (2)
  • Quadratic Basis Functions
  • Calculation of Φ(xi )Φ(xj )
  • It turns out hellip
  • Kernel Trick
  • Kernels
  • String Kernel
  • Solutions of w amp b (2)
  • Decision Boundaries
  • More Maths hellip
  • SVM Roadmap
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk

Calculation of Φ(xi )Φ(xj )

20

mm

m

m

m

m

mm

m

m

m

m

bb

bb

bb

bb

bb

bb

b

b

b

b

b

b

aa

aa

aa

aa

aa

aa

a

a

a

a

a

a

ba

1

2

32

1

31

21

2

22

21

2

1

1

2

32

1

31

21

2

22

21

2

1

2

2

2

2

2

2

2

2

2

1

2

2

2

2

2

2

2

2

2

1

)()(

1

m

iiiba

1

2

m

iii ba

1

22

1

1 1

2m

ijiji

m

ij

bbaa

)()( jiji xxxx

It turns out hellip

21

m

i

m

i

m

i

m

ijjijiiiii bbaabababa

1 1

1

1 1

22 221)()(

1

1 111

2

1 1 1

1

2

1

22

122)(

12

12)(12)()1(

m

i

m

iii

m

ijjjii

m

iii

m

i

m

j

m

iiijjii

m

iii

m

iii

babababa

bababa

bababababa

)()()1()( 2 bababaK )(mO )( 2mO

Kernel Trick

The linear classifier relies on dot products between vectors xixj

If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)

A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)

Example x=[x1 x2] K(xi xj) = (1 + xi xj)2

22

]2221[)( where)(x)(x

]2221[]2221[

2221)1()(

212221

21ji

212221

2121

2221

21

221122

222211

21

21

2

xxxxxxx

xxxxxxxxxxxx

xxxxxxxxxxxxxxxxK

jjjjjjiiiiii

jijijijijijijiji

Kernels

23

)tanh()(

2exp)(

)1()(

2

2

cxxxxKTangentHyperbolic

xxxxKGaussian

xxxxKPolynomial

jiji

ji

ji

djiji

String Kernel

24

64

4

2)()(

)(

catcatKcarcarK

catcarK

Similarity between text strings Car vs Custard

Solutions of w amp b

25

bxxKyxg

xxKyyN

xxyyN

b

xxKyxxyxw

xyw

l

iiii

Ss Smsmmms

sSs Smsmmms

s

l

i

l

ijiiijiiij

l

iiii

1

1 1

1

)()(

))((1

))()((1

)()()()(

)(

bxxybxwxgl

iiii

1

)(

Decision Boundaries

26

More Maths hellip

27

Quadratic Optimization

Lagrange Duality

KarushndashKuhnndashTucker Conditions

SVM Roadmap

28

Kernel Trick

K(ab)=Φ(a)Φ(b)

ab rarr Φ(a)Φ(b)High Computational Cost

Soft MarginNonlinear Problem

Linear SVMNoise

Linear ClassifierMaximum Margin

Reading Materials

Text Book

Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000

Online Resources

httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg

httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm

A list of papers uploaded to the web learning portal

Wikipedia amp Google

29

Review

What is the definition of margin in a linear classifier

Why do we want to maximize the margin

What is the mathematical expression of margin

How to solve the objective function in SVM

What are support vectors

What is soft margin

How does SVM solve nonlinear problems

What is so called ldquokernel trickrdquo

What are the commonly used kernels

30

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic SVM in Practice

Hints

Applications

Demos

Multi-Class Problems

Softwarebull A very popular toolbox Libsvm

Any other interesting topics beyond this lecture

Length 20 minutes plus question time

31

  • Classification IV
  • Overview
  • Linear Classifier
  • Distance to Hyperplane
  • Selection of Classifiers
  • Unknown Samples
  • Margins
  • Margins (2)
  • Margins (3)
  • Objective Function
  • Lagrange Multipliers
  • Solutions of w amp b
  • An Example
  • Soft Margin
  • Soft Margin (2)
  • Non-linear SVMs
  • Feature Space
  • Feature Space (2)
  • Quadratic Basis Functions
  • Calculation of Φ(xi )Φ(xj )
  • It turns out hellip
  • Kernel Trick
  • Kernels
  • String Kernel
  • Solutions of w amp b (2)
  • Decision Boundaries
  • More Maths hellip
  • SVM Roadmap
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk

It turns out hellip

21

m

i

m

i

m

i

m

ijjijiiiii bbaabababa

1 1

1

1 1

22 221)()(

1

1 111

2

1 1 1

1

2

1

22

122)(

12

12)(12)()1(

m

i

m

iii

m

ijjjii

m

iii

m

i

m

j

m

iiijjii

m

iii

m

iii

babababa

bababa

bababababa

)()()1()( 2 bababaK )(mO )( 2mO

Kernel Trick

The linear classifier relies on dot products between vectors xixj

If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)

A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)

Example x=[x1 x2] K(xi xj) = (1 + xi xj)2

22

]2221[)( where)(x)(x

]2221[]2221[

2221)1()(

212221

21ji

212221

2121

2221

21

221122

222211

21

21

2

xxxxxxx

xxxxxxxxxxxx

xxxxxxxxxxxxxxxxK

jjjjjjiiiiii

jijijijijijijiji

Kernels

23

)tanh()(

2exp)(

)1()(

2

2

cxxxxKTangentHyperbolic

xxxxKGaussian

xxxxKPolynomial

jiji

ji

ji

djiji

String Kernel

24

64

4

2)()(

)(

catcatKcarcarK

catcarK

Similarity between text strings Car vs Custard

Solutions of w amp b

25

bxxKyxg

xxKyyN

xxyyN

b

xxKyxxyxw

xyw

l

iiii

Ss Smsmmms

sSs Smsmmms

s

l

i

l

ijiiijiiij

l

iiii

1

1 1

1

)()(

))((1

))()((1

)()()()(

)(

bxxybxwxgl

iiii

1

)(

Decision Boundaries

26

More Maths hellip

27

Quadratic Optimization

Lagrange Duality

KarushndashKuhnndashTucker Conditions

SVM Roadmap

28

Kernel Trick

K(ab)=Φ(a)Φ(b)

ab rarr Φ(a)Φ(b)High Computational Cost

Soft MarginNonlinear Problem

Linear SVMNoise

Linear ClassifierMaximum Margin

Reading Materials

Text Book

Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000

Online Resources

httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg

httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm

A list of papers uploaded to the web learning portal

Wikipedia amp Google

29

Review

What is the definition of margin in a linear classifier

Why do we want to maximize the margin

What is the mathematical expression of margin

How to solve the objective function in SVM

What are support vectors

What is soft margin

How does SVM solve nonlinear problems

What is so called ldquokernel trickrdquo

What are the commonly used kernels

30

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic SVM in Practice

Hints

Applications

Demos

Multi-Class Problems

Softwarebull A very popular toolbox Libsvm

Any other interesting topics beyond this lecture

Length 20 minutes plus question time

31

  • Classification IV
  • Overview
  • Linear Classifier
  • Distance to Hyperplane
  • Selection of Classifiers
  • Unknown Samples
  • Margins
  • Margins (2)
  • Margins (3)
  • Objective Function
  • Lagrange Multipliers
  • Solutions of w amp b
  • An Example
  • Soft Margin
  • Soft Margin (2)
  • Non-linear SVMs
  • Feature Space
  • Feature Space (2)
  • Quadratic Basis Functions
  • Calculation of Φ(xi )Φ(xj )
  • It turns out hellip
  • Kernel Trick
  • Kernels
  • String Kernel
  • Solutions of w amp b (2)
  • Decision Boundaries
  • More Maths hellip
  • SVM Roadmap
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk

Kernel Trick

The linear classifier relies on dot products between vectors xixj

If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)

A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)

Example x=[x1 x2] K(xi xj) = (1 + xi xj)2

22

]2221[)( where)(x)(x

]2221[]2221[

2221)1()(

212221

21ji

212221

2121

2221

21

221122

222211

21

21

2

xxxxxxx

xxxxxxxxxxxx

xxxxxxxxxxxxxxxxK

jjjjjjiiiiii

jijijijijijijiji

Kernels

23

)tanh()(

2exp)(

)1()(

2

2

cxxxxKTangentHyperbolic

xxxxKGaussian

xxxxKPolynomial

jiji

ji

ji

djiji

String Kernel

24

64

4

2)()(

)(

catcatKcarcarK

catcarK

Similarity between text strings Car vs Custard

Solutions of w amp b

25

bxxKyxg

xxKyyN

xxyyN

b

xxKyxxyxw

xyw

l

iiii

Ss Smsmmms

sSs Smsmmms

s

l

i

l

ijiiijiiij

l

iiii

1

1 1

1

)()(

))((1

))()((1

)()()()(

)(

bxxybxwxgl

iiii

1

)(

Decision Boundaries

26

More Maths hellip

27

Quadratic Optimization

Lagrange Duality

KarushndashKuhnndashTucker Conditions

SVM Roadmap

28

Kernel Trick

K(ab)=Φ(a)Φ(b)

ab rarr Φ(a)Φ(b)High Computational Cost

Soft MarginNonlinear Problem

Linear SVMNoise

Linear ClassifierMaximum Margin

Reading Materials

Text Book

Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000

Online Resources

httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg

httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm

A list of papers uploaded to the web learning portal

Wikipedia amp Google

29

Review

What is the definition of margin in a linear classifier

Why do we want to maximize the margin

What is the mathematical expression of margin

How to solve the objective function in SVM

What are support vectors

What is soft margin

How does SVM solve nonlinear problems

What is so called ldquokernel trickrdquo

What are the commonly used kernels

30

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic SVM in Practice

Hints

Applications

Demos

Multi-Class Problems

Softwarebull A very popular toolbox Libsvm

Any other interesting topics beyond this lecture

Length 20 minutes plus question time

31

  • Classification IV
  • Overview
  • Linear Classifier
  • Distance to Hyperplane
  • Selection of Classifiers
  • Unknown Samples
  • Margins
  • Margins (2)
  • Margins (3)
  • Objective Function
  • Lagrange Multipliers
  • Solutions of w amp b
  • An Example
  • Soft Margin
  • Soft Margin (2)
  • Non-linear SVMs
  • Feature Space
  • Feature Space (2)
  • Quadratic Basis Functions
  • Calculation of Φ(xi )Φ(xj )
  • It turns out hellip
  • Kernel Trick
  • Kernels
  • String Kernel
  • Solutions of w amp b (2)
  • Decision Boundaries
  • More Maths hellip
  • SVM Roadmap
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk

Kernels

23

)tanh()(

2exp)(

)1()(

2

2

cxxxxKTangentHyperbolic

xxxxKGaussian

xxxxKPolynomial

jiji

ji

ji

djiji

String Kernel

24

64

4

2)()(

)(

catcatKcarcarK

catcarK

Similarity between text strings Car vs Custard

Solutions of w amp b

25

bxxKyxg

xxKyyN

xxyyN

b

xxKyxxyxw

xyw

l

iiii

Ss Smsmmms

sSs Smsmmms

s

l

i

l

ijiiijiiij

l

iiii

1

1 1

1

)()(

))((1

))()((1

)()()()(

)(

bxxybxwxgl

iiii

1

)(

Decision Boundaries

26

More Maths hellip

27

Quadratic Optimization

Lagrange Duality

KarushndashKuhnndashTucker Conditions

SVM Roadmap

28

Kernel Trick

K(ab)=Φ(a)Φ(b)

ab rarr Φ(a)Φ(b)High Computational Cost

Soft MarginNonlinear Problem

Linear SVMNoise

Linear ClassifierMaximum Margin

Reading Materials

Text Book

Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000

Online Resources

httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg

httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm

A list of papers uploaded to the web learning portal

Wikipedia amp Google

29

Review

What is the definition of margin in a linear classifier

Why do we want to maximize the margin

What is the mathematical expression of margin

How to solve the objective function in SVM

What are support vectors

What is soft margin

How does SVM solve nonlinear problems

What is so called ldquokernel trickrdquo

What are the commonly used kernels

30

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic SVM in Practice

Hints

Applications

Demos

Multi-Class Problems

Softwarebull A very popular toolbox Libsvm

Any other interesting topics beyond this lecture

Length 20 minutes plus question time

31

  • Classification IV
  • Overview
  • Linear Classifier
  • Distance to Hyperplane
  • Selection of Classifiers
  • Unknown Samples
  • Margins
  • Margins (2)
  • Margins (3)
  • Objective Function
  • Lagrange Multipliers
  • Solutions of w amp b
  • An Example
  • Soft Margin
  • Soft Margin (2)
  • Non-linear SVMs
  • Feature Space
  • Feature Space (2)
  • Quadratic Basis Functions
  • Calculation of Φ(xi )Φ(xj )
  • It turns out hellip
  • Kernel Trick
  • Kernels
  • String Kernel
  • Solutions of w amp b (2)
  • Decision Boundaries
  • More Maths hellip
  • SVM Roadmap
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk

String Kernel

24

64

4

2)()(

)(

catcatKcarcarK

catcarK

Similarity between text strings Car vs Custard

Solutions of w amp b

25

bxxKyxg

xxKyyN

xxyyN

b

xxKyxxyxw

xyw

l

iiii

Ss Smsmmms

sSs Smsmmms

s

l

i

l

ijiiijiiij

l

iiii

1

1 1

1

)()(

))((1

))()((1

)()()()(

)(

bxxybxwxgl

iiii

1

)(

Decision Boundaries

26

More Maths hellip

27

Quadratic Optimization

Lagrange Duality

KarushndashKuhnndashTucker Conditions

SVM Roadmap

28

Kernel Trick

K(ab)=Φ(a)Φ(b)

ab rarr Φ(a)Φ(b)High Computational Cost

Soft MarginNonlinear Problem

Linear SVMNoise

Linear ClassifierMaximum Margin

Reading Materials

Text Book

Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000

Online Resources

httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg

httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm

A list of papers uploaded to the web learning portal

Wikipedia amp Google

29

Review

What is the definition of margin in a linear classifier

Why do we want to maximize the margin

What is the mathematical expression of margin

How to solve the objective function in SVM

What are support vectors

What is soft margin

How does SVM solve nonlinear problems

What is so called ldquokernel trickrdquo

What are the commonly used kernels

30

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic SVM in Practice

Hints

Applications

Demos

Multi-Class Problems

Softwarebull A very popular toolbox Libsvm

Any other interesting topics beyond this lecture

Length 20 minutes plus question time

31

  • Classification IV
  • Overview
  • Linear Classifier
  • Distance to Hyperplane
  • Selection of Classifiers
  • Unknown Samples
  • Margins
  • Margins (2)
  • Margins (3)
  • Objective Function
  • Lagrange Multipliers
  • Solutions of w amp b
  • An Example
  • Soft Margin
  • Soft Margin (2)
  • Non-linear SVMs
  • Feature Space
  • Feature Space (2)
  • Quadratic Basis Functions
  • Calculation of Φ(xi )Φ(xj )
  • It turns out hellip
  • Kernel Trick
  • Kernels
  • String Kernel
  • Solutions of w amp b (2)
  • Decision Boundaries
  • More Maths hellip
  • SVM Roadmap
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk

Solutions of w amp b

25

bxxKyxg

xxKyyN

xxyyN

b

xxKyxxyxw

xyw

l

iiii

Ss Smsmmms

sSs Smsmmms

s

l

i

l

ijiiijiiij

l

iiii

1

1 1

1

)()(

))((1

))()((1

)()()()(

)(

bxxybxwxgl

iiii

1

)(

Decision Boundaries

26

More Maths hellip

27

Quadratic Optimization

Lagrange Duality

KarushndashKuhnndashTucker Conditions

SVM Roadmap

28

Kernel Trick

K(ab)=Φ(a)Φ(b)

ab rarr Φ(a)Φ(b)High Computational Cost

Soft MarginNonlinear Problem

Linear SVMNoise

Linear ClassifierMaximum Margin

Reading Materials

Text Book

Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000

Online Resources

httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg

httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm

A list of papers uploaded to the web learning portal

Wikipedia amp Google

29

Review

What is the definition of margin in a linear classifier

Why do we want to maximize the margin

What is the mathematical expression of margin

How to solve the objective function in SVM

What are support vectors

What is soft margin

How does SVM solve nonlinear problems

What is so called ldquokernel trickrdquo

What are the commonly used kernels

30

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic SVM in Practice

Hints

Applications

Demos

Multi-Class Problems

Softwarebull A very popular toolbox Libsvm

Any other interesting topics beyond this lecture

Length 20 minutes plus question time

31

  • Classification IV
  • Overview
  • Linear Classifier
  • Distance to Hyperplane
  • Selection of Classifiers
  • Unknown Samples
  • Margins
  • Margins (2)
  • Margins (3)
  • Objective Function
  • Lagrange Multipliers
  • Solutions of w amp b
  • An Example
  • Soft Margin
  • Soft Margin (2)
  • Non-linear SVMs
  • Feature Space
  • Feature Space (2)
  • Quadratic Basis Functions
  • Calculation of Φ(xi )Φ(xj )
  • It turns out hellip
  • Kernel Trick
  • Kernels
  • String Kernel
  • Solutions of w amp b (2)
  • Decision Boundaries
  • More Maths hellip
  • SVM Roadmap
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk

Decision Boundaries

26

More Maths hellip

27

Quadratic Optimization

Lagrange Duality

KarushndashKuhnndashTucker Conditions

SVM Roadmap

28

Kernel Trick

K(ab)=Φ(a)Φ(b)

ab rarr Φ(a)Φ(b)High Computational Cost

Soft MarginNonlinear Problem

Linear SVMNoise

Linear ClassifierMaximum Margin

Reading Materials

Text Book

Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000

Online Resources

httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg

httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm

A list of papers uploaded to the web learning portal

Wikipedia amp Google

29

Review

What is the definition of margin in a linear classifier

Why do we want to maximize the margin

What is the mathematical expression of margin

How to solve the objective function in SVM

What are support vectors

What is soft margin

How does SVM solve nonlinear problems

What is so called ldquokernel trickrdquo

What are the commonly used kernels

30

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic SVM in Practice

Hints

Applications

Demos

Multi-Class Problems

Softwarebull A very popular toolbox Libsvm

Any other interesting topics beyond this lecture

Length 20 minutes plus question time

31

  • Classification IV
  • Overview
  • Linear Classifier
  • Distance to Hyperplane
  • Selection of Classifiers
  • Unknown Samples
  • Margins
  • Margins (2)
  • Margins (3)
  • Objective Function
  • Lagrange Multipliers
  • Solutions of w amp b
  • An Example
  • Soft Margin
  • Soft Margin (2)
  • Non-linear SVMs
  • Feature Space
  • Feature Space (2)
  • Quadratic Basis Functions
  • Calculation of Φ(xi )Φ(xj )
  • It turns out hellip
  • Kernel Trick
  • Kernels
  • String Kernel
  • Solutions of w amp b (2)
  • Decision Boundaries
  • More Maths hellip
  • SVM Roadmap
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk

More Maths hellip

27

Quadratic Optimization

Lagrange Duality

KarushndashKuhnndashTucker Conditions

SVM Roadmap

28

Kernel Trick

K(ab)=Φ(a)Φ(b)

ab rarr Φ(a)Φ(b)High Computational Cost

Soft MarginNonlinear Problem

Linear SVMNoise

Linear ClassifierMaximum Margin

Reading Materials

Text Book

Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000

Online Resources

httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg

httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm

A list of papers uploaded to the web learning portal

Wikipedia amp Google

29

Review

What is the definition of margin in a linear classifier

Why do we want to maximize the margin

What is the mathematical expression of margin

How to solve the objective function in SVM

What are support vectors

What is soft margin

How does SVM solve nonlinear problems

What is so called ldquokernel trickrdquo

What are the commonly used kernels

30

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic SVM in Practice

Hints

Applications

Demos

Multi-Class Problems

Softwarebull A very popular toolbox Libsvm

Any other interesting topics beyond this lecture

Length 20 minutes plus question time

31

  • Classification IV
  • Overview
  • Linear Classifier
  • Distance to Hyperplane
  • Selection of Classifiers
  • Unknown Samples
  • Margins
  • Margins (2)
  • Margins (3)
  • Objective Function
  • Lagrange Multipliers
  • Solutions of w amp b
  • An Example
  • Soft Margin
  • Soft Margin (2)
  • Non-linear SVMs
  • Feature Space
  • Feature Space (2)
  • Quadratic Basis Functions
  • Calculation of Φ(xi )Φ(xj )
  • It turns out hellip
  • Kernel Trick
  • Kernels
  • String Kernel
  • Solutions of w amp b (2)
  • Decision Boundaries
  • More Maths hellip
  • SVM Roadmap
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk

SVM Roadmap

28

Kernel Trick

K(ab)=Φ(a)Φ(b)

ab rarr Φ(a)Φ(b)High Computational Cost

Soft MarginNonlinear Problem

Linear SVMNoise

Linear ClassifierMaximum Margin

Reading Materials

Text Book

Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000

Online Resources

httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg

httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm

A list of papers uploaded to the web learning portal

Wikipedia amp Google

29

Review

What is the definition of margin in a linear classifier

Why do we want to maximize the margin

What is the mathematical expression of margin

How to solve the objective function in SVM

What are support vectors

What is soft margin

How does SVM solve nonlinear problems

What is so called ldquokernel trickrdquo

What are the commonly used kernels

30

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic SVM in Practice

Hints

Applications

Demos

Multi-Class Problems

Softwarebull A very popular toolbox Libsvm

Any other interesting topics beyond this lecture

Length 20 minutes plus question time

31

  • Classification IV
  • Overview
  • Linear Classifier
  • Distance to Hyperplane
  • Selection of Classifiers
  • Unknown Samples
  • Margins
  • Margins (2)
  • Margins (3)
  • Objective Function
  • Lagrange Multipliers
  • Solutions of w amp b
  • An Example
  • Soft Margin
  • Soft Margin (2)
  • Non-linear SVMs
  • Feature Space
  • Feature Space (2)
  • Quadratic Basis Functions
  • Calculation of Φ(xi )Φ(xj )
  • It turns out hellip
  • Kernel Trick
  • Kernels
  • String Kernel
  • Solutions of w amp b (2)
  • Decision Boundaries
  • More Maths hellip
  • SVM Roadmap
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk

Reading Materials

Text Book

Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000

Online Resources

httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg

httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm

A list of papers uploaded to the web learning portal

Wikipedia amp Google

29

Review

What is the definition of margin in a linear classifier

Why do we want to maximize the margin

What is the mathematical expression of margin

How to solve the objective function in SVM

What are support vectors

What is soft margin

How does SVM solve nonlinear problems

What is so called ldquokernel trickrdquo

What are the commonly used kernels

30

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic SVM in Practice

Hints

Applications

Demos

Multi-Class Problems

Softwarebull A very popular toolbox Libsvm

Any other interesting topics beyond this lecture

Length 20 minutes plus question time

31

  • Classification IV
  • Overview
  • Linear Classifier
  • Distance to Hyperplane
  • Selection of Classifiers
  • Unknown Samples
  • Margins
  • Margins (2)
  • Margins (3)
  • Objective Function
  • Lagrange Multipliers
  • Solutions of w amp b
  • An Example
  • Soft Margin
  • Soft Margin (2)
  • Non-linear SVMs
  • Feature Space
  • Feature Space (2)
  • Quadratic Basis Functions
  • Calculation of Φ(xi )Φ(xj )
  • It turns out hellip
  • Kernel Trick
  • Kernels
  • String Kernel
  • Solutions of w amp b (2)
  • Decision Boundaries
  • More Maths hellip
  • SVM Roadmap
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk

Review

What is the definition of margin in a linear classifier

Why do we want to maximize the margin

What is the mathematical expression of margin

How to solve the objective function in SVM

What are support vectors

What is soft margin

How does SVM solve nonlinear problems

What is so called ldquokernel trickrdquo

What are the commonly used kernels

30

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic SVM in Practice

Hints

Applications

Demos

Multi-Class Problems

Softwarebull A very popular toolbox Libsvm

Any other interesting topics beyond this lecture

Length 20 minutes plus question time

31

  • Classification IV
  • Overview
  • Linear Classifier
  • Distance to Hyperplane
  • Selection of Classifiers
  • Unknown Samples
  • Margins
  • Margins (2)
  • Margins (3)
  • Objective Function
  • Lagrange Multipliers
  • Solutions of w amp b
  • An Example
  • Soft Margin
  • Soft Margin (2)
  • Non-linear SVMs
  • Feature Space
  • Feature Space (2)
  • Quadratic Basis Functions
  • Calculation of Φ(xi )Φ(xj )
  • It turns out hellip
  • Kernel Trick
  • Kernels
  • String Kernel
  • Solutions of w amp b (2)
  • Decision Boundaries
  • More Maths hellip
  • SVM Roadmap
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic SVM in Practice

Hints

Applications

Demos

Multi-Class Problems

Softwarebull A very popular toolbox Libsvm

Any other interesting topics beyond this lecture

Length 20 minutes plus question time

31

  • Classification IV
  • Overview
  • Linear Classifier
  • Distance to Hyperplane
  • Selection of Classifiers
  • Unknown Samples
  • Margins
  • Margins (2)
  • Margins (3)
  • Objective Function
  • Lagrange Multipliers
  • Solutions of w amp b
  • An Example
  • Soft Margin
  • Soft Margin (2)
  • Non-linear SVMs
  • Feature Space
  • Feature Space (2)
  • Quadratic Basis Functions
  • Calculation of Φ(xi )Φ(xj )
  • It turns out hellip
  • Kernel Trick
  • Kernels
  • String Kernel
  • Solutions of w amp b (2)
  • Decision Boundaries
  • More Maths hellip
  • SVM Roadmap
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk