LOGO Classification IV Lecturer: Dr. Bo Yuan E-mail: [email protected].
-
Upload
arely-truran -
Category
Documents
-
view
213 -
download
0
Transcript of LOGO Classification IV Lecturer: Dr. Bo Yuan E-mail: [email protected].
LOGO
Classification IV
Lecturer Dr Bo Yuan
E-mail yuanbsztsinghuaeducn
Overview
Support Vector Machines
2
Linear Classifier
3
wx + b gt0
wx + b lt0
wx + b =0
)(
))(()(
bxwsign
xgsignbwxf
w
n
iiixwxw
caseinJust
1
0)( 21
21
xxw
bxwbxw
Distance to Hyperplane
4
x
x
bxwxg )(
ww
wwbxw
bwxwxg
wxx
)()(
||||
|)(||||||)(|
||||||||
w
xg
ww
wxg
wxxM
||||
||
w
b
Selection of Classifiers
5
Which classifier is the best
All have the same training error
How about generalization
Unknown Samples
6
A
B
Classifier B divides the space more consistently (unbiased)
Margins
7
Support Vectors Support Vectors
Margins
The margin of a linear classifier is defined as the width that the boundary could be increased by before hitting a data point
Intuitively it is safer to choose a classifier with a larger margin
Wider buffer zone for mistakes
The hyperplane is decided by only a few data points
Support Vectors
Others can be discarded
Select the classifier with the maximum margin Linear Support Vector Machines (LSVM)
Works very well in practice
How to specify the margin formally
8
Margins
9
ldquoPredict Class
= +1rdquo
zone
ldquoPredict Class
= -1rdquo
zone
wx+b=1
wx+b=
0wx+b=-1
X-
x+M=Margin
Width
||||
2
wM
Objective Function
Correctly classify all data points
Maximize the margin
Quadratic Optimization Problem
Minimize
Subject to 10
1 1 ii yifbxw
1 1 ii yifbxw
01)( bxwy ii
www
M T
2
1min
2max
www t
2
1)(
1)( bxwy ii
Lagrange Multipliers
11
l
i
l
iiiiiP bxwywL
1 1
2 )(||||2
1
00
0
1
1
l
iii
p
l
iiii
p
yb
L
xyww
L
0amp0
2
1
2
1
ii
ii
jijiijT
ii
jijijiji
iiD
ytosubject
xxyyHwhereH
xxyyL
Dual Problem
Quadratic problem again
Solutions of w amp b
12
bxxyxgl
iiii
1
)(
inner product
positivewithSamplesVectorsSupport
Ss Smsmmms
s
Smsmmms
ssmmSm
ms
Smsmmms
ss
xxyyN
b
xxyyb
ybxxyy
bxxyy
bwxy
)(1
)(
1)(
1)(
2
An Example
13
(1 1 +1)
(0 0 -1)
1
2
212
2
11 00
iii y
00
02
22221212
21211111
2221
1211
xxyyxxyy
xxyyxxyy
HH
HHH
211
2
12
121 2
2
1
HLi
iD
11 21
1)(
1121
]11[]00[)1(1]11[11
21
1
2
1
xxbwxxg
wxb
xywi
iii
0121 xx
22
22
wM
x1
x2
Soft Margin
14
wx+b=1
wx+b=0
wx+b=-
1
e7
e11 e2 01)( iii bwxy
0
2
1)(
i
ii
t Cwww
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Soft Margin
15
iii
p
l
iii
p
l
iiii
p
uCL
yb
L
xyww
L
0
00
0
1
1
i
iiiT
iiD yandCtsHL 00
2
1
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Non-linear SVMs
16
0 x
0 x
x2
x
Feature Space
17
Φ x rarr φ(x)
x1
x2
x12
x22
Feature Space
18
Φ x rarr φ(x)
x2
x1
Quadratic Basis Functions
19
mm
m
m
m
m
xx
xx
xx
xx
xx
xx
x
x
x
x
x
x
x
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
)(
Constant Terms
Linear Terms
Pure Quadratic Terms
Quadratic Cross-Terms
Number of terms
22
)1)(2( 22
2
mmmCm
Calculation of Φ(xi )Φ(xj )
20
mm
m
m
m
m
mm
m
m
m
m
bb
bb
bb
bb
bb
bb
b
b
b
b
b
b
aa
aa
aa
aa
aa
aa
a
a
a
a
a
a
ba
1
2
32
1
31
21
2
22
21
2
1
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
1
)()(
1
m
iiiba
1
2
m
iii ba
1
22
1
1 1
2m
ijiji
m
ij
bbaa
)()( jiji xxxx
It turns out hellip
21
m
i
m
i
m
i
m
ijjijiiiii bbaabababa
1 1
1
1 1
22 221)()(
1
1 111
2
1 1 1
1
2
1
22
122)(
12
12)(12)()1(
m
i
m
iii
m
ijjjii
m
iii
m
i
m
j
m
iiijjii
m
iii
m
iii
babababa
bababa
bababababa
)()()1()( 2 bababaK )(mO )( 2mO
Kernel Trick
The linear classifier relies on dot products between vectors xixj
If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)
A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)
Example x=[x1 x2] K(xi xj) = (1 + xi xj)2
22
]2221[)( where)(x)(x
]2221[]2221[
2221)1()(
212221
21ji
212221
2121
2221
21
221122
222211
21
21
2
xxxxxxx
xxxxxxxxxxxx
xxxxxxxxxxxxxxxxK
jjjjjjiiiiii
jijijijijijijiji
Kernels
23
)tanh()(
2exp)(
)1()(
2
2
cxxxxKTangentHyperbolic
xxxxKGaussian
xxxxKPolynomial
jiji
ji
ji
djiji
String Kernel
24
64
4
2)()(
)(
catcatKcarcarK
catcarK
Similarity between text strings Car vs Custard
Solutions of w amp b
25
bxxKyxg
xxKyyN
xxyyN
b
xxKyxxyxw
xyw
l
iiii
Ss Smsmmms
sSs Smsmmms
s
l
i
l
ijiiijiiij
l
iiii
1
1 1
1
)()(
))((1
))()((1
)()()()(
)(
bxxybxwxgl
iiii
1
)(
Decision Boundaries
26
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
- Classification IV
- Overview
- Linear Classifier
- Distance to Hyperplane
- Selection of Classifiers
- Unknown Samples
- Margins
- Margins (2)
- Margins (3)
- Objective Function
- Lagrange Multipliers
- Solutions of w amp b
- An Example
- Soft Margin
- Soft Margin (2)
- Non-linear SVMs
- Feature Space
- Feature Space (2)
- Quadratic Basis Functions
- Calculation of Φ(xi )Φ(xj )
- It turns out hellip
- Kernel Trick
- Kernels
- String Kernel
- Solutions of w amp b (2)
- Decision Boundaries
- More Maths hellip
- SVM Roadmap
- Reading Materials
- Review
- Next Weekrsquos Class Talk
-
Overview
Support Vector Machines
2
Linear Classifier
3
wx + b gt0
wx + b lt0
wx + b =0
)(
))(()(
bxwsign
xgsignbwxf
w
n
iiixwxw
caseinJust
1
0)( 21
21
xxw
bxwbxw
Distance to Hyperplane
4
x
x
bxwxg )(
ww
wwbxw
bwxwxg
wxx
)()(
||||
|)(||||||)(|
||||||||
w
xg
ww
wxg
wxxM
||||
||
w
b
Selection of Classifiers
5
Which classifier is the best
All have the same training error
How about generalization
Unknown Samples
6
A
B
Classifier B divides the space more consistently (unbiased)
Margins
7
Support Vectors Support Vectors
Margins
The margin of a linear classifier is defined as the width that the boundary could be increased by before hitting a data point
Intuitively it is safer to choose a classifier with a larger margin
Wider buffer zone for mistakes
The hyperplane is decided by only a few data points
Support Vectors
Others can be discarded
Select the classifier with the maximum margin Linear Support Vector Machines (LSVM)
Works very well in practice
How to specify the margin formally
8
Margins
9
ldquoPredict Class
= +1rdquo
zone
ldquoPredict Class
= -1rdquo
zone
wx+b=1
wx+b=
0wx+b=-1
X-
x+M=Margin
Width
||||
2
wM
Objective Function
Correctly classify all data points
Maximize the margin
Quadratic Optimization Problem
Minimize
Subject to 10
1 1 ii yifbxw
1 1 ii yifbxw
01)( bxwy ii
www
M T
2
1min
2max
www t
2
1)(
1)( bxwy ii
Lagrange Multipliers
11
l
i
l
iiiiiP bxwywL
1 1
2 )(||||2
1
00
0
1
1
l
iii
p
l
iiii
p
yb
L
xyww
L
0amp0
2
1
2
1
ii
ii
jijiijT
ii
jijijiji
iiD
ytosubject
xxyyHwhereH
xxyyL
Dual Problem
Quadratic problem again
Solutions of w amp b
12
bxxyxgl
iiii
1
)(
inner product
positivewithSamplesVectorsSupport
Ss Smsmmms
s
Smsmmms
ssmmSm
ms
Smsmmms
ss
xxyyN
b
xxyyb
ybxxyy
bxxyy
bwxy
)(1
)(
1)(
1)(
2
An Example
13
(1 1 +1)
(0 0 -1)
1
2
212
2
11 00
iii y
00
02
22221212
21211111
2221
1211
xxyyxxyy
xxyyxxyy
HH
HHH
211
2
12
121 2
2
1
HLi
iD
11 21
1)(
1121
]11[]00[)1(1]11[11
21
1
2
1
xxbwxxg
wxb
xywi
iii
0121 xx
22
22
wM
x1
x2
Soft Margin
14
wx+b=1
wx+b=0
wx+b=-
1
e7
e11 e2 01)( iii bwxy
0
2
1)(
i
ii
t Cwww
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Soft Margin
15
iii
p
l
iii
p
l
iiii
p
uCL
yb
L
xyww
L
0
00
0
1
1
i
iiiT
iiD yandCtsHL 00
2
1
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Non-linear SVMs
16
0 x
0 x
x2
x
Feature Space
17
Φ x rarr φ(x)
x1
x2
x12
x22
Feature Space
18
Φ x rarr φ(x)
x2
x1
Quadratic Basis Functions
19
mm
m
m
m
m
xx
xx
xx
xx
xx
xx
x
x
x
x
x
x
x
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
)(
Constant Terms
Linear Terms
Pure Quadratic Terms
Quadratic Cross-Terms
Number of terms
22
)1)(2( 22
2
mmmCm
Calculation of Φ(xi )Φ(xj )
20
mm
m
m
m
m
mm
m
m
m
m
bb
bb
bb
bb
bb
bb
b
b
b
b
b
b
aa
aa
aa
aa
aa
aa
a
a
a
a
a
a
ba
1
2
32
1
31
21
2
22
21
2
1
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
1
)()(
1
m
iiiba
1
2
m
iii ba
1
22
1
1 1
2m
ijiji
m
ij
bbaa
)()( jiji xxxx
It turns out hellip
21
m
i
m
i
m
i
m
ijjijiiiii bbaabababa
1 1
1
1 1
22 221)()(
1
1 111
2
1 1 1
1
2
1
22
122)(
12
12)(12)()1(
m
i
m
iii
m
ijjjii
m
iii
m
i
m
j
m
iiijjii
m
iii
m
iii
babababa
bababa
bababababa
)()()1()( 2 bababaK )(mO )( 2mO
Kernel Trick
The linear classifier relies on dot products between vectors xixj
If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)
A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)
Example x=[x1 x2] K(xi xj) = (1 + xi xj)2
22
]2221[)( where)(x)(x
]2221[]2221[
2221)1()(
212221
21ji
212221
2121
2221
21
221122
222211
21
21
2
xxxxxxx
xxxxxxxxxxxx
xxxxxxxxxxxxxxxxK
jjjjjjiiiiii
jijijijijijijiji
Kernels
23
)tanh()(
2exp)(
)1()(
2
2
cxxxxKTangentHyperbolic
xxxxKGaussian
xxxxKPolynomial
jiji
ji
ji
djiji
String Kernel
24
64
4
2)()(
)(
catcatKcarcarK
catcarK
Similarity between text strings Car vs Custard
Solutions of w amp b
25
bxxKyxg
xxKyyN
xxyyN
b
xxKyxxyxw
xyw
l
iiii
Ss Smsmmms
sSs Smsmmms
s
l
i
l
ijiiijiiij
l
iiii
1
1 1
1
)()(
))((1
))()((1
)()()()(
)(
bxxybxwxgl
iiii
1
)(
Decision Boundaries
26
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
- Classification IV
- Overview
- Linear Classifier
- Distance to Hyperplane
- Selection of Classifiers
- Unknown Samples
- Margins
- Margins (2)
- Margins (3)
- Objective Function
- Lagrange Multipliers
- Solutions of w amp b
- An Example
- Soft Margin
- Soft Margin (2)
- Non-linear SVMs
- Feature Space
- Feature Space (2)
- Quadratic Basis Functions
- Calculation of Φ(xi )Φ(xj )
- It turns out hellip
- Kernel Trick
- Kernels
- String Kernel
- Solutions of w amp b (2)
- Decision Boundaries
- More Maths hellip
- SVM Roadmap
- Reading Materials
- Review
- Next Weekrsquos Class Talk
-
Linear Classifier
3
wx + b gt0
wx + b lt0
wx + b =0
)(
))(()(
bxwsign
xgsignbwxf
w
n
iiixwxw
caseinJust
1
0)( 21
21
xxw
bxwbxw
Distance to Hyperplane
4
x
x
bxwxg )(
ww
wwbxw
bwxwxg
wxx
)()(
||||
|)(||||||)(|
||||||||
w
xg
ww
wxg
wxxM
||||
||
w
b
Selection of Classifiers
5
Which classifier is the best
All have the same training error
How about generalization
Unknown Samples
6
A
B
Classifier B divides the space more consistently (unbiased)
Margins
7
Support Vectors Support Vectors
Margins
The margin of a linear classifier is defined as the width that the boundary could be increased by before hitting a data point
Intuitively it is safer to choose a classifier with a larger margin
Wider buffer zone for mistakes
The hyperplane is decided by only a few data points
Support Vectors
Others can be discarded
Select the classifier with the maximum margin Linear Support Vector Machines (LSVM)
Works very well in practice
How to specify the margin formally
8
Margins
9
ldquoPredict Class
= +1rdquo
zone
ldquoPredict Class
= -1rdquo
zone
wx+b=1
wx+b=
0wx+b=-1
X-
x+M=Margin
Width
||||
2
wM
Objective Function
Correctly classify all data points
Maximize the margin
Quadratic Optimization Problem
Minimize
Subject to 10
1 1 ii yifbxw
1 1 ii yifbxw
01)( bxwy ii
www
M T
2
1min
2max
www t
2
1)(
1)( bxwy ii
Lagrange Multipliers
11
l
i
l
iiiiiP bxwywL
1 1
2 )(||||2
1
00
0
1
1
l
iii
p
l
iiii
p
yb
L
xyww
L
0amp0
2
1
2
1
ii
ii
jijiijT
ii
jijijiji
iiD
ytosubject
xxyyHwhereH
xxyyL
Dual Problem
Quadratic problem again
Solutions of w amp b
12
bxxyxgl
iiii
1
)(
inner product
positivewithSamplesVectorsSupport
Ss Smsmmms
s
Smsmmms
ssmmSm
ms
Smsmmms
ss
xxyyN
b
xxyyb
ybxxyy
bxxyy
bwxy
)(1
)(
1)(
1)(
2
An Example
13
(1 1 +1)
(0 0 -1)
1
2
212
2
11 00
iii y
00
02
22221212
21211111
2221
1211
xxyyxxyy
xxyyxxyy
HH
HHH
211
2
12
121 2
2
1
HLi
iD
11 21
1)(
1121
]11[]00[)1(1]11[11
21
1
2
1
xxbwxxg
wxb
xywi
iii
0121 xx
22
22
wM
x1
x2
Soft Margin
14
wx+b=1
wx+b=0
wx+b=-
1
e7
e11 e2 01)( iii bwxy
0
2
1)(
i
ii
t Cwww
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Soft Margin
15
iii
p
l
iii
p
l
iiii
p
uCL
yb
L
xyww
L
0
00
0
1
1
i
iiiT
iiD yandCtsHL 00
2
1
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Non-linear SVMs
16
0 x
0 x
x2
x
Feature Space
17
Φ x rarr φ(x)
x1
x2
x12
x22
Feature Space
18
Φ x rarr φ(x)
x2
x1
Quadratic Basis Functions
19
mm
m
m
m
m
xx
xx
xx
xx
xx
xx
x
x
x
x
x
x
x
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
)(
Constant Terms
Linear Terms
Pure Quadratic Terms
Quadratic Cross-Terms
Number of terms
22
)1)(2( 22
2
mmmCm
Calculation of Φ(xi )Φ(xj )
20
mm
m
m
m
m
mm
m
m
m
m
bb
bb
bb
bb
bb
bb
b
b
b
b
b
b
aa
aa
aa
aa
aa
aa
a
a
a
a
a
a
ba
1
2
32
1
31
21
2
22
21
2
1
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
1
)()(
1
m
iiiba
1
2
m
iii ba
1
22
1
1 1
2m
ijiji
m
ij
bbaa
)()( jiji xxxx
It turns out hellip
21
m
i
m
i
m
i
m
ijjijiiiii bbaabababa
1 1
1
1 1
22 221)()(
1
1 111
2
1 1 1
1
2
1
22
122)(
12
12)(12)()1(
m
i
m
iii
m
ijjjii
m
iii
m
i
m
j
m
iiijjii
m
iii
m
iii
babababa
bababa
bababababa
)()()1()( 2 bababaK )(mO )( 2mO
Kernel Trick
The linear classifier relies on dot products between vectors xixj
If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)
A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)
Example x=[x1 x2] K(xi xj) = (1 + xi xj)2
22
]2221[)( where)(x)(x
]2221[]2221[
2221)1()(
212221
21ji
212221
2121
2221
21
221122
222211
21
21
2
xxxxxxx
xxxxxxxxxxxx
xxxxxxxxxxxxxxxxK
jjjjjjiiiiii
jijijijijijijiji
Kernels
23
)tanh()(
2exp)(
)1()(
2
2
cxxxxKTangentHyperbolic
xxxxKGaussian
xxxxKPolynomial
jiji
ji
ji
djiji
String Kernel
24
64
4
2)()(
)(
catcatKcarcarK
catcarK
Similarity between text strings Car vs Custard
Solutions of w amp b
25
bxxKyxg
xxKyyN
xxyyN
b
xxKyxxyxw
xyw
l
iiii
Ss Smsmmms
sSs Smsmmms
s
l
i
l
ijiiijiiij
l
iiii
1
1 1
1
)()(
))((1
))()((1
)()()()(
)(
bxxybxwxgl
iiii
1
)(
Decision Boundaries
26
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
- Classification IV
- Overview
- Linear Classifier
- Distance to Hyperplane
- Selection of Classifiers
- Unknown Samples
- Margins
- Margins (2)
- Margins (3)
- Objective Function
- Lagrange Multipliers
- Solutions of w amp b
- An Example
- Soft Margin
- Soft Margin (2)
- Non-linear SVMs
- Feature Space
- Feature Space (2)
- Quadratic Basis Functions
- Calculation of Φ(xi )Φ(xj )
- It turns out hellip
- Kernel Trick
- Kernels
- String Kernel
- Solutions of w amp b (2)
- Decision Boundaries
- More Maths hellip
- SVM Roadmap
- Reading Materials
- Review
- Next Weekrsquos Class Talk
-
Distance to Hyperplane
4
x
x
bxwxg )(
ww
wwbxw
bwxwxg
wxx
)()(
||||
|)(||||||)(|
||||||||
w
xg
ww
wxg
wxxM
||||
||
w
b
Selection of Classifiers
5
Which classifier is the best
All have the same training error
How about generalization
Unknown Samples
6
A
B
Classifier B divides the space more consistently (unbiased)
Margins
7
Support Vectors Support Vectors
Margins
The margin of a linear classifier is defined as the width that the boundary could be increased by before hitting a data point
Intuitively it is safer to choose a classifier with a larger margin
Wider buffer zone for mistakes
The hyperplane is decided by only a few data points
Support Vectors
Others can be discarded
Select the classifier with the maximum margin Linear Support Vector Machines (LSVM)
Works very well in practice
How to specify the margin formally
8
Margins
9
ldquoPredict Class
= +1rdquo
zone
ldquoPredict Class
= -1rdquo
zone
wx+b=1
wx+b=
0wx+b=-1
X-
x+M=Margin
Width
||||
2
wM
Objective Function
Correctly classify all data points
Maximize the margin
Quadratic Optimization Problem
Minimize
Subject to 10
1 1 ii yifbxw
1 1 ii yifbxw
01)( bxwy ii
www
M T
2
1min
2max
www t
2
1)(
1)( bxwy ii
Lagrange Multipliers
11
l
i
l
iiiiiP bxwywL
1 1
2 )(||||2
1
00
0
1
1
l
iii
p
l
iiii
p
yb
L
xyww
L
0amp0
2
1
2
1
ii
ii
jijiijT
ii
jijijiji
iiD
ytosubject
xxyyHwhereH
xxyyL
Dual Problem
Quadratic problem again
Solutions of w amp b
12
bxxyxgl
iiii
1
)(
inner product
positivewithSamplesVectorsSupport
Ss Smsmmms
s
Smsmmms
ssmmSm
ms
Smsmmms
ss
xxyyN
b
xxyyb
ybxxyy
bxxyy
bwxy
)(1
)(
1)(
1)(
2
An Example
13
(1 1 +1)
(0 0 -1)
1
2
212
2
11 00
iii y
00
02
22221212
21211111
2221
1211
xxyyxxyy
xxyyxxyy
HH
HHH
211
2
12
121 2
2
1
HLi
iD
11 21
1)(
1121
]11[]00[)1(1]11[11
21
1
2
1
xxbwxxg
wxb
xywi
iii
0121 xx
22
22
wM
x1
x2
Soft Margin
14
wx+b=1
wx+b=0
wx+b=-
1
e7
e11 e2 01)( iii bwxy
0
2
1)(
i
ii
t Cwww
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Soft Margin
15
iii
p
l
iii
p
l
iiii
p
uCL
yb
L
xyww
L
0
00
0
1
1
i
iiiT
iiD yandCtsHL 00
2
1
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Non-linear SVMs
16
0 x
0 x
x2
x
Feature Space
17
Φ x rarr φ(x)
x1
x2
x12
x22
Feature Space
18
Φ x rarr φ(x)
x2
x1
Quadratic Basis Functions
19
mm
m
m
m
m
xx
xx
xx
xx
xx
xx
x
x
x
x
x
x
x
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
)(
Constant Terms
Linear Terms
Pure Quadratic Terms
Quadratic Cross-Terms
Number of terms
22
)1)(2( 22
2
mmmCm
Calculation of Φ(xi )Φ(xj )
20
mm
m
m
m
m
mm
m
m
m
m
bb
bb
bb
bb
bb
bb
b
b
b
b
b
b
aa
aa
aa
aa
aa
aa
a
a
a
a
a
a
ba
1
2
32
1
31
21
2
22
21
2
1
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
1
)()(
1
m
iiiba
1
2
m
iii ba
1
22
1
1 1
2m
ijiji
m
ij
bbaa
)()( jiji xxxx
It turns out hellip
21
m
i
m
i
m
i
m
ijjijiiiii bbaabababa
1 1
1
1 1
22 221)()(
1
1 111
2
1 1 1
1
2
1
22
122)(
12
12)(12)()1(
m
i
m
iii
m
ijjjii
m
iii
m
i
m
j
m
iiijjii
m
iii
m
iii
babababa
bababa
bababababa
)()()1()( 2 bababaK )(mO )( 2mO
Kernel Trick
The linear classifier relies on dot products between vectors xixj
If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)
A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)
Example x=[x1 x2] K(xi xj) = (1 + xi xj)2
22
]2221[)( where)(x)(x
]2221[]2221[
2221)1()(
212221
21ji
212221
2121
2221
21
221122
222211
21
21
2
xxxxxxx
xxxxxxxxxxxx
xxxxxxxxxxxxxxxxK
jjjjjjiiiiii
jijijijijijijiji
Kernels
23
)tanh()(
2exp)(
)1()(
2
2
cxxxxKTangentHyperbolic
xxxxKGaussian
xxxxKPolynomial
jiji
ji
ji
djiji
String Kernel
24
64
4
2)()(
)(
catcatKcarcarK
catcarK
Similarity between text strings Car vs Custard
Solutions of w amp b
25
bxxKyxg
xxKyyN
xxyyN
b
xxKyxxyxw
xyw
l
iiii
Ss Smsmmms
sSs Smsmmms
s
l
i
l
ijiiijiiij
l
iiii
1
1 1
1
)()(
))((1
))()((1
)()()()(
)(
bxxybxwxgl
iiii
1
)(
Decision Boundaries
26
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
- Classification IV
- Overview
- Linear Classifier
- Distance to Hyperplane
- Selection of Classifiers
- Unknown Samples
- Margins
- Margins (2)
- Margins (3)
- Objective Function
- Lagrange Multipliers
- Solutions of w amp b
- An Example
- Soft Margin
- Soft Margin (2)
- Non-linear SVMs
- Feature Space
- Feature Space (2)
- Quadratic Basis Functions
- Calculation of Φ(xi )Φ(xj )
- It turns out hellip
- Kernel Trick
- Kernels
- String Kernel
- Solutions of w amp b (2)
- Decision Boundaries
- More Maths hellip
- SVM Roadmap
- Reading Materials
- Review
- Next Weekrsquos Class Talk
-
Selection of Classifiers
5
Which classifier is the best
All have the same training error
How about generalization
Unknown Samples
6
A
B
Classifier B divides the space more consistently (unbiased)
Margins
7
Support Vectors Support Vectors
Margins
The margin of a linear classifier is defined as the width that the boundary could be increased by before hitting a data point
Intuitively it is safer to choose a classifier with a larger margin
Wider buffer zone for mistakes
The hyperplane is decided by only a few data points
Support Vectors
Others can be discarded
Select the classifier with the maximum margin Linear Support Vector Machines (LSVM)
Works very well in practice
How to specify the margin formally
8
Margins
9
ldquoPredict Class
= +1rdquo
zone
ldquoPredict Class
= -1rdquo
zone
wx+b=1
wx+b=
0wx+b=-1
X-
x+M=Margin
Width
||||
2
wM
Objective Function
Correctly classify all data points
Maximize the margin
Quadratic Optimization Problem
Minimize
Subject to 10
1 1 ii yifbxw
1 1 ii yifbxw
01)( bxwy ii
www
M T
2
1min
2max
www t
2
1)(
1)( bxwy ii
Lagrange Multipliers
11
l
i
l
iiiiiP bxwywL
1 1
2 )(||||2
1
00
0
1
1
l
iii
p
l
iiii
p
yb
L
xyww
L
0amp0
2
1
2
1
ii
ii
jijiijT
ii
jijijiji
iiD
ytosubject
xxyyHwhereH
xxyyL
Dual Problem
Quadratic problem again
Solutions of w amp b
12
bxxyxgl
iiii
1
)(
inner product
positivewithSamplesVectorsSupport
Ss Smsmmms
s
Smsmmms
ssmmSm
ms
Smsmmms
ss
xxyyN
b
xxyyb
ybxxyy
bxxyy
bwxy
)(1
)(
1)(
1)(
2
An Example
13
(1 1 +1)
(0 0 -1)
1
2
212
2
11 00
iii y
00
02
22221212
21211111
2221
1211
xxyyxxyy
xxyyxxyy
HH
HHH
211
2
12
121 2
2
1
HLi
iD
11 21
1)(
1121
]11[]00[)1(1]11[11
21
1
2
1
xxbwxxg
wxb
xywi
iii
0121 xx
22
22
wM
x1
x2
Soft Margin
14
wx+b=1
wx+b=0
wx+b=-
1
e7
e11 e2 01)( iii bwxy
0
2
1)(
i
ii
t Cwww
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Soft Margin
15
iii
p
l
iii
p
l
iiii
p
uCL
yb
L
xyww
L
0
00
0
1
1
i
iiiT
iiD yandCtsHL 00
2
1
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Non-linear SVMs
16
0 x
0 x
x2
x
Feature Space
17
Φ x rarr φ(x)
x1
x2
x12
x22
Feature Space
18
Φ x rarr φ(x)
x2
x1
Quadratic Basis Functions
19
mm
m
m
m
m
xx
xx
xx
xx
xx
xx
x
x
x
x
x
x
x
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
)(
Constant Terms
Linear Terms
Pure Quadratic Terms
Quadratic Cross-Terms
Number of terms
22
)1)(2( 22
2
mmmCm
Calculation of Φ(xi )Φ(xj )
20
mm
m
m
m
m
mm
m
m
m
m
bb
bb
bb
bb
bb
bb
b
b
b
b
b
b
aa
aa
aa
aa
aa
aa
a
a
a
a
a
a
ba
1
2
32
1
31
21
2
22
21
2
1
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
1
)()(
1
m
iiiba
1
2
m
iii ba
1
22
1
1 1
2m
ijiji
m
ij
bbaa
)()( jiji xxxx
It turns out hellip
21
m
i
m
i
m
i
m
ijjijiiiii bbaabababa
1 1
1
1 1
22 221)()(
1
1 111
2
1 1 1
1
2
1
22
122)(
12
12)(12)()1(
m
i
m
iii
m
ijjjii
m
iii
m
i
m
j
m
iiijjii
m
iii
m
iii
babababa
bababa
bababababa
)()()1()( 2 bababaK )(mO )( 2mO
Kernel Trick
The linear classifier relies on dot products between vectors xixj
If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)
A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)
Example x=[x1 x2] K(xi xj) = (1 + xi xj)2
22
]2221[)( where)(x)(x
]2221[]2221[
2221)1()(
212221
21ji
212221
2121
2221
21
221122
222211
21
21
2
xxxxxxx
xxxxxxxxxxxx
xxxxxxxxxxxxxxxxK
jjjjjjiiiiii
jijijijijijijiji
Kernels
23
)tanh()(
2exp)(
)1()(
2
2
cxxxxKTangentHyperbolic
xxxxKGaussian
xxxxKPolynomial
jiji
ji
ji
djiji
String Kernel
24
64
4
2)()(
)(
catcatKcarcarK
catcarK
Similarity between text strings Car vs Custard
Solutions of w amp b
25
bxxKyxg
xxKyyN
xxyyN
b
xxKyxxyxw
xyw
l
iiii
Ss Smsmmms
sSs Smsmmms
s
l
i
l
ijiiijiiij
l
iiii
1
1 1
1
)()(
))((1
))()((1
)()()()(
)(
bxxybxwxgl
iiii
1
)(
Decision Boundaries
26
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
- Classification IV
- Overview
- Linear Classifier
- Distance to Hyperplane
- Selection of Classifiers
- Unknown Samples
- Margins
- Margins (2)
- Margins (3)
- Objective Function
- Lagrange Multipliers
- Solutions of w amp b
- An Example
- Soft Margin
- Soft Margin (2)
- Non-linear SVMs
- Feature Space
- Feature Space (2)
- Quadratic Basis Functions
- Calculation of Φ(xi )Φ(xj )
- It turns out hellip
- Kernel Trick
- Kernels
- String Kernel
- Solutions of w amp b (2)
- Decision Boundaries
- More Maths hellip
- SVM Roadmap
- Reading Materials
- Review
- Next Weekrsquos Class Talk
-
Unknown Samples
6
A
B
Classifier B divides the space more consistently (unbiased)
Margins
7
Support Vectors Support Vectors
Margins
The margin of a linear classifier is defined as the width that the boundary could be increased by before hitting a data point
Intuitively it is safer to choose a classifier with a larger margin
Wider buffer zone for mistakes
The hyperplane is decided by only a few data points
Support Vectors
Others can be discarded
Select the classifier with the maximum margin Linear Support Vector Machines (LSVM)
Works very well in practice
How to specify the margin formally
8
Margins
9
ldquoPredict Class
= +1rdquo
zone
ldquoPredict Class
= -1rdquo
zone
wx+b=1
wx+b=
0wx+b=-1
X-
x+M=Margin
Width
||||
2
wM
Objective Function
Correctly classify all data points
Maximize the margin
Quadratic Optimization Problem
Minimize
Subject to 10
1 1 ii yifbxw
1 1 ii yifbxw
01)( bxwy ii
www
M T
2
1min
2max
www t
2
1)(
1)( bxwy ii
Lagrange Multipliers
11
l
i
l
iiiiiP bxwywL
1 1
2 )(||||2
1
00
0
1
1
l
iii
p
l
iiii
p
yb
L
xyww
L
0amp0
2
1
2
1
ii
ii
jijiijT
ii
jijijiji
iiD
ytosubject
xxyyHwhereH
xxyyL
Dual Problem
Quadratic problem again
Solutions of w amp b
12
bxxyxgl
iiii
1
)(
inner product
positivewithSamplesVectorsSupport
Ss Smsmmms
s
Smsmmms
ssmmSm
ms
Smsmmms
ss
xxyyN
b
xxyyb
ybxxyy
bxxyy
bwxy
)(1
)(
1)(
1)(
2
An Example
13
(1 1 +1)
(0 0 -1)
1
2
212
2
11 00
iii y
00
02
22221212
21211111
2221
1211
xxyyxxyy
xxyyxxyy
HH
HHH
211
2
12
121 2
2
1
HLi
iD
11 21
1)(
1121
]11[]00[)1(1]11[11
21
1
2
1
xxbwxxg
wxb
xywi
iii
0121 xx
22
22
wM
x1
x2
Soft Margin
14
wx+b=1
wx+b=0
wx+b=-
1
e7
e11 e2 01)( iii bwxy
0
2
1)(
i
ii
t Cwww
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Soft Margin
15
iii
p
l
iii
p
l
iiii
p
uCL
yb
L
xyww
L
0
00
0
1
1
i
iiiT
iiD yandCtsHL 00
2
1
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Non-linear SVMs
16
0 x
0 x
x2
x
Feature Space
17
Φ x rarr φ(x)
x1
x2
x12
x22
Feature Space
18
Φ x rarr φ(x)
x2
x1
Quadratic Basis Functions
19
mm
m
m
m
m
xx
xx
xx
xx
xx
xx
x
x
x
x
x
x
x
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
)(
Constant Terms
Linear Terms
Pure Quadratic Terms
Quadratic Cross-Terms
Number of terms
22
)1)(2( 22
2
mmmCm
Calculation of Φ(xi )Φ(xj )
20
mm
m
m
m
m
mm
m
m
m
m
bb
bb
bb
bb
bb
bb
b
b
b
b
b
b
aa
aa
aa
aa
aa
aa
a
a
a
a
a
a
ba
1
2
32
1
31
21
2
22
21
2
1
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
1
)()(
1
m
iiiba
1
2
m
iii ba
1
22
1
1 1
2m
ijiji
m
ij
bbaa
)()( jiji xxxx
It turns out hellip
21
m
i
m
i
m
i
m
ijjijiiiii bbaabababa
1 1
1
1 1
22 221)()(
1
1 111
2
1 1 1
1
2
1
22
122)(
12
12)(12)()1(
m
i
m
iii
m
ijjjii
m
iii
m
i
m
j
m
iiijjii
m
iii
m
iii
babababa
bababa
bababababa
)()()1()( 2 bababaK )(mO )( 2mO
Kernel Trick
The linear classifier relies on dot products between vectors xixj
If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)
A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)
Example x=[x1 x2] K(xi xj) = (1 + xi xj)2
22
]2221[)( where)(x)(x
]2221[]2221[
2221)1()(
212221
21ji
212221
2121
2221
21
221122
222211
21
21
2
xxxxxxx
xxxxxxxxxxxx
xxxxxxxxxxxxxxxxK
jjjjjjiiiiii
jijijijijijijiji
Kernels
23
)tanh()(
2exp)(
)1()(
2
2
cxxxxKTangentHyperbolic
xxxxKGaussian
xxxxKPolynomial
jiji
ji
ji
djiji
String Kernel
24
64
4
2)()(
)(
catcatKcarcarK
catcarK
Similarity between text strings Car vs Custard
Solutions of w amp b
25
bxxKyxg
xxKyyN
xxyyN
b
xxKyxxyxw
xyw
l
iiii
Ss Smsmmms
sSs Smsmmms
s
l
i
l
ijiiijiiij
l
iiii
1
1 1
1
)()(
))((1
))()((1
)()()()(
)(
bxxybxwxgl
iiii
1
)(
Decision Boundaries
26
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
- Classification IV
- Overview
- Linear Classifier
- Distance to Hyperplane
- Selection of Classifiers
- Unknown Samples
- Margins
- Margins (2)
- Margins (3)
- Objective Function
- Lagrange Multipliers
- Solutions of w amp b
- An Example
- Soft Margin
- Soft Margin (2)
- Non-linear SVMs
- Feature Space
- Feature Space (2)
- Quadratic Basis Functions
- Calculation of Φ(xi )Φ(xj )
- It turns out hellip
- Kernel Trick
- Kernels
- String Kernel
- Solutions of w amp b (2)
- Decision Boundaries
- More Maths hellip
- SVM Roadmap
- Reading Materials
- Review
- Next Weekrsquos Class Talk
-
Margins
7
Support Vectors Support Vectors
Margins
The margin of a linear classifier is defined as the width that the boundary could be increased by before hitting a data point
Intuitively it is safer to choose a classifier with a larger margin
Wider buffer zone for mistakes
The hyperplane is decided by only a few data points
Support Vectors
Others can be discarded
Select the classifier with the maximum margin Linear Support Vector Machines (LSVM)
Works very well in practice
How to specify the margin formally
8
Margins
9
ldquoPredict Class
= +1rdquo
zone
ldquoPredict Class
= -1rdquo
zone
wx+b=1
wx+b=
0wx+b=-1
X-
x+M=Margin
Width
||||
2
wM
Objective Function
Correctly classify all data points
Maximize the margin
Quadratic Optimization Problem
Minimize
Subject to 10
1 1 ii yifbxw
1 1 ii yifbxw
01)( bxwy ii
www
M T
2
1min
2max
www t
2
1)(
1)( bxwy ii
Lagrange Multipliers
11
l
i
l
iiiiiP bxwywL
1 1
2 )(||||2
1
00
0
1
1
l
iii
p
l
iiii
p
yb
L
xyww
L
0amp0
2
1
2
1
ii
ii
jijiijT
ii
jijijiji
iiD
ytosubject
xxyyHwhereH
xxyyL
Dual Problem
Quadratic problem again
Solutions of w amp b
12
bxxyxgl
iiii
1
)(
inner product
positivewithSamplesVectorsSupport
Ss Smsmmms
s
Smsmmms
ssmmSm
ms
Smsmmms
ss
xxyyN
b
xxyyb
ybxxyy
bxxyy
bwxy
)(1
)(
1)(
1)(
2
An Example
13
(1 1 +1)
(0 0 -1)
1
2
212
2
11 00
iii y
00
02
22221212
21211111
2221
1211
xxyyxxyy
xxyyxxyy
HH
HHH
211
2
12
121 2
2
1
HLi
iD
11 21
1)(
1121
]11[]00[)1(1]11[11
21
1
2
1
xxbwxxg
wxb
xywi
iii
0121 xx
22
22
wM
x1
x2
Soft Margin
14
wx+b=1
wx+b=0
wx+b=-
1
e7
e11 e2 01)( iii bwxy
0
2
1)(
i
ii
t Cwww
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Soft Margin
15
iii
p
l
iii
p
l
iiii
p
uCL
yb
L
xyww
L
0
00
0
1
1
i
iiiT
iiD yandCtsHL 00
2
1
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Non-linear SVMs
16
0 x
0 x
x2
x
Feature Space
17
Φ x rarr φ(x)
x1
x2
x12
x22
Feature Space
18
Φ x rarr φ(x)
x2
x1
Quadratic Basis Functions
19
mm
m
m
m
m
xx
xx
xx
xx
xx
xx
x
x
x
x
x
x
x
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
)(
Constant Terms
Linear Terms
Pure Quadratic Terms
Quadratic Cross-Terms
Number of terms
22
)1)(2( 22
2
mmmCm
Calculation of Φ(xi )Φ(xj )
20
mm
m
m
m
m
mm
m
m
m
m
bb
bb
bb
bb
bb
bb
b
b
b
b
b
b
aa
aa
aa
aa
aa
aa
a
a
a
a
a
a
ba
1
2
32
1
31
21
2
22
21
2
1
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
1
)()(
1
m
iiiba
1
2
m
iii ba
1
22
1
1 1
2m
ijiji
m
ij
bbaa
)()( jiji xxxx
It turns out hellip
21
m
i
m
i
m
i
m
ijjijiiiii bbaabababa
1 1
1
1 1
22 221)()(
1
1 111
2
1 1 1
1
2
1
22
122)(
12
12)(12)()1(
m
i
m
iii
m
ijjjii
m
iii
m
i
m
j
m
iiijjii
m
iii
m
iii
babababa
bababa
bababababa
)()()1()( 2 bababaK )(mO )( 2mO
Kernel Trick
The linear classifier relies on dot products between vectors xixj
If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)
A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)
Example x=[x1 x2] K(xi xj) = (1 + xi xj)2
22
]2221[)( where)(x)(x
]2221[]2221[
2221)1()(
212221
21ji
212221
2121
2221
21
221122
222211
21
21
2
xxxxxxx
xxxxxxxxxxxx
xxxxxxxxxxxxxxxxK
jjjjjjiiiiii
jijijijijijijiji
Kernels
23
)tanh()(
2exp)(
)1()(
2
2
cxxxxKTangentHyperbolic
xxxxKGaussian
xxxxKPolynomial
jiji
ji
ji
djiji
String Kernel
24
64
4
2)()(
)(
catcatKcarcarK
catcarK
Similarity between text strings Car vs Custard
Solutions of w amp b
25
bxxKyxg
xxKyyN
xxyyN
b
xxKyxxyxw
xyw
l
iiii
Ss Smsmmms
sSs Smsmmms
s
l
i
l
ijiiijiiij
l
iiii
1
1 1
1
)()(
))((1
))()((1
)()()()(
)(
bxxybxwxgl
iiii
1
)(
Decision Boundaries
26
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
- Classification IV
- Overview
- Linear Classifier
- Distance to Hyperplane
- Selection of Classifiers
- Unknown Samples
- Margins
- Margins (2)
- Margins (3)
- Objective Function
- Lagrange Multipliers
- Solutions of w amp b
- An Example
- Soft Margin
- Soft Margin (2)
- Non-linear SVMs
- Feature Space
- Feature Space (2)
- Quadratic Basis Functions
- Calculation of Φ(xi )Φ(xj )
- It turns out hellip
- Kernel Trick
- Kernels
- String Kernel
- Solutions of w amp b (2)
- Decision Boundaries
- More Maths hellip
- SVM Roadmap
- Reading Materials
- Review
- Next Weekrsquos Class Talk
-
Margins
The margin of a linear classifier is defined as the width that the boundary could be increased by before hitting a data point
Intuitively it is safer to choose a classifier with a larger margin
Wider buffer zone for mistakes
The hyperplane is decided by only a few data points
Support Vectors
Others can be discarded
Select the classifier with the maximum margin Linear Support Vector Machines (LSVM)
Works very well in practice
How to specify the margin formally
8
Margins
9
ldquoPredict Class
= +1rdquo
zone
ldquoPredict Class
= -1rdquo
zone
wx+b=1
wx+b=
0wx+b=-1
X-
x+M=Margin
Width
||||
2
wM
Objective Function
Correctly classify all data points
Maximize the margin
Quadratic Optimization Problem
Minimize
Subject to 10
1 1 ii yifbxw
1 1 ii yifbxw
01)( bxwy ii
www
M T
2
1min
2max
www t
2
1)(
1)( bxwy ii
Lagrange Multipliers
11
l
i
l
iiiiiP bxwywL
1 1
2 )(||||2
1
00
0
1
1
l
iii
p
l
iiii
p
yb
L
xyww
L
0amp0
2
1
2
1
ii
ii
jijiijT
ii
jijijiji
iiD
ytosubject
xxyyHwhereH
xxyyL
Dual Problem
Quadratic problem again
Solutions of w amp b
12
bxxyxgl
iiii
1
)(
inner product
positivewithSamplesVectorsSupport
Ss Smsmmms
s
Smsmmms
ssmmSm
ms
Smsmmms
ss
xxyyN
b
xxyyb
ybxxyy
bxxyy
bwxy
)(1
)(
1)(
1)(
2
An Example
13
(1 1 +1)
(0 0 -1)
1
2
212
2
11 00
iii y
00
02
22221212
21211111
2221
1211
xxyyxxyy
xxyyxxyy
HH
HHH
211
2
12
121 2
2
1
HLi
iD
11 21
1)(
1121
]11[]00[)1(1]11[11
21
1
2
1
xxbwxxg
wxb
xywi
iii
0121 xx
22
22
wM
x1
x2
Soft Margin
14
wx+b=1
wx+b=0
wx+b=-
1
e7
e11 e2 01)( iii bwxy
0
2
1)(
i
ii
t Cwww
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Soft Margin
15
iii
p
l
iii
p
l
iiii
p
uCL
yb
L
xyww
L
0
00
0
1
1
i
iiiT
iiD yandCtsHL 00
2
1
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Non-linear SVMs
16
0 x
0 x
x2
x
Feature Space
17
Φ x rarr φ(x)
x1
x2
x12
x22
Feature Space
18
Φ x rarr φ(x)
x2
x1
Quadratic Basis Functions
19
mm
m
m
m
m
xx
xx
xx
xx
xx
xx
x
x
x
x
x
x
x
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
)(
Constant Terms
Linear Terms
Pure Quadratic Terms
Quadratic Cross-Terms
Number of terms
22
)1)(2( 22
2
mmmCm
Calculation of Φ(xi )Φ(xj )
20
mm
m
m
m
m
mm
m
m
m
m
bb
bb
bb
bb
bb
bb
b
b
b
b
b
b
aa
aa
aa
aa
aa
aa
a
a
a
a
a
a
ba
1
2
32
1
31
21
2
22
21
2
1
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
1
)()(
1
m
iiiba
1
2
m
iii ba
1
22
1
1 1
2m
ijiji
m
ij
bbaa
)()( jiji xxxx
It turns out hellip
21
m
i
m
i
m
i
m
ijjijiiiii bbaabababa
1 1
1
1 1
22 221)()(
1
1 111
2
1 1 1
1
2
1
22
122)(
12
12)(12)()1(
m
i
m
iii
m
ijjjii
m
iii
m
i
m
j
m
iiijjii
m
iii
m
iii
babababa
bababa
bababababa
)()()1()( 2 bababaK )(mO )( 2mO
Kernel Trick
The linear classifier relies on dot products between vectors xixj
If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)
A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)
Example x=[x1 x2] K(xi xj) = (1 + xi xj)2
22
]2221[)( where)(x)(x
]2221[]2221[
2221)1()(
212221
21ji
212221
2121
2221
21
221122
222211
21
21
2
xxxxxxx
xxxxxxxxxxxx
xxxxxxxxxxxxxxxxK
jjjjjjiiiiii
jijijijijijijiji
Kernels
23
)tanh()(
2exp)(
)1()(
2
2
cxxxxKTangentHyperbolic
xxxxKGaussian
xxxxKPolynomial
jiji
ji
ji
djiji
String Kernel
24
64
4
2)()(
)(
catcatKcarcarK
catcarK
Similarity between text strings Car vs Custard
Solutions of w amp b
25
bxxKyxg
xxKyyN
xxyyN
b
xxKyxxyxw
xyw
l
iiii
Ss Smsmmms
sSs Smsmmms
s
l
i
l
ijiiijiiij
l
iiii
1
1 1
1
)()(
))((1
))()((1
)()()()(
)(
bxxybxwxgl
iiii
1
)(
Decision Boundaries
26
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
- Classification IV
- Overview
- Linear Classifier
- Distance to Hyperplane
- Selection of Classifiers
- Unknown Samples
- Margins
- Margins (2)
- Margins (3)
- Objective Function
- Lagrange Multipliers
- Solutions of w amp b
- An Example
- Soft Margin
- Soft Margin (2)
- Non-linear SVMs
- Feature Space
- Feature Space (2)
- Quadratic Basis Functions
- Calculation of Φ(xi )Φ(xj )
- It turns out hellip
- Kernel Trick
- Kernels
- String Kernel
- Solutions of w amp b (2)
- Decision Boundaries
- More Maths hellip
- SVM Roadmap
- Reading Materials
- Review
- Next Weekrsquos Class Talk
-
Margins
9
ldquoPredict Class
= +1rdquo
zone
ldquoPredict Class
= -1rdquo
zone
wx+b=1
wx+b=
0wx+b=-1
X-
x+M=Margin
Width
||||
2
wM
Objective Function
Correctly classify all data points
Maximize the margin
Quadratic Optimization Problem
Minimize
Subject to 10
1 1 ii yifbxw
1 1 ii yifbxw
01)( bxwy ii
www
M T
2
1min
2max
www t
2
1)(
1)( bxwy ii
Lagrange Multipliers
11
l
i
l
iiiiiP bxwywL
1 1
2 )(||||2
1
00
0
1
1
l
iii
p
l
iiii
p
yb
L
xyww
L
0amp0
2
1
2
1
ii
ii
jijiijT
ii
jijijiji
iiD
ytosubject
xxyyHwhereH
xxyyL
Dual Problem
Quadratic problem again
Solutions of w amp b
12
bxxyxgl
iiii
1
)(
inner product
positivewithSamplesVectorsSupport
Ss Smsmmms
s
Smsmmms
ssmmSm
ms
Smsmmms
ss
xxyyN
b
xxyyb
ybxxyy
bxxyy
bwxy
)(1
)(
1)(
1)(
2
An Example
13
(1 1 +1)
(0 0 -1)
1
2
212
2
11 00
iii y
00
02
22221212
21211111
2221
1211
xxyyxxyy
xxyyxxyy
HH
HHH
211
2
12
121 2
2
1
HLi
iD
11 21
1)(
1121
]11[]00[)1(1]11[11
21
1
2
1
xxbwxxg
wxb
xywi
iii
0121 xx
22
22
wM
x1
x2
Soft Margin
14
wx+b=1
wx+b=0
wx+b=-
1
e7
e11 e2 01)( iii bwxy
0
2
1)(
i
ii
t Cwww
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Soft Margin
15
iii
p
l
iii
p
l
iiii
p
uCL
yb
L
xyww
L
0
00
0
1
1
i
iiiT
iiD yandCtsHL 00
2
1
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Non-linear SVMs
16
0 x
0 x
x2
x
Feature Space
17
Φ x rarr φ(x)
x1
x2
x12
x22
Feature Space
18
Φ x rarr φ(x)
x2
x1
Quadratic Basis Functions
19
mm
m
m
m
m
xx
xx
xx
xx
xx
xx
x
x
x
x
x
x
x
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
)(
Constant Terms
Linear Terms
Pure Quadratic Terms
Quadratic Cross-Terms
Number of terms
22
)1)(2( 22
2
mmmCm
Calculation of Φ(xi )Φ(xj )
20
mm
m
m
m
m
mm
m
m
m
m
bb
bb
bb
bb
bb
bb
b
b
b
b
b
b
aa
aa
aa
aa
aa
aa
a
a
a
a
a
a
ba
1
2
32
1
31
21
2
22
21
2
1
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
1
)()(
1
m
iiiba
1
2
m
iii ba
1
22
1
1 1
2m
ijiji
m
ij
bbaa
)()( jiji xxxx
It turns out hellip
21
m
i
m
i
m
i
m
ijjijiiiii bbaabababa
1 1
1
1 1
22 221)()(
1
1 111
2
1 1 1
1
2
1
22
122)(
12
12)(12)()1(
m
i
m
iii
m
ijjjii
m
iii
m
i
m
j
m
iiijjii
m
iii
m
iii
babababa
bababa
bababababa
)()()1()( 2 bababaK )(mO )( 2mO
Kernel Trick
The linear classifier relies on dot products between vectors xixj
If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)
A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)
Example x=[x1 x2] K(xi xj) = (1 + xi xj)2
22
]2221[)( where)(x)(x
]2221[]2221[
2221)1()(
212221
21ji
212221
2121
2221
21
221122
222211
21
21
2
xxxxxxx
xxxxxxxxxxxx
xxxxxxxxxxxxxxxxK
jjjjjjiiiiii
jijijijijijijiji
Kernels
23
)tanh()(
2exp)(
)1()(
2
2
cxxxxKTangentHyperbolic
xxxxKGaussian
xxxxKPolynomial
jiji
ji
ji
djiji
String Kernel
24
64
4
2)()(
)(
catcatKcarcarK
catcarK
Similarity between text strings Car vs Custard
Solutions of w amp b
25
bxxKyxg
xxKyyN
xxyyN
b
xxKyxxyxw
xyw
l
iiii
Ss Smsmmms
sSs Smsmmms
s
l
i
l
ijiiijiiij
l
iiii
1
1 1
1
)()(
))((1
))()((1
)()()()(
)(
bxxybxwxgl
iiii
1
)(
Decision Boundaries
26
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
- Classification IV
- Overview
- Linear Classifier
- Distance to Hyperplane
- Selection of Classifiers
- Unknown Samples
- Margins
- Margins (2)
- Margins (3)
- Objective Function
- Lagrange Multipliers
- Solutions of w amp b
- An Example
- Soft Margin
- Soft Margin (2)
- Non-linear SVMs
- Feature Space
- Feature Space (2)
- Quadratic Basis Functions
- Calculation of Φ(xi )Φ(xj )
- It turns out hellip
- Kernel Trick
- Kernels
- String Kernel
- Solutions of w amp b (2)
- Decision Boundaries
- More Maths hellip
- SVM Roadmap
- Reading Materials
- Review
- Next Weekrsquos Class Talk
-
Objective Function
Correctly classify all data points
Maximize the margin
Quadratic Optimization Problem
Minimize
Subject to 10
1 1 ii yifbxw
1 1 ii yifbxw
01)( bxwy ii
www
M T
2
1min
2max
www t
2
1)(
1)( bxwy ii
Lagrange Multipliers
11
l
i
l
iiiiiP bxwywL
1 1
2 )(||||2
1
00
0
1
1
l
iii
p
l
iiii
p
yb
L
xyww
L
0amp0
2
1
2
1
ii
ii
jijiijT
ii
jijijiji
iiD
ytosubject
xxyyHwhereH
xxyyL
Dual Problem
Quadratic problem again
Solutions of w amp b
12
bxxyxgl
iiii
1
)(
inner product
positivewithSamplesVectorsSupport
Ss Smsmmms
s
Smsmmms
ssmmSm
ms
Smsmmms
ss
xxyyN
b
xxyyb
ybxxyy
bxxyy
bwxy
)(1
)(
1)(
1)(
2
An Example
13
(1 1 +1)
(0 0 -1)
1
2
212
2
11 00
iii y
00
02
22221212
21211111
2221
1211
xxyyxxyy
xxyyxxyy
HH
HHH
211
2
12
121 2
2
1
HLi
iD
11 21
1)(
1121
]11[]00[)1(1]11[11
21
1
2
1
xxbwxxg
wxb
xywi
iii
0121 xx
22
22
wM
x1
x2
Soft Margin
14
wx+b=1
wx+b=0
wx+b=-
1
e7
e11 e2 01)( iii bwxy
0
2
1)(
i
ii
t Cwww
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Soft Margin
15
iii
p
l
iii
p
l
iiii
p
uCL
yb
L
xyww
L
0
00
0
1
1
i
iiiT
iiD yandCtsHL 00
2
1
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Non-linear SVMs
16
0 x
0 x
x2
x
Feature Space
17
Φ x rarr φ(x)
x1
x2
x12
x22
Feature Space
18
Φ x rarr φ(x)
x2
x1
Quadratic Basis Functions
19
mm
m
m
m
m
xx
xx
xx
xx
xx
xx
x
x
x
x
x
x
x
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
)(
Constant Terms
Linear Terms
Pure Quadratic Terms
Quadratic Cross-Terms
Number of terms
22
)1)(2( 22
2
mmmCm
Calculation of Φ(xi )Φ(xj )
20
mm
m
m
m
m
mm
m
m
m
m
bb
bb
bb
bb
bb
bb
b
b
b
b
b
b
aa
aa
aa
aa
aa
aa
a
a
a
a
a
a
ba
1
2
32
1
31
21
2
22
21
2
1
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
1
)()(
1
m
iiiba
1
2
m
iii ba
1
22
1
1 1
2m
ijiji
m
ij
bbaa
)()( jiji xxxx
It turns out hellip
21
m
i
m
i
m
i
m
ijjijiiiii bbaabababa
1 1
1
1 1
22 221)()(
1
1 111
2
1 1 1
1
2
1
22
122)(
12
12)(12)()1(
m
i
m
iii
m
ijjjii
m
iii
m
i
m
j
m
iiijjii
m
iii
m
iii
babababa
bababa
bababababa
)()()1()( 2 bababaK )(mO )( 2mO
Kernel Trick
The linear classifier relies on dot products between vectors xixj
If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)
A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)
Example x=[x1 x2] K(xi xj) = (1 + xi xj)2
22
]2221[)( where)(x)(x
]2221[]2221[
2221)1()(
212221
21ji
212221
2121
2221
21
221122
222211
21
21
2
xxxxxxx
xxxxxxxxxxxx
xxxxxxxxxxxxxxxxK
jjjjjjiiiiii
jijijijijijijiji
Kernels
23
)tanh()(
2exp)(
)1()(
2
2
cxxxxKTangentHyperbolic
xxxxKGaussian
xxxxKPolynomial
jiji
ji
ji
djiji
String Kernel
24
64
4
2)()(
)(
catcatKcarcarK
catcarK
Similarity between text strings Car vs Custard
Solutions of w amp b
25
bxxKyxg
xxKyyN
xxyyN
b
xxKyxxyxw
xyw
l
iiii
Ss Smsmmms
sSs Smsmmms
s
l
i
l
ijiiijiiij
l
iiii
1
1 1
1
)()(
))((1
))()((1
)()()()(
)(
bxxybxwxgl
iiii
1
)(
Decision Boundaries
26
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
- Classification IV
- Overview
- Linear Classifier
- Distance to Hyperplane
- Selection of Classifiers
- Unknown Samples
- Margins
- Margins (2)
- Margins (3)
- Objective Function
- Lagrange Multipliers
- Solutions of w amp b
- An Example
- Soft Margin
- Soft Margin (2)
- Non-linear SVMs
- Feature Space
- Feature Space (2)
- Quadratic Basis Functions
- Calculation of Φ(xi )Φ(xj )
- It turns out hellip
- Kernel Trick
- Kernels
- String Kernel
- Solutions of w amp b (2)
- Decision Boundaries
- More Maths hellip
- SVM Roadmap
- Reading Materials
- Review
- Next Weekrsquos Class Talk
-
Lagrange Multipliers
11
l
i
l
iiiiiP bxwywL
1 1
2 )(||||2
1
00
0
1
1
l
iii
p
l
iiii
p
yb
L
xyww
L
0amp0
2
1
2
1
ii
ii
jijiijT
ii
jijijiji
iiD
ytosubject
xxyyHwhereH
xxyyL
Dual Problem
Quadratic problem again
Solutions of w amp b
12
bxxyxgl
iiii
1
)(
inner product
positivewithSamplesVectorsSupport
Ss Smsmmms
s
Smsmmms
ssmmSm
ms
Smsmmms
ss
xxyyN
b
xxyyb
ybxxyy
bxxyy
bwxy
)(1
)(
1)(
1)(
2
An Example
13
(1 1 +1)
(0 0 -1)
1
2
212
2
11 00
iii y
00
02
22221212
21211111
2221
1211
xxyyxxyy
xxyyxxyy
HH
HHH
211
2
12
121 2
2
1
HLi
iD
11 21
1)(
1121
]11[]00[)1(1]11[11
21
1
2
1
xxbwxxg
wxb
xywi
iii
0121 xx
22
22
wM
x1
x2
Soft Margin
14
wx+b=1
wx+b=0
wx+b=-
1
e7
e11 e2 01)( iii bwxy
0
2
1)(
i
ii
t Cwww
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Soft Margin
15
iii
p
l
iii
p
l
iiii
p
uCL
yb
L
xyww
L
0
00
0
1
1
i
iiiT
iiD yandCtsHL 00
2
1
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Non-linear SVMs
16
0 x
0 x
x2
x
Feature Space
17
Φ x rarr φ(x)
x1
x2
x12
x22
Feature Space
18
Φ x rarr φ(x)
x2
x1
Quadratic Basis Functions
19
mm
m
m
m
m
xx
xx
xx
xx
xx
xx
x
x
x
x
x
x
x
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
)(
Constant Terms
Linear Terms
Pure Quadratic Terms
Quadratic Cross-Terms
Number of terms
22
)1)(2( 22
2
mmmCm
Calculation of Φ(xi )Φ(xj )
20
mm
m
m
m
m
mm
m
m
m
m
bb
bb
bb
bb
bb
bb
b
b
b
b
b
b
aa
aa
aa
aa
aa
aa
a
a
a
a
a
a
ba
1
2
32
1
31
21
2
22
21
2
1
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
1
)()(
1
m
iiiba
1
2
m
iii ba
1
22
1
1 1
2m
ijiji
m
ij
bbaa
)()( jiji xxxx
It turns out hellip
21
m
i
m
i
m
i
m
ijjijiiiii bbaabababa
1 1
1
1 1
22 221)()(
1
1 111
2
1 1 1
1
2
1
22
122)(
12
12)(12)()1(
m
i
m
iii
m
ijjjii
m
iii
m
i
m
j
m
iiijjii
m
iii
m
iii
babababa
bababa
bababababa
)()()1()( 2 bababaK )(mO )( 2mO
Kernel Trick
The linear classifier relies on dot products between vectors xixj
If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)
A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)
Example x=[x1 x2] K(xi xj) = (1 + xi xj)2
22
]2221[)( where)(x)(x
]2221[]2221[
2221)1()(
212221
21ji
212221
2121
2221
21
221122
222211
21
21
2
xxxxxxx
xxxxxxxxxxxx
xxxxxxxxxxxxxxxxK
jjjjjjiiiiii
jijijijijijijiji
Kernels
23
)tanh()(
2exp)(
)1()(
2
2
cxxxxKTangentHyperbolic
xxxxKGaussian
xxxxKPolynomial
jiji
ji
ji
djiji
String Kernel
24
64
4
2)()(
)(
catcatKcarcarK
catcarK
Similarity between text strings Car vs Custard
Solutions of w amp b
25
bxxKyxg
xxKyyN
xxyyN
b
xxKyxxyxw
xyw
l
iiii
Ss Smsmmms
sSs Smsmmms
s
l
i
l
ijiiijiiij
l
iiii
1
1 1
1
)()(
))((1
))()((1
)()()()(
)(
bxxybxwxgl
iiii
1
)(
Decision Boundaries
26
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
- Classification IV
- Overview
- Linear Classifier
- Distance to Hyperplane
- Selection of Classifiers
- Unknown Samples
- Margins
- Margins (2)
- Margins (3)
- Objective Function
- Lagrange Multipliers
- Solutions of w amp b
- An Example
- Soft Margin
- Soft Margin (2)
- Non-linear SVMs
- Feature Space
- Feature Space (2)
- Quadratic Basis Functions
- Calculation of Φ(xi )Φ(xj )
- It turns out hellip
- Kernel Trick
- Kernels
- String Kernel
- Solutions of w amp b (2)
- Decision Boundaries
- More Maths hellip
- SVM Roadmap
- Reading Materials
- Review
- Next Weekrsquos Class Talk
-
Solutions of w amp b
12
bxxyxgl
iiii
1
)(
inner product
positivewithSamplesVectorsSupport
Ss Smsmmms
s
Smsmmms
ssmmSm
ms
Smsmmms
ss
xxyyN
b
xxyyb
ybxxyy
bxxyy
bwxy
)(1
)(
1)(
1)(
2
An Example
13
(1 1 +1)
(0 0 -1)
1
2
212
2
11 00
iii y
00
02
22221212
21211111
2221
1211
xxyyxxyy
xxyyxxyy
HH
HHH
211
2
12
121 2
2
1
HLi
iD
11 21
1)(
1121
]11[]00[)1(1]11[11
21
1
2
1
xxbwxxg
wxb
xywi
iii
0121 xx
22
22
wM
x1
x2
Soft Margin
14
wx+b=1
wx+b=0
wx+b=-
1
e7
e11 e2 01)( iii bwxy
0
2
1)(
i
ii
t Cwww
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Soft Margin
15
iii
p
l
iii
p
l
iiii
p
uCL
yb
L
xyww
L
0
00
0
1
1
i
iiiT
iiD yandCtsHL 00
2
1
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Non-linear SVMs
16
0 x
0 x
x2
x
Feature Space
17
Φ x rarr φ(x)
x1
x2
x12
x22
Feature Space
18
Φ x rarr φ(x)
x2
x1
Quadratic Basis Functions
19
mm
m
m
m
m
xx
xx
xx
xx
xx
xx
x
x
x
x
x
x
x
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
)(
Constant Terms
Linear Terms
Pure Quadratic Terms
Quadratic Cross-Terms
Number of terms
22
)1)(2( 22
2
mmmCm
Calculation of Φ(xi )Φ(xj )
20
mm
m
m
m
m
mm
m
m
m
m
bb
bb
bb
bb
bb
bb
b
b
b
b
b
b
aa
aa
aa
aa
aa
aa
a
a
a
a
a
a
ba
1
2
32
1
31
21
2
22
21
2
1
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
1
)()(
1
m
iiiba
1
2
m
iii ba
1
22
1
1 1
2m
ijiji
m
ij
bbaa
)()( jiji xxxx
It turns out hellip
21
m
i
m
i
m
i
m
ijjijiiiii bbaabababa
1 1
1
1 1
22 221)()(
1
1 111
2
1 1 1
1
2
1
22
122)(
12
12)(12)()1(
m
i
m
iii
m
ijjjii
m
iii
m
i
m
j
m
iiijjii
m
iii
m
iii
babababa
bababa
bababababa
)()()1()( 2 bababaK )(mO )( 2mO
Kernel Trick
The linear classifier relies on dot products between vectors xixj
If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)
A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)
Example x=[x1 x2] K(xi xj) = (1 + xi xj)2
22
]2221[)( where)(x)(x
]2221[]2221[
2221)1()(
212221
21ji
212221
2121
2221
21
221122
222211
21
21
2
xxxxxxx
xxxxxxxxxxxx
xxxxxxxxxxxxxxxxK
jjjjjjiiiiii
jijijijijijijiji
Kernels
23
)tanh()(
2exp)(
)1()(
2
2
cxxxxKTangentHyperbolic
xxxxKGaussian
xxxxKPolynomial
jiji
ji
ji
djiji
String Kernel
24
64
4
2)()(
)(
catcatKcarcarK
catcarK
Similarity between text strings Car vs Custard
Solutions of w amp b
25
bxxKyxg
xxKyyN
xxyyN
b
xxKyxxyxw
xyw
l
iiii
Ss Smsmmms
sSs Smsmmms
s
l
i
l
ijiiijiiij
l
iiii
1
1 1
1
)()(
))((1
))()((1
)()()()(
)(
bxxybxwxgl
iiii
1
)(
Decision Boundaries
26
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
- Classification IV
- Overview
- Linear Classifier
- Distance to Hyperplane
- Selection of Classifiers
- Unknown Samples
- Margins
- Margins (2)
- Margins (3)
- Objective Function
- Lagrange Multipliers
- Solutions of w amp b
- An Example
- Soft Margin
- Soft Margin (2)
- Non-linear SVMs
- Feature Space
- Feature Space (2)
- Quadratic Basis Functions
- Calculation of Φ(xi )Φ(xj )
- It turns out hellip
- Kernel Trick
- Kernels
- String Kernel
- Solutions of w amp b (2)
- Decision Boundaries
- More Maths hellip
- SVM Roadmap
- Reading Materials
- Review
- Next Weekrsquos Class Talk
-
An Example
13
(1 1 +1)
(0 0 -1)
1
2
212
2
11 00
iii y
00
02
22221212
21211111
2221
1211
xxyyxxyy
xxyyxxyy
HH
HHH
211
2
12
121 2
2
1
HLi
iD
11 21
1)(
1121
]11[]00[)1(1]11[11
21
1
2
1
xxbwxxg
wxb
xywi
iii
0121 xx
22
22
wM
x1
x2
Soft Margin
14
wx+b=1
wx+b=0
wx+b=-
1
e7
e11 e2 01)( iii bwxy
0
2
1)(
i
ii
t Cwww
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Soft Margin
15
iii
p
l
iii
p
l
iiii
p
uCL
yb
L
xyww
L
0
00
0
1
1
i
iiiT
iiD yandCtsHL 00
2
1
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Non-linear SVMs
16
0 x
0 x
x2
x
Feature Space
17
Φ x rarr φ(x)
x1
x2
x12
x22
Feature Space
18
Φ x rarr φ(x)
x2
x1
Quadratic Basis Functions
19
mm
m
m
m
m
xx
xx
xx
xx
xx
xx
x
x
x
x
x
x
x
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
)(
Constant Terms
Linear Terms
Pure Quadratic Terms
Quadratic Cross-Terms
Number of terms
22
)1)(2( 22
2
mmmCm
Calculation of Φ(xi )Φ(xj )
20
mm
m
m
m
m
mm
m
m
m
m
bb
bb
bb
bb
bb
bb
b
b
b
b
b
b
aa
aa
aa
aa
aa
aa
a
a
a
a
a
a
ba
1
2
32
1
31
21
2
22
21
2
1
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
1
)()(
1
m
iiiba
1
2
m
iii ba
1
22
1
1 1
2m
ijiji
m
ij
bbaa
)()( jiji xxxx
It turns out hellip
21
m
i
m
i
m
i
m
ijjijiiiii bbaabababa
1 1
1
1 1
22 221)()(
1
1 111
2
1 1 1
1
2
1
22
122)(
12
12)(12)()1(
m
i
m
iii
m
ijjjii
m
iii
m
i
m
j
m
iiijjii
m
iii
m
iii
babababa
bababa
bababababa
)()()1()( 2 bababaK )(mO )( 2mO
Kernel Trick
The linear classifier relies on dot products between vectors xixj
If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)
A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)
Example x=[x1 x2] K(xi xj) = (1 + xi xj)2
22
]2221[)( where)(x)(x
]2221[]2221[
2221)1()(
212221
21ji
212221
2121
2221
21
221122
222211
21
21
2
xxxxxxx
xxxxxxxxxxxx
xxxxxxxxxxxxxxxxK
jjjjjjiiiiii
jijijijijijijiji
Kernels
23
)tanh()(
2exp)(
)1()(
2
2
cxxxxKTangentHyperbolic
xxxxKGaussian
xxxxKPolynomial
jiji
ji
ji
djiji
String Kernel
24
64
4
2)()(
)(
catcatKcarcarK
catcarK
Similarity between text strings Car vs Custard
Solutions of w amp b
25
bxxKyxg
xxKyyN
xxyyN
b
xxKyxxyxw
xyw
l
iiii
Ss Smsmmms
sSs Smsmmms
s
l
i
l
ijiiijiiij
l
iiii
1
1 1
1
)()(
))((1
))()((1
)()()()(
)(
bxxybxwxgl
iiii
1
)(
Decision Boundaries
26
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
- Classification IV
- Overview
- Linear Classifier
- Distance to Hyperplane
- Selection of Classifiers
- Unknown Samples
- Margins
- Margins (2)
- Margins (3)
- Objective Function
- Lagrange Multipliers
- Solutions of w amp b
- An Example
- Soft Margin
- Soft Margin (2)
- Non-linear SVMs
- Feature Space
- Feature Space (2)
- Quadratic Basis Functions
- Calculation of Φ(xi )Φ(xj )
- It turns out hellip
- Kernel Trick
- Kernels
- String Kernel
- Solutions of w amp b (2)
- Decision Boundaries
- More Maths hellip
- SVM Roadmap
- Reading Materials
- Review
- Next Weekrsquos Class Talk
-
Soft Margin
14
wx+b=1
wx+b=0
wx+b=-
1
e7
e11 e2 01)( iii bwxy
0
2
1)(
i
ii
t Cwww
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Soft Margin
15
iii
p
l
iii
p
l
iiii
p
uCL
yb
L
xyww
L
0
00
0
1
1
i
iiiT
iiD yandCtsHL 00
2
1
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Non-linear SVMs
16
0 x
0 x
x2
x
Feature Space
17
Φ x rarr φ(x)
x1
x2
x12
x22
Feature Space
18
Φ x rarr φ(x)
x2
x1
Quadratic Basis Functions
19
mm
m
m
m
m
xx
xx
xx
xx
xx
xx
x
x
x
x
x
x
x
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
)(
Constant Terms
Linear Terms
Pure Quadratic Terms
Quadratic Cross-Terms
Number of terms
22
)1)(2( 22
2
mmmCm
Calculation of Φ(xi )Φ(xj )
20
mm
m
m
m
m
mm
m
m
m
m
bb
bb
bb
bb
bb
bb
b
b
b
b
b
b
aa
aa
aa
aa
aa
aa
a
a
a
a
a
a
ba
1
2
32
1
31
21
2
22
21
2
1
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
1
)()(
1
m
iiiba
1
2
m
iii ba
1
22
1
1 1
2m
ijiji
m
ij
bbaa
)()( jiji xxxx
It turns out hellip
21
m
i
m
i
m
i
m
ijjijiiiii bbaabababa
1 1
1
1 1
22 221)()(
1
1 111
2
1 1 1
1
2
1
22
122)(
12
12)(12)()1(
m
i
m
iii
m
ijjjii
m
iii
m
i
m
j
m
iiijjii
m
iii
m
iii
babababa
bababa
bababababa
)()()1()( 2 bababaK )(mO )( 2mO
Kernel Trick
The linear classifier relies on dot products between vectors xixj
If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)
A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)
Example x=[x1 x2] K(xi xj) = (1 + xi xj)2
22
]2221[)( where)(x)(x
]2221[]2221[
2221)1()(
212221
21ji
212221
2121
2221
21
221122
222211
21
21
2
xxxxxxx
xxxxxxxxxxxx
xxxxxxxxxxxxxxxxK
jjjjjjiiiiii
jijijijijijijiji
Kernels
23
)tanh()(
2exp)(
)1()(
2
2
cxxxxKTangentHyperbolic
xxxxKGaussian
xxxxKPolynomial
jiji
ji
ji
djiji
String Kernel
24
64
4
2)()(
)(
catcatKcarcarK
catcarK
Similarity between text strings Car vs Custard
Solutions of w amp b
25
bxxKyxg
xxKyyN
xxyyN
b
xxKyxxyxw
xyw
l
iiii
Ss Smsmmms
sSs Smsmmms
s
l
i
l
ijiiijiiij
l
iiii
1
1 1
1
)()(
))((1
))()((1
)()()()(
)(
bxxybxwxgl
iiii
1
)(
Decision Boundaries
26
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
- Classification IV
- Overview
- Linear Classifier
- Distance to Hyperplane
- Selection of Classifiers
- Unknown Samples
- Margins
- Margins (2)
- Margins (3)
- Objective Function
- Lagrange Multipliers
- Solutions of w amp b
- An Example
- Soft Margin
- Soft Margin (2)
- Non-linear SVMs
- Feature Space
- Feature Space (2)
- Quadratic Basis Functions
- Calculation of Φ(xi )Φ(xj )
- It turns out hellip
- Kernel Trick
- Kernels
- String Kernel
- Solutions of w amp b (2)
- Decision Boundaries
- More Maths hellip
- SVM Roadmap
- Reading Materials
- Review
- Next Weekrsquos Class Talk
-
Soft Margin
15
iii
p
l
iii
p
l
iiii
p
uCL
yb
L
xyww
L
0
00
0
1
1
i
iiiT
iiD yandCtsHL 00
2
1
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Non-linear SVMs
16
0 x
0 x
x2
x
Feature Space
17
Φ x rarr φ(x)
x1
x2
x12
x22
Feature Space
18
Φ x rarr φ(x)
x2
x1
Quadratic Basis Functions
19
mm
m
m
m
m
xx
xx
xx
xx
xx
xx
x
x
x
x
x
x
x
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
)(
Constant Terms
Linear Terms
Pure Quadratic Terms
Quadratic Cross-Terms
Number of terms
22
)1)(2( 22
2
mmmCm
Calculation of Φ(xi )Φ(xj )
20
mm
m
m
m
m
mm
m
m
m
m
bb
bb
bb
bb
bb
bb
b
b
b
b
b
b
aa
aa
aa
aa
aa
aa
a
a
a
a
a
a
ba
1
2
32
1
31
21
2
22
21
2
1
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
1
)()(
1
m
iiiba
1
2
m
iii ba
1
22
1
1 1
2m
ijiji
m
ij
bbaa
)()( jiji xxxx
It turns out hellip
21
m
i
m
i
m
i
m
ijjijiiiii bbaabababa
1 1
1
1 1
22 221)()(
1
1 111
2
1 1 1
1
2
1
22
122)(
12
12)(12)()1(
m
i
m
iii
m
ijjjii
m
iii
m
i
m
j
m
iiijjii
m
iii
m
iii
babababa
bababa
bababababa
)()()1()( 2 bababaK )(mO )( 2mO
Kernel Trick
The linear classifier relies on dot products between vectors xixj
If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)
A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)
Example x=[x1 x2] K(xi xj) = (1 + xi xj)2
22
]2221[)( where)(x)(x
]2221[]2221[
2221)1()(
212221
21ji
212221
2121
2221
21
221122
222211
21
21
2
xxxxxxx
xxxxxxxxxxxx
xxxxxxxxxxxxxxxxK
jjjjjjiiiiii
jijijijijijijiji
Kernels
23
)tanh()(
2exp)(
)1()(
2
2
cxxxxKTangentHyperbolic
xxxxKGaussian
xxxxKPolynomial
jiji
ji
ji
djiji
String Kernel
24
64
4
2)()(
)(
catcatKcarcarK
catcarK
Similarity between text strings Car vs Custard
Solutions of w amp b
25
bxxKyxg
xxKyyN
xxyyN
b
xxKyxxyxw
xyw
l
iiii
Ss Smsmmms
sSs Smsmmms
s
l
i
l
ijiiijiiij
l
iiii
1
1 1
1
)()(
))((1
))()((1
)()()()(
)(
bxxybxwxgl
iiii
1
)(
Decision Boundaries
26
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
- Classification IV
- Overview
- Linear Classifier
- Distance to Hyperplane
- Selection of Classifiers
- Unknown Samples
- Margins
- Margins (2)
- Margins (3)
- Objective Function
- Lagrange Multipliers
- Solutions of w amp b
- An Example
- Soft Margin
- Soft Margin (2)
- Non-linear SVMs
- Feature Space
- Feature Space (2)
- Quadratic Basis Functions
- Calculation of Φ(xi )Φ(xj )
- It turns out hellip
- Kernel Trick
- Kernels
- String Kernel
- Solutions of w amp b (2)
- Decision Boundaries
- More Maths hellip
- SVM Roadmap
- Reading Materials
- Review
- Next Weekrsquos Class Talk
-
Non-linear SVMs
16
0 x
0 x
x2
x
Feature Space
17
Φ x rarr φ(x)
x1
x2
x12
x22
Feature Space
18
Φ x rarr φ(x)
x2
x1
Quadratic Basis Functions
19
mm
m
m
m
m
xx
xx
xx
xx
xx
xx
x
x
x
x
x
x
x
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
)(
Constant Terms
Linear Terms
Pure Quadratic Terms
Quadratic Cross-Terms
Number of terms
22
)1)(2( 22
2
mmmCm
Calculation of Φ(xi )Φ(xj )
20
mm
m
m
m
m
mm
m
m
m
m
bb
bb
bb
bb
bb
bb
b
b
b
b
b
b
aa
aa
aa
aa
aa
aa
a
a
a
a
a
a
ba
1
2
32
1
31
21
2
22
21
2
1
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
1
)()(
1
m
iiiba
1
2
m
iii ba
1
22
1
1 1
2m
ijiji
m
ij
bbaa
)()( jiji xxxx
It turns out hellip
21
m
i
m
i
m
i
m
ijjijiiiii bbaabababa
1 1
1
1 1
22 221)()(
1
1 111
2
1 1 1
1
2
1
22
122)(
12
12)(12)()1(
m
i
m
iii
m
ijjjii
m
iii
m
i
m
j
m
iiijjii
m
iii
m
iii
babababa
bababa
bababababa
)()()1()( 2 bababaK )(mO )( 2mO
Kernel Trick
The linear classifier relies on dot products between vectors xixj
If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)
A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)
Example x=[x1 x2] K(xi xj) = (1 + xi xj)2
22
]2221[)( where)(x)(x
]2221[]2221[
2221)1()(
212221
21ji
212221
2121
2221
21
221122
222211
21
21
2
xxxxxxx
xxxxxxxxxxxx
xxxxxxxxxxxxxxxxK
jjjjjjiiiiii
jijijijijijijiji
Kernels
23
)tanh()(
2exp)(
)1()(
2
2
cxxxxKTangentHyperbolic
xxxxKGaussian
xxxxKPolynomial
jiji
ji
ji
djiji
String Kernel
24
64
4
2)()(
)(
catcatKcarcarK
catcarK
Similarity between text strings Car vs Custard
Solutions of w amp b
25
bxxKyxg
xxKyyN
xxyyN
b
xxKyxxyxw
xyw
l
iiii
Ss Smsmmms
sSs Smsmmms
s
l
i
l
ijiiijiiij
l
iiii
1
1 1
1
)()(
))((1
))()((1
)()()()(
)(
bxxybxwxgl
iiii
1
)(
Decision Boundaries
26
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
- Classification IV
- Overview
- Linear Classifier
- Distance to Hyperplane
- Selection of Classifiers
- Unknown Samples
- Margins
- Margins (2)
- Margins (3)
- Objective Function
- Lagrange Multipliers
- Solutions of w amp b
- An Example
- Soft Margin
- Soft Margin (2)
- Non-linear SVMs
- Feature Space
- Feature Space (2)
- Quadratic Basis Functions
- Calculation of Φ(xi )Φ(xj )
- It turns out hellip
- Kernel Trick
- Kernels
- String Kernel
- Solutions of w amp b (2)
- Decision Boundaries
- More Maths hellip
- SVM Roadmap
- Reading Materials
- Review
- Next Weekrsquos Class Talk
-
Feature Space
17
Φ x rarr φ(x)
x1
x2
x12
x22
Feature Space
18
Φ x rarr φ(x)
x2
x1
Quadratic Basis Functions
19
mm
m
m
m
m
xx
xx
xx
xx
xx
xx
x
x
x
x
x
x
x
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
)(
Constant Terms
Linear Terms
Pure Quadratic Terms
Quadratic Cross-Terms
Number of terms
22
)1)(2( 22
2
mmmCm
Calculation of Φ(xi )Φ(xj )
20
mm
m
m
m
m
mm
m
m
m
m
bb
bb
bb
bb
bb
bb
b
b
b
b
b
b
aa
aa
aa
aa
aa
aa
a
a
a
a
a
a
ba
1
2
32
1
31
21
2
22
21
2
1
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
1
)()(
1
m
iiiba
1
2
m
iii ba
1
22
1
1 1
2m
ijiji
m
ij
bbaa
)()( jiji xxxx
It turns out hellip
21
m
i
m
i
m
i
m
ijjijiiiii bbaabababa
1 1
1
1 1
22 221)()(
1
1 111
2
1 1 1
1
2
1
22
122)(
12
12)(12)()1(
m
i
m
iii
m
ijjjii
m
iii
m
i
m
j
m
iiijjii
m
iii
m
iii
babababa
bababa
bababababa
)()()1()( 2 bababaK )(mO )( 2mO
Kernel Trick
The linear classifier relies on dot products between vectors xixj
If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)
A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)
Example x=[x1 x2] K(xi xj) = (1 + xi xj)2
22
]2221[)( where)(x)(x
]2221[]2221[
2221)1()(
212221
21ji
212221
2121
2221
21
221122
222211
21
21
2
xxxxxxx
xxxxxxxxxxxx
xxxxxxxxxxxxxxxxK
jjjjjjiiiiii
jijijijijijijiji
Kernels
23
)tanh()(
2exp)(
)1()(
2
2
cxxxxKTangentHyperbolic
xxxxKGaussian
xxxxKPolynomial
jiji
ji
ji
djiji
String Kernel
24
64
4
2)()(
)(
catcatKcarcarK
catcarK
Similarity between text strings Car vs Custard
Solutions of w amp b
25
bxxKyxg
xxKyyN
xxyyN
b
xxKyxxyxw
xyw
l
iiii
Ss Smsmmms
sSs Smsmmms
s
l
i
l
ijiiijiiij
l
iiii
1
1 1
1
)()(
))((1
))()((1
)()()()(
)(
bxxybxwxgl
iiii
1
)(
Decision Boundaries
26
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
- Classification IV
- Overview
- Linear Classifier
- Distance to Hyperplane
- Selection of Classifiers
- Unknown Samples
- Margins
- Margins (2)
- Margins (3)
- Objective Function
- Lagrange Multipliers
- Solutions of w amp b
- An Example
- Soft Margin
- Soft Margin (2)
- Non-linear SVMs
- Feature Space
- Feature Space (2)
- Quadratic Basis Functions
- Calculation of Φ(xi )Φ(xj )
- It turns out hellip
- Kernel Trick
- Kernels
- String Kernel
- Solutions of w amp b (2)
- Decision Boundaries
- More Maths hellip
- SVM Roadmap
- Reading Materials
- Review
- Next Weekrsquos Class Talk
-
Feature Space
18
Φ x rarr φ(x)
x2
x1
Quadratic Basis Functions
19
mm
m
m
m
m
xx
xx
xx
xx
xx
xx
x
x
x
x
x
x
x
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
)(
Constant Terms
Linear Terms
Pure Quadratic Terms
Quadratic Cross-Terms
Number of terms
22
)1)(2( 22
2
mmmCm
Calculation of Φ(xi )Φ(xj )
20
mm
m
m
m
m
mm
m
m
m
m
bb
bb
bb
bb
bb
bb
b
b
b
b
b
b
aa
aa
aa
aa
aa
aa
a
a
a
a
a
a
ba
1
2
32
1
31
21
2
22
21
2
1
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
1
)()(
1
m
iiiba
1
2
m
iii ba
1
22
1
1 1
2m
ijiji
m
ij
bbaa
)()( jiji xxxx
It turns out hellip
21
m
i
m
i
m
i
m
ijjijiiiii bbaabababa
1 1
1
1 1
22 221)()(
1
1 111
2
1 1 1
1
2
1
22
122)(
12
12)(12)()1(
m
i
m
iii
m
ijjjii
m
iii
m
i
m
j
m
iiijjii
m
iii
m
iii
babababa
bababa
bababababa
)()()1()( 2 bababaK )(mO )( 2mO
Kernel Trick
The linear classifier relies on dot products between vectors xixj
If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)
A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)
Example x=[x1 x2] K(xi xj) = (1 + xi xj)2
22
]2221[)( where)(x)(x
]2221[]2221[
2221)1()(
212221
21ji
212221
2121
2221
21
221122
222211
21
21
2
xxxxxxx
xxxxxxxxxxxx
xxxxxxxxxxxxxxxxK
jjjjjjiiiiii
jijijijijijijiji
Kernels
23
)tanh()(
2exp)(
)1()(
2
2
cxxxxKTangentHyperbolic
xxxxKGaussian
xxxxKPolynomial
jiji
ji
ji
djiji
String Kernel
24
64
4
2)()(
)(
catcatKcarcarK
catcarK
Similarity between text strings Car vs Custard
Solutions of w amp b
25
bxxKyxg
xxKyyN
xxyyN
b
xxKyxxyxw
xyw
l
iiii
Ss Smsmmms
sSs Smsmmms
s
l
i
l
ijiiijiiij
l
iiii
1
1 1
1
)()(
))((1
))()((1
)()()()(
)(
bxxybxwxgl
iiii
1
)(
Decision Boundaries
26
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
- Classification IV
- Overview
- Linear Classifier
- Distance to Hyperplane
- Selection of Classifiers
- Unknown Samples
- Margins
- Margins (2)
- Margins (3)
- Objective Function
- Lagrange Multipliers
- Solutions of w amp b
- An Example
- Soft Margin
- Soft Margin (2)
- Non-linear SVMs
- Feature Space
- Feature Space (2)
- Quadratic Basis Functions
- Calculation of Φ(xi )Φ(xj )
- It turns out hellip
- Kernel Trick
- Kernels
- String Kernel
- Solutions of w amp b (2)
- Decision Boundaries
- More Maths hellip
- SVM Roadmap
- Reading Materials
- Review
- Next Weekrsquos Class Talk
-
Quadratic Basis Functions
19
mm
m
m
m
m
xx
xx
xx
xx
xx
xx
x
x
x
x
x
x
x
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
)(
Constant Terms
Linear Terms
Pure Quadratic Terms
Quadratic Cross-Terms
Number of terms
22
)1)(2( 22
2
mmmCm
Calculation of Φ(xi )Φ(xj )
20
mm
m
m
m
m
mm
m
m
m
m
bb
bb
bb
bb
bb
bb
b
b
b
b
b
b
aa
aa
aa
aa
aa
aa
a
a
a
a
a
a
ba
1
2
32
1
31
21
2
22
21
2
1
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
1
)()(
1
m
iiiba
1
2
m
iii ba
1
22
1
1 1
2m
ijiji
m
ij
bbaa
)()( jiji xxxx
It turns out hellip
21
m
i
m
i
m
i
m
ijjijiiiii bbaabababa
1 1
1
1 1
22 221)()(
1
1 111
2
1 1 1
1
2
1
22
122)(
12
12)(12)()1(
m
i
m
iii
m
ijjjii
m
iii
m
i
m
j
m
iiijjii
m
iii
m
iii
babababa
bababa
bababababa
)()()1()( 2 bababaK )(mO )( 2mO
Kernel Trick
The linear classifier relies on dot products between vectors xixj
If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)
A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)
Example x=[x1 x2] K(xi xj) = (1 + xi xj)2
22
]2221[)( where)(x)(x
]2221[]2221[
2221)1()(
212221
21ji
212221
2121
2221
21
221122
222211
21
21
2
xxxxxxx
xxxxxxxxxxxx
xxxxxxxxxxxxxxxxK
jjjjjjiiiiii
jijijijijijijiji
Kernels
23
)tanh()(
2exp)(
)1()(
2
2
cxxxxKTangentHyperbolic
xxxxKGaussian
xxxxKPolynomial
jiji
ji
ji
djiji
String Kernel
24
64
4
2)()(
)(
catcatKcarcarK
catcarK
Similarity between text strings Car vs Custard
Solutions of w amp b
25
bxxKyxg
xxKyyN
xxyyN
b
xxKyxxyxw
xyw
l
iiii
Ss Smsmmms
sSs Smsmmms
s
l
i
l
ijiiijiiij
l
iiii
1
1 1
1
)()(
))((1
))()((1
)()()()(
)(
bxxybxwxgl
iiii
1
)(
Decision Boundaries
26
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
- Classification IV
- Overview
- Linear Classifier
- Distance to Hyperplane
- Selection of Classifiers
- Unknown Samples
- Margins
- Margins (2)
- Margins (3)
- Objective Function
- Lagrange Multipliers
- Solutions of w amp b
- An Example
- Soft Margin
- Soft Margin (2)
- Non-linear SVMs
- Feature Space
- Feature Space (2)
- Quadratic Basis Functions
- Calculation of Φ(xi )Φ(xj )
- It turns out hellip
- Kernel Trick
- Kernels
- String Kernel
- Solutions of w amp b (2)
- Decision Boundaries
- More Maths hellip
- SVM Roadmap
- Reading Materials
- Review
- Next Weekrsquos Class Talk
-
Calculation of Φ(xi )Φ(xj )
20
mm
m
m
m
m
mm
m
m
m
m
bb
bb
bb
bb
bb
bb
b
b
b
b
b
b
aa
aa
aa
aa
aa
aa
a
a
a
a
a
a
ba
1
2
32
1
31
21
2
22
21
2
1
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
1
)()(
1
m
iiiba
1
2
m
iii ba
1
22
1
1 1
2m
ijiji
m
ij
bbaa
)()( jiji xxxx
It turns out hellip
21
m
i
m
i
m
i
m
ijjijiiiii bbaabababa
1 1
1
1 1
22 221)()(
1
1 111
2
1 1 1
1
2
1
22
122)(
12
12)(12)()1(
m
i
m
iii
m
ijjjii
m
iii
m
i
m
j
m
iiijjii
m
iii
m
iii
babababa
bababa
bababababa
)()()1()( 2 bababaK )(mO )( 2mO
Kernel Trick
The linear classifier relies on dot products between vectors xixj
If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)
A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)
Example x=[x1 x2] K(xi xj) = (1 + xi xj)2
22
]2221[)( where)(x)(x
]2221[]2221[
2221)1()(
212221
21ji
212221
2121
2221
21
221122
222211
21
21
2
xxxxxxx
xxxxxxxxxxxx
xxxxxxxxxxxxxxxxK
jjjjjjiiiiii
jijijijijijijiji
Kernels
23
)tanh()(
2exp)(
)1()(
2
2
cxxxxKTangentHyperbolic
xxxxKGaussian
xxxxKPolynomial
jiji
ji
ji
djiji
String Kernel
24
64
4
2)()(
)(
catcatKcarcarK
catcarK
Similarity between text strings Car vs Custard
Solutions of w amp b
25
bxxKyxg
xxKyyN
xxyyN
b
xxKyxxyxw
xyw
l
iiii
Ss Smsmmms
sSs Smsmmms
s
l
i
l
ijiiijiiij
l
iiii
1
1 1
1
)()(
))((1
))()((1
)()()()(
)(
bxxybxwxgl
iiii
1
)(
Decision Boundaries
26
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
- Classification IV
- Overview
- Linear Classifier
- Distance to Hyperplane
- Selection of Classifiers
- Unknown Samples
- Margins
- Margins (2)
- Margins (3)
- Objective Function
- Lagrange Multipliers
- Solutions of w amp b
- An Example
- Soft Margin
- Soft Margin (2)
- Non-linear SVMs
- Feature Space
- Feature Space (2)
- Quadratic Basis Functions
- Calculation of Φ(xi )Φ(xj )
- It turns out hellip
- Kernel Trick
- Kernels
- String Kernel
- Solutions of w amp b (2)
- Decision Boundaries
- More Maths hellip
- SVM Roadmap
- Reading Materials
- Review
- Next Weekrsquos Class Talk
-
It turns out hellip
21
m
i
m
i
m
i
m
ijjijiiiii bbaabababa
1 1
1
1 1
22 221)()(
1
1 111
2
1 1 1
1
2
1
22
122)(
12
12)(12)()1(
m
i
m
iii
m
ijjjii
m
iii
m
i
m
j
m
iiijjii
m
iii
m
iii
babababa
bababa
bababababa
)()()1()( 2 bababaK )(mO )( 2mO
Kernel Trick
The linear classifier relies on dot products between vectors xixj
If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)
A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)
Example x=[x1 x2] K(xi xj) = (1 + xi xj)2
22
]2221[)( where)(x)(x
]2221[]2221[
2221)1()(
212221
21ji
212221
2121
2221
21
221122
222211
21
21
2
xxxxxxx
xxxxxxxxxxxx
xxxxxxxxxxxxxxxxK
jjjjjjiiiiii
jijijijijijijiji
Kernels
23
)tanh()(
2exp)(
)1()(
2
2
cxxxxKTangentHyperbolic
xxxxKGaussian
xxxxKPolynomial
jiji
ji
ji
djiji
String Kernel
24
64
4
2)()(
)(
catcatKcarcarK
catcarK
Similarity between text strings Car vs Custard
Solutions of w amp b
25
bxxKyxg
xxKyyN
xxyyN
b
xxKyxxyxw
xyw
l
iiii
Ss Smsmmms
sSs Smsmmms
s
l
i
l
ijiiijiiij
l
iiii
1
1 1
1
)()(
))((1
))()((1
)()()()(
)(
bxxybxwxgl
iiii
1
)(
Decision Boundaries
26
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
- Classification IV
- Overview
- Linear Classifier
- Distance to Hyperplane
- Selection of Classifiers
- Unknown Samples
- Margins
- Margins (2)
- Margins (3)
- Objective Function
- Lagrange Multipliers
- Solutions of w amp b
- An Example
- Soft Margin
- Soft Margin (2)
- Non-linear SVMs
- Feature Space
- Feature Space (2)
- Quadratic Basis Functions
- Calculation of Φ(xi )Φ(xj )
- It turns out hellip
- Kernel Trick
- Kernels
- String Kernel
- Solutions of w amp b (2)
- Decision Boundaries
- More Maths hellip
- SVM Roadmap
- Reading Materials
- Review
- Next Weekrsquos Class Talk
-
Kernel Trick
The linear classifier relies on dot products between vectors xixj
If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)
A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)
Example x=[x1 x2] K(xi xj) = (1 + xi xj)2
22
]2221[)( where)(x)(x
]2221[]2221[
2221)1()(
212221
21ji
212221
2121
2221
21
221122
222211
21
21
2
xxxxxxx
xxxxxxxxxxxx
xxxxxxxxxxxxxxxxK
jjjjjjiiiiii
jijijijijijijiji
Kernels
23
)tanh()(
2exp)(
)1()(
2
2
cxxxxKTangentHyperbolic
xxxxKGaussian
xxxxKPolynomial
jiji
ji
ji
djiji
String Kernel
24
64
4
2)()(
)(
catcatKcarcarK
catcarK
Similarity between text strings Car vs Custard
Solutions of w amp b
25
bxxKyxg
xxKyyN
xxyyN
b
xxKyxxyxw
xyw
l
iiii
Ss Smsmmms
sSs Smsmmms
s
l
i
l
ijiiijiiij
l
iiii
1
1 1
1
)()(
))((1
))()((1
)()()()(
)(
bxxybxwxgl
iiii
1
)(
Decision Boundaries
26
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
- Classification IV
- Overview
- Linear Classifier
- Distance to Hyperplane
- Selection of Classifiers
- Unknown Samples
- Margins
- Margins (2)
- Margins (3)
- Objective Function
- Lagrange Multipliers
- Solutions of w amp b
- An Example
- Soft Margin
- Soft Margin (2)
- Non-linear SVMs
- Feature Space
- Feature Space (2)
- Quadratic Basis Functions
- Calculation of Φ(xi )Φ(xj )
- It turns out hellip
- Kernel Trick
- Kernels
- String Kernel
- Solutions of w amp b (2)
- Decision Boundaries
- More Maths hellip
- SVM Roadmap
- Reading Materials
- Review
- Next Weekrsquos Class Talk
-
Kernels
23
)tanh()(
2exp)(
)1()(
2
2
cxxxxKTangentHyperbolic
xxxxKGaussian
xxxxKPolynomial
jiji
ji
ji
djiji
String Kernel
24
64
4
2)()(
)(
catcatKcarcarK
catcarK
Similarity between text strings Car vs Custard
Solutions of w amp b
25
bxxKyxg
xxKyyN
xxyyN
b
xxKyxxyxw
xyw
l
iiii
Ss Smsmmms
sSs Smsmmms
s
l
i
l
ijiiijiiij
l
iiii
1
1 1
1
)()(
))((1
))()((1
)()()()(
)(
bxxybxwxgl
iiii
1
)(
Decision Boundaries
26
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
- Classification IV
- Overview
- Linear Classifier
- Distance to Hyperplane
- Selection of Classifiers
- Unknown Samples
- Margins
- Margins (2)
- Margins (3)
- Objective Function
- Lagrange Multipliers
- Solutions of w amp b
- An Example
- Soft Margin
- Soft Margin (2)
- Non-linear SVMs
- Feature Space
- Feature Space (2)
- Quadratic Basis Functions
- Calculation of Φ(xi )Φ(xj )
- It turns out hellip
- Kernel Trick
- Kernels
- String Kernel
- Solutions of w amp b (2)
- Decision Boundaries
- More Maths hellip
- SVM Roadmap
- Reading Materials
- Review
- Next Weekrsquos Class Talk
-
String Kernel
24
64
4
2)()(
)(
catcatKcarcarK
catcarK
Similarity between text strings Car vs Custard
Solutions of w amp b
25
bxxKyxg
xxKyyN
xxyyN
b
xxKyxxyxw
xyw
l
iiii
Ss Smsmmms
sSs Smsmmms
s
l
i
l
ijiiijiiij
l
iiii
1
1 1
1
)()(
))((1
))()((1
)()()()(
)(
bxxybxwxgl
iiii
1
)(
Decision Boundaries
26
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
- Classification IV
- Overview
- Linear Classifier
- Distance to Hyperplane
- Selection of Classifiers
- Unknown Samples
- Margins
- Margins (2)
- Margins (3)
- Objective Function
- Lagrange Multipliers
- Solutions of w amp b
- An Example
- Soft Margin
- Soft Margin (2)
- Non-linear SVMs
- Feature Space
- Feature Space (2)
- Quadratic Basis Functions
- Calculation of Φ(xi )Φ(xj )
- It turns out hellip
- Kernel Trick
- Kernels
- String Kernel
- Solutions of w amp b (2)
- Decision Boundaries
- More Maths hellip
- SVM Roadmap
- Reading Materials
- Review
- Next Weekrsquos Class Talk
-
Solutions of w amp b
25
bxxKyxg
xxKyyN
xxyyN
b
xxKyxxyxw
xyw
l
iiii
Ss Smsmmms
sSs Smsmmms
s
l
i
l
ijiiijiiij
l
iiii
1
1 1
1
)()(
))((1
))()((1
)()()()(
)(
bxxybxwxgl
iiii
1
)(
Decision Boundaries
26
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
- Classification IV
- Overview
- Linear Classifier
- Distance to Hyperplane
- Selection of Classifiers
- Unknown Samples
- Margins
- Margins (2)
- Margins (3)
- Objective Function
- Lagrange Multipliers
- Solutions of w amp b
- An Example
- Soft Margin
- Soft Margin (2)
- Non-linear SVMs
- Feature Space
- Feature Space (2)
- Quadratic Basis Functions
- Calculation of Φ(xi )Φ(xj )
- It turns out hellip
- Kernel Trick
- Kernels
- String Kernel
- Solutions of w amp b (2)
- Decision Boundaries
- More Maths hellip
- SVM Roadmap
- Reading Materials
- Review
- Next Weekrsquos Class Talk
-
Decision Boundaries
26
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
- Classification IV
- Overview
- Linear Classifier
- Distance to Hyperplane
- Selection of Classifiers
- Unknown Samples
- Margins
- Margins (2)
- Margins (3)
- Objective Function
- Lagrange Multipliers
- Solutions of w amp b
- An Example
- Soft Margin
- Soft Margin (2)
- Non-linear SVMs
- Feature Space
- Feature Space (2)
- Quadratic Basis Functions
- Calculation of Φ(xi )Φ(xj )
- It turns out hellip
- Kernel Trick
- Kernels
- String Kernel
- Solutions of w amp b (2)
- Decision Boundaries
- More Maths hellip
- SVM Roadmap
- Reading Materials
- Review
- Next Weekrsquos Class Talk
-
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
- Classification IV
- Overview
- Linear Classifier
- Distance to Hyperplane
- Selection of Classifiers
- Unknown Samples
- Margins
- Margins (2)
- Margins (3)
- Objective Function
- Lagrange Multipliers
- Solutions of w amp b
- An Example
- Soft Margin
- Soft Margin (2)
- Non-linear SVMs
- Feature Space
- Feature Space (2)
- Quadratic Basis Functions
- Calculation of Φ(xi )Φ(xj )
- It turns out hellip
- Kernel Trick
- Kernels
- String Kernel
- Solutions of w amp b (2)
- Decision Boundaries
- More Maths hellip
- SVM Roadmap
- Reading Materials
- Review
- Next Weekrsquos Class Talk
-
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
- Classification IV
- Overview
- Linear Classifier
- Distance to Hyperplane
- Selection of Classifiers
- Unknown Samples
- Margins
- Margins (2)
- Margins (3)
- Objective Function
- Lagrange Multipliers
- Solutions of w amp b
- An Example
- Soft Margin
- Soft Margin (2)
- Non-linear SVMs
- Feature Space
- Feature Space (2)
- Quadratic Basis Functions
- Calculation of Φ(xi )Φ(xj )
- It turns out hellip
- Kernel Trick
- Kernels
- String Kernel
- Solutions of w amp b (2)
- Decision Boundaries
- More Maths hellip
- SVM Roadmap
- Reading Materials
- Review
- Next Weekrsquos Class Talk
-
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
- Classification IV
- Overview
- Linear Classifier
- Distance to Hyperplane
- Selection of Classifiers
- Unknown Samples
- Margins
- Margins (2)
- Margins (3)
- Objective Function
- Lagrange Multipliers
- Solutions of w amp b
- An Example
- Soft Margin
- Soft Margin (2)
- Non-linear SVMs
- Feature Space
- Feature Space (2)
- Quadratic Basis Functions
- Calculation of Φ(xi )Φ(xj )
- It turns out hellip
- Kernel Trick
- Kernels
- String Kernel
- Solutions of w amp b (2)
- Decision Boundaries
- More Maths hellip
- SVM Roadmap
- Reading Materials
- Review
- Next Weekrsquos Class Talk
-
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
- Classification IV
- Overview
- Linear Classifier
- Distance to Hyperplane
- Selection of Classifiers
- Unknown Samples
- Margins
- Margins (2)
- Margins (3)
- Objective Function
- Lagrange Multipliers
- Solutions of w amp b
- An Example
- Soft Margin
- Soft Margin (2)
- Non-linear SVMs
- Feature Space
- Feature Space (2)
- Quadratic Basis Functions
- Calculation of Φ(xi )Φ(xj )
- It turns out hellip
- Kernel Trick
- Kernels
- String Kernel
- Solutions of w amp b (2)
- Decision Boundaries
- More Maths hellip
- SVM Roadmap
- Reading Materials
- Review
- Next Weekrsquos Class Talk
-
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
- Classification IV
- Overview
- Linear Classifier
- Distance to Hyperplane
- Selection of Classifiers
- Unknown Samples
- Margins
- Margins (2)
- Margins (3)
- Objective Function
- Lagrange Multipliers
- Solutions of w amp b
- An Example
- Soft Margin
- Soft Margin (2)
- Non-linear SVMs
- Feature Space
- Feature Space (2)
- Quadratic Basis Functions
- Calculation of Φ(xi )Φ(xj )
- It turns out hellip
- Kernel Trick
- Kernels
- String Kernel
- Solutions of w amp b (2)
- Decision Boundaries
- More Maths hellip
- SVM Roadmap
- Reading Materials
- Review
- Next Weekrsquos Class Talk
-