UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational...
Transcript of UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational...
![Page 1: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/1.jpg)
UVACS4501:MachineLearning
Lecture11:SupportVectorMachine
(Basics)
Dr.YanjunQi
UniversityofVirginia
DepartmentofComputerScience
![Page 2: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/2.jpg)
Wherearewe?èFivemajorsec@onsofthiscourse
q Regression(supervised)q Classifica@on(supervised)q Unsupervisedmodelsq Learningtheoryq Graphicalmodels
10/18/18 2
Dr.YanjunQi/UVA
![Page 3: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/3.jpg)
Today
q SupportVectorMachine(SVM)ü HistoryofSVMü LargeMarginLinearClassifierü DefineMargin(M)intermsofmodelparameterü Op@miza@ontolearnmodelparameters(w,b)ü LinearlyNon-separablecaseü Op@miza@onwithdualformü Nonlineardecisionboundaryü Mul@classSVM
10/18/18 3
Dr.YanjunQi/UVA
![Page 4: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/4.jpg)
HistoryofSVM• SVMisinspiredfromsta@s@callearningtheory[3]• SVMwasfirstintroducedin1992[1]
• SVMbecomespopularbecauseofitssuccessinhandwri[endigitrecogni@on(1994)– 1.1%testerrorrateforSVM.– Thesameastheerrorratesofacarefullyconstructedneuralnetwork,LeNet
4.• [email protected][2]orthediscussionin[3]fordetails
• Regardedasanimportantexampleof“kernelmethods”,arguablytheho[estareainmachinelearning20yearsago
[1] B.E. Boser et al. A Training Algorithm for Optimal Margin Classifiers. Proceedings of the Fifth Annual Workshop on Computational Learning Theory 5 144-152, Pittsburgh, 1992.
[2] L. Bottou et al. Comparison of classifier methods: a case study in handwritten digit recognition. Proceedings of the 12th IAPR International Conference on Pattern Recognition, vol. 2, pp. 77-82, 1994.
[3] V. Vapnik. The Nature of Statistical Learning Theory. 2nd edition, Springer, 1999.
10/18/18 4
Dr.YanjunQi/UVA
Theore@callysound/Impaccul
![Page 5: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/5.jpg)
Handwri[endigitrecogni@on
10/18/18
Dr.YanjunQi/UVA
5
![Page 6: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/6.jpg)
Applica@onsofSVMs
• ComputerVision• TextCategoriza@on• Ranking(e.g.,Googlesearches)• Handwri[enCharacterRecogni@on• Timeseriesanalysis• Bioinforma@cs• ……….
àLotsofverysuccessfulapplica@ons!!!
10/18/18
Dr.YanjunQi/UVA
6
![Page 7: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/7.jpg)
ADatasetforbinary
classifica@on
• Data/points/instances/examples/samples/records:[rows]• Features/a0ributes/dimensions/independentvariables/covariates/
predictors/regressors:[columns,exceptthelast]• Target/outcome/response/label/dependentvariable:special
columntobepredicted[lastcolumn]
10/18/18 7
Output as Binary Class:
only two possibilities
Dr.YanjunQi/UVA
![Page 8: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/8.jpg)
Today
q SupportVectorMachine(SVM)ü HistoryofSVMü LargeMarginLinearClassifierü DefineMargin(M)intermsofmodelparameterü Op@miza@ontolearnmodelparameters(w,b)ü LinearlyNon-separablecaseü Op@miza@onwithdualformü Nonlineardecisionboundaryü Mul@classSVM
10/18/18 8
Dr.YanjunQi/UVA
![Page 9: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/9.jpg)
LinearClassifiersfx yest
denotes+1denotes-1
Howwouldyouclassifythisdata?
10/18/18 9
Dr.YanjunQi/UVA
Credit:Prof.Moore
![Page 10: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/10.jpg)
LinearClassifiersfx yest
denotes+1denotes-1
Howwouldyouclassifythisdata?
10/18/18 10
Dr.YanjunQi/UVA
Credit:Prof.Moore
![Page 11: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/11.jpg)
LinearClassifiersfx yest
denotes+1denotes-1
Howwouldyouclassifythisdata?
10/18/18 11
Dr.YanjunQi/UVA
Credit:Prof.Moore
![Page 12: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/12.jpg)
LinearClassifiersfx yest
denotes+1denotes-1
Howwouldyouclassifythisdata?
10/18/18 12
Dr.YanjunQi/UVA
Credit:Prof.Moore
![Page 13: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/13.jpg)
LinearClassifiersfx yest
denotes+1denotes-1
Anyofthesewouldbefine....butwhichisbest?
10/18/18 13
Dr.YanjunQi/UVA
Credit:Prof.Moore
![Page 14: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/14.jpg)
ClassifierMarginfx yest
denotes+1denotes-1 Definethemarginof
alinearclassifierasthewidththattheboundarycouldbeincreasedbybeforehilngadatapoint.
10/18/18 14
Dr.YanjunQi/UVA
Credit:Prof.Moore
![Page 15: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/15.jpg)
MaximumMarginfx yest
denotes+1denotes-1 Themaximum
marginlinearclassifieristhelinearclassifierwiththe,maximummargin.ThisisthesimplestkindofSVM(CalledanLSVM)
LinearSVM10/18/18 15
Dr.YanjunQi/UVA
Credit:Prof.Moore
![Page 16: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/16.jpg)
MaximumMarginfx yest
denotes+1denotes-1 Themaximum
marginlinearclassifieristhelinearclassifierwiththe,maximummargin.ThisisthesimplestkindofSVM(CalledanLSVM)
SupportVectorsarethosedatapointsthatthemarginpushesupagainst
LinearSVM10/18/18 16
Dr.YanjunQi/UVA
x2
x1
Credit:Prof.Moore
![Page 17: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/17.jpg)
MaximumMarginfx yest
denotes+1denotes-1 Themaximum
marginlinearclassifieristhelinearclassifierwiththe,maximummargin.ThisisthesimplestkindofSVM(CalledanLSVM)
SupportVectorsarethosedatapointsthatthemarginpushesupagainst
LinearSVM
f(x,w,b)=sign(wTx+b)
10/18/18 17
Dr.YanjunQi/UVA
x2
x1
Credit:Prof.Moore
![Page 18: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/18.jpg)
Max margin classifiers
• Instead of fitting all points, focus on boundary points
• Learn a boundary that leads to the largest margin from both sets of points
From all the possible boundary lines, this leads to the largest margin on both sides
10/18/18 18
Dr.YanjunQi/UVA
x2
x1
Credit:Prof.Moore
![Page 19: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/19.jpg)
Max margin classifiers
• Instead of fitting all points, focus on boundary points
• Learn a boundary that leads to the largest margin from points on both sides
D
D Why MAX margin?
• Intuitive, ‘makes sense’
• Some theoretical support (using VC dimension)
• Works well in practice
10/18/18 19
Dr.YanjunQi/UVA
x2
x1
Credit:Prof.Moore
![Page 20: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/20.jpg)
Max margin classifiers
• Instead of fitting all points, focus on boundary points
• Learn a boundary that leads to the largest margin from points on both sides
D
D That is why Also known as linear support vector machines (SVMs)
These are the vectors supporting the boundary
10/18/18 20
Dr.YanjunQi/UVA
x2
x1
Credit:Prof.Moore
![Page 21: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/21.jpg)
HowtorepresentaLinearDecisionBoundary?
10/18/18
Dr.YanjunQi/UVA
21
![Page 22: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/22.jpg)
Review:AffineHyperplanes
• h[ps://en.wikipedia.org/wiki/Hyperplane• Anyhyperplanecanbegivenincoordinatesasthesolu@onofasinglelinear(algebraic)equa@onofdegree1.
10/18/18
Dr.YanjunQi/UVA
22
Q:Howdoesthisconnecttolinearregression?
![Page 23: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/23.jpg)
10/18/18
Dr.YanjunQi/UVA
23
![Page 24: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/24.jpg)
Max-margin&DecisionBoundary
Class -1
Class 1
10/18/18
Dr.YanjunQi/UVA
24
W is a p-dim vector; b is a
scalar
x2
x1
![Page 25: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/25.jpg)
Max-margin&DecisionBoundary• Thedecisionboundaryshouldbeasfarawayfrom
thedataofbothclassesaspossible
Class -1
Class 1
W is a p-dim vector; b is a
scalar
10/18/18
Dr.YanjunQi/UVA
25
![Page 26: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/26.jpg)
Specifying a max margin classifier
Predict class +1
Predict class -1 wTx+b=+1
wTx+b=0
wTx+b=-1
Classify as +1 if wTx+b >= 1 Classify as -1 if wTx+b <= - 1 Undefined if -1 <wTx+b < 1
Class +1 plane
boundary
Class -1 plane
10/18/18 26
Dr.YanjunQi/UVA
![Page 27: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/27.jpg)
Specifying a max margin classifier
Predict class +1
Predict class -1 wTx+b=+1
wTx+b=0
wTx+b=-1
Classify as +1 if wTx+b >= 1 Classify as -1 if wTx+b <= - 1 Undefined if -1 <wTx+b < 1
Is the linear separation assumption realistic?
We will deal with this shortly, but lets assume it for now
10/18/18 27
Dr.YanjunQi/UVA
![Page 28: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/28.jpg)
Today
q SupervisedClassifica@onq SupportVectorMachine(SVM)
ü HistoryofSVMü LargeMarginLinearClassifierü DefineMargin(M)intermsofmodelparameterü Op@miza@ontolearnmodelparameters(w,b)ü LinearlyNon-separablecaseü Op@miza@onwithdualformü Nonlineardecisionboundaryü Mul@classSVM
10/18/18 28
Dr.YanjunQi/UVA
![Page 29: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/29.jpg)
Maximizing the margin
Predict class +1
Predict class -1 wTx+b=+1
wTx+b=0
wTx+b=-1
Classify as +1 if wTx+b >= 1 Classify as -1 if wTx+b <= - 1 Undefined if -1 <wTx+b < 1 M
• Lets define the width of the margin by M
• How can we encode our goal of maximizing M in terms of our parameters (w and b)?
• Lets start with a few obsevrations
10/18/18 29
Dr.YanjunQi/UVA
Concretederiva@onsofMseeExtra
![Page 30: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/30.jpg)
10/18/18
Dr.YanjunQi/UVA
30
![Page 31: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/31.jpg)
Finding the optimal parameters
Predict class +1
Predict class -1 wTx+b=+1
wTx+b=0
wTx+b=-1
M x+
x-
€
M =2wTw
We can now search for the optimal parameters by finding a solution that:
1. Correctly classifies all points
2. Maximizes the margin (or equivalently minimizes wTw)
Several optimization methods can be used: Gradient descent, OR SMO (see extra slides)
10/18/18 31
Dr.YanjunQi/UVA
![Page 32: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/32.jpg)
Today
q SupportVectorMachine(SVM)ü HistoryofSVMü LargeMarginLinearClassifierü DefineMargin(M)intermsofmodelparameterü Op@miza@ontolearnmodelparameters(w,b)ü LinearlyNon-separablecaseü Op@miza@onwithdualformü Nonlineardecisionboundaryü Prac@calGuide
10/18/18 32
Dr.YanjunQi/UVA
![Page 33: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/33.jpg)
Optimization Step i.e. learning optimal parameter for SVM
Predict class +1
Predict class -1 wTx+b=+1
wTx+b=0
wTx+b=-1
M x+
x- €
M =2wTw
10/18/18 33
Dr.YanjunQi/UVA
![Page 34: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/34.jpg)
Optimization Step i.e. learning optimal parameter for SVM
Predict class +1
Predict class -1 wTx+b=+1
wTx+b=0
wTx+b=-1
M x+
x- €
M =2wTw
Min (wTw)/2 subject to the following constraints:
10/18/18 34
Dr.YanjunQi/UVA
![Page 35: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/35.jpg)
Optimization Step i.e. learning optimal parameter for SVM
Predict class +1
Predict class -1 wTx+b=+1
wTx+b=0
wTx+b=-1
M x+
x- €
M =2wTw
Min (wTw)/2 subject to the following constraints:
For all x in class + 1
wTx+b >= 1
For all x in class - 1
wTx+b <= -1
} A total of n constraints if we have n training samples
10/18/18 35
Dr.YanjunQi/UVA
![Page 36: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/36.jpg)
Optimization Reformulation
Min (wTw)/2 subject to the following constraints:
For all x in class + 1
wTx+b >= 1
For all x in class - 1
wTx+b <= -1
} A total of n constraints if we have n input samples
10/18/18 36
Dr.YanjunQi/UVA
![Page 37: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/37.jpg)
Optimization Reformulation
Min (wTw)/2 subject to the following constraints:
For all x in class + 1
wTx+b >= 1
For all x in class - 1
wTx+b <= -1
} A total of n constraints if we have n input samples
10/18/18 37
Dr.YanjunQi/UVA
argminw,b
wi2
i=1p∑
subject to ∀x i ∈Dtrain : yi x i ⋅w + b( ) ≥1
![Page 38: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/38.jpg)
Optimization Reformulation
Min (wTw)/2 subject to the following constraints:
For all x in class + 1
wTx+b >= 1
For all x in class - 1
wTx+b <= -1
}A total of n constraints if we have n input samples
10/18/18 38
Dr.YanjunQi/UVA
argminw,b
wi2
i=1p∑
subject to ∀x i ∈Dtrain : yi wT x i + b( ) ≥1
Quadratic Objective
Quadratic programming i.e., - Quadratic objective - Linear constraints
![Page 39: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/39.jpg)
WhatNext?
q SupportVectorMachine(SVM)ü HistoryofSVMü LargeMarginLinearClassifierü DefineMargin(M)intermsofmodelparameterü Op@miza@ontolearnmodelparameters(w,b)ü LinearlyNon-separablecase(sopSVM)ü Op@miza@onwithdualformü Nonlineardecisionboundaryü Prac@calGuide
10/18/18 39
Dr.YanjunQi/UVA
![Page 40: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/40.jpg)
Support Vector Machine
classification
Kernel Trick Func K(xi, xj)
Margin + Hinge Loss
QP with Dual form
Dual Weights
Task
Representation
Score Function
Search/Optimization
Models, Parameters
€
w = α ixiyii∑
argminw,b
wi2
i=1p∑ +C εi
i=1
n
∑
subject to ∀xi ∈ Dtrain : yi xi ⋅w+b( ) ≥1−εi
K(x, z) :=Φ(x)TΦ(z)
40
maxα α i −i∑ 1
2 α iα jyi y ji,j∑ xi
Txj
α iyi =0i∑ , α i ≥0 ∀i
![Page 41: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/41.jpg)
References• BigthankstoProf.ZivBar-JosephandProf.EricXing@CMUforallowingmetoreusesomeofhisslides
• ElementsofSta@s@calLearning,byHas@e,TibshiraniandFriedman
• Prof.AndrewMoore@CMU’sslides• TutorialslidesfromDr.Tie-YanLiu,MSRAsia• APrac@calGuidetoSupportVectorClassifica@onChih-WeiHsu,Chih-ChungChang,andChih-JenLin,2003-2010
• TutorialslidesfromStanford“ConvexOp@miza@onI—Boyd&Vandenberghe
10/18/18 41
Dr.YanjunQi/UVACS
![Page 42: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/42.jpg)
EXTRA
10/18/18
Dr.YanjunQi/UVA
42
![Page 43: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/43.jpg)
10/18/18
Dr.YanjunQi/UVA
43
Thegradientpointsinthedirec3onofthegreatestrateofincreaseofthefunc3onanditsmagnitudeistheslopeofthegraphinthatdirec3on
![Page 44: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/44.jpg)
How to definethewidthofthemarginbyM(EXTRA)
Predict class +1
Predict class -1 wTx+b=+1
wTx+b=0
wTx+b=-1
Classify as +1 if wTx+b >= 1 Classify as -1 if wTx+b <= - 1 Undefined if -1 <wTx+b < 1 M
• Lets define the width of the margin by M
• How can we encode our goal of maximizing M in terms of our parameters (w and b)?
• Lets start with a few obsevrations
10/18/18 44
Dr.YanjunQi/UVA
Concretederiva@onsofMseeExtra
![Page 45: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/45.jpg)
MarginM
10/18/18
Dr.YanjunQi/UVA
45
![Page 46: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/46.jpg)
10/18/18
Dr.YanjunQi/UVA
46
• wT x+ + b = +1
• wT x- + b = -1
• M = | x+ - x- | = ?
![Page 47: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/47.jpg)
Maximizing the margin: observation-1
Predict class +1
Predict class -1 wTx+b=+1
wTx+b=0
wTx+b=-1
M
• Observation 1: the vector w is orthogonal to the +1 plane
• Why?
Corollary: the vector w is orthogonal to the -1 plane 10/18/18 47
Dr.YanjunQi/UVA
![Page 48: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/48.jpg)
Maximizing the margin: observation-1
Predict class +1
Predict class -1 wTx+b=+1
wTx+b=0
wTx+b=-1
Classify as +1 if wTx+b >= 1 Classify as -1 if wTx+b <= - 1 Undefined if -1 <wTx+b < 1
M
• Observation 1: the vector w is orthogonal to the +1 plane
• Why?
Let u and v be two points on the +1 plane, then for the vector defined by u and v we have wT(u-v) = 0
Corollary: the vector w is orthogonal to the -1 plane 10/18/18 48
Dr.YanjunQi/UVA
![Page 49: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/49.jpg)
Maximizing the margin: observation-1
Predict class +1
Predict class -1 wTx+b=+1
wTx+b=0
wTx+b=-1
Classify as +1 if wTx+b >= 1 Classify as -1 if wTx+b <= - 1 Undefined if -1 <wTx+b < 1
M
• Observation 1: the vector w is orthogonal to the +1 plane
• Why?
Let u and v be two points on the +1 plane, then for the vector defined by u and v we have wT(u-v) = 0
Corollary: the vector w is orthogonal to the -1 plane 10/18/18 49
Dr.YanjunQi/UVA
![Page 50: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/50.jpg)
Review:VectorProduct,Orthogonal,andNorm
Fortwovectorsxandy,
xTy
iscalledthe(inner)vectorproduct.
Thesquarerootoftheproductofavectorwithitself,
iscalledthe2-norm(|x|2),canalsowriteas|x|
xandyarecalledorthogonalif
xTy=0
10/18/18 50
Dr.YanjunQi/UVA
![Page 51: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/51.jpg)
Observa@on1èReview:Orthogonal
10/18/18 51
Dr.YanjunQi/UVA
![Page 52: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/52.jpg)
Maximizing the margin: observation-1
• ObservaMon1:thevectorwisorthogonaltothe+1plane
Class 1
Class 2
M
10/18/18
Dr.YanjunQi/UVA
52
![Page 53: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/53.jpg)
Maximizing the margin: observation-2
Predict class +1
Predict class -1 wTx+b=+1
wTx+b=0
wTx+b=-1
Classify as +1 if wTx+b >= 1 Classify as -1 if wTx+b <= - 1 Undefined if -1 <wTx+b < 1
M
• Observation 1: the vector w is orthogonal to the +1 and -1 planes
• Observation 2: if x+ is a point on the +1 plane and x- is the closest point to x+ on the -1 plane then
x+ = λ w + x-
x+
x-
Since w is orthogonal to both planes we need to ‘travel’ some distance along w to get from x+ to x-
10/18/18 53
Dr.YanjunQi/UVA
![Page 54: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/54.jpg)
10/18/18
Dr.YanjunQi/UVA
54
• wT x+ + b = +1
• wT x- + b = -1
• M = | x+ - x- | = ?
• x+ = λ w + x-
Putting it together
![Page 55: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/55.jpg)
Putting it together
Predict class +1
Predict class -1 wTx+b=+1
wTx+b=0
wTx+b=-1
M
• wT x+ + b = +1
• wT x- + b = -1
• x+ = λ w + x-
• | x+ - x- | = M
x+
x-
We can now define M in terms of w and b
10/18/18 55
Dr.YanjunQi/UVA
![Page 56: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/56.jpg)
10/18/18
Dr.YanjunQi/UVA
56
![Page 57: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/57.jpg)
Putting it together
Predict class +1
Predict class -1 wTx+b=+1
wTx+b=0
wTx+b=-1
M
• wT x+ + b = +1
• wT x- + b = -1
• x+ = λw + x-
• | x+ - x- | = M
x+
x-
We can now define M in terms of w and b
wT x+ + b = +1
=>
wT (λw + x-) + b = +1 =>
wTx- + b + λwTw = +1
=>
-1 + λwTw = +1
=>
λ = 2/wTw
10/18/18 57
Dr.YanjunQi/UVA
![Page 58: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier](https://reader036.fdocuments.in/reader036/viewer/2022071214/60420bc1854bf20dfb0c79cd/html5/thumbnails/58.jpg)
Putting it together
Predict class +1
Predict class -1 wTx+b=+1
wTx+b=0
wTx+b=-1
M
• wT x+ + b = +1
• wT x- + b = -1
• x+ = λ w + x-
• | x+ - x- | = M
• λ = 2/wTw
x+
x-
We can now define M in terms of w and b
M = |x+ - x-|
=>
=>
M =| λw |= λ | w |= λ wTw
€
M = 2 wTwwTw
=2wTw
10/18/18 58
Dr.YanjunQi/UVA