Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf ·...
Transcript of Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf ·...
![Page 1: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/1.jpg)
Non-Parametric Methods andSupport Vector Machines
Shan-Hung [email protected]
Department of Computer Science,National Tsing Hua University, Taiwan
Machine Learning
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 1 / 42
![Page 2: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/2.jpg)
Outline
1Non-Parametric Methods
K-NNParzen WindowsLocal Models
2Support Vector Machines
SVCSlacksNonlinear SVCDual ProblemKernel Trick
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 2 / 42
![Page 3: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/3.jpg)
Outline
1Non-Parametric Methods
K-NNParzen WindowsLocal Models
2Support Vector Machines
SVCSlacksNonlinear SVCDual ProblemKernel Trick
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 3 / 42
![Page 4: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/4.jpg)
Outline
1Non-Parametric Methods
K-NNParzen WindowsLocal Models
2Support Vector Machines
SVCSlacksNonlinear SVCDual ProblemKernel Trick
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 4 / 42
![Page 5: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/5.jpg)
K-NN Methods I
The K-nearest neighbor (K-NN) methods are straightforward, but afundamentally different way, to predict the label of a data point x:
1 Choose the number K and a distance metric2 Find the K nearest neighbors of a given point x
3 Predict the label of x by the majority vote (in classification) or average(in regression) of NNs’ labels
Distance metric? E.g., Euclidean distance d(x(i),x) = kx
(i)�xkTraining algorithm? Simply “remember” X in storage
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 5 / 42
![Page 6: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/6.jpg)
K-NN Methods I
The K-nearest neighbor (K-NN) methods are straightforward, but afundamentally different way, to predict the label of a data point x:
1 Choose the number K and a distance metric
2 Find the K nearest neighbors of a given point x
3 Predict the label of x by the majority vote (in classification) or average(in regression) of NNs’ labels
Distance metric? E.g., Euclidean distance d(x(i),x) = kx
(i)�xkTraining algorithm? Simply “remember” X in storage
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 5 / 42
![Page 7: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/7.jpg)
K-NN Methods I
The K-nearest neighbor (K-NN) methods are straightforward, but afundamentally different way, to predict the label of a data point x:
1 Choose the number K and a distance metric2 Find the K nearest neighbors of a given point x
3 Predict the label of x by the majority vote (in classification) or average(in regression) of NNs’ labels
Distance metric? E.g., Euclidean distance d(x(i),x) = kx
(i)�xkTraining algorithm? Simply “remember” X in storage
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 5 / 42
![Page 8: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/8.jpg)
K-NN Methods I
The K-nearest neighbor (K-NN) methods are straightforward, but afundamentally different way, to predict the label of a data point x:
1 Choose the number K and a distance metric2 Find the K nearest neighbors of a given point x
3 Predict the label of x by the majority vote (in classification) or average(in regression) of NNs’ labels
Distance metric? E.g., Euclidean distance d(x(i),x) = kx
(i)�xkTraining algorithm? Simply “remember” X in storage
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 5 / 42
![Page 9: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/9.jpg)
K-NN Methods I
The K-nearest neighbor (K-NN) methods are straightforward, but afundamentally different way, to predict the label of a data point x:
1 Choose the number K and a distance metric2 Find the K nearest neighbors of a given point x
3 Predict the label of x by the majority vote (in classification) or average(in regression) of NNs’ labels
Distance metric?
E.g., Euclidean distance d(x(i),x) = kx
(i)�xkTraining algorithm? Simply “remember” X in storage
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 5 / 42
![Page 10: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/10.jpg)
K-NN Methods I
The K-nearest neighbor (K-NN) methods are straightforward, but afundamentally different way, to predict the label of a data point x:
1 Choose the number K and a distance metric2 Find the K nearest neighbors of a given point x
3 Predict the label of x by the majority vote (in classification) or average(in regression) of NNs’ labels
Distance metric? E.g., Euclidean distance d(x(i),x) = kx
(i)�xkTraining algorithm?
Simply “remember” X in storage
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 5 / 42
![Page 11: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/11.jpg)
K-NN Methods I
The K-nearest neighbor (K-NN) methods are straightforward, but afundamentally different way, to predict the label of a data point x:
1 Choose the number K and a distance metric2 Find the K nearest neighbors of a given point x
3 Predict the label of x by the majority vote (in classification) or average(in regression) of NNs’ labels
Distance metric? E.g., Euclidean distance d(x(i),x) = kx
(i)�xkTraining algorithm? Simply “remember” X in storage
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 5 / 42
![Page 12: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/12.jpg)
K-NN Methods II
Could be very complexK is a hyperparameter controlling the model complexity
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 6 / 42
![Page 13: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/13.jpg)
Non-Parametric Methods
K-NN method is a special case of non-parametric (ormemory-based) methods
Non-parametric in the sense that f are not described by only fewparametersMemory-based in that all data (rather than just parameters) need to bememorized during the training process
K-NN is also a lazy method since the prediction function f is obtainedonly before the prediction
Motivates the development of other local models
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 7 / 42
![Page 14: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/14.jpg)
Non-Parametric Methods
K-NN method is a special case of non-parametric (ormemory-based) methods
Non-parametric in the sense that f are not described by only fewparameters
Memory-based in that all data (rather than just parameters) need to bememorized during the training process
K-NN is also a lazy method since the prediction function f is obtainedonly before the prediction
Motivates the development of other local models
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 7 / 42
![Page 15: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/15.jpg)
Non-Parametric Methods
K-NN method is a special case of non-parametric (ormemory-based) methods
Non-parametric in the sense that f are not described by only fewparametersMemory-based in that all data (rather than just parameters) need to bememorized during the training process
K-NN is also a lazy method since the prediction function f is obtainedonly before the prediction
Motivates the development of other local models
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 7 / 42
![Page 16: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/16.jpg)
Non-Parametric Methods
K-NN method is a special case of non-parametric (ormemory-based) methods
Non-parametric in the sense that f are not described by only fewparametersMemory-based in that all data (rather than just parameters) need to bememorized during the training process
K-NN is also a lazy method since the prediction function f is obtainedonly before the prediction
Motivates the development of other local models
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 7 / 42
![Page 17: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/17.jpg)
Non-Parametric Methods
K-NN method is a special case of non-parametric (ormemory-based) methods
Non-parametric in the sense that f are not described by only fewparametersMemory-based in that all data (rather than just parameters) need to bememorized during the training process
K-NN is also a lazy method since the prediction function f is obtainedonly before the prediction
Motivates the development of other local models
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 7 / 42
![Page 18: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/18.jpg)
Pros & Cons
Pros:Almost no assumption on f other than smoothness
High capacity/complexityHigh accuracy given a large training set
Supports online training (by simply memorizing)Readily extensible to multi-class and regression problems
Cons:Storage demandingSensitive to outliersSensitive to irrelevant data features (vs. decision trees)Needs to deal with missing data (e.g., special distances)Computationally expensive: O(ND) time for making each prediction
Can speed up with index and/or approximation
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 8 / 42
![Page 19: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/19.jpg)
Pros & Cons
Pros:Almost no assumption on f other than smoothness
High capacity/complexityHigh accuracy given a large training set
Supports online training (by simply memorizing)
Readily extensible to multi-class and regression problemsCons:
Storage demandingSensitive to outliersSensitive to irrelevant data features (vs. decision trees)Needs to deal with missing data (e.g., special distances)Computationally expensive: O(ND) time for making each prediction
Can speed up with index and/or approximation
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 8 / 42
![Page 20: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/20.jpg)
Pros & Cons
Pros:Almost no assumption on f other than smoothness
High capacity/complexityHigh accuracy given a large training set
Supports online training (by simply memorizing)Readily extensible to multi-class and regression problems
Cons:Storage demandingSensitive to outliersSensitive to irrelevant data features (vs. decision trees)Needs to deal with missing data (e.g., special distances)Computationally expensive: O(ND) time for making each prediction
Can speed up with index and/or approximation
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 8 / 42
![Page 21: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/21.jpg)
Pros & Cons
Pros:Almost no assumption on f other than smoothness
High capacity/complexityHigh accuracy given a large training set
Supports online training (by simply memorizing)Readily extensible to multi-class and regression problems
Cons:Storage demanding
Sensitive to outliersSensitive to irrelevant data features (vs. decision trees)Needs to deal with missing data (e.g., special distances)Computationally expensive: O(ND) time for making each prediction
Can speed up with index and/or approximation
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 8 / 42
![Page 22: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/22.jpg)
Pros & Cons
Pros:Almost no assumption on f other than smoothness
High capacity/complexityHigh accuracy given a large training set
Supports online training (by simply memorizing)Readily extensible to multi-class and regression problems
Cons:Storage demandingSensitive to outliers
Sensitive to irrelevant data features (vs. decision trees)Needs to deal with missing data (e.g., special distances)Computationally expensive: O(ND) time for making each prediction
Can speed up with index and/or approximation
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 8 / 42
![Page 23: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/23.jpg)
Pros & Cons
Pros:Almost no assumption on f other than smoothness
High capacity/complexityHigh accuracy given a large training set
Supports online training (by simply memorizing)Readily extensible to multi-class and regression problems
Cons:Storage demandingSensitive to outliersSensitive to irrelevant data features (vs. decision trees)
Needs to deal with missing data (e.g., special distances)Computationally expensive: O(ND) time for making each prediction
Can speed up with index and/or approximation
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 8 / 42
![Page 24: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/24.jpg)
Pros & Cons
Pros:Almost no assumption on f other than smoothness
High capacity/complexityHigh accuracy given a large training set
Supports online training (by simply memorizing)Readily extensible to multi-class and regression problems
Cons:Storage demandingSensitive to outliersSensitive to irrelevant data features (vs. decision trees)Needs to deal with missing data (e.g., special distances)
Computationally expensive: O(ND) time for making each predictionCan speed up with index and/or approximation
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 8 / 42
![Page 25: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/25.jpg)
Pros & Cons
Pros:Almost no assumption on f other than smoothness
High capacity/complexityHigh accuracy given a large training set
Supports online training (by simply memorizing)Readily extensible to multi-class and regression problems
Cons:Storage demandingSensitive to outliersSensitive to irrelevant data features (vs. decision trees)Needs to deal with missing data (e.g., special distances)Computationally expensive: O(ND) time for making each prediction
Can speed up with index and/or approximation
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 8 / 42
![Page 26: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/26.jpg)
Outline
1Non-Parametric Methods
K-NNParzen WindowsLocal Models
2Support Vector Machines
SVCSlacksNonlinear SVCDual ProblemKernel Trick
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 9 / 42
![Page 27: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/27.jpg)
Parzen Windows and KernelsBinary KNN classifier:
f (x) = sign
⇣Âi :x
(i)2KNN(x)y(i)
⌘
The “radius” of voter boundary depends on the input x
We can instead use the Parzen window with a fixed radius:
f (x) = sign
⇣Âi y(i)1(x(i);kx
(i)�xk R)⌘
Parzen windows also replace the hard boundary with a soft one:
f (x) = sign
⇣Âi y(i)k(x(i),x)
⌘
k(x(i),x) is a radial basis function (RBF) kernel whose valuedecreases along space radiating outward from x
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 10 / 42
![Page 28: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/28.jpg)
Parzen Windows and KernelsBinary KNN classifier:
f (x) = sign
⇣Âi :x
(i)2KNN(x)y(i)
⌘
The “radius” of voter boundary depends on the input x
We can instead use the Parzen window with a fixed radius:
f (x) = sign
⇣Âi y(i)1(x(i);kx
(i)�xk R)⌘
Parzen windows also replace the hard boundary with a soft one:
f (x) = sign
⇣Âi y(i)k(x(i),x)
⌘
k(x(i),x) is a radial basis function (RBF) kernel whose valuedecreases along space radiating outward from x
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 10 / 42
![Page 29: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/29.jpg)
Parzen Windows and KernelsBinary KNN classifier:
f (x) = sign
⇣Âi :x
(i)2KNN(x)y(i)
⌘
The “radius” of voter boundary depends on the input x
We can instead use the Parzen window with a fixed radius:
f (x) = sign
⇣Âi y(i)1(x(i);kx
(i)�xk R)⌘
Parzen windows also replace the hard boundary with a soft one:
f (x) = sign
⇣Âi y(i)k(x(i),x)
⌘
k(x(i),x) is a radial basis function (RBF) kernel whose valuedecreases along space radiating outward from x
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 10 / 42
![Page 30: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/30.jpg)
Parzen Windows and KernelsBinary KNN classifier:
f (x) = sign
⇣Âi :x
(i)2KNN(x)y(i)
⌘
The “radius” of voter boundary depends on the input x
We can instead use the Parzen window with a fixed radius:
f (x) = sign
⇣Âi y(i)1(x(i);kx
(i)�xk R)⌘
Parzen windows also replace the hard boundary with a soft one:
f (x) = sign
⇣Âi y(i)k(x(i),x)
⌘
k(x(i),x) is a radial basis function (RBF) kernel whose valuedecreases along space radiating outward from x
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 10 / 42
![Page 31: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/31.jpg)
Common RBF Kernels
How to act like soft K-NN?
Gaussian RBF kernel:
k(x(i),x) = N (x(i)�x;0,s2
I)
Or simplyk(x(i),x) = exp
⇣�gkx
(i)�xk2
⌘
g � 0 (or s
2) is a hyperparameter controlling the smoothness of f
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 11 / 42
![Page 32: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/32.jpg)
Common RBF Kernels
How to act like soft K-NN?Gaussian RBF kernel:
k(x(i),x) = N (x(i)�x;0,s2
I)
Or simplyk(x(i),x) = exp
⇣�gkx
(i)�xk2
⌘
g � 0 (or s
2) is a hyperparameter controlling the smoothness of f
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 11 / 42
![Page 33: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/33.jpg)
Common RBF Kernels
How to act like soft K-NN?Gaussian RBF kernel:
k(x(i),x) = N (x(i)�x;0,s2
I)
Or simplyk(x(i),x) = exp
⇣�gkx
(i)�xk2
⌘
g � 0 (or s
2) is a hyperparameter controlling the smoothness of f
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 11 / 42
![Page 34: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/34.jpg)
Outline
1Non-Parametric Methods
K-NNParzen WindowsLocal Models
2Support Vector Machines
SVCSlacksNonlinear SVCDual ProblemKernel Trick
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 12 / 42
![Page 35: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/35.jpg)
Locally Weighted Linear Regression
In addition to the majority voting and average, we can define local
models for lazy predictions
E.g., in (eager) linear regression, we find w 2 RD+1 that minimizesSSE:
argmin
w
Âi(y(i)�w
>x
(i))2
Local model: to find w minimizing SSE local to the point x we
want to predict:
argmin
w
Âi
k(x(i),x)(y(i)�w
>x
(i))2
k(·, ·) 2 R is an RBF kernel
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 13 / 42
![Page 36: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/36.jpg)
Locally Weighted Linear Regression
In addition to the majority voting and average, we can define local
models for lazy predictionsE.g., in (eager) linear regression, we find w 2 RD+1 that minimizesSSE:
argmin
w
Âi(y(i)�w
>x
(i))2
Local model: to find w minimizing SSE local to the point x we
want to predict:
argmin
w
Âi
k(x(i),x)(y(i)�w
>x
(i))2
k(·, ·) 2 R is an RBF kernel
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 13 / 42
![Page 37: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/37.jpg)
Outline
1Non-Parametric Methods
K-NNParzen WindowsLocal Models
2Support Vector Machines
SVCSlacksNonlinear SVCDual ProblemKernel Trick
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 14 / 42
![Page 38: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/38.jpg)
Kernel Machines
Kernel machines:
f (x) =N
Âi=1
cik(x(i),x)+ c0
For example:Parzen windows: ci = y(i) and c
0
= 0
Locally weighted linear regression: ci = (y(i)�w
>x
(i))2 and c0
= 0
The variable c 2 RN can be learned in either an eager or lazy mannerPros: complex, but highly accurate if regularized well
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 15 / 42
![Page 39: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/39.jpg)
Kernel Machines
Kernel machines:
f (x) =N
Âi=1
cik(x(i),x)+ c0
For example:Parzen windows: ci = y(i) and c
0
= 0
Locally weighted linear regression: ci = (y(i)�w
>x
(i))2 and c0
= 0
The variable c 2 RN can be learned in either an eager or lazy mannerPros: complex, but highly accurate if regularized well
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 15 / 42
![Page 40: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/40.jpg)
Sparse Kernel Machines
To make a prediction, we need to store all examplesMay be infeasible due to
Large dataset (N)Time limitSpace limit
Can we make c sparse?I.e., to make ci 6= 0 for only a small fraction of examples called support
vectors
How?
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 16 / 42
![Page 41: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/41.jpg)
Sparse Kernel Machines
To make a prediction, we need to store all examplesMay be infeasible due to
Large dataset (N)Time limitSpace limit
Can we make c sparse?I.e., to make ci 6= 0 for only a small fraction of examples called support
vectors
How?
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 16 / 42
![Page 42: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/42.jpg)
Outline
1Non-Parametric Methods
K-NNParzen WindowsLocal Models
2Support Vector Machines
SVCSlacksNonlinear SVCDual ProblemKernel Trick
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 17 / 42
![Page 43: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/43.jpg)
Separating Hyperplane I
Model: F= {f : f (x;w,b) = w
>x+b}
A collection of hyperplanes
Prediction: y = sign(f (x))
Training: to find w and b such that
w
>x
(i) +b � 0, if y(i) = 1
w
>x
(i) +b 0, if y(i) =�1
or simplyy(i)(w>
x
(i) +b)� 0
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 18 / 42
![Page 44: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/44.jpg)
Separating Hyperplane I
Model: F= {f : f (x;w,b) = w
>x+b}
A collection of hyperplanes
Prediction: y = sign(f (x))
Training: to find w and b such that
w
>x
(i) +b � 0, if y(i) = 1
w
>x
(i) +b 0, if y(i) =�1
or simplyy(i)(w>
x
(i) +b)� 0
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 18 / 42
![Page 45: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/45.jpg)
Separating Hyperplane I
Model: F= {f : f (x;w,b) = w
>x+b}
A collection of hyperplanes
Prediction: y = sign(f (x))
Training: to find w and b such that
w
>x
(i) +b � 0, if y(i) = 1
w
>x
(i) +b 0, if y(i) =�1
or simplyy(i)(w>
x
(i) +b)� 0
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 18 / 42
![Page 46: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/46.jpg)
Separating Hyperplane II
There are many feasible w’s and b’s when the classes are linearlyseparableWhich hyperplane is the best?
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 19 / 42
![Page 47: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/47.jpg)
Support Vector ClassificationSupport vector classifier (SVC) picks one with largest margin:
y(i)(w>x
(i) +b)� a for all iMargin: 2a/kwk [Homework]
With loss of generality, we let a = 1 and solve the problem:
argmin
w,b1
2
kwk2
sibject to y(i)(w>x
(i) +b)� 1,8i
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 20 / 42
![Page 48: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/48.jpg)
Support Vector ClassificationSupport vector classifier (SVC) picks one with largest margin:
y(i)(w>x
(i) +b)� a for all iMargin: 2a/kwk [Homework]
With loss of generality, we let a = 1 and solve the problem:
argmin
w,b1
2
kwk2
sibject to y(i)(w>x
(i) +b)� 1,8i
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 20 / 42
![Page 49: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/49.jpg)
Outline
1Non-Parametric Methods
K-NNParzen WindowsLocal Models
2Support Vector Machines
SVCSlacksNonlinear SVCDual ProblemKernel Trick
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 21 / 42
![Page 50: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/50.jpg)
Overlapping ClassesIn practice, classes may be overlapping
Due to, e.g., noises or outliers
The problemargmin
w,b1
2
kwk2
sibject to y(i)(w>x
(i) +b)� 1,8i
has no solution in this case. How to fix this?
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 22 / 42
![Page 51: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/51.jpg)
Overlapping ClassesIn practice, classes may be overlapping
Due to, e.g., noises or outliers
The problemargmin
w,b1
2
kwk2
sibject to y(i)(w>x
(i) +b)� 1,8i
has no solution in this case. How to fix this?Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 22 / 42
![Page 52: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/52.jpg)
SlacksSVC tolerates slacks that fall outside of the regions they ought to beProblem:
argmin
w,b,x1
2
kwk2+C ÂNi=1
xi
sibject to y(i)(w>x
(i) +b)� 1�xi and xi � 0,8i
Favors large margin but also fewer slacks
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 23 / 42
![Page 53: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/53.jpg)
Hyperparameter C
argmin
w,b,x1
2
kwk2 +C ÂNi=1
xi
The hyperparameter C controls the tradeoff betweenMaximizing marginMinimizing number of slacks
Provides a geometric explanation to the weight decay
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 24 / 42
![Page 54: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/54.jpg)
Hyperparameter C
argmin
w,b,x1
2
kwk2 +C ÂNi=1
xi
The hyperparameter C controls the tradeoff betweenMaximizing marginMinimizing number of slacks
Provides a geometric explanation to the weight decay
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 24 / 42
![Page 55: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/55.jpg)
Outline
1Non-Parametric Methods
K-NNParzen WindowsLocal Models
2Support Vector Machines
SVCSlacksNonlinear SVCDual ProblemKernel Trick
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 25 / 42
![Page 56: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/56.jpg)
Nonlinearly Separable Classes
In practice, classes may be nonlinearly separable
SVC (with slacks) gives “bad” hyperplanes due to underfittingHow to make it nonlinear?
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 26 / 42
![Page 57: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/57.jpg)
Nonlinearly Separable Classes
In practice, classes may be nonlinearly separable
SVC (with slacks) gives “bad” hyperplanes due to underfittingHow to make it nonlinear?
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 26 / 42
![Page 58: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/58.jpg)
Feature Augmentation
Recall that in polynomial regression, we augment data features tomake a linear regressor nonlinear
We can can define a function F(·) that maps each data point to ahigh dimensional space:
argmin
w,b,x1
2
kwk2 +C Âi xi
sibject to y(i)(w>F(x(i))+b)� 1�xi and xi � 0,8i
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 27 / 42
![Page 59: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/59.jpg)
Feature Augmentation
Recall that in polynomial regression, we augment data features tomake a linear regressor nonlinearWe can can define a function F(·) that maps each data point to ahigh dimensional space:
argmin
w,b,x1
2
kwk2 +C Âi xi
sibject to y(i)(w>F(x(i))+b)� 1�xi and xi � 0,8i
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 27 / 42
![Page 60: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/60.jpg)
Outline
1Non-Parametric Methods
K-NNParzen WindowsLocal Models
2Support Vector Machines
SVCSlacksNonlinear SVCDual ProblemKernel Trick
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 28 / 42
![Page 61: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/61.jpg)
Time Complexity
Nonlinear SVC:
argmin
w,b,x1
2
kwk2 +C Âi xi
sibject to y(i)(w>F(x(i))+b)� 1�xi and xi � 0,8i
The higher augmented feature dimension, the more variables in w tosolve
Can we solve w in time complexity that is independent with themapped dimension?
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 29 / 42
![Page 62: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/62.jpg)
Time Complexity
Nonlinear SVC:
argmin
w,b,x1
2
kwk2 +C Âi xi
sibject to y(i)(w>F(x(i))+b)� 1�xi and xi � 0,8i
The higher augmented feature dimension, the more variables in w tosolveCan we solve w in time complexity that is independent with themapped dimension?
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 29 / 42
![Page 63: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/63.jpg)
Dual Problem
Primal problem:
argmin
w,b,x1
2
kwk2 +C Âi xi
sibject to y(i)(w>F(x(i))+b)� 1�xi and xi � 0,8i
Dual problem:
argmax
a,b min
w,b,x L(w,b,x ,a,b )subject to a � 0,b � 0
where L(w,b,x ,a,b ) =1
2
kwk2 +C Âi xt+Âi ai(1� y(i)(w>F(x(i))+b)�xi)+Âi bi(�xi)
Primal problem is convex, so strong duality holds
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 30 / 42
![Page 64: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/64.jpg)
Dual Problem
Primal problem:
argmin
w,b,x1
2
kwk2 +C Âi xi
sibject to y(i)(w>F(x(i))+b)� 1�xi and xi � 0,8i
Dual problem:
argmax
a,b min
w,b,x L(w,b,x ,a,b )subject to a � 0,b � 0
where L(w,b,x ,a,b ) =1
2
kwk2 +C Âi xt+Âi ai(1� y(i)(w>F(x(i))+b)�xi)+Âi bi(�xi)
Primal problem is convex, so strong duality holds
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 30 / 42
![Page 65: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/65.jpg)
Solving Dual Problem I
L(w,b,x ,a,b ) =1
2
kwk2 +C Âi xt+Âi ai(1� y(i)(w>F(x(i))+b)�xi)+Âi bi(�xi)
The inner problemmin
w,b,xL(w,b,x ,a,b )
is convex in terms of w, b, and x
Let’s solve it analytically:
∂L∂w
= w�Âi aiy(i)F(x(i)) = 0 ) w = Âi aiy(i)F(x(i))∂L∂b = Âi aiy(i) = 0
∂L∂xi
= C�ai �bi = 0 ) bi = C�ai
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 31 / 42
![Page 66: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/66.jpg)
Solving Dual Problem I
L(w,b,x ,a,b ) =1
2
kwk2 +C Âi xt+Âi ai(1� y(i)(w>F(x(i))+b)�xi)+Âi bi(�xi)
The inner problemmin
w,b,xL(w,b,x ,a,b )
is convex in terms of w, b, and x
Let’s solve it analytically:∂L∂w
= w�Âi aiy(i)F(x(i)) = 0 ) w = Âi aiy(i)F(x(i))
∂L∂b = Âi aiy(i) = 0
∂L∂xi
= C�ai �bi = 0 ) bi = C�ai
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 31 / 42
![Page 67: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/67.jpg)
Solving Dual Problem I
L(w,b,x ,a,b ) =1
2
kwk2 +C Âi xt+Âi ai(1� y(i)(w>F(x(i))+b)�xi)+Âi bi(�xi)
The inner problemmin
w,b,xL(w,b,x ,a,b )
is convex in terms of w, b, and x
Let’s solve it analytically:∂L∂w
= w�Âi aiy(i)F(x(i)) = 0 ) w = Âi aiy(i)F(x(i))∂L∂b = Âi aiy(i) = 0
∂L∂xi
= C�ai �bi = 0 ) bi = C�ai
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 31 / 42
![Page 68: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/68.jpg)
Solving Dual Problem I
L(w,b,x ,a,b ) =1
2
kwk2 +C Âi xt+Âi ai(1� y(i)(w>F(x(i))+b)�xi)+Âi bi(�xi)
The inner problemmin
w,b,xL(w,b,x ,a,b )
is convex in terms of w, b, and x
Let’s solve it analytically:∂L∂w
= w�Âi aiy(i)F(x(i)) = 0 ) w = Âi aiy(i)F(x(i))∂L∂b = Âi aiy(i) = 0
∂L∂xi
= C�ai �bi = 0 ) bi = C�ai
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 31 / 42
![Page 69: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/69.jpg)
Solving Dual Problem IIL(w,b,x ,a,b ) =1
2
kwk2 +C Âi xt+Âi ai(1� y(i)(w>F(x(i))+b)�xi)+Âi bi(�xi)
Substituting w = Âi aiy(i)F(x(i)) and bi = C�ai in L(w,b,x ,a,b ):
L(w,b,x ,a,b ) = Âi
ai �1
2
Âi,j
aiajy(i)y(j)F(x(i))>F(x(j))�bÂi
aiy(i),
min
w,b,xL(w,b,x ,a,b ) =
8>><
>>:
Âi ai � 1
2
Âi,j aiajy(i)y(j)F(x(i))>F(x(j)) ,if Âi aiy(i) = 0,
�•,otherwise
Outer maximization problem:
argmax
a
1>a � 1
2
a
>Ka
subject to 0 a C1 and y
>a = 0
Ki,j = y(i)y(j)F(x(i))>F(x(j))bi = C�ai � 0 implies ai C
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 32 / 42
![Page 70: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/70.jpg)
Solving Dual Problem IIL(w,b,x ,a,b ) =1
2
kwk2 +C Âi xt+Âi ai(1� y(i)(w>F(x(i))+b)�xi)+Âi bi(�xi)
Substituting w = Âi aiy(i)F(x(i)) and bi = C�ai in L(w,b,x ,a,b ):
L(w,b,x ,a,b ) = Âi
ai �1
2
Âi,j
aiajy(i)y(j)F(x(i))>F(x(j))�bÂi
aiy(i),
min
w,b,xL(w,b,x ,a,b ) =
8>><
>>:
Âi ai � 1
2
Âi,j aiajy(i)y(j)F(x(i))>F(x(j)) ,if Âi aiy(i) = 0,
�•,otherwise
Outer maximization problem:
argmax
a
1>a � 1
2
a
>Ka
subject to 0 a C1 and y
>a = 0
Ki,j = y(i)y(j)F(x(i))>F(x(j))bi = C�ai � 0 implies ai C
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 32 / 42
![Page 71: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/71.jpg)
Solving Dual Problem IIL(w,b,x ,a,b ) =1
2
kwk2 +C Âi xt+Âi ai(1� y(i)(w>F(x(i))+b)�xi)+Âi bi(�xi)
Substituting w = Âi aiy(i)F(x(i)) and bi = C�ai in L(w,b,x ,a,b ):
L(w,b,x ,a,b ) = Âi
ai �1
2
Âi,j
aiajy(i)y(j)F(x(i))>F(x(j))�bÂi
aiy(i),
min
w,b,xL(w,b,x ,a,b ) =
8>><
>>:
Âi ai � 1
2
Âi,j aiajy(i)y(j)F(x(i))>F(x(j)) ,if Âi aiy(i) = 0,
�•,otherwise
Outer maximization problem:
argmax
a
1>a � 1
2
a
>Ka
subject to 0 a C1 and y
>a = 0
Ki,j = y(i)y(j)F(x(i))>F(x(j))
bi = C�ai � 0 implies ai C
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 32 / 42
![Page 72: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/72.jpg)
Solving Dual Problem IIL(w,b,x ,a,b ) =1
2
kwk2 +C Âi xt+Âi ai(1� y(i)(w>F(x(i))+b)�xi)+Âi bi(�xi)
Substituting w = Âi aiy(i)F(x(i)) and bi = C�ai in L(w,b,x ,a,b ):
L(w,b,x ,a,b ) = Âi
ai �1
2
Âi,j
aiajy(i)y(j)F(x(i))>F(x(j))�bÂi
aiy(i),
min
w,b,xL(w,b,x ,a,b ) =
8>><
>>:
Âi ai � 1
2
Âi,j aiajy(i)y(j)F(x(i))>F(x(j)) ,if Âi aiy(i) = 0,
�•,otherwise
Outer maximization problem:
argmax
a
1>a � 1
2
a
>Ka
subject to 0 a C1 and y
>a = 0
Ki,j = y(i)y(j)F(x(i))>F(x(j))bi = C�ai � 0 implies ai C
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 32 / 42
![Page 73: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/73.jpg)
Solving Dual Problem II
Dual minimization problem of SVC:
argmin
a
1
2
a
>Ka �1>a
subject to 0 a C1 and y
>a = 0
Number of variables to solve?
N instead of augmented featuredimensionIn practice, this problem is solved by specialized solvers such as thesequential minimal optimization (SMO) [3]
As K is usually ill-conditioned
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 33 / 42
![Page 74: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/74.jpg)
Solving Dual Problem II
Dual minimization problem of SVC:
argmin
a
1
2
a
>Ka �1>a
subject to 0 a C1 and y
>a = 0
Number of variables to solve? N instead of augmented featuredimension
In practice, this problem is solved by specialized solvers such as thesequential minimal optimization (SMO) [3]
As K is usually ill-conditioned
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 33 / 42
![Page 75: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/75.jpg)
Solving Dual Problem II
Dual minimization problem of SVC:
argmin
a
1
2
a
>Ka �1>a
subject to 0 a C1 and y
>a = 0
Number of variables to solve? N instead of augmented featuredimensionIn practice, this problem is solved by specialized solvers such as thesequential minimal optimization (SMO) [3]
As K is usually ill-conditioned
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 33 / 42
![Page 76: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/76.jpg)
Making Predictions
Prediction: y = sign(f (x)) = sign(w>x+b)
We have w = Âi aiy(i)F(x(i))
How to obtain b?By the complementary slackness of KKT conditions, we have:
ai(1� y(i)(w>F(x(i))+b)�xi) = 0 and bi(�xi) = 0
For any x
(i) having 0 < ai < C, we have
bi = C�ai > 0 ) xi = 0,
(1� y(i)(w>F(x(i))+b)�xi) = 0 ) b = y(i)�w
>F(x(i))
In practice, we usually take the average over all x
(i)’s having0 < ai < C to avoid numeric error
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 34 / 42
![Page 77: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/77.jpg)
Making Predictions
Prediction: y = sign(f (x)) = sign(w>x+b)
We have w = Âi aiy(i)F(x(i))
How to obtain b?
By the complementary slackness of KKT conditions, we have:
ai(1� y(i)(w>F(x(i))+b)�xi) = 0 and bi(�xi) = 0
For any x
(i) having 0 < ai < C, we have
bi = C�ai > 0 ) xi = 0,
(1� y(i)(w>F(x(i))+b)�xi) = 0 ) b = y(i)�w
>F(x(i))
In practice, we usually take the average over all x
(i)’s having0 < ai < C to avoid numeric error
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 34 / 42
![Page 78: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/78.jpg)
Making Predictions
Prediction: y = sign(f (x)) = sign(w>x+b)
We have w = Âi aiy(i)F(x(i))
How to obtain b?By the complementary slackness of KKT conditions, we have:
ai(1� y(i)(w>F(x(i))+b)�xi) = 0 and bi(�xi) = 0
For any x
(i) having 0 < ai < C, we have
bi = C�ai > 0 ) xi = 0,
(1� y(i)(w>F(x(i))+b)�xi) = 0 ) b = y(i)�w
>F(x(i))
In practice, we usually take the average over all x
(i)’s having0 < ai < C to avoid numeric error
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 34 / 42
![Page 79: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/79.jpg)
Making Predictions
Prediction: y = sign(f (x)) = sign(w>x+b)
We have w = Âi aiy(i)F(x(i))
How to obtain b?By the complementary slackness of KKT conditions, we have:
ai(1� y(i)(w>F(x(i))+b)�xi) = 0 and bi(�xi) = 0
For any x
(i) having 0 < ai < C, we have
bi = C�ai > 0 ) xi = 0,
(1� y(i)(w>F(x(i))+b)�xi) = 0 ) b = y(i)�w
>F(x(i))
In practice, we usually take the average over all x
(i)’s having0 < ai < C to avoid numeric error
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 34 / 42
![Page 80: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/80.jpg)
Making Predictions
Prediction: y = sign(f (x)) = sign(w>x+b)
We have w = Âi aiy(i)F(x(i))
How to obtain b?By the complementary slackness of KKT conditions, we have:
ai(1� y(i)(w>F(x(i))+b)�xi) = 0 and bi(�xi) = 0
For any x
(i) having 0 < ai < C, we have
bi = C�ai > 0 ) xi = 0,
(1� y(i)(w>F(x(i))+b)�xi) = 0 ) b = y(i)�w
>F(x(i))
In practice, we usually take the average over all x
(i)’s having0 < ai < C to avoid numeric error
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 34 / 42
![Page 81: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/81.jpg)
Outline
1Non-Parametric Methods
K-NNParzen WindowsLocal Models
2Support Vector Machines
SVCSlacksNonlinear SVCDual ProblemKernel Trick
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 35 / 42
![Page 82: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/82.jpg)
Kernel as Inner Product
We need to evaluate F(x(i))>F(x(j)) whenSolving dual problem of SVC, where Ki,j = y(i)y(j)F(x(i))>F(x(j))Making a prediction, where f (x) = w
>x+b = Âi aiy(i)F(x(i))>F(x)+b
Time complexity?If we choose F carefully, we can can evaluate F(x(i))>F(x) = k(x(i),x)efficientlyPolynomial kernel: k(a,b) = (a>b/a +b )g
E.g., let a = 1, b = 1, g = 2 and a 2 R2, thenF(a) = [1,
p2a
1
,p
2a2
,a2
1
,a2
2
,p
2a1
a2
]> 2 R6
Gaussian RBF kernel: k(a,b) = exp(�gka�bk2) , g � 0
k(a,b) = exp(�gkak2 +2ga
>b� gkbk2)=
exp(�gkak2 � gkbk2)(1+ 2ga
>b
1!
+ (2ga
>b)2
2!
+ · · ·)Let a 2 R2, then F(a) =
exp(�gkak2)[1,q
2g
1!
a1
,q
2g
1!
a2
,q
2g
2!
a2
1
,q
2g
2!
a2
2
,2q
g
2!
a1
a2
, · · · ]> 2 R•
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 36 / 42
![Page 83: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/83.jpg)
Kernel as Inner Product
We need to evaluate F(x(i))>F(x(j)) whenSolving dual problem of SVC, where Ki,j = y(i)y(j)F(x(i))>F(x(j))Making a prediction, where f (x) = w
>x+b = Âi aiy(i)F(x(i))>F(x)+b
Time complexity?
If we choose F carefully, we can can evaluate F(x(i))>F(x) = k(x(i),x)efficientlyPolynomial kernel: k(a,b) = (a>b/a +b )g
E.g., let a = 1, b = 1, g = 2 and a 2 R2, thenF(a) = [1,
p2a
1
,p
2a2
,a2
1
,a2
2
,p
2a1
a2
]> 2 R6
Gaussian RBF kernel: k(a,b) = exp(�gka�bk2) , g � 0
k(a,b) = exp(�gkak2 +2ga
>b� gkbk2)=
exp(�gkak2 � gkbk2)(1+ 2ga
>b
1!
+ (2ga
>b)2
2!
+ · · ·)Let a 2 R2, then F(a) =
exp(�gkak2)[1,q
2g
1!
a1
,q
2g
1!
a2
,q
2g
2!
a2
1
,q
2g
2!
a2
2
,2q
g
2!
a1
a2
, · · · ]> 2 R•
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 36 / 42
![Page 84: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/84.jpg)
Kernel as Inner Product
We need to evaluate F(x(i))>F(x(j)) whenSolving dual problem of SVC, where Ki,j = y(i)y(j)F(x(i))>F(x(j))Making a prediction, where f (x) = w
>x+b = Âi aiy(i)F(x(i))>F(x)+b
Time complexity?If we choose F carefully, we can can evaluate F(x(i))>F(x) = k(x(i),x)efficiently
Polynomial kernel: k(a,b) = (a>b/a +b )g
E.g., let a = 1, b = 1, g = 2 and a 2 R2, thenF(a) = [1,
p2a
1
,p
2a2
,a2
1
,a2
2
,p
2a1
a2
]> 2 R6
Gaussian RBF kernel: k(a,b) = exp(�gka�bk2) , g � 0
k(a,b) = exp(�gkak2 +2ga
>b� gkbk2)=
exp(�gkak2 � gkbk2)(1+ 2ga
>b
1!
+ (2ga
>b)2
2!
+ · · ·)Let a 2 R2, then F(a) =
exp(�gkak2)[1,q
2g
1!
a1
,q
2g
1!
a2
,q
2g
2!
a2
1
,q
2g
2!
a2
2
,2q
g
2!
a1
a2
, · · · ]> 2 R•
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 36 / 42
![Page 85: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/85.jpg)
Kernel as Inner Product
We need to evaluate F(x(i))>F(x(j)) whenSolving dual problem of SVC, where Ki,j = y(i)y(j)F(x(i))>F(x(j))Making a prediction, where f (x) = w
>x+b = Âi aiy(i)F(x(i))>F(x)+b
Time complexity?If we choose F carefully, we can can evaluate F(x(i))>F(x) = k(x(i),x)efficientlyPolynomial kernel: k(a,b) = (a>b/a +b )g
E.g., let a = 1, b = 1, g = 2 and a 2 R2, thenF(a) = [1,
p2a
1
,p
2a2
,a2
1
,a2
2
,p
2a1
a2
]> 2 R6
Gaussian RBF kernel: k(a,b) = exp(�gka�bk2) , g � 0
k(a,b) = exp(�gkak2 +2ga
>b� gkbk2)=
exp(�gkak2 � gkbk2)(1+ 2ga
>b
1!
+ (2ga
>b)2
2!
+ · · ·)Let a 2 R2, then F(a) =
exp(�gkak2)[1,q
2g
1!
a1
,q
2g
1!
a2
,q
2g
2!
a2
1
,q
2g
2!
a2
2
,2q
g
2!
a1
a2
, · · · ]> 2 R•
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 36 / 42
![Page 86: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/86.jpg)
Kernel as Inner Product
We need to evaluate F(x(i))>F(x(j)) whenSolving dual problem of SVC, where Ki,j = y(i)y(j)F(x(i))>F(x(j))Making a prediction, where f (x) = w
>x+b = Âi aiy(i)F(x(i))>F(x)+b
Time complexity?If we choose F carefully, we can can evaluate F(x(i))>F(x) = k(x(i),x)efficientlyPolynomial kernel: k(a,b) = (a>b/a +b )g
E.g., let a = 1, b = 1, g = 2 and a 2 R2, thenF(a) = [1,
p2a
1
,p
2a2
,a2
1
,a2
2
,p
2a1
a2
]> 2 R6
Gaussian RBF kernel: k(a,b) = exp(�gka�bk2) , g � 0
k(a,b) = exp(�gkak2 +2ga
>b� gkbk2)=
exp(�gkak2 � gkbk2)(1+ 2ga
>b
1!
+ (2ga
>b)2
2!
+ · · ·)Let a 2 R2, then F(a) =
exp(�gkak2)[1,q
2g
1!
a1
,q
2g
1!
a2
,q
2g
2!
a2
1
,q
2g
2!
a2
2
,2q
g
2!
a1
a2
, · · · ]> 2 R•
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 36 / 42
![Page 87: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/87.jpg)
Kernel as Inner Product
We need to evaluate F(x(i))>F(x(j)) whenSolving dual problem of SVC, where Ki,j = y(i)y(j)F(x(i))>F(x(j))Making a prediction, where f (x) = w
>x+b = Âi aiy(i)F(x(i))>F(x)+b
Time complexity?If we choose F carefully, we can can evaluate F(x(i))>F(x) = k(x(i),x)efficientlyPolynomial kernel: k(a,b) = (a>b/a +b )g
E.g., let a = 1, b = 1, g = 2 and a 2 R2, thenF(a) = [1,
p2a
1
,p
2a2
,a2
1
,a2
2
,p
2a1
a2
]> 2 R6
Gaussian RBF kernel: k(a,b) = exp(�gka�bk2) , g � 0
k(a,b) = exp(�gkak2 +2ga
>b� gkbk2)=
exp(�gkak2 � gkbk2)(1+ 2ga
>b
1!
+ (2ga
>b)2
2!
+ · · ·)Let a 2 R2, then F(a) =
exp(�gkak2)[1,q
2g
1!
a1
,q
2g
1!
a2
,q
2g
2!
a2
1
,q
2g
2!
a2
2
,2q
g
2!
a1
a2
, · · · ]> 2 R•
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 36 / 42
![Page 88: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/88.jpg)
Kernel Trick
If we choose F induced by Polynomial or Gaussian RBF kernel, then
Ki,j = y(i)y(j)k(x(i),x)
takes only O(D) time to evaluate, and
f (x) = Âi
aiy(i)k(x(i),x)+b
takes O(ND) time
Independent with the augmented feature dimensiona , b , and g are new hyperparameters
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 37 / 42
![Page 89: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/89.jpg)
Kernel Trick
If we choose F induced by Polynomial or Gaussian RBF kernel, then
Ki,j = y(i)y(j)k(x(i),x)
takes only O(D) time to evaluate, and
f (x) = Âi
aiy(i)k(x(i),x)+b
takes O(ND) timeIndependent with the augmented feature dimension
a , b , and g are new hyperparameters
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 37 / 42
![Page 90: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/90.jpg)
Kernel Trick
If we choose F induced by Polynomial or Gaussian RBF kernel, then
Ki,j = y(i)y(j)k(x(i),x)
takes only O(D) time to evaluate, and
f (x) = Âi
aiy(i)k(x(i),x)+b
takes O(ND) timeIndependent with the augmented feature dimensiona , b , and g are new hyperparameters
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 37 / 42
![Page 91: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/91.jpg)
Sparse Kernel MachinesSVC is a kernel machine:
f (x) = Âi
aiy(i)k(x(i),x)+b
It is surprising that SVC works like K-NN in some sense
However, SVC is a sparse kernel machineOnly the slacks become the support vectors (ai > 0)
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 38 / 42
![Page 92: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/92.jpg)
Sparse Kernel MachinesSVC is a kernel machine:
f (x) = Âi
aiy(i)k(x(i),x)+b
It is surprising that SVC works like K-NN in some senseHowever, SVC is a sparse kernel machineOnly the slacks become the support vectors (ai > 0)
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 38 / 42
![Page 93: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/93.jpg)
KKT Conditions and Types of SVs
By KKT conditions, we have:Primal feasibility: y(i)(w>F(x(i))+b)� 1�xi and xi � 0
Complementary slackness: ai(1� y(i)(w>F(x(i))+b)�xi) = 0 andbi(�xi) = 0
Depending on the value of ai, each example x
(i) can be:Non SVs (ai = 0): y(i)(w>F(x(i))+b)� 1 (usually strict)
1� y(i)(w>F(x(i))+b)�xi 0
Since bi = C�ai 6= 0, we have xi = 0
Free SVs (0 < ai < C): y(i)(w>F(x(i))+b) = 1
1� y(i)(w>F(x(i))+b)�xi = 0
Since bi = C�ai 6= 0, we have xi = 0
Bounded SVs (ai = C): y(i)(w>F(x(i))+b) 1 (usually strict)1� y(i)(w>F(x(i))+b)�xi = 0
Since bi = 0, we have xi � 0
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 39 / 42
![Page 94: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/94.jpg)
KKT Conditions and Types of SVs
By KKT conditions, we have:Primal feasibility: y(i)(w>F(x(i))+b)� 1�xi and xi � 0
Complementary slackness: ai(1� y(i)(w>F(x(i))+b)�xi) = 0 andbi(�xi) = 0
Depending on the value of ai, each example x
(i) can be:
Non SVs (ai = 0): y(i)(w>F(x(i))+b)� 1 (usually strict)1� y(i)(w>F(x(i))+b)�xi 0
Since bi = C�ai 6= 0, we have xi = 0
Free SVs (0 < ai < C): y(i)(w>F(x(i))+b) = 1
1� y(i)(w>F(x(i))+b)�xi = 0
Since bi = C�ai 6= 0, we have xi = 0
Bounded SVs (ai = C): y(i)(w>F(x(i))+b) 1 (usually strict)1� y(i)(w>F(x(i))+b)�xi = 0
Since bi = 0, we have xi � 0
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 39 / 42
![Page 95: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/95.jpg)
KKT Conditions and Types of SVs
By KKT conditions, we have:Primal feasibility: y(i)(w>F(x(i))+b)� 1�xi and xi � 0
Complementary slackness: ai(1� y(i)(w>F(x(i))+b)�xi) = 0 andbi(�xi) = 0
Depending on the value of ai, each example x
(i) can be:
Non SVs (ai = 0): y(i)(w>F(x(i))+b)� 1 (usually strict)1� y(i)(w>F(x(i))+b)�xi 0
Since bi = C�ai 6= 0, we have xi = 0
Free SVs (0 < ai < C): y(i)(w>F(x(i))+b) = 1
1� y(i)(w>F(x(i))+b)�xi = 0
Since bi = C�ai 6= 0, we have xi = 0
Bounded SVs (ai = C): y(i)(w>F(x(i))+b) 1 (usually strict)1� y(i)(w>F(x(i))+b)�xi = 0
Since bi = 0, we have xi � 0
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 39 / 42
![Page 96: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/96.jpg)
KKT Conditions and Types of SVs
By KKT conditions, we have:Primal feasibility: y(i)(w>F(x(i))+b)� 1�xi and xi � 0
Complementary slackness: ai(1� y(i)(w>F(x(i))+b)�xi) = 0 andbi(�xi) = 0
Depending on the value of ai, each example x
(i) can be:Non SVs (ai = 0): y(i)(w>F(x(i))+b)� 1 (usually strict)
1� y(i)(w>F(x(i))+b)�xi 0
Since bi = C�ai 6= 0, we have xi = 0
Free SVs (0 < ai < C): y(i)(w>F(x(i))+b) = 1
1� y(i)(w>F(x(i))+b)�xi = 0
Since bi = C�ai 6= 0, we have xi = 0
Bounded SVs (ai = C): y(i)(w>F(x(i))+b) 1 (usually strict)
1� y(i)(w>F(x(i))+b)�xi = 0
Since bi = 0, we have xi � 0
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 39 / 42
![Page 97: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/97.jpg)
KKT Conditions and Types of SVs
By KKT conditions, we have:Primal feasibility: y(i)(w>F(x(i))+b)� 1�xi and xi � 0
Complementary slackness: ai(1� y(i)(w>F(x(i))+b)�xi) = 0 andbi(�xi) = 0
Depending on the value of ai, each example x
(i) can be:Non SVs (ai = 0): y(i)(w>F(x(i))+b)� 1 (usually strict)
1� y(i)(w>F(x(i))+b)�xi 0
Since bi = C�ai 6= 0, we have xi = 0
Free SVs (0 < ai < C): y(i)(w>F(x(i))+b) = 1
1� y(i)(w>F(x(i))+b)�xi = 0
Since bi = C�ai 6= 0, we have xi = 0
Bounded SVs (ai = C): y(i)(w>F(x(i))+b) 1 (usually strict)
1� y(i)(w>F(x(i))+b)�xi = 0
Since bi = 0, we have xi � 0
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 39 / 42
![Page 98: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/98.jpg)
KKT Conditions and Types of SVs
By KKT conditions, we have:Primal feasibility: y(i)(w>F(x(i))+b)� 1�xi and xi � 0
Complementary slackness: ai(1� y(i)(w>F(x(i))+b)�xi) = 0 andbi(�xi) = 0
Depending on the value of ai, each example x
(i) can be:Non SVs (ai = 0): y(i)(w>F(x(i))+b)� 1 (usually strict)
1� y(i)(w>F(x(i))+b)�xi 0
Since bi = C�ai 6= 0, we have xi = 0
Free SVs (0 < ai < C): y(i)(w>F(x(i))+b) = 1
1� y(i)(w>F(x(i))+b)�xi = 0
Since bi = C�ai 6= 0, we have xi = 0
Bounded SVs (ai = C): y(i)(w>F(x(i))+b) 1 (usually strict)
1� y(i)(w>F(x(i))+b)�xi = 0
Since bi = 0, we have xi � 0
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 39 / 42
![Page 99: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/99.jpg)
KKT Conditions and Types of SVs
By KKT conditions, we have:Primal feasibility: y(i)(w>F(x(i))+b)� 1�xi and xi � 0
Complementary slackness: ai(1� y(i)(w>F(x(i))+b)�xi) = 0 andbi(�xi) = 0
Depending on the value of ai, each example x
(i) can be:Non SVs (ai = 0): y(i)(w>F(x(i))+b)� 1 (usually strict)
1� y(i)(w>F(x(i))+b)�xi 0
Since bi = C�ai 6= 0, we have xi = 0
Free SVs (0 < ai < C): y(i)(w>F(x(i))+b) = 1
1� y(i)(w>F(x(i))+b)�xi = 0
Since bi = C�ai 6= 0, we have xi = 0
Bounded SVs (ai = C): y(i)(w>F(x(i))+b) 1 (usually strict)1� y(i)(w>F(x(i))+b)�xi = 0
Since bi = 0, we have xi � 0
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 39 / 42
![Page 100: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/100.jpg)
Remarks I
Pros of SVC:Global optimality (convex problem)
Works with different kernels (linear, Polynomial, Gaussian RBF, etc.)Works well with small training set
Cons:Nonlinear SVC not scalable to large tasks
Takes O(N2)⇠ O(N3) time to train using SMO in LIBSVM [1]On the other hand, linear SVC takes O(ND) time
Kernel matrix K requires O(N2) spaceIn practice, we cache only a small portion of K in memory
Sensitive to irrelevant data features (vs. decision trees)Non-trivial hyperparameter tuning
The effect of a (C, g) combination is unknown in advanceUsually done by grid search
Separate only 2 classesUsually wrapped by the 1-vs-1 technique for multi-class classification
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 40 / 42
![Page 101: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/101.jpg)
Remarks I
Pros of SVC:Global optimality (convex problem)Works with different kernels (linear, Polynomial, Gaussian RBF, etc.)
Works well with small training setCons:
Nonlinear SVC not scalable to large tasksTakes O(N2)⇠ O(N3) time to train using SMO in LIBSVM [1]On the other hand, linear SVC takes O(ND) time
Kernel matrix K requires O(N2) spaceIn practice, we cache only a small portion of K in memory
Sensitive to irrelevant data features (vs. decision trees)Non-trivial hyperparameter tuning
The effect of a (C, g) combination is unknown in advanceUsually done by grid search
Separate only 2 classesUsually wrapped by the 1-vs-1 technique for multi-class classification
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 40 / 42
![Page 102: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/102.jpg)
Remarks I
Pros of SVC:Global optimality (convex problem)Works with different kernels (linear, Polynomial, Gaussian RBF, etc.)Works well with small training set
Cons:Nonlinear SVC not scalable to large tasks
Takes O(N2)⇠ O(N3) time to train using SMO in LIBSVM [1]On the other hand, linear SVC takes O(ND) time
Kernel matrix K requires O(N2) spaceIn practice, we cache only a small portion of K in memory
Sensitive to irrelevant data features (vs. decision trees)Non-trivial hyperparameter tuning
The effect of a (C, g) combination is unknown in advanceUsually done by grid search
Separate only 2 classesUsually wrapped by the 1-vs-1 technique for multi-class classification
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 40 / 42
![Page 103: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/103.jpg)
Remarks I
Pros of SVC:Global optimality (convex problem)Works with different kernels (linear, Polynomial, Gaussian RBF, etc.)Works well with small training set
Cons:Nonlinear SVC not scalable to large tasks
Takes O(N2)⇠ O(N3) time to train using SMO in LIBSVM [1]On the other hand, linear SVC takes O(ND) time
Kernel matrix K requires O(N2) spaceIn practice, we cache only a small portion of K in memory
Sensitive to irrelevant data features (vs. decision trees)Non-trivial hyperparameter tuning
The effect of a (C, g) combination is unknown in advanceUsually done by grid search
Separate only 2 classesUsually wrapped by the 1-vs-1 technique for multi-class classification
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 40 / 42
![Page 104: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/104.jpg)
Remarks I
Pros of SVC:Global optimality (convex problem)Works with different kernels (linear, Polynomial, Gaussian RBF, etc.)Works well with small training set
Cons:Nonlinear SVC not scalable to large tasks
Takes O(N2)⇠ O(N3) time to train using SMO in LIBSVM [1]On the other hand, linear SVC takes O(ND) time
Kernel matrix K requires O(N2) spaceIn practice, we cache only a small portion of K in memory
Sensitive to irrelevant data features (vs. decision trees)Non-trivial hyperparameter tuning
The effect of a (C, g) combination is unknown in advanceUsually done by grid search
Separate only 2 classesUsually wrapped by the 1-vs-1 technique for multi-class classification
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 40 / 42
![Page 105: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/105.jpg)
Remarks I
Pros of SVC:Global optimality (convex problem)Works with different kernels (linear, Polynomial, Gaussian RBF, etc.)Works well with small training set
Cons:Nonlinear SVC not scalable to large tasks
Takes O(N2)⇠ O(N3) time to train using SMO in LIBSVM [1]On the other hand, linear SVC takes O(ND) time
Kernel matrix K requires O(N2) spaceIn practice, we cache only a small portion of K in memory
Sensitive to irrelevant data features (vs. decision trees)
Non-trivial hyperparameter tuningThe effect of a (C, g) combination is unknown in advanceUsually done by grid search
Separate only 2 classesUsually wrapped by the 1-vs-1 technique for multi-class classification
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 40 / 42
![Page 106: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/106.jpg)
Remarks I
Pros of SVC:Global optimality (convex problem)Works with different kernels (linear, Polynomial, Gaussian RBF, etc.)Works well with small training set
Cons:Nonlinear SVC not scalable to large tasks
Takes O(N2)⇠ O(N3) time to train using SMO in LIBSVM [1]On the other hand, linear SVC takes O(ND) time
Kernel matrix K requires O(N2) spaceIn practice, we cache only a small portion of K in memory
Sensitive to irrelevant data features (vs. decision trees)Non-trivial hyperparameter tuning
The effect of a (C, g) combination is unknown in advanceUsually done by grid search
Separate only 2 classesUsually wrapped by the 1-vs-1 technique for multi-class classification
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 40 / 42
![Page 107: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/107.jpg)
Remarks I
Pros of SVC:Global optimality (convex problem)Works with different kernels (linear, Polynomial, Gaussian RBF, etc.)Works well with small training set
Cons:Nonlinear SVC not scalable to large tasks
Takes O(N2)⇠ O(N3) time to train using SMO in LIBSVM [1]On the other hand, linear SVC takes O(ND) time
Kernel matrix K requires O(N2) spaceIn practice, we cache only a small portion of K in memory
Sensitive to irrelevant data features (vs. decision trees)Non-trivial hyperparameter tuning
The effect of a (C, g) combination is unknown in advanceUsually done by grid search
Separate only 2 classesUsually wrapped by the 1-vs-1 technique for multi-class classification
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 40 / 42
![Page 108: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/108.jpg)
Remarks II
Does nonlinear SVC always perform better than linear SVC?
No
Choose linear SVC (e.g., LIBLINEAR [2]) whenN is large (since nonlinear SVC does not scale), orD is large (since classes may already be linearly separable)
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 41 / 42
![Page 109: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/109.jpg)
Remarks II
Does nonlinear SVC always perform better than linear SVC? No
Choose linear SVC (e.g., LIBLINEAR [2]) whenN is large (since nonlinear SVC does not scale), orD is large (since classes may already be linearly separable)
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 41 / 42
![Page 110: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods](https://reader034.fdocuments.in/reader034/viewer/2022042315/5f0323577e708231d407ba01/html5/thumbnails/110.jpg)
Reference I
[1] Chih-Chung Chang and Chih-Jen Lin.Libsvm: a library for support vector machines.ACM Transactions on Intelligent Systems and Technology (TIST),2(3):27, 2011.
[2] Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, andChih-Jen Lin.Liblinear: A library for large linear classification.Journal of machine learning research, 9(Aug):1871–1874, 2008.
[3] John Platt et al.Sequential minimal optimization: A fast algorithm for training supportvector machines.1998.
Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 42 / 42