Support Vector Machines Optimizationjun.hansung.ac.kr/ML/docs-slides-Lecture12-kr.pdf · 2016. 11....

Post on 07-Sep-2020

5 views 0 download

Transcript of Support Vector Machines Optimizationjun.hansung.ac.kr/ML/docs-slides-Lecture12-kr.pdf · 2016. 11....

Support VectorMachines

Optimization objective

Machine Learning

Andrew Ng

If , we want ,

Alternative view of logistic regression

If , we want ,

Andrew Ng

Alternative view of logistic regression

Cost of example:

If (want ): If (want ):

Andrew Ng

Support vector machineLogistic regression:

Support vector machine:

Andrew Ng

SVM hypothesis

Hypothesis:

Large MarginIntuition

Machine Learning

Support VectorMachines

Andrew Ng

If , we want (not just )If , we want (not just )

Support Vector Machine

-1 1 -1 1

Andrew Ng

SVM Decision Boundary

Whenever :

Whenever :

-1 1

-1 1

Andrew Ng

SVM Decision Boundary: Linearly separable case

x1

x2

Large margin classifier

Andrew Ng

Large margin classifier in presence of outliers

x1

x2

The mathematics behind large margin classification (optional)

Machine Learning

Support VectorMachines

Andrew Ng

Vector Inner Product

Andrew Ng

SVM Decision Boundary

Andrew Ng

SVM Decision Boundary

커널(Kernels) I

Machine Learning

Support VectorMachines

Andrew Ng

비선형결정경계

x1

x2

특징 의또다른/ 더나은선택이있는가?

Andrew Ng

Kernel

x1

x2

주어진 에대하여, 랜드마크 의근접도에의존하는새로운특징을계산한다.

Andrew Ng

커널과유사도(Similarity)

Andrew Ng

Example:

Andrew Ng

x1

x2

Support VectorMachines

커널(Kernels) II

Machine Learning

Andrew Ng

x1

x2

Choosing the landmarks

Given :

Predict ifWhere to get ?

Andrew Ng

SVM with KernelsGivenchoose

Given example :

For training example :

Andrew Ng

SVM with Kernels

Hypothesis: Given , compute featuresPredict “y=1” if

Training:

Andrew Ng

Small : 특징 가좀덜부드럽게변화한다.

Lower bias, higher variance.

SVM parameters:

C ( ). Large C: Lower bias, high variance.Small C: Higher bias, low variance.

Large : 특징 가좀더부드럽게변화한다.

Higher bias, lower variance.

Support Vector Machines

Using an SVM

Machine Learning

Andrew Ng

Use SVM software package (e.g. liblinear, libsvm, …) to solve for parameters .

Need to specify:Choice of parameter C.Choice of kernel (similarity function):

E.g. No kernel (“linear kernel”)Predict “y = 1” if

Gaussian kernel:

, where . Need to choose .

Andrew Ng

Kernel (similarity) functions:

function f = kernel(x1,x2)

return

주의 : 가우시안커널을사용하기전에특징스케일링을수행한다.

x1 x2

Andrew Ng

Other choices of kernel

Note: 모든유사도함수 가 유효한커널은아니다.

( SVM 패키지의최적화가정확히실행되고발산하지않으려면, “Mercer’s Theorem”라는기술적조건을만족할필요가있다).

많은이용가능한커널들:

- Polynomial kernel:

- 좀더난해한: String kernel, chi-square kernel, histogram intersection kernel, …

Andrew Ng

Multi-class classification

많은 SVM 패키지들이이미내장된다중-클래스분류기능을갖는다.

그렇지않으면, 일-대-모두방법을사용한다

(Train SVMs, one to distinguish from the rest, for ), get

Pick class with largest

Andrew Ng

Logistic regression vs. SVMs

number of features ( ), number of training examplesIf is large (relative to ):Use logistic regression, or SVM without a kernel (“linear kernel”)

If is small, is intermediate:Use SVM with Gaussian kernel

If is small, is large:Create/add more features, then use logistic regression or SVM without a kernel

Neural network likely to work well for most of these settings, but may be slower to train.