Support vector machines: Maximum-margin linear...

Support vector machines:Maximum-margin linear classifiers

Topics we’ll cover

1 The margin of a linear classifier

2 Maximizing the margin

3 A convex optimization problem

4 Support vectors

Improving upon the Perceptron

For a linearly separable data set, there are in general many possibleseparating hyperplanes, and Perceptron is guaranteed to find oneof them.

Is there a better, more systematic choice of separator?The one with the most buffer around it, for instance?

The learning problem

Given: training data (x (1), y (1)), . . . , (x (n), y (n)) ∈ Rd × {−1,+1}.Find: w ∈ Rd and b ∈ R such that y (i)(w · x (i) + b) > 0 for all i .

By scaling w , b, can equivalently ask for

y (i)(w · x (i) + b) ≥ 1 for all i

The learning problem

Given: training data (x (1), y (1)), . . . , (x (n), y (n)) ∈ Rd × {−1,+1}.Find: w ∈ Rd and b ∈ R such that y (i)(w · x (i) + b) > 0 for all i .

By scaling w , b, can equivalently ask for

y (i)(w · x (i) + b) ≥ 1 for all i

Maximizing the margin

Given: training data (x (1), y (1)), . . . , (x (n), y (n)) ∈ Rd × {−1,+1}.Find: w ∈ Rd and b ∈ R such that

y (i)(w · x (i) + b) ≥ 1 for all i .

w · x + b = 0

w · x + b = 1

w · x + b = �1

Maximize the margin γ.

Maximizing the margin

Given: training data (x (1), y (1)), . . . , (x (n), y (n)) ∈ Rd × {−1,+1}.Find: w ∈ Rd and b ∈ R such that

y (i)(w · x (i) + b) ≥ 1 for all i .

w · x + b = 0

w · x + b = 1

w · x + b = �1

Maximize the margin γ.

A formula for the margin

Close-up of a point z on the positive boundary.

w · x + b = 0

w · x + b = 1

A quick calculation shows that γ = 1/‖w‖.In short: to maximize the margin, minimize ‖w‖.

Maximum-margin linear classifier

• Given (x (1), y (1)), . . . , (x (n), y (n)) ∈ Rd × {−1,+1}

minw∈Rd ,b∈R

‖w‖2

s.t.: y (i)(w · x (i) + b) ≥ 1 for all i = 1, 2, . . . , n

• This is a convex optimization problem:

• Convex objective function• Linear constraints

• This means that:

• the optimal solution can be found efficiently• duality gives us information about the solution

• Given (x (1), y (1)), . . . , (x (n), y (n)) ∈ Rd × {−1,+1}

minw∈Rd ,b∈R

‖w‖2

• Given (x (1), y (1)), . . . , (x (n), y (n)) ∈ Rd × {−1,+1}

minw∈Rd ,b∈R

‖w‖2

Support vectors

Support vectors: training points right on the margin, i.e.y (i)(w · x (i) + b) = 1.

w · x + b = 0

↵i is nonzero only forthese support vectors

w =∑n

i=1 αiy(i)x (i) is a function of just the support vectors.

Small example: Iris data set

Fisher’s iris data

150 data points from three classes:

• iris setosa

• iris versicolor

• iris virginica

Four measurements: petal width/length, sepal width/length

Two features: sepal width, petal width.Two classes: setosa (red circles), versicolor (black triangles)

Soft Maximum-margin linear classifier

• Given (x (1), y (1)), . . . , (x (n), y (n)) ∈ Rd × {−1,+1}

minw∈Rd ,b∈R,ξ∈Rn

‖w‖2 + C∑i

s.t.: y (i)(w · x (i) + b) ≥ 1− ξi for all i = 1, 2, . . . , n

ξ ≥ 0

• Allows for violation of constraints:

• Model pays for violation via slack variables ξi• Works with non-separable data!

Soft Maximum-margin linear classifier

• Given (x (1), y (1)), . . . , (x (n), y (n)) ∈ Rd × {−1,+1}

minw∈Rd ,b∈R,ξ∈Rn

‖w‖2 + C∑i

s.t.: y (i)(w · x (i) + b) ≥ 1− ξi for all i = 1, 2, . . . , n

ξ ≥ 0

• Allows for violation of constraints:

• Model pays for violation via slack variables ξi• Works with non-separable data!

Support vector machines: Maximum-margin linear...

Documents

Transcript of Support vector machines: Maximum-margin linear...

To Do Outlinecseweb.ucsd.edu/~viscomp/classes/cse167/wi19/slides/lecture16.pdf · § Highlighted terms are recursive specularities [mirror reflections] and transmission (latter is

Classi cation by Support Vector Machinescompdiag.molgen.mpg.de/ngfn/docs/2003/nov/svm.pdf · A Support Vector Machine is a maximal margin hyperplane in feature space built by using

CELLULAR NEUROBIOLOGY BIPN 140 Winter 2019courses.ucsd.edu/syllabi/WI19/962123.pdf · 2019. 1. 11. · BIPN 140 Winter, 2019 Grading: Discussion 20% 25% 50% Texts: Grading is on a

Fernando J. Corbato, Marjorie Merwin-Daggett, Robert C. Daleycseweb.ucsd.edu/classes/wi19/cse221-a/papers/corbato62.pdf · 335 AN EXPERIMENTAL TIME-SHARING SYSTEM Fernando J. Corbato,

Lecture 7: P4 - cseweb.ucsd.educseweb.ucsd.edu/classes/wi19/cse222A-a/lectures/222A-wi19-l7.pdf · CSE 222A–Lecture 7: P4 13. Finally, an Action CSE 222A–Lecture 7: P4 14. CompiledStateTransitions

Sequence to Sequence Video to Text - cseweb.ucsd.educseweb.ucsd.edu/classes/wi19/cse291-g/student_presentations/Image... · Sequence to Sequence Video to Text Subhashini Venugopalan,

SVM - Support vector machineeric.univ-lyon2.fr/~ricco/cours/slides/svm.pdf · Support Vector Machine Ricco Rakotomalala ... Deux questions clés toujours en « machine learning »

NETWORKING, CLOUD COMPUTING, AND COURSE OVERVIEWcseweb.ucsd.edu/~gmporter/classes/wi19/cse124/schedule/01-Intro… · 1/8/2019 1 NETWORKING, CLOUD COMPUTING, AND COURSE OVERVIEW George

analysis of svm.pdf

CMU machine learning group membersguerzhoy/411_2018/lec/week10/svm.pdf · 2018. 5. 17. · Support Vector Machines: Slide 2 Supervised Learning • Want to figure out the parameter

Séparateurs a vaste marge - pageperso.univ-lr.frpageperso.univ-lr.fr/arnaud.revel/MesPolys/SVM.pdf · construire un modèle général ... Si f(.) est une fonction continue on

MUSIC 127A: 1959 (Jazz at the Crossroads) Professor ...courses.ucsd.edu/syllabi/WI19/955284.pdf · NAIMA John Coltrane John Coltrane, Tenor Saxophone, Wynton Kelly, Piano, Paul Chambers,

Hidden Markov Support Vector Machinessmil.csie.ntnu.edu.tw/ppt/20070212_Winston_HM-SVM.pdf · 2/12/2007 · • V. Kecman , “Learning and Soft Computing, Support Vector Machines,

Logistic regression for conditional probability estimationcseweb.ucsd.edu/classes/wi19/cse151-b/Logistic_regression.pdf · A linear model for conditional probability estimation Classify

Estimation of impervious surface based on integrated analysis of classification and regression by using SVM.pdf

Stevanato Group Brand Structure - SESAM Worldsesam-world.com/_pdf/sesam-121/05-SVM.pdf · · 2016-11-08Stevanato Group Brand Structure PHARMACEUTICAL SYSTEMS ENGINEERING SYSTEMS

13 Chaptercseweb.ucsd.edu/classes/sp06/cse151/lectures/... · A g P (!) E.g., P (die roll < 4) = P (1) + P (2) + P (3) = 1 = 6 + 1 = 6 + 1 = 6 = 1 = 2 Chapter 13 7. Random v ariables

ETHN 202: ETHNOGRAPHY/QUALITATIVE METHODSc.ucsd.edu/syllabi/WI19/955288.pdf · data driven projects which provide thin accounts of human behavior. In the 1980s and 1990s, an experimental

Indirect Rule Learning: Support Vector Machinesdzeng/BIOS740/SVM.pdf · On SVM optimization I Solving the dual problem is a simple convex quadratic programming problems (there are

Classi cation by Support Vector Machinescompdiag.molgen.mpg.de/ngfn/docs/2003/jan/svm.pdf · Classi cation by Support Vector Machines Florian Markowetz Max-Planck-Institute for Molecular