Support vector machines: Maximum-margin linear...

17
Support vector machines: Maximum-margin linear classifiers

Transcript of Support vector machines: Maximum-margin linear...

Page 1: Support vector machines: Maximum-margin linear classifierscseweb.ucsd.edu/classes/wi19/cse151-b/SVM.pdf · 2019-03-05 · Improving upon the Perceptron For a linearly separable data

Support vector machines:Maximum-margin linear classifiers

Page 2: Support vector machines: Maximum-margin linear classifierscseweb.ucsd.edu/classes/wi19/cse151-b/SVM.pdf · 2019-03-05 · Improving upon the Perceptron For a linearly separable data

Topics we’ll cover

1 The margin of a linear classifier

2 Maximizing the margin

3 A convex optimization problem

4 Support vectors

Page 3: Support vector machines: Maximum-margin linear classifierscseweb.ucsd.edu/classes/wi19/cse151-b/SVM.pdf · 2019-03-05 · Improving upon the Perceptron For a linearly separable data

Improving upon the Perceptron

For a linearly separable data set, there are in general many possibleseparating hyperplanes, and Perceptron is guaranteed to find oneof them.

Is there a better, more systematic choice of separator?The one with the most buffer around it, for instance?

Page 4: Support vector machines: Maximum-margin linear classifierscseweb.ucsd.edu/classes/wi19/cse151-b/SVM.pdf · 2019-03-05 · Improving upon the Perceptron For a linearly separable data

The learning problem

Given: training data (x (1), y (1)), . . . , (x (n), y (n)) ∈ Rd × {−1,+1}.Find: w ∈ Rd and b ∈ R such that y (i)(w · x (i) + b) > 0 for all i .

By scaling w , b, can equivalently ask for

y (i)(w · x (i) + b) ≥ 1 for all i

Page 5: Support vector machines: Maximum-margin linear classifierscseweb.ucsd.edu/classes/wi19/cse151-b/SVM.pdf · 2019-03-05 · Improving upon the Perceptron For a linearly separable data

The learning problem

Given: training data (x (1), y (1)), . . . , (x (n), y (n)) ∈ Rd × {−1,+1}.Find: w ∈ Rd and b ∈ R such that y (i)(w · x (i) + b) > 0 for all i .

By scaling w , b, can equivalently ask for

y (i)(w · x (i) + b) ≥ 1 for all i

Page 6: Support vector machines: Maximum-margin linear classifierscseweb.ucsd.edu/classes/wi19/cse151-b/SVM.pdf · 2019-03-05 · Improving upon the Perceptron For a linearly separable data

Maximizing the margin

Given: training data (x (1), y (1)), . . . , (x (n), y (n)) ∈ Rd × {−1,+1}.Find: w ∈ Rd and b ∈ R such that

y (i)(w · x (i) + b) ≥ 1 for all i .

w · x + b = 0

w · x + b = 1

w · x + b = �1

Maximize the margin γ.

Page 7: Support vector machines: Maximum-margin linear classifierscseweb.ucsd.edu/classes/wi19/cse151-b/SVM.pdf · 2019-03-05 · Improving upon the Perceptron For a linearly separable data

Maximizing the margin

Given: training data (x (1), y (1)), . . . , (x (n), y (n)) ∈ Rd × {−1,+1}.Find: w ∈ Rd and b ∈ R such that

y (i)(w · x (i) + b) ≥ 1 for all i .

w · x + b = 0

w · x + b = 1

w · x + b = �1

Maximize the margin γ.

Page 8: Support vector machines: Maximum-margin linear classifierscseweb.ucsd.edu/classes/wi19/cse151-b/SVM.pdf · 2019-03-05 · Improving upon the Perceptron For a linearly separable data

A formula for the margin

Close-up of a point z on the positive boundary.

w · x + b = 0

w · x + b = 1

z

A quick calculation shows that γ = 1/‖w‖.In short: to maximize the margin, minimize ‖w‖.

Page 9: Support vector machines: Maximum-margin linear classifierscseweb.ucsd.edu/classes/wi19/cse151-b/SVM.pdf · 2019-03-05 · Improving upon the Perceptron For a linearly separable data

Maximum-margin linear classifier

• Given (x (1), y (1)), . . . , (x (n), y (n)) ∈ Rd × {−1,+1}

minw∈Rd ,b∈R

‖w‖2

s.t.: y (i)(w · x (i) + b) ≥ 1 for all i = 1, 2, . . . , n

• This is a convex optimization problem:

• Convex objective function• Linear constraints

• This means that:

• the optimal solution can be found efficiently• duality gives us information about the solution

Page 10: Support vector machines: Maximum-margin linear classifierscseweb.ucsd.edu/classes/wi19/cse151-b/SVM.pdf · 2019-03-05 · Improving upon the Perceptron For a linearly separable data

Maximum-margin linear classifier

• Given (x (1), y (1)), . . . , (x (n), y (n)) ∈ Rd × {−1,+1}

minw∈Rd ,b∈R

‖w‖2

s.t.: y (i)(w · x (i) + b) ≥ 1 for all i = 1, 2, . . . , n

• This is a convex optimization problem:

• Convex objective function• Linear constraints

• This means that:

• the optimal solution can be found efficiently• duality gives us information about the solution

Page 11: Support vector machines: Maximum-margin linear classifierscseweb.ucsd.edu/classes/wi19/cse151-b/SVM.pdf · 2019-03-05 · Improving upon the Perceptron For a linearly separable data

Maximum-margin linear classifier

• Given (x (1), y (1)), . . . , (x (n), y (n)) ∈ Rd × {−1,+1}

minw∈Rd ,b∈R

‖w‖2

s.t.: y (i)(w · x (i) + b) ≥ 1 for all i = 1, 2, . . . , n

• This is a convex optimization problem:

• Convex objective function• Linear constraints

• This means that:

• the optimal solution can be found efficiently• duality gives us information about the solution

Page 12: Support vector machines: Maximum-margin linear classifierscseweb.ucsd.edu/classes/wi19/cse151-b/SVM.pdf · 2019-03-05 · Improving upon the Perceptron For a linearly separable data

Support vectors

Support vectors: training points right on the margin, i.e.y (i)(w · x (i) + b) = 1.

w · x + b = 0

↵i is nonzero only forthese support vectors

w =∑n

i=1 αiy(i)x (i) is a function of just the support vectors.

Page 13: Support vector machines: Maximum-margin linear classifierscseweb.ucsd.edu/classes/wi19/cse151-b/SVM.pdf · 2019-03-05 · Improving upon the Perceptron For a linearly separable data

Small example: Iris data set

Fisher’s iris data

150 data points from three classes:

• iris setosa

• iris versicolor

• iris virginica

Four measurements: petal width/length, sepal width/length

Page 14: Support vector machines: Maximum-margin linear classifierscseweb.ucsd.edu/classes/wi19/cse151-b/SVM.pdf · 2019-03-05 · Improving upon the Perceptron For a linearly separable data

Small example: Iris data set

Two features: sepal width, petal width.Two classes: setosa (red circles), versicolor (black triangles)

Page 15: Support vector machines: Maximum-margin linear classifierscseweb.ucsd.edu/classes/wi19/cse151-b/SVM.pdf · 2019-03-05 · Improving upon the Perceptron For a linearly separable data

Small example: Iris data set

Two features: sepal width, petal width.Two classes: setosa (red circles), versicolor (black triangles)

Page 16: Support vector machines: Maximum-margin linear classifierscseweb.ucsd.edu/classes/wi19/cse151-b/SVM.pdf · 2019-03-05 · Improving upon the Perceptron For a linearly separable data

Soft Maximum-margin linear classifier

• Given (x (1), y (1)), . . . , (x (n), y (n)) ∈ Rd × {−1,+1}

minw∈Rd ,b∈R,ξ∈Rn

‖w‖2 + C∑i

ξi

s.t.: y (i)(w · x (i) + b) ≥ 1− ξi for all i = 1, 2, . . . , n

ξ ≥ 0

• Allows for violation of constraints:

• Model pays for violation via slack variables ξi• Works with non-separable data!

Page 17: Support vector machines: Maximum-margin linear classifierscseweb.ucsd.edu/classes/wi19/cse151-b/SVM.pdf · 2019-03-05 · Improving upon the Perceptron For a linearly separable data

Soft Maximum-margin linear classifier

• Given (x (1), y (1)), . . . , (x (n), y (n)) ∈ Rd × {−1,+1}

minw∈Rd ,b∈R,ξ∈Rn

‖w‖2 + C∑i

ξi

s.t.: y (i)(w · x (i) + b) ≥ 1− ξi for all i = 1, 2, . . . , n

ξ ≥ 0

• Allows for violation of constraints:

• Model pays for violation via slack variables ξi• Works with non-separable data!