Lecture12–Part01 SupportVector...
Transcript of Lecture12–Part01 SupportVector...
![Page 1: Lecture12–Part01 SupportVector Machinessierra.ucsd.edu/cse151a-2020-sp/lectures/12-svm/build/lecture.pdf · Recall:LogisticRegression PredictionRule:𝐻( ⃗ )=𝜎( ⃗ ⋅Aug(](https://reader036.fdocuments.in/reader036/viewer/2022062923/5f0aacd87e708231d42cc8e0/html5/thumbnails/1.jpg)
Lecture 12 – Part 01Support VectorMachines
![Page 2: Lecture12–Part01 SupportVector Machinessierra.ucsd.edu/cse151a-2020-sp/lectures/12-svm/build/lecture.pdf · Recall:LogisticRegression PredictionRule:𝐻( ⃗ )=𝜎( ⃗ ⋅Aug(](https://reader036.fdocuments.in/reader036/viewer/2022062923/5f0aacd87e708231d42cc8e0/html5/thumbnails/2.jpg)
Linear Classifiers
▶ Prediction rule: 𝐻( ⃗𝑥) = �⃗� ⋅ Aug( ⃗𝑥)▶ Predict class 1 if 𝐻( ⃗𝑥) > 0▶ Predict class -1 if 𝐻( ⃗𝑥) < 0
![Page 3: Lecture12–Part01 SupportVector Machinessierra.ucsd.edu/cse151a-2020-sp/lectures/12-svm/build/lecture.pdf · Recall:LogisticRegression PredictionRule:𝐻( ⃗ )=𝜎( ⃗ ⋅Aug(](https://reader036.fdocuments.in/reader036/viewer/2022062923/5f0aacd87e708231d42cc8e0/html5/thumbnails/3.jpg)
Decision Boundary
▶ Aug( ⃗𝑥) ⋅ �⃗� is proportional to distance fromboundary.
![Page 4: Lecture12–Part01 SupportVector Machinessierra.ucsd.edu/cse151a-2020-sp/lectures/12-svm/build/lecture.pdf · Recall:LogisticRegression PredictionRule:𝐻( ⃗ )=𝜎( ⃗ ⋅Aug(](https://reader036.fdocuments.in/reader036/viewer/2022062923/5f0aacd87e708231d42cc8e0/html5/thumbnails/4.jpg)
Recall: Logistic Regression
▶ Prediction Rule: 𝐻( ⃗𝑥) = 𝜎(�⃗� ⋅ Aug( ⃗𝑥))
▶ Find �⃗� by maximizing log likelihood.
▶ Predict class 1 if 𝐻( ⃗𝑥) > 0.5, class -1 otherwise.
▶ But 𝜎(�⃗� ⋅ Aug( ⃗𝑥)) > 0.5 ⟺ �⃗� ⋅ Aug( ⃗𝑥) > 0
![Page 5: Lecture12–Part01 SupportVector Machinessierra.ucsd.edu/cse151a-2020-sp/lectures/12-svm/build/lecture.pdf · Recall:LogisticRegression PredictionRule:𝐻( ⃗ )=𝜎( ⃗ ⋅Aug(](https://reader036.fdocuments.in/reader036/viewer/2022062923/5f0aacd87e708231d42cc8e0/html5/thumbnails/5.jpg)
Recall: Logistic Regression
![Page 6: Lecture12–Part01 SupportVector Machinessierra.ucsd.edu/cse151a-2020-sp/lectures/12-svm/build/lecture.pdf · Recall:LogisticRegression PredictionRule:𝐻( ⃗ )=𝜎( ⃗ ⋅Aug(](https://reader036.fdocuments.in/reader036/viewer/2022062923/5f0aacd87e708231d42cc8e0/html5/thumbnails/6.jpg)
Recall: Logistic Regression
![Page 7: Lecture12–Part01 SupportVector Machinessierra.ucsd.edu/cse151a-2020-sp/lectures/12-svm/build/lecture.pdf · Recall:LogisticRegression PredictionRule:𝐻( ⃗ )=𝜎( ⃗ ⋅Aug(](https://reader036.fdocuments.in/reader036/viewer/2022062923/5f0aacd87e708231d42cc8e0/html5/thumbnails/7.jpg)
Recall: the Perceptron
▶ Prediction Rule: 𝐻( ⃗𝑥) = �⃗� ⋅ Aug( ⃗𝑥)
▶ Find �⃗� by minimizing perceptron risk.
▶ Theorem: if the training data is linearlyseparable, the perceptron algorithm find adividing hyperplane.
![Page 8: Lecture12–Part01 SupportVector Machinessierra.ucsd.edu/cse151a-2020-sp/lectures/12-svm/build/lecture.pdf · Recall:LogisticRegression PredictionRule:𝐻( ⃗ )=𝜎( ⃗ ⋅Aug(](https://reader036.fdocuments.in/reader036/viewer/2022062923/5f0aacd87e708231d42cc8e0/html5/thumbnails/8.jpg)
Perceptron Problems
The learned perceptron may have a small margin.
![Page 9: Lecture12–Part01 SupportVector Machinessierra.ucsd.edu/cse151a-2020-sp/lectures/12-svm/build/lecture.pdf · Recall:LogisticRegression PredictionRule:𝐻( ⃗ )=𝜎( ⃗ ⋅Aug(](https://reader036.fdocuments.in/reader036/viewer/2022062923/5f0aacd87e708231d42cc8e0/html5/thumbnails/9.jpg)
Perceptron Problems
The learned perceptron may have a small margin.
We prefer large margins for generalization.
![Page 10: Lecture12–Part01 SupportVector Machinessierra.ucsd.edu/cse151a-2020-sp/lectures/12-svm/build/lecture.pdf · Recall:LogisticRegression PredictionRule:𝐻( ⃗ )=𝜎( ⃗ ⋅Aug(](https://reader036.fdocuments.in/reader036/viewer/2022062923/5f0aacd87e708231d42cc8e0/html5/thumbnails/10.jpg)
Maximum Margin Classifiers
▶ Assume: linear separability (for now).
▶ Many possible boundaries with zero error.
▶ Goal: Find linear boundary with largest marginw.r.t. training data.
![Page 11: Lecture12–Part01 SupportVector Machinessierra.ucsd.edu/cse151a-2020-sp/lectures/12-svm/build/lecture.pdf · Recall:LogisticRegression PredictionRule:𝐻( ⃗ )=𝜎( ⃗ ⋅Aug(](https://reader036.fdocuments.in/reader036/viewer/2022062923/5f0aacd87e708231d42cc8e0/html5/thumbnails/11.jpg)
Observation
▶ Training data: {( ⃗𝑥(𝑖), 𝑦𝑖)}
▶ Classification is correct when:
{�⃗� ⋅ Aug( ⃗𝑥(𝑖)) > 0, if 𝑦𝑖 = 1]�⃗� ⋅ Aug( ⃗𝑥(𝑖)) < 0, if 𝑦𝑖 = −1]
▶ Equivalently, classification is correct if:
𝑦𝑖 �⃗� ⋅ Aug( ⃗𝑥(𝑖)) > 0
![Page 12: Lecture12–Part01 SupportVector Machinessierra.ucsd.edu/cse151a-2020-sp/lectures/12-svm/build/lecture.pdf · Recall:LogisticRegression PredictionRule:𝐻( ⃗ )=𝜎( ⃗ ⋅Aug(](https://reader036.fdocuments.in/reader036/viewer/2022062923/5f0aacd87e708231d42cc8e0/html5/thumbnails/12.jpg)
Recall
▶ 𝑦𝑖 �⃗� ⋅ Aug( ⃗𝑥(𝑖)) ∝ to distance from boundary.
▶ Our goal: find �⃗� that maximizes the smallestdistance.
�⃗�best = argmax�⃗�∈ℝ𝑑+1
min⃗𝑖∈1,…,𝑛
[𝑦𝑖 �⃗� ⋅ Aug( ⃗𝑥(𝑖))]
▶ This looks hard. But there is a trick.
![Page 13: Lecture12–Part01 SupportVector Machinessierra.ucsd.edu/cse151a-2020-sp/lectures/12-svm/build/lecture.pdf · Recall:LogisticRegression PredictionRule:𝐻( ⃗ )=𝜎( ⃗ ⋅Aug(](https://reader036.fdocuments.in/reader036/viewer/2022062923/5f0aacd87e708231d42cc8e0/html5/thumbnails/13.jpg)
Another Observation
▶ If linearly separable, then there is a �⃗� such that
𝑦𝑖 �⃗� ⋅ Aug( ⃗𝑥(𝑖)) > 0
for all 𝑖 = 1, … , 𝑛.
▶ Actually, linearly separable ⟹ there is a �⃗� s.t.
𝑦𝑖 �⃗� ⋅ Aug( ⃗𝑥(𝑖)) ≥ 1
for all 𝑖 = 1, … , 𝑛.
![Page 14: Lecture12–Part01 SupportVector Machinessierra.ucsd.edu/cse151a-2020-sp/lectures/12-svm/build/lecture.pdf · Recall:LogisticRegression PredictionRule:𝐻( ⃗ )=𝜎( ⃗ ⋅Aug(](https://reader036.fdocuments.in/reader036/viewer/2022062923/5f0aacd87e708231d42cc8e0/html5/thumbnails/14.jpg)
Why?
▶ Suppose �⃗� separates, but𝑦𝑖�⃗� ⋅ Aug( ⃗𝑥(𝑖)) = 0.01
▶ Define �⃗� = 10.01�⃗� = 100�⃗�.
▶ Then 𝑦𝑖�⃗� ⋅ Aug( ⃗𝑥(𝑖)) = 1
▶ Note: ‖�⃗�‖ is large!
![Page 15: Lecture12–Part01 SupportVector Machinessierra.ucsd.edu/cse151a-2020-sp/lectures/12-svm/build/lecture.pdf · Recall:LogisticRegression PredictionRule:𝐻( ⃗ )=𝜎( ⃗ ⋅Aug(](https://reader036.fdocuments.in/reader036/viewer/2022062923/5f0aacd87e708231d42cc8e0/html5/thumbnails/15.jpg)
Why?
▶ Suppose �⃗� separates, but𝑦𝑖�⃗� ⋅ Aug( ⃗𝑥(𝑖)) = 0.5
▶ Define �⃗� = 10.5�⃗� = 2�⃗�.
▶ Then 𝑦𝑖�⃗� ⋅ Aug( ⃗𝑥(𝑖)) = 1
▶ Note: ‖�⃗�‖ is smaller!
![Page 16: Lecture12–Part01 SupportVector Machinessierra.ucsd.edu/cse151a-2020-sp/lectures/12-svm/build/lecture.pdf · Recall:LogisticRegression PredictionRule:𝐻( ⃗ )=𝜎( ⃗ ⋅Aug(](https://reader036.fdocuments.in/reader036/viewer/2022062923/5f0aacd87e708231d42cc8e0/html5/thumbnails/16.jpg)
The Trick
▶ We will demand that
𝑦𝑖 �⃗� ⋅ Aug( ⃗𝑥(𝑖)) ≥ 1
▶ The larger ‖�⃗�‖, the smaller the margin.
▶ New Goal: Minimize ‖�⃗�‖2 subject to𝑦𝑖�⃗� ⋅ Aug( ⃗𝑥(𝑖)) ≥ 1 for all 𝑖.
![Page 17: Lecture12–Part01 SupportVector Machinessierra.ucsd.edu/cse151a-2020-sp/lectures/12-svm/build/lecture.pdf · Recall:LogisticRegression PredictionRule:𝐻( ⃗ )=𝜎( ⃗ ⋅Aug(](https://reader036.fdocuments.in/reader036/viewer/2022062923/5f0aacd87e708231d42cc8e0/html5/thumbnails/17.jpg)
Optimization
▶ Minimize ‖�⃗�‖2 subject to 𝑦𝑖�⃗� ⋅ Aug( ⃗𝑥(𝑖)) ≥ 1 forall 𝑖.
▶ This is a convex, quadratic optimization problem.
▶ Can be solved efficiently with quadraticprogramming.▶ But there is no exact general formula for the solution
![Page 18: Lecture12–Part01 SupportVector Machinessierra.ucsd.edu/cse151a-2020-sp/lectures/12-svm/build/lecture.pdf · Recall:LogisticRegression PredictionRule:𝐻( ⃗ )=𝜎( ⃗ ⋅Aug(](https://reader036.fdocuments.in/reader036/viewer/2022062923/5f0aacd87e708231d42cc8e0/html5/thumbnails/18.jpg)
Support Vectors
▶ A support vector is a training point ⃗𝑥(𝑖) such that
𝑦𝑖�⃗� ⋅ Aug( ⃗𝑥(𝑖)) = 1
![Page 19: Lecture12–Part01 SupportVector Machinessierra.ucsd.edu/cse151a-2020-sp/lectures/12-svm/build/lecture.pdf · Recall:LogisticRegression PredictionRule:𝐻( ⃗ )=𝜎( ⃗ ⋅Aug(](https://reader036.fdocuments.in/reader036/viewer/2022062923/5f0aacd87e708231d42cc8e0/html5/thumbnails/19.jpg)
Support Vector Machines (SVMs)
▶ Then maximum margin solution �⃗� is a linearcombination of the support vectors.
▶ Let 𝑆 be the set of support vectors. Then
�⃗� = ∑𝑖∈𝑆
𝑦𝑖𝛼𝑖 Aug( ⃗𝑥(𝑖))
![Page 20: Lecture12–Part01 SupportVector Machinessierra.ucsd.edu/cse151a-2020-sp/lectures/12-svm/build/lecture.pdf · Recall:LogisticRegression PredictionRule:𝐻( ⃗ )=𝜎( ⃗ ⋅Aug(](https://reader036.fdocuments.in/reader036/viewer/2022062923/5f0aacd87e708231d42cc8e0/html5/thumbnails/20.jpg)
Example: Irises
▶ 3 classes: iris setosa, iris versicolor, iris virginica
▶ 4 measurements: petal width/height, sepal width/height
![Page 21: Lecture12–Part01 SupportVector Machinessierra.ucsd.edu/cse151a-2020-sp/lectures/12-svm/build/lecture.pdf · Recall:LogisticRegression PredictionRule:𝐻( ⃗ )=𝜎( ⃗ ⋅Aug(](https://reader036.fdocuments.in/reader036/viewer/2022062923/5f0aacd87e708231d42cc8e0/html5/thumbnails/21.jpg)
Example: Irises
▶ Using only sepal width/petal width▶ Two classes: versicolor (black), setosa (red)
![Page 22: Lecture12–Part01 SupportVector Machinessierra.ucsd.edu/cse151a-2020-sp/lectures/12-svm/build/lecture.pdf · Recall:LogisticRegression PredictionRule:𝐻( ⃗ )=𝜎( ⃗ ⋅Aug(](https://reader036.fdocuments.in/reader036/viewer/2022062923/5f0aacd87e708231d42cc8e0/html5/thumbnails/22.jpg)
Lecture 12 – Part 02Soft-Margin SVMs
![Page 23: Lecture12–Part01 SupportVector Machinessierra.ucsd.edu/cse151a-2020-sp/lectures/12-svm/build/lecture.pdf · Recall:LogisticRegression PredictionRule:𝐻( ⃗ )=𝜎( ⃗ ⋅Aug(](https://reader036.fdocuments.in/reader036/viewer/2022062923/5f0aacd87e708231d42cc8e0/html5/thumbnails/23.jpg)
Non-Separability
▶ So far we’ve assumed data is linearly separable.
▶ What if it isn’t?
![Page 24: Lecture12–Part01 SupportVector Machinessierra.ucsd.edu/cse151a-2020-sp/lectures/12-svm/build/lecture.pdf · Recall:LogisticRegression PredictionRule:𝐻( ⃗ )=𝜎( ⃗ ⋅Aug(](https://reader036.fdocuments.in/reader036/viewer/2022062923/5f0aacd87e708231d42cc8e0/html5/thumbnails/24.jpg)
The Problem
▶ Old Goal: Minimize ‖�⃗�‖2 subject to𝑦𝑖�⃗� ⋅ Aug( ⃗𝑥(𝑖)) ≥ 1 for all 𝑖.
▶ This no longer makes sense.
![Page 25: Lecture12–Part01 SupportVector Machinessierra.ucsd.edu/cse151a-2020-sp/lectures/12-svm/build/lecture.pdf · Recall:LogisticRegression PredictionRule:𝐻( ⃗ )=𝜎( ⃗ ⋅Aug(](https://reader036.fdocuments.in/reader036/viewer/2022062923/5f0aacd87e708231d42cc8e0/html5/thumbnails/25.jpg)
Cut Some Slack
▶ Idea: allow some classifications to be 𝜉𝑖 wrong,but not too wrong.
![Page 26: Lecture12–Part01 SupportVector Machinessierra.ucsd.edu/cse151a-2020-sp/lectures/12-svm/build/lecture.pdf · Recall:LogisticRegression PredictionRule:𝐻( ⃗ )=𝜎( ⃗ ⋅Aug(](https://reader036.fdocuments.in/reader036/viewer/2022062923/5f0aacd87e708231d42cc8e0/html5/thumbnails/26.jpg)
Cut Some Slack
▶ New problem. Fix some number 𝐶 ≥ 0.
min�⃗�∈ℝ𝑑+1, ⃗𝜉∈ℝ𝑛
‖�⃗�‖2 + 𝐶𝑛∑𝑖=1
𝜉𝑖
subject to 𝑦𝑖�⃗� ⋅ Aug( ⃗𝑥(𝑖)) ≥ 1 − 𝜉𝑖 for all 𝑖, ⃗𝜉 ≥ 0.
![Page 27: Lecture12–Part01 SupportVector Machinessierra.ucsd.edu/cse151a-2020-sp/lectures/12-svm/build/lecture.pdf · Recall:LogisticRegression PredictionRule:𝐻( ⃗ )=𝜎( ⃗ ⋅Aug(](https://reader036.fdocuments.in/reader036/viewer/2022062923/5f0aacd87e708231d42cc8e0/html5/thumbnails/27.jpg)
The Slack Parameter, C
▶ 𝐶 controls how much slack is given.
min�⃗�∈ℝ𝑑+1, ⃗𝜉∈ℝ𝑛
‖�⃗�‖2 + 𝐶𝑛∑𝑖=1
𝜉𝑖
subject to 𝑦𝑖�⃗� ⋅ Aug( ⃗𝑥(𝑖)) ≥ 1 − 𝜉𝑖 for all 𝑖, ⃗𝜉 ≥ 0.▶ Large 𝐶: don’t give much slack. Avoidmisclassifications.
▶ Small 𝐶: allow more slack at the cost ofmisclassifications.
![Page 28: Lecture12–Part01 SupportVector Machinessierra.ucsd.edu/cse151a-2020-sp/lectures/12-svm/build/lecture.pdf · Recall:LogisticRegression PredictionRule:𝐻( ⃗ )=𝜎( ⃗ ⋅Aug(](https://reader036.fdocuments.in/reader036/viewer/2022062923/5f0aacd87e708231d42cc8e0/html5/thumbnails/28.jpg)
Example: Small C
![Page 29: Lecture12–Part01 SupportVector Machinessierra.ucsd.edu/cse151a-2020-sp/lectures/12-svm/build/lecture.pdf · Recall:LogisticRegression PredictionRule:𝐻( ⃗ )=𝜎( ⃗ ⋅Aug(](https://reader036.fdocuments.in/reader036/viewer/2022062923/5f0aacd87e708231d42cc8e0/html5/thumbnails/29.jpg)
Example: Large C
![Page 30: Lecture12–Part01 SupportVector Machinessierra.ucsd.edu/cse151a-2020-sp/lectures/12-svm/build/lecture.pdf · Recall:LogisticRegression PredictionRule:𝐻( ⃗ )=𝜎( ⃗ ⋅Aug(](https://reader036.fdocuments.in/reader036/viewer/2022062923/5f0aacd87e708231d42cc8e0/html5/thumbnails/30.jpg)
Soft and Hard Margins
▶ Max-margin SVM from before has hard margin.
▶ Now: the soft margin SVM.
▶ As 𝐶 → ∞, the margin hardens.
![Page 31: Lecture12–Part01 SupportVector Machinessierra.ucsd.edu/cse151a-2020-sp/lectures/12-svm/build/lecture.pdf · Recall:LogisticRegression PredictionRule:𝐻( ⃗ )=𝜎( ⃗ ⋅Aug(](https://reader036.fdocuments.in/reader036/viewer/2022062923/5f0aacd87e708231d42cc8e0/html5/thumbnails/31.jpg)
Another View: Loss Functions
▶ Recall our problem:
min�⃗�∈ℝ𝑑+1, ⃗𝜉∈ℝ𝑛
‖�⃗�‖2 + 𝐶𝑛∑𝑖=1
𝜉𝑖
subject to 𝑦𝑖�⃗� ⋅ Aug( ⃗𝑥(𝑖)) ≥ 1 − 𝜉𝑖 for all 𝑖, ⃗𝜉 ≥ 0.
▶ Note: if ⃗𝑥(𝑖) is misclassified, then
𝜉𝑖 = 1 − 𝑦𝑖�⃗� ⋅ Aug( ⃗𝑥(𝑖))
![Page 32: Lecture12–Part01 SupportVector Machinessierra.ucsd.edu/cse151a-2020-sp/lectures/12-svm/build/lecture.pdf · Recall:LogisticRegression PredictionRule:𝐻( ⃗ )=𝜎( ⃗ ⋅Aug(](https://reader036.fdocuments.in/reader036/viewer/2022062923/5f0aacd87e708231d42cc8e0/html5/thumbnails/32.jpg)
Another View: Loss Functions
▶ New problem:
min�⃗�∈ℝ𝑑+1, ⃗𝜉∈ℝ𝑛
‖�⃗�‖2 + 𝐶𝑛∑𝑖=1
max{0, 1 − 𝑦𝑖�⃗� ⋅ ⃗𝑥(𝑖)}
▶ max{0, 1 − 𝑦𝑖�⃗� ⋅ ⃗𝑥(𝑖)} is called the hinge loss.
![Page 33: Lecture12–Part01 SupportVector Machinessierra.ucsd.edu/cse151a-2020-sp/lectures/12-svm/build/lecture.pdf · Recall:LogisticRegression PredictionRule:𝐻( ⃗ )=𝜎( ⃗ ⋅Aug(](https://reader036.fdocuments.in/reader036/viewer/2022062923/5f0aacd87e708231d42cc8e0/html5/thumbnails/33.jpg)
Another Way to Optimize
▶ We can use subgradient descent to minimizeSVM risk.
![Page 34: Lecture12–Part01 SupportVector Machinessierra.ucsd.edu/cse151a-2020-sp/lectures/12-svm/build/lecture.pdf · Recall:LogisticRegression PredictionRule:𝐻( ⃗ )=𝜎( ⃗ ⋅Aug(](https://reader036.fdocuments.in/reader036/viewer/2022062923/5f0aacd87e708231d42cc8e0/html5/thumbnails/34.jpg)
Lecture 12 – Part 03Sentiment Analysis
![Page 35: Lecture12–Part01 SupportVector Machinessierra.ucsd.edu/cse151a-2020-sp/lectures/12-svm/build/lecture.pdf · Recall:LogisticRegression PredictionRule:𝐻( ⃗ )=𝜎( ⃗ ⋅Aug(](https://reader036.fdocuments.in/reader036/viewer/2022062923/5f0aacd87e708231d42cc8e0/html5/thumbnails/35.jpg)
Why use linear predictors?
▶ Linear classifiers look to be very simple.
▶ That can be both good and bad.▶ Good: the math is tractable, less likely to overfit▶ Bad: may be too simple, underfit
▶ They can work surprisingly well.
![Page 36: Lecture12–Part01 SupportVector Machinessierra.ucsd.edu/cse151a-2020-sp/lectures/12-svm/build/lecture.pdf · Recall:LogisticRegression PredictionRule:𝐻( ⃗ )=𝜎( ⃗ ⋅Aug(](https://reader036.fdocuments.in/reader036/viewer/2022062923/5f0aacd87e708231d42cc8e0/html5/thumbnails/36.jpg)
Sentiment Analysis
▶ Given: a piece of text.
▶ Determine: if it is postive or negative in tone
▶ Example: “Needless to say, I wasted my money.”
![Page 37: Lecture12–Part01 SupportVector Machinessierra.ucsd.edu/cse151a-2020-sp/lectures/12-svm/build/lecture.pdf · Recall:LogisticRegression PredictionRule:𝐻( ⃗ )=𝜎( ⃗ ⋅Aug(](https://reader036.fdocuments.in/reader036/viewer/2022062923/5f0aacd87e708231d42cc8e0/html5/thumbnails/37.jpg)
The Data
▶ Sentences from reviews on Amazon, Yelp, IMDB.
▶ Each labeled (by a human) positive or negative.
▶ Examples:▶ “Needless to say, I wasted my money.”▶ “I have to jiggle the plug to get it to line up right.”▶ “Will order from them again!”▶ “He was very impressed when going from the originalbattery to the extended battery.”
![Page 38: Lecture12–Part01 SupportVector Machinessierra.ucsd.edu/cse151a-2020-sp/lectures/12-svm/build/lecture.pdf · Recall:LogisticRegression PredictionRule:𝐻( ⃗ )=𝜎( ⃗ ⋅Aug(](https://reader036.fdocuments.in/reader036/viewer/2022062923/5f0aacd87e708231d42cc8e0/html5/thumbnails/38.jpg)
The Plan
▶ We’ll train a soft-margin SVM.
▶ Problem: SVMs take fixed-length vectors asinputs, not sentences.
![Page 39: Lecture12–Part01 SupportVector Machinessierra.ucsd.edu/cse151a-2020-sp/lectures/12-svm/build/lecture.pdf · Recall:LogisticRegression PredictionRule:𝐻( ⃗ )=𝜎( ⃗ ⋅Aug(](https://reader036.fdocuments.in/reader036/viewer/2022062923/5f0aacd87e708231d42cc8e0/html5/thumbnails/39.jpg)
Bags of Words
To turn a document into a fixed-length vector:▶ First, choose a dictionary of words:
▶ E.g.: [”wasted”, ”impressed”, ”great”, ”bad”, ”again”]
▶ Count number of occurrences of each dictionary word indocument.▶ “It was bad. So bad that I was impressed at how badit was.” → (0, 1, 0, 3, 0)𝑇
▶ This is called a bag of words representation.
![Page 40: Lecture12–Part01 SupportVector Machinessierra.ucsd.edu/cse151a-2020-sp/lectures/12-svm/build/lecture.pdf · Recall:LogisticRegression PredictionRule:𝐻( ⃗ )=𝜎( ⃗ ⋅Aug(](https://reader036.fdocuments.in/reader036/viewer/2022062923/5f0aacd87e708231d42cc8e0/html5/thumbnails/40.jpg)
Choosing the Dictionary
▶ Many ways of choosing the dictionary.
▶ Easiest: take all of the words in the training set.▶ Perhaps throw out stop words like “the”, “a”, etc.
▶ Resulting dimensionality of feature vectors:large.
![Page 41: Lecture12–Part01 SupportVector Machinessierra.ucsd.edu/cse151a-2020-sp/lectures/12-svm/build/lecture.pdf · Recall:LogisticRegression PredictionRule:𝐻( ⃗ )=𝜎( ⃗ ⋅Aug(](https://reader036.fdocuments.in/reader036/viewer/2022062923/5f0aacd87e708231d42cc8e0/html5/thumbnails/41.jpg)
Experiment
▶ Bag of words features with 4500 word dictionary.
▶ 2500 training sentences, 500 test sentences.
▶ Train a soft margin SVM.
![Page 42: Lecture12–Part01 SupportVector Machinessierra.ucsd.edu/cse151a-2020-sp/lectures/12-svm/build/lecture.pdf · Recall:LogisticRegression PredictionRule:𝐻( ⃗ )=𝜎( ⃗ ⋅Aug(](https://reader036.fdocuments.in/reader036/viewer/2022062923/5f0aacd87e708231d42cc8e0/html5/thumbnails/42.jpg)
Choosing C
▶ We have to choose the slack parameter, 𝐶.
▶ Use cross validation!
![Page 43: Lecture12–Part01 SupportVector Machinessierra.ucsd.edu/cse151a-2020-sp/lectures/12-svm/build/lecture.pdf · Recall:LogisticRegression PredictionRule:𝐻( ⃗ )=𝜎( ⃗ ⋅Aug(](https://reader036.fdocuments.in/reader036/viewer/2022062923/5f0aacd87e708231d42cc8e0/html5/thumbnails/43.jpg)
Cross Validation
![Page 44: Lecture12–Part01 SupportVector Machinessierra.ucsd.edu/cse151a-2020-sp/lectures/12-svm/build/lecture.pdf · Recall:LogisticRegression PredictionRule:𝐻( ⃗ )=𝜎( ⃗ ⋅Aug(](https://reader036.fdocuments.in/reader036/viewer/2022062923/5f0aacd87e708231d42cc8e0/html5/thumbnails/44.jpg)
Results
▶ With 𝐶 = 0.32, test error ≈ 15.6%.
![Page 45: Lecture12–Part01 SupportVector Machinessierra.ucsd.edu/cse151a-2020-sp/lectures/12-svm/build/lecture.pdf · Recall:LogisticRegression PredictionRule:𝐻( ⃗ )=𝜎( ⃗ ⋅Aug(](https://reader036.fdocuments.in/reader036/viewer/2022062923/5f0aacd87e708231d42cc8e0/html5/thumbnails/45.jpg)