CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization...
Transcript of CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization...
![Page 1: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/1.jpg)
Perceptrons
4 November 2020
CMPU 365 · Artificial Intelligence
![Page 2: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/2.jpg)
https://twitter.com/Adequate_Scott/status/1323978722731544577
![Page 3: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/3.jpg)
![Page 4: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/4.jpg)
Where are we?
![Page 5: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/5.jpg)
Hello,
Do you want free printr cartriges? Why pay more when you can get them ABSOLUTELY FREE! Just
# free : 2 YOUR_NAME : 0 MISSPELLED : 2 FROM_FRIEND : 0 ...
SPAM or +
PIXEL-7,12 : 1 PIXEL-7,13 : 0 ... NUM_LOOPS : 1 ...
“2”
x f(x) yInputs Features Outputs
![Page 6: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/6.jpg)
Data: Labeled instances – inputs and outputs (x, y)
Features: attribute–value pairs that characterize each input x.
Parameters: Numeric parts of the model that determine how each x is mapped to an output y.
![Page 7: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/7.jpg)
Experimentation cycle: Learn (train) parameters on training set
(Tune hyperparameters on held-out set)
Evaluate on test set
E.g., accuracy: fraction of instances predicted correctly
Training data
Held-out data
Test data
![Page 8: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/8.jpg)
Aside: Alternative ways of splitting data
What if you choose an unlucky split into training and testing, where the data sets are very different?
Aren’t we wasting data?
Alternative: k-fold cross validation! Repeat k times:
Partition into train (n − n/k) and test (n/k) data sets
Train on training set; test on test set
Average results across k choices of test set
![Page 9: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/9.jpg)
Naïve Bayes
Bayes’ Rule lets us do diagnostic queries with causal probabilities.
The Naïve Bayes assumption takes all features Fi to be independent given the class label, Y.
We can build classifiers out of a Naïve Bayes model using training data.
Y
F1 F2 … Fn
P(Y ∣ f1, …, fn) ∝ P(Y)∏i
P( fi ∣ Y)
![Page 10: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/10.jpg)
Generalization and overfitting
![Page 11: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/11.jpg)
We want a classifier which does well on test data.
Dangers: Underfitting: Fits the training set poorly
Overfitting: Fitting the training data very closely, but not generalizing well
![Page 12: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/12.jpg)
0 2 4 6 8 10 12 14 16 18 20−15
−10
−5
0
5
10
15
20
25
30 Overfitting
![Page 13: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/13.jpg)
0 2 4 6 8 10 12 14 16 18 20−15
−10
−5
0
5
10
15
20
25
30
Degree 15 polynomial
Overfitting
![Page 14: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/14.jpg)
Example: Overfitting
![Page 15: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/15.jpg)
Example: Overfitting
![Page 16: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/16.jpg)
Example: Overfitting
![Page 17: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/17.jpg)
Example: Overfitting
![Page 18: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/18.jpg)
Example: Overfitting
![Page 19: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/19.jpg)
Example: Overfitting
![Page 20: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/20.jpg)
Example: Overfitting
“2” wins!
![Page 21: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/21.jpg)
Example: Overfitting
Posteriors determined by relative probabilities (odds ratios):P(W ∣ ham)P(W ∣ spam)
P(W ∣ spam)P(W ∣ ham)
![Page 22: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/22.jpg)
Example: Overfitting
Posteriors determined by relative probabilities (odds ratios):
"south-west": inf "nation": inf "morally": inf "nicely": inf "extent": inf "seriously": inf ...
What went wrong here?
"screens": inf "minute": inf "guaranteed": inf "$205.00": inf "delivery": inf "signature": inf ...
P(W ∣ ham)P(W ∣ spam)
P(W ∣ spam)P(W ∣ ham)
![Page 23: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/23.jpg)
Generalization and overfitting
Relative frequency parameters will overfit the training data!
Just because we never saw a 3 with pixel (15, 15) on during training doesn’t mean we won’t see it at test time.
Unlikely that every occurrence of “minute” is 100% spam.
Unlikely that every occurrence of “seriously” is 100% ham.
What about all the words that don’t occur in the training set at all?
In general, we can’t go around giving unseen events zero probability.
![Page 24: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/24.jpg)
Generalization and overfitting
As an extreme case, imagine using the entire email as the only feature.
Would get the training data perfect (if deterministic labeling).
Wouldn’t generalize at all.
Just making the bag-of-words assumption gives us some generalization, but isn’t enough.
To generalize better, we need to smooth or regularize the estimates.
![Page 25: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/25.jpg)
Parameter estimation
![Page 26: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/26.jpg)
Maximum likelihood estimation (MLE) infers the parameter value (θ) that maximize the probability of the observed data.
![Page 27: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/27.jpg)
Maximum likelihood estimation typically assumes: Each sample is drawn from the same distribution. That is, each instance is identically distributed.
Each sample is conditionally independent of the others given the parameters for our distribution.
This is a strong assumption, but it helps simplify the problem of MLE – and generally works in practice.
All possible values of θ are equally likely before we’ve seen any data (uniform prior).
independent, identically distributed (i.i.d.)
![Page 28: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/28.jpg)
Imagine you have a bag of blue and red balls. You draw samples by taking a ball out, recording its color, and then putting the ball back.
If you sample red, red, blue, that suggests that 2/3 of the balls in the bag are red and 1/3 are blue.
r r b
PML(r) = 2/3
![Page 29: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/29.jpg)
Consider our spam classifier.
The maximum likelihood estimate for the parameter P(Wi = 1 | Y = ham) corresponds to just counting the number of legitimate emails that word i appears in and dividing it by the total number of legitimate emails.
![Page 30: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/30.jpg)
Smoothing
![Page 31: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/31.jpg)
Laplace smoothing
Pretend you saw every outcome once more than you actually did
r r b
![Page 32: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/32.jpg)
Laplace smoothing
Pretend you saw every outcome once more than you actually did
r r b
![Page 33: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/33.jpg)
Laplace smoothing
Pretend you saw every outcome once more than you actually did
r r b
![Page 34: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/34.jpg)
Laplace smoothing
Laplace’s estimate (extended):Pretend you saw every outcome k extra times
What’s Laplace with k = 0?
k is the strength of the prior
Laplace for conditionals:
r r b
![Page 35: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/35.jpg)
Laplace smoothing
Laplace’s estimate (extended):Pretend you saw every outcome k extra times
What’s Laplace with k = 0?
k is the strength of the prior
Laplace for conditionals:
r r b
![Page 36: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/36.jpg)
Laplace smoothing
Laplace’s estimate (extended):Pretend you saw every outcome k extra times
What’s Laplace with k = 0?
k is the strength of the prior
Laplace for conditionals:
r r b
![Page 37: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/37.jpg)
Laplace smoothing
Laplace’s estimate (extended):Pretend you saw every outcome k extra times
What’s Laplace with k = 0?
k is the strength of the prior
Laplace for conditionals:Smooth each condition independently:
r r b
![Page 38: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/38.jpg)
Real Naïve Bayes: Smoothing
For real classification problems, smoothing is critical
New odds ratios:
"helvetica": 11.4 "seems": 10.8 "group": 10.2 "ago": 8.4 "areas": 8.3 ...
"verdana": 28.8 "Credit": 28.4 "ORDER": 27.2 "<FONT>": 26.9 "money": 26.5 ...
Do these make more sense?
P(W ∣ ham)P(W ∣ spam)
P(W ∣ spam)P(W ∣ ham)
![Page 39: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/39.jpg)
Now we’ve got two kinds of unknowns Parameters: the probabilities P(X | Y), P(Y)
Hyperparameters: e.g. the amount / type of smoothing to do, k
What should we learn where? We learn parameters from the training data.
We need to tune the hyperparameters using a different set – the held-out data.
Choose the best value and do a final test on the test data.
Tuning on held-out data
![Page 40: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/40.jpg)
Error-driven classification
![Page 41: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/41.jpg)
Examples of errors
Dear GlobalSCAPE Customer, GlobalSCAPE has partnered with ScanSoft to offer you the latest version of OmniPage Pro, for just $99.99* - the regular list price is $499! The most common question we've received about this offer is - Is this genuine? We would like to assure you that this offer is authorized by ScanSoft, is genuine and valid. You can get the . . .
. . . To receive your $30 Amazon.com promotional certificate, click through to http://www.amazon.com/apparel and see the prominent link for the $30 offer. All details are there. We hope you enjoyed receiving this message. However, if you'd rather not receive future e-mails announcing new store launches, please click . . .
![Page 42: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/42.jpg)
What to do about errors?Problem: There’s still spam in your inbox.
Need more features – words aren’t enough! Have you emailed the sender before?
Have a million other people just gotten the same email?
Is the sending information consistent?
Is the email in ALL CAPS?
Do inline URLs point where they say they point?
Does the email address you by name?
Naïve Bayes models can incorporate a variety of features, but they tend to do best in homogeneous cases, e.g., all features are word occurrences.
![Page 43: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/43.jpg)
Linear classifiers
![Page 44: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/44.jpg)
Feature vectors
Hello,
Do you want free printr cartriges? Why pay more when you can get them ABSOLUTELY FREE! Just
# free : 2 YOUR_NAME : 0 MISSPELLED : 2 FROM_FRIEND : 0 ...
SPAM or +
x f(x) y
![Page 45: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/45.jpg)
Feature vectors
Hello,
Do you want free printr cartriges? Why pay more when you can get them ABSOLUTELY FREE! Just
# free : 2 YOUR_NAME : 0 MISSPELLED : 2 FROM_FRIEND : 0 ...
SPAM or +
PIXEL-7,12 : 1 PIXEL-7,13 : 0 ... NUM_LOOPS : 1 ...
“2”
x f(x) y
![Page 46: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/46.jpg)
Loose inspiration
Dendrite
Cell body or Soma
Nucleus
SynapseAxon
Axon from another cell
Axonal arborization
Synapses
Human neuron
![Page 47: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/47.jpg)
Loose inspiration
Dendrite
Cell body or Soma
Nucleus
SynapseAxon
Axon from another cell
Axonal arborization
Synapses
Human neuron Artificial neurons
![Page 48: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/48.jpg)
Linear classifiers
Inputs are feature values, fi
Each feature has a weight, wi
Weighted sum is the activation:
If the activation is > 0, output label “+1”
< 0, output “−1”
Σf1f2f3
w1
w2
w3> 0?
activationw(x) = ∑i
wi ⋅ fi(x) = w ⋅ f(x)
![Page 49: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/49.jpg)
Recall: Dot product
Given two vectors of equal length, the dot product (or scalar product) produces a scalar (a single number) by multiplying the corresponding entries in the vectors: a = [a1, a2, …, an]
b = [b1, b2, …, bn]
a · b = a1b1 + a2b2 + ⋯ + anbn
![Page 50: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/50.jpg)
Recall: Dot product
Given two vectors of equal length, the dot product (or scalar product) produces a scalar (a single number) by multiplying the corresponding entries in the vectors: a = [a1, a2, …, an]
b = [b1, b2, …, bn]
a · b = a1b1 + a2b2 + ⋯ + anbn
a = [1, 2, 3] b = [4, 5, 6] # Version 1 sum(map(lambda x: x[0] * x[1], zip(a, b))) # Version 2 sum([x * y for x, y in zip(a, b)])
![Page 51: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/51.jpg)
Weights
Binary case: Compare features to a weight vector
Learning: Figure out the weight vector from examples
# free : 4 YOUR_NAME :-1 MISSPELLED : 1 FROM_FRIEND :-3 ...
w · f being positive means the positive class/label
w
![Page 52: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/52.jpg)
Weights
Binary case: Compare features to a weight vector
Learning: Figure out the weight vector from examples
# free : 2 YOUR_NAME : 0 MISSPELLED : 2 FROM_FRIEND : 0 ...
# free : 4 YOUR_NAME :-1 MISSPELLED : 1 FROM_FRIEND :-3 ...
w · f being positive means the positive class/label
w f(x1)
![Page 53: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/53.jpg)
Weights
Binary case: Compare features to a weight vector
Learning: Figure out the weight vector from examples
# free : 2 YOUR_NAME : 0 MISSPELLED : 2 FROM_FRIEND : 0 ...
# free : 4 YOUR_NAME :-1 MISSPELLED : 1 FROM_FRIEND :-3 ...
# free : 0 YOUR_NAME : 1 MISSPELLED : 1 FROM_FRIEND : 1 ...
w · f being positive means the positive class/label
w f(x1)
f(x2)
![Page 54: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/54.jpg)
Decision rules
![Page 55: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/55.jpg)
Key idea: decision boundary
The boundary at which the label changesKey Idea: Decision Boundary
+
++
+
+
+
-
-
-
-
-
-
-
-
- -
--
+ ++
++
Boundary at which label changes
![Page 56: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/56.jpg)
w f(x1)
f(x2)
+
−Angle between f(x) and w is < 90 degrees. x is classified as positive for the label.
Angle between f(x) and w is > 90 degrees. x is classified as negative for the label.
![Page 57: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/57.jpg)
Binary decision rule
In the space of feature vectors
Examples are points
Any weight vector defines a hyperplane decision boundary
One side corresponds to Y = +1
Other corresponds to Y = −1
Add a bias feature (value always 1) to allow shifted decision boundaries.
BIAS : -3 free : 4 money : 2 ...
0 10
1
2
free
mon
ey
w
![Page 58: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/58.jpg)
Binary decision rule
In the space of feature vectors
Examples are points
Any weight vector defines a hyperplane decision boundary
One side corresponds to Y = +1
Other corresponds to Y = −1
Add a bias feature (value always 1) to allow shifted decision boundaries.
BIAS : -3 free : 4 money : 2 ...
0 10
1
2
free
mon
ey
w
f ⋅ w = 0
![Page 59: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/59.jpg)
Binary decision rule
In the space of feature vectors
Examples are points
Any weight vector defines a hyperplane decision boundary
One side corresponds to Y = +1
Other corresponds to Y = −1
Add a bias feature (value always 1) to allow shifted decision boundaries.
BIAS : -3 free : 4 money : 2 ...
0 10
1
2
free
mon
ey
+1 = spam
−1 = ham
w
f ⋅ w = 0
![Page 60: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/60.jpg)
Weight updates
![Page 61: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/61.jpg)
Learning: Binary perceptron
Start with weights = 0
Cycle through training examples repeatedly.
![Page 62: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/62.jpg)
Learning: Binary perceptron
Start with weights = 0
Cycle through training examples repeatedly.
For each example:Classify with current weights.
![Page 63: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/63.jpg)
Learning: Binary perceptron
Start with weights = 0
Cycle through training examples repeatedly.
For each example:Classify with current weights.
If correct, no change!
![Page 64: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/64.jpg)
Learning: Binary perceptron
Start with weights = 0
Cycle through training examples repeatedly.
For each example:Classify with current weights.
If correct, no change!
![Page 65: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/65.jpg)
Learning: Binary perceptron
Start with weights = 0
Cycle through training examples repeatedly.
For each example:Classify with current weights.
If correct, no change!
If wrong, adjust the weight vector.
![Page 66: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/66.jpg)
Learning: Binary perceptron
Start with weights = 0
Cycle through training examples repeatedly.
For each example: Classify with current weights:
If correct (i.e., y = y*), no change!
If wrong, adjust the weight vector by adding or subtracting the feature vector. Subtract if y* is −1.
w = w + y* ⋅ f(x)
y = {+1 if w ⋅ f(x) ≥ 0−1 if w ⋅ f(x) < 0
w
f(x)
![Page 67: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/67.jpg)
Learning: Binary perceptron
Start with weights = 0
Cycle through training examples repeatedly.
For each example: Classify with current weights:
If correct (i.e., y = y*), no change!
If wrong, adjust the weight vector by adding or subtracting the feature vector. Subtract if y* is −1.
w = w + y* ⋅ f(x)
y = {+1 if w ⋅ f(x) ≥ 0−1 if w ⋅ f(x) < 0
w
f(x)y* ⋅ f(x)
![Page 68: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/68.jpg)
Learning: Binary perceptron
Start with weights = 0
Cycle through training examples repeatedly.
For each example: Classify with current weights:
If correct (i.e., y = y*), no change!
If wrong, adjust the weight vector by adding or subtracting the feature vector. Subtract if y* is −1.
w = w + y* ⋅ f(x)
y = {+1 if w ⋅ f(x) ≥ 0−1 if w ⋅ f(x) < 0
w
f(x)y* ⋅ f(x)
![Page 69: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/69.jpg)
Examples: Perceptron
![Page 70: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/70.jpg)
Examples: Perceptron
![Page 71: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/71.jpg)
Examples: Perceptron
![Page 72: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/72.jpg)
Examples: Perceptron
![Page 73: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/73.jpg)
Examples: Perceptron
![Page 74: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/74.jpg)
Examples: Perceptron
![Page 75: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/75.jpg)
Examples: Perceptron
![Page 76: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/76.jpg)
If we have more than two classes:A weight vector for each class:
Multi-class decision rule
Binary is multi-class where the negative class has weight zero
wyw1
w2 w3
![Page 77: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/77.jpg)
If we have more than two classes:A weight vector for each class:
Score (activation) of a class y:
Multi-class decision rule
Binary is multi-class where the negative class has weight zero
wy
wy ⋅ f(x)
w1
w2 w3
![Page 78: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/78.jpg)
If we have more than two classes:A weight vector for each class:
Score (activation) of a class y:
Prediction highest score wins
Multi-class decision rule
Binary is multi-class where the negative class has weight zero
wy
wy ⋅ f(x)
y = arg maxy
wy ⋅ f(x)
w1 ⋅ f(x) biggest
w2 ⋅ f(x) biggest w3 ⋅ f(x) biggest
w1
w2 w3
![Page 79: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/79.jpg)
Learning: Multi-class perceptron
Start with all weights = 0.
Pick up training examples one by one.Predict with current weights:
y = arg maxy
wy ⋅ f(x)
![Page 80: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/80.jpg)
Learning: Multi-class perceptron
Start with all weights = 0.
Pick up training examples one by one.Predict with current weights:
y = arg maxy
wy ⋅ f(x)
wy
f(x)wy*
wy′
![Page 81: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/81.jpg)
Learning: Multi-class perceptron
Start with all weights = 0.
Pick up training examples one by one.Predict with current weights:
If correct, no change!
y = arg maxy
wy ⋅ f(x)
wy
f(x)wy*
wy′
![Page 82: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/82.jpg)
Learning: Multi-class perceptron
Start with all weights = 0.
Pick up training examples one by one.Predict with current weights:
If correct, no change!
If wrong, we use the feature weights to lower score of wrong label (y) and to raise score of right label (y*):
y = arg maxy
wy ⋅ f(x)
wy = wy − f(x)wy* = wy* + f(x)
wy
f(x)wy*
wy′
![Page 83: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/83.jpg)
Example: Multiclass perceptron
BIAS : 1 win : 0 game : 0 vote : 0 the : 0 ...
BIAS : 0 win : 0 game : 0 vote : 0 the : 0 ...
BIAS : 0 win : 0 game : 0 vote : 0 the : 0 ...
wPOLITICS wTECHwSPORTS
![Page 84: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/84.jpg)
Example: Multiclass perceptron
“win the vote”
x y* (true) f(x) y (prediction)BIAS : 1 win : 1 game : 0 vote : 1 the : 1 ...
wPOLITICS wTECHwSPORTS
BIAS : 1 win : 0 game : 0 vote : 0 the : 0 ...
BIAS : 0 win : 0 game : 0 vote : 0 the : 0 ...
BIAS : 0 win : 0 game : 0 vote : 0 the : 0 ...
![Page 85: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/85.jpg)
Example: Multiclass perceptron
“win the vote”
x y* (true) f(x) y (prediction)
POLITICS SPORTSBIAS : 1 win : 1 game : 0 vote : 1 the : 1 ...
wPOLITICS wTECHwSPORTS
BIAS : 1 win : 0 game : 0 vote : 0 the : 0 ...
BIAS : 0 win : 0 game : 0 vote : 0 the : 0 ...
BIAS : 0 win : 0 game : 0 vote : 0 the : 0 ...
![Page 86: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/86.jpg)
Example: Multiclass perceptron
BIAS : 1 0 win : 0 -1 game : 0 0 vote : 0 -1 the : 0 -1 ...
“win the vote”
x y* (true) f(x) y (prediction)
POLITICS SPORTSBIAS : 1 win : 1 game : 0 vote : 1 the : 1 ...
BIAS : 0 1 win : 0 1 game : 0 0 vote : 0 1 the : 0 1 ...
BIAS : 0 win : 0 game : 0 vote : 0 the : 0 ...
wPOLITICS wTECHwSPORTS
![Page 87: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/87.jpg)
Example: Multiclass perceptron
x y* (true) f(x) y (prediction)
POLITICS SPORTSBIAS : 1 win : 1 game : 0 vote : 1 the : 1 ...
wPOLITICS wTECHwSPORTS
BIAS : 0 win : -1 game : 0 vote : -1 the : -1 ...
BIAS : 1 win : 1 game : 0 vote : 1 the : 1 ...
BIAS : 0 win : 0 game : 0 vote : 0 the : 0 ...
BIAS : 1 win : 1 game : 0 vote : 0 the : 1 ...
POLITICSPOLITICS“win the election”
“win the vote”
![Page 88: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/88.jpg)
Example: Multiclass perceptron
“win the vote”
x y* (true) f(x) y (prediction)
POLITICS SPORTSBIAS : 1 win : 1 game : 0 vote : 1 the : 1 ...
wPOLITICS wTECHwSPORTS
BIAS : 0 1 win : -1 0 game : 0 1 vote : -1 -1 the : -1 0 ...
BIAS : 1 0 win : 1 0 game : 0 -1 vote : 1 1 the : 1 0 ...
BIAS : 0 win : 0 game : 0 vote : 0 the : 0 ...
“win the election” POLITICS POLITICS
“win the game” SPORTS POLITICS
BIAS : 1 win : 1 game : 0 vote : 0 the : 1 ...
BIAS : 1 win : 1 game : 1 vote : 0 the : 1 ...
![Page 89: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/89.jpg)
Properties of perceptrons
Separability: True if some parameters get the training set perfectly correct.
Convergence: If the training is separable, perceptron will eventually converge (binary case).
Mistake bound: The maximum number of mistakes (binary case) related to the margin or degree of separability:
Separable
Non-separable
Number of features
Margin (thickest separating line)
mistakes <kδ2
![Page 90: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/90.jpg)
Classification: Comparison
Naïve Bayes Builds a model of the training data
Gives prediction probabilities
Strong assumptions about feature independence
One pass through data (counting)
Perceptrons Makes fewer assumptions about data
Error-driven learning
Multiple passes through data (prediction)
Often more accurate
![Page 91: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/91.jpg)
Next time: Improving the perceptron
![Page 92: CMPU 365 · Artificial Intelligencecs365/lectures/2020-11-04.pdf · 2020. 11. 4. · Generalization and over!tting Relative frequency parameters will over!t the training data! Just](https://reader033.fdocuments.in/reader033/viewer/2022061001/60b0691a29425a75fd32a4ed/html5/thumbnails/92.jpg)
Acknowledgments
The lecture incorporates material from: Dan Klein and Pieter Abbeel, University of California, Berkeley; ai.berkeley.edu
George Konidaris, Brown University
Peter Norvig and Stuart Russell, Artificial Intelligence: A Modern Approach
Ketrina Yim (illustrations)