PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.
-
Upload
nataly-bowlsby -
Category
Documents
-
view
228 -
download
0
Transcript of PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.
![Page 1: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/1.jpg)
PERCEPTRON LEARNING
David KauchakCS 451 – Fall 2013
![Page 2: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/2.jpg)
Admin
Assignment 1 solution available online
Assignment 2 Due Sunday at midnight
Assignment 2 competition site setup later today
![Page 3: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/3.jpg)
Machine learning models
Some machine learning approaches make strong assumptions about the data
If the assumptions are true this can often lead to better performance
If the assumptions aren’t true, they can fail miserably
Other approaches don’t make many assumptions about the data
This can allow us to learn from more varied data But, they are more prone to overfitting and generally require more training data
![Page 4: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/4.jpg)
What is the data generating distribution?
![Page 5: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/5.jpg)
What is the data generating distribution?
![Page 6: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/6.jpg)
What is the data generating distribution?
![Page 7: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/7.jpg)
What is the data generating distribution?
![Page 8: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/8.jpg)
What is the data generating distribution?
![Page 9: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/9.jpg)
What is the data generating distribution?
![Page 10: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/10.jpg)
Actual model
![Page 11: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/11.jpg)
Model assumptions
If you don’t have strong assumptions about the model, it can take you a longer to learn
Assume now that our model of the blue class is two circles
![Page 12: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/12.jpg)
What is the data generating distribution?
![Page 13: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/13.jpg)
What is the data generating distribution?
![Page 14: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/14.jpg)
What is the data generating distribution?
![Page 15: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/15.jpg)
What is the data generating distribution?
![Page 16: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/16.jpg)
What is the data generating distribution?
![Page 17: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/17.jpg)
Actual model
![Page 18: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/18.jpg)
What is the data generating distribution?
Knowing the model beforehand can drastically improve the learning and the number of examples required
![Page 19: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/19.jpg)
What is the data generating distribution?
![Page 20: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/20.jpg)
Make sure your assumption is correct, though!
![Page 21: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/21.jpg)
Machine learning models
What were the model assumptions (if any) that k-NN and decision trees make about the data?
Are there data sets that could never be learned correctly by either?
![Page 22: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/22.jpg)
k-NN model
K = 1
![Page 23: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/23.jpg)
Decision tree model
label 1
label 2
label 3
Axis-aligned splits/cuts of the data
![Page 24: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/24.jpg)
Bias
The “bias” of a model is how strong the model assumptions are.
low-bias classifiers make minimal assumptions about the data (k-NN and DT are generally considered low bias
high-bias classifiers make strong assumptions about the data
![Page 25: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/25.jpg)
Linear models
A strong high-bias assumption is linear separability:
in 2 dimensions, can separate classes by a line
in higher dimensions, need hyperplanes
A linear model is a model that assumes the data is linearly separable
![Page 26: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/26.jpg)
Hyperplanes
A hyperplane is line/plane in a high dimensional space
What defines a line?What defines a hyperplane?
![Page 27: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/27.jpg)
Defining a line
Any pair of values (w1,w2) defines a line through the origin:
f1
f2
![Page 28: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/28.jpg)
Defining a line
Any pair of values (w1,w2) defines a line through the origin:
-2-1012
10.50-0.5-1
f1
f2
![Page 29: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/29.jpg)
Defining a line
Any pair of values (w1,w2) defines a line through the origin:
-2-1012
10.50-0.5-1
f1
f2
![Page 30: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/30.jpg)
Defining a line
Any pair of values (w1,w2) defines a line through the origin:
We can also view it as the line perpendicular to the weight vector
w=(1,2)
(1,2)
f1
f2
![Page 31: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/31.jpg)
Classifying with a line
w=(1,2)
Mathematically, how can we classify points based on a line?
BLUE
RED
(1,1)
(1,-1)
f1
f2
![Page 32: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/32.jpg)
Classifying with a line
w=(1,2)
Mathematically, how can we classify points based on a line?
BLUE
RED
(1,1)
(1,-1)
(1,1):
(1,-1):
The sign indicates which side of the line
f1
f2
![Page 33: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/33.jpg)
Defining a line
Any pair of values (w1,w2) defines a line through the origin:
How do we move the line off of the origin?
f1
f2
![Page 34: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/34.jpg)
Defining a line
Any pair of values (w1,w2) defines a line through the origin:
-2-1012
f1
f2
![Page 35: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/35.jpg)
Defining a line
Any pair of values (w1,w2) defines a line through the origin:
-2-1012
0.50-0.5-1-1.5
f1
f2
Now intersects at -1
![Page 36: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/36.jpg)
Linear models
A linear model in n-dimensional space (i.e. n features) is define by n+1 weights:
In two dimensions, a line:
In three dimensions, a plane:
In n-dimensions, a hyperplane
(where b = -a)
![Page 37: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/37.jpg)
Classifying with a linear modelWe can classify with a linear model by checking the sign:
Negative example
Positive example
classifierf1, f2, …, fn
![Page 38: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/38.jpg)
Learning a linear model
Geometrically, we know what a linear model represents
Given a linear model (i.e. a set of weights and b) we can classify examples
TrainingData
(data with labels)
learn
How do we learn a linear model?
![Page 39: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/39.jpg)
Positive or negative?
NEGATIVE
![Page 40: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/40.jpg)
Positive or negative?
NEGATIVE
![Page 41: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/41.jpg)
Positive or negative?
POSITIVE
![Page 42: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/42.jpg)
Positive or negative?
NEGATIVE
![Page 43: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/43.jpg)
Positive or negative?
POSITIVE
![Page 44: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/44.jpg)
Positive or negative?
POSITIVE
![Page 45: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/45.jpg)
Positive or negative?
NEGATIVE
![Page 46: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/46.jpg)
Positive or negative?
POSITIVE
![Page 47: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/47.jpg)
A method to the madness
blue = positive
yellow triangles = positive
all others negative
How is this learning setup different than the learning we’ve done before?
When might this arise?
![Page 48: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/48.jpg)
Online learning algorithm
learn
Only get to see one example at a time!
0
Labele
d d
ata
![Page 49: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/49.jpg)
Online learning algorithm
learn
Only get to see one example at a time!
0
0
Labele
d d
ata
![Page 50: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/50.jpg)
Online learning algorithm
learn
Only get to see one example at a time!
0
0
Labele
d d
ata
1
![Page 51: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/51.jpg)
Online learning algorithm
learn
Only get to see one example at a time!
0
0
Labele
d d
ata
1
1
![Page 52: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/52.jpg)
Online learning algorithm
learn
Only get to see one example at a time!
0
0
Labele
d d
ata
1
1
…
![Page 53: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/53.jpg)
Learning a linear classifier
f1
f2
w=(1,0)What does this model currently say?
![Page 54: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/54.jpg)
Learning a linear classifier
f1
f2
w=(1,0)
POSITIVENEGATIVE
![Page 55: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/55.jpg)
Learning a linear classifier
f1
f2
w=(1,0)
(-1,1)
Is our current guess:right or wrong?
![Page 56: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/56.jpg)
Learning a linear classifier
f1
f2
w=(1,0)
(-1,1)
predicts negative, wrong
How should we update the model?
![Page 57: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/57.jpg)
Learning a linear classifier
f1
f2
w=(1,0)
(-1,1)
Should move this direction
![Page 58: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/58.jpg)
A closer look at why we got it wrong
w1 w2
We’d like this value to be positive since it’s a positive value
(-1, 1, positive)
Which of these contributed to the mistake?
![Page 59: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/59.jpg)
A closer look at why we got it wrong
w1 w2
We’d like this value to be positive since it’s a positive value
(-1, 1, positive)
contributed in the wrong direction
could have contributed (positive feature), but didn’t
How should we change the weights?
![Page 60: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/60.jpg)
A closer look at why we got it wrong
w1 w2
We’d like this value to be positive since it’s a positive value
(-1, 1, positive)
contributed in the wrong direction
could have contributed (positive feature), but didn’t
decrease increase
1 -> 0 0 -> 1
![Page 61: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/61.jpg)
Learning a linear classifier
f1
f2
w=(0,1)
(-1,1)
Graphically, this also makes sense!
![Page 62: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/62.jpg)
Learning a linear classifier
f1
f2
w=(0,1)
(1,-1)Is our current guess:right or wrong?
![Page 63: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/63.jpg)
Learning a linear classifier
f1
f2
w=(0,1)
(1,-1)predicts negative, correct
How should we update the model?
![Page 64: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/64.jpg)
Learning a linear classifier
f1
f2
w=(0,1)
(1,-1)Already correct… don’t change it!
![Page 65: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/65.jpg)
Learning a linear classifier
f1
f2
w=(0,1)
(-1,-1)Is our current guess:right or wrong?
![Page 66: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/66.jpg)
Learning a linear classifier
f1
f2
w=(0,1)
(-1,-1)predicts negative, wrong
How should we update the model?
![Page 67: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/67.jpg)
Learning a linear classifier
f1
f2
w=(0,1)
(-1,-1)
Should move this direction
![Page 68: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/68.jpg)
A closer look at why we got it wrong
w1 w2
We’d like this value to be positive since it’s a positive value
(-1, -1, positive)
Which of these contributed to the mistake?
![Page 69: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/69.jpg)
A closer look at why we got it wrong
w1 w2
We’d like this value to be positive since it’s a positive value
(-1, -1, positive)
didn’t contribute, but could have
contributed in the wrong direction
How should we change the weights?
![Page 70: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/70.jpg)
A closer look at why we got it wrong
w1 w2
We’d like this value to be positive since it’s a positive value
(-1, -1, positive)
didn’t contribute, but could have
contributed in the wrong direction
decrease decrease
0 -> -1 1 -> 0
![Page 71: PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649cc25503460f94989a67/html5/thumbnails/71.jpg)
Learning a linear classifier
f1
f2
f1, f2, label
-1,-1, positive-1, 1, positive 1, 1, negative 1,-1, negative
w=(-1,0)