Machine Learning Department School of Computer Science...
Transcript of Machine Learning Department School of Computer Science...
![Page 1: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/1.jpg)
k-Nearest Neighbors+
Model Selection
1
10-601 Introduction to Machine Learning
Matt GormleyLecture 5
Jan. 29, 2020
Machine Learning DepartmentSchool of Computer ScienceCarnegie Mellon University
![Page 2: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/2.jpg)
Q&A
3
Q: Why don’t my entropy calculations match those on the slides?
A: H(Y) is conventionally reported in “bits” and computed using log base 2. e.g., H(Y) = - P(Y=0) log2P(Y=0) - P(Y=1) log2P(Y=1)
Q: Why is entropy based on a sum of p(.) log p(.) terms?
A: We don’t have time for a full treatment of why it has to be this, but we can develop the right intuition with a few examples…
![Page 3: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/3.jpg)
Q&A
4
Q: How do we deal with ties in k-Nearest Neighbors (e.g. even k or equidistant points)?
A: I would ask you all for a good solution!
Q: How do we define a distance function when the features are categorical (e.g. weather takes values {sunny, rainy, overcast})?
A: Step 1: Convert from categorical attributes to numeric features (e.g. binary)Step 2: Select an appropriate distance function (e.g. Hamming distance)
![Page 4: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/4.jpg)
Reminders
• Homework 2: Decision Trees– Out: Wed, Jan. 22– Due: Wed, Feb. 05 at 11:59pm
• Today’s Poll: – http://p5.mlcourse.org
5
![Page 5: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/5.jpg)
Moss Cheat Checker
![Page 6: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/6.jpg)
What is Moss?• Moss (Measure Of Software Similarity): is an
automatic system for determining the similarity of programs. To date, the main application of Moss has been in detecting plagiarism in programming classes.
• Moss reports:– The Andrew IDs associated with the file submissions– The number of lines matched– The percent lines matched– Color coded submissions where similarities are
found
![Page 7: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/7.jpg)
What is Moss?
At first glance, the submissions may look different
![Page 8: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/8.jpg)
What is Moss?Moss can quickly find the similarities
![Page 9: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/9.jpg)
OVERFITTING(FOR DECISION TREES)
10
![Page 10: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/10.jpg)
Decision Tree Generalization
Answer:
11
Question: Which of the following would generalize best to unseen examples?A. Small tree with low
training accuracyB. Large tree with low
training accuracyC. Small tree with high
training accuracyD. Large tree with high
training accuracy
![Page 11: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/11.jpg)
Overfitting and Underfitting
Underfitting• The model…
– is too simple– is unable captures the trends
in the data– exhibits too much bias
• Example: majority-vote classifier (i.e. depth-zero decision tree)
• Example: a toddler (that has not attended medical school) attempting to carry out medical diagnosis
Overfitting• The model…
– is too complex– is fitting the noise in the data– or fitting random statistical
fluctuations inherent in the “sample” of training data
– does not have enough bias• Example: our “memorizer”
algorithm responding to an “orange shirt” attribute
• Example: medical student who simply memorizes patient case studies, but does not understand how to apply knowledge to new patients
12
![Page 12: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/12.jpg)
Overfitting
• Consider a hypothesis h its……error rate over all training data: error(h, Dtrain)…error rate over all test data: error(h, Dtest)…true error over all data: errortrue(h)
• We say h overfits the training data if…
• Amount of overfitting =
13Slide adapted from Tom Mitchell
errortrue(h) > error(h, Dtrain)
errortrue(h) – error(h, Dtrain)
In practice, errortrue(h) is unknown
![Page 13: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/13.jpg)
Overfitting
• Consider a hypothesis h its……error rate over all training data: error(h, Dtrain)…error rate over all test data: error(h, Dtest)…true error over all data: errortrue(h)
• We say h overfits the training data if…
• Amount of overfitting =
14Slide adapted from Tom Mitchell
errortrue(h) > error(h, Dtrain)
errortrue(h) – error(h, Dtrain)
In practice, errortrue(h) is unknown
![Page 14: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/14.jpg)
Overfitting in Decision Tree Learning
16
24
Figure from Tom Mitchell
![Page 15: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/15.jpg)
How to Avoid Overfitting?
For Decision Trees…1. Do not grow tree beyond some maximum
depth2. Do not split if splitting criterion (e.g. mutual
information) is below some threshold3. Stop growing when the split is not statistically
significant4. Grow the entire tree, then prune
17
![Page 16: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/16.jpg)
18
25
Split data into training and validation set
Create tree that classifies training set correctly
Slide from Tom Mitchell
![Page 17: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/17.jpg)
19
25
Split data into training and validation set
Create tree that classifies training set correctly
Slide from Tom Mitchell
![Page 18: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/18.jpg)
20
25
Split data into training and validation set
Create tree that classifies training set correctly
Slide from Tom Mitchell
IMPORTANT!
Later this lecture we’ll learn that doing
pruning on test data is the wrong thing to do.
Instead, use a third “validation” dataset.
![Page 19: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/19.jpg)
Decision Trees (DTs) in the Wild• DTs are one of the most popular classification methods
for practical applications– Reason #1: The learned representation is easy to explain a
non-ML person– Reason #2: They are efficient in both computation and
memory• DTs can be applied to a wide variety of problems
including classification, regression, density estimation, etc.
• Applications of DTs include…– medicine, molecular biology, text classification,
manufacturing, astronomy, agriculture, and many others• Decision Forests learn many DTs from random subsets of
features; the result is a very powerful example of an ensemble method (discussed later in the course)
23
![Page 20: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/20.jpg)
DT Learning ObjectivesYou should be able to…1. Implement Decision Tree training and prediction2. Use effective splitting criteria for Decision Trees and be able to
define entropy, conditional entropy, and mutual information / information gain
3. Explain the difference between memorization and generalization [CIML]
4. Describe the inductive bias of a decision tree5. Formalize a learning problem by identifying the input space,
output space, hypothesis space, and target function6. Explain the difference between true error and training error7. Judge whether a decision tree is "underfitting" or "overfitting"8. Implement a pruning or early stopping method to combat
overfitting in Decision Tree learning
24
![Page 21: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/21.jpg)
K-NEAREST NEIGHBORS
25
![Page 22: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/22.jpg)
26
![Page 23: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/23.jpg)
Classification
Chalkboard:– Binary classification– 2D examples– Decision rules / hypotheses
27
![Page 24: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/24.jpg)
k-Nearest Neighbors
Chalkboard:– Nearest Neighbor classifier– KNN for binary classification
28
![Page 25: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/25.jpg)
KNN: RemarksDistance Functions:• KNN requires a distance function
• The most common choice is Euclidean distance
• But other choices are just fine (e.g. Manhattan distance)
30
![Page 26: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/26.jpg)
KNN: Remarks
31
In-Class Exercises1. How can we handle ties
for even values of k?
2. What is the inductive bias of KNN?
Answer(s) Here:
![Page 27: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/27.jpg)
KNN: Remarks
33
In-Class Exercises1. How can we handle ties
for even values of k?
2. What is the inductive bias of KNN?
Answer(s) Here:1)– Consider another point– Remove farthest of k
points– Weight votes by
distance– Consider another
distance metric
2)
![Page 28: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/28.jpg)
KNN: Remarks
Inductive Bias:1. Similar points should have similar labels2. All dimensions are created equally!
34
Example: two features for KNN
len
gth
(c
m)
width (cm)
len
gth
(c
m)
width (m)
big problem: feature scale
could dramatically
influence classification
results
![Page 29: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/29.jpg)
KNN: Remarks
Computational Efficiency:
• Suppose we have N training examples, and each one has M features
• Computational complexity for the special case where k=1:
35
Task Naive k-d Tree
Train O(1) ~ O(M N log N)
Predict
(one test example)
O(MN) ~ O(2M log N) on average
Problem: Very fast for small M, but very slow for large M
In practice: use stochastic approximations (very fast, and empirically often as good)
![Page 30: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/30.jpg)
KNN: Remarks
Theoretical Guarantees:
36
Cover & Hart (1967)
Let h(x) be a Nearest Neighbor (k=1) binary classifier. As the number of training examples N goes to infinity…
errortrue(h) < 2 x Bayes Error Rate
“In this sense, it may be said that half the classification information in an infinite sample set is contained in the nearest
neighbor.”
very informally, Bayes Error Rate can be thought of as:‘the best you could possibly do’
![Page 31: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/31.jpg)
Decision Boundary Example
38
In-Class Exercise
Dataset: Outputs {+,-}; Features x1 and x2
Question 1:A. Can a k-Nearest Neighbor classifier
with k=1 achieve zero training error on this dataset?
B. If ‘Yes’, draw the learned decision boundary. If ‘No’, why not?
Question 2:A. Can a Decision Tree classifier achieve
zero training error on this dataset?B. If ‘Yes’, draw the learned decision
bound. If ‘No’, why not?
x1
x2
x1
x2
![Page 32: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/32.jpg)
KNN ON FISHER IRIS DATA
39
![Page 33: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/33.jpg)
Fisher Iris Dataset
Fisher (1936) used 150 measurements of flowers
from 3 different species: Iris setosa (0), Iris virginica (1), Iris versicolor (2) collected by Anderson (1936)
40
Full dataset: https://en.wikipedia.org/wiki/Iris_flower_data_set
Species Sepal Length
Sepal Width
Petal Length
Petal Width
0 4.3 3.0 1.1 0.1
0 4.9 3.6 1.4 0.1
0 5.3 3.7 1.5 0.2
1 4.9 2.4 3.3 1.0
1 5.7 2.8 4.1 1.3
1 6.3 3.3 4.7 1.6
1 6.7 3.0 5.0 1.7
![Page 34: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/34.jpg)
Fisher Iris Dataset
Fisher (1936) used 150 measurements of flowers
from 3 different species: Iris setosa (0), Iris virginica (1), Iris versicolor (2) collected by Anderson (1936)
41
Full dataset: https://en.wikipedia.org/wiki/Iris_flower_data_set
Species Sepal Length
Sepal Width
0 4.3 3.0
0 4.9 3.6
0 5.3 3.7
1 4.9 2.4
1 5.7 2.8
1 6.3 3.3
1 6.7 3.0
Deleted two of the
four features, so that
input space is 2D
![Page 35: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/35.jpg)
KNN on Fisher Iris Data
42
![Page 36: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/36.jpg)
KNN on Fisher Iris Data
46
Special Case: Nearest Neighbor
![Page 37: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/37.jpg)
KNN on Fisher Iris Data
47
![Page 38: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/38.jpg)
KNN on Fisher Iris Data
48
![Page 39: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/39.jpg)
KNN on Fisher Iris Data
49
![Page 40: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/40.jpg)
KNN on Fisher Iris Data
50
![Page 41: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/41.jpg)
KNN on Fisher Iris Data
51
![Page 42: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/42.jpg)
KNN on Fisher Iris Data
52
![Page 43: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/43.jpg)
KNN on Fisher Iris Data
53
![Page 44: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/44.jpg)
KNN on Fisher Iris Data
54
![Page 45: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/45.jpg)
KNN on Fisher Iris Data
55
![Page 46: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/46.jpg)
KNN on Fisher Iris Data
56
![Page 47: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/47.jpg)
KNN on Fisher Iris Data
57
![Page 48: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/48.jpg)
KNN on Fisher Iris Data
58
![Page 49: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/49.jpg)
KNN on Fisher Iris Data
59
![Page 50: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/50.jpg)
KNN on Fisher Iris Data
60
![Page 51: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/51.jpg)
KNN on Fisher Iris Data
61
![Page 52: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/52.jpg)
KNN on Fisher Iris Data
62
![Page 53: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/53.jpg)
KNN on Fisher Iris Data
63
![Page 54: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/54.jpg)
KNN on Fisher Iris Data
64
![Page 55: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/55.jpg)
KNN on Fisher Iris Data
65
![Page 56: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/56.jpg)
KNN on Fisher Iris Data
66
Special Case: Majority Vote
![Page 57: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/57.jpg)
KNN ON GAUSSIAN DATA
67
![Page 58: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/58.jpg)
KNN on Gaussian Data
68
![Page 59: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/59.jpg)
KNN on Gaussian Data
69
![Page 60: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/60.jpg)
KNN on Gaussian Data
70
![Page 61: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/61.jpg)
KNN on Gaussian Data
71
![Page 62: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/62.jpg)
KNN on Gaussian Data
72
![Page 63: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/63.jpg)
KNN on Gaussian Data
73
![Page 64: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/64.jpg)
KNN on Gaussian Data
74
![Page 65: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/65.jpg)
KNN on Gaussian Data
75
![Page 66: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/66.jpg)
KNN on Gaussian Data
76
![Page 67: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/67.jpg)
KNN on Gaussian Data
77
![Page 68: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/68.jpg)
KNN on Gaussian Data
78
![Page 69: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/69.jpg)
KNN on Gaussian Data
79
![Page 70: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/70.jpg)
KNN on Gaussian Data
80
![Page 71: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/71.jpg)
KNN on Gaussian Data
81
![Page 72: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/72.jpg)
KNN on Gaussian Data
82
![Page 73: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/73.jpg)
KNN on Gaussian Data
83
![Page 74: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/74.jpg)
KNN on Gaussian Data
84
![Page 75: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/75.jpg)
KNN on Gaussian Data
85
![Page 76: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/76.jpg)
KNN on Gaussian Data
86
![Page 77: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/77.jpg)
KNN on Gaussian Data
87
![Page 78: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/78.jpg)
KNN on Gaussian Data
88
![Page 79: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/79.jpg)
KNN on Gaussian Data
89
![Page 80: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/80.jpg)
KNN on Gaussian Data
90
![Page 81: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/81.jpg)
KNN on Gaussian Data
91
![Page 82: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/82.jpg)
K-NEAREST NEIGHBORS
93
![Page 83: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/83.jpg)
Questions
• How could k-Nearest Neighbors (KNN) be applied to regression?
• Can we do better than majority vote? (e.g. distance-weighted KNN)
• Where does the Cover & Hart (1967) Bayes error rate bound come from?
94
![Page 84: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/84.jpg)
KNN Learning Objectives
You should be able to…• Describe a dataset as points in a high dimensional space
[CIML]• Implement k-Nearest Neighbors with O(N) prediction• Describe the inductive bias of a k-NN classifier and relate
it to feature scale [a la. CIML]• Sketch the decision boundary for a learning algorithm
(compare k-NN and DT)• State Cover & Hart (1967)'s large sample analysis of a
nearest neighbor classifier• Invent "new" k-NN learning algorithms capable of dealing
with even k• Explain computational and geometric examples of the
curse of dimensionality
95
![Page 85: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/85.jpg)
MODEL SELECTION
96
![Page 86: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/86.jpg)
Model Selection
WARNING: • In some sense, our discussion of model
selection is premature. • The models we have considered thus far are
fairly simple.• The models and the many decisions available
to the data scientist wielding them will grow to be much more complex than what we’ve seen so far.
97
![Page 87: Machine Learning Department School of Computer Science …mgormley/courses/10601/slides/lecture5... · 2020-04-29 · k-Nearest Neighbors + Model Selection 1 10-601 Introduction to](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9b217342834004ee16db79/html5/thumbnails/87.jpg)
Model Selection
Statistics• Def: a model defines the data
generation process (i.e. a set or family of parametric probability distributions)
• Def: model parameters are the values that give rise to a particular probability distribution in the model family
• Def: learning (aka. estimation) is the process of finding the parameters that best fit the data
• Def: hyperparameters are the parameters of a prior distribution over parameters
Machine Learning• Def: (loosely) a model defines the
hypothesis space over which learning performs its search
• Def: model parameters are the numeric values or structure selected by the learning algorithm that give rise to a hypothesis
• Def: the learning algorithm defines the data-driven search over the hypothesis space (i.e. search for good parameters)
• Def: hyperparameters are the tunable aspects of the model, that the learning algorithm does notselect
98