Post on 24-Jan-2015
description
Introduction to MachineIntroduction to Machine LearningLearning
Lecture 7Lecture 7 Instance Based Learning
Albert Orriols i Puigi l @ ll l daorriols@salle.url.edu
Artificial Intelligence – Machine LearningEnginyeria i Arquitectura La Salleg y q
Universitat Ramon Llull
Recap of Lecture 6
LET’S START WITH DATA CLASSIFICATIONCLASSIFICATION
Slide 2Artificial Intelligence Machine Learning
Recap of Lecture 6
Data Set Classification Model How?
We are going to deal with:
• Data described by nominal and continuous attributes
• Data that may have instances with missing values
Slide 3Artificial Intelligence Machine Learning
Recap of Lecture 6We want to build decision trees
How can I automatically generate these typesgenerate these types of trees?
Decide which attribute weDecide which attribute weshould put in each node
Decide a split pointDecide a split point
Rely on information theory
We also saw many other improvements
Slide 4Artificial Intelligence Machine Learning
Today’s Agenda
Classification without building a modelK-Nearest Neighbor (kNN)Effect of KDistance functionsDistance functionsVariants of K-NNStrengths and weaknesses
Slide 5Artificial Intelligence Machine Learning
Classification without Building a Model
Forget about a global model!g gSimply store all the training examples
B ild l l d l f h t t i tBuild a local model for each new test instance
Refered to as lazy learners
Some approaches to IBLSome approaches to IBLNearest neighbors
Locally weighted regression
Case-based reasoning
Slide 6Artificial Intelligence Machine Learning
k-Nearest NeighborsAlgorithmg
Store all the training data
Gi t t i tGiven a new test instanceRecover the k neighbors of the test instanceP di t th j it l th i hbPredict the majority class among the neighbors
Voronoi Cells: The feature space isdecomposed into several cells.
E.g. for k=1
Slide 7Artificial Intelligence Machine Learning
k-Nearest NeighborsBut, where is the learning process?, g p
Select the k neighbors and return the majority class is learning?
N th t’ j t t i iNo, that’s just retrieving
But still, some important issuesWhich k should I use?Which k should I use?
Which distance functions should I use?
Should I maintain all instances of the training data set?
Slide 8Artificial Intelligence Machine Learning
Which k Should I Use?The effect of k
15-NN 1-NN
Do you remember the discussion about overfitting in C4.5?
Slide 9
Apply the same concepts here!
Artificial Intelligence Machine Learning
Which k Should I Use?Some experimental results on the use of different kp
7-NN
Notice that the test error decreases as k increases but at k ≈ 5-
Number of neighbors
Notice that the test error decreases as k increases, but at k ≈ 5-7, it starts increasing again
Rule of thumb: k=3 k=5 and k=7 seem to work ok in the
Slide 10
Rule of thumb: k=3, k=5, and k=7 seem to work ok in the majority of problems
Artificial Intelligence Machine Learning
Distance FunctionsDistance functions must be able to
Nominal attributes
C ti tt ib tContinuous attributes
Missing values
The keyThey must return a low value for similar objects and a highThey must return a low value for similar objects and a high value for different objects
Seems obvious right? But still it is domain dependentSeems obvious, right? But still, it is domain dependent
There are many of them. Let’s see some of the most usedused
Slide 11Artificial Intelligence Machine Learning
Distance FunctionsDistance between two points in the same spacep p
d(x, y)
Some properties expected to be satisfied in generald(x, y) ≥ 0 and d(x, x) = 0
d(x y) = d(y x)d(x, y) = d(y, x)
d(x, y) + d(y, z) ≥ d(x, z)
Slide 12Artificial Intelligence Machine Learning
Distances for Continuous Variables
Given x=(x1,…,xn)’ and y=(y1,…,yn)’1 n 1 n
Euclidean ∑ −=n
yxyxd 2/12 ])([)(Euclidean ∑=
=i
iiE yxyxd1
])([),(
Minkowsky ∑ −=n
qqyxyxd /1])([)(Minkowsky ∑=i
iiE yxyxd1
])([),(
Distance absolute value ∑ −=n
iiABS yxyxd ||),( ∑=i
iiABS yy1
||),(
Slide 13Artificial Intelligence Machine Learning
Distances for Continuous Variables
What if attributes are measured over different scales?Attribute 1 ranging in [0,1]
Attribute 2 ranging in [0 1000]Attribute 2 ranging in [0, 1000]
Can you detect any potential problem in the aforementioned distance functions?distance functions?
Slide 14Artificial Intelligence Machine Learning
X in [0,1], y in [0,1000] X in [0,1000], y in [0,1000]
Distances for Continuous Variables
The larger the scale, the larger the influence of the g , gattribute in the distance function
Solution: Normalize each attributeSolution: Normalize each attribute
How:Normalization by means of the range
aa exexd )(
aa
aaa
exexdexexdnorm minmax
),(),( 2121 −
=
Normalization by means of the standard deviation
aaaa
aexexdexexd
norm σ4),(),( 21
21 =
Slide 15Artificial Intelligence Machine Learning
aσ4
Distances for Nominal Attributes
Several metrics to deal with nominal attributesOverlap distance function
Idea: Two nominal attributes are equal only if they have the same value
Slide 16Artificial Intelligence Machine Learning
Distances for Nominal Attributes
Several metrics to deal with nominal attributesValue difference metric (VDM)
C = number of classesP(a ex a c) = conditional probabilityP(a, exi , c) = conditional probability that the output class is c given that the attribute a has de value exi
a.
Idea: Two nominal values are similar if they have more similar correlations with the output classes
Slide 17
See (Wilson & Martinez) for more distance functions
Artificial Intelligence Machine Learning
Distances for Heterogeneous Attributes
What if my data set is described by both nominal and continuous attributes?continuous attributes?
Apply the same distance function
Use nominal distance functions for nominal attributes
Use continuous distance function for continuous attributes
Slide 18Artificial Intelligence Machine Learning
Variants of kNN
Different variants of kNN Distance-weighted kNN
Attribute-weighted kNN
Slide 19Artificial Intelligence Machine Learning
Distance-Weighted kNNInference of original kNNg
The k nearest neighbors vote for the class
Shouldn’t the closest examples have a higher influence in theShouldn t the closest examples have a higher influence in the decision process?
Weight the contribution of each of the k neighbors wrt their distanceWeight the contribution of each of the k neighbors wrt their distance
E.g.,))((maxarg)(ˆ k
xfvwxf = ∑ δ k
2
1
)(1
))(,(maxarg)(
i
iii
Vvq
dwwhere
xfvwxf
=
= ∑=∈
δ
∑
∑== k
i
iii
q
w
xfwxf 1
)()(ˆ
2),( iqi xxd ∑
=iiw
1
More robust to noisy instances and outliers
E.g.: Shepard’s method (Shepard,1968)
Slide 20Artificial Intelligence Machine Learning
Attribute-weighted kNNWhat if some attributes are irrelevant or misleading?g
If irrelevant cost increases, but accuracy is not affected
If i l di t i d dIf misleading cost increases and accuracy may decrease
Weight attributes:
∑n
d 2)()( ∑=
−=i
iiiw yxwyxd1
2)(),(
How to determine the weights?Option 1: The expert provide us with the weightsp p p g
Option 2: Use a machine learning approach
More will be said in the next lecture!
Slide 21
More will be said in the next lecture!
Artificial Intelligence Machine Learning
Strengths and WeaknessesStrengths of kNN
Building of a new local model for each test instance
Learning has no costLearning has no cost
Empirical results show that the method is highly accurate w.r.t other machine learning techniquesmachine learning techniques
WeaknessesRetrieving approach, but does not learn
No global model. The knowledge is not legible
Test cost increases linearly with the input instances
No generalizationNo generalization
Curse of dimensionality: What happens if we have many attributes?
Slide 22
Noise and outliers may have a very negative effect
Artificial Intelligence Machine Learning
Next Class
From instance-based to case-based reasoning
A little bit more on learningDistance functions
Prototype selection
Slide 23Artificial Intelligence Machine Learning
Introduction to MachineIntroduction to Machine LearningLearning
Lecture 7Lecture 7 Instance Based Learning
Albert Orriols i Puigi l @ ll l daorriols@salle.url.edu
Artificial Intelligence – Machine LearningEnginyeria i Arquitectura La Salleg y q
Universitat Ramon Llull