Lecture7 - IBk

Introduction to MachineIntroduction to Machine LearningLearning

Lecture 7Lecture 7 Instance Based Learning

Albert Orriols i Puigi l @ ll l daorriols@salle.url.edu

Artificial Intelligence – Machine LearningEnginyeria i Arquitectura La Salleg y q

Universitat Ramon Llull

Recap of Lecture 6

LET’S START WITH DATA CLASSIFICATIONCLASSIFICATION

Artificial Intelligence Machine Learning

Recap of Lecture 6

Data Set Classification Model How?

We are going to deal with:

• Data described by nominal and continuous attributes

• Data that may have instances with missing values

Recap of Lecture 6We want to build decision trees

How can I automatically generate these typesgenerate these types of trees?

Decide which attribute weDecide which attribute weshould put in each node

Decide a split pointDecide a split point

Rely on information theory

We also saw many other improvements

Today’s Agenda

Classification without building a modelK-Nearest Neighbor (kNN)Effect of KDistance functionsDistance functionsVariants of K-NNStrengths and weaknesses

Classification without Building a Model

Forget about a global model!g gSimply store all the training examples

B ild l l d l f h t t i tBuild a local model for each new test instance

Refered to as lazy learners

Some approaches to IBLSome approaches to IBLNearest neighbors

Locally weighted regression

Case-based reasoning

k-Nearest NeighborsAlgorithmg

Store all the training data

Gi t t i tGiven a new test instanceRecover the k neighbors of the test instanceP di t th j it l th i hbPredict the majority class among the neighbors

Voronoi Cells: The feature space isdecomposed into several cells.

E.g. for k=1

k-Nearest NeighborsBut, where is the learning process?, g p

Select the k neighbors and return the majority class is learning?

N th t’ j t t i iNo, that’s just retrieving

But still, some important issuesWhich k should I use?Which k should I use?

Which distance functions should I use?

Should I maintain all instances of the training data set?

Which k Should I Use?The effect of k

15-NN 1-NN

Do you remember the discussion about overfitting in C4.5?

Apply the same concepts here!

Which k Should I Use?Some experimental results on the use of different kp

Notice that the test error decreases as k increases but at k ≈ 5-

Number of neighbors

Notice that the test error decreases as k increases, but at k ≈ 5-7, it starts increasing again

Rule of thumb: k=3 k=5 and k=7 seem to work ok in the

Rule of thumb: k=3, k=5, and k=7 seem to work ok in the majority of problems

Distance FunctionsDistance functions must be able to

Nominal attributes

C ti tt ib tContinuous attributes

Missing values

The keyThey must return a low value for similar objects and a highThey must return a low value for similar objects and a high value for different objects

Seems obvious right? But still it is domain dependentSeems obvious, right? But still, it is domain dependent

There are many of them. Let’s see some of the most usedused

Distance FunctionsDistance between two points in the same spacep p

d(x, y)

Some properties expected to be satisfied in generald(x, y) ≥ 0 and d(x, x) = 0

d(x y) = d(y x)d(x, y) = d(y, x)

d(x, y) + d(y, z) ≥ d(x, z)

Distances for Continuous Variables

Given x=(x1,…,xn)’ and y=(y1,…,yn)’1 n 1 n

Euclidean ∑ −=n

yxyxd 2/12 ])([)(Euclidean ∑=

iiE yxyxd1

])([),(

Minkowsky ∑ −=n

qqyxyxd /1])([)(Minkowsky ∑=i

iiE yxyxd1

])([),(

Distance absolute value ∑ −=n

iiABS yxyxd ||),( ∑=i

iiABS yy1

What if attributes are measured over different scales?Attribute 1 ranging in [0,1]

Attribute 2 ranging in [0 1000]Attribute 2 ranging in [0, 1000]

Can you detect any potential problem in the aforementioned distance functions?distance functions?

X in [0,1], y in [0,1000] X in [0,1000], y in [0,1000]

The larger the scale, the larger the influence of the g , gattribute in the distance function

Solution: Normalize each attributeSolution: Normalize each attribute

How:Normalization by means of the range

aa exexd )(

exexdexexdnorm minmax

),(),( 2121 −

Normalization by means of the standard deviation

aexexdexexd

norm σ4),(),( 21

Distances for Nominal Attributes

Several metrics to deal with nominal attributesOverlap distance function

Idea: Two nominal attributes are equal only if they have the same value

Distances for Nominal Attributes

Several metrics to deal with nominal attributesValue difference metric (VDM)

C = number of classesP(a ex a c) = conditional probabilityP(a, exi , c) = conditional probability that the output class is c given that the attribute a has de value exi

Idea: Two nominal values are similar if they have more similar correlations with the output classes

See (Wilson & Martinez) for more distance functions

Distances for Heterogeneous Attributes

What if my data set is described by both nominal and continuous attributes?continuous attributes?

Apply the same distance function

Use nominal distance functions for nominal attributes

Use continuous distance function for continuous attributes

Variants of kNN

Different variants of kNN Distance-weighted kNN

Attribute-weighted kNN

Distance-Weighted kNNInference of original kNNg

The k nearest neighbors vote for the class

Shouldn’t the closest examples have a higher influence in theShouldn t the closest examples have a higher influence in the decision process?

Weight the contribution of each of the k neighbors wrt their distanceWeight the contribution of each of the k neighbors wrt their distance

E.g.,))((maxarg)(ˆ k

xfvwxf = ∑ δ k

))(,(maxarg)(

dwwhere

xfvwxf

= ∑=∈

∑== k

xfwxf 1

)()(ˆ

2),( iqi xxd ∑

More robust to noisy instances and outliers

E.g.: Shepard’s method (Shepard,1968)

Attribute-weighted kNNWhat if some attributes are irrelevant or misleading?g

If irrelevant cost increases, but accuracy is not affected

If i l di t i d dIf misleading cost increases and accuracy may decrease

Weight attributes:

d 2)()( ∑=

iiiw yxwyxd1

2)(),(

How to determine the weights?Option 1: The expert provide us with the weightsp p p g

Option 2: Use a machine learning approach

More will be said in the next lecture!

Strengths and WeaknessesStrengths of kNN

Building of a new local model for each test instance

Learning has no costLearning has no cost

Empirical results show that the method is highly accurate w.r.t other machine learning techniquesmachine learning techniques

WeaknessesRetrieving approach, but does not learn

No global model. The knowledge is not legible

Test cost increases linearly with the input instances

No generalizationNo generalization

Curse of dimensionality: What happens if we have many attributes?

Noise and outliers may have a very negative effect

Next Class

From instance-based to case-based reasoning

A little bit more on learningDistance functions

Prototype selection

Introduction to MachineIntroduction to Machine LearningLearning

Lecture 7Lecture 7 Instance Based Learning

Albert Orriols i Puigi l @ ll l daorriols@salle.url.edu

Artificial Intelligence – Machine LearningEnginyeria i Arquitectura La Salleg y q

Universitat Ramon Llull

Lecture7 - IBk

Education

Transcript of Lecture7 - IBk

Algorithms Lecture7

Oop lecture7

Lecture7: 123.312

Lecture7 New

Friday, February 7, 2020 10:05 AMzaki/MLIB/lecture7.pdf · Friday, February 7, 2020 10:05 AM lecture7 Page 1 . lecture7 Page 2 . lecture7 Page 3

Inference Lecture7

Macro Lecture7

ICS2208 lecture7

Lecture7: Bridging

Me330 lecture7

8477 lecture7

Lecture7 Pricing

lecture7 22042015

Lecture7 CAP

Ibk Femoral Triangle Pbl

Match kumla ibk 20150214 h1 etidning

Ikalafeng IBK - PHD Thesis July 2014

Logic Lecture7

Lecture7: 123.101

lecture7 Page 1 - Electrical and Computer Engineering · 2011-02-15 · Thursday, February 10, 2011 8:08 AM lecture7 Page 1 . lecture7 Page 2 . lecture7 Page 3