Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of...

109
Finding Predictors: Nearest Neighbor Modern Motivations: Be Lazy! Classification Regression Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. c Michael R. Berthold, Christian Borgelt, Frank H¨oppner, Frank Klawonn and Iris Ad¨ a 1 / 49

Transcript of Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of...

Page 1: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Finding Predictors: Nearest Neighbor

Modern Motivations: Be Lazy!

ClassificationRegressionChoosing the right number of neighbors

Some Optimizations

Other types of lazy algorithms

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 1 / 49

Page 2: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Motivation: A Zoo

Given: Information about animals in the zooHow can we classify new animals?

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 2 / 49

Page 3: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Motivation: A Zoo

Given: Information about animals in the zooHow can we classify new animals?

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 2 / 49

Page 4: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Motivation: A Zoo

Given: Information about animals in the zooHow can we classify new animals?

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 2 / 49

Page 5: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Motivation: A Zoo

Given: Information about animals in the zooHow can we classify new animals?

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 2 / 49

Page 6: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Motivation: A Zoo

Given: Information about animals in the zooHow can we classify new animals?

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 2 / 49

Page 7: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Remember ?

Unsupervised vs. Supervised Learning

Unsupervised: No class information given.Goal: detect unknown patterns (e.g. clusters, association rules)

Supervised: Class information exists/is provided by supervisor.Goal: learn class structure for future unclassified/unknown data.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 3 / 49

Page 8: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Motivation: Expert/Legal Systems/Streber

How do Expert Systems work?

Find most similar case(s).

How does the American Justice System work?

Find most similar case(s).

How does the nerd (“Streber”) learn?

He learns by heart.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 4 / 49

Page 9: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Motivation: Expert/Legal Systems/Streber

How do Expert Systems work?

Find most similar case(s).

How does the American Justice System work?

Find most similar case(s).

How does the nerd (“Streber”) learn?

He learns by heart.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 4 / 49

Page 10: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Motivation: Expert/Legal Systems/Streber

How do Expert Systems work?

Find most similar case(s).

How does the American Justice System work?

Find most similar case(s).

How does the nerd (“Streber”) learn?

He learns by heart.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 4 / 49

Page 11: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Motivation: Expert/Legal Systems/Streber

How do Expert Systems work?

Find most similar case(s).

How does the American Justice System work?

Find most similar case(s).

How does the nerd (“Streber”) learn?

He learns by heart.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 4 / 49

Page 12: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Motivation: Expert/Legal Systems/Streber

How do Expert Systems work?

Find most similar case(s).

How does the American Justice System work?

Find most similar case(s).

How does the nerd (“Streber”) learn?

He learns by heart.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 4 / 49

Page 13: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Motivation: Expert/Legal Systems/Streber

How do Expert Systems work?

Find most similar case(s).

How does the American Justice System work?

Find most similar case(s).

How does the nerd (“Streber”) learn?

He learns by heart.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 4 / 49

Page 14: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Eager vs. Lazy Learners

Eager vs. Lazy Learners

Lazy: Save all data from training, use it for classifying(The learner was lazy, classifier has to do the work)

Eager: Builds a (compact) model/structure during training, usemodel for classification.(The learner was eager/worked harder, classifier has a simple life.)

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 5 / 49

Page 15: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Nearest neighbour predictors

Nearest neighbour predictors are special case of instance-basedlearning.

Instead of constructing a model that generalizes beyond the trainingdata, the training examples are merely stored.

Predictions for new cases are derived directly from these storedexamples and their (known) classes or target values.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 6 / 49

Page 16: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Nearest neighbour predictors

Nearest neighbour predictors are special case of instance-basedlearning.

Instead of constructing a model that generalizes beyond the trainingdata, the training examples are merely stored.

Predictions for new cases are derived directly from these storedexamples and their (known) classes or target values.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 6 / 49

Page 17: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Nearest neighbour predictors

Nearest neighbour predictors are special case of instance-basedlearning.

Instead of constructing a model that generalizes beyond the trainingdata, the training examples are merely stored.

Predictions for new cases are derived directly from these storedexamples and their (known) classes or target values.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 6 / 49

Page 18: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Simple nearest neighbour predictor

For a new instance, use the target value of the closest neighbour in thetraining set.

Classification

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 7 / 49

Page 19: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Simple nearest neighbour predictor

For a new instance, use the target value of the closest neighbour in thetraining set.

input

output

Classification Regression

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 7 / 49

Page 20: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Nearest Neighbour Predictor: Issues

Noisy Data is a problem:

How can we fix this?

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 8 / 49

Page 21: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

k-nearest neighbour predictor

Instead of relying for the prediction on only one instance, the (single)nearest neighbour, usually the k (k > 1) are taken into account,leading to the k-nearest neighbour predictor.

Classification: Choose the majority class among the k nearestneighbours for prediction.

Regression: Take the mean value of the k nearest neighbours forprediction.

All k nearest neighbours have the same influence on the prediction.

Closer nearest neighbours should have higher influence on theprediction.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 9 / 49

Page 22: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

k-nearest neighbour predictor

Instead of relying for the prediction on only one instance, the (single)nearest neighbour, usually the k (k > 1) are taken into account,leading to the k-nearest neighbour predictor.

Classification: Choose the majority class among the k nearestneighbours for prediction.

Regression: Take the mean value of the k nearest neighbours forprediction.

All k nearest neighbours have the same influence on the prediction.

Closer nearest neighbours should have higher influence on theprediction.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 9 / 49

Page 23: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

k-nearest neighbour predictor

Instead of relying for the prediction on only one instance, the (single)nearest neighbour, usually the k (k > 1) are taken into account,leading to the k-nearest neighbour predictor.

Classification: Choose the majority class among the k nearestneighbours for prediction.

Regression: Take the mean value of the k nearest neighbours forprediction.

All k nearest neighbours have the same influence on the prediction.

Closer nearest neighbours should have higher influence on theprediction.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 9 / 49

Page 24: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

k-nearest neighbour predictor

Instead of relying for the prediction on only one instance, the (single)nearest neighbour, usually the k (k > 1) are taken into account,leading to the k-nearest neighbour predictor.

Classification: Choose the majority class among the k nearestneighbours for prediction.

Regression: Take the mean value of the k nearest neighbours forprediction.

Disadvantage:

All k nearest neighbours have the same influence on the prediction.

Closer nearest neighbours should have higher influence on theprediction.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 9 / 49

Page 25: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

k-nearest neighbour predictor

Instead of relying for the prediction on only one instance, the (single)nearest neighbour, usually the k (k > 1) are taken into account,leading to the k-nearest neighbour predictor.

Classification: Choose the majority class among the k nearestneighbours for prediction.

Regression: Take the mean value of the k nearest neighbours forprediction.

Disadvantage:

All k nearest neighbours have the same influence on the prediction.

Closer nearest neighbours should have higher influence on theprediction.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 9 / 49

Page 26: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

k-nearest neighbour predictor

Instead of relying for the prediction on only one instance, the (single)nearest neighbour, usually the k (k > 1) are taken into account,leading to the k-nearest neighbour predictor.

Classification: Choose the majority class among the k nearestneighbours for prediction.

Regression: Take the mean value of the k nearest neighbours forprediction.

Disadvantage:

All k nearest neighbours have the same influence on the prediction.

Closer nearest neighbours should have higher influence on theprediction.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 9 / 49

Page 27: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Ingredients for the k-nearest neighbour predictor

Distance Metric:The distance metric, together with a possible task-specific scaling orweighting of the attributes, determines which of the training examplesare nearest to a query data point and thus selects the trainingexample(s) used to produce a prediction.

Number of Neighbours:The number of neighbours of the query point that are considered canrange from only one (the basic nearest neighbour approach) througha few (like k-nearest neighbour approaches) to, in principle, all datapoints as an extreme case.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 10 / 49

Page 28: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Ingredients for the k-nearest neighbour predictor

Distance Metric:The distance metric, together with a possible task-specific scaling orweighting of the attributes, determines which of the training examplesare nearest to a query data point and thus selects the trainingexample(s) used to produce a prediction.

Number of Neighbours:The number of neighbours of the query point that are considered canrange from only one (the basic nearest neighbour approach) througha few (like k-nearest neighbour approaches) to, in principle, all datapoints as an extreme case.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 10 / 49

Page 29: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Ingredients for the k-nearest neighbour predictor

Distance Metric:The distance metric, together with a possible task-specific scaling orweighting of the attributes, determines which of the training examplesare nearest to a query data point and thus selects the trainingexample(s) used to produce a prediction.

Number of Neighbours:The number of neighbours of the query point that are considered canrange from only one (the basic nearest neighbour approach) througha few (like k-nearest neighbour approaches) to, in principle, all datapoints as an extreme case (would that be a good idea?).

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 10 / 49

Page 30: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Ingredients for the k-nearest neighbour predictor

weighting function for the neighboursWeighting function defined on the distance of a neighbour from thequery point, which yields higher values for smaller distances.

prediction functionFor multiple neighbours, one needs a procedure to compute theprediction from the (generally differing) classes or target values ofthese neighbours, since they may differ and thus may not yield aunique prediction directly.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 11 / 49

Page 31: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Ingredients for the k-nearest neighbour predictor

weighting function for the neighboursWeighting function defined on the distance of a neighbour from thequery point, which yields higher values for smaller distances.

prediction functionFor multiple neighbours, one needs a procedure to compute theprediction from the (generally differing) classes or target values ofthese neighbours, since they may differ and thus may not yield aunique prediction directly.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 11 / 49

Page 32: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

k Nearest neighbour predictor

input

output

input

output

Average (3 nearest neighbours) Distance weighted (2 nearest neighbours)

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 12 / 49

Page 33: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Nearest neighbour predictor

Choosing the “ingredients”

distance metricProblem dependent. Often Euclidean distance (after normalisation).

number of neighboursVery often chosen on the basis of cross-validation. Choose k thatleads to the best performance for cross-validation.

weighting function for the neighbours

E.g. tricubic weighting function: w(si, q, k) =

(1−

(d(si,q)

dmax(q,k)

)3)3q Query pointsi (input vector of) the i-th nearest neighbour of q in the training datasetk number of considered neighboursd employed distance functiondmax(q, k) maximum distance between any two nearest neighbours andthe distances of the nearest neighbours to the query point

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 13 / 49

Page 34: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Nearest neighbour predictor

Choosing the “ingredients”

distance metricProblem dependent. Often Euclidean distance (after normalisation).

number of neighboursVery often chosen on the basis of cross-validation. Choose k thatleads to the best performance for cross-validation.

weighting function for the neighbours

E.g. tricubic weighting function: w(si, q, k) =

(1−

(d(si,q)

dmax(q,k)

)3)3q Query pointsi (input vector of) the i-th nearest neighbour of q in the training datasetk number of considered neighboursd employed distance functiondmax(q, k) maximum distance between any two nearest neighbours andthe distances of the nearest neighbours to the query point

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 13 / 49

Page 35: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Nearest neighbour predictor

Choosing the “ingredients”

distance metricProblem dependent. Often Euclidean distance (after normalisation).

number of neighboursVery often chosen on the basis of cross-validation. Choose k thatleads to the best performance for cross-validation.

weighting function for the neighbours

E.g. tricubic weighting function: w(si, q, k) =

(1−

(d(si,q)

dmax(q,k)

)3)3q Query pointsi (input vector of) the i-th nearest neighbour of q in the training datasetk number of considered neighboursd employed distance functiondmax(q, k) maximum distance between any two nearest neighbours andthe distances of the nearest neighbours to the query point

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 13 / 49

Page 36: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Nearest neighbour predictor

Choosing the “ingredients”

distance metricProblem dependent. Often Euclidean distance (after normalisation).

number of neighboursVery often chosen on the basis of cross-validation. Choose k thatleads to the best performance for cross-validation.

weighting function for the neighbours

E.g. tricubic weighting function: w(si, q, k) =

(1−

(d(si,q)

dmax(q,k)

)3)3q Query pointsi (input vector of) the i-th nearest neighbour of q in the training datasetk number of considered neighboursd employed distance functiondmax(q, k) maximum distance between any two nearest neighbours andthe distances of the nearest neighbours to the query point

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 13 / 49

Page 37: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Nearest neighbour predictor

Choosing the “ingredients”

prediction function

Regression:

Compute the weighted average of the target values of the nearestneighbours.

Classification:

Sum up the weights for each class among the nearest neighbours.Choose the class with the highest value(or incorporate a cost matrix and interpret the summed weights for theclasses as likelihoods).

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 14 / 49

Page 38: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Kernel functions

A k-nearest neighbour predictor with a weighting function can beinterpreted as an n-nearest neighbour predictor with a modifiedweighting function where n is the number of (training) data.

The modified weighting function simply assigns the weight 0 to allinstances that do not belong to the k nearest neighbours.

More general approach: Use a general kernel function that assigns adistance-dependent weight to all instances in the training data set.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 15 / 49

Page 39: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Kernel functions

A k-nearest neighbour predictor with a weighting function can beinterpreted as an n-nearest neighbour predictor with a modifiedweighting function where n is the number of (training) data.

The modified weighting function simply assigns the weight 0 to allinstances that do not belong to the k nearest neighbours.

More general approach: Use a general kernel function that assigns adistance-dependent weight to all instances in the training data set.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 15 / 49

Page 40: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Kernel functions

A k-nearest neighbour predictor with a weighting function can beinterpreted as an n-nearest neighbour predictor with a modifiedweighting function where n is the number of (training) data.

The modified weighting function simply assigns the weight 0 to allinstances that do not belong to the k nearest neighbours.

More general approach: Use a general kernel function that assigns adistance-dependent weight to all instances in the training data set.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 15 / 49

Page 41: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Kernel functions

Such a kernel function K assigning a weight to each data point thatdepends on its distance d to the query point should satisfy the followingproperties:

K(d) ≥ 0

K(0) = 1 (or at least, K has its mode at 0)

K(d) decreases monotonously with increasing d.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 16 / 49

Page 42: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Kernel functions

Such a kernel function K assigning a weight to each data point thatdepends on its distance d to the query point should satisfy the followingproperties:

K(d) ≥ 0

K(0) = 1 (or at least, K has its mode at 0)

K(d) decreases monotonously with increasing d.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 16 / 49

Page 43: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Kernel functions

Such a kernel function K assigning a weight to each data point thatdepends on its distance d to the query point should satisfy the followingproperties:

K(d) ≥ 0

K(0) = 1 (or at least, K has its mode at 0)

K(d) decreases monotonously with increasing d.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 16 / 49

Page 44: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Kernel functions

Such a kernel function K assigning a weight to each data point thatdepends on its distance d to the query point should satisfy the followingproperties:

K(d) ≥ 0

K(0) = 1 (or at least, K has its mode at 0)

K(d) decreases monotonously with increasing d.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 16 / 49

Page 45: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Kernel functions

Typical examples for kernel functions (σ > 0 is a predefined constant):

Krect(d) =

{1 if d ≤ σ,0 otherwise

Ktriangle(d) = Krect(d) · (1− d/σ)

Ktricubic(d) = Krect(d) · (1− d3/σ3)3

Kgauss(d) = exp(− d2

2σ2

)

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 17 / 49

Page 46: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Kernel functions

Typical examples for kernel functions (σ > 0 is a predefined constant):

Krect(d) =

{1 if d ≤ σ,0 otherwise

Ktriangle(d) = Krect(d) · (1− d/σ)

Ktricubic(d) = Krect(d) · (1− d3/σ3)3

Kgauss(d) = exp(− d2

2σ2

)

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 17 / 49

Page 47: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Kernel functions

Typical examples for kernel functions (σ > 0 is a predefined constant):

Krect(d) =

{1 if d ≤ σ,0 otherwise

Ktriangle(d) = Krect(d) · (1− d/σ)

Ktricubic(d) = Krect(d) · (1− d3/σ3)3

Kgauss(d) = exp(− d2

2σ2

)

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 17 / 49

Page 48: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Kernel functions

Typical examples for kernel functions (σ > 0 is a predefined constant):

Krect(d) =

{1 if d ≤ σ,0 otherwise

Ktriangle(d) = Krect(d) · (1− d/σ)

Ktricubic(d) = Krect(d) · (1− d3/σ3)3

Kgauss(d) = exp(− d2

2σ2

)

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 17 / 49

Page 49: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Locally weighted (polynomial) regression

For regression problems: So far, weighted averaging of the targetvalues.

Instead of a simple weighted average, one can also compute a (local)regression function at the query point taking the weights into account.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 18 / 49

Page 50: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Locally weighted polynomial regression

input

output

input

output

Kernel weighted regression (left) vs. distance-weighted 4-nearest neighbourregression (tricubic weighting function, right) in one dimension.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 19 / 49

Page 51: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Adjusting the distance function

The choice of the distance function is crucial for the success of anearest neighbour approach.

One can try do adapt the distance function.

One way to adapt the distance function is feature weights to put astronger emphasis on those feature that are more important.

A configuration of feature weights can be evaluated based oncross-validation.

The optimisation of the feature weights can then be carried out basedon some heuristic strategy like hill climbing, simulated annealing,evolutionary algorithms.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 20 / 49

Page 52: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Adjusting the distance function

The choice of the distance function is crucial for the success of anearest neighbour approach.

One can try do adapt the distance function.

One way to adapt the distance function is feature weights to put astronger emphasis on those feature that are more important.

A configuration of feature weights can be evaluated based oncross-validation.

The optimisation of the feature weights can then be carried out basedon some heuristic strategy like hill climbing, simulated annealing,evolutionary algorithms.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 20 / 49

Page 53: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Adjusting the distance function

The choice of the distance function is crucial for the success of anearest neighbour approach.

One can try do adapt the distance function.

One way to adapt the distance function is feature weights to put astronger emphasis on those feature that are more important.

A configuration of feature weights can be evaluated based oncross-validation.

The optimisation of the feature weights can then be carried out basedon some heuristic strategy like hill climbing, simulated annealing,evolutionary algorithms.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 20 / 49

Page 54: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Adjusting the distance function

The choice of the distance function is crucial for the success of anearest neighbour approach.

One can try do adapt the distance function.

One way to adapt the distance function is feature weights to put astronger emphasis on those feature that are more important.

A configuration of feature weights can be evaluated based oncross-validation.

The optimisation of the feature weights can then be carried out basedon some heuristic strategy like hill climbing, simulated annealing,evolutionary algorithms.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 20 / 49

Page 55: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Adjusting the distance function

The choice of the distance function is crucial for the success of anearest neighbour approach.

One can try do adapt the distance function.

One way to adapt the distance function is feature weights to put astronger emphasis on those feature that are more important.

A configuration of feature weights can be evaluated based oncross-validation.

The optimisation of the feature weights can then be carried out basedon some heuristic strategy like hill climbing, simulated annealing,evolutionary algorithms.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 20 / 49

Page 56: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Data set reduction, prototype building

Advantage of the nearest neighbour approach:No time for training is needed – at least when no feature weightadaptation is carried out or the number of nearest neighbours is fixedin advance.

Disadvantage:Calculation of the predicted class or value can take long when thedata set is large.

Finding a smaller subset of the training set for the nearest neighbourpredictor.

Building prototypes by merging (close) instances, for instance byaveraging.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 21 / 49

Page 57: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Data set reduction, prototype building

Advantage of the nearest neighbour approach:No time for training is needed – at least when no feature weightadaptation is carried out or the number of nearest neighbours is fixedin advance.

Disadvantage:Calculation of the predicted class or value can take long when thedata set is large.

Finding a smaller subset of the training set for the nearest neighbourpredictor.

Building prototypes by merging (close) instances, for instance byaveraging.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 21 / 49

Page 58: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Data set reduction, prototype building

Advantage of the nearest neighbour approach:No time for training is needed – at least when no feature weightadaptation is carried out or the number of nearest neighbours is fixedin advance.

Disadvantage:Calculation of the predicted class or value can take long when thedata set is large.

Possible solutions:

Finding a smaller subset of the training set for the nearest neighbourpredictor.

Building prototypes by merging (close) instances, for instance byaveraging.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 21 / 49

Page 59: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Data set reduction, prototype building

Advantage of the nearest neighbour approach:No time for training is needed – at least when no feature weightadaptation is carried out or the number of nearest neighbours is fixedin advance.

Disadvantage:Calculation of the predicted class or value can take long when thedata set is large.

Possible solutions:

Finding a smaller subset of the training set for the nearest neighbourpredictor.

Building prototypes by merging (close) instances, for instance byaveraging.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 21 / 49

Page 60: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Data set reduction, prototype building

Advantage of the nearest neighbour approach:No time for training is needed – at least when no feature weightadaptation is carried out or the number of nearest neighbours is fixedin advance.

Disadvantage:Calculation of the predicted class or value can take long when thedata set is large.

Possible solutions:

Finding a smaller subset of the training set for the nearest neighbourpredictor.

Building prototypes by merging (close) instances, for instance byaveraging.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 21 / 49

Page 61: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Data set reduction, prototype building

Advantage of the nearest neighbour approach:No time for training is needed – at least when no feature weightadaptation is carried out or the number of nearest neighbours is fixedin advance.

Disadvantage:Calculation of the predicted class or value can take long when thedata set is large.

Possible solutions:

Finding a smaller subset of the training set for the nearest neighbourpredictor.

Building prototypes by merging (close) instances, for instance byaveraging.

Can be carried out based on cross-validation and using heuristicoptimisation strategies.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 21 / 49

Page 62: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Choice of parameter k

Linear classification problem (with some noise):

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 22 / 49

Page 63: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Choice of parameter k

1 nearest neighbour:

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 23 / 49

Page 64: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Choice of parameter k

2 nearest neighbour:

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 24 / 49

Page 65: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Choice of parameter k

5 nearest neighbour:

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 25 / 49

Page 66: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Choice of parameter k

50 nearest neighbour:

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 26 / 49

Page 67: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Choice of parameter k

470 nearest neighbour:

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 27 / 49

Page 68: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Choice of parameter k

480 nearest neighbour:

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 28 / 49

Page 69: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Choice of parameter k

500 nearest neighbour:

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 29 / 49

Page 70: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Choice of Parameter k

k=1 yields y=piecewise constant labeling

”too small” k: very sensitive to outliers

”too large” k: many objects from other clusters (classes) in thedecision set

k = N predicts y=globally constant (majority) label

The selection of k depends from various input ”parameters” :

the size n of the data set

the quality of the data

...Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 30 / 49

Page 71: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Choice of Parameter k: cont.

Simple classifier, k = 1, 2, . . .

Concept, Images, and Analysis from Peter Flach.Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 31 / 49

Page 72: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Choice of Parameter k: cont.

Simple data

Concept, Images, and Analysis from Peter Flach.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 32 / 49

Page 73: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Choice of Parameter k: cont.

Simple classifier, k = 1. Voronoi Tesselation of input space.

Concept, Images, and Analysis from Peter Flach.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 33 / 49

Page 74: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Choice of Parameter k: cont.

Simple classifier, k = 1. ...and classification.

Concept, Images, and Analysis from Peter Flach.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 34 / 49

Page 75: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Choice of Parameter k: cont.

Simple classifier, k = 1

Concept, Images, and Analysis from Peter Flach.Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 35 / 49

Page 76: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Choice of Parameter k: cont.

Simple classifier, k = 2

Concept, Images, and Analysis from Peter Flach.Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 36 / 49

Page 77: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Choice of Parameter k: cont.

Simple classifier, k = 2

Concept, Images, and Analysis from Peter Flach.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 37 / 49

Page 78: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Choice of Parameter k: cont.

Simple classifier, k = 3

Concept, Images, and Analysis from Peter Flach.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 38 / 49

Page 79: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Choice of Parameter k

k = 1: highly localized classifier, perfectly fits separable training data

k > 1:

the instance space partition refinesmore segments are labelled with the same local models

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 39 / 49

Page 80: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Choice of Parameter k - Cross Validation

k is mostly determined manual or heuristic

One heuristic : Cross Validation

1 Select a cross validation method (e.g. q-fold cross validation withD = D1

.∪ ...

.∪ Dq)

2 Select a range for k (e.g. 1 < k <= kmax)

3 Select an evaluation measure (e.g.E(k) =

∑qi=1

∑x∈Di

p(x is correct classified|D \Di))

4 Use k which results in minimal kbest = argmin (E(k))

Can we do this in KNIME?...

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 40 / 49

Page 81: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Choice of Parameter k - Cross Validation

k is mostly determined manual or heuristic

One heuristic : Cross Validation

1 Select a cross validation method (e.g. q-fold cross validation withD = D1

.∪ ...

.∪ Dq)

2 Select a range for k (e.g. 1 < k <= kmax)

3 Select an evaluation measure (e.g.E(k) =

∑qi=1

∑x∈Di

p(x is correct classified|D \Di))

4 Use k which results in minimal kbest = argmin (E(k))

Can we do this in KNIME?...

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 40 / 49

Page 82: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

kNN Classifier: Summary

Instance Based Classifier: Remembers all training cases

Sensitive to neighborhood:

Distance FunctionNeighborhood WeightingPrediction (Aggregation) Function

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 41 / 49

Page 83: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Food for Thought: 1-NN Classifier

Bias of the Learning Algorithm?

No variations in search: simple store all examples

Model Bias?

Classification via Nearest Neighbor

Hypothesis Space?

One hypothesis only: Voronoi partitioning of space

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 42 / 49

Page 84: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Food for Thought: 1-NN Classifier

Bias of the Learning Algorithm?

No variations in search: simple store all examples

Model Bias?

Classification via Nearest Neighbor

Hypothesis Space?

One hypothesis only: Voronoi partitioning of space

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 42 / 49

Page 85: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Food for Thought: 1-NN Classifier

Bias of the Learning Algorithm?

No variations in search: simple store all examples

Model Bias?

Classification via Nearest Neighbor

Hypothesis Space?

One hypothesis only: Voronoi partitioning of space

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 42 / 49

Page 86: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Food for Thought: 1-NN Classifier

Bias of the Learning Algorithm?

No variations in search: simple store all examples

Model Bias?

Classification via Nearest Neighbor

Hypothesis Space?

One hypothesis only: Voronoi partitioning of space

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 42 / 49

Page 87: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Again: Lazy vs. Eager Learners

kNN learns a local model at query time

Previous algorithms (k-means, ID3, ...) learn a global model beforequery time.

Lazy Algorithms:

do nothing during training (just store examples)Generate new hypothesis for each query (“class A!” in case of kNN!)

Eager Algorithms:

do as much as possible during training (ideally: extract the onerelevant rule!)Generate one global hypothesis (or a set, see Candidate-Elimination)once.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 43 / 49

Page 88: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Again: Lazy vs. Eager Learners

kNN learns a local model at query time

Previous algorithms (k-means, ID3, ...) learn a global model beforequery time.

Lazy Algorithms:

do nothing during training (just store examples)Generate new hypothesis for each query (“class A!” in case of kNN!)

Eager Algorithms:

do as much as possible during training (ideally: extract the onerelevant rule!)Generate one global hypothesis (or a set, see Candidate-Elimination)once.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 43 / 49

Page 89: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Again: Lazy vs. Eager Learners

kNN learns a local model at query time

Previous algorithms (k-means, ID3, ...) learn a global model beforequery time.

Lazy Algorithms:

do nothing during training (just store examples)Generate new hypothesis for each query (“class A!” in case of kNN!)

Eager Algorithms:

do as much as possible during training (ideally: extract the onerelevant rule!)Generate one global hypothesis (or a set, see Candidate-Elimination)once.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 43 / 49

Page 90: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Again: Lazy vs. Eager Learners

kNN learns a local model at query time

Previous algorithms (k-means, ID3, ...) learn a global model beforequery time.

Lazy Algorithms:

do nothing during training (just store examples)

Generate new hypothesis for each query (“class A!” in case of kNN!)

Eager Algorithms:

do as much as possible during training (ideally: extract the onerelevant rule!)Generate one global hypothesis (or a set, see Candidate-Elimination)once.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 43 / 49

Page 91: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Again: Lazy vs. Eager Learners

kNN learns a local model at query time

Previous algorithms (k-means, ID3, ...) learn a global model beforequery time.

Lazy Algorithms:

do nothing during training (just store examples)Generate new hypothesis for each query (“class A!” in case of kNN!)

Eager Algorithms:

do as much as possible during training (ideally: extract the onerelevant rule!)Generate one global hypothesis (or a set, see Candidate-Elimination)once.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 43 / 49

Page 92: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Again: Lazy vs. Eager Learners

kNN learns a local model at query time

Previous algorithms (k-means, ID3, ...) learn a global model beforequery time.

Lazy Algorithms:

do nothing during training (just store examples)Generate new hypothesis for each query (“class A!” in case of kNN!)

Eager Algorithms:

do as much as possible during training (ideally: extract the onerelevant rule!)Generate one global hypothesis (or a set, see Candidate-Elimination)once.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 43 / 49

Page 93: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Again: Lazy vs. Eager Learners

kNN learns a local model at query time

Previous algorithms (k-means, ID3, ...) learn a global model beforequery time.

Lazy Algorithms:

do nothing during training (just store examples)Generate new hypothesis for each query (“class A!” in case of kNN!)

Eager Algorithms:

do as much as possible during training (ideally: extract the onerelevant rule!)

Generate one global hypothesis (or a set, see Candidate-Elimination)once.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 43 / 49

Page 94: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Again: Lazy vs. Eager Learners

kNN learns a local model at query time

Previous algorithms (k-means, ID3, ...) learn a global model beforequery time.

Lazy Algorithms:

do nothing during training (just store examples)Generate new hypothesis for each query (“class A!” in case of kNN!)

Eager Algorithms:

do as much as possible during training (ideally: extract the onerelevant rule!)Generate one global hypothesis (or a set, see Candidate-Elimination)once.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 43 / 49

Page 95: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Other Types of Lazy Learners

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 44 / 49

Page 96: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Lazy Decision Trees

Can we use a Decision Tree Algorthm in a lazy mode?

Sure: only create branch that contains test case.

Better: do beam search instead of greedy “branch” building!

Works for essentially all model building algorithms (but makes sensefor “partitioning”-style algorithms only...

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 45 / 49

Page 97: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Lazy Decision Trees

Can we use a Decision Tree Algorthm in a lazy mode?

Sure: only create branch that contains test case.

Better: do beam search instead of greedy “branch” building!

Works for essentially all model building algorithms (but makes sensefor “partitioning”-style algorithms only...

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 45 / 49

Page 98: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Lazy Decision Trees

Can we use a Decision Tree Algorthm in a lazy mode?

Sure: only create branch that contains test case.

Better: do beam search instead of greedy “branch” building!

Works for essentially all model building algorithms (but makes sensefor “partitioning”-style algorithms only...

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 45 / 49

Page 99: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Lazy Decision Trees

Can we use a Decision Tree Algorthm in a lazy mode?

Sure: only create branch that contains test case.

Better: do beam search instead of greedy “branch” building!

Works for essentially all model building algorithms (but makes sensefor “partitioning”-style algorithms only...

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 45 / 49

Page 100: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Lazy(?) Neural Networks

Specht introduced Probabilistic Neural Networks in 1990.

Special type of Neural Networks, optimized for classification tasks.

One Neuron per training instance

Actually long known as “Parzen Window Classifier” in statistics.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 46 / 49

Page 101: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Lazy(?) Neural Networks

Specht introduced Probabilistic Neural Networks in 1990.

Special type of Neural Networks, optimized for classification tasks.

One Neuron per training instance

Actually long known as “Parzen Window Classifier” in statistics.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 46 / 49

Page 102: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Lazy(?) Neural Networks

Specht introduced Probabilistic Neural Networks in 1990.

Special type of Neural Networks, optimized for classification tasks.

One Neuron per training instance

Actually long known as “Parzen Window Classifier” in statistics.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 46 / 49

Page 103: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Lazy(?) Neural Networks

Specht introduced Probabilistic Neural Networks in 1990.

Special type of Neural Networks, optimized for classification tasks.

One Neuron per training instance

Actually long known as “Parzen Window Classifier” in statistics.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 46 / 49

Page 104: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Lazy(?) Neural Networks

Specht introduced Probabilistic Neural Networks in 1990.

Special type of Neural Networks, optimized for classification tasks.

One Neuron per training instance

Actually long known as “Parzen Window Classifier” in statistics.

Parzen Window

The Parzen windows method is a non-parametric procedure thatsynthesizes an estimate of a probability density function (pdf) bysuperposition of a number of windows, replicas of a function (often theGaussian).

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 46 / 49

Page 105: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Probabilistic Neural Networks

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 47 / 49

Page 106: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Probabilistic Neural Networks

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 48 / 49

Page 107: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Probabilistic Neural Networks

Probabilistic Neural Networks are powerful predictors(but σ-adjustment problematic)

Efficient (and eager!) training algorithms exist that introduce moregeneral neurons, covering more than just one training instance.

Usual problems of distance based classifiers apply.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 49 / 49

Page 108: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Probabilistic Neural Networks

Probabilistic Neural Networks are powerful predictors(but σ-adjustment problematic)

Efficient (and eager!) training algorithms exist that introduce moregeneral neurons, covering more than just one training instance.

Usual problems of distance based classifiers apply.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 49 / 49

Page 109: Finding Predictors: Nearest Neighbor Modern Motivations ... · Choosing the right number of neighbors Some Optimizations Other types of lazy algorithms Compendium slides for \Guide

Probabilistic Neural Networks

Probabilistic Neural Networks are powerful predictors(but σ-adjustment problematic)

Efficient (and eager!) training algorithms exist that introduce moregeneral neurons, covering more than just one training instance.

Usual problems of distance based classifiers apply.

Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011.c©Michael R. Berthold, Christian Borgelt, Frank Hoppner, Frank Klawonn and Iris Ada 49 / 49