Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded...

67
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Nearest Neighbor Rule Error Rate of NN Rule Large Sample Size Geometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms . . Nearest Neighbor Rule Robert M. Haralick Computer Science, Graduate Center City University of New York

Transcript of Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded...

Page 1: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.

......Nearest Neighbor Rule

Robert M. Haralick

Computer Science, Graduate CenterCity University of New York

Page 2: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Outline

...1 The Nearest Neighbor Rule

...2 Error Rate of NN Rule

...3 Large Sample Size

...4 Geometry of a Bounded High Dimensional Space

...5 Max and Euclidean Distances

...6 Projection Based Algorithms

Page 3: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Nearest Neighbor Rule

The nearest neighbor rule uses ancient common sensewisdom..Definition..

......Assign a new pattern to the class of the pattern in the trainingset closest to it.

Page 4: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Voroni Tesselation

Let a set of points be given. Associate with each point the setof points that are closer to it than any other of the given points.This tesselation is called the Voroni Tesselation.

Page 5: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Formal Statement

Let the set of classes be C = {c1, . . . , cK} and let X be the setof all possible measurements. We assume that there is a metricd defined on X .

Let the training data set be < (x1, c1), . . . , (xN , cN) > whereeach xn is a measurement vector and its corresponding cn isthe class label of xn.

Let x be the new measurement vector. The NN rule assigns xto class cm where d(x , xm) ≤ d(x , xn),n = 1, . . . ,N.

Page 6: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Nearest Neighbor Probability Distribution Assumption

Let the training data set be < (x1, c1), . . . , (xN , cN) > whereeach xn is a measurement vector and its corresponding cn isthe class label of xn.Let N (xm) denote the nearest neighbor set associated with xm.

N (xm) = {x | d(xm, x) ≤ d(xn, x), n = 1, . . . ,N}

Then,

P(c | x) =

1 if m is the smallest index such that

x ∈ N (xm) and cm = c0 otherwise

Page 7: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Nearest Neighbor Probability Distribution Assumption

Let {m | x ∈ N (xm)} = {m1, . . . ,mK}Let p1, . . . , pK satisfy

pk ≥ 0, k = 1, . . . ,KK∑

k=1

pk = 1

Then,

P(cmk | x) = pk , k = 1, . . . ,KP(c | x) = 0, c ̸= cmk for some k

Page 8: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. The Random Sampling Process

The training set is created by a twofold random samplingprocess.

First a class is sampled in accordance with the class priorprobabilities P(c1), . . . ,P(cK ).Suppose that the randomly sampled class for the nth

sample is cn. Then the measurement xn is randomlysampled from the class conditional distribution P(xn | cn).

This two fold sampling is done independently for n = 1, . . . ,N.

Page 9: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. The Random Sampling Process

The training set is created by a twofold random samplingprocess. Hence,

P(c1, . . . , cN) =N∏

n=1

P(cn)

P(x1, . . . , xN | c1, . . . , cN) =N∏

n=1

P(xn | cn)

Page 10: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Conditional IndependenceLet a new pair (x , c) be sampled in accordance with the randomsampling process. However the true class c is not made available tothe decision rule. Suppose that xm is the nearest neighbor to x .Consider, P(c, cm | x , xm)

P(c, cm | x , xm) =P(x , xm | c, cm)P(c, cm)

P(x , xm)

=P(x | c)P(xm | cm)P(c)P(cm)

P(x , xm)

=

P(c | x)P(x)P(c)

P(cm | xm)P(xm)P(cm) P(c)P(cm)

P(x , xm)

=P(c | x)P(x)P(cm | xm)P(xm)

P(x , xm)

Page 11: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Conditional Independence

P(x , xm) =K∑

i=1

K∑j=1

P((c i , x), (c j , xm))

=K∑

i=1

K∑j=1

P(x , xm | c i , c j)P(c i , c j)

=K∑

i=1

K∑j=1

P(x |c i)P(xm | c j)P(c i , c j)

Page 12: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Conditional Independence

P(x , xm) =K∑

i=1

K∑j=1

P(x |c i)P(xm | c j)P(c i)P(c j)

=K∑

i=1

P(x |c i)P(c i)K∑

j=1

P(xm | c j)P(c j)

=K∑

i=1

P(x , c i)K∑

j=1

P(xm, c j)

= P(x)P(xm)

Page 13: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Conditional Independence

P(c, cm | x , xm) =P(c | x)P(x)P(cm | xm)P(xm)

P(x , xm)

P(x , xm) = P(x)P(xm)

P(c, cm | x , xm) =P(c | x)P(x)P(cm | xm)P(xm)

P(x)P(xm)

= P(c | x)P(cm | xm)

Page 14: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Probability of NN Rule Error

Let PN(e | x) be the probability of error of a NN rule based on atraining set sample size of N. Let xm be the nearest neighbor tox .

PN(e |x) =PN(e, x)

P(x)

=

∫PN(e, xm, x)

P(x)dxm

=

∫P(e | x , xm)PN(x , xm)

P(x)dxm

=

∫P(e |x , xm)PN(xm | x)dxm

Page 15: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Probability of NN Rule Error

P(e | x , xm) = 1 −K∑

k=1

P(ck , ck | x , xm)

= 1 −K∑

k=1

P(ck |x)P(ck | xm)

Therefore,

PN(e |x) =

∫P(e |x , xm)PN(xm | x)dxm

=

∫ (1 −

K∑k=1

P(ck |x)P(ck | xm)

)PN(xm | x)dxm

Page 16: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Large Sample Size

Given x , consider what happens to its nearest neighbor xm asthe sample size gets large. We assume the mixture densityfunction P to be continuous and P(x) ̸= 0. Let S be ahypersphere centered at x and with small radius r . Let PS bethe probability that a measurement sampled from the mixturedensity function falls into S. Then 0 < PS < 1.

The probability that all of the N independently sampled trainingmeasurements are outside of the hypersphere is (1 − PS)

N .

limN→∞(1 − PS)N = 0

Page 17: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Large Sample Size

limN→∞(1 − PS)N = 0

This implies that the nearest neighbor xm to x converges to x inprobability. Thus we can write

limN→∞

PN(xm | x) → δ(x − xm)

Page 18: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Asymptotic Probability of NN Rule Error

Now, as the training set gets large, N → ∞, andPN(xm | x) → δ(x − xm). Hence,

limN→∞

PN(e | x) = limN→∞

∫ (1 −

K∑k=1

P(ck |x)P(ck | xm)

)PN(xm | x)dxm

=

∫ (1 −

K∑k=1

P(ck |x)P(ck | xm)

)lim

N→∞PN(xm | x)dxm

=

∫ (1 −

K∑k=1

P(ck |x)P(ck | xm)

)δ(x − xm)dxm

Page 19: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Asymptotic Probability of NN Rule Error

P(e | x) be the asymptotic probability of error of a NN Rule.

P(e | x) = limN→∞

PN(e | x)

=

∫ (1 −

K∑k=1

P(ck |x)P(ck | xm)

)δ(x − xm)dxm

= 1 −K∑

k=1

P(ck |x)P(ck | x)

= 1 −K∑

k=1

P2(ck |x)

Page 20: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Asymptotic Probability of NN Rule Error

Let x be given and let Q(e | x) be the error of a Bayes rule onmeasurement x . Let cm be the true class of x . Then

Q(e | x) = 1 − P(cm | x)

K∑k=1

P2(ck |x) = P2(cm | x) +K∑

k=1k ̸=m

P2(ck |x)

= (1 − Q(e | x))2 +K∑

k=1k ̸=m

P2(ck |x)

Page 21: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Inequality

Given,

K∑k=1

k ̸=m

P(ck |x) = 1 − P(cm | x)

= Q(e | x)

What is the smallestK∑

k=1k ̸=m

P2(ck |x)

can be over all P(ck | x)?

Page 22: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Inequality

MinimizeJ∑

j=1

z2j

under the constraint thatJ∑

j=1

zj = b

The minimum is achieved when zj = b/J, j = 1, . . . , J. In this case,

J∑j=1

z2j = J

b2

J2 =b2

J

Page 23: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Inequality

Page 24: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Inequality

When J = 2, the minimum is b2

2 and z1 = z2 = b√2.

Page 25: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Asymptotic Probability of NN Rule Error

Let x be given and let Q(e |x) be the error of a Bayes rule.Then Q(e | x) = 1 − P(c | x)

P(e | x) = 1 −K∑

k=1

P2(ck | x) = 1 − P2(cm | x)−K∑

k=1k ̸=m

P2(ck | x)

≤ 1 − (1 − Q(e | x))2 − Q2(e | x)K − 1

≤ 1 − (1 − 2Q(e | x) + Q2(e | x))− Q2(e | x)K − 1

≤ 2Q(e | x)− Q2(e | x)− Q2(e | x)/(K − 1)

≤ 2Q(e | x)− Q2(e | x)K

K − 1≤ 2Q(e | x)

Page 26: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Asymptotic Probability of NN Rule Error

Let Pe be the asymptotic probability of error of the NN Rule. LetQe be the probability of error of the Bayes rule.

P(e | x) ≤ 2Q(e | x)

Pe =

∫P(e, x)dx

=

∫P(e | x)P(x)dx

≤ 2∫

Q(e | x)P(x)dx

≤ 2Qe

Page 27: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. The Surprising Geometry in High Dimension Spaces

Consider the volume V(d,r) of a sphere of radius r in a space ofdimension d .

V (d , r) =πd/2rd

Γ(d/2 + 1)

As dimension d gets large the volume decreases to 0.

Page 28: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Volume of a Sphere in a High Dimension Space

Figure: Volume of a sphere of radius 1 as a function of d , thedimension of the space.

Page 29: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Volume of HyperSphere to Volume of HyperBox

Figure: Ratio of the volume of a hypersphere of radius r to a hyperboxof side 2r as a function of d , the dimension of the space. Ratio isbelow 10% when the dimension d ≥ 6.

Page 30: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

..

Volume of Smaller HyperSphere to LargerHyperSphere

Figure: Ratio of the volume of a hypersphere of radius .9 to ahypersphere of radius 1 as a function of d , the dimension of thespace. Ratio is below 10% when the dimension d ≥ 20.

Page 31: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Geometry in Bounded High Dimensional Spaces

A hypersphere in an N Dimensional space has volume β(N)RN .Consider the fraction f of the volume in a shell of width ∆r .

f (N;∆r) =β(N)(r +∆r)N − β(N)rN

β(N)(r +∆r)N

= 1 − rN

(r +∆r)N

"!# &%'$

Page 32: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Geometry in Bounded High Dimensional Spaces

Take ∆r to be fixed and take the limit of this shell volumefraction as N → ∞.

limN→∞

f (N;∆r) = limN→∞

1 − rN

(r +∆r)N

= 1 − limN→∞

(r

r +∆r

)N

= 1

As the dimension of the space increases a greater fraction ofthe volume of the hypersphere is in the shell.

Page 33: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Geometry in Bounded High Dimensional Spaces

Let α be a fixed small fraction like .01, for example. Determinethe shell width ∆r such that

f (N;∆r) =(1 +∆r)N − rN

(r +∆r)N = α

1 − rN

(r +∆r(α;N))N = α

1 − α =

(r

r +∆r(α;N)

)N

(1 − α)1N =

rr +∆r(α;N)

Page 34: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Geometry in Bounded High Dimensional Spaces

(1 − α)1N =

rr +∆r(α;N)

(1 − α)1N (r +∆r(α;N)) = r

(1 − α)1N ∆r(α;N) = r − r(1 − α)

1N

∆r(α;N) = r1 − (1 − α)

1N

(1 − α)1N

= r

(1

(1 − α)1N

− 1

)

Page 35: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Geometry in Bounded High Dimensional Spaces

∆r(α;N) = r

(1

(1 − α)1N

− 1

)Consider what happens as the dimension N of the spaceincreases.

limN→∞

∆r(α;N) = limN→∞

r

(1

(1 − α)1N

− 1

)

= r limN→∞

(1

(1 − α)1N

− 1

)= 0

Page 36: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Geometry in Bounded High Dimensional Spaces

limN→∞

∆r(α;N) = 0

As the dimension N of the space increases, the width of theshell required to keep the volume of the shell a fixed fraction ofthe volume of the hypersphere decreases to 0.

Page 37: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Gaussian Distribution in High Dimensional Space

In a univariate Gaussian Distribution, 90% of a random samplewill fall in the interval [-1.65,1.65]. This fraction decreases tozero as the dimension of the space increases. By d = 10, thefraction is less than 1%.

Page 38: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Gaussian Distribution in High Dimensional Space

Figure: Fraction of a random sample that will fall into a sphere ofradius 1.65 as a function of d , the dimension of the space. Thefraction is less than 1% when dimension d = 10.

Page 39: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Nearest Neighbors

Let d be the dimension of the spaceLet x be a given pointLet Dmind be the distance of the nearest neighbor to xLet Dmaxd be the distance of the furthers neighbor to x

Suppose limd→∞ var( ||Xd ||E [||Xd || = 0

ThenDmaxd − Dmind

Dmind

→p 0

Page 40: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Nearest Neighbors

Dmaxd − Dmind

Dmind

→p 0

means poor discrimination of the nearest and farthest pointswith respect to the query point.

Page 41: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Minkowski Distance

For k ≥ 1

ρ((x1, . . . , xd), (y1, . . . , yd)) =

(d∑

i=1

|xi − yi |k)1/k

The norm is the Lk norm.Max Distance: k → ∞Euclidean Distance: k = 2Manhattan Distance: k = 1

Page 42: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Comparison of Max and Euclidean Distance

ρMax(x , y) = maxn

n=1,...N

|xn − yn|

ρEuclidean(x , y) =

√√√√ N∑n=1

(xn − yn)2

Page 43: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Comparison of Max and Euclidean Distance

It is always the case that ρEuclidean(x , y) ≥ ρMax(x , y). Supposethat (xm − ym)

2 ≥ (xn − yn)2, n = 1, . . . ,N

ρEuclidean(x , y) =

√√√√ N∑n=1

(xn − yn)2

=

√√√√√(xm − ym)2 +N∑

n=1n ̸=m

(xn − yn)2

≥√

(xm − ym)2

≥ ρMax(x , y)

Page 44: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Comparison of Max and Euclidean Distance

ρMax(x , y) ≤ ρEuclidean(x , y)ρMax(x , y) < ρMax(x , z)

ρEuclidean(x , z) < ρEuclidean(x , y)

Page 45: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Comparison of Max and Euclidean Distance

In an N Dimensional Space

ρMax(x , y) ≤ ρEuclidean(x , y) ≤√

NρMax(x , y)

Page 46: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Fractional Distance

For 0 ≤ k ≤ 1

ρ((x1, . . . , xd), (y1, . . . , yd)) =

(d∑

i=1

|xi − yi |k)

Nearest Neighbor using the fractional distance measuresperform better than Euclidean or Manhattan distances.

Page 47: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Geometry of a Bounded High Dimension Space

The higher the dimension the more each point is about thesame distance away from another pointDistance between point x and point y has little informationregarding distance between x and any nearest neighbor ofy

Page 48: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Experimental Protocol

Unit HypercubeSet S of 1000 vectors uniformly distributedDimension 10 - 200Max distanceEuclidean distance10,000 Trials

Page 49: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Mean Euclidean Distance Between Points

As the dimension of the space increases:the mean Euclidean distance between pairs of points of Sincreasesthe standard deviation of the Euclidean distance betweenpairs of points of S is about constantthe ratio of the mean Euclidean distance to the standarddeviation Euclidean distance increases

Page 50: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Mean Euclidean Distance Between Points

Dimension = 2 Mean/Std = 2.095

Page 51: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Mean Euclidean Distance Between Points

Dimension = 10 Mean/Std = 5.18

Page 52: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Mean Euclidean Distance Between Points

Dimension = 200 Mean/Std = 23.8

Page 53: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Mean Max Distance Between Points

As the dimension of the space increases:the mean Max distance between pairs of points of Sincreasesthe standard deviation of the Max distance between pairsof points of S is decreasesthe ratio of the mean Max distance to the standarddeviation Max distance increases

Page 54: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Mean Max Distance Between Points

Dimension = 2 Mean/Std = 2.106

Page 55: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Mean Max Distance Between Points

Dimension = 10 Mean/Std = 5.46

Page 56: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Mean Max Distance Between Points

Dimension = 200 Mean/Std = 45.88

Page 57: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Mean Max Distance to Nearest Neighbor

Page 58: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

..

Standard Deviation of Max Distance to NearestNeighbor

Page 59: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Random y to NN in S

Dimension = 200

Page 60: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. NN Between Points of S

Dimension = 200

Page 61: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Distance to a Point Has Little Information

If the distance between a point x to a point p is d , then p is thenearest neighbor to x when the nearest neighbor to p has agreater distance than 2d .

Page 62: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Distance to a Point Has Little Information

But the typical distance d of x to a point p is about the same asthe distance of x to its nearest neighbor. So knowing thedistance of x to p provides no information about whether p is anearest neighbor of x .

Page 63: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. NN Max Distance Algorithm

x and y are measurement vectors of dimension Ndimdmin is the minimum distance found so fard is the current state of the Max distanceTerminate calculation as soon as the distance is greaterthan the minimum distance dmin

d=0.f;i=0;while(i < Ndim && d <= dmin)

{d=max(d,fabs(x[i]-y[i]));i++;}

Page 64: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Projection Based Algorithms

Page 65: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Projection Based Algorithms

Let unit vectors w1, . . . ,wJ and a distance d be given. Let x bethe vector whose nearest neighbor needs to be found. Definefor each j , j = 1, . . . , J

Sj(d) = {n | w ′j x − d ≤ w ′

j xn ≤ w ′j x + d}

δ = minj

minn∈Sj (d)

ρ(x , xn)

Then the nearest neighbor to x must be in the set

J∩j=1

Sj(δ)

Page 66: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. Projection Based Algorithms

Let wj , j = 1, . . . ,N project onto the N coordinate axes.

Sj(d) = {n | w ′j x − d ≤ w ′

j xn ≤ w ′j x + d}

Find the smallest d such thatJ∩

j=1

Sj(d) ̸= ∅

Then

minn

n=1,...,N

ρMax(x , xn) = d

Page 67: Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded High Dimensional Space Max and Euclidean Distances Projection Based Algorithms. Nearest

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Nearest Neighbor RuleError Rate of NN Rule

Large Sample SizeGeometry of a Bounded High Dimensional Space

Max and Euclidean DistancesProjection Based Algorithms

.. The K-Nearest Neighbor Rule.Definition..

......

Let K be a fixed positive integer. Given a measurement x to beclassified, the K-NN rule finds the K nearest neighbors to x andassigns the class of x to be the class associated with themajority of the K nearest neighbors.