Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded...
Transcript of Nearest Neighbor Rule - Robert Haralickharalick.org/ML/nearest_neighbor.pdfGeometry of a Bounded...
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.
......Nearest Neighbor Rule
Robert M. Haralick
Computer Science, Graduate CenterCity University of New York
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Outline
...1 The Nearest Neighbor Rule
...2 Error Rate of NN Rule
...3 Large Sample Size
...4 Geometry of a Bounded High Dimensional Space
...5 Max and Euclidean Distances
...6 Projection Based Algorithms
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Nearest Neighbor Rule
The nearest neighbor rule uses ancient common sensewisdom..Definition..
......Assign a new pattern to the class of the pattern in the trainingset closest to it.
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Voroni Tesselation
Let a set of points be given. Associate with each point the setof points that are closer to it than any other of the given points.This tesselation is called the Voroni Tesselation.
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Formal Statement
Let the set of classes be C = {c1, . . . , cK} and let X be the setof all possible measurements. We assume that there is a metricd defined on X .
Let the training data set be < (x1, c1), . . . , (xN , cN) > whereeach xn is a measurement vector and its corresponding cn isthe class label of xn.
Let x be the new measurement vector. The NN rule assigns xto class cm where d(x , xm) ≤ d(x , xn),n = 1, . . . ,N.
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Nearest Neighbor Probability Distribution Assumption
Let the training data set be < (x1, c1), . . . , (xN , cN) > whereeach xn is a measurement vector and its corresponding cn isthe class label of xn.Let N (xm) denote the nearest neighbor set associated with xm.
N (xm) = {x | d(xm, x) ≤ d(xn, x), n = 1, . . . ,N}
Then,
P(c | x) =
1 if m is the smallest index such that
x ∈ N (xm) and cm = c0 otherwise
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Nearest Neighbor Probability Distribution Assumption
Let {m | x ∈ N (xm)} = {m1, . . . ,mK}Let p1, . . . , pK satisfy
pk ≥ 0, k = 1, . . . ,KK∑
k=1
pk = 1
Then,
P(cmk | x) = pk , k = 1, . . . ,KP(c | x) = 0, c ̸= cmk for some k
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. The Random Sampling Process
The training set is created by a twofold random samplingprocess.
First a class is sampled in accordance with the class priorprobabilities P(c1), . . . ,P(cK ).Suppose that the randomly sampled class for the nth
sample is cn. Then the measurement xn is randomlysampled from the class conditional distribution P(xn | cn).
This two fold sampling is done independently for n = 1, . . . ,N.
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. The Random Sampling Process
The training set is created by a twofold random samplingprocess. Hence,
P(c1, . . . , cN) =N∏
n=1
P(cn)
P(x1, . . . , xN | c1, . . . , cN) =N∏
n=1
P(xn | cn)
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Conditional IndependenceLet a new pair (x , c) be sampled in accordance with the randomsampling process. However the true class c is not made available tothe decision rule. Suppose that xm is the nearest neighbor to x .Consider, P(c, cm | x , xm)
P(c, cm | x , xm) =P(x , xm | c, cm)P(c, cm)
P(x , xm)
=P(x | c)P(xm | cm)P(c)P(cm)
P(x , xm)
=
P(c | x)P(x)P(c)
P(cm | xm)P(xm)P(cm) P(c)P(cm)
P(x , xm)
=P(c | x)P(x)P(cm | xm)P(xm)
P(x , xm)
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Conditional Independence
P(x , xm) =K∑
i=1
K∑j=1
P((c i , x), (c j , xm))
=K∑
i=1
K∑j=1
P(x , xm | c i , c j)P(c i , c j)
=K∑
i=1
K∑j=1
P(x |c i)P(xm | c j)P(c i , c j)
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Conditional Independence
P(x , xm) =K∑
i=1
K∑j=1
P(x |c i)P(xm | c j)P(c i)P(c j)
=K∑
i=1
P(x |c i)P(c i)K∑
j=1
P(xm | c j)P(c j)
=K∑
i=1
P(x , c i)K∑
j=1
P(xm, c j)
= P(x)P(xm)
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Conditional Independence
P(c, cm | x , xm) =P(c | x)P(x)P(cm | xm)P(xm)
P(x , xm)
P(x , xm) = P(x)P(xm)
P(c, cm | x , xm) =P(c | x)P(x)P(cm | xm)P(xm)
P(x)P(xm)
= P(c | x)P(cm | xm)
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Probability of NN Rule Error
Let PN(e | x) be the probability of error of a NN rule based on atraining set sample size of N. Let xm be the nearest neighbor tox .
PN(e |x) =PN(e, x)
P(x)
=
∫PN(e, xm, x)
P(x)dxm
=
∫P(e | x , xm)PN(x , xm)
P(x)dxm
=
∫P(e |x , xm)PN(xm | x)dxm
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Probability of NN Rule Error
P(e | x , xm) = 1 −K∑
k=1
P(ck , ck | x , xm)
= 1 −K∑
k=1
P(ck |x)P(ck | xm)
Therefore,
PN(e |x) =
∫P(e |x , xm)PN(xm | x)dxm
=
∫ (1 −
K∑k=1
P(ck |x)P(ck | xm)
)PN(xm | x)dxm
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Large Sample Size
Given x , consider what happens to its nearest neighbor xm asthe sample size gets large. We assume the mixture densityfunction P to be continuous and P(x) ̸= 0. Let S be ahypersphere centered at x and with small radius r . Let PS bethe probability that a measurement sampled from the mixturedensity function falls into S. Then 0 < PS < 1.
The probability that all of the N independently sampled trainingmeasurements are outside of the hypersphere is (1 − PS)
N .
limN→∞(1 − PS)N = 0
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Large Sample Size
limN→∞(1 − PS)N = 0
This implies that the nearest neighbor xm to x converges to x inprobability. Thus we can write
limN→∞
PN(xm | x) → δ(x − xm)
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Asymptotic Probability of NN Rule Error
Now, as the training set gets large, N → ∞, andPN(xm | x) → δ(x − xm). Hence,
limN→∞
PN(e | x) = limN→∞
∫ (1 −
K∑k=1
P(ck |x)P(ck | xm)
)PN(xm | x)dxm
=
∫ (1 −
K∑k=1
P(ck |x)P(ck | xm)
)lim
N→∞PN(xm | x)dxm
=
∫ (1 −
K∑k=1
P(ck |x)P(ck | xm)
)δ(x − xm)dxm
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Asymptotic Probability of NN Rule Error
P(e | x) be the asymptotic probability of error of a NN Rule.
P(e | x) = limN→∞
PN(e | x)
=
∫ (1 −
K∑k=1
P(ck |x)P(ck | xm)
)δ(x − xm)dxm
= 1 −K∑
k=1
P(ck |x)P(ck | x)
= 1 −K∑
k=1
P2(ck |x)
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Asymptotic Probability of NN Rule Error
Let x be given and let Q(e | x) be the error of a Bayes rule onmeasurement x . Let cm be the true class of x . Then
Q(e | x) = 1 − P(cm | x)
K∑k=1
P2(ck |x) = P2(cm | x) +K∑
k=1k ̸=m
P2(ck |x)
= (1 − Q(e | x))2 +K∑
k=1k ̸=m
P2(ck |x)
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Inequality
Given,
K∑k=1
k ̸=m
P(ck |x) = 1 − P(cm | x)
= Q(e | x)
What is the smallestK∑
k=1k ̸=m
P2(ck |x)
can be over all P(ck | x)?
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Inequality
MinimizeJ∑
j=1
z2j
under the constraint thatJ∑
j=1
zj = b
The minimum is achieved when zj = b/J, j = 1, . . . , J. In this case,
J∑j=1
z2j = J
b2
J2 =b2
J
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Inequality
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Inequality
When J = 2, the minimum is b2
2 and z1 = z2 = b√2.
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Asymptotic Probability of NN Rule Error
Let x be given and let Q(e |x) be the error of a Bayes rule.Then Q(e | x) = 1 − P(c | x)
P(e | x) = 1 −K∑
k=1
P2(ck | x) = 1 − P2(cm | x)−K∑
k=1k ̸=m
P2(ck | x)
≤ 1 − (1 − Q(e | x))2 − Q2(e | x)K − 1
≤ 1 − (1 − 2Q(e | x) + Q2(e | x))− Q2(e | x)K − 1
≤ 2Q(e | x)− Q2(e | x)− Q2(e | x)/(K − 1)
≤ 2Q(e | x)− Q2(e | x)K
K − 1≤ 2Q(e | x)
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Asymptotic Probability of NN Rule Error
Let Pe be the asymptotic probability of error of the NN Rule. LetQe be the probability of error of the Bayes rule.
P(e | x) ≤ 2Q(e | x)
Pe =
∫P(e, x)dx
=
∫P(e | x)P(x)dx
≤ 2∫
Q(e | x)P(x)dx
≤ 2Qe
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. The Surprising Geometry in High Dimension Spaces
Consider the volume V(d,r) of a sphere of radius r in a space ofdimension d .
V (d , r) =πd/2rd
Γ(d/2 + 1)
As dimension d gets large the volume decreases to 0.
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Volume of a Sphere in a High Dimension Space
Figure: Volume of a sphere of radius 1 as a function of d , thedimension of the space.
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Volume of HyperSphere to Volume of HyperBox
Figure: Ratio of the volume of a hypersphere of radius r to a hyperboxof side 2r as a function of d , the dimension of the space. Ratio isbelow 10% when the dimension d ≥ 6.
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
..
Volume of Smaller HyperSphere to LargerHyperSphere
Figure: Ratio of the volume of a hypersphere of radius .9 to ahypersphere of radius 1 as a function of d , the dimension of thespace. Ratio is below 10% when the dimension d ≥ 20.
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Geometry in Bounded High Dimensional Spaces
A hypersphere in an N Dimensional space has volume β(N)RN .Consider the fraction f of the volume in a shell of width ∆r .
f (N;∆r) =β(N)(r +∆r)N − β(N)rN
β(N)(r +∆r)N
= 1 − rN
(r +∆r)N
"!# &%'$
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Geometry in Bounded High Dimensional Spaces
Take ∆r to be fixed and take the limit of this shell volumefraction as N → ∞.
limN→∞
f (N;∆r) = limN→∞
1 − rN
(r +∆r)N
= 1 − limN→∞
(r
r +∆r
)N
= 1
As the dimension of the space increases a greater fraction ofthe volume of the hypersphere is in the shell.
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Geometry in Bounded High Dimensional Spaces
Let α be a fixed small fraction like .01, for example. Determinethe shell width ∆r such that
f (N;∆r) =(1 +∆r)N − rN
(r +∆r)N = α
1 − rN
(r +∆r(α;N))N = α
1 − α =
(r
r +∆r(α;N)
)N
(1 − α)1N =
rr +∆r(α;N)
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Geometry in Bounded High Dimensional Spaces
(1 − α)1N =
rr +∆r(α;N)
(1 − α)1N (r +∆r(α;N)) = r
(1 − α)1N ∆r(α;N) = r − r(1 − α)
1N
∆r(α;N) = r1 − (1 − α)
1N
(1 − α)1N
= r
(1
(1 − α)1N
− 1
)
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Geometry in Bounded High Dimensional Spaces
∆r(α;N) = r
(1
(1 − α)1N
− 1
)Consider what happens as the dimension N of the spaceincreases.
limN→∞
∆r(α;N) = limN→∞
r
(1
(1 − α)1N
− 1
)
= r limN→∞
(1
(1 − α)1N
− 1
)= 0
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Geometry in Bounded High Dimensional Spaces
limN→∞
∆r(α;N) = 0
As the dimension N of the space increases, the width of theshell required to keep the volume of the shell a fixed fraction ofthe volume of the hypersphere decreases to 0.
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Gaussian Distribution in High Dimensional Space
In a univariate Gaussian Distribution, 90% of a random samplewill fall in the interval [-1.65,1.65]. This fraction decreases tozero as the dimension of the space increases. By d = 10, thefraction is less than 1%.
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Gaussian Distribution in High Dimensional Space
Figure: Fraction of a random sample that will fall into a sphere ofradius 1.65 as a function of d , the dimension of the space. Thefraction is less than 1% when dimension d = 10.
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Nearest Neighbors
Let d be the dimension of the spaceLet x be a given pointLet Dmind be the distance of the nearest neighbor to xLet Dmaxd be the distance of the furthers neighbor to x
Suppose limd→∞ var( ||Xd ||E [||Xd || = 0
ThenDmaxd − Dmind
Dmind
→p 0
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Nearest Neighbors
Dmaxd − Dmind
Dmind
→p 0
means poor discrimination of the nearest and farthest pointswith respect to the query point.
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Minkowski Distance
For k ≥ 1
ρ((x1, . . . , xd), (y1, . . . , yd)) =
(d∑
i=1
|xi − yi |k)1/k
The norm is the Lk norm.Max Distance: k → ∞Euclidean Distance: k = 2Manhattan Distance: k = 1
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Comparison of Max and Euclidean Distance
ρMax(x , y) = maxn
n=1,...N
|xn − yn|
ρEuclidean(x , y) =
√√√√ N∑n=1
(xn − yn)2
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Comparison of Max and Euclidean Distance
It is always the case that ρEuclidean(x , y) ≥ ρMax(x , y). Supposethat (xm − ym)
2 ≥ (xn − yn)2, n = 1, . . . ,N
ρEuclidean(x , y) =
√√√√ N∑n=1
(xn − yn)2
=
√√√√√(xm − ym)2 +N∑
n=1n ̸=m
(xn − yn)2
≥√
(xm − ym)2
≥ ρMax(x , y)
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Comparison of Max and Euclidean Distance
ρMax(x , y) ≤ ρEuclidean(x , y)ρMax(x , y) < ρMax(x , z)
ρEuclidean(x , z) < ρEuclidean(x , y)
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Comparison of Max and Euclidean Distance
In an N Dimensional Space
ρMax(x , y) ≤ ρEuclidean(x , y) ≤√
NρMax(x , y)
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Fractional Distance
For 0 ≤ k ≤ 1
ρ((x1, . . . , xd), (y1, . . . , yd)) =
(d∑
i=1
|xi − yi |k)
Nearest Neighbor using the fractional distance measuresperform better than Euclidean or Manhattan distances.
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Geometry of a Bounded High Dimension Space
The higher the dimension the more each point is about thesame distance away from another pointDistance between point x and point y has little informationregarding distance between x and any nearest neighbor ofy
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Experimental Protocol
Unit HypercubeSet S of 1000 vectors uniformly distributedDimension 10 - 200Max distanceEuclidean distance10,000 Trials
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Mean Euclidean Distance Between Points
As the dimension of the space increases:the mean Euclidean distance between pairs of points of Sincreasesthe standard deviation of the Euclidean distance betweenpairs of points of S is about constantthe ratio of the mean Euclidean distance to the standarddeviation Euclidean distance increases
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Mean Euclidean Distance Between Points
Dimension = 2 Mean/Std = 2.095
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Mean Euclidean Distance Between Points
Dimension = 10 Mean/Std = 5.18
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Mean Euclidean Distance Between Points
Dimension = 200 Mean/Std = 23.8
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Mean Max Distance Between Points
As the dimension of the space increases:the mean Max distance between pairs of points of Sincreasesthe standard deviation of the Max distance between pairsof points of S is decreasesthe ratio of the mean Max distance to the standarddeviation Max distance increases
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Mean Max Distance Between Points
Dimension = 2 Mean/Std = 2.106
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Mean Max Distance Between Points
Dimension = 10 Mean/Std = 5.46
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Mean Max Distance Between Points
Dimension = 200 Mean/Std = 45.88
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Mean Max Distance to Nearest Neighbor
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
..
Standard Deviation of Max Distance to NearestNeighbor
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Random y to NN in S
Dimension = 200
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. NN Between Points of S
Dimension = 200
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Distance to a Point Has Little Information
If the distance between a point x to a point p is d , then p is thenearest neighbor to x when the nearest neighbor to p has agreater distance than 2d .
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Distance to a Point Has Little Information
But the typical distance d of x to a point p is about the same asthe distance of x to its nearest neighbor. So knowing thedistance of x to p provides no information about whether p is anearest neighbor of x .
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. NN Max Distance Algorithm
x and y are measurement vectors of dimension Ndimdmin is the minimum distance found so fard is the current state of the Max distanceTerminate calculation as soon as the distance is greaterthan the minimum distance dmin
d=0.f;i=0;while(i < Ndim && d <= dmin)
{d=max(d,fabs(x[i]-y[i]));i++;}
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Projection Based Algorithms
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Projection Based Algorithms
Let unit vectors w1, . . . ,wJ and a distance d be given. Let x bethe vector whose nearest neighbor needs to be found. Definefor each j , j = 1, . . . , J
Sj(d) = {n | w ′j x − d ≤ w ′
j xn ≤ w ′j x + d}
δ = minj
minn∈Sj (d)
ρ(x , xn)
Then the nearest neighbor to x must be in the set
J∩j=1
Sj(δ)
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. Projection Based Algorithms
Let wj , j = 1, . . . ,N project onto the N coordinate axes.
Sj(d) = {n | w ′j x − d ≤ w ′
j xn ≤ w ′j x + d}
Find the smallest d such thatJ∩
j=1
Sj(d) ̸= ∅
Then
minn
n=1,...,N
ρMax(x , xn) = d
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
The Nearest Neighbor RuleError Rate of NN Rule
Large Sample SizeGeometry of a Bounded High Dimensional Space
Max and Euclidean DistancesProjection Based Algorithms
.. The K-Nearest Neighbor Rule.Definition..
......
Let K be a fixed positive integer. Given a measurement x to beclassified, the K-NN rule finds the K nearest neighbors to x andassigns the class of x to be the class associated with themajority of the K nearest neighbors.