Learning Vector Quantization and K-Nearest...

30
Learning Vector Quantization and K-Nearest Neighbor Learning Vector Quantization and K-Nearest Neighbor Jia Li Department of Statistics The Pennsylvania State University Email: [email protected] http://www.stat.psu.edu/jiali Jia Li http://www.stat.psu.edu/jiali

Transcript of Learning Vector Quantization and K-Nearest...

Page 1: Learning Vector Quantization and K-Nearest Neighborpersonal.psu.edu/jol2/course/stat597e/notes2/knn.pdf · Learning Vector Quantization and K-Nearest Neighbor Experiments I Use the

Learning Vector Quantization and K-Nearest Neighbor

Learning Vector Quantization and K-NearestNeighbor

Jia Li

Department of StatisticsThe Pennsylvania State University

Email: [email protected]://www.stat.psu.edu/∼jiali

Jia Li http://www.stat.psu.edu/∼jiali

Page 2: Learning Vector Quantization and K-Nearest Neighborpersonal.psu.edu/jol2/course/stat597e/notes2/knn.pdf · Learning Vector Quantization and K-Nearest Neighbor Experiments I Use the

Learning Vector Quantization and K-Nearest Neighbor

Learning Vector Quantization

I Developed by Kohonen. A package with document is availableat:http://www.cis.hut.fi/nnrc/nnrc-programs.html .

I When position the prototypes, use information given by classlabels, as a contrast to k-means which selects prototypeswithout using class labels.

I Often works better than k-means.

I The idea is to move a prototype close to training samples inits class, and move away from samples with different classes.

Jia Li http://www.stat.psu.edu/∼jiali

Page 3: Learning Vector Quantization and K-Nearest Neighborpersonal.psu.edu/jol2/course/stat597e/notes2/knn.pdf · Learning Vector Quantization and K-Nearest Neighbor Experiments I Use the

Learning Vector Quantization and K-Nearest Neighbor

Jia Li http://www.stat.psu.edu/∼jiali

Page 4: Learning Vector Quantization and K-Nearest Neighborpersonal.psu.edu/jol2/course/stat597e/notes2/knn.pdf · Learning Vector Quantization and K-Nearest Neighbor Experiments I Use the

Learning Vector Quantization and K-Nearest Neighbor

Jia Li http://www.stat.psu.edu/∼jiali

Page 5: Learning Vector Quantization and K-Nearest Neighborpersonal.psu.edu/jol2/course/stat597e/notes2/knn.pdf · Learning Vector Quantization and K-Nearest Neighbor Experiments I Use the

Learning Vector Quantization and K-Nearest Neighbor

The Algorithm

1. Start from a set of initial prototypes with classes assigned. Denotethe M prototypes by Z = {z1, ..., zM} and their associated classesby C (zm), m = 1, 2, ...,M.

I The initial prototypes can be provided by k-means.

2. Sweep through the training samples and update zm after visitingeach sample.

I Suppose xi is assigned to the mth prototype zm by the nearestneighbor rule:‖ xi − zm ‖≤‖ xi − zm′ ‖ ,∀m′ 6= m, 1 ≤ m′ ≤ M

I If gi = C (zm), move zm towards the training sample:zm ← zm + ε(xi − zm) where ε is the learning rate.

I If gi 6= C (zm), move zm away from the training sample:zm ← zm − ε(xi − zm)

3. Step 2 can be repeated a number of times.

Jia Li http://www.stat.psu.edu/∼jiali

Page 6: Learning Vector Quantization and K-Nearest Neighborpersonal.psu.edu/jol2/course/stat597e/notes2/knn.pdf · Learning Vector Quantization and K-Nearest Neighbor Experiments I Use the

Learning Vector Quantization and K-Nearest Neighbor

Experiments

I Use the diabetes data set.

I Use prototypes obtained by k-means as initial prototypes.

I Use LVQ with ε = 0.1.

I Results obtained after 1, 2, and 5 passes are shown below.

I Classification is not guaranteed to improve after adjustingprototypes.

I One pass with a small ε usually helps. But don’t over do it.

Comments:I Fine tuning often helps:

I Select initial prototypes.I Adjust learning rate ε.

I Read the package documents for details.

Jia Li http://www.stat.psu.edu/∼jiali

Page 7: Learning Vector Quantization and K-Nearest Neighborpersonal.psu.edu/jol2/course/stat597e/notes2/knn.pdf · Learning Vector Quantization and K-Nearest Neighbor Experiments I Use the

Learning Vector Quantization and K-Nearest Neighbor

Error rate: 27.74% Error rate: 27.61%

Jia Li http://www.stat.psu.edu/∼jiali

Page 8: Learning Vector Quantization and K-Nearest Neighborpersonal.psu.edu/jol2/course/stat597e/notes2/knn.pdf · Learning Vector Quantization and K-Nearest Neighbor Experiments I Use the

Learning Vector Quantization and K-Nearest Neighbor

Error rate: 27.86% Error rate: 32.37%

Jia Li http://www.stat.psu.edu/∼jiali

Page 9: Learning Vector Quantization and K-Nearest Neighborpersonal.psu.edu/jol2/course/stat597e/notes2/knn.pdf · Learning Vector Quantization and K-Nearest Neighbor Experiments I Use the

Learning Vector Quantization and K-Nearest Neighbor

K-Nearest Neighbor Classifiers

I Given a query point x0, find the k training samples x(r),r = 1, ..., k closest in distance to x0, and then classify usingmajority vote among the k neighbors.

I Feature normalization is often performed in pre-processing.

I Classification boundaries become smoother with larger k.

Jia Li http://www.stat.psu.edu/∼jiali

Page 10: Learning Vector Quantization and K-Nearest Neighborpersonal.psu.edu/jol2/course/stat597e/notes2/knn.pdf · Learning Vector Quantization and K-Nearest Neighbor Experiments I Use the

Learning Vector Quantization and K-Nearest Neighbor

Jia Li http://www.stat.psu.edu/∼jiali

Page 11: Learning Vector Quantization and K-Nearest Neighborpersonal.psu.edu/jol2/course/stat597e/notes2/knn.pdf · Learning Vector Quantization and K-Nearest Neighbor Experiments I Use the

Learning Vector Quantization and K-Nearest Neighbor

Jia Li http://www.stat.psu.edu/∼jiali

Page 12: Learning Vector Quantization and K-Nearest Neighborpersonal.psu.edu/jol2/course/stat597e/notes2/knn.pdf · Learning Vector Quantization and K-Nearest Neighbor Experiments I Use the

Learning Vector Quantization and K-Nearest Neighbor

Jia Li http://www.stat.psu.edu/∼jiali

Page 13: Learning Vector Quantization and K-Nearest Neighborpersonal.psu.edu/jol2/course/stat597e/notes2/knn.pdf · Learning Vector Quantization and K-Nearest Neighbor Experiments I Use the

Learning Vector Quantization and K-Nearest Neighbor

Jia Li http://www.stat.psu.edu/∼jiali

Page 14: Learning Vector Quantization and K-Nearest Neighborpersonal.psu.edu/jol2/course/stat597e/notes2/knn.pdf · Learning Vector Quantization and K-Nearest Neighbor Experiments I Use the

Learning Vector Quantization and K-Nearest Neighbor

A Comparative Study (ElemStatLearn)

I Two simulated problems.

I There are 10 independent features Xj , each uniformlydistributed on [0, 1].

I The two-class 0-1 target variable is defined as follows:

problem 1: “easy”

Y = I (X1 >1

2);

problem 2: “difficult”

Y = I

sign

3∏

j=1

(Xj −1

2)

> 0

.

Jia Li http://www.stat.psu.edu/∼jiali

Page 15: Learning Vector Quantization and K-Nearest Neighborpersonal.psu.edu/jol2/course/stat597e/notes2/knn.pdf · Learning Vector Quantization and K-Nearest Neighbor Experiments I Use the

Learning Vector Quantization and K-Nearest Neighbor

I For problem 1 (2), except X1 (and X2, X3), all the otherfeatures are “noise”.

I The Bayes error rates are zero.

I In each run, 100 samples used in training and 1000 used intesting.

Jia Li http://www.stat.psu.edu/∼jiali

Page 16: Learning Vector Quantization and K-Nearest Neighborpersonal.psu.edu/jol2/course/stat597e/notes2/knn.pdf · Learning Vector Quantization and K-Nearest Neighbor Experiments I Use the

Learning Vector Quantization and K-Nearest Neighbor

I The figure shows the mean and standard deviation of themisclassification error for nearest-neighbors, K-means andLVQ over 10 realizations (10 simulated data sets), as thetuning parameters change.

I K-means and LVQ give almost identical results.

I For the first problem, K-means and LVQ outperform nearestneighbors, assuming the best choice of tuning parameters foreach. For the second problem, they perform similarly.

I The optimal k for the k-nearest neighbor classification differssignificantly for the two problems.

Jia Li http://www.stat.psu.edu/∼jiali

Page 17: Learning Vector Quantization and K-Nearest Neighborpersonal.psu.edu/jol2/course/stat597e/notes2/knn.pdf · Learning Vector Quantization and K-Nearest Neighbor Experiments I Use the

Learning Vector Quantization and K-Nearest Neighbor

Jia Li http://www.stat.psu.edu/∼jiali

Page 18: Learning Vector Quantization and K-Nearest Neighborpersonal.psu.edu/jol2/course/stat597e/notes2/knn.pdf · Learning Vector Quantization and K-Nearest Neighbor Experiments I Use the

Learning Vector Quantization and K-Nearest Neighbor

Adaptive Nearest-Neighbor MethodsI When dimension is high, data become relatively sparse.I Implicit in nearest-neighbor classification is the assumption

that the class probabilities are roughly constant in theneighborhood, and hence simple averages give good estimates.

I In high dimensional space, the neighborhood represented bythe few nearest samples may not be local.

I Consider N data points uniformly distributed in the unit cube[− 1

2 , 12 ]p. Let R be the radius of a 1-nearest-neighborhood

centered at the origin. Then

median(R) = v−1/pp

(1− 1

2

1/N)1/p

,

where vprp is the volume of the sphere of radius r in p

dimensions.I The median radius quickly approaches 0.5, the distance to the

edge of the cube, when dimension increases.Jia Li http://www.stat.psu.edu/∼jiali

Page 19: Learning Vector Quantization and K-Nearest Neighborpersonal.psu.edu/jol2/course/stat597e/notes2/knn.pdf · Learning Vector Quantization and K-Nearest Neighbor Experiments I Use the

Learning Vector Quantization and K-Nearest Neighbor

I Adjust distance metric locally, so that the resultingneighborhoods stretch out in directions for which the classprobabilities don’t change much.

Jia Li http://www.stat.psu.edu/∼jiali

Page 20: Learning Vector Quantization and K-Nearest Neighborpersonal.psu.edu/jol2/course/stat597e/notes2/knn.pdf · Learning Vector Quantization and K-Nearest Neighbor Experiments I Use the

Learning Vector Quantization and K-Nearest Neighbor

Jia Li http://www.stat.psu.edu/∼jiali

Page 21: Learning Vector Quantization and K-Nearest Neighborpersonal.psu.edu/jol2/course/stat597e/notes2/knn.pdf · Learning Vector Quantization and K-Nearest Neighbor Experiments I Use the

Learning Vector Quantization and K-Nearest Neighbor

Jia Li http://www.stat.psu.edu/∼jiali

Page 22: Learning Vector Quantization and K-Nearest Neighborpersonal.psu.edu/jol2/course/stat597e/notes2/knn.pdf · Learning Vector Quantization and K-Nearest Neighbor Experiments I Use the

Learning Vector Quantization and K-Nearest Neighbor

Discriminant Adaptive Nearest-Neighbor (DANN)

I At each query point, a neighborhood of say 50 points isformed.

I Class probabilities are NOT assumed constant in thisneighborhood. This neighborhood is used only to decide howto define the adapted metric.

I After the metric is decided, a normal k-nearest neighbor ruleis applied to classify the query.

I The metric changes with the query.

Jia Li http://www.stat.psu.edu/∼jiali

Page 23: Learning Vector Quantization and K-Nearest Neighborpersonal.psu.edu/jol2/course/stat597e/notes2/knn.pdf · Learning Vector Quantization and K-Nearest Neighbor Experiments I Use the

Learning Vector Quantization and K-Nearest Neighbor

I The DANN metric at a query point x0 is defined by

D(x , x0) = (x − x0)TΣ(x − x0) ,

where

Σ = W−1/2[(W−1/2)TBW−1/2 + εI](W−1/2)T

= W−1/2[B∗ + εI](W−1/2)T .

I W is the pooled within-class covariance matrix and B is thebetween class covariance matrix.

Jia Li http://www.stat.psu.edu/∼jiali

Page 24: Learning Vector Quantization and K-Nearest Neighborpersonal.psu.edu/jol2/course/stat597e/notes2/knn.pdf · Learning Vector Quantization and K-Nearest Neighbor Experiments I Use the

Learning Vector Quantization and K-Nearest Neighbor

Intuition

I We compute W and B in LDA.

I Recall we did similar computation when deriving discriminantcoordinates.

I Eigen decomposition of B∗ = V∗DBV∗T .

Σ = W−1/2[B∗ + εI](W−1/2)T

= W−1/2[V∗DBV∗T + εI](W−1/2)T

= W−1/2V∗ · (DB + εI) · (W−1/2V∗)T

I Note that the column vectors of W−1/2V∗ are simply thediscriminant coordinates.

I εI is added to avoid using samples far away from the querypoint.

Jia Li http://www.stat.psu.edu/∼jiali

Page 25: Learning Vector Quantization and K-Nearest Neighborpersonal.psu.edu/jol2/course/stat597e/notes2/knn.pdf · Learning Vector Quantization and K-Nearest Neighbor Experiments I Use the

Learning Vector Quantization and K-Nearest Neighbor

Geometric interpretation:

I To compute DNAA metric, x − x0 is projected onto thediscriminant coordinates.

I The projection values on the significant discriminantcoordinates are magnified; those on the insignificant DCs areshrunk.

I The implication is that the neighborhood is stretched in thedirections of the insignificant DCs and squeezed in those ofthe significant ones.

I (W−1/2)T is introduced to sphere the data so that thewithin-class covariance matrix is the identity matrix.

Jia Li http://www.stat.psu.edu/∼jiali

Page 26: Learning Vector Quantization and K-Nearest Neighborpersonal.psu.edu/jol2/course/stat597e/notes2/knn.pdf · Learning Vector Quantization and K-Nearest Neighbor Experiments I Use the

Learning Vector Quantization and K-Nearest Neighbor

I Significant discriminant coordinatesrepresent directions in which classprobabilities change substantially.Hence when we form aneighborhood, we want it to havesmall span in these directions. Onthe opposite, we want theneighborhood to have large span indirections in which classprobabilities don’t change much.

I To summarize: we want to form aneighborhood that contains asmany samples as possible but in themean while has approximatelyconstant class probabilities.

Jia Li http://www.stat.psu.edu/∼jiali

Page 27: Learning Vector Quantization and K-Nearest Neighborpersonal.psu.edu/jol2/course/stat597e/notes2/knn.pdf · Learning Vector Quantization and K-Nearest Neighbor Experiments I Use the

Learning Vector Quantization and K-Nearest Neighbor

I To summarize: we want to form a neighborhood that containsas many samples as possible but in the mean while hasapproximately constant class probabilities.

Jia Li http://www.stat.psu.edu/∼jiali

Page 28: Learning Vector Quantization and K-Nearest Neighborpersonal.psu.edu/jol2/course/stat597e/notes2/knn.pdf · Learning Vector Quantization and K-Nearest Neighbor Experiments I Use the

Learning Vector Quantization and K-Nearest Neighbor

Jia Li http://www.stat.psu.edu/∼jiali

Page 29: Learning Vector Quantization and K-Nearest Neighborpersonal.psu.edu/jol2/course/stat597e/notes2/knn.pdf · Learning Vector Quantization and K-Nearest Neighbor Experiments I Use the

Learning Vector Quantization and K-Nearest Neighbor

Global Dimension Reduction

I At each training sample xi , the between-centroids sum ofsquares matrix Bi is computed, using a neighborhood of say50 points.

I Average Bi over all training samples:

B̄ =1

N

N∑i=1

Bi .

Jia Li http://www.stat.psu.edu/∼jiali

Page 30: Learning Vector Quantization and K-Nearest Neighborpersonal.psu.edu/jol2/course/stat597e/notes2/knn.pdf · Learning Vector Quantization and K-Nearest Neighbor Experiments I Use the

Learning Vector Quantization and K-Nearest Neighbor

I Let v1, v2, ..., vp be the eigenvectors of the matrix B̄, orderedfrom largest to smallest eigenvalues θk . Then a rank-L, L < p,approximation to B̄ is

B̄[L] =L∑

l=1

θlvlvTl .

B̄[L] is optimal in the sense of solving the least square problem:

minrank(M)=L

trace(B−M)2 .

And hence solves:

minrank(M)=L

N∑i=1

trace(Bi −M)2 .

Jia Li http://www.stat.psu.edu/∼jiali