GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha...
-
Upload
branden-robinson -
Category
Documents
-
view
214 -
download
0
Transcript of GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha...
GRASP
Learning a Kernel Matrix for Nonlinear Dimensionality Reduction
Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul
ICML’04
Department of Computer and Information Science
GRASP
The Big Picture
Given high dimensional data sampled from a low dimensional manifold,
how to compute a faithful embedding?
GRASP
Outline
• Part I: kernel PCA
• Part II: Manifold Learning
• Part III: Algorithm • Part IV: Experimental Results
GRASP
Part I.Part I. kernel PCAkernel PCA
GRASP
Nearby points remain nearby,distant points remain distant.
Estimate d.
Input:
Output:
Problem:
Embedding:
GRASP
Subspaces
D=3d=2
D=2d=1
GRASP
Principal Component Analysis
Project data into subspace of maximum variance:
Can be solved as eigenvalue problem:
GRASP
Using the kernel trick
Do PCA in a higher dimensional feature space
Can be defined implicitly through
kernel matrix
GRASP
• Linear
• Gaussian
• Polynomial
Common Kernels
Do very well for classification.How about manifold learning?
GRASP
Linear KernelQuickTime™ and a
Photo - JPEG decompressorare needed to see this picture.
GRASP
Gaussian KernelsQuickTime™ and a
Photo - JPEG decompressorare needed to see this picture.
GRASP
Gaussian KernelsQuickTime™ and a
Photo - JPEG decompressorare needed to see this picture.
Feature vectors span as many dimensions as number of spheres with radius needed to enclose input vectors.
GRASP
Polynomial KernelsQuickTime™ and a
Photo - JPEG decompressorare needed to see this picture.
GRASP
Part II. Manifold Learningvia Semidefinite Programming
GRASP
Local Isometry
A smooth, invertible mapping that preserves distances and looks locally like a rotation plus translation.
GRASP
Local Isometry
A smooth, invertible mapping that preserves distances and looks locally like a rotation plus translation.
GRASP
Neighborhood graphConnect each point toits k nearest neighbors.
Discretized manifolds
GRASP
Preserve local distancesApproximation of local isometry:
Constraint
Neighborhoodindicator
GRASP
• Goal:
• Problem:
• Heuristic:
Objective Function?
Find Minimum Rank Kernel Matrix
Computationally Hard
Maximize Pairwise Distances
GRASP
QuickTime™ and aPhoto - JPEG decompressor
are needed to see this picture.
Objective Function? (Cont’d)
What happens if we maximize the pairwise distances?
GRASP
Semidefinite Programming Problem:
Maximize:
subject to: Preserve local neighborhoods
Unfold manifold
Center output
Semipositivedefinite
GRASP
Part IIISemidefinite Embedding
in three easy steps(Also known as “Maximum Variance Unfolding”
[Sun, Boyd, Xiao, Diaconis])
GRASP
1. Step: K-Nearest Neighbors
Compute nearest neighbors and the Gram matrix for each neighborhood
GRASP
2. Step: Semidefinite programming
Compute centered, locally isometric dot-product matrix with maximal trace
GRASP
Estimate d from eigenvalue spectrum. Top eigenvectors give embedding
3. Step: kernel PCA
GRASP
Part IV. Experimental Results
GRASP
QuickTime™ and aPhoto - JPEG decompressor
are needed to see this picture.
Trefoil Knot
N=539k=4D=3d=2
GRASP
QuickTime™ and aPhoto - JPEG decompressor
are needed to see this picture.
Trefoil Knot
N=539k=4D=3d=2
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
RB
F
Poly
nom
ial
Lin
ear
SDE
% V
aria
nce
GRASP
Teapot (full rotation) N=400k=4
D=23028d=2
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Bef
ore
Aft
er
% V
aria
nce
QuickTime™ and aPhoto - JPEG decompressor
are needed to see this picture.
GRASP
N=200k=4
D=23028d=2
Teapot (half rotation)
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Bef
ore
Aft
er
% V
aria
nce
GRASP
FacesN=1000
k=4D=540
d=2
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Bef
ore
Aft
er
% V
aria
nce
GRASP
Twos vs. Threes
N=953k=3
D=256d=2
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
RBF
POLY
NOMIA
L
LINEA
RSD
E
GRASP
Part V.Supervised Experimental Results
GRASP
Large Margin Classification
• SDE Kernel used in SVM
• Task: Binary Digit Classification
• Input: USPS Data Set
• Training / Testing set: 810/90
• Neighborhood Size: k=4
GRASP
0
2
4
6
8
10
12
1 vs 2 1 vs 3 2 vs 8 8 vs 9
LinearPolynomialGaussianSDE
SVM Kernel
SDE is not well-suited for SVMs
GRASP
SVM Kernel (cont’d)
Non-Linear decision boundaryLinear decision boundary
Unfolding does not necessarily help classification • Reducing the dimensionality is counter-intuitive.• Needs linear decision boundary on manifold.
GRASP
Part VI. Conclusion
GRASP
Previous Work
Isomap and LLE can both be seen from a kernel view [Jihun Ham et al., ICML’04]
GRASP
Previous Work (Isomap)
Isomap and LLE can both be seen from a kernel view [Jihun Ham et al., ICML’04]
Matrix not necessarily semi-positive definite
SDE Isomap
GRASP
Previous Work (Isomap)
Isomap and LLE can both be seen from a kernel view [Jihun Ham et al., ICML’04]
Matrix not necessarily semi-positive definite
SDE Isomap
GRASP
Previous Work (LLE)
Isomap and LLE can both be seen from a kernel view [Jihun Ham et al., ICML’04]
Eigenvalues do not reveal true dimensionality
SDE LLE
GRASP
Conclusion
Semidefinite Embedding (SDE)+ extends kernel PCA to do manifold learning+ uses semidefinite programming+ has a guaranteed unique solution- not well suited for support vector machines- exact solution (so far) limited to N=2000
GRASP
GRASP
Semidefinite Programming Problem:
Maximize:
subject to: Preserve local neighborhoods
Unfold Manifold
Center Output
semi-positivedefinite
GRASP
Semidefinite Programming Problem:
Maximize:
subject to: Preserve local neighborhoods
Unfold Manifold
Center Output
semi-positivedefinite
Introduce Slack
GRASP
QuickTime™ and aPhoto - JPEG decompressor
are needed to see this picture.
Swiss Roll
N=800k=4D=3d=2
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Bef
ore
Aft
er
% V
aria
nce
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
% V
aria
nce
GRASP
Applications
• Visualization of Data
• Natural Language Processing
GRASP
Trefoil Knot
N=539k=4D=3d=2
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
RB
F
Poly
nom
ial
Lin
ear
SDE
% V
aria
nce
RBFPolynomial
SDE
GRASP
Motivation
• Similar vectorized pictures lie on a non-linear manifolds
• Linear Methods don’t work here