Jan Kamenický. Many features ⇒ many dimensions Dimensionality reduction ◦ Feature extraction...

26
Manifold learning Jan Kamenický

Transcript of Jan Kamenický. Many features ⇒ many dimensions Dimensionality reduction ◦ Feature extraction...

Page 1: Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.

Manifold learningJan Kamenický

Page 2: Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.

Many features ⇒ many dimensions

Dimensionality reduction◦ Feature extraction (useful representation)◦ Classification◦ Visualization

Nonlinear dimensionality reduction

Page 3: Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.

WhaT maniFold?◦ Low dimensional embedding of high dimensional

data lying on a smooth nonlinear manifold

Linear methods fail◦ i.e. PCA

Manifold learning

Page 4: Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.

Unsupervised methods◦ Without any a priori knowledge

ISOMAPs◦ Isometric mapping

LLE◦ Locally linear embedding

Manifold learning

Page 5: Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.

Core idea◦ Use geodesic distances on the manifold instead of

Euclidean

Classical MDS◦ Maps data to the lower dimensional space

ISOMAP

Page 6: Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.

Select neighbours◦ K-nearest neighbours◦ ε-distance neighbourhood

Create weighted neighbourhood graph◦ Weights = Euclidean distances

Estimate the geodesic distances as shortest paths in the weighted graph◦ Dijkstra’s algorithm

Estimating geodesic distances

Page 7: Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.

Dijkstra’s algorithm 1) Set distances (0 for initial, ∞ for all other nodes),

set all nodes as unvisited 2) Select unvisited node with smallest distance as

active 3) Update all unvisited neighbours of the active

node (if the computed distance is smaller) 4) Mark active node as visited (it has now minimal

distance), repeat from 2) as necessary

Page 8: Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.

Time complexity◦ O(|E|dec+|V|min)

Implementation◦ Sparse edges◦ Fibonacci heap as a priority queue◦ O(|E|+|V|log|V|)

Geodesic distances in ISOMAP◦ O(N2logN)

Dijkstra’s algorithm

Page 9: Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.

Input◦ Dissimilarities (distances)

Output◦ Data in a low-dimensional embedding, with

distances corresponding to the dissimilarities

Many types of MDS◦ Classical◦ Metric / non-metric (number of dissimilarity

matrices, symmetry, etc.)

Multidimensional scaling (MDS)

Page 10: Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.

Quantitative similarity Euclidean distances (output) One distance matrix (symmetric)

Minimizing the stress function

Classical MDS

Page 11: Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.

We can optimize directly◦ Compute double-centered distance matrix

◦ Note:

◦ Perform SVD of B

◦ Compute final data

Classical MDS

Page 12: Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.

Covariance matrix

Projection of centered X onto eigenvectors of NS (result of the PCA of X)

MDS and PCA correspondence

Page 13: Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.

ISOMAP

Page 14: Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.

ISOMAP

Page 15: Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.

How many dimensions to use?◦ Residual variance

Short-circuiting◦ Too large neigbourhood (not enough data)◦ Non-isometric mapping◦ Totally destroys the final embedding

ISOMAP

Page 16: Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.

Conformal ISOMAP◦ Modified weights in geodesic distance estimate:

◦ Magnifies regions with high density◦ Shrinks regions with low density

ISOMAP modifications

Page 17: Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.

C-ISOMAP

Page 18: Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.

Landmark ISOMAP◦ Use only geodesic distances from several

landmark points (on the manifold)◦ Use Landmark-MDS for finding the embedding

Involves triangulation of non-landmark data◦ Significantly faster, but higher chance for “short-

circuiting”, number of landmarks has to be chosen carefully

ISOMAP modifications

Page 19: Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.

Kernel ISOMAP◦ Ensures that the B (double-centered distance

matrix) is positive semidefinite by constant-shifting method

ISOMAP modifications

Page 20: Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.

Core idea◦ Estimate each point

as a linear combination of it’s neighbours – find best such weights

◦ Same linear representation will hold in the low dimensional space

Locally linear embedding

Page 21: Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.

Find weights Wij by constrained minimization

Neighbourhood preserving mapping

LLE

Page 22: Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.

Low dimensional representation Y

We take eigenvectors of M corresponding to its q+1 smallest eigenvalues

Actually, different algebra is used to improve numeric stability and speed

LLE

Page 23: Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.

LLE

Page 24: Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.

LLE

Page 25: Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.

ISOMAP◦ Preserves global geometric properties (geodesic

distances), especially for faraway points

LLE◦ Preserves local neighbourhood correspondence

only◦ Overcomes non-isometric mapping◦ Manifold is not explicitly required◦ Difficult to estimate q (number of dimensions)

ISOMAP vs LLE

Page 26: Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.

The end