Isomap - Computer & Information Science & Engineering · 2017-12-04 · Isomap Isometric feature...

Post on 16-Aug-2020

3 views 0 download

Transcript of Isomap - Computer & Information Science & Engineering · 2017-12-04 · Isomap Isometric feature...

IsomapIsometric feature mapping

Drew GonsalvesYangdi Lyu

CAP6617 - Adv. Machine Learning9/1/17

Isomap

Isometric feature mapping

A nonlinear dimensionality reduction technique that preserves distances (isometic) and generates features during a transformation from a larger to smaller metric space.

Data Problem

Main problems faced with high dimensional data

1. Visualization of high dimensional data (e.g. N>3)

2. Feature selection (e.g. classification)

Example: Visualization

Visualize the relationship between height and weight (N=2)

Easy or hard?

Example: Visualization

Visualize the relationship between these images?

Easy or hard?

Example: Visualization/Feature Selection

ProblemIdentify smaller subspace for identical face set [1]

• Original dimensionality = 4096• True dimensionality == 3

• Up-down pose• Left-Right pose• Lighting direction

Use new space to do…!

[1]

3D output using Isomap on N=698 image set

Note: The above graph is the output of Isomap. (I think) the first dimension ‘happened’ to correspond to Left-Right pose, the second dimension Up-Down pose, etc. To put it in ‘PCA terms’ we may have said something like “the first principal axis corresponded to Left-Right pose...”.

What is Isomap attempting to do?Learn a lower dimensional, non-intersecting manifold. Assumes data is densely sampled and resides on a manifold.

Swiss roll. 2D surface embedded in 3D. [1] Swiss roll. 2D surface embedded in 3D. [1] Boy’s surface. Intersecting surface. [2]

How could we use this for classification?

For example, SVM may find some boundary

[4]

Suppose we have 2 classes on a manifold in 3D.

Utilizing Isomap first, we may find a 2D subspace where the data lies where the SVM can find a better decision boundary

[4]

Let’s use an SVM!

How does Isomap work?

Steps1. Constructs a local neighborhood graph for all data points2. Computes geodesic distances between all data points•Geodesic distances - the summative path distance along a manifold3. Constructs lower dimensional (d<<N) embedding

Step 1: Construct local graphs

Free parameters: K or ϵ• K - number of nearest

neighbors• ϵ - max Euclidean search

distance (for arbitrary number of neighbors)

Note: Selection of K and ϵ are critical to reduce chances of ‘short circuit’

Local graph

Example: Use ϵ to construct local graphs

Or by adjacency matrix...

Combine local graphs

Step 2: Develop distances

Geodesic distances between all pairs • NOTE: NOT Euclidean

Intuition - Graph is made up of small hops. Combining hops will estimate geodesic distance

Geodesic Euclidean

Distance algorithms

All pairs, shortest path• Floyd-Warshall algorithm [5]

• All Pairs: O(V3)• Djikstra (V times)

• Single Source: O(V2)• All Pairs: O(V2*V)=O(V3)

• Bellman Ford (V times)• Single Source: O(V*E)=O(V3)• All Pairs: O(V*V*E)=O(V4)

Isomap

Parallel vs. non-parallel versions….

Best: O(n)

Floyd-Warshall

• You have V vertices labelled V={V1,V

2,...}

• You want to find all pairs, shortest path.• There are k=V-2 subgraph sets, S

i for i...k

• For each k=1..VFind all pairs, shortest path by only pivoting through the

subsets of V, Sk={V

1,...,V

k}

Update Equation:

Example: Floyd-Warshall

k=2• Find all pairs, shortest path by using set

S2={V

1,V

2} as only pivot nodes (Note: V

1 was

already considered in k=1)• Update: Path 1 -> 4 is shorter by considering

1 -> 2 -> 4 from S2 with distance = 2 + 1 (3),

versus 1->4 = 5.• Other updates:

• 4->2->5 (d=5 from 58)• 3->2->1 (d=16 from inf)• 3->2->5 (d=18 from 34)• 5->2->1 (d=6 from inf)• 3->2->4 (d=15 from ???)

1

Best Algorithm

• Best parallel: Floyd pipelined 2-D block• How it works:

Best Algorithm

Floyd pipelined 2-D block

How it works • Requires V2 parallel processes• Requires interprocess

communication

Each subprocess p covers a region of distances in matrix D. Process p covers portion of D

Dp

Floyd pipelined 2-D block

Iteration k-1

For each process at k-1Update distance Pass to required processes

Update Pass

Step 3: Transform to lower dimension

Output of all pairs, shortest path (from Floyd)

Multidimensional Scaling (MDS)

Multidimensional Scaling (MDS)

• Geometry:• Solve a triangle given 3 sides• a, b, c

a

bc

Multidimensional Scaling (MDS)

PCA!

?

Multidimensional Scaling (MDS)

ISOMAP

Why H?

Classical MDS

Metric multidimensional scaling

Metric multidimensional scaling

• Construct a map from city distance matrix

SMACOF (Scaling by MAjorizing a COmplicate Function)

Majorizing

No

Yes

Majorizing

SMACOF

Non-metric multidimensional scaling

Non-metric multidimensional scaling

• Example: Consider a small example with 4 objects based on the car marks data set. (from http://sfb649.wiwi.hu-berlin.de/fedc_homepage/xplore/tutorials/mvahtmlnode100.html)

Scatterplot of dissimilarities against distances

Example: Handwritten Digits

Estimate a lower dimensionality (d<<N) for MNIST digit set consisting of the number “2” with N=4096.

Handwritten Digits

1. Develop local graphs2. Estimate geodesic distances 3. Use MDS to produce mapping4. Utilize residual variance for a set

of d

Uncertain ‘best’ lower d<<N. d = ~6-10.

Dimension (d)

Res

idua

l Var

ianc

eKey: Triangle (PCA), Open Circle (MDS), Closed Circle (Isomap)

Handwritten Digits

Result: top d=2 from MDS

By visually looking at the output, the authors determined the major ‘features’ that differentiate all “2s” are top arch and bottom loop articulation

How can we use Isomap for classification?

“A way”:• Choose top k isomap features• Verify discriminability in 2D/3D mappings• Use SVM, k-NN, or some other network

NOTE: Not immediately clear why or how this works for d>2 data for classes of size >=2 (and if any better than without using Isomap). No assumptions on distribution of class data on manifold.

Questions for audience - How does Isomap deal with:1. Too small ϵ == disconnected graph

2. Multiple manifolds

Special cases

Thank you

References

[1] Tenenbaum, Joshua B., Vin De Silva, and John C. Langford. "A global geometric framework for nonlinear dimensionality reduction." science290.5500 (2000): 2319-2323.

[2] https://en.wikipedia.org/wiki/Boy%27s_surface[3] Roweis, Sam T., and Lawrence K. Saul. "Nonlinear dimensionality reduction by locally linear embedding." science 290.5500 (2000): 2323-2326.

[4] Lee, George, Carlos Rodriguez, and Anant Madabhushi. "Investigating the efficacy of nonlinear dimensionality reduction schemes in classifying gene and protein expression studies." IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB) 5.3 (2008): 368-384.

[5] https://en.wikipedia.org/wiki/Floyd%E2%80%93Warshall_algorithm[6] V. Kumar, A. Grama, A. Gupta, G. Karypis, Introduction to Parallel Computing: Design and Analysis of Algorithms (Benjamin/Cummings, Redwood City, CA,1994), pp. 257‹297