Estimation of the Intrinsic Dimension

67
Nonlinear Dimensionality Reduction , John A. Lee, Michel Verleysen, Chapter3 1 Estimation of the Intrinsic Dimension ر ي ب کر مي ا ي عت ن ص گاه ش ن دا ر ي ب ک ر مي ا ي عت ن ص گاه ش ن دا( ( ) ران ه ت ک ي کن ت ي ت’ ل) ران ه ت ک ي کن ت ي ت’ ل

description

دانشگاه صنعتي اميرکبير ( پلي تکنيک تهران). Estimation of the Intrinsic Dimension. Nonlinear Dimensionality Reduction , John A. Lee, Michel Verleysen, Chapter3. Overview. Introduce the concept of intrinsic dimension along with several techniques that can estimate it - PowerPoint PPT Presentation

Transcript of Estimation of the Intrinsic Dimension

Page 1: Estimation of the Intrinsic Dimension

•Nonlinear Dimensionality Reduction , John A. Lee, Michel Verleysen, Chapter3

1

Estimation of the Intrinsic Dimension

دانشگاه صنعتي اميرکبيردانشگاه صنعتي اميرکبيرپلي تکنيک تهران(پلي تکنيک تهران())

Page 2: Estimation of the Intrinsic Dimension

Overview

2

Introduce the concept of intrinsic dimension along with several techniques that can estimate it

Estimators based on fractal geometry

Estimators related to PCA

Trail and error approach

Page 3: Estimation of the Intrinsic Dimension

q-dimension (Cont.)

3

Page 4: Estimation of the Intrinsic Dimension

q-dimension (Cont.)

4

The support of μ is covered with a (multidimensional) grid of cubes with edge lengthε.

Let N(ε) be the number of cubes that intersect the support of μ.

let the natural measures of these cubes be p1, p2, . . . , pN(ε).

pi may be seen as the probability that these cubes are populated

Page 5: Estimation of the Intrinsic Dimension

q-dimension (Cont.)

5

For q ≥ 0, q ≠ 1, these limits do not depend on the choice of the –grid, and give the same values

Page 6: Estimation of the Intrinsic Dimension

Capacity dimension

6

Setting q equal to zero

In this definition, dcap does not depend on the natural measures pi

dcap is also known as the ‘box-counting’ dimension

Page 7: Estimation of the Intrinsic Dimension

Capacity dimension (Cont.)

7

When the manifold is not known analytically and only a few data points are available, the capacity dimension is quite easy to estimate:

Page 8: Estimation of the Intrinsic Dimension

Intuitive interpretation of the capacity dimension

8

Assuming a three-dimensional space divided in small cubic boxes with a fixed edge length ℇ

The number of occupied boxes growsFor a growing one-dimensional object, proportionally to the

object length for a growing two-dimensional object, proportionally to the

object surface. for a growing three-dimensional object, proportionally to the

object volume.

Generalizing to a P-dimensional object like a P-manifold embedded in RD

Page 9: Estimation of the Intrinsic Dimension

Correlation dimension

9

Where q = 2The term correlation refers to the fact that

the probabilities or natural measures pi are squared.

Page 10: Estimation of the Intrinsic Dimension

Correlation dimension )Cont.(

10

C2(ε) is the number of neighboring points lying closer than a certain threshold ε .

This number grows as a length for a 1D object, as a surface for a 2D object, as a volume for a 3D object, and so forth.

Page 11: Estimation of the Intrinsic Dimension

Correlation Dim.)Cont.(

11

When the manifold or fractal object is only known by a countable set of points

Page 12: Estimation of the Intrinsic Dimension

Practical estimation

12

When knowledge is finite number of points

Capacity and correlation dimensions

However for each situation calculating limit to zero is impossible in practice

Page 13: Estimation of the Intrinsic Dimension

Practical estimation )Cont.(

13

Page 14: Estimation of the Intrinsic Dimension

14

the slope of the curve is almost constant between 1 ≈ exp(−6) = 0.0025 and 2 ≈ exp(0) = 1

Page 15: Estimation of the Intrinsic Dimension

Dimension estimators based on PCA

15

The model of PCA is linear

The Estimator works only for manifolds containing linear dependencies (linear subspaces)

For more complex manifolds, PCA gives at best an estimate of the global dimensionality of an object.(2D for spiral manifold, Macroscopic effect)

Page 16: Estimation of the Intrinsic Dimension

Local Methods

16

Decomposing the space into small patches, or “space windows”

Ex. Nonlinear generalization of PCA1. Windows are determined by

clustering the data (Vector quantization)

2. PCA is carried out locally, on each space Window

3. Compute weighted average on localities

Page 17: Estimation of the Intrinsic Dimension

17

The fraction of the total variance spanned by the first principal component of each cluster or space window. The corresponding dimensionality (computed by piecewise linear interpolation) for three variance fractions (0.97, 0.98, and0.99)

Page 18: Estimation of the Intrinsic Dimension

Properties

18

The dimension given by local PCA is scale-dependent, like the correlation dimension.

Low number of space windows-> Large window-> Macroscopic structure of spiral (2D)

Optimum window -> small pieces of spiral (1D)

High number of space windows-> too small window-> Noise scale(2D)

Page 19: Estimation of the Intrinsic Dimension

Propeties

19

local PCA requires more data samples to yield an accurate estimate(dividing the manifold into non overlapping patches.)

PCA is repeated for many different numbers of space windows, then the computation time grows.

Page 20: Estimation of the Intrinsic Dimension

Trial and error

20

1. For a manifold embedded in a D-dimensional space, reduce dimensionality successively to P=1,2,..,D.

2. Plot Ecodec as a function of P.

3. Choose a threshold, and determine the lowest value of P such that Ecodec

goes below it (An elbow).

Page 21: Estimation of the Intrinsic Dimension

Additional refinement

21

Using statistical estimation methods like cross validation or bootstrapping:

Ecodec is computed by dimensionality reduction on several subsets that are randomly drawn from the available data.

This results in a better estimation of the reconstruction errors, and therefore in a more faithful estimation of the dimensionality at the elbow.

Huge computational requirements.

Page 22: Estimation of the Intrinsic Dimension

ComparisionsData Set

22

10D data setIntrinsic Dim : 3100, 1000, and 10,000 observationsWhite Gaussian noise, with std 0.01

Page 23: Estimation of the Intrinsic Dimension

PCA estimator

23

Number of observations does not greatly influence the results Nonlinear dependences hidden in the data sets

Page 24: Estimation of the Intrinsic Dimension

Correlation Dimension

24Much more sensitive to the number of available observations.

Page 25: Estimation of the Intrinsic Dimension

25

the correlation dimension is muchslower than PCA but yields higher quality

results

Edge effects appear: the dimensionality is slightly underestimated

The noise dimensionality appears more clearly as the number of observations grows.

The correlation dimension is much slower than PCA but yields higher quality results

Page 26: Estimation of the Intrinsic Dimension

Local PCA estimator

26 The nonlinear shape of the underlying manifold for large windows

Page 27: Estimation of the Intrinsic Dimension

Local PCA estimator )cont.(

27

too small window, rare samples

PCA is no longer reliable, because the windows do not contain enough points.

Page 28: Estimation of the Intrinsic Dimension

Local PCA estimator )cont.(

28

Local PCA yields the right dimensionality.

The largest three normalized eigen values remain high for any number of windows, while the fourth and subsequent ones are negligible.

It is noteworthy that for a single window the result of local PCA is trivially the same as for PCA applied globally, But as the number of windows is increasing, the fourth normalized eigen value is decreasing slowly.

Local PCA is obviously much slower than global PCA, but still faster than the correlation dimension

Page 29: Estimation of the Intrinsic Dimension

Trial and error

29

The number of points does not play an important role.

The DR method slightly over estimates the dimensionality.

Although the method relies on a nonlinear model, the manifold may still be too curved to achieve a perfect embedding in a space having the same dimension as the exact manifold dimensionality.

The overestimation observed for PCA does not disappear but is only attenuated when switching to an NLDR method.

Page 30: Estimation of the Intrinsic Dimension

Concluding remarks

30

PCA applied globally on the whole data set remains the simplest and fastest one.

Its results are not very convincing: the dimension is almost always overestimated if data do not perfectly fit the PCA model.

Method relying on a nonlinear model is very slow.

The overestimation that was observed with PCA does not disappear totally.

Page 31: Estimation of the Intrinsic Dimension

Concluding remarks

31

Local PCA runs fast if the number of windows does not sweep a wide interval.

local PCA has given the right dimensionality for the studied data sets.

The correlation dimension clearly appears as the best method to estimate the intrinsic dimensionality.

It is not the fastest of the four methods, but its results are the best and most detailed ones, giving the dimension on all scales.

Page 32: Estimation of the Intrinsic Dimension

•Nonlinear Dimensionality Reduction , John A. Lee, Michel Verleysen, Chapter4

32

Distance Preservation

دانشگاه صنعتي اميرکبيردانشگاه صنعتي اميرکبيرپلي تکنيک تهران(پلي تکنيک تهران())

Page 33: Estimation of the Intrinsic Dimension

33

The motivation behind distance preservation is that any manifold can be fully described by pairwise distances.

Presrving geometrical structure

Page 34: Estimation of the Intrinsic Dimension

Outline

34

Metric space & most common distance measures

Metric Multi dimensional scaling

Geodesic and graph distances

Non linear DR methods

Page 35: Estimation of the Intrinsic Dimension

Spatial distancesMetric space

35

A space Y with a distance function d(a, b) between two points a, b ∈ Y is said to be a metric space if the distance function respects the following axioms:

Nondegeneracy

d(a, b) = 0 if and only if a = b.

Triangular inequality

d(a, b) ≤ d(c, a) + d(c, b).

Nonnegativity. Symmetry

Page 36: Estimation of the Intrinsic Dimension

36

In the usual Cartesian vector space RD, the most-used distance functions are derived from the Minkowski norm

Dominance distance (p = ∞)Manhattan distance (p=1)Euclidean distance(p = 2)

Mahalanobis distanceA straight generalization of the Euclidean distance

pbabad ),(

Page 37: Estimation of the Intrinsic Dimension

Metric Multi dimensional scaling

37

Classical metric MDS is not a true distance preserving method.Metric MDS preserves pairwise scalar products instead of pairwise

distances(both are closely related).Is not a nonlinear DR.

Instead of pairwise distances we can use pairwise “similarities”.

When the distances are Euclidean MDS is equivalent to PCA.

Page 38: Estimation of the Intrinsic Dimension

Metric MDS

38

Generative model

Where components of x are independent or uncorrelated

W is a D-by-p matrix such that

Scalar product between observations

Gram matrixBoth Y and X are unknown; only the matrix of pairwise

scalar products S,Gram matrix, is given.

Page 39: Estimation of the Intrinsic Dimension

Metric MDS (Cont.)

39

Eigen value decomposition of Gram matrix

P-dimensional latent variables

Criterion of metric MDS

Page 40: Estimation of the Intrinsic Dimension

Metric MDS (Cont.)

40

Metric MDS and PCA give the same solution.

When data consist of distances or similarities prevent us from applying PCA -> Metric MDS.

When the coordinates are known, PCA spends fewer memory resources than MDS.

??

Page 41: Estimation of the Intrinsic Dimension

Experiments

41

Page 42: Estimation of the Intrinsic Dimension

Geodesic distance

42

Assuming that very short Euclidean distances are preserved

Euclidean longer distances are considerably stretched.

Measuring the distance along the manifold and not through the embedding space

Page 43: Estimation of the Intrinsic Dimension

Geodesic distance

43

Distance along a manifoldIn the case of a one-dimensional manifold

M, which depends on a single latent variable x

Page 44: Estimation of the Intrinsic Dimension

Geodesic distance-Multi Dim. manifold

44

Page 45: Estimation of the Intrinsic Dimension

Geodesic distance (Cont.)

45

The integral then has to be minimized over all possible paths that connect the starting and ending points.

Such a minimization is intractable since it is a functional minimization.

Anyway, the parametric equations of M(and P) are unknown; only some (noisy) points of M are available.

Page 46: Estimation of the Intrinsic Dimension

Graph dist.

46

Lack of analytical information -> reformulation of problem.

Minimizing an arc length between two points on a manifold.

Minimize the length of a path (i.e., a broken line).The path should be constrained to follow the underlying

manifold.

In order to obtain a good approximation of the true arc length, a fine discretization of the manifold is needed.

Only the smallest jumps will be permitted. (K-rule,ε-rule )

Page 47: Estimation of the Intrinsic Dimension

Graph dist.

47

Page 48: Estimation of the Intrinsic Dimension

48

Page 49: Estimation of the Intrinsic Dimension

Graph dist.

49

How to compute the shortest paths in a weighted graph? Dijkstra

It is proved that the graph distance approximates the true geodesic distance in an appropriate way.

Page 50: Estimation of the Intrinsic Dimension

Isomap

50

Isomap is a NLDR method that uses the graph distance as an approximation of the geodesic distance.

Isomap inherits one of the major shortcomings of MDS: a very rigid model.

If the distances in D are not Euclidean, Implicitly assumed that the replacement metric yields distances that are equal to Euclidean distances measured in some transformed hyperplane.

Page 51: Estimation of the Intrinsic Dimension

Isomap Algorithm

51

Page 52: Estimation of the Intrinsic Dimension

Intrinsic dimensionality

52

ˆx)i) is the ith column of ˆX = IP×NΛ1/2UT .

An elbow indicates the right dimensionality

Page 53: Estimation of the Intrinsic Dimension

Experiments

53

The first two eigenvalues clearly dominate the others.

Page 54: Estimation of the Intrinsic Dimension

Experiments

54

The open box is not a developable manifold and Isomap does not embed it in a satisfying way.

The first three eigenvalues dominate the others.

like MDS, Isomap does not succeed in detecting that the intrinsic dimensionality of the box is two.

Page 55: Estimation of the Intrinsic Dimension

Kernel PCA

55

The first idea of KPCA consists of reformulating the PCA into its metric MDS equivalent.

KPCA works as metric MDS, i.e., with the matrix of pairwise scalar products S = YTY.

The second idea of KPCA is to “linearize” the underlying manifold M.

Page 56: Estimation of the Intrinsic Dimension

Kernel PCA (Cont.)

56

As a unique hypothesis, KPCA assumes that the mapping φ is such that the mapped data span a linear subspace of the Q-dimensional space, with Q > D.

KPCA thus starts by increasing the data dimensionality!

Page 57: Estimation of the Intrinsic Dimension

Kernel PCA

57

Choose the mapping φCompute pairwise scalar products for the

mapped data and store in the N-by-N matrix Φ

The symmetric matrix Φ has to be decomposed in eigenvalues and eigenvectors.

This operation will not yield the expected result unless Φ is positive semidefinite.

Page 58: Estimation of the Intrinsic Dimension

Experiments- kernel PCA

58

It aims at embedding the manifold into a space where an MDS-based projection would be more successful than in the initial space.

No guarantee is provided that this goal can be reached.

Page 59: Estimation of the Intrinsic Dimension

Experiments- kernel PCA (Cont.)

59

Tuning the parameters of the kernel is tedious

In other methods using an EVD, like metric MDS and Isomap, the variance remains concentrated within the first three eigenvalues, whereas KPCA spreads it out in most cases.

In order to concentrate the variance within a minimal number of eigenvalues, the width of the Gaussian kernel may be increased, but then the benefit of using a kernel is lost and KPCA tends to yield the same result as metric MDS: a linear projection.

Page 60: Estimation of the Intrinsic Dimension

Advantages and drawbacks

60

KPCA can deal with nonlinear manifolds. And actually, the theory hidden behind KPCA is a beautiful and powerful work of art.

KPCA is not used much in dimensionality reduction. The reasons are that the method is not motivated by

geometrical arguments and the geometrical interpretation of the various kernels remains difficult.

Gaussian kernels

The main difficulty in KPCA, as highlighted in the example, is the choice of an appropriate kernel along with the right values for its parameters.

Page 61: Estimation of the Intrinsic Dimension

LLE

61

Page 62: Estimation of the Intrinsic Dimension

LLE Step1Suppose the data consist of N real-valued

vectors y(i) each of dimensionality D, sampled from some underlying manifold.

We expect each data point and its neighbors to lie on or close to a locally linear patch of the manifold.

The idea of LLE is to replace each point y(i) with a linear combination of its neighbors.

Page 63: Estimation of the Intrinsic Dimension

LLEStep 2

63

Characterize the local geometry of these patches by linear coefficients that reconstruct each data point from its neighbors.

Reconstruction errors are measured by the cost function

Page 64: Estimation of the Intrinsic Dimension

LLE Step 2 (cont.)

64

Minimize the cost function subject to two constraints

For any particular data point y)i), they are invariant to rotations, rescaling, and translations of that data point and its neighbors.

The reconstruction weights reflect intrinsic geometric properties of the data that are invariant to exactly such transformations.

ijW

Page 65: Estimation of the Intrinsic Dimension

LLEStep 3

65

Each high-dimensional observation y(i) is mapped to a low-dimensional vector representing global internal coordinates on the manifold.

This is done by choosing p-dimensional coordinate to minimize the embedding cost function

This cost function, like the previous one, is based on locally linear reconstruction errors, but here we fix the weights while optimizing the coordinates.

)(ˆ ix

Page 66: Estimation of the Intrinsic Dimension

Experiments

66

Once the parameters are correctly set, the embedding looks rather good: There are no tears and the box is deformed smoothly, without

superpositions. The only problem for the open box is that at least one lateral face is

completely crushed.

Page 67: Estimation of the Intrinsic Dimension

67

Any question?