Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science...
-
Upload
angel-cobb -
Category
Documents
-
view
212 -
download
0
Transcript of Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science...
![Page 1: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/1.jpg)
Similarities, Distances and Manifold Learning
Prof. Richard C. Wilson
Dept. of Computer ScienceUniversity of York
![Page 2: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/2.jpg)
Background
• Typically objects are characterised by features– Face images
– SIFT features
– Object spectra
– ...
• If we measure n features → n-dimensional space
• The arena for our problem is an n-dimensional vector space
![Page 3: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/3.jpg)
Background
• Example: Eigenfaces
• Raw pixel values: n by m gives nm features
• Feature space is space of all n by m images
![Page 4: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/4.jpg)
Background
• The space of all face-like images is smaller than the space of all images
• Assumption is faces lie on a smaller manifold embedded in the global space
All images
Face images
![Page 5: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/5.jpg)
Manifold: A space which locally looks Euclidean
Manifold learning: Finding the manifold representing the objects we are interested in
All objects should be on the manifold, non-objects outside
![Page 6: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/6.jpg)
Part I: Euclidean SpacePosition, Similarity and Distance
Manifold Learning in Euclidean space
Some famous techniques
Part II: Non-Euclidean ManifoldsAssessing Data
Nature and Properties of Manifolds
Data Manifolds
Learning some special types of manifolds
Part III: Advanced TechniquesMethods for intrinsically curved manifolds
Thanks to Edwin Hancock, Eliza Xu, Bob Duin for contributionsAnd support from the EU SIMBAD project
![Page 7: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/7.jpg)
Part I: Euclidean Space
![Page 8: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/8.jpg)
Position
The main arena for pattern recognition and machine learning problems is vector space– A set of n well defined features collected into a vector
ℝn
Also defined are addition of vectors and multiplication by a scalar
Feature vector → position
![Page 9: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/9.jpg)
Similarity
To make meaningful progress, we need a notion of similarity
Inner product
• The inner-product ‹x,y› can be considered to be a similarity between x and y
i
ii yxyx,
![Page 10: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/10.jpg)
Induced norm
• The self-similarity ‹x,x› is the (square of) the ‘size’ of x and gives rise to the induced norm, of the length of x:
• Finally, the length of x allows the definition of a distance in our vector space as the length of the vector joining x and y
• Inner product also gets us distance
xxx ,
yxyxyxyx ,),(d
![Page 11: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/11.jpg)
Euclidean space
• If we have a vector space for features, and the usual inner product, all three are connected:
),( Distance
Similarity
, Position
yx
yx
yx
d
,
![Page 12: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/12.jpg)
non-Euclidean Inner Product
• If the inner-product has the form
• Then the vector space is Euclidean
• Note we recover all the expected stuff for Euclidean space, i.e.
• The inner-product doesn’t have to be like this; for example in Einstein’s special relativity, the inner-product of spacetime is
i
iiT yxyxyx,
2222
211
21
22
21
)()()(),( nn yxyxyxd
xxx
yx
x
44332211, yxyxyxyx yx
![Page 13: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/13.jpg)
The Golden Trio
• In Euclidean space, the concepts of position, similarity and distance are elegantly connected
PositionX
SimilarityK
DistanceD
![Page 14: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/14.jpg)
Point position matrix
• In a normal manifold learning problem, we have a set of samples X={x1,x2,...,xm}
• These can be collected together in a matrix X
Tm
T
T
x
x
x
X2
1
I use this convention, but othersmay write them vertically
![Page 15: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/15.jpg)
Centreing
A common and important operation is centreing – moving the mean to the origin– Centred points behave better
is the mean matrix, so is the centred matrix
– J is the all-ones matrix
This can be done with C
– C is the centreing matrix (and is symmetric C=CT)
CXXJIC / m
m/JX m/JXX
![Page 16: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/16.jpg)
Position-Similarity
• The similarity matrix K is defined as
• From the definition of X, we simply get
• The Gram matrix is the similarity matrix of the centred points (from the definition of X)
– i.e. a centring operation on K
• Kc is really a kernel matrix for the points (linear kernel)
PositionX
SimilarityK
CKCCCXXK TTc
jiijK xx ,
TXXK
![Page 17: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/17.jpg)
Position-Similarity
• To go from K to X, we need to consider the eigendecomposition of K
• As long as we can take the square root of Λ then we can find X as
PositionX
SimilarityK
T
T
XXK
UUK
Λ
1/2ΛUX
![Page 18: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/18.jpg)
Kernel embedding
First manifold learning method – kernel embedding
Finds a Euclidean manifold from object similarities
• Embeds a kernel matrix into a set of points in Euclidean space (the points are automatically centred)
• K must have no negative eigenvalues, i.e. it is a kernel matrix (Mercer condition)
1/2ΛUX TUUK Λ
![Page 19: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/19.jpg)
Similarity-Distance
SimilarityK
DistanceD
ijsijjjii
jijjii
jijiji
DKKK
d
,
2
2
,2,,
,),(
xxxxxx
xxxxxx
• We can easily determine Ds from K
![Page 20: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/20.jpg)
Similarity-Distance
What about finding K from Ds ?
Looking at the top equation, we might imagine that
K=-½ Ds is a suitable choice
• Not centred; the relationship is actually
CCDK s2
1
ijjjiiijs KKKD 2,
![Page 21: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/21.jpg)
Classic MDS
• Classic Multidimensional Scaling embeds a (squared) distance matrix into Euclidean space
• Using what we have so far, the algorithm is simple
• This is MDS
kernel theEmbed Λ
kernel theposeEigendecom Λ
kernel theCompute 2
1
1/2UX
KUU
CCDK
T
s
PositionX
DistanceD
![Page 22: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/22.jpg)
The Golden Trio
PositionX
SimilarityK
DistanceD
Kernel EmbeddingMDS
ijjjiiijs
s
KKKD 22
1
,
CCDK
![Page 23: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/23.jpg)
Kernel methods
• A kernel is function k(i,j) which computes an inner-product
– But without needing to know the actual points (the space is implicit)
• Using a kernel function we can directly compute K without knowing XPosition
X
SimilarityK
DistanceD
jijik xx ,),(
Kernel function
![Page 24: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/24.jpg)
Kernel methods
• The implied space may be very high dimensional, but a true kernel will always produce a positive semidefinite K and the implied space will be Euclidean
• Many (most?) PR algorithms can be kernelized– Made to use K rather than X or D
• The trick is to note that any interesting vector should lie in the space spanned by the examples we are given
• Hence it can be written as a linear combination
• Look for α instead of u
αX
xxxuT
mm
2211
![Page 25: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/25.jpg)
Kernel PCA
• What about PCA? PCA solves the following problem
• Let’s kernelize:
XuXu
Σuuu
u
u
TT
T
n
1minarg
minarg*
αKα
αXXXXα
αXXXαXXuXu
21
1
)()(11
T
TTT
TTTTTT
n
n
nn
![Page 26: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/26.jpg)
Kernel PCA
• K2 has the same eigenvectors as K, so the eigenvectors of PCA are the same as the eigenvectors of K
• The eigenvalues of PCA are related to the eigenvectors of K by
• Kernel PCA is a kernel embedding with an externally provided kernel matrix
2PCA
1Kn
![Page 27: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/27.jpg)
Kernel PCA
• So kernel PCA gives the same solution as kernel embedding– The eigenvalues are modified a bit
• They are essentially the same thing in Euclidean space
• MDS uses the kernel and kernel embedding
• MDS and PCA are essentially the same thing in Euclidean space
• Kernel embedding, MDS and PCA all give the same answer for a set of points in Euclidean space
![Page 28: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/28.jpg)
Some useful observations
• Your similarity matrix is Euclidean iff it has no negative eigenvalues (i.e. it is a kernel matrix and PSD)
• By similar reasoning, your distance matrix is Euclidean iff the similarity matrix derived from it is PSD
• If the feature space is small but the number of samples is large, then the covariance matrix is small and it is better to do normal PCA (on the covariance matrix)
• If the feature space is large and the number of samples is small, then the kernel matrix will be small and it is better to do kernel embedding
![Page 29: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/29.jpg)
Part II: Non-Euclidean Manifolds
![Page 30: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/30.jpg)
Non-linear data
• Much of the data in computer vision lies in a high-dimensional feature space but is constrained in some way– The space of all images of a face is a subspace of the
space of all possible images
– The subspace is highly non-linear but low dimensional (described by a few parameters)
![Page 31: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/31.jpg)
Non-linear data
• This cannot be exploited by the linear subspace methods like PCA– These assume that the subspace is a Euclidean space as well
• A classic example is the
‘swiss roll’ data:
![Page 32: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/32.jpg)
‘Flat’ Manifolds• Fundamentally different types of data, for example:
• The embedding of this data into the high-dimensional space is highly curved– This is called extrinsic curvature, the curvature of the manifold
with respect to the embedding space
• Now imagine that this manifold was a piece of paper; you could unroll the paper into a flat plane without distorting it– No intrinsic curvature, in fact it is homeomorphic to Euclidean
space
![Page 33: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/33.jpg)
• This manifold is different:
• It must be stretched to map it onto a plane– It has non-zero intrinsic curvature
• A flatlander living on this manifold can tell that it is curved, for example by measuring the ratio of the radius to the circumference of a circle
• In the first case, we might still hope to find Euclidean embedding
• We can never find a distortion free Euclidean embedding of the second (in the sense that the distances will always have errors)
Curved manifold
![Page 34: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/34.jpg)
Intrinsically Euclidean Manifolds
• We cannot use the previous methods on the second type of manifold, but there is still hope for the first
• The manifold is embedded in Euclidean space, but Euclidean distance is not the correct way to measure distance
• The Euclidean distance ‘shortcuts’ the manifold• The geodesic distance calculates the shortest path along the
manifold
![Page 35: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/35.jpg)
Geodesics
• The geodesic generalizes the concept of distance to curved manifolds– The shortest path joining two points which lies completely within
the manifold
• If we can correctly compute the geodesic distances, and the manifold is intrinsically flat, we should get Euclidean distances which we can plug into our Euclidean geometry machine Position
X
SimilarityK
DistanceD
GeodesicDistances
![Page 36: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/36.jpg)
ISOMAP
• ISOMAP is exactly such an algorithm
• Approximate geodesic distances are computed for the points from a graph
• Nearest neighbours graph– For neighbours, Euclidean distance≈geodesic distances
– For non-neighbours, geodesic distance approximated by shortest distance in graph
• Once we have distances D, can use MDS to find Euclidean embedding
![Page 37: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/37.jpg)
ISOMAP
• ISOMAP:– Neighbourhood graph
– Shortest path algorithm
– MDS
• ISOMAP is distance-preserving – embedded distances should be close to geodesic distances
![Page 38: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/38.jpg)
Laplacian Eigenmap
• The Laplacian Eigenmap is another graph-based method of embedding non-linear manifolds into Euclidean space
• As with ISOMAP, form a neighbourhood graph for the datapoints
• Find the graph Laplacian as follows
• The adjacency matrix A is
• The ‘degree’ matrix D is the diagonal matrix
• The normalized graph Laplacian is
otherwise 0
connected are and if
2
jieA t
d
ij
ij
j
ijii AD
2/12/1 ADDIL
![Page 39: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/39.jpg)
Laplacian Eigenmap
• We find the Laplacian eigenmap embedding using the eigendecomposition of L
• The embedded positions are
• Similar to ISOMAP– Structure preserving not distance preserving
TUUL
UDX 2/1
![Page 40: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/40.jpg)
Locally-Linear Embedding
• Locally-linear Embedding is another classic method which also begins with a neighbourhood graph
• We make point i (in the original data) from a weighted sum of the neighbouring points
• Wij is 0 for any point j not in the neighbourhood (and for i=j)• We find the weights by minimising the reconstruction error
– Subject to the constrains that the weights are non-negative and sum to 1
• Gives a relatively simple closed-form solution
i j j
jiji W xx̂
2|ˆ|min ii xx
j
ijij WW 1,0
![Page 41: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/41.jpg)
Locally-Linear Embedding
• These weights encode how well a point j represents a point i and can be interpreted as the adjacency between i and j
• A low dimensional embedding is found by then finding points to minimise the error
• In other words, we find a low-dimensional embedding which preserves the adjacency relationships
• The solution to this embedding problem turns out to be simply the eigenvectors of the matrix M
• LLE is scale-free: the final points have the covariance matrix I– Unit scale
)()( WIWIM T
j
jijii
ii W yyyy ˆ |ˆ|min 2
![Page 42: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/42.jpg)
Comparison
• LLE might seem like quite a different process to the previous two, but actually very similar
• We can interpret the process as producing a kernel matrix followed by scale-free kernel embedding
ISOMAP Lap. Eigenmap LLE
Representation Neighbourhood graph
Neighbourhood graph
Neighbourhood graph
Similarity matrix From geodesic distances
Graph Laplacian Reconstruction weights
Embedding
UXUUΛK
WWWWJIK
T
TT
n
kk
)1(
UDX 2/12/1UX UX
![Page 43: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/43.jpg)
Comparison
• ISOMAP is the only method which directly computes and uses the geodesic distances– The other two depend indirectly on the distances through local
structure
• LLE is scale-free, so the original distance scale is lost, but the local structure is preserved
• Computing the necessary local dimensionality to find the correct nearest neighbours is a problem for all such methods
![Page 44: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/44.jpg)
![Page 45: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/45.jpg)
Non-Euclidean data
• Data is Euclidean iff K is psd
• Unless you are using a kernel function, this is often not true
• Why does this happen?
![Page 46: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/46.jpg)
What type of data do I have?
• Starting point: distance matrix
• However we do not know apriori if our measurements are representable on a manifold– We will call them dissimilarities
• Our starting point to answer the question “What type of data do I have?” will be a matrix of dissimilarities D between objects
• Types of dissimilarities– Euclidean (no intrinsic curvature)
– Non-Euclidean, metric (curved manifold)
– Non-metric (no point-like manifold representation)
![Page 47: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/47.jpg)
Causes
• Example: Chicken pieces data
• Distance by alignment
• Global alignment of everything could find Euclidean distances
• Only local alignments are practical
![Page 48: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/48.jpg)
Causes
Dissimilarities may also be non-metric
The data is metric if it obeys the metric conditions1. Dij≥ 0 (nonegativity)
2. Dij= 0 iff i=j (identity of indiscernables)
3. Dij= Dji (symmetry)
4. Dij≤Dik+ Dkj (triangle inequality)
Reasonable dissimilarites should meet 1&2
![Page 49: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/49.jpg)
Causes
• Symmetry Dij= Dji
• May not be symmetric by definition• Alignment: i→j may find a better solution than
j→i
![Page 50: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/50.jpg)
Causes
• Triangle violations Dij≤Dik+ Dkj
• ‘Extended objects’
• Finally, noise in the measure of D can cause all of these effects
k
i j
0
0
0
ij
kj
ik
D
D
D
![Page 51: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/51.jpg)
Tests(1)
• Find the similarity matrix
• The data is Euclidean iff K is positive semidefinite (no negative eigenvalues)– K is a kernel, explicit embedding from kernel embedding
• We can then use K in a kernel algorithm
CCDK s2
1
![Page 52: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/52.jpg)
Tests(2)
• Negative eigenfraction (NEF)
• Between 0 and 0.5
i
i
i
0NEF
![Page 53: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/53.jpg)
Tests(3)
1. Dij≥ 0 (nonegativity)
2. Dij= 0 iff i=j (identity of indiscernables)
3. Dij= Dji (symmetry)
4. Dij≤Dik+ Dkj (triangle inequality)
– Check these for your data (3rd involves checking all triples)
– Metric data is embeddable on a (curved) Reimannian manifold
![Page 54: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/54.jpg)
Corrections
• If the data is non-metric or non-Euclidean, we can ‘correct it’
• Symmetry violations– Average
– For min-cost distances may be more appropriate
• Triangle violations– Constant offset
– This will also remove non-Euclidean behaviour for large enough c
• Euclidean violations– Discard negative eigenvalues
• There are many other approaches*
* “On Euclidean corrections for non-Euclidean dissimilarities”, Duin, Pekalska, Harol,Lee and Bunke, S+SSPR 08
)(2
1jiijjiij DDDD
),min( jiijjiij DDDD
)( jicDD ijij
![Page 55: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/55.jpg)
Part III: Advanced techniques for non-Euclidean Embeddings
![Page 56: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/56.jpg)
Known Manifolds
• Sometimes we have data which lies on a known but non-Euclidean manifold
• Examples in Computer Vision– Surface normals
– Rotation matrices
– Flow tensors (DT-MRI)
• This is not Manifold Learning, as we already know what the manifold is
• What tools do we need to be able to process data like this?– As before, distances are the key
![Page 57: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/57.jpg)
Example: 2D direction
Direction of an edge in an image, encoded as a unit vector
The average of the direction vector isn’t even a direction vector (not unit length), let alone the correct ‘average’ direction
The normal definition of mean is not correct
– Because the manifold is curved
1x
2x
x
i
inxx
1
![Page 58: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/58.jpg)
Tangent space
• The tangent space (TP) is the Euclidean space which is parallel to the manifold(M) at a particular point (P)
• The tangent space is a very useful tool because it is Euclidean
M
TP
P
![Page 59: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/59.jpg)
Exponential Map
• Exponential map:
• ExpP maps a point X on the tangent plane onto a point A on the manifold– P is the centre of the mapping and is at the origin on the tangent
space
– The mapping is one-to-one in a local region of P
– The most important property of the mapping is that the distances to the centre P are preserved
– The geodesic distance on the manifold equals the Euclidean distance on the tangent plane (for distances to the centre only)
XA
MT
P
PP
Exp
:Exp
),(),( PAdPXd MTP
![Page 60: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/60.jpg)
Exponential map
• The log map goes the other way, from manifold to tangent plane
MX
TM
P
pP
Log
:Log
![Page 61: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/61.jpg)
Exponential Map
• Example on the circle: Embed the circle in the complex plane
• The manifold representing the circle is a complex number with magnitude 1 and can be written x+iy=exp(i)
Re
ImPieP
![Page 62: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/62.jpg)
• In this case it turns out that the map is related to the normal exp and log functions
M
TP PieP
AieA
PAi
i
P
P
A
e
ei
P
AiAX
log
logLog
APAP
P
iii
iXPXA
exp)(expexp
expExp
X
![Page 63: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/63.jpg)
Intrinsic mean
• The mean of a set of samples is usually defined as the sum of the samples divided by the number– This is only true in Euclidean space
• A more general formula
• Minimises the distances from the mean to the samples (equivalent in Euclidean space)
i
igd ),(minarg 2 xxxx
![Page 64: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/64.jpg)
Intrinsic mean
• We can compute this intrinsic mean using the exponential map
• If we knew what the mean was, then we can use the mean as the centre of a map
• From the properties of the Exp-map, the distances are the same
• So the mean on the tangent plane is equal to the mean on the manifold
iMi AX Log
),(),( MAdMXd igie
![Page 65: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/65.jpg)
Intrinsic mean
• Start with a guess at the mean and move towards correct answer
• This gives us the following algorithm– Guess at a mean M0
1. Map on to tangent plane using Mi
2. Compute the mean on the tangent plane to get new estimate Mi+1
i
iMMk An
Mkk
Log1
Exp1
![Page 66: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/66.jpg)
Intrinsic Mean
• For many manifolds, this procedure will converge to the intrinsic mean– Convergence not always guaranteed
• Other statistics and probability distributions on manifolds are problematic.– Can hypothesis a normal distribution on tangent plane, but
distortions inevitable
![Page 67: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/67.jpg)
Some useful manifolds and maps
• Some useful manifolds and exponential maps
• Directional vectors (surface normals etc.)
• a, p unit vectors, x lies in an (n-1)D space
map) (Exp sin
cos
map) (Log )cos(sin
1 ,
xpa
pax
aa
![Page 68: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/68.jpg)
Some useful manifolds and maps
• Symmetric positive definite matrices (covariance, flow tensors etc)
• A is symmetric positive definite, X is just symmetric
• log is the matrix log defined as a generalized matrix function
map) (Exp exp
map) (Log log
0 0 ,
21
21
21
21
21
21
21
21
PXPPPA
PAPPPX
uAuuA
T
![Page 69: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/69.jpg)
Some useful manifolds and maps
• Orthogonal matrices (rotation matrices, eigenvector matrices)
• A orthogonal, X antisymmetric (X+XT=0)
• These are the matrix exp and log functions as before
• In fact there are multiple solutions to the matrix log– Only one is the required real antisymmetric matrix; not easy to find
– Rest are complex
map) (Exp exp
map) (Log log
I ,
XPA
APX
AAA
T
T
![Page 70: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/70.jpg)
![Page 71: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/71.jpg)
Embedding on Sn
• On S2 (surface of a sphere in 3D) the following parameterisation is well known
• The distance between two points (the length of the geodesic) is
Trrr )cos ,sinsin ,cossin( x
xyd
x
y
yxxyyxij rd coscossinsincos 1
![Page 72: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/72.jpg)
xyrθ
xyθ
x
y
More Spherical Geometry
• But on a sphere, the distance is the highlighted arc-length– Much neater to use inner-product
– And works in any number of dimensions
21
2
,cos
coscos,
rrrd
rxy
xyxy
xyxy
yx
yx
![Page 73: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/73.jpg)
Spherical Embedding
• Say we had the distances between some objects (dij), measured on the surface of a [hyper]sphere of dimension n
• The sphere (and objects) can be embedded into an n+1 dimensional space– Let X be the matrix of point positions
• Z=XXT is a kernel matrix• But• And
• We can compute Z from D and find the spherical embedding!
jiijZ xx ,
r
drZ
rrd
ijjiij
xy
cos,
,cos
2
21
xx
yx
![Page 74: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/74.jpg)
Spherical Embedding
• But wait, we don’t know what r is!
• The distances D are non-Euclidean, and if we use the wrong radius, Z is not a kernel matrix– Negative eigenvalues
• Use this to find the radius– Choose r to minimise the negative eigenvalues
)(minarg* rZr or
![Page 75: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/75.jpg)
Example: Texture Mapping
• As an alternative to unwrapping object onto a plane and texture-mapping the plane
• Embed onto a sphere and texture-map the sphere
Plane Sphere
![Page 76: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/76.jpg)
Backup slides
![Page 77: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/77.jpg)
Laplacian and related processes
• As well as embedding objects onto manifolds, we can model many interesting processes on manifolds
• Example: the way ‘heat’ flows across a manifold can be very informative
•
• On a sphere it is
equationheat 2udt
du
2
2
2
2
2
2
2
isit spaceEuclidean 3Din andLaplacian theis
zyx
sin
sin
1
sin
122
2
22 rr
![Page 78: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/78.jpg)
Heat flow
• Heat flow allows us to do interesting things on a manifold
• Smoothing: Heat flow is a diffusion process (will smooth the data)
• Characterising the manifold (heat content, heat kernel coefficients...)
• The Laplacian depends on the geometry of the manifold– We may not know this
– It may be hard to calculate explicitly
• Graph Laplacian
![Page 79: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/79.jpg)
Graph Laplacian
• Given a set of datapoints on the manifold, describe them by a graph– Vertices are datapoints, edges are adjacency relation
• Adjacency matrix (for example)
• Then the graph Laplacian is
• The graph Laplacian is a discrete approximation of the manifold Laplacian
2
2 )/exp(
ij
ijij d
dA
j
ijii AV AVL
![Page 80: Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York.](https://reader038.fdocuments.in/reader038/viewer/2022110401/56649e025503460f94aec229/html5/thumbnails/80.jpg)
Heat Kernel
• Using the graph Laplacian, we can easily implement heat-flow methods on the manifold using the heat-kernel
• Can diffuse a function on the manifold by
kernelheat )exp(
equationheat
tdt
d
LH
Luu
Hff '