Convergence of Graph Laplacianssaito/confs/ICIAM11/ting.pdfReferences Graph Laplacian Definition...
Transcript of Convergence of Graph Laplacianssaito/confs/ICIAM11/ting.pdfReferences Graph Laplacian Definition...
References
Convergence of Graph Laplacians
Daniel Ting 1
joint work withLing Huang 2 and Michael Jordan 3
1Dept. of Statistics, UC Berkeley
2Intel Research, Berkeley
3EECS and Dept. of Statistics, UC Berkeley
July 29, 2011
1 / 29
References
Overview
What the talk is aboutMy general goal: Understand methods that rely on the manifoldassumption
Graph LaplaciansUsed in:
I ClusteringI Semi-supervised learningI Non-linear Dimensionality Reduction
Goals:I Better understanding of Laplacian or “Laplacian like” methodsI Analysis of kNN graphsI Better choices when constructing graphs
2 / 29
References
Graph Laplacian
Definition (Graph Laplacian)
W = edge weight matrixD = diagonal degree matrix
There are three commonly used versions of Graph LaplaciansLu = D −W (unnormalized)
Ln = I −D−1/2WD−1/2 (normalized)
Lrw = I −D−1W (random walk).
3 / 29
References
A visual example: kNN non-linear embedding
−2
0
2
−2
0
2
0.05
0.1
0.15
(A) Gaussian Manifold
−0.1 −0.05 0 0.05
−0.1
−0.05
0
0.05
Kernel Laplacian Embedding
−0.05 0 0.05
−0.05
0
0.05
(C) Raw kNN Laplacian Embedding
Two graph constructions:1) Why do they behave differently?2) Is it fixable?
4 / 29
References
Constructing graphs on a manifold
Convert Euclidean distance to edge weights: Existing theory1 Choose a smooth kernel K (e.g. Gaussian)2 Choose a bandwidth h3 Choose a weighting/normalization exponent 1/d(x)α
wij =1
d(xi)αd(xj)αK
(‖xi − xj‖
h
)Overly restrictive conditions!
Fails to cover kNN graphsImportant since kNN graphs
I are sparse and have good computational propertiesI show better empirical performance in SSL applications
5 / 29
References
Constructing graphs on a Manifold
Generalizing edge weight choices1 Allow for a non-smooth kernel K2 Use a location dependent bandwidth hR3 Use arbitrary weight functions ω
wij = ω(xi)ω(xj) K(‖xi − xj‖hR(xi, xj)
)Covers most geometric graph constructions of interest. This covers kNNgraphs by using an indicator kernel.
6 / 29
References
Overview: The details
AnalysisIntroduce kernel-free framework for analysis using diffusion theoryIdentify the limiting operator and corresponding smoothness functional
I Effect of the densityI Effect of the graph construction method
Discuss how choose a good graph construction
ApplicationkNN graphs“self-tuning” graphs of Zelnik-Manor & Perona (2004)Locally Linear Embedding (LLE)
7 / 29
References
Setting and Notation
Setting
Compact, smooth m-dimensional manifold without boundaryM embedded in Rk
Density p
Graph construction method (e.g. kNN) where neighborhood sizes are shrinking
Examine L(n)u f →? (i.e. pointwise convergence in the strong operator topology)
NotationData points x, y ∈ RkSmooth C3 function onM f :M→ RNormal coordinates for y
in neighborhood of x s ∈ Rm
Canonical measure onM ηBase kernel function K(·) : R+ → R+
degree function dn(·)
8 / 29
References
Motivation for a Kernel-free framework
Previous questionHere’s a graph construction method. Does a limit exist and what is it?
Broader questionsWhat characterizes the limit?What are sufficient conditions for a limit to exist?
9 / 29
References
Answer: Diffusion processes
CharacterizationsLaplacian is a (negative) infinitesimal generator for a random walkDrift µ and diffusion σσT terms characterize a diffusion process
dXt = µ(Xt)dt+ σ(Xt)dWt
Infinitesimal generator of a diffusion process is
Gf =12
∑i,j
(σσT
)ij
∂2f
∂si∂sj+∑i
µi∂f
∂si.
10 / 29
References
Interpreting the drift and diffusion
A special elliptic operator: Laplace-BeltramiIf σσT = g(·)I and µ/g is a conservative vector field with µ/g = 2∇ log q, then
G =12g∆ + 〈µ,∇〉 =
12g∆q
Continuous analog to graph LaplacianSmoothness functional∫
〈f,−∆qf〉dη =∫〈∇f,∇f〉qdη = ‖∇f‖2L2(q)
11 / 29
References
Main result
Key AssumptionsKernel K of bounded variation
Bandwidth scaling hn → 0 and hm+2n n/ logn→∞
Weight ωn(x) and bandwidth functions Rn(x, y) with Taylor-like expansions
Rn(x, x+ ε) = hn`r(x) + εdr(x, sign(u(x)Tε)) + o(hn)
´ResultExpressed in normal coordinates, the drift and diffusion defined by Lrw are
µs(x) = r(x)2„∇p(x)p(x)
+∇ω(x)
ω(x)+m+ 2
2
dr(x)
r(x)
«,
σs(x)σs(x)T = r(x)2I
where dr(x) = 12(dr(x, 1) + dr(x,−1))us(x).
12 / 29
References
Remarks
Alternative form (for self-adjoint Laplacians)If Rn(x, y)/hn =
√r(x)r(y) + o(hn) e.v. then we have
The asymptotic limit of the graph Laplacian is
−cnLrw → r2∆q
−c′nLu →q
p∆q
where q = p2ω2rm+2.Gives a change of measure from p to q
13 / 29
References
Proof sketch
Based on one cute trick. Write a kernel with bounded variation as aweighted sum of indicator kernels
K(x) =∫
I(x < z)dν+(z)−∫
I(x < z)dν−(z)
All the calculations involving non-random quantities reduce to findingmoments for a sphere
Location dependent bandwidth =⇒ in-dicator kernel behaves like it is shifted bym+2
2 h2dr. Shift introduces drift. Effect onthe diffusion term is of smaller order than theleading term.
������������������������������������������
������������������������������������������
{
(m+2)r’(x)h /22
r(x)h
Use Bernstein inequality and Borel-Cantelli to get almost sureconvergence of random quantities
14 / 29
References
Application to kernel graphs
Weights
wij =K (‖xi − xj‖ /hn)dn(xi)αdn(xj)α
ωn(x) = 1/dn(x)α → 1/p(x)α
Asymptotics (known)Rn(x, y) = 1q = p2w2rd+2 = p2−2α
Asymptotic random walk and unnormalized Laplacians arecnLrw → ∆p2−2α
c′nLu → p1−2α∆p2−2α
If α < 1, drift towards high density regions.
15 / 29
References
Application to kNN
Weights
wij = I(‖xi − xj‖ < Rn(xi, xj)
)kn/n→ 0 and k1+2/m
n /(n2/m log n)→∞
Undirected kNN graph (OR rule)Rn(x, y)/hn ≈ max{p(x)−1/m, p(y)−1/m}dr(x) = 1
2∇p(x)−1/m
Asymptotic random walk and unnormalized Laplacians are
cnLrw = c′nLu →1
p2/m∆p1−2/m
Drift away from high density regions if m = 1!
16 / 29
References
kNN non-linear embedding example
−2
0
2
−2
0
2
0.05
0.1
0.15
(A) Gaussian Manifold
−0.1 −0.05 0 0.05
−0.1
−0.05
0
0.05
Kernel Laplacian Embedding
−0.05 0 0.05
−0.05
0
0.05
(C) Raw kNN Laplacian Embedding
Fix:kNN limit: 1
p∆p0
kernel limit: ∆p
Take ω = p1/2 ≈ r−1
17 / 29
References
kNN non-linear embedding example
−2
0
2
−2
0
2
0.05
0.1
0.15
(A) Gaussian Manifold
−0.1 −0.05 0 0.05
−0.1
−0.05
0
0.05
Kernel Laplacian Embedding
−0.05 0 0.05
−0.05
0
0.05
(C) Raw kNN Laplacian Embedding
−0.05 0 0.05
−0.08
−0.06
−0.04
−0.02
0
0.02
0.04
0.06
0.08(D) Rescaled kNN Laplacian Embedding
18 / 29
References
Application to “self-tuning” graphs
“self-tuning” kernel (Zelnik-Manor & Perona (2004))
K(x, y) = exp
(−‖x− y‖
2
σxσy
)where σx is the distance between x and the kth neighbor
Asymptotics
r(x, y) =√p(x)−1/mp(y)−1/m
dr(x) = 12∇p(x)−1/m (same as undirected kNN)
Same asymptotic Laplace-Beltrami operator as for kNN
cnLrw = c′nLu →1
p1+2/m∆p1−2/m
19 / 29
References
Application of kernel-free approach to LLE
Reminder: LLEGoal: Find weights and coordinates that minimize reconstruction error.
Find weights Find coordinates(local regression) (eigen-decomposition)
minW
Y T (I −W )T (I −W )Y minzzT (I −W )T (I −W )z
BehaviorBelkin & Niyogi (2003) give a heuristic derivation that LLE should behave likethe Laplace-Beltrami operator, but... This is not completely true!
20 / 29
References
Application of kernel-free approach to LLE
Normal coordinatesConverting normal coordinates s to extrinsic coordinates y
y = x+HTxs+ LT⊥x (ssT ) +O(‖s‖3)
Rough analysisExamine the “drift” and “diffusion” terms for the LLE matrix I −W .
µRk(x) = HTx Es︸︷︷︸µ(x)
+LT⊥x E(ssT )︸ ︷︷ ︸σσT (x)
The quantities in normal coordinates satisfyµ = 0 LT⊥x
(σσT
)= 0
21 / 29
References
Application of kernel-free approach to LLE
Implications for behaviorµ = 0
I No effect of density on drift
LT⊥x(σσT
)= 0
I Curvature of manifold affects diffusion termI No well-defined limit when L is not full rankI Does not behave like any elliptic operator if L is full rank
Behavior in practiceWeights are regularized=⇒ favors constant weights=⇒ like kNN-Laplacian if regularization is large
22 / 29
References
Explaining weird behavior of LLE
−20
2
−2
0
2
−1
−0.5
0
0.5
1
(A) Toroidal helix
−0.05 0 0.05−0.05
0
0.05
(B) Laplacian
−2 −1.5 −1 −0.5 0 0.5 1
−1.5
−1
−0.5
0
0.5
1
1.5(C) LLE w/ regularization 1e−3
−1 0 1 2−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
(D) LLE w/ regularization 1e−9
23 / 29
References
Explaining weird behavior of LLE
−20
2
−2
0
2
−1
−0.5
0
0.5
1
(A) Toroidal helix
−0.05 0 0.05−0.05
0
0.05
(B) Laplacian
−2 −1.5 −1 −0.5 0 0.5 1
−1.5
−1
−0.5
0
0.5
1
1.5(C) LLE w/ regularization 1e−3
−2 −1.5 −1 −0.5 0 0.5 1 1.5−2
−1.5
−1
−0.5
0
0.5
1
1.5LLE w/ projection step
24 / 29
References
Effect of regularization and boundary in LLE
−10 −5 0 5 102040
−10
−5
0
5
10
Swiss hole
−2 −1 0 1−2−101
−1
−0.5
0
0.5
1
1.5
LLE w/ regularization 1e−0
−20
2
−2−1012
−1
0
1
LLE w/ regularization 1e−2
−2024 −2−1012
−2
−1
0
1
LLE w/ regularization 1e−3
25 / 29
References
Effect of regularization and boundary in LLE
−10 −5 0 5 102040
−10
−5
0
5
10
Swiss hole
−2 −1 0 1
−1
0
1
012
LLE w/ regularization 1e−9
−2 0 2−202
−1
0
1
2
LLE w/projection
−2024 −2−1012
−2
−1
0
1
LLE w/ regularization 1e−3
26 / 29
References
Other good properties: Convergence of Eigenvectors
ImportanceConstruction of valid basis functionsPossibly of relevance for spectral clustering
Graph Laplacians and spectrum (von Luxburg et al. (2008))
Luf(·) = d(·)f(·)−∫K(·, y)f(y)dPy
No convergence for eigenvalues in rng(d).Good spectral properties: Choose d so rng(d) = {1}.
27 / 29
References
Graph construction and spectrum
Asymptotic degree operator and weighting density
d(x) = p(x)ω(x)2r(x)m set= 1
q(x) = p(x)2ω(x)2r(x)m+2
= p(x)r(x)2 Any density q is possible
Possible “good” symmetric graph constructionsw/o loc dep bw w/ loc dep bw
Unnormalized∆p
q
p∆q
Normalized p12−α∆p2−2αp−
12+α
g
(q
p∆q
)g−1
28 / 29
References
Conclusion
What we introduceKernel-free framework using diffusion processesAnalysis for a general class of graph constructionsLocation dependent bandwidth + arbitrary weights + non-smooth kernels
Practical implicationsBetter understanding of kNN graphsLLE
I no “drift” componentI affected by curvature of the manifold
Construction of arbitrary first order smoothness functionals ‖∇f‖2L2(q),
not just q = pα.Pilot density estimates lead to “better” graph constructions
29 / 29
References
Belkin, M. and Niyogi, P. Laplacian eigenmaps for dimensionality reductionand data representation. Neural Computation, 15(6):1373–1396, 2003.
Donoho, D.L. and Grimes, C. Hessian eigenmaps: Locally linear embeddingtechniques for high-dimensional data. Proceedings of the NationalAcademy of Sciences, 100(10):5591, 2003.
von Luxburg, U., Belkin, M., and Bousquet, O. Consistency of spectralclustering. Annals of Statistics, 36(2):555–586, 2008.
Zelnik-Manor, L. and Perona, P. Self-tuning spectral clustering. In NIPS 17,2004.
Zhang, Z. and Zha, H. Principal Manifolds and Nonlinear DimensionalityReduction via Tangent Space Alignment. SIAM Journal on ScientificComputing, 26:313, 2004.
29 / 29